Using TSql100Parser to tokenize TSQL Script

Niladri.Biswas
Posted by in C# category on for Beginner level | Points: 250 | Views : 13203 red flag

In this article we will look into how to tokenize the script.


 Download source code for Using TSql100Parser to tokenize TSQL Script

Introduction

For parsing TSQL script, Microsoft has come out with TSQL100Parser that helps to parse the SQL scripts.In this article we will look into how to tokenize the script.

Stright to Experiment

We will use a simple script as listed under

--If the #Temp object exists in the tempdb, then drop it
IF OBJECT_ID('tempdb..#Temp') IS NOT NULL
BEGIN
Drop Table #Temp 
END

--Create a temporary table
CREATE TABLE #Temp 
(
	Id int, 
	EmpName Varchar(50),
	PhoneNo int 
) 
--Insert records into the Temporary table from the executed stored proc
INSERT INTO #Temp 
( 
	Id
	,EmpName
	,PhoneNo
)
EXEC dbo.Usp_FetchRecords

--Display the records inserted into the temporary table
Select * from #Temp

The first thing is that we need to add the assemblies "Microsoft.Data.Schema.ScriptDom" and "Microsoft.Data.Schema.ScriptDom.Sql"

Note:~It is available with VS2010 Ultimate Edition.

The next step is to read the file

string filePath = @"D:\Test.sql";
string Script = string.Empty;

using (StreamReader streamReader = new StreamReader(filePath))
{
	Script = streamReader.ReadToEnd();
}

Then the TSQL100Parser parses T-SQL code and turns it in to a script fragment.

public static TSqlScript ParseScript(string script, out IList<string> errorsList)
{
	IList<ParseError> parseErrors;
	TSql100Parser tsqlParser = new TSql100Parser(true);
	TSqlFragment fragment;
	using (StringReader stringReader = new StringReader(script))
	{
		fragment = (TSqlFragment)tsqlParser.Parse(stringReader, out parseErrors);
	}
	errorsList = new List<string>();
	if (parseErrors.Count > 0)
	{
		var retMessage = string.Empty;
		foreach (var error in parseErrors)
		{
			retMessage += error.Identifier + " - " + error.Message + " - position: " + error.Offset + "; ";
		}
	}
	return (TSqlScript)fragment;
}

Next loop through the batches followed by the statements which contains the "Tokens" and get the collection as shown under

public List<Token> GetTokenCollection(TSqlScript tsqlScript)
{
	List<Token> lstTokens = new List<Token>();

	if (tsqlScript != null)
	{
		foreach (TSqlBatch batch in tsqlScript.Batches)
		{
			if (batch.Statements.Count == 0) continue;

			foreach (TSqlStatement statement in batch.Statements)
			{
				foreach (var token in statement.ScriptTokenStream)
				{
					lstTokens.Add(new Token { TokenType = token.TokenType.ToString(), Text = token.Text });
				}
			}
		}

	}
	return lstTokens;
}

Thus we will get all the token and it's types.We can further write custom rules to validate from the tokens.The reault is as under

Conclusion

In this article we have seen of parsing SQL Script using TSQL100 parser. Any developer interested in building a rule engine can start from this article.Hope this will be helpful.Thanks for reading.Zipped file is attached.

Page copy protected against web site content infringement by Copyscape

About the Author

Niladri.Biswas
Full Name: Niladri Biswas
Member Level: Platinum
Member Status: Member
Member Since: 10/25/2010 11:04:24 AM
Country: India
Best Regards, Niladri Biswas
http://www.dotnetfunda.com
Technical Lead at HCL Technologies

Login to vote for this post.

Comments or Responses

Login to post response

Comment using Facebook(Author doesn't get notification)