In this article we will look into how to tokenize the script.
Introduction
For parsing TSQL script, Microsoft has come out with TSQL100Parser that helps to parse the SQL scripts.In this article we will look into how to tokenize the script.
Stright to Experiment
We will use a simple script as listed under
--If the #Temp object exists in the tempdb, then drop it
IF OBJECT_ID('tempdb..#Temp') IS NOT NULL
BEGIN
Drop Table #Temp
END
--Create a temporary table
CREATE TABLE #Temp
(
Id int,
EmpName Varchar(50),
PhoneNo int
)
--Insert records into the Temporary table from the executed stored proc
INSERT INTO #Temp
(
Id
,EmpName
,PhoneNo
)
EXEC dbo.Usp_FetchRecords
--Display the records inserted into the temporary table
Select * from #Temp
The first thing is that we need to add the assemblies "Microsoft.Data.Schema.ScriptDom" and "Microsoft.Data.Schema.ScriptDom.Sql"
Note:~It is available with VS2010 Ultimate Edition.
The next step is to read the file
string filePath = @"D:\Test.sql";
string Script = string.Empty;
using (StreamReader streamReader = new StreamReader(filePath))
{
Script = streamReader.ReadToEnd();
}
Then the TSQL100Parser parses T-SQL code and turns it in to a script fragment.
public static TSqlScript ParseScript(string script, out IList<string> errorsList)
{
IList<ParseError> parseErrors;
TSql100Parser tsqlParser = new TSql100Parser(true);
TSqlFragment fragment;
using (StringReader stringReader = new StringReader(script))
{
fragment = (TSqlFragment)tsqlParser.Parse(stringReader, out parseErrors);
}
errorsList = new List<string>();
if (parseErrors.Count > 0)
{
var retMessage = string.Empty;
foreach (var error in parseErrors)
{
retMessage += error.Identifier + " - " + error.Message + " - position: " + error.Offset + "; ";
}
}
return (TSqlScript)fragment;
}
Next loop through the batches followed by the statements which contains the "Tokens" and get the collection as shown under
public List<Token> GetTokenCollection(TSqlScript tsqlScript)
{
List<Token> lstTokens = new List<Token>();
if (tsqlScript != null)
{
foreach (TSqlBatch batch in tsqlScript.Batches)
{
if (batch.Statements.Count == 0) continue;
foreach (TSqlStatement statement in batch.Statements)
{
foreach (var token in statement.ScriptTokenStream)
{
lstTokens.Add(new Token { TokenType = token.TokenType.ToString(), Text = token.Text });
}
}
}
}
return lstTokens;
}
Thus we will get all the token and it's types.We can further write custom rules to validate from the tokens.The reault is as under
Conclusion
In this article we have seen of parsing SQL Script using TSQL100 parser. Any developer interested in building a rule engine can start from this article.Hope this will be helpful.Thanks for reading.Zipped file is attached.