Find values from outside braces using T-SQL and XQuery

Rajnilari2015
Posted by in Sql Server category on for Beginner level | Points: 250 | Views : 1213 red flag
Rating: 4.5 out of 5  
 2 vote(s)

Suppose we have a text like "Hello!!!This is text1 (text1).This is text2(text2).This is text3(text3) and text4(text4)".And the objective is to figure out the value outside the brackets [ ( and ) ].In this article, we will look into one of the techniques as how can we achieve this.


 Download source code for Find values from outside braces using T-SQL and XQuery

Recommendation
Read Perform Android Like Search in TSQL before this article.

Introduction

Suppose we have a text like "Hello!!!This is text1 (text1).This is text2(text2).This is text3(text3) and text4(text4)".

And the objective is to figure out the value outside the brackets [ ( and ) ].In this article, we will look into one of the techniques as how can we achieve this.

Environment Setup

DECLARE @str AS VARCHAR(500)
SET @str= 'Hello!!!This is text1 (text1).This is text2(text2).This is text3(text3) and text4(text4)'

Using the code

The first thing that we will do is to replace the '(' with say '@-' and ')' with say '-@' symbols as shown under

DECLARE @str AS VARCHAR(500)
SET @str= 'Hello!!!This is text1 (text1).This is text2(text2).This is text3(text3) and text4(text4)'

SELECT @str =REPLACE(REPLACE(@str,'(','@-'),')','-@')

SELECT @str

The output will be

The next step is to Replace the '@' delimiter/seperator symbols with </X><X> as under

DECLARE 
 @str AS VARCHAR(500)
 ,@xml AS XML
 ,@delimiter AS VARCHAR(1)

--Set the original string
SET @str= 'Hello!!!This is text1 (text1).This is text2(text2).This is text3(text3) and text4(text4)'

--Set the delimiter
SET @delimiter ='@'

-- Replace the ( and ) with '@-' and '-@' respectively
SELECT @str =REPLACE(REPLACE(@str,'(','@-'),')','-@')

-- Replace the '@' symbols with </X><X>
SET @xml = CAST(('<X>'+REPLACE(@str,@delimiter ,'</X><X>')+'</X>') AS XML)

SELECT @xml

The next step is to query the XML nodes using XQuery as shown under

--Query the XML using XQuery
SELECT 
 N.value('.', 'varchar(500)') AS QueryResult
FROM @xml.nodes('X') AS T(N)

This rather brings up an interesting pattern as the values within the original braces are now appended with - and -. So the final task is to pickup those values which does not starts or ends with '-' pattern

SELECT 
 N.value('.', 'varchar(500)') AS QueryResult 
FROM @xml.nodes('X') AS T(N)
WHERE PATINDEX('%[-]%', N.value('.', 'varchar(100)')) = 0

As can be figure out by using the PATINDEX function we are able to figure out the patterns of our choice.

We can figure out that, the last row is blank.We can remove that by using the below filter

SELECT 
 N.value('.', 'varchar(500)') AS QueryResult 
FROM @xml.nodes('X') AS T(N)
WHERE PATINDEX('%[-]%', N.value('.', 'varchar(100)')) = 0
AND LEN(N.value('.', 'varchar(500)')) > 0

As a final step, we need to merge the rows into one single column.We can do it in the below way

SELECT ExtractedText = 
  STUFF((SELECT '' + QueryResult                       
                 FROM CTE                        
                 FOR XML PATH('')), 
              1, 0,'')

The FOR XML PATH() does the trick with the hep of STUFF() function that helps to extract the relevant text

The final query is as under

DECLARE 
 @str AS VARCHAR(500)
 ,@xml AS XML
 ,@delimiter AS VARCHAR(1)

--Set the original string
SET @str= 'Hello!!!This is text1 (text1).This is text2(text2).This is text3(text3) and text4(text4)'

--Set the delimiter
SET @delimiter ='@'

-- Replace the ( and ) with '@-' and '-@' respectively
SELECT @str =REPLACE(REPLACE(@str,'(','@-'),')','-@')

-- Replace the '@' symbols with </X><X>
SET @xml = CAST(('<X>'+REPLACE(@str,@delimiter ,'</X><X>')+'</X>') AS XML)

--Query the XML using XQuery
;WITH CTE AS(
SELECT 
 N.value('.', 'varchar(500)') AS QueryResult 
FROM @xml.nodes('X') AS T(N)
WHERE PATINDEX('%[-]%', N.value('.', 'varchar(100)')) = 0
AND LEN(N.value('.', 'varchar(500)')) > 0)

SELECT ExtractedText = 
STUFF((SELECT '' + QueryResult                       
       FROM CTE                        
       FOR XML PATH('')), 
1, 0,'')

Reference

PATINDEX

XQuery Language Reference

Conclusion

This article has shown us a technique of filtering the records using PATINDEX and query XML nodes using XQuery.Hope this will be useful.Thanks for reading. Zipped file is attached herewith.

Recommendation
Read Find values within braces using T-SQL and XQuery after this article.
Page copy protected against web site content infringement by Copyscape

About the Author

Rajnilari2015
Full Name: Niladri Biswas (RNA Team)
Member Level: Platinum
Member Status: Member,Microsoft_MVP,MVP
Member Since: 3/17/2015 2:41:06 AM
Country: India
-- Thanks & Regards, RNA Team


Login to vote for this post.

Comments or Responses

Login to post response

Comment using Facebook(Author doesn't get notification)