This article will demonstrate the way to store special character like less than(<), greater than (>), ampersand(&) in XML data type
Introduction
Recently, I came across a situation where we need to store some emails with special characters (e.g. ampersand (&)) into an xml column.In this article we will see some of the approaches of handling the situation
Environment setup
Run the below script
DECLARE @T TABLE (OldEmailAddress VARCHAR(50))
--Declare 5 input variables
DECLARE @SimpleEmail VARCHAR(50) = 'mail2dotnetfunda@dotnetfunda.com'
DECLARE @EmailWith_Ampersand_SpecialCharacter VARCHAR(50) = 'mail2&.netfunda@.netfunda.com'
DECLARE @EmailWith_LessThan_SpecialCharacter VARCHAR(50) = 'mail2<.netfunda@.netfunda.com'
DECLARE @EmailWith_GreaterThan_SpecialCharacter VARCHAR(50) = 'mail2>.netfunda@.netfunda.com'
DECLARE @EmailWith_AllThe3_SpecialCharacter VARCHAR(50) = 'mail2<.net>funda@&.netfunda.com'
--Insert the records into the table
INSERT INTO @T
VALUES(@SimpleEmail)
,(@EmailWith_Ampersand_SpecialCharacter)
,(@EmailWith_LessThan_SpecialCharacter)
,(@EmailWith_GreaterThan_SpecialCharacter)
,(@EmailWith_AllThe3_SpecialCharacter)
--Project the records
SELECT *
FROM @T
Output

What is expected?
We need to create another column say "NewEmailAddress" which will be of type XML and need to import the data from "OldEmailAddress" column in a special format
The format is as under
<EmailCollections>
<Email>[EMAIl ADDRESS CONTENT FROM OldEmailAddress column]</Email>
</EmailCollections>
What happens if we just create another column and try to import the data?
Let us try to do so
DECLARE @T TABLE (OldEmailAddress VARCHAR(50),NewEmailAddress XML)
--Declare 5 input variables
DECLARE @SimpleEmail VARCHAR(50) = 'mail2dotnetfunda@dotnetfunda.com'
DECLARE @EmailWith_Ampersand_SpecialCharacter VARCHAR(50) = 'mail2&.netfunda@.netfunda.com'
DECLARE @EmailWith_LessThan_SpecialCharacter VARCHAR(50) = 'mail2<.netfunda@.netfunda.com'
DECLARE @EmailWith_GreaterThan_SpecialCharacter VARCHAR(50) = 'mail2>.netfunda@.netfunda.com'
DECLARE @EmailWith_AllThe3_SpecialCharacter VARCHAR(50) = 'mail2<.net>funda@&.netfunda.com'
--Try to insert the records into the table
INSERT INTO @T
VALUES(@SimpleEmail,@SimpleEmail)
,(@EmailWith_Ampersand_SpecialCharacter,@EmailWith_Ampersand_SpecialCharacter)
,(@EmailWith_LessThan_SpecialCharacter,@EmailWith_LessThan_SpecialCharacter)
,(@EmailWith_GreaterThan_SpecialCharacter,@EmailWith_GreaterThan_SpecialCharacter)
,(@EmailWith_AllThe3_SpecialCharacter,@EmailWith_AllThe3_SpecialCharacter)
--Project the records
SELECT *
FROM @T
It will throw the below error
Msg 9421, Level 16, State 1, Line 8
XML parsing: line 1, character 7, illegal name character
Why is the error?
It is very obvious. XML has some "special" characters and they need to be escaped or encoded to prevent being misinterpreted.
So what is the solution?
Well, there are indeed solutions for this problem and we will look into them here
Solution 1:Use the Replace function to replace the special characters
DECLARE @T TABLE (OldEmailAddress VARCHAR(50),NewEmailAddress XML)
--Declare 5 input variables
DECLARE @SimpleEmail VARCHAR(50) = 'mail2dotnetfunda@dotnetfunda.com'
DECLARE @EmailWith_Ampersand_SpecialCharacter VARCHAR(50) = 'mail2&.netfunda@.netfunda.com'
DECLARE @EmailWith_LessThan_SpecialCharacter VARCHAR(50) = 'mail2<.netfunda@.netfunda.com'
DECLARE @EmailWith_GreaterThan_SpecialCharacter VARCHAR(50) = 'mail2>.netfunda@.netfunda.com'
DECLARE @EmailWith_AllThe3_SpecialCharacter VARCHAR(50) = 'mail2<.net>funda@&.netfunda.com'
--Try to insert the records into the table
INSERT INTO @T
VALUES(@SimpleEmail,CAST('<EmailCollections><Email>' + @SimpleEmail + '</Email></EmailCollections>' AS XML))
,(@EmailWith_Ampersand_SpecialCharacter,CAST('<EmailCollections><Email>' + REPLACE(@EmailWith_Ampersand_SpecialCharacter, '&', '&') + '</Email></EmailCollections>' AS XML))
,(@EmailWith_LessThan_SpecialCharacter,CAST('<EmailCollections><Email>' + REPLACE(@EmailWith_LessThan_SpecialCharacter, '<', '<') + '</Email></EmailCollections>' AS XML))
,(@EmailWith_GreaterThan_SpecialCharacter,CAST('<EmailCollections><Email>' + REPLACE(@EmailWith_GreaterThan_SpecialCharacter, '>', '>') + '</Email></EmailCollections>' AS XML))
,(@EmailWith_AllThe3_SpecialCharacter,CAST('<EmailCollections><Email>' + REPLACE(REPLACE(REPLACE(@EmailWith_AllThe3_SpecialCharacter, '<', '<'),'>','>'),'&', '&') + '</Email></EmailCollections>' AS XML))
--Project the records
SELECT *
FROM @T
Output

As we can make out that, the REPLACE function though does the job but is not so clean.Specially consider the last case where we used 3 Replace function!!!
.
A slightly cleaner approach would be to create a function that will remove the noises.The function will look as under
CREATE FUNCTION [dbo].[RemoveNoise]
(
-- Add the parameters for the function here
@Input AS Varchar(100)
)
RETURNS VARCHAR(100)
AS
BEGIN
--Create a noise table
DECLARE @Noise TABLE(Noise VARCHAR(100),ReplaceChars VARCHAR(10))
INSERT INTO @Noise(Noise,ReplaceChars)
SELECT '<','<' UNION ALL SELECT '>', '>' UNION ALL
SELECT '&','"&"amp;'
-- Return the result of the function
SELECT @Input = REPLACE(@Input, Noise, ReplaceChars) FROM @Noise
RETURN @Input
END
GO
We can use this function as under
INSERT INTO @T
VALUES(@SimpleEmail,CAST('<EmailCollections><Email>' + dbo.RemoveNoise(@SimpleEmail) + '</Email></EmailCollections>' AS XML))
,(@EmailWith_Ampersand_SpecialCharacter,CAST('<EmailCollections><Email>' + dbo.RemoveNoise(@EmailWith_Ampersand_SpecialCharacter) + '</Email></EmailCollections>' AS XML))
,(@EmailWith_LessThan_SpecialCharacter,CAST('<EmailCollections><Email>' + dbo.RemoveNoise(@EmailWith_LessThan_SpecialCharacter) + '</Email></EmailCollections>' AS XML))
,(@EmailWith_GreaterThan_SpecialCharacter,CAST('<EmailCollections><Email>' + dbo.RemoveNoise(@EmailWith_GreaterThan_SpecialCharacter) + '</Email></EmailCollections>' AS XML))
,(@EmailWith_AllThe3_SpecialCharacter,CAST('<EmailCollections><Email>' + dbo.RemoveNoise(@EmailWith_AllThe3_SpecialCharacter) + '</Email></EmailCollections>' AS XML))
Solution 2:Use CDATA
DECLARE @T TABLE (OldEmailAddress VARCHAR(50),NewEmailAddress XML)
--Declare 5 input variables
DECLARE @SimpleEmail VARCHAR(50) = 'mail2dotnetfunda@dotnetfunda.com'
DECLARE @EmailWith_Ampersand_SpecialCharacter VARCHAR(50) = 'mail2&.netfunda@.netfunda.com'
DECLARE @EmailWith_LessThan_SpecialCharacter VARCHAR(50) = 'mail2<.netfunda@.netfunda.com'
DECLARE @EmailWith_GreaterThan_SpecialCharacter VARCHAR(50) = 'mail2>.netfunda@.netfunda.com'
DECLARE @EmailWith_AllThe3_SpecialCharacter VARCHAR(50) = 'mail2>.net<funda@&.netfunda.com'
--Try to insert the records into the table
INSERT INTO @T
VALUES(@SimpleEmail,CAST('<Emails><Email><![CDATA[' + + @SimpleEmail + ']]></Email></Emails>' AS XML))
,(@EmailWith_Ampersand_SpecialCharacter,CAST('<Emails><Email><![CDATA[' + + @EmailWith_Ampersand_SpecialCharacter + ']]></Email></Emails>' AS XML))
,(@EmailWith_LessThan_SpecialCharacter,CAST('<Emails><Email><![CDATA[' + + @EmailWith_LessThan_SpecialCharacter + ']]></Email></Emails>' AS XML))
,(@EmailWith_GreaterThan_SpecialCharacter,CAST('<Emails><Email><![CDATA[' + + @EmailWith_GreaterThan_SpecialCharacter + ']]></Email></Emails>' AS XML))
,(@EmailWith_AllThe3_SpecialCharacter,CAST('<Emails><Email><![CDATA[' + + @EmailWith_AllThe3_SpecialCharacter + ']]></Email></Emails>' AS XML))
--Project the records
SELECT *
FROM @T
The output remains same.As can be seen that it is quite handy to use CData in this situation since no Replacement is needed.
Conslusion
So in this article, we have seen as how to handle special characters while storing them into XML data type.Hope this will be useful.Zip file is attached