Remove HTML tags using Regular Expression in C#

Posted by Raja under C# category on 7/1/2020 | Points: 40 | Views : 23376

Post Code |

Search Codes |

Code Home

Here is a C# function that can be used to remove HTML tags from the content. It will ensure that returned content is pure text.

public static string RemoveHtml(string source)
{
    return Regex.Replace(source, "<.*?>|&.*?;", string.Empty);
}

This also removes   (blank space) from the content.

Thanks

Alert Moderator

Bookmark It

Comments or Responses

Posted by: Ishan7 on: 12/15/2020 Level:Starter | Status: [Member] | Points: 10

As often stated before, you should not use regular expressions to process XML or HTML documents. They do not perform very well with HTML and XML documents, because there is no way to express nested structures in a general way.

You could use the following.

String result = Regex.Replace(htmlDocument, @"<[^>]*>", String.Empty);

This will work for most cases, but there will be cases (for example CDATA containing angle brackets) where this will not work as expected.

Reference: https://stackoverflow.com/a/787951/11954917

Latest Code Snipptes