The postings on this site are my own and do not represent my Employer's positions, advice or strategies.

LifeAsBob - Blog

 

Home

No Ads ever, except search!
Friday, March 29, 2024 Login
Public

Remove comments from an XML file with double dashes -- 12/18/2012 10:41:45 AM

XML is not my favorite and removing and validating malformed XML is even worse.

I'm trying to load the xml file, but it is failing. These comments make the xml invalid. The xml comes from a vendor.

I tried removing these based on approaches from other posts, but I was not successful. Here is an example of the xml:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!--MAIN VARIABLES-->
<content type="screwed">
<!--KEEP 19-39 -- SEE HELP.TXT AND THE VIDEO TUTORIALS FOR MORE INFO -->
<!--REGULAR/NON-Regular EXAMPLE --><SomeTag somefile="test.txt3" Name="test"/>
<!-- -->
</content>

I have tried the following without success:

string xmlDocFile = "c:\server\test.xml";

XmlReaderSettings readerSettings = new XmlReaderSettings();
readerSettings.IgnoreComments = true;
readerSettings.ProhibitDtd = false;
readerSettings.ValidationType = ValidationType.DTD;
XmlReader reader = XmlReader.Create(xmlDocFile, readerSettings);
XmlDocument myXmlDoc = new XmlDocument();
myXmlDoc.Load(reader);
myXmlDoc.Save(xmlDocFile);
The solution is before using XmlReader, parse xml file and filter comments out using regexp.
// using System.Text.RegularExpressions;
System.IO.StreamReader file= new System.IO.StreamReader(xmlDocFile);
string validXml = Regex.Replace(file.ReadToEnd(),"<!--.*?-->","");

XmlReader reader = XmlReader.Create(validXml);
 

Blog Home