c# - Regex matching tags -
I have a text of the following text, out of which I have all the & lt; Td ??? Want to remove? & Gt; ??? & Lt; / Td>
Tags
& lt; Tr id = row509 & gt; & Lt; Td id = serv509 align = center class = 'style1' & gt; Z Deviazion Techniko Home Verso S 24 [Non-USAT]] & Lt; Td align = center class = 'style4' & gt; 23 & lt; / Td> & Lt; Td align = center class = 'style10' & gt; 22 & lt; / Td> & Lt; Td alignment = center class = 'style6' & gt; 0 & lt; / Td> & Lt; Td alignment = center class = 'style2' & gt; 0 & lt; / Td> & Lt; Td id = rowtot509 align = center class = 'style6' & gt; 0 & lt; / Td> & Lt; Td alignment = center class = 'style6' & gt; 0 & lt; / Td> & Lt; Td alignment = center class = 'style2' & gt; 0 & lt; / Td> & Lt; Td alignment = center class = 'style6' & gt; 0 & lt; / Td> & Lt; / TR & gt;
Expected results will be:
1 & lt; Td id = serv509 align = center class = 'style1' & gt; Z Deviazion Techniko Home Verso S 24 [Non-USAT]] 2. & lt; Td align = center class = 'style4' & gt; 23 & lt; / Td> 3. & lt; Td align = center class = 'style10' & gt; 22 & lt; / Td> [..]
any help?
What's the problem using an HTML or XML library?
Using XML and XPath, for example, this would be just a matter of doing xml / td
, how the Library API supports it.
Regex is a detrimental way to do this, because XMLs are not a regular language, especially you can tag nests inside other tags, and this is something that is not present with regular expressions. Can be done
Therefore, while the simplest case ( & lt; TD. *? & Lt; / td & gt;
), it can be easily broken if the XML changed slightly Ho.
It is valid that XML is broken, but you fix it with Regex it. For example, if you replace (\ w +) = (\ w +)
in $ 1 = '$ 2'
(or \ 1 =) For example, '\ 2',
, if it changes the syntax pattern of C #), then you will get a valid XML.
Comments
Post a Comment