Using ruby and nokogiri to parsing HTML using HTML comments as markers -


How can I use Ruby to extract information from a table containing these lines? Is it possible to find out the comments that use the nose?

  & lt ;! - Start Topic Entry 4134 - & gt; & Lt; TR & gt; & Lt; Td align = "center" square = "line2" & gt; & Lt; Image src = 'style_images / ip.boardpr / f_norm.gif' border = '0' alt = 'new post' / & gt; & Lt; / Td> & Lt; Td align = "center" width = "3%" square = "line1" & gt; & Amp; Nbsp; & Lt; / Td> & Lt; Td square = "row2" & gt; & Lt; Table class = 'ipbtable' cellspacing = "0" & ​​gt; & Lt; TR & gt; & Lt; Td valign = "middle" & gt; & Lt; Alink href = 'http: //www.xxx.com/index.php? Shotekak = 4134 & amp; See = Similar '& gt; & Lt; Image src = 'style_images / ip.boardpr / newpost.gif' border = '0' alt = 'goto previous unread' title = 'goto previous unread' hspace = 2 & gt; & Lt; / A & gt; & Lt; / Td> & Lt; Td width = "100%" & gt; & Lt; Div style = 'float: right' & gt; & Lt; / Div & gt; & Lt; Div & gt; & Lt; Alink href = "http://www.xxx.com/index.php?showtopic=4134&hl=" & gt; EXTRACT LINK 1 & lt; / A & gt; & Lt; / Div & gt; & Lt; / TD & gt; & Lt; / TR & gt; & Lt; / Table & gt; & Lt; Span class = "desc" & gt; EXTRACT Description & lt; / Span & gt; & Lt; / TD & gt; & Lt; Td square = "line2" width = "15%" & gt; & Lt; Span class = "foredicine" & gt; & Lt; Alink href = "http://www.xxx.com/index.php?showforum=19" title = "living" & gt; EXTRACT LINK 2 & lt; / A & gt; & Lt; / Span & gt; & Lt; / Td> & Lt; Td align = "center" square = "line1" width = '10% '& gt; & Lt; Alink href = 'http: //www.xxx.com/index.php? Showuser = 1642 '& gt; Mr. P & LT; / A & gt; & Lt; / TD & gt; & Lt; Td align = "center" square = "line2" & gt; & Lt; Alink href = "javascript: who_posted (4134);" & Gt; 1 & lt; / A & gt; & Lt; / Td> & Lt; Td align = "center" square = "line1" & gt; 46 & lt; / Td> & Lt; Td class = "row1" & gt; & Lt; Span class = "desc" & gt; Today, 12:04 am & lt; Br / & gt; & Lt; Alink href = "http://www.xxx.com/index.php?shitopic = 4134 & amp; see = post matching '& gt; last post :: & lt; / a & gt; & lt; b & Gt; alink href = 'http: //www.xxx.com/index.php?souwer=1649' & gt; Underfoot & lt; / a & gt; & lt; / b & gt; & lt; / span & Gt; & lt; / td & gt; & lt; / tr & gt; & lt;! - End Subject Entry 4134 - & gt; - & gt;  

You can apply a Nokogiri. This can be done at the first sight as fast as you can get events for elements, attributes and comments. .

Your Parser Within, you should remember the state, like @ circuit - to know which part to remember interest and what is not.


Comments

Popular posts from this blog

c# - ListView onScroll event -

PHP - get image from byte array -

Linux Terminal Problem with Non-Canonical Terminal I/O app -