With HTML Agility Pack, you can parse tables in HTML documents and extract TRs and TDs.

html-agility-pack html-parsing vb.net

Question

I assigned a task to change the format of old data in table form.

Old fake data looks like this:

<table>
<tr>
<td>Some text 1.</td>
<td>Some text 2.</td>
</tr>
..... //any number of TRs goes here
</table>

The issue arises because the new data must be in this format:

Several texts: 1. Several texts: 2.

Summary of what must be done in this case:

The table contains all TRs. Find the first TD for each TR, then join it to the second TD by placing a "-" between them.

I'm using VB.Net with HTML Agility Pack.

Please Offer Your Assistance.

Regards and thanks.

1
0
4/18/2012 5:31:49 PM

Popular Answer

To obtain all tds from the table node, retrieve all the InnerText of these nodes, and construct a new TR or TD, you may use Linq and HTML Agility Pack.

// tableNode is the <table> HtmlNode. If you know where is this table you can use XPath to find him.

Dim sb As New StringBuilder()
For Each childNode As HtmlNode In tableNode.DescendantNodes().Where(Function(n) n.Name = "td")
    sb.Append(String.Format("{0} - ", childNode.InnerText))
Next

tableNode.RemoveAllChildren()

Dim newTrNode As HtmlNode = tableNode.OwnerDocument.CreateElement("tr")
Dim newTdNode As HtmlNode = tableNode.OwnerDocument.CreateElement("td")

newTdNode.InnerHtml = sb.ToString()
newTrNode.AppendChild(newTdNode)

tableNode.AppendChild(newTrNode)

Hope it's useful.

0
5/3/2012 8:32:47 AM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow