With HTML Agility Pack, you can parse tables in HTML documents and extract TRs and TDs.

html-agility-pack html-parsing vb.net

Question

I've given a job to convert old data in table format to new format.

Old dummy data is as follows:

<table>
<tr>
<td>Some text 1.</td>
<td>Some text 2.</td>
</tr>
..... //any number of TRs goes here
</table>

The problem is that the new data needs to be in this format:

Some text 1. - Some text 2. ....

Summary of what needs to be done here:

Find all TRs in the table. for each TR find first TD and concatenate with second TD separated by " - ".

I am using HTML Agility Pack with VB.Net.

Please Help.

Thanks and regards.

1
0
4/18/2012 5:31:49 PM

Popular Answer

You can use Linq and HtmlAgilityPack to get all td's from the table node, get all the InnerText of this nodes and create a new TR / TD.

// tableNode is the <table> HtmlNode. If you know where is this table you can use XPath to find him.

Dim sb As New StringBuilder()
For Each childNode As HtmlNode In tableNode.DescendantNodes().Where(Function(n) n.Name = "td")
    sb.Append(String.Format("{0} - ", childNode.InnerText))
Next

tableNode.RemoveAllChildren()

Dim newTrNode As HtmlNode = tableNode.OwnerDocument.CreateElement("tr")
Dim newTdNode As HtmlNode = tableNode.OwnerDocument.CreateElement("td")

newTdNode.InnerHtml = sb.ToString()
newTrNode.AppendChild(newTdNode)

tableNode.AppendChild(newTrNode)

I hope it helps

0
5/3/2012 8:32:47 AM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow