LINQ and HTML Agility Pack filling the parsed HTML table data into a datatable

datatable html-agility-pack linq vb.net

Question

I am using the following query to parse html table data.

Dim q = From table In htmldoc.DocumentNode.SelectNodes("//table[@class='Seller']").Cast(Of HtmlNode)()
                    From row In table.SelectNodes("tr").Cast(Of HtmlNode)()
                    From header In row.SelectNodes("th").Cast(Of HtmlNode)()
                    From cell In row.SelectNodes("td").Cast(Of HtmlNode)()
               Select New With {Key .Table = table.Id, Key .CellText = cell.InnerText, Key .headerText = header.InnerText}

How can i use for each loops how can to fill this into a datatable?

I would create columns first using the header data then use a nested for each loop to fill the cell data in the table, but i am not sure how to, also any suggested changes on the above LINQ query?

Note: The html page contains only one table always.

Accepted Answer

Given the following html

Dim t = <table class='Seller' id='MyTable'>
            <tr>
                <th>FooColumn</th>
                <td>Foo</td>
                <td>Another Foo</td>
            </tr>
            <tr>
                <th>BarColumn</th>
                <td>Bar</td>
                <td>Another Bar</td>
            </tr>
            <tr>
                <th>ThirdColumn</th>
                <td>Third</td>
                <td>Another Third</td>
            </tr>
        </table>

Dim htmldoc = New HtmlAgilityPack.HtmlDocument()
htmldoc.LoadHtml(t.ToString())

and your query

Dim q = From table In htmldoc.DocumentNode.SelectNodes("//table[@class='Seller']")
            From row In table.SelectNodes("tr")
                From header In row.SelectNodes("th")
                From cell In row.SelectNodes("td")
        Select New With {.Table = table.Id, .CellText = cell.InnerText, .headerText = header.InnerText}

you can use GroupBy or ToLookup to group the objects by columns:

Dim grouped = q.ToLookup(Function(a) a.headerText)

and use this grouping to create a DataTable with the appropriate DataColumns:

Dim dt = new DataTable()

For Each h in grouped.Select(Function(g) g.Key)
    dt.Columns.Add(h)
Next

Now, for filling the DataTable, you have to "rotate" the grouping, since each group contains the data for one column, but we want the data for each row. Let's use a little helper method

Function Rotate(Of T, TR)(source As IEnumerable(Of IEnumerable(Of T)), 
                          selector As Func(Of IEnumerable(Of T), IEnumerable(Of TR))) As IEnumerable(Of IEnumerable(Of TR))

    Dim result = new List(Of IEnumerable(Of TR))
    Dim enums = source.Select(Function(e) e.GetEnumerator()).ToArray()
    While enums.All(Function(e) e.MoveNext())
        result.Add(selector(enums.Select(Function(e) e.Current)).ToArray())
    End While

    Return result
End Function

to fill the DataTable.

For Each rrow in Rotate(grouped, Function(row) row.Select(Function(e) e.CellText))
    dt.Rows.Add(rrow.ToArray())
Next 

And now the DataTable will look like this:

enter image description here




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why