HTML Agility Pack and LINQ

c# html-agility-pack linq web-scraping


I want to use HAP to scrape data from a table on a website, loop through the rows to find a value in a column that matches a predefined string and then store only that row that matches. Then I will have a dictionary with the column header as the key, and the column text for the selected row as the value.

Table ex.

<table id="Table3">
<td>Last Name</td>
<td>First Name</td>
<td>Birth Date</td>

<td>&nbsp;DUNN          &nbsp;</td>
<td>&nbsp;JOE          &nbsp;</td>

<td>&nbsp;SMITH          &nbsp;</td>
<td>&nbsp;MARY          &nbsp;</td>

<td>&nbsp;ROCKFORD          &nbsp;</td>
<td>&nbsp;BILL          &nbsp;</td>


If my DOB date I want to match is 20000320 then I want all of the info on Bill.

Adding the header titles to the list is no problem. I know I don't have the user row written up right. What I have is still trying to get a list of rows instead of one row. Another problem I run into with the user row is the inner text will come back with "&nbsp" in it and I can't just do a .Replace so I need a way to remove the spaces. I'm open to all suggestions. Smarter ways of doing all of this etc.

List<string> headerList = new List<string>();
List<string> userList = new List<string>();

var htmlRows = htmlDoc.DocumentNode.SelectNodes("//*[@id=\"Table3\"]/tbody/tr");
if(htmlRows != null)
     // Add first row which contains column headings
         .Select(td => td.InnerText.Trim())
         .ForEach(header => headerList.Add(header));

     // Add user rows
         .Select(tr => tr.Elements("td")
             .Where(td => td.InnerText.Trim() == dteDOB))
         .ForEach(row => userList.Add(row));

    for(int i = 0; i < headerList.Count; i++)
        if(headerList.Count == userList.Count && userList[i] != null)
            dictValues.Add(headerList[i], userList[i]);                 
2/19/2013 6:42:16 PM

Accepted Answer

you can try to select the whole tr using the value in td I think

//*[@id=\"Table3\"]/tbody/tr[td//text()[contains(., 'targetString')]]

take a look at this

XPath to select a table row that has a cell containing specified text

5/23/2017 10:24:51 AM

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow