HTML AgilityPack Note Finding Header

c# html-agility-pack web-scraping

Question

I'm trying to pull data from a website.

and am having trouble pulling out some Header details. My code just skips over the headers It's the "<h4 class" that I am trying to pull out.

Also different browsers contain different data.

for example.

    <section class="results-list">
      <header>
        <h3>U.S. House</h3>
      </header>

      <section class="results-group">
        <header>
          <h4 class="district">Florida 1st congressional district</h4>
        </header>
        <div class="container">
          <div class="row clearfix">



<article class="results fifty">

  <header>
    <h4>Democrat primary</h4>
  </header>

  <section class="results-table">
    <table>
      <tr class="header results-table-row">
        <th class="vote-percent">Percent</th>
        <th class="candidate">Candidate</th>
        <th class="vote-count">Votes</th>
        <th class="winning">Winner</th>
      </tr>

        <tr>
          <td class="vote-percent">55%</td>
          <td class="candidate">Jennifer Zimmerman</td>
          <td class="vote-count">13090</td>
          <td class="winning">WINNER</td>
        </tr>

    </table>
  </section>
</article>

Here is my code.

        foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table"))
        {
            var temp = table.InnerHtml.ToString();

            foreach (HtmlNode row in table.SelectNodes("tr"))
            {
                ResultsListBox.Items.Add(row.InnerText.ToString());

                foreach (HtmlNode cell in row.SelectNodes("th|td"))
                {
                    ResultsListBox.Items.Add(cell.InnerText.ToString());
                    Console.WriteLine("cell: " + cell.InnerText);
                }
            }
        }

Accepted Answer

Assuming that there is only one header that you want to get in the page that is h4 element with class attribute, you can try the following XPath query :

var queryHeader = "//section/header/h4[@class]";
var header = doc.DocumentNode.SelectSingleNode(queryHeader);
Console.WriteLine("header: " + header.InnerText);



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why