How Can I Find a Specific Node By Using SelectSingleNode From HtmlAgilityPack

html-agility-pack html-parsing selectsinglenode


Using the HtmlAgilityPack I am trying to obtain the text "9/30/2013" from a node on this website:

Here is the HTML from the website

<div id="financials-iframe-wrap">
<div class="nextgen thin">
<div class="table-headtag">
<div style="float:left;">
<h3 style="color:#fff;">Quarterly Income Statement (values in 000's)</h3>
<div style="float:right;">
<h3><a id="quotes_content_left_hlswitchtype" href="" style="color:#fff;">Get Annual Data</a></h3>
<div style="clear:both"></div>
<tbody><tr class="tr_BG_Color">
<th class="th_No_BG">Quarter:</th>
<th style="text-align:left;">Trend</th>
<tr class="tr_BG_Color">
<th class="th_No_BG">Quarter Ending:</th>

And here is my code

Dim wreq As HttpWebRequest = WebRequest.Create("")
    wreq.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv: Gecko/20091102 Firefox/3.5.5"
    wreq.Method = "get"
    Dim prox As IWebProxy = wreq.Proxy
    prox.Credentials = CredentialCache.DefaultCredentials
    Dim document As New HtmlAgilityPack.HtmlDocument
    Dim web As New HtmlAgilityPack.HtmlWeb
    web.UseCookies = True
    web.PreRequest = New HtmlAgilityPack.HtmlWeb.PreRequestHandler(AddressOf onPreReq)
    wreq.CookieContainer = cookies
    Dim res As HttpWebResponse = wreq.GetResponse()
    document.Load(res.GetResponseStream, True)
    Dim Page_Most_Recent_Quarter As Date = document.DocumentNode.SelectSingleNode("//*[@id='financials-iframe-wrap']/div/table//tr[2]/th[3]").InnerText

When my code reaches the last line I get this error Object reference not set to an instance of an object.

If I debug using Debug.WriteLine(document.DocumentNode.SelectSingleNode("//*[@id='financials-iframe-wrap']/div/table/tbody/tr[2]/th[3]")) a blank is returned.

What am I doing wrong?

10/30/2013 8:00:50 PM

Popular Answer

First of all, why are you creating a HttpWebRequest object? Let the Html Agility Pack do the heavy lifting for you:

    Dim doc As New HtmlAgilityPack.HtmlDocument()

    Dim web As New HtmlAgilityPack.HtmlWeb()

    web.UseCookies = True

    doc = web.Load("")

Once the HtmlDocument is loaded, we will extract the date:

        Dim dateNode As HtmlAgilityPack.HtmlNode = doc.DocumentNode.SelectSingleNode("//*[@id='financials-iframe-wrap']/div/table//tr[2]/th[3]")

        If dateNode IsNot Nothing Then
            Dim Page_Most_Recent_Quarter As Date = Convert.ToDateTime(dateNode.InnerHtml.Trim())
        End If

I tried this several times, and it works perfectly.

11/8/2013 3:52:24 PM

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow