Using HTMLAgilityPack Extract text, which is not between tags and comes after specific node

c# html html-agility-pack web-scraping xpath

Question

CSS code:

 <b> CAR </b>
    <br></br>
  Car is something you can drive.
    <br></br>
    <br></br>

C# code:

        HtmlAgilityPack.HtmlDocument doc = new HtmlWeb().Load("http://website.com/x.html");

        if (doc != null)
        {
            HtmlNode link = doc.DocumentNode.SelectSingleNode("//b[contains(text(), 'CAR')]");

            webBrowser1.DocumentText = link.InnerText;
            webBrowser1.AllowNavigation = true;

            webBrowser1.ScriptErrorsSuppressed = true;
            webBrowser1.Visible = true;
        }

What I am able to acquire: CAR

I must obtain:
CAR
A car is a vehicle you can operate.

Any recommendations? I tried adding the following nodes, but it failed with the following NullReferenceException: "/b[contains(text(), 'CAR')/br]" "/b[contains(text(), 'CAR')/br/br]" is also used.

I appreciate it. PS. I want to stay away from regex.

1
0
5/10/2013 7:18:37 AM

Accepted Answer

XPATH respects case (see here for more on this: Can one disregard case while using xpath and C#? ) Moreover, the second phrase that includes "Car" is not a B element's child. It might function as follows:

HtmlDocument doc = new HtmlWeb().Load("http://website.com/x.html");
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//text()[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'car')]"))
{
    Console.WriteLine(node.InnerText);
}

It will produce the following in a console application:

 CAR

  Car is something you can drive.
0
5/23/2017 11:49:16 AM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow