Using HTMLAgilityPack Extract text, which is not between tags and comes after specific node

c# html html-agility-pack web-scraping xpath

Question

HTML code:

 <b> CAR </b>
    <br></br>
  Car is something you can drive.
    <br></br>
    <br></br>

C# code:

        HtmlAgilityPack.HtmlDocument doc = new HtmlWeb().Load("http://website.com/x.html");

        if (doc != null)
        {
            HtmlNode link = doc.DocumentNode.SelectSingleNode("//b[contains(text(), 'CAR')]");

            webBrowser1.DocumentText = link.InnerText;
            webBrowser1.AllowNavigation = true;

            webBrowser1.ScriptErrorsSuppressed = true;
            webBrowser1.Visible = true;
        }

What I manage to get: CAR

I need to get:
CAR
Car is something you can drive.

Any suggestions? I have tried adding next nodes, but it I gave NullReferenceExceptions : "//b[contains(text(), 'CAR')/br]" and "//b[contains(text(), 'CAR')/br/br]"

Thanks in advance. PS.I Would like to avoid Regex..

Accepted Answer

XPATH is case-sensitive (see here for more on this: Is it possible to ignore case using xpath and c#? ) plus the second phrase that contains 'Car' is not a child a B element. You could have it work like this:

HtmlDocument doc = new HtmlWeb().Load("http://website.com/x.html");
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//text()[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'car')]"))
{
    Console.WriteLine(node.InnerText);
}

In a console application, it will output this:

 CAR

  Car is something you can drive.


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why