Html Agility Pack, search through site for a specified string of words

c# html-agility-pack

Question

I'm using the Html Agility Pack for this task, basically I've got a URL, and my program should read through the content of the html page on it, and if it finds a line of text (ie: "John had three apples"), it should change a label's text to "Found it".

I tried to do it with contains, but I guess it only checks for one word.

var nodeBFT = doc.DocumentNode.SelectNodes("//*[contains(text(), 'John had three apples')]");

if (nodeBFT != null && nodeBFT.Count != 0)
    myLabel.Text = "Found it";

EDIT: Rest of my code, now with ako's attempt:

if (CheckIfValidUrl(v)) // foreach var v in a list..., checks if the URL works
{
    HtmlWeb hw = new HtmlWeb();
    HtmlDocument doc = hw.Load(v);

    try
    {
        if (doc.DocumentNode.InnerHtml.ToString().Contains("string of words"))
        {
            mylabel.Text = v;
        }
    ...

Accepted Answer

One possible option is using . instead of text(). Passing text() to contains() function the way you did will, as you suspected, effective only when the searched text is the first direct child of the current element :

doc.DocumentNode.SelectNodes("//*[contains(., 'John had three apples')]");

In the other side, contains(., '...') evaluates the entire text content of current element, concatenated. So, just a heads up, the above XPath will also consider the following element for example, as a match :

<span>John had <br/>three <strong>apples</strong></span>

If you need the XPath to only consider cases when the entire keyword contained in a single text node, and therefore considers the above case as a no-match, you can try this way :

doc.DocumentNode.SelectNodes("//*[text()[contains(., 'John had three apples')]]");

If none of the above works for you, please post minimal HTML snippet that contains the keyword but returned no match, so we can examine further what possibly causes that behavior and how to fix it.


Popular Answer

use this:

if (doc.DocumentNode.InnerHtml.ToString().Contains("John had three apples"))
    myLabel.Text="Found it";


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why