Get HtmlAgilityPack Node using exact HTML search or Converting HTMLElement to HTMLNode

browser c# html-agility-pack

Question

I have created a HTMLElement picker (DOM) by using the default .net WebBrowser. The user can pick (select) a HTMLElement by clicking on it.

I want to get the HtmlAgilityPack.HTMLNode corresponding to the HTMLElement.

The easiest way (in my mind) is to use doc.DocumentNode.SelectSingleNode(EXACTHTMLTEXT) but it does not really work (because the function only accepts xpath code).

How can I do this?

A sample HTMLElement select by a user looks like this (The OuterHtml Code):

<a onmousedown="return wow" class="l" href="http://site.com"><em>Great!!!</em> <b>come and see more</b></a>

Of course, any element can be selected, that's why I need a way to get the HTMLNode.

Accepted Answer

Same concept, but a bit simpler because you don't have to know the element type:

HtmlNode n = doc.DocumentNode.Descendants().Where(n => n.OuterHtml.Equals(text, StringComparison.InvariantCultureIgnoreCase)).FirstOrDefault();

Popular Answer

I came up with a solution. Don't know if it's the best (I would appreciate if somebody knows a better way to achieve this to let me know).

Here is the class that will get the HTMLNode:

public HtmlNode GetNode(string text)
        {

            if (text.StartsWith("<")) //get the type of the element (a, p, div etc..)
            {
                string type = "";
                for (int i = 1; i < text.Length; i++)
                {
                    if (text[i] == ' ')
                    {
                        type = text.Substring(1, i - 1);
                        break;
                    }
                }

                try //check to see if there are any nodes of your HTMLElement type that have an OuterHtml equal to the HtmlElement Outer HTML. If a node exist, than that's the node we want to use
                {
                    HtmlNode n = doc.DocumentNode.SelectNodes("//" + type).Where(x => x.OuterHtml == text).First();
                    return n;
                }
                catch (Exception)
                {
                    throw new Exception("Cannot find the HTML element in the HTML Page");
                }
            }
            else
            {
                throw new Exception("Invalid HTML Element supplied. The selected HTML element must start with <");
            }
        }

The idea is that you pass the OuterHtml of the HtmlElement. Example:

HtmlElement el=....
HtmlNode N = GetNode(el.OuterHtml);



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why