How to get the inner text for a single node using HtmlAgilityPack

c# html-agility-pack

Question

My HTML looks like this:

        <div id="footer">
            <div id="footertext">
                <p> 
                    Copyright &copy; FUCHS Online Ltd, 2013. All Rights Reserved.
                </p>
             </div>
        </div>

I would like to obtain this text from the markup and store it as a string in my C# code: "Copyright © FUCHS Online Ltd, 2013. All Rights ".

This is what I have tried:

   public string getvalue()
        {
            HtmlWeb web = new HtmlWeb();
            HtmlAgilityPack.HtmlDocument doc = web.Load("www.fuchsonline.com");
            var link = doc.DocumentNode.SelectNodes("//div[@id='footertext']");
            return link.ToString();
        }

This returns an object of type "HtmlAgilityPack.HtmlNodeCollection". How do I get just this text value?

Popular Answer

You need the value of one node. Therefore it is better to use SelectSingleNode method.

HtmlWeb web = new HtmlWeb();
var doc = web.Load("http://www.fuchsonline.com");
var link = doc.DocumentNode.SelectSingleNode("//div[@id='footertext']/p");

string rawText = link.InnerText.Trim();
string decodedText = HttpUtility.HtmlDecode(text); // or WebUtility

return decodedText;

Also you may need to decode the html entity &copy;.



Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow