Using HtmlAgilityPack to extract text from HTML on WP7

c# html-agility-pack windows-phone-7


I'm trying to extract text from HTML using HtmlAgilityPack. I successfully added HtmlAgilityPack to my project. However, I tried the following code to extract the body text:

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();

// There are various options, set as needed

// filePath is a path to a file containing the html

// Use:  htmlDoc.LoadXML(xmlString);  to load from a string

// ParseErrors is an ArrayList containing any errors from the Load statement
if (htmlDoc.ParseErrors!=null && htmlDoc.ParseErrors.Count>0)
    // Handle any parse errors as required
    if (htmlDoc.DocumentNode != null)
        HtmlNode bodyNode = htmlDoc.DocumentNode.SelectSingleNode("//body");

        if (bodyNode != null)
            // Do something with bodyNode

and I receive the following error when building the project.

Error 1 The type 'System.Xml.XPath.IXPathNavigable' is defined in an assembly that is not referenced. You must add a reference to assembly 'System.Xml.XPath, Version=, Culture=neutral, PublicKeyToken=31bf3856ad364e35'. D:\test\test\MainPage.xaml.cs 58

I should add that I added the System.Xml reference and I still get this error. Can you please help me out what this issue? Thanks.

12/10/2011 3:36:21 PM

Accepted Answer

Thanks. I figured out that I had to add a reference to the System.Xml.XPath from the Silverlight 4.0 folder available in the Microsoft SDKs parent folder.

12/11/2011 6:36:53 AM

Popular Answer

With HAP on the phone you'll have to use Linq2Xml to find stuff in the parsed HTML. And you might have to build the phone version from the source (HAPPhone).

public void Hap()
   HtmlWeb.LoadAsync("", OnCallback);              

private void OnCallback(object s, HtmlDocumentLoadCompleted htmlDocumentLoadCompleted)
            var htmlDocument = htmlDocumentLoadCompleted.Document;

            var test = htmlDocument.DocumentNode.Descendants("select").ToList();

            var test2 = (from h in htmlDocument.DocumentNode.Descendants("select")
                         where h.Attributes["id"].Value == "stateDropdown"
                         select h).FirstOrDefault().ChildNodes.ToList();

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow