Obtaining text from HTML on WP7 using HtmlAgilityPack

c# html-agility-pack windows-phone-7

Question

I'm trying to extract text from HTML using HtmlAgilityPack. I successfully added HtmlAgilityPack to my project. However, I tried the following code to extract the body text:

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();

// There are various options, set as needed
htmlDoc.OptionFixNestedTags=true;

// filePath is a path to a file containing the html
htmlDoc.Load(filePath);

// Use:  htmlDoc.LoadXML(xmlString);  to load from a string

// ParseErrors is an ArrayList containing any errors from the Load statement
if (htmlDoc.ParseErrors!=null && htmlDoc.ParseErrors.Count>0)
{
    // Handle any parse errors as required
}
else
{
    if (htmlDoc.DocumentNode != null)
    {
        HtmlNode bodyNode = htmlDoc.DocumentNode.SelectSingleNode("//body");

        if (bodyNode != null)
        {
            // Do something with bodyNode
        }
    }
}

and I receive the following error when building the project.

Error 1 The type 'System.Xml.XPath.IXPathNavigable' is defined in an assembly that is not referenced. You must add a reference to assembly 'System.Xml.XPath, Version=2.0.5.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35'. D:\test\test\MainPage.xaml.cs 58

I should add that I added the System.Xml reference and I still get this error. Can you please help me out what this issue? Thanks.

Accepted Answer

Thanks. I figured out that I had to add a reference to the System.Xml.XPath from the Silverlight 4.0 folder available in the Microsoft SDKs parent folder.


Popular Answer

With HAP on the phone you'll have to use Linq2Xml to find stuff in the parsed HTML. And you might have to build the phone version from the source (HAPPhone).

public void Hap()
{
   HtmlWeb.LoadAsync("http://www.page.com", OnCallback);              
}



private void OnCallback(object s, HtmlDocumentLoadCompleted htmlDocumentLoadCompleted)
        {            
            var htmlDocument = htmlDocumentLoadCompleted.Document;

            var test = htmlDocument.DocumentNode.Descendants("select").ToList();


            var test2 = (from h in htmlDocument.DocumentNode.Descendants("select")
                         where h.Attributes["id"].Value == "stateDropdown"
                         select h).FirstOrDefault().ChildNodes.ToList();
        }


Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow