using HtmlAgilityPack for parsing a web page information in C#

c# html html-agility-pack

Question

I'm attempting to parse information from a web page using HTML Agility Pack. Here's my code:

using System;
using HtmlAgilityPack;

namespace htmparsing
{
    class MainClass
    {
        public static void Main (string[] args)
        {
            string url = "https://bugs.eclipse.org";
            HtmlWeb web = new HtmlWeb();
            HtmlDocument doc = web.Load(url);
            foreach(HtmlNode node in doc){
                //do something here with "node"
            }               
        }
    }
}

However, when I attempted to accessdoc.DocumentElement.SelectNodes I am blind.DocumentElement the listing. I added HtmlAgilityPack.dll to the references, but the issue still persists.

1
3
4/29/2016 2:35:51 PM

Accepted Answer

In one of my articles, I use ASP.NET and HAP to show how to scrape DOM elements. It just enables you to carry out the whole procedure piece by piece. You may look at it and give it a go.

Utilizing ASP.NET's HTMLAgilityPack (HAP) to scrape HTML DOM elements

And as for your procedure, I've had no problems with it. I made one tweak to the way you did it and tested it.

string url = "https://www.google.com";
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(url);
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//a")) 
{
    outputLabel.Text += node.InnerHtml;
}

got the results as anticipated. The issue is that you are requesting DocumentElement from an object that should be DocumentNode and is really HtmlDocument. Here is a response to the issue you are having from a HTMLAgilityPack developer.

DocumentElement.HTMLDocument not found in object browser

12
11/9/2013 2:32:53 AM

Popular Answer

The conduct you see is appropriate.

Look at what you're doing right now: http://htmlagilitypack.codeplex.com/SourceControl/latest#Release/1_4_0/HtmlAgilityPack/HtmlNode.cs.

You're requesting that the top element choose nodes that match a certain xpath. If your xpath expression doesn't begin with a// you're asking it for descendent nodes known as relative nodes. No element is a descendant of another element, hence a document element cannot be a descendent of another document element.



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow