I'm trying to scrape some information from a website but can't find a solution that works for me. Every code I read on the Internet generates at least one error for me.
Even the example code at their homepage generates errors for me.
My code:
HtmlDocument doc = new HtmlDocument();
doc.Load("https://www.flashback.org/u479804");
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
{
HtmlAttribute att = link["href"];
att.Value = FixLink(att);
}
doc.Save("file.htm");
Generates the following error:
'HtmlDocument' is an ambiguous reference between 'System.Windows.Forms.HtmlDocument' and 'HtmlAgilityPack.HtmlDocument' C:*\Form1.cs
Edit: My entire code is located here: http://beta.yapaste.com/55
All help is very appreciated!
Use HtmlAgilityPack.HtmlDocument
:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
The compiler is getting confused because two of the namespaces you have imported with using
contain classes called HtmlDocument
- the HTML Agility Pack namespace, and the Windows Forms namespace. You can get around this by specifying which class you want to use explicitly.
this is how i achieved. Note that there is a code error given in main Html Agility Pack Example in foreach line doc.DocumentElement.SelectNodes("//a[@href"]). The correct and tested one is given below.
HtmlWeb hw = new HtmlWeb();
HtmlDocument doc = hw.Load(@"http://adityabajaj.com");
StringBuilder sb = new StringBuilder();
List<string> lstHref = new List<string>();
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]").Distinct())
{
string curHref = link.Attributes["href"].Value;
if(!lstHref.Contains(curHref))
lstHref.Add(curHref);
}
foreach (string str in lstHref)
{
sb.Append(str +"<br />");
}
Response.Write (sb.ToString());
Since it got working for me, I thought I should share.