Get all links inside a div and save them to a list

c# html html-agility-pack

Question

I use WebClient and downloadString() to download an html webpage, and then I attempt to compile all of the links between it and them into a list.

After many efforts and two hours of labor, I finally received all the links one time; other times, I just got one, and other times, none.

Here is an example of my code, where I have removed the Catch Block for easier reading.

List<string> getLinks = new List<string>();
for (int i = 0; i < wikiUrls.Length; i++)
{
    try
    {
        string download = client.DownloadString(wikiUrls[i]);
        string searchForDiv = "<div class=\"wiki\">";
        int firstCharacter = download.IndexOf(searchForDiv);
        //if wiki doens't exists, go to next element of for loop
        if (firstCharacter == -1)
            continue;
        else
        {
            HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
            document.LoadHtml(download);
            string nodes = String.Empty;
            var div = document.DocumentNode.SelectSingleNode("//div[@class=\"wiki\"]");
            if (div != null)
            {
                getLinks = div.Descendants("a").Select(node => node.GetAttributeValue("href", "Not found \n")).ToList(); 
                output.Text = string.Join(" ", getLinks);
            }
        }
    }
1
0
3/20/2017 1:04:33 PM

Accepted Answer

I got it. Because of

getLinks = div.Descendants("a").Select(node => node.GetAttributeValue("href", "Not found \n")).ToList();

Because GetLinks is in a for loop, it was continually being overwritten. I came up with this solution:

getLinks.AddRange(div.Descendants("a").Select(node => node.GetAttributeValue("href", String.Empty)).ToList()); 
1
3/20/2017 2:31:36 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow