C#如何遍歷列表,更新相同的列表,並繼續循環?

c# for-loop html-agility-pack loops web-scraping

我正在寫一個抓取特定網址並將其添加到列表中的webscraper。

using HtmlAgilityPack;

List<string> mylist = new List<string>();
var firstUrl = "http://example.com";
HtmlWeb web = new HtmlWeb();
HtmlDocument document = web.Load(firstUrl);

HtmlNodeCollection nodes = document.DocumentNode.SelectNodes("//div[contains(@class,'Name')]/a");
            foreach (HtmlNode htmlNode in (IEnumerable<HtmlNode>)nodes)
            {
                if (!mylist.Contains(htmlNode.InnerText))
                {
                    mylist.Add(htmlNode.InnerText);

                }

            }

在這一點上我想要做的是循環“mylist”並做同樣的事情並且基本上永遠地繼續。代碼應該採用新解析的URL並將它們添加到列表中。最簡單的方法是什麼?

我嘗試在上面的那個之後創建一個for循環。但它似乎沒有更新列表。它將只會繼續循環遍歷列表中已有的相同項目(因為我將始終小於mylist.Count)

for (int i = 0; i < mylist.Count; i++)
            {
                //the items in mylist are added to the url
                var urls = "http://example.com" + mylist[i];

                HtmlWeb web = new HtmlWeb();
                HtmlDocument document = web.Load(urls);

                HtmlNodeCollection nodes = document.DocumentNode.SelectNodes("//div[contains(@class,'Name')]/a");
                foreach (HtmlNode htmlNode in (IEnumerable<HtmlNode>)nodes)
                {
                    if (!mylist.Contains(htmlNode.InnerText))
                    { 
                        mylist.Add(htmlNode.InnerText);
                    }

                }


            }

謝謝!

一般承認的答案

Queue符合您的要求。

Queue<string> mylist = new Queue<string>();

第一關:

using HtmlAgilityPack;

Queue<string> mylist = new Queue<string>();
var firstUrl = "http://example.com";
HtmlWeb web = new HtmlWeb();
HtmlDocument document = web.Load(firstUrl);

HtmlNodeCollection nodes = document.DocumentNode.SelectNodes("//div[contains(@class,'Name')]/a");
            foreach (HtmlNode htmlNode in (IEnumerable<HtmlNode>)nodes)
            {
                if (!mylist.Contains(htmlNode.InnerText))
                {
                    mylist.Enqueue(htmlNode.InnerText);

                }
            }

現在第二關

while (mylist.Count > 0)
            {
                var url = mylist..Dequeue();
                //the items in mylist are added to the url
                var urls = "http://example.com" + url;

                HtmlWeb web = new HtmlWeb();
                HtmlDocument document = web.Load(urls);

                HtmlNodeCollection nodes = document.DocumentNode.SelectNodes("//div[contains(@class,'Name')]/a");
                foreach (HtmlNode htmlNode in (IEnumerable<HtmlNode>)nodes)
                {
                    if (!mylist.Contains(htmlNode.InnerText))
                    { 
                        mylist.Enqueue(htmlNode.InnerText);
                    }

                }


            }

熱門答案

去NuGet“System.Interactive”,然後執行以下操作:

var found = new HashSet<string>();

var urls =
    EnumerableEx
        .Expand(
            new[] { "http://example.com" },
            url =>
            {
                var web = new HtmlWeb();
                var document = web.Load(url);
                var nodes = document.DocumentNode.SelectNodes("//div[contains(@class,'Name')]/a");
                return
                    nodes
                        .Cast<HtmlNode>()
                        .Select(x => x.InnerText)
                        .Where(x => !found.Contains(x))
                        .Do(x => found.Add(x))
                        .Select(x => "http://example.com" + x);
            });


Related

許可下: CC-BY-SA with attribution
不隸屬於 Stack Overflow
許可下: CC-BY-SA with attribution
不隸屬於 Stack Overflow