只獲得第一名
    • c# html-agility-pack

      這是問題所在。我有一個網站和幾個子頁面

      子頁面:DAMSKIE,MÄ~SKIE,DZIECIÄ~CE,SPORT,AKCESORIA,PREMIUM,TOREBKI,WYPRZEDAÅ»,

      每一個都很少分類像“PóÅ,buty”,“Klapki”等。

      我可以獲得Subpages,但我無法獲得分類元素列表(PóÅ,buty,Klapki等)。如果列表看起來像:“PóÅ,buty”,“Klapki”,“Obcasy”我的代碼只獲得“PóÅ,buty”,但他沒有得到“Klapki”或“Obcasy”。

      [子頁面圖片+我試圖獲得的元素列表] [1]

      using HtmlAgilityPack;
      using System;
      using System.Collections.Generic;
      using System.Linq;
      using System.Net.Http;
      using System.Text;
      using System.Threading.Tasks;
      
      namespace Crawler_Shoes
      {
          public class Crawl
          {
              private static string navBar = "megamenu__item";
              private const string shoesTypes = "sidebar-section__wrapper sidebar-section__wrapper--categories";
              private static string mainSite = "https://www.eobuwie.com.pl/";
              public static List<string> categoriesNames = new List<string>();
              public static List<string> linksNames = new List<string>();
              public static List<string> categoriesOfCategoriesNames = new List<string>();
              private readonly List<Shoes> shoes = new List<Shoes>();
      
              public static async Task<IEnumerable<HtmlNode>> HttpClient(string site, string descendant, string equals)
              {
                  var httpClient = new HttpClient();
                  var html = await httpClient.GetStringAsync(site);
                  var htmlDocument = new HtmlDocument();
                  htmlDocument.LoadHtml(html);
                  return htmlDocument.DocumentNode.Descendants(descendant)
                      .Where(node => node.GetAttributeValue("class", "").Equals(equals)).ToList();
              }
              public static async Task GetCategories()
              {
                  var menu = await HttpClient(mainSite, "li", navBar);                      
                  foreach (var nav in menu)
                  {
                      //links.Add(nav.Descendants("a").FirstOrDefault().ChildAttributes("href").FirstOrDefault().Value);
                      categoriesNames.Add(nav.Descendants("a").FirstOrDefault().InnerText); //gets names of categories
                      linksNames.Add(nav.Descendants("a").FirstOrDefault().ChildAttributes("href").FirstOrDefault().Value); //gets links for categories
                      if(categoriesNames.Last() == "\n\t\t\tWyprzedaż\t\t")
                      {
                          categoriesNames.Remove(categoriesNames.Last());
                          linksNames.Remove(categoriesNames.Last());
                      }
                  }
                  Crawl.GetCategoriesofCategories();
              }
              public static async Task GetCategoriesofCategories()
              {
                      for (var i = 0; i <= categoriesNames.Count-1; i++)
                      {
                          var categories = await HttpClient(linksNames.ElementAt(i), "ul", shoesTypes);
                          categoriesOfCategoriesNames.Add(categoriesNames.ElementAt(i));
                          foreach(var li in categories)
                          {
                              categoriesOfCategoriesNames.Add(li.Descendants("a").FirstOrDefault().ChildAttributes("href").FirstOrDefault().Value);
                          }
                      }
      
              }
          }
      }
      

      有問題的部分:

          public static async Task GetCategoriesofCategories()
                  {
                          for (var i = 0; i <= categoriesNames.Count-1; i++)
                          {
                              var categories = await HttpClient(linksNames.ElementAt(i), "ul", shoes
      
      Types);
                          categoriesOfCategoriesNames.Add(categoriesNames.ElementAt(i));
                          foreach(var li in categories)
                          {
                              categoriesOfCategoriesNames.Add(li.Descendants("a").FirstOrDefault().ChildAttributes("href").FirstOrDefault().Value);
                          }
                      }
      
              }
      

      熱門答案

      我成功了:

      string url = "https://www.eobuwie.com.pl/damskie.html";
      HtmlWeb web = new HtmlWeb();
      HtmlDocument doc = web.Load(url);
      var sidebar = doc.DocumentNode.SelectSingleNode("//ul[@class='sidebar-section__wrapper sidebar-section__wrapper--categories']");
      var categories = sidebar.SelectNodes("li");
      foreach (var category in categories)
      {
          var anchor = category.SelectSingleNode("a");
          string shoeCategory = anchor.InnerText.Trim();
          Console.WriteLine(shoeCategory);
      }
      

      它與你的工作方式有點不同,但我至少希望你能從這裡得到一些提示並將其應用到你自己的代碼中。

      如果您還需要鏈接,請添加以下內容:

      string shoeCategoryLink = anchor.GetAttributeValue("href", string.Empty);
      


      Related

      許可下: CC-BY-SA with attribution
      不隸屬於 Stack Overflow
      這個KB合法嗎? 是的,了解原因
      許可下: CC-BY-SA with attribution
      不隸屬於 Stack Overflow
      這個KB合法嗎? 是的,了解原因