只获得第一名
    • c# html-agility-pack

      这是问题所在。我有一个网站和几个子页面

      子页面:DAMSKIE,MÄ~SKIE,DZIECIÄ~CE,SPORT,AKCESORIA,PREMIUM,TOREBKI,WYPRZEDAÅ»,

      每一个都很少分类像“PóÅ,buty”,“Klapki”等。

      我可以获得Subpages,但我无法获得分类元素列表(PóÅ,buty,Klapki等)。如果列表看起来像:“PóÅ,buty”,“Klapki”,“Obcasy”我的代码只获得“PóÅ,buty”,但他没有得到“Klapki”或“Obcasy”。

      [子页面图片+我试图获得的元素列表] [1]

      using HtmlAgilityPack;
      using System;
      using System.Collections.Generic;
      using System.Linq;
      using System.Net.Http;
      using System.Text;
      using System.Threading.Tasks;
      
      namespace Crawler_Shoes
      {
          public class Crawl
          {
              private static string navBar = "megamenu__item";
              private const string shoesTypes = "sidebar-section__wrapper sidebar-section__wrapper--categories";
              private static string mainSite = "https://www.eobuwie.com.pl/";
              public static List<string> categoriesNames = new List<string>();
              public static List<string> linksNames = new List<string>();
              public static List<string> categoriesOfCategoriesNames = new List<string>();
              private readonly List<Shoes> shoes = new List<Shoes>();
      
              public static async Task<IEnumerable<HtmlNode>> HttpClient(string site, string descendant, string equals)
              {
                  var httpClient = new HttpClient();
                  var html = await httpClient.GetStringAsync(site);
                  var htmlDocument = new HtmlDocument();
                  htmlDocument.LoadHtml(html);
                  return htmlDocument.DocumentNode.Descendants(descendant)
                      .Where(node => node.GetAttributeValue("class", "").Equals(equals)).ToList();
              }
              public static async Task GetCategories()
              {
                  var menu = await HttpClient(mainSite, "li", navBar);                      
                  foreach (var nav in menu)
                  {
                      //links.Add(nav.Descendants("a").FirstOrDefault().ChildAttributes("href").FirstOrDefault().Value);
                      categoriesNames.Add(nav.Descendants("a").FirstOrDefault().InnerText); //gets names of categories
                      linksNames.Add(nav.Descendants("a").FirstOrDefault().ChildAttributes("href").FirstOrDefault().Value); //gets links for categories
                      if(categoriesNames.Last() == "\n\t\t\tWyprzedaż\t\t")
                      {
                          categoriesNames.Remove(categoriesNames.Last());
                          linksNames.Remove(categoriesNames.Last());
                      }
                  }
                  Crawl.GetCategoriesofCategories();
              }
              public static async Task GetCategoriesofCategories()
              {
                      for (var i = 0; i <= categoriesNames.Count-1; i++)
                      {
                          var categories = await HttpClient(linksNames.ElementAt(i), "ul", shoesTypes);
                          categoriesOfCategoriesNames.Add(categoriesNames.ElementAt(i));
                          foreach(var li in categories)
                          {
                              categoriesOfCategoriesNames.Add(li.Descendants("a").FirstOrDefault().ChildAttributes("href").FirstOrDefault().Value);
                          }
                      }
      
              }
          }
      }
      

      有问题的部分:

          public static async Task GetCategoriesofCategories()
                  {
                          for (var i = 0; i <= categoriesNames.Count-1; i++)
                          {
                              var categories = await HttpClient(linksNames.ElementAt(i), "ul", shoes
      
      Types);
                          categoriesOfCategoriesNames.Add(categoriesNames.ElementAt(i));
                          foreach(var li in categories)
                          {
                              categoriesOfCategoriesNames.Add(li.Descendants("a").FirstOrDefault().ChildAttributes("href").FirstOrDefault().Value);
                          }
                      }
      
              }
      

      热门答案

      我成功了:

      string url = "https://www.eobuwie.com.pl/damskie.html";
      HtmlWeb web = new HtmlWeb();
      HtmlDocument doc = web.Load(url);
      var sidebar = doc.DocumentNode.SelectSingleNode("//ul[@class='sidebar-section__wrapper sidebar-section__wrapper--categories']");
      var categories = sidebar.SelectNodes("li");
      foreach (var category in categories)
      {
          var anchor = category.SelectSingleNode("a");
          string shoeCategory = anchor.InnerText.Trim();
          Console.WriteLine(shoeCategory);
      }
      

      它与你的工作方式有点不同,但我至少希望你能从这里得到一些提示并将其应用到你自己的代码中。

      如果您还需要链接,请添加以下内容:

      string shoeCategoryLink = anchor.GetAttributeValue("href", string.Empty);
      


      Related

      许可下: CC-BY-SA with attribution
      不隶属于 Stack Overflow
      许可下: CC-BY-SA with attribution
      不隶属于 Stack Overflow