Tabella di analisi con problema HtmlAgilitypack

asp.net c# html html-agility-pack

Domanda

Sto cercando di analizzare un tavolo che assomiglia a questo:

<table><tbody>
<tr><th a href=""></th><th></th></tr>
<tr><td class="v"></td><td class="d"></td><td class="h"></td><td class="a">   </td><td class="o"></td><td class="o"></td><td class="o"></td><td class="p"><table class="p" title="ttt"></table></td></tr>
<tr><td class="v"></td><td class="d"></td><td class="h"></td><td class="a">   </td><td class="o"></td><td class="o"></td><td class="o"></td><td class="p"><table class="p" title="eee"></table></td></tr>
<tr><td class="v"></td><td class="d"></td><td class="h"></td><td class="a">   </td><td class="o"></td><td class="o"></td><td class="o"></td><td class="p"><table class="p" title="rtr"></table></td></tr>
<tr><th a href=""></th><th></th></tr>
<tr><td class="v"></td><td class="d"></td><td class="h"></td><td class="a">   </td><td class="o"></td><td class="o"></td><td class="o"></td><td class="p"><table class="p" title="ouu"></table></td></tr>
<tr><td class="v"></td><td class="d"></td><td class="h"></td><td class="a">   </td><td class="o"></td><td class="o"></td><td class="o"></td><td class="p"><table class="p" title="teee"></table></td></tr>
</tbody></table>

E sto usando questo codice in ASP.net per ottenere le celle in ogni riga che voglio:

var getHtmlWeb = new HtmlWeb();
var document = getHtmlWeb.Load(txtbox.Text); 
//get tables
foreach (HtmlNode table in document.DocumentNode.SelectNodes("//table"))
        {
            //get each table row
            foreach (HtmlNode row in table.SelectNodes("tr"))
            {

                Outputlabel.Text += "row: <br />";
//get table head tags that have a link, get the Inner text
                if((row.SelectSingleNode("//th//a").InnerText) != null)
                {

                    Outputlabel.Text += row.SelectSingleNode("//th//a").InnerText + "<br />";
                }
                // get the cells with the classes I want
                    string d = row.SelectSingleNode("//td[@class='d']").InnerText;
                    Outputlabel.Text += row.SelectSingleNode("//td[@class='d']").InnerText + " ";

                    string h = row.SelectSingleNode("//td[@class='h']").InnerText;
                    Outputlabel.Text += row.SelectSingleNode("//td[@class='h']").InnerText + " ";
                    string a = row.SelectSingleNode("//td[@class='a']").InnerText;
                    Outputlabel.Text += row.SelectSingleNode("//td[@class='a']").InnerText + " ";
                    string op = "";
//there are 3 classes in each row to have the class="o"
                    if (row.SelectNodes("//td[@class='o']") != null)
                    {
                        foreach (HtmlNode o in row.SelectNodes("//td[@class='o']"))
                        {
                            op += o.InnerText;
                        }
                        Outputlabel.Text += op + " ";
                    }

                    var pr = row.SelectSingleNode("//td//table[@class='p']");
                    string pr = probability.Attributes["title"].Value;

                    Outputlabel.Text += pr + "<br />";
            }
        }  

Prendo solo la prima riga della prima tabella e viene ripetuta molte volte ... e non ottengo la classe "o" e il titolo della tabella con la classe "p" nel tag td con la classe "p"

Risposta popolare

Sembra funzionare in questo modo per il file html online:

    HtmlWeb getHtmlWeb = new HtmlWeb();

        HtmlDocument doc = getHtmlWeb.Load(txtbox.Text);

        string d = "//td[@class='d']";
        string h = "//td[@class='h']";
        string a = "//td[@class='a']";
        string p = "//table[@class='p']";


        HtmlNodeCollection ds = doc.DocumentNode.SelectNodes(d);
        HtmlNodeCollection hs = doc.DocumentNode.SelectNodes(h);
        HtmlNodeCollection as = doc.DocumentNode.SelectNodes(a);
        HtmlNodeCollection ps = doc.DocumentNode.SelectNodes(p);
foreach (HtmlNode n in ds)
        {
            Outputlabel.Text += n.InnerHtml + "<br />"; 
        }

        foreach (HtmlNode h in hs)
        {
            Outputlabel.Text += h.InnerHtml + "<br />";
        }
        foreach (HtmlNode a in as)
        {
            Outputlabel.Text += a.Attributes["href"].Value + "<br />";
        }
        foreach (HtmlNode p in ps)
        {
            Outputlabel.Text += p.Attributes["title"].Value + "<br />";
        }


Related

Autorizzato sotto: CC-BY-SA with attribution
Non affiliato con Stack Overflow
È legale questo KB? Sì, impara il perché
Autorizzato sotto: CC-BY-SA with attribution
Non affiliato con Stack Overflow
È legale questo KB? Sì, impara il perché