Analysieren von HTML mit HTMLAGILITYPACK und Laden in Datatable C #

c# html html-agility-pack

Frage

Ich habe HTML, das so aussieht:

<body class="style_0">
        <div>
            <div class="style_1">Pending Test List</div>
            <table style=" width: 100%;" id="AUTOGENBOOKMARK_4365445353431356880">
                <col>
                <col>
                <tbody>
                    <tr>
                        <td style="vertical-align: baseline;">
                            <div class="style_4">Pending Test List</div>
                        </td>
                        <td style="vertical-align: baseline;">
                            <div class="style_5">SOME AGENCY Laboratories, Inc.</div>
                        </td>
                    </tr>
                </tbody>
            </table>
            <table class="style_6" style=" width: 4.531in;" id="AUTOGENBOOKMARK_5083738604442918131">
                <col style=" width: 1in;">
                <col class="style_7" style=" width: 0.75in;">
                <col class="style_8" style=" width: 0.6in;">
                <col style=" width: 0.75in;">
                <col style=" width: 2.375in;">
                <tbody>
                    <tr class="style_9" style=" height: 0.5in;">
                        <td style="vertical-align: middle;">
                            <div class="style_10">Report Range:</div>
                        </td>
                        <td style="vertical-align: middle;">
                            <div class="style_11">01/01/2012</div>
                        </td>
                        <td style="vertical-align: middle;">
                            <div class="style_12">through</div>
                        </td>
                        <td style="vertical-align: middle;">
                            <div class="style_13">01/31/2012</div>
                        </td>
                        <td style="vertical-align: middle;">
                            <div class="style_14">(by Date Entered)</div>
                        </td>
                    </tr>
                </tbody>
            </table>
            <table class="style_15" style=" width: 100%;" id="AUTOGENBOOKMARK_7602283385844673591" iid="/526

(QuRs78576248:0)">
                <col style=" width: 0.75in;">
                <col style=" width: 1.25in;">
                <col style=" width: 1in;">
                <col style=" width: 1.5in;">
                <col style=" width: 1.5in;">
                <col style=" width: 1.5in;">
                <col>
                <thead>
                    <tr>
                        <td colspan="4" style="vertical-align: baseline;"></td>
                        <td style="vertical-align: baseline;"></td>
                        <td style="vertical-align: baseline;"></td>
                        <td style="vertical-align: baseline;"></td>
                    </tr>
                    <tr>
                        <td style="vertical-align: baseline;">
                            <div class="style_16">Entered</div>
                        </td>
                        <td style="vertical-align: baseline;">
                            <div class="style_16">Spec. ID</div>
                        </td>
                        <td style="vertical-align: baseline;">
                            <div class="style_16">Batch/Pos.</div>
                        </td>
                        <td style="vertical-align: baseline;">
                            <div class="style_16">Test</div>
                        </td>
                        <td style="vertical-align: baseline;">
                            <div class="style_16">Client ID</div>
                        </td>
                        <td style="vertical-align: baseline;">
                            <div class="style_16">Client Name</div>
                        </td>
                        <td style="vertical-align: baseline;">
                            <div class="style_16">Agency</div>
                        </td>
                    </tr>
                </thead>
                <tbody>
                    <tr>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_18">1/30/12</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_19">ZZ324sdf</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_18">51446 / 75</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">HOLD_DE</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">234234</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">smith, john</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">PPPM-6P - SOME AGENCY</div>
                        </td>
                    </tr>
                    <tr>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_18">1/31/12</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_19">SFD3434</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_18">51668 / 17</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">HOLD_DE</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">FOY, EL</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">FOY, ALEX</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">someagency &amp; Associates LLC</div>
                        </td>
                    </tr>
                    <tr>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_18">1/31/12</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_19">SFD3434</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_18">51668 / 25</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">HOLD_DE</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">JAMISON, PA</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">JAMISON, ROY</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">someagency &amp; Associates LLC</div>
                        </td>
                    </tr>
                    <tr>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_18">1/31/12</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_19">SFD3434</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_18">51669 / 34</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">HOLD_DE</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">NEWMAN, SO</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">NEWMAN, ALEX</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">someagency &amp; Associates LLC</div>
                        </td>
                    </tr>
                </tbody>
                <tfoot>
                    <tr>
                        <td colspan="2" style="vertical-align: baseline;">
                            <div class="style_21">Total Tests:</div>
                        </td>
                        <td style="vertical-align: baseline;">
                            <div class="style_22">4</div>
                        </td>
                        <td style="vertical-align: baseline;"></td>
                        <td style="vertical-align: baseline;"></td>
                        <td style="vertical-align: baseline;"></td>
                        <td style="vertical-align: baseline;"></td>
                    </tr>
                </tfoot>
            </table>
            <table style=" width: 100%;" id="AUTOGENBOOKMARK_8507236727661888074">
                <col>
                <col>
                <col>
                <tbody>
                    <tr>
                        <td style="vertical-align: baseline;">
                            <div class="style_2">
                                <br>Feb 13, 2012 9:37 AM</div>
                        </td>
                        <td style="vertical-align: baseline;">
                            <div class="style_3">
                                <br>
                                <div style="text-align:center;">Page 1</div>
                            </div>
                        </td>
                        <td style="vertical-align: baseline;"></td>
                    </tr>
                </tbody>
            </table>
        </div>
    </body>

Wenn es gerendert wird, sieht es ungefähr so ​​aus:

Bildbeschreibung hier eingeben

Hier sind die Daten, die ich da raus analysieren wollte:

1/30/12 ZZ324sdf 51446/75 HOLD_DE 234234 Smith, John PPPM-6P - EINIGE AGENTUR 1/31/12 SFD3434 51668/17 HOLD_DE FOY, EL FOY, ALEX Someagency & Associates LLC 31.01.12 SFD3434 51668/25 HOLD_DE JAMISON, PA JAMISON, ROY Someagency & amp; Associates LLC 1/31/12 SFD3434 51669/34 HOLD_DE NEWMAN, SO NEWMAN, ALEX Someagency & Associates LLC

Bisher habe ich es versucht:

foreach (HtmlNode link in htmlSnippet.DocumentNode.SelectNodes("//a[@href]"))
    {
        HtmlAttribute att = link.Attributes["href"];
        hrefTags.Add(att.Value);
    }

aber ich verstehe, dass dies nur die AHREF extrahiert, und ich möchte die Tabellenelemente extrahieren.

Wie mache ich das? Vielen Dank für deine Hilfe.

Akzeptierte Antwort

Denken Sie ein bisschen anders darüber nach - anstatt jeden Anker (mit einem href) zu wollen, wollen Sie jede Zeile aus dem Körper des Tisches mit der Klasse style_15 (diese ID sieht im Flug sehr generiert aus); dann wirst du für jede Zeile jede Zelle haben wollen.

foreach (var row in htmlSnippet.DocumentNode.SelectNodes("//table[@class = 'style_15']/tbody/tr"))
{
    foreach (var cell in row.SelectNodes("td"))
    {
        // Do something
    }
}



Lizenziert unter: CC-BY-SA with attribution
Nicht verbunden mit Stack Overflow
Ist diese KB legal? Ja, lerne warum
Lizenziert unter: CC-BY-SA with attribution
Nicht verbunden mit Stack Overflow
Ist diese KB legal? Ja, lerne warum