如何使用c#解析html文檔

c# html html-agility-pack html-parsing

我必須按如下方式解析文檔。我正在嘗試HtmlAgilityPack,但它非常複雜。我需要這個標籤內部文本: <td style="background: #36461f;color: #ffffff;font-weight: bold;padding: 2px;font-size: 12px;height: 25px;">Mac Bahsi</td>和孩子們

<div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518117;;;-;11.25;1;Maç Bahsi;164518117')">
<div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518117;;;-;6.50;0;Maç Bahsi;164518117')">,
<div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518117;;;-;1.18;2;Maç Bahsi;164518117')">

<!DOCTYPE HTML>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <style>
        .table1 {
            width: 100%;
            margin: 0px;
            padding: 0px;
            border-collapse: collapse;
            padding: 0px;
        }

        .div1 {
            cursor: pointer;
            margin: 1px;
            border: 1px solid #999999;
            float: left;
            font-size: 12px;
        }

        .td1 {
            text-align: center;
            font-size: 20px;
            font-weight: bold;
            color: #33460E;
            height: 20px;
            padding: 0px;
        }

        .td2 {
            text-align: center;
            font-weight: bold;
            color: #808000;
            padding: 0px;
        }
    </style>
</head>
<body style="background: #FFFFCC;margin: 0px;padding: 0px;font-size: 12px;">
    <p></p>
    <table style="width: 100%" cellpadding="0" cellspacing="0">
        <tr>
            <td style="background: #36461f;color: #ffffff;font-weight: bold;padding: 2px;font-size: 12px;height: 25px;">Mac Bahsi</td>
        </tr>
        <tr>
            <td>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518117;;;-;11.25;1;Maç Bahsi;164518117')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">11.25</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Club America Mexico</td>
                        </tr>
                    </table>
                </div>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518117;;;-;6.50;0;Maç Bahsi;164518117')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">6.50</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Beraberlik</td>
                        </tr>
                    </table>
                </div>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518117;;;-;1.18;2;Maç Bahsi;164518117')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">1.18</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Real Madrid</td>
                        </tr>
                    </table>
                </div>
            </td>
        </tr>
    </table>
    <table style="width: 100%" cellpadding="0" cellspacing="0">
        <tr>
            <td style="background: #36461f;color: #ffffff;font-weight: bold;padding: 2px;font-size: 12px;height: 25px;">Ilk Yari Bahsi</td>
        </tr>
        <tr>
            <td>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518128;;;-;8.50;1;İlk Yarı Bahsi;164518128')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">8.50</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Club America Mexico</td>
                        </tr>
                    </table>
                </div>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518128;;;-;3.05;0;İlk Yarı Bahsi;164518128')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">3.05</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Beraberlik</td>
                        </tr>
                    </table>
                </div>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518128;;;-;1.50;2;İlk Yarı Bahsi;164518128')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">1.50</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Real Madrid</td>
                        </tr>
                    </table>
                </div>
            </td>
        </tr>
    </table>
    <table style="width: 100%" cellpadding="0" cellspacing="0">
        <tr>
            <td style="background: #36461f;color: #ffffff;font-weight: bold;padding: 2px;font-size: 12px;height: 25px;">İkinci Yarı Bahsi</td>
        </tr>
        <tr>
            <td>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518133;;;-;8.50;1;İkinci Yarı Bahsi;164518133')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">8.50</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Club America Mexico</td>
                        </tr>
                    </table>
                </div>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518133;;;-;3.70;0;İkinci Yarı Bahsi;164518133')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">3.70</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Beraberlik</td>
                        </tr>
                    </table>
                </div>
                <div class="div1" style="width: 288px;" onclick="parent.javaScriptAddSlip('slip', '164518133;;;-;1.40;2;İkinci Yarı Bahsi;164518133')">
                    <table class="table1">
                        <tr class="menuClickable">
                            <td class="td1">1.40</td>
                        </tr>
                        <tr class="menuClickable">
                            <td class="td2">Real Madrid</td>
                        </tr>
                    </table>
                </div>
            </td>
        </tr>
    </table>
    <br />
    <br />
    <br />
</body>
</html>

熱門答案

首先,將HTMLAgilityPack nuget包安裝到項目中。

然後,作為一個例子:

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();

// There are various options, set as needed
htmlDoc.OptionFixNestedTags=true;

// filePath is a path to a file containing the html
htmlDoc.Load(filePath);

// Use:  htmlDoc.LoadHtml(xmlString);  to load from a string (was htmlDoc.LoadXML(xmlString)

// ParseErrors is an ArrayList containing any errors from the Load statement
if (htmlDoc.ParseErrors != null && htmlDoc.ParseErrors.Count() > 0)
{
    // Handle any parse errors as required

}
else
{

    if (htmlDoc.DocumentNode != null)
    {
        HtmlAgilityPack.HtmlNode bodyNode = htmlDoc.DocumentNode.SelectSingleNode("//body");

        if (bodyNode != null)
        {
            // Do something with bodyNode
        }
    }
}

(注意:此代碼僅為示例,不一定是最好/唯一的方法。不要在您自己的應用程序中盲目使用它。)

HtmlDocument.Load()方法還接受一個流,該流在與.NET框架中的其他面向流的類集成時非常有用。而HtmlEntity.DeEntitize()是另一種正確處理html實體的有用方法。 (謝謝​​馬修)

HtmlDocumentHtmlNode是您最常使用的類。與XML解析器類似,它提供了接受XPath表達式的selectSingleNode和selectNodes方法。

注意HtmlDocument.Option??????布爾屬性。它們控制LoadLoadXML方法處理HTML / XHTML的方式。

還有一個名為HtmlAgilityPack.chm的編譯幫助文件,其中包含每個對象的完整參考。這通常位於解決方案的基本文件夾中。



Related

許可下: CC-BY-SA with attribution
不隸屬於 Stack Overflow
許可下: CC-BY-SA with attribution
不隸屬於 Stack Overflow