嘗試使用HtmlAgilityPack從網頁中提取數據

c# html-agility-pack web

我正在嘗試從中提取單個數據
http://www.dsebd.org/displayCompany.php?name=NBL
我在附圖中顯示了必需的字段,其中Xpath:/ html / body / table [2] / tbody / tr / td [2] / table / tbody / tr [3] / td 1 / p 1 / table 1 / tbody / tr / td 1 / table / tbody / tr [2] / td [2] / font

錯誤:發生異常並且使用該Xpath找不到數據。 “在HtmlAgilityPack.dll中發生了'System.Net.WebException'類型的未處理異常”

在此處輸入圖像描述

源代碼:

static void Main(string[] args)
    {
        /************************************************************************/
        string tickerid = "Bse_Prc_tick";
        HtmlAgilityPack.HtmlDocument doc = new   HtmlWeb().Load(@"http://www.dsebd.org/displayCompany.php?name=NBL", "GET");

        if (doc != null)
        {
            // Fetch the stock price from the Web page
            string stockprice = doc.DocumentNode.SelectSingleNode(string.Format("./html/body/table[2]/tbody/tr/td[2]/table/tbody/tr[3]/td1/p1/table1/tbody/tr/td1/table/tbody/tr[2]/td[2]/font", tickerid)).InnerText;
            Console.WriteLine(stockprice);
        }
        Console.WriteLine("ReadKey Starts........");
        Console.ReadKey();
}

一般承認的答案

好吧,我查了一下。 XPath我們使用的只是不正確。當您嘗試找出錯誤所在的位置時,真正的樂趣就開始了。

只需查看您正在使用的頁面的源代碼,除了許多阻礙XPath的錯誤,它甚至包含多個HTML標記...

Chrome開發工具和您正在使用的工具適用於由瀏覽器更正的dom樹(全部打包到單個html節點,添加了一些tbody等)。

由於html結構被簡單破解,因此成為HtmlAgilityPack解析。

根據情況,您既可以使用RegExp,也可以只搜索源中的已知元素(速度更快,但靈活性更低)。

例如:

...
using System.Net; //required for Webclient
...
        class Program
        {
            //entry point of console app
            static void Main(string[] args)
            {
                // url to download
                // "var" means I am too lazy to write "string" and let compiler decide typing
                var url = @"http://www.dsebd.org/displayCompany.php?name=NBL";

                // creating object in using makes Garbage Collector delete it when using block ends, as opposed to standard cleaning after whole function ends
                using (WebClient client = new WebClient()) // WebClient class inherits IDisposable
                {

                    // simply download result to string, in this case it will be html code
                    string htmlCode = client.DownloadString(url);
                    // cut html in half op position of "Last Trade:"
                    // searching from beginning of string is easier/faster than searching in middle
                    htmlCode = htmlCode.Substring(
                        htmlCode.IndexOf("Last Trade:")
                        );
                    // select from .. to .. and then remove leading and trailing whitespace characters
                    htmlCode = htmlCode.Substring("2\">", "</font></td>").Trim();
                    Console.WriteLine(htmlCode);
                }
                Console.ReadLine();
            }
        }
        // http://stackoverflow.com/a/17253735/3147740 <- copied from here
        // this is Extension Class which adds overloaded Substring() I used in this code, it does what its comments says
        public static class StringExtensions
        {
            /// <summary>
            /// takes a substring between two anchor strings (or the end of the string if that anchor is null)
            /// </summary>
            /// <param name="this">a string</param>
            /// <param name="from">an optional string to search after</param>
            /// <param name="until">an optional string to search before</param>
            /// <param name="comparison">an optional comparison for the search</param>
            /// <returns>a substring based on the search</returns>
            public static string Substring(this string @this, string from = null, string until = null, StringComparison comparison = StringComparison.InvariantCulture)
            {
                var fromLength = (from ?? string.Empty).Length;
                var startIndex = !string.IsNullOrEmpty(from)
                    ? @this.IndexOf(from, comparison) + fromLength
                    : 0;

                if (startIndex < fromLength) { throw new ArgumentException("from: Failed to find an instance of the first anchor"); }

                var endIndex = !string.IsNullOrEmpty(until)
                ? @this.IndexOf(until, startIndex, comparison)
                : @this.Length;

                if (endIndex < 0) { throw new ArgumentException("until: Failed to find an instance of the last anchor"); }

                var subString = @this.Substring(startIndex, endIndex - startIndex);
                return subString;
            }
        }

熱門答案

將代碼包裝在try-catch中以獲取有關異常的更多信息。




許可下: CC-BY-SA with attribution
不隸屬於 Stack Overflow
這個KB合法嗎? 是的,了解原因
許可下: CC-BY-SA with attribution
不隸屬於 Stack Overflow
這個KB合法嗎? 是的,了解原因