Problems with HtmlAgilityPack

c# html-agility-pack

Question

I am a HtmlAgilityPack expert. I'm attempting to get info from a loaded website. Namely: Page 1.htm is present. From the table next to the line that says "Operating system," I want to get the value. (The actual paperwork is attached.) What I do is

private void simpleButton1_Click(object sender, EventArgs e)
        {
            // Создаю экземпляр класса
            HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
            // Загружаю файл
            doc.Load(@"D:\(тут путь к файлу)\1.htm");
            // Пытаюсь получить информацию из ноды, но получаю null
            HtmlAgilityPack.HtmlNode bodyNode = doc.DocumentNode.SelectSingleNode("//TD[@CLASS=pt]");
            ...

In general, a lot of data must be extracted from the file, but in my opinion, if one line is recovered, then more may be deduced by comparison.

The following line was needed:

 private void simpleButton1_Click(object sender, EventArgs e)
        {
            // Создаю экземпляр класса
            HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
            // Загружаю файл
            doc.Load(@"D:\(тут путь к файлу)\1.htm");

            foreach (HtmlAgilityPack.HtmlNode node in doc.DocumentNode.SelectNodes("//body/table[2]/tr[8]/td[4]"))
            {
                string stroka = node.InnerText;
            }

This choice, though, is "on the forehead." if you don't alter the way my paper is organized. And it is yet unknown how it is feasible with the aid of the search.

File

1
2
5/6/2018 3:29:15 PM

Accepted Answer

A dictionary of tables by name will be returned. The first column in each table serves as the key and the second column serves as the value.

var tables = new Dictionary<string, Dictionary<string, string>>();
var doc = new HtmlDocument();
doc.Load(@"D:\(тут путь к файлу)\1.htm", Encoding.GetEncoding(1251), false);
var tableNames = doc.DocumentNode.SelectNodes("//td[@class='pt']/a").Select(a=>a.Attributes["name"].Value);
foreach(string name in tableNames)
{
    var table = doc.DocumentNode.SelectSingleNode("//table[.//a[@name='" + name + "']]/following-sibling::table[1]");
    int columns = table.SelectNodes(".//tr[1]/td").Count();

    string[] keys = table.SelectNodes(".//tr/td["+(columns-1)+"]").Select(n => n.InnerText.Replace("&nbsp;"," ").Trim()).ToArray();
    string[] values = table.SelectNodes(".//tr/td["+columns+"]").Select(n => n.InnerText.Replace("&nbsp;"," ").Trim()).ToArray();
    var body = new Dictionary<string, string>();
    for (int i = 0; i < keys.Count(); i++)
    {
        string key = keys[i];
        if (body.ContainsKey(key))
            body[key] += ", " + values[i];
        else if( key!="" && values[i]!="")
            body[key] = values[i];

    }
    tables.Add(name, body);

}

For instancetables["power management"] a list of 4 entries

  • [0] {[Текущий источник питания, Электросеть]} System.Collections.Generic.KeyValuePair
  • [1] {[Состояние батарей, Нет батареи]} System.Collections.Generic.KeyValuePair
  • [2] {[Полное время работы от батарей, Неизвестно]} System.Collections.Generic.KeyValuePair
  • [3] {[Оставшееся время работы от батарей, Неизвестно]} System.Collections.Generic.KeyValuePair

and tables["power management"]["Текущий источник питания"] returns:

"Электросеть"

In order to iterate, you can:

foreach(var tableName in tables.Keys)
{
    var table = tables[tableName];
    foreach(var key in table.Keys)
    {
        string value = table[key];
        Debug.Print(tableName + "/" + key + "/" + value);
    }
}
0
6/2/2018 10:30:09 AM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow