I master HtmlAgilityPack. I'm trying to get data from a pre-loaded page. Namely: There is a page 1.htm. I want to get the value from the table opposite the line "Operating system". (the document itself is attached). I do this:
private void simpleButton1_Click(object sender, EventArgs e)
{
// Создаю ÑкземплÑÑ€ клаÑÑа
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
// Загружаю файл
doc.Load(@"D:\(тут путь к файлу)\1.htm");
// ПытаюÑÑŒ получить информацию из ноды, но получаю null
HtmlAgilityPack.HtmlNode bodyNode = doc.DocumentNode.SelectSingleNode("//TD[@CLASS=pt]");
...
In general, it is necessary to extract a lot of information from the file, but I think that if one line is obtained, then further by analogy.
The required line was as follows:
private void simpleButton1_Click(object sender, EventArgs e)
{
// Создаю ÑкземплÑÑ€ клаÑÑа
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
// Загружаю файл
doc.Load(@"D:\(тут путь к файлу)\1.htm");
foreach (HtmlAgilityPack.HtmlNode node in doc.DocumentNode.SelectNodes("//body/table[2]/tr[8]/td[4]"))
{
string stroka = node.InnerText;
}
But this option is "on the forehead." If you do not change the structure of my document. And how it is possible with the help of the search has not figured out yet.
This will return a dictionary of tables by name. Each table is a dictionary with first column as key and second for value.
var tables = new Dictionary<string, Dictionary<string, string>>();
var doc = new HtmlDocument();
doc.Load(@"D:\(тут путь к файлу)\1.htm", Encoding.GetEncoding(1251), false);
var tableNames = doc.DocumentNode.SelectNodes("//td[@class='pt']/a").Select(a=>a.Attributes["name"].Value);
foreach(string name in tableNames)
{
var table = doc.DocumentNode.SelectSingleNode("//table[.//a[@name='" + name + "']]/following-sibling::table[1]");
int columns = table.SelectNodes(".//tr[1]/td").Count();
string[] keys = table.SelectNodes(".//tr/td["+(columns-1)+"]").Select(n => n.InnerText.Replace(" "," ").Trim()).ToArray();
string[] values = table.SelectNodes(".//tr/td["+columns+"]").Select(n => n.InnerText.Replace(" "," ").Trim()).ToArray();
var body = new Dictionary<string, string>();
for (int i = 0; i < keys.Count(); i++)
{
string key = keys[i];
if (body.ContainsKey(key))
body[key] += ", " + values[i];
else if( key!="" && values[i]!="")
body[key] = values[i];
}
tables.Add(name, body);
}
For example tables["power management"]
returns 4 entries:
- [0] {[Текущий иÑточник питаниÑ, ÐлектроÑеть]} System.Collections.Generic.KeyValuePair
- [1] {[СоÑтоÑние батарей, Ðет батареи]} System.Collections.Generic.KeyValuePair
- [2] {[Полное Ð²Ñ€ÐµÐ¼Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ñ‹ от батарей, ÐеизвеÑтно]} System.Collections.Generic.KeyValuePair
- [3] {[ОÑтавшееÑÑ Ð²Ñ€ÐµÐ¼Ñ Ñ€Ð°Ð±Ð¾Ñ‚Ñ‹ от батарей, ÐеизвеÑтно]} System.Collections.Generic.KeyValuePair
and tables["power management"]["Текущий иÑточник питаниÑ"]
returns:
"ÐлектроÑеть"
For iterating you can do:
foreach(var tableName in tables.Keys)
{
var table = tables[tableName];
foreach(var key in table.Keys)
{
string value = table[key];
Debug.Print(tableName + "/" + key + "/" + value);
}
}