Parsing HTML page with HtmlAgilityPack using LINQ

c# html-agility-pack linq


How can I add values to a string and parse HTML using Linq on a webpage? I want to get three data from a metro application that uses the HTMLAgilityPack and add them to a string.

the website address is

I want the values from the following list to be seen as "belwo."

Balance, Incoming Transactions, and Received

WebResponse x = await req.GetResponseAsync();
HttpWebResponse res = (HttpWebResponse)x;
if (res != null)
    if (res.StatusCode == HttpStatusCode.OK)
        Stream stream = res.GetResponseStream();
        using (StreamReader reader = new StreamReader(stream))
            html = reader.ReadToEnd();
        HtmlDocument htmlDocument = new HtmlDocument();

        string appName = htmlDocument.DocumentNode.Descendants // not sure what t
        string a = "Name: " + WebUtility.HtmlDecode(appName);
1/2/2014 7:48:03 PM

Accepted Answer

Please give the next a shot. As the table is a bit more structured than the free-text in the 'p' tag, you may also think about disassembling it.

Thank you, Aaron.

// download the site content and create a new html document
// NOTE: make this asynchronous etc when considering IO performance
var url = "";
var data = new WebClient().DownloadString(url);
var doc = new HtmlDocument();

// extract the transactions 'h3' title, the node we want is directly before it
var transTitle = 
    (from h3 in doc.DocumentNode.Descendants("h3")
     where h3.InnerText.ToLower() == "transactions"
     select h3).FirstOrDefault();

// tokenise the summary, one line per 'br' element, split each line by the ':' symbol
var summary = transTitle.PreviousSibling.PreviousSibling;
var tokens = 
    (from row in summary.InnerHtml.Replace("<br>", "|").Split('|')
     where !string.IsNullOrEmpty(row.Trim())
     let line = row.Trim().Split(':')
     where line.Length == 2
     select new { name = line[0].Trim(), value = line[1].Trim() });

// using linqpad to debug, the dump command drops the currect variable to the output

This is an example of the output from the LinqPad command "Dump()," which dumps the variable to the console:

  • Leverage: 5 LTC
  • Activities in: 2
  • Obtain: 5 LTC
  • 0 transactions were made.
  • Sent: zero LTC
12/31/2013 3:58:18 PM

Popular Answer

The document you must parse is not the best since many components lack the class or at the very least the id property, but what you want to get is the content of a second p tag.

You could try it.

HtmlDocument htmlDocument = new HtmlDocument();

var pNodes = htmlDocument.DocumentNode.SelectNodes("//p")
[1].InnerHtml.ToString().Split(new string[] { "<br />" }, StringSplitOptions.None).Take(3);

 string vl="Balance:"+pNodes[0].Split(':')[1]+"Transactions in"+pNodes[1].Split(':')[1]+"Received"+pNodes[2].Split(':')[1];

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow