get value from table in a web page using Html Agility Pack Without using "SelectNode'

c# c#-4.0 html html-agility-pack windows-store-apps


I am trying to get the full value of the "Transaction and get url" using the Html Agility Pack. when i inspect the html source using google i am able to see the full transaction id with a url. My question is how do i get the full value of all Transaction and the url associated with them and add them to my datagridusing Async. I am not able to use "SelectNode" due to it is not supported in windows store apps.## Heading ##

here is the url of the site:

async private void GetTransactions()
    url = "";
    string html;

    HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
        WebResponse x = await req.GetResponseAsync();
        HttpWebResponse res = (HttpWebResponse)x;
        if (res != null)
            if (res.StatusCode == HttpStatusCode.OK)
                Stream stream = res.GetResponseStream();
                using (StreamReader reader = new StreamReader(stream))
                    html = reader.ReadToEnd();
                HtmlDocument htmlDocument = new HtmlDocument();

               var tsTable = htmlDocument.DocumentNode.ChildNodes["html"].ChildNodes["body"].ChildNodes["div"].

                    int n = 2;
                    var tsRow = tsTable.Split(Environment.NewLine.ToCharArray()).Skip(n).ToArray();

                    for (var index = 1; index < tsRow.Count(); index++)

        MessageDialog messageDialog =
            new MessageDialog("A tear occured in the space-time continuum. Please try again when all planets in the solar system are aligned.");
<telerikGrid:RadDataGrid Grid.RowSpan="1"  ItemsSource="{Binding Data}" IsSynchronizedWithCurrentItem="True" AlternateRowBackground="AliceBlue" Background="White" Grid.Row="2" 
                         UserEditMode="Inline" UserGroupMode="Disabled" VerticalAlignment="Bottom" AutoGenerateColumns="False" Height="294" Grid.ColumnSpan="2">
        <telerikGrid:PropertyGroupDescriptor PropertyName="Group"/>
        <telerikGrid:DataGridNumericalColumn PropertyName="Id" CanUserEdit="False" CanUserFilter="False" Header="#" SizeMode="Fixed" Width="40"/>
        <telerikGrid:DataGridTextColumn PropertyName="pnDate" CanUserFilter="False" Header="Date" CellContentFormat="{}{0,0:dd.MM.yyyy}"/>
        <telerikGrid:DataGridNumericalColumn PropertyName="pnType" CanUserFilter="False" Header="Type"/>
        <telerikGrid:DataGridTextColumn PropertyName="pnAddress" CanUserFilter="False" Header="Address"/>
        <telerikGrid:DataGridDateColumn PropertyName="pnAmount" CanUserFilter="False" Header="Amount"/>
1/10/2014 12:26:32 AM

Accepted Answer

SelectNode (with an XPath query) just does its own thing of iterating through the nodes and matching things up. You just have to do this by hand, by looking at the HTML itself and building a path to get to what you want.

var table = htmlDocument.DocumentNode.ChildNodes["html"].ChildNodes["Body"].ChildNodes[0].ChildNodes[0].ChildNodes[0].ChildNodes["Table"];

Now that you have the table (and you could have been more specific with the ChildNodes, like looking for the Div with a specific class attribute value) you can start looking at the rows. The first row is the headers, we don't care about that.

// The first table row is index 0 and looks like this:
// <tr><th>Transaction</th><th>Block</th><th>Approx. Time</th><th>Amount</th><th>Balance</th><th>Currency</th></tr>
// It is the column headers, each <th> node represents a column. The foreach below starts at index 1, the first row of real data...
foreach(var index = 1; index < table.ChildNodes.Count; index++)
    // a row of data looks like:
    // <tr><td><a href="../tx/513.cut for space.b4a#o1">5130f066e0...</a></td><td><a href="../block/c3.cut for space.c9c">468275</a></td><td>2013-11-28 09:14:17</td><td>0.3</td><td>0.3</td><td>LTC</td></tr>
    // each <td> node inside of the row, is the matching data for the column index...
    var row = table.ChildNodes[index];
    var transactionLink = row.ChildNodes[0].ChildNodes["a"].Attributes["href"].Value;
    var transactionText = row.ChildNodes[0].ChildNodes["a"].InnerText;

    // Other variables for the table row data... 
    // Here is one more example
    var apporxTime = row.ChildNodes[2].InnerText;
1/10/2014 1:22:59 AM

Popular Answer

this is one hell of a hack but you may try using following regex to parse if you are absolutely positively sure to not use the API that @the_lotus mentioned.


Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow