c# parsing difficulty with HTML Agility Pack

c# html html-agility-pack

Question

I am having a couple of issues with my code, I am trying to pull every paragraph from a page, but at the moment it is only selecting the last paragraph.

here is my code.

foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[@id='body']/p"))
{
  string text = node.InnerText;
  lblTest2.Text = text;
}
1
1
1/20/2011 9:56:29 PM

Accepted Answer

In your loop you are taking the current node innerText and assigning it to the label. You do this to each node, so of course you only see the last one - you are not preserving the previous ones.

Try this:

foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[@id='body']/p"))
{
  string text = node.InnerText;
  lblTest2.Text += text + Environment.NewLine;
}
4
1/20/2011 10:01:38 PM

Popular Answer

IMO, XPath is no fun. I'd recommend using LINQ syntax instead:

foreach (var node in doc.DocumentNode
    .DescendantNodes()
    .Single(x => x.Id == "body")
    .DescendantNodes()
    .Where(x => x.Name == "p")) 
{
    string text = node.InnerText;
    lblTest2.Text = text;
}


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow