How to grab elements by class or id in HTML Source in C#?

c# html html-agility-pack parsing

Question

I am trying to grab elements from HTML source based on the class or id name, using C# windows forms application. I am putting the source into a string using WebClient and plugging it into the HTMLAgilityPack using HtmlDocument.

However, all the examples I find with the HTMLAgilityPack pack parse through and find items based on tags. I need to find a specific id, of say a link in the html, and retrieve the value inside of the tags. Is this possible and what would be the most efficient way to do this? Everything I am trying to parse out the ids is giving me exceptions. Thanks!

Accepted Answer

You should be able to do this with XPath:

HtmlDocument doc = new HtmlDocument();
doc.Load(@"file.htm");

HtmlNode node = doc.DocumentNode.SelectSingleNode("//*[@id=\"my_control_id\"]");
string value = (node == null) ? "Error, id not found" : node.InnerHtml;

Quick explanation of the xpath here:

  • // means search everywhere in the path, Use SelectNodes if it will be matching multiples
  • * means match any type of node
  • [] define "Predicates" which are basically checking properties relative to this node
  • [@id=\"my_control_id\"] means find nodes that have an attribute named "id" with the value "my_control_id"

Further reference



Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow