I am trying to grab elements from HTML source based on the class or id name, using C# windows forms application. I am putting the source into a string using WebClient and plugging it into the HTMLAgilityPack using HtmlDocument.
However, all the examples I find with the HTMLAgilityPack pack parse through and find items based on tags. I need to find a specific id, of say a link in the html, and retrieve the value inside of the tags. Is this possible and what would be the most efficient way to do this? Everything I am trying to parse out the ids is giving me exceptions. Thanks!
You should be able to do this with XPath:
HtmlDocument doc = new HtmlDocument(); doc.Load(@"file.htm"); HtmlNode node = doc.DocumentNode.SelectSingleNode("//*[@id=\"my_control_id\"]"); string value = (node == null) ? "Error, id not found" : node.InnerHtml;
Quick explanation of the xpath here:
//means search everywhere in the path, Use
SelectNodesif it will be matching multiples
*means match any type of node
define "Predicates" which are basically checking properties relative to this node
[@id=\"my_control_id\"]means find nodes that have an attribute named "id" with the value "my_control_id"