Select HTML from specific position using Html Agility Pack

c# html html-agility-pack xpath


From, say, line 64, line position 45 through line 183, line position 22, I need to get the html text nodes. I'm still learning XPath, so I'm not really clear what my alternatives are. What should I do next? I had the following in mind:

var nodes=doc.DocumentNode.SelectNodes("//text()[position() > startPosition and position() < endPosition]");
2/18/2013 8:26:50 PM

Accepted Answer

The HtmlNode There are two crucial characteristics of class that you must consider:

  • Line (the line that starts the node)
  • LinePosition (the line that terminates the node)

You could carry out the following:

var nodes = doc.DocumentNode.Descendants("#text").Where(
    x => (x.Line > 64 || (x.Line == 64 && x.LinePosition >= 45)) &&
         (x.Line < 183 || (x.Line == 183 && x.LinePosition <= 22))

You can, of course, dodoc.DocumentNode.SelectNodes("//text()").Where(...)

You'll have to cope with the following issue:

The above technique can give you nodes that terminate in a line rather than a straight line since it doesn't specify where the node stops.183 the queue183 although in a greater position than22 You may do it by using theOuterHtml a node's attribute, then manipulate some strings (get the length to determine where it ends, divide by\n the number of lines, etc.).

2/19/2013 1:03:32 AM

Popular Answer

You cannot do this with XPath since it is ignorant of the line numbers and character locations found in XML.

The position() The function's return value is the node's position in a list of nodes, such as 1 for the first node, 2 for the second, and so on.

You may use the IXmlLineInfo interface after parsing the XML using the XElement or XmlReader to get line position information.

However, keep in mind that it is difficult to detect XML file pieces using line or character positions: Because end lines and spaces are often added to or removed from XML, the location of an XML fragment might shift.

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow