Select HTML from specific position using Html Agility Pack

c# html html-agility-pack xpath


I need to obtain html text nodes from, let's say, line 64,line position 45 to line 183,line position 22. I'm pretty new to XPath and I'm not quite sure what are my options. How should I proceed? I had in mind something like this:

var nodes=doc.DocumentNode.SelectNodes("//text()[position() > startPosition and position() < endPosition]");
2/18/2013 8:26:50 PM

Accepted Answer

The HtmlNode class has two important attributes (for what you need to do):

  • Line (the line where the node begins)
  • LinePosition (the line where the node ends)

You could do something like:

var nodes = doc.DocumentNode.Descendants("#text").Where(
    x => (x.Line > 64 || (x.Line == 64 && x.LinePosition >= 45)) &&
         (x.Line < 183 || (x.Line == 183 && x.LinePosition <= 22))

of course, you can also do doc.DocumentNode.SelectNodes("//text()").Where(...)

One problem you'll have to deal with:

It doesn't tell you where the node ends, so the above solution might give you nodes that end in a line greather than 183, or in line 183 but in a position greather than 22. For that, you can use the OuterHtml property of the node, and do some strings manipulation (get the length to know where it ends, split by \n to know how many lines, etc).

2/19/2013 1:03:32 AM

Popular Answer

You cannot do this with XPath: it does not know anything about line numbers and character positions within the XML.

The position() function returns the relative position of a node in a list of nodes - e.g. returns 1 for the first node in the list, 2 for the second one and so forth.

To get line position information you can parse the XML using XElement or XmlReader and then use the IXmlLineInfo interface.

Note though that using line / character positions to identify fragments of a XML file is problematic: XML processors routinely re-format XML, adding/removing spaces and end lines, and so the same XML fragment can change position.

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow