How to find the html tag node position with Html Agility Pack

c#-3.0 html html-agility-pack

Question

I am trying to find the start/end positions of different Html tags inside my Html string by using Html Agility Pack.

Sample html string:

This is a <a href="https://en.wikipedia.org/wiki/Health">custom</a> made html string that will serve as an example for the <a href="http://stackoverflow.com">StackOverflow</a> question described above.

After successfully running the code I need to get 2 arrays with values from the start index of the a tags as follows:

int[] startIndex = new int[] { 11, 124 };
int[] endIndex = new int[] { 68, 176 };

Where 11 and 125 are the index positions that mark the begining of the a tag and 68 and 175 represents the last index position of the same tag.

I know that using the html agility pack HtmlNode I can get the LinePosition value that will give me the start index and along with the innerHtml.Lenght of the element I can calculate the end index position of the html element.

I was able to count the a elements by using:

int aNodesCount =  htmlDoc.DocumentNode.SelectNodes("//a").Count;

And now I need to itereate through all of them and get the LinePosition values of each one. This is where I find myself stuck.

Popular Answer

Well, that was pretty simple so I will post an answer for myself of others getting the same problem:

foreach (HtmlNode aNode in htmlDoc.DocumentNode.SelectNodes("//a"))
{
    startIndex.Add(aNode.LinePosition);
    endIndex.Add(aNode.LinePosition + aNode.OuterHtml.Length);
}


Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow