I am trying to find the start/end positions of different Html tags inside my Html string by using Html Agility Pack.
Sample html string:
This is a <a href="https://en.wikipedia.org/wiki/Health">custom</a> made html string that will serve as an example for the <a href="http://stackoverflow.com">StackOverflow</a> question described above.
After successfully running the code I need to get 2 arrays with values from the start index of the a tags as follows:
int[] startIndex = new int[] { 11, 124 };
int[] endIndex = new int[] { 68, 176 };
Where 11 and 125 are the index positions that mark the begining of the a tag and 68 and 175 represents the last index position of the same tag.
I know that using the html agility pack HtmlNode I can get the LinePosition value that will give me the start index and along with the innerHtml.Lenght of the element I can calculate the end index position of the html element.
I was able to count the a elements by using:
int aNodesCount = htmlDoc.DocumentNode.SelectNodes("//a").Count;
And now I need to itereate through all of them and get the LinePosition values of each one. This is where I find myself stuck.
Well, that was pretty simple so I will post an answer for myself of others getting the same problem:
foreach (HtmlNode aNode in htmlDoc.DocumentNode.SelectNodes("//a"))
{
startIndex.Add(aNode.LinePosition);
endIndex.Add(aNode.LinePosition + aNode.OuterHtml.Length);
}