HTMLAgilityPack and separating on

html-agility-pack

Question

I have some html, which is separated by <br/> e.g.:

Jack Janson
<br/>
309 123 456
<br/>
My Special Street 43

What is the easiest way to retrieve the information in 3 columns?

I am not an XPath expert, so another approach would be to separate the string on the line break, and just work with the array. Is there a smarter way to do it?

Update: Forgot to format the code.

Accepted Answer

In pure XPATH over XML, you would use an XPATH expression like this: //preceding-sibling::br or //following-sibling::br (see here for help on XPATH Axes)

But, the XPATH over HTML implementation that you'll find in Html Agility Pack does not support pure text node or (Attribute node) in XPATH selection expressions (//br/text() or //br/@blah do not work for example). Note it works in filters, so, these //br[text()='blah'] or //br[@att='blah'] work.

So, back to the question, you need to combine XPATH and code, something like this:

HtmlDocument doc = new HtmlDocument();
doc.Load(myHtmlFile);

foreach (HtmlNode p in doc.DocumentNode.SelectNodes("//br"))
{
    Console.WriteLine(p.PreviousSibling.InnerText.Trim());
}

That will output

Jack Janson
309 123 456


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why