Parsing parent and descendants tags using HTML Agility Pack

asp.net html html-agility-pack

Question

I am trying to parse HTML form with HTML Agility Pack. It is working fine for the following code: <p>Some Text</p> But suppose I have this: <p>Some Text in p Tag<span>Some text in span tag.</span> Again some text in p tag</p>

I am using HtmlNode nodeItem in htmlDoc.DocumentNode.Descendants(controlName).ToArray() to get all values of a control(in our case p and span). But this is only getting text which is in span.

How can i get values of both the tage - "p" as well as "span".

UPDATE: I am trying to develop a multilingual application where resource files and keys are generated through code. In the above example: I need to create 3 keys: 1-"Some Text in p Tag", 2-"Some text in span tag." and 3-"Again some text in p tag." How can I create these keys. Current Scenario is that, it is creating key for span tag and not for p tag.

Thanks In Advance

Popular Answer

Actually the question is not very clear. You should've posted more relevant codes showing how you tried to get value of <p> and <span>.

This one worked just fine to get text in both <p> and <span> :

var html = @"<p>Some Text in p Tag<span>Some text in span tag.</span> Again some text in p tag</p>";
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
foreach (HtmlNode nodeItem in htmlDoc.DocumentNode.Descendants("p").ToArray())
{
    Console.WriteLine(nodeItem.InnerText);
}
foreach (HtmlNode nodeItem in htmlDoc.DocumentNode.Descendants("span").ToArray())
{
    Console.WriteLine(nodeItem.InnerText);
}

The same output yielded by this single foreach loop :

foreach (HtmlNode nodeItem in 
                htmlDoc.DocumentNode
                       .SelectNodes("//*[name() = 'p' or name() = 'span']"))
{
    Console.WriteLine(nodeItem.InnerText);
}

Or if you actually don't care about tag name, you can get all elements as follow :

foreach (HtmlNode nodeItem in 
                htmlDoc.DocumentNode
                       .SelectNodes("//*"))
{
    Console.WriteLine(nodeItem.InnerText);
}

If none of above samples useful for your case, please update the question to clarify further about the actual problem you're trying to solve.




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why