C# HTML agility pack, pulling plain text from a div

c# html html-agility-pack

Question

I am attempting to pull short little blurbs from site (lol).

HTML of what I am trying to pull is below.

<div class="field field-name-field-body-medium field-type-text-long field-label-hidden">
The community comics collaboration is back for another heaping helping of Academy fun!
</div>

code I am currently using that is not working.

var shortBio = doc.DocumentNode.Descendants("div").Where(p => p.Attributes.Contains("class") && p.Attributes["class"]
         .Value.Contains("field field - name - field - body - medium field - type - text - long field - label - hidden"));


 for (int i = 0; i < 5; i++)
     {
         blurbs[i] = shortBio.ElementAt(i).ToString();
     }

obviously this is not working and I am not sure how to pull the text. I keep finding info on just pulling

Thank you in advance.

Accepted Answer

Looks like the parent of your target div is given class teaser-content which can be a good identifier. The following XPath should return the wanted div :

//div[@class='teaser-content']/div

Then you can get the content text of the div from InnerText property, for example (replace SelectSingleNode() with SelectNodes() and iterate through the result if you want all divs instead of just the first one) :

var doc = new HtmlWeb().Load("http://na.leagueoflegends.com/en/news/");
var div = doc.DocumentNode.SelectSingleNode("//div[@class='teaser-content']/div");
Console.WriteLine(div.InnerText);

dotnetfiddle demo

output :

The community comics collaboration is back for another heaping helping of Academy fun!



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why