I have an html file like below
<div>
<div style="margin-left:0.5em;">
<div class="tiny" style="margin-bottom:0.5em;">
<b><span class="h3color tiny">This review is from: </span>You Meet</b>
</div>
If you know Ron Kaufman as I do ...
<br /><br />Whether you're the CEO....
<br /><br />Written in a distinctive, ...
<br /><br />My advice? Don't just get one copy
<div style="padding-top: 10px; clear: both; width: 100%;"></div>
</div>
<div style="margin-left:0.5em;">
<div class="tiny" style="margin-bottom:0.5em;">
<b><span class="h3color tiny">This review is from: </span>My Review</b>
</div>
I became a fan of Ron Kaufman after reading an earlier book of his years ago...
<div style="padding-top: 10px; clear: both; width: 100%;"></div>
</div>
</div>
I want to get review text which doesnt have any html tag. I am using below code now
foreach (HtmlNode divReview in doc.DocumentNode.SelectNodes(@"//div[@style='margin-left:0.5em;']"))
{
if (divReview != null)
{
review.Add(divReview.Descendants("div").Where(d => d.Attributes.Contains("style") &&
d.Attributes["style"].Value.Contains("padding-top: 10px; clear: both; width: 100%;")).
Select(d =>
d.PreviousSibling.InnerText.Trim()).SingleOrDefault());
}
}
which only return "My advice? Don't just get one copy", how can I get the whole text?
Update: Even if I remove all
"br"
tag from htmlnode, still when use the above code I only get "My advice? Don't just get one copy" part!!! any comment?
I've updated the code to this:
var allText = (reviewDiv.Descendants("div")
.First(div => div.Attributes["style"].Value == "padding-top: 10px; clear: both; width: 100%;")
.SelectNodes("./preceding-sibling::text()") ?? new HtmlNodeCollection(null))
.Select(text => text.InnerText);
This should return an IEnumerable of strings with the text preceding the div with the intricate style.
Without having a little more of the surrounding HTML it's hard to tell whether this is exactly what you're after. I'm currently guessing that you have selected a div and that that div is the direct parent of this whole block of text (given your reference to a reviewDiv). Your HTML sample doesn't seem to contain this piece of HTML, so I'm making a few assumptions here.
With the following input:
<div><div class="tiny" style="margin-bottom:0.5em;"> <b><span class="h3color tiny">This review is from: </span>You Meet</b> </div> If you know Ron Kaufman as I do ... <br /><br />Whether you're the CEO.... <br /><br />Written in a distinctive, ... <br /><br />My advice? Don't just get one copy <div style="padding-top: 10px; clear: both; width: 100%;"></div></div>
It extracts this:
If you know Ron Kaufman as I do ...
Whether you're the CEO....
Written in a distinctive, ...
My advice? Don't just get one copy
To build a single string I used: string extractedText = string.Join("", allText);