I am working with some html contents. The format of the HTML is like below.
<li>
<ul>
<li>Test1</li>
<li>Test2</li>
</ul>
Odd string 1
<ul>
<li>Test3</li>
<li>Test4</li>
</ul>
Odd string 2
<ul>
<li>Test5</li>
<li>Test6</li>
</ul>
<li>
There can be multiple "odd string" in html content. So I want all the "odd string" in array. Is there any easy way ? (I am using C# and HtmlAgilityPack)
Select ul
elements and refer to next sibling node, which will be your text:
HtmlDocument html = new HtmlDocument();
html.Load(html_file);
var odds = from ul in html.DocumentNode.Descendants("ul")
let sibling = ul.NextSibling
where sibling != null &&
sibling.NodeType == HtmlNodeType.Text && // check if text node
!String.IsNullOrWhiteSpace(sibling.InnerHtml)
select sibling.InnerHtml.Trim();
something like
MatchCollection matches = Regex.Matches(HTMLString, "</ul>.*?<ul>", RegexOptions.SingleLine);
foreach (Match match in matches)
{
String oddstring = match.ToString().Replace("</ul>","").Replace("<ul>","");
}