HtmlAgilityPack XPath This is an unclosed string

.net c# html html-agility-pack xpath

Question

I need to parse a page and get inner text from specified textbox on that page. But, when I compiled this code:

HtmlAgilityPack.HtmlDocument infoDoc = new HtmlAgilityPack.HtmlDocument();
HtmlNode.ElementsFlags["br"] = HtmlElementFlag.Closed;
infoDoc.LoadHtml(@ProblemPageSource.ToString());
HtmlNode bodyGlobal = @infoDoc.DocumentNode.SelectSingleNode(".//body").SelectSingleNode(".//div[@class='global']");
HtmlNode globalRight = @bodyGlobal.SelectSingleNode(".//div[@class='globalRight']");
HtmlNode formPanel = @globalRight.SelectSingleNode(".//form").SelectSingleNode(".//div[@class='panel]");
ProblemCode = @formPanel.SelectNodes(".//div")[0].SelectSingleNode(".//textarea").OuterHtml.ToString(); //And here is now NullRefEx :(
codeEditor.Text = @ProblemCode.ToString();

I had an exception throwed from Xpath with message "this string is unclosed". And...source of the page I need to parse hosted at GitHub Gist. UPD: Minimalistic version: Minimalistic version of the code viewed in the MozDevTools Can anybody help me please?

P.S. Sorry for my bad english! P.S.S. When I checked the code by W3C Validator there are no any unclose tags...but many errors (not my problem :) ) P.S.S.S. Yes, I am using CEFsharp to view the pages, and I get sources from him. So, if it uses autocorrection of Html, why this code is broken? :(

Accepted Answer

Besides the uncolsed single quote in in your ".//div[@class='panel]" you will need to call:

HtmlAgilityPack.HtmlNode.ElementsFlags.Remove("form");

Before you create an instance of your HtmlDocument because form elements are allowed to overlap and thus handled differently, after that you'll be able to deal with forms as any other element.

so the following shall do:

 HtmlAgilityPack.HtmlNode.ElementsFlags.Remove("form");
 HtmlNode.ElementsFlags["br"] = HtmlElementFlag.Closed;
 var infoDoc = new HtmlAgilityPack.HtmlDocument();
 infoDoc.LoadHtml(@ProblemPageSource.ToString());
 HtmlNode bodyGlobal = infoDoc.DocumentNode.SelectSingleNode("//body//div[@class='global']");
 HtmlNode globalRight = @bodyGlobal.SelectSingleNode(".//div[@class='globalRight']");
 HtmlNode formPanel = @globalRight.SelectSingleNode(".//form//div[@class='panel']");
 var ProblemCode = @formPanel.SelectSingleNode(".//div/textarea").OuterHtml.ToString();

Popular Answer

Correct SelectSingleNode(".//div[@class='panel]"); to SelectSingleNode(".//div[@class='panel']");.



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why