I need to parse a page and get inner text from specified textbox on that page. But, when I compiled this code:
HtmlAgilityPack.HtmlDocument infoDoc = new HtmlAgilityPack.HtmlDocument();
HtmlNode.ElementsFlags["br"] = HtmlElementFlag.Closed;
infoDoc.LoadHtml(@ProblemPageSource.ToString());
HtmlNode bodyGlobal = @infoDoc.DocumentNode.SelectSingleNode(".//body").SelectSingleNode(".//div[@class='global']");
HtmlNode globalRight = @bodyGlobal.SelectSingleNode(".//div[@class='globalRight']");
HtmlNode formPanel = @globalRight.SelectSingleNode(".//form").SelectSingleNode(".//div[@class='panel]");
ProblemCode = @formPanel.SelectNodes(".//div")[0].SelectSingleNode(".//textarea").OuterHtml.ToString(); //And here is now NullRefEx :(
codeEditor.Text = @ProblemCode.ToString();
I had an exception throwed from Xpath with message "this string is unclosed". And...source of the page I need to parse hosted at GitHub Gist. UPD: Minimalistic version: Minimalistic version of the code viewed in the MozDevTools Can anybody help me please?
P.S. Sorry for my bad english! P.S.S. When I checked the code by W3C Validator there are no any unclose tags...but many errors (not my problem :) ) P.S.S.S. Yes, I am using CEFsharp to view the pages, and I get sources from him. So, if it uses autocorrection of Html, why this code is broken? :(
Besides the uncolsed single quote in in your ".//div[@class='panel]"
you will need to call:
HtmlAgilityPack.HtmlNode.ElementsFlags.Remove("form");
Before you create an instance of your HtmlDocument
because form
elements are allowed to overlap and thus handled differently, after that you'll be able to deal with forms as any other element.
so the following shall do:
HtmlAgilityPack.HtmlNode.ElementsFlags.Remove("form");
HtmlNode.ElementsFlags["br"] = HtmlElementFlag.Closed;
var infoDoc = new HtmlAgilityPack.HtmlDocument();
infoDoc.LoadHtml(@ProblemPageSource.ToString());
HtmlNode bodyGlobal = infoDoc.DocumentNode.SelectSingleNode("//body//div[@class='global']");
HtmlNode globalRight = @bodyGlobal.SelectSingleNode(".//div[@class='globalRight']");
HtmlNode formPanel = @globalRight.SelectSingleNode(".//form//div[@class='panel']");
var ProblemCode = @formPanel.SelectSingleNode(".//div/textarea").OuterHtml.ToString();
Correct SelectSingleNode(".//div[@class='panel]");
to SelectSingleNode(".//div[@class='panel']");
.