HTML Agility Pack - Can only load xml files from the file system, not from the internet.

.net c# html-agility-pack scrape

Question

I've downloaded xhtml pages from the internet using HAP successfully previously. I'm now attempting to load and parse xml documents, however. Only xml files on my file system, such as "C:xmlMyXml.xml," will be loaded by HAP. The web address (http://www.web.com/doc.xml) will not load it. The xml documents are really being requested by HAP over the web, and the server also returns an xml document in response, as I can see using Fiddler. However, nothing is parsed beyond that point. There are no ChildNodes or other elements in the HTML document. When retrieved from the file system, it is correctly processed into an HTML document.

Any thoughts?

1
1
1/10/2011 10:38:00 AM

Accepted Answer

You do not need to utilize HAP if you are just using XML (and not (X)HTML). Comprehensive XML processing is included into Net:

String PostUrl = "http://www.web.com/doc.xml"; 
WebResponse webResponse = WebRequest.Create(PostUrl).GetResponse();
StreamReader sr = new StreamReader(webResponse.GetResponseStream());

String Result = sr.ReadToEnd().Trim();

XmlDocument xdoc = new XmlDocument(); xdoc.LoadXml(Result);
5
1/10/2011 11:53:19 AM

Popular Answer

Since the XML you are attempting to read provides an XSL stylesheet to convert it to (X)HTML, I'm assuming you are using HAP because you wish to edit that HTML in some manner.

Use the built-in XmlDocument and System in.Net if this isn't the case and you are just interested in the raw XML structure. As Sebastian's response indicates, Xml namespaces.

If you need to modify the HTML structure of such a document, you must download the XML and use the XSLT yourself.System.Xml before trying to parse this using HAP, to produce the resultant HTML.



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow