아래 link
에서 xpath
를 사용하여 link
를 추출하려고합니다.
string url = "http://www.album-cover-art.org/search.php?q=Ruin+-+Live+Album+Version+Lamb+of+God"
내 코드 :
HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc = web.Load(url); //Exception generated here Line 23
if (htmlDoc.DocumentNode != null)
{
HtmlNode linkNode = htmlDoc.DocumentNode.SelectSingleNode(".//*[@id='related_search_row']/img/@src");
if (linkNode != null)
Console.WriteLine(linkNode.InnerText);
}
위의 코드는 잘 컴파일되지만 예외를 생성하면 실행하려고합니다.
Unhandled Exception: System.NullReferenceException: Object reference not set to an instance of an object.
스택 트레이스 완료
System.NullReferenceException: Object reference not set to an instance of an object.
at HtmlAgilityPack.HtmlDocument.ReadDocumentEncoding(HtmlNode node) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 1916
at HtmlAgilityPack.HtmlDocument.PushNodeEnd(Int32 index, Boolean close) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 1805
at HtmlAgilityPack.HtmlDocument.Parse() in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 1468
at HtmlAgilityPack.HtmlDocument.Load(TextReader reader) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 769
at HtmlAgilityPack.HtmlWeb.Get(Uri uri, String method, String path, HtmlDocument doc, IWebProxy proxy, ICredentials creds) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1515
at HtmlAgilityPack.HtmlWeb.LoadUrl(Uri uri, String method, WebProxy proxy, NetworkCredential creds) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1563
at HtmlAgilityPack.HtmlWeb.Load(String url, String method) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1149
at HtmlAgilityPack.HtmlWeb.Load(String url) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1107
at ScreenScrapping.Program.Main(String[] args) in c:\Users\ranveer\csharp\ScreenScrapping\ScreenScrapping\Program.cs:line 23
그래서, 내 질문은 왜 내가이 예외를 받고있다.
이것은 HtmlAgilityPack의 버그입니다. 구문 분석하려는 문서에 charset
값 ( iso-utf-8
)이없는 <meta http-equiv="Content-Type" content="text/html; charset=iso-utf-8">
AgilityPack에 의해 유효한 인코딩 이름으로 구문 분석됩니다. Simon Mourier가 말했듯 이 이것은 1.4.0.0에서 소개 된 버그 입니다.
이것을 피하려면 수동으로 스트림에서 문서를로드하고 다음과 같이 수동으로 인코딩을 설정하십시오.
var htmlDoc = new HtmlDocument();
htmlDoc.OptionReadEncoding = false;
var request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
using (var response = (HttpWebResponse)request.GetResponse())
{
using (var stream = response.GetResponseStream())
{
htmlDoc.Load(stream, Encoding.UTF8);
}
}