Select "src" value with XPath to HtmlAgilityPack

c# html-agility-pack xpath

Question

I'm on a development process of a crawling engine. My program crawls websites through Xpath with HtmlAgilityPack. I need to get some image src tag's directly. You can see my simple code below which is not working correctly, thanks in advice!

PS: Please ignore " char problem, XPath patterns are provided by database.

Agility.DocumentNode.SelectSingleNode("//img[@id="product_photo"]/@src");

And this is the line i need to crawl (the *...* part shows block to extract

<img id="product_photo" src="*/images/thumb/4400/10280/st.jpg*">

Some pages provide image in meta tags so .Attributes["src"] wont work.

UPDATE: You can see my query and result hereQuery

Popular Answer

You cann't get the value of "src" or any other attributes in using:

Agility.DocumentNode.SelectSingleNode(yourXpath);

Just by using:

string s=Agility.DocumentNode.SelectSingleNode(yourXpath).value;

It's because XPath cann't return value of an attribute by SelectSingleNode() func in HtmlAgilityPack class. So you must use SelectSingleNode(yourXpath).value or use Regex after the pharsing to get just the "src" without the outerText.



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why