Parse webpage using HtmlAgilityPack and Json

c# html-agility-pack json web-scraping

Question

I'm attempting to extract the script tag from Hotpads' HTML, but I'm not sure how to turn it into a JSON object. I loaded an example link using HTMLAgilityPack, and it breaks where it searches for the tag. I'm going to deserialize it after.

main technique

   private static void ParseSite()
    {
        var url = "https://hotpads.com/308-s-9th-dr-ponte-vedra-beach-fl-32082-syw3eh/building";
        var web = new HtmlWeb();
        var doc = web.Load(url);

        var link = doc.DocumentNode.SelectSingleNode("//a[contains(.,'window.__PRELOADED_STATE__')]");

        if (link != null)
        {
            Console.WriteLine(link.InnerText);
        }
        Console.ReadLine();
    }

Code tag:

<script>
 window.__PRELOADED_STATE__ = {{SOME JSON HERE}}
<script>

Model:

public class Contact
{
    public string DATA_MODEL { get; set; }
    public string companyName { get; set; }
    public string contactName { get; set; }
    public string contactPhone { get; set; }
}
1
1
6/12/2018 2:11:11 AM

Popular Answer

I believe you just neglected to change the 'a' tag in your xpath expression to the'script' tag. Although I'm unable to check in code right now, you can try them using Chrome Dev Tools by heading to Expect and using it in the search box.

I changed it to use the script tag instead, and using the Chrome dev tools, it worked for me. The xpath I tested on the website is as follows:

//script[contains(.,'window.__PRELOADED_STATE__')]
0
6/14/2018 9:58:17 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow