Parse webpage using HtmlAgilityPack and Json

c# html-agility-pack json web-scraping

Question

I am trying to parse the HTML from Hotpads and am confused on how to get extract the script tag and map part of it into a Json object.By using HTMLAgilityPack I have loaded an example url and it breaks where it looks for that tag. I plan on deserializing it after

Main method

   private static void ParseSite()
    {
        var url = "https://hotpads.com/308-s-9th-dr-ponte-vedra-beach-fl-32082-syw3eh/building";
        var web = new HtmlWeb();
        var doc = web.Load(url);

        var link = doc.DocumentNode.SelectSingleNode("//a[contains(.,'window.__PRELOADED_STATE__')]");

        if (link != null)
        {
            Console.WriteLine(link.InnerText);
        }
        Console.ReadLine();
    }

Script tag:

<script>
 window.__PRELOADED_STATE__ = {{SOME JSON HERE}}
<script>

Model:

public class Contact
{
    public string DATA_MODEL { get; set; }
    public string companyName { get; set; }
    public string contactName { get; set; }
    public string contactPhone { get; set; }
}

Popular Answer

I think you just forgot to replace the 'a' tag with the 'script' tag in your xpath expression. I can't verify in code at the moment but you can use chrome dev tools to test these by going to expect and using it in the search window.

I modified it to have the script tag instead and it worked for me using the chrome dev tools. This is the xpath I tried on the page:

//script[contains(.,'window.__PRELOADED_STATE__')]



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why