Parsing not labeled HTML with "HTML Agility Pack" in C#

.net c# html html-agility-pack web-scraping

Question

Using HTML Agility Pack, I want to parse a not tagged text in a HTML document. The next HTML is an example of the HTML structure that I will treat and the text below the last div is an example of the text that I want to extract. (The one that begins with "I am selling..." and ends in "...services or offers")

<div class="mapbox">
    <div id="map" class="viewposting" data-latitude="32.965732" data-longitude="-96.882528" data-accuracy="22"></div>
    
    <p class="mapaddress">
        <small>
        (<a target="_blank" href="https://maps.google.com/maps/preview/@32.965732,-96.882528,16z">google map</a>)
        </small>
    </p>
</div>
    <p class="attrgroup">

            <span><b>2012 jeep grand cherokee laredo</b></span>
            <br>
    </p>
    <p class="attrgroup">
            <span>VIN: <b>ask me</b></span>
            <br>
            <span>condition: <b>excellent</b></span>
            <br>
            <span>cylinders: <b>6 cylinders</b></span>
            <br>
            <span>drive: <b>rwd</b></span>
            <br>
            <span>fuel: <b>gas</b></span>
            <br>
            <span>odometer: <b>98000</b></span>
            <br>
            <span>title status: <b>clean</b></span>
            <br>

            <span>transmission: <b>automatic</b></span>
            <br>

    </p>
    
        <div class="print-information print-qrcode-container">
            <p class="print-qrcode-label">QR Code Link to This Post</p>
            <div class="print-qrcode" data-location="east"></div>
        </div>
I am selling my 2012 Jeep Grand Cherokee. The Jeep runs and drives great. Zero issues. Always been well maintained and serviced on time. Very dependable car has never left me stranded. Very healthy. Everything works like it should. This Grand Cherokee would make a great family car or First car.<br>
<br>
*3.6 V6 <br>
*Automatic Transmission <br>
*98,000 Original Miles<br>
*Leather and Heated Seats<br>
*Navigation<br>
*Back Up Camera <br>
*Good Tires<br>
*Cold A/C Hot Heater <br>
*Clean Texas Title<br>
*Clean Carfax<br>
Much More!!<br>
<br>
Call or Text me for anymore information. <br>
 <a href="/fb/dal/cto/6620220745" class="showcontact" title="click to show contact info" rel="nofollow">show contact info</a>
    
            <li>do NOT contact me with unsolicited services or offers</li>

Can anyone tell me how to do this? How to extract that text using HTML Agility Pack in .NET?

Thanks in advance

Accepted Answer

After you load the document, use xpath for selecting the text following a specific node.

const string xpath = "//div[@class='print-information print-qrcode-container']/following-sibling::text()[1]";
string text = doc.DocumentNode.SelectSingleNode(xpath).InnerText;

returns:

I am selling my 2012 Jeep Grand Cherokee. The Jeep runs and drives great. Zero issues. Always been well maintained and serviced on time. Very dependable car has never left me stranded. Very healthy. Everything works like it should. This Grand Cherokee would make a great family car or First car.

and visca catalunya!



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why