Using htmlAgilityPack scraping all inner text from tag

.net c# html-agility-pack web-scraping xpath

Question

I want to scrap all the word from the link http://search.freefind.com/siteindex.html?id=59478474&ltr=10240&fwr=0&pid=i&ics=1 I tried something like this:

HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("http://search.freefind.com/siteindex.html?id=59478474&ltr=10240&fwr=0&pid=i&ics=1");
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//font[@class='search-index-font']//a");

if (nodes != null)
{

    foreach (HtmlNode n in nodes)
    {
         link = n.InnerText;
        my_link.Add(link);
        MessageBox.Show(link);
    }

}
else
    MessageBox.Show("no wordfound ");

My expexted output should like

a    
aa
aachhe
aagrashi
aagun
aaj
aam
aanka
aankhi
aar
aashman
abāddhō
abāddhōtā
abadh
..
..

But it didn't work??It shows "no word found" Means it returns null.How can i get all text from < a > tag in that case??? Can anyone tell me What should be in SelectNodes("")???

Accepted Answer

You need to search for the next text node after <script> tag(not <a> tag as you said), inside <font class='search-index-font'>. This xpath expression will do the trick:

//font[@class='search-index-font']/script/following-sibling::text()[1]

And this code:

HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("http://search.freefind.com/siteindex.html?id=59478474&ltr=10240&fwr=0&pid=i&ics=1");
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//font[@class='search-index-font']/script/following-sibling::text()[1]");

will return text nodes you need:

a
aa
aachhe
aagrashi
aagun
aaj
aam
aanka
aankhi
aar
...

Popular Answer

HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc =
    web.Load("http://search.freefind.com/siteindex.html?id=59478474&ltr=10240&fwr=0&pid=i&ics=1");
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//font[@class='search-index-font']");
string link = string.Empty;
if (nodes != null)
{
    foreach (var item in nodes)
    {
        var value =
        item.Elements("script").ToList();
        foreach (var items in value)
        {
            link += items.NextSibling.InnerText+ "\n";
        }
    }
    MessageBox.Show(link);
}
else
    MessageBox.Show("no wordfound ");


Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow