Using htmlAgilityPack scraping all inner text from tag

.net c# html-agility-pack web-scraping xpath

Question

I want to remove every word from the http://search.freefind.com/siteindex.html?id=59478474&ltr=10240&fwr=0&pid=i&ics=1 link. I made the following attempt:

HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("http://search.freefind.com/siteindex.html?id=59478474&ltr=10240&fwr=0&pid=i&ics=1");
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//font[@class='search-index-font']//a");

if (nodes != null)
{

    foreach (HtmlNode n in nodes)
    {
         link = n.InnerText;
        my_link.Add(link);
        MessageBox.Show(link);
    }

}
else
    MessageBox.Show("no wordfound ");

My anticipated result should be similar

a    
aa
aachhe
aagrashi
aagun
aaj
aam
aanka
aankhi
aar
aashman
abāddhō
abāddhōtā
abadh
..
..

But that wasn't successful? zzz 21 zzz means that it gives null. In that situation, how can I get all text from the a > tag? Can someone please explain me what goes in SelectNodes("")?

1
0
2/22/2013 2:21:34 PM

Accepted Answer

After that, you must look for the subsequent text node.<script> tag(not <a> like you said, within<font class='search-index-font'> . The following xpath expression will work:

//font[@class='search-index-font']/script/following-sibling::text()[1]

Also, this code:

HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("http://search.freefind.com/siteindex.html?id=59478474&ltr=10240&fwr=0&pid=i&ics=1");
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//font[@class='search-index-font']/script/following-sibling::text()[1]");

returns the text nodes you require:

a
aa
aachhe
aagrashi
aagun
aaj
aam
aanka
aankhi
aar
...
0
2/22/2013 2:55:08 PM

Popular Answer

HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc =
    web.Load("http://search.freefind.com/siteindex.html?id=59478474&ltr=10240&fwr=0&pid=i&ics=1");
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//font[@class='search-index-font']");
string link = string.Empty;
if (nodes != null)
{
    foreach (var item in nodes)
    {
        var value =
        item.Elements("script").ToList();
        foreach (var items in value)
        {
            link += items.NextSibling.InnerText+ "\n";
        }
    }
    MessageBox.Show(link);
}
else
    MessageBox.Show("no wordfound ");


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow