I want to scrap all the word from the link http://search.freefind.com/siteindex.html?id=59478474<r=10240&fwr=0&pid=i&ics=1 I tried something like this:
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("http://search.freefind.com/siteindex.html?id=59478474<r=10240&fwr=0&pid=i&ics=1");
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//font[@class='search-index-font']//a");
if (nodes != null)
{
foreach (HtmlNode n in nodes)
{
link = n.InnerText;
my_link.Add(link);
MessageBox.Show(link);
}
}
else
MessageBox.Show("no wordfound ");
My expexted output should like
a
aa
aachhe
aagrashi
aagun
aaj
aam
aanka
aankhi
aar
aashman
abÄddhÅ
abÄddhÅtÄ
abadh
..
..
But it didn't work??It shows "no word found" Means it returns null.How can i get all text from < a > tag in that case??? Can anyone tell me What should be in SelectNodes("")???
You need to search for the next text node after <script>
tag(not <a>
tag as you said), inside <font class='search-index-font'>
. This xpath expression will do the trick:
//font[@class='search-index-font']/script/following-sibling::text()[1]
And this code:
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("http://search.freefind.com/siteindex.html?id=59478474<r=10240&fwr=0&pid=i&ics=1");
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//font[@class='search-index-font']/script/following-sibling::text()[1]");
will return text nodes you need:
a
aa
aachhe
aagrashi
aagun
aaj
aam
aanka
aankhi
aar
...
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc =
web.Load("http://search.freefind.com/siteindex.html?id=59478474<r=10240&fwr=0&pid=i&ics=1");
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//font[@class='search-index-font']");
string link = string.Empty;
if (nodes != null)
{
foreach (var item in nodes)
{
var value =
item.Elements("script").ToList();
foreach (var items in value)
{
link += items.NextSibling.InnerText+ "\n";
}
}
MessageBox.Show(link);
}
else
MessageBox.Show("no wordfound ");