I want to be able to get first link from inside this div.
<div id="first-tweet-wrapper">
<blockquote class="tweet" lang="en">
<a href="htttp://link.com"> <--- This one
text </a>
</blockquote>
<a href="http://link2.net" class="click-tracking" target="_blank"
data-tracking-category="discover" data-tracking-action="tweet-the-tweet">
Tweet it! </a>
</div>
I've tried with this code but it doesn't work
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(source);
var div = doc.DocumentNode.SelectSingleNode("//div[@id='first-tweet-wrapper']");
if (div != null)
{
var links = div.Descendants("a")
.Select(a => a.InnerText)
.ToList();
}
You need to take the value of the href-attribute of the anchor element using HtmlAgilityPack's GetAttributeValue method. You could access the single anchor element by extracting directly the content of the parent blockcode element like this:
//div[@id='first-tweet-wrapper']/blockquote[@class='twitter-tweet']
Then fetch the single link inside. A possible solution could look like this (in this case the input is facebook, but works with microsoft too):
try
{
// download the html source
var webClient = new WebClient();
var source = webClient.DownloadString(@"https://discover.twitter.com/first-tweet?username=facebook#facebook");
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(source);
var div = doc.DocumentNode.SelectSingleNode("//div[@id='first-tweet-wrapper']/blockquote[@class='twitter-tweet']");
if (div != null)
{
// there is only one links
var link = div.Descendants("a").FirstOrDefault();
if (link != null)
{
// take the value of the attribute
var href = link.GetAttributeValue("href", "");
Console.WriteLine(href);
}
}
}
catch (Exception exception)
{
Console.WriteLine(exception.Message);
}
The output is in this case:
Another possibility is to directly select the anchor element using XPath (like @har07 suggested):
var xpath = @"//div[@id='first-tweet-wrapper']/blockquote[@class='twitter-tweet']/a";
var link = doc.DocumentNode.SelectSingleNode(xpath);
if (link != null)
{
// take the value of the href-attribute
var href = link.GetAttributeValue("href", "");
Console.WriteLine(href);
}
The output is the same as above.
Assuming your <div>
id is "first-tweet-wrapper" instead of "firt", you can use this XPath query to get <a>
element inside <blockquote>
:
//div[@id='first-tweet-wrapper']/blockquote/a
So your code will look about like this :
var a = doc.DocumentNode
.SelectSingleNode("//div[@id='first-tweet-wrapper']/blockquote/a");
if (a != null)
{
var text = a.InnerText;
var link = a.GetAttributeValue("href", "");
}