How to identify if tweet is original or retweet in scraping with HtmlAgilityPack?

c# filter html-agility-pack tweetr web-scraping


I wanted Twitter tweets of user for data analysis. For that I have used HtmlAgilityPack package to scrape Twitter and it gives me 30 top tweets.

I recognized tweet-text element and fetched all tweets. But I want to identify if it is tweet or retweet. How can I do that?

I have analysed HTML. In retweet there will be an element having tweet-context with-icn class. But when I scrape tweet on that class it throws null exception, because not all tweets will have that class. Then based on what and how can I scrape to get to know if it is retweet or not?


HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("");

var TweetsNode= doc.DocumentNode.SelectNodes("//tr[@class='tweet-container']").ToList();

foreach (var item in TweetsNode)
    var tweet = new Tweets

In the above code, I have tried to fetch tweets of Barack Obama profile. I'm getting top 30 tweets. How can I recognize which one is retweet?
Thank you.

7/10/2018 3:01:52 PM

Accepted Answer

Scraping Twitter 101

  1. Get all Tweets from a page (which comes in handy tables <table class='tweet '>)

    HtmlWeb p = new HtmlWeb();
    var doc = p.Load(@"");
    var nodes = doc.DocumentNode.SelectNodes("//table[@class='tweet  ']");
  2. Look in nodes for the <span class='context'> to indicated that this tweet is a retweet.

    List<Tweet> tweets = new List<Tweet>();
    foreach (var node in nodes)
        bool isRetweet = false;
        var spanNode = node.SelectSingleNode(".//span[@class='context']");
        if (spanNode != null && spanNode.InnerHtml.Contains("retweeted"))
            isRetweet = true;
  3. We also want the Message Text, so scrap this next <div class='tweet-text'>:

        string msg = string.Empty;
        var msgNode = node.SelectSingleNode(".//div[@class='tweet-text']");
        if (msgNode != null)
            msg = msgNode.InnerText.Trim();
        tweets.Add(new Tweet(msg, isRetweet));

Additional the Tweet Container Class:

class Tweet
    public Tweet(string message, bool isRetweet)
        Message = message;
        IsRetweet = isRetweet;

    string Message { get; private set; }
    bool IsRetweet { get; private set; }

As you tell, this is not really rocket science. But you need to understand the basic principals of XPath and Scrapping.

6/11/2018 2:34:55 PM

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow