get title tag by html agility pack

c# html-agility-pack search-engine

Question


I'm attempting to utilize the HTML Agility Pack to get results' URLs and titles.
I've got this code.

using HtmlAgilityPack;
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Net;
using System.ServiceModel.Syndication;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.Xml;

namespace Search
{
public partial class Form1 : Form
{
    // load snippet
    HtmlAgilityPack.HtmlDocument htmlSnippet = new HtmlAgilityPack.HtmlDocument();

    public Form1()
    {
        InitializeComponent();
    }

    private void btn1_Click(object sender, EventArgs e)
    {
        listBox1.Items.Clear();
        StringBuilder sb = new StringBuilder();
        byte[] ResultsBuffer = new byte[8192];
        string SearchResults = "http://google.com/search?q=" + txtKeyWords.Text.Trim();
        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(SearchResults);
        HttpWebResponse response = (HttpWebResponse)request.GetResponse();

        Stream resStream = response.GetResponseStream();
        string tempString = null;
        int count = 0;
        do
        {
            count = resStream.Read(ResultsBuffer, 0, ResultsBuffer.Length);
            if (count != 0)
            {
                tempString = Encoding.ASCII.GetString(ResultsBuffer, 0, count);
                sb.Append(tempString);
            }
        }

        while (count > 0);
        string sbb = sb.ToString();

        HtmlAgilityPack.HtmlDocument html = new HtmlAgilityPack.HtmlDocument();
        html.OptionOutputAsXml = true;
        html.LoadHtml(sbb);
        HtmlNode doc = html.DocumentNode;

        foreach (HtmlNode link in doc.SelectNodes("//a[@href]"))
        {
            //HtmlAttribute att = link.Attributes["href"];
            string hrefValue = link.GetAttributeValue("href", string.Empty);
            if (!hrefValue.ToString().ToUpper().Contains("GOOGLE") && hrefValue.ToString().Contains("/url?q=") && hrefValue.ToString().ToUpper().Contains("HTTP://"))
            {
                int index = hrefValue.IndexOf("&");
                if (index > 0)
                {
                    hrefValue = hrefValue.Substring(0, index);
                    listBox1.Items.Add(hrefValue.Replace("/url?q=", ""));
                }
            }
        }
    }
}

}

This code produces links in response to a search, but I also want to obtain the title tag for each link. How can I do that?
Can somebody assist me?

1
2
5/22/2016 10:04:49 AM

Accepted Answer

If by "title" you mean the text that appears when a link is clicked, you may get it fromInnerText attribute of eachHtmlNode link :

foreach (HtmlNode link in doc.SelectNodes("//a[@href]"))
{
    .....
    var title = link.InnerText.Trim();
}
2
5/22/2016 9:47:09 AM

Popular Answer

Links and the titles of the links will be produced from this. This won't give you the link anchor text back. Only the link's title will be shown to you.

foreach (HtmlNode link in doc.SelectNodes("//a[@href]"))
{
   HtmlWeb htmlWeb = new HtmlWeb();
   HtmlAgilityPack.HtmlDocument htmlDocument = htmlWeb.Load(link);
   var title = htmlDocument.DocumentNode.SelectSingleNode("html/head/title").InnerText;
}


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow