Visual Basic HTML Agility Pack How to Grab Images from Table Cells

extract html-agility-pack image vb.net

Question

Hope someone can help as I have spent ages trying to figure this out. I am using the agility pack to extract data from a table and put it in a data grid (the Data grid is not important I am just using it to see if the extraction works). Anyway in the first column of the table thumbnail pictures are contained. I can extract all the text using the code below but I don't know how to extract the images from the first column... Can anyone help?

PS I have saved the webpage as a MHL file as couldn't extract any data directly from it - I believe it's something to do with the site security/ credentials. Don't know if I have made things easier or harder for myself.

Private Sub Button1_Click(ByVal sender As Object, ByVal e As EventArgs) Handles Button1.Click

    ' '' original cods ***************************************
    Dim Web As New HtmlAgilityPack.HtmlWeb
    Dim Doc As New HtmlAgilityPack.HtmlDocument
    Dim RowCount As Integer = 1



    '   Doc = Web.Load("https://firefly.cardinalnewman.ac.uk/home/my")

    Doc.Load("E:\table.mht")


    Dim tables As HtmlAgilityPack.HtmlNodeCollection = Doc.DocumentNode.SelectNodes("//table")
    Dim img As HtmlAgilityPack.HtmlNodeCollection = Doc.DocumentNode.SelectNodes("//table")
    Dim Links As HtmlAgilityPack.HtmlNodeCollection = Doc.DocumentNode.SelectNodes("//table")
    Dim rows As HtmlAgilityPack.HtmlNodeCollection = tables(0).SelectNodes("//*[@id=HomeMyStudents]")


    For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[1]")
        RowCount = RowCount + 1

        DGV.Rows.Add(Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing)

    Next
    RowCount = 0
    For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[2]")
        RowCount = RowCount + 1
        '     DGV.Rows(RowCount).Cells(1).Value = somehow insert image
        ' this is the section where I need to grab the image in each cell and either save or place in my datagrid


    Next
    RowCount = 0
    For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[3]")
        RowCount = RowCount + 1
        DGV.Rows(RowCount).Cells(2).Value = table.InnerText
    Next
    RowCount = 0
    For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[4]")
        RowCount = RowCount + 1
        DGV.Rows(RowCount).Cells(3).Value = table.InnerText
    Next
    RowCount = 0
    For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[5]")
        RowCount = RowCount + 1
        DGV.Rows(RowCount).Cells(4).Value = table.InnerText
    Next
    RowCount = 0
    For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[6]")
        RowCount = RowCount + 1
        DGV.Rows(RowCount).Cells(5).Value = table.InnerText
    Next
    RowCount = 0
    For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[7]")
        RowCount = RowCount + 1
        DGV.Rows(RowCount).Cells(6).Value = table.InnerText
    Next
    RowCount = 0
    For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[8]")
        RowCount = RowCount + 1
        DGV.Rows(RowCount).Cells(7).Value = table.InnerText
    Next
    RowCount = 0
    For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[9]")
        RowCount = RowCount + 1
        DGV.Rows(RowCount).Cells(8).Value = table.InnerText
    Next
    RowCount = 0
    For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[10]")
        RowCount = RowCount + 1
        DGV.Rows(RowCount).Cells(9).Value = table.InnerText
    Next
    RowCount = 0
    For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[11]")
        RowCount = RowCount + 1
        DGV.Rows(RowCount).Cells(10).Value = table.InnerText
    Next




End Sub  

Accepted Answer

So, presumably the images look something like:

<img src="whatever.jpg"/> 

in the markup, right?

HAP will allow you to grab image nodes with something like

... .SelectNodes("./img") 

And for the paths:

... .Attributes("src").Value()

From there, I'm not aware of any particular HAP features that allow you to actually perform any HTTP requests like this, so you're going to want a WebClient for that.

Dim wc as new WebClient 

wc.DownloadFile(StringContainingThatSrcValue, PathToSaveFileTo) 

HTH!




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why