Visual Basic HTML Agility Pack How to Grab Images from Table Cells

extract html-agility-pack image vb.net

Question

I've spent a long time trying to figure this out, so perhaps someone can assist. I'm extracting data from a table and putting it on a data grid using the agility pack (the Data grid is not important I am just using it to see if the extraction works). Anyhow, there are thumbnail images in the first column of the table. The code below allows me to extract all the text, but I'm unsure how to do the same for the first column's photos. Can anyone assist?

PS I saved the website as an MHL file since I was unable to extract any data straight from it; I presume this was due to the site's security or login credentials. I'm not sure whether I've made things for myself tougher or easier.

Private Sub Button1_Click(ByVal sender As Object, ByVal e As EventArgs) Handles Button1.Click

    ' '' original cods ***************************************
    Dim Web As New HtmlAgilityPack.HtmlWeb
    Dim Doc As New HtmlAgilityPack.HtmlDocument
    Dim RowCount As Integer = 1



    '   Doc = Web.Load("https://firefly.cardinalnewman.ac.uk/home/my")

    Doc.Load("E:\table.mht")


    Dim tables As HtmlAgilityPack.HtmlNodeCollection = Doc.DocumentNode.SelectNodes("//table")
    Dim img As HtmlAgilityPack.HtmlNodeCollection = Doc.DocumentNode.SelectNodes("//table")
    Dim Links As HtmlAgilityPack.HtmlNodeCollection = Doc.DocumentNode.SelectNodes("//table")
    Dim rows As HtmlAgilityPack.HtmlNodeCollection = tables(0).SelectNodes("//*[@id=HomeMyStudents]")


    For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[1]")
        RowCount = RowCount + 1

        DGV.Rows.Add(Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing)

    Next
    RowCount = 0
    For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[2]")
        RowCount = RowCount + 1
        '     DGV.Rows(RowCount).Cells(1).Value = somehow insert image
        ' this is the section where I need to grab the image in each cell and either save or place in my datagrid


    Next
    RowCount = 0
    For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[3]")
        RowCount = RowCount + 1
        DGV.Rows(RowCount).Cells(2).Value = table.InnerText
    Next
    RowCount = 0
    For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[4]")
        RowCount = RowCount + 1
        DGV.Rows(RowCount).Cells(3).Value = table.InnerText
    Next
    RowCount = 0
    For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[5]")
        RowCount = RowCount + 1
        DGV.Rows(RowCount).Cells(4).Value = table.InnerText
    Next
    RowCount = 0
    For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[6]")
        RowCount = RowCount + 1
        DGV.Rows(RowCount).Cells(5).Value = table.InnerText
    Next
    RowCount = 0
    For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[7]")
        RowCount = RowCount + 1
        DGV.Rows(RowCount).Cells(6).Value = table.InnerText
    Next
    RowCount = 0
    For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[8]")
        RowCount = RowCount + 1
        DGV.Rows(RowCount).Cells(7).Value = table.InnerText
    Next
    RowCount = 0
    For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[9]")
        RowCount = RowCount + 1
        DGV.Rows(RowCount).Cells(8).Value = table.InnerText
    Next
    RowCount = 0
    For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[10]")
        RowCount = RowCount + 1
        DGV.Rows(RowCount).Cells(9).Value = table.InnerText
    Next
    RowCount = 0
    For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[11]")
        RowCount = RowCount + 1
        DGV.Rows(RowCount).Cells(10).Value = table.InnerText
    Next




End Sub  
1
0
11/29/2016 8:55:53 PM

Accepted Answer

Therefore, it is likely that the visuals are similar to:

<img src="whatever.jpg"/> 

presumably in the markup?

HAP enables you to seize picture nodes using tools like

... .SelectNodes("./img") 

Regarding the paths:

... .Attributes("src").Value()

I'm not aware of any specific HAP capabilities that would enable you to execute any HTTP requests like this from there, so you'll need a WebClient for that.

Dim wc as new WebClient 

wc.DownloadFile(StringContainingThatSrcValue, PathToSaveFileTo) 

HTH!

0
1/23/2014 5:05:40 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow