Parse HTML Table in PowerShell V3

html-agility-pack powershell

Question

I have the HTML table a URL for the HTML below.

I tried using HTMLAgilityPack.dll to parse it and convert it to XML/CSV/PS Object, but I was unable. Do you have any instructions I could follow?


I presently just have the beginning of the code, and I have access to the lines but not the data in the lines. I want to convert the table to a PSObject and export it to csv.

Add-Type -Path C:\Windows\system32\HtmlAgilityPack.dll
$HTML = New-Object HtmlAgilityPack.HtmlDocument
$res = $HTML.Load("C:\Test\Test.html")
$table = $HTML.DocumentNode.SelectNodes("//table/tr/td/nobr")

as soon as I access $table[0..47]. I can access the first ** column ** of the file using InnerHtml, but not the second or subsequent ones.

Ohad, thanks

1
0
1/24/2013 10:23:31 AM

Accepted Answer

To include all the html, you may try this.<nobr> tags. I'll let you work out the reasoning to get the results you desire.

$ie = new-object -com "InternetExplorer.Application"
$ie.navigate("http://urltoyourfile.html")
$doc = $ie.Document
($doc.getElementsByTagName("nobr"))|%{$_.innerHTML}

Output:

Lead User&nbsp;&nbsp;
Accesses&nbsp;&nbsp;
Last Accessed&nbsp;&nbsp;
Average&nbsp;&nbsp;
Max&nbsp;&nbsp;
Min&nbsp;&nbsp;
Total&nbsp;&nbsp;
amirt</NO br>
2
01/20/2013 09:40:47
04:18:17
06:19:26
02:17:09
08:36:35
andream
1
01/20/2013 10:33:01
02:34:37
02:34:37
02:34:37
02:34:37
avnerm
1
01/17/2013 11:34:16
00:30:44
00:30:44
00:30:44
00:30:44
brouria

a method of parsing it

($doc.getElementsByTagName("nobr"))|%{
    write-host -nonew $_.innerHTML";"
    $cpt++
    if ($cpt % 8 -eq 0){$cpt=1;write-host ""}
}
3
1/24/2013 12:36:11 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow