Parsing htmlagilitypack (table without id's) vb.net

html html-agility-pack html-table vb.net xpath

Domanda

Spero di ottenere delle risposte da te.

Uso vb.net e htmlagilitypack per recuperare i dati e funziona, ma non nel modo in cui lo voglio =)

Ho questa pagina html (parte di):


<TABLE WITH=100% BORDER=4>

<TR>
<TH><A HREF="http:/cgi-bin/vplata.py?tgnr=4300&val=Visa+T%C3%A5gnummer&Bek=Visa&sort=Lok" >Lok</A></TH>
<TH><A HREF="http:/cgi-bin/vplata.py?tgnr=4300&val=Visa+T%C3%A5gnummer&Bek=Visa&sort=Avg" >Avg&aring;r</A></TH>
<TH><A HREF="http:/cgi-bin/vplata.py?tgnr=4300&val=Visa+T%C3%A5gnummer&Bek=Visa&sort=AvgS" >Station</A></TH>
<TH><A HREF="http:/cgi-bin/vplata.py?tgnr=4300&val=Visa+T%C3%A5gnummer&Bek=Visa&sort=Ank" >Ankommer</A></TH>
<TH><A HREF="http:/cgi-bin/vplata.py?tgnr=4300&val=Visa+T%C3%A5gnummer&Bek=Visa&sort=AnkS" >Station</A></TH>
<TH>Tjänstetyp</TH>
</TR>
<TR>
<TD><a HREF="/cgi-bin/vplata.py?individ=R1176&val=Visa+Lokindivid&Bek=Visa">R1176</a></TD>
<TD>Mar-20-2013 13:04:00</TD>
<TD><A HREF="/cgi-bin/vplata.py?stn=HBGB&val=Visa+Driftplats&Bek=Visa">HBGB</A></TD>
<TD>Mar-20-2013 21:21:00</TD>
<TD><A HREF="/cgi-bin/vplata.py?stn=ET3&val=Visa+Driftplats&Bek=Visa">ET3</A></TD>
<TD>B1</TD>
</TR>
<TR>
<TD><a HREF="/cgi-bin/vplata.py?individ=R1267&val=Visa+Lokindivid&Bek=Visa">R1267</a></TD>
<TD>Mar-20-2013 13:04:00</TD>
<TD><A HREF="/cgi-bin/vplata.py?stn=HBGB&val=Visa+Driftplats&Bek=Visa">HBGB</A></TD>
<TD>Mar-20-2013 21:21:00</TD>
<TD><A HREF="/cgi-bin/vplata.py?stn=ET3&val=Visa+Driftplats&Bek=Visa">ET3</A></TD>
<TD>B2</TD>
</TR>
<TR>
<TD><a HREF="/cgi-bin/vplata.py?individ=R1267&val=Visa+Lokindivid&Bek=Visa">R1267</a></TD>
<TD>Mar-20-2013 22:05:00</TD>
<TD><A HREF="/cgi-bin/vplata.py?stn=ET3&val=Visa+Driftplats&Bek=Visa">ET3</A></TD>
<TD>Mar-20-2013 22:28:00</TD>
<TD><A HREF="/cgi-bin/vplata.py?stn=KB%&val=Visa+Driftplats&Bek=Visa">KBÄ</A></TD>
<TD>D1</TD>
</TR>
<TR>
<TD><a HREF="/cgi-bin/vplata.py?individ=R1281&val=Visa+Lokindivid&Bek=Visa">R1281</a></TD>
<TD>Mar-21-2013 13:04:00</TD>
<TD><A HREF="/cgi-bin/vplata.py?stn=HBGB&val=Visa+Driftplats&Bek=Visa">HBGB</A></TD>
<TD>Mar-21-2013 21:21:00</TD>
<TD><A HREF="/cgi-bin/vplata.py?stn=ET3&val=Visa+Driftplats&Bek=Visa">ET3</A></TD>
<TD>D1</TD>
</TR>
<TR>
<TD><a HREF="/cgi-bin/vplata.py?individ=R1281&val=Visa+Lokindivid&Bek=Visa">R1281</a></TD>
<TD>Mar-21-2013 22:05:00</TD>
<TD><A HREF="/cgi-bin/vplata.py?stn=ET3&val=Visa+Driftplats&Bek=Visa">ET3</A></TD>
<TD>Mar-21-2013 22:28:00</TD>
<TD><A HREF="/cgi-bin/vplata.py?stn=KB%&val=Visa+Driftplats&Bek=Visa">KBÄ</A></TD>
<TD>B2</TD>
</TR>
<TR>
<TD><a HREF="/cgi-bin/vplata.py?individ=RXXXXX&val=Visa+Lokindivid&Bek=Visa">RXXXXX</a></TD>
<TD>Mar-21-2013 22:05:00</TD>
<TD><A HREF="/cgi-bin/vplata.py?stn=ET3&val=Visa+Driftplats&Bek=Visa">ET3</A></TD>
<TD>Mar-21-2013 22:28:00</TD>
<TD><A HREF="/cgi-bin/vplata.py?stn=KB%&val=Visa+Driftplats&Bek=Visa">KBÄ</A></TD>
<TD>B1\B2</TD>
</TR>
<TR>
<TD><a HREF="/cgi-bin/vplata.py?individ=R1281&val=Visa+Lokindivid&Bek=Visa">R1281</a></TD>
<TD>Mar-25-2013 13:04:00</TD>
<TD><A HREF="/cgi-bin/vplata.py?stn=HBGB&val=Visa+Driftplats&Bek=Visa">HBGB</A></TD>
<TD>Mar-25-2013 21:21:00</TD>
<TD><A HREF="/cgi-bin/vplata.py?stn=ET3&val=Visa+Driftplats&Bek=Visa">ET3</A></TD>
<TD>D1</TD>
</TR>
<TR>
<TD><a HREF="/cgi-bin/vplata.py?individ=R1281&val=Visa+Lokindivid&Bek=Visa">R1281</a></TD>
<TD>Mar-25-2013 22:05:00</TD>
<TD><A HREF="/cgi-bin/vplata.py?stn=ET3&val=Visa+Driftplats&Bek=Visa">ET3</A></TD>
<TD>Mar-25-2013 22:28:00</TD>
<TD><A HREF="/cgi-bin/vplata.py?stn=KB%&val=Visa+Driftplats&Bek=Visa">KBÄ</A></TD>
<TD>D1</TD>
</TR>
<TR>
<TD><a HREF="/cgi-bin/vplata.py?individ=R1254&val=Visa+Lokindivid&Bek=Visa">R1254</a></TD>
<TD>Mar-27-2013 13:04:00</TD>
<TD><A HREF="/cgi-bin/vplata.py?stn=HBGB&val=Visa+Driftplats&Bek=Visa">HBGB</A></TD>
<TD>Mar-27-2013 21:21:00</TD>
<TD><A HREF="/cgi-bin/vplata.py?stn=ET3&val=Visa+Driftplats&Bek=Visa">ET3</A></TD>
<TD>B2</TD>
</TR>
<TR>
<TD><a HREF="/cgi-bin/vplata.py?individ=RXXXXX&val=Visa+Lokindivid&Bek=Visa">RXXXXX</a></TD>
<TD>Mar-27-2013 13:04:00</TD>
<TD><A HREF="/cgi-bin/vplata.py?stn=HBGB&val=Visa+Driftplats&Bek=Visa">HBGB</A></TD>
<TD>Mar-27-2013 21:21:00</TD>
<TD><A HREF="/cgi-bin/vplata.py?stn=ET3&val=Visa+Driftplats&Bek=Visa">ET3</A></TD>
<TD>B1\B2</TD>
</TR>
</TABLE>
<A><A>Senast uppdaterad: Mar-20-2013 18:16:00</A><BR>
<table width="100%" cellpadding="0" cellspacing="0" border="0">
<TR>
<TD width="20%" bgcolor="#009900"  align="left">
<IMG src="http://litmgc101.greencargo.com/bottenbild.jpg" alt="Green Cargo" width=800 height=25 border=0>
</TD>
</TR>
<TR>
</table>

Quello che voglio fare è recuperare le parti con (per esempio) "R1176" e la data "Mar-20-2013 13:04:00". (Preferirei NON avere il tempo "13:04:00"), ma posso eliminarlo in seguito in VB.net se non riesco a saltarlo nella fase di analisi.

Quindi, per spiegare semplicemente quello che voglio fare è seguire: Ottenere tutti "R1234" e la data che viene con esso quindi metterlo in una casella di testo diciamo per "R4321" e un'altra casella di testo per la data o qualcosa del genere.

Risposta accettata

In C # farei qualcosa del genere:

var result =
    doc.DocumentNode.SelectNodes("//td/a[contains(@href,'Lokindivid')]")
       .Select(node => new KeyValuePair<string, DateTime>(node.InnerText, DateTime.Parse(node.SelectSingleNode("./ancestor::tr[1]/td[2]").InnerText).Date));

Il mio VB.NET foo ha prodotto il seguente codice (che è una traduzione letterale) che funziona con l'html di esempio che hai fornito:

var result =
    doc.DocumentNode.SelectNodes("//td/a[contains(@href,'Lokindivid')]")
       .Select(node => new KeyValuePair<string, DateTime>(node.InnerText, DateTime.Parse(node.SelectSingleNode("./ancestor::tr[1]/td[2]").InnerText).Date));



Autorizzato sotto: CC-BY-SA with attribution
Non affiliato con Stack Overflow
È legale questo KB? Sì, impara il perché
Autorizzato sotto: CC-BY-SA with attribution
Non affiliato con Stack Overflow
È legale questo KB? Sì, impara il perché