I have scraped a table from a website using C# for my own website and loaded it into a string. There are too many columns so I was wondering if there was an easy way to delete some, probably using HTML Agility Pack but in C# if necessary.
The table in the string looks like this:
<table>
<tr>
<th scope="col"> </th>
<th scope="col"> </th>
<th scope="col">P </th>
<th scope="col">W </th>
<th scope="col">L </th>
<th scope="col">T </th>
<th scope="col">NR </th>
<th scope="col">Bat </th>
<th scope="col">Bowl </th>
<th scope="col">Pen </th>
<th scope="col">Pts </th>
</tr>
<tr>
<td>1 </td>
<td><a href="fixbyteam.aspx?clubid=44576&teamid=58170&divid=32181">Rayleigh 2nd</a> </td>
<td>12 </td>
<td>8 </td>
<td>1 </td>
<td>0 </td>
<td>3 </td>
<td>14 </td>
<td>52 </td>
<td>0 </td>
<td>209 </td>
</tr>
<tr>
<td>2 </td>
<td><a href="fixbyteam.aspx?clubid=44612&teamid=58169&divid=32181">Rainham 1st</a> </td>
<td>12 </td>
<td>8 </td>
<td>1 </td>
<td>1 </td>
<td>2 </td>
<td>12 </td>
<td>56 </td>
<td>-15 </td>
<td>199 </td>
</tr>
<tr class="lineAbove">
<td>3 </td>
<td><a href="fixbyteam.aspx?clubid=44571&teamid=58162&divid=32181">Old Chelmsfordians 2nd</a> </td>
<td>12 </td>
<td>5 </td>
<td>5 </td>
<td>0 </td>
<td>2 </td>
<td>10 </td>
<td>48 </td>
<td>0 </td>
<td>148 </td>
</tr>
<tr>
<td>4 </td>
<td><a href="fixbyteam.aspx?clubid=44570&teamid=58161&divid=32181">Little Baddow 2nd</a> </td>
<td>12 </td>
<td>5 </td>
<td>4 </td>
<td>0 </td>
<td>3 </td>
<td>21 </td>
<td>43 </td>
<td>-15 </td>
<td>144 </td>
</tr>
<tr>
<td>5 </td>
<td><a href="fixbyteam.aspx?clubid=44606&teamid=58159&divid=32181">Rayne 1st</a> </td>
<td>12 </td>
<td>5 </td>
<td>4 </td>
<td>0 </td>
<td>3 </td>
<td>6 </td>
<td>39 </td>
<td>0 </td>
<td>140 </td>
</tr>
<tr>
<td>6 </td>
<td><a href="fixbyteam.aspx?clubid=44605&teamid=58158&divid=32181">Terling 1st</a> </td>
<td>12 </td>
<td>4 </td>
<td>5 </td>
<td>1 </td>
<td>2 </td>
<td>12 </td>
<td>35 </td>
<td>0 </td>
<td>129 </td>
</tr>
<tr>
<td>7 </td>
<td><a href="fixbyteam.aspx?clubid=44602&teamid=58154&divid=32181">Willow Herbs 1st</a> </td>
<td>12 </td>
<td>4 </td>
<td>6 </td>
<td>0 </td>
<td>2 </td>
<td>9 </td>
<td>34 </td>
<td>0 </td>
<td>117 </td>
</tr>
<tr>
<td>8 </td>
<td><a href="fixbyteam.aspx?clubid=50925&teamid=68864&divid=32181">Ongar 1st</a> </td>
<td>12 </td>
<td>3 </td>
<td>5 </td>
<td>0 </td>
<td>4 </td>
<td>3 </td>
<td>42 </td>
<td>-5 </td>
<td>108 </td>
</tr>
<tr class="lineAbove">
<td>9 </td>
<td><a href="fixbyteam.aspx?clubid=44607&teamid=58163&divid=32181">Sandon Sports 1st</a> </td>
<td>12 </td>
<td>3 </td>
<td>6 </td>
<td>0 </td>
<td>3 </td>
<td>8 </td>
<td>27 </td>
<td>0 </td>
<td>98 </td>
</tr>
<tr>
<td>10 </td>
<td><a href="fixbyteam.aspx?clubid=44582&teamid=58156&divid=32181">Little Waltham 2nd</a> </td>
<td>12 </td>
<td>1 </td>
<td>9 </td>
<td>0 </td>
<td>2 </td>
<td>14 </td>
<td>25 </td>
<td>0 </td>
<td>65 </td>
</tr>
</table>
And I want to delete columns 8-10 (Bat, Bowl and Pen). I'm not really sure where to start so any pointers would be helpful!
You would need to iterate over each tr
and remove the 8th, 9th and 10th td
nodes from each.
bool first = true;
foreach (HtmlNode row in doc.DocumentNode.SelectNodes("//tr"))
{
if (first)
{
row.RemoveChild(row.SelectSingleNode("th[10]"));
row.RemoveChild(row.SelectSingleNode("th[9]"));
row.RemoveChild(row.SelectSingleNode("th[8]"));
first = false;
}
else
{
row.RemoveChild(row.SelectSingleNode("td[10]"));
row.RemoveChild(row.SelectSingleNode("td[9]"));
row.RemoveChild(row.SelectSingleNode("td[8]"));
}
}