Even if the tbody element wasn't there in the original HTML text, the C# HtmlAgilityPack adds it to the DOM tree in tables after the LoadHtml method. How can I turn this off?
My method generates various XPATH expressions by navigating the DOM tree, and because the original document's tbody element is missing, the SelectNodes are unable to locate the needed objects. It took me a long time to realize this:
Is it feasible to have SelectNodes take into account the nodes that HTMLAgilityPack has added?
<table> <tr><td>data</td></tr> </table>
To extract "data," my program would generate the following XPATH: /table/tbody/tr/td
The tbody element in the equation was inserted because, when HtmlAgilityPack parsed the HTML code, it was found in the DOM tree (even though it didn't exist). In light of that
In other words, the parent TagName of the tr element (HtmlElement) is 'TBODY' rather than 'TABLE. Additionally, I parse a lot of various websites, so this is one instance.
Instead of using the DOM tree it has after the HtmlDocument, SelectNodes searches the original HTML code. Otherwise, it disregards any "virtual" items that are added by LoadHtml.
It's not necessary to use the whole hierarchy.
If all you need are the following, just utilize them.
either disregard the
Node and get all relevant hierarchy: