How can i parse InnerText of

c# html-agility-pack



I am trying to parse the "Cities" from this Page here. I already managed to simulate the request for the data of this combobox, which is a Ajax call.

Fiddler Request :

Connection: keep-alive
Content-Length: 106
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko)      Chrome/23.0.1271.97 Safari/537.11
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Accept: */*
Accept-Encoding: gzip,deflate,sdch
Accept-Language: pt-BR,pt;q=0.8,en-US;q=0.6,en;q=0.4
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
Cookie: cert_Origin=directo;; auto=automatico=0; searchparameters=bottom=0&btnsite=0&email=&uf=rj&origem=0&nome=&pagina=1&codlogradouro=&predio=213&tiquete=0&localidadeendmap=&codbairro=0&pcount=25&estacionamento=0&letra=&top=&entrega=0&pchave=&info=&logradouro=rua+da+lapa&codtitulo=-1&chave=&zoom=&comercial=0&ddd=0&comib=0&btnemail=0&pgresultado=&localidade=&telefone=&manobrista=0&codlocalidade=21000&site=&cartoes=0&atividade=&bairro=&reserva=0&residencial=0; perfil=logged=1&iduser=2563063& 11:45:00; __utma=70879631.392027796.1355939587.1356014801.1356021821.5; __utmb=70879631.1.10.1356021821; __utmc=70879631; __utmz=70879631.1355939587.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)

PostData : state=rj&style=busca_interna&selectedCity=21000&clientId=pch_localidade_select&method=GetSearchCitiesNamed


Here is a fragment of the string returned by this request :

<select name='pch_localidade_select' class='busca_interna' id='pch_localidade_select' tabindex="4"><option value="">Selecione</option><option selected value="21000">Rio de Janeiro</option><option value="21380">Abraão</option><option value="21001">Afonso Arinos</option><option value="21002">Agência Luterback</option><option value="21847">Agriões de Dentro</option>

What i am trying to do, is to reach the InnerTextof the Option tags ("Rio de Janeiro", "Abraao"...), but for some weird reason, the InnerText is always empty, for every option node found.

There's some code fragment that is failing :

        // Iterating over nodes to build the dictionary
        foreach (HtmlNode city in citiesNodes)
            string key   = city.InnerText;
            string value = city.Attributes["value"].Value;

            citiesHash.AddCity (key,value);

Technology in Place:

I am using HtmlAgilityPack that supports XPath syntax for node selecting, C# code and Fiddler2 for WebDebugging.

Thanks in advance

12/20/2012 5:16:08 PM

Accepted Answer

For some weird reason, HtmlAgilityPack does not handles those tags correctly, so this managed to solve my problem.

        // Iterating over nodes to build the dictionary
        foreach (HtmlNode city in citiesNodes)
            if (city.NextSibling != null)
                string key   = city.NextSibling.InnerText;
                string value = city.Attributes["value"].Value;

                citiesHash.AddCity (key,value);

Instead of reaching directly the node,i managed to get the values of each node by using the NextSimbling reference from the previous simbling.

12/20/2012 5:20:43 PM

Popular Answer

Just use HtmlAgilityPack.HtmlNode.ElementsFlags.Remove("option"); before loading html


HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

var options = doc.DocumentNode.Descendants("option").Skip(1)
                .Select(n => new
                    Value = n.Attributes["value"].Value,
                    Text = n.InnerText

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow