How can i parse InnerText of

c# html-agility-pack

Question

Context:

I am trying to parse the "Cities" from this Page here. I already managed to simulate the request for the data of this combobox, which is a Ajax call.

Fiddler Request :

POST http://www.telelistas.net/AjaxHandler.ashx HTTP/1.1
Host: www.telelistas.net
Connection: keep-alive
Content-Length: 106
Origin: http://www.telelistas.net
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko)      Chrome/23.0.1271.97 Safari/537.11
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Accept: */*
Referer: http://www.telelistas.net/
Accept-Encoding: gzip,deflate,sdch
Accept-Language: pt-BR,pt;q=0.8,en-US;q=0.6,en;q=0.4
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
Cookie: cert_Origin=directo; email=bdc.testes@gmail.com; auto=automatico=0; searchparameters=bottom=0&btnsite=0&email=&uf=rj&origem=0&nome=&pagina=1&codlogradouro=&predio=213&tiquete=0&localidadeendmap=&codbairro=0&pcount=25&estacionamento=0&letra=&top=&entrega=0&pchave=&info=&logradouro=rua+da+lapa&codtitulo=-1&chave=&zoom=&comercial=0&ddd=0&comib=0&btnemail=0&pgresultado=&localidade=&telefone=&manobrista=0&codlocalidade=21000&site=&cartoes=0&atividade=&bairro=&reserva=0&residencial=0; perfil=logged=1&iduser=2563063&email=bdc.testes@gmail.com&usertype=2&specialsearch=3&siteusernome=BigDataCorp&siteuserdatanasc=15/01/1988&siteusersexo=M&siteuserlocalidade=21000&siteuseruf=RJ&siteuserddd=21&siteusertelefone=94118439&siteuserprofissao=4&siteuserrenda=5000&siteuserformacao=4&siteusernovidades=0&siteusernovidadesrevista=&siteusernovidadesparceiros=0&siteusercpf=10541308769&siteuseracesso=brasil&siteusercep=22631000&siteuseridade=24&siteuserparceiro=telelistas&siteuserconhecimento=2&siteuseroperadora=oi&siteuserurlorigem=http://www.telelistas.net/&siteuserdatacadastro=13/12/2012 11:45:00; __utma=70879631.392027796.1355939587.1356014801.1356021821.5; __utmb=70879631.1.10.1356021821; __utmc=70879631; __utmz=70879631.1355939587.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)

PostData : state=rj&style=busca_interna&selectedCity=21000&clientId=pch_localidade_select&method=GetSearchCitiesNamed

Issue:

Here is a fragment of the string returned by this request :

<select name='pch_localidade_select' class='busca_interna' id='pch_localidade_select' tabindex="4"><option value="">Selecione</option><option selected value="21000">Rio de Janeiro</option><option value="21380">Abraão</option><option value="21001">Afonso Arinos</option><option value="21002">Agência Luterback</option><option value="21847">Agriões de Dentro</option>

What i am trying to do, is to reach the InnerTextof the Option tags ("Rio de Janeiro", "Abraao"...), but for some weird reason, the InnerText is always empty, for every option node found.

There's some code fragment that is failing :

        // Iterating over nodes to build the dictionary
        foreach (HtmlNode city in citiesNodes)
        {
            string key   = city.InnerText;
            string value = city.Attributes["value"].Value;

            citiesHash.AddCity (key,value);
        }

Technology in Place:

I am using HtmlAgilityPack that supports XPath syntax for node selecting, C# code and Fiddler2 for WebDebugging.

Thanks in advance

Accepted Answer

For some weird reason, HtmlAgilityPack does not handles those tags correctly, so this managed to solve my problem.

        // Iterating over nodes to build the dictionary
        foreach (HtmlNode city in citiesNodes)
        {
            if (city.NextSibling != null)
            {
                string key   = city.NextSibling.InnerText;
                string value = city.Attributes["value"].Value;

                citiesHash.AddCity (key,value);
            }
        }

Instead of reaching directly the node,i managed to get the values of each node by using the NextSimbling reference from the previous simbling.


Popular Answer

Just use HtmlAgilityPack.HtmlNode.ElementsFlags.Remove("option"); before loading html

HtmlAgilityPack.HtmlNode.ElementsFlags.Remove("option");

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

var options = doc.DocumentNode.Descendants("option").Skip(1)
                .Select(n => new
                {
                    Value = n.Attributes["value"].Value,
                    Text = n.InnerText
                })
                .ToList();


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why