c #: HtmlAgilityPack Descendientes

c# html html-agility-pack

Pregunta

Buen día. Tengo una tarea donde necesito convertir el documento de Word a HTML.

Esto se puede hacer usando interoperabilidad y guardar el documento como html. Pero necesito limpiar la salida html de interoperabilidad

Pero tengo un problema con htmlagilitypack. Pensé que es similar a XmlDocument c #

este es mi código c #

HtmlDocument doc = new HtmlDocument();
doc.Load(htmlLocation);
      foreach (var item in doc.DocumentNode.Descendants("p"))
      {

      if (item.HasChildNodes)
          {
             foreach (var itm in item.Descendants("span").ToList())
                {
                   Console.WriteLine(itm.InnerText);
                }
          }
      }

Este es el código html.

<html>

<head>
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<meta name=Generator content="Microsoft Word 12 (filtered)">

</head>

<body lang=EN-US link="#0066CC" vlink=purple style='text-justify-trim:punctuation'>

<div class=WordSection1>

<p class=Heading61 style='margin-bottom:0in;margin-bottom:.0001pt;text-indent:
.5in;line-height:normal;page-break-after:avoid;background:transparent'><span
class=Heading6><span style='font-size:12.0pt;color:black;background:yellow'>Epilogue</span></span></p>

<p class=MsoBodyText style='line-height:normal;background:transparent'><span
class=BodytextItalic2><span style='font-size:12.0pt;color:black;font-style:
normal'>&nbsp;</span></span></p>

<p class=MsoBodyText style='line-height:normal;background:transparent'><span
class=BodytextItalic2><span style='font-size:12.0pt;color:black;font-style:
normal'>Rebecca sat outside her lodge cradling her infant son in her arms. How
handsome he was, her little warrior, with his dusky skin and thick black hair.
For the first few days after his birth, she had been afraid to let him out of
her sight, out of her arms, for fear she would lose him, but he was a strong
healthy child.</span></span></p>

<p class=MsoBodyText style='text-indent:.5in;line-height:normal;background:
transparent'><span class=BodytextItalic2><span style='font-size:12.0pt;
color:black;font-style:normal'>Looking at him made her heart swell with love
for him and for his father. She had married Wolf Dreamer the day after they
returned to his people. Summer Moon Rising had left the village the following
day.</span></span></p>

</div>

</body>

</html>

Esta es la salida del código anterior.

Epilogue
Epilogue
&nbsp;
&nbsp;
Rebecca sat outside her lodge cradling her infant son in her arms. How
handsome he was, her little warrior, with his dusky skin and thick black hair.
For the first few days after his birth, she had been afraid to let him out of
her sight, out of her arms, for fear she would lose him, but he was a strong
healthy child.
Rebecca sat outside her lodge cradling her infant son in her arms. How
handsome he was, her little warrior, with his dusky skin and thick black hair.
For the first few days after his birth, she had been afraid to let him out of
her sight, out of her arms, for fear she would lose him, but he was a strong
healthy child.
Looking at him made her heart swell with love
for him and for his father. She had married Wolf Dreamer the day after they
returned to his people. Summer Moon Rising had left the village the following
day.
Looking at him made her heart swell with love
for him and for his father. She had married Wolf Dreamer the day after they day.

Lo que espero es que el segundo para cada uno depende de los elementos del elemento. ¿Pero por qué repite el texto?

Respuesta popular

Tienes 4 etiquetas p y cada etiqueta tiene dos vanos. Descendientes, obtiene todos los nodos descendientes con un nombre coincidente para que su foreach interno se repita por dos períodos

tu foreach interno podría ser

    foreach (var itm in item.ChildNodes)
    {
      Console.WriteLine(itm.InnerText);
    }


Related

Licencia bajo: CC-BY-SA with attribution
No afiliado con Stack Overflow
¿Es esto KB legal? Sí, aprende por qué
Licencia bajo: CC-BY-SA with attribution
No afiliado con Stack Overflow
¿Es esto KB legal? Sí, aprende por qué