agility html parser read from buffer/stream

asp.net c# html html-agility-pack httpmodule

Question

I am trying to alter an HTML page before it renders in a browser using an HTTP module. I tried to implement the agility HTML parser, but it only seems to read from files.

How can I have it read from a buffer/stream?

public override void Write(byte[] buffer, int offset, int count)
    {
      byte[] data = new byte[count];
      Buffer.BlockCopy(buffer, offset, data, 0, count);
      string html = System.Text.Encoding.Default.GetString(buffer);

      HtmlDocument doc = new HtmlDocument();
      doc.Load(html);
      foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
      {
      HtmlAttribute att = link["href"];
      att.Value = FixLink(att);
      }
    }

Accepted Answer

You should be able to use a MemoryStream to read in the data:

public override void Write(byte[] buffer, int offset, int count)
{
  var stream = new MemoryStream(buffer, offset, count);

  HtmlDocument doc = new HtmlDocument();
  doc.Load(stream);

  foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
  {
    HtmlAttribute att = link["href"];
    att.Value = FixLink(att);
  }
}

Popular Answer

Actually HtmlDocument.Load() method is overloaded and contains definitions for loading streams: Load(Stream), Load(Stream, Boolean), Load(Stream, Encoding).

You can find documentation in Downloads tab at http://htmlagilitypack.codeplex.com/




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why