C# data scraping from websites

c# html-agility-pack

Question

HI I am pretty new in C# sphere. Been in php and JavaScript since the beginning of this year. I want to scrap posts and comments from a blog. The site is http://www.somewhereinblog.net

What I want to do is 1. I want to log in using a software 2. Then download the html 3. Then use regular expressions, xpath whatever comes handy to separate the contents of posts and comments

I been searching all over. Understood very little. Though I am quite sure I need to use 'htmlagilitypack'. I dont know how to add a library to c# console or form application. Can someone give me some help? I badly need this. And I am not too into C# just a week. So would be grateful if there is some detailed information. Waiting eagerly.

Thanks in advance brothers.

Accepted Answer

  1. Using Webclient you can login and download
  2. Instead html-agility-pack I like CsQuery because lets you use jQuery syntax inside a string in C# code, so you can download to a string the html, and search and do things in it like with jQuery and HTML page.


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why