使用HtmlAgilityPack在C#中獲取特定數據並將其序列化為json

c# html html-agility-pack json visual-studio

我已經下載了一個html源代碼,我試圖從中獲取一些數據,將其序列化為“json”文件。

這是html源文件: https//drive.google.com/file/d/0BzweTZsfeoxMTWk2LVdnYTJMRUE/view?usp =sharing

在html代碼中,我希望從中收集數據的“2”組。

目前,我設法將代碼放在這個“2”組中,並使用標籤將其顯示在兩個面板中。我的代碼是休閒:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using HtmlAgilityPack;

namespace Parser_Test_1._0
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {

        }

        private void button1_Click(object sender, EventArgs e)
        {
            HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
            doc.Load(@"C:...\bin\Debug\xbFrSourceCode.txt");

            string datacollected1 = doc.DocumentNode.SelectNodes("//*[@id=\"favoritesContent\"]/div[2]/div[2]/ul")[0].InnerHtml;
            string datacollected2 = doc.DocumentNode.SelectNodes("//*[@id=\"friendsContent\"]/div[2]/div[2]")[0].InnerHtml;
            label1.Text = datacollected1;
            label2.Text = datacollected2;
        }      

    }
}

從這兩個組中,我希望收集用戶和每個用戶的用戶,將他們各自的數據序列化為json文件。

每個用戶都以<li ...></li>分隔

對於我希望得到的每個用戶:

  • Gamertag: data-gamertag="this is the gamertag"
  • Gamerpic:它在class="gamerpicWrapper"src="this is the gamerpic"
  • 真實姓名: <div class="realName">this is the realname</div>真實姓名<div class="realName">this is the realname</div>
  • PrimaryInfo: <div class="primaryInfo">this is the primaryinfo</div>
  • <div class="statusIcon"><div class="statusIcon">如果這裡有代碼,那麼在json文件中這個值將為true </div>

這是所需“json”文件格式的一個示例(請注意,可能寫入的代碼很糟糕。):

{
    "favorites" : 
    [
        {
            "gamertag" : "Gamertag1",
            "gamerpic" : "gamerpicURL",
            "realname" : "",
            "primaryInfo" : "",
            "isOnline" : false,
        },
        {
            "gamertag" : "Gamertag2",
            "gamerpic" : "gamerpicURL",
            "realname" : "realname2",
            "primaryInfo" : "primaryinfo2",
            "isOnline" : true,
        },
        {
            "gamertag" : "Gamertag3",
            "gamerpic" : "gamerpicURL",
            "realname" : "",
            "primaryInfo" : "",
            "isOnline" : false,
        },
        {
            "gamertag" : "Gamertag4",
            "gamerpic" : "gamerpicURL",
            "realname" : "realname4",
            "primaryInfo" : "",
            "isOnline" : true,
        }

    ]
    "friends" : 
    [
        {
            "gamertag" : "Gamertag1",
            "gamerpic" : "gamerpicURL",
            "realname" : "",
            "primaryInfo" : "",
            "isOnline" : true,
        },
        {
            "gamertag" : "Gamertag2",
            "gamerpic" : "gamerpicURL",
            "realname" : "realname2",
            "primaryInfo" : "primaryinfo2",
            "isOnline" : false,
        },
        {
            "gamertag" : "Gamertag3",
            "gamerpic" : "gamerpicURL",
            "realname" : "realname3",
            "primaryInfo" : "",
            "isOnline" : true,
        },
        {
            "gamertag" : "Gamertag4",
            "gamerpic" : "gamerpicURL",
            "realname" : "",
            "primaryInfo" : "",
            "isOnline" : false,
        }

    ]
}

如果有人能告訴我如何做到這一點,我將不勝感激。

熱門答案

以下代碼顯示了xpath和HAP的適當用法。 xpath的用法可以簡化,但你給了我一個4k的html文件,我不想學習所有這些的結構。但是代碼會將您想要的所有內容作為變量。現在你的工作是放入json結構 - 但是如果你對JSON沒有任何了解,那麼考慮使用XML。

        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        doc.OptionFixNestedTags = true;
        doc.Load("damn.html");

        //First off we find the nodes we want to collect data from. Note that we are only looking for a singlenode compared to your code where you find all nodes
        //this could be cut down to selectnodes where we take all <li> tages with each div tag. But for simplicity.
        HtmlNodeCollection favoritesContent = doc.DocumentNode.SelectNodes("//div[@id='favoritesContent']/div[@class='personListWrapper']/div[@class='gamerList']/ul//li");

        foreach (HtmlNode x in favoritesContent)
        {
            //here we find the gamertag which is an attribute in <li> if <li> does not have that value
            //it will then return the deault value ""(empty string as specified)
            string gamerTag = x.GetAttributeValue("data-gamertag", "");
            HtmlNode temp = x.SelectSingleNode("./a[@class='gamerpicWrapper']/*/img[@class='favorite']");
            string srcOnPic = temp.GetAttributeValue("src", "not found");
            string realName = x.SelectSingleNode("./descendant::*//div[@class='realName']").InnerText;
            string primaryInfo = x.SelectSingleNode("./descendant::*//div[@class='primaryInfo']").InnerText;

            if (0 < x.SelectSingleNode("./div[@class='statusIcon']").InnerHtml.Length)
            {
                bool online = true;

            }
        }



許可下: CC-BY-SA with attribution
不隸屬於 Stack Overflow
這個KB合法嗎? 是的,了解原因
許可下: CC-BY-SA with attribution
不隸屬於 Stack Overflow
這個KB合法嗎? 是的,了解原因