weixin_public_corpus
1.0.0
Some articles on WeChat public accounts crawled on the Internet have removed HTML and only contain plain text. One article per line is in JSON format, name is the name of the WeChat public account, account is the WeChat public account ID, title is the title, and content is the main text.
The data was compressed in zip volumes, without a password. For preview, you can see preview.json.
Currently the data is about 3G, and the data will be updated regularly.
Please only be used for research purposes.
If you have any problems or special needs, you can build an Issue directly.
Like-minded friends are welcome to join the school treasure and do interesting things! https://www.xiaobaoonline.com/pc/contactjoin