weibo analysis and visualization Download - weibo analysis and visualization Source code download

weibo analysis and visualization

Other source code

1.0.0

Download

Weibo text analysis and visualization

0. Data source and structure

Sina Weibo, crawler link:

https://github.com/HUANGZHIHAO1994/weibospider-keyword

Weibo content data structure (json document exported by mongo database)

 content_example:
[
{'_id': '1177737142_H4PSVeZWD', 'keyword': 'A股', 'crawl_time': '2019-06-01 20:31:13', 'weibo_url': 'https://weibo.com/1177737142/H4PSVeZWD', 'user_id': '1177737142', 'created_at': '2018-11-29 03:02:30', 'tool': 'Android', 'like_num': {'$numberInt': '0'}, 'repost_num': {'$numberInt': '0'}, 'comment_num': {'$numberInt': '0'}, 'image_url': 'http://wx4.sinaimg.cn/wap180/4632d7b6ly1fxod61wktyj20u00m8ahf.jpg', 'content': '#a股观点# 鲍威尔主席或是因为被特朗普总统点名批评后萌生悔改之意，今晚一番讲话被市场解读为美联储或暂停加息步伐。美元指数应声下挫，美股及金属贵金属价格大幅上扬，A50表现也并不逊色太多。对明天A股或有积极影响，反弹或能得以延续。 [组图共2张]'},...
]

Weibo comment data structure (json document exported by mongo database)

 comment_example:
[
{'_id': 'C_4322161898716112', 'crawl_time': '2019-06-01 20:35:36', 'weibo_url': 'https://weibo.com/1896820725/H9inNf22b', 'comment_user_id': '6044625121', 'content': '没问题，', 'like_num': {'$numberInt': '0'}, 'created_at': '2018-12-28 11:19:21'},...
]

1. Data preprocessing

prepro.py, pre_graph.py, senti_pre.py
In order to meet various analysis needs, data preprocessing is required. See these three py files for the specific required data file type and output result data structure.
PS:
When prepro.py runs, modify the three codes in lines 123, 143, and 166 as needed.
When pre_graph.py runs, modify two codes at 127 and 140 lines as needed.
Senti_pre.py runtime to modify line 119 code as needed
zh_wiki.py, langconv.py
These two py files are used to traditional Chinese to simplified Chinese without modification

2. Data analysis and visualization

Word cloud: wc.py (need to finish prepro.py)
Modify 3, 19, 26 lines of code as needed
Popularity map: map.py (need to complete prepro.py)
Modify line 8 code as needed
Repost, comment, like time series: line.py (need to run senti_pre.py and senti_analy.py)
Weibo comment relationship diagram: graph.py (need to run pre_graph.py)
(refer to)
Text clustering: cluster_tfidf.py and cluster_w2v.py (need to run prepro.py)
LDA theme model analysis: LDA.py (need to run senti_pre.py) tree.py (need to run senti_analy.py)
Senti Analysis (Dictionary): senti_analy.py (need to run senti_pre.py) 3Dbar.py (need to run senti_analy.py) pie.py (need to run senti_analy.py)
Sentiment Analysis (W2V+LSTM): senti_lstm.py in Sentiment-Analysis-master document (need to run senti_pre.py)
Modify 250 lines of code according to the situation
Some documents are too large and placed in the Baidu Netdisk link:
Link: https://pan.baidu.com/s/1l447d3d6OSd_yAlsF7b_mA Extraction code: og9t
Text similarity analysis: similar.py (for reference only)
Others are available for reference: senti_analy_refer.py, Sentiment_lstm.py
About Senti_Keyword_total_id.csv:
Download 8. Senti_Keyword_total_id.csv in Baidu Netdisk. The following is an explanation: This file is almost the same as Senti_Keyword_total.csv, but there is an additional column of weibo_id (the code to generate Senti_Keyword_total_id.csv is no longer given here. It is directly used to generate Senti_Keyword_total_id.csv. The generated Senti_Keyword_total_id.csv can be rewritten senti_analy.py and add a column of weibo_id). Baidu Netdisk in 8 (there are Senti_Keyword_total_id.csv and Senti_Keyword_total.csv, as well as all comments and all contents). Since lines.py and other words require all keywords, you need to use senti_analy.py to directly run all comment.json and content.json to generate Senti_Keyword_total.csv (just drop from the network disk, Senti_Keyword_total_id.csv and then run lines.py, 3Dbar.py, pie.py)

Expand

Additional Information

Version 1.0.0
Type Other source code
Update Time 2025-04-19
size 66.36MB
From Github

Related Applications

Gunslingers and Zombies

2022-08-15
Dungeon and Puzzles

2022-08-08
Toodee and Topdee

2022-08-06
Dungeon and Gra

2022-08-06
Fire and Steel

2022-08-02
Lilei And Hanmeimei

2022-07-26

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3

Related Information All