TextDescriptives下载 - TextDescriptives源代码下载

TextDescriptives

其他源码

v2.8.4

下载

一个用于使用Spacy V.3管道组件和扩展名来计算文本的各种指标的Python库。

？安装

pip install textdescriptives

？消息

现在，我们有一个由TextDescriptives驱动的Web应用程序，因此您可以提取和下载指标，而无需单行代码！在这里查看
2.0版，具有新的API，新组件，更新的文档和教程！现在由“ textdescriptives/{metric_name}的组件调用，用于计算句子之间的语义连贯性的新coherence组件。有关教程和更多信息，请参见文档！

⚡快速开始

使用extract_metrics快速提取所需的指标。要查看可用的方法，您可以简单地运行：

 import textdescriptives as td
td . get_valid_metrics ()
# {'quality', 'readability', 'all', 'descriptive_stats', 'dependency_distance', 'pos_proportions', 'information_theory', 'coherence'}

设置spacy_model参数以指定要使用的SPACY模型，否则，TextDeScriptives将根据lang自动下载适当的下载。如果设置了lang ，则不需要spacy_model ，反之亦然。

指定在metrics参数中提取的指标。 None提取所有指标。

 import textdescriptives as td

text = "The world is changed. I feel it in the water. I feel it in the earth. I smell it in the air. Much that once was is lost, for none now live who remember it."
# will automatically download the relevant model (´en_core_web_lg´) and extract all metrics
df = td . extract_metrics ( text = text , lang = "en" , metrics = None )

# specify spaCy model and which metrics to extract
df = td . extract_metrics ( text = text , spacy_model = "en_core_web_lg" , metrics = [ "readability" , "coherence" ])

用Spacy使用

要与其他Spacy管道集成，请使用标准Spacy语法导入库并将组件添加到您的管道中。可用的组件是描述性_STAT ，可读性，依赖关系_distance ， pos_proportions ， Cooherence和质量，并带有textdescriptives/ 。

如果要添加所有组件，则可以使用Shorthand textdescriptives/all 。

 import spacy
import textdescriptives as td
# load your favourite spacy model (remember to install it first using e.g. `python -m spacy download en_core_web_sm`)
nlp = spacy . load ( "en_core_web_sm" )
nlp . add_pipe ( "textdescriptives/all" ) 
doc = nlp ( "The world is changed. I feel it in the water. I feel it in the earth. I smell it in the air. Much that once was is lost, for none now live who remember it." )

# access some of the values
doc . _ . readability
doc . _ . token_length

TextDeScriptivs包括将指标从Doc提取到Pandas DataFrame或字典的方便功能。

 td . extract_dict ( doc )
td . extract_df ( doc )

	文本	first_order_coherence	second_order_coherence	pos_prop_det	pos_prop_noun	pos_prop_aux	pos_prop_verb	pos_prop_punct	pos_prop_pron	pos_prop_adp	pos_prop_adv	pos_prop_sconj	flesch_reading_ease	FLESCH_KINCAID_GRADE	烟雾	gunning_fog	automated_radibalibal_index	coleman_liau_index	利克斯	rix	n_stop_words	alpha_ratio	mean_word_length	doc_length	promportion_ellipsis	promportion_bullet_points	diplate_line_chr_fraction	diplate_paragraph_chr_fraction	重复_5-gram_chr_fraction	重复_6-gram_chr_fraction	重复_7-gram_chr_fraction	重复_8-gram_chr_fraction	重复_9-gram_chr_fraction	重复_10-gram_chr_fraction	top_2-gram_chr_fraction	top_3-gram_chr_fraction	top_4-gram_chr_fraction	符号_＃_ to_word_ratio	contains_lorem ipsum	通过_quality_check	dependency_distance_mean	dependency_distance_std	prop_adjacent_dependency_relation_mean	prop_adjacent_dependency_relation_std	token_length_mean	token_length_median	token_length_std	ston_length_mean	ston_length_median	ston_length_std	syllables_per_token_mean	syllables_per_token_median	syllables_per_token_std	n_tokens	n_unique_tokens	promportion_unique_tokens	n_characters	n_Sentences
0	世界变了（...）	0.633002	0.573323	0.097561	0.121951	0.0731707	0.170732	0.146341	0.195122	0.0731707	0.0731707	0.0487805	107.879	-0.0485714	5.68392	3.94286	-2.45429	-0.708571	12.7143	0.4	24	0.853659	2.95122	41	0	0	0	0	0.232258	0.232258	0	0	0	0	0.0580645	0.174194	0	0	错误的	错误的	1.77524	0.553188	0.457143	0.0722806	3.28571	3	1.54127	7	6	3.09839	1.08571	1	0.368117	35	23	0.657143	121	5

文档

TextDescriptives具有详细的文档以及一系列Jupyter笔记本教程。所有教程都位于docs/tutorials文件夹中，也可以在文档网站上找到。

文档
入门	指南和说明如何使用TextDeScriptives及其功能。
？ ‍演示	文本描述词的现场演示。
？教程	有关如何充分利用文本描述词的详细教程
？新闻和更改	新的添加，更改和版本历史记录。
？ API参考	TextDective的API的详细参考。包括功能文档
？纸	TextDeScriptives论文的预印本。