pysentimiento 다운로드 - pysentimiento 소스 코드 다운로드

pysentimiento

기타 소스코드

1.0.0

다운로드

pysentimiento : 감정 분석 및 소셜 NLP 작업을위한 파이썬 툴킷

SocialNLP 작업을위한 변압기 기반 라이브러리.

현재 지원 :

일	언어
감정 분석	es, en, it, pt
언어 탐지를 싫어합니다	es, en, it, pt
아이러니 탐지	es, en, it, pt
감정 분석	es, en, it, pt
NER & POS 태깅	es, en
맥락화 된 증오심 표현 탐지	es
표적 감정 분석	es

pip install pysentimiento 하고 사용을 시작하십시오.

시작하기

 from pysentimiento import create_analyzer
analyzer = create_analyzer ( task = "sentiment" , lang = "es" )

analyzer . predict ( "Qué gran jugador es Messi" )
# returns AnalyzerOutput(output=POS, probas={POS: 0.998, NEG: 0.002, NEU: 0.000})
analyzer . predict ( "Esto es pésimo" )
# returns AnalyzerOutput(output=NEG, probas={NEG: 0.999, POS: 0.001, NEU: 0.000})
analyzer . predict ( "Qué es esto?" )
# returns AnalyzerOutput(output=NEU, probas={NEU: 0.993, NEG: 0.005, POS: 0.002})

analyzer . predict ( "jejeje no te creo mucho" )
# AnalyzerOutput(output=NEG, probas={NEG: 0.587, NEU: 0.408, POS: 0.005})
"""
Emotion Analysis in English
"""

emotion_analyzer = create_analyzer ( task = "emotion" , lang = "en" )

emotion_analyzer . predict ( "yayyy" )
# returns AnalyzerOutput(output=joy, probas={joy: 0.723, others: 0.198, surprise: 0.038, disgust: 0.011, sadness: 0.011, fear: 0.010, anger: 0.009})
emotion_analyzer . predict ( "fuck off" )
# returns AnalyzerOutput(output=anger, probas={anger: 0.798, surprise: 0.055, fear: 0.040, disgust: 0.036, joy: 0.028, others: 0.023, sadness: 0.019})

"""
Hate Speech (misogyny & racism)
"""
hate_speech_analyzer = create_analyzer ( task = "hate_speech" , lang = "es" )

hate_speech_analyzer . predict ( "Esto es una mierda pero no es odio" )
# returns AnalyzerOutput(output=[], probas={hateful: 0.022, targeted: 0.009, aggressive: 0.018})
hate_speech_analyzer . predict ( "Esto es odio porque los inmigrantes deben ser aniquilados" )
# returns AnalyzerOutput(output=['hateful'], probas={hateful: 0.835, targeted: 0.008, aggressive: 0.476})

hate_speech_analyzer . predict ( "Vaya guarra barata y de poca monta es XXXX!" )
# returns AnalyzerOutput(output=['hateful', 'targeted', 'aggressive'], probas={hateful: 0.987, targeted: 0.978, aggressive: 0.969})

지원되는 작업 및 언어에 대한 자세한 내용은 작업과 각 벤치마킹 모델에 대한보고 된 성능에 대한 자세한 내용을 참조하십시오.

또한 각 언어에 pysentimiento 사용하는 방법의 예를 들어이 노트북을 확인하십시오.

스페인어 + 영어
이탈리아 사람
포르투갈 인
맥락화 된 증오심 표현 - 스페인어

전처리

pysentimiento 변압기 기반 모델로 트윗 분류에 특별히 적합한 트윗 전 처리기를 특징으로합니다.

 from pysentimiento . preprocessing import preprocess_tweet

# Replaces user handles and URLs by special tokens
preprocess_tweet ( "@perezjotaeme debería cambiar esto http://bit.ly/sarasa" ) # "@usuario debería cambiar esto url"

# Shortens repeated characters
preprocess_tweet ( "no entiendo naaaaaaaadaaaaaaaa" , shorten = 2 ) # "no entiendo naadaa"

# Normalizes laughters
preprocess_tweet ( "jajajajaajjajaajajaja no lo puedo creer ajajaj" ) # "jaja no lo puedo creer jaja"

# Handles hashtags
preprocess_tweet ( "esto es #UnaGenialidad" )
# "esto es una genialidad"

# Handles emojis
preprocess_tweet ( "??" , lang = "en" )
# 'emoji party popper emoji emoji party popper emoji'

개발자를위한 지침

복제 및 설치

 git clone https://github.com/pysentimiento/pysentimiento
pip install poetry
poetry shell
poetry install

모델을 훈련시키기 위해 스크립트를 실행하십시오

모델 훈련 방법에 대한 자세한 내용은 Train.md를 확인하십시오.

참고 : 당분간 공개되지 않은 데이터 세트에 대한 액세스가 필요합니다. 액세스하려면 이메일을 보내주십시오.

Huggingface의 모델 허브에 모델을 업로드하십시오

huggingface Docs의 "모델 공유 및 업로드"지침을 확인하십시오.

특허

pysentimiento 는 오픈 소스 라이브러리입니다. 그러나 모델은 타사 데이터 세트로 교육을 받았으며 해당 라이센스가 적용되며 대부분은 비상업적 사용을위한 라이센스가 적용됩니다.

TASS 데이터 세트 라이센스 (스페인어의 감정 분석 라이센스, 스페인어 및 영어의 감정 분석)
Semeval 2017 DataSet 라이센스 (영어로 된 감정 분석)
Lince 데이터 세트 (NER & POS 태깅에 대한 라이센스)

제안 및 버그 픽스

저장소 문제 추적기를 사용하여 버그를 지적하고 제안을하십시오 (새로운 모델, 다른 데이터 세트 사용, 다른 언어 등)

소환

작업에서 pysentimiento 사용하는 경우이 논문을 인용하십시오.

 @misc { perez2021pysentimiento ,
      title = { pysentimiento: A Python Toolkit for Opinion Mining and Social NLP tasks } , 
      author = { Juan Manuel Pérez and Mariela Rajngewerc and Juan Carlos Giudici and Damián A. Furman and Franco Luque and Laura Alonso Alemany and María Vanina Martínez } ,
      year = { 2023 } ,
      eprint = { 2106.09462 } ,a
      archivePrefix = { arXiv } ,
      primaryClass = { cs.CL }
}