Article Summarizer Using AI

Article Summarizer Using AI

其他源碼

1.0.0

下載

文章 - 符合器 - 使用-AI

基於AI的Web應用程序，可使用先進的自然語言處理（NLP）技術簡要摘要。

介紹

文章 - 夏線using-ai是一種Web應用程序，旨在使用NLP匯總冗長的文章。該應用程序允許用戶上傳自己的文章或使用示例數據以使用生成的AI模型來生成各種樣式的摘要。

數據探索

數據集

用於培訓和評估的數據集是PubMed摘要數據集。它包括來自PubMed的文章，其中包含用作摘要的相應摘要。

加載數據集：

 from datasets import load_dataset

pubmed_data = load_dataset ( "ccdv/pubmed-summarization" , split = 'train[:1000]' )

初始數據清潔：

刪除缺少值的行以確保數據質量。

 pubmed_data = pubmed_data . filter ( lambda x : x [ 'article' ] is not None and x [ 'abstract' ] is not None )

探索性數據分析：
- 檢查文章長度和摘要長度的分佈。
- 確定數據集中的常見主題和術語。
```
 print ( pubmed_data [ 0 ])  # View the first data entry 
```

模型選擇

預處理

文本令牌化：

將文本分成句子和單詞以進行詳細分析。

 from nltk . tokenize import sent_tokenize , word_tokenize

sentences = sent_tokenize ( article_text )
words = word_tokenize ( sentence )

停止單詞刪除：

刪除不貢獻摘要的常見英語單詞。

 from nltk . corpus import stopwords

stop_words = set ( stopwords . words ( 'english' ))
words = [ word for word in words if word . lower () not in stop_words ]

檸檬酸：

將單詞轉換為基本形式。

 from nltk . stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer ()
words = [ lemmatizer . lemmatize ( word . lower ()) for word in words ]

生成模型

API配置：

使用google.generativeai庫進行模型生成。

 import google . generativeai as genai
import os

api_key = os . environ . get ( 'your_api_key' )
genai . configure ( api_key = api_key )

模型初始化：
- 設置生成AI模型。
```
 model = genai . GenerativeModel ()
```

模型微調

訓練

使用PubMed數據集微調模型，以提高匯總質量。

 # Example pseudo-code for fine-tuning
model . train ( dataset = pubmed_data , epochs = 10 , learning_rate = 0.001 )

提取性摘要

方法

對於提取性摘要，該應用程序使用傳統的NLP技術來識別文章中的關鍵句子，而無需依賴生成模型。

提取性摘要腳本：
重命名提供的extractive_summary.py到app.py並將其移至項目root：
```
mv /mnt/data/extractive_summary.py app.py
```

核心邏輯：

提取性摘要腳本使用統計和啟發式方法來識別文本中最重要的句子。

 # Example of extractive summarization
def extractive_summary ( text ):
    # Tokenize the text and rank sentences
    sentences = sent_tokenize ( text )
    # Rank and select key sentences (pseudo-code)
    summary = ' ' . join ( sentences [: 3 ])  # Example: Select first 3 sentences
    return summary

一體化：

將提取性摘要邏輯與燒瓶應用程序集成在一起，以允許用戶在生成和提取性摘要之間進行選擇。

 @ app . route ( '/summarize' , methods = [ 'POST' ])
def summarize ():
    if 'file' in request . files and request . files [ 'file' ]. filename != '' :
        file = request . files [ 'file' ]
        article_text = file . read (). decode ( "utf-8" )
    else :
        sample_index = int ( request . form [ 'sample' ])
        article_text = pubmed_data [ sample_index ][ 'article' ]

    style = request . form . get ( 'style' , 'brief' )
    summary_method = request . form . get ( 'method' , 'generative' )
    
    if summary_method == 'generative' :
        summary_text = preprocess_and_summarize ( article_text , style )
    else :
        summary_text = extractive_summary ( article_text )

    return render_template ( 'result.html' , original = article_text , summary = summary_text )

評估

使用Rouge或Bleu等指標評估模型的性能。

 from nltk . translate . bleu_score import sentence_bleu

reference = [ reference_summary . split ()]
candidate = generated_summary . split ()
score = sentence_bleu ( reference , candidate )
print ( f'BLEU Score: { score } ' )

Web應用程序開發

後端

燒瓶設置：

初始化燒瓶應用程序並配置登錄管理器。

 from flask import Flask
from flask_login import LoginManager

app = Flask ( __name__ )
app . secret_key = 'your_secret_key'
login_manager = LoginManager ( app )

路線和身份驗證：

實施登錄，註冊，摘要和註銷的路由。

 @ app . route ( '/login' , methods = [ 'GET' , 'POST' ])
def login ():
    # login logic here
    return render_template ( 'login.html' )

前端

模板：

為用戶界面創建HTML模板。

 <!-- templates/index.html -->
< form action =" {{ url_for('summarize') }} " method =" post " enctype =" multipart/form-data " >
    < input type =" file " name =" file " >
    < button type =" submit " > Summarize </ button >
</ form >

用戶體驗：
- 確保具有清晰說明和反饋的用戶友好界面。

安裝

先決條件

Python 3.7+
燒瓶
NLTK
生成AI庫（例如Google.generativeai）
生成AI的API密鑰

步驟

克隆存儲庫：

git clone https://github.com/yourusername/Article-Summarizer-Using-AI.git

導航到項目目錄：
```
 cd Article-Summarizer-Using-AI
```

創建虛擬環境：

python -m venv venv
source venv/bin/activate  # On Windows use `venvScriptsactivate`

安裝依賴項：
```
pip install -r requirements.txt
```
設置環境變量：
- 使用您的API鍵創建.env文件。
```
 your_api_key=<YOUR_GENERATIVE_AI_API_KEY>
```
下載NLTK數據：
該腳本處理下載必要的NLTK數據。