An AI-based web application that provides concise summaries of articles using advanced natural language processing (NLP) techniques.
Article-Summarizer-Using-AI is a web application designed to summarize lengthy articles using NLP. The application allows users to upload their own articles or use sample data to generate summaries in various styles, utilizing a generative AI model.
The dataset used for training and evaluation is the PubMed Summarization dataset. It includes articles from PubMed with corresponding abstracts used as summaries.
Loading the Dataset:
from datasets import load_dataset
pubmed_data = load_dataset("ccdv/pubmed-summarization", split='train[:1000]')Initial Data Cleaning:
pubmed_data = pubmed_data.filter(lambda x: x['article'] is not None and x['abstract'] is not None)Exploratory Data Analysis:
print(pubmed_data[0]) # View the first data entryText Tokenization:
from nltk.tokenize import sent_tokenize, word_tokenize
sentences = sent_tokenize(article_text)
words = word_tokenize(sentence)Stop Words Removal:
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))
words = [word for word in words if word.lower() not in stop_words]Lemmatization:
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
words = [lemmatizer.lemmatize(word.lower()) for word in words]API Configuration:
google.generativeai library for model generation.import google.generativeai as genai
import os
api_key = os.environ.get('your_api_key')
genai.configure(api_key=api_key)Model Initialization:
model = genai.GenerativeModel()Fine-tune the model with the PubMed dataset to improve summary quality.
# Example pseudo-code for fine-tuning
model.train(dataset=pubmed_data, epochs=10, learning_rate=0.001)For extractive summarization, the application uses traditional NLP techniques to identify key sentences from the article without relying on a generative model.
Extractive Summary Script:
Rename the provided extractive_summary.py to app.py and move it to the project root:
mv /mnt/data/extractive_summary.py app.pyCore Logic:
# Example of extractive summarization
def extractive_summary(text):
# Tokenize the text and rank sentences
sentences = sent_tokenize(text)
# Rank and select key sentences (pseudo-code)
summary = ' '.join(sentences[:3]) # Example: Select first 3 sentences
return summaryIntegration:
@app.route('/summarize', methods=['POST'])
def summarize():
if 'file' in request.files and request.files['file'].filename != '':
file = request.files['file']
article_text = file.read().decode("utf-8")
else:
sample_index = int(request.form['sample'])
article_text = pubmed_data[sample_index]['article']
style = request.form.get('style', 'brief')
summary_method = request.form.get('method', 'generative')
if summary_method == 'generative':
summary_text = preprocess_and_summarize(article_text, style)
else:
summary_text = extractive_summary(article_text)
return render_template('result.html', original=article_text, summary=summary_text)Evaluate the model's performance using metrics such as ROUGE or BLEU.
from nltk.translate.bleu_score import sentence_bleu
reference = [reference_summary.split()]
candidate = generated_summary.split()
score = sentence_bleu(reference, candidate)
print(f'BLEU Score: {score}')Flask Setup:
from flask import Flask
from flask_login import LoginManager
app = Flask(__name__)
app.secret_key = 'your_secret_key'
login_manager = LoginManager(app)Routes and Authentication:
@app.route('/login', methods=['GET', 'POST'])
def login():
# login logic here
return render_template('login.html')Templates:
<!-- templates/index.html -->
<form action="{{ url_for('summarize') }}" method="post" enctype="multipart/form-data">
<input type="file" name="file">
<button type="submit">Summarize</button>
</form>User Experience:
Clone the Repository:
git clone https://github.com/yourusername/Article-Summarizer-Using-AI.gitNavigate to the Project Directory:
cd Article-Summarizer-Using-AICreate a Virtual Environment:
python -m venv venv
source venv/bin/activate # On Windows use `venvScriptsactivate`Install Dependencies:
pip install -r requirements.txtSet Environment Variables:
.env file with your API key.your_api_key=<YOUR_GENERATIVE_AI_API_KEY>
Download NLTK Data:
The script handles downloading necessary NLTK data.
Run the Application:
flask run --port=5001Access the App:
http://127.0.0.1:5001 in your browser.Login/Register:
Summarize Articles:
View Summary:
Thank you for using Article-Summarizer-Using-AI! We hope you find it useful for your summarization needs.