doppelganger Download - doppelganger Source code download

doppelganger

Other source code

1.0.0

Download

doppelgänger

Problem: open-source maintainers spend a lot of time managing duplicate/related (doppelgänger) issues & pull requests
Solution: doppelgänger compares newly submitted issues/PRs against existing ones to automatically flag duplicate/related (doppelgänger) issues/PRs

Topics: vector db, github, open-source, embedding search, rag, similarity scores

Screen.Recording.2024-04-27.at.4.57.11.PM.mov

This application is a GitHub App that automatically compares newly opened issues with existing ones, closing and commenting on highly similar issues to reduce duplication. In addition, it comments feedback on PRs based on title and description for points to consider.

Doppelganger Documentation

How it works

Each issue['title'] and issue['body'] is converted into vector representation using MiniLM-L6-v2.

Each vector are persisted in ChromaDB and performs similarity search using ChromaDB's built-in cosine similarity search. Along with each vector are issue_id and issue['title'] stored using ChromaDB's metadata argument.

SIMILARITY_THRESHOLD (i.e. distance d in which we consider "similar") is configurable, and can be set to any decimal between 0 and 1 [1].

Doppelganger will close any issue when the cosine distance d between the newly submitted issue and the most similar issue is greater than this threshold. Otherwise, if the newly submitted issue is greater than (SIMILARTY_THRESHOLD*0.5), it will leave a helpful comment indicating the most similar/related issue.

[1] cosine distance

Issues and pull requests are stored in ChromaDB collections per repository.

Prerequisites

Python 3.8+
A GitHub account
A server or hosting platform to run the app (e.g., Heroku, DigitalOcean, AWS)
ollama

Setup Instructions

1. Create a GitHub App

Go to your GitHub account settings.
Click on "Developer settings" in the left sidebar.
Select "GitHub Apps" and click "New GitHub App".
Fill in the required information:
- GitHub App name: Choose a unique name (e.g., "Issue Similarity Checker")
- Homepage URL: Your app's website or your GitHub profile
- Webhook URL: The URL where your server will be running (e.g., https://your-server.com/webhook)
- Webhook secret: Generate a secure secret and save it for later use
Set permissions:
- Repository permissions:
  - Issues: Read & write
  - Pull requests: Read & Write
  - Webhooks: Read-only
- Subscribe to events:
  - Issues
  - Pull request
Create the app and note down the App ID
Generate a private key and download it (you'll need this later)

2. Prepare Your Environment

Clone this repository:

git clone https://github.com/dannyl1u/doppelganger.git
cd doppelganger

Install dependencies:
```
pip install -r requirements.txt
```
To create a new .env file, run the following command in your terminal:

cp .env.example .env

Open the newly created .env file and update the following variables with your own values:
* APP_ID: Replace your_app_id_here with your actual app ID.
* WEBHOOK_SECRET: Replace your_webhook_secret_here with your actual webhook secret.
* OLLAMA_MODEL: Replace your_chosen_llm_model_here with your chosen LLM model (e.g. "llama3.2"). Note: it must be an Ollama supported model (see: https://ollama.com/library for supported models)
* NGROK_DOMAIN: Replace your_ngrok_domain_here with your ngrok domain if you have one 4. Place the downloaded private key in the project root and name it rsa.pem.