realtime vector embeddings Download - realtime vector embeddings Source code download

realtime vector embeddings

Other source code

1.0.0

Download

Vectors Simplified

Overview

Vectors Simplified This project is created to understand how similar different pieces of text are in a multi-dimensional space. This is a crucial concept in Natural Language Processing (NLP) tasks such as text classification, clustering, and recommendation systems.

Visualization

This project is part of llamapp

Purpose

Vectors Simplified is a web application designed to visualize vector embeddings. This project provides a user-friendly interface to input text, generate vector embeddings, and visualize their similarities using interactive plots. The core of the application demonstrates the concept of cosine similarity between vectors, making it easier to understand how similar different pieces of text are in a multi-dimensional space.

What are Vectors?

Vector Embeddings

Vector embeddings are numerical representations of text that capture semantic meaning in a high-dimensional space. They are used in various Natural Language Processing (NLP) tasks such as text classification, clustering, and recommendation systems. Each vector consists of multiple components that encode information about the text's features.

Vector (Mathematics and Physics)
Understanding Word Embeddings

Cosine Similarity

Cosine similarity is a metric used to measure how similar two vectors are, based on the cosine of the angle between them. It ranges from -1 (completely dissimilar) to 1 (completely similar). It’s widely used in text analysis to determine how similar two pieces of text are based on their vector representations.

Cosine Similarity
Cosine Similarity in Machine Learning

Basic Example of Cosine Similarity

Here’s a simple example of how cosine similarity works with basic vectors:

function calculateCosineSimilarity(vec1, vec2) {
    const dotProduct = vec1.reduce((acc, val, idx) => acc + val * vec2[idx], 0);
    const magnitude1 = Math.sqrt(vec1.reduce((acc, val) => acc + val * val, 0));
    const magnitude2 = Math.sqrt(vec2.reduce((acc, val) => acc + val * val, 0));
    return dotProduct / (magnitude1 * magnitude2);
}

// Example vectors
const vectorA = [1, 2, 3];
const vectorB = [4, 5, 6];

const similarity = calculateCosineSimilarity(vectorA, vectorB);
console.log(`Cosine Similarity: ${similarity.toFixed(2)}`); // Output: Cosine Similarity: 0.9746

Smoothing Function

When visualizing vector embeddings, especially when they contain a large number of items, the graph can become too clustered and difficult to interpret. To address this, a smoothing function is used.

Purpose of Smoothing

The smoothing function helps in reducing the noise in the data by averaging the values over a specified window size. This makes the graph more readable and allows for better visualization of the overall trends in the data.

Implementation of Smoothing

Here's how the smoothing function is implemented in the application:

const smoothData = (data, windowSize) => {
    const smoothed = [];
    for (let i = 0; i < data.length; i++) {
        const start = Math.max(0, i - Math.floor(windowSize / 2));
        const end = Math.min(data.length, i + Math.floor(windowSize / 2) + 1);
        const window = data.slice(start, end);
        const average = window.reduce((sum, val) => sum + val, 0) / window.length;
        smoothed.push(average);
    }
    return smoothed;
};

The smoothing can be toggled on or off using a checkbox in the UI, providing flexibility to the user.

Installation

Prerequisites

Ollama and Nomic-Text-Embed Model
- Download Ollama: Follow the instructions on the Ollama website to download and install Ollama.
- Run Nomic-Text-Embed Model:
  - Ensure Ollama is running and accessible at http://localhost:11434.
  - Run the nomic-embed-text command in the terminal by calling
```
ollama pull nomic-embed-text
```
  - You can find the model details and setup instructions in the Ollama documentation.
Why use Ollama and Nomic-Text-Embed Model?

Ollama helps in running Language models locally, large and small. nomic-embed-text isA high-performing open embedding model with a large token context window.
Install Node.js and npm
- Download and install Node.js from the official website. This will also install npm (Node Package Manager).

Install Packages

Clone the repository:

git clone www.github.com/rajatasusual/realtime-vector-embeddings.git
cd realtime-vector-embeddings

Install dependencies:
```
npm install
```

App Structure

CLI Usage

Run the server:
```
npm run cli
```
Start typing in the terminal to input text and get vector embeddings. The results and plot will be saved as embedding_plot.png in the current directory.
env file can be found in the project root directory with default settings

PORT=3000
EMBEDDINGS_MODEL=nomic-embed-text
EMBEDDINGS_BASE_URL=http://localhost:11434
SMOOTH=TRUE

UI Usage

Help

Start the Web Server:
```
npm start
```
This will start the server with the default port 3000.
Open index.html.
Use the input field to type text and click "Submit" to generate vector embeddings. The results will be displayed on the graph, and you can interact with it to view similarities.
Smoothing Option: Use the "Smooth" checkbox to toggle smoothing on or off for better visualization of the graph.