pgsql search Download - pgsql search Source code download

pgsql search

Other source code

1.0.0

Download

? Key Features

Currrent and planned features:

PostgreSQL Full Text Search
Vector text-to-image search
Vector image-to-image search
Hybrid search with RRF

? Installation

This project uses pixi to manage dependencies and environments.

If you're on Linux or macOS, you can install pixi using the following commands:

curl -fsSL https://pixi.sh/install.sh | bash

Then clone the repository:

git clone https://github.com/dnth/pgsql-search.git
cd pgsql-search

Install the project:

pixi install

This should install all the dependencies of the project including PostgreSQL, CUDA, PyTorch, and pgvector into a virtual environment.

Tip

Why pixi and not uv?

We are using PostgreSQL database in this project and it's not installable directly via uv or pip. But PostgreSQL is installable via conda.

Instead of using conda, we use pixi to manage the environment and dependencies. Plus, pixi uses uv under the hood to pull Python packages. This gives us the speed of uv for Python packages and the flexibility of conda for system level dependencies.

Quickstart

Start the local database server:

pixi run configure-db

This initializes the database and starts the server. You should see a folder named mylocal_db in your current directory. This folder contains the database files.

Once the database is set up, let's run the quickstart script:

pixi run quickstart

This script will load a dataset with images and captions, create a database, insert the dataset into the database, and run a full text search and print the results.

If everything goes well, you should see the results printed in the terminal.

Usage

Currently, we only support Hugging Face datasets. Let's load a dataset with images and captions.

from pgsql_search.loader import HuggingFaceDatasets

ds = HuggingFaceDatasets("UCSC-VLAA/Recap-COCO-30K") # Load the dataset
ds.save_images("../data/images") # Save the images to a local folder
ds = ds.select_columns(["image_filepath", "caption"]) # Select the columns we want to use

ds.dataset is a Hugging Face Dataset object. You are free to perform any operations supported by the datasets package.

ds.dataset

Dataset({
    features: ['image_filepath', 'caption'],
    num_rows: 30504
})

From ds.dataset we see that we have 30504 rows in the dataset with 2 columns: image_filepath and caption. Now we can create a database and insert the dataset into the database.

from pgsql_search.database import PostgreSQLDatabase, ColumnType

PostgreSQLDatabase.create_database("my_database")

Insert the dataset into the database:

df = ds.dataset.to_pandas()

with PostgreSQLDatabase("my_database") as db:
    db.initialize_table("image_metadata")
    db.add_column("image_filepath", ColumnType.TEXT, nullable=False)
    db.add_column("caption", ColumnType.TEXT, nullable=True)

    db.insert_dataframe(df)

Once completed, we can run a full text search on the database.

from pgsql_search.database import PostgreSQLDatabase

query = "man in a yellow shirt"

with PostgreSQLDatabase("my_database") as db:
    res = db.full_text_search(
        query=query, 
        table_name="image_metadata", 
        search_column="caption", 
        num_results=10
    )