book reviews semantic search Download - book reviews semantic search Source code download

book reviews semantic search

Other source code

1.0.0

Download

Semantic & Full-Text Search Engine for Books

This repository contains code and resources to run a semantic and full-text search engine for books. It utilizes text embeddings and supports harvesting book metadata from various sources, using international standards like MARC21 and ONIX 3.

The application leverages Multilingual-E5-small for generating text embeddings and PostgreSQL with pgvector as vector store. This provides multilingual semantic search capabilities.

Technologies

Multilingual-E5-small: This pre-trained model is used for generating text embeddings.
pgvector: A PostgreSQL extension for storing and querying vectors, used as the vector store in the search engine.

Getting Started

Follow these steps to set up and run the application:

1. Create and run PostgreSQL database

Run the following command in the project directory:

docker compose up

This will start the PostgreSQL database with pgvector enabled.

2. Configure the Gateway

Select and configure the appropriate gateway and service-uri for harvesting metadata by editing application.yaml. Available options:

oai-pmh
bibbi
bokbasen

3. Start the Application

The first run may take some time as it will download the necessary embedding models. Once the models are in place, the application will be ready for use.

./gradlew bootRun

4. Use the search engine

Visit http://localhost:8080 in the browser and watch the results as the metadata harvesting progresses. For semantic search enter a search query or leave it blank for a random choice (the first search hit will be the random choice and the rest will be semantically similar books). For full-text search enter a search query.

Gateway

The gateway abstracts away the details of the external services and transforms metadata from the external services into a common model. The application supports three gateways: OAI-PMH (MARC21), Bokbasen (ONIX) and Bibbi. Custom mappers can be implemented as needed and activated by configuring the appropriate values in application.yaml.

OAI-PMH

The OAI-PMH gateway harvests metadata using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). It supports retrieving bibliographic data in MARC21 format.

OAI-PMH
MARC21

Additional documentation for OAI-PMH from Biblioteksentralen (https://www.bibsent.no/):

Ája OAI-PMH API (requires no authentication)

Bokbasen

The Bokbasen gateway uses the ONIX format for metadata, commonly employed in the publishing industry. This is particularly useful for harvesting data from large-scale book vendors.

ONIX 3.0

Additional documentation for ONIX from Bokbasen (https://www.bokbasen.no/):

Bokbasen ONIX API (requires authentication)

Bibbi

The Bibbi gateway is used for integrating with the Bibbi metadata service. The gateway uses a format based on Schema.org.

Schema.org

Additional documentation for Bibbi from Biblioteksentralen (https://www.bibsent.no/):

Bibbi Metadata REST API (requires no authentication)

Text classification

Instructions for extracting a dataset for fine-tuning a BERT-based model for multi-label classification of book reviews: https://github.com/torleifg/book-reviews-genre-classification

psql -h localhost -p 5433 -U username -d postgres

Extract example dataset using genre and form as labels.

copy (
select
	concat(metadata ->>'title', '. ', metadata ->>'description') as text,
	metadata ->>'genreAndForm' as labels
from
	book
where
	metadata->>'description' is not null
	and metadata->>'description' <> ''
	and length(metadata->>'description') > 200
	and metadata->>'genreAndForm' is not null
	and metadata->>'genreAndForm' <> '[]'
) to '~/dataset.csv' with csv header delimiter ';';

Expand

Additional Information

Version 1.0.0
Type Other source code
Update Time 2025-05-26
size 396.66KB
From Github

Related Applications

Word Search 800

2024-11-08
azure search python samples

2024-11-05
book searcher software

2023-10-25
Secret Phone Book app

2023-07-05
Book of Demons

2022-07-25
PHP Address Book

2012-04-27

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3

Related Information All