ragcar Download - ragcar Source code download

ragcar

Other source code

v0.1.4

Download

RAGCAR: Retrieval-Augmented Generative Companion for Advanced Research

RAGCAR? Is based on the natural language processing library of Kakaove Lane Pororo Architecture, adding a large language model (LLM) Openai GPT and NAVER's hyperclova X API feature. It supports the tools needed for RAG (RAG).

Installation

python>=3.8 It works normally in an environment.
You can install the package through the command below.

 pip install ragcar

You can also install it in a local environment as shown below.

 git clone https://github.com/leewaay/ragcar.git
cd ragcar
pip install -e .

Usage

You can use Ragcar with the following command:

First, in order to import Ragcar , you must run the following command:

 > >> from ragcar import Ragcar

After the Import, you can check the tasks currently supported by Ragcar through the following command.

 > >> from ragcar import Ragcar
> >> Ragcar . available_tools ()
"Available tools are ['tokenization', 'sentence_embedding', 'sentence_similarity', 'semantic_search', 'text_generation', 'text_segmentation']"

To see what models are supported by Task, you can go through the process below.

 > >> Ragcar . available_models ( "text_generation" )
'Available models for text_generation are ([src]: openai, [model]: gpt-4-turbo-preview, gpt-4, gpt-3.5-turbo, MODELS_SUPPORTED(https://platform.openai.com/docs/models)), ([src]: clova, [model]: YOUR_MODEL(https://www.ncloud.com/product/aiService/clovaStudio))'

When you want to perform a specific task, you can put the model type in the tool name and src argument that you look before tool factor.

 > >> from ragcar . utils import PromptTemplate
> >> prompt_template = PromptTemplate ( "사용자: {input} 수도는? n AI:" )

> >> generator = Ragcar ( tool = "text_generation" , src = "openai" , prompt_template = prompt_template , formatting = True )

After the object creation, it can be used by passing the input value as follows. Please refer to each TASK example in EXAMPLES.

 > >> generator ( input = "대한민국" )
{
    'id' : 'openai-dad4969f-6f0d-4413-a748-26d05cc0e73d' , 
    'model' : 'gpt-4-turbo-preview' , 
    'content' : '대한민국의 수도는 서울입니다.' , 
    'finish_reason' : 'stop' , 
    'input_tokens' : 23 , 
    'output_tokens' : 15 , 
    'total_tokens' : 38 , 
    'predicted_cost' : 0.0015899999999999998 , 
    'response_time' : 1.0608701705932617
}

️ How to set up environment variables

Certain src requires environment variables (ex. API key ) that requires security and maintenance, and can be set as one of the following three methods:

.env file: Create a .env file in the project top route and enter the necessary environment variable values.

Export: Declarate the necessary environment variables in the terminal.

 export OPENAI_API_KEY= ' sk-... '

model Factor Value: Enter the required environment variable directly to the Model factor value. ( Apply the same even if you need to add it in addition to the default model )

 > >> Ragcar . available_customizable_src ( "text_generation" )
"Available customizable src for text_generation are ['clova', 'openai']"

> >> Ragcar . available_model_fields ( "clova" )
'Available fields for clova are ([field]: model_n, [type]: str), ([field]: api_key, [type]: str), ([field]: app_key, [type]: str)'

 > >> generator = Ragcar (
    tool = "text_generation" , 
    src = "clova" , 
    model = {
        "model_n" : "YOUR_API_URL" , 
        "api_key" : "YOUR_APIGW-API-KEY" ,
        "app_key" : "YOUR_CLOVASTUDIO-API-KEY"
    }, 
    prompt_template = prompt_template , 
    formatting = True
)
> >> generator ( input = "대한민국" )
{
    'id' : 'clova-3c241fa1-f01e-4738-b208-5bcb35daad42' ,
    'model' : 'HCX-003' ,
    'content' : '대한민국 수도는 서울입니다.' ,
    'finish_reason' : 'stop_before' ,
    'input_tokens' : 12 ,
    'output_tokens' : 8 ,
    'total_tokens' : 20 ,
    'predicted_cost' : 0.6 ,
    'response_time' : 0.7090704441070557 ,
    'ai_filter' : []
 }

Please check Examples for more detailed ways!

️ Notes on using text_generation `Tool`

1. Notice of `predicted_cost`

predicted_cost is calculated differently depending on the API used when using the text_generation tool. For Openai, predicted_cost is calculated by the dollar (USD) , and Clova is calculated as the original (KRW) . This is because the billing system of each service is different. Specific charging information according to the current model can be found in the base.py file.

2. Precautions when using Naver Hyperloba

When using Text_generation tool with Clova src , be careful about the changes that have been changed compared to some official parameter:

Parameter name change :
- Please use presence_penalty instead of top_k .
- Please use frequency_penalty instead of repeat_penalty .
Parameter value range :
- 0.0 < temperature < 1.0
- 0.0 < top_p < 1.0
- 0 < presence_penalty < 128
- 0.0 < frequency_penalty < 10.0

️ How to upload Google Drive Model

Sentence_embedding Example Check

Documentation

If you have any questions or opinions, please leave an issue.

ACKNOWEDGEMENTS

pororo

 @misc { pororo ,
  author       = { Heo, Hoon and Ko, Hyunwoong and Kim, Soohwan and
                  Han, Gunsoo and Park, Jiwoo and Park, Kyubyong } ,
  title        = { PORORO: Platform Of neuRal mOdels for natuRal language prOcessing } ,
  howpublished = { url{https://github.com/kakaobrain/pororo} } ,
  year         = { 2021 } ,
}

Sentence-Transformers

 @inproceedings { reimers-2019-sentence-bert ,
    title = " Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks " ,
    author = " Reimers, Nils and Gurevych, Iryna " ,
    booktitle = " Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing " ,
    month = " 11 " ,
    year = " 2019 " ,
    publisher = " Association for Computational Linguistics " ,
    url = " https://arxiv.org/abs/1908.10084 " ,
}