Simplified Chinese | English
Features | Deployment Guide | User Guide | Configuration | Dialogue Log
Important
There will be major adjustments in configuration after v0.7.0 and will be incompatible with previous versions. It is more convenient to configure through the UI and provides more powerful configuration options.
OpenAI-Forward is an efficient forwarding service for large language models. Its core functions include user request rate control, Token rate limiting, intelligent prediction caching, log management and API key management, etc., aiming to provide efficient and convenient model forwarding services. Whether it is a proxy local language model or a cloud language model, such as LocalAI or OpenAI, it can be easily implemented by OpenAI Forward. Thanks to the support of libraries such as uvicorn, aiohttp, and asyncio, OpenAI-Forward achieves excellent asynchronous performance.
The agent service address built by this project:
Original OpenAI service address
https://api.openai-forward.com
https://render.openai-forward.com
The service address that enables cache (the user request result will be saved for a period of time)
https://smart.openai-forward.com
Deploy the documentation
Install
pip install openai-forward
# 或安装webui版本:
pip install openai-forward[webui]Start the service
aifd run
# 或启动带webui的服务
aifd run --webui If you read the .env configuration of the root path, you will see the following startup information
❯ aifd run
╭────── ? openai-forward is ready to serve ! ───────╮
│ │
│ base url https://api.openai.com │
│ route prefix / │
│ api keys False │
│ forward keys False │
│ cache_backend MEMORY │
╰────────────────────────────────────────────────────╯
╭──────────── ⏱️ Rate Limit configuration ───────────╮
│ │
│ backend memory │
│ strategy moving-window │
│ global rate limit 100/minute (req) │
│ /v1/chat/completions 100/2minutes (req) │
│ /v1/completions 60/minute ; 600/hour (req) │
│ /v1/chat/completions 60/second (token) │
│ /v1/completions 60/second (token) │
╰────────────────────────────────────────────────────╯
INFO: Started server process [191471]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) The default option for aifd run is proxy https://api.openai.com
The following is an example of the established service address https://api.openai-forward.com
Python
from openai import OpenAI # pip install openai>=1.0.0
client = OpenAI(
+ base_url="https://api.openai-forward.com/v1",
api_key="sk-******"
)Applicable scenarios: Used with LocalAI, api-for-open-llm and other projects
How to do: Take LocalAI as an example. If you have deployed LocalAI service at http://localhost:8080, you only need to set FORWARD_CONFIG=[{"base_url":"http://localhost:8080","route":"/localai","type":"openai"}] in the environment variable or .env file. You can then use LocalAI by visiting http://localhost:8000/localai.
(More)
Configure environment variables or .env files as follows:
FORWARD_CONFIG = [{"base_url":"https://generativelanguage.googleapis.com","route":"/gemini","type":"general"}] Note: After aidf run is started, you can use gemini pro by visiting http://localhost:8000/gemini.
Scenario 1: Use universal forwarding to forward services from any source, and you can obtain request rate control and token rate control; but universal forwarding does not support custom keys.
Scenario 2: The API formats of many cloud models can be converted to openai's API format through LiteLLM, and then forwarded using openai style
(More)
Execute aifd run --webui to enter the configuration page (default service address http://localhost:8001)
You can create a .env file in the project's run directory to customize various configurations. The .env.example file in the root directory can be seen in the reference configuration.
After the cache is enabled, the content of the specified route will be cached, and the forwarding types are openai and general , respectively, and the behaviors are slightly different. When using general forwarding, the same request will be returned by default using cache.
When using openai forwarding, after opening the cache, you can use OpenAI's extra_body parameter to control the cache behavior, such as
Python
from openai import OpenAI
client = OpenAI(
+ base_url="https://smart.openai-forward.com/v1",
api_key="sk-******"
)
completion = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": "Hello!"}
],
+ extra_body={"caching": True}
)Curl
curl https://smart.openai.com/v1/chat/completions
-H " Content-Type: application/json "
-H " Authorization: Bearer sk-****** "
-d ' {
"model": "gpt-3.5-turbo",
"messages": [{"role": "user", "content": "Hello!"}],
"caching": true
} '
See .env file
Use Cases:
import openai
+ openai.api_base = "https://api.openai-forward.com/v1"
- openai.api_key = "sk-******"
+ openai.api_key = "fk-******" See .env.example for using the use cases for forwarding services with different addresses to different routes to the same port.
The save path is in Log/openai/chat/chat.log path in the current directory.
The record format is
{'messages': [{'role': 'user', 'content': 'hi'}], 'model': 'gpt-3.5-turbo', 'stream': True, 'max_tokens': None, 'n': 1, 'temperature': 1, 'top_p': 1, 'logit_bias': None, 'frequency_penalty': 0, 'presence_penalty': 0, 'stop': None, 'user': None, 'ip': '127.0.0.1', 'uid': '2155fe1580e6aed626aa1ad74c1ce54e', 'datetime': '2023-10-17 15:27:12'}
{'assistant': 'Hello! How can I assist you today?', 'is_tool_calls': False, 'uid': '2155fe1580e6aed626aa1ad74c1ce54e'}
Convert to json format:
aifd convert Get chat_openai.json :
[
{
"datetime" : " 2023-10-17 15:27:12 " ,
"ip" : " 127.0.0.1 " ,
"model" : " gpt-3.5-turbo " ,
"temperature" : 1 ,
"messages" : [
{
"user" : " hi "
}
],
"tools" : null ,
"is_tool_calls" : false ,
"assistant" : " Hello! How can I assist you today? "
}
]Contributing to this project is welcome by submitting a pull request or raising questions in the repository.
OpenAI-Forward uses a MIT license.