LIDA是生成數據可視化和數據信仰信息圖表的庫。 LIDA是語法不可知論(將使用任何編程語言和可視化庫,例如Matplotlib,Seaborn,Altair,D3等),並與多個大型語言模型提供商(OpenAI,Azure Openai,Palm,Palm,Cohere,Cohere,HuggingFace)一起工作。有關LIDA組件的詳細信息在此處和本教程筆記本中進行了描述。請參閱此處的項目頁面以獲取更新!
請注意代碼執行:為了創建可視化,LIDA生成並執行代碼。確保您在安全的環境中運行LIDA。

LIDA將可視化視為代碼,並提供了用於生成,執行,編輯,解釋,評估和修復可視化代碼的干淨API。
from lida import Manager , llm
lida = Manager ( text_gen = llm ( "openai" )) # palm, cohere ..
summary = lida . summarize ( "data/cars.csv" )
goals = lida . goals ( summary , n = 2 ) # exploratory data analysis
charts = lida . visualize ( summary = summary , goal = goals [ 0 ]) # exploratory data analysis 設置並驗證您的Python環境是python 3.10或更高(最好是使用Conda)。通過PIP安裝庫。
pip install -U lida LIDA取決於llmx和openai 。如果您先前安裝了這些庫,請考慮更新它們。
pip install -U llmx openai滿足要求後,設置您的API密鑰。在此處了解有關為其他LLM提供商設置密鑰的更多信息。
export OPENAI_API_KEY= < your key >另外,您可以通過克隆此存儲庫並運行pip install -e .在存儲庫根中。
LIDA帶有可選的捆綁UI和Web API,您可以通過運行以下命令來探索:
lida ui --port=8080 --docs然後導航到http:// localhost:8080/在瀏覽器中。要查看Web API規範,請在CLI命令中添加--docs選項,然後導航到瀏覽器中的http://localhost:8080/api/docs 。
安裝後最快,最建議的開始方法是嘗試上面的Web UI或運行教程筆記本。
可以使用Docker和下面的命令來設置LIDA Web API和UI(確保已安裝了Docker,並且已設置OPENAI_API_KEY環境變量)。
docker compose up給定數據集,生成數據的緊湊摘要。
from lida import Manager
lida = Manager ()
summary = lida . summarize ( "data/cars.json" ) # generate data summary給定數據摘要生成一組可視化目標。
goals = lida . goals ( summary , n = 5 , persona = "ceo with aerodynamics background" ) # generate goals添加persona參數以基於該角色生成目標。
給定數據摘要和可視化目標,生成,完善,執行和過濾可視化代碼。請注意,LIDA表示可視化為代碼。
# generate charts (generate and execute visualization code)
charts = lida . visualize ( summary = summary , goal = goals [ 0 ], library = "matplotlib" ) # seaborn, ggplot ..鑑於可視化,請使用自然語言編輯可視化。
# modify chart using natural language
instructions = [ "convert this to a bar chart" , "change the color to red" , "change y axes label to Fuel Efficiency" , "translate the title to french" ]
edited_charts = lida . edit ( code = code , summary = summary , instructions = instructions , library = library , textgen_config = textgen_config )給定可視化,生成自然語言的可視化代碼(可訪問性,應用數據轉換,可視化代碼)的解釋)
# generate explanation for chart
explanation = lida . explain ( code = charts [ 0 ]. code , summary = summary )鑑於可視化,評估以查找修復說明(可以是人類的或生成的),修復可視化。
evaluations = lida . evaluate ( code = code , goal = goals [ i ], library = library )給定數據集,生成一組推薦的可視化。
recommendations = lida . recommend ( code = code , summary = summary , n = 2 , textgen_config = textgen_config )給定可視化,生成數據信仰的信息圖。該方法應視為實驗,並使用孔雀庫中穩定的擴散模型。您將需要運行pip install lida[infographics]來安裝所需的依賴項。
infographics = lida . infographics ( visualization = charts [ 0 ]. raster , n = 3 , style_prompt = "line art" )LIDA使用LLMX庫作為文本生成的接口。 LLMX支持多個本地模型,包括擁抱面模型。您可以直接使用擁抱面模型(假設您擁有GPU),也可以使用出色的VLLM庫連接到OpenAI兼容本地模型端點。
!p ip3 install - - upgrade llmx == 0.0 . 17 a0
# Restart the colab session
from lida import Manager
from llmx import llm
text_gen = llm ( provider = "hf" , model = "uukuguy/speechless-llama2-hermes-orca-platypus-13b" , device_map = "auto" )
lida = Manager ( text_gen = text_gen )
# now you can call lida methods as above e.g.
sumamry = lida . summarize ( "data/cars.csv" ) # .... from lida import Manager , TextGenerationConfig , llm
model_name = "uukuguy/speechless-llama2-hermes-orca-platypus-13b"
model_details = [{ 'name' : model_name , 'max_tokens' : 2596 , 'model' : { 'provider' : 'openai' , 'parameters' : { 'model' : model_name }}}]
# assuming your vllm endpoint is running on localhost:8000
text_gen = llm ( provider = "openai" , api_base = "http://localhost:8000/v1" , api_key = "EMPTY" , models = model_details )
lida = Manager ( text_gen = text_gen )自然,以上一些局限性可以由受到倍受歡迎的公關解決。
這裡有一篇描述LIDA的簡短論文(在ACL 2023會議上接受)。
@inproceedings { dibia2023lida ,
title = " {LIDA}: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models " ,
author = " Dibia, Victor " ,
booktitle = " Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations) " ,
month = jul,
year = " 2023 " ,
address = " Toronto, Canada " ,
publisher = " Association for Computational Linguistics " ,
url = " https://aclanthology.org/2023.acl-demo.11 " ,
doi = " 10.18653/v1/2023.acl-demo.11 " ,
pages = " 113--126 " ,
}LIDA建立在自動生成可視化的見解基礎上,從較早的論文-Data2vis:使用序列自動生成數據可視化,以序列復發性神經網絡。