LIDA是生成数据可视化和数据信仰信息图表的库。 LIDA是语法不可知论(将使用任何编程语言和可视化库,例如Matplotlib,Seaborn,Altair,D3等),并与多个大型语言模型提供商(OpenAI,Azure Openai,Palm,Palm,Cohere,Cohere,HuggingFace)一起工作。有关LIDA组件的详细信息在此处和本教程笔记本中进行了描述。请参阅此处的项目页面以获取更新!
请注意代码执行:为了创建可视化,LIDA生成并执行代码。确保您在安全的环境中运行LIDA。

LIDA将可视化视为代码,并提供了用于生成,执行,编辑,解释,评估和修复可视化代码的干净API。
from lida import Manager , llm
lida = Manager ( text_gen = llm ( "openai" )) # palm, cohere ..
summary = lida . summarize ( "data/cars.csv" )
goals = lida . goals ( summary , n = 2 ) # exploratory data analysis
charts = lida . visualize ( summary = summary , goal = goals [ 0 ]) # exploratory data analysis 设置并验证您的Python环境是python 3.10或更高(最好是使用Conda)。通过PIP安装库。
pip install -U lida LIDA取决于llmx和openai 。如果您先前安装了这些库,请考虑更新它们。
pip install -U llmx openai满足要求后,设置您的API密钥。在此处了解有关为其他LLM提供商设置密钥的更多信息。
export OPENAI_API_KEY= < your key >另外,您可以通过克隆此存储库并运行pip install -e .在存储库根中。
LIDA带有可选的捆绑UI和Web API,您可以通过运行以下命令来探索:
lida ui --port=8080 --docs然后导航到http:// localhost:8080/在浏览器中。要查看Web API规范,请在CLI命令中添加--docs选项,然后导航到浏览器中的http://localhost:8080/api/docs 。
安装后最快,最建议的开始方法是尝试上面的Web UI或运行教程笔记本。
可以使用Docker和下面的命令来设置LIDA Web API和UI(确保已安装了Docker,并且已设置OPENAI_API_KEY环境变量)。
docker compose up给定数据集,生成数据的紧凑摘要。
from lida import Manager
lida = Manager ()
summary = lida . summarize ( "data/cars.json" ) # generate data summary给定数据摘要生成一组可视化目标。
goals = lida . goals ( summary , n = 5 , persona = "ceo with aerodynamics background" ) # generate goals添加persona参数以基于该角色生成目标。
给定数据摘要和可视化目标,生成,完善,执行和过滤可视化代码。请注意,LIDA表示可视化为代码。
# generate charts (generate and execute visualization code)
charts = lida . visualize ( summary = summary , goal = goals [ 0 ], library = "matplotlib" ) # seaborn, ggplot ..鉴于可视化,请使用自然语言编辑可视化。
# modify chart using natural language
instructions = [ "convert this to a bar chart" , "change the color to red" , "change y axes label to Fuel Efficiency" , "translate the title to french" ]
edited_charts = lida . edit ( code = code , summary = summary , instructions = instructions , library = library , textgen_config = textgen_config )给定可视化,生成自然语言的可视化代码(可访问性,应用数据转换,可视化代码)的解释)
# generate explanation for chart
explanation = lida . explain ( code = charts [ 0 ]. code , summary = summary )鉴于可视化,评估以查找修复说明(可以是人类的或生成的),修复可视化。
evaluations = lida . evaluate ( code = code , goal = goals [ i ], library = library )给定数据集,生成一组推荐的可视化。
recommendations = lida . recommend ( code = code , summary = summary , n = 2 , textgen_config = textgen_config )给定可视化,生成数据信仰的信息图。该方法应视为实验,并使用孔雀库中稳定的扩散模型。您将需要运行pip install lida[infographics]来安装所需的依赖项。
infographics = lida . infographics ( visualization = charts [ 0 ]. raster , n = 3 , style_prompt = "line art" )LIDA使用LLMX库作为文本生成的接口。 LLMX支持多个本地模型,包括拥抱面模型。您可以直接使用拥抱面模型(假设您拥有GPU),也可以使用出色的VLLM库连接到OpenAI兼容本地模型端点。
!p ip3 install - - upgrade llmx == 0.0 . 17 a0
# Restart the colab session
from lida import Manager
from llmx import llm
text_gen = llm ( provider = "hf" , model = "uukuguy/speechless-llama2-hermes-orca-platypus-13b" , device_map = "auto" )
lida = Manager ( text_gen = text_gen )
# now you can call lida methods as above e.g.
sumamry = lida . summarize ( "data/cars.csv" ) # .... from lida import Manager , TextGenerationConfig , llm
model_name = "uukuguy/speechless-llama2-hermes-orca-platypus-13b"
model_details = [{ 'name' : model_name , 'max_tokens' : 2596 , 'model' : { 'provider' : 'openai' , 'parameters' : { 'model' : model_name }}}]
# assuming your vllm endpoint is running on localhost:8000
text_gen = llm ( provider = "openai" , api_base = "http://localhost:8000/v1" , api_key = "EMPTY" , models = model_details )
lida = Manager ( text_gen = text_gen )自然,以上一些局限性可以由受到倍受欢迎的公关解决。
这里有一篇描述LIDA的简短论文(在ACL 2023会议上接受)。
@inproceedings { dibia2023lida ,
title = " {LIDA}: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models " ,
author = " Dibia, Victor " ,
booktitle = " Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations) " ,
month = jul,
year = " 2023 " ,
address = " Toronto, Canada " ,
publisher = " Association for Computational Linguistics " ,
url = " https://aclanthology.org/2023.acl-demo.11 " ,
doi = " 10.18653/v1/2023.acl-demo.11 " ,
pages = " 113--126 " ,
}LIDA建立在自动生成可视化的见解基础上,从较早的论文-Data2vis:使用序列自动生成数据可视化,以序列复发性神经网络。