In the field of data visualization, generating charts that accurately reflect complex data has always been a challenging task. Not only does the chart need to accurately capture visual elements such as layout, color and text position, but it also needs to convert these details into code to achieve the desired design effect. However, traditional methods often rely on direct prompting vision-language models (VLMs), such as GPT-4V, which often encounter difficulties when converting complex visual elements into syntactical correct Python code. Even small mistakes can cause charts to fail to meet design goals, which is particularly important in areas such as financial analysis, academic research and educational reporting.
To address this problem, a research team from UCLA (University of California, Los Angeles), UC Merced and Adobe proposed a new framework called METAL. The system breaks down the chart generation task into a series of centralized steps managed by a dedicated agent, thereby improving the accuracy and consistency of generating the chart.

The METAL framework includes four key agents: Generation agent, Visual evaluation agent, Code evaluation agent, and Revision agent. The Generator is responsible for initially generating Python code, the visual evaluation agent evaluates how consistent the generated charts are with the reference diagrams, the Code Evaluation Agent reviews the generated code to capture any syntax or logical errors, and finally revises the agent to adjust the code based on the evaluation feedback. This modular design allows each agent to focus on its specific functionality, ensuring that the visual and technical elements of the chart are fully considered and adjusted.
In the experiment, METAL performed a performance evaluation on the ChartMIMIC dataset, and the results showed that it was superior to traditional methods in terms of text clarity, chart type accuracy, color consistency, and layout accuracy. Comparisons with the open source model LLAMA3.2-11B and the closed source model GPT-4O show that the graph generated by METAL is closer to the accuracy of the reference graph. In addition, the study also emphasized the importance of separation of visual and code evaluation mechanisms through ablation experiments. Performance tends to decline when these two components are combined into one evaluation agent, suggesting that specialized evaluation methods are critical for high-quality graph generation.

METAL provides a balanced multi-agent approach by breaking down tasks into specialized, iterative steps. This approach not only promotes the precise conversion of visual design to Python code, but also provides a systematic process for error detection and correction. With the increase of computing resources, METAL's performance also shows a near-linear improvement, which provides practical potential for application scenarios with high precision requirements.
Project: https://metal-chart-generation.github.io/
Key points:
The METAL framework was jointly proposed by UCLA, UC Merced and Adobe to optimize the chart generation process.
The framework consists of four dedicated agents, respectively, responsible for generating, evaluating and revising charts to ensure that visual and technical elements are properly handled.
Experimental results show that METAL is better than traditional methods in terms of accuracy and consistency in graph generation, showing good practical potential.