Large language models are expensive and slow behemoths, and evaluating them on gigantic modern datasets only makes it worse.
If only there is a way to just select a meaningful (and small) subset of the corpus and obtain a highly accurate evaluation.....
Wait, sounds like Bayesian Optimization!
Bocoel works in the following steps:
The evaluations generated are easily managed by the provided manager utility.
To our knowledge, this is the first work aiming to reduce computation costs during evaluation (benchmarking) with a (possibly dynamic) budget.
GPT2, Pythia, LLAMA and more through integration with huggingface transformers and datasetsLike what you see? Please consider giving this a star (★)!
Simply put, Bayesian optimization aims to optimize either the exploration objective (the purple area in the image) or the exploitation object (the height of the black dots). It uses Gaussian processes as a backbone for inference, and uses an acquisition function to decide where to sample next. See here for an a more in-depth introduction.
Since Bayesian optimization works well with an expensive-to-evaluate black-box model (paraphrase: LLM), it is perfect for this particular use case. Bocoel uses Bayesian optimization as a backbone for exploring the embedding space given by our corpus, which allows it to select a good subset acting as a mini snapshot of the corpus.
LLMs are painfully slow, especially generative ones (which is what is usually referred to as LLM), since sequence generation is sequential by nature.
Despite bocoel's requirement to use an embedder to encode the entire corpus, embedders are faster than LLMs by orders of magnitude and the time is gained back by practically any savings in evaluating LLMs.
I don't want optional dependencies:
pip install bocoel
Give me the full experience (all optional dependencies):
pip install "bocoel[all]"
See the folder examples/getting_started for a simplistic usage of the library to get started with just a few lines of code.
Usage examples are under the folder examples. API reference can be found here.
Contributors wanted! Don't be shy. Feel free to file issues and PRs. For PRs, please follow the guide on contributing and the code of conduct. Openness and inclusiveness are taken very seriously.
The code is available under BSD-3 License.
If you find this project helpful in your research, please cite this work at
@misc{bocoel2024,
title = {BoCoEL: Bayesian Optimization as a Coverage Tool for Evaluating Large Language Models},
url = {https://bocoel.rentruewang.com/research/},
author = {Wang, RenChu},
month = {January},
year = {2024}
}