mlc llm
1.0.0
带有ML汇编的通用LLM部署引擎
入门|文档|博客
MLC LLM是用于大型语言模型的机器学习编译器和高性能部署引擎。该项目的任务是使每个人都可以在每个人的平台上开发,优化和部署AI模型。
| AMD GPU | NVIDIA GPU | 苹果GPU | 英特尔GPU | |
|---|---|---|---|---|
| Linux / Win | ✅VULKAN,ROCM | ✅VULKAN,CUDA | N/A。 | ✅VULKAN |
| macos | ✅金属(DGPU) | N/A。 | ✅金属 | ✅金属(IGPU) |
| Web浏览器 | ✅webgpu和wasm | |||
| iOS / iPados | ✅苹果A系列GPU上的金属 | |||
| 安卓 | ✅在Adreno GPU上的OPENCL | Mali GPU上的OPENCL | ||
MLC LLM在MLCengine上编译并运行代码 - 上述平台上的统一高性能LLM推理引擎。 MLCENGINE提供通过REST服务器,Python,JavaScript,iOS和Android的OpenAI兼容API,所有这些都得到了我们不断改进社区的同一引擎和编译器的支持。
请访问我们的文档以开始使用MLC LLM。
如果您觉得有用,请考虑引用我们的项目:
@software { mlc-llm ,
author = { {MLC team} } ,
title = { {MLC-LLM} } ,
url = { https://github.com/mlc-ai/mlc-llm } ,
year = { 2023-2024 }
}MLC LLM的基础技术包括:
@inproceedings { tensorir ,
author = { Feng, Siyuan and Hou, Bohan and Jin, Hongyi and Lin, Wuwei and Shao, Junru and Lai, Ruihang and Ye, Zihao and Zheng, Lianmin and Yu, Cody Hao and Yu, Yong and Chen, Tianqi } ,
title = { TensorIR: An Abstraction for Automatic Tensorized Program Optimization } ,
year = { 2023 } ,
isbn = { 9781450399166 } ,
publisher = { Association for Computing Machinery } ,
address = { New York, NY, USA } ,
url = { https://doi.org/10.1145/3575693.3576933 } ,
doi = { 10.1145/3575693.3576933 } ,
booktitle = { Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 } ,
pages = { 804–817 } ,
numpages = { 14 } ,
keywords = { Tensor Computation, Machine Learning Compiler, Deep Neural Network } ,
location = { Vancouver, BC, Canada } ,
series = { ASPLOS 2023 }
}
@inproceedings { metaschedule ,
author = { Shao, Junru and Zhou, Xiyou and Feng, Siyuan and Hou, Bohan and Lai, Ruihang and Jin, Hongyi and Lin, Wuwei and Masuda, Masahiro and Yu, Cody Hao and Chen, Tianqi } ,
booktitle = { Advances in Neural Information Processing Systems } ,
editor = { S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh } ,
pages = { 35783--35796 } ,
publisher = { Curran Associates, Inc. } ,
title = { Tensor Program Optimization with Probabilistic Programs } ,
url = { https://proceedings.neurips.cc/paper_files/paper/2022/file/e894eafae43e68b4c8dfdacf742bcbf3-Paper-Conference.pdf } ,
volume = { 35 } ,
year = { 2022 }
}
@inproceedings { tvm ,
author = { Tianqi Chen and Thierry Moreau and Ziheng Jiang and Lianmin Zheng and Eddie Yan and Haichen Shen and Meghan Cowan and Leyuan Wang and Yuwei Hu and Luis Ceze and Carlos Guestrin and Arvind Krishnamurthy } ,
title = { {TVM}: An Automated {End-to-End} Optimizing Compiler for Deep Learning } ,
booktitle = { 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) } ,
year = { 2018 } ,
isbn = { 978-1-939133-08-3 } ,
address = { Carlsbad, CA } ,
pages = { 578--594 } ,
url = { https://www.usenix.org/conference/osdi18/presentation/chen } ,
publisher = { USENIX Association } ,
month = oct,
}