Recently, Arc Institute and NVIDIA jointly launched the world's largest biological artificial intelligence model - Evo2, together with research teams from Stanford University, UC Berkeley and UC San Francisco. Based on data from more than 128,000 genomes, this breakthrough model trained 9.3 trillion nucleotides, comparable to the most powerful generative AI language model at present, marking a major leap in the field of biology research.
Evo2's deep learning ability allows it to quickly identify patterns in gene sequences of different organisms, greatly reducing the working hours of researchers. This model not only accurately recognizes mutations that trigger human diseases, but also designs new genomes that are comparable to the length of a simple bacterial genome. The development team plans to release details of Evo2 on February 19, 2025 and launch a user-friendly interface called Evo Designer. In addition, Evo2's code has been published on Arc's GitHub and integrated into NVIDIA's BioNeMo framework to promote further scientific research.
Compared with the previous generation model Evo1, Evo2 has significantly expanded its data range, covering data from bacteria, archaea, viruses, and eukaryotes such as humans and plants. The researchers said the development of Evo2 marks an important milestone in the field of generative biology, which enables machines to “read, write, think” the language of nucleotides, providing new possibilities for future bioengineering and gene therapy design.
At the technical level, Evo2 was trained on the NVIDIA DGX Cloud AI platform and used more than 2,000 NVIDIA H100 GPUs. This powerful computing power enables the model to process up to 1 million nucleotides at a time, thereby better understanding of the relationships between remote parts of the genome. The new AI architecture "StripedHyena2" allows Evo2 to process 30 times more data than Evo1, further improving its performance.
Evo2 has a wide range of applications, especially in the analysis of genetic changes related to protein function and organism adaptability. For example, in variant tests of the breast cancer-related gene BRCA1, Evo2 predicts mutations with more than 90%. These findings will not only greatly save laboratory time and funds, but will also accelerate the development of new drugs.
In addition, Evo2 can help design new biological tools or treatment options. For example, scientists could use the model to design gene therapies targeting specific cells to avoid side effects. The research team believes that in the future, more specific AI models can be built based on Evo2, providing more possibilities for genomic research and bioengineering.
In terms of ethical and security risks, the researchers ensure that Evo2's dataset does not contain pathogens that are harmful to humans and other complex organisms to responsibly develop and deploy the technology. This move not only ensures the safety of technology, but also lays a solid foundation for future biological research.
The detailed introduction of Evo2 can be found at the following link: https://arcinstitute.org/news/blog/evo2
Key points: Evo2 is the world's largest biological AI model, with training data covering 128,000 genomes. This model can quickly identify disease mutations and design new genomes, greatly improving scientific research efficiency. Evo2 offers new possibilities for future bioengineering and gene therapy design.