DeepSeek, a leader in the field of AI in China, released a major technology on the last day of the open source week - the high-performance parallel file system 3FS (Fire-Flyer File System) designed for modern computing scenarios and its supporting data processing framework Smallpond. This combination of punches directly solves the data processing problems in AI training and inference, setting a new industry record with a cluster throughput of 6.6TiB/s, marking a new era for distributed storage technology.

Through its decentralized architecture and strong consistent semantic design, 3FS has achieved an aggregated read throughput of 6.6TiB/s in a 180-node cluster, and the single-node KVCache search peak has exceeded 40GiB/s. In the GraySort benchmark test, 3FS's performance reached 3.66TiB/min (25 nodes), an exponential improvement compared to traditional solutions. This system deeply optimizes the characteristics of SSD and RDMA networks, pushes hardware bandwidth utilization to the extreme, and provides stable data supply for kilocard-level AI training clusters.
As the core infrastructure of the DeepSeek V3/R1 version, 3FS has fully penetrated into key links such as data preprocessing, checkpoint storage, vector search and inference caching. Its shared storage layer design significantly simplifies the complexity of distributed development, while strong consistency guarantees ensure the security of large-scale concurrent operations. The smallpond framework with open source has built lightweight PeB-level data processing capabilities, and relied on DuckDB to realize "service-free" data engineering, forming a complete ecological closed loop from storage to computing.
The dual open source of 3FS and Smallpond continues the technological opening rhythm of DeepSeek's "five-day continuous release". By making systems that have been proven by its own AI business to the public, DeepSeek is pushing the industry to break through the storage bottlenecks of data-intensive applications. Analysts believe that this solution may cause a dimensionality reduction blow to traditional distributed systems such as Ceph and Lustre, especially to open up new paradigms in scenarios such as large model training.
Open source address:
3FS → https://github.com/deepseek-ai/3FS
Data processing framework on Smallpond -3FS→: https://github.com/deepseek-ai/smallpond