Imagine that text is no longer limited to a two-dimensional plane, but can be splashed in a three-dimensional space like paint and interact with the real world. What will it be like? Research teams from Tsinghua University and Harvard University jointly developed an amazing black technology - LangSplat, which achieves a breakthrough in open text querying of the real world. This technology uses three-dimensional Gaussian splashing technology to make text "alive" in three-dimensional space, bringing revolutionary changes to 3D scene understanding and interaction.
In this three-dimensional world, we use words to describe everything and use language to explore the world. But have you ever thought about what it would be like if text could be "splashed" directly into three-dimensional space?
Recently, top academics from Tsinghua University and Harvard University have developed such a black technology - LangSplat. It uses three-dimensional Gaussian splashing technology to make text "alive" in three-dimensional space and realize open text query in the real world.

Project address: https://github.com/minghanqin/LangSplat
Imagine you are playing a 3D game and want to find a hidden sword. You only need to enter the word "sword", and LangSplat can accurately locate its location in the vast scene. Isn't it amazing?
A double leap in speed and accuracy
The biggest highlight of LangSplat is its speed and accuracy.
Speed: At 1080P resolution, its query speed is 200 times faster than traditional methods! This means you can get feedback instantly without having to wait for the progress bar.
Accuracy: Through hierarchical semantic learning, it makes the three-dimensional semantic field clearer and the boundaries of the target are no longer blurry. It's like using a magnifying glass to observe details, every corner is revealed.
The black technology behind the technology
LangSplat’s core technologies include:
Hierarchical semantic learning: Use Segment Anything Model (SAM) to learn multi-level semantics from the whole to the part, so that each object can be accurately recognized.
Three-dimensional Gaussian splash: In 3D space, Gaussian distribution is used to represent semantic information, and each Gaussian point encodes rich semantic features.
Scene autoencoder: In order to solve the storage problem of high-dimensional features, LangSplat built a scene-specific autoencoder to reduce the dimensionality of semantic features, which not only saves memory but also improves efficiency.
Application prospects are unlimited
The advent of LangSplat has opened a new door for 3D scene understanding. Whether it is robot navigation, augmented reality, or 3D editing, it can show its talents.
Imagine that in the future you are playing an immersive VR game, and you can direct the robot to find treasures with just your words. Or if you are designing a 3D model, you can quickly modify the parameters through language. All this is no longer a dream.
The emergence of LangSplat has undoubtedly brought revolutionary changes to the interaction between the three-dimensional world and human language. Its application prospects in games, robots, AR/VR and other fields are immeasurable. Let us wait and see the further development and application of this technology.