3D-Generalist is a generative graphics framework composed of multiple foundation models and modules to scale up 3D environments and data that are readily usable for Synthetic Data and Embodied AI purposes.
Here are some 3D environments crafted by 3D-Generalist, demonstrating controllable generation over
🎨 materials, 💡 lighting, 🏠 assets, and 📐 layout:
"An international restaurant with vibrant decor."
"A spacious home gym that is fully equipped."
"A bohemian art studio with a vintage easel."
"A chic clothing store with mannequins."
"A colorful arcade with neon signs."
"A modern bar with brick wall and marble bar counter."
"A quaint bookstore."
These examples qualitatively highlight 3D-Generalist's self-correcting behavior.
@article{sun20253d,
title={3D-Generalist: Self-Improving Vision-Language-Action Models for Crafting 3D Worlds},
author={Sun, Fan-Yun and Wu, Shengguang and Jacobsen, Christian and Yim, Thomas and Zou, Haoming and Zook, Alex and Li, Shangru and Chou, Yu-Hsin and Can, Ethem and Wu, Xunlei and others},
journal={arXiv preprint arXiv:2507.06484},
year={2025}
}