Uni-1 is the world's first multimodal reasoning model that generates pixels. Built on Unified Intelligence by Luma Labs, it bridges the gap between language understanding and visual creation — reasoning, imagining, and generating in one unified architecture. Experience the next generation of AI image generation.
Ranked #1 in human preference Elo for Overall, Style & Editing, and Reference-Based Generation
Uni-1 represents a paradigm shift in artificial intelligence. Unlike traditional models that separate language and vision, it grows a mind's eye from a logical brain — jointly modeling time, space, and logic in a single decoder-only autoregressive Transformer. This unified approach enables forms of visual reasoning and image generation that fragmented pipelines simply cannot achieve.
At its core, Uni-1 is a decoder-only autoregressive Transformer where text and images are represented in a single interleaved sequence. This design enables seamless cross-modal reasoning — a fundamental advantage over models that treat language and vision as separate systems.
Uni-1 performs structured internal reasoning before and during image synthesis. Given a complex prompt, it decomposes instructions, resolves spatial constraints, plans composition, and renders accordingly — achieving state-of-the-art results on the RISEBench benchmark for reasoning-informed visual editing.
Uni-1 demonstrates that learning to generate images materially improves visual understanding. It excels at fine-grained tasks like open-vocabulary object detection (ODinW-13), showing that generation and understanding reinforce each other within the unified framework.
Uni-1 processes text and images in a single interleaved sequence — both as input and output. It can accept text prompts, reference images, and editing instructions all at once, producing pixel-perfect results that reflect deep understanding of every input element.
Uni-1 outperforms competitors across multiple evaluation dimensions. In human preference Elo rankings, it takes first place for Overall quality, Style & Editing, and Reference-Based Generation — and second in Text-to-Image. Here's what makes it the leading choice for intelligent image generation.
Uni-1 delivers a comprehensive suite of AI image generation capabilities — all powered by a single unified Transformer model. Every feature benefits from its reasoning-first architecture.
Generate stunning images from text descriptions. The reasoning engine automatically plans scene composition, spatial layout, lighting, and perspective before rendering each pixel.
Edit images with natural language instructions. Uni-1 decomposes complex edits into logical steps — modifying exactly what's needed while preserving everything else.
Provide up to 8 reference images to guide generation. Identity, style, and compositional constraints are preserved across all references, enabling powerful creative workflows.
Uni-1 understands 3D space, object relationships, and physical plausibility. Objects are placed with correct perspective, depth, and occlusion — creating spatially coherent scenes every time.
The generation capability enhances visual understanding. Objects, regions, and layouts can be identified, localized, and reasoned about with fine-grained precision across diverse visual domains.
Seamlessly transform between artistic styles — from photorealism to watercolor, from manga to oil painting. Subject identity is preserved while adopting any target aesthetic with cultural awareness.
Everything you need to know about Uni-1, the multimodal reasoning model from Luma Labs.
Join thousands of creators exploring the future of AI image generation. Experience what happens when reasoning meets visual creation.