ResearcharXiv AI / CL
IV-CoT: Implicit Visual Chain-of-Thought for Structure-Aware Text-to-Image Generation

Summary
Unified multi-modal large language models (MLLMs) have achieved strong text-to-image generation quality, but still struggle with structure-aware prompt following, where object counts, spatial relations, attribute bindings, and coarse layouts must be preserved.
Region
Global
Heat Score
76
Category
Research
Language
en
