1、Seedream 2.0:A Native Chinese-English Bilingual ImageGeneration Foundation ModelSeed Vision Team,ByteDanceAbstractRapid advancement of diffusion models has catalyzed remarkable progress in the field of imagegeneration.However,prevalent models such as Flux,SD3.5 and Midjourney,still grapple withissue
2、s like model bias,limited text rendering capabilities,and insufficient understanding of Chinesecultural nuances.To address these limitations,we present Seedream 2.0,a native Chinese-Englishbilingual image generation foundation model that excels across diverse dimensions,which adeptlymanages text pro
3、mpt in both Chinese and English,supporting bilingual image generation and textrendering.We develop a powerful data system that facilitates knowledge integration,and a captionsystem that balances the accuracy and richness for image description.Particularly,Seedream isintegrated with a self-developed
4、bilingual large language model(LLM)as a text encode,allowingit to learn native knowledge directly from massive data.This enable it to generate high-fidelityimages with accurate cultural nuances and aesthetic expressions described in either Chinese orEnglish.Beside,Glyph-Aligned ByT5 is applied for f
5、lexible character-level text rendering,while aScaled ROPE generalizes well to untrained resolutions.Multi-phase post-training optimizations,including SFT and RLHF iterations,further improve the overall capability.Through extensiveexperimentation,we demonstrate that Seedream 2.0 achieves state-of-the
6、-art performance acrossmultiple aspects,including prompt-following,aesthetics,text rendering,and structural correctness.Furthermore,Seedream 2.0 has been optimized through multiple RLHF iterations to closely alignits output with human preferences,as revealed by its outstanding ELO score.In addition,