1、ERNIE 4.5 Technical ReportERNIE Team,BAbstractIn this report,we introduce ERNIE 4.5,a new family of large-scale foundation modelscomprising 10 distinct variants.The model family consists of Mixture-of-Experts(MoE)models with 47B and 3B active parameters(the largest containing 424B totalparameters),a
2、s well as a dense model with 0.3B parameters.For the MoE architecture,we propose a novel multimodal heterogeneous structure that supports parametersharing across modalities while also allowing dedicated parameters for each modality.This MoE architecture has the advantage of enhancing multimodal unde
3、rstandingwhile also improving performance on text-related tasks.All of our models are trainedwith optimized efficiency using the PaddlePaddle deep learning framework,whichenables high-performance inference and streamlined deployment.We achieve 47%Model FLOPs Utilization(MFU)during the pre-training o
4、f our largest ERNIE 4.5language model.Experimental results show that our models achieve state-of-the-art performance across multiple text and multimodal benchmarks,particularly ininstruction following,world knowledge memorization,visual understanding,andmultimodal reasoning.All models are publicly a
5、vailable under the Apache 2.0 licenseto support future research and development in the field.Additionally,we open-sourcethe development toolkits for ERNIE 4.5,which feature industrial-grade capabilities,resource-efficient training and inference workflows,and multi-hardware compatibility.Github:https
6、:/ 29,20251Contents1Introduction42Architecture52.1Heterogeneous MoE.52.2Vision Encoder.72.3Adapter.82.4Multimodal Position Embedding.83Pre-Training83.1Pre-Training Data.83.2REEAO:Bitwise-Deterministic Pre-Training Data Manager.103.3Pre-Training Recipe.103.3.1Stage I:Text-Only Training.103.3.2Stage I