1、Phi-3 Technical Report:A Highly Capable Language Model Locally on Your PhoneMicrosoftAbstractWe introduce phi-3-mini,a 3.8 billion parameter language model trained on 3.3 trillion tokens,whose overall performance,as measured by both academic benchmarks and internal testing,rivalsthat of models such
2、as Mixtral 8x7B and GPT-3.5(e.g.,phi-3-mini achieves 69%on MMLU and 8.38on MT-bench),despite being small enough to be deployed on a phone.The innovation lies entirely inour dataset for training,a scaled-up version of the one used for phi-2,composed of heavily filteredweb data and synthetic data.The
3、model is also further aligned for robustness,safety,and chat format.We also provide some initial parameter-scaling results with a 7B and 14B models trained for 4.8Ttokens,called phi-3-small and phi-3-medium,both significantly more capable than phi-3-mini(e.g.,respectively 75%and 78%on MMLU,and 8.7 a
4、nd 8.9 on MT-bench).1IntroductionThe striking progress of AI in the last few years can be largely attributed to major efforts through-out the world towards scaling-up to ever-larger models and datasets.Large Language Models(LLMs)have steadily increased in size from a mere billion parameters just fiv
5、e years ago(GPT-2 had 1.5 bil-lion parameters RWC+19)to trillion parameters today.The impetus for this effort originates in theseemingly predictable improvement one obtains by training large models,the so-called scaling lawsKMH+20,HBM+22,MRB+23.However these laws assume a“fixed”data source.This assu
6、mptionis now significantly disrupted by the existence of frontier LLMs themselves,which allow us to interactwith data in novel ways.In our previous works on the phi models GZA+23,LBE+23,JBA+23 it wasshown that a combination of LLM-based filtering of web data,and LLM-created synthetic data,enableperf