1、1www.wandb.ai contactwandb.aiCurrent Best Practices for Training LLMs from ScratchAuthors:Rebecca Li,Andrea Parker,Justin Tenuto2www.wandb.ai contactwandb.aiTable of ContentsIntroductionBuild vs.Buy Pre-trained LLM ModelsThe Scaling LawsHardwareMemory vs.Compute EfficiencyTechniques for Parallelizat
2、ionDataset CollectionDataset Pre-processingDataset HandlingTokenizationPre-training StepsModel EvaluationBias and ToxicityInstruction TuningReinforcement Learning through Human Feedback(RLHF)ConclusionReferencesAppendixLLM OverviewTransformer Model ArchitectureThe Original LLM Scaling Laws0303050606
3、060808080913151617192020212121233www.wandb.ai contactwandb.aiIntroductionAlthough were only a few years removed from the transformer breakthrough,LLMs have already grown massively in performance,cost,and promise.At W&B,weve been fortunate to see more teams try to build LLMs than anyone else.But many
4、 of the critical details and key decision points are often passed down by word of mouth.The goal of this white paper is to distill the best practices for training your own LLM for scratch.Well cover everything from scaling and hardware to dataset selection and model training,letting you know which t
5、radeoffs to consider and flagging some potential pitfalls along the way.This is meant to be a fairly exhaustive look at the key steps and considerations youll make when training an LLM from scratch.The first question you should ask yourself is whether training one from scratch is right for your orga
6、nization.As such,well start there:Before starting LLM pre-training,the first question you need to ask is whether you should pre-train an LLM by yourself or use an existing one.There are three basic approaches:Option 1:Use the API of a commercial LLM,e.g.GPT-3(OpenAI,2020),Cohere APIs,AI21 J-1 Option