1、Miriam Wollenhaupt,Ph.D.,Computational Chemist,Bayer AGMartn Villalba,Ph.D.,Expert Applied Mathematics,Bayer AGOrr Ravitz,Ph.D.,Synthesis Planning Solutions,CASPREDICTINGNEW CHEMISTRYIMPACT OF HIGH-QUALITY TRAINING DATA ON PREDICTION OF REACTION OUTCOMEScas.orgIn chemical synthesis planning applicat
2、ions,the goal is to generate sets of synthetic routes that are as diverse and as accurate as possible,to provide organic chemists with many plausible and distinct strategies to make their target molecules.However,data-driven computational applications can only be as good as the underpinning data.The
3、 quality of predicted results depends on the following main properties of the training data:1.The diversity of the predictions is correlated to the breadth of the data source:how many reaction types are represented,and how diverse the products and substrates are in each reaction.2.The accuracy of th
4、e predictions depends on the quality and consistency of the data and its representation as well as its depth:the number of examples available for each reaction type and the spectrum of reactants,products and reaction conditions are available.In this study,we demonstrate the significant impact that e
5、ven a moderately sized set of scientist-curated reactions from the CAS content collection can have on the predictive power of a synthesis planning tool.A broad training set was enriched with examples targeting certain reaction types,which dramatically enhanced the predictive power of the machine lea
6、rning models.This is a strong indication for the much greater potential for CAS content to drive AI applications in synthesis planning.Enriching a training set with high-quality,diverse CAS reactions had a significant impact on predictive power.3 2021 American Chemical Society.All rights reserved.Co