1、Certification of Reinforcement Learning:Challenges per Safety Criticality and AutonomyMarta Ribeiro1,Fynn Opperman21Assistant Professor,Aerospace Faculty2PhD Candidate,Aerospace Faculty199Current State Machine Learning CertificationEurope202420242020USA202420202023EuropeUSA200Current State Machine L
2、earning Certification 202420202024202320242020201Supervised vs Reinforcement Learning?Supervised Learning:Labeled Data?Reinforcement Learning:AlgorithmF(x)EnvironmentDataKnown,pre-processed dataAlgorithmActionRewardAlgorithmLookup TableNeural NetworkReinforcement Learning202StateEnvironmentSimulated
3、RealActionReinforcement Learning203Autonomous/Decision SupportImmediate/DelayedReactionAlgorithmLookup TableNeural NetworkStateRewardEnvironmentSimulatedRealActionReinforcement Learning204 Future rewards:sequence of actions vs immediate action bad action can lead to a good state Maximization of rewa
4、rd Multi-objective reward formulation121AlgorithmLookup TableNeural NetworkStateRewardEnvironmentSimulatedRealActionReinforcement Learning205Autonomous/Decision SupportImmediate/DelayedReaction20Safeguards against:Non-safe/prohibited actions Actions that move the system to bad/unknown statesEnvironm
5、entSimulatedRealAlgorithmLookup TableNeural NetworkStateReward206Reinforcement Learning Data?1 MLEAP-D4 Final Report207Reinforcement Learning Data?Model TrainingPost Training Assurance(Deployment Assurance)Pre-Training Assurance?No pre-existing data?Running the model to learn about actions taken?Pos
6、t-assessment explainability/predictability of actions208Proposal Model TrainingPre-Training AssuranceModel TrainingDefine ODD+AssumptionsIn/Out of Distribution Detection MethodRuntime VerificationTraining Environment AnalysisReward Formulation AnalysisRL Algorithm SelectionHyperp