Shield Synthesis for Safe Reinforcement Learning
You have a reinforcement learning system? Sure, it works great, but does it give you any guarantees? I thought not.
We will describe methods to use reactive synthesis to construct runtime enforcement modules (shields) that can ensure that a system works correctly, even if the system has bugs. If the system doesn't have too many bugs, the behavior of the shielded system will stay close to the behavior of the original system. We will show extensions probabilistic and timed settings.
About the Speaker
Prof. Roderick Bloem received his M.Sc. degree in Computer Science from Leiden University, the Netherlands in 1996, and his Ph.D. degree in Computer Science from the University of Colorado at Boulder, in 2001. He joined Graz University of Technology in 2002. From 2008, he has been a full professor of Computer Science at the same university and is currently head of the department of Computer Science and Biomedical Engineering.
Roderick Bloem has published over 100 peer reviewed papers in formal verification, reactive synthesis and security and is an editor of the Handbook of Model Checking. He led the Austrian National Research Network on Rigorous Systems Engineering and has organized events including the Computer Aided Verification conference and Formal Methods in Computer Aided Design.