Project coordinator: Elena Gaura, Coventry University ( [email protected])
Reinforcement learning (RL) originated almost 40 years ago but has only had marginal impact on physical-world applications, despite showing considerable promise, such as mastering the game of Go (AlphaGo), moderating output from large language models (part of ChatGPT), and Waymo’s autonomous driving. RL has the potential to resolve a key weakness in traditional optimal control approaches that oversimplify human-in-the-loop systems. RL enables sophisticated non-linear, stochastic data-driven models of human behaviour, while also allowing for that behaviour to change over time.
When RL is used successfully the results outshine by far all previous efforts.
Using RL for control of complex real-world physical systems, however, is challenging because policies learnt under simulation can fail catastrophically when transferred to the real world. One approach to avoid model mismatch is to perform domain randomisation but this can end up with final systems that are poorly optimised. Furthermore, the end-to-end process of bringing RL to production is complex and expensive (e.g., OpenAI Robot Hand). Both research gaps above offer a wealth of exploration opportunities leading to robust solutions for control applications in the built environment, the automotive industry, health and agriculture – all domains where humans and machines interact and that interaction needs to be optimized over many dimensions.
Topics: Reinforcement learning for real-life applications, stochastic models of human behaviour, dynamic bayesian networks, data driven system identification, closing the sim2real gap, domain randomization through residual noise models.