At Bonsai, we talk to a lot of Fortune 500 companies who want to explore using Reinforcement Learning to solve their industrial control problems. Often the thorniest part of the conversation is figuring out whether Reinforcement Learning is a good technique to solve their problem. This blog post is meant to shed some light on that subject.
Much of the buzz about Reinforcement Learning (RL) is created by solving so called “toy problems”. These toy problems, like the ones on OpenAI Gym and OpenAI Universe, are good benchmarks as they provide a standard set of simplified problems to judge AI algorithms against. Enterprise customers, however, face a much more complex set of challenges when using reinforcement learning to control or optimize industrial applications.
Industrial control systems like a wind turbine or diesel engine may involve dozens or thousands of variables, require human intensive calibration or optimization, and generate reams of output data. You may be thinking: machine learning is used to solve complex problems all the time, why do I need Reinforcement Learning? True, some problems are best solved by supervised or unsupervised machine learning. On the other hand, Reinforcement Learning can help control and optimize some systems that other methods cannot. The following three comparisons are meant to share some insights we’ve gained about how to tell a good industrial strength Reinforcement Learning problem when you see it.
When you play chess, each move that you make completely changes the game for you and your opponent. There are far too many move combinations for there to be a “right answer” move at any stage of the game because each sequential move changes the whole game and there’s no turning back. On the other hand, each question in a trivia game can be independently scored correct or incorrect.
If your system has many “knobs to turn” and turning any of those knobs changes the entire state of the environment, you might have a great application for Reinforcement Learning. On the other hand, if you have a data set where each row in the data can be graded correct or incorrect like a test or trivia game, then your problem is a better fit for supervised or unsupervised learning.
We’ve recently come across the following applications of industrial control systems that are a good fit for RL:
Musicians practice at rehearsals. They play musical passages over and over until they get it right. Mistakes are allowed and even encouraged in rehearsal to prepare the musicians for a high quality performance. Reinforcement Learning, like music rehearsal involves letting your AI learn by experience so you need to give the AI a lot of opportunities to practice and fail in order for it to learn.
Some industrial systems allow an AI to practice getting things right and some systems require performance quality only. For example, you wouldn’t want an AI to practice learning control on an expensive CNC machine.
This is where simulations become very valuable. In a simulated environment, the AI can repeat the try-fail - learn cycle many thousands of times safely and quickly. Safely because the AI is not failing in your live environment, and quickly because the simulation can show the AI failure conditions much more frequently than failure occurs in real life. If you have a simulation of your system or a live system that an AI can practice on (for example assembling a small and inexpensive polymer part repeatedly), then you might have a strong candidate for RL.
Returning to our chess example: an AI might learn how to play chess well against one type of opponent that it’s been training against, but teaching the AI how to play well against players who employ many different styles is another matter. Some algorithms are well suited for cookie cutter scenarios where the conditions will be identical over time. When conditions vary, it can become more difficult for those algorithms to optimize or control the system well. This is a job for Reinforcement Learning!
Now let’s return to the examples used in section #1 above to show the kinds of variations in industrial systems that are well solved with reinforcement learning:
Hopefully this post gives you a better idea about where Reinforcement Learning should be considered across different types of applications. If you think have a use case that could be solved with RL, we would love to hear from you at our getting started page. You can also find more details in our Use Case Qualification Worksheet.