September 20, 2017

Programming Intelligence into Industrial Control Systems: Is Reinforcement Learning the Answer?

At Bonsai, we talk to a lot of Fortune 500 companies who want to explore using Reinforcement Learning to solve their industrial control problems. Often the thorniest part of the conversation is figuring out whether Reinforcement Learning is a good technique to solve their problem.  This blog post is meant to shed some light on that subject.

Much of the buzz about Reinforcement Learning (RL) is created by solving so called “toy problems”.  These toy problems, like the ones on OpenAI Gym and OpenAI Universe, are good benchmarks as they provide a standard set of simplified problems to judge AI algorithms against.  Enterprise customers, however, face a much more complex set of challenges when using reinforcement learning to control or optimize industrial applications.

Industrial control systems like a wind turbine or diesel engine may involve dozens or thousands of variables, require human intensive calibration or optimization, and generate reams of output data. You may be thinking: machine learning is used to solve complex problems all the time, why do I need Reinforcement Learning?  True, some problems are best solved by supervised or unsupervised machine learning.  On the other hand, Reinforcement Learning can help control and optimize some systems that other methods cannot. The following three comparisons are meant to share some insights we’ve gained about how to tell a good industrial strength Reinforcement Learning problem when you see it.

#1 Is your problem more like chess or multiple choice trivia?

When you play chess, each move that you make completely changes the game for you and your opponent.  There are far too many move combinations for there to be a “right answer” move at any stage of the game because each sequential move changes the whole game and there’s no turning back.  On the other hand, each question in a trivia game can be independently scored correct or incorrect.

If your system has many “knobs to turn” and turning any of those knobs changes the entire state of the environment, you might have a great application for Reinforcement Learning.  On the other hand, if you have a data set where each row in the data can be graded correct or incorrect like a test or trivia game, then your problem is a better fit for supervised or unsupervised learning.   

We’ve recently come across the following applications of industrial control systems that are a good fit for RL:

  • Robotic Control: To grasp and assemble the part correctly, a robot must account for shape, size, mass, and mechanical properties of the part.
  • HVAC Control Systems: To optimize temperature control, power consumption, and reliability, a refrigeration truck needs to account for temperature set point, system power needs, and compressor cycles.
  • Supply Chain: To optimize the restocking policy for warehouse products, a warehouse manager needs to consider warehouse space, minimum order terms, shelf life, and transportation times.
  • Automotive Control Systems: To optimize engine performance, a mechanic needs to tune air-to-fuel ratio, ignition advance and other parameters.
  • Automatic Calibration: To optimize injection moulding machine calibration settings a technician must measure wear patterns and plate temperature.
  • Network Optimization: To maintain service availability during Distribute Denial of Service (DDos) attacks, an operator must adapt network configuration during the attack.

#2 Does your system allow practice or require performance quality only?

Musicians practice at rehearsals.  They play musical passages over and over until they get it right.  Mistakes are allowed and even encouraged in rehearsal to prepare the musicians for a high quality performance.  Reinforcement Learning, like music rehearsal involves letting your AI learn by experience so you need to give the AI a lot of opportunities to practice and fail in order for it to learn.

Some industrial systems allow an AI to practice getting things right and some systems require performance quality only.  For example, you wouldn’t want an AI to practice learning control on an expensive CNC machine.

This is where simulations become very valuable.  In a simulated environment, the AI can repeat the try-fail - learn cycle many thousands of times safely and quickly.  Safely because the AI is not failing in your live environment, and quickly because the simulation can show the AI failure conditions much more frequently than failure occurs in real life.  If you have a simulation of your system or a live system that an AI can practice on (for example assembling a small and inexpensive polymer part repeatedly), then you might have a strong candidate for RL.

#3 Is your problem cookie cutter or Heinz 57 varieties?

Returning to our chess example: an AI might learn how to play chess well against one type of opponent that it’s been training against, but teaching the AI how to play well against players who employ many different styles is another matter.  Some algorithms are well suited for cookie cutter scenarios where the conditions will be identical over time.  When conditions vary, it can become more difficult for those algorithms to optimize or control the system well. This is a job for Reinforcement Learning!

Now let’s return to the examples used in section #1 above to show the kinds of variations in industrial systems that are well solved with reinforcement learning:

  • Robotic Control: Teach a robot end effector to pick up parts that may be cylindrical, rectangular, or hexagonal in profile.
  • HVAC Control Systems: Teach HVAC system to optimize temperature control in various climates, weather scenarios, and room occupancy.
  • Supply Chain: Teach system to optimize warehouse restocking policy across shelf life and transportation times.
  • Automotive Control Systems: Teach system to tune engine parameters for optimal engine performance across driver preferences and road conditions.
  • Automatic Calibration: Teach system to optimize injection moulding machine calibration settings across various wear patterns and materials injected.
  • Network Optimization: Teach system to optimize network configuration to prevent various styles of DDos attacks.

Hopefully this post gives you a better idea about where Reinforcement Learning should be considered across different types of applications. If you think have a use case that could be solved with RL, we would love to hear from you at our getting started page.  You can also find more details in our Use Case Qualification Worksheet.

Always. Be. Learning.

Stay up-to-date on our latest product news, AI industry highlights, and more!