One of the most important but least approachable facets of the study of artificial intelligence is the notion of hyperparameters, the parameters used to train specific algorithms. These crucial aspects of system setup govern the training speed and accuracy of your network, so getting them right is critical to finding best results. Traditionally, a machine learning expert would be required to work out several sophisticated equations to determine the best parameters around the training model; with some models, there can be a dozen or more hyperparameters that must be computed or discovered through repeated experiments. Experts with years of experience can sometimes guess at the values before starting the learning process. As an example, for a Deep Q-Network (DQN) this can include exploration decay and others. For context, exploration decay describes the preference for exploring novel approaches to solve a problem (exploration) as opposed to exploiting a solution that was computed in the past. Here's a great simulator with the exploration delay (also known as epsilon) on a slider: REINFORCEjs Puckworld.
At Bonsai, we're abstracting this process by building a hyper-learner to identify and detect the optimum configuration to attack the given task. I sat with Ruofan Kong to get some of the details about how this is done.
The first hyperparameters to select are the correct learning algorithm and configuration of the neural network to apply to the task. The component responsible for this is the Architect, which will examine your compiled Inkling code to infer the correct number of input and output neurons, and design the rest of the layers accordingly. We're continually implementing more algorithms that the Architect can employ on new problems. Today's most popular learning models are Stochastic Gradient Descent and Adam, but there's constant research pushing forward the state of the art.
Each optimizer has hyperparameters of their own. The Instructor sets the parameters around the learning optimization scheme, and like the Architect, it will be collecting new information every time it makes a decision. This keeps the process abstracted away, so you can concentrate on more interesting things.
Just in case you did want to know what's behind the curtain...
To abstract this, our hyper-learner studies the problem before assigning hyperparameters by taking a fingerprint of a sample of the data available, using random predictions. During this time, we collect statistics about the different states that are reported, as well as our results based on the reward function. Based on these specifics, we can get a pretty good guess at the type of learning parameters that will give the best results. We do this by keeping a tally of the problems we've solved, and how those were best resolved. Naturally, the AI engine will have the hive mind to support it, which is the database of all of the solution statistics that have passed through the public API. Using this platform, we can democratize machine learning and artificial intelligence for developers around the world.
We'll talk to the AI team about training times, and why they're so hard to predict, as well as what our research is finding in this area.