Facebook is open-sourcing Horizon, a reinforcement learning platform created by Facebook AI researchers, recommender systems experts, and engineers.
Horizon is made for the deployment of AI at scale so that companies or research teams can carry out operations that may require thousands of CPUs or GPUs to train with billions of observations. However, since it utilizes Apache Spark for preprocessing and PyTorch to train AI systems, Horizon can also be deployed on a single computer.
Product teams at Facebook have used Horizon for things like M Suggestions, a service that can recommend translations, Spotify songs, Food Network recipes, and a myriad of other things based on words used in conversations on Facebook Messenger.
It’s also been used to determine the bitrate of Facebook 360 videos, and to personalize when the Facebook app chooses to send users notifications.
Reinforcement learning uses rewards to drive the activity of agents to reach a desired goal.
Facebook chose to open-source Horizon to move forward the field of reinforcement learning and unsupervised learning methods both among novice practitioners and students as well as large research projects that, like Facebook, need thousands of machines to train AI systems.
“I do think reinforcement learning (RL) is kind of the next frontier when it comes to industrywide, widespread adoption when it comes to machine learning, so we wanted to open-source this to really provide a good platform for people all around the Bay and all around the world to start using RL,” Gauci said.
Facebook is no stranger to open source tools for the training or deployment of AI.
Version 1.0 of popular deep learning framework PyTorch was released in October with integrations for Google Cloud, AWS, and Azure Machine Learning. There’s also Caffe2 and Parlai, a platform for training AI models. Research from Facebook AI Research is also open-sourced.
In addition to using PyTorch and Apache Spark, TensorBoard X is used for training visualizations and ONNX for serving up AI models after training.
Unlike other forms of reinforcement learning at large organizations that may operate live, Horizon trains AI systems offline.
Horizon applies a technique known as counterfactual policy evaluation to evaluate the offline performance of an AI system to determine if alternative approaches may improve performance before going live.
“We can counterfactually look at these alternative actions and say ‘Oh, maybe this alternative action was better in this circumstance,’” he said. “So using this we can — as opposed to like a lot of RL, where they kind of train online and the model’s always changing — we train offline and we have a stage where we evaluate the model, and we come up with some confidence on the model’s performance, and then engineers can choose to deploy that model or not. And the Horizon platform open-sources all of that and makes it all available.”
Horizon is also made to normalize the training of large datasets, a commonly encountered issue with reinforcement learning, Gauci said. The platform comes with step-by-step instructions so it can be utilized by anyone with basic computer science knowledge, not just researchers or experts at companies like Facebook.
“Anyone who has any kind of basic Unix experience can generate a dataset and train a model and see how it works, and that’s one of the things. There’s sort of an educational aspect to this; we want to get a lot of people kind of excited about the field,” he said.
To Read Our Daily News Updates, Please visit Inventiva or Subscribe Our Newsletter & Push.