11.8 C
New York
Tuesday, December 1, 2020
Home Trends Salesforce open-sources TransmogrifAI, the machine learning library that powers Einstein

Salesforce open-sources TransmogrifAI, the machine learning library that powers Einstein

Machine learning models — artificial intelligence (AI) that identifies relationships among hundreds, thousands, or even millions of data points — are rarely easy to architect. Data scientists spend weeks and months not only preprocessing the data on which the models are to be trained, but extracting useful features (i.e., the data types) from that data, narrowing down algorithms, and ultimately building (or attempting to build) a system that performs well not just within the confines of a lab, but in the real world.

Salesforce’s new toolkit aims to ease that burden somewhat. On GitHub today, the San Francisco-based cloud computing company published TransmogrifAI, an automated machine learning library for structured data — the kind of searchable, neatly categorized data found in spreadsheets and databases — that performs feature engineering, feature selection, and model training in just three lines of code.

It’s written in Scala and built on top of Apache Spark (some of the same technologies that power Salesforce AI platform Einstein) and was designed from the ground up for scalability. To that end, it can process datasets ranging from dozens to millions of rows and run on clustered machines on top of Spark or an off-the-shelf laptop.

Mayukh Bhaowal, director of product management for Salesforce Einstein, told VentureBeat in a phone interview that TransmogrifAI essentially transforms raw datasets into custom models. It’s the evolution of Salesforce’s in-house machine learning library, which allowed the Einstein team to deploy custom models for enterprise clients in just hours.

“It’s informed by what our data scientists learned while building Einstein,” Bhaowal explained. Chief among those lessons: Custom-built models beat global, pretrained models. “If you’re using the same model to make predictions for a Fortune 500 company and a mom and pop shop, you’ll have a hard time finding the right pattern.”

Machine learning made easy

TransmogrifAI offers a three-step workflow.

First is feature inference and automated feature selection. It’s a crucial part of model training, as selecting the wrong features could result in an overly optimistic, inaccurate, or biased model.

Using TransmogrifAI, users specify a schema for their data, which the library uses to extract features automatically (such as phone numbers and zip codes, for example). It also performs statistical tests, automatically cataloging text fields with low cardinality — i.e., a small number of elements — and throwing out features with little-to-no predictive power, or those that are likely to result in hindsight bias (the tendency to overestimate an event’s predictability) and other unwanted signals.

In a demo, Bhaowal showed how TransmogrifAI could quickly isolate features like job titles, emails, and addresses and figure out whether they’re predictive. Those that aren’t — salutation, in this case — were discarded automatically. “It’s perfect for dimensionality reduction,” he said, referring to the process of reducing the number of features on which the model is trained.

The next step in TransmogrifAI’s flow is automated feature engineering. Drawing on the feature types extracted in the first step, the library transforms structured data into vectors, automatically taking, for example, a list of phone numbers and splitting out the country code to see if a phone number is valid.

Once TransmogrifAI has extracted features from the dataset, it’s primed to begin automated model training. At this stage, it runs a cadre of machine learning algorithms in parallel on the data, automatically selects the best-performing model, and samples and recalibrates predictions to avoid imbalanced data.

Core to TransmogrifAI’s training is what Shubha Nabar, senior director of data science for Salesforce Einstein, calls “model explainability” — transparency about the factors influencing a model’s predictions. “From a trust and data privacy perspective, it’s important that the generated model isn’t a ‘black box’,” she said. “[TransmogrifAI] shows the global effects of each feature.”

And that’s just the tip of a very tall iceberg.

TransmogrifAI boasts tools that make it easier to adjust hyperparameters — variables such as sampling rate and filters — that influence and optimize machine learning models. And within integrated development environments that support it, TransmogrifAI highlights typos and syntax errors, suggests code completion, and “types” features with an extensible hierarchy, allowing users to differentiate between nuanced and primitive features.

“[TransmogrifAI] has been transformational for us, [reducing] the average turn-around time for training a performant model to a couple of hours and enabling our data scientists to deploy thousands of models in production with minimal hand-tuning,” Bhaowal said. “The goal of democratizing machine learning can only be achieved through an open exchange of ideas and code, and diverse perspectives from the community will make the technology better for everyone.”

Coincidentally, the public launch of TransmogrifAI comes a day after the open-sourcing of Oracle’s GraphPipe, a tool that makes it easier to deploy machine learning models made by frameworks like Google’s TensorFlow, MXNet, Facebook’s Caffe2, and PyTorch in the cloud.

Source: VentureBeat


Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.

- Advertisment -

Most Popular

Here’s the seven ways Americans pay taxes!

As Americans across the country rang within the New Year, many were unaware that, in the dark, quite 50 different tax breaks expired. Consistent...

Livspace, a design and renovations firm, finds itself in the midst of negativity – customers, vendors, and ex-employees livid about failed promises and lack...

Livspace is unfortunately not living up to its name; this Bangalore-based start-up finds itself facing a severe backlash from its customers, vendors, and even...

On the Auspicious Occasion of Guru Nanak Jayanti, nCORE Games Have Started The Pre-Registrations Of “FAU-G”: Fearless And United Guards!

As PUBG Mobile hopes to make a comeback in India, its alternatives have appeared on the Google Play store. On Monday, on the auspicious...

Google Pay ends support for old apps, adds transfer fee in the United States.

Google pay will end the installment includes its web variant and old versatile application in Android and iOS toward the beginning of January. It...

Recent Comments

%d bloggers like this: