4 human-caused biases we need to fix for machine learning

Nitin Naresh October 27, 2018

0 251 3 minutes read

Bias is an overloaded word. It has multiple meanings, from mathematics to sewing to machine learning, and as a result it’s easily misinterpreted.
When people say an AI model is biased, they usually mean that the model is performing badly. But ironically, poor model performance is often caused by various kinds of actual bias in the data or algorithm.
Machine learning algorithms do precisely what they are taught to do and are only as good as their mathematical construction and the data they are trained on. Algorithms that are biased will end up doing things that reflect that bias.
To the extent that we humans build algorithms and train them, human-sourced bias will inevitably creep into AI models. Fortunately, bias, in every sense of the word as it relates to machine learning, is well understood. It can be detected and it can be mitigated — but we need to be on our toes.
There are four distinct types of machine learning bias that we need to be aware of and guard against.

1. Sample bias

Sample bias is a problem with training data. It occurs when the data used to train your model does not accurately represent the environment that the model will operate in. There is virtually no situation where an algorithm can be trained on the entire universe of data it could interact with.
But there’s a science to choosing a subset of that universe that is both large enough and representative enough to mitigate sample bias. This science is well understood by social scientists, but not all data scientists are trained in sampling techniques.
We can use an obvious but illustrative example involving autonomous vehicles. If your goal is to train an algorithm to autonomously operate cars during the day and night, but train it only on daytime data, you’ve introduced sample bias into your model. Training the algorithm on both daytime and nighttime data would eliminate this source of sample bias.

2. Prejudice bias

Prejudice bias is a result of training data that is influenced by cultural or other stereotypes. For instance, imagine a computer vision algorithm that is being trained to understand people at work. The algorithm is exposed to thousands of training data images, many of which show men writing code and women in the kitchen.
The algorithm is likely to learn that coders are men and homemakers are women. This is prejudice bias, because women obviously can code and men can cook. The issue here is that training data decisions consciously or unconsciously reflected social stereotypes. This could have been avoided by ignoring the statistical relationship between gender and occupation and exposing the algorithm to a more even-handed distribution of examples.
Decisions like these obviously require a sensitivity to stereotypes and prejudice. It’s up to humans to anticipate the behavior the model is supposed to express. Mathematics can’t overcome prejudice.
And the humans who label and annotate training data may have to be trained to avoid introducing their own societal prejudices or stereotypes into the training data.

3. Measurement bias

Systematic value distortion happens when there’s an issue with the device used to observe or measure. This kind of bias tends to skew the data in a particular direction. As an example, shooting training data images with a camera with a chromatic filter would identically distort the color in every image. The algorithm would be trained on image data that systematically failed to represent the environment it will operate in.
This kind of bias can’t be avoided simply by collecting more data. It’s best avoided by having multiple measuring devices, and humans who are trained to compare the output of these devices.

4. Algorithm bias

This final type of bias has nothing to do with data. In fact, this type of bias is a reminder that “bias” is overloaded. In machine learning, bias is a mathematical property of an algorithm. The counterpart to bias in this context is variance.
Models with high variance can easily fit into training data and welcome complexity but are sensitive to noise. On the other hand, models with high bias are more rigid, less sensitive to variations in data and noise, and prone to missing complexities. Importantly, data scientists are trained to arrive at an appropriate balance between these two properties.
Data scientists who understand all four types of AI bias will produce better models and better training data. AI algorithms are built by humans; training data is assembled, cleaned, labeled and annotated by humans. Data scientists need to be acutely aware of these biases and how to avoid them through a consistent, iterative approach, continuously testing the model, and by bringing in well-trained humans to assist.
Source: The Next Web
To Read Our Daily News Updates, Please visit Inventiva or Subscribe Our Newsletter & Push.

ELSS vs Equity: Which is Best For Tax Saving?

What Has Impacted The Apple iPhone Shipments?

With No Respite To Manipur Violence, There Is Another Side To The Coin – Drugs, Politics And Armed Militia

AI Created Celebrity Porn To Be Reviewed By Meta’s Oversight Board; The Epidemic Of Deepfake Porn Ruining Many Lives And It Is Worse Than You Think

Instagram Influencers with More Followers Than Their Countries’ Populations

Elevate Your Skills with Dynamic Microsoft Project Courses

Israel Army Chief Pledges Iran Response, Bracing For The Worst As Western Countries Urge Restraint

Why Is IndiGo India’s Most Unsafe Airline? Why DGCA & Aviation Ministry Do Not Suspend The License Of Indigo But Are Happy To Play With The Lives Of 100s Of People

Brace For Inflation. Iran’s Attack On Israel Has Deep Implications For The Global Economy And India, Affecting Trade, Oil, And Daily Life. What Should Indian Investors Do?

Embracing Fairness And Beauty At The Cost Of Kidney Damage: From Using Products That Are High In Mercury To Recent Growing Use Of Glutathione, How We Are Still Enslaved In The Prejudices Of The Past!

4 human-caused biases we need to fix for machine learning

1. Sample bias

2. Prejudice bias

3. Measurement bias

4. Algorithm bias

Nitin Naresh

Read Next

Instagram Influencers with More Followers Than Their Countries’ Populations

Israel Army Chief Pledges Iran Response, Bracing For The Worst As Western Countries Urge Restraint

Why Is IndiGo India’s Most Unsafe Airline? Why DGCA & Aviation Ministry Do Not Suspend The License Of Indigo But Are Happy To Play With The Lives Of 100s Of People

Instagram Influencers with More Followers Than Their Countries’ Populations

Israel Army Chief Pledges Iran Response, Bracing For The Worst As Western Countries Urge Restraint

Why Is IndiGo India’s Most Unsafe Airline? Why DGCA & Aviation Ministry Do Not Suspend The License Of Indigo But Are Happy To Play With The Lives Of 100s Of People

Leave a Reply Cancel reply

Top 10 Best Agriculture Companies in India 2022

Top 10 Best Artificial Intelligence (AI) Companies of India in 2022

Ampere launches new chip built from ground up for cloud workloads

Acer may shutter or sell StarVR after location-based VR revenues sink

Indonesia short on oxygen, seeks help as virus cases soar

Floods- Why are Pune and Mumbai prone to it?

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

The death of democracy in India

Employee Engagement In The Hybrid Workplace Of The Future

1. Sample bias

2. Prejudice bias

3. Measurement bias

4. Algorithm bias

Read Next

Instagram Influencers with More Followers Than Their Countries’ Populations

Israel Army Chief Pledges Iran Response, Bracing For The Worst As Western Countries Urge Restraint

Why Is IndiGo India’s Most Unsafe Airline? Why DGCA & Aviation Ministry Do Not Suspend The License Of Indigo But Are Happy To Play With The Lives Of 100s Of People

China Startup that owns Musical.ly Beats Uber to become World’s Most Valuable Startup

The 10 most reliable cars of 2018

Related Articles

Leave a Reply Cancel reply

Top 10 Best Agriculture Companies in India 2022

Top 10 Best Artificial Intelligence (AI) Companies of India in 2022

Ampere launches new chip built from ground up for cloud workloads

Acer may shutter or sell StarVR after location-based VR revenues sink

Indonesia short on oxygen, seeks help as virus cases soar

Floods- Why are Pune and Mumbai prone to it?

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

The death of democracy in India

Employee Engagement In The Hybrid Workplace Of The Future

Adblock Detected