Skip to main content
Skip to footer
Blog AI Bias image_people_orange overlay

SERIES: AI myths – Gender and racial bias in ML algorithms?

Estimated read time: 7 minutes

by Tanuj Gupta, MD | Rebecca Winokur, MD, MPH

Published on 8/24/2020

Artificial intelligence (AI) and machine learning (ML) are hot topics in most industries, but especially in health care where there’s great hope that they’ll usher in industry advancements. While AI and ML have the potential to transform patient care, quality and outcomes, there are also concerns about the negative impact these technologies could have on human interaction and patient safety.

In the third post of this series, Tanuj Gupta, MD, vice president, Cerner Intelligence, is joined by Rebecca Winokur, MD, Cerner physician executive. They jointly tackle gender and racial bias in ML algorithms and how the industry can overcome the issue.

How can an algorithm be biased?

Broadly speaking, AI is a set of tools and technologies that are put together to mimic human behavior and boost the capacity and efficiency of performing human tasks. ML is a subset of AI that automatically adapts over time based on data and end-user input. Bias can be introduced into AI and ML through things like human behavior and the data we generate.

Traditionally, health care providers consider a patient’s unique situation when making decisions, along with information sources, such as their clinical training and experiences, as well as published medical research. Now, with ML, we can be more efficient and improve our ability to examine large amounts of data, flag potential problems and suggest next steps for treatment.

While this technology is promising, there are some risks. There’s an assumption that ML and rules-based clinical decision support applies objective data to make objective conclusions. But we're learning this is a myth —while AI and ML are just tools, they have many points of entry that are vulnerable to bias, from inception to end use. 

For instance, the ML model may be biased from the start if its assumptions are skewed. Once built, the model is tested against a large data set. If the data set is not appropriate for its intended use, the model can become biased. Bias can show up anywhere in the design of the algorithm: the types of data, how you collect it, how it’s used, how it’s tested, who it’s intended for or even the question it’s asking.

As ML learns and adapts, it’s vulnerable to potentially biased input and patterns. Existing prejudices – especially if they’re unknown – and data that reflects societal or historical inequities can result in bias being baked into the data that’s used to train an algorithm or ML model to predict outcomes.

If not identified and mitigated, clinical decision-making based on bias could negatively impact patient care and outcomes. Bias can happen any time your view of something is no longer objective. If you don't know that bias is there, and you don't know to look for it, you could cause harm. There are multiple biases you might encounter when working with AI, such as:

  • Racial
  • Gender
  • Religious
  • Sexual identity
  • Age
  • Socioeconomic

What are the consequences of a biased algorithm?

When bias is introduced into an algorithm, certain groups can be targeted unintentionally. Gender and racial biases have been identified in commercial facial recognition systems, which are known to falsely identify Black and Asian faces 10 to 100 times more than Caucasian faces, and have more difficulty identifying women than men. Bias is also seen in natural language processing that identifies topic, opinion and emotion.

Biases in health care AI can perpetuate and worsen health disparities. For example,

  • An AI model intended to improve clinic performance instead disproportionately decreased access to care for Black patients. Patients were arranged and scheduled based on their no-show history, which disproportionately identified Black patients as high-risk for missing subsequent appointments. Because of factors like poor access to care and perpetual lower quality of care, Black patients often mistrust the health care system, which can translate into higher risk for missing appointments. The patients within the higher-risk group who did make their appointments therefore had longer wait times and a more unfavorable experience, increasing their no-show risk at subsequent visits. This negative cycle can perpetuate or further worsen an existing bias in access to care.
  • An algorithm, that was intended to offer additional services to patients with an increased risk of disease complication, used health care spending as a proxy for health status. This incorrectly concluded that Black patients were healthier than equally sick Caucasian patients. Unintentional flagging of more White patients than Black patients could impact decisions around who receives additional services and has the potential to negatively affect patient outcomes.

How do we reduce and limit bias in ML?

If we can agree that bias in AI is a problem, then we can act with intention to reduce it. Though this is an emerging field, several approaches from clinical research design might inform us on techniques we can apply toward gender and racial bias in algorithms.

Algorithm design: In clinical research design, we’ve learned to mitigate bias by diversifying the groups of patients who participate in drug trials and by publishing patient demographics and study methods for transparency. Cross-training models by using different data sets from contrasting sites might help detect how diverse populations could affect model performance. The authors of a recent AMIA publication on developing reporting standards for AI in health care suggest four components of AI solutions that should be made transparent:

1. Study population and setting – e.g., data sources, type of health care setting used, inclusion/exclusion criteria

2. Patient demographic characteristics – e.g., age, sex, race, socioeconomic status

3. Model architecture – e.g., model features, model output, target users

4. Model evaluation – e.g., internal and external validation methods

Algorithm use and interpretation: ML diagnostics are just another form of a lab test. However, for blood-based labs, doctors are trained to detect biased results. If a serum potassium level is taken from a hemolyzed blood sample that has been left sitting too long, we know the reading may be falsely elevated. Similarly, if ML algorithms provide transparent reasons for making a recommendation, then clinicians have information to validate the result or to consider how results may be biased based on a holistic view of the patient.

Adverse event detection: Consider these four proactive and reactive ways to detect unintentional effects of AI.

1. Build two different versions of a model – one with demographics included and one without. Comparing the outcomes of both models can proactively assess how much risk is due to demographics in general.

2. Examine algorithm results for an unexplained minority group effect that is statistically significant when compared to the median.

3. Explicitly create a quality measure of disparity that can be monitored over time. Changes in this quality measure can be investigated after a model is introduced, especially if the model continuously learns from new data.

4. Follow established processes for adverse drug event (ADE) reporting, which creates time frames for responding to a patient safety risk, standards for communicating the issue, resolution to affected parties and required response times for mitigation. This system might one day be adapted to include responses to unintentional bias risk.

Systemic bias: This is worth repeating: the introduction of bias, through things like human behavior and data we generate, will lead to bias in AI and ML. Addressing this problem is much broader than just algorithms. Training clinicians to be more aware of disparities is important. We should also be asking, what might occur when clinicians encounter patients with mental illness, a history of incarceration or drug abuse, dementia, morbid obesity, economic poverty or other factors that could lead to unconscious bias? Care is time intensive and expensive. Take time away from clinicians by forcing them to see more patients or by overloading them with administrative tasks, and it reduces the time they have to care for the whole person. Our health care system, technologies and tools should be mobilized to give our clinicians more awareness of systemic bias and more time to spend with patients.

If the systems in which our AI and ML tools are developed or implemented are biased, then their resulting health outcomes can be biased, which can perpetuate health disparities. While breaking down systemic bias can be challenging, it’s important that we do all we can to identify and correct it in all its manifestations. This is the only way we can optimize AI and ML in health care and ensure the highest quality of patient experience.

Stay tuned for the next article in our series, where we’ll explore what the future may look like if we unlocked the full potential of ML diagnostic algorithms and ML therapeutic algorithms for the health care industry.

More like this: