Skip to main navigation Skip to search Skip to main content

Racial Bias Can Confuse AI for Genomic Studies

  • Beifen Dai
  • , Zhihao Xu
  • , Hongjue Li
  • , Bo Wang
  • , Jinsong Cai
  • , Xiaomo Liu*
  • *Corresponding author for this work
  • Hubei University
  • Beihang University
  • Peking University

Research output: Contribution to journalArticlepeer-review

Abstract

Large-scale genomic studies are important ways to comprehensively decode the human genomics, and provide valuable insights to human disease causalities and phenotype developments. Genomic studies are in need of high throughput bioinformatics analyses to harness and integrate such big data. It is in this overarching context that artificial intelligence (AI) offers enormous potentials to advance genomic studies. However, racial bias is always an important issue in the data. It is usually due to the accumulation process of the dataset that inevitability involved diverse subjects with different races. How can race bias affect the outcomes of AI methods? In this work, we performed comprehensive analyses taking The Cancer Genome Atlas (TCGA) project as a case study. We construct a survival model as well as multiple artificial intelligence prediction models to analyze potential confusion caused by racial bias. From the genomic discovery, we demonstrated cancer associated genes identified from the major race hardly overlap with the discoveries from minor races from the same causal gene discovery model. We demonstrated that the biased racial distribution will greatly affect the cancer-associated genes, even taking the racial identity as a confounding factor in the model. The prediction models will be potentially risky and less accurate due to the existence of racial bias in projects. Cancer genes from the overall patient model with strong racial bias will be less informative to the minor races. Meanwhile, when the racial bias is less severe, the major conclusion from the overall analysis can be less useful even for the major group.

Original languageEnglish
Article numberA3
JournalOncologie
Volume23
Issue number4
DOIs
StatePublished - 2022

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • Racial bias
  • artificial intelligence
  • survival analysis
  • the Cancer Genome Atlas (TCGA)

Fingerprint

Dive into the research topics of 'Racial Bias Can Confuse AI for Genomic Studies'. Together they form a unique fingerprint.

Cite this