How Big Data Can Eliminate Racial Bias and Structural Discrimination

Healthcare is utilizing Big Data to to assist in creating systems that can be used to detect health risks, implement preventative care, and provide an overall better experience for patients. However, there are fundmental issues that exist in the creation and implementation of these systems. Medical algorithms and efforts in precision medicine often neglect the structural inequalities that already exist for minorities accessing healthcare and therefore perpetuate bias in the healthcare industry. The author examines current applications of these concepts, how they are affecting minority communities in the United States, and discusses improvements in order to achieve more equitable care in the industry.

Check Report Status Status: final, Type: Report

  • Robert Neubauer, fa20-523-304
  • Edit


Healthcare is utilizing Big Data to to assist in creating systems that can be used to detect health risks, implement preventative care, and provide an overall better experience for patients. However, there are fundmental issues that exist in the creation and implementation of these systems. Medical algorithms and efforts in precision medicine often neglect the structural inequalities that already exist for minorities accessing healthcare and therefore perpetuate bias in the healthcare industry. The author examines current applications of these concepts, how they are affecting minority communities in the United States, and discusses improvements in order to achieve more equitable care in the industry.


Keywords: healthcare, machine learning, data science, racial bias, precision medicine, coronavirus, big data, telehealth, telemedicine, public health.

1. Introduction

Big Data is helping to reshape healthcare through major advancements in telehealth and precision medicine. Due to the swift increase in telehealth services due to the COVID-19 pandemic, researchers at the University of California San Francisco have found that black and hispanic patients use these services less frequently than white patients. Prior to the pandemic, research showed that racial and ethnic minorities were disadvantaged by the digital divide 1. These differences were attributed to disparities in access to technology and digital literacy 2. Studies like these highlight how racial bias in healthcare is getting detected more frequently; However, there are few attempts to eradicate it through the use of similar technology. This has implications in various areas of healthcare including major healthcare algorithms, telehealth, precision medicine, and overall care provision.

From the 1985 Report of the Secretary’s Task Force on Black and Minority Health, ‘Blacks, Hispanics, Native Americans and those of Asian/Pacific Islander heritage have not benefited fully or equitably from the fruits of science or from those systems responsible for translating and using health sciences technology’ 3. The utilization of big data in industries largely acts to automate a process that was carried out by a human. This makes the process quicker to accomplish and the outcomes more precise since human error can now be eliminated. However, whenever people create the algorithms that are implemented, it is common that these algorithms will align with the biases of the human, or system, that created it. An area where this is happening that is especially alarming is the healthcare industry. Structural discrimination has long caused discrepencies in healthcare between white patients and minority patients and, with the introduction of big data to determine who should receive certain kinds of care, the issue has not been resolved but automated. Studies have shown that minority groups that are often at higher risk than white patients receive less preventative care while spending almost equal amounts on healthcare 4. National data also indicates that racial and ethnic minorities also have poorer health outcomes from preventable and treatable diseases such as cardiovascular disease, cancer, asthma, and HIV/AIDS than those in the majority 5.

2. Bias in Medical Algorithms

In a research article published to Science in October of 2019, the researchers uncovered that one of the most used algorithms in healthcare, widely adopted by non- and for-profit medical centers and government agencies, less frequently identified black patients for preventative care than white patients. This algorithm is estimated to be applied to around 200 million people in the United States every year in order to target patients for high-risk care management. These programs seek to improve the care of patients with complex health needs by providing additional resources. The dataset used in the study contained the algorithms predictions, the underlying ingredients that formed the algorithm, and rich data outcomes which allowed for the ability to quantify racial disparities and isolate the mechanisms by which they arise. The sample consisted of 6,079 self-identified black patients and 43,539 self-identified white patients where 71.2% of all patients were enrolled in commercial insurance and 28.8% were on Medicare. On average, the patient age was 50.9 years old and 63% of patients were female. The patients enrolled in the study were classified among risk percentiles, where patients with scores at or above the 97th percentile were auto-enrolled and patients with scores over the 55th percentile were encouraged to enroll 4.

In order to measure health outcomes, they linked predictions to a wide range of outcomes in electronic health records, which included all diagnoses, and key quantitative laboratory studies and vital signs that captured the severity of chronic illnesses. When focusing on a point in the very-high-risk group, which would be patients in the 97th percentile, they were able to quantify the differences between white and black patients, where black patients had 26.3% more chronic illnesses than white patients4. To get a corrected health outcome measurement among white and black patients, the researchers set a specific risk threshold for health outcomes among all patients, and repeated the procedure to replace healthier white patients with sicker black patients. So, for a white patient with a health risk score above the threshold, their data was replaced with a black patient whose score fell below the threshold and this continued until the health risk scores for black and white patients were equal and the predictive gap between patients would be eliminated. The health scores were based on the number of chronic medical conditions. The researchers then compared the data from their corrected algorithm and the original and found that the fraction of black patients at all risk thresholds above the 50th percentile increased when using the corrected algorithm. At the 97th percentile, the fraction of black patients increased to 46.5% from the original 17.7% 4. Black patients are likely to have more severe hypertension, diabetes, renal failure, and anemia, and higher cholesterol. Using data from clinical trials and longitudinal studies, the researchers found that for mortality rates with hypertension and diabetes black patients had a 7.6% and 30% increase, respectively4.

In the original and corrected algorithms, black and white patients spent roughly the same amount on healthcare. However, black patients spent more on emergency care and dialysis while white patients spent more on inpatient surgery and outpatient specialist care4. In a study that tracked black patients with a black versus a white primary care provider, it found the occurrence of a black primary care provider recommending preventative care was significantly higher than recommendations from a white primary care provider. This conclusion sheds additional light on the disparities black patients face in the healthcare system and further adds to the lack of trust black people have in the healthcare system that has been heavily documented since the Tuskegee study 6. The change recommended by the researchers that would correct the gap in the predictive care model was rather simple, shifting from predictions from purely future cost to an index that combined future cost prediction with health prediction. The researchers were able to work with the distributor of the original algorithm in order to make a more equitable algorithm. Since the original and corrected models from the study were both equal in cost but varied significantly in health predictions, they reworked the cost prediction based on health predictions, conditional on the risk factor percentiles. Both of the models excluded race from the predictions, but the algorithm created with the researchers saw an 84% reduction in bias among black patients, reducing the number of excess active chronic conditions in black patients to 7,758.

3. Disparities Found with Data Dashboards

To relate this to a present health issue that is affecting everyone, more black patients are dying from the novel coronavirus than white patients. In the United States, in counties where more than 86% of residents are black, the COVID-19 death rates were 10 times higher than the national average 7. Considering how medical algorithms allocate resources to black patients, similar trends are expected for minorities, people who speak languages other than english, low-income residents, and people without insurance. At Brigham Health, a member of the not-for-profit Mass General Brigham health system, Karthik Sivashanker, Tam Duong, Shauna Ford, Cheryl Clark, and Sunil Eappen created data dashboards in order to assist staff and those in positions of leadership. The dashboards included rates of those who tested positive for COVID-19 sorted into different subgroups based on race, ethnicity, language, sex, insurance status, geographic location, health-care worker status, inpatient and ICU census, deaths, and discharges 7.

Through the use of these dashboards, the COVID-19 equity committee were able to identify emerging risks to incident command leaders, including the discovery that non-English speaking Hispanic patients had higher mortality rates when compared to English speaking Hispanic patients. This led to quality-improvement efforts to increase patient access to language interpreters. While attempting to implement these changes, it was discovered that efforts to reduce clinicians entering patient rooms to maintain social distancing guidelines was impacting the ability for interpreters to join at a patient’s bedside during clinician rounding. The incident command leadership expanded their virtual translation services by purchasing additional iPads to allow interpreters and patients to communicate through online software. The use of the geographic filter, when combined with a visual map of infection-rates by neighborhood, showed that people who lived in historically segregated and red-lined neighborhoods were tested less frequently but tested positive more frequently than those from affluent white neighborhoods 7. In a study conducted with survey data from the Pew Research Center on U.S. adults with internet access, black people were significantly more likely to report using telehealth services. In the same study, black and latino respondents had higher odds of using telehealth to report symptoms 8.

However, COVID-19 is not the only disease that researchers have found to be higher in historically segregated communities. In 1999, Laumann and Youm found that disparities segregation in social and sexual networks explained racial disparities in STDs which, they suggested, could also explain the disparities black people face in the spread of other diseases 3. Prior to 1999 researchers believed that some unexplained characteristic of black people described the spread of such diseases, which shows the pervasiveness of racism in healthcare and academia. Residential segregation may influence health by concentrating poverty, environmental pollutants, infectious agents, and other adverse conditions. In 2006, Morello-Frosch and Jesdale found that segregation increased the risk of cancer related to air pollution 3. Big Data can assess national and local public health for disease prevention. An example is how the National Health Interview Survey is being used to estimate insurance coverage in different areas of the U.S. population and clinical data is being used to measure access and quality-related outcomes. Community-level data can be linked with health care system data using visualization and network analysis techniques which would enable public health officials and clinicians to effectively allocate resources and assess whether all patients are getting the medical services they need 9. This would drastically improve the health of historically segregated and red-lined communities who are already seeing disparities during the COVID-19 pandemic.

4. Effect of Precision Medicine and Predictive Care

Public health experts established that the most important determinant of health throughout a person’s course of life is the environment where they live, learn, work, and play. There exists a discrepancy between electronic health record systems in well-resourced clinical practices and smaller clinical sites, leading to disparities in how they are able to support population health management. For Big Data technology, if patient, family, and community focus were implemented equally in both settings, it has shown that the social determinants of health information would both improve public health among minority communities and minimize the disparities that would arise. Geographic information systems are one way to locate social determinants of health. These help focus public health interventions on populations at greater risk of health disparities. Duke University used this type of system to visualize the distribution of individuals with diabetes across Durham County, NC in order to explore the gaps in access to care and self-management resources. This allowed them to identify areas of need and understand where to direct resources. A novel approach to identify place-based disparities in chronic diseases was used by Young, Rivers, and Lewis where they analyzed over 500 million tweets and found a significant association between the geographic location of HIV-related tweets and HIV prevalence, a disease which is known to predominantly affect the black community 9.

One of the ways researchers call for strengthening the health of the nation is through community-level engagement. This is often ignored when it comes to precision medicine, which is one of the latest ways that big data is influencing healthcare. It has the potential to benefit racial and ethnic minority populations since there is a lack of clinical trial data with adequate numbers of minority populations. It is because of this lack of clinical data that predictions in precision medicine are often made off risks associated with the majority which give preferential treatment to those in the majority while ignoring the risks of minority groups, further widening the gap in the allocation of preventative health resources. These predictive algorithms are rooted in cost/benefit tradeoffs, which were proven to limit resources to black patients from the science magazine article on medical algorithms 10. For the 13th Annual Texas Conference on Health Disparities, the overall theme was “Diversity in the Era of Precision Medicine.” Researchers at the event said diversity should be kept at the forefront when designing and implementing the study in order to increase participation by minority groups 6. Building a trusting relationship with the community is also necessary for increased participation, therefore the institution responsible for recruitment needs to be perceived as trustworthy by the community. Some barriers for participation shared among minority groups are hidden cost of participation, concern about misuse of research data, lack of understanding the consent form and research materials, language barrier, low perceived risk of disease, and fear of discrimination 6. As discussed previously, overall lack of distrust in the research process is rooted in the fact that research involving minority groups often overwhelmingly benefits the majority by comparison. Due to the lack of representation of minority communities, big clinical data can be generated for the means of conducting pragmatic trials with underserved populations and distribute the lack of benefits 9.

4.1 Precision Public Health

The benefit of the majority highlights the issue that one prevention strategy does not account for everyone. This is the motivation behind combining precision medicine and public health to create precision public health. The goal of this is to target populations that would benefit most from an intervention as well as identify which populations the intervention would not be suitable for. Machine learning applied to clinical data has been used to predict acute care use and cost of treatment for asthmatic patients and diagnose diabetes, both of which are known to affect black people at greater rates than white patients 9. This takes into account the aforementioned factors that contribute to a person’s health and combines it with genomic data. Useful information about diseases at the population level are attributed to advancements in genetic epidemiology, through increased genetic and genomic testing. Integration of genomic technologies with public health initiatives have already shown success in preventing diabetes and cancers for certain groups, both of which affect black patients at greater rates than white patients. Specifically, black men have the highest incidence and mortality rates of prostate cancer. The presence of Kaiso, a transcriptional repressors present in human genes, is abundant in those with prostate cancer and, in black populations, it has been shown to increase cancer aggressive and reduce survival rates 10. The greatest challenge affecting advancements made to precision public health is the involvement of all subpopulations required to get effective results. This demonstrates another area where there’s a need for the healthcare industry to prioritize building a stronger relationship with minority communities in order to assist in advancing healthcare.

Building a stronger relationship with patients begins with having an understanding of the patient’s needs and their backgrounds, requiring multicultural understanding on the physicians side. This can be facilitated by the technological advances in healthcare. Researchers from Johns Hopkins University lay out three strategic approaches to improve multicultural communications. The first is providing direct services to minimize the gap in language barriers through the use of interpreters and increased linguistic competency in health education materials. The second is the incorporation of cultural homophily in care through staff who share a cultural background, inclusion of holistic medical suggestions, and the use of community health workers. Lastly, they highlight the need for more institutional accommodation such as increasing the ability of professionals to interact effectively within the culture of the patient population, more flexible hours of operation, and clinic locations 11. These strategic approaches are much easier to incorporate into practice when used in telehealth monitoring, providing more equitable care to minority patients who are able to use these services. There are three main sections of telehealth monitoring which include synchronous, asynchronous, and remote monitoring. Synchronous would be any real-time interaction, whether it be over the telephone or through audio/visual communication via a tablet or smartphone. This could occur when the patient is at their home or they are present with a healthcare professional while consulting with a medical provider virtually. Asynchronous communication occurs when patients communicate with their provider through a secure messaging platform in their patient portal. Remote patient monitoring is the direct transmission of a patient’s clinical measurements to their healthcare provider. Remote access to healthcare would be the most beneficial to those who are medically and socially vulnerable or those without ready access to providers and could also help preserve the patient-provider relationship 12. Connecting a patient to a provider that is from a similar cultural or ethnic background becomes easier through a virtual consultation, a form of synchronous telehealth monitoring. A virtual consultation would also help eliminate the need for transportation and open up the flexibility of meeting times for both the patient and the provider. From this, a way to increase minority patient satisfaction in regards to healthcare during the shift to telehealth services due to COVID-19 restrictions would be a push to increase technology access to these groups by providing them with low-cost technology with remote-monitoring capabilities.

5. Telehealth and Telemedicine Applications

Telehealth monitoring is evolving the patient-provider relationship by extending care beyond the in-person clinical visit. This provides an excellent opportunity to build a more trusting and personal relationship with the patient, which would be critical for minority patients as it would likely increase their trust in the healthcare system. Also, with an increase in transparency and involvement with their healthcare, the patient will be more engaged in the management of their healthcare which will likely have more satisfactory outcomes. Implementing these types of services will create large amounts of new data for patients, requiring big data applications in order to manage it. Similar to the issue of inequality in the common medical algorithm for determination of preventative care, if the data collected from minority groups using this method is not accounted for properly, then the issue of structural discrimination will continue. The data used in healthcare decision-making often comes from a patient’s electronic health record. An issue that presents itself when considering the use of a patient’s electronic health record in the process of using big data to assist with the patient’s healthcare is missing data. In the scope of telehealth monitoring, since the visit and most of the patient monitoring would be done virtually, the electronic health record would need to be updated virtually as well 13.

For telehealth to be viable, the tools that accommodate it need to work seamlessly and be supported by the data streams that are integrated into the electronic health record. Most electronic health record systems are unable to be populated with remote self-monitoring patient-generated data 13. However, the American Telemedicine Association is advocating for remotely-monitored patient-generated data to be incorporated into electronic health records. The SMART Health IT platform is an approach that would allow clinical apps to run across health systems and integrate with electronic health records through the use of a standards-based open-source application programming interface (API) Fast Healthcare Interoperability Resources (FHIR). There are also advancements being made in technology that is capable of integrating data from electronic health records with claims, laboratory, imaging, and pharmacy data 13. There is also a push to include social determinants of health disparities including genomics and socioeconomic status in order to further research underlying causes of health disparities 9.

5.1 Limitations of Teleheath and Telemedicine

The issue of lack of access to the internet and devices that would be necessary for virtual health visits would limit the participation of those from lower socioeconomic backgrounds. From this arises the issue of representativeness in remotely-monitored studies where the participant must have access to a smartphone or tablet. However, much like the Brigham Health group providing iPads in order to assist with language interpretation, there should be an incentive to provide access to these devices for patients in high risk groups in order to boost trust and representation in this type of care. From the article that discussed the survey results that found black and latino patients to be more responsive to using telehealth, the researchers contrasted the findings with another study where 52,000 Mount Sinai patients were monitored between March and May of 2020 that found black patients were less likely to use telehealth than white patients 1. One reason for the discrepancy the researchers introduce is that the Pew survey, while including data from across the country, only focused on adults that had internet access. This brings up the need for expanding broadband access, which is backed by many telehealth experts 8.

The process of providing internet access and devices with internet capabilities to those without them should be similar to that from the science magazine study where patients whose risk scores are above a certain threshold should automatically qualify for technological assistance. Programs such as the Telehealth Network Grant Program would be beneficial for researchers conducting studies with a similar focus, as the grant emphasizes advancements in tele-behavioral health and tele-emergency medical services and providing access to these services to those who live in rural areas. Patients from rural areas are less likely to have access to technology that would enable them to participate in a study requiring remote monitoring. The grant proposal defines tele-emergency as an electronic, two-way, audio/visual communication service between a central emergency healthcare center, the tele-emergency hub, and a remote hospital emergency department designed to provide real-time emergency care consultation 14. This is especially important when considering that major medical algorithms show that black patients often spend more on emergency medical care.

6. Conclusion

Big Data is changing many areas of healthcare and all of the areas that it’s affecting can benefit from making structural changes in order to allow minorities to get equitable healthcare. This includes how the applications are put into place, since Big Data has the ability to demonstrate bias and reinforce structural discrimination in care. It should be commonplace to consider race or ethnicity, socioeconomic status, and other relevant social determinants of health in order to account for this. Several studies have displayed the need for different allocations of resources based on race and ethnicity. From the findings that black patients were often given more equitable treatment when matched with a primary care provider that was black and that COVID-19 has limited in-person resources, such as a bedside interpreter for non-English speaking patients, there should be a development of a resource that allows people to be matched with a primary care provider that aligns with their identity and to connect with them virtually. When considering the lack of trust black people and other minority populations have in the healthcare system, there are a variety of services that would help boost trust in the process of getting proper care. Given the circumstances surrounding COVID-19 pandemic, there is already an emphasis on making improvements within telehealth monitoring as barriers to telehealth have been significantly reduced. Several machine-learning based studies have highlighted the importance of geographic location’s impact on aspects of the social determinants of health, including the effects in segregated communities. Recent work has shown that black and other ethnic minority patients report having less involvement in medical decisions and lower levels of satisfaction of care. This should motivate researchers who are focused on improving big data applications in the healthcare sector to focus on these communities in order to eliminate disparities in care and increase the amount of minority healthcare workers in order to have accurate representation. From the survey data showing that minority populations were more likely to use telehealth services, there needs to be an effort to highlight these communities in future work surrounding telehealth and telemedicine. Several studies have prepared a foundation for what needs to be improved and have already paved the way for additional research. With the progress that these studies have made and continued reports of inadequacies in care, it is only a matter of time before substantial change is implemented and equitable care is available.

7. References

  1. E. Weber, S. J. Miller, V. Astha, T. Janevic, and E. Benn, “Characteristics of telehealth users in NYC for COVID-related care during the coronavirus pandemic,” J. Am. Med. Inform. Assoc., Nov. 2020, doi: 10.1093/jamia/ocaa216. ↩︎

  2. K. Senz, “Racial disparities in telemedicine: A research roundup,” Nov. 30, 2020. (accessed Dec. 07, 2020). ↩︎

  3. C. L. F. Gilbert C. Gee, “STRUCTURAL RACISM AND HEALTH INEQUITIES: Old Issues, New Directions1,” Du Bois Rev., vol. 8, no. 1, p. 115, Apr. 2011, Accessed: Dec. 07, 2020. [Online]. ↩︎

  4. Z. Obermeyer, B. Powers, C. Vogeli, and S. Mullainathan, “Dissecting racial bias in an algorithm used to manage the health of populations,” Science, vol. 366, no. 6464, pp. 447–453, Oct. 2019, Accessed: Dec. 07, 2020. [Online]. ↩︎

  5. J. N. G. Chazeman S. Jackson, “Addressing Health and Health-Care Disparities: The Role of a Diverse Workforce and the Social Determinants of Health,” Public Health Rep., vol. 129, no. Suppl 2, p. 57, 2014, Accessed: Dec. 07, 2020. [Online]. ↩︎

  6. A. Mamun et al., “Diversity in the Era of Precision Medicine - From Bench to Bedside Implementation,” Ethn. Dis., vol. 29, no. 3, p. 517, 2019, Accessed: Dec. 07, 2020. [Online]. ↩︎

  7. “A Data-Driven Approach to Addressing Racial Disparities in Health Care Outcomes,” Jul. 21, 2020. (accessed Dec. 07, 2020). ↩︎

  8. “Study: Black patients more likely than white patients to use telehealth because of pandemic,” Sep. 08, 2020. (accessed Dec. 07, 2020). ↩︎

  9. X. Zhang et al., “Big Data Science: Opportunities and Challenges to Address Minority Health and Health Disparities in the 21st Century,” Ethn. Dis., vol. 27, no. 2, p. 95, 2017, Accessed: Dec. 07, 2020. [Online]. ↩︎

  10. S. A. Ibrahim, M. E. Charlson, and D. B. Neill, “Big Data Analytics and the Struggle for Equity in Health Care: The Promise and Perils,” Health Equity, vol. 4, no. 1, p. 99, 2020, Accessed: Dec. 07, 2020. [Online]. ↩︎

  11. Institute of Medicine (US) Committee on Understanding and Eliminating Racial and Ethnic Disparities, B. D. Smedley, A. Y. Stith, and A. R. Nelson, “PATIENT-PROVIDER COMMUNICATION: THE EFFECT OF RACE AND ETHNICITY ON PROCESS AND OUTCOMES OF HEALTHCARE,” in Unequal Treatment: Confronting Racial and Ethnic Disparities in Health Care, National Academies Press (US), 2003. ↩︎

  12. CDC, “Using Telehealth to Expand Access to Essential Health Services during the COVID-19 Pandemic,” Sep. 10, 2020. (accessed Dec. 07, 2020). ↩︎

  13. "[No title]." (accessed Dec. 07, 2020). ↩︎

  14. “Telehealth Network Grant Program,” Feb. 12, 2020. (accessed Dec. 07, 2020). ↩︎