Peeling back the layers of data science in healthcare.
Healthcare has been a data repository for quite some time. Electronic Healthcare Records (EHRs), patient summaries, clinical test results, medical imaging, and insurance claims represent just a fraction of the diverse and extensive data sources that have been generated at an unprecedented rate. As a 2023 Deloitte report highlights, the life sciences and healthcare industries account for 30% of the world’s data. The implementation of data science methods in this context serves a dual purpose: to help extract new knowledge and to transform healthcare delivery, thus creating new opportunities for better patient outcomes. In this article, we will discern the nuanced mechanisms through which data science is reshaping the healthcare industry, from increasing the rate of accurate first-time-right diagnoses to curbing the incidence of lifestyle-related diseases.
What we talk about when we talk about data science in healthcare
Data science combines multiple disciplines and techniques that allow healthcare actors to discover areas where progress can be made. While primarily anchored in methods such as big data analytics, data mining, Artificial Intelligence (AI), statistics, advanced Machine Learning, computer vision, and semantic reasoning, its scope isn’t confined to these alone. Such a wide array of data science techniques allows healthcare institutions to break down data silos and refine decision-making. When it comes to healthcare data analytics, there are four primary research perspectives available to data scientists:
- Descriptive analytics helps to scrutinize data in order to discern patterns, trends, and relationships. It’s the storyteller of the data world as it searches for answers to the fundamental question: “What happened?”. In healthcare, descriptive analytics sifts through vast volumes of health data, such as patient medical records or treatment records.
- Predictive analytics answers the question: “What is likely to happen in the future?”. Healthcare professionals utilize sophisticated algorithms and statistical models to pinpoint patterns in historical data that can be indicative of future events, such as disease outbreaks or patient readmissions.
- Prescriptive analytics recommends the best course of action to achieve a desired outcome. It responds to the question, “What should be done to achieve a specific goal?”. For example, it can recommend personalized treatment plans for patients based on their behavioral data, genetic makeup, or environment.
- Diagnostic analytics goes beyond the ‘what’ of descriptive analytics and the ‘what might happen’ of predictive analytics. Instead, it investigates the ‘why’ — uncovering the root causes of specific healthcare outcomes. Diagnostic analytics helps to answer questions like “Why did this patient’s condition worsen?” or “What factors led to the success of this treatment plan?”.
One of the main reasons data science offers a bright promise to healthcare is the unprecedented data deluge that has engulfed the healthcare sector. With the rapid digitization of healthcare systems, an immense volume of data is generated daily. EHRs, clinical trial data, health insurance claims, wearable devices, genetic and genomic data, medical imaging, and even patient-generated data from sources like mobile health applications and social media interactions contribute to this vast pool of information (see Fig 1.). According to the global investment bank RBC Capital Markets, the annual growth rate of healthcare data is set to achieve a 36% increase by 2025, outpacing other sectors such as manufacturing, financial services, as well as media and entertainment.Figure 1. A list of data sources in the healthcare industry
This data tsunami can be seen both as a challenge and an opportunity. On the one hand, researchers are often confronted with questions of how to access or gather representative data with high-quality annotations. Moreover, data often comes in an unstructured form from different sources, creating a challenge of interoperability. Privacy and security pose other concerns for those dealing with large data volumes, especially when it comes to vulnerable patient health data. These nuances are just a drop in the ocean of healthcare’s complex data science landscape.
On the other hand, the sheer volume and diversity of data offer the potential for significant discoveries and improvements in patient care, research, and healthcare systems. This wealth of data provides researchers with an extensive foundation on which to base their investigations of the human body. It allows for the exploration of advanced strategies of drug development, the identification of novel biomarkers, and the exploration of new approaches to clinical trials. Moreover, healthcare administrators and policymakers can leverage data-driven insights so as to make informed decisions about resource allocation, infrastructure development, and policy planning.
What’s more, AI, with its ability to analyze complex patterns, has become a linchpin in healthcare data analysis in recent years. Particularly adept at handling rich data types, such as genetic sequences, medical images, and electronic health records, it can process these multifaceted datasets swiftly and accurately. In a recent Nature survey of more than 1,600 researchers, the majority of respondents admit that AI offers faster ways to process data, accelerates computations, and saves an individual’s time and/or money. Meanwhile, the number of research papers mentioning AI or Machine Learning concepts in their titles or abstracts has increased up to 5% in life sciences and healthcare from 1983 to 2023.