The healthcare industry generates massive amounts of data. Unlocking its value can, however, be challenging. Although the healthcare industry shifted to digitization years ago with electronic health records (EHRs), there are still many obstacles, including exchanging data and compliance regulations. Machine learning in healthcare expands and touches many areas, deriving powerful insights and predictions from data models.

With machine learning, the healthcare industry has the potential to improve operations, reduce costs, drive better patient care, understand public health holistically, and do much more.

Applying Machine Learning in Healthcare

So how does healthcare use machine learning? First, it’s important to define machine learning.

Machine learning is the use and development of computer systems that can learn and adapt by using algorithms and statistical models to analyze data patterns.

The process applies those algorithms to structured and unstructured data, uncovering interrelationships, anomalies, and trends. Algorithms learn over time as they ingest more data, imitating human learning and improving their accuracy.

This technology can provide many advantages to various fields of healthcare.


Precision Medicine

A common use of machine learning in healthcare is precision medicine, which describes the prediction of a treatment protocol’s likelihood of success based on patient attributes.

One example is a study conducted on the treatment of acute myeloid leukemia. The researchers included data from patients with the disease to determine how and why pathologically similar cancers react differently to the same drug regimens. They also incorporated multiomic information. The study concluded with the assertion that this method performed better than other approaches.

What the researchers learned in this scenario allowed for future patients to receive more personalized care. With this use case, clinicians were likely able to save or prolong the lives of many patients.



One of the most effective areas of machine learning in healthcare is diagnosis. Many cancers and genetic diseases are hard to detect, even with modern science. IBM Watson Genomics is an example of this, combining cognitive computing with genome-based tumor sequencing.

It’s also proficient at diagnosing diabetes, one of the most deadly and prevalent chronic diseases. IBM Watson Genomics can detect diabetes early so treatment can begin much sooner, resulting in better outcomes for patients.


Radiology Analysis

Another avenue for machine learning in healthcare is through radiology. There are several applications for machine learning in diagnostic imaging, including:

  • Optimized order scheduling and patient screening, which can reduce the risk of patients missing care
  • Intelligent imaging systems, leading to decreased image time and unnecessary imaging and also improving positioning
  • Automated detection and interpretation of findings for various cancers and other diseases with faster processing speeds and the ability to detect anomalies beyond what the human eye can see
  • Postprocessing, including image segmentation, registration, and quantification
  • Automated clinical decision support and examination protocoling



EHRs are the standard houses for all medical records. Although this technology has many benefits, not all clinicians find it to be an efficient or effective tool. Now, machine learning is addressing these issues. For example, algorithms support clinician use by offering clinical decision support, image analysis automation, and telehealth technologies integration.


Medical Research

In situations where large amounts of health data are available, machine learning can also play a key role in determining trends in public health. Such data must be anonymized to ensure compliance. Crowdsourcing is one way of collecting medical data in order to analyze it. Findings from this could be significant, especially with the world still being in the midst of a pandemic.

Why Healthcare Entities Haven’t Fully Adopted Machine Learning

With all the possibilities that machine learning brings to the table, why isn’t every healthcare organization embracing it?

The key challenges are data governance, data silos, data scientist hiring and training, data integrity, compliance, and costs. Not every organization can launch machine learning initiatives. It seems out of reach for many, but new platforms are making it easier and more cost-effective.

Machine Learning in Healthcare Made Easier with 3D Visualizations

It is possible to overcome some of the biggest obstacles for healthcare to realize the power of machine learning. New platforms gather and clean the data, ingest it, and deliver visualizations. Data scientists aren’t necessary, which means it’s more accessible and actionable.

Virtualitics wraps up exciting data science hackathon with Caltech students leveraging the power of Virtualitics AI Platform to solve unique business cases and challenges.

We are proud to have just wrapped up our Data Science Hackathon for Caltech students in which they leveraged the power of Virtualitics AI Platform to solve unique business cases in the Bioinformatics and Predictive Maintenance & IoT spaces.

Each team consisted of 2-5 students where they competed for $9000 in prizes and internship/full-time opportunities with the Virtualitics AI and Engineering teams. Within 24 hours of gaining access to VIP students were able to immediately leverage the speed and ease of use of VIP’s augmented analytical tools to garner key insights. 

Virtualitics AI Platform enables you to augment your analytical practices to easily find, visualize, and share insights from complex data sets 100X faster than traditional business intelligence tools. VIP creates a no-code/low code environment where users can leverage patented ML & AI models to analyze data at the speed of now. VIP helps identify and detect correlations within your data at a record pace.

🏆 Check out the winners!


Predictive Maintenance & IoT

🥇 Solving Supply Chain Inefficiencies using VIP (Team 11)

Team Members: Esmir Mesic, Ian Fowler, Devin Hartzell, Krish Mehta

Caltech students leveraged Virtualitics AI Platform to identify inefficiencies in a company’s supply chain and provided recommendations on alternative warehouse locations.

 A global logistics company with records pertaining to supply chain needs to analyze their supply chain data. The students analyzed this data in order to identify and eliminate inefficiencies in the supply chain. 

The data is geospatial and has many features to analyze, some of which are categorical. Either of these factors make the data hard to visualize through common methods such as Excel and Matplotlib.

The team constructed features in Python to better analyze the tardiness of warehouse orders and geographically categorize them, and then used VIP to segregate the data by category in order to pinpoint inefficiencies in our supply chain. For instance, they found that moving operations for Mens’ Footwear from Moldova to Italy would reduce lateness in their orders.

VIP capabilities used:
Within VIP, generation of 2D and 3D histogramsbar,  scatterand violin plots allowed us to quickly conduct exploratory data analysis. Virtualitics AI Platform allowed us to investigate interesting parts of our data and begin to track down our root issue. Virtualitics Python API allowed us to seamlessly generate and visualize network graphs from geospatial data. This was the most intuitive way to view the data, essential for bringing us to our final insights. Virtualitics network explainability tool helped us understand the data better.




🥇 Drug Repurposing Using Network Analysis (Team 3)

Team Members: Kevin Huang, Anish Shenoy, Grace Lu, Jae Yoon Kim, Wesley Huang

Caltech students leveraged Virtualitics AI Platform to predict which drugs could be repurposed to treat certain diseases while accounting for drug-drug interactions.

The Objective was to identify which drugs could potentially be repurposed to treat different diseases based on network analysis of drug-target-interaction data.

The Challenge was to find which drugs could potentially be repurposed based on unstructured and difficult to process drug-bank data with drugs that have no known use for a given class of disease.

To solve this problem, Caltech students first processed their data to get the similarity between drugs based on their shared targets. They clustered their data to locate the candidates and verified their accuracy by comparing older and newer versions of their drug-target-interaction data to see if their candidates were indeed repurposed after medical research.

VIP capabilities used
Network Graph Community Detection: clustering the data helped group drugs with similar properties.
Histograms: using the histogram feature helped figure out what the leading drug property of each cluster of drugs was to identify which drugs didn’t match this leading property and label it for repurposing.
Eccentricity: after displaying which repurposed drugs were successfully verified, we ran eccentricity to see if there was a pattern in them.

🥈  Multiplexed single cell RNA-seq for network-based analysis of COVID-19 inflammatory response (Team 10) 

Team Members: Saehui Hwang, Liam Silvera, Archie Shahidullah

Caltech students leveraged Virtualitics Immersive platform (VIP) to create network graphs to visualize how different conditions affect gene expression in multiplexed single cells of COVID-19 inflammatory response. 

Honorable Mention: Differential Causal Gene Networks in Asthma (Team 1)

Team Members: Agnim Agarwal, Pranav Patil, Brandon Guo, Neha Dalia

Caltech students leveraged Virtualitics Immersive Platform to identify genes independently correlated with asthma to determine underlying causal relationships.

The world of healthcare has become one of the most demanding and quickly evolving industries. This has been highly driven by the rapidly changing technology landscape, where efforts are leading to faster and better research, development, and overall outcomes. The catalyst for this transformation has been the access to large troves of data and how organizations have been able to improve both quality and overall data utilization.


The Challenges and Apparent Solutions

Data quality and utilization improvements have been a spark for better research and development in healthcare driven by the pairing of decision-making and the ability to leverage all types of data in the analytics process. Organizations today understand the ecosystem of their data, which has mainly solved the problem of accessibility through the Internet of Things (IoT), ETL and scalable cloud data stores. This was made even more impactful through the massive adoption of the dashboard.

The image above details the before and after workflow of my time as an analyst. I joined an analytics team prior to the dashboard adoption and spent even more time thereafter helping build different views for different departments in my organization. My time as an analyst consisted of working closely with both Data Infrastructure (DI) and Information Technology (IT) to ensure I would have access to the right data. If that data wasn’t being collected, that would kick off an ancillary project for the DI team to start collecting. Once we were able to confirm the right data was being collected, the DI team would provide me access to a copy of the data, typically in an Amazon S3 bucket or Amazon Redshift Cluster. 

Analyzing the Data

At this point access had been attained, and it was my responsibility to start exploring the data. We had a few team members who would use a query service like Amazon Athena in order to start querying the databases. As analysts, we all didn’t have basic SQL knowledge so we spent time training and learning with a few of the DI team members. We ended up building a library of queries curated by the DI team that helped us get the basic pieces of data that we wanted. Anything more sophisticated required more time with the DI team or spending a lot of time figuring it out on our own. Getting the right data was just half the battle. Now that we had it, it would go straight into an Excel spreadsheet where the hardest work was required. We would spend days and weeks combing through the data to eventually never fully realize insights. The biggest challenge here was that we needed faster computations and to view the data some other way than just spreadsheets and data tables.

Visualizing the Data

The dashboard arrived in the form of Looker. The team now shifted time to building out an environment that democratized data by providing access in an interface that was more visual and easier to navigate. My skills and team’s skills now shifted to learning a new querying language (LookML) in order to build the right dashboards and views for our organization. The dashboard was an extremely interesting solution to the problem of accessibility. It solved a very real and tangible problem; the problem of not being able to see the data. The intuitive and highly visual UI solved this. But a deep rooted problem still existed. It didn’t matter that I had access to the data, I didn’t know what to do with it. This challenge propelled the need for higher knowledge and specialties in working with complex data. We quickly reached this point at my organization which led to the massive growth of our data science team.

Communicating Insights

Traversing this data requires certain specialties in order for the time spent exploring to be guided and maximized. Organizations and teams require rich data science skills in order to reach meaningful results in a timely fashion. Oftentimes, this leads to bottlenecks and unmanageable backlogs of projects because teams are unequipped to tackle those effectively on the spot. With the landscape of data growing and organizations pressuring analytics teams to uncover more, the lack of timely insights will unfortunately cause organizations to fall behind in their work. With data-driven quality and utilization improvements being at the core of healthcare transformation, it is important that analytics teams are empowered to do this outside of traditional and heavy duty analytics workflows.

The image above details a specific hypothesis around solving problems on the spot, or at the very least, learning something meaningful on the spot that helps kick off a further investigation that is not starting from square 1. The hypothesis is that teams solving more problems on the spot will reach more meaningful insights faster that impact the business versus teams that are outsourcing pieces of work to data science, data infrastructure, or information technology teams consistently.


Virtualitics Immersive Platform Helps Analysts Solve Problems Right Now.

Virtualitics AI Platform is an augmented analytics company that helps companies enable analysts and domain experts by enhancing the value generated by existing teams without requiring extensive IT or Data Science support. The embedded no-code AI equips analysts with skills and techniques to quickly explore data and develop dynamic storytelling that targets decision makers directly. Virtualitics AI Platform focuses on the most critical components of analysts’ core workflows and requirements to work at their best.


1. Querying data

One of the most critical things for an analyst is to scale up and maintain their database knowledge. This type of knowledge is typically gained through Entity-Relationship Diagrams (ERD) or hours of querying. The visual aspect of this is extremely limited and if the data isn’t organized as cleaning, it can become almost impossible. IBM estimates that 80% of data is unstructured, which heavily impacts database knowledge and overall effectiveness of data utilization. VIP is a platform rooted in visualizations, helping analysts quickly understand relationships in their data within seconds. Analysts are able to quickly gain database knowledge but are also granted a mechanism for communicating that knowledge to other stakeholders who are not as familiar or equipped with the skills of the analyst themselves.

2. Merging domain expertise with data science for deeper understanding

It is not one or the other. One of the most critical components of an analyst is their command of the domain and business. They spend a lot of their time getting as close to the business problems as possible. You can think of them as the front line to exploring business problems, but when it comes to merging that knowledge with analytics it usually requires other resources. After researching and speaking with analysts in the healthcare industry, I quickly learned how critical it was for them to have a very intimate understanding of the data they would be using. The vast nature and complexity of the data made that a huge challenge on a daily basis for analysts. Virtualitics AI Platform completely eliminates the initial need for other resources, empowering analysts to learn more and identify insights that can meaningfully impact the problem or guide the subsequent analysis done by other teams. The platform provides no-code AI routines that enables an analyst to deploy ML models on their data and quickly identify relationships or characterize prime drivers in any of their existing models.

3. Improving communication and insights presentation

My research and interviews also yielded results that led to most analysts spending time upskilling in their data analytics capabilities, typically focused on learning at least one coding language. Almost none of the analysts highlighted presentations as something they thought of immediately improving. One of the biggest challenges in the industry is finding the common language between the highly technical audience and the non-technical audience. Part of the decision-making process is setting a foundational understanding and reaching alignment. Analytics teams today spend more time trying to convince stakeholders of one thing instead of spending more time implementing and steering the actionable change. Part of the reason for this is the knowledge gap, and how “the why” of insights are communicated. VIP eliminates this by developing core pillars in accessibility, visualizations, and explainability. The concept is that VIP can sit at the center of an organization, offering multiple access points based on your ecosystem, whether that is naturally using Python in a Jupyter notebook or simply using a web application in your browser. The idea is that using the core pillars, organizations can collaborate in the environments they feel most comfortable, using no-code AI to guide the analytics experience and using multi-dimensional visualizations as a common language for explaining “the why” behind the analytics.

Biotech Firm IsoPlexis announces Duomic platform that utilizes Virtualitics AI Platform to gain faster and more robust insights between genetic and proteomic connections at the single-cell level.

IsoPlexis, a biotech leader in single-cell proteomics, recently announced their integrated single-cell functional multi-omic biology platform Duomic. IsoPlexis partnered with Virtualitics to leverage the Virtualitics AI Platform in order to generate more robust insights around proteomic connections.

Virtualitics AI Platform enables companies to easily find, visualize, and share insights from complex data sets 100X faster than traditional business intelligence tools. Leveraging patented ML & AI models the platform gives organizations the power to analyze data at the speed of now. Virtualitics AI Platform helps identify and detect correlations within an organization’s data at a record pace.

“The ability to detect the functional proteome and genetic drivers of these proteins from each single cell will allow deeper precision into connecting the mechanisms and outcomes of therapies and disease. We believe this connection is key to accelerating the discovery of advanced medicines,” said Sean Mackay, CEO and Co-Founder of IsoPlexis.

IsoPlexis presented the first proof-of-concept data from its novel functional proteomic + transcriptomic platform, Duomic, at the Advances in Genome Biology and Technology (AGBT) – Precision Health conference on Coronado Island.

To learn more about IsoPlexis and how VIP helps accelerate the future of healthcare please visit:

Lung cancer is the leading cause of cancer-related death, making up almost 25 percent of deaths from all types of cancer in the U.S. However, thanks to positive lifestyle changes such as quitting smoking and advances in research, detection, and treatments, that number is dropping.

Individuals with cancer are at greater risk of developing complications and dying from common illnesses and viruses, including influenza. So, when a research group from Columbia Medical School launched a study to explore the link between lung cancer mortality and flu epidemics, Virtualitics joined the team to provide state-of-the-art data analytics and multidimensional visualization capabilities.

The purpose of this study was first to determine whether a link existed between death from lung cancer and influenza and then to apply any significant findings to increase patient and provider awareness of the danger of influenza infection for patients with lung cancer.


How the Study Was Conducted

The study was conducted using 195,038 patients with non-small cell lung cancer (NSCLC) located across 13 states using data obtained from the Surveillance, Epidemiology, and End Results (SEER) Program and the Centers for Disease Control and Prevention (CDC).  

The research team compared monthly mortality rates during the high and low flu months between 2009 and 2015 for all at-risk patients in the study, as well as newly diagnosed patients. Influenza severity level was defined based on the percentage of outpatient visits to healthcare providers for influenza-like illness and matching CDC flu activity levels with SEER data by state and month. 

The data was then analyzed using high-dimensional visualization coupled with AI routines so researchers could fully understand and explore the complex relationships within the data.

What the Study Found

The research team observed a significant difference between the monthly mortality rate in patients during high flu months compared with low flu months.    

The positive relationship between flu severity and mortality was observed across all study participants, as well as at the individual state level and among new patients, specifically.

The results of the study, which were published in the Journal of Clinical Oncology, proved that increased influenza severity was positively associated with higher mortality rates for NSCLC patients. At the conclusion of the study, researchers indicated that the study’s findings support the need for future research to determine the impact of influenza vaccines on reduced mortality for patients with lung cancer. 


3D Visualization and AI-Driven Analytics Enable Research Teams to Get the Most Valuable Information from Data

The Columbia Medical School research team was able to determine that a correlation exists between influenza severity and patient mortality by using three-dimensional visualization and artificial intelligence to analyze and model data. This type of visualization has several key benefits that improve data analysis capabilities across not just research but all industries.


Increase Visibility

Unlike 2D models, multidimensional visualization allows researchers to add and remove variables to measure the impact on the subject and get deeper and more complete insight into the relationships between datasets. For example, during the lung cancer study, researchers examined flu severity level by month and year simultaneously. 

Identify Interrelationships

Immersive data analytic technology is essential for identifying patterns, trends, and outliers that researchers can easily miss when the data is only presented in two dimensions. Without the additional plane, critical data points may be occluded that could make a significant difference in a study’s results.

Three-dimensional visualizations can also be applied to unstructured data to enable research teams to gain valuable insight and identify relationships that exist between datasets from otherwise hard-to-analyze, disparate data sources.

Improve Communication

Applying multidimensional visualizations to complex AI and machine learning models allows researchers to display data relationships in a variety of geospatial and graphical visualizations that are easily understood by researchers, data scientists, and the non-scientific community alike.

AI-driven data analytics and 3D visualization tools also enable distributed research teams to collaborate over geographic locations. With collaboration and presentation tools available in both desktop and VR, researchers can work together securely to find, visualize, and share data insights, anomalies, and patterns, no matter where they are located.

Multidimensional visualization paired with the power of AI and machine learning is driving advances in healthcare and medicine. When the Columbia Medical School research team and Virtualitics partnered to tackle the world’s deadliest cancer, they were able to look at the data in a whole new way.

Contact us to learn how Virtualitics AI Platform can help your organization get the most information and value from your data. 

Detailed paper: 

Influenza and mortality for non-small cell lung cancer. Authors: Connor J Kinslow, Yuankun Wang, Yi Liu, Konstantin M. Zuev, Tony J. C. Wang, Ciro Donalek, Michael Amori, and Simon Cheng. Published on Journal of Clinical Oncology, Volume 37, Issue 15.