“HUMANS ARE A POWERFUL PATTERN RECOGNITION SYSTEM, BUT…”
We acquire more information through vision than all the other senses combined. Eye + visual cortex can be seen as a massively parallel processor that provides the highest bandwidth channel into human cognitive centers. That’s why the words “understanding” and “seeing” are often used synonymously.
Powerful visualizations carry many advantages: they provide an ability to understand huge amounts of data; they help find properties that were not anticipated; they allow for understanding of both large-scale and small-scale features of data; and they help us understand how data was collected.
While the human visual system is probably still the most powerful pattern recognition system, unfortunately, we don’t scale!
“MACHINE LEARNING: A VISUALIZATION-CENTRIC APPROACH”
As a data scientist I have always been faced with two major tasks: 1) Having a “quick but informative” look at the data before choosing the appropriate model and 2) Conveying the results to non-data scientists (e.g., domain experts, upper management…).
Both of these tasks can be successfully accomplished using the right visualization tools that can represent abstract data in a form that facilitates human interaction for exploration and understanding. All of us, working on Machine Learning, know that visualization is actually involved in every aspect of the KDD process (Knowledge Discovery in Databases) – from the data gathering up to the extraction of knowledge. That’s why even the whole KDD process itself can be viewed in a visualization-centric way.
“OVERVIEW FIRST, ZOOM AND FILTER, THEN DETAILS ON DEMAND”
“Overview first, zoom and filter, then details on demand” is a famous Visual Information-Seeking mantra from Shneiderman (1996) that explains how an effective data visualization tool should be built. Arguably, in the majority of use cases, “static” visualization is not enough anymore. Users need new ways to interact with their data and drill down information in an easy and intuitive fashion in order to get more insights. But we are at a stage now that even “interactive” may not be enough. We are at the start of the “immersive data visualization era“.
“RE-IMAGINE THE WAY WE REPRESENT AND INTERACT WITH DATA”
As humans we are fully trained to live in a 3D world. In 3D we can grasp many more dimensions in a very intuitive way (e.g., how big objects are, relative distances, shapes, colors, etc).
Experiments conducted at Caltech and JPL showed that users commit less errors when immersed in a 3D environment and that the discovery process is much more powerful when they can interact with each other and be immersed in their data.
While it is true that we are used to “standard” plots (e.g., bar charts, 2D scatter plots), they may hide useful and crucial information that is visible only at the intersection of many dimensions. That’s why the complexity of data sets nowadays calls for new types of visualization and interaction. In order to solve the challenges faced in gaining precious insights and competitive advantages from the data we need to re-imagine the way we represent the data.
That’s where Virtual Reality, coupled with easy-to-use Machine Learning tools, becomes a great asset. It allows us to build robust, immersive environments that can be used for collaborative data exploration and to better understand and convey the results.
Ciro Donalek is co-founder and CTO of Virtualitics Inc. He has spent over ten years as data scientist at Caltech authoring numerous papers (published on Nature, IEEE Big Data, MNRAS, Neural Networks, etc), in Machine Learning, Immersive Visualization and Virtual Reality.