“Via dynamic and interactive graphics, today’s technology allows students to visualize externally what they have difficulty representing mentally.”
Research on human perception have taught us a lot about how to better communicate data through illustrations and graphics. Visualizing data-driven results continues to improve through psychological research and technological innovations. Today, researchers have greater capacity to collect better-quality data and in larger quantities (e.g., online surveys, cell phone applications, wearable technologies, social media, Google Analytics, etc.). In order to preserve the integrity and purpose of our graphs, data visualizations must be able to accommodate this surge in information volume and complexity of the digital age. Yet, despite the exponential growth in computing power and software tools, psychology researchers continue to rely mostly on static data visualizations.
Static visualizations are entirely acceptable and appropriate for
many purposes (as I hope the figures in this paper), however, as we
transition into the big data era, relying only on static graphics may
not be enough (Heer
& Kandel, 2012). Static graphs can limit the type and amount of
information one wishes to communicate, or the speed and comprehension of
details the reader is meant to perceive; certain types of static graphs
can often be no more than “visual tables” (Weissgerber et al.,
2015, p. 1). In a 2014 systematic review of research articles
published in the top 25% physiology journals, the authors found that the
most common type of data visualizations are static graphs. And
specifically, static bar graphs (Weissgerber et al.,
2015). Although a recent1 systematic review in psychological research
has yet to be published (to my knowledge), it appears that psychology
suffers from the same graph use issues as physiology and other
disciplines. The #barbarplots
project (2017) examined 131 research articles from the first six
month of 2016 in four high impact psychology journals and found that
there is a substantial presence of static bar graphs. Particularly, from
a total of 104 graphs presented, 55% of figures were static bar graphs.
Static bar graphs have been heavily criticized (especially with
continuous data) for increasing the risk of misinterpreting research
findings and providing limited information (Cooper et al., 2002;
Gelman,
2017; Lane & Sándor,
2009; Schriger et
al., 2006; Saxon, 2015; Weissgerber et al.,
2015). For example, a static bar graph can portray potentially
countless different distributions (e.g., bimodal, skewed, containing
outliers, difference in sample size) the same way if their means and
variability are similar (see Figure 1).
Figure 1. Many different datasets can lead to the same static bar graph. The full data may suggest different conclusions from the summary statistics. The means and standard errors (SEs) for the four example datasets shown in Panels B–E are all within 0.5 units of the means and SEs shown in the bar graph (Panel A). In Panel B, the distribution in both groups appears symmetric. Although the data suggest a small difference between groups, there is substantial overlap between groups. In Panel C, the apparent difference between groups is driven by an outlier. Panel D suggests a possible bimodal distribution. Additional data are needed to confirm that the distribution is bimodal and to determine whether this effect is explained by a covariate. In Panel E, the smaller range of values in group two may simply be because there are only three observations. Additional data for group two would be needed to determine whether the groups are different. Figure and caption adapted from Weissgerber et al. (2015).
This can mislead readers to make a certain set of assumptions (e.g., normality, equal sample size and variability, no outliers, etc.) which in reality might have never been met. In addition, if non-independent – or paired – data are presented using static bar graphs, readers might falsely infer independence and ignore within-subject differences (or lack thereof; see Figure 2).
Figure 2. Additional problems with using static bar graphs to show paired data. The bar graph (mean ± SE) suggests that the groups are independent and provides no information about whether changes are consistent across individuals (Panel A). The scatterplots shown in the Panels B–D clearly demonstrate that the data are paired. Each scatterplot reveals very different patterns of change, even though the means and SEs differ by less than 0.3 units. The lower scatterplots showing the differences between measurements allow readers to quickly assess the direction, magnitude, and distribution of the changes. The solid lines show the median difference. In Panel B, values for every subject are higher in the second condition. In Panel C, there are no consistent differences between the two conditions. Panel D suggests that there may be distinct subgroups of “responders” and “non-responders.” Figure and caption taken from Weissgerber et al. (2015)..
Of course, there are numerous types of static data visualizations,
other than bar graphs, that might be better suited for the researcher or
data scientist’s needs (e.g., scatterplots, line graphs, violin plots).
Yet, static graphs (unintentionally) force the reader to look at the
data through a single lens – only the one that the authors intended. The
authors have complete and utter control over the type of graph, which
variables go in, the scaling or truncation of axes, the angle or point
of view (e.g., in a 3D plot), the colours, sizes, and more. Many static
graphs can also be inaccessible to some readers. For example, readers
who are colour blind or have difficulty with depth perception, graphs
that include very small or too many labels, etc.
Furthermore,
static figures prevent readers from exploring the visualized data
independently and assessing data-driven conclusions impartially: readers
cannot manipulate the graph, inspect particular observations, rescale
the axes, or visualize different analyses. By doing so, we
unintentionally limit the information presented and therefore
transparency. This might sound normal; it is the default after all, we
have been seeing static figures in academic journals for decades. But we
now have the capacity to allow readers to share control over how the
data is visualized, so why do we not make it the new “normal?”
This unintentional lack of transparency is problematic in data
visualization as it is in other research aspects such as reporting
practices (e.g., limited/ambiguous reporting, excluding analyses due to
lack of statistical significance, etc.). Reporting practices, among
other research facets, have been directly addressed and are improving
thanks to various taskforces, the open science
framework (OSF), and more. Given recent concerns about the
replication crisis, publication bias, and lack of transparency in
psychological research, a reform in data visualization should also be
considered.
Interactive and dynamic data visualizations can accommodate virtually
endless amount and types of information. In fact, with interactive
dynamic visualizations, readers can tailor and modify the graphic to
display the details of their choosing. But, before we go deeper into why
we should implement these data visualization methods, we first need to
define what are interactive and dynamic visualizations. In this paper, I
define three types of non-static data visualizations: interactive,
dynamic, and interactive dynamic data visualizations. However, I will
mostly focus on the latter type, interactive dynamic, due its advanced
features and benefits.
In an interactive data
visualization (IDV) the user2 can probe the presentation to view certain
aspects of the data (e.g., value on another variable), and thus interact
with the display directly instead of using menus or cross-referencing a
table (Ward et al.,
2015). Other interactive features may include scaling or zooming in
and out on a particular slice of data, highlighting certain categories
or values, rotating (in 3D graphics) or panning over, and more. An
example of IDV is presented in Figure 3 and can be accessed here.
Figure 3. Interactive data visualization. This is a 3D scatter plot with interactive features. The dataset presented is ‘mtcars’ from the ‘datasets’ R package (R Core Team, 2022). The plot displays the relationship between different vehicle types (i.e., automatic in red and manual in blue), weight (x axis), time it takes to drive a quarter of a mile in minutes (y axis), and horsepower (z axis). The user can rotate the plot 360°, zoom in and out, pan the “camera” over, highlight one or more categories (e.g., red; automatic), hover over any individual observations to see its coordinates, and download a snapshot of the current position of the plot.
Dynamic data visualizations (DDV) are representations of data that can change their graphical make-up while presented (Ploetzner & Lowe, 2004; Schnotz et al., 1999). Kaput (1992) considered time as a dimension in DDV. This is, of course, true when visualizing time-series, hierarchical, or longitudinal designs, or to depict a process (e.g., neuron firing). However, in other designs, I argue, any variable – whether categorical or continuous – can serve as a dimension in DDV (e.g., treatment condition, IQ, etc.). In DDV, displayed values of the outcome can change (while still presented) as a function of another variable (e.g., time, group). In a linear, or automatic DDV, changes transpire on their own and cannot be altered by the user or reader. An example of DDV is presented in Figure 4 and is available here.