“Via dynamic and interactive graphics, today’s technology allows
students to visualize externally what they have difficulty representing
mentally.”
— David
Moreau (2015), p. 2.
Research on human perception have taught us a lot about how to better
communicate data through illustrations and graphics. Visualizing
data-driven results continues to improve through psychological research
and technological innovations. Today, researchers have greater capacity
to collect better-quality data and in larger quantities (e.g., online
surveys, cell phone applications, wearable technologies, social media,
Google Analytics, etc.). In order to preserve the integrity and purpose
of our graphs, data visualizations must be able to accommodate this
surge in information volume and complexity of the digital age. Yet,
despite the exponential growth in computing power and software tools,
psychology researchers continue to rely mostly on static data
visualizations.
The (Optical) Illusion of Current Data Visualization Practices
Static visualizations are entirely acceptable and appropriate for
many purposes (as I hope the figures in this paper), however, as we
transition into the big data era, relying only on static graphics may
not be enough (Heer
& Kandel, 2012). Static graphs can limit the type and amount of
information one wishes to communicate, or the speed and comprehension of
details the reader is meant to perceive; certain types of static graphs
can often be no more than “visual tables” (Weissgerber et al.,
2015, p. 1). In a 2014 systematic review of research articles
published in the top 25% physiology journals, the authors found that the
most common type of data visualizations are static graphs. And
specifically, static bar graphs (Weissgerber et al.,
2015). Although a recent systematic review in psychological research
has yet to be published (to my knowledge), it appears that psychology
suffers from the same graph use issues as physiology and other
disciplines. The #barbarplots
project (2017) examined 131 research articles from the first six
month of 2016 in four high impact psychology journals and found that
there is a substantial presence of static bar graphs. Particularly, from
a total of 104 graphs presented, 55% of figures were static bar graphs.
Static bar graphs have been heavily criticized (especially with
continuous data) for increasing the risk of misinterpreting research
findings and providing limited information (Cooper et al., 2002;
Gelman,
2017; Lane & Sándor,
2009; Schriger et
al., 2006; Saxon, 2015; Weissgerber et al.,
2015). For example, a static bar graph can portray potentially
countless different distributions (e.g., bimodal, skewed, containing
outliers, difference in sample size) the same way if their means and
variability are similar (see Figure 1).
This can mislead readers to make a certain set of assumptions (e.g.,
normality, equal sample size and variability, no outliers, etc.) which
in reality might have never been met. In addition, if non-independent –
or paired – data are presented using static bar graphs, readers might
falsely infer independence and ignore within-subject differences (or
lack thereof; see Figure 2).
Of course, there are numerous types of static data visualizations,
other than bar graphs, that might be better suited for the researcher or
data scientist’s needs (e.g., scatterplots, line graphs, violin plots).
Yet, static graphs (unintentionally) force the reader to look at the
data through a single lens – only the one that the authors intended. The
authors have complete and utter control over the type of graph, which
variables go in, the scaling or truncation of axes, the angle or point
of view (e.g., in a 3D plot), the colours, sizes, and more. Many static
graphs can also be inaccessible to some readers. For example, readers
who are colour blind or have difficulty with depth perception, graphs
that include very small or too many labels, etc.
Furthermore,
static figures prevent readers from exploring the visualized data
independently and assessing data-driven conclusions impartially: readers
cannot manipulate the graph, inspect particular observations, rescale
the axes, or visualize different analyses. By doing so, we
unintentionally limit the information presented and therefore
transparency. This might sound normal; it is the default after all, we
have been seeing static figures in academic journals for decades. But we
now have the capacity to allow readers to share control over how the
data is visualized, so why do we not make it the new “normal?”
This unintentional lack of transparency is problematic in data
visualization as it is in other research aspects such as reporting
practices (e.g., limited/ambiguous reporting, excluding analyses due to
lack of statistical significance, etc.). Reporting practices, among
other research facets, have been directly addressed and are improving
thanks to various taskforces, the open science
framework (OSF), and more. Given recent concerns about the
replication crisis, publication bias, and lack of transparency in
psychological research, a reform in data visualization should also be
considered.
Enter Beyond-Static Graphics
Interactive and dynamic data visualizations can accommodate virtually
endless amount and types of information. In fact, with interactive
dynamic visualizations, readers can tailor and modify the graphic to
display the details of their choosing. But, before we go deeper into why
we should implement these data visualization methods, we first need to
define what are interactive and dynamic visualizations. In this paper, I
define three types of non-static data visualizations: interactive,
dynamic, and interactive dynamic data visualizations. However, I will
mostly focus on the latter type, interactive dynamic, due its advanced
features and benefits.
In an interactive data
visualization (IDV) the user can probe the presentation to view certain
aspects of the data (e.g., value on another variable), and thus interact
with the display directly instead of using menus or cross-referencing a
table (Ward et al.,
2015). Other interactive features may include scaling or zooming in
and out on a particular slice of data, highlighting certain categories
or values, rotating (in 3D graphics) or panning over, and more. An
example of IDV is presented in Figure 3 and can be accessed here.
Figure 3. Interactive data visualization. This is a
3D scatter plot with interactive features. The dataset presented is
‘mtcars’ from the ‘datasets’ R package (R Core Team, 2022). The plot
displays the relationship between different vehicle types (i.e.,
automatic in red and manual in blue), weight (x axis), time it takes to
drive a quarter of a mile in minutes (y axis), and horsepower (z axis).
The user can rotate the plot 360°, zoom in and out, pan the “camera”
over, highlight one or more categories (e.g., red; automatic), hover
over any individual observations to see its coordinates, and download a
snapshot of the current position of the plot.
Dynamic data visualizations (DDV) are
representations of data that can change their graphical make-up while
presented (Ploetzner
& Lowe, 2004; Schnotz et al., 1999). Kaput (1992) considered time
as a dimension in DDV. This is, of course, true when visualizing
time-series, hierarchical, or longitudinal designs, or to depict a
process (e.g., neuron firing). However, in other designs, I argue, any
variable – whether categorical or continuous – can serve as a dimension
in DDV (e.g., treatment condition, IQ, etc.). In DDV, displayed values
of the outcome can change (while still presented) as a function of
another variable (e.g., time, group). In a linear, or automatic DDV,
changes transpire on their own and cannot be altered by the user or
reader. An example of DDV is presented in Figure 4 and is available here.
Figure 4. Dynamic data visualization. This is a
scatter plot with dynamic features. The dataset presented comes from the
‘gapminder’ R package (Bryan, 2017). The plot
displays the relationship between gross domestic product per capita
(gdpPercap; x axis) and life expectancy in years (lifeExp; y axis)
across different times, from 1952 to 2007 (bottom scale). The user can
zoom in and out, highlight one or more categories (e.g., blue; Europe),
hover over any individual observations to see specific details (as seen
in the red box), and when the “play” button is pressed (bottom left
corner), changes across the years can be seen via animation.
Lastly, as its name suggests, interactive dynamic data
visualizations (IDDV) combine elements from both IDV and DDV
and is therefore the best of both worlds. In IDDV, any change in the
graphical structure follows a direct action from the user. This way,
users have control over – or, can manipulate – how the data is displayed
(Ploetzner
& Lowe, 2004). In many cases, the viewer even has control over
how the data is analyzed (see below example by Ellis &
Merdian, 2015). Schwan and
Riempp (2004) noted that IDDV “enable the user to adapt the
presentation to [their] individual cognitive needs” (p. 296). An example
of IDDV is presented in Figure 5 and is available here.
Figure 5. Interactive dynamic data visualization.
This app allows users to visualize and analyze COVID-19 and SARS
geographic and descriptive data across time using both interactive and
dynamic features created by Parker (2022).
Users can browse the different tabs at the top, adjust the date on the
left blue scale, hover over any region or observation, apply
filters/masks, pan over, zoom, view the raw data, and create a
customized line graph.
Although there are mixed conclusions about the effectiveness of IDV
and DDV (e.g., Hegarty,
2004; Hood et
al., 2019; Ploetzner
& Lowe, 2004; Rolfes et al., 2020;
Schnotz et al., 1999;
Schwan
& Riempp, 2004), there is a growing body of literature
advocating for the use of IDDV in both education and research settings
(e.g., Ellis &
Merdian, 2015; Heer, & Shneiderman,
2012; Rolfes et
al., 2020; Ward et
al., 2015).
Why We Should Use Interactive Dynamic Graphics in Psychology
There are many clear advantages to incorporating IDDV in academic
articles and teaching. In this paper, I focus only on a few key reasons
relating to open research practices, live data updates after
visualization deployment, and effective data communication:
IDDV Incite Transparent Research Practices
Concerns about the credibility of previously published results in
psychological research inspired new standards of recommended practices
such as preregistration, registered reports, open data, shared analyses,
etc. IDDV are much more aligned with the new standards of transparent
research than static graphs. Thoughtful IDDV allow careful inspection of
every dimension of the data, whereas with static visualizations, this is
usually impossible or can make the graph too complex to comprehend. More
advanced IDDV can even grant the reader the ability to explore and
reanalyze the data on their own. This is perhaps one of the highest
levels of open research practices. For example, in the IDDV
example created by
Ellis and Merdian (2015) using R Shiny app, the user can select the
variables of their choice, differentiate by group, apply different
filters or masks, and recalculate the regression equation and
correlation coefficient (see Figure 6).
Figure 6. Interactive dynamic example with exploratory data
analysis applications. This interactive dynamic figure allows
the user to select the variables of their choice (from the dropdown
menus at the top left), differentiate by group (male, female), apply
different filters or masks (previous victim, not previous victim), and
recalculate the regression equation and correlation coefficient (bottom
right table). The user can also inspect the data using a different graph
type (e.g., boxplot) by switching to the Boxplot tab above the
figure.
Furthermore, with tools such as R Shiny, where the data and analyses
are stored on the author’s server, users can have all the benefits of
having access to the full dataset without compromising participant
confidentiality. One of the main factors preventing researchers from
making their data openly available to external evaluators is the risk of
breaching ethical protocols and exposing private information. However,
using IDDV, the data can still be investigated and reanalyzed by the
user without actually having to share the data. Note that we could
explore and visualize data in the Ellis and Merdian
(2015) example above without having access to the Fear of Crime
dataset.
Live Data Visualizations
IDDV can continue to receive new data and update its visual structure
accordingly, even after its deployment. Normally, once a static plot or
graph are produced, the figure cannot be changed. This can influence our
conclusions, too; once a paper is finalized, the figures are produced,
scientists publish their paper and (usually) never look back. However,
in the current era of big data and cloud technology, it is possible and
often useful to stream and visualize our data live as it is being
collected. Imagine you are conducting a study looking at various
transactions on a social media platform (e.g., Facebook, Twitter,
Instagram). Using IDDV, researchers can follow and visualize their data
in real-time as social transactions take place. For example, there is an
existing dashboard visualizing a livestream downloads of R packages from
CRAN (Sievert et al.,
2022; see Figure 7). A similar app was attempted by Hadley Wickham
(2019). This novel approach opens a lot of investigative avenues and
provides useful tools to expend the arsenal of data visualization and
research capabilities.
Figure 7. Interactive dynamic Live Data
Visualization. livestream of download logs from cran.rstudio.com. This Shiny app
illustrates the names and frequencies of R packages downloaded by users.
Viewers can also manipulate the rate and capacity of the visualization,
scan the raw logs in the top tab, review the source code, and share this
app on various social media platforms.
IDDV are More Effective at Communicating Data
As Moreau
beautifully put it, “[v]ia dynamic and interactive graphics, today’s
technology allows students to visualize externally what they have
difficulty representing mentally” (2015, p. 2). In a recent study,
Hood and colleagues
(2020) found that IDDV figures in digital publications were more
effective at communicating main effects and null relationships than
static graphs. In another study by Rolfes et
al. (2020), students who were given materials visualized using IDDV
performed significantly better than students in the static visualization
group. These results suggest that IDDV might be more effective in
communicating data. Furthermore, IDDV make active exploration possible
where readers are free to interact with their data rather than trying to
absorb it passively; functional engagement improves learning, interest,
and comprehension (Bodemer et
al., 2004). This is particularly helpful because direct interaction
with visual content facilitates the involvement of the motor system (Wraga et al.,
2003), which is essential for achieving deeper levels of
understanding (Moreau, 2015).
