Statistical Literacy – the Golden Rules
A Review of Tim Harford’s The Data Detective: Ten Easy Rules to Make Sense of
Statistics
Stephanie Budgett1
The University of Auckland
Amy Renelle2
The University of Auckland
Curiosity is the wick in the candle of learning
~William Arthur Ward
Setting the Scene
As statistics educators, our overarching goal is to instill in our students an appreciation
for the stories that can be told from data. Storytelling, however, is never straightforward.
Surely, we can remember times where we have been in the thrall of an experienced
raconteur with a penchant to up-sell narratives with a healthy sprinkling of imagination
and exaggeration in order to pique the interest of their readers or listeners. Given we are
living in a “post-truth” world, where “alternative facts” and “fake news” permeate the
airwaves, equipping students with the skills to distinguish fact from fiction is more
important than ever (Ridgway, Nicholson, & Stern, 2017).
Seventy years ago, in his presidential address to the American Statistical Association,
Samuel S. Wilks commented, “Perhaps H. G. Wells was right when he said, “statistical
thinking will one day be as necessary for efficient citizenship as the ability to read and
write”!” (Wilks, 1951, p. 5). Fast-forward four decades to Katherine Wallman’s
presidential address where she offered her perspective on the breadth of views about the
definition of statistical literacy from within and beyond the statistics education discipline:
““Statistical Literacy” is the ability to understand and critically evaluate statistical results
that permeate our daily lives – coupled with the ability to appreciate the contributions
1
2
s.budgett@auckland.ac.nz
alin717@aucklanduni.ac.nz
The Mathematics Enthusiast, ISSN 1551-3440, vol. 20, nos. 1, 2 & 3, pp. 256-265.
2023 © The Authors & Department of Mathematical Sciences, University of MontanaBudgett & Renelle, p. 257
that statistical thinking can make in public and private, professional and personal
decisions” (Wallman, 1993, p. 1).
Since then, there has been much debate regarding the definition of statistical literacy.
Jane Watson (1997) initially developed a view of statistical literacy that was centered on
media reports and focused on the data consumer, later widening her definition to
incorporate awareness and experience of how data is produced (Watson, 2013). In 2002,
Iddo Gal conceptualized statistical literacy by proposing a framework comprised of
knowledge elements and dispositional elements. He contended that statistical literacy
applied to data consumers and described the ability to interpret and critically evaluate
statistically based information from a wide range of sources and to articulate a reasoned
opinion based on such information. He stated “[I]t follows that adults should maintain in
their minds a list of “worry questions” regarding statistical information being
communicated or displayed” (Gal, 2002, p. 17). Many others have contributed to the
statistical literacy debate (e.g., Garfield & Ben-Zvi, 2007; Rumsey, 2002; Sharma, 2017,
Snell, 2003; Utts, 2003, Weiland, 2017).
Rob Gould’s (2010) visionary reflection of the style and content of statistics courses noted
that things needed to change if we are to move with the times. His three challenges to the
statistics education community were to (1) Redefine Data – recognizing that data need
not be confined to ‘numbers’, (2) Create Citizen Statisticians – recognizing that the
separation of consumer and producer makes little sense in today’s world when students
are both consuming and producing data, and (3) Teach Technology – recognizing that, in
order to access and make sense of the plethora of freely-available data, some coding and
programming skills are required. More recently, he proposed an augmented definition of
statistical literacy incorporating data literacy, in recognition of the growing significance
of data in our daily lives (Gould, 2017).
Yes, data and information permeated our daily lives in 1993. But today, data is
omnipresent, and the types of data to which we are exposed is changing constantly, as
noted by Gould (2017). We are living in a world awash with data and information. Every
day we consume data and information to make everyday decisions. It cannot be avoided.TME, vol. 20, nos. 1, 2 & 3, p. 258
We consult interactive websites and information dashboards, manipulating complex
multivariate data to discover things about the world around us. Given the information-
laden society we now live in, everyone needs to have a level of statistical literacy to make
sense of data. In Gould’s words, there is a need for Citizen Statisticians. Not only are we
required to passively consume data and information, “[I]n the future, everyone will need
some data analysis skills.” (reflection of Roger Peng in Gould, et al., 2018).
All the above begs the question: How do we equip citizens to be statistically literate?
Unfortunately, we don’t have the answer. However, respected and talented
communicators, such as Tim Harford, can help us on our journey to improve everyone’s
statistical literacy.
Harford is the recipient of multiple awards, largely thanks to his many contributions to
improving public understanding of economics. His engaging and entertaining prose
draws the curious reader in, effectively bridging the gap between academic statistics
education researchers and Joe Public. He was awarded an OBE in 2019 for “services to
improving economic understanding”. As a senior columnist at the Financial Times, and
presenter of the popular investigative BBC Radio’s More or Less, he is a familiar media
figure in the UK. Others will know him from his previous books such as The Undercover
Economist and Fifty Things That Made the Modern Economy.
In his most recent book The Data Detective: Ten easy rules to make sense of statistics
(the North American edition, for the worldwide edition look for How to Make the World
Add Up: Ten Rules for Thinking Differently About Numbers), Tim Harford provides his
audience with a list of ‘rules’ to follow when dealing with statistical claims. The rules are
cleverly illustrated via both entertaining anecdotes and respected academic research.
Much of what is covered by Harford’s ten rules aligns with the human behaviours
identified by academics to be crucial in the development of statistical literacy.
Let’s take a closer look, through the collective lenses of two statistics educators.Budgett & Renelle, p. 259
Harford’s Rules
Harford’s first two rules, Rule 1: Search Your Feelings, and Rule 2: Ponder Your Personal
Experience, reference our emotions and our individual knowledge. Going somewhat
against the grain, Rule 1 proposes that, when presented with a new piece of information,
we should ask ourselves: “How does it make me feel?” This advice seems contrary to the
belief that decision-making should be based on statistical evidence rather than on one’s
emotions. Indeed, much is known about the biases we succumb to when we use heuristic
reasoning for making decisions under uncertainty (Tversky & Kahneman, 1974). If a new
piece of information with which we are presented appeals, or aligns with our prior beliefs,
we are likely to look for reasons to believe it. If not, we are likely to look for reasons to
challenge it. Rule 1 suggests that, rather than blindly reacting on the basis of a particular
emotion, we take time to understand where that emotion came from.
Rule 2, to Ponder your personal experience, highlights the fact that an individual worm’s-
eye view can often be in conflict with the bird’s-eye view that statistics can offer. A
comparison of Harford’s own commuting experience with the occupancy statistics
provided by Transport for London seemed at odds. As noted by Hans Rosling, our
instincts based on personal experience, can serve to distort our view of the world (Rosling
et al., 2018). Relying on the worm’s-eye view is likely to give us a warped sense of reality.
Somehow amalgamating the worm’s-eye and bird’s-eye views can contribute to a deeper
understanding of the underlying situation.
Rule 3, to Avoid Premature Enumeration, aligns with the important idea of defining
measurements. Let’s start off with a deceptively simple question: how would you define
‘sheep’? Rule 3 is reminiscent of Jessica Utts’ Critical Component 3, claiming that sound
statistical studies and accompanying media reports should detail “[the] exact nature of
the measurements made, or the questions asked” (Utts, 2014, italics as in original, p. 20).
Harford provides an entertaining example of how complex defining measures can be,
courtesy of Michael Blastland, creator of BBC Radio 4’s More or Less. How many sheep
are in the image below (Figure 1)? Is a lamb a sheep? At what point is an unborn lamb a
sheep? Is the correct answer one? Or two? Maybe two and a half? Or three? Figuring out
what reported statistics are referring to is a vital first step in being able to drawTME, vol. 20, nos. 1, 2 & 3, p. 260
appropriate conclusions. Added to this, it is also important to pay attention to definitions
when comparisons are being made. If comparisons happen over time, have definitions
changed? If comparisons occur between countries, do those countries share common
definitions of what is being compared?
Figure 1. Defining ‘sheep’?
Harford describes several scenarios in which issues with definitions contributed to
inaccurate claims and conclusions. Examples include infant mortality, violence, gun
deaths and self-harm, not to mention ‘sheep’! Yet, the question remains: how can we, as
statistics educators, encourage consumers of statistics (i.e., everyone!) to be curious about
definitions?
Once the definitional aspect is sorted and we know what we are looking at, we need to
figure out how closely to look at it. Focusing on the difference between taking an
individual view, compared with an aggregate standpoint, Rule 4: Step Back and Enjoy the
View demonstrates how our inference may change depending on how closely we examine
the picture. What we see is often related to how frequently we look; for example, perhaps
unemployment – once we understand the definition being used – in the UK has increased
this past month, but decreased since the 1980s, yet increased since the end of WW2.
Politicians are notorious for carefully picking two points in time that conveniently portray
the statistical story that supports their argument. See Young (2012) for a media report
describing precisely this. Harford describes the media-hype surrounding a story, in April
2018, that London’s murder rate had surpassed that of New York. While true at that
specific point in time, taking a step back demonstrated that monthly fluctuations can
paint a very different picture to that of trends across a longer period of time. In theBudgett & Renelle, p. 261
absence of rules or guidelines indicating how to pick points in time to make comparisons,
the only way to avoid the trap of succumbing to media spin is for the reader (or listener)
to ask questions. Why compare these specific times? What does the overall trend look
like? Critically questioning claims is a key attribute of a statistically literate person (e.g.,
Gal, 2002).
Harford’s fifth rule, Rule 5: Get the Backstory, casts the spotlight on the idea that, just as
happens in the media, novel and exciting scientific findings are more likely to be
published in academic journals than dull and uninteresting ones. The media rarely, if
ever, provide the backstory to a scientific finding. Without doing one’s own background
research, the reader is ill-equipped to place specific findings in a wider context, or to
consider how those findings compare or contrast to previous related discoveries. To
Harford’s credit, he considers the findings from the numerous studies he has presented
in his previous chapters and asks how he knows that those studies were credible. His
answer? “I cannot be certain” (p. 131). He suggests that discerning the good from the bad,
in terms of science journalism, may be possible by asking a few questions which will be
familiar to those with some knowledge of the statistical literacy research base (e.g., Gal’s
“worry questions” (2002) and Utt’s Seven Critical Components (2004)).
Rule 6: Ask Who is Missing captures the spirit of the statistical concept of
representativeness. Through a series of illuminating examples, Harford demonstrates
that much of ‘research-based accepted wisdom’ may not be entirely what we thought it
was. An increasing awareness of studies involving WEIRD subjects (Western, Educated,
from Industrialised Rich Democracies) brings into question the relevance of the findings
for non-WEIRD groups. Harford describes the impact of not paying enough attention to
the missing people (selection bias) and the missing responses (non-response bias). This
particular idea corresponds to the second of Gal’s (2002) “worry questions” which he
promoted as supporting “the process of critical evaluation of statistical messages and
[leading] to the creation of more informed interpretations and judgments” (p. 17).
Harford cautions the reader not to be seduced by big data, highlighting that a thirst for
N=All might lead us to an acceptance of N=Everyone who has signed up for a particular
service. Such a compromise is risky and, despite perhaps having a huge dataset, willTME, vol. 20, nos. 1, 2 & 3, p. 262
inevitably lead to misleading findings. For example, a sentiment analysis of tweets on
Twitter will only give us a snapshot of Twitter users’ thoughts on a topic of interest, and
the snapshot is unlikely to resemble that of non-Twitter users.
Examining the black hole of big data, Rule 7: Demand Transparency When the Computer
Says No reminds us that we need to exercise caution when interpreting the output of
‘mysterious’ algorithms which, having been fed with large amounts of data, are
increasingly being used in decision-making. Harford provides several stories where
questionable, and often damaging, judgments were made. Quoting respected statistician
and fellow OBE Sir David Spiegelhalter, “There are a lot of small data problems that occur
in big data. They don’t disappear because you’ve got lots of the stuff. They get worse.” (in
Harford, 2014, p. 15). In light of Gould’s (2017) enhanced definition of statistical literacy
(SL), “Big data are ubiquitous in our society, and developing SL in the context of big data
is equally important as developing SL with more traditional data types” (p. 24). Big data
therefore deserves scrutiny in all statistics classrooms.
Harford’s Rule 8: Don’t Take Statistical Bedrock for Granted opens with the story of how
and why a woman, Alice Rivlin, would become the first director of the Congressional
Budget Office (CBO), the agency which would provide budgetary advice to congress.
Agencies such as the CBO tend to be taken for granted, likened in many ways to our
sewerage systems which are inclined to suffer from neglect until a problem arises. We
learn how global leaders distrust statistical agency predictions that don’t conform to their
own beliefs, much like the biases mentioned earlier, but perhaps with more disastrous
consequences. What we also discover is that independent statistical agencies are essential
if we are to understand the world in which we live. Today, with increased access to data,
there is a need to engage with the proposal of Gal and Ograjenšek (2017) to further
conceptualize the skills required to develop official statistics literacy. It would be fair to
say that official statistics are not perfect, they can certainly be tweaked and distorted by
those in power. However, we need to stand in unity with those honest and forthright
statisticians who have been threatened and commend them for staying true to their cause.Budgett & Renelle, p. 263
The deceptive beauty of data visualizations comes under the microscope in Rule 9:
Remember that Misinformation Can Be Beautiful, Too. “Familiarity with graphical and
tabular displays and their interpretation” is the third component of the statistical
knowledge base outlined by Gal (2002, p. 11), otherwise known as “Document Literacy
tasks [which] require people to identify, interpret, and use information given in lists,
tables, indexes, schedules, charts and graphical displays” (p. 8). Creating graphs and
interpreting them is commonplace in statistics classrooms. But do we spend enough time
highlighting how we can be manipulated by crafty data visualizations? Referencing WW1
battleships and their clever attempts at misdirection, Harford suggests that visualizations
can use “[dazzle] camouflage [that is] intended to provoke misjudgments” (p. 219). Dazzle
camouflage, in a statistics context, refers to a graph so stunning that you forget it’s telling
you a load of nonsense. From the dodgy pie chart to emotive renditions of a simple bar
graph, colour, scale, units, and other cunning manipulations takes a statistician’s best
friend and turns it into a beautifully photoshopped catastrophe. But, when something
looks so good, how can we motivate anyone to take their eyes off the shine for long enough
to recognise they are being serenaded by a statistical sea siren.
Harford’s final rule, to Keep an Open Mind, draws many threads together effectively
through a simple example. Imagine you are at a wedding and, in conversation with those
at your table, you predict whether or not you think the marriage will last. The tendency is
for us to consider what we know about the couple, searching our feelings and pondering
our personal experience. Harford advocates for us to activate Rule 4 and to Step Back and
Enjoy the View by considering the rate of failed marriages in the population of interest.
Not an easy feat. How to define ‘marriage’? And should one consider marriages between
people of the same age, education level, …, as the couple in question?
Wrapping Up
The common thread throughout this review, and indeed throughout Tim Harford’s
engaging book, is the need to inspire curiosity. We see this in statistics education all the
time, the ultimate challenge of how to motivate statistical consumers to take a critical
stance such that “...adults hold a propensity to adopt, without external cues, a questioningTME, vol. 20, nos. 1, 2 & 3, p. 264
attitude towards quantitative messages that may be misleading, one-sided, biased, or
incomplete in some way, whether intentionally or unintentionally” (Gal, 2002, p. 18).
Harford’s Golden Rule: Be Curious emphasises this goal. Everyone should read The Data
Detective - it’s a valuable and highly accessible resource - but the chances are, until
curiosity is piqued in all statistical consumers, Harford may fail to reach his intended,
broad audience. If so, the task of encouraging curiosity among statistical consumers falls
to us, the statistics educators. For us, at least, The Data Detective will represent a highly
practical and engaging tool in our statistical literacy education toolbox.