Monday, October 21, 2024

Review of Tim Harford Data Detective

 Statistical Literacy – the Golden Rules

A Review of Tim Harford’s The Data Detective: Ten Easy Rules to Make Sense of

Statistics

Stephanie Budgett1

The University of Auckland

Amy Renelle2

The University of Auckland

Curiosity is the wick in the candle of learning

~William Arthur Ward

Setting the Scene

As statistics educators, our overarching goal is to instill in our students an appreciation

for the stories that can be told from data. Storytelling, however, is never straightforward.

Surely, we can remember times where we have been in the thrall of an experienced

raconteur with a penchant to up-sell narratives with a healthy sprinkling of imagination

and exaggeration in order to pique the interest of their readers or listeners. Given we are

living in a “post-truth” world, where “alternative facts” and “fake news” permeate the

airwaves, equipping students with the skills to distinguish fact from fiction is more

important than ever (Ridgway, Nicholson, & Stern, 2017).

Seventy years ago, in his presidential address to the American Statistical Association,

Samuel S. Wilks commented, “Perhaps H. G. Wells was right when he said, “statistical

thinking will one day be as necessary for efficient citizenship as the ability to read and

write”!” (Wilks, 1951, p. 5). Fast-forward four decades to Katherine Wallman’s

presidential address where she offered her perspective on the breadth of views about the

definition of statistical literacy from within and beyond the statistics education discipline:

““Statistical Literacy” is the ability to understand and critically evaluate statistical results

that permeate our daily lives – coupled with the ability to appreciate the contributions

1

2

s.budgett@auckland.ac.nz

alin717@aucklanduni.ac.nz

The Mathematics Enthusiast, ISSN 1551-3440, vol. 20, nos. 1, 2 & 3, pp. 256-265.

2023 © The Authors & Department of Mathematical Sciences, University of MontanaBudgett & Renelle, p. 257

that statistical thinking can make in public and private, professional and personal

decisions” (Wallman, 1993, p. 1).

Since then, there has been much debate regarding the definition of statistical literacy.

Jane Watson (1997) initially developed a view of statistical literacy that was centered on

media reports and focused on the data consumer, later widening her definition to

incorporate awareness and experience of how data is produced (Watson, 2013). In 2002,

Iddo Gal conceptualized statistical literacy by proposing a framework comprised of

knowledge elements and dispositional elements. He contended that statistical literacy

applied to data consumers and described the ability to interpret and critically evaluate

statistically based information from a wide range of sources and to articulate a reasoned

opinion based on such information. He stated “[I]t follows that adults should maintain in

their minds a list of “worry questions” regarding statistical information being

communicated or displayed” (Gal, 2002, p. 17). Many others have contributed to the

statistical literacy debate (e.g., Garfield & Ben-Zvi, 2007; Rumsey, 2002; Sharma, 2017,

Snell, 2003; Utts, 2003, Weiland, 2017).

Rob Gould’s (2010) visionary reflection of the style and content of statistics courses noted

that things needed to change if we are to move with the times. His three challenges to the

statistics education community were to (1) Redefine Data – recognizing that data need

not be confined to ‘numbers’, (2) Create Citizen Statisticians – recognizing that the

separation of consumer and producer makes little sense in today’s world when students

are both consuming and producing data, and (3) Teach Technology – recognizing that, in

order to access and make sense of the plethora of freely-available data, some coding and

programming skills are required. More recently, he proposed an augmented definition of

statistical literacy incorporating data literacy, in recognition of the growing significance

of data in our daily lives (Gould, 2017).

Yes, data and information permeated our daily lives in 1993. But today, data is

omnipresent, and the types of data to which we are exposed is changing constantly, as

noted by Gould (2017). We are living in a world awash with data and information. Every

day we consume data and information to make everyday decisions. It cannot be avoided.TME, vol. 20, nos. 1, 2 & 3, p. 258

We consult interactive websites and information dashboards, manipulating complex

multivariate data to discover things about the world around us. Given the information-

laden society we now live in, everyone needs to have a level of statistical literacy to make

sense of data. In Gould’s words, there is a need for Citizen Statisticians. Not only are we

required to passively consume data and information, “[I]n the future, everyone will need

some data analysis skills.” (reflection of Roger Peng in Gould, et al., 2018).

All the above begs the question: How do we equip citizens to be statistically literate?

Unfortunately, we don’t have the answer. However, respected and talented

communicators, such as Tim Harford, can help us on our journey to improve everyone’s

statistical literacy.

Harford is the recipient of multiple awards, largely thanks to his many contributions to

improving public understanding of economics. His engaging and entertaining prose

draws the curious reader in, effectively bridging the gap between academic statistics

education researchers and Joe Public. He was awarded an OBE in 2019 for “services to

improving economic understanding”. As a senior columnist at the Financial Times, and

presenter of the popular investigative BBC Radio’s More or Less, he is a familiar media

figure in the UK. Others will know him from his previous books such as The Undercover

Economist and Fifty Things That Made the Modern Economy.

In his most recent book The Data Detective: Ten easy rules to make sense of statistics

(the North American edition, for the worldwide edition look for How to Make the World

Add Up: Ten Rules for Thinking Differently About Numbers), Tim Harford provides his

audience with a list of ‘rules’ to follow when dealing with statistical claims. The rules are

cleverly illustrated via both entertaining anecdotes and respected academic research.

Much of what is covered by Harford’s ten rules aligns with the human behaviours

identified by academics to be crucial in the development of statistical literacy.

Let’s take a closer look, through the collective lenses of two statistics educators.Budgett & Renelle, p. 259

Harford’s Rules

Harford’s first two rules, Rule 1: Search Your Feelings, and Rule 2: Ponder Your Personal

Experience, reference our emotions and our individual knowledge. Going somewhat

against the grain, Rule 1 proposes that, when presented with a new piece of information,

we should ask ourselves: “How does it make me feel?” This advice seems contrary to the

belief that decision-making should be based on statistical evidence rather than on one’s

emotions. Indeed, much is known about the biases we succumb to when we use heuristic

reasoning for making decisions under uncertainty (Tversky & Kahneman, 1974). If a new

piece of information with which we are presented appeals, or aligns with our prior beliefs,

we are likely to look for reasons to believe it. If not, we are likely to look for reasons to

challenge it. Rule 1 suggests that, rather than blindly reacting on the basis of a particular

emotion, we take time to understand where that emotion came from.

Rule 2, to Ponder your personal experience, highlights the fact that an individual worm’s-

eye view can often be in conflict with the bird’s-eye view that statistics can offer. A

comparison of Harford’s own commuting experience with the occupancy statistics

provided by Transport for London seemed at odds. As noted by Hans Rosling, our

instincts based on personal experience, can serve to distort our view of the world (Rosling

et al., 2018). Relying on the worm’s-eye view is likely to give us a warped sense of reality.

Somehow amalgamating the worm’s-eye and bird’s-eye views can contribute to a deeper

understanding of the underlying situation.

Rule 3, to Avoid Premature Enumeration, aligns with the important idea of defining

measurements. Let’s start off with a deceptively simple question: how would you define

‘sheep’? Rule 3 is reminiscent of Jessica Utts’ Critical Component 3, claiming that sound

statistical studies and accompanying media reports should detail “[the] exact nature of

the measurements made, or the questions asked” (Utts, 2014, italics as in original, p. 20).

Harford provides an entertaining example of how complex defining measures can be,

courtesy of Michael Blastland, creator of BBC Radio 4’s More or Less. How many sheep

are in the image below (Figure 1)? Is a lamb a sheep? At what point is an unborn lamb a

sheep? Is the correct answer one? Or two? Maybe two and a half? Or three? Figuring out

what reported statistics are referring to is a vital first step in being able to drawTME, vol. 20, nos. 1, 2 & 3, p. 260

appropriate conclusions. Added to this, it is also important to pay attention to definitions

when comparisons are being made. If comparisons happen over time, have definitions

changed? If comparisons occur between countries, do those countries share common

definitions of what is being compared?

Figure 1. Defining ‘sheep’?

Harford describes several scenarios in which issues with definitions contributed to

inaccurate claims and conclusions. Examples include infant mortality, violence, gun

deaths and self-harm, not to mention ‘sheep’! Yet, the question remains: how can we, as

statistics educators, encourage consumers of statistics (i.e., everyone!) to be curious about

definitions?

Once the definitional aspect is sorted and we know what we are looking at, we need to

figure out how closely to look at it. Focusing on the difference between taking an

individual view, compared with an aggregate standpoint, Rule 4: Step Back and Enjoy the

View demonstrates how our inference may change depending on how closely we examine

the picture. What we see is often related to how frequently we look; for example, perhaps

unemployment – once we understand the definition being used – in the UK has increased

this past month, but decreased since the 1980s, yet increased since the end of WW2.

Politicians are notorious for carefully picking two points in time that conveniently portray

the statistical story that supports their argument. See Young (2012) for a media report

describing precisely this. Harford describes the media-hype surrounding a story, in April

2018, that London’s murder rate had surpassed that of New York. While true at that

specific point in time, taking a step back demonstrated that monthly fluctuations can

paint a very different picture to that of trends across a longer period of time. In theBudgett & Renelle, p. 261

absence of rules or guidelines indicating how to pick points in time to make comparisons,

the only way to avoid the trap of succumbing to media spin is for the reader (or listener)

to ask questions. Why compare these specific times? What does the overall trend look

like? Critically questioning claims is a key attribute of a statistically literate person (e.g.,

Gal, 2002).

Harford’s fifth rule, Rule 5: Get the Backstory, casts the spotlight on the idea that, just as

happens in the media, novel and exciting scientific findings are more likely to be

published in academic journals than dull and uninteresting ones. The media rarely, if

ever, provide the backstory to a scientific finding. Without doing one’s own background

research, the reader is ill-equipped to place specific findings in a wider context, or to

consider how those findings compare or contrast to previous related discoveries. To

Harford’s credit, he considers the findings from the numerous studies he has presented

in his previous chapters and asks how he knows that those studies were credible. His

answer? “I cannot be certain” (p. 131). He suggests that discerning the good from the bad,

in terms of science journalism, may be possible by asking a few questions which will be

familiar to those with some knowledge of the statistical literacy research base (e.g., Gal’s

“worry questions” (2002) and Utt’s Seven Critical Components (2004)).

Rule 6: Ask Who is Missing captures the spirit of the statistical concept of

representativeness. Through a series of illuminating examples, Harford demonstrates

that much of ‘research-based accepted wisdom’ may not be entirely what we thought it

was. An increasing awareness of studies involving WEIRD subjects (Western, Educated,

from Industrialised Rich Democracies) brings into question the relevance of the findings

for non-WEIRD groups. Harford describes the impact of not paying enough attention to

the missing people (selection bias) and the missing responses (non-response bias). This

particular idea corresponds to the second of Gal’s (2002) “worry questions” which he

promoted as supporting “the process of critical evaluation of statistical messages and

[leading] to the creation of more informed interpretations and judgments” (p. 17).

Harford cautions the reader not to be seduced by big data, highlighting that a thirst for

N=All might lead us to an acceptance of N=Everyone who has signed up for a particular

service. Such a compromise is risky and, despite perhaps having a huge dataset, willTME, vol. 20, nos. 1, 2 & 3, p. 262

inevitably lead to misleading findings. For example, a sentiment analysis of tweets on

Twitter will only give us a snapshot of Twitter users’ thoughts on a topic of interest, and

the snapshot is unlikely to resemble that of non-Twitter users.

Examining the black hole of big data, Rule 7: Demand Transparency When the Computer

Says No reminds us that we need to exercise caution when interpreting the output of

‘mysterious’ algorithms which, having been fed with large amounts of data, are

increasingly being used in decision-making. Harford provides several stories where

questionable, and often damaging, judgments were made. Quoting respected statistician

and fellow OBE Sir David Spiegelhalter, “There are a lot of small data problems that occur

in big data. They don’t disappear because you’ve got lots of the stuff. They get worse.” (in

Harford, 2014, p. 15). In light of Gould’s (2017) enhanced definition of statistical literacy

(SL), “Big data are ubiquitous in our society, and developing SL in the context of big data

is equally important as developing SL with more traditional data types” (p. 24). Big data

therefore deserves scrutiny in all statistics classrooms.

Harford’s Rule 8: Don’t Take Statistical Bedrock for Granted opens with the story of how

and why a woman, Alice Rivlin, would become the first director of the Congressional

Budget Office (CBO), the agency which would provide budgetary advice to congress.

Agencies such as the CBO tend to be taken for granted, likened in many ways to our

sewerage systems which are inclined to suffer from neglect until a problem arises. We

learn how global leaders distrust statistical agency predictions that don’t conform to their

own beliefs, much like the biases mentioned earlier, but perhaps with more disastrous

consequences. What we also discover is that independent statistical agencies are essential

if we are to understand the world in which we live. Today, with increased access to data,

there is a need to engage with the proposal of Gal and Ograjenšek (2017) to further

conceptualize the skills required to develop official statistics literacy. It would be fair to

say that official statistics are not perfect, they can certainly be tweaked and distorted by

those in power. However, we need to stand in unity with those honest and forthright

statisticians who have been threatened and commend them for staying true to their cause.Budgett & Renelle, p. 263

The deceptive beauty of data visualizations comes under the microscope in Rule 9:

Remember that Misinformation Can Be Beautiful, Too. “Familiarity with graphical and

tabular displays and their interpretation” is the third component of the statistical

knowledge base outlined by Gal (2002, p. 11), otherwise known as “Document Literacy

tasks [which] require people to identify, interpret, and use information given in lists,

tables, indexes, schedules, charts and graphical displays” (p. 8). Creating graphs and

interpreting them is commonplace in statistics classrooms. But do we spend enough time

highlighting how we can be manipulated by crafty data visualizations? Referencing WW1

battleships and their clever attempts at misdirection, Harford suggests that visualizations

can use “[dazzle] camouflage [that is] intended to provoke misjudgments” (p. 219). Dazzle

camouflage, in a statistics context, refers to a graph so stunning that you forget it’s telling

you a load of nonsense. From the dodgy pie chart to emotive renditions of a simple bar

graph, colour, scale, units, and other cunning manipulations takes a statistician’s best

friend and turns it into a beautifully photoshopped catastrophe. But, when something

looks so good, how can we motivate anyone to take their eyes off the shine for long enough

to recognise they are being serenaded by a statistical sea siren.

Harford’s final rule, to Keep an Open Mind, draws many threads together effectively

through a simple example. Imagine you are at a wedding and, in conversation with those

at your table, you predict whether or not you think the marriage will last. The tendency is

for us to consider what we know about the couple, searching our feelings and pondering

our personal experience. Harford advocates for us to activate Rule 4 and to Step Back and

Enjoy the View by considering the rate of failed marriages in the population of interest.

Not an easy feat. How to define ‘marriage’? And should one consider marriages between

people of the same age, education level, …, as the couple in question?

Wrapping Up

The common thread throughout this review, and indeed throughout Tim Harford’s

engaging book, is the need to inspire curiosity. We see this in statistics education all the

time, the ultimate challenge of how to motivate statistical consumers to take a critical

stance such that “...adults hold a propensity to adopt, without external cues, a questioningTME, vol. 20, nos. 1, 2 & 3, p. 264

attitude towards quantitative messages that may be misleading, one-sided, biased, or

incomplete in some way, whether intentionally or unintentionally” (Gal, 2002, p. 18).

Harford’s Golden Rule: Be Curious emphasises this goal. Everyone should read The Data

Detective - it’s a valuable and highly accessible resource - but the chances are, until

curiosity is piqued in all statistical consumers, Harford may fail to reach his intended,

broad audience. If so, the task of encouraging curiosity among statistical consumers falls

to us, the statistics educators. For us, at least, The Data Detective will represent a highly

practical and engaging tool in our statistical literacy education toolbox.