Guidelines for Reporting Data (How Not to Lie with Graphs)

Karen Scheltema,
MA, MS Statistician,
HealthEast, St.Paul, MN
Mary Larweck,
BSN, MS, CIC, CPHQ Consultant, Emerald Quality Services, Mpls, MN

We've
all heard the phrase, "There's lies, damned lies and statistics."
It is not surprising that there are many ways to "lie" with graphs.
We believe the best way to avoid "lying" to ourselves is to know
how to distort the data so we know what to look for and what to
avoid.
Data are used to support decision making every day. We all attend
meetings where displays of tables of data are met with the question,
"What does this mean?" This query can be followed with as many
interpretations as there are people in the room. When data are
displayed in graphs or charts, misunderstandings can be minimized.
Use of graphs can improve the quality of shared information and
therefore improve the quality of decisions. The following paragraphs
are guides for the use of graphics in reporting data.
A visual example of the benefit of graphic data display is provided
in Figures 1 and 2 (Anscombe's Quartet). Figure 1 is four tables
of numbers all with the same number of items, same means, and
same standard deviations. Figure 2 is the graphic display of the
same data. Which format do you find the most informative?

Figure
1 Anscombe's Quartet  Table

Data 1


Data 2


Data 3


Data 4


X

Y


X

Y


X

Y


X

Y


10

8.04


10

9.14


10

7.46


8

6.58


8

6.95


8

8.14


8

6.77


8

5.76


13

7.58


13

8.74


13

12.74


8

7.71


9

8.81


9

8.77


9

7.11


8

8.84


11

8.33


11

9.26


11

7.81


8

8.47


14

9.96


14

8.1


14

8.84


8

7.04


6

7.24


6

6.13


6

6.08


8

5.25


4

4.26


4

3.1


4

5.39


19

12.5


12

10.84


12

9.13


12

8.15


8

5.56


7

4.82


7

7.26


7

6.42


8

7.91


5

5.68


5

4.74


5

5.73


8

6.89

For all four data sets: N=11, X has a mean
of 9.0 and standard deviation of 3.2, and Y has a mean of 7.5 and
standard deviation of 1.9.

Figure 2 Anscombe's
Quartet  Graphs


Anscombe, F.J.
(1973). Graphs in Statistical Analysis. American Statistician, 27,
1721.

When using graphs keep these guidelines
in mind: 

Use more graphs instead of tables. 

Use consistency of size, type, scale
& labels when graphs are in the same article and/or on the same page. 

Use run charts rather than bar charts
whenever data allows  run charts show trends & subtle changes very
well. 

Understand the difference between means
and medians. It may be helpful to display both. 

Graphing meaningful data requires more
than two data points! 

Always indicate the number in the sample
or population represented in the graph.

Consistency:
Graphs without titles and axis labels can be difficult to understand.
The yaxis should be a unit (e.g. percentage, days, infections, etc.).
Data can appear distorted when inconsistent graph formats are
used in the same article and/or on the same page. When graphs are
embedded in text it is helpful for them to be the same size. When
multiple graphs of differing sizes are presented, larger graphs
have more impact. Readers' eyes are drawn to larger graphs, and
differences can appear more significant than they really are.
One way to give a graph more impact is to make it larger than the
other graphs. Differences seem bigger, and readers' eyes are naturally
drawn to the graph. Yet another way to make a graph stand out is
to give it a smaller range on the yaxis. Conversely, if the message
you wish to convey is that a difference is not important, use a
larger range on the yaxis. The three graphs below illustrate how
manipulating the range of the yaxis changes the visual interpretation
of the graph.

Figure 3  Three graphs
of the same data with different yaxes.


Threedimensional graphs can be misleading
in several ways. Data points, when displayed as a 3D bar, have extra
depth, which gets displayed as height, thus leaving the reader with
the visual impression that the data point is even higher than it really
is. Another effect of 3D graphs is that it ends up leaving the impression
that the difference is smaller that it really is. Lastly 3D graphs
make it difficult to estimate what the points are in the graph. Look
at the 3D graph and estimate the 1996 value. Now look at the 2D
graph and estimate the 1996 value. Both are 30%.

Figure
4  A 2D and a 3D graph of the same data. 

Do I Use the Mean or the Median?
Too often, this question is not asked. Means are reported without
consideration of whether they are appropriate measures of central
tendency. One outlier can make the average look very different than
where the bulk of the sample is. An example of this can be seen in
data about lengths of stays for newborns. The typical stay is two
days long. However, there are a few cases where the length of stay
is several weeks because of premature births. The data below illustrate
this example:
1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 40
Of 20 births, 18 stays were two days, one was one day, and one
was 40 days. The average for this data is 3.85. Clearly, that is
not the usual experience for a newborn in the hospital. The median
of 2 more accurately represents the typical experience.
The bell curve is the visual depiction of statistical normality.
The median is in the middle of the curve, and in fact, the mean
is the same as the median. The tails on either side of the mean
and median are symmetrical. A skewed distribution, on the other
hand, has one tail longer than the other and the mean does not equal
the median.

Figure
5 A normal and a skewed curve. 

A general rule of thumb is that when the data are normally distributed
(the tails are symmetrical, and the mean and median are close),
report the mean. When the data are skewed (one tail is longer than
the other), report the median.
Graphing Data Points:
Use of Run Charts vs. Bar Charts When graphing data before and after
a change, you need to graph the process over time not just a before
and an after data point.

Figure 6
Bar charts before and after a change.

Figure 7
Run charts  Improvement not sustained.

Figure 8
Run charts  Process improvement before the change.


These three graphs have the same result graphed for Week 3 and
Week 7  knowing what happened over time tells a lot about what
is happening with the process. Is the improvement sustained? Is
it normal variation and not really improvement? Did the process
get better before the change?
Run charts that display data on a monthly or quarterly basis quickly
give a picture of variation and trends. Annual summaries often obscure
process variation. It is tempting when implementing an intervention
to only measure two time points  pre and post. Measuring data over
several time points allows clearer analysis of the effects of the
intervention. Figure 9 shows how yearly bar charts miss important
changes in variation.

Figure 9 Bar chart
and run chart of the same data. 

Conclusion:

When using graphs keep these guidelines
in mind: 

Avoid 3D graphs. 

Label all axes 

State the sample or population size. 

When multiple graphs are on the same
page, use the same yaxis range as much as possible. 

Understand the difference between mean
and median. Displaying both is acceptable where indicated. 

Graph data over time to show what is
happening to a process over time. Two data points are not usually
sufficient. 

Run charts depict processes
and the effects of changes better than bar charts.

