Emerald Quality Service - Home
"E" Data Survey Analysis
Presentations and Seminars
Calendar and Events
Contact Emerald Quality Services

Guidelines for Reporting Data (How Not to Lie with Graphs)

Karen Scheltema, MA, MS- Statistician, HealthEast, St.Paul, MN
Mary Larweck, BSN, MS, CIC, CPHQ- Consultant, Emerald Quality Services, Mpls, MN
 

We've all heard the phrase, "There's lies, damned lies and statistics." It is not surprising that there are many ways to "lie" with graphs. We believe the best way to avoid "lying" to ourselves is to know how to distort the data so we know what to look for and what to avoid.

Data are used to support decision making every day. We all attend meetings where displays of tables of data are met with the question, "What does this mean?" This query can be followed with as many interpretations as there are people in the room. When data are displayed in graphs or charts, misunderstandings can be minimized. Use of graphs can improve the quality of shared information and therefore improve the quality of decisions. The following paragraphs are guides for the use of graphics in reporting data.

A visual example of the benefit of graphic data display is provided in Figures 1 and 2 (Anscombe's Quartet). Figure 1 is four tables of numbers all with the same number of items, same means, and same standard deviations. Figure 2 is the graphic display of the same data. Which format do you find the most informative?

Figure 1- Anscombe's Quartet - Table
Data 1
Data 2
Data 3
Data 4
For all four data sets: N=11, X has a mean of 9.0 and standard deviation of 3.2, and Y has a mean of 7.5 and standard deviation of 1.9.
 
Figure 2- Anscombe's Quartet - Graphs

Anscombe, F.J. (1973). Graphs in Statistical Analysis. American Statistician, 27, 17-21.
 
When using graphs keep these guidelines in mind:
Use more graphs instead of tables.
Use consistency of size, type, scale & labels when graphs are in the same article and/or on the same page.
Use run charts rather than bar charts whenever data allows - run charts show trends & subtle changes very well.
Understand the difference between means and medians. It may be helpful to display both.
Graphing meaningful data requires more than two data points!
Always indicate the number in the sample or population represented in the graph.
 

Consistency:

Graphs without titles and axis labels can be difficult to understand. The y-axis should be a unit (e.g. percentage, days, infections, etc.).

Data can appear distorted when inconsistent graph formats are used in the same article and/or on the same page. When graphs are embedded in text it is helpful for them to be the same size. When multiple graphs of differing sizes are presented, larger graphs have more impact. Readers' eyes are drawn to larger graphs, and differences can appear more significant than they really are.

One way to give a graph more impact is to make it larger than the other graphs. Differences seem bigger, and readers' eyes are naturally drawn to the graph. Yet another way to make a graph stand out is to give it a smaller range on the y-axis. Conversely, if the message you wish to convey is that a difference is not important, use a larger range on the y-axis. The three graphs below illustrate how manipulating the range of the y-axis changes the visual interpretation of the graph.
 

Figure 3 - Three graphs of the same data with different y-axes.


 
Three-dimensional graphs can be misleading in several ways. Data points, when displayed as a 3-D bar, have extra depth, which gets displayed as height, thus leaving the reader with the visual impression that the data point is even higher than it really is. Another effect of 3-D graphs is that it ends up leaving the impression that the difference is smaller that it really is. Lastly 3-D graphs make it difficult to estimate what the points are in the graph. Look at the 3-D graph and estimate the 1996 value. Now look at the 2-D graph and estimate the 1996 value. Both are 30%.
 
Figure 4 - A 2-D and a 3-D graph of the same data.

 

Do I Use the Mean or the Median?

Too often, this question is not asked. Means are reported without consideration of whether they are appropriate measures of central tendency. One outlier can make the average look very different than where the bulk of the sample is. An example of this can be seen in data about lengths of stays for newborns. The typical stay is two days long. However, there are a few cases where the length of stay is several weeks because of premature births. The data below illustrate this example:

1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 40

Of 20 births, 18 stays were two days, one was one day, and one was 40 days. The average for this data is 3.85. Clearly, that is not the usual experience for a newborn in the hospital. The median of 2 more accurately represents the typical experience.

The bell curve is the visual depiction of statistical normality. The median is in the middle of the curve, and in fact, the mean is the same as the median. The tails on either side of the mean and median are symmetrical. A skewed distribution, on the other hand, has one tail longer than the other and the mean does not equal the median.
 

Figure 5- A normal and a skewed curve.

A general rule of thumb is that when the data are normally distributed (the tails are symmetrical, and the mean and median are close), report the mean. When the data are skewed (one tail is longer than the other), report the median.

Graphing Data Points:

Use of Run Charts vs. Bar Charts When graphing data before and after a change, you need to graph the process over time not just a before and an after data point.
 
Figure 6
Bar charts before and after a change.
 

 
Figure 7
Run charts - Improvement not sustained.
Figure 8
Run charts - Process improvement before the change.

These three graphs have the same result graphed for Week 3 and Week 7 - knowing what happened over time tells a lot about what is happening with the process. Is the improvement sustained? Is it normal variation and not really improvement? Did the process get better before the change?

Run charts that display data on a monthly or quarterly basis quickly give a picture of variation and trends. Annual summaries often obscure process variation. It is tempting when implementing an intervention to only measure two time points - pre and post. Measuring data over several time points allows clearer analysis of the effects of the intervention. Figure 9 shows how yearly bar charts miss important changes in variation.
 

Figure 9- Bar chart and run chart of the same data.


 

Conclusion:

When using graphs keep these guidelines in mind:
Avoid 3-D graphs.
Label all axes
State the sample or population size.
When multiple graphs are on the same page, use the same y-axis range as much as possible.
Understand the difference between mean and median. Displaying both is acceptable where indicated.
Graph data over time to show what is happening to a process over time. Two data points are not usually sufficient.
Run charts depict processes and the effects of changes better than bar charts.
 

 
  Home | "E" Data | Presentations | Calendar | Contact