Also known as Box and Whisker Plot and Whisker Plot.
A box plot is a graph that shows the frequency of numeric data values. It is mainly used to explore data as well as to present the data in an easy and understandable manner. Box plots are one of the easiest and most useful ways to understand and compare continuous data sets. They are widely used in statistics, scientific research, higher education, process improvement, and in social and human sciences.
Box plots are similar to histograms and provide similar information but in different graphical format. They provide a quick way for examining the central tendency, the amount of variation in the data, as well as the presence unusual data points and outliers (outliers are those values in the data set that are not consistent with the other observed values). For example, a wider range boxplot indicates more variability.
In a box plot, the data is plotted in such away that the bottom 25% and the top 25% of the data points are represented by the two whiskers, whereas the middle 50% of the data points are represented by the box. Key statistics can be indicated including the median of the data, maximum and minimum values, and the lower and upper quartiles. Outliers are usually plotted as asterisks.
Box plots are most useful when comparing between several data sets. They allow to compare the central tendency as well as the variability of multiple data sets. They are less detailed than histograms, and take up less space which make them more practical when comparing multiple data sets. More advanced statistical tests can then be conducted to test the significance of the differences in terms of central tendency and variability.
Like histograms, box plots are ideal to represent moderate to large amount of data. The size of the box plot can vary significantly if the data size is small. Individual value plots are preferred over box plots when representing small amount of data.
Box plots can also tell if the distribution is symmetrical or skewed. In a symmetric distribution, the mean and median are nearly the same, and the two whiskers has almost the same length.
Example – Crop Yield
The following box plots display the yield of a crop after applying two different fertilizers.
Fertilizer 2 appears to have a higher yield than Fertilizer 1. What other comments would you make about the above box plots? Think about the variation as well as the presence of any unusual values.
Example – Diabetes Test
The below box plots illustrate an analysis that was conducted for diagnosing the presence of diabetes at a workplace.
It is evident that females have in general higher glucose levels than males. ANOVA can be used to test the significance of the difference between the two means.
There are many applications and online services that allow the creation of box plots quickly and automatically (such as Minitab). One of the simplest ways is to use this template.
If you want to use the following three documents in your training courses, the PPTX versions are available to buy from our Shop page.