Percentiles are 100 equal groups into which a population can be divided according to the distribution of values. A percentile can be between 1 and 99 – whatever number you pick, X% should fall below that number. For example, if you’re in the 60th percentile, you should be greater than 60% of all other observations.
In the below example, 20 people take a test, which has possible scores from 0 to 5. The test is pretty easy, so everybody scores 4 or above. That means, taking the 25th percentile of the data will take members, even if they scored 4/5.
We often use quartiles rather than percentiles. The 1st quartile = 25th percentile; the 2nd quartile = 50th percentile (also the median) and the 3rd quartile = 75th percentile.
In order to calculate the quartiles, we:
In the below example, the median falls between two values and becomes 3.5. We therefore re-use 3 and 4 when calculating the medians for the upper and lower datasets.
This is not the case when the median is a whole number – that number is not used to calculate the 1st and 3rd quartiles.
The interquartile range is the third quartile minus the 1st quartile (in the above, 6-2 = 4).
So, what about box plots. Let’s start with an example. We have the below features in our dataset.
This leads us to produce the below box plot.
This shows us how closely clustered the central 50% of data is around the median and gives us a pictorial view of the entire data range. Other examples include: