Pareto Chart: Expose the Vital Few Contributors in Your Data Visualization

In my previous post (Make Vital Few Contributors Stand Out in Your Visuals Using Treemaps and Maps) I presented a use case of COVID19 2021 data where Treemaps and Maps were leveraged to make the vital few information stand out from other trivial many. And I pointed out another chart which is commonly used in improvement projects but can be deployed in this context- Pareto Chart.

Pareto principle- referred to as the 80/20 rule- states that roughly 80% of consequences come from 20% of causes. However, what matters is the principle itself of how we can expose the few significant contributors rather than the ratio per se.

What makes Pareto chart unique is the cumulative percentage line-Pareto Line. The chart consists of an x-axis of categories (e.g., countries) and two y-axes: primary and secondary. The primary axis represents the values (e.g., number of COVID cases), while the secondary one depicts percentages from 0-100%. Pareto Line is formed by adding up the value contribution (%) of each category from left to right creating a line of cumulative percentages.

The line helps to identify the vital few that contribute to a specific percentage from the whole. For instance, if USA, India, and Brazil recorded COVID cases of 17%, 12%, and 7%, respectively, from the entire cases, then Pareto Line will show a cumulative percentage of 36% contributed by these countries. Here we say that 1.4% (3 out of 212 countries) recorded 36% of the entire cases.

There are two methods to build Pareto Chart in Excel. First is by inserting a bar chart of type “Pareto” from the Charts group. The second is more complex but effective in communicating the full message, especially if used on dashboards.

The first method is sufficient for exploratory analysis to get an overview of data patterns. But it lacks a couple features that render the chart confusing sometimes. There’s no straightforward way to add Data Labels to the Pareto Line, which makes spotting cumulative percentages difficult at any point. More importantly, if you have “Others” category, which is the case in most Pareto analyses involving hundreds of categories, it will be sorted according to its value, rather than being pushed to the far right as seen in Pareto charts. This defeats the purpose of Pareto Line.

For that, I’d rather use the second method in reports and dashboards that overcomes the above shortcomings of the quick method. Yet, it involves adding a new column in data table to calculate the cumulative percentages.

The movie of this post illustrates both methods with annotations of steps. Your thoughts are welcome.

NB: Data source: covid.ourworldindata.org

Pareto Chart

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply