THE CAP CURVES (The Cumulative Accuracy Profile)

Ojo Olawale
7 min readJul 6, 2020

--

Prior to reading this I would advise you go and get a good understanding of ML classification.

A brief run down on ML classification, classification is the process of assigning a data point or an observation to a specific category. ML Classification may be grouped into 2 which are binary classification and multiclass classification.

The binary classification is when you are classifying your data points into 2 specific categories for example: A dataset of people who go into a store to buy a certain item. Some people walk into the store and do not purchase the item (category 1), while some purchase the item (category 2).

The multi-class classification is when you are classifying a dataset into more than 2 categories for example: We have different types of food and we need to classify them into the 6 classes of food I.e. Protein, Carbohydrate, Fat and Oil, Mineral Salt, Vitamins and Water (6 categories).

In machine learning there are different algorithms that could be used to solve classification problems they include Logistic Regression, K–Nearest Neighbor, Support Vector Machine, Naïve Bayes etc.

As a data scientist it is important to track the progress of our model to know how good our model is and there are different ways in which we can do this and one of them is the CAP we are going to be talking about in this article.

The Cumulative Accuracy Profile

The cumulative accuracy profile (CAP) is used in data science to visualize the discriminative power of a model. The CAP of a model represents the cumulative number of positive outcomes along the y-axis versus the corresponding cumulative number of a classifying parameter along the x-axis.

To understand this, you need to follow me on these case studies.

CASE 1: THE RANDOM

Let’s say you’re a data scientist invited to work on the data of a store for one item, and you have your data saying that 100,000 customers walk into your store on a daily basis. This item is advertised to them all but just 10% of them actually purchase the product.

At the moment when 0 customers come into the store (maybe when the store is closed), 0 customers will purchase the item, when 20,000 customers come in (maybe like the first one hour), we have 2,000 customers to purchase the item since 10% of the customers that walk in purchase the item. Moving on, in the same manner 40,000 customers walk in, 4,000 will purchase, 60,000 walk in and 6,000 purchase, till we get to 100,000 in which 10,000 purchased. we would have a graph that looks like the figure below:

And it gives a straight-line graph with a slope of 10%

Now we ask ourselves, can we improve this experience, can we have more people buy the item when they come into the store, could we target our customers to ensure that people who come in with the intention of purchasing this item always get it before they leave the store. Rather than just placing this item anywhere in the store, we strategically place it in a place where it is most visible to customers or in a place where our targeted customers could reach it or placing similar products that when close to our item they are mostly considered. For example, if a pack of disposable cup is our item and we place it where the drinks are, it would get more attention there than placing it where kitchen utensils are.

We can do all these using the data gathered from experiments on the item position in the store and the data of people that purchase our item to train a machine learning model that predicts who purchases and who doesn’t. This brings us to the second case

CASE 2: THE GOOD MODEL

There would be an improvement on the number of customers that purchased our item this time around so when we have 0 customers we would have 0 purchased but when we have 20,000 customers we would have more than 2,000 customers, let’s say about 5,000 would purchase because we are following the recommendations of the data scientist in the advertisement of our item, then when we get to have 40,000 customers in the store we would have about 8,000 purchasing our item and at 60,000 we would have about 9,500 purchased but we know that we can’t exceed 10% of the total people walking into the store so we can’t exceed 10,000 sold items so if we have 80,000 come in we could get about 9,700 purchases and if we have all 100,000 customers come in, we would have our 10,000 purchases like in the graph below.

And we get a curve like below which is called the cumulative accuracy profile of the classification model

Note that the better your model the larger the area between the CAP curve and the random case i.e. the red and the blue line.

Assuming we have a model that is performing worse than our model we would have something like the green line in the diagram below:

CASE 3: THE IDEAL MODEL

This happens only when you are like Dr. strange in the avengers(a marvel fiction movie) and can see the future and tell what exactly to do so that the first 10,000 people, who come into the store purchase the item.

This is shown in the graph that at 10% (10,000)customers walking into the store we would have 100% (10,000)items purchased.

We would have a curve that looks like the labelled crystal ball below:

ANALYSING CAP

We can analyze the cap curve in 2 ways.

First is using the ratio of the areas under the good model and the random (AR) to the area under the ideal curve or crystal ball and the random (AP). Our answer is always going to be less than 1 but the closer it is to 1 the better, and the closer it is to 0 the worse it gets.

However, this is not visible at a glance of this plot and it could be quite difficult to calculate these areas but statistical tools could do it for you, hence the other approach.

To use this approach you should go on the horizontal axis and trace out 50% of it to hit the good model curve, then trace it sideways to the vertical axis to determine, at 50,000 customers how many people would purchase the item or the percentage of people that purchased and whatever our value is we evaluate it with this metric.

  1. If your value is less than 60% of the total number of customers the model is at its worst (underfitting).
  2. If it is in between 60% and 70% then the model is fair.
  3. If it is between 70% and 80% the model is good.
  4. If it falls between 80% and 90% to me, I think it is at its best as a predictive model
  5. Anything between 90% to 100% is at the Dr Strange’s level (seeing the future) which isn’t realistic in data science we have to check if we had input a forward seeing variable or it also could be overfitting. At 90% to 97% you could actually have a very very good model but let’s always endeavor to check to be sure we are not just too eager to have the best model.

The CAP is distinct from the receiver operating characteristic (ROC), which plots the true-positive rate against the false-positive rate.

thanks for reading hope it was impactful and if it was, lets give as much claps that suit our satisfaction😍

References:

Machine Learning A to Z Hands On Python and R In Data Science Course from Udemy

Machine Learning A to Z SuperDataScience.

--

--