Imagine you have a data set which contains every page of your website and two corresponding key performance indicators (KPI). The first KPI are your page impressions which are defined as number of times a specific page has been accessed. The second KPI is the bounce rate and represents the percentage of visitors who enter your website on a specific page and then leave (“bounce”) directly rather than continuing on to view other pages within your website. The bounce rate is a measure of the effectiveness of a website.

Page |
Page_Impressions |
Bounce_Rate |

Page_483 | 883302 | 0.16 |

Page_186 | 459298 | 0.1 |

Page_307 | 366320 | 0.0 |

Page_388 | 312929 | 0.01 |

Page_308 | 261719 | 0.0 |

Page_277 | 226548 | 0.03 |

Page_31 | 212580 | 0.02 |

Page_129 | 189652 | 0.12 |

Page_337 | 176998 | 0.02 |

… | … | … |

So far so good, but if you sort your table via bounce rate you have not achieved much, as most of the pages with a high bounce rate will have a small amount of page impressions. The underlying question is the following: How can I get pages with a high bounce rate but also a high amount of page impressions?

Page |
Page_Impressions |
Bounce_Rate |

Page_108 | 1 | 1.00 |

Page_121 | 1 | 1.00 |

Page_83 | 3 | 1.00 |

Page_45 | 3 | 0.67 |

Page_112 | 4 | 0.67 |

Page_11 | 12 | 0.57 |

Page_140 | 388 | 0.56 |

Page_89 | 2 | 0.5 |

Page_133 | 2 | 0.5 |

… | … | … |

Google developed a weighted sort algorithm that will help you. It is based on calculating the Estimated True Value (ETV) and using the ETV as a score on which to sort. The ETV computation is performed for each page as follows:

ETV(page x) = (PI / maxPI * BR) + ((1 – PI / maxPI)) * avgBR)BR = Bounce rate for the page xPI = Page impressions for the page xmaxPI = The highest amount of page impressions found for any pageavgBR = The Bounce rate for the whole table

The ETV calculation essentially says, “I need to figure out whether the row of data (page x) that I’m looking at is a large or a small percentage of the data in the whole table. If it’s a large percentage, then I need to take this row seriously in the sense that the data is probably meaningful. If it’s a small percentage, then I shouldn’t take this row seriously, and the reality is that I’d probably be more ‘correct’ if I just leaned towards using the average Bounce Rate for the whole table.“ Now you can sort your table via ETV and you will have the most interesting pages on top of the table.

Page |
Page_Impression |
Bounce_Rate |
ETV |

Page_483 | 883302 | 0.16 | 0.16 |

Page_186 | 459298 | 0.1 | 0.07557256 |

Page_129 | 189652 | 0.12 | 0.06433205 |

Page_35 | 58882 | 0.17 | 0.05717034 |

Page_37 | 81772 | 0.13 | 0.05660003 |

Page_179 | 126322 | 0.09 | 0.05495925 |

Page_274 | 19304 | 0.3 | 0.05459478 |

Page_377 | 174237 | 0.07 | 0.05323211 |

Page_275 | 58731 | 0.1 | 0.05249535 |

… | … | … | … |

Find the R code below to compute the ETV. It is also on my Github.

# calculate max visits max_visit = max(dataset$Page_Impressions) dataset$prop_visits = dataset$Page_Impressions/max_visit # calculate average weighted bounce rate weighted_avg_br = sum(dataset$Page_Impressions * dataset$Bounce_Rate)/sum(dataset$Page_Impressions) # calculate Estimated True Value (ETV) dataset$ETV = (dataset$prop_visits*dataset$Bounce_Rate)+((1-(dataset$prop_visits))*weighted_avg_br)

And of course the weighted sort algorithm can be implemented for any kind of problem, not only in the field of web analytics.