Build simple but nifty cohorts in R


Cohorts are always a great way to split a group into segments and get a deeper view of what ever you looking at. Imagine you have an online shop and would like to know how your user retention has developed over the last view weeks. I will explain cohorts down below after we created some data to build a cohort.

# get packages

# simulate cohort data
mydata = replicate(15, sort(runif(15, 1, 100), T))
mydata[lower.tri(mydata)] = NA

# convert to df and add cohort label
mydata = t(mydata)
mydata =
mydata$cohort = as.factor(c(15:1))

# reshape and reorder
mydata = na.omit(melt(mydata, id.vars = "cohort"))
mydata$variable = as.numeric(gsub("V","",mydata$variable))
mydata$cohort = factor(mydata$cohort, levels=rev(levels(mydata$cohort)))

# plot cohort
ggplot(mydata, aes(variable, cohort)) +
 theme_minimal() +
 xlab('Week') +
 ylab('Cohort') +
 geom_tile(aes(fill = value), color='white') +
 scale_fill_viridis(direction = -1) +
 scale_x_continuous(breaks = round(seq(min(mydata$variable), max(mydata$variable), by = 1)))

With the code above you can simulate fifteen cohorts over a maximum period of fifteen weeks (or whatever the period might be). After creating some data you can easily use ggplot to build your cohort diagram. I have used a minimal theme and a neat viridis color palette.


The diagram above basically shows the retention rate of fifteen different groups. For example about 25 percent of the people from cohort one came back to visit our online shop 15 weeks after their first visit. Cohort fifteen visit the online shop for the first time this week that is why we just have data from one week. With this principle in mind you can analyze your retention rates over time.

And of course this little plot can be used for all kinds of different task. Make sure you check out the code on my Github along with other projects. I also recommend for really good R related marketing content.

Author: inside data blog

data analysis & visualization blog

