Build simple but nifty cohorts in R

cohort-analysis

Cohorts are always a great way to split a group into segments and get a deeper view of what ever you looking at. Imagine you have an online shop and would like to know how your user retention has developed over the last view weeks. I will explain cohorts down below after we created some data to build a cohort.

# get packages
library(ggplot2)
library(reshape2)
require(viridis)

# simulate cohort data
mydata = replicate(15, sort(runif(15, 1, 100), T))
mydata[lower.tri(mydata)] = NA

# convert to df and add cohort label
mydata = t(mydata)
mydata = as.data.frame(mydata)
mydata$cohort = as.factor(c(15:1))

# reshape and reorder
mydata = na.omit(melt(mydata, id.vars = "cohort"))
mydata$variable = as.numeric(gsub("V","",mydata$variable))
mydata$cohort = factor(mydata$cohort, levels=rev(levels(mydata$cohort)))

# plot cohort
ggplot(mydata, aes(variable, cohort)) +
 theme_minimal() +
 xlab('Week') +
 ylab('Cohort') +
 geom_tile(aes(fill = value), color='white') +
 scale_fill_viridis(direction = -1) +
 scale_x_continuous(breaks = round(seq(min(mydata$variable), max(mydata$variable), by = 1)))

With the code above you can simulate fifteen cohorts over a maximum period of fifteen weeks (or whatever the period might be). After creating some data you can easily use ggplot to build your cohort diagram. I have used a minimal theme and a neat viridis color palette.

rplot03

The diagram above basically shows the retention rate of fifteen different groups. For example about 25 percent of the people from cohort one came back to visit our online shop 15 weeks after their first visit. Cohort fifteen visit the online shop for the first time this week that is why we just have data from one week. With this principle in mind you can analyze your retention rates over time.

And of course this little plot can be used for all kinds of different task. Make sure you check out the code on my Github along with other projects. I also recommend analyzecore.com for really good R related marketing content.

Advertisements

Author: inside data blog

data analysis & visualization blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s