Recently I thought about how to visualize the result of a cluster analysis. I do not mean the visualization of the clusters itself but the results in terms of content and variable description – something you could give away to someone who does not understand the mechanics of cluster algorithms and just want to see a description of the resulting clusters. I came up with a fairly easy ggplot solution but let’s get some data before we go into that.
# load packages require(reshape2) require(ggplot2) require(viridis) require(dplyr) # get the data url = 'http://www.biz.uiowa.edu/faculty/jledolter/DataMining/protein.csv' food = read.csv(url) # filter on specific countries food = subset(food, food$Country %in% c("Albania","Belgium","Denmark","France","Romania","USSR","W Germany","Finland","UK"))
With the code above we are getting some example data of the 25 European countries and their protein consumption (in percent) from nine major food sources. We are going to reduce the data set and filter on nine countries. With the code below you are transforming the data to a long table format which is required for plotting.
# melt data DT1 = melt(food,id.vars = "Country") # plot data ggplot(DT1, aes(Country, value)) + geom_bar(aes(fill = Country), position = "dodge", stat="identity") + facet_wrap(~variable, scales = "free") + xlab("") + ylab("protein intake (in %)") + theme(axis.text.x=element_blank()) + scale_fill_viridis(discrete=TRUE)
From here on its just a bit of classic ggplot commands to get the diagram we want. I set up a grouped barplot with a facet wrap und some neat coloring with the viridis palette.
I think this plot is perfect to see the differences between the countries (clusters) in just one diagram. Find the full code on my Github along with other projects.