Top 16% Solution to Kaggle’s Product Classification Challenge

Kaggle is a platform for predictive modelling and analytics competitions on which companies and researchers post their data and statisticians and data miners from all over the world compete to produce the best models. As of May 2016, Kaggle had over 536,000 registered users, or Kagglers. The community spans 194 countries. It is the largest and most diverse data community in the world (Wikipedia).

grafik

One of my first Kaggle competitions was the OTTO product classification challange. OTTO is one of the world’s biggest e-commerce companies. For this competition, OTTO has provided a dataset with 93 features (all features have been obfuscated) for more than 200,000 products. The objective was to build a predictive model which is able to distinguish between their main product categories. There are nine categories for all products.

Kaggle also allows users to publicly share their code on each competition page. It helped me a lot to check out some other people’s code before getting started. You can find my R script for the OTTO product classification challenge on my Github.

Advertisements

Author: inside data blog

data analysis & visualization blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s