Project Management

This post is a machine learning project management guideline. This assumes you have chosen your dataset.

Pull existing Kernels

For just a sake of a little kick to move your project going, pull existing kernels from dataset owner or Kaggle and see what people have done. Play around with it a little.

Analyze the data

Super important. It’s very easy to just dive into algorithms and treat data as just digits but it is cruicial that you comprehend your data; distribution, symmetry, correlation, pixel aggregation(for visual data) and etc. Visualize with histograms, violin, swarm or heatmaps.

alternate text

< Correlation heatmap of Fashion-MNIST >

alternate text

< Pixel intensity distribution of Fashion-MNIST >

alternate text

< Symmetry per category of Fashion-MNIST >

alternate text

< Pixel aggregation per category of Fashion-MNIST >

Tune your hyperparameters

In NN, you have to make several choices:

  • optimization methods
  • cost function
  • hidden activation function (e.g. ReLu)
  • output activation function (e.g. softmax)
  • regularization parameters
  • depth of network
  • width of each hidden layer

This is a very time consuming phase. You could run grid search but they can be exhuasting. Instead, you could do a step-by-step tuning. For a CNN model for Fashion-MNIST my team tuned in the following order:

  1. regularization parameters
  2. activation functions
  3. dropout rates
  4. optimization methods

Check which class of data is hard to train

Check and fix

Make a Telegram notification bot training models

Send a message of a output file name with plots of the data. It saves you time and also it’s a good log.

Benchmark your model on different dataset

If you are doing image classification run it on MNIST or CIFAR10.