ML Notes: Cloud setup and best practices.

I started working on machine learning 2 years ago. Initially, my laptop was sufficient and usually I used to get result in 10-15 mins while working with 100mb of data. Now, my laptop gets very busy as I intend to tackle non-linear datasets of size > 1gb. It was able to train SVMs, Random Forests or other models on my personal system. But it became difficult when I started learning deep learning. It would take 8-10 hours for a CNN model with 5 hidden layers on MNIST dataset and memory error for larger networks.

I have never worked with cloud instances but at this point of time it was necessary to use a cloud service. There are two cloud providers which offer free credits Google Cloud Platform and AWS Student. Since, gcp offers 300$, I decided to use it instead of aws.

Thanks to free cloud credits, I was able to learn and practice deep learning courses. Idea is to blog my notes and key points during installation, running and maintaining my cloud instance.

There are many well documented setup guides and you can choose any of them. I used Stanford's CS231n guide. It has a setup with GPU and without GPU. It come with lots of packages and tools required for ML tasks. Also, my CPU, GPU and disk requirements are nicely fulfilled by their VM instance specs.
- GPUs are not available at all the server locations and you can't add GPU in your free trial account. But you can upgrade your account and still use the trail account's free credits. This means that, after your free credits are over you will be billed according to your usage.
- It's helpful to use Google cloud SDK shell to have ssh connection with your cloud instance.
- I maintain a sticky note for gcloud command to connect to instance, set my virtual environment, start jupyter notebook and static ip for my jupyter notebook. I use these 4 commands every time I run my GCP instance for training.
- Set up WinSCP. Some time I need to transfer large datasets and it is really helpful.
Never forget to shutdown your instance. You will loose your credits even if you keep running an idle cloud instance.
I generally run an epoc of the program on my local machine with a subset of data and remove the bugs and errors in code. Once satisfied I move my code to GCP and let it train on entire dataset.
We can setup jupyter in cloud and access it on our system for more live code run experience. But for large datasets it will take quite sometime and if your ssh session is disconnected before your training ends, you wil loose computation and start again. Once, I scheduled my training and went to sleep. Next morning I found my ssh session got disconnected after 2 hours of running. Advice: create a shell script and append shutdown commands at the end. Now, I don't have to babysit training session.
Refresh your terminal commands and shell script knowledge. Basic are enough and go far in helping you manage your cloud.

These are few handful suggestions and experience I went through personally while doing my deep learning course and participating in Kaggle competition. If you guys have some good tips for using cloud platform for machine learning, do share.