Wednesday, July 29, 2015

Deploy edx spark environment to DigitalOcean

This summer I took the Spark courses at edx CS100 and CS190, and had wonderful experience.
The two classes apply a Vagrant virtual machine containing Spark and all teaching materials. There are two challenges with the virtual machine —
  1. The labs usually take long time to finish, say 8-10 hours. If the host machine is closed, the RDDs will be lost and the pipeline has to be run again.
  2. Some RDD operations take a lot computation/communication powers, such as groupByKey and distinct. Many of my 50k classmates complained about the waiting time. And my most used laptop is a Chromebook and doesn’t even have options to install Virtual Box.
To deploy the learning environment to a cloud may be an alternative. DigitalOcean is a good choice because it uses mirrors for most packages, and the network speed is amazingly fast that is almost 100MB/s (thanks to the SSD infrastructure DigitalOcean implements for the cloud, otherwise the hard disk may not stand this rapid IO; see my deployment records GitHub).

I found that a Linux box with 1 GB memory and 1 CPU at DigitalOcean that costs 10 dollars a month will handle most labs fairly easy with IPython and Spark. A 2 GB memory and 2 CPU droplet will be ideal since it is the minimal requirement for a simulated cluster. It costs 20 dollars a month, but is still much cheaper than the cost to earn the big data certificate that is $100 (50 for each). I just need to write Python scripts to install IPython notebook with SSL, and download Spark and the course materials.
  • The DevOps tool is Fabric and the fabfile is at GitHub.
  • The deployment pipeline is also at GitHub

6 comments:

  1. I did took both courses this summer and had great experience working with Spark. I used your fabfile and was able to setup the environment pretty quickly. Thanks again

    ReplyDelete
  2. I really loved reading your blog. It was very well authored and easy to understand.

    ReplyDelete
  3. So guys hope you all prepared for Fathers Day festival because Dads Day is very close to us, Today we are happy to share with you Happy Fathers Day 2016 Stuffs which are going to be your favorite for ever.
    Fathers Day Images

    ReplyDelete
  4. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!.

    - emoticons
    - lenny face
    - whatsapp messenger

    ReplyDelete
  5. i cannot truly enable but admire your weblog, your weblog is so adorable and great.It has given me courage to try scarier things. I tend to steer clear of them but not anymore.
    Packers And Movers Bangalore
    Packers And Movers Bangalore

    ReplyDelete
  6. You narrated the topic perfectly. Meanwhile I am a business owner running professional Packers and Mover service offering all kinds of Express relocation . Here are my links to view portfolio
    House Shifting team in Kochi Cochin and House Shifting team in Ernakulam Packers Movers in Kochi Packers and Movers in Ernakulam Great Service assured

    ReplyDelete