Tuesday, April 9, 2013

Hadoop installation in AWS EC2

I was using the CDH's vm image on my local for a hands-on experience with Hadoop. I thought let's try it out in AWS and see how smooth is the process.

So, I started following the cloudera's blog post on the same. But the blog had a lot of issues and things didn't work as outlined. I received a little more help from this blog.

So, here's the brief setup after creating the instance. As told in both the posts, I used whirr to install the cluster to avoid manual setup.

Step 1: Get the latest whirr binary
Step 2: Setup the whirr config file. You can copy the below contents and update the AWS Access Key ID and Secret Access Key accordingly.
Step 3: Install java
Step 4: Generate public key. I just entered on the first prompt.
Step 5: Launch cluster. Wait till you get the instruction to ssh to the nodes.
Step 6: ssh to the nodes.
Step 7: verify if hadoop installation works. We'll look more on this later.