Configuring and Running

Setting Environment Variables to Specify AWS Credentials

You must specify your AWS credentials when using the cloud scripts (see How do I find my cloud credentials?). The simplest way to do this is to set the environment variables (see this page for other options):

Configuring the Python Cloud Scripts

To configure the scripts, create a directory called .hadoop-cloud in your home directory (note the leading period "."). In that directory, create a file called clusters.cfg that contains a section for each cluster you want to control. The following example shows how to specify an i386 Ubuntu OS as the AMI in a clusters.cfg file.

[my-hadoop-cluster]
image_id=ami-ed59bf84
instance_type=c1.medium
key_name=tom
availability_zone=us-east-1c
private_key=/path/to/private/key/file
ssh_options=-i %(private_key)s -o StrictHostKeyChecking=no

You can select a suitable AMI from the following table:

AMI (bucket/name) ID OS
cloudera-ec2-hadoop-images/cloudera-hadoop-ubuntu-20090623-i386 ami-ed59bf84 Ubuntu 8.10 (Intrepid)
cloudera-ec2-hadoop-images/cloudera-hadoop-ubuntu-20090623-x8664 ami-8759bfee Ubuntu 8.10 (Intrepid)
cloudera-ec2-hadoop-images/cloudera-hadoop-fedora-20090623-i386 ami-6159bf08 Fedora release 8 (Werewolf)
cloudera-ec2-hadoop-images/cloudera-hadoop-fedora-20090623-x8664 ami-2359bf4a Fedora release 8 (Werewolf)

If you wish to use CDH instead of Apache Hadoop, use the following configuration:

[my-hadoop-cluster]
image_id=ami-2d4aa444
instance_type=c1.medium
key_name=tom
availability_zone=us-east-1c
private_key=/path/to/private/key/file
ssh_options=-i %(private_key)s -o StrictHostKeyChecking=no
user_data_file=http://archive.cloudera.com/cloud/ec2/cdh3/hadoop-ec2-init-remote.sh

Note that this example uses CDH3, as specified by the user_data_file property (the version of Hadoop to install is determined by this script). For CDH, use one of the AMIs from this table:

AMI (bucket/name) ID OS Notes
ubuntu-images/ubuntu-lucid-10.04-i386-server-20100427.1 ami-2d4aa444 Ubuntu 10.04 (Lucid) This AMI is suitable for use with CDH3b2 onwards. See http://alestic.com/
ubuntu-images/ubuntu-lucid-10.04-amd64-server-20100427.1 ami-fd4aa494 Ubuntu 10.04 (Lucid) This AMI is suitable for use with CDH3b2 onwards. See http://alestic.com/

Running a Basic Cloud Script

After specifying an AMI, you can run the hadoop-ec2 script. It will display usage instructions when you invoke it without arguments.

You can test that the script can connect to your cloud provider by typing:

% hadoop-ec2 list