You must specify your AWS credentials when using the cloud scripts (see How do I find my cloud credentials?). The simplest way to do this is to set the environment variables (see this page for other options):
To configure the scripts, create a directory called .hadoop-cloud in your home directory (note the leading period "."). In that directory, create a file called clusters.cfg that contains a section for each cluster you want to control. The following example shows how to specify an i386 Ubuntu OS as the AMI in a clusters.cfg file.
[my-hadoop-cluster] image_id=ami-ed59bf84 instance_type=c1.medium key_name=tom availability_zone=us-east-1c private_key=/path/to/private/key/file ssh_options=-i %(private_key)s -o StrictHostKeyChecking=no
You can select a suitable AMI from the following table:
AMI (bucket/name) | ID | OS |
---|---|---|
cloudera-ec2-hadoop-images/cloudera-hadoop-ubuntu-20090623-i386 | ami-ed59bf84 | Ubuntu 8.10 (Intrepid) |
cloudera-ec2-hadoop-images/cloudera-hadoop-ubuntu-20090623-x8664 | ami-8759bfee | Ubuntu 8.10 (Intrepid) |
cloudera-ec2-hadoop-images/cloudera-hadoop-fedora-20090623-i386 | ami-6159bf08 | Fedora release 8 (Werewolf) |
cloudera-ec2-hadoop-images/cloudera-hadoop-fedora-20090623-x8664 | ami-2359bf4a | Fedora release 8 (Werewolf) |
If you wish to use CDH instead of Apache Hadoop, use the following configuration:
[my-hadoop-cluster] image_id=ami-2d4aa444 instance_type=c1.medium key_name=tom availability_zone=us-east-1c private_key=/path/to/private/key/file ssh_options=-i %(private_key)s -o StrictHostKeyChecking=no user_data_file=http://archive.cloudera.com/cloud/ec2/cdh3/hadoop-ec2-init-remote.sh
Note that this example uses CDH3, as specified by the user_data_file property (the version of Hadoop to install is determined by this script). For CDH, use one of the AMIs from this table:
AMI (bucket/name) | ID | OS | Notes |
---|---|---|---|
ubuntu-images/ubuntu-lucid-10.04-i386-server-20100427.1 | ami-2d4aa444 | Ubuntu 10.04 (Lucid) | This AMI is suitable for use with CDH3b2 onwards. See http://alestic.com/ |
ubuntu-images/ubuntu-lucid-10.04-amd64-server-20100427.1 | ami-fd4aa494 | Ubuntu 10.04 (Lucid) | This AMI is suitable for use with CDH3b2 onwards. See http://alestic.com/ |
After specifying an AMI, you can run the hadoop-ec2 script. It will display usage instructions when you invoke it without arguments.
You can test that the script can connect to your cloud provider by typing:
% hadoop-ec2 list