Frequently Asked Questions

How do I find my cloud credentials?

On EC2:

  1. Go to http://aws-portal.amazon.com/gp/aws/developer/account/index.html?action=access-key
  2. Log in, if prompted
  3. Find your Access Key ID and Secret Access Key in the "Access Credentials" section, under the "Access Keys" tab. You will have to click "Show" to see the text of your secret access key.

Another good resource is Understanding Access Credentials for AWS/EC2 by Eric Hammond.

Can I specify my own private key?

Yes, by setting whirr.private-key-file (or --private-key-file on the command line). You should also set whirr.public-key-file (--public-key-file) at the same time.

Private keys must not have a passphrase associated with them. You can check this with:

grep ENCRYPTED ~/.ssh/id_rsa

If there is no passphrase then there will be no match.

How do I access my cluster from a different network?

By default, access to clusters is restricted to the single IP address of the machine starting the cluster, as determined by Amazon's check IP service. However, some networks report multiple origin IP addresses (e.g. they round-robin between them by connection), which may cause problems if the address used for later connections is different to the one reported at the time of the first connection.

A related problem is when you wish to access the cluster from a different network to the one it was launched from.

In these cases you can specify the IP addresses of the machines that may connect to the cluster by setting the client-cidrs property to a comma-separated list of CIDR blocks.

For example, 208.128.0.0/16,38.102.147.107/32 would allow access from the 208.128.0.0 class B network, and the (single) IP address 38.102.147.107.

How can I start a cluster in a particular location?

By default clusters are started in an arbitrary location (e.g. region or data center). You can control the location by setting location-id (see the configuration guide for details).

For example, in EC2, setting location-id to us-east-1 would start the cluster in the US-East region, while setting it to us-east-1a (note the final a) would start the cluster in that particular availability zone (us-east-1a) in the US-East region.

How can I use a custom image? How can I control the cloud hardware used?

The default image used is dependent on the Cloud provider, the hardware, and the service.

Use image-id to specify the image used, and hardware-id to specify the hardware. Both are cloud-specific.

In addition, on EC2 you need to set jclouds.ec2.ami-owners to include the AMI owner if it is not Amazon, Alestic, Canonical, or RightScale.

How do I log in to a node in the cluster?

On EC2, if you know the node's address you can do

ssh -i ~/.ssh/id_rsa ec2-user@host

This assumes that you use the default private key; if this is not the case then specify the one you used at cluster launch.

The Amazon Linux AMI requires that you login as ec2-user. If needed, you can become root by doing sudo su - after logging in.

Different AMIs may have different login users. Check with the documentation for the AMI.

How can I modify the instance installation and configuration scripts?

The scripts to install and configure cloud instances are downloaded from an S3 bucket by the instances at, or after, boot time. The base URL defaults to http://whirr.s3.amazonaws.com/VERSION/, where VERSION is the version of Whirr. (Note that S3 buckets are not browsable, so you can't use a browser to look at these scripts unless you know the URL.)

If you want to change the scripts then you can place a modified copy of the scripts in the scripts directory of the distribution on a webserver (such as S3) and change the base URL, by setting the run-url-base property. You need to copy all the scripts, not just the ones you have changed.

For example, by setting run-url-base to http://example.org/ the scripts would be loaded from the example.org domain. The Java install script, for instance, would be requested from http://example.org/sun/java/install.

Scripts have to be publicly readable, so on S3 you have to set the ACL to give everyone read access. S3Fox is a useful Firefox extension for uploading and managing script files on S3.

You can debug the scripts that run on a cloud instance without having to log into the instance, since the output is sent to whirr.log in the directory from which you launched the whirr CLI.

How do I specify the service version and other service properties?

Currently the only way to do this is to modify the scripts to install a particular version of the service, or to change the service properties from the defaults.

See "How to modify the instance installation and configuration scripts" above for details on how to do this.

How can I install custom packages?

You can install extra software by modifying the scripts that run on the cloud instances. See "How to modify the instance installation and configuration scripts" above.

How do I run Cloudera's Distribution for Hadoop?

You can run CDH rather than Apache Hadoop by running the hadoop service and setting the whirr.hadoop-install-runurl and whirr.hadoop-configure-runurl properties. See the recipes directory in the distribution for samples.

How do I run a ZooKeeper cluster?

See the recipes directory in the distribution for samples.

How do I run a Cassandra cluster?

See the recipes directory in the distribution for samples.

How do I automatically tear down a cluster after a fixed time?

It's often convenient to terminate a cluster a fixed time after launch. This is the case for test clusters, for example. You can achieve this by scheduling the destroy command using the at command from your local machine.

WARNING: The machine from which you issued the atcommand must be running (and able to contact the cloud provider) at the time it runs.

% echo 'bin/whirr destroy-cluster --config hadoop.properties' \
    | at 'now + 50 min'

Note that issuing a shutdown command on an instance may simply stop the instance, which is not sufficient to fully terminate the instance, in which case you would continue to be charged for it. This is the case for EBS boot instances, for example.

You can read more about this technique on Eric Hammond's blog.

Also, Mac OS X users might find this thread a useful reference for the at command.