After you launch a cluster, a hadoop-site.xml file is created in the directory ~/.hadoop-cloud/<cluster-name>. You can use this to connect to the cluster by setting the HADOOP_CONF_DIR environment variable. (It is also possible to set the configuration file to use by passing it as a -conf option to Hadoop Tools):
% export HADOOP_CONF_DIR=~/.hadoop-cloud/my-hadoop-cluster
To browse HDFS:
% hadoop fs -ls /
Note that the version of Hadoop installed locally should match the version installed on the cluster.
To run a job locally:
% hadoop fs -mkdir input # create an input directory % hadoop fs -put $HADOOP_HOME/LICENSE.txt input # copy a file there % hadoop jar $HADOOP_HOME/hadoop-*examples*.jar wordcount input output % hadoop fs -cat output/part-* | head
The preceding examples assume that you installed Hadoop on your local machine. But you can also run jobs within the cluster.
To run jobs within the cluster:
1. Log into the Namenode:
% hadoop-ec2 login my-hadoop-cluster
2. Run the job:
# hadoop fs -mkdir input # hadoop fs -put /etc/hadoop/conf/*.xml input # hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar grep input output 'dfs\[a-z.]+' # hadoop fs -cat output/part-* | head