Difference between revisions of "Hadoop: Setup a cluster test system"
(Created page with "Tested with stable release 1.0.3 on ubuntu 12.04 == Installation == Ensure the following: - Install hadoop on all systems (tar zxvf hadoop in /opt) - Have passwordless ssh be...") |
|||
| Line 28: | Line 28: | ||
Setup the local environment and control hadoop startup variables (at very least, set $JAVA_HOME!) | Setup the local environment and control hadoop startup variables (at very least, set $JAVA_HOME!) | ||
| + | |||
| + | === conf/hadoop-env.sh === | ||
<syntaxhighlight> | <syntaxhighlight> | ||
| − | + | .. | |
| − | export JAVA_HOME= | + | export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-armhf/jre |
</syntaxhighlight> | </syntaxhighlight> | ||
| Line 41: | Line 43: | ||
</xml> | </xml> | ||
| − | + | === conf/hdfs-site.xml === | |
Note: In this configuration file we are only using one disk per host <tt>/data/hadoop/dfs/name</tt> is the hadoop data location. | Note: In this configuration file we are only using one disk per host <tt>/data/hadoop/dfs/name</tt> is the hadoop data location. | ||
<xml> | <xml> | ||
| Line 65: | Line 67: | ||
</xml> | </xml> | ||
| − | + | === conf/mapred-site.xml === | |
<xml> | <xml> | ||
<property> | <property> | ||
Revision as of 14:39, 24 July 2012
Tested with stable release 1.0.3 on ubuntu 12.04
Installation
Ensure the following: - Install hadoop on all systems (tar zxvf hadoop in /opt) - Have passwordless ssh between all hosts - Useful to have pdsh installed (running commands across all hosts) and csync2 to keep the hadoop configuration files syncd. - Ensure Java is installed on all nodes (apt-get install openjdk-7-jre on 12.04)
Default Settings in Hadoop
The following files contain all the default settings for hadoop. These can changed in the site specific configuration files.
- src/core/core-default.xml
- src/hdfs/hdfs-default.xml
- src/mapred/mapred-default.xmlThe following files can be used to override any default parameters in the files above (site configuration files)
- conf/core-site.xml
- conf/hdfs-site.xml
- conf/mapred-site.xmlSetup Hadoop Cluster
Setup the local environment and control hadoop startup variables (at very least, set $JAVA_HOME!)
conf/hadoop-env.sh
..
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-armhf/jreconf/core-site.xml
<xml>
<property>
<name>fs.default.name</name>
<value>hdfs://hostname:9000</value>
</property>
</xml>
conf/hdfs-site.xml
Note: In this configuration file we are only using one disk per host /data/hadoop/dfs/name is the hadoop data location. <xml>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/data/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/data/hadoop/dfs/data</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>/data/hadoop/dfs/namesecondary</value>
</property>
</xml>
conf/mapred-site.xml
<xml>
<property>
<name>mapred.job.tracker</name>
<value>hostname:9001</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/data/hadoop/mapred/local</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/data/hadoop/mapred/system</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>512</value>
<description>The maximum number of map tasks that will be run
simultaneously by a task tracker.
</description>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>512</value>
<description>The maximum number of reduce tasks that will be run
simultaneously by a task tracker.
</description>
</property>
</xml>
- conf/slaves, this is only a line seperated file of hosts that will be data nodes. Ensure ssh password-less access between all hosts for setup.
Verify hadoop is working ok
Check that HDFS is working as expected
root@calx2:~# cd /opt/hadoop-1.0.3/
root@calx2:/opt/hadoop-1.0.3# ./bin/hadoop dfsadmin -report
Configured Capacity: 5533761699840 (5.03 TB)
Present Capacity: 5144346378240 (4.68 TB)
DFS Remaining: 5144345747456 (4.68 TB)
DFS Used: 630784 (616 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 22 (22 total, 0 dead)
Name: 172.28.0.190:50010
Decommission Status : Normal
Configured Capacity: 251534622720 (234.26 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 17711915008 (16.5 GB)
DFS Remaining: 233822679040(217.76 GB)
DFS Used%: 0%
DFS Remaining%: 92.96%
Last contact: Tue Jul 24 04:11:55 CDT 2012
..