Difference between revisions of "Hadoop: Setup a single host test system"
Jump to navigation
Jump to search
| (11 intermediate revisions by 2 users not shown) | |||
| Line 6: | Line 6: | ||
<syntaxhighlight> | <syntaxhighlight> | ||
apt-get update | apt-get update | ||
| − | apt-get install default-jre | + | apt-get install default-jre openjdk-7-jre |
</syntaxhighlight> | </syntaxhighlight> | ||
| Line 65: | Line 65: | ||
<syntaxhighlight> | <syntaxhighlight> | ||
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-armhf/jre | export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-armhf/jre | ||
| + | </syntaxhighlight> | ||
| + | |||
| + | === Format the namenode === | ||
| + | <syntaxhighlight> | ||
| + | ./bin/hadoop namenode -format | ||
| + | </syntaxhighlight> | ||
| + | |||
| + | == Start Hadoop == | ||
| + | <syntaxhighlight> | ||
| + | ./bin/start-all.sh | ||
| + | </syntaxhighlight> | ||
| + | |||
| + | == Verify Hadoop == | ||
| + | |||
| + | === Check available tests === | ||
| + | <syntaxhighlight> | ||
| + | root@cal4:/opt/hadoop-1.0.3$ ./bin/hadoop jar hadoop-test-1.0.3.jar | ||
| + | An example program must be given as the first argument. | ||
| + | Valid program names are: | ||
| + | DFSCIOTest: Distributed i/o benchmark of libhdfs. | ||
| + | DistributedFSCheck: Distributed checkup of the file system consistency. | ||
| + | MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures | ||
| + | TestDFSIO: Distributed i/o benchmark. | ||
| + | dfsthroughput: measure hdfs throughput | ||
| + | filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed) | ||
| + | loadgen: Generic map/reduce load generator | ||
| + | mapredtest: A map/reduce test check. | ||
| + | mrbench: A map/reduce benchmark that can create many small jobs | ||
| + | nnbench: A benchmark that stresses the namenode. | ||
| + | testarrayfile: A test for flat files of binary key/value pairs. | ||
| + | testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce | ||
| + | testfilesystem: A test for FileSystem read/write. | ||
| + | testipc: A test for ipc. | ||
| + | testmapredsort: A map/reduce program that validates the map-reduce framework's sort. | ||
| + | testrpc: A test for rpc. | ||
| + | testsequencefile: A test for flat files of binary key value pairs. | ||
| + | testsequencefileinputformat: A test for sequence file input format. | ||
| + | testsetfile: A test for flat files of binary key/value pairs. | ||
| + | testtextinputformat: A test for text input format. | ||
| + | threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill | ||
| + | </syntaxhighlight> | ||
| + | |||
| + | === Run the DFSIO Test === | ||
| + | <syntaxhighlight> | ||
| + | root@cal4:/opt/hadoop-1.0.3$ ./bin/hadoop jar hadoop-test*.jar TestDFSIO -write -nrFiles 10 -fileSize 100 TestDFSIO.0.0.4 | ||
| + | 12/07/18 12:18:43 INFO fs.TestDFSIO: nrFiles = 10 | ||
| + | 12/07/18 12:18:43 INFO fs.TestDFSIO: fileSize (MB) = 100 | ||
| + | 12/07/18 12:18:43 INFO fs.TestDFSIO: bufferSize = 1000000 | ||
| + | 12/07/18 12:18:45 INFO fs.TestDFSIO: creating control file: 100 mega bytes, 10 files | ||
| + | 12/07/18 12:18:45 INFO fs.TestDFSIO: created control files for: 10 files | ||
| + | 12/07/18 12:18:46 INFO mapred.FileInputFormat: Total input paths to process : 10 | ||
| + | 12/07/18 12:18:46 INFO mapred.JobClient: Running job: job_201207171641_0004 | ||
| + | 12/07/18 12:18:48 INFO mapred.JobClient: map 0% reduce 0% | ||
| + | 12/07/18 12:19:10 INFO mapred.JobClient: map 20% reduce 0% | ||
| + | 12/07/18 12:19:22 INFO mapred.JobClient: map 40% reduce 6% | ||
| + | 12/07/18 12:19:31 INFO mapred.JobClient: map 40% reduce 13% | ||
| + | 12/07/18 12:19:34 INFO mapred.JobClient: map 60% reduce 13% | ||
| + | 12/07/18 12:19:46 INFO mapred.JobClient: map 80% reduce 20% | ||
| + | 12/07/18 12:19:52 INFO mapred.JobClient: map 80% reduce 26% | ||
| + | 12/07/18 12:19:58 INFO mapred.JobClient: map 100% reduce 26% | ||
| + | 12/07/18 12:20:07 INFO mapred.JobClient: map 100% reduce 100% | ||
| + | 12/07/18 12:20:15 INFO mapred.JobClient: Job complete: job_201207171641_0004 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: Counters: 30 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: Job Counters | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: Launched reduce tasks=1 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=119264 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: Launched map tasks=10 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: Data-local map tasks=10 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=56575 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: File Input Format Counters | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: Bytes Read=1120 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: File Output Format Counters | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: Bytes Written=78 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: FileSystemCounters | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: FILE_BYTES_READ=851 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: HDFS_BYTES_READ=2360 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: FILE_BYTES_WRITTEN=238588 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1048576078 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: Map-Reduce Framework | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: Map output materialized bytes=905 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: Map input records=10 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: Reduce shuffle bytes=815 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: Spilled Records=100 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: Map output bytes=745 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: Total committed heap usage (bytes)=1626120192 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: CPU time spent (ms)=58680 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: Map input bytes=260 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: SPLIT_RAW_BYTES=1240 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: Combine input records=0 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: Reduce input records=50 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: Reduce input groups=5 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: Combine output records=0 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: Physical memory (bytes) snapshot=1853804544 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: Reduce output records=5 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: Virtual memory (bytes) snapshot=4109959168 | ||
| + | 12/07/18 12:20:16 INFO mapred.JobClient: Map output records=50 | ||
| + | 12/07/18 12:20:16 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write | ||
| + | 12/07/18 12:20:16 INFO fs.TestDFSIO: Date & time: Wed Jul 18 12:20:16 BST 2012 | ||
| + | 12/07/18 12:20:16 INFO fs.TestDFSIO: Number of files: 10 | ||
| + | 12/07/18 12:20:16 INFO fs.TestDFSIO: Total MBytes processed: 1000 | ||
| + | 12/07/18 12:20:16 INFO fs.TestDFSIO: Throughput mb/sec: 21.530379365284418 | ||
| + | 12/07/18 12:20:16 INFO fs.TestDFSIO: Average IO rate mb/sec: 21.541706085205078 | ||
| + | 12/07/18 12:20:16 INFO fs.TestDFSIO: IO rate std deviation: 0.4955591491226172 | ||
| + | 12/07/18 12:20:16 INFO fs.TestDFSIO: Test exec time sec: 90.213 | ||
| + | 12/07/18 12:20:16 INFO fs.TestDFSIO: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Latest revision as of 13:47, 25 July 2012
Tests performed on a single calxeda SOC with ubuntu 12.10
Prerequisites
Install Java/JRE
apt-get update
apt-get install default-jre openjdk-7-jreSetup Passwordless Access
Setup passwordless ssh for user/root (I used root in this example, separate hadoop user should really be setup!)
ssh-keygen -t rsa
# dont enter a passphrase, just hit enter twice for a blank passphrase
cd .ssh
cat id_rsa.pub >> authorized_keys
chmod 600 authorized_keysInstall Hadoop
Get latest stable release
The latest release is available from: http://ftp.heanet.ie/mirrors/www.apache.org/dist/hadoop/common/stable/
wget http://ftp.heanet.ie/mirrors/www.apache.org/dist/hadoop/common/stable/hadoop-1.0.3.tar.gz
cd /opt
tar zxvf /path/to/download/hadoop-1.0.3.tar.gzSetup Config Files
All files in question here are found in /opt/hadoop-1.0.3
conf/core-site.xml: <xml> <configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration> </xml>
conf/hdfs-site.xml: <xml> <configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration> </xml>
conf/mapred-site.xml: <xml> <configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration> </xml>
conf/hadoop-env/sh:
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-armhf/jreFormat the namenode
./bin/hadoop namenode -formatStart Hadoop
./bin/start-all.shVerify Hadoop
Check available tests
root@cal4:/opt/hadoop-1.0.3$ ./bin/hadoop jar hadoop-test-1.0.3.jar
An example program must be given as the first argument.
Valid program names are:
DFSCIOTest: Distributed i/o benchmark of libhdfs.
DistributedFSCheck: Distributed checkup of the file system consistency.
MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
TestDFSIO: Distributed i/o benchmark.
dfsthroughput: measure hdfs throughput
filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
loadgen: Generic map/reduce load generator
mapredtest: A map/reduce test check.
mrbench: A map/reduce benchmark that can create many small jobs
nnbench: A benchmark that stresses the namenode.
testarrayfile: A test for flat files of binary key/value pairs.
testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
testfilesystem: A test for FileSystem read/write.
testipc: A test for ipc.
testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
testrpc: A test for rpc.
testsequencefile: A test for flat files of binary key value pairs.
testsequencefileinputformat: A test for sequence file input format.
testsetfile: A test for flat files of binary key/value pairs.
testtextinputformat: A test for text input format.
threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spillRun the DFSIO Test
root@cal4:/opt/hadoop-1.0.3$ ./bin/hadoop jar hadoop-test*.jar TestDFSIO -write -nrFiles 10 -fileSize 100 TestDFSIO.0.0.4
12/07/18 12:18:43 INFO fs.TestDFSIO: nrFiles = 10
12/07/18 12:18:43 INFO fs.TestDFSIO: fileSize (MB) = 100
12/07/18 12:18:43 INFO fs.TestDFSIO: bufferSize = 1000000
12/07/18 12:18:45 INFO fs.TestDFSIO: creating control file: 100 mega bytes, 10 files
12/07/18 12:18:45 INFO fs.TestDFSIO: created control files for: 10 files
12/07/18 12:18:46 INFO mapred.FileInputFormat: Total input paths to process : 10
12/07/18 12:18:46 INFO mapred.JobClient: Running job: job_201207171641_0004
12/07/18 12:18:48 INFO mapred.JobClient: map 0% reduce 0%
12/07/18 12:19:10 INFO mapred.JobClient: map 20% reduce 0%
12/07/18 12:19:22 INFO mapred.JobClient: map 40% reduce 6%
12/07/18 12:19:31 INFO mapred.JobClient: map 40% reduce 13%
12/07/18 12:19:34 INFO mapred.JobClient: map 60% reduce 13%
12/07/18 12:19:46 INFO mapred.JobClient: map 80% reduce 20%
12/07/18 12:19:52 INFO mapred.JobClient: map 80% reduce 26%
12/07/18 12:19:58 INFO mapred.JobClient: map 100% reduce 26%
12/07/18 12:20:07 INFO mapred.JobClient: map 100% reduce 100%
12/07/18 12:20:15 INFO mapred.JobClient: Job complete: job_201207171641_0004
12/07/18 12:20:16 INFO mapred.JobClient: Counters: 30
12/07/18 12:20:16 INFO mapred.JobClient: Job Counters
12/07/18 12:20:16 INFO mapred.JobClient: Launched reduce tasks=1
12/07/18 12:20:16 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=119264
12/07/18 12:20:16 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/07/18 12:20:16 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/07/18 12:20:16 INFO mapred.JobClient: Launched map tasks=10
12/07/18 12:20:16 INFO mapred.JobClient: Data-local map tasks=10
12/07/18 12:20:16 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=56575
12/07/18 12:20:16 INFO mapred.JobClient: File Input Format Counters
12/07/18 12:20:16 INFO mapred.JobClient: Bytes Read=1120
12/07/18 12:20:16 INFO mapred.JobClient: File Output Format Counters
12/07/18 12:20:16 INFO mapred.JobClient: Bytes Written=78
12/07/18 12:20:16 INFO mapred.JobClient: FileSystemCounters
12/07/18 12:20:16 INFO mapred.JobClient: FILE_BYTES_READ=851
12/07/18 12:20:16 INFO mapred.JobClient: HDFS_BYTES_READ=2360
12/07/18 12:20:16 INFO mapred.JobClient: FILE_BYTES_WRITTEN=238588
12/07/18 12:20:16 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1048576078
12/07/18 12:20:16 INFO mapred.JobClient: Map-Reduce Framework
12/07/18 12:20:16 INFO mapred.JobClient: Map output materialized bytes=905
12/07/18 12:20:16 INFO mapred.JobClient: Map input records=10
12/07/18 12:20:16 INFO mapred.JobClient: Reduce shuffle bytes=815
12/07/18 12:20:16 INFO mapred.JobClient: Spilled Records=100
12/07/18 12:20:16 INFO mapred.JobClient: Map output bytes=745
12/07/18 12:20:16 INFO mapred.JobClient: Total committed heap usage (bytes)=1626120192
12/07/18 12:20:16 INFO mapred.JobClient: CPU time spent (ms)=58680
12/07/18 12:20:16 INFO mapred.JobClient: Map input bytes=260
12/07/18 12:20:16 INFO mapred.JobClient: SPLIT_RAW_BYTES=1240
12/07/18 12:20:16 INFO mapred.JobClient: Combine input records=0
12/07/18 12:20:16 INFO mapred.JobClient: Reduce input records=50
12/07/18 12:20:16 INFO mapred.JobClient: Reduce input groups=5
12/07/18 12:20:16 INFO mapred.JobClient: Combine output records=0
12/07/18 12:20:16 INFO mapred.JobClient: Physical memory (bytes) snapshot=1853804544
12/07/18 12:20:16 INFO mapred.JobClient: Reduce output records=5
12/07/18 12:20:16 INFO mapred.JobClient: Virtual memory (bytes) snapshot=4109959168
12/07/18 12:20:16 INFO mapred.JobClient: Map output records=50
12/07/18 12:20:16 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
12/07/18 12:20:16 INFO fs.TestDFSIO: Date & time: Wed Jul 18 12:20:16 BST 2012
12/07/18 12:20:16 INFO fs.TestDFSIO: Number of files: 10
12/07/18 12:20:16 INFO fs.TestDFSIO: Total MBytes processed: 1000
12/07/18 12:20:16 INFO fs.TestDFSIO: Throughput mb/sec: 21.530379365284418
12/07/18 12:20:16 INFO fs.TestDFSIO: Average IO rate mb/sec: 21.541706085205078
12/07/18 12:20:16 INFO fs.TestDFSIO: IO rate std deviation: 0.4955591491226172
12/07/18 12:20:16 INFO fs.TestDFSIO: Test exec time sec: 90.213
12/07/18 12:20:16 INFO fs.TestDFSIO: