Hadoop: Setup a single host test system

From Define Wiki
Jump to navigation Jump to search

Tests performed on a single calxeda SOC with ubuntu 12.10

Prerequisites

Install Java/JRE

  apt-get update
  apt-get install default-jre

Setup Passwordless Access

Setup passwordless ssh for user/root (I used root in this example, separate hadoop user should really be setup!)

  ssh-keygen -t rsa
  # dont enter a passphrase, just hit enter twice for a blank passphrase
  cd .ssh
  cat id_rsa.pub >> authorized_keys
  chmod 600 authorized_keys

Install Hadoop

Get latest stable release

The latest release is available from: http://ftp.heanet.ie/mirrors/www.apache.org/dist/hadoop/common/stable/

  wget http://ftp.heanet.ie/mirrors/www.apache.org/dist/hadoop/common/stable/hadoop-1.0.3.tar.gz
  cd /opt
  tar zxvf /path/to/download/hadoop-1.0.3.tar.gz

Setup Config Files

All files in question here are found in /opt/hadoop-1.0.3

conf/core-site.xml: <xml> <configuration>

    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000</value>
    </property>

</configuration> </xml>

conf/hdfs-site.xml: <xml> <configuration>

    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>

</configuration> </xml>

conf/mapred-site.xml: <xml> <configuration>

    <property>
        <name>mapred.job.tracker</name>
        <value>localhost:9001</value>
    </property>

</configuration> </xml>

conf/hadoop-env/sh:

  export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-armhf/jre

Format the namenode

  ./bin/hadoop namenode -format

Verify Hadoop

Check available tests

root@cal4:/opt/hadoop-1.0.3$ ./bin/hadoop jar hadoop-test-1.0.3.jar 
An example program must be given as the first argument.
Valid program names are:
  DFSCIOTest: Distributed i/o benchmark of libhdfs.
  DistributedFSCheck: Distributed checkup of the file system consistency.
  MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
  TestDFSIO: Distributed i/o benchmark.
  dfsthroughput: measure hdfs throughput
  filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
  loadgen: Generic map/reduce load generator
  mapredtest: A map/reduce test check.
  mrbench: A map/reduce benchmark that can create many small jobs
  nnbench: A benchmark that stresses the namenode.
  testarrayfile: A test for flat files of binary key/value pairs.
  testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
  testfilesystem: A test for FileSystem read/write.
  testipc: A test for ipc.
  testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
  testrpc: A test for rpc.
  testsequencefile: A test for flat files of binary key value pairs.
  testsequencefileinputformat: A test for sequence file input format.
  testsetfile: A test for flat files of binary key/value pairs.
  testtextinputformat: A test for text input format.
  threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill

Run the DFSIO Test

root@cal4:/opt/hadoop-1.0.3# ./bin/hadoop jar hadoop-test*.jar TestDFSIO -write -nrFiles 10 -fileSize 100TestDFSIO.0.0.4 12/07/18 12:18:43 INFO fs.TestDFSIO: nrFiles = 10 12/07/18 12:18:43 INFO fs.TestDFSIO: fileSize (MB) = 100 12/07/18 12:18:43 INFO fs.TestDFSIO: bufferSize = 1000000 12/07/18 12:18:45 INFO fs.TestDFSIO: creating control file: 100 mega bytes, 10 files 12/07/18 12:18:45 INFO fs.TestDFSIO: created control files for: 10 files 12/07/18 12:18:46 INFO mapred.FileInputFormat: Total input paths to process : 10 12/07/18 12:18:46 INFO mapred.JobClient: Running job: job_201207171641_0004 12/07/18 12:18:48 INFO mapred.JobClient: map 0% reduce 0% 12/07/18 12:19:10 INFO mapred.JobClient: map 20% reduce 0% 12/07/18 12:19:22 INFO mapred.JobClient: map 40% reduce 6% 12/07/18 12:19:31 INFO mapred.JobClient: map 40% reduce 13% 12/07/18 12:19:34 INFO mapred.JobClient: map 60% reduce 13% 12/07/18 12:19:46 INFO mapred.JobClient: map 80% reduce 20% 12/07/18 12:19:52 INFO mapred.JobClient: map 80% reduce 26% 12/07/18 12:19:58 INFO mapred.JobClient: map 100% reduce 26% 12/07/18 12:20:07 INFO mapred.JobClient: map 100% reduce 100% 12/07/18 12:20:15 INFO mapred.JobClient: Job complete: job_201207171641_0004 12/07/18 12:20:16 INFO mapred.JobClient: Counters: 30 12/07/18 12:20:16 INFO mapred.JobClient: Job Counters 12/07/18 12:20:16 INFO mapred.JobClient: Launched reduce tasks=1 12/07/18 12:20:16 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=119264 12/07/18 12:20:16 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/07/18 12:20:16 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/07/18 12:20:16 INFO mapred.JobClient: Launched map tasks=10 12/07/18 12:20:16 INFO mapred.JobClient: Data-local map tasks=10 12/07/18 12:20:16 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=56575 12/07/18 12:20:16 INFO mapred.JobClient: File Input Format Counters 12/07/18 12:20:16 INFO mapred.JobClient: Bytes Read=1120 12/07/18 12:20:16 INFO mapred.JobClient: File Output Format Counters 12/07/18 12:20:16 INFO mapred.JobClient: Bytes Written=78 12/07/18 12:20:16 INFO mapred.JobClient: FileSystemCounters 12/07/18 12:20:16 INFO mapred.JobClient: FILE_BYTES_READ=851 12/07/18 12:20:16 INFO mapred.JobClient: HDFS_BYTES_READ=2360 12/07/18 12:20:16 INFO mapred.JobClient: FILE_BYTES_WRITTEN=238588 12/07/18 12:20:16 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1048576078 12/07/18 12:20:16 INFO mapred.JobClient: Map-Reduce Framework 12/07/18 12:20:16 INFO mapred.JobClient: Map output materialized bytes=905 12/07/18 12:20:16 INFO mapred.JobClient: Map input records=10 12/07/18 12:20:16 INFO mapred.JobClient: Reduce shuffle bytes=815 12/07/18 12:20:16 INFO mapred.JobClient: Spilled Records=100 12/07/18 12:20:16 INFO mapred.JobClient: Map output bytes=745 12/07/18 12:20:16 INFO mapred.JobClient: Total committed heap usage (bytes)=1626120192 12/07/18 12:20:16 INFO mapred.JobClient: CPU time spent (ms)=58680 12/07/18 12:20:16 INFO mapred.JobClient: Map input bytes=260 12/07/18 12:20:16 INFO mapred.JobClient: SPLIT_RAW_BYTES=1240 12/07/18 12:20:16 INFO mapred.JobClient: Combine input records=0 12/07/18 12:20:16 INFO mapred.JobClient: Reduce input records=50 12/07/18 12:20:16 INFO mapred.JobClient: Reduce input groups=5 12/07/18 12:20:16 INFO mapred.JobClient: Combine output records=0 12/07/18 12:20:16 INFO mapred.JobClient: Physical memory (bytes) snapshot=1853804544 12/07/18 12:20:16 INFO mapred.JobClient: Reduce output records=5 12/07/18 12:20:16 INFO mapred.JobClient: Virtual memory (bytes) snapshot=4109959168 12/07/18 12:20:16 INFO mapred.JobClient: Map output records=50 12/07/18 12:20:16 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write 12/07/18 12:20:16 INFO fs.TestDFSIO: Date & time: Wed Jul 18 12:20:16 BST 2012 12/07/18 12:20:16 INFO fs.TestDFSIO: Number of files: 10 12/07/18 12:20:16 INFO fs.TestDFSIO: Total MBytes processed: 1000 12/07/18 12:20:16 INFO fs.TestDFSIO: Throughput mb/sec: 21.530379365284418 12/07/18 12:20:16 INFO fs.TestDFSIO: Average IO rate mb/sec: 21.541706085205078 12/07/18 12:20:16 INFO fs.TestDFSIO: IO rate std deviation: 0.4955591491226172 12/07/18 12:20:16 INFO fs.TestDFSIO: Test exec time sec: 90.213 12/07/18 12:20:16 INFO fs.TestDFSIO: