Hortonworks HDP: Using the command line to manage files on HDFS
Jump to navigation
Jump to search
Assumes a working HDP Installation
Using the command line to manage files on HDFS
Perform a directory listing
[boston@compute-0-0 ~]$ /usr/lib/hadoop/bin/hadoop fs -ls /
Found 5 items
drwxr-xr-x - hdfs hdfs 0 2014-09-09 17:11 /apps
drwxr-xr-x - mapred hdfs 0 2014-09-09 17:06 /mapred
drwxr-xr-x - hdfs hdfs 0 2014-09-09 17:06 /mr-history
drwxrwxrwx - hdfs hdfs 0 2014-09-09 17:12 /tmp
drwxr-xr-x - hdfs hdfs 0 2014-09-09 17:11 /user[boston@compute-0-0 ~]$ /usr/lib/hadoop/bin/hadoop fs -ls /user
Found 4 items
drwxrwx--- - ambari-qa hdfs 0 2014-09-09 17:16 /user/ambari-qa
drwxr-xr-x - hcat hdfs 0 2014-09-09 17:11 /user/hcat
drwx------ - hive hdfs 0 2014-09-09 17:07 /user/hive
drwxrwxr-x - oozie hdfs 0 2014-09-09 17:08 /user/oozieCreate a directory
NOTE: Standard users by default wont be able to create a directory. Use the 'hdfs' user to perform a chmod
[root@compute-0-0 ~]# /usr/lib/hadoop/bin/hadoop fs -mkdir /user/boston
mkdir: Permission denied: user=root, access=WRITE, inode="/user":hdfs:hdfs:drwxr-xr-x[root@compute-0-0 ~]# su -l hdfs
[hdfs@compute-0-0 ~]$ /usr/lib/hadoop/bin/hadoop fs -chmod 777 /user
[hdfs@compute-0-0 ~]$Now the directory can be created with a standard user
[boston@compute-0-0 ~]$ /usr/lib/hadoop/bin/hadoop fs -mkdir /user/boston
[boston@compute-0-0 ~]$Upload a file to HDFS
Here we create a file and 'put' it in to HDFS
[boston@compute-0-0 hadoop]$ echo "Sample Text" > filename.txt
[boston@compute-0-0 hadoop]$ /usr/lib/hadoop/bin/hadoop fs -put filename.txt /user/boston/
[boston@compute-0-0 hadoop]$ /usr/lib/hadoop/bin/hadoop fs -ls /user/boston/
Found 1 items
-rw-r--r-- 3 boston hdfs 12 2014-09-10 11:49 /user/boston/filename.txtUpload multiple files:
[boston@compute-0-0 hadoop]$ touch multiplefile{1..10}.txt
[boston@compute-0-0 hadoop]$ ls
filename.txt multiplefile1.txt multiplefile3.txt multiplefile5.txt multiplefile7.txt multiplefile9.txt
multiplefile10.txt multiplefile2.txt multiplefile4.txt multiplefile6.txt multiplefile8.txt
[boston@compute-0-0 hadoop]$ /usr/lib/hadoop/bin/hadoop fs -put filename.txt multiplefile* /user/boston/
put: `/user/boston/filename.txt': File exists
[boston@compute-0-0 hadoop]$ /usr/lib/hadoop/bin/hadoop fs -ls /user/boston/
Found 11 items
-rw-r--r-- 3 boston hdfs 12 2014-09-10 11:49 /user/boston/filename.txt
-rw-r--r-- 3 boston hdfs 0 2014-09-10 12:37 /user/boston/multiplefile1.txt
-rw-r--r-- 3 boston hdfs 0 2014-09-10 12:37 /user/boston/multiplefile10.txt
-rw-r--r-- 3 boston hdfs 0 2014-09-10 12:37 /user/boston/multiplefile2.txt
-rw-r--r-- 3 boston hdfs 0 2014-09-10 12:37 /user/boston/multiplefile3.txt
-rw-r--r-- 3 boston hdfs 0 2014-09-10 12:37 /user/boston/multiplefile4.txt
-rw-r--r-- 3 boston hdfs 0 2014-09-10 12:37 /user/boston/multiplefile5.txt
-rw-r--r-- 3 boston hdfs 0 2014-09-10 12:37 /user/boston/multiplefile6.txt
-rw-r--r-- 3 boston hdfs 0 2014-09-10 12:37 /user/boston/multiplefile7.txt
-rw-r--r-- 3 boston hdfs 0 2014-09-10 12:37 /user/boston/multiplefile8.txt
-rw-r--r-- 3 boston hdfs 0 2014-09-10 12:37 /user/boston/multiplefile9.txtCheck the disk usage on HDFS
[boston@compute-0-0 hadoop]$ /usr/lib/hadoop/bin/hadoop fs -du /user/boston
12 /user/boston/filename.txt
Some Advanced Features
Use getmerge to concatenate files
This example takes the contents of all files in a hadoop directory and merges the contents in to a single file on your local system (file not created on the HDFS system)
[boston@compute-0-0 hadoop]$ /usr/lib/hadoop/bin/hadoop fs -mkdir /user/boston/mergetest
[boston@compute-0-0 hadoop]$ touch merge{1..5}.txt
[boston@compute-0-0 hadoop]$ echo content1 > merge1.txt
[boston@compute-0-0 hadoop]$ echo content2 > merge2.txt
[boston@compute-0-0 hadoop]$ echo content3 > merge3.txt
[boston@compute-0-0 hadoop]$ echo content4 > merge4.txt
[boston@compute-0-0 hadoop]$ echo content5 > merge5.txt
[boston@compute-0-0 hadoop]$ /usr/lib/hadoop/bin/hadoop fs -put merge*.txt /user/boston/mergetest
[boston@compute-0-0 hadoop]$ /usr/lib/hadoop/bin/hadoop fs -ls /user/boston/mergetest
Found 5 items
-rw-r--r-- 3 boston hdfs 9 2014-09-10 12:43 /user/boston/mergetest/merge1.txt
-rw-r--r-- 3 boston hdfs 9 2014-09-10 12:43 /user/boston/mergetest/merge2.txt
-rw-r--r-- 3 boston hdfs 9 2014-09-10 12:43 /user/boston/mergetest/merge3.txt
-rw-r--r-- 3 boston hdfs 9 2014-09-10 12:43 /user/boston/mergetest/merge4.txt
-rw-r--r-- 3 boston hdfs 9 2014-09-10 12:43 /user/boston/mergetest/merge5.txt
[boston@compute-0-0 hadoop]$ /usr/lib/hadoop/bin/hadoop fs -getmerge /user/boston/mergetest/ ./LocalMergeFile.txt
[boston@compute-0-0 hadoop]$ cat LocalMergeFile.txt
content1
content2
content3
content4
content5distcp for large internal copies
- Copy file or directories recursively
- It is a tool used for large inter/intra-cluster copying
- It uses MapReduce to effect its distribution copy, error handling and recovery, and reporting
[boston@compute-0-0 hadoop]$ /usr/lib/hadoop/bin/hadoop fs -mkdir /user/boston/mergecopy
[boston@compute-0-0 hadoop]$ /usr/lib/hadoop/bin/hadoop distcp /user/boston/mergetest /user/boston/mergecopy
14/09/10 13:15:16 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[/user/boston/mergetest], targetPath=/user/boston/mergecopy}
14/09/10 13:15:16 INFO client.RMProxy: Connecting to ResourceManager at compute-0-14.local/10.1.255.238:8050
14/09/10 13:15:17 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
14/09/10 13:15:17 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
14/09/10 13:15:19 INFO client.RMProxy: Connecting to ResourceManager at compute-0-14.local/10.1.255.238:8050
14/09/10 13:15:21 INFO mapreduce.JobSubmitter: number of splits:5
14/09/10 13:15:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1410279084309_0005
14/09/10 13:15:22 INFO impl.YarnClientImpl: Submitted application application_1410279084309_0005
14/09/10 13:15:22 INFO mapreduce.Job: The url to track the job: http://compute-0-14.local:8088/proxy/application_1410279084309_0005/
14/09/10 13:15:22 INFO tools.DistCp: DistCp job-id: job_1410279084309_0005
14/09/10 13:15:22 INFO mapreduce.Job: Running job: job_1410279084309_0005
14/09/10 13:15:27 INFO mapreduce.Job: Job job_1410279084309_0005 running in uber mode : false
14/09/10 13:15:27 INFO mapreduce.Job: map 0% reduce 0%
14/09/10 13:15:32 INFO mapreduce.Job: map 20% reduce 0%
14/09/10 13:15:34 INFO mapreduce.Job: map 40% reduce 0%
14/09/10 13:15:35 INFO mapreduce.Job: map 60% reduce 0%
14/09/10 13:15:36 INFO mapreduce.Job: map 100% reduce 0%
14/09/10 13:15:41 INFO mapreduce.Job: Job job_1410279084309_0005 completed successfully
14/09/10 13:15:41 INFO mapreduce.Job: Counters: 33
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=520330
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2585
HDFS: Number of bytes written=45
HDFS: Number of read operations=95
HDFS: Number of large read operations=0
HDFS: Number of write operations=21
Job Counters
Launched map tasks=5
Other local map tasks=5
Total time spent by all maps in occupied slots (ms)=19798
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=19798
Total vcore-seconds taken by all map tasks=19798
Total megabyte-seconds taken by all map tasks=20273152
Map-Reduce Framework
Map input records=5
Map output records=0
Input split bytes=580
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=101
CPU time spent (ms)=4270
Physical memory (bytes) snapshot=1087881216
Virtual memory (bytes) snapshot=14741442560
Total committed heap usage (bytes)=5211422720
File Input Format Counters
Bytes Read=1960
File Output Format Counters
Bytes Written=0
org.apache.hadoop.tools.mapred.CopyMapper$Counter
BYTESCOPIED=45
BYTESEXPECTED=45
COPY=5
[boston@compute-0-0 hadoop]$ /usr/lib/hadoop/bin/hadoop fs -ls /user/boston/mergecopy
Found 1 items
drwxr-xr-x - boston hdfs 0 2014-09-10 13:15 /user/boston/mergecopy/mergetest
[boston@compute-0-0 hadoop]$ /usr/lib/hadoop/bin/hadoop fs -ls /user/boston/mergecopy/mergetest
Found 5 items
-rw-r--r-- 3 boston hdfs 9 2014-09-10 13:15 /user/boston/mergecopy/mergetest/merge1.txt
-rw-r--r-- 3 boston hdfs 9 2014-09-10 13:15 /user/boston/mergecopy/mergetest/merge2.txt
-rw-r--r-- 3 boston hdfs 9 2014-09-10 13:15 /user/boston/mergecopy/mergetest/merge3.txt
-rw-r--r-- 3 boston hdfs 9 2014-09-10 13:15 /user/boston/mergecopy/mergetest/merge4.txt
-rw-r--r-- 3 boston hdfs 9 2014-09-10 13:15 /user/boston/mergecopy/mergetest/merge5.txt