HDFS Commands
General purpose commands for HDFS
Find help on ls command in HDFS command
$ hadoop fs -help ls
List files under HDFS root
$ hadoop fs -ls /
List files under current user’s home folder, which is /user/<username>
$ hadoop fs -ls
Note: hdfs has no concept of pwd like linux's. Any file/directory can be specified either relative to root (/) or user's home directory (/user/<username>)
Create a local folder sfpd directory in your home folder.
$ hadoop fs -mkdir -p sfpd
Find ~/Downloads/datasets/Map__Crime_Incidents_-_from_1_Jan_2003.csv inside the VM and upload it to sfpd dir you created above. Alternatively, you can download sample data from source here.
$ hadoop fs -put ~/Downloads/datasets/Map__Crime_Incidents_-_from_1_Jan_2003.csv sfpd
Read first few lines from the file
$ hadoop fs -cat sfpd/Map__Crime_Incidents_-_from_1_Jan_2003.csv| head
Read last few lines from the file (last KB)
$ hadoop fs -tail sfpd/Map__Crime_Incidents_-_from_1_Jan_2003.csv
Download the file to the local directory. You may have to rename or move or delete any existing file with the same name.
$ hadoop fs -get sfpd/Map__Crime_Incidents_-_from_1_Jan_2003.csv
Note: You can download a directory as well using same command.
Similarly, upload entire directory ~/Downloads/datasets/weblogs to home dir of HDFS. Note, the files are in compressed format. Upload them as they are.
How many files are there?
$ hadoop fs -ls hdfs:///user/cloudera/weblogs
View first few lines from /user/cloudera/weblogs/access_log_1.gz file
$ hadoop fs -text /user/cloudera/weblogs/access_log_1.gz | head
Note: since the file is compressed you cannot use -cat command to view the content.
What is the size of the file in KB?
$ hadoop fs -ls -h /user/cloudera/weblogs/
Copy the directory /user/cloudera/weblogs under /internal. Create the direct /internal if it does not exist.
$ hadoop fs -mkdir /internal
$ hadoop fs -cp /user/cloudera/weblogs /internal
For large file copy, you can use distcp command, which invokes a map reduce job to execute the copy operation in parallel.
Change the file permission to 777
$ hadoop fs -chmod -R 777 /internal/weblogs
Size of /internal/weblogs directory?
$ hadoop fs -du -s -h /internal/weblogs
Find file with name containing crime
$ hadoop fs -find / -iname *weblogs*
Delete /internal/weblogs directory
$ hadoop fs -rm -r /internal/weblogs
Create a file with based on input from stdin. Press <ctrl -C> to end appending the file.
$ hadoop fs -appendToFile - sample.txt
Note: Useful to quickly create a sample file for test purpose. You can also use this command to concatenate a local file to an existing file in HDFS.
Advanced HDFS Commands
Upload the file Map__Crime_Incidents_-_from_1_Jan_2003.csv from local to HDFS with block size 32MB.
$ hadoop fs -rm -r /user/cloudera/sfpd
$ hadoop fs -mkdir /user/cloudera/sfpd
$ hadoop fs -Ddfs.block.size=33554432 -put Map__Crime_Incidents_-_from_1_Jan_2003.csv /user/cloudera/sfpd
How many blocks are there? You can check the information in Namenode Web UI (http://<namenode server, localhost>:50070) or using command
Find number of blocks of
$ hadoop fsck /user/cloudera/sfpd/Map__Crime_Incidents_-_from_1_Jan_2003.csv -files -blocks -locations
Setting the replication factor on an existing file to 2.
$ hadoop fs -setrep 2 /user/cloudera/sfpd/Map__Crime_Incidents_-_from_1_Jan_2003.csv
View documentations on all available commands in the HDFS help document
Web UI for HDFS is exposed to http://<namenode-server>:50070
You can use this web address to view file blocks, block Ids, number of data nodes, used storage etc. Also you can view the HDFS configurations using http://<namenode-server>:50070/conf
Name Node File (** DO NOT make any changes. ***)
Go to web UI conf, http://localhost:50070/conf and search for dfs.namenode.name.dir. The holder mentioned contains fs images and edit logs.
Explore the directory
$ tree /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current
VERSION file contains hadoop version and block pool id that is unique for a HDFS cluster.
$ cat /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current/VERSION
fsimage file contains the serialized form of directory structure and file inode meta data such replication level, modification and access times, permissions, block size, blocks Id that make up the file.
If you want to apply the edit logs on fs image
$ hdfs dfsadmin -safemode enter
$ hdfs dfsadmin -saveNamespace
$ hdfs dfsadmin -safemode leave
Check HDFS cluster utilization report
$ hdfs dfsadmin -report