HDFS Commands

General purpose commands for HDFS

Find help on ls command in HDFS command

$ hadoop fs -help ls

List files under HDFS root

$ hadoop fs -ls /

List files under current user’s home folder, which is /user/<username>

$ hadoop fs -ls

Note: hdfs has no concept of pwd like linux's. Any file/directory can be specified either relative to root (/) or user's home directory (/user/<username>)

Create a local folder sfpd directory in your home folder.

$ hadoop fs -mkdir -p sfpd

Find ~/Downloads/datasets/Map__Crime_Incidents_-_from_1_Jan_2003.csv inside the VM and upload it to sfpd dir you created above. Alternatively, you can download sample data from source here.

$ hadoop fs -put ~/Downloads/datasets/Map__Crime_Incidents_-_from_1_Jan_2003.csv sfpd

Read first few lines from the file

$ hadoop fs -cat sfpd/Map__Crime_Incidents_-_from_1_Jan_2003.csv| head

Read last few lines from the file (last KB)

$ hadoop fs -tail sfpd/Map__Crime_Incidents_-_from_1_Jan_2003.csv

Download the file to the local directory. You may have to rename or move or delete any existing file with the same name.

$ hadoop fs -get sfpd/Map__Crime_Incidents_-_from_1_Jan_2003.csv

Note: You can download a directory as well using same command.

Similarly, upload entire directory ~/Downloads/datasets/weblogs to home dir of HDFS. Note, the files are in compressed format. Upload them as they are.

How many files are there?

$ hadoop fs -ls hdfs:///user/cloudera/weblogs 

View first few lines from /user/cloudera/weblogs/access_log_1.gz file

$ hadoop fs -text /user/cloudera/weblogs/access_log_1.gz | head

Note: since the file is compressed you cannot use -cat command to view the content.

What is the size of the file in KB?

$ hadoop fs -ls -h /user/cloudera/weblogs/

Copy the directory /user/cloudera/weblogs under /internal. Create the direct /internal if it does not exist.

$ hadoop fs -mkdir /internal
$ hadoop fs -cp /user/cloudera/weblogs /internal

For large file copy, you can use distcp command, which invokes a map reduce job to execute the copy operation in parallel.

Change the file permission to 777

$ hadoop fs -chmod -R 777 /internal/weblogs

Size of /internal/weblogs directory?

$ hadoop fs -du -s -h /internal/weblogs

Find file with name containing crime

$ hadoop fs -find / -iname *weblogs*

Delete /internal/weblogs directory

$ hadoop fs -rm -r /internal/weblogs

Create a file with based on input from stdin. Press <ctrl -C> to end appending the file.

$ hadoop fs -appendToFile - sample.txt

Note: Useful to quickly create a sample file for test purpose. You can also use this command to concatenate a local file to an existing file in HDFS.

Advanced HDFS Commands

Upload the file Map__Crime_Incidents_-_from_1_Jan_2003.csv from local to HDFS with block size 32MB.

$ hadoop fs -rm -r /user/cloudera/sfpd

$ hadoop fs -mkdir /user/cloudera/sfpd

$ hadoop fs -Ddfs.block.size=33554432 -put Map__Crime_Incidents_-_from_1_Jan_2003.csv /user/cloudera/sfpd

How many blocks are there? You can check the information in Namenode Web UI (http://<namenode server, localhost>:50070) or using command

Find number of blocks of

$ hadoop fsck /user/cloudera/sfpd/Map__Crime_Incidents_-_from_1_Jan_2003.csv -files -blocks -locations

Setting the replication factor on an existing file to 2.

$ hadoop fs -setrep 2 /user/cloudera/sfpd/Map__Crime_Incidents_-_from_1_Jan_2003.csv

View documentations on all available commands in the HDFS help document

Web UI for HDFS is exposed to http://<namenode-server>:50070

You can use this web address to view file blocks, block Ids, number of data nodes, used storage etc. Also you can view the HDFS configurations using http://<namenode-server>:50070/conf

Name Node File (** DO NOT make any changes. ***)

Go to web UI conf, http://localhost:50070/conf and search for dfs.namenode.name.dir. The holder mentioned contains fs images and edit logs.

Explore the directory

$ tree /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current

VERSION file contains hadoop version and block pool id that is unique for a HDFS cluster.

$ cat /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current/VERSION

fsimage file contains the serialized form of directory structure and file inode meta data such replication level, modification and access times, permissions, block size, blocks Id that make up the file.

If you want to apply the edit logs on fs image

$ hdfs dfsadmin -safemode enter
$ hdfs dfsadmin -saveNamespace
$ hdfs dfsadmin -safemode leave

Check HDFS cluster utilization report

$ hdfs dfsadmin -report