
Command-line access to the HDFS filesystem
Within the Hadoop distribution, there is a command-line utility called hdfs
, which is the primary way to interact with the filesystem from the command line. Run this without any arguments to see the various subcommands available. There are many, though; several are used to do things like starting or stopping various HDFS components. The general form of the hdfs
command is:
hdfs <sub-command> <command> [arguments]
The two main subcommands we will use in this book are:
dfs
: This is used for general filesystem access and manipulation, including reading/writing and accessing files and directoriesdfsadmin
: This is used for administration and maintenance of the filesystem. We will not cover this command in detail, though. Have a look at the-report
command, which gives a listing of the state of the filesystem and all DataNodes:$ hdfs dfsadmin -report
Exploring the HDFS filesystem
Run the following to get a list of the available commands provided by the dfs
subcommand:
$ hdfs dfs
As will be seen from the output of the preceding command, many of these look similar to standard Unix filesystem commands and, not surprisingly, they work as would be expected. In our test VM, we have a user account called cloudera
. Using this user, we can list the root of the filesystem as follows:
$ hdfs dfs -ls / Found 7 items drwxr-xr-x - hbase hbase 0 2014-04-04 15:18 /hbase drwxr-xr-x - hdfs supergroup 0 2014-10-21 13:16 /jar drwxr-xr-x - hdfs supergroup 0 2014-10-15 15:26 /schema drwxr-xr-x - solr solr 0 2014-04-04 15:16 /solr drwxrwxrwt - hdfs supergroup 0 2014-11-12 11:29 /tmp drwxr-xr-x - hdfs supergroup 0 2014-07-13 09:05 /user drwxr-xr-x - hdfs supergroup 0 2014-04-04 15:15 /var
The output is very similar to the Unix ls
command. The file attributes work the same as the user
/group
/world
attributes on a Unix filesystem (including the t
sticky bit as can be seen) plus details of the owner, group, and modification time of the directories. The column between the group name and the modified date is the size; this is 0 for directories but will have a value for files as we'll see in the code following the next information box:
Note
If relative paths are used, they are taken from the home directory of the user. If there is no home directory, we can create it using the following commands:
$ sudo -u hdfs hdfs dfs –mkdir /user/cloudera $ sudo -u hdfs hdfs dfs –chown cloudera:cloudera /user/cloudera
The mkdir
and chown
steps require superuser privileges (sudo -u hdfs
).
$ hdfs dfs -mkdir testdir $ hdfs dfs -ls Found 1 items drwxr-xr-x - cloudera cloudera 0 2014-11-13 11:21 testdir
Then, we can create a file, copy it to HDFS, and read its contents directly from its location on HDFS, as follows:
$ echo "Hello world" > testfile.txt $ hdfs dfs -put testfile.txt testdir
Note that there is an older command called -copyFromLocal
, which works in the same way as -put
; you might see it in older documentation online. Now, run the following command and check the output:
$ hdfs dfs -ls testdir Found 1 items -rw-r--r-- 3 cloudera cloudera 12 2014-11-13 11:21 testdir/testfile.txt
Note the new column between the file attributes and the owner; this is the replication factor of the file. Now, finally, run the following command:
$ hdfs dfs -tail testdir/testfile.txt Hello world
Much of the rest of the dfs
subcommands are pretty intuitive; play around. We'll explore snapshots and programmatic access to HDFS later in this chapter.