Situatie
Directories on Linux let you group files in distinct, separate collections. The downside is it becomes tedious moving from directory to directory to perform a repetitive task. Here’s how to automate that.
The first command you learn when you’re introduced to Linux is probably ls
, but cd
won’t be far behind it. Understanding directories and how to move around them, particularly nested subdirectories, is a fundamental part of understanding how Linux organizes itself, and how you can organize your own work into files, directories, and subdirectories.
Grasping the concept of a tree of directories—and how to move between them—is one of the many little milestones you pass as you familiarize yourself with the landscape of Linux. Using cd
with a path takes you to that directory. Shortcuts like cd ~
or cd
on its own take you back to your home directory, and cd ..
moves you up one level in the directory tree. Simple.
However, there isn’t an equally simple means of running a command in all directories of a directory tree. There are different ways we can achieve that functionality, but there isn’t a standard Linux command dedicated to that purpose.
Some commands, such as ls
, have command-line options that force them to operate recursively, meaning they start in one directory and methodically work through the entire directory tree below that directory. For ls
, it’s the -R
(recursive) option.
If you need to use a command that doesn’t support recursion, you have to provide the recursive functionality yourself.
Solutie
The tree Command
The tree
command won’t help us with the task at hand, but it does make it easy to see the structure of a directory tree. It draws the tree in a terminal window so that we can get an instant overview of the directories and subdirectories that make up the directory tree, and their relative positions in the tree.
You’ll need to install tree
.
On Ubuntu you need to type:
sudo apt install tree
On Fedora, use:
sudo dnf install tree
On Manjaro, the command is:
sudo pacman -Sy tree
Using tree
with no parameters draws out the tree below the current directory.
tree
You can pass a path to tree
on the command line.
tree work
The -d
(directories) option excludes files and only shows directories.
tree -d work
This is the most convenient way to get a clear view of the structure of a directory tree. The directory tree shown here is the one used in the following examples. There are five text files and eight directories.
Your first thought might be, if ls
can recursively traverse a directory tree, why not use ls
to do just that and pipe the output into some other commands that parse the directories and perform some actions?
Parsing the output of ls
is considered bad practice. Because of the ability in Linux to create file and directory names containing all sorts of strange characters, it becomes very difficult to create a generic, universally-correct parser.
You might never knowingly create a directory name as preposterous as this, but a mistake in a script or an application might.
Parsing legitimate but poorly considered file and directory names is error-prone. There are other methods we can use that are safer and much more robust than relying on interpreting the output of ls
.
Using the find Command
The find
command has in-built recursive capabilities, and it also has the ability to run commands for us. This lets us build powerful one-liners. If it’s something you’re likely to want to use in the future, you can turn your one-liner into an alias or a shell function.
This command recursively loops through the directory tree, looking for directories. Each time it finds a directory it prints out the name of the directory and repeats the search inside that directory. Having completed searching one directory, it exits that directory and resumes the search in its parent directory.
find work -type d -execdir echo "In:" {} \;
You can see by the order the directories are listed in, how the search progresses through the tree. By comparing the output from the tree
command to the output from the find
one-liner, you’ll see how find
searches each directory and subdirectory in turn until it hits a directory with no subdirectories. It then goes back up a level and resumes the search at that level.
Here’s how the command is made up.
- find: The
find
command. - work: The directory to start the search in. This can be a path.
- -type d: We’re looking for directories.
- -execdir: We’re going to execute a command in each directory we find.
- echo “In:” {}: This is the command., We’re simply echoing the name of the directory to the terminal window. The “{}” holds the name of the current directory.
- \;: This is a semicolon used to terminate the command. We need to escape it with the backslash so that Bash doesn’t interpret it directly.
With a slight change, we can make the find command return files that match a search clue. We need to include the -name option and a search clue. In this example, we’re looking for text files that match “*.txt”, and echoing their name to the terminal window.
find work -name "*.txt" -type f -execdir echo "Found:" {} \;
Whether you search for files or directories depends on what you want to achieve. To run a command inside each directory, use -type d
. To run a command on each matching file, use -type f
.
This command counts the lines in all text files in the starting directory and subdirectories.
find work -name "*.txt" -type f -execdir wc -l {} \;
Traversing Directory Trees With a Script
If you need to traverse directories inside a script you could use the find
command inside your script. If you need to—or just want to—do the recursive searches yourself, you can do that too.
#!/bin/bash shopt -s dotglob nullglob function recursive { local current_dir dir_or_file for current_dir in $1; do echo "Directory command for:" $current_dir for dir_or_file in "$current_dir"/*; do if [[ -d $dir_or_file ]]; then recursive "$dir_or_file" else wc $dir_or_file fi done done } recursive "$1"
Copy the text into an editor and save it as “recurse.sh”, then use the chmod
command to make it executable.
chmod +x recurse.sh
The script sets two shell options, dotglob
and nullglob
.
The dotglob
setting means file and directory names that start with a period “.
” will be returned when wildcard search terms are expanded. This effectively means we’re including hidden files and directories in our search results.
The nullglob
setting means search patterns that don’t find any results are treated as an empty or null string. They don’t default to the search term itself. In other words, if we’re searching for everything in a directory by using the asterisk wildcard “*
“, but there are no results we’ll receive a null string instead of a string containing an asterisk. This prevents the script from inadvertently trying to open a directory called “*”, or treating “*” as a file name.
Next, it defines a function called recursive
. This is where the interesting stuff happens.
Two variables are declared, called current_dir
and dir_or_file
. These are local variables, and can only be referenced within the function.
A variable called $1
is also used within the function. This is the first (and only) parameter passed to the function when it is called.
The script uses two for
loops, one nested inside the other. The first (outer) for
loop is used for two things.
One is to run whatever command you want to have performed in each directory. All we’re doing here is echoing the name of the directory to the terminal window. You could of course use any command or sequence of commands, or call another script function.
The second thing the outer for loop does is to check all file system objects it can find—which will be either files or directories. This is the purpose of the inner for
loop. In turn, each file or directory name is passed into the dir_or_file
variable.
The dir_or_file
variable is then tested in an if statement to see if it is a directory.
- If it is, the function calls itself and passes the name of the directory as a parameter.
- If the
dir_or_file
variable is not a directory, then it must be a file. Any commands that you wish to have applied to the file can be called from theelse
clause of theif
statement. You could also call another function within the same script.
The final line in the script calls the recursive
function and passes in the first command line parameter $1
as the starting directory to search in. This is what kicks off the whole process.
Let’s run the script.
./recurse.sh work
The directories are traversed, and the point in the script where a command would be run in each directory is indicated by the “Directory command for:” lines. Files that are found have the wc
command run on them to count lines, words, and characters.
The first directory processed is “work”, followed by each nested directory branch of the tree.
An interesting point to note is you can change the order that the directories are processed in, by moving the directory-specific commands from being above the inner for loop to being below it.
Let’s move the “Directory command for:” line to after the done
of the inner for
loop.
#!/bin/bash shopt -s dotglob nullglob function recursive { local current_dir dir_or_file for current_dir in $1; do for dir_or_file in "$current_dir"/*; do if [[ -d $dir_or_file ]]; then recursive "$dir_or_file" else wc $dir_or_file fi done echo "Directory command for:" $current_dir done } recursive "$1"
Now we’ll run the script once more.
./recurse.sh work
This time the directories have the commands applied to them from the deepest levels first, working back up the branches of the tree. The directory passed as the parameter to the script is processed last.
If it is important to have deeper directories processed first, this is how you can do it.
Recursion Is Weird
It’s like calling yourself on your own phone, and leaving a message for yourself to tell yourself when you next meet you—repeatedly. It can take some effort before you grasp its benefits, but when you do you’ll see it is a programmatically elegant way to tackle hard problems.
Leave A Comment?