Shell Scripting in Rust?

Jason McCampbell
ITNEXT
Published in
5 min readApr 29, 2021

--

tl;dr: Yes, for a performance-oriented, system-level language, scripting in Rust is surprisingly practical, on par with Python in this case. The Cargo build/package manager offer three big advantages compared to using a language like C or C++.

Maybe overkill for a garden… but looks like fun! Photo by omid roshan on Unsplash

Gphoto-sort

Gphoto-sort started as a small Bash script back when Google killed its link between Google Photos and Google Drive. I used the Google Drive sync tool to easily backup all of our photos. Then it was replaced by Google Takeout, which lets me download all of my Photos. Every. Single. Time. In a different directory structure.

Most photos have the timestamp in the file name so I figured I could easily fix it with script to move files from the Takeout archive into the existing tree if the file doesn’t already exist. That way I keep my directory structure and don’t have to do a full cloud backup again.

Except that I really wanted to also include my wife’s Takeout archive and de-duplicate across the two, which meant comparing file checksums. And so on… As with most software projects the scope grew and suddenly my Bash-foo was not up to task. Typically I would fail-over to Python for scripts like this but I was also learning the Rust Language and I got curious, how practical would this be in Rust, which is a system-level language like C/C++?

I have spent most of my career writing C++ code, and I really like writing performance-oriented, close-to-the-metal code. But I would never do this in C++. Even if I already had Boost installed and built at home, and a Makefile structure setup, I’d just use Python because there are higher level libraries to build upon.

Really, scripting in Rust?

As it turns out, it worked well. Really well. And I attribute that to three advantages Rust has in this context: Cargo, Cargo, and Cargo.

Cargo makes code reuse practical

If you aren’t familiar with it, Cargo is Rust’s package manager and build tool. It handles downloading, updating, and building dependencies and then compiling and linking the final executable. If you are coming from the Javascript world with NPM or using Ruby with Gem, this is old hat. Compared to C++, and even Python to some degree, it is the equivalent of being given a washing machine and not having to lug the dirty laundry down to the nearest stream to beat it against some rocks.

For example, I add the following line to my Cargo.toml:

rust-crypto = "0.2"

and suddenly I have a MD5 hashing library (Crate) for computing the checksum of files. That’s it. No need to figure out the idiosyncrasies of each packages’ build system and their dependencies.

Cargo enables a rich ecosystem of Crates

Making it easy to reuse even small modules means a rich ecosystem of support crates has quickly developed. Excellent crates such as Walkdir and Clap (command line parsing) mean it took minutes to assemble the building blocks for a file processing utility. Again, this is nothing new to many high-level languages, but isn’t typical at the system level.

Cargo encourages small Crates, thus smaller module interfaces

The fine granularity of modules in the Rust Crate ecosystem surprised me. For example, Walkdir is not very big itself, and has three dependencies. There is even a (useful!) single-function crate: GetHostName. Why?

Part of it, I think, is due to the module support in Rust. This makes it easy to define even small modules with a well-defined interface. However, it isn’t all that different than creating a C++ class or a header file plus internal details header file.

What is different is that splitting a C++ class out into a separate directory, with a separate build target, is (in my experience) considerably more cumbersome: every consumer has to link the .o files, the static library, or the shared library. But then each consumer also needs to know of any dependencies to link, and keep those up-to-date from then on. This is where Cargo is game-changing because suddenly my build focuses only on my direct dependencies. This allows what would be internal modules to be broken off into reusable components which may be modified independently of the main application.

Examples

As an illustration, here is the code needed to recursively walk the directory tree under the current directory and print all of the files and directories found:

for ent in WalkDir::new(".") {
if let Ok(ent) = ent {
let path = ent.path();
if path.is_dir() {
println!("{:?} is a directory", path);
} else {
println!("{:?} is a file", path);
}
}
}

The if let Ok(ent) = ent expression gets each entry that is “Ok” (not an error) into variable ent again. Then the code just checks whether the path is a file or directory and prints it.

A similar example could instead by written as a pipeline using Rust’s iterator support, like this:

let count = WalkDir::new(".")
.into_iter()
.filter_map(|e| e.ok())
.filter(|e|e.path().is_file())
.count();
println!("Found {} files", count);

This example uses filter_map to remove any error entries, then filters to keep only entries which are files, and finally produces a count of those files.

The really interesting bit is this iterator-based version plays well with a data-parallelism crate called Rayon. Rayon can take a pipeline such as this and broadcast the processing across multiple CPU cores using a thread pool, and aggregate the final results almost magically. To parallelize our simple example here we just add one line, .par_bridge():

let count = WalkDir::new(".")
.into_iter()
.filter_map(|e| e.ok())
.par_bridge()
.filter(|e|e.path().is_file())
.count();
println!("Found {} files", count);

Now the “heavy processing” of calling is_file on thousands of files can be shared across multiple CPU cores. It doesn’t make sense here, of course, but if the filtering step are more computationally expensive it could make a big difference.

Summary

In short, I am highly impressed with Rust’s “dynamic range” in being a system-level language that also allowed me to write what amounts to a glorified shell script in under 300 lines — comments, unit tests, and all. This really reflects the quality of the crates out there and the thought that has gone into Cargo.

Was it necessary to write this in a low-level language? Not at all! However, this sort of file system-level manipulation comes up surprisingly often in larger applications. Being able to do it efficiently even at the lowest-levels of the application stack is a nice productivity boost.

--

--

Software architect with interests in AI/ML, high-performance computing, physics, and finance/economics.