DEV Community

Alexis Kypridemos
Alexis Kypridemos

Posted on

Sort Items in a Directory by Descending Size Using Python, PowerShell, C#, or Go

A simple program to sort all the items in a directory, be they file or folder, by descending size.

Sorting items in a directory by descending size is not as straightforward as you might think, whether you are using a graphical file browser or the command line, because operating systems do not calculate the total size of a directory's contents when browsing a directory tree. This article offers complete working programs to overcome this on most operating systems.

Problem

Perhaps you will find the following familiar:

Whether for work or for personal projects, I like to organize my digital assets by creating a parent directory, let's say one called Projects, and storing all the content for the individual projects in there. If a project is small and doesn't involve a lot of content, I'll use a single file, usually a text file. If a project involves more content, say a text file as well as a couple of screenshots, I'll create a folder for that project and place all the related assets in there. So, from my perspective, the single text file and the folder are equivalent in the sense that each represents a project. The only difference is that the folder represents a bigger project, one with more stuff.

Sometimes I want to see which of my projects is currently the largest, which has the most stuff. This usually happens because I haven't worked on a particular area for some time, so when I come back to it, I want to see which project has the most content. My reasoning being that the project with the most content should be the most complete, and therefore probably the one I should start working on first, as it will be easiest to finish.

For example, consider a directory with the following contents:

Name Type Size
Huge Project.txt File 2.6KB
Larger Project Folder 1.07KB
0 - Tiny Project Folder 0KB
Basic Project.txt File 0.36KB
Big Project.txt File 2.11KB

Sorting the above directory by descending size should output:

Huge Project.txt        2.6KB
Big Project.txt         2.11KB
Larger Project          1.07KB
Basic Project.txt       0.36KB
0 - Tiny Project        0KB
Enter fullscreen mode Exit fullscreen mode

However, this is not what we get when we click the Size column header in graphical file browsers on Windows, Mac, and Linux.

Windows

Folder items sorted by size in Windows File Explorer

Windows File Explorer - The files are sorted by descending size, and the folders are displayed underneath, in ascending alphabetical order.

Mac

Folder items sorted by size in MacOS Finder.

MacOS Finder - The directory contents are sorted the same as on Windows.

Linux

Folder items sorted by size in the Ubuntu Files app.

Linux (Ubuntu) Files app - The folders and files are sorted correctly, but individually; folders first, then files. So the item that appears first on the list is not actually the largest item in the directory.

Using the command line provides output that is somewhat closer to the desired one, but still not entirely correct:

Windows

dir /b /o:-d

Output:

Larger Project
0 - Tiny Project
Huge Project.txt
Big Project.txt
Basic Project.txt

Enter fullscreen mode Exit fullscreen mode

Mac and Linux

There are various command combinations for directory content sorting on UNIX-based systems such as Mac and Linux. Most involve using du, sort, and ls. Other examples I found online threw find and grep into the mix as well.

Here are the ones I tried:

du | sort

du -a -h --max-depth=1 | sort -hr

Output:

32K     .
8.0K    ./Larger Project
8.0K    ./0 - Tiny Project
4.0K    ./Huge Project.txt
4.0K    ./Big Project.txt
4.0K    ./Basic Project.txt

Enter fullscreen mode Exit fullscreen mode

ls

Using the -S switch on the ls command is supposed to do exactly what I'm looking for, sort items by descending size.

ls -S

Output:

'0 - Tiny Project'  'Larger Project'  'Huge Project.txt'  'A - Big Project.txt'  'Basic Project.txt'
Enter fullscreen mode Exit fullscreen mode

The output is still off. I tried adding the -l (long) switch.

ls -lS

Output:

total 20
drwx---r-x 2 admin admin 4096 Sep 20 21:49 '0 - Tiny Project'
drwx---r-x 2 admin admin 4096 Sep 20 21:49 'Larger Project'
-rw-rw-r-- 1 admin admin 2667 Sep 20 21:49 'Huge Project.txt'
-rw-rw-r-- 1 admin admin 2164 Sep 20 21:49 'Big Project.txt'
-rw-rw-r-- 1 admin admin  368 Sep 20 21:49 'Basic Project.txt'
Enter fullscreen mode Exit fullscreen mode

The output includes more detail, as expected, but the sort order is the same as before.

Root Cause

While the output of the various commands does not provide the desired output, it does highlight the root cause of the problem. When browsing a directory tree, operating systems do not recurse into folders to calculate the total size of their contents. Instead, they treat all folders as having the same fixed size. Usually this is the file system's minimum block size, commonly 4096 bytes, 4KB.

Solution

There must be at least a dozen free tools out there that solve this problem, but to be honest, I didn't even look. Writing a script/program that does the same thing and then sharing it here felt like it would be easier, involve less bloat, hopefully useful for others, and definitely more fun.

I've waffled on long enough. Here is the code:

Python

#! /usr/bin/env python3

import sys, os, argparse

def get_dir_items(path):
    results = {}
    items = os.scandir(path)
    for item in items:
        if item.is_file():
            results[item.name] = item.stat().st_size
        elif item.is_dir():
            dir_size = 0
            dir_results = get_dir_items(item.path)
            for key, value in dir_results.items():
                dir_size += value
            results[item.name] = dir_size
    return results

if __name__ == "__main__":
    if len(sys.argv) <= 1:
        print("Specify a path as the first argument.")
    else:
        root_path = sys.argv[1]
        if not os.path.exists(root_path):
            print(root_path, "is not a valid path")
        else:
            results = get_dir_items(root_path)
            results_sorted = dict(sorted(results.items(), key = lambda item: item[1], reverse = True))
            for key, value in results_sorted.items():
                print(key, "\t", round(value / 1024, 2), "KB")
Enter fullscreen mode Exit fullscreen mode

PowerShell

if (!$args[0]){
    echo "Specify a path as the first argument."
} else {
    $root_path = $args[0]
    if (!(test-path $root_path)){
        echo ($root_path + " is not a valid path")
    } else {
        $results = @{}
        $items = gci $root_path
        foreach($item in $items){
            if ($item -is [System.IO.DirectoryInfo]){
                $results[$item.Name] = (gci $item.FullName -recurse | measure length -sum).Sum
            } else {
                $results[$item.Name] = $item.Length
            }
        }
        $results_sorted = $results.GetEnumerator() | sort value -descending
        foreach($result in $results_sorted){
            echo ($result.Name + "`t" + [math]::Round(($result.Value/1024), 2) + "KB")
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

C Sharp

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
namespace Program {
    class Program {

        private static Dictionary<string, long> getDirItems(string path) {
            Dictionary<string, long> results = new Dictionary<string, long>();
            DirectoryInfo directoryInfo = new DirectoryInfo(path);
            FileInfo[] files = directoryInfo.GetFiles();
            foreach(FileInfo file in files){
                results.Add(file.Name, file.Length);
            }
            DirectoryInfo[] directories = directoryInfo.GetDirectories();
            foreach(DirectoryInfo directory in directories){
                var dirResults = getDirItems(directory.FullName);
                long dirSize = 0;
                foreach(var dirResult in dirResults){
                    dirSize += dirResult.Value;
                }
                results.Add(directory.Name, dirSize);
            }
            return results;
        }

        static void Main (string[] args) {
            if (args.Length < 1){
                Console.WriteLine("Specify a path as the first argument.");
            } else {
                string rootPath = args[0];
                if (!Directory.Exists(rootPath)){
                    Console.WriteLine("{0} is not a valid path", rootPath);
                } else {
                    var results = getDirItems(rootPath);
                    var resultsSorted = results.OrderByDescending(x => x.Value).ToDictionary(x => x.Key, x => x.Value);
                    foreach(var result in resultsSorted){
                        Console.WriteLine("{0}\t{1}KB", Path.GetFileName(result.Key), Math.Round((decimal)result.Value / 1024, 2));
                    }
                }
            }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Go

package main

import(
    "fmt"
    "os"
    "path/filepath"
    "sort"
)

func getDirItems(path string) map[string]int64 {
    results := make(map[string]int64)
    items, e1 := os.ReadDir(path)
    if e1 != nil {
        fmt.Println("Error:", e1)
        os.Exit(1)
    }
    for _, item := range items {
        if item.IsDir(){
            var dirSize int64 = 0
            var dirResults = getDirItems(filepath.Join(path, item.Name()))
            for _, dirResult := range dirResults {
                dirSize += dirResult
            }
            results[item.Name()] = dirSize          
        } else {
            itemInfo, e2 := item.Info()
            if e2 != nil {
                fmt.Println("Error:", e2)
                os.Exit(1)
            }
            results[item.Name()] = itemInfo.Size()
        }
    }
    return results
}

func main(){
    if len(os.Args) < 2 {
        fmt.Println("Specify a path as the first argument.")
    } else {
        rootPath := os.Args[1]
        _, e1 := os.Stat(rootPath)
        if e1 != nil {
            fmt.Println(rootPath, "is not a valid path")
        } else {
            results := getDirItems(rootPath)
            var resultKeys []string
            for key, _ := range results {
                resultKeys = append(resultKeys, key)
            }
            sort.Slice(resultKeys, func(i, j int) bool{
                return results[resultKeys[i]] > results[resultKeys[j]]
            })
            for _, key := range resultKeys {
                fmt.Printf("%s\t%.2fKB\n", key, float64(float64(results[key]) / 1024))
            }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

There are some minor differences between the four implementations, but the general approach used for all four is the same:

  1. Create a recursive function that returns a collection of key-value pairs of item (file or folder) name and size.
  2. In the main function or block, do some basic input validation, and if the user has provided a valid path, run the recursive function on that path.
  3. Sort the output of the recursive function by value (size), in descending order.
  4. Print the sorted output to the console. Each line printed adheres to the format: the item name, followed by a tab character, followed by the item size divided by 1024 and rounded to two decimal places to get the size in kilobytes, followed by "KB" to denote the size unit.

Usage

On the command line, pass the path to the directory you want to sort as the first parameter. I won't list all the possible examples, but here are a couple, assuming you've copied the code and saved it as a file name dir_desc, short for "directory descending", plus the appropriate file extension:

Using Python on Mac or Linux:

python3 dir_desc.py <some path>

Using PowerShell on Windows:

powershell -f dir_desc.ps1 <some path>

Differences Between Languages and Implementations

  • Python and Go resemble C and other C-like languages in that the first command line argument is the second item in the args array. In the .NET languages, PowerShell and C#, the first argument is the first item in the args array.
  • In PowerShell, there is no need to create a separate recursive function, because the desired result can be more easily achieved by using the built-in Get-ChildItem (gci) and Measure-Object (measure) cmdlets.
  • In Go, sorting a collection of key-value pairs (map) by value requires a couple more lines of code than in other languages, as the built-in sorting functions are designed to work with arrays/slices, not maps.
  • In Go, rounding a floating point number to X decimal places is handled when printing the output, using the fmt.Printf() function, as opposed to using a function like, say, math.Round() (which wouldn't do what we want anyway, as it rounds to the nearest integer). If you have a C background, this is probably intuitive. For the rest of us, it's a bit bizarre, but works fine.

I ported my original approach in Python to a few other languages, so that there is at least one version that should work on each of the three major operating systems:

  • Mac and Linux: should have the python3 interpreter installed by default. If not, you can use the Go version. Some Linux systems may have a version of gcc installed by default that can compile Go, but most systems will not, so you will need to download the Go compiler.
  • Windows: the PowerShell version should work out of the box on systems with Windows 10 or later. For older systems, the C# version is probably the better choice. You can use Windows' built-in C# compiler to compile the code.

And that's it. Another yak, shaved. I hope you found this useful.

Top comments (0)