Introduction

ltrash_purge (LU-19598) is a user-space tool designed to automatically manage and clean up files in the Lustre Trash Can from the client side, based on configurable policies. It helps manage filesystem space by purging these files when the storage becomes full, the files exceed their retention period, or based on user ID, group ID, and proj ID. It periodically scans the trash directories across all MDTs, examines each deleted file, and applies a series of policy checks to determine whether the file should be purged. These policies can be combined, allowing administrators to create sophisticated cleanup strategies that match their organization's specific needs and usage patterns.

This tool will be released in Lustre version 2.17.x and represent a enhancement to the Lustre Trash Can feature, bringing automated lifecycle management to the deleted files.

Features

ltrash_purge is designed with the following features.

Multiple Purging Policies

Space-based policy: Purge files when MDT or OST usage exceeds thresholds (default: 90%)
Age-based policy: Purge files older than a specified age (default: 7 days)
User-based policy: Purge files based on UID/GID/PROJID filters

Operational Modes

Daemon mode: Continuous monitoring with configurable scan intervals (default: 60s)
Dry-run mode: Scan and report without actually deleting files
Empty mode: Purge all files from Trash Can regardless of filters

Performance Optimizations

Efficient traversal using llapi_find_with_cb() with custom callbacks
Parallel scanning and purging (default: 4 threads)
Auto-select the MDTs and OSTs that exceed the usage threshold to purge the files on them (default: off)
Usage checks every 10000 deletions by default to avoid excessive overhead

Configuration Management

Default config file: /etc/lustre/ltrash_purge.conf
Command-line options override config file settings
PID file locking prevents multiple instances (default: /var/run/ltrash_purge.pid)
Configurable dump file

Statistics and Monitoring

Optional statistics tracking (disabled by default)
YAML-formatted statistics dump on SIGUSR1 signal
Tracks scanned/purged objects, freed bytes, and work rate

Signal Handling

SIGUSR1: Dump statistics to file
SIGINT/SIGTERM: Graceful shutdown

Installation and Configuration

The ltrash_purge utility is built and installed as part of the Lustre utilities package. lt can be configured by command line with the different options or config file.

# Verify installation
which ltrash_purge

Command Line

By running "ltrash_purge --help", you will see the usage as follows:

# ltrash_purge -h
Usage: ltrash_purge [options] [lustre_mount_point]
    -a, --max-age=SECONDS, purge files older than this (default: 604800)
    -c, --conf=FILE, load config file (default: /etc/lustre/ltrash_purge.conf)
    -C, --del-count=NUM, usage check every N deletions (default: 10000)
    -d, --dump=FILE, dump stats to FILE when signal USR1 is received
    -D, --dry-run, scan but do not delete files
    -E, --empty, empty Trash Can without any filter check
    -g, --gid=RANGE, only purge files from these GIDs (e.g. '0-100,1000')
    -h, --help, print this help message
    -l, --log-level={debug|info|warn|error|off}, set log level
    -m, --mdt-usage=NUM, % of MDT space to start purging (default: 90)
    -M, --auto-mdt, auto-select MDTs above mdt-usage threshold
    -n, --threads=NUM, scanning threads (default: 4)
    -o, --ost-usage=NUM, % of OST space to start purging (default: 90)
    -O, --auto-ost, auto-select OSTs above ost-usage threshold
    -p, --projid=RANGE, only purge files from these PROJIDs (e.g. '100-200')
    -P, --pidfile=FILE, the pidfile name, (default: /var/run/ltrash_purge.pid)
    -s, --enable-stats, enable stats collection(default: off)
    -t, --interval=NUM, seconds between checks (default: 60)
    -u, --uid=RANGE, only purge files from these UIDs (e.g. '1000-2000,5000')
    -v, --verbose, print more logs

Config File

Also, ltrash_purge can be configured by the config file. By default, the config file is located in /etc/lustre/ltrash_purge.conf in yaml format.
Here is an example:

# Lustre (subdir) mount point (required)
mount: /mnt/lustre

# Space watermark (percentage, default: 90)
# Start purging when usage exceeds this threshold on any individual MDT/OST
mdt_usage: 90
ost_usage: 85

# Auto-select targets for purging (default: off)
# When enabled, automatically purge files from the targets above usage.
# This is useful for automatically balancing space across the targets.
auto_ost: on
auto_mdt: off

# Age-based policy (in seconds, default: 604800)
max_age: 604800     # 7 days

# User/Group/Project filtering
# Only purge files from these UIDs/GIDs/PROJIDs
#uid: 500-2000,5000
#gid: 100-200,501
#projid: 1,2,3-7

# Scanning options
scan_interval: 60  # Seconds between checks (default: 60)
scan_threads: 4    # Number of parallel scanning threads (default: 4)
del_count: 10000   # Check filesystem usage every N deletions (default: 10000)

# Statistics
#dump: /var/log/ltrash_purge.stats  # output to the screen if no dumpfile is specified
#pidfile: /var/run/ltrash_purge.pid # PID file location
enable_stats: on # Enable statistics output (default: off)

# Logging level
# debug(6)|info(5)|normal(4)|warn(3)|error(2)|fatal(1)|off(0)
log_level: 6

Use Cases

Here are some use cases to help the user understand this tool.

Dry-run and Empty Mode

In dry-run mode, you can see what would be purged without actually deleting. It's useful for testing.

ltrash_purge --dry-run <options> /mnt/lustre

In empty mode, it will purge all files in the trash can regardless of age or usage thresholds.

ltrash_purge --empty --enable-stats /mnt/lustre

Space-based Purging with Auto-Selection

Automatically purge files from OSTs exceeding 85% usage

ltrash_purge --auto-ost --ost-usage=85 --enable-stats /mnt/lustre

ltrash_purge --auto-mdt --mdt-usage=85 /mnt/lustre

Age-based Purging

Purge files older than 3 days (259200 seconds)

ltrash_purge --max-age=259200 --mdt-usage=0 --ost-usage=0 /mnt/lustre

ID-based Purging

Purge files only from specific UIDs

ltrash_purge --uid=1000-1201 --mdt-usage=0 --ost-usage=0 /mnt/lustre

Purge files from specific GIDs

ltrash_purge --gid=100,101 --mdt-usage=0 --ost-usage=0 /mnt/lustre

Purge files from specific project IDs

ltrash_purge --projid=1-3,7 --mdt-usage=0 --ost-usage=0 /mnt/lustre

Daemon Mode with Configuration File

Run as daemon using config file

ltrash_purge --conf=/etc/lustre/ltrash_purge.conf /mnt/lustre

Run in background

ltrash_purge --enable-stats /mnt/lustre &

Statistics Monitoring

Enable stats and dump to file on SIGUSR1

ltrash_purge --enable-stats --dump=/var/log/ltrash_stats.yaml /mnt/lustre &
PURGE_PID=$!

# Trigger stats dump
kill -USR1 $PURGE_PID

# View stats
cat /var/log/ltrash_stats.yaml

Test Plan

Test Environment Requirements

Here are some simple test case in sanityn.sh test_119 cases for reference.

Hardware Requirements

Lustre filesystem with sufficient storage space, at least:
- 2+ MDTs
- 4+ OSTs
- 2+ clients

Software Requirements

- Trash Can feature enabled
- ltrash_purge utility installed
- YAML library support
- Project quota support (for PROJID tests)
- Root and multiple non-root users for UID/GID testing

Functionality Testing

Basic Operations

Verify each ltrash_purge option can work correctly , including

Run mode: --dry-run, --empty and daemon
Purge policy: --mdt-usage, --ost-usage, --uid, --gid, --projid, --max-age
Scan control: --interval, --threads
Optimization: --auto-ost, --auto-mdt, --del-count, --enable-stats
Config: --conf
Others: --pid-file, --dump, --log-level, --verbose, --help

Combination Operations

Verify if different options combination can work correctly, especially different policies together.

Note, in the current initial version, our goal is to cover the most common cases, so a file is purged as long as any policy is matched.

Performance Testing

There are several options related to performance directly:

--threads: 4 by default
--del-count: 10000 by default
--auto-ost: This is an important option, working with --ost-usage to auto-select those OSTs which exceed the usage threshold. This is useful to release the space efficiently and balance the space among OSTs.
--auto-mdt: Similar to --auto-ost, but usually MDT space has smaller pressure than OST.
--interval: 60s by default

Prepare for some files in the trash in some file size distribution, small files and large files, for different space usage. For example, 1000000 small files, 10000 large(>=1GB) files.

Large Number Files

Since ltrash_purge scans and purges objects in one pass, we can't tell the accurate different process, but we can measure if the work rate it reports is linear scaling with threads number from 1, 2, 4, 8, 16, ..., on large number of small files and large files.

Deep Directory Hierarchy

Measure performance with deep directory structures by empty mode.

Stress Testing

Verify stability under sustained load. As deleting file from Lustre file system to trash, as running ltrash_purge to purge files from the trash, for a long time, e.g. serveral hours, even days.

Scalability Testing on Multiple Client Nodes

Theoretically, ltrash_purge can be run on the multiple client nodes, but we should pay attention to its policy partition to avoid any conflict. For example, we can use subdir mount to do some isolation. Anyway, let's verify and see if the performance can be improved.