Introduction

ltrash_purge (LU-19598) is a user-space tool designed to automatically manage and clean up files in the Lustre Trash Can from the client side, based on configurable policies. It helps manage filesystem space by purging these files when the storage becomes full, the files exceed their retention period, or based on user ID, group ID, and proj ID. It periodically scans the trash directories across all MDTs, examines each deleted file, and applies a series of policy checks to determine whether the file should be purged. These policies can be combined, allowing administrators to create sophisticated cleanup strategies that match their organization's specific needs and usage patterns.

This tool will be released in Lustre version 2.17.x and represent a enhancement to the Lustre Trash Can feature, bringing automated lifecycle management to the deleted files.

Features

ltrash_purge is designed with the following features.

Multiple Purging Policies

Operational Modes

Performance Optimizations

Configuration Management

Statistics and Monitoring

Signal Handling

Installation and Configuration

The ltrash_purge utility is built and installed as part of the Lustre utilities package. lt can be configured by command line with the different options or config file.

# Verify installation
which ltrash_purge

Command Line

By running "ltrash_purge --help", you will see the usage as follows:

# ltrash_purge -h
Usage: ltrash_purge [options] [lustre_mount_point]
    -a, --max-age=SECONDS, purge files older than this (default: 604800)
    -c, --conf=FILE, load config file (default: /etc/lustre/ltrash_purge.conf)
    -C, --del-count=NUM, usage check every N deletions (default: 10000)
    -d, --dump=FILE, dump stats to FILE when signal USR1 is received
    -D, --dry-run, scan but do not delete files
    -E, --empty, empty Trash Can without any filter check
    -g, --gid=RANGE, only purge files from these GIDs (e.g. '0-100,1000')
    -h, --help, print this help message
    -l, --log-level={debug|info|warn|error|off}, set log level
    -m, --mdt-usage=NUM, % of MDT space to start purging (default: 90)
    -M, --auto-mdt, auto-select MDTs above mdt-usage threshold
    -n, --threads=NUM, scanning threads (default: 4)
    -o, --ost-usage=NUM, % of OST space to start purging (default: 90)
    -O, --auto-ost, auto-select OSTs above ost-usage threshold
    -p, --projid=RANGE, only purge files from these PROJIDs (e.g. '100-200')
    -P, --pidfile=FILE, the pidfile name, (default: /var/run/ltrash_purge.pid)
    -s, --enable-stats, enable stats collection(default: off)
    -t, --interval=NUM, seconds between checks (default: 60)
    -u, --uid=RANGE, only purge files from these UIDs (e.g. '1000-2000,5000')
  -v, --verbose, print more logs



 Config File

Also, ltrash_purge can be configured by the config file. By default, the config file is located in /etc/lustre/ltrash_purge.conf in yaml format.
Here is an example:

# Lustre (subdir) mount point (required)
mount: /mnt/lustre

# Space watermark (percentage, default: 90)
# Start purging when usage exceeds this threshold on any individual MDT/OST
mdt_usage: 90
ost_usage: 85

# Auto-select targets for purging (default: off)
# When enabled, automatically purge files from the targets above usage.
# This is useful for automatically balancing space across the targets.
auto_ost: on
auto_mdt: off

# Age-based policy (in seconds, default: 604800)
max_age: 604800     # 7 days

# User/Group/Project filtering
# Only purge files from these UIDs/GIDs/PROJIDs
#uid: 500-2000,5000
#gid: 100-200,501
#projid: 1,2,3-7

# Scanning options
scan_interval: 60  # Seconds between checks (default: 60)
scan_threads: 4    # Number of parallel scanning threads (default: 4)
del_count: 10000   # Check filesystem usage every N deletions (default: 10000)

# Statistics
#dump: /var/log/ltrash_purge.stats  # output to the screen if no dumpfile is specified
#pidfile: /var/run/ltrash_purge.pid # PID file location
enable_stats: on # Enable statistics output (default: off)

# Logging level
# debug(6)|info(5)|normal(4)|warn(3)|error(2)|fatal(1)|off(0)
log_level: 6

Use Cases

Here are some use cases to help the user understand this tool.

Dry-run and Empty Mode

In dry-run mode, you can see what would be purged without actually deleting. It's useful for testing.

ltrash_purge --dry-run <options> /mnt/lustre

In empty mode, it will purge all files in the trash can regardless of age or usage thresholds.

ltrash_purge --empty --enable-stats /mnt/lustre

Space-based Purging with Auto-Selection

Automatically purge files from OSTs exceeding 85% usage

ltrash_purge --auto-ost --ost-usage=85 --enable-stats /mnt/lustre
ltrash_purge --auto-mdt --mdt-usage=85 /mnt/lustre

Age-based Purging

Purge files older than 3 days (259200 seconds)

ltrash_purge --max-age=259200 --mdt-usage=0 --ost-usage=0 /mnt/lustre

ID-based Purging

Purge files only from specific UIDs

ltrash_purge --uid=1000-1201 --mdt-usage=0 --ost-usage=0 /mnt/lustre


Purge files from specific GIDs

ltrash_purge --gid=100,101 --mdt-usage=0 --ost-usage=0 /mnt/lustre


Purge files from specific project IDs

ltrash_purge --projid=1-3,7 --mdt-usage=0 --ost-usage=0 /mnt/lustre

Daemon Mode with Configuration File

Run as daemon using config file

ltrash_purge --conf=/etc/lustre/ltrash_purge.conf /mnt/lustre


Run in background

ltrash_purge --enable-stats /mnt/lustre &

 Statistics Monitoring

Enable stats and dump to file on SIGUSR1

ltrash_purge --enable-stats --dump=/var/log/ltrash_stats.yaml /mnt/lustre &
PURGE_PID=$!

# Trigger stats dump
kill -USR1 $PURGE_PID

# View stats
cat /var/log/ltrash_stats.yaml

Test Plan

Test Environment Requirements

Here are some simple test case in sanityn.sh test_119 cases for reference.

Hardware Requirements

Lustre filesystem with sufficient storage space, at least:
  - 2+ MDTs
  - 4+ OSTs
  - 2+ clients

Software Requirements

- Trash Can feature enabled
- ltrash_purge utility installed
- YAML library support
- Project quota support (for PROJID tests)
- Root and multiple non-root users for UID/GID testing

Functionality Testing

Basic Operations

Verify each ltrash_purge option can work correctly , including

Combination Operations

Verify if different options combination can work correctly, especially different policies together.

Note, in the current initial version, our goal is to cover the most common cases, so a file is purged as long as any policy is matched.

Performance Testing

There are several options related to performance directly:

Prepare for some files in the trash in some file size distribution, small files and large files, for different space usage. For example, 1000000 small files, 10000 large(>=1GB) files.

Large Number Files

Since ltrash_purge scans and purges objects in one pass, we can't tell the accurate different process, but we can measure if the work rate it reports is linear scaling with threads number from 1, 2, 4, 8, 16, ..., on large number of small files and large files.

Deep Directory Hierarchy

Measure performance with deep directory structures by empty mode.

Stress Testing

Verify stability under sustained load. As deleting file from Lustre file system to trash, as running ltrash_purge to purge files from the trash, for a long time, e.g. serveral hours, even days.

Scalability Testing on Multiple Client Nodes

Theoretically, ltrash_purge can be run on the multiple client nodes, but we should pay attention to its policy partition to avoid any conflict. For example, we can use subdir mount to do some isolation. Anyway, let's verify and see if the performance can be improved.