Page History
| Table of Contents |
|---|
Introduction
If files are accidentally or maliciously deleted from a file system, the user data may be permanently lost. The Trash Can is a useful feature in file systems that acts as a temporary holding area, allowing users to store deleted files for a short time before they are permanently deleted. It provides a mechanism to restore or retrieve deleted files if needed, and automatically deletes the files once they become too old or the filesystem is too full.
...
A file (or directory) is deleted from the Trash Can. In other words it have been deleted twice. The first deletion only moves the file to the Trash Can. The second deletion actually removes the file from the file system.
The Trash Can is emptied of all of its contents.
Design and Implementation
The design for the Trash Can feature in Lustre is relatively straight forward.
...
The implementation borrows ideas from orphan and volatile files in Lustre, which normally stores deleted files in the "ROOT/PENDING" directory on each MDT. After the initial setup and mount, each MDT creates a "ROOT/.lustre/Trash/MDTxxxx" directory as a Trash Can to store deleted files, if it does not already exist.
Configuration for the Trash Can
An administrator can enable/disable Trash Can feature globally on a specified MDT via:
lctl set_param mdd.*.trash_can_enable- The UID/GID/PROJID of files in the Trash Can are configured globally via
mdd.*.trash_can_uid,mdd.*.trash_can_gid, andmdd.*.trash_can_projid, see Space and Quota Accounting below for details
Delete a file into the Trash Can
When a file or empty subdirectory is deleted (last link in namespace is removed) a number of steps are performed for the file. Some of them are "one time only" for the user or directory, while others are done for each file
...
Where ltx_uid/ltx_gid/ltx_projid are the original UID/GID/PROJID of the deleted file, mainly used for the restore operation. ltx_timestamp is the time that the file was moved into the Trash Can. It is used to determine whether the file is expired for the specified retention period and thus should be purged from the Trash Can. It may be to use the inode ctime for this purpose instead of storing a separate timestamp to reduce the size of the xattr.
Delete a directory into the Trash Can
When a directory is deleted into the Trash Can, it is desirable to preserve the directory hierarchy of the original directory tree, so that accidental "rm -rf" (or equivalent) does not result in millions of files or directories in the top level of the .lustre/Trash/MDTxxxx/UID directory. The directory deletion will perform the following actions:
...
There is no single "delete directory tree" command in POSIX, since that may normally take a very long time to complete while processing billions of files. With Trash Can it may be desirable to offer such an interface, since the whole directory tree could be moved into the Trash Can in a single operation.(it would necessitate background operations to annotate files with the FS_UNRM_FL attribute, store the original UID/GID/PROJID into the trusted.unrm xattr, and change the UID/GID/PROJID into the trash_can_* equivalents. This would still be more efficient than deleting (renaming) thousands or millions of individual files and subdirectories.
List "undeleted" files within a Trash Can
- The
.lustre/trash/MDTxxxx(wherexxxxis the hexadecimal MDT index) directory tree is local to each MDT. By this way, users can access the "undeleted" files with readonly mode under the Trash Can directory on a given MDTnnnn via POSIX file system API. However, we can not access these files from fileset sub directory mount. We can perform the following commands from a Lustre namespace (mount point of "/mnt/lustre") on a client:
...
Internally, the lfs trash list command is looking up the FID and MDT of the current directory, or the directory specified by DIR, and then listing the respective directory under $MOUNT/.lustre/trash/MDTxxxx/pFID/ or the directory file descriptor returned via llapi_open_by_fid() if the .lustre/trash directory is not available. This is mainly for debugging, since users will generally use the virtual .Trash directory to interact with the Trash Can and restore files.
Deleting a file or directory in the Trash Can
To remove the temporary file under "
ROOT/.lustre/Trash" and free the data space on Lustre OSTs permanently:
# lfs trash {delete|rm} [DIR/]FILE ...Empty a Trash Can:
# lfs trash clear DIR ...
Restore a file from the Trash Can
on a given MDT. It will restore the file and its content according to the saved full path and then delete the stub on the Trash Can.
...
Provide the functionality to restore/delete all files within a given directory. This can be achieved by using the command combination of "
lfs trash list" and "lfs trash restore" or "lfs trash delete" to filter the files with the full path attribute under a given directory.
Space and Quota Accounting
In order to separate space and quota accounting for a user, group, or project's files, the original UID, GID, and PROJID of the file cannot be used for files in the Trash Can. Otherwise, there would be confusion on the part of the user when they delete files and their quota usage does not decrease. Similarly, the free and used space and inodes reported by df should not contain the space consumed by files in the Trash Can, since users would be confused by the fact that deleting files does not reduce the amount of space used in the filesystem.
...
A new option "lfs df --trash" should show the actual space usage for the filesystem, so that it is possible for an administrator to diagnose issues with the Trash Can space usage.
Clean up files from the Trash Can
A mechanism is needed to automatically clean up files from the trash can when the filesystem becomes full. It cannot be that the user has to delete every file twice, and it cannot be that the filesystem is allowed to get 100% full (or even 90% full) due to files in the trash. There needs to be an automatic mechanism to clean up the trash to ensure that the filesystem performance does not degrade when users though they deleted files.
...
The MDS is already monitoring the OST fullness every 5s to make object allocation decisions, so it can also make decisions about files to delete. Therefore, the MDT can periodically monitor the space usage of the trash user (quota) and space usage for the entire file system with the additional consideration of the retention period and deleted timestamp for the files, choose the candidates to be deleted permanently to free up the space.
Per-User Trash Can
A per-user Trash/MDTxxxx/UID/ directory that is owned by that UID and mode 0700 should always be created in the top-level directory to avoid world readable access to deleted files, and to de-conflict files/directories of the same name created by users (e.g. tmp/ or data/ or Documents/ or similar. That avoids exposing files to other users that may be private, and also allows tracking space usage more clearly for each UID, so that a user's data can be found and purged more quickly if they are exceeding their quotas.
In some uncommon cases, it may be that a parent directory has files owned by multiple different users (different UIDs). This would likely only happen for top-level directories like scratch/ or home/ that contain directories from multiple users. When those directories were deleted they created separate .lustre/Trash/MDTxxxx/UID/pFID/ stub directories. In this case, the deleted parent directory should only be created in the .../UID/ directory with the inode->i_uid of the parent directory.
Per-Tenant Trash Can
Files and directories deleted from within a subdirectory mount of a Nodemap should be stored in a Trash/MDTxxxx/NODEMAP/UID/ directory to isolate the files/directories from different tenants. The NODEMAP/ directory name is the configured name of the nodemap for that tenant, and can be found from the client export used to perform the final unlink operation. The UID/ directory name should be the client UID of the user, so that the visible directory name matches the user expectation. The UID directory ownership should be the server UID of the user, so that proper file access controls can be maintained. By having the multi-level NODEMAP/UID/ naming, it isolates the UID directory names from other tenants that may have the same mapped UID directory name.
...
When running the df command, the statfs() output should add in the space used by the trash_can_projid for the nodemap, so that the space and inode usage reported does not reflect the space used by the Trash Can.
Repeated deletion of same filename
If the same filename is repeatedly created and deleted within the same parent directory, then the deleted files will have conflicts when moved into the pFID directory in the Trash Can. This may happen for files that are edited by the user, and a temporary file like .FILENAME.tmp12345 is created and written by the editor, and then renamed over the original FILENAME (causing it to be deleted) so that the file contents are not lost if the new file is only partially written. In such cases, the same FILENAME may be deleted many times.
To disambiguate the files in Trash, the conflicting filenames should be disambiguated by appending a timestamp to the filename, like filenameFILENAME.2025-04-03-00:11:24, possibly adding .microseconds if there is still a conflict. It isn't totally clear whether it would be better to use the timestamp from when the file was deleted, or when the file was created. Both have some value to help users distinguish between the different versions.
In order to avoid overwhelming the Trash Can with files that are rapidly created and deleted (e.g. short-lived temporary files), it would be desirable to impose an upper limit on the number of versions that will be saved in the trash can. Some complexity exists in implementing this, because the MDS shouldn't need to do a full directory listing to determine if there are multiple versions of a file in the trash.
Avoid preserving temporary files
Files that only exist for a very short time (e.g. temporary files) should not necessarily be preserved in the Trash Can, or they can quickly overwhelm the available capacity of the filesystem, and result in important files being purged from trash and/or filling the trash faster than files can be cleaned up. Files marked with the I_LINKABLE flag on the MDS (from O_TMPFILE, or Lustre Volatile files, see LU-18844) should not be preserved in the Trash Can .if they are not linked into a file in the namespace. It would be useful to have a tunable parameter that sets a minimum age for files to be preserved in the Trash Can (e.g. 65 minutes?) so that files that are frequently created and deleted are not preserved since they could consume a considerable amount of space.
JobID of process deleting a file
In LU-13031 the JobID of the process that first creates a file is stored in the user.job xattr on the MDT inode, for diagnostic purposes and to allow determining provenance of each file later on. For the Trash Can, LU-17648 describes storing the JobID of the process that is deleting the file, for diagnostics such as determining rogue processes that are deleting files in the filesystem. Something like user.del would be a reasonable default xattr name. The actual xattr name can be configured with the mdt.*.job_xattr_del parameter.
.Trash Virtual Directory Support
.Trash virtual directory
Backup and Restore of MDT with Trash Can
For filesystem-level/namespace backup and restore, the .Trash directory will not be visible to the backup utility during namespace traversal, so deleted files in the Trash will not be backed up. However, if a deleted directory is restored from Trash then if it has a new FID (== new inode number) the backup utilities may consider this to be a new directory and back it up again. At a minimum, it would be desirable to preserve the inode number. Similarly, HSM integration may depend on the FID of the file, so it may be desirable to preserve the original parent FID of the directory when it is restored from Trash. A deleted directory could only be restored to use the same FID if the original directory had been deleted.
For MDT-level/device backup and restore, the Trash directory and its contents will be backed up in the same way as any other directory in the filesystem. The stub pFID directory is named by the parent directory FID, so it will be backed up and restored as-is (its FID is not really important, but should be treated normally). The FID on the actual parent directory will be preserved in the trusted.lma xattr, but the FID lookup in the restored OI will not be possible until after the initial post-restore OI Scrub has completed. Otherwise, the FID lookup in the restored OI will reference the old inode number, which will reference a wrong or unused inode number/generation, and should return -EINPROGRESS or possibly -ESTALE.
It may be necessary to add special support to OI Scrub/LFSCK to handle the Trash directory on each MDT, so that it can do a top-level scan of the stub pFID directories to confirm that the parent FID still references a valid parent directory.
Trash Can and fscrypt Files and Directories
When an fscrypt directory is deleted, the encryption context on the directory must be copied from the parent directory to the new pFID stub directory when it is first created, so that the encrypted filenames and contents can be accessed properly via the .Trash/ directory, as well as if an encrypted file or directory is undeleted. The fscrypt context is partly derived from the parent directory, as well as a unique per-inode Cryptographic nonce value that ensures the encrypted data is unique, even if the same data is encrypted multiple times.
Virtual .Trash Directory Support
Virtual .Trash directory
A virtual ".Trash" subdirectory accessible in each directory in the filesystem would allow users to easily browse deleted files/directories under the current subdirectory It would useful to implement a virtual ".Trash" subdirectory accessible in each directory in the filesystem that can be used to browse files/directories in the Trash Can and access them for recovery.
...
There are two solution to handle striped directory for Trash Can:
- Solution 1:
Create a stub dir on trash Can upon the first file deletion of the striped directory. And the stub dir is using the master FID of the striped directory as the name and using the LMV layout of the striped directory as the layout configuration template.
When deleting a file or directory under the striped directory, the MDT first locates the stripe index of the stripe shard under which the file is deleting and then locates the corresponding stub shard of the stub directory and move the deleting file into the stub shard object.
Under this design, the user can directly access the files or directoies on Trash Can via ".Trash" or ".lustre/trash/MDTXXXX/[uid]/pFID". But the first deleting under a striped directory may cause a distributed metadata transcation. And Each deleting under the striped directory needs to locate the stub shard object any maybe need to obtain LMV layout of stub master striped dir remotely.
- Solution 2:
Do not create the stub striped dir on Trash Can upon the first file deletion of the striped directory. The MDT just creates a stub shard dir naming with the corresponding shard FID of the parent directory and move the file into this stub shard dir in Trash Can.
When access trash files via ".Trash", the client will construct a virtual file with the FID same with the parent dir, but the version of FID is set with (FID_VERSION_TRASH = 1).
And the shard FIDs are constructed with the corresponding FIDs with same FID names in Trash Can.
Under this design, the user can only access the trash files for striped directories via ".Trash", can not access them via ".lustre/MDTXXXX/pFID" as the master stub directory does not exist yet until the deleting the parent directory.
When deleting the parent directory, the MDT will create a stripe stub directory on Trash Can where the shards' FIDs are FIDs of corresponding stub shard objects.
Under this design, the file deleting is simpler. Just need to create a stub shard object with the name same as the FID of the corresponding shard for the parent striped directory. But the access via ".Trash" and deleting the striped directory is complex.
For a striped directory, its ".Trash" directory is also a vitual striped directory with each stripe on the same location (MDTs) where the shard FID is the FID of the corresponding stub directory on that MDT. If the stub directory on a certain MDT does not exist (or not create yet), the client lookup() or readdir() under ".Trash" directory should skip the stripe. The master FID of the virtual ".Trash" directory could be same with the FID of the parent directory but with f_ver setting with 1 (FID_VERSION_TRASH = 1) to distinguish them.
...