Page History
Introduction
If files are accidentally or maliciously deleted from a file system, an application may be interrupted and the user data may be permanently lost. The recycle bin Trash Can is a recommended useful feature in file systems that acts as a virtual trash cantemporary holding area, allowing users to store deleted files temporarily before permanently deleting themfor a short time before they are permanently deleted. It provides a way mechanism to restore or retrieve deleted files if needed, and automatically deletes the files once they become too old or the filesystem is too full.
Once When the recycle bin Trash Can feature is enabled, when a user deletes a file from a file system, it is not actually immediately deleted but moved to the recycle bin, deleted Trash Can. Deleted files and directories are temporarily stored in the recycle bin. The recycel bin may be manually emptied or once it is full, it will remove the oldest files first. Additionally, items in the recycle bin Trash Can. Files and directories in the Trash Can may be restored or retrieved individually or in bulk if they are still there.
Recycle Bin Functionalities
available. The Trash Can may be manually emptied, or once the filesystem is nearly full the system will automatically empty files from the Trash, taking into account which users and projects are consuming the most space.
The Trash Can The recycle bin should including the following functionalities:
- Files should be added to the Trash can during normal usage (e.g.
rm
orrmdir
utility orunlink()
andrmdir()
syscalls, or if a file is renamed onto another one). - There should be a per-UID space for files in the Trash Can, so that administrative tools can easily find files for each user
List "undeleted" files in the recycle bin;
After a file is deleted and moved into recycle bin, the quota for this file should be accounted and updated (reduced) accordingly;
- A file in the recycle bin is not visiable in the namespace of the file system;
Restore a file in the recycle binTrash Can. This will restore a file or directory to its original path. The corresponding User, Group, and Project quota account should be updated also;
Delete a file in the recycle bin. This will finally remove the file from the file system and free the used space. The file is now unrecoverable;
Empty the recycle bin. This will remove all files in the recycle bin;
A user can restore files from recycle bin within the specified retention period. By this way, a file can be kept "undeleted" under a pre-defined configureable grace period.
Enable/disable recycle bin feature on a entire file system;
A administrator can enable/disable recycle bin feature on a specified directory;
Deleted files can no longer be restored from the recycle bin when:
A file (or directory) is deleted again from the recycle bin. In other words it have been deleted twice. The first deletion only moves the file to the recycle bin. The second deletion actually removes the file from the file system.
The recycle bin is emptied of all of its contents.
Design and Implementation
The design for the recycle bin feature in Lustre is simple.
On the server side, It just implements the basic functionalities such as moving the "undeleted" files into the cycle bin and the interface how to traverse them. On the client side, it implements the basic utility tools to interact with the recycle bin (lctl recycle set|clear|list|delete|restore xxx), including:
- Set or clear the recycle flag on a given file or directory;
- list "undeleted" files on a given MDT;
- Permantently delete a file within the recycle bin on a given MDT;
- Empty the recycle bin on a give MDT;
- Restore a file within the recycle bin on a give MDT;
Our mechanism only moves the regular files into the recycle bin upon its last unlink, but ignoring the directories.
It borrows lots of ideas from orphan and volatile files in Lustre (which stores in "ROOT/PENDING"" directory on each MDT). During the format and setup, each MDT creates a "ROOT/RECYCLE" directory as a recycle bin to store "undeleted" files.
The POSIX API is used to traverse the files under the recycle bin on a given MDT. First, a client can get the FID of recycle bin directory "ROOT/RECYCLE" on the MDT. Then the client can get the file handle via FID open: dir_fd=llapi_open_by_fid(); After that, the "undeleted" files within the recycle bin can be traversed via readdir(dir_fd); it can open by openat(dir_fd, ent->d_name) and obtain the "undeleted" XATTR, which contains the necessary information to resotre, via fgetxattr(fd, "trusted.recyclebin"); The client can even read the data or swap layouts of the "undeleted" file on the recycle bin for restore: opendir()/readddir()/openat()/fgetxattr("trusted.recyclebin")/close()/closedir();
The workflow for the recycle bin is as follows:
An administrator can enable/disable recycle bin feature on a specified MDT via: mdd.*.recycle_bin_enabled;
An adminstrator can enable/disable recycle bin feature on a specified directory or a file via the file flag: FS_UNRM_FL; All sub files under a directory flagged with FS_UNRM_FL can inherit this flag;
# lctl recycle unrm_set $file|$dir
# lctl recycle unrm_clear $file|$dir
Move a deleting file into the recycle bin. When delete a regular file marked with FS_UNRM_FL upon its last unlink, first move the file into the recycle bin directory "ROOT/RECYCLE" with FID as its name. And then set a "trusted.unrm" XATTR on the "undeleted" file on the recycle bin. The XATTR contains the following information:
also be increased again.
- List "undeleted" files in the Trash Can.
After a file is deleted and moved into Trash Can, the User, Group, and Project quota for this file should be accounted and reduced accordingly.
A file in the Trash Can is not visible in the namespace of the file system.
- A file or directory in the Trash Can is marked with a
TRASH
flag, so that tools likelipe_find3
can optionally skip/find deleted files. - A file in the Trash Can marked with the
TRASH
flag cannot be read by a user or application, to prevent applications continuing to read these files. Permanently remove a file in the Trash Can. This will remove the file from the Trash can and destroy the OST objects to free the used space. The file is now unrecoverable.
Empty the Trash Can. This will remove all files in the Trash Can.
Files should be deleted from Trash Can after a specified retention period, such as 7 days.
- Files should be deleted from Trash Can when an OST approaches a capacity threshold (over 80% for HDD, or 90% for SSD) to avoid performance impact or the risk of running out of space if a large number of files are written at once.
Have a tunable parameter to enable/disable Trash Can feature on a entire file system.
A administrator can enable/disable Trash Can feature on a specified file or directory by setting the
NOTRASH
flag on the file.
Deleted files can no longer be restored from the Trash Can when:
A file (or directory) is deleted from the Trash Can. In other words it have been deleted twice. The first deletion only moves the file to the Trash Can. The second deletion actually removes the file from the file system.
The Trash Can is emptied of all of its contents.
Design and Implementation
The design for the Trash Can feature in Lustre is relatively straight forward.
On the server side, the MDS implements the basic functionalities such as moving the "deleted" files into the Trash Can, and the interface how to traverse them. On the client side, it implements the basic utility tools to interact with the Trash Can ("lfs trash set|clear|list|delete|restore FILE|DIR
"), including:
- Set or clear the
TRASH
flag on a given file or directory - list files in the Trash Can on a given MDT
- Permantently delete a file or directory within the Trash Can on a given MDT
- Empty the Trash Can on a given MDT
- Restore a file within the Trash Can on a give MDT
Our mechanism only moves the regular files into the Trash Can upon its last unlink, but ignoring the directories.
It borrows lots of ideas from orphan and volatile files in Lustre (which stores in "ROOT/PENDING
" directory on each MDT). During the format and setup, each MDT creates a "ROOT/TRASH
" directory as a Trash Can to store "undeleted" files.
The POSIX API is used to traverse the files under the Trash Can on a given MDT. First, a client can get the FID of Trash Can directory ROOT/TRASH
on the MDT. Then the client can get the file handle via FID open: dir_fd=llapi_open_by_fid().
After that, the "undeleted" files within the Trash Can can be traversed via readdir(dir_fd)
; it can open by openat(dir_fd, ent->d_name)
and obtain the "trusted.unrm
" XATTR, which contains the necessary information to resotre, via fgetxattr(fd, "trusted.recyclebin")
; The client can even read the data or swap layouts of the "undeleted" file on the Trash Can for restore: opendir()/readddir()/openat()/fgetxattr("trusted.recyclebin")/close()/closedir().
The workflow for the Trash Can is as follows:
An administrator can enable/disable Trash Can feature on a specified MDT via:
lctl set_param mdd.*.enable_trash_can
An adminstrator can enable/disable Trash Can feature on a specified directory or a file via the file flag:
FS_UNRM_FL
; All sub files under a directory flagged withFS_UNRM_FL
can inherit this flag;
# lfs trash set $file|$dir
# lfs trash clear $file|$dir
Move a deleting file into the Trash Can. When delete a regular file marked with
FS_UNRM_FL
upon its last unlink, first move the file into the Trash Can directory "ROOT/RECYCLE
" with FID as its name. And then set a "trusted.unrm" XATTR on the "undeleted" file on the Trash Can. The XATTR contains the following information:
struct ll_trash_xattr { struct recycle_bin_xattr {
__u32 flags;
__u32 uid; // uid of the deleting file, used for quota accounting
__u32 gid; // gid of the deleting file, used for quota accounting
__u32 projid; // projid of the deleting file, used for quota accounting
__u64 timestamp; // Timestamp that the file moved into the recycle bin, maybe we could use ctime here
char fullpath[MAX_PATH]; // Full path of the deleting file
__u32 ltx_flags;
__u32 ltx_uid; };
...
/
...
/ UID of the deleting file, used for quota accounting
__u32 ltx_gid; // gid of the deleting file, used for quota accounting
__u32 ltx_projid; // projid of the deleting file, used for quota accounting
__u64 ltx_timestamp; // Timestamp that the file moved into the Trash Can, maybe we could use ctime here
};
Where ltx_uid
/ltx_gid
/ltx_projid
are the original UID/GID/PROJID of the deleted file, mainly used for quota accounting for the restore operation; @ltx_timestamp
is the time that the file was moved into the Trash Can. It is used to determine whether the file is expired for the specified retention period and thus should be removed from the Trash Can finially (maybe we could also use the inode ctime for this purpose instead of storing a separate timestamp?). During deleting the file, we can get the full path information via the way similar to fid2path()
.
List "undeleted" files within a Trash Can. By default it will list files/directories deleted relative to the current working directory. If
DIR
is provided, then list deleted files/directories relative to that directory, in the same format asls
:
# lfs trash {list|ls} [DIR|FILE]
MDT index: 1
uid gid size delete time FID Fullpath
0 0 4096 Nov 14 08:11 [0x200034021:0x1:0x0]->/mnt/lustre/f1
0 0 32104 Nov 14 08:07 [0x200034021:0x2:0x0]->/mnt/lustre/dir/f2
...
Internally, the lfs trash list
command is looking up the FID and MDT of the current directory, or the directory specified by DIR
, and then listing the respective directory under $MOUNT/.lustre/trash/MDTxxxx/DIRFID/
or the directory file descriptor returned via llapi_recycle_fid_get(MNTPT, mdt)
if the .lustre/trash
directory is not available.
where any files deleted from this directory would be moved.
Deleting a file or directory in the Trash Can will remove the temporary file under "
ROOT/TRASH
" and free the data space on Lustre OSTs permanently
List "undeleted" files within a recycle bin on a given MDT:
# lctl recycle list <--mdt|-m mdt> MNTPT
MDT index: 1
uid gid size delete time FID Fullpath
0 0 4096 Nov 14 08:11 [0x200034021:0x1:0x0]->/mnt/lustre/f1
0 0 32104 Nov 14 08:07 [0x200034021:0x2:0x0]->/mnt/lustre/dir/f2
...
The preudo code:
rbin_fid = llapi_recycle_fid_get(MNTPT, mdt);
dir_fd = llapi_open_by_fid(MNTPT, rbin_fid);
while ((ent = readdir(dir_fd)) != NULL) {
fd = openat(dir_fd, ent->d_name);
fgetxattr(fd, "trusted.recycle", xattr_buf);
print_one(ent->d_name, xattr_buf);
close(fd);
}
close(dir_fd);
Deleting a file in the recycle bin will remove the temporary file under "ROOT/RECYCLE" and free the data space on Lustre OSTs permantently.
# lctl recycle delete <--mdt|-m mdt> MNTPT FID
The pseudo code:
rbin_fid = llapi_recycle_fid_get(MNTPT, mdt);
dir_fd = llapi_open_by_fid(MNTPT, rbin_fid);
unlinkat(dir_fd, "FID", 0);
close(dir_fd);
Empty a recycle bin:
# lctllfs recycle clear <--mdt|-m mdt> MNTPT FID
The pseudo codetrash {delete|rm} [DIR/]FILE ...
Empty a Trash Can:
rbin_fid# = llapi_recycle_fid_get(MNTPT, mdt);
dir_fd = llapi_open_by_fid(MNTPT, rbin_fid);lfs trash clear DIR ...
Restore a file in the Trash Can on a given MDT. It will restore the file and its content according to the saved full path and then delete the stub on the Trash Can.
while# ((entlfs = readdir(dir_fd)) != NULL) {
unlinkat(dir_fd, ent->d_name, 0);trash {restore|unrm} [DIR/]FILE ...
A utility periodically scans the files under Trash Can directory "
ROOT/TRASH
" and delete the file with grace time expiration.Provide the functionality to scan files in the trash on all MDTs that exceed the specified age manually:
}
# lfs trash find close(dir_fd);
Resotre a file in the recycle bin on a given MDT. It will restore the file and its content according to the saved full path and then delete the stub on the recycle bin.
# lctl recycle restore <--mdt|-m mdt> MNTPT FID
The pseudo code:
rbin_fid = llapi_recycle_fid_get(MNTPT, mdt);
dir_fd = llapi_open_by_fid(MNTPT, rbin_fid);
fd = openat(dir_fd, FID, O_RDONLY);
fgetxattr(fd, "trusted.recycle", xattr_buf);
mkdir -p dirname(xattr_buf.path);
{ way 1:
dst_fd = open(xattr_buf.path, O_CREAT);
// copy the file data via read()/write() syscall
copy_data(dst_fd, fd);
close(dst_fd);
unlinkat(dir_fd, "FID", 0);
} { way 2:
mknod(xattr_buf.path);
dst_fid=path2fid(xattr_buf.path);
swap_layouts(dst_fid, FID);
unlinkat(dir_fd, "FID", 0)
} { way 3:
parent_fid=path2fid(dirname(xattr_buf.path))
ioctl(IOCTL_RECYCLE_RESTORE, parent_fid, FID);
in the ioctl(), mv the FID into parent_fid on MDT.
}
close(fd);
close(dir_fd);
LFSCK periodically scans the files under recycle bin directory "ROOT/RECYCLE" and delete the file with grace time expiration.
Provide the functionality to scan "undeleted" files on all MDTs with the grace time expired manually and delete all of them.
# lctl recycle check [--expire_time|-E time] MNTPT
Provide the functionality to restore/delete all files within a given directory. This can be achieved by using the command combination of "lctl recycle list" and "lctl recycle restore" or "lctl recycle delete" to fileter the files with the full path attribute under a given directory.
Provide .lustre/recycle/MDT[N] (where N is the MDT index) filesystem namespace. By this way, users can access the "undeleted" files with readonly mode under the recycle bin directory on a given MDT[N] via POSIX file system API. However, we can not access these files from fileset sub directory mount. We can perform the following commands from a Lustre namespace (mount point of "/mnt/lustre") on a client:
# ls /mnt/lustre/.lustre/recycle/MDT0002
0x200034021:0x1:0x0
0x200034021:0x2:0x0
...
# cat /mnt/lustre/.lustre/recycle/MDT0002/0x200034021:0x1:0x0
# lctl recycle info /mnt/lustre/.lustre/recycle/MDT0002/0x200034021:0x1:0x0
0 0 4096 Nov 14 08:11 [0x200034021:0x1:0x0]->/mnt/lustre/f1
# lctl recycle list /mnt/lustre/.lustre/recycle/MDT0002
MDT index: 1
uid gid size delete time FID Fullpath
0 0 4096 Nov 14 08:11 [0x200034021:0x1:0x0]->/mnt/lustre/f1
0 0 32104 Nov 14 08:07 [0x200034021:0x2:0x0]->/mnt/lustre/dir/f2
...
Clean up files from the trash
It needs to automatically clean up files from the trash can when the filesystem becomes full. It cannot be that the user has to delete every file twice, and it cannot be that the filesystem is allowed to get 100% full (or even 90% full) due to files in the trash. There needs to be an automatic mechanism to clean up the trash to ensure that the filesystem performance does not degrade when users though they deleted files.
It can assign the UID/GID/PROJID to a trash user so that this quota is not accounted against the end user, and keep the original UID/GID/PROJID in a the XATTR "trusted.unrm".
In our design, it does not depend on a userspace utility for such a critical function to clean up files from the trash when FS is nearly full, since that utility may never be started, or the client is evicted, or similar. If that happens, the filesystem would become full and unusable, even though the user had already deleted files from the filesystem. This needs to be bulletproof and run automatically when the OSTs (or MDTs) are getting full.
The MDS is already monitoring the OST fullness every 5s to make object allocation decisions, so it can also make decisions about files to delete.Thus MDT can periodically monitor the space usage of the trash user (quota) and space usage for the entire file system with the additional consideration of the retention period and deleted timestamp for the files, choose the candidates to be deleted permanently to free up the space.
Also, there needs to be some accounting of files in the trash, so that "df" does not show the filesystem as 100% or 90% full all the time, but rather show only the non-trash space usage (= real usage - trash usage).
Per-user Recycle Bin
It can define a per-user RECYCLE/UID directory that is owned by that UID and mode 0600 to avoid world readable access. That avoids exposing files to other users that may be private, and also allows tracking space usage more clearly for each{{UID_}}, so that a user's data can be purged more quickly if they are exceeding their quotas.
Flashback
Recycle bin feature can only recover data for regular files. It can not recover for all metadata changes especially for empty directories.
Combined with the recycle bin feature and the extended Lustre changelog, it can achieve the flashback feature (LU-18457) for Lustre just like the flashback in ORACLE database. With the flashback feature, a user can rewind the metadata of the whole file system to a target time, SCN or restore point.
Flashback undoes changes made by users. It can fix logical failures, but not physical failures. As a result, a user cannot use the flashback command to recover from disk failures, but can recover from the accidental deletion of data files or directories combined with the recycle bin feature.
The MDT undoes the changes according to the changelog in the reverse order.
The MDT can record all necessary information (such as name and FIDs) for each metadata update operation (such as create, mkdir, rmdir, unlink, rename, chmod, setattr, et, al) into the changelog. It can do the reverse operation (undo operation) according to the changelog record to recovery the file system back to a target time, SCN or restore point with a cluster-wide consistent view.
...
-ctime +time [DIR]
Provide the functionality to restore/delete all files within a given directory. This can be achieved by using the command combination of "
lfs trash list
" and "lfs trash restore
" or "lfs trash delete
" to filter the files with the full path attribute under a given directory.Provide
.lustre/trash/MDTnnnn
(wherennnn
is the MDT index) filesystem namespace. By this way, users can access the "undeleted" files with readonly mode under the Trash Can directory on a given MDTnnnn via POSIX file system API. However, we can not access these files from fileset sub directory mount. We can perform the following commands from a Lustre namespace (mount point of "/mnt/lustre
") on a client:
# ls /mnt/lustre/.lustre/trash/MDT0002
0x200034021:0x1:0x0
0x200034021:0x2:0x0
...
# lfs trash ls /mnt/lustre/.lustre/recycle/MDT0002/0x200034021:0x1:0x0
0 0 4096 Nov 14 08:11 [0x200034021:0x1:0x0]->/mnt/lustre/f1
# lfs trash list /mnt/lustre/.lustre/recycle/MDT0002
MDT index: 1
uid gid size delete time FID Fullpath
0 0 4096 Nov 14 08:11 [0x200034021:0x1:0x0]->/mnt/lustre/f1
0 0 32104 Nov 14 08:07 [0x200034021:0x2:0x0]->/mnt/lustre/dir/f2
...
Clean up files from the trash
It needs to automatically clean up files from the trash can when the filesystem becomes full. It cannot be that the user has to delete every file twice, and it cannot be that the filesystem is allowed to get 100% full (or even 90% full) due to files in the trash. There needs to be an automatic mechanism to clean up the trash to ensure that the filesystem performance does not degrade when users though they deleted files.
It can assign the UID/GID/PROJID to a trash user so that this quota is not accounted against the end user, and keep the original UID/GID/PROJID in a the XATTR "trusted.unrm".
In our design, it does not depend on a userspace utility for such a critical function to clean up files from the trash when FS is nearly full, since that utility may never be started, or the client is evicted, or similar. If that happens, the filesystem would become full and unusable, even though the user had already deleted files from the filesystem. This needs to be bulletproof and run automatically when the OSTs (or MDTs) are getting full.
The MDS is already monitoring the OST fullness every 5s to make object allocation decisions, so it can also make decisions about files to delete.Thus MDT can periodically monitor the space usage of the trash user (quota) and space usage for the entire file system with the additional consideration of the retention period and deleted timestamp for the files, choose the candidates to be deleted permanently to free up the space.
Also, there needs to be some accounting of files in the trash, so that "df" does not show the filesystem as 100% or 90% full all the time, but rather show only the non-trash space usage (= real usage - trash usage).
Per-user Trash Can
It can define a per-user TRASH/UID
directory that is owned by that UID and mode 0600 to avoid world readable access. That avoids exposing files to other users that may be private, and also allows tracking space usage more clearly for each UID, so that a user's data can be purged more quickly if they are exceeding their quotas.