Page History
...
The btrfs-debug-tree can be used to dump the btrees. Here is an output example.
btrfs stores all metadata structures (and to some extent even data if it can fit in a btree leaf) are stored as a item inside the btree.
...
We can then have an inode item sitting by an EA item sitting by an extent and all this inside the same leaf block. That's a very space-efficient approach.
*************
* Directory *
*************
...
If space reservation for either data or metadata cannot be satisfied, the write fails with ENOSPC.
Otherwise, the reserved space is released when the new btree root is written to disk (transaction commit) through the following code path:
__extent_writepage()
->run_delalloc_range()
-> cow_file_range()
-> extent_clear_unlock_delalloc()
-> clear_extent_bit(...EXTENT_DELALLOC)
-> btrfs_delalloc_release_metadata()
btrfs-debug-tree output on:
* an empty filesystem which has just been formatted (1 single device, default option)* a filesystem with one file (with inum/objid 257) created* the same filesystem with a subvolume (subvolid 256) created$ btrfs subvolume list /mnt/
ID 256 top level 5 path subvol************
* Checksum *
************
Btrfs checksums both data and metadata. Data checksums for extents are stored in a dedicated btree (see btrfs_csum_item).
In-line data and metadata are proteted by the btree checksum stored in btrfs_header (256-bit checksum).
Only one checksum type is supported for now, that's crc32c, but new checksum type can be easily added.
...
Like jbd, btrfs has a dedicated thread (namely btrfs-transaction) in charge of transaction commit.
By default, it commits the new tree every 30s (see transaction_kthread()).
btrfs exports transaction to userspace through 2 ioctls (BTRFS_IOC_TRANS_START and BTRFS_IOC_TRANS_END).
This API is used by Ceph's OSD but does not allow to handle ENOSPC issue correctly.
A new API was proposed, more information are available here: http://lwn.net/Articles/361457/![]()