Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Lustre Quality

A lot of work has been put into stabilizing Lustre over the years, and we have evolved processes to ensure we maintain the stability that so many sites worldwide now rely on for their site-wide production filesystems. In many sites, when Lustre stops working many millions of dollars of computing equipment sits idle, so stability of Lustre is the top criterion for all development.

The processes described below ensure that careful attention is paid prior to proposed changes being landed, whether the engineer works for Intel or some other community organization.

This presentation by Eric Barton (from SC10) elaborates on these themes.

Landing a Feature to a Community Release

The community produces feature releases every 6 months. Check the Community Lustre Roadmap to find out when the next one is scheduled. At the beginning of the development cycle, the features that will be included into the upcoming release are decided, and a landing schedule is worked out to ensure that not all of the features try to land in the week before the code freeze. The feature code freeze may be as early as 3 months prior to the release date, depending on the number and scope of features that are to be landed.

First Steps

Before starting to think about the logistics associated with developing your feature it is imperative to share your plans with the Lustre community before you start work.

You should check the community development wiki to see if anyone is already working on something similar. If someone is then add yourself as a watcher to the JIRA ticket and offer to collaborate.

If you are unable to find a match, then open a new JIRA ticket outlining your plans, including the intended purpose of the development and any initial thoughts on design. Then, mail the lustre-devel mailing list to draw attention to the ticket. This will alert other community members of your intentions and may well result in potential collaborators stepping forward.

If you choose not to do this you may find that you are either duplicating work with someone else, or that your code needs to be reworked to accommodate other changes occurring in the same part of Lustre code.

While features will not be scheduled for landing into a release until they are already close to completion, it is still important that the features themselves be discussed before or during early development. This allows developers to take into account other changes that are being worked on, to avoid conflicts in network protocol changes, code restructuring, and to ensure interoperability between releases.

It is also strongly suggested that you gain experience in the Lustre landing process by fixing one or more bugs for a maintenance release before attempting to tackle writing a Lustre feature. Feel free to ask for suggestions on the lustre-devel mailing list for a suitable bug to get started with.

Schedule and Timing

Community Lustre releases operate to a "train model". The schedule is fixed and will not wait for features that are not ready in time - they are deferred to the next release.

History has shown that a lengthy stabilization period is needed after all features have landed to work through any bugs introduced by the new code that were not caught by normal regression testing. If there is sufficient testing of intermediate development releases at a large enough scale, and the release branch is stable, additional features may be landed as time permits.

For a feature release scheduled for release in month T the schedule is roughly as follows. For more precise dates, keep up to date on the lustre-devel mailing list .

  1. T-7 A call for features is sent out to the lustre-devel mailing list. The amount of change that can be landed for a given release is limited so it is prudent to respond early if you feel that you will have a feature that warrants consideration for inclusion. Expect to be asked to provide the information on the Feature Landing Checklist below - either completed or with estimates as to when any missing portions will be completed. Typically, feature development is already well underway before a feature is scheduled for landing.
  2. T-6 Initial review of candidate features to define the scope of the release. A test plan is created and the Community Lustre Roadmap is updated. A landing schedule is created so that feature landings are spaced out to make it easier for intermediate testing to identify when features introduce regressions. If serious regressions are found when a feature is landed then it will be reverted from master until the problems have been addressed. It should be obvious that not all changes can land in the last weeks before the feature freeze, hence the requirement that features already be close to completion before scheduling them for landing.
  3. T-3 Feature Freeze - feature landing is finished, bug fixes only from now on. The Community Lustre Roadmap is updated.
  4. T-1 Code Freeze - critical bug fixes only from now on. A release candidate (RC) is tagged and release testing commences
  5. T0 GA announced and RPMs available for download from the Whamcloud download site

Feature Landing Checklist

  1. High level design has been reviewed and signed off by a senior Intel engineer
  2. Test plan has been reviewed and signed off by a senior Intel engineer. The test plan should include performance testing, interoperability, and any feature-specific tests that may fall outside normal testing.
  3. Results from executing the test plan uploaded into Maloo
  4. Proposed revisions to the manual have been provided
  5. The criteria from the Patch Landing Process Summary are met

Seeking Guidance

Please alert the community via the lustre-devel mailing list if you need some extra guidance in getting your patch submitted for the release

Gerrit Workflow

We're using Gerrit for code inspections and for managing changes to branches. The workflow for a change in Gerrit begins with an inspection request, and once the request has been approved, it is pushed directly to the branch by the gatekeeper. We don't need to manage patches externally, or manage our landing process using other tools - all of it can be done in Gerrit. Please review the Using Gerrit page for more details on how Gerrit and Git are used together.

Requirements for patch submission

All patches that land on the main release branches in the fs/lustre-release repository must conform to the Lustre Coding Guidelines, pass the standard Lustre regression tests. All patches should be associated with a Lustre Jira ticket, and by signed off by the developer. Bug fixes should include additional regression tests. Feature patches should include new regression tests to exercise the feature, and potentially performance or load tests as appropriate for the change. All changes submitted to the fs/lustre-release repository will be tested automatically with the Lustre regression tests in lustre/tests and a link to the test results (saved in Maloo) and a pass/fail review comment will be added to the Gerrit change. For additional tests run outside of autotest, the results can either be sent to Maloo manually, or test results can be posted into the Jira ticket associated with the patch.

If there are a series of changes being made to the code (e.g. code style fixes, renaming some functions, adding function parameters, moving code from one file to another, making the actual fix for a bug, making some additional fixes to other defects found during coding), these should all go into separate patches for inspection and testing. Having a series of smaller patches, each of which makes a single, clearly understood change speeds up inspection significantly. Ordering uncontroversial and easy-to-inspect changes first in a patch series ensures that they are only inspected a single time, and can land quickly. Ordering complex changes at the end of the patch series ensures that if changes are requested by the inspectors for these patches, the earlier ones do not need to be re-inspected and re-tested for each update. Following this patch submission model speeds up patch landing, and reduces the burden on the patch inspector, and test systems, and reduces the time that the patch submitter needs to spend waiting or rebasing in order to get their patches accepted.

If there are a number of patches on a git branch, pushing the changes to Gerrit will result in the entire series of patches being sent, with a separate change for each patch. Gerrit correctly tracks dependencies for these patches. If the patches are rebased, either on a new Lustre version, or one of the earlier patches in the series is modified, then all modified patches will be resubmitted to Gerrit for re-inspection, build, and testing.

Any patches that require updates to the Lustre manual will also need to submit documentation updates when the patch is ready for submission. See the link Making changes to the Lustre Manual.

For code that requires modifications or additions to the existing suite of automated Lustre tests, the new or modified test scripts must be submitted with the patch. If there is a defect in the existing code, it means that there is no regression test that covers this code path. A new test should be written that confirms the problem be hit by the test before the fix is landed, and verifies the patch resolves the problem afterward. Any additional testing that cannot be covered by the existing automated Lustre tests must also be documented with a clear, step-by-step test plan, and arrangements must be made with the Lustre test team to execute these plans.

Signed-off-by

We are using the a similar sign-off process that is used for contributions to the Linux kernel. This means developers making contributions certify that they wrote the patch or have the right to pass it on. To certify this, you include a Signed-off-by: line like this one at the bottom of your commit comments:

No Format

Signed-off-by: Random J Developer <random@developer.example.org>

Adding this line means that you certify the following:

No Format

Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

Changes submitted to to the release branch without the Signed-off-by: line will be rejected.

Commit comment format

The first line of the commit comment is the commit summary of the change. Changes submitted to the fs/lustre-release branch also require a Lustre Jira issue tag at the beginning of the first line of the commit summary. A Lustre Jira ticket is one that begins with LU and is therefore part of the Lustre project within Jira. If the patch is submitted for the fs/lustre-release repository without a Lustre Jira ID in the first line, then it will automatically receive a -2 review which will prevent the patch from being submitted to a release branch. You will get an e-mail about this rejection at which time you can add the line and resubmit the patch. The commit summary should also have a component: field immediately following the Jira issue tag that indicates which Lustre subsystem that the commit is related to. Example Lustre subsystems relate to modules like llite, lov, osc, mdc, lmv, ldlm, ptlrpc, mds, oss, mdd, osd, ldiskfs, lnet, libcfs, socklnd, o2iblnd; functional components like recovery, quota, grant; or auxiliary components like build, tests, docs. This subsystem list is not exhaustive, but provides a good guideline for consistency.

The rest of the comments should be wrapped to 70 columns or less. This allows for the first line to be used a subject in emails, and also for the entire body to be displayed using tools like git log or git shortlog in an 80 column window.

Please make sure the Change-Id: line and Signed-off-by: lines are at the bottom of the comment. This is required by the fs/lustre-release project (at least) in order for the patch to be approved. If your comments are missing either of these lines the patch will be automatically be rejected at submission time. See Using Gerrit for a description of the Change-Id: line .

Commit comment template
No Format

LU-999 component: short description of change under 64 columns

A more detailed explanation of the change being made.  This can be
as detailed as you'd like, possibly several paragraphs in length.

Please provide a detailed explanation of what problem was solved,
a good high-level description of how it was solved, and which
parts of the code were affected (including function names as
needed, for easier searching).  Wrap lines at 72 columns or less.

Signed-off-by: Random J Developer <random@developer.example.org>
Change-Id: Ica9ed1612eab0c4673dee088f8b441d806c64932

Examples of good commit comments are:

No Format

LU-477 ldiskfs: allocate s_group_desc/s_group_info by vmalloc()

Add the patch to the RHEL6 ldiskfs patch series.

Large kmalloc() for sbi->s_group_desc and sbi->s_group_info can fail
for large filesystems (typically over 16TB), which will cause the
"not enough memory" error while mounting.  This patch makes allocation
fall back to vmalloc() if the kmalloc() failed, as what was done for
sbi->s_flex_groups.

To avoid colliding with an valid on-disk inode number, EXT4_BAD_INO
is used as the number of the buddy cache inode.

The patch also incorporates the following upstream kernel fix:

commit 32a9bb57d7c1fd04ae0f72b8f671501f000a0e9f
ext4: fix missing iput() of root inode for some mount error paths
https://bugzilla.kernel.org/show_bug.cgi?id=26752



Signed-off-by: Yu Jian <yujian whamcloud.com>
Change-Id: I3950425835ea7f2968ceb2edbc622e3ff3ed8545
No Format

LU-723 Enhance lustre/ldiskfs build system


Enhance the lustre/ldiskfs build system so it is more robust, flexible,
and consistent with lustre/zfs build system. This change is being made
in the interest of standardizing the infra-structure around backend
filesystems.

This change does not effect the current behavior of the --with-ldiskfs,
--enable-ext4, or --with-ldiskfsprogs configure options. However, it
does remove the obsolete --with-ldiskfs-inkernel configure option which
was only used by LLNL. It also adds the --with-ldiskfs-obj configure
option which improves flexibility. And the --enable-ldiskfs-build
configure option to support building against the lustre-ldiskfs-devel
package. The behavior of these options is consistent with their ZFS
counterparts, see commit 8c7266c for further details.

--enable-ext4 enable ldiskfs build using ext4
--enable-ldiskfs-build enable ldiskfs configure/make
--with-ldiskfs=path set path to ldiskfs source
--with-ldiskfs-obj=path set path to ldiskfs objects
--with-ldiskfsprogs use alternate names for ldiskfs-enabled e2fsprogs

Sample ./configure results when building lustre and ldiskfs using
the kernel-devel and kernel-debuginfo-common packages.

checking whether to enable ldiskfs... yes
checking ldiskfs source directory... /home/behlendo/src/git/lustre/ldiskfs
checking ldiskfs object directory... /home/behlendo/src/git/lustre/ldiskfs
checking ldiskfs module symbols... Module.symvers
checking ldiskfs source release... 3.3.0
checking whether to use ext3 or ext4 source... ext4
checking ext4 source directory... /usr/src/debug/.../fs/ext4
checking whether to build ldiskfs... yes
checking for /home/behlendo/src/git/lustre/ldiskfs/configure... yes
checking for /usr/src/debug/.../fs/ext4/dir.c... yes
checking for /usr/src/debug/.../fs/ext4/file.c... yes
checking for /usr/src/debug/.../fs/ext4/inode.c... yes
checking for /usr/src/debug/.../fs/ext4/super.c... yes
checking if ext4_ext_walk_space() takes i_data_sem... yes
checking if LDISKFS_SINGLEDATA_TRANS_BLOCKS takes sb as argument... yes
checking if ldiskfs_discard_preallocations defined... yes
checking if ldiskfs_ext_insert_extent needs 5 arguments... yes

In the context of this change additional cleanup has been done and
all of the ldiskfs specific code relocated to lustre-build-ldiskfs.m4.

Note that this change moves us closer to supporting patchless Lustre
servers with ldiskfs. Once the remaining kernel patches for Lustre
are dropped you will be able to build Lustre using the distribution
provided kernel-devel and kernel-debuginfo-common packages.

This change also incorperates ORI-340, commit f604951, which ensures
that the Module.symvers file will cleanly include the symbols for all
enabled Lustre backends. While the only backend supported by master
right now is ldiskfs this brings the master and orion branchs in to
sync in this regard.



Change-Id: I6f13f266944ec6967f4d7705a30b83ab8e577b15
Signed-off-by: Brian Behlendorf <behlendorf1 llnl.gov>
Signed-off-by: Prakash Surya <surya1 llnl.gov>

Submitting a change to a Lustre Release

The git URLs here assume you have updated your ~/.ssh/config using the suggestions in Using Gerrit.

The repository we are using for releases is fs/lustre-release. This is the repository to push to when you request an inspection for a change that is ready for a release. There are currently only two branches here, b1_8 and master, which is the branch for 2.X releases.

Code Block

git clone ssh://review/fs/lustre-release

Push the change to the fs/lustre-release repository for inspection and eventual submission to the master branch:

Code Block

git push origin HEAD:refs/for/master

Once a change request has received at least two positive inspections (and ideally no negatives), the gatekeeper for the branch will, at her discretion, submit the change directly from Gerrit. If a patch no longer applies cleanly to the branch, then it will need to be updated by the developer, and the gatekeeper should be notified when the patch is ready again. The gatekeeper will decide if additional inspections are required or if it is ready to land.

Patch landing process

Brief summary for landing patch to lustre in Whamcloud:

...

.