
Introduction
Currently there is no dedicated functional test tool in Lustre test suites for LNet testing. Lustre Unit Test Framework (LUTF) fills that gap to provide a means for testing existing LNet features as well as new features that would be added in future. It facilitates an easy way of adding new test cases/scripts to test any new LNet feature.
Objectives
This High Level Design Document describes the current LUTF design, code base, infrastructure requirements for its setup and the new features that can be added on top of the current design.
Reference Documents
Document Structure
This document is made up of the following sections:
- Design Overview
- Building the LUTF
- LUTF-Autotest Integration
- Infrastructure
LUTF Design Overview
The LUTF is designed with a Master-Agent approach to test LNet. The Master and Agent LUTF instance uses a telnet python module to communicate with each other and more than one Agent can communicate with single Master instance at the same time. The Master instance controls the execution of the python test scripts to test LNet on Agent instances. It collects the results of all the tests run on Agents and write them to a YAML file. It also controls the synchronization mechanism between test-scripts running on different Agents.
The below diagram shows how LUTF interacts with LNet

Figure 1: System Level Diagram
Building the LUTF
To build LUTF, it first requires to set up an environment with all the required packages installed and then building using GNU build system like Lustre tree is built.
Following sub sections briefs on the steps for the building process.
- Python 2.7.5 is required along with some other python related packages like -
- netifaces
- PyYAML
- paramiko (some MR test scripts are written using paramiko, so need to have this installed too)
- SWIG (Simplified Wrapper and Interface Generator) is required to generate a glue code to allow the python test scripts call DLC APIs.
- Password less SSH - Nodes running LUTF are required to setup password less SSH to each other.
Build along Lustre tree using GNU tools
- All the other test suites/scripts for lustre are placed under lustre/tests/ directory. Place LUTF as well under lustre/tests.
- Mention LUTF as a subdirectory to be build in lustre/tests/Makefile.am
- * Create an autoMakefile.am under lustre/tests/ and also under lustre/tests/lutf/ .
- Create a Makefile.am under lustre/tests/lutf/ to generate the required binary files and swig files.
- It would also require to modify configure.ac under lustre tree parent directory to add python path and other dependencies.
- Add the LTLIBRARIES and SOURCES to generate the swig wrapper files.
- Run "make distclean" to clean up any residual build artifacts.
- cd to lustre tree parent directory and run "sh autogen.sh"
- Run "./configure"
- Run "make"
LUTF/AT Integration
For LUTF-Autotest integration, the first step in this process is to build LUTF along with lustre just like other test-suites are build. The previous step "Build along Lustre tree using GNU tools" discussed fulfills this purpose. Once, the LUTF is build along with lustre, we have all the needed binary files and swig generated wrapper files used to run python test-scripts. After this -
- The config file (similar to how Auster has) provided for LUTF under lustre/tests/cfg/ is used to identify nodes involved in test-suite and set up environment variables.
- AT runs the Master script which reads the config file and set up LUTF on the identifies nodes and triggers the execution of test-suite on the Agent nodes.
- AT collects the results of the test-suite in the form of a YAML file (similar to Auster) and then passes the results to Maloo.
Infrastructure
Automatic Deployment
With LUTF-Autotest integration, an infrastructure is created that makes AT to deploy LUTF on the test nodes, collect results of the tests run and then pass the test results to Maloo to be displayed there.
Deploy LUTF
- A config file () is provided by AT which can define and set the environment variables .
- A Master script is created which can read the IP addresses of the nodes involved in the test-suite from the config file and run LUTF on the identified Agent and Master nodes.
- This Master script also triggers to run a child script that can fetch the information about the Network Interfaces (NIDs) on all the nodes involved in test-suite.
- This information of NIDs can then further be provided to each batch test (scripts to run all the similar tests related to one feature bundled together) to execute.
- The Master script then triggers the batch test script to run on the Agent nodes through the Master node identified to be used for the test-suite.
A sample Config file used by Auster |
---|
|
Sample LUTF Config file |
---|
#!/bin/bash #Key Exports export master_HOST=onyx-15vm1 export agent1_HOST=onyx-16vm1 export agent2_HOST=onyx-17vm1 export agent3_HOST=onyx-18vm1 export AGENTCOUNT=3 VERBOSE=true # ports for LUTF Telnet connection export MASTER_PORT=8494
export AGENT_PORT=8094 # script and result paths script_DIR=$LUSTRE/tests/lutf/python/test/dlc/ output_DIR=$LUSTRE/tests/lutf/python/tests/ |
Collect Results
- The YAML file also points to the path where the test result file for each test is stored.
- This YAML file is then passed to AT which further passes it to Maloo.
Sample LUTF result YAML file |
---|
TestGroup: test_group: review-ldiskfs testhost: trevis-13vm5 submission: Mon May 8 15:54:41 UTC 2017 user_name: root autotest_result_group_id: 5e11dc5b-7dd7-48a1-b4a3-74a333acd912 test_sequence: 1 test_index: 10 session_group_id: cfeff6b3-60fc-438a-88ef-68e65a08694f enforcing: true triggering_build_number: 45090 triggering_job_name: lustre-reviews total_enforcing_sessions: 5 code_review: type: Gerrit url: review.whamcloud.com project: fs/lustre-release branch: multi-rail identifiers: - id: 3fbd25eb0fe90e4f34e36bad006c73d756ef8499 issue_tracker: type: Jira url: jira.hpdd.intel.com identifiers: - id: LU-9119 Tests: - name: dlc description: lutf dlc submission: Mon May 8 15:54:43 UTC 2017 report_version: 2 result_path: lustre-release/lustre/tests/lutf/python/tests/ SubTests: - name: test_01 status: PASS duration: 2 return_code: 0 error: - name: test_02 status: PASS duration: 2 return_code: 0 error: duration: 5 status: PASS - name: multi-rail description: lutf multi-rail submission: Mon May 8 15:59:43 UTC 2017 report_version: 2 result_path: lustre-release/lustre/tests/lutf/python/tests/ SubTests: - name: test_01 status: PASS duration: 2 return_code: 0 error: - name: test_02 status: PASS duration: 2 return_code: 0 error: duration: 5 status: PASS |
|
Maloo
- A separate section is to be created in Maloo to display LUTF test results.
- The results from output YAML file passed from AT are displayed in the LUTF results section.
- A Test-parameter specifically for LUTF tests to be defined that will allow to run only LUTF tests. This will help in avoiding unnecessary tests to run for only LNet related changes.
C Backend
This allows for the setup of TCP connection (TCP sockets) to connect the Master and Agent nodes (lutf.c). LUTF can be run on a node in either Master mode or an Agent mode.
Master mode:
Spawns a listener thread (lutf_listener_main) to listen to Agent connections (lutf.c).
- Maintains a list of the Agents, check on the health of Agents, associate and disassociate with Agents (liblutf_agent.c).
- Start up a python interpreter (lutf_python.c).
Agent mode:
- Spawns a heart beat thread (lutf_heartbeat_main) to send a heart beat to master every 2 seconds. The master uses this Heart beat signal to determine the aliveness of the agents (lutf.c).
- Start up a python interpreter (lutf_python.c).
Python
Script execution and result collection
- A telnet connection is established from Master to Agent when we create a Script instance by running lutf_script.Script('local_intf', 'script_path ', 'output_dir') (lutf_script.py).
- The scripts from 'script_path' in lutf_script.Script('local_intf', 'script_path ', 'output_dir') are copied over to Agent using scp (lutf_agent_ctrl.py).
- The copied scripts are then executed by calling run_script() on the Script instance created. (lutf_agent_ctrl.py).
- If an 'output_dir' path is specified then the results of the script execution are copied to the path given by calling push_results(). If no path is provided for the 'output_dir' then the results are ignored.
Improvements
- Currently the LUTF is designed to have the Python infrastructure establish a Telnet connection to facilitate Master to scp the test scripts to Agent and then execute those test scripts. The Telnet approach can be improved upon by using SSH instead.
- can be added to synchronize the different parts of one test script running on different Agents by providing an API that uses notification mechanism. The Master node will control this synchronization between different Agent nodes that are used for running a test script. An example scenario of how it would be implemented is -If a test script is such that it requires to do some operation on more than one Agent node, then as one part of a test script runs to it completion on one Agent, it would notify the Master about its status by calling this API and then Master can redirect this event to the main script waiting on it which will trigger the other part (operation) to start execution on another Agent node.
- All the similar test scripts (pertaining to one feature like multi-rail or Dynamic Discovery) are bundled in one auto-test script which when executed, runs all the test-scripts listed in it and then post the cumulative results after execution.
There is an auto-test script for each of the bundle of test-scripts related to one feature.
The result file for each individual test script is also placed in lutfTMP directory on Agent node.
- The above design can be changed to have all the test scripts related to a feature placed in separate directory under LUTF/python/tests/ and then have single auto-test script which will trigger the execution of all the test scripts under one folder. The name of the folder can be passed as a parameter to this auto-test script.