Introduction
Currently there is no dedicated functional test tool in Lustre test suites for LNet testing. Lustre Unit Test Framework (LUTF) fills that gap to provide a means for testing existing LNet features as well as new features that would be added in future. It facilitates an easy way of adding new test cases/scripts to test any new LNet feature.
Objectives
This High Level Design Document describes the current LUTF design, code base, infrastructure requirements for its setup and the new features that can be added on top of the current design.
Reference Documents
Document Link |
---|
LNet Unit Test Infrastructure (LUTF) Requirements |
Document Structure
This document is made up of the following sections:
- Design Overview
- Building the LUTF
- LUTF-Autotest Integration
- Infrastructure
LUTF Design Overview
The LUTF is designed with a Master-Agent approach to test LNet. The Master and Agent LUTF instance uses a telnet python module to communicate with each other and more than one Agent can communicate with single Master instance at the same time. The Master instance controls the execution of the python test scripts to test LNet on Agent instances. It collects the results of all the tests run on Agents and write them to a YAML file. It also controls the synchronization mechanism between test-scripts running on different Agents.
The below diagram shows how LUTF interacts with LNet
Figure 1: System Level Diagram
Building the LUTF
The LUTF shall be integrated with the Lustre tests under lustre/tests/lutf
. The LUTF will be built and packaged with the standard
sh ./autogen.sh ./configure --with-linux=<kernel path> make # optionally make rpms # optionally make install
The make system will build the following items:
lutf
binaryliblutf_agent.so
- shared library to communicate with the LUTF backend.clutf_agen.py
and _clutf_agent.so
: glue code that allows python to call functions in liblutf_agent.so
- glue code to allow python test scripts to utilize the DLC interface.lnetconfig.py
and _lnetconfig.so
The build process will check if python 2.7.5
and SWIG 2.0
or higher is installed before building. If these requirements are not met the LUTF will not be built
If the LUTF is built it will be packaged in the lustre-tests
rpm and installed in /usr/lib64/lustre/tests/lutf
.
Test Environment Set-Up
Each node which will run the LUTF will need to have the following installed
- ncurses library
yum install ncurses-devel
- readline library
yum install readline-devel
- python 2.7.5
https://www.python.org/download/releases/2.7.5/
./configure --prefix=<> --enable-shared # it is recommended to install in standard system path
make; make install
- setuptools
- https://pypi.python.org/pypi/setuptools
- The way it worked for me:
- Download package and untar
python2.7 setup.py install
- https://pypi.python.org/pypi/setuptools
- psutils
- https://pypi.python.org/pypi?:action=display&name=psutil
- untar
- cd to untared directory
python2.7 setup.py install
- https://pypi.python.org/pypi?:action=display&name=psutil
- netifaces
- Install PyYAML
The LUTF will also require that passwordless ssh is setup for all the nodes which run the LUTF. This task will fall on the admin.
LUTF/AT Integration
LUTF Deployment
The LUTF will provide a deployment script, lutf_deploy.py,
which will download and install all the necessary elements defined above.
The LUTF will provide a start script, lutf_launch.py
, which will start the master and agent nodes given the appropriate configuration files, described later in this document.
AT Integration
A similar script to auster will be provided by the LUTF, lutf_perform_test.py
. The purpose of the script is to manage which nodes the LUTF will be deployed on. Only the AT has knowledge of the nodes available; therefore the script will perform the following steps;
- Take as input the following parameters. NOTE: These parameters can be provided as a set of environment variables, or can be placed in a YAML file and then the path of the YAML file can be passed to the
lutf_perform_test.py
script. The second option will be assumed in this HLD.- IP address of node to be used for master
- IP addresses of nodes to be used as agents
- Two YAML configuration files for the Master and Agent nodes.
- YAML configuration file describing the tests to run.
- Call the
lutf_deploy.py
script for each of the nodes provided. - Call the
lutf_launch.py
script for each of the nodes provided. Pass the Master YAML LUTF Configuration file to the master node that the agent configuration file to the agent nodes.- Query the LUTF master to ensure the expected number of agents are connected.
- If everything is correct, then continue with the tests, otherwise build a YAML block describing the error.
- Send the test YAML configuration file to the LUTF master and wait.
- Once the tests are completed the LUTF master will return a YAML block describing the test results, described below
- the LUTF Master will provide an API based around
paramiko
. The API is described below.
- the LUTF Master will provide an API based around
LUTF Configuration Files
Setup YAML Configuration File
This file is passed to the lutf_perform_test.py. It describes the test system so that the LUTF can be deployed correctly.
config: type: test-setup master: <ip of master> agent: 0: <ip of 1st agent> 1: <ip of 2nd agent> ... N: <ip of Nth agent> master_cfg: <path to master config file> agent_cfg: <path to agent config file> test_cfg: <path to test config file> result_dir: <path to the directory to store the test results in>
Master YAML Configuration File
This configuration file describes the information the master needs in order to start
config: type: master mport: <OPTIONAL: master port. Default: 8494> dport: <master daemon port. Used to communicate with master> base_path: <OPTIONAL: base path to the LUTF directory. Default: /usr/lib64/lustre/tests> extra_py: <OPTIONAL: extra python paths>
Agent YAML Configuration File
This configuration file describes the information the agent needs in order to start
config: type: agent maddress: <master address> mport: <OPTIONAL: master port. Default: 8494> dport: <OPTIONAL: agent daemon port. Default: 8094> base_path: <OPTIONAL: base path to the LUTF directory Default: /usr/lib64/lustre/tests> extra_py: <extra python paths>
The agent's maddress can be inserted automatically, since it's already defined in the setup configuration file.
Both the Master and Agent configuration files can be optional. If nothing is provided all the parameters will be defaulted. In the absence of an agent configuration file one will be automatically created that only has the maddress field. Example below:
config: type: agent maddress: <master address as provided in the setup file>
Test YAML Configuration File
This configuration file describes the list of tests to run
config: type: tests testsID: <test id> timeout: <how long to wait before the test is considered a failure. If not provided then the script will wait until killed by the AT> tests: 0: <test set name> 1: <test set name> 2: <test set name> .... N: <test set name> # "test set name" is the name of the directory under lutf/python/tests # which includes the tests to be run. For example: dlc, multi-rail, etc
LUTF Result file
This YAML result file describes the results of the tests that were requested to run (TODO: it's not clear exactly what the result file will look like. What definitely will be needed is the results zip file generated by the LUTF master. This will need to be available from Maloo to be able to understand which tests failed, and why)
TestGroup: test_group: review-ldiskfs testhost: trevis-13vm5 submission: Mon May 8 15:54:41 UTC 2017 user_name: root autotest_result_group_id: 5e11dc5b-7dd7-48a1-b4a3-74a333acd912 test_sequence: 1 test_index: 10 session_group_id: cfeff6b3-60fc-438a-88ef-68e65a08694f enforcing: true triggering_build_number: 45090 triggering_job_name: lustre-reviews total_enforcing_sessions: 5 code_review: type: Gerrit url: review.whamcloud.com project: fs/lustre-release branch: multi-rail identifiers: - id: 3fbd25eb0fe90e4f34e36bad006c73d756ef8499 issue_tracker: type: Jira url: jira.hpdd.intel.com identifiers: - id: LU-9119 Tests: - name: dlc description: lutf dlc submission: Mon May 8 15:54:43 UTC 2017 report_version: 2 result_path: lustre-release/lustre/tests/lutf/python/tests/ SubTests: - name: test_01 status: PASS duration: 2 return_code: 0 error: - name: test_02 status: PASS duration: 2 return_code: 0 error: duration: 5 status: PASS - name: multi-rail description: lutf multi-rail submission: Mon May 8 15:59:43 UTC 2017 report_version: 2 result_path: lustre-release/lustre/tests/lutf/python/tests/ SubTests: - name: test_01 status: PASS duration: 2 return_code: 0 error: - name: test_02 status: PASS duration: 2 return_code: 0 error: duration: 5 status: PASS
LUTF Master API
There are two ways to start the LUTF Master.
- In interactive mode
- This is useful for interactive testing
- In Daemon mode
- This is useful for automatic testing
In either of these modes the Master instance can process the following requests:
- Query the status of the LUTF master and its agents
- Run tests
- Collect results
A C API SWIG wrapped to allow it to be called from python will be provided. The API will send messages to the identified LUTF Master instance to perform the above tasks, and then wait indefinitely until the request completes.
Query Status
- Send a QUERY message to the LUTF Master
- LUTF Master will look up all the agents currently connected.
- LUTF Master will bundle the information and send it back.
- The result is examined against expected values
- The script succeeds or fails.
Run Tests
- Send a RUN_TESTS message to the LUTF Master
- Include a buffer containing the YAML block identifying the tests to run
- LUTF master will run the tests
- For each individual test run a result file is generated
- An overall test run result file will also be generated
- Once the LUTF master finishes running the tests it will ZIP up the results and return a path to the results to the caller.
- The script will then collect the results
Collect Results
- Send a COLLECT_RESULTS with the test ID to collect
- LUTF Master ZIP up the test results and returns back to caller.
- The script can then collect the results.
Message Structure
typedef enum { EN_MSG_TYPE_HB = 0, EN_MSG_TYPE_QUERY_STATUS, EN_MSG_TYPE_RUN_TESTS, EN_MSG_TYPE_COLLECT_RESULTS, EN_MSG_TYPE_YAML_INFO, EN_MSG_TYPE_MAX } lutf_msg_type_t; typedef struct lutf_message_hdr_s { lutf_msg_type_t type; unsigned int len; struct in_addr ip; unsigned int version; } lutf_message_hdr_t;
YAML Response
For each of the three requests identified above, the LUTF Master will respond with a YAML block. The python script can use the python YAML parser to extract relevant information.
master_response: status: <[Success | Failure]> agents: - name: <agent name> ip: <agent ip address> - name: <agent name> ip: <agent ip address> test_results: <path to zipped results>
Network Interface Discovery
The LUTF test scripts will need to be implemented in a generic way. Which means that each test scripts which requires the use of interfaces, will need to discover the interfaces available to it on the node. If there are sufficient number of interfaces of the correct type, then the test can continue otherwise the test will be skipped and reported as such in the final result.
Maloo
- A separate section is to be created in Maloo to display LUTF test results.
- The results from output YAML file passed from AT are displayed in the LUTF results section.
- A Test-parameter specifically for LUTF tests to be defined that will allow to run only LUTF tests. This will help in avoiding unnecessary tests to run for only LNet related changes.
Improvements
- Currently the LUTF is designed to have the Python infrastructure establish a Telnet connection to facilitate Master to scp the test scripts to Agent and then execute those test scripts. The Telnet approach can be improved upon by using SSH instead.
- A synchronization mechanism can be added to synchronize the different parts of one test script running on different Agents by providing an API that uses notification mechanism. The Master node will control this synchronization between different Agent nodes that are used for running a test script. An example scenario of how it would be implemented is -If a test script is such that it requires to do some operation on more than one Agent node, then as one part of a test script runs to it completion on one Agent, it would notify the Master about its status by calling this API and then Master can redirect this event to the main script waiting on it which will trigger the other part (operation) to start execution on another Agent node.
Misc
Some Sample files from Auster
A sample Config file used by Auster | A sample result YAML file from Auster |
---|---|
results.yml |
Another proposal to passing information to the LUTF if it can not be passed via a YAML config file as described above.
#!/bin/bash #Key Exports export master_HOST=onyx-15vm1 export agent1_HOST=onyx-16vm1 export agent2_HOST=onyx-17vm1 export agent3_HOST=onyx-18vm1 export AGENTCOUNT=3 VERBOSE=true # ports for LUTF Telnet connection export MASTER_PORT=8494 export AGENT_PORT=8094 # script and result paths script_DIR=$LUSTRE/tests/lutf/python/test/dlc/ output_DIR=$LUSTRE/tests/lutf/python/tests/