This High Level Design Document describes the current LUTF design, code base, infrastructure requirements for its setup and the new features that can be added on top of the current design.

Reference Documents

Document Link
LNet Unit Test Infrastructure (LUTF) Requirements

...

lutf binary
liblutf_agent.so - shared library to communicate with the LUTF backend.
clutf_agen.py and _clutf_agent.so: glue code that allows python to call functions in liblutf_agent.so
lnetconfig.py and _lnetconfig.so
and lnetconfig.py
- glue code to allow python test scripts to utilize the DLC interface.

The build process will check if python 2.7.5 and SWIG 2.0 or higher is installed before building. If these requirements are not met the LUTF is will not be built

If the LUTF is built it will be packaged in the lustre-tests rpm and installed in /usr/lib64/lustre/tests/lutf.

...

The LUTF will also require that passwordless ssh is setup for all the nodes which run the LUTF. This task will fall on the admin.

LUTF/AT Integration

LUTF Deployment

The LUTF will provide a deployment script, lutf_deploy.py, which will download and install all the necessary elements defined above. If everything is successful it .

The LUTF will provide a start script, lutf_launch.py, which will start the LUTF master and agent nodes given the LUTF YAML appropriate configuration filefiles, described later in this document.

AT Integration

A similar script to auster will be provided by the LUTF, lutf_engageperform_test.py. The purpose of the script is to manage which nodes the LUTF will be deployed on. Only the AT has knowledge of the nodes available; therefore the script will perform the following steps;

Take as input the following parameters. NOTE: These parameters can be provided as a set of environment variables, or can be placed in a YAML file and then the path of the YAML file can be passed to the lutf_engageperform_test.py script. The second option will be assumed in this HLD.
1. IP address of node to be used for master
2. IP addresses of nodes to be used as agents
3. Two YAML configuration files for the Master and Agent nodes.
4. YAML configuration file describing the tests to run.
Call the lutf_deploy.py script for each of the nodes provided.
Call the lutf_launch.py script for each of the nodes provided. Pass It will pass the Master YAML LUTF Configuration file to the master node that the agent configuration file to the agent nodes.
1. Query the LUTF master to ensure the expected number of agents are connected.
2. If everything is correct, then continue with the tests, otherwise build a YAML block describing the error.
Send the test YAML configuration file to the LUTF master and wait.
Once the tests are completed the LUTF master will return a YAML block describing the test results, described below
1. the LUTF Master will provide an API based around paramiko. The API is described below.

LUTF Configuration Files

...

Setup YAML Configuration File

This configuration file describes the information the master needs in order to startfile is passed to the lutf_perform_test.py. It describes the test system so that the LUTF can be deployed correctly.

Code Block
config: type: test-setup master : <ip of mport: <master port> base_path: <base path to the LUTF directory - optional. if not present default to /usr/lib64/lustre/tests> extra_py: <extra python paths>

...

master>
	agent:
		0: <ip of 1st agent>
		1: <ip of 2nd agent>
		...
		N: <ip of Nth agent>
    master_cfg: <path to master config file>
	agent_cfg: <path to agent config file>
	test_cfg: <path to test config file>

Master YAML Configuration File

This configuration file describes the information the agent master needs in order to start

Code Block

config:
   type: agentmaster
   maddressmport: <master<OPTIONAL: addressmaster - optional>port. Default: 8494>
   mportdport: <master port>
daemon port. Used dport:to <agentcommunicate daemonwith port>master>
   base_path: <OPTIONAL: <basebase path to the LUTF directory> directory.
               Default: /usr/lib64/lustre/tests>
   extra_py: <OPTIONAL: <extraextra python paths>

...

Slave YAML Configuration File

This configuration file describes the list of tests to runthe information the agent needs in order to start

Code Block

config:
   type: testsagent
   testsmaddress: <master address>
   mport: <OPTIONAL: master -port. 0Default: <test set name>8094>
        1dport: <OPTIONAL: <testagent setdaemon name>port>
   base_path: <OPTIONAL: base path to 2:the <testLUTF set name>
directory
			        ....Default: /usr/lib64/lustre/tests>
        Nextra_py: <test<extra setpython name>

...

paths>

Test YAML Configuration File

This

...

configuration file describes the

...

list of

...

tests

...

to run

Code Block

TestGroupconfig:
    test_group: review-ldiskfs
type: tests
   testsID: <test id>
   tests:
      testhost: trevis-13vm5
- 0: <test set name>
        submission1: Mon<test Mayset name>
 8 15:54:41 UTC 2017
    2: <test set name>
        ....
        N: <test set name>
 
# "test set name" is the name of the directory under lutf/python/tests
# which includes the tests to be run. For example: dlc, multi-rail, etc

LUTF Result file

This YAML result file describes the results of the tests that were requested to run (TODO: it's not clear exactly what the result file will look like. What definitely will be needed is the results zip file generated by the LUTF master. This will need to be available from Maloo to be able to understand which tests failed, and why)

Code Block

TestGroup:
    test_group: review-ldiskfs
    testhost: trevis-13vm5
    user_name: root
autotest_result_group_id: 5e11dc5b-7dd7-48a1-b4a3-74a333acd912
test_sequence: 1
test_index: 10
session_group_id: cfeff6b3-60fc-438a-88ef-68e65a08694f
enforcing: true
triggering_build_number: 45090
triggering_job_name: lustre-reviews
total_enforcing_sessions: 5
code_review:
 type: Gerrit
 url: review.whamcloud.com
 project: fs/lustre-release
 branch: multi-rail
 identifiers:
 - id: 3fbd25eb0fe90e4f34e36bad006c73d756ef8499
issue_tracker:
 type: Jira
 url: jira.hpdd.intel.com
 identifiers:
 - id: LU-9119
Tests:
- name: dlc
        description: lutf dlc
        submission: Mon May  8 15:54:4341 UTC 2017
        report_versionuser_name: 2
        result_path: lustre-release/lustre/tests/lutf/python/tests/
        SubTests:
        - name: test_01
          status: PASS
          duration: 2
          return_code: 0root
autotest_result_group_id: 5e11dc5b-7dd7-48a1-b4a3-74a333acd912
test_sequence: 1
test_index: 10
session_group_id: cfeff6b3-60fc-438a-88ef-68e65a08694f
enforcing: true
triggering_build_number: 45090
triggering_job_name: lustre-reviews
total_enforcing_sessions: 5
code_review:
 type: Gerrit
 url: review.whamcloud.com
 project: fs/lustre-release
 branch: multi-rail
 identifiers:
 - id: 3fbd25eb0fe90e4f34e36bad006c73d756ef8499
issue_tracker:
 type: Jira
 url: jira.hpdd.intel.com
 identifiers:
 - id: LU-9119
Tests:
- name: dlc
        description: lutf error:dlc
        - namesubmission: test_02
Mon May  8 15:54:43      status: PASSUTC 2017
          durationreport_version: 2
          returnresult_codepath: 0lustre-release/lustre/tests/lutf/python/tests/
          errorSubTests:
        - durationname: 5test_01
          status: PASS
-  name: multi-rail
        descriptionduration: 2
 lutf multi-rail
        submissionreturn_code: Mon0
 May  8 15:59:43 UTC 2017
        report_version: 2
        result_path: lustre-release/lustre/tests/lutf/python/tests/
        SubTestserror:
        - name: test_0102
            status: PASS
            duration: 2
            return_code: 0
            error:
        - nameduration: test_02
    5
        status: PASS
-  name: multi-rail
        description: duration: 2lutf multi-rail
            return_codesubmission: 0
Mon May  8 15:59:43        error:UTC 2017
        durationreport_version: 52
        status: PASS

...

sample.sh

...

#!/bin/bash

#Key Exports

export master_HOST=onyx-15vm1

export agent1_HOST=onyx-16vm1

export agent2_HOST=onyx-17vm1

export agent3_HOST=onyx-18vm1

export AGENTCOUNT=3

VERBOSE=true

# ports for LUTF Telnet connection

export MASTER_PORT=8494

export AGENT_PORT=8094

# script and result paths

script_DIR=$LUSTRE/tests/lutf/python/test/dlc/

output_DIR=$LUSTRE/tests/lutf/python/tests/

Collect Results

...

result_path: lustre-release/lustre/tests/lutf/python/tests/
        SubTests:
        - name: test_01
            status: PASS
            duration: 2
            return_code: 0
            error:
        - name: test_02
            status: PASS
            duration: 2
            return_code: 0
            error:
        duration: 5
        status: PASS

LUTF Master API

There are two ways to start the LUTF Master.

In interactive mode
1. This is useful for interactive testing
In Daemon mode
1. This is useful for automatic testing

In either of these modes the Master instance can process the following requests:

Query the status of the LUTF master and its agents
Run tests
Collect results

A C API SWIG wrapped to allow it to be called from python will be provided. The API will send messages to the identified LUTF Master instance to perform the above tasks, and then wait indefinitely until the request completes.

Query Status

Send a QUERY message to the LUTF Master
LUTF Master will look up all the agents currently connected.
LUTF Master will bundle the information and send it back.
The result is examined against expected values
The script succeeds or fails.

Run Tests

Send a RUN_TESTS message to the LUTF Master
Include a buffer containing the YAML block identifying the tests to run
LUTF master will run the tests
1. For each individual test run a result file is generated
2. An overall test run result file will also be generated
Once the LUTF master finishes running the tests it will ZIP up the results and return a path to the results to the caller.
The script will then collect the results

Collect Results

Send a COLLECT_RESULTS with the test ID to collect
LUTF Master ZIP up the test results and returns back to caller.
The script can then collect the results.

Message Structure

Code Block

typedef enum {
	EN_MSG_TYPE_HB = 0,
	EN_MSG_TYPE_QUERY_STATUS,
	EN_MSG_TYPE_RUN_TESTS,
	EN_MSG_TYPE_COLLECT_RESULTS,
	EN_MSG_TYPE_YAML_INFO,
	EN_MSG_TYPE_MAX
} lutf_msg_type_t;


typedef struct lutf_message_hdr_s {
	lutf_msg_type_t type;
	unsigned int len;
	struct in_addr ip;
	unsigned int version;
} lutf_message_hdr_t;

YAML Response

For each of the three requests identified above, the LUTF Master will respond with a YAML block. The python script can use the python YAML parser to extract relevant information.

Code Block
master_response: status: <[Success \| Failure]> agents: - name: <agent name> ip: <agent ip address> - name: <agent name> ip: <agent ip address> test_results: <path to zipped results>

Network Interface Discovery

The LUTF test scripts will need to be implemented in a generic way. Which means that each test scripts which requires the use of interfaces, will need to discover

...

TestGroup:

test_group: review-ldiskfs

testhost: trevis-13vm5

submission: Mon May 8 15:54:41 UTC 2017

user_name: root

autotest_result_group_id: 5e11dc5b-7dd7-48a1-b4a3-74a333acd912

test_sequence: 1

test_index: 10

session_group_id: cfeff6b3-60fc-438a-88ef-68e65a08694f

enforcing: true

triggering_build_number: 45090

triggering_job_name: lustre-reviews

total_enforcing_sessions: 5

code_review:

type: Gerrit

url: review.whamcloud.com

project: fs/lustre-release

branch: multi-rail

identifiers:

- id: 3fbd25eb0fe90e4f34e36bad006c73d756ef8499

issue_tracker:

type: Jira

url: jira.hpdd.intel.com

identifiers:

- id: LU-9119

Tests:

-

description: lutf dlc

submission: Mon May 8 15:54:43 UTC 2017

report_version: 2

result_path: lustre-release/lustre/tests/lutf/python/tests/

SubTests:

-

status: PASS

duration: 2

return_code: 0

error:

-

status: PASS

duration: 2

return_code: 0

error:

duration: 5

status: PASS

-

description: lutf multi-rail

submission: Mon May 8 15:59:43 UTC 2017

report_version: 2

result_path: lustre-release/lustre/tests/lutf/python/tests/

SubTests:

-

status: PASS

duration: 2

return_code: 0

error:

-

status: PASS

duration: 2

return_code: 0

error:

duration: 5

status: PASS

Network Interface Discovery

The LUTF test scripts will need to be implemented in a generic way. Which means that each test scripts which requires the use of interfaces, will need to discover the interfaces available to it on the node. If there are sufficient number of interfaces of the correct type, then the test can continue otherwise the test will be skipped and reported as such in the final result.

...

Maloo

A separate section is to be created in Maloo to display LUTF test results.
The results from output YAML file passed from AT are displayed in the LUTF results section.
A Test-parameter specifically for LUTF tests to be defined that will allow to run only LUTF tests. This will help in avoiding unnecessary tests to run for only LNet related changes.

C Backend

This allows for the setup of TCP connection (TCP sockets) to connect the Master and Agent nodes (lutf.c). LUTF can be run on a node in either Master mode or an Agent mode.

Master mode:
1. Spawns a listener thread (lutf_listener_main) to listen to Agent connections (lutf.c).
2. Maintains a list of the Agents, check on the health of Agents, associate and disassociate with Agents (liblutf_agent.c).
3. Start up a python interpreter (lutf_python.c).
Agent mode:
1. Spawns a heart beat thread (lutf_heartbeat_main) to send a heart beat to master every 2 seconds. The master uses this Heart beat signal to determine the aliveness of the agents (lutf.c).
2. Start up a python interpreter (lutf_python.c).

Python

Script execution and result collection

A telnet connection is established from Master to Agent when we create a Script instance by running lutf_script.Script('local_intf', 'script_path ', 'output_dir') (lutf_script.py).
The scripts from 'script_path' in lutf_script.Script('local_intf', 'script_path ', 'output_dir') are copied over to Agent using scp (lutf_agent_ctrl.py).
The copied scripts are then executed by calling run_script() on the Script instance created. (lutf_agent_ctrl.py).
If an 'output_dir' path is specified then the results of the script execution are copied to the path given by calling push_results(). If no path is provided for the 'output_dir' then the results are ignored.

Improvements

Currently the LUTF is designed to have the Python infrastructure establish a Telnet connection to facilitate Master to scp the test scripts to Agent and then execute those test scripts. The Telnet approach can be improved upon by using SSH instead.
A synchronization mechanism can be added to synchronize the different parts of one test script running on different Agents by providing an API that uses notification mechanism. The Master node will control this synchronization between different Agent nodes that are used for running a test script. An example scenario of how it would be implemented is -If a test script is such that it requires to do some operation on more than one Agent node, then as one part of a test script runs to it completion on one Agent, it would notify the Master about its status by calling this API and then Master can redirect this event to the main script waiting on it which will trigger the other part (operation) to start execution on another Agent node.

Batch test

All the similar test scripts (pertaining to one feature like multi-rail or Dynamic Discovery) are bundled in one auto-test script which when executed, runs all the test-scripts listed in it and then post the cumulative results after execution.
There is an auto-test script for each of the bundle of test-scripts related to one feature.
The result file for each individual test script is also placed in lutfTMP directory on Agent node.

Improvements

The above design can be changed to have all the test scripts related to a feature placed in separate directory under LUTF/python/tests/ and then have single auto-test script which will trigger the execution of all the test scripts under one folder. The name of the folder can be passed as a parameter to this auto-test script.

Improvements

Currently the LUTF is designed to have the Python infrastructure establish a Telnet connection to facilitate Master to scp the test scripts to Agent and then execute those test scripts. The Telnet approach can be improved upon by using SSH instead.
A synchronization mechanism can be added to synchronize the different parts of one test script running on different Agents by providing an API that uses notification mechanism. The Master node will control this synchronization between different Agent nodes that are used for running a test script. An example scenario of how it would be implemented is -If a test script is such that it requires to do some operation on more than one Agent node, then as one part of a test script runs to it completion on one Agent, it would notify the Master about its status by calling this API and then Master can redirect this event to the main script waiting on it which will trigger the other part (operation) to start execution on another Agent node.

Misc

Some Sample files from Auster

A sample Config file used by Auster	A sample result YAML file from Auster
sample.sh	results.yml

Another proposal to passing information to the LUTF if it can not be passed via a YAML config file as described above.

Code Block

#!/bin/bash
#Key Exports
export master_HOST=onyx-15vm1
export agent1_HOST=onyx-16vm1
export agent2_HOST=onyx-17vm1
export agent3_HOST=onyx-18vm1
export AGENTCOUNT=3

VERBOSE=true
  
# ports for LUTF Telnet connection
export MASTER_PORT=8494
export AGENT_PORT=8094

# script and result paths
script_DIR=$LUSTRE/tests/lutf/python/test/dlc/
output_DIR=$LUSTRE/tests/lutf/python/tests/

...

Space shortcuts

Page tree

Versions Compared

Old Version 75

New Version 76

Key

Reference Documents

LUTF/AT Integration

LUTF Deployment

AT Integration

LUTF Configuration Files

Setup YAML Configuration File

Master YAML Configuration File

Slave YAML Configuration File

Test YAML Configuration File

LUTF Result file

Collect Results

LUTF Master API

Query Status

Run Tests

Collect Results

Message Structure

YAML Response

Network Interface Discovery

Network Interface Discovery

Maloo

C Backend

Python

Script execution and result collection

Improvements

Batch test

Improvements

Improvements

Misc

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 75

New Version 76

Key

Reference Documents

LUTF/AT Integration

LUTF Deployment

AT Integration

LUTF Configuration Files

Setup YAML Configuration File

Master YAML Configuration File

Slave YAML Configuration File

Test YAML Configuration File

LUTF Result file

Collect Results

LUTF Master API

Query Status

Run Tests

Collect Results

Message Structure

YAML Response

Network Interface Discovery

Network Interface Discovery

Maloo

C Backend

Python

Script execution and result collection

Improvements

Batch test

Improvements

Improvements

Misc