Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The LUTF is meant to cover the following test use cases:

Use Case
Requirement
Description
Single node configuration
Excersize the
  • Exercise the liblnetconfig  API directly to configure LNet
Excersise the
  • Exercise the lnetctl  utility to configure LNet

LUTF Design Overview

The LUTF is designed with a Master-Agent approach to test LNet. The Master and Agent LUTF instance uses a telnet python module to communicate with each other and more than one Agent can communicate with single Master instance at the same time. The Master instance controls the execution of the python test scripts to test LNet on Agent instances. It collects the results of all the tests run on Agents and write them to a YAML file. It also controls the synchronization mechanism between test-scripts running on different Agents.

The below diagram shows how LUTF interacts with LNet

Gliffy Diagram
size700
nameLUTF design
pagePin8

Figure 1: System Level Diagram

LUTF Data Flow

...

LUTF Deployment

The LUTF will provide a dependency script, lutf_dep.py, which will download and install all the necessary elements defined above.

The LUTF will integrate with auster. LUTF should just run like any other Lustre test. A bash wrapper script will be created to execute the LUTF, lutf.sh .

SIDE NOTE: Since LUTF simply just runs python scripts, it can run any test, including Lustre tests.

Auster

auster configuration scripts set up the environment variables required for the tests to run. These environment variables include:

  1. The nodes involved in the tests
  2. The devices to use for storage
  3. The clients
  4. The PDSH command to use

It also sets a host of specific Lustre environment variables.

It then executes the tests scripts, ex: sanity.sh 

sanity.sh can then run scripts utilizing the information provided in the environment variables.

LUTF and Auster

The LUTF will build on the existing test infrastructure.

An lutf.sh script will be created, which will be executed from auster.

auster will continue to setup the environment variables it does as of the time of this writing. The lutf.sh will run the LUTF. Since the LUTF is run within the auster context, the test python scripts will have access to these environment variables and can use them the same way as the bash test scripts do. If LUTF python scripts are executed on the remote node the necessary information from the environment variables are delivered to these scripts.

Test Prerequisites

Before each test the lutf.sh will provide functions to perform the following checks:

  1. If the master hasn't started, start it.
  2. If the agents on the nodes specified haven't started, then start them.
  3. Verify the system is ready to start. IE: master and agents are all started.

Test Post-requisites

  1. Provide test results in YAML format.

It's the responsibility of the test scripts to ensure that the system is in an expected state; ie: file system unmounted, modules unloaded, etc.

LUTF Threading Overview

...

All tests are run on one node.

Multi-node/no File system testing
  • Configure one or more nodes
  • Run lnet_selftest
  • Ensure traffic conforms to configuration
  • Repeat the above

These tests require node synchronization. For example if a script is configuring node A, node B can not start traffic until node A has finished configuration.

Multi-node/File system testing
  • Start file system traffic
  • Perform some configuration changes which would change LNet behavior
  • Ensure that configuration changes are honored

These tests require node synchronization.

Error Injection testing
  • Either with file system mount or not
  • Inject various types of errors on different nodes on the setup
  • Monitor statistics to determine how LNet is handling faults

These tests require node synchronization.

LUTF Design Overview

The LUTF is designed with a Master-Agent approach to test LNet. The Master and Agent LUTF instance uses a telnet python module to communicate with each other and more than one Agent can communicate with single Master instance at the same time. The Master instance controls the execution of the python test scripts to test LNet on Agent instances. It collects the results of all the tests run on Agents and write them to a YAML file. It also controls the synchronization mechanism between test-scripts running on different Agents.

The below diagram shows how LUTF interacts with LNet

Gliffy Diagram
size700
nameLUTF design
pagePin8

Figure 1: System Level Diagram

LUTF Data Flow

Gliffy Diagram
nameLUTF Data Flow
pagePin1

LUTF Deployment

The LUTF will provide a dependency script, lutf_dep.py, which will download and install all the necessary elements defined above.

The LUTF will integrate with auster. LUTF should just run like any other Lustre test. A bash wrapper script will be created to execute the LUTF, lutf.sh .

SIDE NOTE: Since LUTF simply just runs python scripts, it can run any test, including Lustre tests.

Auster

auster configuration scripts set up the environment variables required for the tests to run. These environment variables include:

  1. The nodes involved in the tests
  2. The devices to use for storage
  3. The clients
  4. The PDSH command to use

It also sets a host of specific Lustre environment variables.

It then executes the tests scripts, ex: sanity.sh 

sanity.sh can then run scripts utilizing the information provided in the environment variables.

LUTF and Auster

The LUTF will build on the existing test infrastructure.

An lutf.sh script will be created, which will be executed from auster.

auster will continue to setup the environment variables it does as of the time of this writing. The lutf.sh will run the LUTF. Since the LUTF is run within the auster context, the test python scripts will have access to these environment variables and can use them the same way as the bash test scripts do. If LUTF python scripts are executed on the remote node the necessary information from the environment variables are delivered to these scripts.

Test Prerequisites

Before each test the lutf.sh will provide functions to perform the following checks:

  1. If the master hasn't started, start it.
  2. If the agents on the nodes specified haven't started, then start them.
  3. Verify the system is ready to start. IE: master and agents are all started.

Test Post-requisites

  1. Provide test results in YAML format.

It's the responsibility of the test scripts to ensure that the system is in an expected state; ie: file system unmounted, modules unloaded, etc.

LUTF Threading Overview

Gliffy Diagram
nameThreading Overview
pagePin2

Thread Description

  • Listener: Listens for connections from LUTF Agents and for Heartbeats to monitor aliveness of the Agents.
  • HeartBeat: Send a periodic heartbeat to the LUTF Master to inform it that the agent is still alive.
  • Python Interpreter: Executes python test scripts which can call into one of the C/Python APIs provided

C/Python APIs

C/Python Management API

  1. Parse configuration
  2. provide status on the LUTF Agents
  3. provide status on executing scripts
  4. Store results

C/Python Synchronization APIs

  1. Assign work to LUTF Agents from LUTF Master
    1. This will result in a YAML rpc block being sent to the LUTF agent
  2. Wait for work completion events from LUTF Agents
  3. Register for asynchronus events
    1. Asynchronous events come in the form of YAML blocks.

C/Python liblnetconfig APIs

  1. These are the configuration APIs in lnet/utils/lnetconfig/liblnetconfig.h

Other APIs can be wrapped in SWIG and exposed for the LUTF python test scripts to call

LUTF Test Scripts Design Overview

  • The test scripts will be deployed on all nodes under test as well as the test master.
  • Each test script will need to provide a run  function
    • This function is intended to be executed by the  test master
  • The LUTF will provide at least one other function to perform the actual testing.
    • This function will be called remotely and will execute on the test node.
  • Each test, which can be composed of arbetrary python code, must return a YAML text block to the test master reporting the results of the operation.
  • All functions should always take a dictionary as its input parameter and output a dictionary as its return result

LUTF Communication Protocol


Code Block
rpc:
   target: agent_id
   type: function_call
   fname: function_name
   parameters:
      param0: value
      param1: value2
      param2: 1
      param3: [1, 2, 3]
      param4: 1.4


Test Environment Set-Up

Each node which will run the LUTF will need to have the following installed

...