Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This document is made up of the following sections:

  • Use Cases
  • Design Overview
  • Building the LUTF
  • LUTF-Autotest Integration
  • Infrastructure

...

Use CaseDescription
Single node configuration
  • Exercise the liblnetconfig  API directly to configure LNet
  • Exercise the lnetctl  utility to configure LNet

All tests are run on one node.

Multi-node/no File system testing
  • Configure one or more nodes
  • Run lnet_selftest
  • Ensure traffic conforms to configuration
  • Repeat the above

These tests require node synchronization. For example if a script is configuring node A, node B can not start traffic until node A has finished configuration.

Multi-node/File system testing
  • Start file system traffic
  • Perform some configuration changes which would change LNet behaviour
  • Ensure that configuration changes are honoured

These tests require node synchronization.

Error Injection testing
  • Either with file system mount mounted or not
  • Inject various types of errors on different nodes on the setup
  • Monitor statistics to determine how LNet is handling faults

These tests require node synchronization.

LNet Operations

...

  1. LNet Configuration steps
    1. Via API directly. LUTF will provide a C/Python API to call the liblnetconfig API
    2. Via lnetctl utility. LUTF will provide a simple wrapper class to call lnetctl.
  2. Provisioning/Unprovisioning a File System
    1. LUTF will provide an API to provision a file system.
    2. LUTF will provide an API to clean the clustre and get it in a state ready for next test
      1. LUTF will do this automatically before running a test. It'll ensure that the clustre has no FS mounted and no lustre modules loaded. This way a test starts from a clean slate
      2. LUTF will provide a way to override this feature
  3. VerificationVerificaition
    1. This will be the responsibility of each test
  4. Running traffic using selftest
    1. LUTF will provide a wrapper class to run selftest, so that the test writer doesn't need to know about selftest specific scripts.

...

  1. Provide a python interface to run scripts
    1. Automatically figure out all the suites
    2. Automatically figure out all the tests in each suite
    3. Provide a method to run a script.
      1. Code Block
        # Manually running the lutf
        # lutf.sh is a wrapper script to run the lutf. It can be called manually or through Auster.
        # Takes the following parameters
        #   -c config: configuration file with all the environment variable in the same
        #              format as what Auster takes
        #. If  -not provided it'll assume environment variables are already set.
        #   -s: run in shell mode (IE access to python shell)
        #       if not provided then run in daemon mode.
        # lutf.sh will have the default communication ports hard coded in the script and will start the agents and the master
        #   >> pdsh -w <hostname> <lutf agent start command>
        #   >> <lutf bin> <paramters>
        >> ./lutf.sh
        
        # when you enter LUTF python interface. It'lll have an lutf library already imported
        
        #  environment for test
        lutf.get_environment()
        
        # get connected agents
        lutf.get_agents()
        
        #print all available suites
        lutf.suites
        
        #print all available scripts in the suite
        lutf.suites['suite name'].scripts
        
        # reload the suites and the scripts if it has changed
        lutf.suites.reload()
        
        # run a script
        lutf.suites['suite name'].scripts['script name'].run()
        
        # reload a script after making changes
        lutf.suites['suite name'].scripts['script name'].reload()
  2. Provision LNet configuration
  3. Provision FS configuration
  4. When running a test script it always makes sure it cleans the clustre
  5. Grab common logs
    1. lctl dk
    2. syslog
    3. crash log

...

Gliffy Diagram
nameLUTF Data Flow
pagePin12

LUTF

...

Threading Overview

Gliffy Diagram
nameThreading Overview
pagePin3

The LUTF will provide a dependency script, lutf_dep.py, which will download and install all the necessary elements defined above.

The LUTF will integrate with auster. LUTF should just run like any other Lustre test. A bash wrapper script will be created to execute the LUTF, lutf.sh .

SIDE NOTE: Since LUTF simply just runs python scripts, it can run any test, including Lustre tests.

Auster

auster configuration scripts set up the environment variables required for the tests to run. These environment variables include:

  1. The nodes involved in the tests
  2. The devices to use for storage
  3. The clients
  4. The PDSH command to use

It also sets a host of specific Lustre environment variables.

It then executes the tests scripts, ex: sanity.sh 

sanity.sh can then run scripts utilizing the information provided in the environment variables.

LUTF and Auster

The LUTF will build on the existing test infrastructure.

An lutf.sh script will be created, which will be executed from auster.

auster will continue to setup the environment variables it does as of the time of this writing. The lutf.sh will run the LUTF. Since the LUTF is run within the auster context, the test python scripts will have access to these environment variables and can use them the same way as the bash test scripts do. If LUTF python scripts are executed on the remote node the necessary information from the environment variables are delivered to these scripts.

Test Prerequisites

Before each test the lutf.sh will provide functions to perform the following checks:

  1. If the master hasn't started, start it.
  2. If the agents on the nodes specified haven't started, then start them.
  3. Verify the system is ready to start. IE: master and agents are all started.

Test Post-requisites

  1. Provide test results in YAML format.

It's the responsibility of the test scripts to ensure that the system is in an expected state; ie: file system unmounted, modules unloaded, etc.

LUTF Threading Overview

Gliffy Diagram
nameThreading Overview
pagePin2

Thread Description

  • Listener: Listens for connections from LUTF Agents and for Heartbeats to monitor aliveness of the Agents.
  • HeartBeat: Send a periodic heartbeat to the LUTF Master to inform it that the agent is still alive.
  • Python Interpreter: Executes python test scripts which can call into one of the C/Python APIs provided

C/Python APIs

C/Python Management API

  1. Parse configuration
  2. provide status on the LUTF Agents
  3. provide status on executing scripts
  4. Store results

C/Python Synchronization APIs

  1. Execute tests on the LUTF
    1. This will result in a YAML rpc block being sent to the LUTF agent
  2. Wait for work completion events from LUTF Agents
  3. Register for asynchronus events
    1. Asynchronous events come in the form of YAML blocks.

C/Python liblnetconfig APIs

  1. These are the configuration APIs in lnet/utils/lnetconfig/liblnetconfig.h

Other APIs can be wrapped in SWIG and exposed for the LUTF python test scripts to call

LUTF Test Scripts Design Overview

  • The test scripts will be deployed on all nodes under test as well as the test master.
  • Each test script will need to provide a run  function
    • This function is intended to be executed by the  test master
  • The LUTF will provide a method to do remote procedure calls.
  • Each test, which can be composed of arbitrary python code, must return a YAML text block to the test master reporting the results of the operation.

LUTF Communication Protocol

The Master and the Agent need to exchange information on which scripts to execute and the results of the scripts. Luckily, YAML provides an easy way to transport information. Python YAML parser converts YAML blocks into dictionaries, which are in turn easy to handle in Python code. Therefore YAML is a good way to define Remote Procedure Calls. It is understood that there are other libraries which implement RPCs; however, the intent is to keep the LUTF as simply and easily debug-able as possible.

To execute a function call on a remote node the following RPC YAML block is sent

Code Block
rpc:
   target: agent_id # the ID of the agent to execute the function on
   type: function_call # Type of the RPC
   script: script_path # Path to the script which includes the function to execute
   fname: function_name # Name of function to execute
   parameters: # Parameters to pass the function
      param0: value # parameters can be string, integer, float or list
      param1: value2
      paramN: valueN

To return the results of the script execution

is designed to allow master-agent, agent-agent or master-master communication. For the first phase of the implementation we will implement the master-agent communication.

Thread Description

  • Listener: Listens for connections from LUTF Agents and for Heartbeats to monitor aliveness of the Agents.
  • HeartBeat: Send a periodic heartbeat to the LUTF Master to inform it that the agent is still alive.
  • Python Interpreter: Executes python test scripts which can call into one of the C/Python APIs provided

C/Python APIs

C/Python Management API

  1. Parse configuration
  2. provide status on the LUTF Agents
  3. provide status on executing scripts
  4. Store results

C/Python Synchronization APIs

  1. Execute tests on the LUTF
    1. This will result in a YAML rpc block being sent to the LUTF agent
  2. Wait for work completion events from LUTF Agents
  3. Register for asynchronus events
    1. Asynchronous events come in the form of YAML blocks.

C/Python liblnetconfig APIs

  1. These are the configuration APIs in lnet/utils/lnetconfig/liblnetconfig.h

Other APIs can be wrapped in SWIG and exposed for the LUTF python test scripts to call

LUTF Block view

Gliffy Diagram
nameLUTF Layers
pagePin5

  • C/Python APIs are described above
  • Python LUTF test execution APIs: These are a set of classes which allow the abstraction of the execution of python methods on remote nodes
  • Python Clustre Managment APIs: These are a set of classes which allow the scripts to manage the cluster: provision it, deploy an LNet configuration, deploy a FS configuration, collect logs, etc.
  • Python LUTF test Magamement APIs: These are a set of classes which allow the user to query and execute the LUTF suites and scripts available.
  • lutf.sh: A wrapper script which is responsible for starting the LUTF instances on the provisioned clustre

LUTF Deployment

The LUTF will provide a dependency script, lutf_dep.py, which will download and install all the necessary elements defined above.

The LUTF will integrate with auster. LUTF should just run like any other Lustre test. A bash wrapper script will be created to execute the LUTF, lutf.sh .

SIDE NOTE: Since LUTF simply just runs python scripts, it can run any test, including Lustre tests.

Auster

auster configuration scripts set up the environment variables required for the tests to run. These environment variables include:

  1. The nodes involved in the tests
  2. The devices to use for storage
  3. The clients
  4. The PDSH command to use

It also sets a host of specific Lustre environment variables.

It then executes the tests scripts, ex: sanity.sh 

sanity.sh can then run scripts utilizing the information provided in the environment variables.

LUTF and Auster

The LUTF will build on the existing test infrastructure.

An lutf.sh script will be created, which will be executed from auster.

auster will continue to setup the environment variables it does as of the time of this writing. The lutf.sh will run the LUTF. Since the LUTF is run within the auster context, the test python scripts will have access to these environment variables and can use them the same way as the bash test scripts do. If LUTF python scripts are executed on the remote node the necessary information from the environment variables are delivered to these scripts.

Auster will run the LUTF as follows

Code Block
./auster -f lutfcfg -rsv -d /opt/results/ lutf [--suite <test suite name>] [--only <test case name>]
example:
./auster -f lutfcfg -rsv -d /opt/results/ lutf --suite samples --only sample_02


Test Prerequisites

Before each test the lutf.sh will provide functions to perform the following checks:

  1. If the master hasn't started, start it.
  2. If the agents on the nodes specified haven't started, then start them.
  3. Verify the system is ready to start. IE: master and agents are all started.

Test Post-requisites

  1. Provide test results in YAML format.

It's the responsibility of the test scripts to ensure that the system is in an expected state; ie: file system unmounted, modules unloaded, etc.

LUTF Test Scripts Design Overview

  • The test scripts will be deployed on all nodes under test as well as the test master.
  • Each test script will need to provide a run  function
    • This function is intended to be executed by the  test master
  • The LUTF will provide a method to do remote procedure calls.
  • Each test, which can be composed of arbitrary python code, must return a YAML text block to the test master reporting the results of the operation.

LUTF Communication Protocol

The Master and the Agent need to exchange information on which scripts to execute and the results of the scripts. Luckily, YAML provides an easy way to transport information. Python YAML parser converts YAML blocks into dictionaries, which are in turn easy to handle in Python code. Therefore YAML is a good way to define Remote Procedure Calls. It is understood that there are other libraries which implement RPCs; however, the intent is to keep the LUTF as simply and easily debug-able as possible.

To execute a function call on a remote node the following RPC YAML block is sent

Code Block
rpc:
   dst: agent_id # name of the agent to execute the function on
   src: source_name # name of the originator of the rpc
   type: function_call # Type of the RPC
   
Code Block
rpc:
   target: master_id # master ID. There should only be one
   type: results # Type of the RPC
   results:
      script: script_path # Path to the script which wasincludes executed
the function to execute
   return_codefname: pythonfunction_objectname # returnName code of function whichto isexecute
 a python object

A python class will wrap the RPC protocol, such that the scripts do not need to form the RPC YAML block manually.

 parameters: # Parameters to pass the function
      param0: value # parameters can be string, integer, float or list
      param1: value2
      paramN: valueN

To return the results of the script execution

Code Block
rpc:
   dst: agent_id # name of the agent to execute the function on
   src: source_name # name of the originator of the rpc
   type: results # Type of the RPC
   results
Code Block
####### Part of the LUTF infrastructure ########
# The BaseTest class is provided by the LUTF infrastructure
# The rpc method of the BaseTest class will take the parameters,
# serialize it into a YAML block and send it to the target specified.
class BaseTest(object, lutfrpc):
   def __init__(target=None):
      if target:
         self.remote = true
         self.target = target

   def __getattribute__(self,name):
      script: script_path attr# = object.__getattribute__(self, name)
    Path to the script which was executed
    if hasattr(attr, return_code: python_object # return code of function which is a python object

A python class will wrap the RPC protocol, such that the scripts do not need to form the RPC YAML block manually.

Code Block
####### Part of the LUTF infrastructure ########
# The BaseTest class is provided by the LUTF infrastructure
# The rpc method of the BaseTest class will take the parameters,
# serialize it into a YAML block and send it to the target specified.
class BaseTest(object, lutfrpc):
   def __init__(target=None):
'__call__'):
            def newfunc(*args, **kwargs):
                if self.remote:
                    # execute on the remote defined by:
          if target:
         #self.remote = true
   self.target
      self.target = target

   def __getattribute__(self,name):
        #attr  =   attrobject.__namegetattribute__(self, = name of functionname)
        if hasattr(attr, '__call__'):
          #    def type(self).__name__ = name of class  
newfunc(*args, **kwargs):
                if self.remote:
         result = lutfrpc.send_rpc(self.target, attr.__name__, type(self).__name__, *args, **kwargs)
     # execute on the remote       elsedefined by:
                    result# = attr(*args, **kwargs)
  self.target
              return result
     #     attr.__name__ = name of return newfuncfunction
        else:
            return attr

###### In the test script ######
# Each test case will inherit from the BaseTest class.
class Test_1a(BaseTest):
   def __init__(target):
#     type(self).__name__ = name of class  
                 # call base constructor
result =     super(Test_1a, lutfrpc.send_rpc(self.target, attr.__name__, type(self).__initname__(target, *args, **kwargs)
     def methodA(parameters):
	  # do some test logic
   def methodB(parameters)else:
      #   do some more test logic

# The run function will be executed byresult the LUTF master
# it will instantiate the Test or the step of the test to run
# then call the class' run function providing it with a dictionary
# of parameters
def run(dictionary, results)= attr(*args, **kwargs)
                return result
            return newfunc
        else:
   target   = lutf.get_target('mds')
   # do somereturn logicattr

###### In the Test1atest = Test_1a(target);
   result = Test1a.methodA(params)
   if (test for result success):
       result2 = Test1a.methodb(more_params)
   # append the results_yaml to the global results

To simplify matters Test parameters take only a dictionary as input. The dictionary can include arbitrary data,  which can be encoded in YAML eventually.

Communication Infrastructure

...

script ######
# Each test case will inherit from the BaseTest class.
class Test_1a(BaseTest):
   def __init__(target):
      # call base constructor
      super(Test_1a, self).__init__(target)
   def methodA(parameters):
	  # do some test logic
   def methodB(parameters):
      # do some more test logic

# The run function will be executed by the LUTF master
# it will instantiate the Test or the step of the test to run
# then call the class' run function providing it with a dictionary
# of parameters
def run(dictionary, results):
   target = lutf.get_target('mds')
   # do some logic
   Test1a = Test_1a(target);
   result = Test1a.methodA(params)
   if (test for result success):
       result2 = Test1a.methodb(more_params)
   # append the results_yaml to the global results

To simplify matters Test parameters take only a dictionary as input. The dictionary can include arbitrary data,  which can be encoded in YAML eventually.

Communication Infrastructure

Gliffy Diagram
nameCallFlow
pagePin1

The LUTF provided rpc communciation relies on a simple socket implementation.

  1. The LUTF Python RPC call will package the following into a YAML block:
    1. absolute file path
    2. class name
    3. function name
    4. arguments passed to the function
  2. The LUTF Python RPC call will call into an LUTF provided C API to send the rpc text block to the target specified and block for response
  3. The LUTF slave listener will recieve the rpc YAML text block and pass it up to the python layer
  4. Python layer will parse the rpc YAML text block into a python dictionary and will instantiate the class specified and call the method
  5. It'll take the return values from the executed method pack it up in an RPC YAML block and call the same C API to send back the YAML block to the waiting master.
  6. The master will receive the RPC YAML text block and pass it up to the python RPC layer
  7. Python RPC layer will decode the YAML text block into a python dictionary and return the results

This mechanism will also allow the test class methods to be executed locally, by not providing a target

The LUTF can read all the environment variables provided and encode them into the YAML being sent to the node under test. This way the node under test has all the information it needs to execute.

Test Environment Set-Up

Each node which will run the LUTF will need to have the following installed

  1. ncurses library
    1. yum install ncurses-devel
  2. readline library
    1. yum install readline-devel
  3. rlwrap: Used when telneting into the LUTF telnet server. Allows using up/down errors and other readline features
    1. yum install rlwrap 
  4. python 3.6+
    1. yum install python3
  5. paramiko
    1. pip3 install paramiko 
  6. netifaces
    1. pip3 install netifaces 
  7. Install PyYAML
    1. pip3 install pyyaml 

The LUTF will also require that passwordless ssh is setup for all the nodes which run the LUTF. This task is already done when the AT sets up the test cluster.

Building the LUTF

The LUTF shall be integrated with the Lustre tests under lustre/tests/lutf. The LUTF will be built and packaged with the standard

Code Block
sh ./autogen.sh
./configure --with-linux=<kernel path>
make
# optionally
make rpms
# optionally
make install

The make system will build the following items:

  1. lutf binary
  2. liblutf_agent.so - shared library to communicate with the LUTF backend.
  3. clutf_agent.py and _clutf_agent.so: glue code that allows python to call functions in liblutf_agent.so
  4. clutf_global.py  and _clutf_global.so : glue code that allows python to call functions in liblutf_global.so
  5. lnetconfig.py and _lnetconfig.so  - glue code to allow python test scripts to utilize the DLC interface.

The build process will check if python 3.6 and SWIG 3.0 or higher is installed before building. If these requirements are not met the LUTF will not be built

If the LUTF is built it will be packaged in the lustre-tests rpm and installed in /usr/lib64/lustre/tests/lutf.

Tasks

TaskDescription
C infrastructure
  • lutf binary
  • listener thread
  • Heart beat
  • python integration
    • Look into having a choice between python 3.x and python 2.7.x
  • IPC
    • Manage connections between the master and the agents
    • Track the agents
    • Provide APIs for Request/Response Pair
      • These APIs will block in the calling thread until a response is received
      • TODO: What happens if we're calling these APIs from separate Python threads?
        • What I'm trying to get at is to see how a script can spawn python threads. These threads can do RPC. While the main test thread can continue doing other test logic.
  • API for managing and querying the state kept by the C infrastructure
    • agent information
SWIG
  • SWIG infrastructure to call C APIs
    • liblnetconfig
    • LUTF Agent Management
    • LUTF RPC
lutf.sh
  • Spawn the master and agents appropriately
  • Pass to the master the suite or specific test to run. If nothing is provided all suites are run.
  • Waits on the master until it exits after running the tests
lutf Python Library
  • Association between Agents and node roles (MGS/MDD/etc)
    • IE build a view of the clustre as identified by the provided environment variables.
  • API for querying the Agents
  • Automatically loaded and initialized
  • API for suites and scripts management and execution
  • Use the lutf Provisioning Library to clean the clustre before running each test.
lutf Provisioning Library
  • API to provision LNet and lnet_selftest
  • API to provision the Lustre File System
    • API should take a dictionary of the different nodes and based on the node types it spawns a simple File system
  • Both APIs can be used together.
    • use the LNet provisioning API to provision and configure LNet
    • use the Lustre FS provisioning API to provision the File system on top of the configured LNet 
  • API to un-provision a clustre described in a python dictionary 
lutf logging infrastructure
  • Set lustre logging levels
  • Collect lustre logs
  • collect syslogs
  • Provide debugging level infrastructure for the test scripts (probably just use the provided Python logging)
  • API for storing YAML results.


OLD INFORMATION

TODO: Below is old information still being cleaned up

Test Environment Set-Up

Each node which will run the LUTF will need to have the following installed

  1. ncurses library
    1. yum install ncurses-devel
  2. readline library
    1. yum install readline-devel
  3. python 2.7.5
    1. https://www.python.org/download/releases/2.7.5/
    2. ./configure --prefix=<> --enable-shared # it is recommended to install in standard system path
    3. make; make install
  4. setuptools
    1. https://pypi.python.org/pypi/setuptools
    2. The way it worked for me:
      1. Download package and untar
      2. python2.7 setup.py install
  5. psutils
    1. https://pypi.python.org/pypi?:action=display&name=psutil
      1. untar
      2. cd to untared directory
      3. python2.7 setup.py install
  6. netifaces
    1. https://pypi.python.org/pypi/netifaces
  7. Install PyYAML
    1. pip isntall pyyaml

The LUTF will also require that passwordless ssh is setup for all the nodes which run the LUTF. This task is already done when the AT sets up the test cluster.

...

The LUTF provided rpc communciation relies on a simple socket implementation.

  1. The LUTF Python RPC call will package the following into a YAML block:
    1. absolute file path
    2. class name
    3. function name
    4. arguments passed to the function
  2. The LUTF Python RPC call will call into an LUTF provided C API to send the rpc text block to the target specified and block for response
  3. The LUTF slave listener will recieve the rpc YAML text block and pass it up to the python layer
  4. Python layer will parse the rpc YAML text block into a python dictionary and will instantiate the class specified and call the method
  5. It'll take the return values from the executed method pack it up in an RPC YAML block and call the same C API to send back the YAML block to the waiting master.
  6. The master will receive the RPC YAML text block and pass it up to the python RPC layer
  7. Python RPC layer will decode the YAML text block into a python dictionary and return the results

This mechanism will also allow the test class methods to be executed locally, by not providing a target

The LUTF can read all the environment variables provided and encode them into the YAML being sent to the node under test. This way the node under test has all the information it needs to execute.

Test Environment Set-Up

Each node which will run the LUTF will need to have the following installed

  1. ncurses library
    1. yum install ncurses-devel
  2. readline library
    1. yum install readline-devel
  3. python 2.7.5
    1. https://www.python.org/download/releases/2.7.5/
    2. ./configure --prefix=<> --enable-shared # it is recommended to install in standard system path
    3. make; make install
  4. setuptools
    1. https://pypi.python.org/pypi/setuptools
    2. The way it worked for me:
      1. Download package and untar
      2. python2.7 setup.py install
  5. psutils
    1. https://pypi.python.org/pypi?:action=display&name=psutil
      1. untar
      2. cd to untared directory
      3. python2.7 setup.py install
  6. netifaces
    1. https://pypi.python.org/pypi/netifaces
  7. Install PyYAML
    1. pip isntall pyyaml

The LUTF will also require that passwordless ssh is setup for all the nodes which run the LUTF. This task is already done when the AT sets up the test cluster.

Building the LUTF

The LUTF shall be integrated with the Lustre tests under lustre/tests/lutf. The LUTF will be built and packaged with the standard

Code Block
sh ./autogen.sh
./configure --with-linux=<kernel path>
make
# optionally
make rpms
# optionally
make install

The make system will build the following items:

  1. lutf binary
  2. liblutf_agent.so - shared library to communicate with the LUTF backend.
  3. clutf_agen.py and _clutf_agent.so: glue code that allows python to call functions in liblutf_agent.so
  4. lnetconfig.py and _lnetconfig.so - glue code to allow python test scripts to utilize the DLC interface.

The build process will check if python 2.7.5 and SWIG 2.0 or higher is installed before building. If these requirements are not met the LUTF will not be built

If the LUTF is built it will be packaged in the lustre-tests rpm and installed in /usr/lib64/lustre/tests/lutf.

OLD INFORMATION

TODO: Below is old infromation stil being cleaned up

LUTF Configuration Files

Setup YAML Configuration File

...