Introduction

Currently there is no dedicated test tool in Lustre test suites for LNet testing. Lustre Unit Test Framework (LUTF) fills that gap to provide a means for testing existing LNet features as well as new features that would be added in future. It facilitates an easy way of adding new test cases/scripts to test any new LNet feature.

Objectives

This High Level Design Document describes the current LUTF design, code base, infrastructure requirements for its setup and the new features that can be added on top of the current design.

Reference Documents

Document Link
LNet Unit Test Infrastructure (LUTF) Requirements

Document Structure

This document is made up of the following sections:

Design Overview: Describes the existing infrastructure, code base and components of LUTF.

Setup and Usage: Describes how to setup and run LUTF on test nodes.

New Features: Describes the new features to be added to the current design.

LUTF Design Overview

The below diagram shows how LUTF interacts with LNet

Figure 1: System Level Diagram

The LUTF is designed with a Master-Agent approach to test LNet. The Master and Agent LUTF instance uses a telnet python module to communicate with each other and more than one Agent can communicate with single Master instance at the same time. The Master instance controls all the Agents connected to it. It controls the execution of the python test scripts to test LNet on Agent instances, collects the results of all the tests run on Agents and write them to a YAML file. It also controls the synchronization mechanism between test-scripts running on different Agents.

Building the LUTF

TBD: How is the LUTF build and where is it built

LUTF/AT Integration

TBD: How the LUTF integrates in the AT

Infrastructure

Automatic Deployment

TBD: How does the AT deploy the LUTF, collect results, show results in Maloo

C Backend

TBD: how does the C backend work. How does it glue with python

Python

Script execution and result collection

how are scripts deployed from the Master to the AGent

How are the scripts executed

How are the results collected

Batch test

how should we execute a collection of tests. You can discuss how it's currently done, and if it can be imporved.

Components of LUTF

The LUTF is composed of two components:

C back-end infrastructure
This allows for the setup of the TCP communication between the Master and Agent nodes (lutf.c).
1. Master mode:
  1. Starts a listener thread to listen to Agent connections (lutf_listener.c).
  2. Maintains a list of the Agents
  3. Start up a python interpreter (lutf_python.c).
  4. Provides a library which is SWIG wrapped and callable from python scripts (liblutf_agent.c).
2. Agent mode:
  1. A thread is started to maintain a heart beat with the master. The master uses the Heart beat to determine the aliveness of the agents (lutf.c).
  2. Start up a python interpreter through Telnet (lutf_python.c).
Python Test infrastructure
1. Infrastructure Level 1:
  A python master script for this infrastructure would facilitate the following:
  1. Deploy LUTF on all the Agent nodes and Master node.
  2. Provides a telnet server and client for Master<->Agent communication.
  3. Provides a mechanism to query IP addresses and the network interfaces (NIs) on the Agents. This information can further be fetched by the test scripts on demand using an API.
  4. Facilitates running individual python tests scripts on the Agents and collecting results.
  5. Facilitates running the auto-test script which is a test-suite of all the test scripts related to one particular feature.
  6. Facilitate synchronization between the tests running on different Agents by providing an API that uses notification mechanism. An example scenario of how it would be implemented is - as a test script runs to it completion on an Agent, it would notify the Master about its status by calling this API and then Master can redirect this event to any script waiting on it.
2. Infrastructure Level 2:
  With its implementation, the functions which are used by multiple test scripts are defined in a base test infrastructure file (lnet_test_infra_utils.py) which is then imported in each test script. This ease out the process of writing new test scripts and avoids code redundancy.

SWIG & DLC Library

SWIG is used to wrap the DLC library and make it callable from Python. This allows Python test scripts to call DLC APIs directly to test LNet.