...
- transaction_timeout, retry_count should be consistent across all nodes
- discovery: it should be clear why this is on or off on a particular node
Debug Logs
System logs only contain warning and error messages, so there are situations when debug logs are necessary too.
- When debugging a connection issue, make sure to obtain debug logs from both sides of the connection
- For debugging lnet, at least enable net debug logging: "lctl set_param debug=+net"
- If possible, reproduce on a "quiet" system to reduce the amount of data in the log.
- Make sure that debug log buffer is big enough, especially on a busy server, and that there's sufficient memory to hold the buffer.
Scenarios
Connectivity
This section describes the procedure for testing connectivity between nodes running Lustre.
...