You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Problem Statement

Currently scheduler threads are grouped into blocks, ksock_sched_info divided by the number of configured CPTs. Each ksock_sched_info defaults to having cfs_cpt_weight() threads, or if ksnd_nscheds is configured, it will select the least of these. The idea is not to have more threads than the cfs_cpt_weight().

When creating a connection the connection is associated with a thread selected from the ksock_sched_info on the CPT derived by lnet_cpt_of_nid(peer_nid).

The connection remains associated with that thread for the duration of its life.

The life span of a TCP connection spans multiple transmits. This means that the same scheduler thread is used for all these transmits. The scheduler is changed only when the connection is torn down. In an lnet_selftest run and in other filesystem tests one scheduler thread takes up all the CPU resources, causing severe drop in performance.

Solution

Each transmit/receive should be handled by a separate scheduler thread selected round robin from the scheduler threads available in the ksock_sched_info. This way the work is spread evenly over all the schedulers available on the CPT.

This solution will require infrastructure changes in socklnd as the scheduler can no longer be associated with a connection, but with a single transmit. This impacts the lock handling. This solution will be detailed in the next section

Implementation Details


  • No labels