Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Currently scheduler threads are grouped into blocks, ksock_sched_info divided by the number of configured CPTs. Each ksock_sched_info defaults to having cfs_cpt_weight() threads struct ksock_sched, or if ksnd_nscheds is configured, it will select the least of these. The idea is not to have more threads than the cfs_cpt_weight().

struct ksock_sched is associated with exactly 1 thread. There is a 1:1 relationship between a ksock_sched and a scheduler thread. So any transmits or receives queued on the ksock_sched will be served by that thread.

When creating a connection the connection is associated with a thread ksock_sched selected from the ksock_sched_info on the CPT derived by lnet_cpt_of_nid(peer_nid).

The connection remains associated with that thread that ksock_sched for the duration of its life.

...

The life span of a TCP connection spans multiple transmits/receives. This means that the same scheduler thread is used for all these transmitsoperations. The scheduler is changed only when the connection is torn down. In an lnet_selftest run and in other filesystem tests one scheduler thread takes up all the CPU resources, causing severe drop in performance.

...

Gliffy Diagram
namesocklnd_sched_new
pagePin4

Implementation Details

Locking

Locking was per thread now a lock covers all the threads in the scheduler. I need to work out the impact of that.

Sending/Receiving Messages

Again each thread had access to buffers to put the data received or sent in. These can't be shared over all the threads as data can then override each other. Some investigation is needed in this area.