...
Currently scheduler threads are grouped into blocks, ksock_sched_info
divided by the number of configured CPTs. Each ksock_sched_info
defaults to having cfs_cpt_weight()
threads struct ksock_sched
, or if ksnd_nscheds
is configured, it will select the least of these. The idea is not to have more threads than the cfs_cpt_weight()
.
A struct ksock_sched
is associated with exactly 1 thread. There is a 1:1 relationship between a ksock_sched
and a scheduler thread. So any transmits or receives queued on the ksock_sched
will be served by that thread.
When creating a connection the connection is associated with a thread a ksock_sched
selected from the ksock_sched_info
on the CPT derived by lnet_cpt_of_nid(peer_nid)
.
The connection remains associated with that thread that ksock_sched
for the duration of its life.
...
The life span of a TCP connection spans multiple transmits/receives. This means that the same scheduler thread is used for all these transmitsoperations. The scheduler is changed only when the connection is torn down. In an lnet_selftest
run and in other filesystem tests one scheduler thread takes up all the CPU resources, causing severe drop in performance.
...
Gliffy Diagram | ||||
---|---|---|---|---|
|
Implementation Details
Locking
Locking was per thread now a lock covers all the threads in the scheduler. I need to work out the impact of that.
Sending/Receiving Messages
Again each thread had access to buffers to put the data received or sent in. These can't be shared over all the threads as data can then override each other. Some investigation is needed in this area.