...
When creating a connection the connection is associated with a thread selected from the the ksock_sched_info
on the CPT derived by lnet_cpt_of_nid(peer_nid)
.
The connection remains associated with that thread for the duration of its life.
Gliffy Diagram | ||||
---|---|---|---|---|
|
The life span of a TCP connection spans multiple transmits. This means that the same scheduler thread is used for all these transmits. The scheduler is changed only when the connection is torn down. In an an lnet_selftest
run and in other filesystem tests one scheduler thread takes up all the CPU resources, causing severe drop in performance.
Solution
Each transmit/receive should be handled by a separate scheduler thread selected round robin from the scheduler threads available in the ksock_sched_info. This way the work is spread evenly over all the schedulers available on the CPT.
This solution will require infrastructure changes in socklnd as the scheduler can no longer be associated with a connection, but with a single transmit. This impacts the lock handling. This solution will be detailed in the next section
...
The fundamental issue with the socklnd scheduler design is that there is a 1:1 relationship between a scheduler thread and a connection. There is no reason to have such a 1:1 relationship, when any scheduler thread bound to the desired CPT can be used. The best case scenario is to let the kernel do the thread scheduling. This is how o2iblnd works.
In o2iblnd the scheduling code is as follows:
Code Block |
---|
/*
* Allocate and determine the number of threads for each
* scheduler
*/
3099 »·······kiblnd_data.kib_scheds = cfs_percpt_alloc(lnet_cpt_table(),
3100 »·······»·······»·······»·······»·······»······· sizeof(*sched));
3101 »·······if (kiblnd_data.kib_scheds == NULL)
3102 »·······»·······goto failed;
3103
3104 »·······cfs_percpt_for_each(sched, i, kiblnd_data.kib_scheds) {
3105 »·······»·······int»····nthrs;
3106
3107 »·······»·······spin_lock_init(&sched->ibs_lock);
3108 »·······»·······INIT_LIST_HEAD(&sched->ibs_conns);
3109 »·······»·······init_waitqueue_head(&sched->ibs_waitq);
3110
3111 »·······»·······nthrs = cfs_cpt_weight(lnet_cpt_table(), i);
3112 »·······»·······if (*kiblnd_tunables.kib_nscheds > 0) {
3113 »·······»·······»·······nthrs = min(nthrs, *kiblnd_tunables.kib_nscheds);
3114 »·······»·······} else {
3115 »·······»·······»·······/* max to half of CPUs, another half is reserved for
3116 »·······»·······»······· * upper layer modules */
3117 »·······»·······»·······nthrs = min(max(IBLND_N_SCHED, nthrs >> 1), nthrs);
3118 »·······»·······}
3119
3120 »·······»·······sched->ibs_nthreads_max = nthrs;
3121 »·······»·······sched->ibs_cpt = i;
3122 »·······}
/*
* start schedulers
*/
3197 static int
3198 kiblnd_dev_start_threads(struct kib_dev *dev, int newdev, u32 *cpts, int ncpts)
{
...
3204 »·······for (i = 0; i < ncpts; i++) {
3205 »·······»·······struct kib_sched_info *sched;
...
3213 »·······»·······rc = kiblnd_start_schedulers(kiblnd_data.kib_scheds[cpt]);
...
}
3156 static int
3157 kiblnd_start_schedulers(struct kib_sched_info *sched)
{
...
3178 »·······for (i = 0; i < nthrs; i++) {
3179 »·······»·······long»···id;
3180 »·······»·······char»···name[20];
3181 »·······»·······id = KIB_THREAD_ID(sched->ibs_cpt, sched->ibs_nthreads + i);
3182 »·······»·······snprintf(name, sizeof(name), "kiblnd_sd_%02ld_%02ld",
3183 »·······»·······»······· KIB_THREAD_CPT(id), KIB_THREAD_TID(id));
3184 »·······»·······rc = kiblnd_thread_start(kiblnd_scheduler, (void *)id, name);
3185 »·······»·······if (rc == 0)
3186 »·······»·······»·······continue;
3187
3188 »·······»·······CERROR("Can't spawn thread %d for scheduler[%d]: %d\n",
3189 »·······»······· sched->ibs_cpt, sched->ibs_nthreads + i, rc);
3190 »·······»·······break;
3191 »·······}
3192
3193 »·······sched->ibs_nthreads += i;
} |
In the code snippet above you can see that each struct kib_sched_info
has multiple threads. So when a connection is put on the scheduler list any of the threads in the scheduler can pick up this connection to work on.
The socklnd
needs to use the same mechanism.
This will result in the following structure
Gliffy Diagram | ||||
---|---|---|---|---|
|