The SRCU flavor of RCU uses per-cpu counters to detect that every CPU has
passed through a quiescent state for a particular SRCU lock instance
(srcu_struct
).
There’s are total of 4 counters per-cpu. One pair for locks, and another for
unlocks. You can think of the SRCU instance to be split into 2 parts. The
readers sample srcu_idx
and decided which part to use. Each part corresponds
to one pair of lock and unlock counters. A reader increments a part’s lock
counter during locking and likewise for unlock.
During an update, the updater flips srcu_idx
(thus attempting to force new
readers to use the other part) and waits for the lock/unlock counters on the
previous value of srcu_idx
to match. Once the sum of the lock counters of
all CPUs match that of unlock, the system knows all pre-existing read-side
critical sections have completed.
Things are not that simple, however. It is possible that a reader samples the
srcu_idx
, but before it can increment the lock counter corresponding to it,
it undergoes a long delay. We thus we end up in a situation where there are
readers in both srcu_idx = 0
and srcu_idx = 1
.
To prevent such a situation, a writer has to wait for readers corresponding to
both srcu_idx = 0
and srcu_idx = 1
to complete. This depicted with ‘A MUST’
in the below pseudo-code:
reader 1 writer reader 2
-------------------------------------------------------
// read_lock
// enter
Read: idx = 0;
<long delay> // write_lock
// enter
wait_for lock[1]==unlock[1]
idx = 1; /* flip */
wait_for lock[0]==unlock[0]
done.
Read: idx = 1;
lock[0]++;
lock[1]++;
// write_lock
// return
// read_lock
// return
/**** NOW BOTH lock[0] and lock[1] are non-zero!! ****/
// write_lock
// enter
wait_for lock[0]==unlock[0] <- A MUST!
idx = 0; /* flip */
wait_for lock[1]==unlock[1] <- A MUST!
NOTE: QRCU has a similar issue. However it overcomes such a race in the reader
by retrying the sampling of its srcu_idx
equivalent.
Q: If you have to wait for readers of both srcu_idx = 0
, and 1
, then why
not just have a single counter and do away with the “flipping” logic?
Ans:
Because of updater forward progress. If we had a single counter, then it is
possible that new readers would constantly increment the lock counter, thus
updaters would be waiting all the time. By using the ‘flip’ logic, we are able
to drain pre-existing readers using the inactive part of srcu_idx
to be
drained in a bounded time. The number of readers of a ‘flipped’ part would only
monotonically decrease since new readers go to its counterpart.
2023 update: I have more detailed notes with diagrams and such on this and other cases. Just reach out to me if you want to take a look at those.