In this post I will try to explain how locality of reference could be exploited by grouping operations that access different memory locations. In the SAT solver program I write, there is a loop that does a something called propagation. The goal of propagation is to set a number of variables to a given value through a pretty cache-intensive process. While doing this, as the variables are set, some statistics are calculated. For example, I used to have something like this called for every variable set:
enqueueNewFact(Literal lit, PropagatedBy someData) {
trail.push_back(lit.var());
value[lit.var()] = lit.sign();
agility *= conf.agilityG;
if (varPolarity[lit.var()] != lit.sign()) agility += 0.0001;
varPolarity[lit.var()] = lit.sign();
varSet[lit.var()]++;
level[lit.var()] = getLevel();
}
The problem with this function is that it is accessing a lot of data pieces: trail[], value[], level[], varPolarity[]
and varSet[]
. All of these must be in cache for these to be updated, and if they aren’t, they must be pulled in from lower levels of cache or the main memory. Since enqueuNewFact()
is called often, these are mostly in cache all the time, taking the space away from what really needs it: the data used during the propagate()
routine.
What I have just recently realised is that I don’t use the values of level[], varPolarity[], varSet[]
or agility
at all during the propagate()
routine. Instead, they are used when the recursive calling of propagate()
has finished, and a certain technical point (called a conflict) is reached. If this is indeed the case, why are we updating them all the time, immediately when enqueueNewFact()
is called? If you look at the code, due to various reasons, we must save a trail (called, conveniently, trail[]
) of what we have done. This trail contains all the information needed by all the statistics-update functions: var()
and the sign()
, both contained in Lit
is all that’s necessary.
So, I have made a new function: delayedUpdate()
that calculates all the statistics just when it is needed: when a conflict is reached. It has a simple state that saves at what trail
position it was called last time, lastDelayedEnqueueUpdate
. The default is trail level 0, and then whenever we reach a conflict, we update it, once the stats are calculated:
delayedEnqueueUpdate() {
int pseudoLevel = lastDelayedEnqueueUpdateLevel;
for (int i = lastDelayedEnqueueUpdate; i < trail.size(); i++) {
Lit lit = trail[i];
agility *= conf.agilityG;
if (varPolarity[lit.var()] != lit.sign()) agility += 0.0001;
varPolarity[lit.var()] = lit.sign();
varSet[lit.var()]++;
if (trail_lim[pseudoLevel] == i) pseudoLevel++;
level[p.var()] = pseudoLevel;
}
lastDelayedEnqueueUpdate = trail.size();
lastDelayedEnqueueUpdateLevel = decisionLevel();
}
This code, apart from the pseudoLevel
and lastDelayedEnqueueUpdateLevel
(both of which are SAT solver technicalities) is rather easy to understand. Basically, we take the last position we updated last time, lastDelayedEnqueueUpdate
, and we work our way from there to the end, trail.size()
. Once we have worked our way through, we set lastDelayedEnqueueUpdate
to the current point, trail.size()
.
All this seems useless if we don't take cache usage into consideration: we are not saving any CPU instructions. If anything, we add CPU instructions to execute, as we now need to execute an extra for
loop (propagate()
is essentially a recursive for
loop). However, from a cache usage point of view, we are saving a lot. When doing propagation, the propagate()
routine can have all the cache it needs. Then, when that cache-intensive process has finished, we pull in a set of arrays, varPolarity[], varSet[], level[]
, update them all in one go, and then continue operation.
We have managed to group operations according to what memory pieces they are using&updating. This makes our program more cache-friendly, which almost always leads to a measurable speed increase.