All posts by msoos

Come to Hackito Ergo Sum’11

There is going to be a nice hacker conference in Paris real soon now: Hackito Ergo Sum, between the 7th and the 11th of April. There will be plenty of interesting talks, among them, I will also be presenting some fun crypto-breaking :) The schedule is here, I am scheduled for the first day, on Thursday at 11:30 AM. If you are in or near Paris, come around for the conference, I am sure it will be a really nice one :) If you are interested in meeting me, we can also meet in Paris, just drop me a mail, but better yet, come and chat with me at the conf, I will be there all day long.

The HES conference mainly focuses on attacks: against software, crypto, infrastructure, hardware, and the like. If you are interested what the current trend in attacking these systems is, and/or you are interested in making your organisation safer from these attacks, it’s a good idea to come and visit to get the hang of what’s happening. It’s also a great opportunity to meet all the organisations and people who are driving the change in making software and hardware systems more secure by demonstrating their weaknesses and presenting possibilities for securing them.

Meet you at HES’11 ;)

Recovering XORs from a CNF

Let’s suppose you are looking to recover XORs. It’s fun to recover XORs, because you could try to XOR the XORs together to obtain new information, like this:

a \oplus b \oplus c = 0
a \oplus d \oplus b = 0
therefore,
c \oplus d = 0
therefore,
c = d

Simple XOR recovery

The simplest way to recover an XOR clause is to look for clauses that encode it, which is the following set of clauses:

\neg a \vee b \vee c
a \vee \neg b \vee c
a \vee b \vee \neg c
\neg a \vee \neg b \vee \neg c

Finding such a set is not that hard: we need to first sort each clause according to its literals, then sort the list of clauses according to their sizes and variable contents and then go through the list of clauses linearly. If you think through the first two steps, you will see that it ensures that the above 4 clauses are next to each other in the list of clauses, making it trivial to find them.

Improved XOR recovery

There is only one small glitch with the algorithm above: other sets of clauses can also make up XORs. For example, this set of clauses:
\neg a \vee b \vee c
a \vee \neg b \vee c
a \vee b \vee \neg c
\neg a \vee \neg b

Notice that all I took away was one literal from one clause. Now, if a CNF implies \neg a \vee \neg b it must surely also imply \neg a \vee \neg b \vee \neg c since this latter one is less stringent than the first one. So, we could easily have it inside our CNF, but we don’t for reasons of lazyness: who wants to keep around data that is implied by unit propagation already? However, keeping in mind that the second, less stringent clause is implied by the first one is important: it could lead to the discovery of more XORs.

Results

Let’s move on to the magical territory of practical SAT solving. Let’s first take a typically industrial problem, UTI-20-10p0:

$./cryptominisat_old UTI-20-10p0.cnf
[...]
c Finding non-binary XORs:     0.10 s (found:       0, avg size: -nan)

The old, 2.9.0 CryptoMiniSat seems to have no luck at finding any XOR clauses. This is very typical: industrial instances seem not to contain any XOR clauses at all. Let’s look at the new algorithm now:

$./cryptominisat_new UTI-20-10p0.cnf
[...]
c XOR finding finished. Num XORs:   9288 

All right, that seems better.

Edited to add: Thanks to Vegard Nossum for telling me something was odd with XOR recovery. As usually is the case, answering a question is much easier than knowing what question to ask.

Edited to add2: The above algorithm can be easily improved by using cached literals and stamping.

CryptoMiniSat in SAT Competition’11

I have submitted three versions of CryptoMiniSat to the 2011 SAT Competition, all available here. The releases are commonly called “Strange Night”, after the music track of the same name. Inside the archive you will find 5 binaries and a PDF description. The five binaries are made up of two single-threaded, two multi-threaded, and a unified single- and multi-threaded binary. This latter is called techdemo, as it is more of a technological demonstrator than anything else, and was developed in collaboration with George Katsirelos and Laurent Simon. All versions are collaborative, however, as they all have the hands of some people from the CryptoMiniSat development mailing list on them.

To be honest, I am not overly optimistic of these binaries, especially since I couldn’t get the 3.0 version of CryptoMiniSat ready for the competition, on which I have been working on the past two months. I have tried to backport some features to the 2.9.0 branch to make these releases, and so the resulting executables are pretty unstable. The techdemo version is a broken 3.0 release, as it is missing the signature feature of fully shared clauses between threads. Since I have finally obtained a 4-core CPU — with hyper-threading for 8 “independent” threads — I am all the more optimistic about a fully-shared scheme. The upcoming 3.0 release will be specifically made to work in multi-threaded environments, and will suffer if launched in single-threaded mode. This architectural decision was made because multi-threading nowadays no longer seems optional: it’s a must. On the CryptoMiniSat mailing list, some people are reporting performance on 64-core machines — optimising for single-threaded environment (and having multi-threading as an afterthought) in these times seems like a broken approach.

For those interested in the actual sourcecode of what has been submitted, everything is available from GIT, as usual:

  1. Strange Night1 – single-threaded
  2. Strange Night1 – multi-threaded
  3. Strange Night2 – single-threaded
  4. Strange Night2 – multi-threaded
  5. Tech Demo

Since these are pretty unstable, I wouldn’t use them in a production environment… happy bug-hunting ;)

Visiting Linz

Lately I had the pleasure of visiting Linz, Armin Biere’s workplace, where I gave a quick talk on SAT solver architectures. To me, it was really interesting to think through that presentation — not because it was entirely new or exciting, but because it recapped on many issues that have been bothering me lately. Namely, that it’s difficult to make a really versatile SAT solver, because the low-level choices that must be made (e.g. watchlist scheme) determines so many things when one must make higher-level architectural decisions such as clause sharing or even something as simple as hyper-binary resolution. As for this latter, thanks to Armin Biere’s thoughts I have finally managed to realise why my hyper-binary resolution was so slow: I lately took the decision not to fully propagate binary clauses before propagating normal (i.e. larger) clauses, which meant that doing hyper-binary resolution was much slower as I had to re-calculate the binary graph. The fact of not fully propagating binary clauses before normal clauses seemed also to influence my much higher-level choice of using cached implications, as they (intuitively, and also in practice) help much more if binary clauses are not fully propagated before normal clauses. This latter influence is interesting to think through, as something this trivial shouldn’t — at least in principle — influence such a high-level decision.

Also thanks to Armin Biere, I have managed to grasp a better understanding of lingeling and its superior watchlist scheme. Some of the architectural elements of lingeling’s watchlist scheme are really interesting, and when they get published I will definitely port some of them to CryptoMiniSat. It seems to use much less space, and stores information in a more cache-friendly manner, aiding the processor in its job. A related, interesting pointer that I have learnt is this paper that originally introduced blocking literals, but also talks about a range of other ideas that can help. All in all, it was great to visit Linz and the group of Armin Biere, as I have managed to learn a lot from him and his colleagues.

Delayed updates

In this post I will try to explain how locality of reference could be exploited by grouping operations that access different memory locations. In the SAT solver program I write, there is a loop that does a something called propagation. The goal of propagation is to set a number of variables to a given value through a pretty cache-intensive process. While doing this, as the variables are set, some statistics are calculated. For example, I used to have something like this called for every variable set:

enqueueNewFact(Literal lit, PropagatedBy someData) {
    trail.push_back(lit.var());
    value[lit.var()] = lit.sign();
    agility *= conf.agilityG;
    if (varPolarity[lit.var()] != lit.sign()) agility += 0.0001;
    varPolarity[lit.var()] = lit.sign();
    varSet[lit.var()]++;
    level[lit.var()] = getLevel();
}

The problem with this function is that it is accessing a lot of data pieces: trail[], value[], level[], varPolarity[] and varSet[]. All of these must be in cache for these to be updated, and if they aren’t, they must be pulled in from lower levels of cache or the main memory. Since enqueuNewFact() is called often, these are mostly in cache all the time, taking the space away from what really needs it: the data used during the propagate() routine.

What I have just recently realised is that I don’t use the values of level[], varPolarity[], varSet[] or agility at all during the propagate() routine. Instead, they are used when the recursive calling of propagate() has finished, and a certain technical point (called a conflict) is reached. If this is indeed the case, why are we updating them all the time, immediately when enqueueNewFact() is called? If you look at the code, due to various reasons, we must save a trail (called, conveniently, trail[]) of what we have done. This trail contains all the information needed by all the statistics-update functions: var() and the sign(), both contained in Lit is all that’s necessary.

So, I have made a new function: delayedUpdate() that calculates all the statistics just when it is needed: when a conflict is reached. It has a simple state that saves at what trail position it was called last time, lastDelayedEnqueueUpdate. The default is trail level 0, and then whenever we reach a conflict, we update it, once the stats are calculated:

delayedEnqueueUpdate() {
    int pseudoLevel = lastDelayedEnqueueUpdateLevel;
    for (int i = lastDelayedEnqueueUpdate; i < trail.size(); i++) {
        Lit lit = trail[i];
        agility *= conf.agilityG;
        if (varPolarity[lit.var()] != lit.sign()) agility += 0.0001;
        varPolarity[lit.var()] = lit.sign();
        varSet[lit.var()]++;
        if (trail_lim[pseudoLevel] == i) pseudoLevel++;
        level[p.var()] = pseudoLevel;
    }
    lastDelayedEnqueueUpdate = trail.size();
    lastDelayedEnqueueUpdateLevel = decisionLevel();
}

This code, apart from the pseudoLevel and lastDelayedEnqueueUpdateLevel (both of which are SAT solver technicalities) is rather easy to understand. Basically, we take the last position we updated last time, lastDelayedEnqueueUpdate, and we work our way from there to the end, trail.size(). Once we have worked our way through, we set lastDelayedEnqueueUpdate to the current point, trail.size().

All this seems useless if we don't take cache usage into consideration: we are not saving any CPU instructions. If anything, we add CPU instructions to execute, as we now need to execute an extra for loop (propagate() is essentially a recursive for loop). However, from a cache usage point of view, we are saving a lot. When doing propagation, the propagate() routine can have all the cache it needs. Then, when that cache-intensive process has finished, we pull in a set of arrays, varPolarity[], varSet[], level[], update them all in one go, and then continue operation.

We have managed to group operations according to what memory pieces they are using&updating. This makes our program more cache-friendly, which almost always leads to a measurable speed increase.