Category Archives: SAT

GCC 4.5.2 at the SAT Competition’13

(Note update below)

Just a short note. All MiniSat-derived SAT solvers might give wrong results during the competition. No, MiniSat doesn’t have a bug. GCC has a giant bug. Vegard reported this years ago — I personally put about 12h tracking it down, posted it to the CryptoMiniSat mailing list, and Vegard took it up, stripped it down and reported it.

This is a well-known bug, there is a thread about it on the MiniSat mailing list and I personally asked the competition organizers back in 2011 not to use the 4.5.2(or .3?) they wanted to use. I used to have code to check for this bug (there is a handy check attached to the bug report), but I removed it, though maybe I should add it back. It’s fixed in 4.5.4+, still present in 4.5.3 where Vegard&me found it. I hope I will be able to convince the organisers this time around, too :) By the way, the option you need to pass to gcc to avoid the bug is “-fno-tree-pre” — though of course this turns off an optimisation and so slows down the final executable.

Update: The organisers decided to move to gcc 4.8.0. This is very good news. They also note that the speed of the solvers is better, which is also very good news. I am happy I contributed to this welcome improvement and I hope it doesn’t cause any delays or inconveniences to anyone.

Certified UNSAT and CryptoMiniSat

Marijn Heule kindly sent me an email on the 10th of April about DRUP, the new system used this year in the SAT Competition’13 for the UNSAT track. He kindly encouraged me to implement the DRUP system. He personally implemented it into Minisat which was a very helpful lead for me. In this post I will talk about my experiences in implementing DRUP into CryptoMiniSat within a span of 3 days.

Implementation complexity

It took only 3 days, about 1400 lines of code to implement DRUP:

git diff f27c74bbd  c0b6ccc10 --shortstat
 25 files changed, 1372 insertions(+), 307 deletions(-)

It turns out that the biggest problem is that whenever I shorten a clause, I first have to add the shortened version, and then delete the old one. Since I always do in-place literal deletion, this means I have to save the old clause into a temporary place, add the new one and finally delete the old one that has been saved. I will eventually write a C++ wrapper that does this for me, but currently, it’s a lot of

vector origCl(cl.size());
std::copy(cl.begin(), cl.end(), origCl.begin());
[blah...]
drup << cl << " 0" << endl;
drup << "d " << origCl << " 0" << endl;

So, it's a bit messy code. Other than this, the implementation went very smoothly. The biggest pain was not to forget to add to the DRUP output all changed clauses. Since I have implicit binary and tertiary clauses and I manipulate them in-place, they are changed in quite complicated code paths.

If you don't have such complicated code paths, you should be able to implement DRUP within a day or less. I encourage you to do so, it's quite fun!

Remaining uncertainties

I am a bit confused about whether some of the optimisations in CryptoMiniSat work with DRUP. I have been fuzzing the DRUP implementation for about ~1000 CPU hours, but not with all optimisations turned on. Some are a bit shaky. In particular, XOR and stamping&caching come to mind.

I cannot turn DRUP on for the top-level XOR manipulation because otherwise I would need to tell DRUP every Gaussian elimination step. Not funny, and not fast. Well, XOR is not such a big thing, and it is no longer natively implemented in CryptoMiniSat, so not a big deal, really.

The other, more troubling one is stamping and implied literal caching. Luckily I have on-the-fly hyper-binary resolution (this is needed for DRUP with Stalmarck if you think about it), so the binary clauses stored by caching and stamping are there... but they may get deleted by variable elimination, blocked clause elimination and... well, maybe nothing else. Hopefully not. Anyway, I never block binary clauses (does clause blocking ever help? I am confused) and I can of course not delete binary eliminated clauses from DRUP. However... that may make the verification very slow. So, I am at crossroads here. I think I will submit a version with stamping&caching and one without.

In the end, every optimisation can be turned on except for XOR. I find that exceptionally good given the number of tweaks/hacks used by CryptoMiniSat.

Long-term advatages of having DRUP

I think DRUP allows for a lot of possibilities. Naturally I first want to draw resolution graphs. There are plenty of libraries for 3D drawing, and I have already ordered the LEAP controller (a 3D controller), which will come handy to play with the resolution graphs (zoom&out, rotate, etc.).

From there, I want to get stats out of the graph, and I want to present it next to/with the stats that I already generate. For example, how many of the deleted clauses get re-learnt later? How many clauses get used in the resolution graph with the empty node? How often when cleaning with glues? How often when cleaning with activities? For which types of instances?

Linking this with real-world instances by coloring the graph points according to e.g. filter functions in stream ciphers is not very hard and should be quite a lot of fun.

Acknowledgements

I think Marijn Heule deserves a lot of thanks for the work he has put into DRUP (webpage, example, DIFF for MiniSat) and all the help he has given me. I had some initial doubts about whether it's possible to implement at all and I had some minor problems with the checker --- he always replied kindly and promptly. Thanks!

A variable elimination improvement

Lately, I have been thinking about how to improve variable elimination. It’s one of the most important things in SAT solvers, and it’s not exactly easy to do right.

Variable elimination

Variable elimination simply resolves every occurrence of a literal v1 with every occurrence of the literal \neg v1 , removes the original clauses and adds the resolvents. For example, let’s take the clauses

v1 \vee v2 \vee v3
v1 \vee v4 \vee v5
\neg v1 \vee v10 \vee v11
\neg v1 \vee v12 \vee v13

When v1 gets eliminated the resolvents become

v2 \vee v3 \vee v10 \vee v11
v2 \vee v3 \vee v12 \vee v13
v4 \vee v5 \vee v10 \vee v11
v4 \vee v5 \vee v12 \vee v13

The fun comes when the resolvents are tautological. This happens in this case for example:

v1 \vee v4
\neg v1 \vee v5\vee \neg v4

The resolvent is the clause

v4 \vee \neg v4 \vee v5

Which contains both a literal and its negation and is therefore always true. It’s good to find variables we can eliminate without and side-effects, i.e. variables that eliminate without leaving any resolvents behind. However, it’s not so cheap to find these. Until now.

A fast procedure for calculating the no. of non-tautological resolvents

The method I came up with is the following. For every clause where v1 is inside, I go through every literal and in an array the size of all possible literals, I set a bit. For every clause, I set a different bit. When going through every clause of every literal where \neg v1 is present, I calculate the hamming weight (a popcount(), a native ASM instruction on modern CPUs) of the array’s inverse literals and substruct this hamming weight from the number of clauses v1 was inside. I sum up all these and then the final count will be the number of non-tautological resolvents. Here is a pseudo-code:

mybit = 1
num = 0
for clause in clauses[v1]:
    for l in clause:
        myarray[l] |= mybit

    mybit = mybit << 1
    num += 1

count = 0
for clause in clauses[not v1]:
    tmp = 0
    for l in clause:
        tmp |= myarray[not l]
    count += num - popcount(tmp)

print "number of non-tautological resolvents: %d" % count

I think this is pretty neat. Notice that this is linear in the number of literals in the clauses where v1 and \neg v1 is present. The only limitation of this approach is that ‘myarray’ has to have enough bits in its elements to hold ‘num’ number of bits. This is of course non-trivial and can be expensive in terms of memory (and cache-misses) but I still find this approach rather fun.

Using this procedure, I can check whether all resolvents are tautological, and if so, remove all the clauses and not calculate anything at all. Since this happens very often, I save a lot of calculation.

CryptoMinisat 3.1 released

CryptoMinisat 3.1 has been released. The short changelog is:

$ git diff cryptoms-3.0 cryptoms-3.1 --shortstat
 84 files changed, 3079 insertions(+), 2751 deletions(-)

The changes made were threefold. First, memory usage has been greatly reduced. This is crucial, because memory usage was over 7GB on certain instances. Secondly, the implication cache wasn’t very well-used and an idea that came to my mind greatly improved performance on most problems. Finally, time limiting of some inprocessing techniques on certain types of problems has been improved.

Memory usage reduction

On instances that produced a lot of long learnt clauses the memory usage was very high. These learnt clauses were all automatically linked in to the occurrence list and consequently took large amounts of memory, sometimes up to 10GB. On other instances, the original clauses were too numerous and too large, so putting even them into the occurrence list was too much. On these instances, variable elimination is not carried out (or carried out only later, when enough original clauses have been removed/shortened). To debug some of these problems, I wrote a fuzzer that generates extremely large problems with many binary and many long clauses, it’s available here as “largefuzzer”. It’s actually quite nice with many-many binary clauses so it also can fuzz the problems encountered with probing of extremely weird and large instances.

Implied literal usage improvement

CryptoMiniSat uses implied literals, i.e. caches what literals were propagated by each literal during probing. It then re-uses this information to subsume and/or strengthen clauses. This is kind of similar to stamping though uses more memory. It is actually useful to have alongside stamping, and I now do both — propagating DFS that stamping requires is expensive though updating cache during DFS is just as easy as during quasi-BFS.

The trick I discovered while playing around with cached implied literals is that if literal L1 propagates L2 and also !L2 then that means there are conceptually two binary clauses in the solver (!L1, L2), (!L1, !L2), so !L1 is TRUE. This is of course trivial, but I never checked for this. The question most would raise is: why would L1 propagate both L2 and !L2 and not fail? The answer is kind of tricky, but very interesting. Let’s say at one point, L1 propagates L2 due to a learnt clause, but that learnt clause is then removed. A new learnt clause is then later learnt, and with that learnt clause in place, L1 propagates !L2. Now, without caching, this would be ignored. Caching memorizes past conceptual binary clauses and re-uses this information.

This is not an optimization that only looks good on paper, it is very good to have. With this one optimization, I gained 5 instances from the SAT Comp’09 instances with a 1000s timeout (196 solved -> 201 solved). I can’t right now imagine how this could be done with stamping effectively, but that doesn’t mean it’s not possible. Though, according to my experience, stamping doesn’t preserve that much information over time as it’s being updated (renumbered) frequently while the cache is only improved over time, never shrunk. A possibility would be to have more than one stamp system and round-robin selecting them. However that would mean that sorting of clauses (for shrinking) would need to be done more than once, and sorting them is already relatively expensive. I sometimes feel that what stamping gains in memory it looses on sorting (i.e. processing time) and lower coverage (re-numbering).

More precise time-limiting

Martin Maurer has been kind enough to file a lot of bug reports about probing and variable elimination taking too much time, sometimes upwards of 150s when they should take around 20-30s maximum. While investigating, it tuned out that the problem was very weird indeed. While trying to eliminate or probe one variable the time for that one variable took upwards of 100s. This was completely unexpected as the code only checked for timeouts on a per-variable basis. In the end, the code had to be improved to track time on an intra-variable basis in both systems. While at it, I also added intra-variable time-tracking to implicit clause subsumption and strengthening too. So, over-times should less prevalent from now on. As an interesting side-note, time-limiting on probing is now so fine-grained that a 32-bit unsigned integer would overflow within 15s if used as the time-tracker.

CryptoMiniSat 3.0 released

CryptoMiniSat 3.0 has been released. I could talk about how it’s got a dynamic, web-based statistics interface, how it has more than 80 options, how it uses no glues for clause-cleaning and all the other goodies, but unfortunately these don’t much matter if the speed is not up to par. So, here is the result for the 2009 SAT Competition problems on a 1000s timeout with two competing solvers, lingeling and glucose:

cryptoms_speed

This of course does not mean that CryptoMiniSat is faster than the other solvers in general. In fact it is slower on a number of instances. What it means is that in general it’s OK and that’s good enough for the moment. It would be awesome to run the above experiment (or a similar one) with a longer timeout. Unfortunately, I don’t have a cluster to do that. However, if you have access to one, and would be willing to help with running the 3 solvers on a larger timeout, please do, I will post the updated graph here.

Update Norbert Manthey kindly ran all the above solvers on the TU Dresden cluster, thanks! He also kindly included one more solver, Riss 3g. The cluster was an AMD Bulldozer architecture with 2cores/solver with an extreme, 7200s timeout. The resulting graph is here:

cryptoms_speed

Riss 3g is winning this race, with CryptoMiniSat being second, third is glucose, and very intriguingly lingeling the 4th. Note that CryptoMiniSat leads the pack most of the time. Also note, this is the first time CryptoMiniSat 3.0 has been run for such a long time, while all the other competing solvers’ authors (lingeling, glucose, riss) have clusters available for research purposes.

Licensing

For those wondering if they could use this as a base for SAT Competition 2013, the good news is that the licence is LGPL so you can do whatever you want with it, provided you publish the changes you made to the code. However, I would prefer that you compete with a name such as “cms-MYNAME” unless you change at least 10% of the code, i.e. ~2000 lines. For the competitions after 2013, though, it’s all up for grabs. As for companies, it’s LGPL, so you can link it with your code, it’s safe, you only have to publish what you change in the library, you don’t have to publish your own code that uses the library.

Features

CryptoMiniSat has been almost completely rewritten from scratch. It features among other things:

  • 4 different ways to propagate
  • Implicit binary&tertiary clauses
  • Cached implied literals
  • Stamping
  • Blocking of long clauses
  • Extended XOR detection and top-level manipulation
  • Gate detection and manipulation
  • Subsumption, variable elimination, strengthening
  • 4 different ways to clear clauses
  • 4 different ways to restart
  • Large amounts of statistics data, both into console and optionally to MySQL
  • Web-based dynamic display of gathered statistics
  • 3 different ways to calculate optimal variable elimination order
  • On-the-fly variable elimination order update
  • Super-fast binary&tertiary subsumption&strengthening thanks to implicit bin&tri
  • On-the-fly hyper-binary resolution with precise time-control
  • On-the-fly transitive reduction with precise time-control
  • Randomised literal dominator braching
  • Internal variable renumbering
  • Vivification
  • On-the-fly clause strengthening
  • Cache&stamp-based learnt clause minimisation
  • Dynamic strongly connected component check and equivalent literal replacement

Code layout

As for those wondering how large the code base is, it’s about 20KLOC of code, organised as:

cryptoms_overview