All posts by msoos

On variable renumbering

Variable renumbering in SAT solvers keeps a mapping between the external variable numbers that is visible to the users and the internal variable numbers that is visible the to the system. The trivial mapping that most SAT solvers use is the one-to-one mapping where there is no difference between outer and internal variables. A smart mapping doesn’t keep track of all data related to variables that have been set or eliminated internally, so the internal datastructures can be smaller.

Advantages

Having smaller internal data structures help in achieving a lower memory footprint and better cache usage.

The memory savings are useful because some CNFs have tens of millions of variables. If the solver uses the typical watched literal scheme, it needs 2 arrays for each variable. If we are using 64b pointers and 32b array sizes, it’s 32B for each variable, so 32MB for every million variable only to keep the watching literal array(!). I have seen people complaining that their 100M variable problem runs out of memory — if we count that right that’s 3.2GB of memory only to hold the watching literal array pointers and sizes, not any data at all.

As for the CPU cache benefits: modern CPUs work using cache lines which are e.g. 64B long on Intel Sandy Bridge. If half of the variables are set already, the array holding the variable values — which will be accessed non-stop during propagation — will contain 50% useless data. In practice the speedup achieved can be upwards of 10%.

The simple problems

One problem with having a renumbering scheme is that you need to keep track of which datastructure is numbered in which way. The easy solution is to renumber absolutely everything. This is costly, however, as the mapping has to change every once in a while when new variables have been set. In this case, if everything is renumbered, then the eliminated variables‘ data needs to be updated according to the new mapping every time. This might be quite significant. So, it’s best not to renumber that. Similarly, if disconnected component analysis is used, then the disconnected components’ saved solutions need to be renumbered as well, along with the clauses that have been moved to the components.

An approach I have found to be satisfying is to keep every dynamic datastructure such as variable states (eliminated/decomposed/etc.), variable values (True/False/Unknown), clauses’ literals, etc. renumbered, while keeping mostly static datastructures such as eliminated clauses or equivalent literal maps non-renumbered. This works very well in practice as it allows the main system to shuffle the mapping around while not caring about all the other systems’ data.

The hard problem

The above is all fine and dandy until bounded variable addition (BVA) comes to the scene. This technique adds new variables to the problem to simplify it. These new variables will look like new outer variables, which seems good at first sight: the system could simply print the solution to all variables except the last N that were added by BVA and are not part of the original problem. However, if the caller adds new variables after the call to solve(), what can we do? The actual variables by the caller and the BVA variables will be mixed up: start with a bunch of original variables, sprinkle the end with some BVA, then some original variables, then some BVA…

The trivial solution to this is to have another mapping, one that translates variable numbers between the BVA and non-BVA system. As you might imagine, this complicates everything. Another solution is to forcibly eliminate all BVA variables after the call to solve(), let the user add the new clauses, and perform BVA again. Another even more complicated solution is to keep track of the variables being added, then re-number all variables inside all datastructures to move all BVA variables to the end of the variable array. This is expensive but only needs to be done once after the call to solve(), which may be acceptable. Currently, CryptoMiniSat uses the trivial scheme. Maybe I’ll move to the last (and most complicated) system later on.

Conclusion

Variable renumbering is not for the faint of heart. Bugs become significantly harder to track, as all debug messages need to be translated to a common variable numbering or they make no sense at all. It’s also very easy to introduce bugs through variable renumbering. A truly difficult bug I had was when the disconnected component finder’s sub-solver renumbered its internal variables and when I tried to import some values from the sub-solver back to the main solver, I used the wrong variable numbers.

A note on learnt clauses

Learnt clauses are clauses derived while searching for a solution with a SAT solver in a CNF. They are at the heart of every modern so-called “CDCL” or “Conflict-Driven Clause-Learning” SAT solver. SAT solver writers make a very important difference between learnt and original clauses. In this blog post I’ll talk a little bit about this distinction, why it is important to make it, and why we might want to relax that distinction in the future.

A bit of terminology

First, let me call “learnt” clauses “reducible” and original clauses “irreducible”. This terminology was invented by Armin Biere I believe, and it is conceptually very important.

If a clause is irreducible it means that if I remove that clause from the clause database and solve the remaining system of constraints, I might end up with a solution that is not a solution to the original problem. However, these clauses might not be the “original” clauses — they might have been shortened, changed, or otherwise manipulated such as through equivalent literal replacement, strengthening, etc.

Reducible clauses on the other hand are clauses that I can freely remove from the clause database without the risk of finding a solution that doesn’t satisfy the original set of constraints. These clauses could be called “learnt” but strictly speaking they might not have been learnt through the 1st UIP learning process. They could have been added through hyper-binary resolution, they could have been 1UIP clauses that have been shortened/changed, or clauses obtained through other means such as Gaussian Elimination or other high-level methods.

The distinction

Reducible clauses are typically handled “without care” in a SAT solver. For example, during bounded variable elimination (BVE) resolutions are not carried out with reducible clauses. Only irreducible clauses are resolved with each other and are added back to the clause database. This means that during variable elimination information is lost. For this reason, when bounded variable addition (BVA) is carried out, one would not count the simplification obtained through the removal of reducible clauses, as BVE could then completely undo BVA. Naturally, the heuristics in both of these systems only count irreducible clauses.

Reducible clauses are also regularly removed or ‘cleaned’ from the clause database. The heuristics to perform this has been a hot topic in the past years and continue to be a very interesting research problem. In particular, the solver Glucose has won multiple competitions by mostly tuning this heuristic. Reducible clauses need to be cleaned from the clause database so that they won’t slow the solver down too much. Although they represent information, if too many of them are present, propagation speed grinds to a near-halt. A balance must be achieved, and the balance lately has shifted much towards the “clean as much as possible” side — we only need to observe the percentage of clauses cleaned between MiniSat and recent Glucose to confirm this.

An observation about glues

Glues (used first by Glucose) are an interesting heuristic in that they are static in a certain way: they never degrade. Once a clause achieves glue status 2 (the lowest, and best), it can never loose this status. This is not true of dynamic heuristics such as clause activities (MiniSat) or other usability metrics (CryptoMiniSat 3). They are highly dynamic and will delete a clause eventually if it fails to perform well after a while. This makes a big difference: with glues, some reducible clauses will never be deleted from the clause database, as they have achieved a high enough status that most new clauses will have a lower status (a higher glue) and will be deleted instead in the next cleaning run.

Since Glucose doesn’t perform variable elimination (or basically any other optimization that could forcibly remove reducible clauses), some reducible clauses are essentially “locked” into the clause database, and are never removed. These reducible clauses act as if they were irreducible.

It’s also interesting to note that glues are not static: they are in fact updated. The way they are updated, however, is very particular: they can obtain a lower glue number (a higher chance of not being knocked out) through some chance encounters while propagating. So, if they are propagated often enough, they have a higher chance of obtaining a lower glue number — essentially having a higher chance to be locked into the database.

Some speculation about glues

What if these reducible clauses that are locked into the clause database are an important ingredient in giving glues the edge? In other words, what if it’s not only the actual glue number that is so wildly good at guessing the usefulness of a reducible clause, instead the fact that their calculation method doesn’t allow some reducible clauses ever to be removed also significantly helps?

To me, this sounds like a possibility. While searching and performing conflict analysis SAT solvers are essentially building a chain of lemmas, a proof. In a sense, constantly removing reducible clauses is like building a house and then knocking a good number of bricks out every once in a while. If those bricks are at the foundation of the system, what’s above might collapse. If there are however reducible clauses that are never “knocked out”, they can act as a strong foundation. Of course, it’s a good idea to be able to predict what is a good foundation, and I believe glues are good at that (though I think there could be other, maybe better measures invented). However, the fact that some of them are never removed may also play a significant role in their success.

Locking clauses

Bounded variable addition is potentially a very strong system that could help in shortening proofs. However, due to the original heuristics of BVE it cannot be applied if the clauses it removes are only reducible. So, it can only shorten the description of the original problem (and maybe incidentally some of the reducible clauses) but not only the reducible clauses themselves. This is clearly not optimal for shortening the proof. I don’t know how lingeling performs BVA and BVE, but I wouldn’t be surprised if it has some heuristic where it treats some reducible clauses as irreducible (thereby locking them) so that it could leverage the compression function of BVA over the field of reducible clauses.

Unfortunately, lingeling code is hard to read, and it’s proprietary code so I’d rather not read it unless some licensing problems turn up. No other SAT solver performs BVA as an in-processing method (riss performs it only as pre-processing, though it is capable to perform BVA as in-processing), so I’m left on my own to guess this and code it accordingly.

UPDATE: According to Norbert Manthey lingeling doesn’t perform BVA at all. This is more than a little surprising.

End notes

I believe it was first Vegard Nossum who put into my head the idea of locking some reducible clauses into the database. It only occurred to me later that glues automatically achieve that, and furthermore, they seem to automatically lock oft-propagated reducible clauses.

There are some problems with the above logic, though. I believe lingeling increments the glue counter of some (all?) reducible clauses on a regular basis, and lingeling is a good solver. That would defeat the above logic, though the precise way glues are incremented (and the way they are cleaned) in lingeling is not entirely clear to me. So some of the above could still hold. Furthermore, lingeling could be so well-performing for other reasons — there are more to SAT solvers than just search and resolution. Lately, up to 50% or more of the time spent in modern SAT solvers could be used to perform actions other than search.

MiniSat in your browser

CLICK HERE to run CyrptoMiniSat in your browser

 

Lately, I have become an LLVM nut. LLVM is amazing, it can do a lot, and, weirdly enough, a system developed from it is probably one of the biggest users of SAT solvers. I’m talking about KLEE, the symbolic virtual execution machine. It takes a source, compiles it with LLVM, then translates the LLVM IR into SMT, runs STP on it, which in turn runs CryptoMiniSat, and voila… the user gets back a free bug report. Or 1’200 bug reports.

Emscripten

Another great use of LLVM is emscripten, which takes the LLVM IR (again) and turns it into javascript. This of course means we can now run SAT solvers in the browser. This is so much fun. Here you can play with it, this is MiniSat “core”, from the golden old days of 2009. Just type in any CNF in DIMACS format and enjoy :) You can now solve the most basic NP-complete problem, right from your browser!

MiniSat in javascript

Please click here to run CyrptoMiniSat in your browser — much better!!

Epiloge

Yes, it’s actually running in your browser. The default CNF describes the HiTag2 cipher, with 10 output bits and 2 help bits given. The instance was generated using the Grain of Salt system. Note that the web page hangs until MiniSat finishes. Try typing in your own CNFs, play as much as you like. If you are patient, it can solve many problems, including reversing modern ciphers, finding SHA-1 collisions (PDF), verifying hardware and software… you just have to be patient enough.

Probably unbeknownst to you, you are using products of SAT solvers for your daily life: CPUs are verified using SAT solver-based techniques, airplane software is formally verified using SAT solvers, FPGA and CPU layouts are optimized using them, and if you are lucky, your car’s safety-critical systems are also verified using formal techniques, which basically means SAT solvers. Actually, even train schedules and public transport schedules are created using SAT solvers — I know for a fact that many trains in Europe are scheduled using these techniques. All these can be done using the very simple problem description, the CNF, you see above — and many of these systems use MiniSat.

How I did it and what could be improved

It’s relatively easy to compile any C or C++-based SAT solver to javascript using emscripten. Here is my github repo, with included HOWTO, for MiniSat. I would love to compile current lingeling, but its license doesn’t seem to allow emscripten to even think about compiling it. I’ll compile CryptoMiniSat later, it’s a bit hairy right now, I’m trying to refactor the code since it’s a mess. Until then, I hope you’ll enjoy playing with MiniSat!

Acknowledgements

A big thanks to my colleagues who pushed me to became an LLVM nut. Also, big thanks to Niklas Eén and Niklas Sörensson for MinSat!

My SAT solver fuzzing setup

Since the SAT solver fuzzing paper by Brummayer, Lonsing and Biere, SAT solver fuzzing has been quite the thing in the SAT community. Many bugs have since been discovered in almost all SAT solvers, including those of this author. In this blog post I will detail my SAT fuzzing setup: what fuzzers I use and how I use them.

My fuzzers

The most obvious fuzzers to use are those present in the paper and accompanying website: CNFFuzz and FuzzSAT. These fuzzers randomly genereate high-quality structured problem instances. The emphasis here is on the word structured. Randomly generated instances, as the authors of the original paper note, do not excercise SAT solvers enough. SAT solvers are written with very specific problem types in mind: problems containing long chains of binary clauses, gates, etc.

Over the year(s) I have found, however, that these fuzzers are not enough. One of the revelations came when Vegard sent me lots of bugreports about CryptoMiniSat failing when Gaussian elimination was enabled. It turned out that although I ran the fuzzer for long hours, I only ran it with the default option, Gaussian turned off. So default options are sometimes not enough and when off-the track, systems can fail.

Vegard then sent me a fuzzer that wasn’t using CNFFuzz or FuzzSAT, but instead his own CNF generator for SHA-1 hash functions. Instances thus generated are not only structured, they are also very picky about the solution: they either have exactly one or zero. This helped fuzz solution reconstruction, which after blocked clause elimination may not be exactly easy. In fact, it’s so good at fuzzing that it finds a bug I have never been able to fix in CryptoMiniSat, to this day, about binary clauses. Whenever I block them, I cannot reconstruct the solution, ever, unless I disable equivalent literal replacement. The authors of the paper give a pretty complicated proof about why it should work in the presence of equivalent literal replacement, which I still haven’t understood.

When I implemented disconnected component analysis into CryptoMiniSat 3, I knew I had to be careful. In the SAT Competition’11 I failed terribly with it: CryptoMiniSat 2 memory-outed and crashed on many instances with disconnected components. The solution I came up with was to make a program that could concatenate two or more problems into one problem, renumbering the variables. This meta-fuzzer (that used fuzzers as inputs) was then used to fuzz the disconnected component handling system. Although this sounds sufficient, it is in fact far from it. Firstly, the problems generated are disconnected at toplevel — whereas problems in the real world sometimes disconnect later during the search. Secondly, it took 2, maybe 4 CNF inputs, but real problems sometimes have thousands of disconnected components. Both of these issues had to be addressed.

One of the most difficult parts of modern SAT solvers is time-outs of complicated algorithms. Time-outs sometimes make code quite convoluted, and thus are a source of bugs. However, most fuzzers generate problems that are too small for any time-outs to occur, thus severely limiting the scope of the fuzzer. So, the fuzzers don’t exercise a part of the system that is fully exercised by real-world problems. To fix this, I have written a short fuzzer script that generates enormous, but simple problems with millions of variables. This helped me fix a lot of issues in timeouts. I advise having a RAM-disk when using it though, as it writes gigabyte-size CNFs.

I also use some other fuzzers, namely one that excercise XOR-manipulation and the sgen4 CNF generator by Ivor Spence (PDF, pages 85-86) that generates problems of tuneable hardness.

My fuzzing system

Fuzz CNF generators are nice, but not enough on their own. A system around them is necessary for the full benefit. Writing one sounds trivial at first, but my system around the fuzzers is about 1000 lines of python code, so maybe it’s not that trivial after all. It has to do a number of things to be truly useful.

First, it needs to test the solutions given by the solver. If the solution is SAT, it needs to read the (gzipped) CNF file, and check the (partial) solution. Second, if the solution is UNSAT, it either needs to call the DRUP checker to verify the proof, and/or it needs to call another solver to see if it finds a solution or not — and verify that the other solver’s solution is correct. I actually run both DRUP and the other solver, you can never be sure enough. Note that solution verification also complicates time-limiting: we don’t want to wait 30 minutes because a fuzz CNF is too complicated, and we don’t want to indicate an error just because the problem is hard.

Secondly, fuzzing is not only about generating CNFs that trigger bugs, but also about saving these problematic cases for later, as a form of regression tests. These regression tests need to be ran regularly, so that bugs that came up before don’t come up again. I have a large set of regression tests that I have accumulated over the years. These help me increase confidence that when I improve the solver, I don’t accidentally re-introduce bugs. My fuzzer system helps manage and run these regression tests.

Finally, we don’t just want to fuzz the solver through the CNF interface. SAT solvers typically expose a library interface where one can add clauses, solve (with assumptions) and check the conflict returned in terms of the assumptions given. For now, CryptoMiniSat’s library is fuzzed through a specialized CNF format, where “c Solver::solve()” lines are interspersed with the regular clauses. These lines indicate for the solver to call the solve() method and output the solution to a file. This file is then read by the fuzzing system and verified by copying parts of the file into another CNF and using the DRUP-checker and another SAT solver to verify the results. This simulates library usage without assumptions. Thanks to Lukas Prokop who alerted me to bugs in assumptions usage, assumptions-based solving is also simulated through “c Solver::solve( LIT LIT LIT…)” calls, where LITs are replaced with a random literals.

Future work

I want to do two things related to testing and fuzzing. First, I want to improve the checker so that it checks for more problems. I want to check the conflict returned when assumptions are used. Further, I want to check timeouts — this will require some coordination between the fuzzer system and the solver where times are printed that are later checked for correctness. Some optimisations must never take more than N seconds on a fast enough machine, and this could be checked relatively easily.

Second, I want to move CryptoMiniSat into a test-driven system. This will require some quite extensive code re-writing so that test-harnesses will be easier to write. However, it will help fix one issue that hasn’t been talked about, kind of like the elephant in the room: performance bugs. They are quite prevalent, and cannot be caught by simple fuzzing. They plague many SAT solvers, I’m sure, as I keep on finding them at every corner. They are the silent bugs that prevent solvers to be more performant, yet they don’t diminish the resulting solution — only the time takes to find it.

Why CryptoMiniSat 3.3 doesn’t have XORs

I’ve had a number of inquiries about why CryptoMiniSat 3.3 doesn’t have native XOR clause support. Let me explain what I think the benefits and the drawbacks are of having native XOR support and why I made the choice of removing them.

Benefits

The most trivial benefit is that users don’t have to translate XORs into plain CNF. Although transforming is rather easy, some prefer not to go through the hassle. The principle is to cut the XOR into smaller XORs by introducing new variables and then representing the cut-down XORs by banning all the wrong combinations. For example the the XOR x+y+z+w=0 is cut into two XORs x+y+r=0, z+w+r=0 and then x+y+r=0 is represented as x \vee y \vee r, x \vee \neg y \vee \neg r, \ldots.

A more substantial benefit is that propagation and conflict generation can be done natively. This is quite advantageous as it means that far less clauses and variables need to be visited while propagating and generating the UIP conflict. Less variables and clauses visited means less cache-misses and due to the more compact representation, less memory overhead and thus even better cache usage. (For the aficionados: an interesting side-effect of XORs is that blocked literals cannot be used for XOR clauses).

The biggest benefit of having native XORs is that Gaussian elimination can be carried out at every level of the search tree, not just at top-level. Although most people I have talked with still think that CryptoMiniSat 2 only performs Gaussian elimination only at top-level, this is not the case. Performing Gauss at top-level is a two-day hack. Performing it at every level took me about half a year to implement (and about 5K lines of code). It requires leaving traces of past computations and computing reasons for propagations and conflicts indicated by the Gaussian elimination algorithm. According to my highly unofficial count, exactly 2 people use Gaussian elimination at deeper than top-level, Vegard Nossum and Kuldeep S Meel. Out of these two users, only the latter have indicated speedups. It speeds up crypto problems, but not many others, so it’s completely disabled by default.

Drawbacks

The main drawback of having native XOR clauses along normal CNF clauses is the loss of simplicity and universality. Let me explain in detail.

Loss of simplicity. Once a variable present in both a normal and an XOR clause, it cannot be eliminated in a simple way. I don’t even want to try to express the clauses that would result from doing varelim in such cases — it’s probably horribly convoluted. All algorithms in the code now need to check for XORs. On-the-fly clause strengthening cannot strengthen an XOR with a normal clause, of course. Clause cleaning has to take into account that when a 3-long XOR is shortened to a 2-long one, it indicates a new equivalent literal and that may lead to immediate UNSAT. I could go on, but the list is long.

Another problem is that XORs make the state more complex. Until now there were only clauses of the form a \vee \neg b \ldots, sometimes specially stored such as binary clauses. The state of the solver goes through many transformations — think of them as llvm passes — while it is in-processed and solved. The more complex the state, the larger and the more complex the code. While a simple clause-cleaning algorithm can be expressed in about 20 lines of code without implicit binary&tertiary clauses, the same algo blows up to about 100 lines of code with those optimizations. Once you add XOR clauses, it jumps to around double that. We just made our code 10x larger. Imagine adding native support for, e.g. at-most-n.

Loss of universality. If a variable is only XOR clauses, it can now be eliminated at the XOR level. A pure variable at the XOR level indicates an XOR clause that can be removed and later always satisfied. XOR subsumption is when an XOR subsumes the other: the larger can be removed and their XOR added in. All of these are implemented in CMS 2. Probably there are many others out there, though. We could try to implement them all, but this is a whole new universe that may open up the gates of hell. Worst of all, most of these techniques on XORs are probably already simulated by in-processing techniques at the CNF level. Pure XOR variables are simulated by blocked clause elimination. XOR-based variable elimination doesn’t make practical sense at the CNF level, but it could be simulated by normal variable elimination if ignoring heuristics. I could go on.

What CryptoMiniSat 3.3 does

The compromise I came up with for CryptoMiniSat 3.0 is that I regularly search for XOR clauses at the top-level, perform top-level Gaussian elimination, and add the resulting new information to the CNF. Binary and unitary XOR clauses are the simplest to add back.

In the SAT’11 application benchmark set, out of 300 problems, 256 problems contain XOR clauses. the number of XOR clauses found, when they are found, is on average 2905 with an average size of 4.2 literals. Using Gaussian elimination, counting only when something was extracted, the average number of literal equivalences (binary XORs) extracted is 130, while the number of unitary clauses extracted is 0.15.

Here is a typical example output:

c Reading file 'valves-gates-1-k617-unsat.shuffled-as.sat03-412.cnf.gz'
c -- header says num vars:         985042
c -- header says num clauses:     3113540
[.. after about 124K conflicts ..]
c XOR finding  Num XORs: 104091 avg size:  3.0 T: 2.11
c Cut XORs into 2185 block(s) sum vars: 180818 T: 0.11
c Extracted XOR info. Units: 0 Bins: 1275 0-depth-assigns: 0 T: 8.34

Here, about 104K XOR clauses were found (using a sophisticated algorithm) within 2.11s, all of size 3. These XOR clauses were subjected to disconnected component analysis (within the XOR sphere) and they were found to belong to 2185 components. It is important to cut XORs into groups, because Gaussian elimination is an O(n^{\approx 2.5}) algorithm and if we can reduce the n by having more than one matrix, the speed is significantly increased. Next, these 2185 matrices were Gauss-eliminated and were found to contain 1275 binary XOR clauses, which were quickly marked as equivalent literals.

To perform Gaussian elimination at top-level, CryptoMiniSat uses the excellent m4ri library, maintained by my old college and friend Martin Albrecht. It’s a lightweight yet extremely fast Gaussian elimination system using all sorts of nifty tricks to do its job. Unfortunately, for large matrices it can still take a long while as the algorithm itself is not cheap, so I have a cut-off for matrices that are too large.

What a future CryptoMiniSat 3.x could do

Once we have extracted the XORs, we could just as well keep them around during search and perform Gaussian elimination at levels below the top-level. This would bring us close to the old system in terms of real performance — native propagation (which would be unavailable) can only give a 1.5x-2x speedup, but Gaussian elimination at every level can give much more. I would need to clean up a lot of code and then, maybe, this would work. Maybe I’ll do this one day. Though, after spending weeks doing it, probably people will still believe it only does top-level. At least right now, it’s the case.

Conclusions

Implementing a new theory such as XOR deep into the guts of a SAT solver is neither easy nor does it provide a clear advantage in most situations. Those who push for these native theories have either not tried implementing them into a complicated solver such as lingeling/SatELite/CryptoMiniSat/clasp or have already bit the bullet such as the clasp group and I did, and it is probably limiting them in extending the system with new techniques. The resulting larger internal state leads to edge cases, exceptions, and much-much more code.