Category Archives: SAT

Implicit binary clauses

October 28, 2010SATbug, Optimisation, subsumptionmsoos

I have lately been trying to get CryptoMiniSat to use implicit binary clauses. The idea is that since binary clauses are very trivial (just two literals), and they don’t really need to keep state (like clause activity), they don’t really need to be stored at a separate location. Instead, they can be stored directly in the watchlists. There are a number of advantages and disadvantages that come with this approach.

The main advantage is a notable reduction of memory usage and memory fragmentation. The first is obvious: since we don’t allocate space for binary clauses separately, the memory usage of the program should go down. This is especially true since SAT problems usually contain a huge number of binary clauses. The secondary benefit, that of reduced memory fragmentation is not really that much of an advantage if someone uses, e.g. the boost pool library.

The disadvantages are mainly twofold. Firstly, bugs are very difficult to find. Since there is not one central database of binary clauses (as before), it becomes difficult to check the consistency of the watchlists. However, if inconsistencies creep in, then the solution found by the SAT solver could be wrong. Worst of all, consistency is difficult to keep, as binary clauses often need to be worked on by e.g. subsumption, variable elimination, etc. The second biggest disadvatage is that if a new algorithm comes up that needs a database of binary clauses, this database would need to be re-built every time to run that algorithm, which can be very costly in terms of time.

All in all, it took me about 1 day to implement implicit binary clauses, and about 3 days to debug it. Surprisingly, it came with some very positive side-effects. Firstly, the debugging session took out some very long-standing bugs in CryptoMiniSat. Secondly, since binary clauses represent most clauses in typical SAT problems, and since binary clauses cannot be subsumed by anything other than binary clauses, the subsumption algorithm has been notably speeded up and its memory usage has been reduced. The worst part of using implicit binary clauses has been the fact that I can no longer use binary clause-sorting to find binary xor clauses, and I must resort back to Tarjan’s strongly connected component finding algorithm. This algorithm is algorithmically faster (only O(V+E) versus O(n*logn) ), but practically slower, since I need to carry it out repeatedly, and I cannot save state. Furthermore, I haven’t yet coded it down, so I am using boost’s graph algorithm, which means that I now have a dependency in CryptoMiniSat. This will eventually be corrected, but it is already causing some trouble.

In the meantime, I have been parallelising CryptoMiniSat. Funnily enough, it took me about 2 days in total to multi-thread CryptoMiniSat versus the 4 days it took to implement implicit binary clauses. Oh well.

Wasting cluster time

October 15, 2010SATcluster, programmingmsoos

Today marked the 100th cluster experiment I have conducted on the Grid’5000 cluster in Grenoble. The computers in this cluster are very powerful beasts, equipped with two Xeon Core i7-based processors, each with 4 cores for 8 cores in total with some impressive memory size and connectivity. The sad part is that since CryptoMiniSat is sequential software, I only use 1 core from the 8 total, since I cannot allow to have fluctuating test results due to memory-bandwidth problems when running multiple instances of the solver on the same physical computer. I have just counted, and using a very conservative estimate, I have wasted 41.8 years of CPU core time to get CryptoMiniSat up to speed.

Was it worth wasting such immense amount of CPU core time to speed up a SAT solver? I think yes. First of all, I only used the cluster during night and during the weekends, when load was negligible. So, in a way I wasn’t wasting much apart from my own time launching the cluster experiments, collecting and analysing the results. Secondly, the feedback from the cluster immensely influenced my research and engineering decisions, and taught me many things regarding SAT. The best thing I have learnt is: never trust your intuitions when it comes to SAT. Secondly, good ideas on paper rarely turn out to be good ideas when first implemented, but with heavy tuning, they usually turn out to work well. Let me detail two examples.

Clause Vivification (I called it asymmetric branching) is a relatively simple example of why tuning is always necessary. This optimisation shortens clauses it is fed to, however, it is slow. The idea I used when tuning this optimisation is that the longest clauses are probably responsible for the slow solving and the long generated conflict clauses. So, I sort clauses according to size first, and then do clause vivification on the topmost (i.e. longest) clauses. Furthermore, I realised that enqueueing literals 2-by-2 on the stack (instead of enqueueing 1-by-1) saves significant time, and is almost as good as enqueueing them 1-by-1 — the difference is at most 1 literal in the final simplified clause.

A more complex example is that of speeding up self-subsuming resolution. The problem was that keeping all clauses attached in the watchlists slowed down self-subsuming resolution. However, the obvious solution of detaching all clauses and reattaching them after self-subs. resolution was incomplete, strangely enough. With multiple cluster experiments it became apparent that keeping clauses attached is important, because some propagations are made during the running of the algorithm, and propagating these (very few!) variables is crucial. The solution was to keep natively stored (2- and 3-long clauses) in the watchlists. These natively kept clauses don’t have pointers in the watchlists, and therefore their representation at the pointer’s location can be updated. The watchlists are therefore partially cleaned, then self-subsuming resolution is carried out, then the watchlists are fully cleaned and finally the watchlists are fully populated — not exactly a trivial algorithm, but it achieves both goals of fast and complete self-subsuming resolution.

—
Off-topic: For those who like contemporary art, here is a gem

CryptoMiniSat 2.7.1 released

October 13, 2010SATReleasemsoos

It has been a long and winding road, but finally, CryptoMiniSat 2.7.1 has been released. The main additions relative to version 2.6.0 are:

Lots of documentation. I have added ~1’500 lines of comments to aid developers and user alike
Transitive on-the-fly self-subsuming resolution — an optimisation I have blogged about
Memory allocator has been fixed and updated
DIMACS parsing is now much better, which should allow for very interesting usage scenarios

Although the above list doesn’t seem to talk about performance, let me talk about it first. My personal point of view on performance is that different applications have different needs, so even if overall some solver is the best, it might not mean that it is the best for a particular application. I compared the performance of CryptoMiniSat 2.7.1 to the other mainline solvers, namely MiniSat 2.2, PrecoSat465, and lingeling. I ran the problem instances of the 2009 SAT Competition, with a timeout of 6’000 seconds on some powerful computers. The results are the following:

CryptoMiniSat could solve 225 problem instances (223 within SATComp’09 limits), and the second best solver, lingeling, could solve 211 (207 within SATComp’09 limits). Interestingly, both MiniSat 2.2 and lingeling were better than CryptoMiniSat 2.7.1 at low timeouts, and lingeling kept this advantage the longest, until ~500 seconds, above which CryptoMiniSat took over. Not being too fast at the beginning is a conscious choice by me, as I think it helps in the long run, though I might be wrong. There is much to be improved, and I hope that the immense amount of documentation added will help those willing to try to iron out these shortcomings.

As for the DIMACS parsing update, it basically allows to add information to clauses to distinguish between learnt and non-learnt clauses. This helps, because it is now possible to simply dump the learnt clauses at any moment during the search, and then re-start the search more-or-less where it was stopped. The command-line options for this are “–dumplearnts=FILENAME –maxdumplearnts=3” to dump, and “–alsoread=FILENAME” to re-read. This is currently quite a crude way of stopping-and-starting, but hopefully it will improve in the future.

All in all, I am quite happy about the new release. I have put lots of time into it, and I hope you will enjoy using it :)

Configuration management at IRILL days

October 4, 2010SATconfiguration management, free softwaremsoos

Today I attended the IRILL days here in Paris. IRILL’s goal is to bring together high level researchers and scientists, FOSS (Free and Open Source) developers, and FOSS industry players to tackle the three fundamental challenges that FOSS poses:

scientific: study problems raised by the development and maintenance of FOSS code
educational: adapt the teaching curriculum to FOSS
economic: help create a sustainable ecosystem for FOSS innovations

The most interesting part of the conference for me was Debian package dependency management, which currently is done by a number of ad-hoc algorithms, a problem that the Mancoosi Project is tacking. Essentially, there are a number of program packages that need different libraries and services, and we need to decide which ones we need to install in order to satisfy those dependencies when installing/upgrading a package. This dependency graph and the package to be installed/upgraded is capture in a language called CUDF (complete PDF description here). When deciding which packages to retain,upgrade or install, there are some goals which we must achieve, such as minimising the number of new packages as much a possible: the different optimisation criterias are described here. There is a competition associated with this computationally difficult problem, the MISC Competition, which was first conducted this year, during the FLoC conference.

Interestingly, the problem of package dependencies maps very well to problems faced by industry: configuration management. For instance, many high-end car models have thousands of options. Some of these options are incompatible, and some of them depend on other options being taken. For example, it may be necessary to take a certain color scheme to apply certain figurative patterns on the car. Although this sounds trivial, once the number of options go up, the complexity increases at a typically exponential rate, after which finding suitable solutions for customers might become a real challenge. For example, I wouldn’t be surprised if a modern airplane such as the A380 had customer options approaching the tens of thousands. Managing this automatically is a really interesting and a very real challenge, in the solution of which I hope CryptoMiniSat will be able to participate in.

Open source software? Free software?

September 30, 2010SATCollaborative effort, free software, linuxmsoos

Today I attended the Open World Forum conference here in Paris. Basically, it’s a business-oriented conference to do networking for folks in the free/open source industry. Some of the panelists were sometimes really boring, such as the “French Secretary of State responsible for the Digital Economy” who seemed to have deeply confused “free as in speech” and “open source” software — a grave mistake by my book. The panelist, who knew the difference of course, regularly overemphasised the use of “open source” software, but the notion of “free as in speech” was lost, and mentioned rarely, with the notable exception of the Red Hat folks. With the release of CryptoMiniSat, which I explicitly released under a free software licence, GPLv3, I of course disagreed.

The highlight of the conference for me was meeting the current Debian Leader, Stefano Zacchiroli, and researcher Roberto Di Cosmo. I have been using Debian for a very long time, and I always wanted to contribute. However, the best way to contribute is always with your expertise, which for me is SAT solvers. So, I approached Stefano with the idea of configuration management in Debian (dpkg), for which CryptoMiniSat would be a good fit, I think: complex package dependencies could be resolved with ease using CryptoMiniSat. If included in dpkg, CryptoMiniSat could take the prize of the most deployed SAT solver away from SAT4J, which currently holds this title due to its inclusion in the Eclipse development package. Fingers crossed… and lots of work is ahead.

Wonderings of a SAT geek

A blog about SAT solving and cryptography