All posts by msoos

CCC madness

The Chaos Computer Club (CCC) is having its yearly conference starting today. It’s a madhouse here, which is great for ideas, so I have been having this rush of ideas implemented. First off, it seems that complex problems with few variables are way too difficult to solve even with distributed solving. I have been trying to solve some difficult problems, but no matter how much computer power I throw at them (and I have been throwing >2 years’ worth, with >300 CPU cores running), not much progress is being made. I have lately been attributing this “failure” to one prime problem: lack of expressiveness.

Basically, I feel like the SAT solver is trying to express some complex functions with learnt clauses. For instance, an adder. However, the only thing it can do, is describe this function using a Karnaugh Map. Anybody who has tried to express an adder without introducing new variables is well aware that that’s a difficult task. So, we need new variables. This is the point where some collaboration I have lately been doing with some researchers (Laurent Simon and George Katsirelos) comes into play. Following their footsteps, I realised we could introduce new definitions, using a new targeting method we developed. So, I did just that, and here is a nice result for a very difficult problem:

num: 7750 lit1: 741 lit2: 4641
num: 2339 lit1: 1396 lit2: 1670
num: 2172 lit1: -719 lit2: 741
num: 2169 lit1: -719 lit2: 4641
num: 2061 lit1: 1669 lit2: 1670
num: 2010 lit1: 1670 lit2: 1734
num: 1973 lit1: 1670 lit2: 1882
...
time: 2.53 seconds

Where literals lit1 and lit2 are both present in num number of clauses. So, for example literals 741 and 4641 are both present in 7750 clauses. Which means that if we introduce a definition with a new variable 10000:

10000 = 741 OR 4641

(which requires 2 binary clauses and one 3-long clause) we can remove 7750 literals from the problem, while increasing the propagation potential. For example, if there was a clause

741 OR 4641 OR 2 OR 3 = TRUE

we can make it into

10000 OR 2 OR 3 = TRUE

What I have just described is a (dumb) version of extended resolution, the holy grail that many have tried and only few have succeeded at. It is a holy grail because it is provably more powerful than normal resolution, but heuristics of when and how to introduce the new variables have been difficult to come up with. What I have just described at least would reduce the problem size: removing e.g. ~7700 literals while only introducing 3 short clauses seems to worth the effort. It might not simulate extended resolution (ER), but it should speed up the solving, while giving a (lowly) shot at ER.

PS: If all of this sounds mad, that may be attributed to the ~500 people around me loudly developing and discussing issues ranging from peer-to-peer routing to FPGA programming

Distributed SAT solving is fun

I have lately been trying to keep my promise I made on the webpage of CryptoMiniSat: “The long-term goals of CryptoMiniSat are to be an efficient sequential, parallel and distributed solver”. The first step was to make a relatively efficient sequential solver. With the winning of the SAT Race’10, that was at least partially accomplished. The next step was to make the program parallel. Now, I have made it distributed. The parallel version was made using OpenMP, and you can try it using the “master” branch from the git repository. The distributed version uses MPI, which means it can probably work on most cluster/supercomputer architectures. This distributed version is currently alpha, and is not yet in the git repo.

My first impression of the distributed version is: it’s awsome to solve using 10-20 machines in parallel. The speed is just unbelievable. It’s also great to have the multi-threaded and distributed option in one program, as I can now launch just one MPI process on each machine, and the 8 cores on each machine can use threads to work together, instead of using the somewhat slower MPI. I haven’t measured the speedup, and I haven’t measured the MPI load, so I cannot say anything yet about efficiency. All I can say, is that using 160 threads is a little faster than using only one.

As for technical details, I am only sharing unitary and binary clauses. It’s a trivial and lazy choice, but it makes it easier to check for redundant clauses (thanks to Armin Biere for highlighting to me that this is important), and works well with CryptoMiniSat’s expand-contract loop, where a set of simplification operations are carried out at regular intervals to squeeze the most unitary and binary clauses out of the current knowledge stored inside the clause database.

I am looking forward to breaking a somewhat problematic crypto-problem with the new distributed CryptoMiniSat. Hopefully this will work, though maybe it’s some way off — I think the original time needed to break the problem might be in the hundreds of days with single-threaded non-distributed CryptoMiniSat. Fingers are crossed that the speedup is non-linear with the distributed version, and I can allocate enough computers for enough time to the job.

Edited to add — The solver has now been running stable on 100 computers, distributed. This is getting to be real fun. Next step is to deploy it on a 342-node cluster, each node double-core. Having 684 threads collaborate will feel mighty ;) Also, it will inspire me to build more heterogeneity into the architecture. It was never designed to run on this many nodes…

Hyper-binary resolution: I was wrong, again

I am not perfect, so I make mistakes. One of the more memorable mistakes I have made has been regarding hyper-binary resolution. More specifically, in this post I wrote that I cannot add as many binary clauses using hyper-binary resolution as Armin Biere, one of the leading SAT solver experts, and I proposed a reason that would have meant that CryptoMiniSat was doing things differently and thus better. Well, I was wrong, on multiple accounts.

First of all, there was an awful bug in the code that meant that hyper-binary resolution was not carried out on negated variables. Second, when it was decided that multiple implications need to be attached to a literal, I didn’t check for redundancy. For example, if v1 had to be connected to v2 and v3, I simply added the clauses
-v1 OR v2 (1)
-v1 OR v3 (2)
However, this may not be very efficient, since if there is already a binary clause
-v2 OR v3 (3)
then clause (2) doesn’t need to be added. The added redundant binary clauses reduced the speed of solving while adding no extra knowledge. There were many of them, too: approx 2/3rd of all binary clauses added were redundant.

Hyper-binary resolution is conceptually not too difficult, but takes a lot of thinking to code efficiently, is very time consuming and its benefits are not clear-cut. I believe the problem is that many binary clauses added this way are ultimately useless, since most are connected to variables that will soon be proved to be of a fixed value. Another possibility is that since problems are pretty structured, and it’s usually best to attack problems in a specific way (which is normally correctly guessed by the VSIDS and polarity-guessing&caching heuristics), the binary clauses added by hyper-binary resolution do not help resolving the problem given the (typically correct) attack method employed by SAT solvers. In other words, the information added is nice, but mostly unused. This is just wild speculation, and I may only think this because my code is slow — I believe Armin’s code is faster, so I should have a look.

Dependent Literals

I have lately been trying to work on branching heuristics. The idea is relatively simple. If we have a cache of what propagates what, we can try to branch on literals that propagate a lot of other literals. The algorithm boils down to the following:

  1. For each literal L, we select a literal DL that propagates L and has the largest tree of propagated literals.. We call this DL the dominating literal. Naturally, some literals don’t have a dominating literal, and some literals appear as dominating literals for many other literals.
  2. When branching, if we are supposed to branch on a literal that has a dominator, we pick the dominator instead of the literal. This not only ensures that the literal will be set, but it will probably also set a number of other literals.
  3. Using the heuristic of step 2, we should reach a conflict earlier, with less decisions taken. This should lead to faster conflicting, and thus faster solving.

Of course things aren’t as simple as they seem. First of all, as George has indicated to me, Jarvisalo and Junttila have published  a paper entitled Limitations of restricted branching in clause learning, which deals with issues related to this branching heursitic. The baseline of that paper is that it’s a bad idea to branch only on dependent (they call them input) variables. I haven’t really got very deep into this, but there are some subtle differences between the heuristic above and dependent variables, notably that I have treated dependent literals above, while Matti and Tommi are treating dependent variables.

As for the advantages of the above heuristic, it seems to work on some instances in practice. In particular, it allows me to solve some cryptographic instances significantly faster — in fact, some instances that have been extremely hard to solve are now solveable. To me, this looks like an indication that the heuristic is not entirely useless. There are some subtle problems, but it seems that a mixture of randomly branching either on the literal or on its dominator fixes most problems. Indeed, I think CryptoMiniSat 3.0.0 will be released with this experimental feature turned on by default.

PS: CryptoMiniSat 3.0.0 is coming soon, I believe. It’s going to provide a fully multi-threaded library&executable, with a MiniSat-compatible syntax, so anyone will be able to multi-thread his or her SMT solver at a whim.

Caching

“Representation is the essence of programming”
Frederic P. Brooks Jr., “The Mythical Man-month”

Looking through the literature on SAT solvers, it is rare to find any algorithm that uses a form of time-memory trade-off. In fact, it is rare to find any algorithm that uses too much memory. This also shows up in practice, as it’s rare to find a SAT solver using too much memory. This translates to the following: if a SAT solver can in an effective way utilize the typically abundant memory, that SAT solver will have an advantage. Based on this logic, I think CryptoMiniSat is doing good.

What I have realised is that it’s great to know all the literals we can reach from any given literal. Let me explain. Let’s say we have a look at literal “v1”, i.e. “v1 = true”. We propagate this literal as usual, and we reach a set of literals that are implied. For example, we reach “v2, v3, -v4”. This knowledge is regularly computed by modern SAT solvers, but is also quickly thrown away. I advocate keeping this, and here is why. There are a number of very interesting things we can do with these, three of which I have found useful.

Number one. This is the best, and the least obvious one. The algorithm used for computing equivalent literals (i.e. “v1 = -v2” or “v1 = v4”) is a variation of Tarjan’s algorithm for finding strongly connected components (SCC). This algorithm requires a set of binary clauses such as “-v1 v2” and “v1 -v2” as input to find equivalent literals, in the example case “v1 = -v2”. The most obvious way to “feed” this algorithm with information is to give it all the binary clauses we have. But that misses out on all the binary clauses that are not directly apparent in the problem, but  could be derived from the problem. For example, if “-v1 -v4” was not directly in the problem, SCC cannot find the equivalence “v1 = v4”. Naturally, by using our cache, we can be sure that “-v1 -v4” is part of the problem (since “v1” propagates “-v4”). So, using this cache, we can substantially increase the power of SCC. More equivalent literals lead to less variables, and an overall faster solving.

Number two. Clause vivification is a method to make the original clauses of the problem shorter. It’s a relatively simple algorithm that enqueues the literals in the clauses one-by-one and checks if their propagation leads to a conflict. If so, then the clause can be shortened. With the cache, we can do something similar, though less powerful in terms of reasoning power, but far more effective in terms of speed. The trick is to simply try every single literal in the clause: if a literal propagates the inverse of another literal in the clause, we can remove it. For example, if the clause is “a b -c”, and in the cache for “b” there is “f,g,h,j,-c”, then we know that conceptually there exists a clause “-b -c”, which means we can now remove “b” from the original clause, using the the self-subsuming resolution rule. This form of vivification is, although technically less strong than normal vivification, is typically 50-100x faster than normal vivification and is almost as powerful. This kind of speed advantage means it can essentially be carried out without (time) penalty.

Number three. When generating conflict clauses, MiniSat-type conflict minimisation has now become commonplace. This uses the clauses involved in the conflict to carry out self-subsuming resolution on the conflict clause generated. Just as with clause vivification, we can use our cache to carry out self-subsuming resolution with the (conceptually binary) clauses stored in the cache. This is not much different from clause vivification, but it allows us to do simplification in the middle of a restart, instead of patiently waiting for the restart to end. Furthermore, it can uncover that certain learnt clauses can become binary, thus alleviating the problem of cleaning “useless” learnt clauses that could have become binary through, e.g. clause vivification.

I am aware that all the above three could be carried out without a cache — in fact, CryptoMiniSat could do (and most of the time did) all of the above without it. The advantage of having the cache is speed: things that used to take immense amounts of time can now be done really fast. However, what interests me the most in this cache, is the new uses that will come of it. I was originally only aware of number (3), then when I realised (1), I dived deep and realised that (2) can be done. I think more uses will eventually emerge.