Satisfiability problem solvers, or SAT solvers for short, try to find a solution to decidable, finite problems such as cryptography, planning, scheduling, and the like. They are very finely tuned engines that can be looked at in two main ways . One is to see them as proof generators, where the SAT solver is building a proof of unsatisfiability as it runs, i.e. it tries to prove that there is no solution to the problem. Another way is to see SAT solvers as smart search engines. In this blog post, I’ll take this latter view and try to explain why I think intermediary variables are important. So, for the sake of argument, let’s forget that SAT solvers sometimes restart the search (forgetting where they were before) and learn clauses (cutting down the search space and remembering where not go again). Let’s just pretend all they do is search.
Searching
The CryptoMiniSat SAT solver used to be able to generate graphs that show how a search through the search space went. Search spaces in these domains are exponential in size, say, 2^n in case there are n variables involved. I don’t have the search visualization code anymore but below is an example of such a search tree. The search starts at the very top not far from the middle, it descends towards the bottom left, then iteratively backtracks all the way to the top, and then descends towards the bottom right. Every pentagon at the bottom of a line is a place where the SAT solver backtracked. Observe that it never goes all the way back to the top — except once, when the top assignment needs to be flipped. Instead, it only goes back some way, partially unassigning variables. The bottom right corner is where the solution is found after many-many partial backtracks and associated partial unassignements:
What I want you to take away from this graph is the following: the solver iteratively tries to set a variable a value, calculates forward, and if it doesn’t work, it will partially backtrack, flip its value to its opposite, then descend again.
Brute force search vs. SAT solving
Trying one value and then trying the other sounds suspiciously like brute force search. Brute force search does exactly that, in a systematic and incredibly efficient way. We can build highly specialized executables and even hardware, to perform this task. If you look at e.g. Bitcoin mining, you will see a lot of specialized hardware, ASICs, doing brute-force search. And if you look at rainbow tables, you’ll see a lot of bit slicing.
So why waste our time doing all this fancy value propagation and backtracking when we could use a much more effective, systematic search system? The answer is, if you generated your problem description wrongly, then basically, for no good reason, and you are probably better off doing brute-force search. But if you did well, then a SAT solver can perform a significantly better search than brute-force. The trick lies in intermediary variables, and partial value assignments.
Partial value assignments
So let’s say that your brute force engine is about to check one input variable setting. It sets the input variables, runs the whole algorithm, and computes the output. The output is wrong though. Here is where things go weird. The brute force engine now completely erases its state, takes another input and runs the whole algorithm again.
So, brute force does the whole calculation again, starting from a clean state, every time. What we have to recognize is that this is actually a design choice. Another design choice is to calculate what variables were affected by one of the input bits, unset these variables, flip the input bit value, and continue running the calculation. This has the following requirements:
- A way to quickly determine which intermediate values depend on which other ones so we can unset variables and know which intermediate, already calculated, dependent variables also need to be unset.
- A way to quickly unset variables
- A good set of intermediary values so we can keep as much state of the calculation as possible
If you think about it, the above is what SAT solvers do, well mostly. In fact, they do (1) only partially: they allow variables only to be unset in reverse chronological order. Calculating and maintaining a complete dependency graph seems too expensive. So we unset more variables than we need to. But we can unset them quickly and correctly and we compensate for the lack of correct dependency check in (1) by caching polarities. This caches the independent-but-nevertheless-unset variables’ values and then hopes to reassign them later to the correct value. Not perfect, but not too shabby either.
Modeling and intermediary variables
To satisfy requirement (3) one must have a good set of intermediary variables in the input problem (described in DIMACS format), so the SAT solver can both backtrack and evaluate partially. Unfortunately, this is not really in the hands of the SAT solver. It is in the hands of the person describing the problem. Modeling is the art of transforming a problem that is usually expressed in natural language (such as “A person cannot be scheduled to be on a night shift twice in a row”) into a problem that can be given to a SAT solver.
Modeling has lots of interesting constraints, one of which I often hear and I am confused by: that it should minimize the number of variables. Given the above, that SAT solvers can be seen at as partial evaluation engines that thrive on the fact that they can partially evaluate and partially backtrack, why would anyone try to minimize the number of variables? If the solver has no intermediary variables to backtrack to, the solver will simply backtrack all the way to the beginning every time, thus becoming a really bad brute-force engine that incidentally tracks a dependency graph and is definitely non-optimized for the task at hand.
Some final thoughts
In the above I tried to take a premise, i.e. that SAT solvers are just search engines, and ran with it. I don’t think the results are that surprising. Of course, nothing is black-and-white. Having hundreds of millions of variables in your input is not exactly optimal. But minimizing the number of variables given to a SAT solver at the expense of expressive intermediate variables is a huge no-no.