Category Archives: SAT

CryptoMiniSat 3.0 released

April 1, 2013Development, SATRelease, SAT, SAT Racemsoos

CryptoMiniSat 3.0 has been released. I could talk about how it’s got a dynamic, web-based statistics interface, how it has more than 80 options, how it uses no glues for clause-cleaning and all the other goodies, but unfortunately these don’t much matter if the speed is not up to par. So, here is the result for the 2009 SAT Competition problems on a 1000s timeout with two competing solvers, lingeling and glucose:

cryptoms_speed

This of course does not mean that CryptoMiniSat is faster than the other solvers in general. In fact it is slower on a number of instances. What it means is that in general it’s OK and that’s good enough for the moment. It would be awesome to run the above experiment (or a similar one) with a longer timeout. Unfortunately, I don’t have a cluster to do that. However, if you have access to one, and would be willing to help with running the 3 solvers on a larger timeout, please do, I will post the updated graph here.

Update Norbert Manthey kindly ran all the above solvers on the TU Dresden cluster, thanks! He also kindly included one more solver, Riss 3g. The cluster was an AMD Bulldozer architecture with 2cores/solver with an extreme, 7200s timeout. The resulting graph is here:

cryptoms_speed

Riss 3g is winning this race, with CryptoMiniSat being second, third is glucose, and very intriguingly lingeling the 4th. Note that CryptoMiniSat leads the pack most of the time. Also note, this is the first time CryptoMiniSat 3.0 has been run for such a long time, while all the other competing solvers’ authors (lingeling, glucose, riss) have clusters available for research purposes.

Licensing

For those wondering if they could use this as a base for SAT Competition 2013, the good news is that the licence is LGPL so you can do whatever you want with it, provided you publish the changes you made to the code. However, I would prefer that you compete with a name such as “cms-MYNAME” unless you change at least 10% of the code, i.e. ~2000 lines. For the competitions after 2013, though, it’s all up for grabs. As for companies, it’s LGPL, so you can link it with your code, it’s safe, you only have to publish what you change in the library, you don’t have to publish your own code that uses the library.

Features

CryptoMiniSat has been almost completely rewritten from scratch. It features among other things:

4 different ways to propagate
Implicit binary&tertiary clauses
Cached implied literals
Stamping
Blocking of long clauses
Extended XOR detection and top-level manipulation
Gate detection and manipulation
Subsumption, variable elimination, strengthening
4 different ways to clear clauses
4 different ways to restart
Large amounts of statistics data, both into console and optionally to MySQL
Web-based dynamic display of gathered statistics
3 different ways to calculate optimal variable elimination order
On-the-fly variable elimination order update
Super-fast binary&tertiary subsumption&strengthening thanks to implicit bin&tri
On-the-fly hyper-binary resolution with precise time-control
On-the-fly transitive reduction with precise time-control
Randomised literal dominator braching
Internal variable renumbering
Vivification
On-the-fly clause strengthening
Cache&stamp-based learnt clause minimisation
Dynamic strongly connected component check and equivalent literal replacement

Code layout

As for those wondering how large the code base is, it’s about 20KLOC of code, organised as:

cryptoms_overview

On benchmark randomization

March 10, 2013Research, SATBCP, CNF, SAT Race, simplificationmsoos

As many of you have heard, the SAT Competition for this year has been announced. You can send in your benchmarks between the 12th and the 22nd of April, so get started. I have a bunch of benchmarks I have already submitted about 2 years ago, still waiting for any reply from those organizers — but the organizers are different this year, so fingers crossed.

What I want to talk about today is benchmark randomization. This is a very-very touchy topic. However, I fear that it’s touchy for the wrong reasons, and so I think it’s important to talk about it in detail.

What is benchmark randomization?

Benchmark randomization is when a benchmark that is submitted is shuffled around a bit. There are many ways to shuffle a problem, and I will discuss this in a bit, but the point is that the problem at hand that is described by the benchmark CNF should not be changed, or changed only in a very-very minor way, such that everyone agrees that it doesn’t affect the core problem itself as described by the CNF.

Why do we need shuffling?

We need shuffling because simply put, there aren’t enough good benchmarks and so the benchmarks of yesteryear (and the year before, and before, and…) re-appear often. This would be OK if SAT solvers couldn’t be tuned to solving specific problems faster. Note that I am not suggesting that SAT solvers are intentionally manipulated to solve specific problems faster by unscrupulous researchers. Instead, the following happens.

Unintentional random seed improvements

Researchers test the performance of their SAT solvers on specific instances and then tune their solvers, testing the performance again and again on the same instances to check if they have improved performance. Logically this is the best way to test and improve performance: use the same well-defined test-set all the time for meaningful comparison. Since the researcher wants to use the instances that he/she thinks is the current use-case of SAT solvers, he naturally uses the instances of SAT competitions, since those are representative. I did and still do the same.

So, researchers add their idea to a SAT solver, and test. If the idea is not improving things then some change is made and tested again. Since modern CDCL SAT solvers behave quite randomly, and since any change in the source code changes the behaviour quite significantly, a small change in the source code (tuning of a parameter, for example) will change the behaviour. And since the set of problems tested on is fixed, there is a chance that more problems will be solved. If more are solved, the researcher might correctly interpret this as a general improvement, not specific to the problem set. However, it may very well be generic, it is also specific.

The above suggests that the randomness of the SAT solver is completely unintentionally tuned to specific problems — a subset of which will appear next year in the competition.

Easy fixes

Since there aren’t enough benchmark problems, and in particular some benchmark types are rare, I suggest to fix the unintentional tuning of solvers to specific problems by changing the benchmarks in minor ways. Here is a list, with an explanation why I think it’s OK to perform the manipulation:

Propagate variables. Unitary clauses are often part of benchmarks. Propagating some of these, some recursively, gives quite a bit of problem space variation. Propagation is performed by every CDCL SAT solver, and I think many would be surprised if it didn’t help SAT solvers that worked differently than current SAT solvers. Agreeing on performing partial propagation is something that shouldn’t be too difficult.
Renumber variables. For some variable X that is not used (or is fixed to a value that has been propagated), every variable that is higher than X is decremented by one, and the CNF header is fixed to reflect this change. Such a minor renumbering may be approved by every researcher as something that doesn’t change the problem or its structure. Note that if partial propagation is performed there should be quite a number of variables that can be removed. Renumbering some, but not others is a way to shuffle the problem. A more radical way of renumbering variables would be to completely shuffle them, however that would change the way the problem is described in quite a radical way, so some would correctly object and it’s not necessary anyway.
Replace equivalent literals. Perform strongly connected component analysis and replace equivalent literals. This has been shown to significantly improve performance and I have never seen a case where it doesn’t. Since equivalent literal replacement can be performed with a lot of freedom, this is quite a bit of shuffling space. For example, if v1=v2=v3, then any of the v1, v2, v3 can be the one that replaces the rest in the CNF. Picking one randomly is a way to shuffle the instance

There are other ways of shuffling, but either they change the instance too much (e.g. blocked clause removal), or can be undone quite easily (e.g. shuffling the order of the clauses). In fact, (3) is already quite a touchy issue I think, but with (1) and (2) all could agree on. Neither requires the order of the literals or the order of the clauses to change — some clauses (e.g. unitary ones) and literals (some of those that are set) would be removed, but that’s all. The problem remains essentially unchanged such that most probably even the original problem author would easily recognize it. However, it would be different from a SAT solver point of view: these changes would change the random seed of the solver, forcing the solver to behave in a way that is less tuned to this specific problem instance.

Conclusion

SAT solvers are currently tuned too much to specific instances. This is not intentional by the researchers, however it still affects the results. To obtain better, less biased results we should shuffle the problem instances we have. Above, I suggested three ways to shuffle the instances in such a way that most would agree they don’t disturb or change the complexity of the underlying problem described by the instance. I hope that some of these suggestions will be employed, if not this year then for next year’s SAT competition such that we could reach better, more meaningful results.

Collecting solver data

November 21, 2012Research, SATSQL, statistics, visualisationmsoos

Lately, I have been putting a lot of effort into collecting data about solver behaviour and dumping it on-the-fly to a MySQL database. Some of this data is then presented by a PHP&javascript interface, such as this.The goal is to better understand how solving works, and thus to possibly come up with better strategies for solving. The requirements of the data gathered are: speed to calculate, size to store and usefulness.

Gathered data

There are some obvious statistics that one would like to collect, such as number of conflicts, propagations and conflicts per second, etc — these are all gathered by almost every SAT solver out there. However, it would be interesting to have more. Here is what CryptoMiniSat now gathers and dumps using highly efficient prepared bulk insert statements, using at most 5% of time.

For each variable, dumped once in a while:

no. times propagated false, and true (separately)
no. times decided false and true (sep.)
no. times flipped polarity
avg & variance of decision level
avg & variance of propagation level

For each clause larger than 3-long, dumped once in a while:

activity
conflict at which it was introduced (if learnt)
number of propagations made
number of conflicts made
number of times any of its literal was looked at during BCP
number of times it was looked at (dereferenced) during BCP
number of times used to resolve with during learnt clause generation
number of resolutions needed to during its generation (if learnt clause)

For earch restart:

no. of reducible&irreducible (i.e. learnt&non-learnt) binary clauses
no. of red&irred tertiary clauses (separately)
no. of red&irred long clauses (sep)
avg, variance, min and max of learnt clause glues
avg, var, min, max of learnt clause sizes
avg, var, min, max of number of resolutions for 1st UIP
avg,var,min,max of branch depths
avg,var,min,max of backjump length –in the number of literals unassigned
avg,var,min,max of backjump lenght — in the number of levels backjumped
avg, var, min, max times a conflict followed a conflict without decisions being made
avg,var,min,max of agility
no. of propagations by red&irred binary clauses
no. of props by red&irred tertiary clauses
no. of props by red&irred long clauses
no. of conflicts by red&irred binary clauses (separately)
no. of conflicts by red&irred tertiary clauses (sep)
no. of conflicts by red&irred long clauses (sep)
no. of learnt unit, binary, tertiary, and long clauses (sep)
avg,var, min,max of watchlist size traversed during BCP
time spent
no. of decisions
no. of propagations
no. of variables flipped
no. of variables set positive&negative (sep)
no. of variables currently unset
no. of variables currently replaced
no. of variables currently eliminated
no. of variables currently set

For each learnt clause database cleaning:

for reducible clauses removed: all the data in the “clauses larger than 3-long” data above, summed up
for reducible clauses not removed: all the data in the “clauses larger than 3-long” data above, summed up
no. of clauses removed
no. of clauses not removed
for all irred clauses (these are not touched): all the data in the “clauses larger than 3-long” data above, summed up

For every N conflicts:

clause size distribution
clause glue distribution
clause size and glue scatter data

This is all, and is not all mine. Suggestions were made by many, some as much as a year ago: George Katsirelos, Luca Melette, Vegard Nossum, Valentin-Mayer Eichberger, Ben Schlabs, Said Jabbour and Niklas Sörensson. Naturally, none of these people necessarily approve of how I gather data or what I do with the data gathered, but they helped, so listing them is only fair.

What’s not yet or cannot be gathered

Two suggestions are still on the TODO list:

counting the number of conflicts done while a learnt clause was “locked”, i.e. has propagated in the current search tree. This stat of a learnt clause could tell us if the clause seemed essential to the search or not. If a clause propagated once, at the bottom of the search tree, and then the variable propagated was quickly unset, it’s not the same as if the same clause propagated at the top of the search tree, and then the variable it set was essentially never unset.
for each variable, counting the number of conflicts done while the variable was set. This is interesting, because if a variable was propagated only once, at top-level, it will seem like it was never set (set only once), yet it may have had a huge influence on the search tree and consequently on the learnt clauses.

Both of these require a change in the algorithm used to backjump and although I am a bit worried about the performance, I will add these soon.

Unfortunately, the data about variables can’t really be dumped so often, because it’s huge for large problems with millions of variables. So I guess I will only dump that for the most active N variables, for some sane definition of “active”. Similarly, the data about individual clauses is not dumped, only in a summed-up way during database cleaning.

Suggestions?

In case you have any ideas what else could be interesting to dump, please put it as a comment. The idea is to dump data that is cheap to compute and cheap to dump yet would be interesting for some reason or another. I am prepared to add stuff to datastructures, as you have probably guessed from the above. Yes, it makes the clause database less space-efficient, and so increases cache-miss. But on large problems, it’s going to be a cache-miss most of the time anyway, and a cache-fetch brings in 128B of data, which should be plenty for both the stats and the clause. Similarly with variables. So, don’t refrain from suggesting some stuff that takes space. Also, weird stuff is interesting. The most weird stat on the list above is definitely the “conflict after a conflict without decisions” (suggested by Said Jabbour) which I would have guessed to be zero, or near-zero, yet is pretty high, in the 30+% range.

Suggestions are welcome!

How to use CryptoMiniSat 2.9 as a library

June 20, 2012Development, SATlibrary, programmingmsoos

There have been some questions asked about how to use CryptoMiniSat as a library. Here, I give a simple example usage to help any future potential users. If there are any question that seems unanswered, please feel free to add them as a comment to this post.

First, we initialize the solver with high verbosity:

#include "cmsat/Solver.h"
using namespace CMSat;

SolverConf conf;
conf.verbosity = 2;
Solver solver(conf);

Then we add the variables 0..9:

for(int i = 0; i < 10; i++) {
  solver.newVar();
}

And add a clause that would look in the DIMACS input format as "2 -5 10 0":

vec clause;
clause.push(Lit(1, false));
clause.push(Lit(4, true));
clause.push(Lit(9, false));
solver.addClause(clause);

We now create the assumptions that would look in DIMACS input format as "1 0" and "-5 0":

vec assumps;
assumps.push(Lit(0, false));
assumps.push(Lit(4, true));

We now solve with these assumptions:

lbool ret = solver.solve(assumps);

Let's see if the state of the solver after solving is UNSAT regardless of the assumptions:

if (!solver.okay()) {
  printf("UNSAT!\n");
  exit(0);
}

Note that okay() can only be FALSE if "ret" was "l_False". However, it is important to distinguish between two kinds of l_False: one where only the assumptions couldn't be satisfied, and one where no matter what the assumptions are, the problem is unsatisfiable. See below for more.

Now let's check that the solution is UNSAT relative to the assumptions we have given. In this case, let's print the conflicting clause:

if (ret == l_False) {
  printf("Conflict clause expressed in terms of the assumptions:\n");
  for(unsigned i = 0; i != solver.conflict.size(); i++) {
    printf("%s%d ", (solver.conflict[i].sign()) ? "" : "-", solver.conflict[i].var()+1);
  }
  printf("0\n");
}

The above will print an empty clause if solver.okay() is FALSE.

Now, let's see if the solution is SAT, and if so, print the solution to console, using DIMACS notation:

if (ret == l_True) {
  printf("We have the solution, it's here: \n");
  for (unsigned var = 0; var != solve.nVars(); var++)
    if (S.model[var] != l_Undef) {
      printf("%s%d ", (solver.model[var] == l_True)? "" : "-", var+1);
      printf("0\n");
    }
}

If solver.okay() is still TRUE (which it must be, if the ret was l_True), we can now add new clauses and new variables without restriction, and solve again (potentially with different assumptions). Note that most of the above applies to MiniSat as well.

When compiling the code, you have to pass the parameters such as:

$ g++ -fopenmp -lcryptominisat myprogram.cpp -o myprogram

PS: Thanks go to Martin Albrecht for pointing out some ambiguities in my explanation of the library usage, which have now been fixed.

On hyper-binary resolution

May 17, 2012Research, SATbug, hyper-binary resolution, transitive reductionmsoos

Hyper-binary resolution is actually quite straightforward, or at least appears to be. Let’s take the following example. The clauses in our database are the following:

-a V b
-a V c
-b V -c V g
-b V -c V d
-d V g

Let’s set a to true, and see what happens. Immediately, b and c get set to true through binary clauses. If we now propagate g through the clause -b V -c V g, we ought to do hyper-binary resolution straight away, and add the clause -a V g — some call this lazy hyper-binary resolution. Good, one more binary clause!

But then… So now we have nothing to propagate using only binary clauses, we have to propagate using a long clause, -b V -c V d. As good citizens, we also do (lazy) hyper-binary resolution, coming up with the clause -a V d. Good, one more binary clause! One slight glitch now… d propagates to g, using a binary clause. But this means that setting a can propagate to g without -a V g, the first hyper-binary clause we added! So the first hyper-binary clause we added is in fact useless, it needs to be removed. If we applied transitive reduction, it would remove the first hyper-binary clause -a V g automatically.

Let’s go a bit deeper here. How could we have avoided adding the first hyper-binary clause? The obvious answer is: we should have started with -b V -c V d instead of -b V -c V g. But how easy would have it been to know (i.e. calculate) that starting with that different long clause would have made our work easier? I am not sure it would have been easy to know. And of course the example above is very trivial. It could be made much-much more complicated: g could have been reached with any number of hyper-binary resolutions from d — so simple binary look-ahead would not have helped.

I am stuck here. Any ideas?

Wonderings of a SAT geek

A blog about SAT solving and cryptography