Lately, I have been working quite hard on writing module tests for CryptoMinisat using GoogleTest. I’d like to share what I’ve learnt and what surprised me most about this exercise.
An example
First of all, let me show how a typical test looks like:
TEST_F(intree, fail_1)
{
s->add_clause_outer(str_to_cl(" 1, 2"));
s->add_clause_outer(str_to_cl("-2, 3"));
s->add_clause_outer(str_to_cl("-2, -3"));
inp->intree_probe();
check_zero_assigned_lits_contains(s, "-2");
}
Here we are checking that intree probing finds that the set of three binary clauses cause a failure and it enqueues “-2” at top level. If one looks at it, it’s a fairly trivial test. It turns out that most are in fact, fairly trivial if the system is set up well. This test’s setup is the following test fixture:
struct intree : public ::testing::Test {
intree()
{
must_inter = false;
s = new Solver(NULL, &must_inter);
s->new_vars(30);
inp = s->intree;
}
~intree()
{
delete s;
}
Solver* s;
InTree* inp;
bool must_inter;
};
Which hides no complexity, either. In fact, what’s surprising about this test is how easy it really is to write.
The fiddly parts
Looking at the test code, it’s easy to see if a module implements a bit too much:
TEST_F(probe, uip_fail_1)
{
s->add_clause_outer(str_to_cl(" 1, 2"));
s->add_clause_outer(str_to_cl("-2, 3"));
s->add_clause_outer(str_to_cl("-2, -3"));
s->conf.doBothProp = false;
s->conf.doStamp = false;
s->conf.otfHyperbin = false;
vars = tovars("1");
p->probe(&vars);
check_zero_assigned_lits_contains(s, "-2"); //1UIP is -2
}
This is testing the same principle, probing, but using the non-intree version. This probing type has been in CryptoMiniSat for a long while, and the feature creep shows. There is stamping, hyper-binary resolution and Stalmarck all in the module, separately controllable. This test needs none of these functionalities, and is only interested if the 1st UIP (i.e. -2) will be caught if “-1” is probed. Note that Stalmarck could set -2 by probing 3 and -3 so we need to make sure to turn it off.
What I learned
The surprising thing about this module testing exercise was that it’s much easier than I first thought. I have written about 100 tests for 11 modules in just a week. The vast majority of the tests are very simple to set up, execute and check. However, for tests to be easy to write and reliable to run, there are some prerequisites. Any time the tests were flaky, weird or plain wrong, either the module under test didn’t expose enough control or it had too many functionalities. In other words, if the module is not self-contained and manageable, it cannot be tested. What’s interesting about that is that based solely on the test code, one can discern the quality of the code being tested. That, I think, was my main takeaway.
Interestingly, I have found no actual bugs in the code. However, while trying to hunt down why a test wasn’t working, I have often found places to clean up, fix up, and generally be more sane and readable. In other words, just because the tests haven’t caught any real bugs, they weren’t useless at all. I find this fascinating — though it only takes a cursory look at the testing literature to see that there is more to testing than just quality, risk and performance metrics. It’s fun to see that first-hand, though.
Future Work
I have written tests for all the main modules of CryptoMiniSat except for the occurrence-based systems. These are up next — and these are the most hairy, and the most interdependent. For example, bounded variable elimination (BVE) calls backward-subsumption for each clause it creates. So testing BVE in isolation will either need some stubbing or more control parameters to turn off subsystems. Self-subsuming resolution is also similarly interdependent on subsumption. Handling these interdependencies in a sane way will be an fun challenge.