Computing Tricky Probabilities | Wonderings of a SAT geek

Probabilities of certain events are really hard to estimate sometimes. Often, it’s because we lack information of the underlying causal chains, but sometimes, it’s because the causes are so intertwined that even if we know the underlying probabilities of certain events happening along with the causal chains, it’s still impossible for us to untangle the web.

Let’s say there’s an event X has a probability of 0.4, Y has a probability of 0.6, and Z happens if either X or Y happens, but X and Y can’t happen at the same time, with a probability of 0.8. What’s the probability of Z happening?

             0.4                  0.6          
         ┌───────┐            ┌───────┐       
         │       │    0.8     │       │       
         │   X   │◄────!─────►│   Y   │       
         │       │            │       │       
         └──┬────┘            └────┬──┘       
            │                      │                    
            │      ┌───────┐       │          
            │      │       │       │          
            └─────►│   Z   │◄──────┘          
                   │       │                  
                   └───────┘

Translating to a Model Counting Problem

The solution is easy to compute if you know how to use propositional model counters. They allow you to create a set of rules, such as:

X = 0.4
Y = 0.6
X or Y = True
0.8 -> NOT (X & Y)

The above perfectly describes the probability landscape, by using “->”, i.e. implication (if the left side is true, the right side must also be true), and using “&” (i.e. logical AND). Basically, for Z to happen, X or Y has to happen. And with probability 0.8, they must not happen at the same time. To solve the above, we add a helper variable H1, that will allow us to translate the last equation into an implication:

X = 0.4
Y = 0.6
X or Y = True
H1 = 0.8
H1 -> NOT (X & Y)

To translate this to propositional model counting, we can do the following, where “-” is negation:

weight X = 0.4
weight Y = 0.6
weight H1 = 0.8
X or Y = True
-H1 or -X or -Y = True

The highlighted lines are what’s call the Tseitin transformation, which basically means that when H1 is TRUE, (NOT X or NOT y) must be TRUE, i.e. at least one of them has to be FALSE, i.e. they can’t happen at the same time (but it’s possible that neither of them happens). The above is what’s called a conjunctive normal form, or CNF for short.

The model counting DIMACS representation (formal description here) of the above is:

c t pwmc
p cnf 3 2
c comment X = 1, Y = 2, H1 = 3
c p weight 1 0.4 0
c p weight 2 0.6 0
c p weight 3 0.8 0
1 2 0
-3 -1 -2 0
c p show 1 2 3 0

Which is a straightforward 1-to-1 translation of the above, except we have to tell the counter that this is a projected weighted model counting problem (“pwmc”) and that it has 3 variables, and 2 constraints (“p cnf 3 2”).

Solving with a Model Counter

Once you have a CNF representation, you can run a modern propositional weighted model counter to count the number of weighted solutions, i.e. the probability of Z. I recommend the one I develop, ganak (linux binary here, open source code will be available soon, ask me if you need it):

./ganak problem.cnf
c s type pwmc
s SATISFIABLE
c s exact arb float 5.68E-01

So Z has a probability of 0.568. This can easily be verified to be correct:

Python 3.13.1
>>> 0.6*(1-.4)+0.4*(1-0.6)+0.4*0.6*(1-0.8)
0.568

Which adds up the probabilities that the left fires (but not the right), the right fires (but not the left), and that both fire, but the conflict does not happen (with 1-0.8 probability).

Of course, the above is a rather simple problem. However, with the above steps, one can create arbitrarily complicated causal chains, with complex constraints between partial, or intermediate elements of the chain, conditional probabilities, etc. Although there are limits to scalability, it’s many-many orders of magnitude faster than anything by hand, or by some form of brute-force checking/estimation.

Use-cases

In certain scenarios, it’s really important to know what the probability of some event is. For example, if you want to run a hazardous process such as a nuclear reactor or a chemical plant, or you want to provide insurance for some complicated (financial) product, accurately computing the probabilities of bad events happening is extremely important. I think it’s rather obvious that nuclear plants have very complicated causal chains for bad events happening — and they tend to be intertwined. If the external power lines are damaged, then likely it’s a major event (e.g. earthquake) and you likely also lost the majority of your transport capabilities, and along them, much of potential external help.

Similarly, if you want to bet for/against market moves, where you know or have a good idea of the underlying causal chains and associated probabilities, you can build quant trading, or expert advice systems based on the above. In other words, weighted model counting is not only useful for party tricks. Complex probability chains with thousands of interdependent random variables can easily be analyzed using the above system.