2014-04-24

plaidCTF 2014 - graphs (crypto200)

By jix
Filed under plaidCTF2014 cryptography graphs sat

For PlaidCTF2014, Eindbazen and fail0verflow joined forces as 0xffa, the Final Fail Alliance.
Don't miss out on other write-ups at Eindbazen's site!

This challenge was about breaking a custom public key encryption system.

graphs
Cryptography (200 pts)
--------------
In this era, block ciphers hadn't even been invented. The Plague created this
system based on problems he knew to be NP hard, but there must be something you
can do to decode his messages.

We were given a python implementation of the system, the Plague’s public key and an encrypted message. The implementation includes encryption, decryption (given a private key) and key generation.

The implementation comes with a nice comment explaining the system:

Encrypts a message using super secure, NP-hard graph stuff.

Encryption works like this: take a graph, split your message up into
N numbers, such that the sum of all the numbers is equal to your message.
Assign each vertex to one of these N numbers.

Then ciphertext[v] is the sum of v's N, and all of the N's from v's neighbors

A Look at the Private Parts

The description omits some details about the private key, but everything can be inferred from that comment alone:

The public key is a graph with N vertices.
To prepare a message for encryption we first split it into N numbers that sum to the original message and assign them to the vertices.
To encrypt, we add to each vertex the sum of its adjacent vertices.
The encrypted message consists of the values with their assigned vertices.
To decrypt we compute the sum of a subset of vertices selected so that each of the initial N numbers is added once to the final sum.
This subset is the private key.

The vertex subset that is used as private key has to have the following property: For every vertex A in the graph, the subset contains exactly one vertex out of A together with the vertices adjacent to A, or equivalently, the sets containing the private key vertices and their neighbors form a partition of the vertex set.

We can see how this works using a small example. We will encrypt the message 42 using the following graph, where the private key is marked in red, and the partition defined by the private key using green lines:

Graph Keypair {: .graph }

First we split 42 into 2 + 7 + 2 + 0 + 1 + 9 + 11 + 10 and assign the numbers to the vertices:

Graph Keypair {: .graph }

Next we compute the sum of the all neighbors and add it to each vertex’ value:

Graph Keypair {: .graph }

The encrypted message is the sequence 11, 10, 4, 18, 29, 9, 29, 22. To decrypt it we just add the numbers corresponding to private key vertices: 11 + 9 + 22 = 42.

The problem of deriving a private key given just the public key is similar to, but not quite like many well known NP-hard graph problems including minimal vertex covers, maximal independent set, and so on. But there are also some similar problems like a maximum matching that are not NP-hard (assuming P ≠ NP). I haven’t thought too hard about whether this specific problem is NP-hard or not, I wouldn’t be surprised either way. Luckily we don’t have to be concerned about this. This NP business is about asymptotic worst-case behavior, to crack this we care about specific problem instances, unlikely to be a worst case instances.

If All You Have is a Hammer, …

… everything looks like a boolean satisfiability problem. This problem, also called SAT, is NP-complete, which means that every instance of any problem in NP can be efficiently reduced to a SAT instance. And, importantly, there are SAT solvers out there which are very efficient for many problems encountered in practice.

What SAT solvers do in practice is this: You give them a boolean formula of some variables, they give you back an assignment of variables that makes the complete formula true or tell you if no such assignment exists. We don’t need to know anything about the magic that makes them work fast. To make things simpler SAT solvers expect a formula in conjunctive normal form (CNF), which means that the formula is a conjunction (AND-connective) of clauses. Clauses in turn are a disjunction (OR-connective) of literals. Literals are either a variable or a negation of a variable. This makes it easy to represent a CNF formula in textual form:

* A non-inverted literal, i.e. a variable, is a represented as a positive decimal integer.
* An inverted literal is represented as a negative decimal integer, i.e. <code>-5</code> is the negation of <code>5<code>.
* A clause consists of multiple literals terminated with a 0 value, separated with space.
* A formula is just a file with multiple lines.

The numbers are just identifiers for the variables, just think of v1, v2, v3, … instead of 1, 2, 3, …

¡Viva la Reducción!

We can now write a small python script that reduces our graph problem to a boolean formula in CNF.

import subprocess
import tempfile
import genkey # provided for this challenge

def crack_key(gc):
    """compute the private key of a given Graphcrypt instance"""
    with tempfile.NamedTemporaryFile(mode='w') as formula:
...

A Grapcrypt instance has a pubkey attribute which is list with an element for each graph vertex. The elements are lists of indices of adjacent vertices. The example graph from above would be encoded as [[1, 2], [0, 3, 4], [0, 3], [1, 2, 5], [1, 6, 7], [3], [1, 4, 7], [4, 6]]. Python list indices start at zero, but the CNF variables start at 1, thus we need to convert between those. Another problem is that 1 and -1 are distinct, but 0 and -0 are the same. Instead we will use 0, 1, 2, … and ~0, ~1, ~2, … (-1, -2, -3, …) as literals. When outputting a clause we convert between those representations:

...
        def add_clause(vertices):
            formula.write('%s 0\n' %
                            ' '.join(str(v if v < 0 else v + 1) for v in vertices))
...

To make sure every vertex of the graph is in the subset or has a neighbor who is, we simply add a clause for each vertex, consisting of all those vertices (vertex itself and neighbors) as positive literals. (Remember: If none of them are true, the clause is false and thus the complete formula.)

...
        for vertex, neighbors in enumerate(gc.pubkey):
            add_clause([vertex] + neighbors)
...

The first few clauses for the example are:

v0 or v1 or v2 or v3
v1 or v0 or v3 or v4
v2 or v0 or v3
...

We also need to make sure that no two adjacent vertices are in the subset. We can simply add a clause containing both vertices negated for every two adjacent vertices:

...
            for neighbor in neighbors:
                add_clause([~vertex, ~neighbor])
...

Again, some clauses for the example:

(not v0) or (not v1) == not (v0 and v1)
(not v0) or (not v2) == not (v0 and v2)
(not v1) or (not v0) == not (v1 and v0)
...

Finally we need to make sure that no vertex has two or more neighbors in the subset. For this, we add a clause containing each pair of distinct neighbors negated:

...
                for neighbor2 in neighbors:
                    if neighbor != neighbor2:
                        add_clause([~neighbor, ~neighbor2])
...

And some clauses for the example:

(not v1) or (not v2) == not (v1 and v2)
(not v2) or (not v1) == not (v2 and v1)
(not v0) or (not v3) == not (v0 and v3)
...

We don’t have to worry about duplicated or redundant clauses, SAT solvers handle them just fine.

Don’t Forget to Flush

This completes the reduction and we can feed the formula to our favorite SAT solver.

...
        formula.flush()
        solver = subprocess.Popen(['cryptominisat', formula.name],
                                    stdout=subprocess.PIPE)
        solver.wait()
        output = solver.stdout.read().split('\n')
...

Now we need to parse the variable assignment from the SAT solver output. The format is the same as used for the clauses, prefixed with ‘v ‘ to indicate the line with the variable assignments.

...
        solution = [line for line in output if line.startswith('v ')][0]
        gc.privkey = [ int(var) - 1
                        for var in solution.split()[1:-2]
                        if not var.startswith('-') ]
...

Finally, we can load the public key, crack it (takes a few seconds, mostly to generate the large CNF formula), and decipher the ciphertext:

...
gc = genkey.Graphcrypt(fil='pubkey')
crack_key(gc)
with open('ciphertext', 'rb') as ciphertext:
    print ('%x' % gc.decrypt(ciphertext.read())).decode('hex')
...

And we get The flag is: 3veryb0dy_poops~.

Here is the complete solution code:

import subprocess
import tempfile
import genkey # provided for this challenge

def crack_key(gc):
    """compute the private key of a given Graphcrypt instance"""
    with tempfile.NamedTemporaryFile(mode='w') as formula:


        def add_clause(vertices):
            formula.write('%s 0\n' %
                            ' '.join(str(v if v < 0 else v + 1) for v in vertices))


        for vertex, neighbors in enumerate(gc.pubkey):
            add_clause([vertex] + neighbors)


            for neighbor in neighbors:
                add_clause([~vertex, ~neighbor])


                for neighbor2 in neighbors:
                    if neighbor != neighbor2:
                        add_clause([~neighbor, ~neighbor2])


        formula.flush()
        solver = subprocess.Popen(['cryptominisat', formula.name],
                                    stdout=subprocess.PIPE)
        solver.wait()
        output = solver.stdout.read().split('\n')


        solution = [line for line in output if line.startswith('v ')][0]
        gc.privkey = [ int(var) - 1
                        for var in solution.split()[1:-2]
                        if not var.startswith('-') ]


gc = genkey.Graphcrypt(fil='pubkey')
crack_key(gc)
with open('ciphertext', 'rb') as ciphertext:
    print ('%x' % gc.decrypt(ciphertext.read())).decode('hex')