banner
social-twittersocial-githubsocial-github

A practical introduction to the PCP class

This is a post motivating and demonstrating how to write a probabilistically checkable proof for the graph non isomorphism (GNIGNI) language. For the sake of practicality and beginner friendliness, definitions will be somewhat handwaved.

Still, this post assumes that you know enough about computational complexity to understand terms as "language", "complexity class" or "turing machine" and that you already heard a bit about the PCPPCP class before.

The definition of PCP

The class PCPPCP is built upon letting a verifier VV, running in polynomial time, having probabilistic access (i.e. VV has access to a source of randomness) to a proof string π\pi provided by a prover PP.

Some key definitional points about what the class PCP[r(n),q(n)]PCP[r(n), q(n)] means:

  1. A verifier VV of a statement x{0,1}nx \in \{0, 1\}^n for a language LPCP[r(n),q(n)]L \in PCP[r(n), q(n)] must always accept any valid proof π\pi. If π\pi is invalid, VV accepts it with probability less than 1/21/2
  2. What r(n),q(n)r(n), q(n) mean is that VV can use r(n)r(n) random coins and q(n)q(n) non-adaptive queries to π\pi. A non-adaptive query is a query whose content is independent of the answers VV gets when querying π\pi
  3. The proof π\pi is of length at most q(n)2r(n)q(n) 2 ^ {r(n)}
  4. No assumption is made about PP's computational power - i.e. PP is not assumed to be a polynomial time running prover

PCP theorem

A landmark achievement of complexity theory is the PCPPCP theorem. It states that NP=PCP[log(n),1]NP = PCP[log(n), 1], meaning that for any language in NPNP and a statement x0,1nx \in {0, 1}^n, a proof system can be built with:

  1. Completeness 11 and soundness 1/2\leq 1/2
  2. A proof length of at most 2log(n)2^{log(n)}
  3. VV running in polynomial time
  4. VV needing only a constant amount of non-adaptive queries and a logarithmic amount of coin tosses in the size of the statement xx.

For a result applying to the whole NPNP class, this makes up for a pretty effective out-of-the box verifier! Above all, it characterizes NPNP in a whole new light, using probabilistic and interactive turing machines, two notions which were absent from the initial definition of NPNP.

The history of the PCP theorem is pretty interesting. At the time, there was some kind of race to be the first to be able to define NPNP in the most efficient way possible in terms of the PCPPCP class, using the smallest number of random coins and queries. It was this race that eventually led up to the PCP theorem, published in 92, and that we know today.

However, building PCP proofs for a language is non trivial. There is still a language for which building a PCP proof is actually pretty easy: the graph non-isomorphism language aka GNIGNI.

The graph non-isomorphism language

We say that two graphs are isomorphic if one is a permutation of vertices of the other. In fact, the language GIGI of deciding whether two graphs are isomorphic is in NPNP: the permutation is a certificate for the isomorphism. Using this certificate, the verifier can verify in polynomial time that the isomorphism claim is correct.

The complement of GIGI is GNIGNI, graph non-isomorphism. If you pause for a minute, you will realise that it's pretty darn hard to think of a short proof (of length polynomial in the inputs) when claiming that two graphs are not isomorphic. What certificate can be used to prove that for all the permutations of the vertices of a given graph G0G_0, none is equivalent to a second graph G1G_1? I.e. GNIGNI is not in NPNP.

Still, this example is a good a motivation for building PCP proofs. While there might not be a deterministic, short proof based verifier for GNIGNI, the PCP class still lets us trade additional bits in the proof and access to randomness for a polynomial time probabilistic verifier. That is pretty cool!

An astute reader might note that GNIGNI is in IPIP since there exists an interactive proof system for a prover PP to convince a verifier VV that two graphs G0G_0, G1G_1 are not isomorphic. The protocol for it is straightforward and consists, in broad strokes, into letting VV choosing to permute (randomly) either G0G_0 or G1G_1 and asking PP whether G0G_0 or G1G_1 was permuted.

But, to the difference of IPIP, the PCP class even lets us even reduce the interaction phase to a single message between VV and PP, using only the proof π\pi.

As mentioned above, GNIGNI is not in NPNP. Let me now tell you that GNIGNI is in PCP[poly(n),1]PCP[poly(n), 1]. This means that any non-isomorphism claim about two graphs has a probabilistically checkable proof of length polynomial in the size of the inputs (i.e. the number of vertices in the checked graphs), checkable by VV using a number of random coins polynomial in the length of the inputs and a constant number of queries.

Again, note that this is quite counter-intuitive for a language with no short proof to have such an efficient verification procedure. The key lays in letting VV access randomness and allowing PP to append additional bits to his proof.

A PCP for GNI

The intuition behind GNIGNI's PCP is simple. First, PP will build:

  1. A list of all non isomorphic graphs of nn vertices.
  2. A list of all possible graphs of nn vertices

Recall that no computational assumption has been made regarding PP. So we assume doing the above is possible for PP. Also, note that those two lists are different. There exists 1024 possible graphs with 5 vertices, out of which only 34 are not isomorphic to one another.

The proof PP will hand over to VV will consist in an array indexed by all possible graphs of nn vertices. Each value in the array will be set to 0 or 1 following whether the graph at this index is isomorphic to G0G_0 or G1G_1. If the graph being indexed is not isomorphic to both, PP will set the value randomly.

Upon receiving (G0,G1,π)(G_0, G_1, \pi), the check that VV does is straightforward. VV randomly chooses a bit b=0b=0 or b=1b=1, takes GbG_b, permutes it to obtain a graph GpG_p and accesses the array at the position of GpG_p. If the value at this position is bb then VV accepts, otherwise VV rejects.

Recall that by the definition of the PCP class, we should have for proof length something in the order of 2poly(n)2^{poly(n)}. That is the case, since for nn vertices, there exists 2(n(n1))/22^{(n * (n-1))/2} possible graphs, which is the length of the proof that PP will send to VV. You can also see that the number of queries VV does to obtain the required soundness level is constant.

The soundness of this proof is straightforward. Suppose that G0G_0 and G1G_1 are permutations of each other. After VV samples a random bit bb, the permutation of GbG_b, which is GpG_p will get VV to access a position in the array whose value is bb with probability 0.5, and reject if the two bits differ. If VV wants to reduce the soundness error, he will just repeat his verification process many times.

For completeness, since if G0G_0 and G1G_1 are not isomorphic, randomly permuting them will make VV always access the array at an index whose value will be equal to the sampled bit bb.

Scripting a PCP for GNI

Let's build this PCP system for GNIGNI. I will be omitting imports and functions' definitions when comments are sufficient to understand what's going on. This is running with sage 9.8.

First, we build a list with all the possible graphs composed of 55 vertices. Since sage has a database for non-isomorphic graphs, let's use it directly.

n = 5
vertices = range(n)
possible_edges = list(combinations(vertices, 2))
powerset_all_possible_edges = list(powerset(possible_edges))

all_graphs = []
for g in powerset_all_possible_edges:
    d = defaultdict(list, {k: [] for k in vertices})
    for edge in g:
        d[edge[0]].append(edge[1])
    all_graphs.append(Graph(d, immutable=True))

# we have 2^{(n * (n-1))/2} possible graphs with n vertices (
# the proof is of polynomial length in the number of vertices
# n vertices --> n(n-1)/2 pairs of possible points 
# all possible subsets of those pairs of points = 2^{n(n-1)/2}
assert(len(all_graphs) == 2**((n**2 - n)/2))

# also, sage has a db of all non isomorphic graphs of n vertices
# list of all non isomorphic graphs with n vertices, using `graphs()`
non_isomorphic_graphs = [g.copy(immutable=True) for g in graphs(len(vertices))]

We will randomly sample two non-isomorphic graphs and build a proof for it.

# generating a proof requires two non-isomorphic graphs and 
# a list of all possible graphs of n vertices
def prove(g0, g1, graphs):
    pi = {} # the proof is an array of bits, indexed by all possible n-vertex graphs
    for g in graphs:
        if g.is_isomorphic(g0):
            pi[g] = 0
        elif g.is_isomorphic(g1):
            pi[g] = 1
        else:
            # the choice does not matter if neither are isomorphic to g
            # this bit will never be inspected by the verifier
            pi[g] = random.choice([0, 1])
    return pi

g0, g1 = sample_graph_pair(non_isomorphic_graphs)
pi = prove(g0, g1, all_graphs)

The verifier can check π\pi:

# Soundness is a parameter indicating how many checks the verifier should do on the proof
def verify(pi, g0, g1, soundness):
    for i in range(soundness):
        b = random.choice([0, 1])
        gb = [g0, g1][b]
        h = get_permuted_graph(gb)
        if pi[h] != b:
            return False
    return True

proof_is_valid = verify(pi, g0, g1, 10000)
assert(proof_is_valid)

You can also try to cheat the verifier:

# try to cheat verifier with two isomorphic graphs
g1 = get_permuted_graph(g0) # g1 is a permutation of g0
pi = prove(g0, g1, all_graphs)
# if you set soundness to 1, there is a 1/2 prob that this check passes
proof_is_valid = verify(pi, g0, g1, 1)
assert(proof_is_valid == False)