A practical introduction to the PCP class

This is a post motivating and demonstrating how to write a probabilistically checkable proof for the graph non isomorphism ( $GNI$ ) language. For the sake of practicality and beginner friendliness, definitions will be somewhat handwaved.

Still, this post assumes that you know enough about computational complexity to understand terms as "language", "complexity class" or "turing machine" and that you already heard a bit about the $PCP$ class before.

The definition of PCP

The class $PCP$ is built upon letting a verifier $V$ , running in polynomial time, having probabilistic access (i.e. $V$ has access to a source of randomness) to a proof string $\pi$ provided by a prover $P$ .

Some key definitional points about what the class $PCP[r(n), q(n)]$ means:

A verifier $V$ of a statement $x \in \{0, 1\}^n$ for a language $L \in PCP[r(n), q(n)]$ must always accept any valid proof $\pi$ . If $\pi$ is invalid, $V$ accepts it with probability less than $1/2$
What $r(n), q(n)$ mean is that $V$ can use $r(n)$ random coins and $q(n)$ non-adaptive queries to $\pi$ . A non-adaptive query is a query whose content is independent of the answers $V$ gets when querying $\pi$
The proof $\pi$ is of length at most $q(n) 2 ^ {r(n)}$
No assumption is made about $P$ 's computational power - i.e. $P$ is not assumed to be a polynomial time running prover

PCP theorem

A landmark achievement of complexity theory is the $PCP$ theorem. It states that $NP = PCP[log(n), 1]$ , meaning that for any language in $NP$ and a statement $x \in {0, 1}^n$ , a proof system can be built with:

Completeness $1$ and soundness $\leq 1/2$
A proof length of at most $2^{log(n)}$
$V$ running in polynomial time
$V$ needing only a constant amount of non-adaptive queries and a logarithmic amount of coin tosses in the size of the statement $x$ .

For a result applying to the whole $NP$ class, this makes up for a pretty effective out-of-the box verifier! Above all, it characterizes $NP$ in a whole new light, using probabilistic and interactive turing machines, two notions which were absent from the initial definition of $NP$ .

The history of the PCP theorem is pretty interesting. At the time, there was some kind of race to be the first to be able to define $NP$ in the most efficient way possible in terms of the $PCP$ class, using the smallest number of random coins and queries. It was this race that eventually led up to the PCP theorem, published in 92, and that we know today.

However, building PCP proofs for a language is non trivial. There is still a language for which building a PCP proof is actually pretty easy: the graph non-isomorphism language aka $GNI$ .

The graph non-isomorphism language

We say that two graphs are isomorphic if one is a permutation of vertices of the other. In fact, the language $GI$ of deciding whether two graphs are isomorphic is in $NP$ : the permutation is a certificate for the isomorphism. Using this certificate, the verifier can verify in polynomial time that the isomorphism claim is correct.

The complement of $GI$ is $GNI$ , graph non-isomorphism. If you pause for a minute, you will realise that it's pretty darn hard to think of a short proof (of length polynomial in the inputs) when claiming that two graphs are not isomorphic. What certificate can be used to prove that for all the permutations of the vertices of a given graph $G_0$ , none is equivalent to a second graph $G_1$ ? I.e. $GNI$ is not in $NP$ .

Still, this example is a good a motivation for building PCP proofs. While there might not be a deterministic, short proof based verifier for $GNI$ , the PCP class still lets us trade additional bits in the proof and access to randomness for a polynomial time probabilistic verifier. That is pretty cool!

An astute reader might note that $GNI$ is in $IP$ since there exists an interactive proof system for a prover $P$ to convince a verifier $V$ that two graphs $G_0$ , $G_1$ are not isomorphic. The protocol for it is straightforward and consists, in broad strokes, into letting $V$ choosing to permute (randomly) either $G_0$ or $G_1$ and asking $P$ whether $G_0$ or $G_1$ was permuted.

But, to the difference of $IP$ , the PCP class even lets us even reduce the interaction phase to a single message between $V$ and $P$ , using only the proof $\pi$ .

As mentioned above, $GNI$ is not in $NP$ . Let me now tell you that $GNI$ is in $PCP[poly(n), 1]$ . This means that any non-isomorphism claim about two graphs has a probabilistically checkable proof of length polynomial in the size of the inputs (i.e. the number of vertices in the checked graphs), checkable by $V$ using a number of random coins polynomial in the length of the inputs and a constant number of queries.

Again, note that this is quite counter-intuitive for a language with no short proof to have such an efficient verification procedure. The key lays in letting $V$ access randomness and allowing $P$ to append additional bits to his proof.

A PCP for GNI

The intuition behind $GNI$ 's PCP is simple. First, $P$ will build:

A list of all non isomorphic graphs of $n$ vertices.
A list of all possible graphs of $n$ vertices

Recall that no computational assumption has been made regarding $P$ . So we assume doing the above is possible for $P$ . Also, note that those two lists are different. There exists 1024 possible graphs with 5 vertices, out of which only 34 are not isomorphic to one another.

The proof $P$ will hand over to $V$ will consist in an array indexed by all possible graphs of $n$ vertices. Each value in the array will be set to 0 or 1 following whether the graph at this index is isomorphic to $G_0$ or $G_1$ . If the graph being indexed is not isomorphic to both, $P$ will set the value randomly.

Upon receiving $(G_0, G_1, \pi)$ , the check that $V$ does is straightforward. $V$ randomly chooses a bit $b=0$ or $b=1$ , takes $G_b$ , permutes it to obtain a graph $G_p$ and accesses the array at the position of $G_p$ . If the value at this position is $b$ then $V$ accepts, otherwise $V$ rejects.

Recall that by the definition of the PCP class, we should have for proof length something in the order of $2^{poly(n)}$ . That is the case, since for $n$ vertices, there exists $2^{(n * (n-1))/2}$ possible graphs, which is the length of the proof that $P$ will send to $V$ . You can also see that the number of queries $V$ does to obtain the required soundness level is constant.

The soundness of this proof is straightforward. Suppose that $G_0$ and $G_1$ are permutations of each other. After $V$ samples a random bit $b$ , the permutation of $G_b$ , which is $G_p$ will get $V$ to access a position in the array whose value is $b$ with probability 0.5, and reject if the two bits differ. If $V$ wants to reduce the soundness error, he will just repeat his verification process many times.

For completeness, since if $G_0$ and $G_1$ are not isomorphic, randomly permuting them will make $V$ always access the array at an index whose value will be equal to the sampled bit $b$ .

Scripting a PCP for GNI

Let's build this PCP system for $GNI$ . I will be omitting imports and functions' definitions when comments are sufficient to understand what's going on. This is running with sage 9.8.

First, we build a list with all the possible graphs composed of $5$ vertices. Since sage has a database for non-isomorphic graphs, let's use it directly.

n = 5
vertices = range(n)
possible_edges = list(combinations(vertices, 2))
powerset_all_possible_edges = list(powerset(possible_edges))

all_graphs = []
for g in powerset_all_possible_edges:
    d = defaultdict(list, {k: [] for k in vertices})
    for edge in g:
        d[edge[0]].append(edge[1])
    all_graphs.append(Graph(d, immutable=True))

# we have 2^{(n * (n-1))/2} possible graphs with n vertices (
# the proof is of polynomial length in the number of vertices
# n vertices --> n(n-1)/2 pairs of possible points 
# all possible subsets of those pairs of points = 2^{n(n-1)/2}
assert(len(all_graphs) == 2**((n**2 - n)/2))

# also, sage has a db of all non isomorphic graphs of n vertices
# list of all non isomorphic graphs with n vertices, using `graphs()`
non_isomorphic_graphs = [g.copy(immutable=True) for g in graphs(len(vertices))]

We will randomly sample two non-isomorphic graphs and build a proof for it.

# generating a proof requires two non-isomorphic graphs and 
# a list of all possible graphs of n vertices
def prove(g0, g1, graphs):
    pi = {} # the proof is an array of bits, indexed by all possible n-vertex graphs
    for g in graphs:
        if g.is_isomorphic(g0):
            pi[g] = 0
        elif g.is_isomorphic(g1):
            pi[g] = 1
        else:
            # the choice does not matter if neither are isomorphic to g
            # this bit will never be inspected by the verifier
            pi[g] = random.choice([0, 1])
    return pi

g0, g1 = sample_graph_pair(non_isomorphic_graphs)
pi = prove(g0, g1, all_graphs)

The verifier can check $\pi$ :

# Soundness is a parameter indicating how many checks the verifier should do on the proof
def verify(pi, g0, g1, soundness):
    for i in range(soundness):
        b = random.choice([0, 1])
        gb = [g0, g1][b]
        h = get_permuted_graph(gb)
        if pi[h] != b:
            return False
    return True

proof_is_valid = verify(pi, g0, g1, 10000)
assert(proof_is_valid)

You can also try to cheat the verifier:

# try to cheat verifier with two isomorphic graphs
g1 = get_permuted_graph(g0) # g1 is a permutation of g0
pi = prove(g0, g1, all_graphs)
# if you set soundness to 1, there is a 1/2 prob that this check passes
proof_is_valid = verify(pi, g0, g1, 1)
assert(proof_is_valid == False)