Elena-Madalina Persu ‘13

Harvard School of Engineering and Applied Sciences, Computer Science

This paper shows an algorithm for efficiently evolving 2-CNF when the underlying variables are uniformly distributed. This contributes to the new Evolvability model proposed by Valiant [4] and developed by Feldman [1]. It is still an open question whether 2-CNF is evolvable under unknown distribution.

**Motivation**

From a biological point of view, the basis of evolution is natural selection as introduced by Charles Darwin. Natural selection has four components:

- Variation
*.*Organisms within populations exhibit individual variation. However this does not assume total variation, since there are some traits which might show little or no variation among individuals. - Inheritance. Traits are consistently passed on from parent to offspring.
- High rate of population growth.
- Survival of the fittest. Individuals possessing the traits well-suited for the environment will have more offspring such that in the end, deleterious traits perish.

The theoretical computer science model for evolvability introduced by Valiant [4] emulates these four concepts. As biological traits can be analyzed, we analyze different functions and characterize them as evolvable or not according to the model.

**Learning models**

**PAC learning**

Evolvability is actually a restricted form of Probably Approximately Correct (PAC) Learning, first introduced by Valiant [3].

Under the PAC model we are trying to learn a hidden function *f* from a concept class *C*. For this paper we assume *f *: {−1, 1 }^{n} → {−1, 1} has the form of 2-CNF over *n* variables, *X* = {*x _{1}, *¬

*x*¬

_{1}, x_{2},*x*¬

_{2 }…x_{n},*x*},

_{n}*f*=

*˄*(

*y*), where

_{i }˅ y_{j}*y*,

_{i}*y*

_{j}*X*. The learning algorithm is given access to an example oracle

^{2}*EX*(

*f, D*) that upon being invoked returns an example (

_{n}*x*,

*f*(

*x*)), where

*x*is chosen randomly with respect to a distribution

*D*.

_{n}An algorithm is said to efficiently PAC learn a class of functions *C* if for every ε, *δ* > 0, *f * *C* and distribution *D _{n}*, having access to

*EX(f, D*) outputs with probability at least 1−

_{n}*δ*and in time polynomial in 1/ε, 1/δ,

*n*and the size of a concept in

*C*, a hypothesis

*h*for which

*P*[

_{Dn}*f*(

*x*)

*≠*

*h*(

*x*)] ≤ ε.

We say that a class of functions *C*_{n} is PAC-learnable if there exists an algorithm which efficiently PAC learns it.

#### SQ−Learning

The Statistical Query Learning model, introduced by Kearns [2] has the same structure as the PAC learning model; the only difference is that it has acces to a different type of oracle instead of *EX*(*D _{n}, f)*. SQ-Learning uses an oracle,

*STAT*(

*D*

_{n}

*, f,*τ), which has the property that upon query with any function

*g*(

*x*,

*f*) it outputs an approximation to

*P*(

*g*(

*x*,

*f*) = 1) within an error of τ. Note that

*g*is a function of both the

*n*variables and the hidden function

*f*.

Similar to PAC learnability, a class *C*_{n} is said to be SQ learnable if there is an algorithm, which takes as input ε, τ, that has access to *STAT*(*D*_{n }, *f, τ*) and outputs a hypothesis *h* such that *P*_{D}[*f*(*x*)* *≠ *h*(*x*)] ≤ *ε*, in time polynomial in 1/*ε*, 1/*τ*, *n*. Note the disappearance of the uncertainty given by *δ* from the PAC case.

#### Evolvability

The evolvability model transposes into theoretical computer science terms the concept of natural selection. For a complete characterization of it see [4].

As in previous cases, there is a hidden concept *f* that we try to approximate. For any other function, *g*, from the concept class *C* we define a subset *N*(*g*) of *C* as the neighbors of *g*, which presumably doesn’t differ very much from *g* according to some pre-established criterion. All functions in *C* are characterized by a performance function which indicates how close they are to *f*. *N*(*g*) is subsequently divided into three disjoint sets with respect to the performance function into beneficial, neutral and deleterious neighbors. At any step, *g* is allowed to mutate into a beneficial neighbor. The probability of mutating into a given *n*(*g*) is proportional to the fitness of *n* with respect to the performance function, given that *n*(*g*) is beneficial. If no beneficial neighbors exist, *g* is allowed to mutate to a neutral neighbor again under a distribution proportional to the fitness.

In the end, a class *C* is said to be evolvable if starting from any *g* from *C*, we can arrive to an approximation of the hidden function *f* within ε error in a polynomial number of mutations.

#### CSQ−Learning

Feldman[1] showed that the Evolvability Model introduced by Valiant is equivalent to the Correlational Statistical Model. This model is a restriction of the SQ model in the sense that now *STAT*(*D _{n}, f, *τ) can’t accept any function

*g*(

*f*,

*x*). Instead, the functions need to be of the form

*g*(

*x*) and the returned results will be

*E*[

_{Dn}*f*(

*x*)

*g*(

*x*)]. Note that

*g*could be any function of the

*n*variables.

We use this different characterization to show that the concept class of 2-CNF is evolvable.

**Evolving 2-CNF**

This section shows an algorithm for evolving 2-CNF under the uniform distribution. We assume that the true function is a 2-CNF that we call *f* and we have *n* literals *L = *{*x _{1},x_{2},…x_{n}*}. Let

*=*{¬

*x*¬

_{1},*x*¬

_{2}…*x*}, the set of negations of all variables. Note that given

_{n}*d = a*∨

_{1}*a*, where

_{2}*a*

_{i }*L*∪ , a

_{1 }≠ a

_{2}, we can make use of the function

*g*:

_{d}*L*∪

*→ N*,

*g*(

_{d}*a*)

_{1}, a_{2}, …,a_{n}*= d*− 1. Hence, if at least one of

*a*is true, the value of

_{1}a_{2}*g*will be 0, and if they are both false

_{d}*g*will be −2. The key to proving the algorithm is the observation that if

_{d}*d*is in

*f*, then

*E*[

*g*]

_{d}f*=*1/2, while if

*d*is not part of

*f*, then

*E*[

*g*] is significantly different from 1/2. In a similar manner, for any

_{d}f*d’ = a ˅ a*and

*g*1, with

_{d’}= d’ −*a*

_{ }*L*∪

*,*if

*d’*is in

*f*then

*E*[

*g*]

_{d’}f*=*1, while if

*d’*is not part of

*f*then

*E*[

*g*] is significantly different from 1. By quering CSQ for any

_{d’}f*d*and

*d’*with

*E*[

*g*] and

_{d}f*E*[

*g*], we will be able to identify all

_{d’}f*d, d’*that are in

*f*. Our algorithm falsely characterizes some

*d − s*and

*d’ − s*which are not part of

*f*as being part of

*f*, but we show that the error that arises is less than

*ε*.

The algorithm is provided first, followed by an explanation and proof of correctness.

**Input:** A set of literals *L = *{*x _{1}, x_{2},…, x_{n}*}

**Input:**A CSQ oracle

*O*with tolerance

*τ*(

*n*)

**Input:**A desired error ε

**Output:**A learned conjunction

*h*

Let

*S =*{ (

*a*)

_{1 }˅ a_{2 }*|*

*a*

_{i }*L*∪ ,

*a*}.

_{1 }≠ a_{2}Let

*T =*{ (

*a ˅ a*)

*|*

*a*

*L*}

**foreach ***Variable d * *S ***do**

Let *g _{d} = d − *1

Query CSQ for

*E*[

*g*]

_{d}f*with*

*τ = ε/(6n*)

^{2}let

*α(d)*be the result obtained

**if**

*12 − α*(

*d*)

*> ε*/(

*6n*)

^{2}**then**remove

*d*from

*S*.

**foreach** *Variable d’ * *T ***do**

Let *g _{d’} = d’−*1

Query CSQ for

*E*[

*g*]

_{d’}f*with*

*τ = ε*/(

*6n*)

^{2}let

*β(d’)*be the result obtained

*β(d’) = E*[

*g*]

_{d’}f**if**1

*− βd’ > ε/(*6

*n*)

^{2}

**then**remove

*d’*from

*T*.

Compute the hypotheses *h = ˄ _{x}*

_{∈}

_{(S}

_{∪}

_{T)}Return

*h*.

#### Proof

*Initial remarks*

There are many equivalent 2-CNF. For example, (*a *∨ *c*)* *∧ (*a *∨ )* *is equivalent to* *(*a *∨ *a*) since one of *c* and is true and the other is false, hence *a* needs to be true for the first one to be true. This is obviously true for the second one as well, so they are equivalent. We can assume then that *f* is given in any of the equivalent forms, and we choose the longest out of all 2-CNF equivalent to it that don’t have duplicate disjunctions. Note that since we don’t allow duplicates, the length of any given *f* will be at most * + 2n* so it remains polynomial in *n*.

Now, we have to show that the described algorithm works. That is, given an input error ε, then the algorithm returns a 2-CNF *h* such that *Prob*(*h ≠ f*)* < ε* in a time that increases polynomially with n and 1/ε.

Since we have access to a CSQ oracle, we can find *E*[(*d − *1)*f*]* *within an error of *τ*(*n*), for any disjunction *d*.

We analyze 2 cases: what happens with variables that are part of *f* and with the ones that aren’t. We treat in parallel the subcases given by the two sets* S* and *T*, picking one variable from *S* and one from *T*.

*Case 1*.

If *d* in *S*, *d’* in *T* and *d*, *d’ *∈ *f,* then *E*[*g _{d}f*]

*= 1/2*and

*E*[

*g*]

_{d’}f*=*1.

*Proof:* If *d = *1 (similarly if *d’ = *1), then regardless of the value of *f* (either 1 or −1), (*d − *1)*f = 0 *and (*d’ − *1)*f = *0, hence it doesn’t contribute to the expected value.

If *d = *−1 or if *d’ = −1*, since *d *∈ *f* and *d’ *∈ *f*, this means *f *= −1, so

*E*[*g _{d}f*]

*=*2

*P*(

*d =*−1)

*=*2 (1/2)

^{2}

*=*1/2

*E*[*g _{d’}f*]

*=*2

*P*(

*d’ =*−1)

*=*2 (1/2)

*=*1

Note that *P*(*d* = −1) = 1/4 and *P*(*d’* = −1) = 1/2 because the two literals that appear in *d* are different and uniformly distributed. They both need to be −1, hence the chances are 1/4 and 1/2. Since the true expectation is 1/2 and 1, when CSQ will be queried with *τ* = *ε/*(6*n*^{2}), we have

Similarly,

Hence our algorithm doesn’t remove *d* and *d’* from *S *and *T*, respectively, so *d *and *d’* will be part of *h*. This proves that all disjunctions from* S *and *T *which are in *f* will also be in the output hypothesis.

*Case 2*.

If *d = a _{1 }*∨

*a*

_{2 }*f,*let

*f = A*∧

*R,*where

*A*is the set of disjunctions in

*f*which contain one of the literals in

*d*. Let

*D*be the set of different literals from

*L*∪ which appear in

*A*, excluding

*a*, which has

_{1}, a_{2}*t*elements. Let

*p*be the event that all those literals are true.

Again we have that if *d* = 1 then (*d − *1)*f* = 0 regardless of *f*. The probability that *d *= −1 is 1/4 as discussed above. Hence

For *f* to be 1 when *d* = −1, we need *p* to happen or else a disjunction from *A* will be false.

*Key idea:* I claim that *D* can not contain both forms, affirmative and negative, of one literal *y*.

Assume by the sake of contradiction that this were the case. In this situation if* d* = −1, then *f* = −1 because for each example one of *y, ȳ* will be −1, so one disjunction in *A* is false, hence *f *is −1. This means that the 2-CNF, *f’ = f *∧ *d,* is equivalent to *f*. If *f’* is true, then *f *needs to be true as its subset; if *f*’ is false, then either *f* or *d* are false. Since *d* being false implies *f* is false, *f’* = −1 ⇒ *f* = −1. Reverse, if *f* is false, then *f’* is false and if *f* is true, then *d* needs to be true, hence *f’* is true. However, note that we are in the case when *d * *f,* so we have obtained a conjunction equivalent to *f* of greater length. This contradicts our initial assumption for *f* being the longest out of the conjunctions equivalent to it.

Overall, we get that *D* doesn’t contain both forms of one literal, so all literals appearing in *D* are independent. This means that the probability of *p* (all literals in *D* are true), is just (1/2)^{t}. We obtain

Overall we have:

Let *err* = (1/2)^{t}*P*(*R* = 1|*p*).

If *err *> 2*τ*(*n*), then our algorithm will correctly identify *d* as not being part of* f,* since CSQ would output an answer less than 1/2−*τ. *If *d* is in* f,* CSQ would output an answer at least 1/2−*τ*.

An identical argument holds for the case when *d’ * *T* and *d’* *f. *We make the same error considerations, but with respect to 1 as opposed to 1/2.

As proven in the first case, any *α* ∈ *f* will not be removed from *S*, hence *h* = *f *∧ *f’*, where *f’* is a 2-CNF but possibly the empty set. The error in our hypothesis *h* comes from misclassifying disjunctions *d*, *d’* as being part of *f* when they aren’t. This happens only if *err < *2*τ*. In this case, for such a disjunction *d = a _{1 }*∨

*a*the difference is given by the case when

_{2,}*x =*−1, so

*h*is −1 while

*f*can be 1.

As shown above, if *d* = −1, *d * *f*, then

Now, overall, we have |*S|* = and *|T*| = 2*n*, (or + 2*n*) different values for *a _{1 }*∨

*a*, so we have at most

_{2}*n*(

*2n +*1) values of

*d*from which we get errors on

*h*. By a union bound, we have that the total error is at most

Hence, *h* is a good approximation for *f,* giving an error of at most *ε*, as desired.

Note also that *h* was found in time polynomial in *n* and 1/*ε*. In order to find *h* we queried CSQ *n*(2*n* + 1) times with an error *τ* = *ε*/(6*n*^{2}). Since it takes CSQ a polynomial time in *n*, 1/*τ*, to give an answer, our algorithm is therefore polynomial in *n* and 1/*ε*.

**Conclusions and future work**

We presented an algorithm for learning 2-CNF under the CSQ model when underlying variables are uniformly distributed. This shows 2-CNF are evolvable under uniform distribuions, because the CSQ model is equivalent to the evolvability model.

It would be interesting to generalize our algorithm to the case of *k*-CNF and find a direct algorithm which proves evolvability, as described by Valiant [4], without using CSQ.

**Acknowledgements**

I would like to thank Prof. Leslie Valiant for everything I learned in CS 228 and Varun Kanade for useful discussions on this research.

**References**

Vitaly Feldman. Evolvability from learning algorithms. In COLT, 2008.

Michael Kearns. Efficient noise-tolerant learning from statistical queries. J. ACM, 1998.

Leslie Valiant. A theory of the learnable.C. ACM, 1984.

Leslie Valiant. Evolvability. In 32nd International Symposium on Mathematical Foundations of Computer Science, 2007.