Elena-Madalina Persu ‘13

Harvard School of Engineering and Applied Sciences, Computer Science

This paper shows an algorithm for efficiently evolving 2-CNF when the underlying variables are uniformly distributed. This contributes to the new Evolvability model proposed by Valiant [4] and developed by Feldman [1]. It is still an open question whether 2-CNF is evolvable under unknown distribution.

Motivation

From a biological point of view, the basis of evolution is natural selection as introduced by Charles Darwin. Natural selection has four components:

  • Variation. Organisms within populations exhibit individual variation. However this does not assume total variation, since there are some traits which might show little or no variation among individuals.
  • Inheritance. Traits are consistently passed on from parent to offspring.
  • High rate of population growth.
  • Survival of the fittest. Individuals possessing the traits well-suited for the environment will have more offspring such that in the end, deleterious traits perish.

The theoretical computer science model for evolvability introduced by Valiant [4] emulates these four concepts. As biological traits can be analyzed, we analyze different functions and characterize them as evolvable or not according to the model.

Learning models

PAC learning

Evolvability is actually a restricted form of Probably Approximately Correct (PAC) Learning, first introduced by Valiant [3].

Under the PAC model we are trying to learn a hidden function f from a concept class C. For this paper we assume f : {−1, 1 }n → {−1, 1} has the form of 2-CNF over n variables, X = {x1, ¬x1, x2, ¬x2 …xn,¬xn}, f  = ˄ (yi ˅ yj), where yi, yj \in X2. The learning algorithm is given access to an example oracle EX(f, Dn) that upon being invoked returns an example (x, f(x)), where x is chosen randomly with respect to a distribution Dn.

An algorithm is said to efficiently PAC learn a class of functions C if for every ε, δ > 0, f \in C and distribution Dn, having access to EX(f, Dn) outputs with probability at least 1−δ and in time polynomial in 1/ε, 1/δ, n and the size of a concept in C, a hypothesis h for which PDn[f(x) h(x)] ≤ ε.

We say that a class of functions Cn is PAC-learnable if there exists an algorithm which efficiently PAC learns it.

SQ−Learning

The Statistical Query Learning model, introduced by Kearns [2] has the same structure as the PAC learning model; the only difference is that it has acces to a different type of oracle instead of EX(Dn, f). SQ-Learning uses an oracle, STAT(Dn, f, τ), which has the property that upon query with any function g(x, f) it outputs an approximation to P(g(x, f) = 1) within an error of τ. Note that g is a function of both the n variables and the hidden function f.

Similar to PAC learnability, a class Cn is said to be SQ learnable if there is an algorithm, which takes as input ε, τ, that has access to STAT(Dn , f, τ) and outputs a hypothesis h such that PD[f(x) h(x)] ≤ ε, in time polynomial in 1/ε, 1/τ, n. Note the disappearance of the uncertainty given by δ from the PAC case.

Evolvability

The evolvability model transposes into theoretical computer science terms the concept of natural selection. For a complete characterization of it see [4].

As in previous cases, there is a hidden concept f that we try to approximate. For any other function, g, from the concept class C we define a subset N(g) of C as the neighbors of g, which presumably doesn’t differ very much from g according to some pre-established criterion. All functions in C are characterized by a performance function which indicates how close they are to f. N(g) is subsequently divided into three disjoint sets with respect to the performance function into beneficial, neutral and deleterious neighbors. At any step, g is allowed to mutate into a beneficial neighbor. The probability of mutating into a given n(g) is proportional to the fitness of n with respect to the performance function, given that n(g) is beneficial. If no beneficial neighbors exist, g is allowed to mutate to a neutral neighbor again under a distribution proportional to the fitness.

In the end, a class C is said to be evolvable if starting from any g from C, we can arrive to an approximation of the hidden function f within ε error in a polynomial number of mutations.

CSQ−Learning

Feldman[1] showed that the Evolvability Model introduced by Valiant is equivalent to the Correlational Statistical Model. This model is a restriction of the SQ model in the sense that now STAT(Dn, f, τ) can’t accept any function g(f, x). Instead, the functions need to be of the form g(x) and the returned results will be EDn[f(x)g(x)]. Note that g could be any function of the n variables.

We use this different characterization to show that the concept class of 2-CNF is evolvable.

Evolving 2-CNF

This section shows an algorithm for evolving 2-CNF under the uniform distribution. We assume that the true function is a 2-CNF that we call f and we have n literals L = {x1,x2,…xn}. Let \overline{L}  = x1, ¬x2¬xn}, the set of negations of all variables. Note that given d = a1a2, where ai  \in L \overline{L}, a1 ≠ a2, we can make use of the function gd: L \overline{L} → N, gd(a1, a2, …,an) = d − 1. Hence, if at least one of a1a2 is true, the value of gd will be 0, and if they are both false gd will be −2. The key to proving the algorithm is the observation that if d is in f, then E[gdf] = 1/2, while if d is not part of f, then E[gdf] is significantly different from 1/2. In a similar manner, for any d’ = a ˅ a and gd’ = d’ − 1, with a  \in L \overline{L}, if d’ is in f then E[gd’f] = 1, while if d’ is not part of f then E[gd’f] is significantly different from 1. By quering CSQ for any d and d’ with E[gdf] and E[gd’f], we will be able to identify all d, d’ that are in f. Our algorithm falsely characterizes some d − s and d’ − s which are not part of f as being part of f, but we show that the error that arises is less than ε.

The algorithm is provided first, followed by an explanation and proof of correctness.

Input: A set of literals L = {x1, x2,…, xn}
Input: A CSQ oracle O with tolerance τ(n)
Input: A desired error ε
Output: A learned conjunction h
Let S = { (a1 ˅ a2 ) | \forall ai  \in L \overline{L}, a1 ≠ a2}.
Let T = { (a ˅ a) | \forall a \in L}

foreach Variable d \in S do

Let gd = d − 1
Query CSQ for E[gdf] with τ = ε/(6n2)
let α(d) be the result obtained
if  12 − α(d) > ε/(6n2then remove d from S.

foreach Variable d’ \in T do

Let gd’ = d’−1
Query CSQ for E[gd’f] with τ = ε/(6n2)
let β(d’) be the result obtained
β(d’) = E[gd’f]
if 1 − βd’ > ε/(6n2) then remove d’ from T.

Compute the hypotheses h = ˄x(ST)
Return h.

Proof

Initial remarks

There are many equivalent 2-CNF. For example, (a c) ∧ (a \overline{c}) is equivalent to (a a) since one of c and \overline{c} is true and the other is false, hence a needs to be true for the first one to be true. This is obviously true for the second one as well, so they are equivalent. We can assume then that f is given in any of the equivalent forms, and we choose the longest out of all 2-CNF equivalent to it that don’t have duplicate disjunctions. Note that since we don’t allow duplicates, the length of any given f will be at most \binom{2n}{2} + 2n so it remains polynomial in n.

 Now, we have to show that the described algorithm works. That is, given an input error ε, then the algorithm returns a 2-CNF h such that Prob(h ≠ f) < ε in a time that increases polynomially with n and 1/ε.

Since we have access to a CSQ oracle, we can find E[(d − 1)f] within an error of τ(n), for any disjunction d.

We analyze 2 cases: what happens with variables that are part of f and with the ones that aren’t. We treat in parallel the subcases given by the two sets S and T, picking one variable from S and one from T.

Case 1.

If d in S, d’ in T and d, d’ f, then E[gdf] = 1/2 and E[gd’f] = 1.

Proof: If d = 1 (similarly if d’ = 1), then regardless of the value of f (either 1 or −1), (d − 1)f = 0 and (d’ − 1)f = 0, hence it doesn’t contribute to the expected value.

If d = −1 or if d’ = −1, since d f and d’ f, this means f = −1, so

           E[gdf] = 2P(d = −1) = 2 (1/2)2 = 1/2

           E[gd’f] = 2P(d’ = −1) = 2 (1/2) = 1

Note that P(d = −1) = 1/4 and P(d’ = −1) = 1/2 because the two literals that appear in d are different and uniformly distributed. They both need to be −1, hence the chances are 1/4 and 1/2. Since the true expectation is 1/2 and 1, when CSQ will be queried with τ = ε/(6n2), we have

|\frac{1}{2} - \alpha(d)| < \frac{\epsilon}{6n^2} \Rightarrow \frac{1}{2} - \alpha(d) < \frac{\epsilon}{6n^2}

Similarly,

|1 - \beta(d')| < \frac{\epsilon}{6n^2} \Rightarrow 1 - \beta(d') < \frac{\epsilon}{6n^2}

Hence our algorithm doesn’t remove d and d’ from S and T, respectively, so d and d’ will be part of h. This proves that all disjunctions from S and T which are in f will also be in the output hypothesis.

Case 2.

If d = a1 a2 \notin f, let f = A R, where A is the set of disjunctions in f which contain one of the literals in d. Let D be the set of different literals from L \overline{L} which appear in A, excluding a1, a2, which has t elements. Let p be the event that all those literals are true.

Again we have that if d = 1 then (d − 1)f = 0 regardless of f. The probability that d = −1 is 1/4 as discussed above. Hence

E[g_df] = \frac{1}{4} (-2) [P(f=1|d=-1)-P(f=-1|d=-1)]

For f to be 1 when d = −1, we need p to happen or else a disjunction from A will be false.

Key idea: I claim that D can not contain both forms, affirmative and negative, of one literal y.

Assume by the sake of contradiction that this were the case. In this situation if d = −1, then f = −1 because for each example one of y, ȳ will be −1, so one disjunction in A is false, hence f is −1. This means that the 2-CNF, f’ = f d, is equivalent to f. If f’ is true, then f needs to be true as its subset; if f’ is false, then either f or d are false. Since d being false implies f is false, f’ = −1 ⇒ f = −1. Reverse, if f is false, then f’ is false and if f is true, then d needs to be true, hence f’ is true. However, note that we are in the case when d \notin f, so we have obtained a conjunction equivalent to f of greater length.  This contradicts our initial assumption for f being the longest out of the conjunctions equivalent to it.

Overall, we get that D doesn’t contain both forms of one literal, so all literals appearing in D are independent. This means that the probability of p (all literals in D are true), is just (1/2)t. We obtain

P(f=1|d=-1) = (\frac{1}{2})^tP(R=1|p)

P(f=1|d=-1) = (1-(\frac{1}{2})^t) + (\frac{1}{2})^2(1-P(R=1|p))

P(f=1|d=-1) = (1-(\frac{1}{2})^tP(R=1|p)

Overall we have:

E[(d-1)f] = -\frac{1}{2}(2 (\frac{1}{2})^tP(R=1|p)-1)

E[(d-1)f] = \frac{1}{2}-(\frac{1}{2})^tP(R=1|p)

Let err = (1/2)tP(R = 1|p).

If err > 2τ(n), then our algorithm will correctly identify d as not being part of f, since CSQ would output an answer less than 1/2−τ. If d is in f, CSQ would output an answer at least 1/2−τ.

An identical argument holds for the case when d’ \in T and d’ \notin f. We make the same error considerations, but with respect to 1 as opposed to 1/2.

As proven in the first case, any αf will not be removed from S, hence h = f f’, where f’ is a 2-CNF but possibly the empty set. The error in our hypothesis h comes from misclassifying disjunctions d, d’ as being part of f when they aren’t. This happens only if err < 2τ. In this case, for such a disjunction d = a1 a2, the difference is given by the case when x = −1, so h is −1 while f can be 1.

As shown above, if d = −1, d \notin f, then

P(f=1) = (\frac{1}{2})^tP(R=1|p) = err < 2\tau = \frac{\epsilon}{3n^2}

Now, overall, we have |S| = \binom{2n}{2} and |T| = 2n, (or \binom{2n}{2} + 2n) different values for a1 a2, so we have at most n(2n + 1) values of d from which we get errors on h. By a union bound, we have that the total error is at most

n(2n+1)\frac{\epsilon}{3n^2} < \epsilon

Hence, h is a good approximation for f, giving an error of at most ε, as desired.

Note also that h was found in time polynomial in n and 1/ε. In order to find h we queried CSQ n(2n + 1) times with an error τ = ε/(6n2). Since it takes CSQ a polynomial time in n, 1/τ, to give an answer, our algorithm is therefore polynomial in n and 1/ε.

Conclusions and future work

We presented an algorithm for learning 2-CNF under the CSQ model when underlying variables are uniformly distributed. This shows 2-CNF are evolvable under uniform distribuions, because the CSQ model is equivalent to the evolvability model.

It would be interesting to generalize our algorithm to the case of k-CNF and find a direct algorithm which proves evolvability, as described by Valiant [4], without using CSQ.

Acknowledgements

I would like to thank Prof. Leslie Valiant for everything I learned in CS 228 and Varun Kanade for useful discussions on this research.

References

Vitaly Feldman. Evolvability from learning algorithms. In COLT, 2008.

Michael Kearns. Efficient noise-tolerant learning from statistical queries. J. ACM, 1998.

Leslie Valiant. A theory of the learnable.C. ACM, 1984.

Leslie Valiant. Evolvability. In 32nd International Symposium on Mathematical Foundations of Computer Science, 2007.


Comments:

NO COMMENTS

LEAVE A REPLY