About Me

My photo
Ravi is an armchair futurist and an aspiring mad scientist. His mission is to create simplicity out of complexity and order out of chaos.

Thursday, May 23, 2013

Arbitrary girth and chromatic number

In 1961, using a probabilistic argument, Paul Erdős proved that there exist graphs with arbitrarily large girth and chromatic number. In this blog post, we prove this theorem.

Statement of the theorem

For any number $k\ge 3$, there exists a graph with girth at least $k$ and chromatic number at least $k$.

Idea behind the proof

In a random graph $G\in\mathcal{G}_{n,p}$, as the probability of each edge $p$ increases, so does the expected number of edges in the graph $G$. Increase in the number of edges does two things:
  1. It increases the chromatic number $\chi(G)$ of the graph. We want this to prove the theorem.
  2. It decreases the girth $g(G)$ of the graph, since more edges mean more chances for a small cycle. We want the opposite of this to prove the theorem.
To get both $\chi(G)\geq{k}$ and $g(G)\geq{k}$, we need a $p$ that is large enough to have the required chromatic number $\chi(G)\geq{k}$, but small enough to have the required girth $g(G)\geq{k}$.

Sketch of the proof

We will implement the above idea as follows:
  1. We will construct a random graph $G$ by fixing a value for $p$ such that $G$ has a small number of cycles of length less than $k$, for any large $n$. We will then delete these cycles, giving us a graph $G^\prime$ that has no cycles of length less than $k$. In other words, $g(G^\prime)\geq{k}$. Aha!
  2. Keeping $p$ fixed, we will then increase $n$ to get $\chi(G^{\prime\prime})\ge k$. Note that keeping $p$ fixed and increasing $n$ increases the chromatic number of the graph, since $\chi(G^{\prime\prime})\ge v(G^{\prime\prime})/ \alpha(G^{\prime\prime})$ where
    • $v(G^{\prime\prime})=O(n)$ (the number of vertices in the graph)
    • $\alpha(G^{\prime\prime})=O(\log{n})$ (the maximum size of any color class in the graph).
The interesting thing about this approach is that every graph in the family $\mathcal{G}_{n,p}$ with the above values for $n$ and $p$ will almost surely have both girth and chromatic number $\geq k$.

Techniques used

The proof

  1. Let $X_i$ be a random variable denoting the number of cycles of length $i$. What is the expected value $E(X_i)$?
    1. What is the possible number of cycles of length $i$ over $n$ vertices?
      1. Number of ways of choosing $i$ vertices of a cycle: $n \choose i$
      2. Number of ways of creating a cycle out of $i$ chosen vertices: $(i-1)!/2$ since any permutation of the $i$ vertices is a cycle. However, cycles have no "ends" (hence the division by $i$) and clock-wise and counter-clock-wise orderings translate to the same cycle (hence the division by 2).
      3. So the possible number of cycles of length $i$ over $n$ vertices is $ {n\choose i} (i-1)!/2$.
    2. If each edge occurs with probability $p$, then the probability of $i$ edges of the cycle in the graph is $p^i$.
    3. So $E(X_i)={n\choose i}(i-1)! p^i/2$
  2. Lets call "cycles of length less than $k$" as small cycles.
  3. Let $X=\sum_{i=3}^{k-1}X_i$ be a random variable denoting the number of small cycles. What is the expected number of small cycles? $E(X) = E(\sum_{i=3}^{k-1}X_i) = \sum_{i=3}^{k-1}E(X_i) \\ = \sum_{i=3}^{k-1} {n\choose i} (i-1)! p^i/2 \\ \le\sum_{i=0}^{k-1} {n\choose i} (i-1)! p^i/2 \\ \leq \sum_{i=0}^{k-1}(np)^i = \frac{(np)^k-1}{np-1}$
  4. What is the probability that the number of such small cycles is more than $n/2$? By Markov's inequality, $\mathbf{Pr}(X>n/2) \le E(X)/(n/2)\le\frac{2(np)^k-1}{n(np-1)}$.
  5. By choosing $np=n^{1/k}$, or in other words, $p=p_0:=n^{-(k-1)/k}$ \[\lim_{n\rightarrow\infty}\mathbf{Pr}(X>n/2)=0\] This means that when $p=p_0$, almost surely, we only have small cycles (no large cycles).
  6. By deleting one vertex on each of the possibly $n/2$ cycles, we obtain $G^\prime$ which has girth at least $k$. In this process, we would have deleted no more than $n/2$ vertices, meaning the number of vertices in $v(G^\prime)$, $v(G^\prime)\ge n/2$.
  7. We have now fixed a value of $p$ for which the graph $G^\prime$ has girth $g(G^\prime)\geq k$. This completes part (1) from the sketch of the proof. Now onto part (2).
  8. For $G\in\mathcal{G}_{n,p}$, the size of the maximum independent set $\alpha(G)$ is bounded by $\alpha(G)\leq (1/p)\ln(n^2)$ for large $n$. We also note that deleting vertices does not increase the size of the maximum independent set. This implies that for $G^\prime$, $\alpha(G^\prime)\leq (1/p)\ln(n^2)$ is still valid (where $n$ is the number of vertices in the original graph $G$).
  9. The chromatic number for $G^\prime$ \[\chi(G^\prime)\geq v(G^\prime)/\alpha(G^\prime) \geq \frac{(n/2)}{(1/p)\ln(n^2)} = \frac{np}{4\ln(n)} = \frac{n^{1/k}}{4\ln(n)}\] By choosing a large enough $n$, we can get $\chi(G^\prime)\ge k$.

Application

With the above method, for $k=10$, we need a graph $G\in\mathcal{G}_{n,p}$ where $n$ is around $10^{20}$ vertices and probability $p=10^{-18}$. All resulting graphs will almost surely have both girth and chromatic number greater than 10.

Sunday, November 18, 2012

Redei's theorem

Redei's theorem states that every tournament has a directed Hamilton path. We prove this theorem in this blog post.

Background


Tournament

A tournament is a complete graph with oriented edges.
It can be viewed as the result of a round-robin tournament, where every team plays every other team and there is always a winner and a loser in every match (no ties). The direction of each edge is from the winner to the loser of that match. In the above graph, team 1 beat team 2, hence the edge $1\rightarrow 2$ and so on.

Hamilton path

A Hamilton path is a path connecting all vertices of a graph (once and only once).
The directed path shown above in red is a Hamilton path, since it connects all vertices of the graph.

Now we are ready for the theorem and its proof.

Redei's theorem

Every tournament has a directed Hamilton path. This was first proven by Laszlo Redei in 1934.

Proof by induction

For the base case, consider a directed graph on 2 vertices, say $v_1\rightarrow v_2$. This is also a Hamilton path, since it covers both vertices. So the statement holds true for our base case.

For the inductive step, we assume that each tournament on $(n-1)$ vertices has a Hamilton path. Assume that this path is {$v_1,\cdots,v_{n-1}$} as shown in the graphs below. We consider 3 different scenarios for the new vertex $v$ added to this graph.

  1. In the first scenario, we have an edge $v\rightarrow v_1$ as shown by the red edge in the graph below. The new path $\{v,v_1,\cdots,v_{n-1}\}$ is a Hamilton path. So for this scenario, a tournament on $n$ vertices does have a Hamilton path.
  1. In the second scenario, we have an edge $v_{n-1}\rightarrow v$ as shown by the red edge in the graph below. The new path $\{v_1,\cdots,v_{n-1},v\}$ is a Hamilton path. So for this scenario too, a tournament on $n$ vertices does have a Hamilton path.
  1. In the final scenario different from the previous two, we have both $v_1\rightarrow v$ and $v\rightarrow v_{n-1}$ as shown in the graph below. In this case, the first vertex $v_i$ such that there is an edge $v\rightarrow v_i$ (shown as a dotted edge) completes a Hamilton cycle $\{v_1,\cdots,v_{i-1},v,v_{i+1},\cdots,v_{n-1}\}$. (Note that $i$ could be $n-1$ (the last vertex) if all edges preceding it go into $v$.) So for this scenario too, a tournament on $n$ vertices has a Hamilton path.

The above cover all the scenarios for the inductive step, completing an inductive proof of Redei's theorem that every tournament has a directed Hamilton path.

Conclusion

Using the analogy of matches in a round-robin tournament between $n$ teams, Redei's theorem says that it is always possible to find $n$ matches, such that team A beat team B, which beat team C and so on, which beat team N. Now that was not obvious before! (Note: team A doesn't mean team 1. $\{A, B, \cdots, N\}$ is some permutation of $\{1, 2, \cdots, n\}$.)

References

  1. Bondy, J.A., Murty, U.S.R., Graph Theory, 2008.

Friday, September 14, 2012

Gale-Shapley algorithm as a picture

Here is a pictorial depiction of the Gale-Shapley algorithm:

Gale-Shapley algorithm


  • Each man $M_i$ is drawn as a square and each woman $W_j$ as a circle.
  • Each person's choices are arranged clockwise, in ascending order of desirability - i.e. the bottom choice first, followed by better and better choices and eventually the top choice in a clockwise manner.
  • According to the algorithm
    • Man proposes to his highest choice woman that he has not proposed to. So, in the diagram above, he goes through his choices anti-clockwise.
    • Woman accepts a proposal if the man is better than her current fiance.  So, in the diagram above, she goes through her choices clockwise.
The following is apparent from the picture:
  1. As the algorithm progresses, every woman can only improve her situation. In other words, the worst she can do is her current fiance - it only gets better for her!
  2. As the algorithm progresses, every man's position can only get worse. In other words, "it is all downhill" for him.
  3. The reason for rejecting a man's proposal is always that the woman's current choice is better than him (for her) and it can only get better for her, implying that he cannot introduce instability for those women who have rejected him.
  4. Since a woman can be proposed to only once by each man, she can get no more than $|M|$ proposals. Since there are $|W|$ women and some woman is proposed to at each step, the algorithm has to terminate in $O(|M|\times|W|)$ time.
  5. When we have an equal number of men and women, each woman will be proposed to (eventually). This means that each woman will have a fiance (she must accept a proposal if she has no fiance). Since $|M|=|W|$ in this case, this algorithm runs in $O(n^2)$ time.
References

Friday, August 17, 2012

Schwartz-Zippel Lemma

The intuition behind this lemma seems rather straight-forward. An algebraic multi-variate equation of total degree $d$ can have no more than $d$ roots. In other words, a maximum of $d$ values (or elements from a field, if you will) can satisfy that algebraic equation (unless, of course, the equation is identical to zero, in which case any element satisfies the equation).

Examples
  • $x^2-3x+5$ has total degree 2. So it can have no more than 2 roots.
  • $x^3+x^2y^2+y^3$ has total degree 4. So it can have no more than 4 roots.

With the above intuition, it seems "obvious" that for any given set of elements (no matter how large or small), at most $d$ elements from that set can satisfy the given equation. So, the probability that a random element from this set will satisfy the equation is no more than $d/|S|$. And that is exactly the statement of this lemma.

Statement
Let $\mathbb{Q}(x_1,\cdots,x_n)\in\mathbb{F}[x_1,\cdots\,x_n]$ be a multi-variate polynomial of total degree $d$. Fix any finite set $S\subseteq\mathbb{F}$, and let $r_1,\cdots,r_n$ be chosen independently and uniformly at random from $S$. Then \[\mathbf{Pr}[\mathbb{Q}(r_1,\cdots,r_n)=0\hspace{1mm}|\hspace{1mm}\mathbb{Q}(x_1,\cdots,x_n)\not\equiv 0]\leq\frac{d}{|S|}\]
References

Sunday, July 1, 2012

Verifying equality of large data

The general pattern of verifying equality of two pieces of data, say $a$ and $b$, is comparing each bit in sequence, until we either:
  • find a mismatch (in which case, we declare that $a\neq{b}$) or
  • we reach the end (in which case, we declare that $a=b$).
With this approach, our answer is correct with 100% confidence. However, for this absolute confidence, we need $O(n)$ comparisons, which may be inconvenient as $a$ and $b$ can be arbitrarily large, e.g. 1 PB of data. Additionally, $a$ and $b$ may not necessarily be available at the same time (e.g. $a$ and $b$ are two databases that are geographically separated), so comparing bits has network overhead.

The question we are trying to address here is if there is a way to verify equality with high confidence (say 99% confidence) without paying the penalty of comparing each bit or the associated network overhead. We describe a randomized algorithm that does this, with significantly less than $O(n)$ comparisons. Conceptually, the algorithm hashes the data and compares the hashed values instead of the original data (hardly a novel idea). The interesting part of the algorithm is how it comes up with a provably good hash function.

Algorithm

  1. Treat bits of data as bits of a large number. So 1 peta byte (PB) of data becomes a single number with $2^{53}$ bits, meaning its value can be of the order of $2^{2^{53}}$.
  2. Since we want to compare two pieces of data, $a$ and $b$, conceptually, we are looking at the number $c=|a-b|$. In particular, we are interested in finding out if $c=0$, because if it is, then the input data is equal, not otherwise.
  3. A number $c$ cannot have more than $\lfloor{\log_2{c}}\rfloor$ unique prime divisors. So a number representing 1 PB of data has no more than $2^{53}\approx 8\times{10^{15}}$ unique prime divisors.
  4. Choose $\tau=2(\log_2{c})^2\log_2({\log_2{c}})$. For 1 PB, $\tau\approx{7\times 10^{32}}$. By the prime number theorem, this number has about $10^{31}$ prime numbers less than it. These are enough prime numbers such that if we were to draw one at random from it, the probability that it is one of the prime divisors of $c$ is less than $1/\log_2{c}$. In other words, the number of primes less than $\tau$ is about the square of the number of prime divisors of $c$. For our example, this probability is less than $1/2^{53}\approx1.25 \times 10^{-16}$, if I got the calculations correct.
  5. Choose a prime number less than $\tau$ at random. Note that this involves choosing $O(\log\tau)$ random bits. In our example, this is about 110 bits.
  6. Construct a hash function $F_p(x) = x \mod p$. 
  7. Determine $F_p(a)$ and $F_p(b)$. Since these hash functions are mod functions, the output is no more than $log(\tau)$ bits, in our case 110 bits.
  8. If $F_p(a)=F_p(b)$, then we say that $a=b$. Otherwise, we say that $a\neq b$. So we compared $O(\log{\tau})$ number of bits, instead of $O(n)$.
  9. The probability that this algorithm errs is less than $1/\log_2{c}$. So in our case, it is $1.25 \times 10^{-16}$, which is practically zero.

Conclusion and notes

  • To determine equality of 1PB of data, instead of comparing $2^{53}$ bits of data, we compared 110 bits of data. This introduced a minuscule probability of error.
  • However, computing prime numbers is a hard problem (and large ones, especially so!). But this problem is CPU intensive. So we replaced a network intensive operation with a CPU intensive one, with a small probability of error.
  • Computing hash function $F_p(x)$ needs access to the entire content of $x$, which is potentially a disk intensive operation.

References

  • Motwani, R. and Raghavan, P., Randomized Algorithms, 2007.

Friday, June 15, 2012

On Weak and Strong Laws of Large Numbers

Introduction

In statistics, we use the mean calculated from a sample as an estimate for the mean of the population. E.g. if the average height of a random sample of a thousand people from a region is 6 feet, then we estimate that the average height of all people in that region is 6 feet. Why does this work? The weak and strong laws of large numbers provide a theoretical basis for why this works. Given below are the laws themselves and their difference, which serves as a justification for their names.


Notation

  1. Let $X_1$, $X_2$, $\cdots$, $X_n$ be independent and identically distributed random variables. (In layman terms, $X_1$ is the first observation, $X_2$ is the second and so on.)
  2. Let $M_n$ be a random variable denoting the mean of $X_1$, $X_2$, $\cdots$, $X_n$. In other words, \[M_n=\frac{1}{n}\sum_{1}^{n}X_i\]So this is the mean of the sample.
  3. Let $\mu$ be the mean of each of $X_1$, $X_2$, $\cdots$, $X_n$. In other words, $\mathbf{E}(X_i)=\mu$ for each $i$. So $\mu$ is the mean of the population (usually unknown, which is why we want to estimate it!).

Weak Law of Large Numbers

This law states that for any $\epsilon>0$,
\[\lim_{n\to\infty}\mathbf{P}(|M_n-\mu|>\epsilon)=0\]Interpretation
  • For large values of $n$ (i.e. $n>n_0$ for some $n_0$), the probability that the value of $M_n$ (the sample mean) differs from the population mean $\mu$ by more than any given number $\epsilon$ is 0.
  • Alternatively, all probability is concentrated in an $\epsilon$-interval around $\mu$.
  • Alternatively, almost surely, for large samples, the sample mean is within an $\epsilon$ neighborhood of the population mean.

Strong Law of Large Numbers

This law states that
\[\mathbf{P}(\lim_{n\to\infty}M_n=\mu)=1\]Interpretation

  • For large values of $n$ (i.e. $n>n_0$ for some $n_0$), the probability that the value of $M_n$ (the sample mean) differs from the population mean at all is 0.
  • Alternatively, all probability is concentrated at $\mu$.
  • Alternatively, almost surely, for large samples, the sample mean is exactly the population mean.

Difference between the two laws

  • Strong law is stronger than the weak law because the strong law allows for $\epsilon=0$, while the weak law has to have  $\epsilon>0$.
  • Per the strong law, all probability is concentrated at $\mu$, while per the weak law, it is concentrated in the interval $(\mu-\epsilon,\mu+\epsilon)$, which is infinitely larger because $\epsilon>0$.
  • Because the probability of the sample mean, $M_n$ differing from population mean $\mu$ is 0, the strong law allows for only a finite number of values of $M_n$ to differ from $\mu$. In other words, there are only a finite number of sequences $X_1$, $X_2$, $\cdots$, $X_n$ whose mean $M_n$ differs from $\mu$. Now that is a very strong statement!
  • Because the probability of the sample mean $M_n$ differing from population mean $\mu$ is positive (although small), the weak law allows for an infite number of values of $M_n$ to differ from $\mu$. In other words, there are an infinte number of sequences $X_1$, $X_2$, $\cdots$, $X_n$ whose mean $M_n$ differs from $\mu$. This is clearly weaker than the previous statement.

References

Friday, May 18, 2012

Freivalds' Algorithm

Introduction

Freivalds' algorithm is used to verify correctness of matrix multiplication. It is a randomized Monte Carlo algorithm and has a small probability of error. By repeatedly running this algorithm, one can reduce the probability of error below any threshold.


Motivation

One naive way to verify correctness of matrix multiplication is to actually redo the multiplication and compare the output with the given input. However, current matrix multiplication algorithms are quite slow - the fastest one being $O(n^{2.376})$. Freivalds' algorithm takes $\Theta(n^2)$ time to verify this multiplication and errs with probability $\leq 1/2$, only when the multiplication is incorrect. It therefore runs faster than matrix multiplication itself and is usually correct.


Freivalds' algorithm

  1. Input: $n\times{n}$ matrices $A$, $B$ and $C$. We would like to verify if $A\times{B}=C$.
  2. Choose a $n\times{1}$ column vector $\overrightarrow{r}\in\{0,1\}^n$, uniformly and at random. In other words, each component of the vector is either 0 or 1 and chosen randomly and uniformly.
  3. Compute $A\cdot(B\cdot\overrightarrow{r})$ and $C\cdot\overrightarrow{r}$. This takes $\Theta(n^2)$ time.
  4. Now comes the output
    1. If $A \cdot (B\cdot\overrightarrow{r}) \neq {C\cdot\overrightarrow{r}}$, output FALSE. In other words, we say that the given matrix multiplication is incorrect. This has a probability of correctness = 1.
    2. If $A \cdot (B\cdot\overrightarrow{r}) = {C\cdot\overrightarrow{r}}$, output TRUE. In other words, we say that the given matrix multiplication is correct. This has a probability of correctness $\geq{1/2}$. (See references for details.)

Discussion

  • If Freivalds' algorithm says that:
    • $A\times{B}\neq{C}$, then this statement is always correct.
    • $A\times{B}={C}$, because we are only doing a small number of checks ($n$ - one for each row of the result), it is possible that $A\times{B}\neq{C}$, but the algorithm cannot determine it in the amount of time it has. (See references for details.)
  • Stated another way:
    • when $A\times{B}={C}$, Freivalds' algorithm is always correct.
    • When $A\times{B}\neq{C}$, the algorithm errs with probability $\leq{1/2}$.
  • By re-running this algorithm $k$ times and choosing a random $\overrightarrow{r}$ each time, we can reduce the probability of error to $\leq{1/2^{k}}$.
  • In other words, in $\Theta({kn^2})$ time, Freivalds' algorithm guarantees that the error in verifying multiplication of two $n\times{n}$ matrix is $\leq{1/2^{k}}$.

References

  1. Wikipedia - Freivalds' algorithm
  2. Motwani, R. and Raghavan, P., Randomized Algorithms, 2007.