Peter Zhang

CS 170 Final

Good luck on finals! Also check out the notes for topics from midterm 1 and midterm 2. Notes based on chapters 7 of Algorithms by Papadimitriou et al.

Linear Programming

Zero-sum Games

Suppose two players, Row and Column, are playing rock, paper, scissor. The matrix of outcomes looks as follows:

Clearly, if either player knows their opponent’s move ahead of time, then they can always win. Suppose instead that Row has a mixed strategy, picking \(r, p, s\) with probabilities \(x_1, x_2, x_3\), respectively. We denote Row’s mixed strategy \(x = (x_1, x_2, x_3)\) and Column’s mixed strategy \(y = (y_1, y_2, y_3)\). The expected payoff for a given round is: \(\mathbb{E}[P] = \sum_{i, j} G_{ij}x_iy_j\) If Row adopts a complete random strategy with \(x = (1/3, 1/3, 1/3)\), then it is easy to show that for any choice Column makes, the expected value is 0. In fact, no matter which player goes first, the other can pick \((1/3, 1/3, 1/3)\) and set the EV to 0.

Now consider an asymmetric game, where Row chooses between \(e\) and \(s\) while Column chooses between \(m\) and \(t\).

Notice that for any fixed Row strategy—say, \(x = (1/2, 1/2)\)—there is a pure strategy from Column that is optimal—in this case, \(y = (0, 1)\). Therefore, Row should pick a strategy that maximizes the optimal response from Column: \(\max_{x_1, x_2} \min(3x_1 - 2x_2, -x_1 + x_2)\) But, this is an LP problem: \(\max z\\ -3x_1 + 2x_2 \leq z\\ x_1 - x_2 \leq z\\ x_1 + x_2 = 1\\ x_1, x_2 \geq 0\) Now, notice that Column faces a symmetric problem: \(\min_{y_1 , y_2} \max(3y_1 - y_2, 2y_1 + y_2)\) which we can rewrite as \(\min w\\ 3y_1 - y_2 \leq w\\ -2y_1 + y_2 \leq w\\ y_1 + y_2 = 1\\ y_1, y_2 \geq 1\) These two LPS are dual to one another, so they produce the same optimum \(V\), which is the value of the game.

A more general game theory result is the min-max theorem, which states that \(\max_x \min_y \sum_{i,j} G_{ij}x_iy_j = \min_y \max_x \sum_{i, j} \sum G_{i, j} x_i y_j\)

Multiplicative Weights

Each day, we allocate \(x_i^{(t)}\) of our capital to stock \(i\) on day \(t\). Let \(l_i^{(t)}\) denote the loss we incur on day \(t\) if we invested everything on stock \(i\). At the end of day \(T\) we have lost a total of \(\sum^T_{t=1} \sum^{n}_{i=1} x_i^{(t)}l_i^{(t)}\) On each day, we must pick an array \(x^{(t)} = [x_1^{(t)}, ..., x_n^{(t)}]\) with non-negative \(x_i^{(t)}\) that sum to 1.

Define the regret \(R_T\) on day \(T\) as the difference between the overall loss and the optimal fixed allocation strategy: \(R_T = \sum^T_{t=1} \sum^n_{i=1} x_i^{(t)} \cdot l_i^{(t)} - \min_{x \text{ probability distribution}} \sum^T_{t=1} \sum^n_{i=1} x_i l_i^{(t)} = \sum^T_{t=1} \sum^n_{i=1} x_i^{(t)} \cdot l_i^{(t)} - \min_{i = 1, ..., n} \sum^T_{t=1} l_i^{(t)}\) The multiplicative weight algorithms bounds \(R_T \leq 2\sqrt{T \ln n}\), noticing that \(\frac{R_T}{T} = O(\sqrt{\frac{\ln n}{T}})\). Begin by setting a parameter \(0 \leq \epsilon \leq 1/2\) with weights \(w^{(t)} = [w_1^{(t)},...,w_n^{(t)}]\) where: \(w_t^{(0)} = 1\\ w_i^{(t+1)} = w_i^{(t)} \cdot (1-\epsilon)^{l_t^{(t)}}\) which causes strategies with high losses to lose weight. At each step, the algorithm allocates proportionately to the weights: \(x_i^{(ta)} = \frac{w_i^{(t)}}{\sum_j w_j^{(t)}}\) Let \(L^*\) denote the offline optimum. First, notice that \(W_{T+1} \geq (1-\epsilon)^{L^*}\). Second, we can prove that \(W_{t+1} \leq W_t \cdot (1-\epsilon L_t)\). Taken together, we have: \(\sum^T_{i=1} L_t - L^* \leq \epsilon L^* + \frac{\ln n}{\epsilon} \leq \epsilon T + \frac{\ln } { \epsilon}\) were we can set the terms equal with \(\epsilon = \sqrt{\frac{\ln n}{T}}\), yielding our regret bound.


When a subroutine of task \(Q\) is used to solve \(P\), then \(P\) reduces to \(Q\). Visually,

A fast algorithm for \(Q\) provides a fast algorithm for \(P\). But, if \(P\) is hard, then \(Q\) must also be hard.

Bipartite Matching

Suppose in a graph with an equal number of boys and girls, an edge a heterosexual crush. A perfect matching is a bipartite matching of the graph and can be reduced the following network flow. The solution is integer since all the edge weights are also integer.

Circuit Value Problem

Suppose we have a Boolean circuit with input gates (value true or false), AND/OR gates with indegree 2, and NOT gates with indegree 1. One gate is called the output. The problem can be reduced to a linear program by creating constructs for each gate.

Surprisingly, since any algorithm is ultimately a Boolean circuit, all problems that can be solved in polynomial time also reduce to linear programming.


Matrix multiplication (MM) and matrix inversion (MI) reduce to one another.

First, notice that MM reduces to MI because to multiply matrices \(A\) and \(B\), we construct a larger matrix and compute: \(X = \begin{bmatrix}I & A & 0\\ 0 & I & B\\0 & 0 & I \end{bmatrix}\\ X^{-1} = \begin{bmatrix}I & A & AB\\ 0 & I & B\\0 & 0 & I \end{bmatrix}\) yielding our desired result.

Second, we reduce MI to MM through Gaussian elimination: \(\begin{bmatrix}A & B\\ C & D\end{bmatrix} = \begin{bmatrix}I & 0 \\ CA^{-1} & I\end{bmatrix} \cdot \begin{bmatrix}A & B \\ 0 & D - C \cdot A^{-1} \cdot B\end{bmatrix}\) Let \(X = D - C \cdot A^-1 \cdot B\), \(Y = CA^-1\), and \(Z = -A^{-1} \cdot B \cdot S^{-1}\). We easily compute the inverse as: \(\begin{bmatrix}A & B\\ C & D\end{bmatrix}^{-1} = \begin{bmatrix}A & B\\ 0 & S\end{bmatrix}^{-1} \cdot \begin{bmatrix}I & 0\\ Y & I\end{bmatrix}^{-1} = \begin{bmatrix}A^{-1} - Z \cdot Y & Z\\ -S^{-1} \cdot Y & S^{-1}\end{bmatrix}\) Computing \(A^{-1}\) and \(S^{-1}\) are \(F(n/2)\), while computing \(Y\), \(YB\), \(S\), \(Z\), \(Z \cdot Y\) and \(S^-1 \cdot Y\) each take \(O(n^2)\). \(F(n) = 2F(n/2) + O(n^2) = O(n^2)\) by the recurrence master theorem.


Search Problems

A search problem involves an instance \(I\) with a target solution \(S\) that can be quickly verified by a checking algorithm \(C\). The running time is polynomial in \(I\). The class of all search problems is called NP (non-deterministic polynomial time). Meanwhile, the class of all search problems solvable in polynomial time is P. We can leverage reductions to show that certain problems are NP-complete.


SAT (or satisfiability) involves a boolean formula in conjunctive normal form where several or clauses are and-ed together. We can quickly check if a particular assignment is valid, but it’s hard to check all assignments. If all clauses contain at most one positive literal, then we have a Horn formula which we can check with a greed algorithm. In 2-SAT, where each has only two literals, we can solve it in linear time with SCCs.

We can reduce 3-SAT to independent set as follows. Create a graph with a triangle for each clause and edges between opposite literals. Set the goal \(g\) to the number of clauses. Any solution \(S\) will never contain a contradiction, so it must solve 3-SAT. Conversely, for any solution to 3-SAT, simply add any true literal in each clause to the set \(S\).

We can reduce SAT to 3-SAT as follows. Whenever a clause has more than three literals such as \((a_1 \lor a_2 \lor ... \lor a_k)\), replace it with \((a_1 \lor a_2 \lor y_1) (\bar{y_1} \lor a_3 \lor y_2) (\bar{y_2} \lor a_4 \lor y_3) ... (\bar{y}_{k-3} \lor a_{k-1} \lor a_k)\) If this version is satisfied, then it is easy to see that the original must be satisfied. Conversely, if some \(a_i\) is true, then we can set \(y_1, ..., y_{i-2}\) to true and the rest to false.

Any problem in NP reduces to SAT because we can generalize SAT to Circuit SAT, which has AND/OR/NOT gates, known inputs, and unknown inputs. SAT can trivially be represented as a Circuit SAT with no known input gates. In the other way around, we can represent OR, AND, and NOT by breaking them down into one or more clauses.


The Traveling Salesman Problem looks for a tour among \(n\) vertices with total cost constrained by budget \(b\). We introduce the constraint so we can reduce the optimization problem (which can’t be checked in polynomial time) to a search problem. We can use a binary search on the budget to find the optimal cost. There are only exponential time algorithms for TSP.


Two twin problems are the Rudrata cycle–which visits each vertex exactly once—and Euler cycle—which traverse each edge exactly once. Rudrata cycle is not solvable in polynomial time because it focuses on vertices.

Rudrata path is equally hard. Suppose we have an instance of Rudrata path from \(s\) to \(t\). Create new edges \((s, x), (x, t)\). This trivially reduces to Rudrata path.

Rudrata can be reduced to TSP as follows. Construct an instance of TSP with unit length edges and \(1+\alpha\) length edges where they aren’t suppose to exist. For \(\alpha = 1\), we have the triangle inequality.


There is no efficient algorithm for integer linear programming, where we are given a set of linear inequalities \(AX \leq b\).

ZOE can be reduced to subset sum by turning each column’s elements into digits. ZOE can be reduced to ILP by constraining each variable to be between 0 and 1.

3D Matching

Given a tripartite graph with \(n\) nodes on each side, find \(n\) disjoint edge triples or decide that none exists. We can reduce 3-SAT to 3D matching by using this gadget.

We can then link up clauses by pairing the pets and adding extra boys and girls to fill in the gaps.

We can easily reduce 3D matching to ZOE via the following matrix.

Set, Cover, Clique

In Independent Set, given a graph and integer \(g\), find \(g\) independent vertices (which share no edge). It can be solved efficiently on trees, but not for general graphs. In Vertex Cover, the goal is to find \(b\) vertices which \(b\) vertices that touch every edge. Vertex Cover and 3D matching are both special cases of Set Cover.

We reduce Independent Set to Vertex Cover by computing the vertex cover on $$ V _g$$ nodes. Notice that the complement must be an independent set.

We reduce Independent Set to Clique by checking if a set of nodes \(S\) in \(G\) is a clique in \(\bar{G}\).

Longest Path

Find a simple path from \(s\) to \(t\) with weight at least \(g\).


Knapsack is exponential in its input size since it looks at every subset. The unary knapsack variant has an efficient solution. A seemingly simple variant called Subset Sum—where each item’s value is its weight and we wish to reach a capacity—is also hard.

Dealing with NP-Hard


For any minimization problem, the approximation ratio of the algorithm is defined to be \(\alpha_A = \max_I \frac{A(i)}{\text{OPT}(I)}\) which is to say, a factor by the which the output exceeds the optimal solution.

We can approximate vertex cover with a matching, a subset of edges with no vertices in common. Repeatedly add disjoint edges until no longer possible. It is guaranteed to have at most 100% error, so \(\alpha_A = 2\).

Not all NP-hard problems have approximation ratio 2 . For example, if TSP has an approximation ratio of 2, then we could solve Hamiltonian Cycle by adding a length 2n edge on all non-existing edges. An algorithm with approx. ratio 2 could tell the difference.


To find the maximum of a function \(f\), we should follow the up direction, which is called gradient ascent. The idea is to pick a random starting point \(z\) and try to find a better point \(z'\) near \(z\) over \(M\) iterations. It gets close to gradient descent when we choose worse option according to an exponential decaying temperature schedule.

Randomized Algorithms

Consider a quicksort of array \(A\) with random pivots. The worst case will have an empty left or right and fail to divide. The best case is divide and conquer. Using an indicator variable, we can show that the expected value of each comparison \(X_{ij}\) between entry \(i\) and \(j\) has an expected value \(\mathbb{E}(X_{ij}) = \frac{2}{j-i+1}\). Summing across all \(i, j\), we have \(2n\log n\).

Freivalds’ Algorithm gives a quick way of testing if \(C = A \times B\) by picking a uniformly random \(x \in {0, 1}^n\). If \(C = A \times B\), then \mathbb{P}(ABx = Cx) = 1\(. Otherwise,\)\mathbb{ABx = Cx} \leq 1/2\(. To show the latter, consider\)D := AB - C\(and notice that at most half of all\)Dx\(could be 0. If we pick\)N\(random and independent vectors, then the probability of failure is\)\frac{1}{2^N}$$.

Karger’s Global Minimum Cut algorithm uses contractions. The algorithm uniformly contracts an edge (replacing it with a supervertex) \(V-2\) times. The probability that Contraction returns the global minimum is at least \(\frac{1}{n \choose 2}\). This happens if no edge crossing the cut is contracted. The probability expression is a telescoping product that equals \(\frac{1}{n \choose 2}\). Using the fact that \(1+x \leq e^x\), we can compute that repeating the algorithm \(N\) times has a \(O(mN) = O(mn^2\log(1/p))\) runtime.


Suppose we want a data structure that can approximately count with error bound \(\mathbb{P}(\text{abs}(\tilde{n} - n) > \epsilon n) < \delta\) A trivial algorithm would use log(n) memory. While we could instead only increment \(X\) with probability \(1/2\), the memory saving is small and the approximation is bad for small \(X\). Morris devised this algorithm:

  1. Initialize \(X = 0\)
  2. For each update, increment \(X\) with probability \(\frac{1}{2^X}\).
  3. Output \(\tilde{n} = 2^X - 1\).

Using induction, we can show \(\mathbb{E} 2^{X_n} = n+1\). By using \(s\) copies of Morris and averaging them, we can reach \(\mathbb{P} (\text{abs}(\tilde{n} - n) > \epsilon n) < \frac{1}{2s\epsilon^2}\).

Lower Bounds

Any comparison based sorting algorithm takes at least \(\omega(n \log n)\) time. Since there are \(n!\) possibilities for the true sorted order, there are at least \(\log (n!)\) decisions it needs to make.

We haven’t found a superlinear lower bound for any explicit problem. Cell probe lower bounds are the gold standard since they only count read and write. Everything else is free!


We hash using \(h_a (x_1, x_2..., x_k) = \sum^k_{i=1} a_i x_i \mod n\) and are guaranteed a chance of collision of 1/n. We picked a random hash function by drawing the \(a_i\), so they are universal.


Markov’s inequality for a nonnegative random variable \(X\): \(P(X \geq a) \leq \frac{E(x)}{a}\) Chebyshev’s inequality guarantees that \(P(|X-\mu| \geq c) = P((X-\mu)^2 \geq c^2) \geq \frac{\sigma^2}{c^2}\)

Built with Jekyll on the Swiss theme.