Peter Zhang

CS 170 MT1


New concept: taking notes for midterms and uploading them to the site. A natural way to hold myself accountable for producing good notes. Visiting my blog is also a convenient and morally justifiable (?) way to cheat on a test. To anyone actually using this to study: no promises, bud.

These notes are for Computer Science 170: Efficient Algorithms and Intractable Problems, based on Algorithms by Dasgupta et al.

Big-O Notation


Runtime is expressed in terms of basic computer steps. For functions \(f(n)\) and \(g(n)\), we have:

\[f = O(g) \implies f(n) \leq c * g(n)\\ f = \Omega(g) \implies f(n) \geq c * g(n)\\ f = \Theta(g) \implies O(g), \Omega(g)\]

Use the following rules:

  1. Multiplicative constants are dropped.
  2. For all \(a > b\), \(n^a\) dominates \(n^b\).
  3. Any exponential dominates any polynomial.
  4. Any polynomial dominates any logarithm.

To demonstrate an asymptotic, compute the limit \(\lim_{n \to \infty} \frac{f(n)}{g(n)}\).

Divide and Conquer

Master Theorem

The general format for regression is as follows, where \(a\) is the number of children, \(b\) is the reduction in problem complexity, and \(c\) and \(d\) describe the complexity at each level.

\[T(n) = a*T(\frac{n}{b}) + c*n^d, T(1) = C\]

These can be evaluated as follows:

\[\frac{a}{b^d} < 1 \implies T(n) = O(n^d)\\ \frac{a}{b^d} = 1 \implies T(n) = O(n^d\log n)\\ \frac{a}{b^d} > 1 \implies T(n) = O(n^{\log_b a})\]

For complicated recurrence relationships, substitute and expand, examining patterns.


To select the \(k\)th smallest element in an array, repeatedly partition the array by a randomly selected element \(v\). Since there is a 50% chance that the \(v\) is between the 25th and 75th percentile, we expect the array to shrink by 25% after two partition, so the expected running time is

\[T(n) = T(\frac{3n}{4}) + O(n)\]

which is \(O(n)\) by the master theorem.

Matrix Multiplication

Subdivide two matrices and multiply as:

\[XY = \begin{bmatrix} A & B\\ C & D \end{bmatrix} \begin{bmatrix} E & F\\ G & H \end{bmatrix} = \begin{bmatrix} AE+BG & AF+BH\\ CE+DG & CF+DH \end{bmatrix}\]

This is \(O(n^3)\), which is inefficient. Strassen discovered a way to compute

\[XY = \begin{bmatrix} P_5 + P_4 - P_2 + P_6 & P_1 + P_2\\ P_3 + P_4 & P_1 + P_5 - P_3 - P_7 \end{bmatrix}\]


\[P_1 = A(F-H)\\ P_2 = (A+B)H\\ P_3 = (C+D)E\\ P_4 = D(G-E)\\ P_5 = (A+D)(E+H)\\ P_6 = (B-D)(G+H)\\ P_7 = (A-C)(E+F)\]

resulting in a new running time of \(T(n) = 7T(\frac{n}{2}) + O(n^2)\) which by the master theorem is \(O(n^{\log_27})\).

Fast Fourier Transform

The FFT expedites the multiplication of two polynomials. For some polynomial \(A(x)\), we pad the polynomial to have \(n = 2^k\) coefficients \(a_0, a_1,..., a_{n-1}\). The FFT outputs the values of \(A(x)\) on the $n$th roots of unity, \(\omega^0, \omega^1,..., \omega^{n-1}\) where \(\omega = e^{2\pi i/n}\).

The algorithm divides the polynomial into even coefficients \(A_e = a_0, a_2, ..., a_{n-2}\) and odd coefficients \(A_o = a_1, a_2, ..., a_{n-1}\). We recursively call \(FFT(A_e, \omega^2)\) and \(FFT(A_o, \omega^2)\). Notice that \(\omega^2\) corresponds to the \(n/2\)th roots of unity, so we can assign the results to \(A_e^{k}\) and \(A_o^{k}\) for \(0 \leq k \leq n/2\).

The results are recombined as \((A^i) = A_e(\omega^{2(i \mod n/2)}) + \omega^{i}A_o(\omega^{2(i \mod n/2)})\) for \(0 \leq i \leq n\).

Now, for two polynomials \(A(x)\) and \(B(x)\), we compute the polynomials on the \(n\) roots of unity using \(FFT\) and compute \(C(x)\) on those points. Surprisingly, we can interpolate by computing \(FFT(C, \omega^{-1})\), yielding a result in \(O(n, \log n)\) operations.



A graph is most commonly represented as an adjacency matrix or adjacency list. For a graph of \(n\) vertices and \(e\) edges, a matrix requires \(O(n^2)\) space but provides constant access. A list requires just \(O(e)\) and \(O(n)\) access.


Depth first search (DFS) involves repeatedly calling the explore procedure on unvisited starting points until the entire graph has been traversed. It marks each visit with a pre/post-visit number and explores every edge, resulting in a runtime of \(O(v+e)\). In an undirected graph, DFS can be used to identify connected components.

In DFS, for any two intervals \([pre(u), post(u)]\) and \([pre(v), post(v)]\), either one contains the other or they are disjoint. In the case of the former, the node of the containing interval is the ancestor of the other node.


A cycle is a circular path. A DAG is a directed, acyclic graph. If a DFS reveals a back-edge, then a directed graph contains a cycle. In a DAG, every edge leads to a vertex with a lower \(post\) number. Every DAG must contain at least one source—no incoming edges—and one sink—no outgoing edges.

In a directed graph, a strongly connected component has the property that for any two nodes \(u\) and \(v\), there exists a path from \(u\) to \(v\) and a path from \(v\) to \(u\). Shrinking a directed graph’s SCCs to nodes produces a DAG since any cycle would constitute a SCC. Therefore, every direct graph is a DAG of its SCCs.


Breadth-first search partitions a graph into layers emanating from a starting vertex \(s\). For each new layer, the nodes are processed via FIFO. The overall running time is identical to DFS \(O(v+e)\). BFS does not restart in other connected components, so nodes unreachable from \(s\) are ignored.

When edges have unit length, an edge of length \(l\) could just be broken down into \(l\) weightless edges. But, this is silly. Dijkstra’s algorithm uses a priority queue to repeatedly explore the most proximate edge. The running time is identical to BFS, but is slowed down by the use of a priority queue. With a heap, the running time is \(O((v+e)\log v)\).

When there are negative edges, we can no longer use Dijsktra’s, since distance is no longer strictly increasing. Instead, we use Bellman-Ford, which “updates” every edge \(v-1\) times. If there is a finite shortest path, then it must be reachable with \(v-1\) nodes. If there a negative cycle, then it doesn’t make sense to ask about a shortest path. We know there is a shortest cycle is distance is reduced on the \(v-1\)th update.


Some problems can be simplified by “stacking” multiple copies of a given graph. For example, suppose we want to construct a walk on graph \(G\) that crosses a given edge \((u, v)\) four times before returning to starting node \(s\). We can create five copies of \(G\), with each edge replaced wtih \((u_i, v_{i+1})\), and run Dijkstra’s the compute the shortest distance from \(s_0\) to \(s_4\).

Greedy Algorithms


Greedy algorithms solve a problem by making immediately optimal decisions at each step. In the cases of MSTs, Huffman encoding, and Horn formulas, greedy produces optimal solutions. In the case of set cover, it produces a decent approximation. And in the case of chess, it’s a shit strategy (I can prove this fact with first-hand experience).

To prove a greedy algorithm is optimal, use a proof by contradiction. Assume that a given piece of the optimal solution disagrees with the greedy solution and show that it leads to a contradiction.


Greedy algorithms can correctly construct minimum spanning trees, an algorithm called Krustal’s algorithm. Begin at any node and repeatedly add the edge of lowest weight that doesn’t produce a cycle.

Correctness is demonstrated by the cut property. Across any cut of an MST, the lowest weight edge \(e\) is included in a valid MST. This must be the case, since \(e\) is of equal or lower weight the current edge over the cut, \(e'\).

Built with Jekyll on the Swiss theme.