Notes are for Computer Science 170: Efficient Algorithms and Intractable Problems, based on *Algorithms* by Dasgupta et al. If you’re studying for the midterm on Tuesday, good luck! If you’re just reading along for fun, the first set of notes is here.

### Greedy

#### Huffman Encoding

Tasks like audio compression involve efficiently representing a string. Suppose it need to encode an alphabet of characters \(A\), \(B\), \(C\), and \(D\). We could create a unique 2-bit representation of each. But if one of the character is predominant in the string, such an encoding would be inefficient. We want to define binary codewords of different lengths, and we can do so with a full binary string.

For a given tree of \(n\) symbols, each with frequency \(f_i\) and depth \(d_i\), we can compute the cost of the tree as \(\sum^n_{i=1} f_i * d_i\). Equivalently, the cost is the sum of the frequencies of all leaves and internal nodes, excluding the root.

We can construct this tree greedily using Huffman encoding. Iterate through a priority queue of frequencies in ascending order. Remove the two smallest frequencies \(f_i, f_j\) and add \(f_k = f_j + f_i\) to the priority queue. Then, create a node \(k\) with children \(i, j\).

#### Horn Formulas

A Boolean variable is either true or false and a literal is either a variable or its negation. Horn formulas store knowledge in two forms of clauses. Implication clauses state that a combination of positive literals imply a single positive literal, e.g. \((z \land w) \implies u\). To dictate that a variable is always true, remove the predicate, e.g. \(\implies x\). Negation clauses take the form of “at least one must be false”, e.g. \(\neg a \lor \neg b \lor \neg c\).

We can use a greedy algorithm to find a satisfying assignment of true and false values. Begin by setting everything false. Then, while there exists an unsatisfied implication, set the consequence to true. If all negative clauses are met, return the assignment. Otherwise, the clauses are inconsistent.

#### Set Cover

Suppose we are given a group of sets \(S_0, S_1, S_2, ..., S_n\) and a target set of elements \(B\). How can we select the minimal number of sets to cover the target set? A greedy approach would repeatedly pick the smallest set \(S_i\), but it doesn’t guarantee optimality. To see this, consider the following set.

Fortunately, greedy can still provide a good approximation. Suppose \(B\) contains \(n\) elements and the optimal cover includes \(k\) sets. In picking each set, the number of remaining uncovered elements \(n_t\) decreases by at least \(\frac{n_r}{k}\). This implies that \(n_t \leq n(1-1/k)^t \leq ne^{-t/k}\).

### Dynamic Programming

Dynamic programming and linear programming are “sledgehammers,” techniques that provide broad applicability at the expense of efficiency.

#### Shortest Paths

Shortest paths are easy to find in DAGs because the nodes can be linearized, arranging them on a line so that the edges go left to right. The algorithm happens to be an example of dynamic programming. For a given node, we break the problem down by finding the shortest path to every incoming node.

#### Longest Increasing Subsequence

Given a sequence of numbers \(a_1, a_2, ..., a_n\), an increasing subsequence is any strictly increasing ordered subset. Notice that comparisons produce a similar structure to a dag.

We use a similar approach. Begin the subsequence at each index and consider the longest subsequence from each subsequent node of greater value.

#### Edit Distance

Suppose we have two strings. The edit distance between the strings is the minimum number of insertions, deletions, and substitutions to get from the first string to the second.

We can break this down into subproblems by considering the edit distance between the first \(i\) characters of the first string and the first \(j\) characters of the second string. We can either delete the \(i\)th character of string one, add the \(j\)th character of string two, or substitute the \(i\)th character. It corresponds to the following recurrence relationship: \(E(i, j) = \min[1 + E(i-1, j), 1+E(i, j-1), \text{diff}(i, j) + E(i-1, j-1)]\) Underlying the solution is a DAG with edges from different combinations of \((i, j)\).

#### Knapsack Problem

We have a set of \(n\) items form which to choose, each of which has weight \(w_i\) and value \(v_i\). We want to maximize the value subject to the capacity \(w\). If we are allowed to repeatedly pick items, then we can repeatedly pick the item which maximizes the value, notated as
\(K(w) = \max_{i:w_i\leq w} \{K(w-w_i) + v_i \}\)
But, if we are not allowed to repeat items, then we must introduce a second parameter \(0 \leq j \leq n\), where \(K(w, j)\) is the maximum value achievable using a knapsack of capacity \(w\) and items \(1, ..., j\). We have the new recursive relationship:
\(K(w, j) = \max \{K(w-w_j, j-1) + v_j, K(w, j-1) \}\)
To avoid repetition of subproblems, we use **memorization**, storing the solutions to subproblems that we’ve already seen. The running time is \(O(nW)\).

#### Chain Matrix Multiplication

Suppose we have a chain of matrices \(A_1 \times A_2 \times ... \times A_n\) with dimensions \(m_0 \times m_1, m_1 \times m_2, ..., m_{n-1} \times m_n\). We can represent the order of multiplication with a binary tree, such as below.

Given a list of dimensions, we can set up a recurrence relationship for matrices \(i\) through \(j\) by choosing the cheapest pair \(k, k+1\) to multiply first: \(C(i, j) = \min_{i\leq k \leq j} \{C(i, k) + C(k+1, j) + m_{i-1} \cdot m_k \cdot{} m_j\}\) The algorithm runs in \(O(n^3)\), with \(O(n^2)\) subproblems that take \(O(n)\) to compute each.

#### Shortest Paths

Suppose we want to find the shortest path from node \(s\) to every other path with at most \(k\) edges. We can modify the Dijkstra’s update function as follows: \(\text{dist}(v, i) = \min_{(v, u)\in E} \{\text{dist}(u, i-1) + l(u, v) \}\) But if we want to find the shortest path between every pair of vertices, we can use a faster algorithm. We instead let \(\text{dist}(i, j, k)\) denote the length of the shortest path from \(i\) to \(j\) where only the nodes \({1, 2, ..., k}\) can be used as intermediates. Updating is easy, since we just to need examine whether adding the \(k\)‘th node presents a shorter path for some \(i\) and \(j\) with an edge to \(k\). In other words, we update: \(\text{dist}(i, j, k) = \min \{\text{dist}(i, k, k-1) + \text{dist}(k, j, k-1), \text{dist}(i, j, k-1)\}\) with a runtime of \(O(\|V\|^3)\), which is better than the \(O(\|V\|^2 \|E\|)\) we’d get from running Dijkstra’s from each node.

#### Traveling Salesman

Suppose we want to find a tour between \(n\) nodes which visits every node exactly once and has minimum total length. A brute force solution checking every possible tour would take \(O(n!)\) time. Instead, we can evaluate every subset of size \(s\) that contains a starting node \(j\). We iterate through subsets \(S\) in ascending order of size \(s\) and all nodes, updating: \(C(S, j) = \min \{C(S - \{j\}, i) + d_{ij} : i\in S, i\neq j\}\) The \(2^n \cdot n\) subproblems each are linear time, so the running time is \(O(n^2 2^n)\).

### Linear Programming

#### Overview

An optimization problem consists of \(n\) variables \(x_1, x_2, ..., x_n \in \mathbb{R}\), an objective function \(f(x_1, x_2, ..., x_n) \in \mathbb{R}\), and \(m\) constraints \(C_1, c_2, ..., c_m\) with \(c_i(x_1, x_2, ..., x_n) \in \mathbb{R}\). The goal is to maximize \(F\) subject to every \(c_i\).

A linear programming problem involves a linear \(f(x_1, x_2, ..., x_n) = a_1x_1 + ... + a_nx_n\) with \(a_1, ..., a_n \in \mathbb{R}\) and a set of linear constraints (e.g. \(x_1 + x_2 = 1\) or \(x_3 - 4x_4 \geq 5\)). Strict inequalities are not allowed because the set of possible solutions would not be topologically closed. A generic setup can be written as: \(\max e^T x\\ Ax \leq b\\ x \geq 0\)

Visually, a set of constraints forms a region of feasible solutions. This region could be empty or unbounded. If it isn’t, then the region is convex. This is means that if \(\vec{x}\) and \(\vec{y}\) are feasible then so is any \(\vec{z} = \lambda \vec{x} + (1-\lambda) \vec{y}, \forall \lambda \in [0, 1]\). The conclusion is that any point on the interior can improve and any point on the edge can move along to a vertex.

#### Simplex

Simplex is a greedy strategy that begins with any vertex and moves to adjacent vertices. Each vertex is defined by \(n\) constraints at equality, and an adjacent vertex involves relaxing a constraint and tightening another constraint.

The correctness follows from convexity. Suppose that at some vertex \(v\), each of its neighbors has the same or worse value. Then, the feasible region is below the profit line going through \(v\).

The runtime is exponential. There are \({m \choose n}\) vertices, so while simplex halts in finite time, the above bound is exponential.

#### Network Flow

Suppose we wish to maximize the “flow” through a network subject to constraints on flow through each edge.

We reduce the problem to linear programming for the variables \({f(e)}_{e \in E}\) with constraints \(0 \leq f(e) \leq c(e)\) and \(\sum f(u, v) = \sum f(v, w)\) for all \(e\) and \(v\) with an objective function of \(\sum f(s, u)\) where \(s\) is the source. Solving the generic LP problem gives us a polynomial time solution.

A direct algorithm can solve the problem by modifying greedy. Begin with zero flow and repeatedly pick any path, increasing the flow by the maximum amount possible and canceling out existing flow when they travel in opposing directions. Each iteration is \(O(E)\). By the Ford-Fulkerson algorithm with random paths, there are at most \(f_{\max}\) iterations, so the runtime is \(O(E \cdot f_{\max})\). By the Edmunds-Korp algorithm based on BFS, there are at most \(O(V\cdot E)\) iterations and the overall runtime is \(O(V \cdot E^2)\).

The direct algorithm is correct by the Max-Flow Min-Cut Theorem. \(\max f = \min_{\text{cut}(L, R)} \text{capacity}(L, R)\) To see this, suppose there is some max flow \(F^*\). Notice that since it is optimal, there is no path \((s, t)\) in \(F^*\). Let \(L\) be every vertex reachable from \(s\) and \(R\) constitute the rest. The capacity of the cut must be \(f*\), since otherwise there would be another path from \(s\) to \(R\), which by construction is impossible.

#### Duality

In a primal LP problem, we have \(I\) inequalities and \(E\) equalities over \(n\) variables with \(N\) constrained to be non-negative. The **dual** has \(m = I + E\) variables with only \(I\) non-negative constraints.

The primal LP takes the generic form:
\(\max c_1x_1 + ... + c_nx_n\\
a_{i1}x_1 + ... + a_{in}x_n \leq b_i \text{ for } i\in I\\
a_{i1}x_1 + ... + a_{in}x_n = b_i \text{ for } i\in E\\
x_j \geq 0 \text{ for } j \in N\)
While the dual LP takes the form:
\(\min b_1y_1 + ... + b_ny_m\\
b_{j1}y_1 + ... + b_{jm}y_m \geq c_j \text{ for } j\in N\\
b_{j1}y_1 + ... + b_{jm}y_m = c_j \text{ for } j\not\in N\\
y_i \geq 0 \text{ for } i \in I\)
**Weak duality** states that the solution to the primal LP problem is always less than or equal to the solution to the dual LP problem. **Strong duality** states that the solutions are equal.

To get the intuition for why this works, consider the shortest path problems between two nodes \(S\) and \(T\) in a graph. Suppose we built a physical model where physical nodes are connected via loose strings. Clearly, we can find the shortest distance by pulling \(S\) and \(T\) away from one another, turning a minimization problem into a maximization problem.