YeeKal
optimization

凸优化问题

YeeKal
"#optimization"

Optimization Terminology

A convex optimization program can be written as a convex minimization:

where $f$ and $g_i, i=1,\cdots m$ are all convex.

Convex minimization can be reposed as concave maximization, and both are called convex optimization problems:

| notation/terminology | coment |
|---|---|---| |$f(x)$: Objective function, criterion| the convex function we are minimizing over.| |$D$: Optimization domain(定义域)|implicit domain, $D=\operatorname{dom}(f) \cap \bigcap_i \operatorname{dom}\left(g_i\right)$| |$g_i(x)$: Inequality constraints| convex functions; the solution must satisfy these.| |$f^$: Optimal value(最优值)| the minimum of $f(x)$ over all feasible points $x$| |$X_{\text {opt }}$: Set of solutions(解集)| All $x$ 's that solve to a convex problem. Property: it's a convex set.| |$x^$: Solution, optimal point| the $\mathrm{x}$ at which the function achieves the optimal value; i.e. - $x$ such that $f(x)=f^*$; also called the minimizer.| |Feasible points(可行集)| $x, x\in D, g_i(x)\leq 0, Ax=b$ | |$\epsilon$-suboptimal|a feasible $x$ for which the function is within $\epsilon$ of the optimal value, i.e. $f(x) ≤ f + \epsilon$| |$g_i$ is active|a feasible $x$ and $g_i(x)=0$| |Locally optimal(局部最优)|| |Globally optimal(全局最优)**||

$X_{\text {opt }}$ is a convex set

Proof: use definitions. If $x, y$ are solutions, then for $0 \leq \theta \leq 1$, - $\theta x+(1-\theta) y \in D$ - $g_i(\theta x+(1-\theta) y) \leq \theta g_i(x)+(1-\theta) g_i(y) \leq 0$ - $A(\theta x+(1-\theta) y)=\theta A x+(1-\theta) A y=b$ - $f(\theta x+(1-\theta) y) \leq \theta f(x)+(1-\theta) f(y)=f^{\star}$

Therefore $\theta x+(1-\theta) y$ is also a solution.

Another key property: if criterion $f$ is strictly convex, then the solution is unique.$X_{\mathrm{opt}}$ contains one element

Example:

lasso

Given $y \in \mathbb{R}^n, X \in \mathbb{R}^{n \times p}$, consider the lasso problem:

Huber loss

Locally optimal

A feasible point $x$ is called locally optimal is thers is some $R > 0$ such that:

for convex optimization problems: local optima are global optima

proof:

  1. suppose local optima $f(x)$, and global optima $f(y)$, $f(x)\geq f(y), x \neq y$. By definition, $f(x) \leq f(k), \forall ||x-k||_2\leq R$
  2. let $\theta = 1-\frac{R}{2l}, l = ||x-y||_2$
  3. let $z = \theta x + (1-\theta) y, ||x-z|| \leq R$, then

  1. this contradicts $f(x) \leq f(z)$

Rewrite Constraints

The initial problem can be rewritten as:

where $C={x:g_i(x)\leq 0\, i=1,\cdots m, Ax = b }$ is feasible set. The above formulation is Completely general.

The unconstrained form can also be rewrite as:

where $I_C$ is the indicator function of $C$

Properties and first-order optimality

first-order condition for optimality

For a convex problem:

and differential $f$, a feasible point $x$ is optimal if and only if:

In words: all feasible directions from $x$ are aligned with increasing gradient $\nabla f(x)$

$\nabla f(x)$: 在可行集内的梯度方向, 与等高线垂直. 梯度是切线在定义域内的投影

该条件含义: 1. if you move from $x$ towards any feasible $y$, you will increase $f$ locally 2. the vector $-\nabla f(x)$(assuming nonzero) serves as a hyperplane that "supports" the feasible set $\Omega$ at x.

first_optimality_for_convex_optimization.svg

Note: - 在$f$ 非凸情况下,该条件是必要条件(necessity). $f$是凸函数$\Rightarrow$ 该条件是充要条件(充分: sufficiency). - if $C=\mathbb{R}^n$(unconstrained optimization) then optimality condition reduces to familiar $\nabla f(x) = 0$. - 如果最优点在可行集内部, $\nabla f(x) = 0$.(Take $y=x-\alpha \nabla f(x)$ for $\alpha$ small enough)

Proof

proof-sufficiency:

according to First-order characterization of the convex function:

proof-necessity: (若小于0则在x附近存在更小的值) Suppose $x$ is optimal but for some $y \in \Omega$ we had

Consider $g(\alpha):=f(x+(\alpha(y-x))$. Because $\Omega$ is convex, $\forall \alpha \in[0,1], x+\alpha(y-x) \in \Omega$. Observe that

This implies that

But this contradicts the optimality of $x$.

Example2: equality-constrained minimization

consider the optimization problem

The point $x$ is optimal point if and only if it is feasible and $\exist\mu\in \mathbb{R}^m ~ s.t.$

$\nabla f(x) = A^T \mu ~~\text{or}~~ f(x) = A^T\mu = 0$

proof: according to first-order optimality, solution $x$ satisfies:

This is equivalent to:

result follows since $null(A)\perp = row(A)$

上式含义是指$\nabla f(X)$是A的列向量的线性组合, 即在A的列空间里面. 列空间里的向量和零空间的向量相互垂直. 这与拉格朗日乘子法的结果相同.

Example2: projection onto a convex set

according to first-order optimality condition, the optimal point $x$ satisfies:

Equivalently, this says that $a-x\in \mathcal{N}_C(x)$, which is the normal cone to $C$ at $x$:

从定义上看,若点在内部,则结果为0.若点在边界上, normal cone实际上定义了在该点处远离整个凸集的方向应该往哪个方向走.若边界在该点处是光滑的,则远离的方向只有一个;若不是光滑的,比如是多边形的一个角,则就会形成一个锥.

normal_cone_for_convex_set.svg

Equivalent transformations

Partial optimization

Reminder: $g(x)=\min _{y \in C} f(x, y)$ is convex in $x$, provided that $f$ is convex in $(x, y)$ and $C$ is a convex set

Therefore we can always partially optimize a convex problem and retain convexity

E.g., if we decompose $x=\left(x_1, x_2\right) \in \mathbb{R}^{n_1+n_2}$, then

where $\tilde{f}\left(x_1\right)=\min \left{f\left(x_1, x_2\right): g_2\left(x_2\right) \leq 0\right}$. The right problem is convex if the left problem is.

Example: Recall the SVM problem

Rewrite the constraints as $\xi_i \geq \max \left{0,1-y_i\left(x_i^T \beta+\beta_0\right)\right}$. Indeed we can argue that we have $=$ at solution(由于是最小化,因此 $\geq\max ~~ \Rightarrow ~~ =$)

Therefore plugging in for optimal $\xi$ gives the hinge form of SVMs: where $a_{+}=\max {0, a}$ is called the hinge function

Transformations and change of variables

If $h:\mathbb{R}\rightarrow \mathbb{R}$ is a monotone increasing transformation, then:

If $\phi:\mathbb{R}^n \rightarrow \mathbb{R}^m$:

Similarly, inequality or equality constraints can be transformed and yield equivalent optimization problems. Can use this to reveal the hidden convexity of a problem

Eliminating equality constraints

we can express any feasible point as $x = My+x_0$, where $Ax_0=b$ and $col(M) = null (A)$(即表示为一个解和零空间的线性组合). Hence the sbove is equivalent to:

Note: fully general but not always a good idea

Introducing slack variables

transform the above inequality constraints via:

Note: this is no longer convex unless each $g_i(x) = c_i^T x + d_i$, an affine function

Relaxing nonaffine equality constraints

Given an optimization problem we can always take an enlarged constraint set $\tilde{C} \supseteq C$ and consider This is called a relaxation and its optimal value is always smaller or equal to that of the original problem.(使用更大的可行集,最优点小于等于原问题。比如把等式约束放宽为小于等于)

Important special case: relaxing nonaffine equality constraints, i.e.,

where $h_j, j=1, \ldots r$ are convex but nonaffine, are replaced with