YeeKal
optimization

duality

YeeKal
"#optimization"

intro

three main formulations and implementations: Fenchel, Wolfe, and Lagrangean

  • primal problem
  • dual problem

拉格朗日乘子,惩罚项把约束函数包含在目标函数中, 原始问题被转化为另一个问题

  • inf: infimum, the greatest lower bound, 下确界/最大下界
  • sup: suprema, the least upper bound

inf & sup

$\inf,\sup,\min,\max$

  • inf (infimum): The infimum or greatest lower bound of a set is the largest number that is less than or equal to every element in the set. It may not necessarily belong to the set itself. 下确界/最大下界
  • sup (supremum): The supremum or least upper bound of a set is the smallest number that is greater than or equal to every element in the set. Like the infimum, it may not necessarily belong to the set.
  • min (minimum): The minimum of a set is the smallest element within the set. Unlike the infimum, it must be a member of the set.
  • max (maximum): The maximum of a set is the largest element within the set. Similar to the minimum, it must be a member of the set

incomplete: the inf or sup of the set does not belong to this set

operations on sets

Sum of sets The Minkowski sum of two sets $A$ and $B$ of real numbers is the set consisting of all possible arithmetic sums of pairs of numbers, one from each set. The infimum and supremum of the Minkowski sum satisfies and Product of sets The multiplication of two sets $A$ and $B$ of real numbers is defined similarly to their Minkowski sum: If $A$ and $B$ are nonempty sets of positive real numbers then $\inf (A \cdot B)=(\inf A) \cdot(\inf B)$ and similarly for suprema $\sup (A \cdot B)=(\sup A) \cdot(\sup B) \cdot{ }^{[3]}$

Scalar product of a set The product of a real number $r$ and a set $B$ of real numbers is the set If $r \geq 0$ then while if $r \leq 0$ then Using $r=-1$ and the notation $-A:=(-1) A={-a: a \in A}$, it follows that $\inf (-A)=-\sup A \quad$ and $\quad \sup (-A)=-\inf A$. Multiplicative inverse of a set For any set $S$ that does not contain 0 , let

If $S \subseteq(0, \infty)$ is non-empty then

where this equation also holds when $\sup S=\infty$ if the definition $\frac{1}{\infty}:=0$ is used. ${ }^{[\text {note } 2]}$ This equality may alternatively be written as $\frac{1}{\sup {s \in S} s}=\inf $.} \frac{1}{s}$. Moreover, $\inf S=0$ if and only if $\sup \frac{1}{S}=\infty$, where if ${ }^{[n o t e ~ 2]} \inf S>0$, then $\frac{1}{\inf S}=\sup \frac{1}{S

Duality

For subsets of the real numbers:

For any functionals $f$ and $g$:

Complementary slackness

  1. 构建拉格朗日乘子
  2. 根据梯度为0替换掉x, 即求 $g(\lambda, \nu) = \inf_{x \in \mathcal{D}} L(x, \lambda, \nu)$ 或者根据最小化
  3. 构建以拉格朗日乘子的对偶问题, 即求 $\sup_{\lambda, \nu} g(\lambda, \nu)$

hyperplane

$\inf_{x \in \mathcal{D}} L(x, \lambda, v)$

两种方案: 1. 求梯度; 2. fenchel conjugate

关于求梯度: 通过次微分进行解释

the optimality condition for minimizing over $y$ can be written as:

where $\partial_g(y)$ is the subdifferential of $g$ at $y$, and is defined by the set of supporting vectors, $e.g.$:

Then this can be proved: $\partial |y| = {z : z^Ty = |y|,\; |z|_* \leq 1}$. proofs pdf

Fenchel conjugate

convex conjudate: 凸共轭

Fenchel conjugate is also known as the Fenchel-Legendre transform

原函数$f:\mathbb{R}^n \rightarrow \mathbb{R}$的共轭函数:

the dual norm $||\cdot||_*$ of a norm $||\cdot||$ is difined as:

Fenchel不等式:

  1. 绝对值 $f(x) = |x|$, 共軛函数为:

as: $\inf_y{||y|| - \mu ^Ty} = \sup (\mu^T y-||y||) = f^*(y)$

主问题Prime $\max {\alpha \geq 0, \beta} L(w, \alpha, \beta)=\left{\right.$ 主问题Prime 最优 $\quad J^*=\min _w \max L(w, \alpha, \beta)$ 对偶问题Dual $\quad D(\alpha, \beta)=\min _w L(w, \alpha, \beta)$. Prime 和 Dual关系 $D(\alpha, \beta) \leq J(w)$

Lagrangian

The Lagrangian of this problem is:

The Lagrangian dual is:

g是最小值,而由于$L \leq f$, 所以希望L与f之间的gap越小越好,因此希望找到最优的$\lambda, \nu$使得g越大越好.

The Lagrangian dual problem, in turn is:

Note that:

This means that

is the optimal value of the primal problem. Moreover, by definition,

is the optimal value of the dual problem.

If it is strong duality, then $d^ = p^$.

(Next explanation)the dual problem:

For each $\boldsymbol{y}$, the Lagrangian $L(\boldsymbol{y}, \boldsymbol{x}, \boldsymbol{s})$ is linear in $(\boldsymbol{x}, \boldsymbol{s})$ and hence also concave in them. Hence $L(\boldsymbol{x}, s)$ is a concave function, because it is the pointwise minimum (over $y$ ), of a collection of concave functions in $(x, s)$.

This also means that the dual problem is really a convex optimization problem in disguise, because we can flip the sign of $-L(\boldsymbol{x}, s)$ to get a convex function and minimizing this is equivalent to maximizing $L(\boldsymbol{x}, \boldsymbol{s})$.

先求了一簇“最小”,然后从其中挑了一个“最大

Slater条件: 是凸集是强对偶的充分非必要条件

linear combination of the gradient

Ref