Convexity

We conclude this section by giving a geometrical condition which enables us to guarantee solutions are available using the above techniques. One way to guarantee good behaviour is to work with convex functions. A convex set is geometrically very simple. A set C is convex if x1 $ \in$ C, x2 $ \in$ C means that $ \lambda$x1 + (1 - $ \lambda$)x2 $ \in$ C whenever 0$ \le$$ \lambda$$ \le$1. In other words, if any line joining two points in c lies entirely in C.

You may already have met a convex function -- one in which the points above the graph of the function (the ``supergraph'') form a convex set. Equivalently, a function is convex if the line joining two points on the graph of f always lies above the graph of f.

An obvious way to generate a convex set is as the intersection of a collection of half planes; hence the set of feasible solutions of a linear programming problem forms a (possibly empty) convex set.

There are other very natural ways to generate convex sets. Let x1, x2, ..., xk be elements of Rn. It is essentially trivial to show that the set of all convex combinations of x1, x2, ..., xk is a convex set. Here is the argument: let u = $\l $1x1 + $\l $2x2 +...+ $\l $kxk and v = $ \mu_{1}^{}$x1 + $ \mu_{2}^{}$x2 +...+ $ \mu_{k}^{}$xk be two convex combinations of x1, x2, ..., xk. Write down $\l $u + (1 - $\l $)v and show that, when 0 $ \leq$ $\l $ $ \leq$ 1, this is also a convex combination of x1, x2, ...,  xk.

It is of interest that the set of optimal solutions of a linear programming problem is a convex set. The next two examples discuss this:

Example 7..17   Show that if x1, x2, ..., xk are optimal solutions of a linear programming problem, then any convex combination of x1, x2, ..., xk is also an optimal solution.

Solution Let $\l $1x1 + $\l $2x2 +...+ $\l $kxk be any convex combination of x1, x2, ...,  xk. Let the objective function be c1x1 + c2x2 +...+ cnxn. Since x1, x2, ...,  xk all have the same objective value, z say, then, for all i,

c1x1i + c2x2i +...+ cnxni = z,

where xi = (x1i, x2i,..., xni). So (why ?),

c1($\l $1x11 +...+ $\l $kx1k) +...+ cn($\l $1xn1 +...+ $\l $kxnk) = z,

which shows (why ?) that $\l $1x1 + $\l $2x2 +...+ $\l $kxk also has objective value z. Hence the result.

This solution can be written more simply by using matrix notation, letting the objective function be cTx.


Example 7..18   Prove (by induction on k) that, if x1, x2, ..., xk belong to a convex set C, then any convex combination of x1, x2, ..., xk belongs to C.

Solution Let v = $\l $1x1 + $\l $2x2 +...+ $\l $kxk be any convex combination of x1, x2, ...,  xk. Write v = (1 - $\l $k)u + $\l $kxk, where

u = $\displaystyle {\frac{{\l _1}}{{1-\l _k}}}$x1 + $\displaystyle {\frac{{\l _2}}{{ 1-\l _k}}}$x2 +...+ $\displaystyle {\frac{{\l _{k-1}}}{{ 1-\l _k}}}$xk-1.

By the induction hypothesis (why ?), u belongs to C and hence v belongs to C.


We now return to our main thread; that convexity gives us a handle on uniqueness. It is geometrically clear that a (strictly) convex function cannot have two different local minima, while if the same value occurs as a local minimum at different places, the graph must be flat between them, so neither minimum is isolated (and f can't be strictly convex).

Say that f is concave if - f is convex. To help remember the difference, note that x2 is convex, while - x2 is concave.

Taylors theorem in its general form allows us to analyse f. At the point x, we have

\begin{displaymath}
f(\mathbf{x}+ \mathbf{h}) = f(\mathbf{x}) +\mathbf{h}.\ensur...
...\mathbf{h}^TG(\mathbf{x})\mathbf{h}+
\text{higher order terms}
\end{displaymath}

exactly as in the two variable case. We have written

G = $\displaystyle \begin{pmatrix}
\displaystyle \frac{\partial^{2}f}{\partial x_1^...
... }&\ldots&
\displaystyle \frac{\partial^{2}f}{\partial x_n^{2}}
\end{pmatrix}$

With our usual assumptions, G(x) is symmetric and so it can be diagonalised; let $\l $1, $\l $2,...,$\l $n be the eigenvalues of G(x) and write D(x) = D = diag($\l $1,$\l $2,...,$\l $n). Then there is an orthogonal P (ie PT = P-1 such that G(x) = PTDP, and we have

\begin{displaymath}
f(\mathbf{x}+ \mathbf{h}) = f(\mathbf{x}) +\mathbf{h}.\ensur...
...{h}^TP^TD(\mathbf{x})(P\mathbf{h}) +
\text{higher order terms}
\end{displaymath}

At a critical point, $\ensuremath{\nabla}{} f(\mathbf{x}) = 0$ and so

f (x + h) = f (x) + $\displaystyle {\frac{{1}}{{2}}}$$\displaystyle \left(\vphantom{\l _1u_1^2 + \l _2u_2^2 + \dots + \l _nu_n^2 }\right.$$\l $1u12 + $\l $2u22 + ... + $\l $nun2$\displaystyle \left.\vphantom{\l _1u_1^2 + \l _2u_2^2 + \dots + \l _nu_n^2 }\right)$ + higher order terms

where Ph = u. Thus f is convex iff $\l $i$ \ge$ 0 for all i; we can read the behaviour of f from the set of eigenvalues of the Hessian. This leads to:

Theorem 7..19 (Second Derivative Theorem)   Assume that f is twice differentiable on the convex set D. Then f is convex on D if its Hessian G is positive semidefinite, so xTGx$ \ge$ 0 for each x.

Note that if f is linear then it is both convex and concave. The function f (x) = x12 + ... + xn2 is strictly convex, since

G = $\displaystyle \begin{pmatrix}
2&0&\dots&0\\
0&2&\dots&0\\
\hdotsfor[1.5]{4}\\
0&0&\dots&2
\end{pmatrix}$

Theorem 7..20 (Second Kuhn-Tucker)   Let f : D$ \to$$ \mathbb {R}$n be a convex function and suppose that each constraint cj is concave on D. Assume that each of the functions involved is twice continuously differentiable. Then every solution of the Kuhn-Tucker conditions is a global minimum of the corresponding NLCO problem.

Ian Craw 2002-09-11