Catastrophe theory is the classification of isolated degenerate critical points. The theory is important for a wide range of fields in mathematics and science. However, it can be difficult to get started. While the theorems can be stated in elementary terms, and there are many expositions of the applications, the proofs require a solid foundation in mathematics. In these notes, I will state the main theorems and give a flavour of the underlying mathematics. For a complete, accessible, and rigorous treatment of catastrophe theory I recommend Catastrophe Theory by Domenico Castrigiano and Sandra Hayes to which I will refer in these notes.
A smooth function \(f: U \to \mathbb{R}\), with \(U\) an open subset of \(\mathbb{R}^n\) including the origin \(\boldsymbol{0}\), has a critical point at the origin when its gradient vanishes at this point, i.e.,
\[D f(\boldsymbol{0}) = (\partial_1 f(\boldsymbol{0}), \dots, \partial_n f(\boldsymbol{0})) =\boldsymbol{0},\]
with the partial derivative \(\partial_i f = \frac{\partial f}{\partial x_i}.\) The character of the critical point is often determined by the second-order derivative in the form of the Hessian matrix
\[D^2 f(\boldsymbol{0}) = \begin{pmatrix} \partial_{1}^2 f(\boldsymbol{0}) & \partial_1 \partial_2 f(\boldsymbol{0}) & \dots & \partial_1 \partial_n f(\boldsymbol{0})\\ \partial_2 \partial_1 f(\boldsymbol{0}) & \partial_{2}^2 f(\boldsymbol{0}) & \dots & \partial_2 \partial_n f(\boldsymbol{0})\\ \vdots & \vdots & \ddots & \vdots\\ \partial_n \partial_1 f(\boldsymbol{0}) & \partial_n \partial_2 f(\boldsymbol{0}) & \dots & \partial_{n}^2 f(\boldsymbol{0}) \end{pmatrix}. \]
The rank – defined as the number of independent rows or columns – and the index – defined as the number of negative eigenvalues – of the Hessian matrix are independent of the choice of coordinates. That is to say, a coordinate transformation \(g(\boldsymbol{x}) = f(\psi(\boldsymbol{x}))\) – with a diffeomorphism \(\psi:U \to V\) with \(U\) and \(V\) open sets in \(\mathbb{R}^n\) – preserves the gradient, and the rank and index of the Hessian matrix at the critical point,
\[ \begin{align} D g(\boldsymbol{0}) &= D f(\boldsymbol{0}) D \psi(\boldsymbol{0}) = \boldsymbol{0},\\ D^2g(\boldsymbol{0}) &=(D \psi(\boldsymbol{0}))^T D^2f(\boldsymbol{0}) (D \psi(\boldsymbol{0})), \end{align} \]
where the first order derivative of the diffeomorphism \(\psi(\boldsymbol{x}) = (\psi_1(\boldsymbol{x}), \dots, \psi_n(\boldsymbol{x}))\) is defined as
\[D \psi(\boldsymbol{0}) = \begin{pmatrix} \partial_{1} \psi_1(\boldsymbol{0}) & \partial_1 \psi_2(\boldsymbol{0}) & \dots & \partial_1 \psi_n(\boldsymbol{0})\\ \partial_2 \psi_1(\boldsymbol{0}) & \partial_{2} \psi_2(\boldsymbol{0}) & \dots & \partial_2 \psi_n(\boldsymbol{0})\\ \vdots & \vdots & \ddots & \vdots\\ \partial_n \psi_1(\boldsymbol{0}) & \partial_n \psi_2(\boldsymbol{0}) & \dots & \partial_{n} \psi_n(\boldsymbol{0}) \end{pmatrix}. \]
The rank determines whether the critical points is degenerate or nondegenerate.
Definition: A critical point is nondegenerate when the Hessian matrix \(D^2 f\) is invertable (the rank \(r = n\)). When the Hessian matrix is not invertable, the critical point is degenerate (the rank \(r < n\)).
For a nondegenerate critical point, the index distinguishes between the maximum, minimum and saddle points.
Definition: A nondegenerate critical point is a minimum when the index \(s=0\), a maximum when the index \(s=n\), and saddle point of index \(s\) when \(0 < s < n\).
As we will show below, the index of a nondegenerate critical point fully classifies the critical point up to coordinate transformations. For degenerate critical points, both the rank and the index play an important role in the classification.
The classification of nondegenerate critical points is expressed by Morse lemma, first proven by Marston Morse in 1934. Note that the theorem was already anticipated by Arthur Cayley (1859) and James Clerk Maxwell (1870) in the study of contour lines in topography.
Morse lemma: Let \(f\) be a smooth function of \(n\) variables with a critical point at the origin with index \(s\). The critical point is nondegenerate if and only if there exists a local diffeomorphism \(\psi\) which preserves the origin \(\psi(\boldsymbol{0})=\boldsymbol{0}\) for which
\[f(\psi(\boldsymbol{x})) =f(\boldsymbol{0}) - x_1^2 - \dots - x_s^2 + x_{s+1}^2 + \dots + x_n^2\]
in the vicinity of the origin.
Proof: The Morse lemma can be proven using the diagonalization of symmetric matrices and the implicit function theory. I provide a detailed proof in the next section. One can also find a proof in chapter one of Catastrophe Theory by Domenico Castrigiano and Sandra Hayes.
In the one-dimensional case, the lemma states that any nondegenerate critical points at the origin is either locally equivalent to a maximum \(f =f(0) -x^2\) or a minimum \(f= f(0) + x^2\) (see figure 1).
More generally, the lemma states that for nondegenerate critical points, there exist coordinates for which the function locally coincides with its second Taylor polynomial at the origin. The index of the Hessian matrix completely classifies the nondegenerate critical points of a smooth real-valued function on \(\mathbb{R}^n\).
A function with only nondegenerate critical points is known as a Morse function. Morse functions are dense in the space of smooth functions, i.e., a generic function only includes nondegenerate critical points and a degenerate critical point decomposes into nondegenerate ones upon an infinitesimal perturbation of the function. As such, Morse functions play a fundamental role in both mathematics and many applications. Given these observations, one might doubt the relevance of the classification of degenerate points. However, as it turns out, degenerate critical points do frequently occur in applications when considering a family of continuously varying functions. Catastrophe theory extends Morse lemma to degenerate critical points. It describes the possible equivalence classes, determines how many terms in the Taylor series are required to distinguish a critical point, and shows how a critical point unfolds into nondegenerate critical points under a general perturbation.
Consider a continuous family of smooth function \(f_\boldsymbol{u}:U \to \mathbb{R}\) with \(U\) a open subset of \(\mathbb{R}^n\) and with the external parameter \(\boldsymbol{u} = (u_1,u_2,\dots, u_d)\). René Thom’s main theorem of catastrophe theory states that when the number of external parameters does not exceed four (\(d \leq 4\)), each ‘stable’ degenerate critical point is equivalent to one of the \(\boldsymbol{7}\) elementary catastrophes
\[x^3, x^4, x^5, x^6, x^3 + y^3, x^3 - x y^2, \text{ and } x^2 y + y^4,\]
up to coordinate transformations. These elementary catastrophes are known as the fold, cusp, swallowtail, butterfly, elliptic umbilic, hyperbolic umbilic, and parabolic umbilic catastrophes. The second major theorem of catastrophe theory describes how these degenerate critical points decompose into nondegenerate ones upon small perturbations. The unfolding of a critical point is versal when it includes all the possible ways in which the degenerate critical can decompose. The unfolding is universal when it is both versal and only uses a minimal number of parameters.
Before describing the proofs of these theorems, we colloquially apply the theory to several physical toy models to introduce the relevant concepts.
Catastrophe theory is manifest in many mechanical systems which reside in equilibrium, locally minimizing the potential energy. This is known as quasistatics. We will here describe a few applications for which the potential energy includes degenerate critical points, a point in configuration space at which a small perturbation can lead to catastrophic changes in the behaviour. However, catastrophe theory can also be found in other systems, including more general problems in mechanics and optics. This is not surprising as classical physical systems extremize the action and optical systems extremize the travel time.
Consider an inhomogeneous cylinder on an inclined slope for which the centre of mass is located off centre. For convenience, let’s consider a unit cylinder with unit mass, for which the centre of mass a radius \(0 < r <1\) from the centre of the cylinder, on an inclined hill with slope \(\alpha\). In this problem, the angle \(\Theta\) between the horizontal line and the line joining the centre of mass \(C\) and the centre of the cylinder is the dynamical variable. Note that the angle \(\Theta\) uniquely determines the position of the cylinder on the slope. The angle of the slope \(\alpha\) is the external parameter.
Depending on the slope \(\alpha\), we can distinguish four configurations. For a shallow slope, the system has two equilibrium configurations for which the centre of mass \(C\) is above the point of contact \(P\) (two left panels of figure 2). The configuration for which the centre of mass is below (above) the centre of the cylinder corresponds to a global minimum (maximum) of the potential energy. When the slope is increased above a critical value, equilibrium configurations disappear (the right panel of figure 2). The cylinder will roll down the hill. At the critical slope \(\alpha_c\), the two stable equilibrium configurations merge (see the central right panel of figure 2). This configuration, known as a fold catastrophe, is a special configuration, as an infinitesimal increase in the angle \(\alpha\) leads to a catastrophic change in the dynamics of the system. This is the final straw that broke the camel’s back.
To quantify the system, note that the potential energy of the system \(V\) is the gravitational potential of the centre of mass \(C\). When the cylinder is rolled uphill by an angle \(\Theta_0\), we observe that the centre of the cylinder is raised by a distance \(\Theta_0 \sin \alpha\) while the centre of mass moves down with respect to the centre of the cylinder by a distance \(r\sin \Theta_0\). The potential energy of the system is
\[V(\Theta, \alpha) = \Theta \sin \alpha - r \sin \Theta + \cos \alpha.\]
According to the equation of motion
\[\frac{\mathrm{d}^2\Theta}{\mathrm{d}t^2} = -\frac{\partial V}{\partial\Theta} = -\sin \alpha +r \cos \Theta\]
the cylinder is in an equilibrium configuration when
\[\cos \Theta = \frac{\sin \alpha}{r}.\]
For \(\alpha < \alpha_c\), with the critical angle \(\alpha_c=\arcsin r\), this equation has two real solutions corresponding to global maximum and minimum of \(V\). For \(\alpha > \alpha_0\), there is no stable configuration. Expanding the potential energy around the angle \(\alpha = \alpha_c\),
\[V(\Theta, \alpha) = \frac{r}{6 }\Theta^3 + (\alpha - \alpha_c)\Theta \cos \alpha_c + \cos \alpha + \mathcal{O}(\alpha-\alpha_c),\]
we discover the degenerate critical point corresponding to the fold catastrophe. As it turns out, this is the universal unfolding of the fold catastrophe. The normal form of the universal unfolding of the fold catastrophe is
\[x^3+ ux,\]
plotted in figure 3.
The Zeeman catastrophe machine is a good example of a mechanical system illustrating the cusp catastrophe (first introduced for this purpose by Christopher Zeeman (1969)). Consider a disk with the centre \(C\) (see figure 4). We allow the disk to freely rotate around its centre. Consider an elastic string connecting a fixed point \(A\) to a point \(S\) on the perimeter of the disk to a point \(P\) (see figure 4). The angle \(\Theta\) between the point \(A\), the centre of the disk \(C\), and the point \(S\) is the dynamical variable of the system. We observe the behaviour of the system as we change the position of the point \(P\). This is the external parameter of the system. We will assume the system to minimize the potential energy of the string.
Let’s gradually vary the position of the point \(P\). When the point \(P\) is in the far upper right, the point \(S\) will be above the centre \(C\). When we lower the point \(P\) to line \(AC\) while crossing the red curve, the angle \(\Theta\) will gradually change but will remain above the centre \(C\). As we lower the point further, we will observe a dramatic change in the angle \(\Theta\) when we pass the second red line. The point \(S\) will all of a sudden move to the lower hemisphere of the disk. This is a fold catastrophe, forming the fold line (red). Other than the situation discussed above, we can observe that the fold line has four non-differentiable points. These points are examples of the cusp catastrophe.
The potential energy of the Zeeman machine is given by
\[ \begin{align} V(\Theta, u, v) =&\ \left(\left[ \frac{17}{4} - 2 \cos \Theta\right]^{1/2} -1\right)^2 \\ &\ + \left(\left[(u+a)^2 + v^2 + \frac{1}{4} + (u+a)\cos\Theta - v \sin \Theta\right]^{1/2} - 1\right)^2, \end{align} \]
with the distance \(a\) between the points \(A\) and \(C\). When we Taylor expand the potential energy around the point \((u,v)=(0,0)\) we will discover that the potential energy is quartic in \(\Theta\), corresponding to the cusp catastrophe.
The universal unfolding of the cusp catastrophe is of the form
\[x^4 - u x^2 + v x\]
with up to three critical points satisfying the cubic equation
\[4 x^3 - 2u x + v = 0.\]
Figure 5 shows the geometry of this unfolding. As with Zeeman’s machine, the cusp caustic has hysteresis. When approaching the region in configuration space corresponding to two angles \(\Theta\), we arrive on either the lower or the upper sheet depending on whether we approached the region from positive or negative \(v\).
This can in particular be observed in the geometry of the potential energy for the different configurations (see figure 6). For positive \(v\) there is only a single minimum. For negative \(v\), we observe either a single minimum or a set of three critical points. The two minima correspond to the two possible equilibrium configurations. When we cross a fold line (red), one of the minima merges with the maximum. If the system resided in the local minimum before crossing the fold line, the system will suddenly roll to the global minimum.
Another physical system with catastrophes in the form of caustics can be found in optics. These caustic are regions in space where the intensity spikes and as such play a fundamental role in both lensing and reflection problems. To demonstrate the emergence of the fold and the cusp caustics in optics, consider the reflection of a beam of light by the right hemisphere of a circular mirror in the geometric approximation (figure 7). For the reflection by a complete circular mirror see figure 8. The caustic occurs as the envelope of the light paths. This caustic can be observed in the reflection of light in a coffee cup. For more details of the coffeecup caustic see Symmetries of the Coffeecup Caustic by Brendan Guilfoyle, Wilhelm Klingenberg. This caustic is incidentally also relevant to the formation of rainbows. Each raindrop reflects the sun’s rays at its internal sphere to form a cone-like caustic. Different wavelengths lead to a slightly different cone due to the refractive nature of water. This refractive effect leads to the separation of colours which we observe in the sky. There is of course a lot more to the formation of rainbows. For a detailed description see Catastrophe theory and its applications by Tim Poston and Ian Stewart.
Fermat’s principle of least time provides a natural explanation for the relevance of catastrophes and caustics in optics. A path from the source to a point in space is a classical ray if and only if it is a real critical point of the time it takes to travel from the source to point in space. When the time function \(t\) has \(m\) of critical points, the point receives light from \(m\) light rays. The caustics at which the intensity spikes directly correspond to the degenerate critical points of these time function. Note that when the time function at the degenerate critical point scales like \(t^3,\) it forms a fold caustic. The fold catastrophes form lines in configuration space. When the time function scales like \(t^4\) at the degenerate critical point, the point corresponds to a cusp caustic. The cusp is realized as a non-differentiable point on a fold line.
We here give a sketch of catastrophe theory. We will explain the concepts in several steps.
Morse lemma shows that for any nondegenerate critical point of a \(n\)-dimensional function there exist local coordinates for which the function is a sum of quadratic terms, its normal form. Catastrophe theory mirrors this theorem for degenerate critical points. The reduction lemma (first proved in 1969 by D. Grommoll and W. Meyer) provides the first step towards this goal.
Reduction lemma: Around a degenerate critical point \(\boldsymbol{0}\) of a smooth function \(f\) of \(n\) variables with rank \(r\) and index \(s\), there exists a local diffeomorphism \(\psi\) preserving the origin, such that the function can be expressed as the sum
\[f(\psi(\boldsymbol{x})) = f(\boldsymbol{0}) + q_{sr}(x_1,\dots, x_r) +g(x_{r+1},\dots,x_{n}),\]
where \(q_{sr}\) is a nondegenerate normal form in \(r\) variables, vanishing and with a critical point at the origin with index \(s\), and \(g\) is a smooth function of \(n-r\) variables, vanishing and with a critical point at the origin that is totally degenerate (has vanishing rank). Note that using Morse lemma, we can select coordinates for which
\[q_{sr}(x_1,\dots, x_r) = -x_1^2 -\dots - x_s^2 + x_{s+1}^2+\dots + x_{r}^2.\]
The rank and the index completely classifies the nondegenerate part of the critical point. The function \(g\) is known as the residual singularity of \(f\) at \(\boldsymbol{0}\).
Proof: The proof of the reduction lemma is based on the diagonalization of symmetric matrices and can be found in many textbooks on catastrophe theory. I will here simply refer to chapter three of Catastrophe Theory by Domenico Castrigiano and Sandra Hayes.
Using the reduction lemma we can always write a degenerate critical point as a nondegenerate part, classified by the Morse lemma, and a totally degenerate part. This reduces the classification of degenerate critical points to the classification of totally degenerate ones. Note that when a degenerate critical point has rank \(r\), the residual singularity \(g\) is a function of \(n-r\) variables. This quantity plays an important role in catastrophe theory and is known as the corank of \(f\). It follows that the corank is a measure of the degree of degeneracy of the critical point. When the corank vanishes, the critical point is nondegenerate. When the corank is unity, it suffices the classify a totally degenerate critical point in a single variable. When the corank is two, we need to look for the classification of a totally degenerate critical point in two variables.
To develop a classification of totally degenerate critical points up to coordinate transformations, we need to develop some algebraic notation. First, we will develop a notion of locality known as the germ of the function. The germ allows us to restrict attention to the behaviour of the function in the vicinity of a point \(\boldsymbol{p}\).
Definition: For \(m,n\in \mathbb{N}\) and \(\boldsymbol{p} \in \mathbb{R}^n\), consider the set of smooth functions
\[\{f:U \to \mathbb{R}^m | U \text{ an open set in } \mathbb{R}^n \text{ including the point } p\}.\]
Two functions in this set \(f_1:U_1 \to \mathbb{R}^m\) and \(f_2:U_2\to \mathbb{R}^m\), with \(\boldsymbol{p} \in U_1 \cap U_2\), are defined to be equivalent \(f_1 \sim f_2\), when they coincide on the overlap of their definition, i.e., \(f_1|_{U_1 \cap U_2} =f_2|_{U_1 \cap U_2}.\) Locally, \(f_1\) and \(f_2\) are indistinguishable. Using this equivalence relation, we construct the germ
\[[f] =\{ g:U \to \mathbb{R}^m| g \sim f\}\]
of functions locally indistinguishable to \(f\). We define \(\boldsymbol{\mathcal{E}}_{n,m}\) as the set of germs of functions from \(\mathbb{R}^n\to \mathbb{R}^m\) at the origin \(\boldsymbol{0}\). For convenience we use the shorthand \(\boldsymbol{\mathcal{E}}\) when \(m=1\).
It can be easily shown that the space of germs \(\boldsymbol{\mathcal{E}}\) inherits the algebraic structure of the space of functions. Given the constant \(\alpha \in \mathbb{R}\) and the functions \(f_1:U_1 \to \mathbb{R}\) and \(f_2:U_2 \to \mathbb{R}\), we define the addition
\[ [f_1] + \alpha [f_2] \equiv [f_1|_V + \alpha f_2|_V]\]
and multiplication of germs as
\[ [f_1][f_2] = [ (f_1|_V) (f_2|_V)],\]
with \(f_i|_V: V \to \mathbb{R}\) the restriction of \(f_i\) to the overlap \(V= U_1 \cap U_2\). From these definitions it follows that the algebra \(\boldsymbol{\mathcal{E}}\) has a zero \([0]\) and an unit germ \([1]\) where \(0\) and \(1\) are interpreted as the constant functions \(\mathbb{R}^n \to \mathbb{R}\) defined by \(\boldsymbol{x} \mapsto 0\) and \(\boldsymbol{x} \mapsto 1\). A germ \([f] \in \boldsymbol{\mathcal{E}}\) is invertible with the inverse \([1/f]\) if and only if \(f\) does not vanish at the origin. In this case, there exists a open neighborbood of the origin for which the recipical \(1/f\) is well defined.
The notion of a germs is a very powerful algebraic construct, as it implements a very concrete notion of locality. However, germs can be difficult to work with in practice. For smooth functions, it is natural to develop a more practical notion of locality using the famous Taylor series,
\[T(f) \equiv \sum_{\boldsymbol{\nu} \in \mathcal{N}_0^n} \frac{1}{\boldsymbol{\nu}!} D^\boldsymbol{\nu} f(\boldsymbol{0}) \boldsymbol{x}^\boldsymbol{\nu},\]
where \(\boldsymbol{\nu}\) is a multi-tuple \(\boldsymbol{\nu}=(\nu_1,\dots, \nu_n)\) with the factorial defined as \(\boldsymbol{\nu}! = \nu_1!\dots \nu_n!\), the natural power defined as \(\boldsymbol{x}^\boldsymbol{\nu} = x_1^{\nu_1} \dots x_n^{\nu_n}\) and the derivative \(D^\boldsymbol{\nu}\) defined as the product of the \(\nu_i\)-th partial derivative in the \(x_i\) direction for \(i=1,\dots, n\). The Taylor series of two equivalent functions in a germ are identical since if \(f \sim g\) then \(D^\boldsymbol{\nu} f(\boldsymbol{0}) = D^\boldsymbol{\nu} g(\boldsymbol{0})\) for all \(\boldsymbol{\nu}\). However note that two functions with equal Taylor polynomials are not necessarily in the same term. For example, the Taylor series of the constant function \(0\) and the function \(e^{-1/x^2}\) coincide, but the two functions are not in the same germ at \(x=0\). Alternatively, the two functions \(f(x)=x\) and \(g(x)=2(e^x-1)\) are equivalent in the neighborhood of the origin, but do not have the same Taylor series.
Definition: Two germs \([f]\) and \([g]\) in \(\boldsymbol{\mathcal{E}}\) are \(k\)-equivalent if the \(k\)-th order Taylor series coincide, i.e., \(D^\boldsymbol{\nu} f(\boldsymbol{0}) =D^\boldsymbol{\nu} g(\boldsymbol{0})\) for all \(n\)-tuples \(\boldsymbol{\nu} \in \mathbb{N}_0^n\) for which \(|\boldsymbol{\nu}| = \nu_1 + \dots + \nu_n \leq k\). The set of all germs \(k\)-equivalent to the germ \([f]\) is called the \(k\)-jet of \([f]\). We write the \(k\)-jet as \(j^k[f]\). The set of all \(k\)-jets in \(\boldsymbol{\mathcal{E}}\) is the jet-space \(J^k\).
Two germs are in the same \(k\)-jet when their \(k\)-th order Taylor series coincide. Conversely, we can define this jet space \(j^k[f]\) as the space of germs of smooth functions with the \(k\)-th Taylor series
\[T^k(f) \equiv \sum_{\boldsymbol{\nu} \leq k} \frac{1}{\boldsymbol{\nu}!} D^\boldsymbol{\nu} f(\boldsymbol{0}) \boldsymbol{x}^\boldsymbol{\nu}.\]
This is a more practical notion of locality as it is fully determined by a finite number of coefficients.
The germs and jet space provide a notion of locality for functions. To classify the nondegenerate critical points of a function upon the equivalence class of local coordinate transformations, we need a local notion for the coordinate transformations. Define the group \(\boldsymbol{\mathcal{G}}\) of germs \([\psi]\) of local diffeomorphisms \(\psi\) at the origin leaving the origin invariant. The product of two such germs \([\varphi], [\psi] \in \boldsymbol{\mathcal{G}}\) is defined germ of the composition \([\varphi][\psi] = [\varphi \circ{} \psi]\) where the diffeomorphism are restricted to a sufficiently small open neighborhood of the origin. We can easily proof that the set \(\boldsymbol{\mathcal{G}}\) is indeed a group, with the identity element \([\text{id}]\) defined by \(id:x \mapsto x\) and where every element \([\psi]\) has an inverse \([\psi^{-1}]\).
The group of local diffeomorphism germs \(\boldsymbol{\mathcal{G}}\) acts on the space of function germs \(\boldsymbol{\mathcal{G}}\) by the composition action
\[\boldsymbol{\mathcal{E}} \times \boldsymbol{\mathcal{G}} \to \boldsymbol{\mathcal{E}}: ([f],[\psi]) \mapsto [f\circ{} \psi],\]
where we restrict the function \(f\) and the diffeomorphism \(\psi\) to the appropriate overlaps. This finally allows us to define the appropriate notion of equivalence up to coordinate transformations.
Definition: Two function germs \([f]\) and \([g]\) in \(\boldsymbol{\mathcal{E}}\) are equivalent if there is a diffeomorphism germ \([\psi]\) in \(\boldsymbol{\mathcal{G}}\) such that \([g] = [f][\psi]\). The set of all function germs equivalent to \([f]\) is known as the orbit of \([f]\) under \(\mathcal{G}\),
\[[f]\boldsymbol{\mathcal{G}} \equiv \{[f][\psi] | [\psi] \in \boldsymbol{\mathcal{G}}\}.\]
In terms of this more sophisticated language, we see that the Morse lemma states that the germ of a nondegenerate critical point at the origin, vanishing at the origin, is in the orbit of one of the normal quadratic forms \(q_{sn}(\boldsymbol{x})=-x_1^2 - \dots - x_{s}^2 + x_{s+1}^2 + \dots + x_n^2\).
Given the two notions of locality, we can study when the two notions agree.
Definition: A germ \([f] \in \boldsymbol{\mathcal{E}}\) is called \(k\)-determined if every germ that is \(k\)-equivalent to \([f]\) is equivalent to \([f]\). A germ is finitely determined if it is \(k\)-determined for some \(k < \infty\). When a germ \([f]\) is finitely determined, then the smallest \(k\) for which \([f]\) is \(k\)-determined is refered to as the determinancy of \([f]\) and denoted by \(\text{det}[f]\). When the germ \([f]\) is infintely determined we write \(\text{det}[f] = \infty\).
To study germs in terms of jet-spaces we need to know: when is a germ \([f]\in \boldsymbol{\mathcal{E}}\) finitely determined and how do we determine the determinacy of \([f]\)? When we can answer these questions, we have reduced the notion of locality to the study of a finite number of coefficients in the \(k\)-th order Taylor series!
We can answer this using the algebraic notion of the maximal and the Jacobi ideals.
Definition: The maximal ideal \(m^k\) is defined as all germs for which the first \(k-1\) order derivatives vanish
\[m^k=\{[f] \in \boldsymbol{\mathcal{E}}| D^\boldsymbol{\nu} = 0\text{ for all } |\boldsymbol{\nu}|<k\}.\]
We can also write this ideal as \(m^k = \langle x^\boldsymbol{\nu}| |\boldsymbol{\nu} = k\rangle\). Note that \(m^{k+l} = \langle m^k m^l\rangle\).
Definition: The Jacobi ideal \(\boldsymbol{\mathcal{J}}[f]\) of a germ \([f] \in \boldsymbol{\mathcal{E}}\) is the ideal of \(\boldsymbol{\mathcal{E}}\) generated by the germs of the partial derivatives \(D_if\) for \(i=1,\dots, n\), i.e.,
\[ \begin{align} \boldsymbol{\mathcal{J}}[f] &= \langle D_1f,\dots,D_n f\rangle\\ &= \{ [ D_1f] [g_1] + \dots + [D_n f] [g_n]| [g_i] \in \boldsymbol{\mathcal{E}}\}. \end{align} \]
This finally allows us to link the equivalence in terms of germs to the equivalence in terms of the Taylor series.
Theorem: The germ \([f]\) is \(k\)-determined when the ideal \(m^k\) is contained in the ideal \(\langle m \boldsymbol{\mathcal{J}} \rangle\), i.e. \(m^k \subset \langle m \boldsymbol{\mathcal{J}} \rangle\).
Proof: For the proof I refer to chapter four of Catastrophe Theory by Domenico Castrigiano and Sandra Hayes.
Thom’s classification of critical points applies to germs with are at most \(4\)-codimensional. The codimension of a germ is here the number of parameters required to describe the universal unfolding of the germ. For germs with codimension higher than \(7\), the classification becomes more involved as we are required to consider families of catastrophes, known as unimodular germs. Formally,
Definition: Let \([f]\) be a germ in \(m^2\). The codimension of the germ is defined as
\[\text{cod}[f] \equiv \text{dim } m / \boldsymbol{\mathcal{J}}[f].\]
It can be shown that the notion of codimension is independent of the choice of coordinates, i.e., two equivalent germs (which can be transformed into each other with a diffeomorphism germ) have the same codimension. Not surprisingly, the codimension is related to the corank of a function. When the codimension of a nondegenerate critical point is between \(1\) and \(4\), the corank will be either \(1\) or \(2\). Using the reduction lemma, we can thus reduce the classification to totally degenerate critical points in one and two variables.
Using these tools, it can be proven that every degenerate critical point with at most codimension \(4\) is equivalent to one of the \(\boldsymbol{7}\) elementary catastrophes
Name | Symbol | Germ | Corank | Codimension | Determinancy |
---|---|---|---|---|---|
Fold | \(A_2\) | \([x^3]\) | \(1\) | \(1\) | \(3\) |
Cusp | \(A_3\) | \([x^4]\) | \(1\) | \(2\) | \(3\) |
Swallowtail | \(A_4\) | \([x^5]\) | \(1\) | \(3\) | \(5\) |
Butterfly | \(A_5\) | \([x^6]\) | \(1\) | \(4\) | \(6\) |
Elliptic umbilic | \(D_{4}^-\) | \([x^3-xy^2]\) | \(2\) | \(3\) | \(3\) |
Hyperbolic umbilic | \(D_{4}^+\) | \([x^3+y^3]\) | \(2\) | \(3\) | \(3\) |
Parabolic umbilic | \(D_{5}\) | \([x^2 y+y^4]\) | \(2\) | \(4\) | \(4\) |
The universal unfoldings of these \(\boldsymbol{7}\) elementary catastrophes are given by
Name | Universal unfolding |
---|---|
Fold | \(x^3/3+\mu x\) |
Cusp | \(x^4/4+\mu_2 x^2/2 + \mu_1 x\) |
Swallowtail | \(x^5/5+\mu_3 x^3/3 + \mu_2 x^2 + \mu_1 x\) |
Butterfly | \(x^6/6+\mu_4 x^4/4 + \mu_3 x^3/3 + \mu_2 x^2/2 + \mu_1 x\) |
Elliptic umbilic | \(x^3 -3 x y^2 - \mu_3(x^2+y^2)-\mu_2 y - \mu_1 x\) |
Hyperbolic umbilic | \(x^3 + y^3 - \mu_3 x y - \mu_2 y - \mu_1 x\) |
Parabolic umbilic | \(x^4 + x y^2 + \mu_4 y^2 + \mu_3 x^2 + \mu_2 y + \mu_1 x\) |
This concludes Thom’s original formulation of catastrophe theory. It is possible to extend the classification to critical points with higher codimension using the same techniques.
To complete the classification theorem but keep the flow of the reasoning, we here give some proves of some of the lemmas and theorems.
Morse lemma: Let \(f\) be a smooth function of \(n\) variables with a critical point at the origin with index \(s\). The critical point is nondegenerate if and only if there exists a local diffeomorphism \(\psi\) which preserves the origin \(\psi(\boldsymbol{0})=\boldsymbol{0}\) for which
\[f(\psi(\boldsymbol{x})) =f(\boldsymbol{0}) - x_1^2 - \dots - x_s^2 + x_{s+1}^2 + \dots + x_n^2\]
in the vicinity of the origin.
Proof: For \(f\) in the given form, the function indeed has a nondegenerate critical point at the origin with the index \(s\). Conversely, write \(f\) as
\[ \begin{align} f(x_1,\dots, x_n) &= f(\boldsymbol{0}) + \int_0^1 \frac{\mathrm{d}f}{\mathrm{d}t}(tx_1,\dots, t x_n) \mathrm{d}t\\ &=f(0) + \int_0^1 \sum_{i=1}^n \frac{\partial f}{\partial x_i} (tx_1,\dots, t x_n) x_i \mathrm{d}t, \end{align} \]
using the chain rule. Define the new functions \(g_i(x_1,\dots,x_n) = \int_0^1 \frac{\partial f}{\partial x_i} (tx_1,\dots, tx_n)\mathrm{d}t\) and write \(f\) as
\[f(x_1,\dots, x_n) = f(\boldsymbol{0}) + \sum_{i=1}^n x_i g_i(x_1,\dots, x_n).\]
Since \(f\) is assumed to have a critical point at the origin, we find \(\frac{\partial f(\boldsymbol{0})}{\partial x_i} = g_i(\boldsymbol{0}) = 0\). Repeating the above described process for \(g_i\), we obtain the formula
\[g_i(x_1,\dots,x_n) = \sum_{j=1}^n x_j h_{ij}(x_1,\dots,x_n),\]
for smooth functions \(h_{ij}\). We can thus write \(f\) as
\[f(x_1,\dots,x_n) = f(\boldsymbol{0}) + \sum_{i,j=1}^n x_i x_j \bar{h}_{ij}(x_1,\dots,x_n),\]
with the symmetrized functions \(\bar{h}_{ij} = \frac{1}{2}(h_{ij}+h_{ji}).\)
Since the Hessian \((\bar{h}_{ij}(\boldsymbol{0})\) is nonsingular. We have thus obtained a real symmetric form for \(f\) which we can diagonalise in a neighbourhood of the origin using induction. For the first term, we can use the change of coordinates
\[y_1 = |\bar{h}_{11}|^{1/2}\left(x_1 + \sum_{i=2}^n x_i \frac{\bar{h}_{i1}}{|\bar{h}_{11}|}\right)\]
and \(y_i = x_i\) for \(i=2,\dots, n\), to replace the dependence on \(x_1\) by the simple quadratic form \(\pm y_1^2\). Suppose we managed to carry this forward till the \(i=m-1\)-th term
\[f = f(\boldsymbol{0}) \pm y_1^2 \pm \dots \pm y_{m-1}^2 + \sum_{i,j = m}^n y_i y_n H_{ij}(y_1,\dots, y_n)\]
where \(H_{ij}\) is a new symmetric function. Changing coordinates again to \(v_i = y_i\) when \(i \neq m\) and
\[v_m = |H_{mm}|^{1/2}\left(y_m + \sum_{i=m+1}^n y_i \frac{H_{im}}{|H_{mm}|}\right),\]
we obtain the expression
\[f=f(\boldsymbol{0}) \pm v_1^2 \pm \dots \pm v_m^2 + \sum_{i,j=m+1}^n v_i v_j H'_{ij}(v_1,\dots, v_n)\]
with again a symmetric function \(H'_{ij}\). Continuing this process till we have eliminated the summation symbol gives the required quadratic form after a simple reordering.