In this post we will study differentiation in abstract spaces.
Definition of Derivatives
Let \(E\) be a normed linear space and \(K\) the closed interval \([0,\,1]\) of the real number line. We consider an operator \(x = x(t),\) which need not be linear and maps \(K\) into \(E.\) In the following, we will call such an operator an abstract function on the interval \([0,\,1].\)
For these functions, we shall define and deduce properties of the fundamental operations of analysis.
Definition 1. We consider the function \(x = x(t)\) with \(x\in E\) and \(t\in [0,\,1].\) We define the derivative \(x ' (t)\) by \[x ' (t) := \frac{d}{dt} x(t) := \lim_{\varDelta t \to 0} \frac{1}{\varDelta t}[x(t+\varDelta t) - x(t)],\tag{1}\] in case the limit in the right-hand side exists.
It follows that \[x ' (t) = \frac{1}{\varDelta t}[x(t+\varDelta t) - x(t)] + \alpha (t,\, \varDelta t),\] where \(\alpha (t,\,\varDelta t) \to 0\) as \(\varDelta t \to 0 ,\) Therefore \[x(t+\varDelta t) - x(t) = x ' (t) \varDelta t - \alpha (t,\,\varDelta t) \varDelta t .\tag{2}\] As \(\varDelta t \to 0,\) the right side of the equation (2) tends to \(0.\) Hence if \(x(t)\) has a derivative with respect to \(t,\) then \(x(t)\) is continuous at the point \(t.\)
As an example we mention differentiation of a vector function \(x_n (t).\) If we interpret \(t\) as time, then \(x_n ' (t)\) is the velocity vector.
To the translation from the mechanics of a system of points to the mechanics of a continuum there corresponds the transition from an \(n\)-dimensional vector \(x_n (t)\) to a time \((t)\) dependent element \(x(t)\) of a certain function space. Then \(x ' (t)\) is the velocity.
In one-dimensional problems(string, rod, etc.) we consider \(x(t)\) and \(x ' (t)\) as elements of the space \(C[0,\,1].\)
We easily recognize the following properties of differentiation:
- \([x(t) + y(t)] ' = x ' (t) + y ' (t).\)
- \([\lambda x(t) ] ' = \lambda x ' (t)\) for all numbers \(\lambda .\)
- Let there be defined for the elements \(x\in E\) a left-sided (right-sided) multiplication with elements \(y\in E.\) Suppose also that it is continuous and distributive with respect to addition and commutative with scalar multiplication. Then \[[yx(t)] ' = yx ' (t) , \tag{3}\] i.e., constant factors can be removed from under the differentiation symbol.
Properties [1] and [2] are clear. Property [3] follows from the continuity and distributivity of left-sided multiplication, namely \[\begin{align} yx ' (t) &= y \lim_{\varDelta t \to 0} \frac{1}{\varDelta t} [x(t+ \varDelta t) - x(t)] \\[6pt] &= \lim_{\varDelta t \to 0} y\left\{ \frac{1}{\varDelta t} [ x(t+ \varDelta t) - x(t)] \right\} \\[6pt] &= \lim_{\varDelta t \to 0} \frac{1}{\varDelta t}[yx(t+\varDelta t) - yx(t) ] \\[6pt] &= \frac{d}{dt} [yx(t)]. \end{align}\] Analogously, we obtain \[[x(t) y] ' = x ' (t) y \tag{3a}\] for right-sided multiplication.
Example 1. Let \(x = x(t)\) with \(x\in E\) and let \(A\) be an operator from \((E \to E_1 ).\) \[[ Ax(t) ] ' = Ax ' (t) . \tag{4}\] If \(A = A(t) \in (E \to E_1 )\) and \(x\in E ,\) then \[[A(t)x] ' = A ' (t) x . \tag{4a}\] In particular, for a linear functional \(f\in \overline{E} :\) \[\begin{align} \left\{ f[x(t)] \right\} ' &= f[x ' (t)] ,\tag{5} \\[6pt] \left\{ f(t)(x) \right\} ' &= f ' (t)(x) .\tag{5a} \end{align}\]
Derivatives of Higher Order
We will now define derivatives of higher order. As in the case of ordinary functions, we can give two definitions of the \(n\)th derivative of an abstract function \(x=x(t).\) \[ {\overline{\varDelta}}_{\varDelta t} ^n x(t) =\sum_{k=0}^n (-1)^{n-k} \binom{n}{k} x(t+k \varDelta t),\] is the \(n\)th difference of \(x(t)\) at the point \(t.\) Further, we call \[\varDelta _{\varDelta t}^n x(t) ={\overline{\varDelta}}_{\varDelta t}^n x \left( t- \frac{n}{2} \varDelta t \right)\] the \(n\)th central difference.
The expression \[x^{[n]}(t) = \lim_{\varDelta t \to 0} \frac{1}{(\varDelta t)^n} \varDelta_{\varDelta t}^{n} x(t) \tag{6}\] shall be called -- under the hypothesis that this limit exists -- the \(n\)th difference derivative of the function \(x(t)\) at the point \(t.\)
If the limit process in (6) is uniform in the neighborhood of every point \(t,\) then \(x^{[n]}(t)\) is called the uniform \(n\)th difference-derivative.
The \(n\)th derivative \(x^{(n)}(t)\) is defined by differentiating successively \(n\) times: \[\begin{align} x ' (t)_0 &= \frac{d}{dt} x(t),\\[6pt] x ' ' (t)_0 &= \frac{d}{dt} [x ' (t)_0 ] ,\\[6pt] &\,\,\,\vdots \\[6pt] x^{(n)} (t)_0 &= \frac{d}{dt} [x^{(n-1)}(t)_0 ]. \end{align}\]
Theorem 1. If the continuous \(n\)th derivative \(x^{(n)}(t)_0\) exists in a neighborhood of a point \(t,\) then in this neighborhood the uniform \(n\)th difference-derivative \(x^{[n]}(t)\) exists too, and \[x^{[n]}(t) = x^{(n)}(t)_0 .\] Conversely, if in a neighborhood of the point \(t\) the uniformly continuous, uniform difference-derivative \(x^{[n]}(t)\) exists, then there exists in this neighborhood the \(n\)th derivative \(x^{(n)} (t)_0 .\)
These statements hold for functions whose ranges consist of numbers. The transition to abstract functions is accomplished by a method which is often applied in functional analysis. We shall carry this out for the first statement.
For an arbitrary linear functional \(f\in \overline{E} ,\) \[\varphi (t) = f[x(t)]\] is a function whose domain of definition and range consist of numbers. Because of (5), we obtain \[\begin{align} f[x ' (t)_0 ] &= \left\{ f [ x(t) ] \right\} ' = \varphi ' (t), \\[6pt] f[x ' ' (t)_0 ] &= \left\{ f [ x ' (t)_0 ] \right\} ' = \left\{ \varphi ' (t) \right\} ' = \varphi ' ' (t) ,\\[6pt] &\,\,\,\vdots\\[6pt] f[x^{(n)}(t)_0 ] &= \left\{ f[x^{(n-1)}(t)_0 ] \right\} ' = \left\{ \varphi^{n-1} (t) \right\} ' = \varphi^{(n)} (t) . \end{align}\] Furthermore \[\begin{align} f \left[ \frac{1}{(\varDelta t)_n} \varDelta_{\varDelta t}^{n} x(t) \right] &= \frac{1}{(\varDelta t)^n} \sum_{k=0}^{n} (-1)^{n-k} \binom{n}{k} \varphi \left( t+ \left( k- \frac{n}{2} \right) \varDelta t \right) \\[6pt] &= \frac{1}{(\varDelta t)^n} \varDelta_{\varDelta t}^n \varphi (t) \\[6pt] &= \varphi^{(n)} (t+ \theta \varDelta t ) \\[6pt] &=f[x^{(n)} (t+\theta \varDelta t)] \end{align}\] where \(-\frac{1}{2} \le \theta \le \frac{1}{2} .\) Since \(x^{(n)} (t)_0 ,\) according to the hypothesis, is continuous in a neighborhood of the point \(t,\) \[\lVert x^{(n)} (t+ \theta \varDelta t)_0 - x^{(n)} (t)_0 \rVert \le \epsilon_{\varDelta t} ,\] where \(\epsilon_{\varDelta t} \to 0\) as \(\varDelta t \to 0\) uniformly in a neighborhood of the point \(t.\)
From this we have \[\left\lvert f\left[ \frac{1}{(\varDelta t)^n} \varDelta_{\varDelta t}^{n} x(t) \right] - f[x^{(n)} (t)_0 ] \right\rvert \le \epsilon_{\varDelta t} \lVert f \rVert . \tag{7}\] The inequality (7) holds for arbitrary \(f\in \overline{E} ,\) therefore \[\left\lVert \frac{1}{(\varDelta t)^n} \varDelta_{\varDelta t}^{n} x(t) - x^{(n)} (t)_0 \right\rVert \le \epsilon_{\varDelta t}\] and consequently \[x^{(n)}(t)_0 = \lim_{\varDelta t \to 0} \frac{1}{(\varDelta )^n} \varDelta_{\varDelta t}^{n} x(t) .\] In this case, the convergence is uniform in a neighborhood of every point \(t.\)
This proves the first statement of the theorem.
Partial Derivatives
We introduce the concept of the partial derivative of an abstract function. To do this, we consider a function of \(n\) real variables \(t_1 ,\) \(t_2 ,\) \(\cdots ,\) \(t_n\) with a range lying in a normed linear space \(E:\) \[y = f(t_1 ,\, t_2 ,\, \cdots,\, t_n ) \in E .\] We can interpret \(t_1 ,\) \(t_2 ,\) \(\cdots ,\) \(t_n\) as components of the \(n\)-dimensional vector \[T = \sum_{i=1}^{n} t_i e_i ,\] where the \(e_i\) are orthonormal basis vectors, i.e., \(n\)-dimensional mutually orthogonal unit vectors.
We now define the \(n\)th partial difference-derivative \[\frac{\partial ^n}{\partial t_1 \,\partial t_2 \cdots \partial t_n} f(t_1 ,\, t_2 ,\, \cdots ,\, t_n )\] at the point \(T_0 = \sum_{i=1}^{n} t_i^{(0)} e_i .\)
To do this, we form the \(n\)th partial difference: \[\varDelta_{t_1 , \cdots, t_n}^n f(T_0 ) = \sum_{i_1 ,\cdots, i_n} (-1)^{n-k} f[T_0 + \varDelta t(e_{i_1} + \cdots + e_{i_k} )].\] Here \(0\le i_1 < i_2 < \cdots < i_k \le n(i_1 ,\, i_2 ,\, \cdots ,\, i_k )\) is a subset of \((1,\,2,\,\cdots,\,n ).\) The summation extends over all such subsets. For the empty set we put \(k=0\) and \(Y_0\) as the argument. The \(n\)th central partial difference at the point \(T_0\) \[\varDelta_{t_1 ,\, \cdots ,\, t_n ;\, \varDelta t}^n f(T_0 )\] is the \(n\)th partial difference at the point \[T_0 ' = T_0 - \frac{1}{2} \varDelta t \sum_{i=1}^{n} e_i .\] Therefore \[\varDelta_{t_1 ,\, \cdots ,\, t_n ;\, \varDelta t}^{n} f(T_0 ) = {\overline{\varDelta}}_{t_1 ,\,\cdots,\,t_n ;\, \varDelta t}^n f\left( T_0 - \frac{1}{2} \varDelta t \sum_{i=1}^{n} e_i \right).\] Then the limit as \(\varDelta t \to 0\) of \[\frac{1}{(\varDelta t)^n} \varDelta_{t_1 ,\, \cdots ,\,t_n ;\, \varDelta t}^n f(T_0 ),\] in case it exists, is called the \(n\)th partial difference derivative \[\frac{\partial ^n}{\partial t_1 \, \partial t_2 \cdots \partial t_n} f(t_1^{(0)} ,\, t_2^{(0)} ,\, \cdots ,\, t_n^{(0)})\] of the function \(f\) at the point \(T_0 = (t_1^{(0)} ,\, t_2^{(0)} ,\, \cdots ,\, t_n^{(0)}).\)
Parallel to this, we can define the \(n\)th partial derivative. It is the result of carrying out successive differentiations of the function \(f(t_1 ,\, t_2 ,\, \cdots ,\, t_n )\) with respect to \(t_{k_n} ,\) \(t_{k_{n-1}} ,\) and finally \(t_{k_1} ,\) where \(k_1 ,\) \(k_2 ,\) \(\cdots ,\) \(k_n\) is an arbitrary permutation of the indices \(1,\) \(2,\) \(\cdots,\) \(n.\) Of course, we have to assume that the derivative \[\begin{align} \frac{\partial}{\partial t_{k_n}} & f(t_1 ,\, t_2 ,\, \cdots ,\, t_n ) , \\[6pt] \frac{\partial}{\partial t_{k_{n-1}}} & \left( \frac{\partial}{\partial t_{k_n}} f(t_1 ,\, t_2 ,\, \cdots ,\, t_n ) \right) ,\\[6pt] &\vdots \\[6pt] \frac{\partial}{\partial t_{k_i}} & \left\{ \frac{\partial}{\partial t_{k_{i+1}}} \left( \cdots \frac{\partial}{\partial t_{k_n}} f(t_1 ,\, t_2 ,\, \cdots ,\, t_n ) \right) \right\}\\[6pt] &\vdots \end{align}\] obtained successively exist in a neighborhood of \(T_0 .\)
Theorem 2. If in a neighborhood of the point \(T_0 = (t_1 ^{(0)} ,\, t_2 ^{(0)} ,\, \cdots ,\, t_n^{(0)})\) there exists an \(n\)th partial derivative of the function \(f\) and if this derivative is continuous at \(T_0 ,\) then there also exists at \(T_0\) the \(n\)th partial difference-derivative and both derivatives coincide.
Proof
Let \(L\) be an arbitrary linear functional of \(\overline{E}.\) Then \[L[f(t_1 ,\, t_2 ,\, \cdots ,\, t_n )] = L f(t_i )\] is a function of \(t_1 ,\) \(t_2 ,\) \(\cdots ,\) \(t_n ,\) whose range consists of numbers. We have \[\varDelta_{t_1 ,\, \cdots ,\, t_n ;\, \varDelta t}^n L f(t_i^{(0)} ) = \frac{\partial}{\partial t_{k_1}} \left\{ \cdots \frac{\partial}{\partial t_{k_n}} L f(t_i^{(0)} + \theta_i \varDelta t ) \right\}\] for every permutation \(k_1 ,\) \(k_2 ,\) \(\cdots ,\) \(k_n\) of the indices \(1,\) \(2,\) \(\cdots ,\) \(n\) and \[\begin{align} L \left[ \frac{1}{(\varDelta t)^n} \varDelta_{t_1 ,\, \cdots ,\, t_n ;\, \varDelta t}^n f(t_i ^{(0)}) \right] &= \frac{1}{(\varDelta t)^n} \varDelta_{t_1 ,\, \cdots ,\, t_n ;\, \varDelta t}^n L f(t_i^{(0)} ) \\[6pt] &= \frac{\partial}{\partial t_{k_1}} \left\{ \cdots \frac{\partial}{\partial t_{k_n}} L f(t_i ^{(0)} + \theta_i \varDelta t ) \right\} \\[6pt] &= L \left[ \frac{\partial}{\partial t_{k_1}} \left\{ \cdots \frac{\partial}{\partial t_{k_n}} f(t_i ^{(0)} + \theta_i \varDelta t ) \right\} \right] , \end{align}\] because the linear functional \(L\) can be removed from under the derivative sign.
According to the hypothesis, the \(n\)th partial derivative of \(f(t_i )\) is continuous at the point \((t_i ^{(0)} ),\) so that for an arbitrary \(\epsilon_{\varDelta t} > 0 ,\) \[\left\lVert \frac{\partial}{\partial t_{k_1}} \left\{ \cdots \frac{\partial}{\partial t_{k_n}} f(t_i ^{(0)} + \theta_i \varDelta t ) \right\} - \frac{\partial}{\partial t_{k_1}} \left\{ \cdots \frac{\partial}{\partial t_{k_n}} f(t_i ^{(0)} )\right\} \right\rVert \le \epsilon_{\varDelta t}\] for sufficiently small \(\lvert \varDelta t \rvert .\) Hence \[\begin{align} \,& \left\lvert L \left[ \frac{1}{(\varDelta t)^n} \varDelta_{t_1 ,\, \cdots ,\, t_n ;\, \varDelta t}^n f(t_i^{(0)}) \right] - L \left[ \frac{\partial}{\partial t_{k_1}} \left\{ \cdots \frac{\partial}{\partial t_{k_n}} f(t_i ^{(0)} ) \right\} \right] \right\rvert \\[6pt] =& \left\lvert L \left[ \frac{\partial}{\partial t_{k_1}} \left\{ \cdots \frac{\partial}{\partial t_{k_n}} f(t_i ^{(0)} + \theta_i \varDelta t ) \right\} \right] - L \left[ \frac{\partial}{\partial t_{k_1}} \left\{ \cdots \frac{\partial}{\partial t_{k_n}} f(t_i ^{(0)} ) \right\} \right] \right\rvert \\[6pt] \le & \lVert L \rVert \left\lVert \frac{\partial}{\partial t_{k_1}} \left\{ \cdots \frac{\partial}{\partial t_{k_n}} f(t_i ^{(0)} + \theta_i \varDelta t ) \right\} - \frac{\partial}{\partial t_{k_1}} \left\{ \cdots \frac{\partial}{\partial t_{k_n}} f(t_i ^{(0)} ) \right\} \right\rVert\\[6pt] \le & \lVert L \rVert \epsilon_{\varDelta t} . \end{align}\] This inequality is valid for every \(L \in \overline{E},\) therefore \[\left\lVert \frac{1}{(\varDelta t)^n} \varDelta_{t_1 ,\, \cdots ,\, t_n ;\, \varDelta t}^n f(t_i ^{(0)}) - \frac{\partial}{\partial t_{k_1}} \left\{ \cdots \frac{\partial}{\partial t_{k_n}} f(t_i ^{(0)} ) \right\} \right\rVert \le \epsilon_{\varDelta t} .\] It follows that \[\frac{1}{(\varDelta t)^n} \varDelta_{t_1 ,\, \cdots ,\, t_n ;\, \varDelta t}^n f(t_i ^{(0)}) \,\to\, \frac{\partial}{\partial t_{k_1}} \left\{ \cdots \frac{\partial}{\partial t_{k_n}} f(t_i ^{(0)} ) \right\}\] as \(\varDelta t \to 0,\) and we have proved the existence of the limit of the expression \[\frac{1}{(\varDelta t)^n} \varDelta_{t_1 ,\, \cdots ,\, t_n ;\, \varDelta t}^n f(t_i^{(0)})\] and its equality with the \(n\)th partial derivative.
Corollary. Two \(n\)th partial derivatives which correspond to different permutations of the indices \(1,\) \(2,\) \(\cdots,\) \(n,\) coincide at the points where they both are continuous. This means that the \(n\)th partial derivative does not depend on the order of the differentiations.
The equality \[\frac{\partial}{\partial t_{k_1}} \left\{ \cdots \frac{\partial}{\partial t_{k_n}} f(t_i) \right\} = \lim_{\varDelta t \to 0} \frac{1}{(\varDelta t)^n} \varDelta_{t_1 ,\, \cdots ,\, t_n ;\, \varDelta t}^n f(t_i )\] holds in this case. The right hand side, however, does not depend on the permutation \(k_1 ,\) \(\cdots ,\) \(k_n .\)
In the sequel, if we speak about the \(n\)th partial derivative in a certain region, we shall assume also that it is continuous in this region. Therefore, both definitions give the same result and we shall designate this derivative by the symbol \[\frac{\partial ^n}{\partial t_1 \,\partial t_2 \cdots \partial t_n} f(t_i ) .\] If \(h_i\) are arbitrary elements of the space \(E,\) and if \[y = f \left( \sum_{i=1}^{n} t_i h_i \right) \in E \] holds, then \(y\) is a function of the \(n\) parameters \(t_1 ,\) \(t_2 ,\) \(\cdots ,\) \(t_n .\) The function \[f \left( \sum_{i=1}^n t_i h_i \right)\] has, for arbitrary \(h_1 ,\) \(\cdots ,\) \(h_n ,\) an \(n\)th partial derivative \[\left[ \frac{\partial ^n}{\partial t_1 \, \partial t_2 \cdots \partial t_n} f\left( \sum_{i=1}^{n} t_i h_i \right) \right]_{t_1 = \cdots = t_n =0 } .\] If \(h_1 = \cdots = h_n ,\) then it is not difficult to show that \[\left[ \frac{\partial ^n}{\partial t_1 \, \partial t_2 \cdots \partial t_n} f \left( x+\sum_{i=1}^{n} t_i h_i \right) \right]_{t_1 = \cdots = t_n =0} = \frac{d^n}{dt^n} f(x+th)_{t=0}\] holds.
Reference
- L. A. Liusternik ad V. J. Sobolev, 『Elements of Functional Analysis』 (Translated by Anthony E. Labarre), Frederick Ungar Publishing Company(New York), 168-173.