% LaTEX file of chapter 1 of the computing in finance notes.
% It was written starting January 20, 1996, by Jonathan Goodman (see the
% author line below). Goodman retains the copyright to these notes.
% He does not give anyone permission to copy the computer files related
% to them (the .tex files, .dvi files, .aux files, .ps files, etc.).
% If you want a copy of these notes, ask Jonathan Goodman. His email
% address on the internet is goodman@cims.nyu.edu.
\documentstyle[12pt]{article}
% Somewhat wider and taller page than in art12.sty
\topmargin -0.1in \headsep 0in \textheight 8.9in \footskip 0.6in
\oddsidemargin 0in \evensidemargin 0in \textwidth 6.5in
% Definitions of commonly used symbols.
\begin{document}
\title{Basic Numerical Analysis}
\author{Jonathan Goodman}
\maketitle
{\scriptsize
\noindent
These are course notes for Scientific Computing, given
Spring 1996 at the Courant Institute of Mathematical
Sciences at
New York University by Jonathan Goodman. Professor
Goodman retains the copyright to these notes. He does not give
anyone permission to copy computer files related to them.
Send email to goodman@cims.nyu.edu.
}
\vspace{1cm}
\section{Introduction}
This chapter discusses the most basic ideas in numerical analysis and explains
how they are applied to validating complex numerical software and to developing
efficient adaptive computational algorithms. The main idea is to manipulate
asymptotic error expansions that are derived from Taylor series expansions of
the functions being operated on.
We will be concerned with efficiency. Here, efficiency means finding the
derivative
accurately using a large step size, finding an integral accurately using
a small number function evaluations, etc. For casual computation such
concern is probably a waste of time; more time
is spent optimizing the code than could possibly be saved in a computation
that takes less than a second on the computer. However, we will see that
these simple operations are at the heart of algorithms that solve partial
differential equations, algorithms whose running time and accuracy are of
serious concern.
We will often use the phrase ``for sufficiently small $h$'' without clarifying
how small is small enough. This issue is too problem dependent to settle in a
simple general way. When $h$ is small enough, we say that ``$h$ is in the
asymptotic range'', the range in which an asymptotic expansion gives
useful information. It is often clear from the problem roughly how large
the asymptotic range is likely to be, and that range is usually large enough
to cover practical $h$ values. If a code is unable to produce results
consistent with an asymptotic error analysis, it is generally not because the
asymptotic range is inaccessible. Look instead for a bug in the code, in the
method, or in the error analysis.
The three specific problems discussed in this chapter are differentiation,
integration, and interpolation. Numerical differentiation is the problem
of finding (as accurately as necessary) a derivative of a function, given
several values of the function itself. Numerical integration is the
problem of computing the integral. Interpolation is the problem of evaluating
a function at some value of its arguments given as data function values
at other values of the arguments. Each of these topics is treated from the
point view of asymptotic error expansions. The chapter also discusses
asymptotic error expansions, and their uses, in general.
\section{Numerical Differentiation}
We start with the simplest case. Suppose we have a smooth function, $f(x)$,
of a single variable, $x$. If we can evaluate $f$ but not
$f^{\prime} = \frac{df}{dx}$, we could approximate $f^{\prime}$ be a
difference quotient. Several common approximations are
\begin{equation} \left . \begin{array}{cclc}
f^{\prime}(x) & \approx &
\displaystyle \frac{f(x+h) - f(x)}{h} & \;\;\;\; (a) \\
& & & \\
f^{\prime}(x) & \approx &
\displaystyle \frac{f(x) - f(x-h)}{h} & \;\;\;\; (b) \\
& & & \\
f^{\prime}(x) & \approx &
\displaystyle \frac{f(x+h) - f(x-h)}{2h} & \;\;\;\; (c) \\
& & & \\
f^{\prime}(x) & \approx &
\displaystyle \frac{- f(x+2h) + 4 f(x+h) - 3f(x)}{2h} & \;\;\;\; (d) \\
& & & \\
f^{\prime}(x) & \approx &
\displaystyle \frac{- f(x+2h) + 4f(x+h )
-4f(x-h ) + f(x+2h) } {4h} & \;\;\;\; (e) \end{array}
\;\;\;\; \right\}
\end{equation}
The first three have simple geometric interpretations as the slope of
lines connecting nearby points on the graph of $f(x)$. The last two
are more technical.
The advantage of the last one over the others is tis greater accuracy.
If $h$ is small enough, the error in (1e) is likely to be much less
than the error in the others. This often (but not always) has the
consequence that (1e) achieves a given level of accuracy (say $1\%$)
for a larger $h$ value, a property that leads to greater efficiency
when solving differential equations.
To perform a basic error analysis, we expand $f(x+h)$ about $h=0$
using Taylor series.
\begin{displaymath}
f(x+h) = f(x) + h f^{\prime}(x) + h^2 \frac{f^{\prime \prime}(x)}{2} +
\cdots + h^n \frac{f^{(n)}}{n!} + \cdots \;\; .
\end{displaymath}
Substituting this into (1a) gives
\begin{displaymath}
\frac{f(x+h) - f(x)}{h} = f^{\prime}(x)
+ h \frac{f^{\prime \prime}(x)}{2}
+ h^2 \frac{f^{\prime \prime \prime}(x)}{6}
+ \cdots \;\; .
\end{displaymath}
Now we can express the error in formula (1a) by:
\begin{displaymath}
f^{\prime}(x) = \frac{f(x+h) - f(x)}{h} + E_a(h) \;\; ,
\end{displaymath}
where
\begin{equation}
E_a(h) = \frac{h}{2} f^{\prime \prime}(x) +
\frac{h^2}{6} f^{\prime\prime\prime}(x) + \cdots \;\; .
\end{equation}
If $h$ is small, then $h^2$ is smaller still; $h^2$ is smaller than
$h$ by a factor of $h$. Thus,
$\frac{h^2}{6} f^{\prime\prime\prime}(x)$ will be much smaller
than $ \frac{h}{2} f^{\prime \prime}(x)$ unless
$ f^{\prime\prime\prime}(x)$ is much larger than
$f^{\prime \prime}(x)$. When $h$ is small enough, we may
approximate the error by
\begin{displaymath}
E_a(h) \approx \frac{h}{2} f^{\prime \prime}(x) \;\; .
\end{displaymath}
This tells the story about approximation (1a). The error
depends roughly linearly on $h$ (for small enough $h$).
Cutting $h$ in half reduces the error roughly in half.
The same is true of the related approximation (1b). The approximations
(1a) and (1b) are called ``first order one sided difference approximations''
to the derivative.
Taylor series analysis applied to the centered difference
approximation (1c) leads to
\begin{displaymath}
f^{\prime}(x) = \frac{f(x+h) - f(x-h)}{2h} + E_c(h)
\end{displaymath}
where
\begin{eqnarray}
E_c(h) & = & \frac{h^2}{3} f^{\prime\prime\prime}(x)
+ \frac{h^4}{12} f^{(5)}(x) + \cdots \\
& \approx & \frac{h^2}{3} f^{\prime\prime\prime}(x)
\;\;\;\mbox{for $h$ small enough.} \nonumber
\end{eqnarray}
In this case, the error depends quadratically on $h$ instead of
linearly. Moreover the error from (1c) is much smaller
than the error from (1a) or (1b). That is\footnote{The notation
$A \ll B$ is read ``$A$ is much smaller than $B$''.},
\begin{displaymath}
\frac{h^2}{3} f^{\prime\prime\prime}(x)
\ll
\frac{h}{2} f^{\prime\prime}(x) \;\;\;\;
\mbox{i.e.} \;\;\;\; E_c(h) \ll E_a(h) \;\; .
\end{displaymath}
When the error is on the order of $h$ for small $h$ we say that the
approximation is first order accurate. When it is of the order of
$h^2$ we call it second order accurate, and similarly for third,
fourth, and higher powers of $h$. A Taylor series analysis shows that
(1d) is second order accurate while (1e) is fourth order. If $h$ is
small and the derivatives of $f$ up to fifth order are not extremely large
then (1c) and (1d) are more accurate than (1a) or (1b), and (1e) is more
accurate than any of the others. Warning: mathematicians use
the phrase ``order of magnitude'' to refer to powers of $h$ rather
than to powers of ten. We might say that $E_a(h)$ is of the order of
$h$ even if $f^{\prime \prime}(x) = 100$. We are referring to scaling
behavior, not to specific numbers. The approximation (1c) is called the
``second order three point centered difference approximation'' to the
derivative; (1d) is the ``three point one sided second order'' difference
approximation; (1e) is the ``five point centered'' approximation.
It is important to bear in mind that a difference approximation may not
achieve its expected order of accuracy if the requisite derivatives
are infinite or do not exist. As an example of this, let $f(x)$ be the
function
\begin{displaymath}
f(x) = \left\{ \begin{array}{cl}
0 & \;\;\; \mbox{if}\;\; x\leq 0 \\
x^2 & \;\;\; \mbox{if}\;\; x\geq 0 \;\; . \end{array} \right.
\end{displaymath}
If we want $f^{\prime}(0)$ , the formulas (1c) and (1e) are only first
order accurate despite their higher accuracy for smoother functions.
This $f$ has a mild singularity, a discontinuity in its second derivative.
Such a singularity is hard to spot on a graph. Nevertheless, it has a
drastic effect on the numerical analysis of the function.
We can use finite difference approximate higher derivatives such as
\begin{displaymath}
f^{\prime\prime}(x) = \frac{f(x+h) - 2f(x) + f(x-h)}{h^2}
+ \frac{h^2}{12} f^{(4)} + \cdots \;\; ,
\end{displaymath}
and to estimate partial derivatives of functions depending on several
variables, such as
\begin{displaymath}
\frac{\partial}{\partial x} f(x,y) = \frac{f(x+h,y) - f(x-h,y)}{2h}
+ \frac{h^2}{3} \frac{\partial^3 f}{\partial x^3}(x,y) + \cdots \;\; .
\end{displaymath}
We can approximate sums of partial derivatives by adding the separate
difference approximations. For example
\begin{displaymath}
\bigtriangleup f = \frac{\partial^2 f}{\partial x^2}
+ \frac{\partial^2 f}{\partial y^2}
\approx
\frac{f(x+h,y) + f(x-h,y) + f(x,y+h) + f(x,y-h) - 4f(x,y)}{h^2} \;\; .
\end{displaymath}
It is not necessary to approximate $\bigtriangleup f$ by approximating
the second partials separately in this way. Another second order
accurate approximation is
\begin{displaymath}
\bigtriangleup f(x,y) \approx
\frac{f(x+h,y+h) + f(x+h,y-h) + f(x-h,y+h) + f(x-h,y-h) - 4f(x,y)}{2h^2}
\;\; .
\end{displaymath}
This approximation is rarely preferable to the previous, standard, one.
A multidimensional complication is that we might use different step
sizes in different coordinate directions. An important example of
this arises when we compute a function, $f(x,t)$ that satisfies a
partial differential equation such as
\begin{displaymath}
\frac{\partial f}{\partial t} = \frac{\partial ^2 f}{\partial x^2} \;\; .
\end{displaymath}
To evaluate the combination
\begin{equation}
\frac{\partial f}{\partial t} - \frac{\partial ^2 f}{\partial x^2} \;\; ,
\end{equation}
we need a ``time step'', $\Delta t$, and a ``space step'', $\Delta x$.
For example, we might use
\begin{equation}
\frac{\partial f}{\partial t} \approx \frac{f(x,t+\Delta t) - f(x,t)}{\Delta t}
\;\;
\end{equation}
and
\begin{equation}
\frac{\partial^2 f}{\partial x^2} \approx
\frac{f(x+\Delta x,t) - 2 f(x,t) + f(x-\Delta x,t)}{2\Delta x} \;\; .
\end{equation}
The approximation (5) has error roughly
$\Delta t \frac{\partial^2 f}{\partial t^2}/2$ which is first order
in $\Delta t$. The approximation (6) has error roughly
$\Delta x^2 \frac{\partial^3 f}{\partial x^3}/3$, which is second
order in $\Delta x$. If we now take $\Delta x = h$ and
$\Delta t = h^2$ then the principle term in the error in the
approximation to (4) is
is
\begin{displaymath}
h^2 \left( \frac{1}{2} \frac{\partial^2 f}
{\partial t^2} + \frac{1}{3} \frac{\partial^3 f}{\partial x^3}
\right) \;\; .
\end{displaymath}
This example illustrates the point that the overall order of accuracy
depends not only on the order of accuracy of the individual approximations,
but also on the choice of step sizes in various directions.
%*******************************************************************
\section{Error Expansions and Richardson Extrapolation}
%*******************************************************************
The error expansions (2) and (3) above are instances of a common situation
that we now describe more systematically and abstractly.
Suppose we are trying to compute a number, $A$, which is the ``Answer''
to some computational problem. We have an approximation
to $A$ that depends on a step size, $h$:
\begin{displaymath}
A \approx A(h) \;\; .
\end{displaymath}
The approximation becomes increasingly accurate as $h$ becomes small:
\begin{displaymath}
A(h) \rightarrow A \;\; \mbox{as}\;\; h \rightarrow 0\;\; .
\end{displaymath}
The error is $E(h) = A(h) - A$.
An ``asymptotic error expansion'' in powers of $h$ can be expressed either by
\begin{equation}
A(h) \approx A + h^{p_1} A_1 + h^{p_2}A_2 + \cdots \;\;\;\; ,
\end{equation}
or, equivalently, by
\begin{displaymath}
E(h) \approx h^{p_1} A_1 + h^{p_2}A_2 + \cdots \;\;\;\; .
\end{displaymath}
The expression (7) does not imply that the series on the right converges
to $A(h)$, but rather it is a substitute for the statements
\begin{equation} \left . \begin{array}{lcl}
\displaystyle \frac{A(h) - \left( A + h^{p_1}A_1 \right)}{h^{p_1}}
\rightarrow 0
& \mbox{as $h \rightarrow 0$,} & \;\;\;\; (a) \\
& & \\
\displaystyle \frac{A(h) - \left( A + h^{p_1}A_1 + h^{p_2}A_2\right)}{h^{p_2}}
\rightarrow 0
& \;\;\mbox{as $h \rightarrow 0$,} & \;\;\;\; (b) \\
& & \\
\mbox{and so on.} & & \vdots
\end{array}
\;\;\;\; \right\}
\end{equation}
The statement (8a) says not only that $A + h^{p_1}$ is a good approximation
to $A(h)$, but that the error in the approximation is smaller than $h^{p_1}$
for small enough $h$. The statement (8b) says that $A+h^{p_1}A_1 + h^{p_2}A_2$
is a better approximation to $A(h)$ in that the error is smaller than
$h^{p_2}$, which is much smaller than $h^{p_1}$ for small $h$.
It goes without saying that $0 < p_1 < p_2 < \cdots$. In many cases, the
powers and coefficients are found by Taylor series manipulations. For the
approximations (1a) and (1b), $p_1 = 1$, $p_2 = 2$, $p_3 = 3$, and so
on. For (1c), $p_1 = 2$, $p_2 = 4$, $p_3 = 6$, and so on. Although
(1d) has the same order as (1c), $p_1 = 2$ in both cases, but its asymptotic
error expansion, $p_2=3$, $p_3=4$, etc. is different (work this out!).
In all cases the principal error term is dominant when $h$ is small enough.
How small is small enough depends on the coefficients, $A_1$, $A_2$, etc.
The leading power, $p_1$, determines the order of accuracy, as we have
seen. This order need not be an integer. Methods with fractional order
of accuracy sometimes arise, particularly in simulation of stochastic
differential equations. Two methods may have the same order of accuracy
but different asymptotic error expansions, as we have seen.
We sometimes use the ``big O'' notation. We say $E(h) = O(h^p)$ if
there is some constant, $C$ so that
\begin{equation}
\left| E(h) \right| \leq C h^p \;\;\;\;
\mbox{whenever $\left|h\right| \leq h_0$.}
\end{equation}
In this case we may say that the method is $p^{th}$ order accurate.
It may happen that a method is $p^{th}$ order accurate in the sense
(9) without having an asymptotic error expansion of the form (7).
An example of this is comming.
In numerical analysis, it can be valuable to know the existence of
an error expansion of the form (7) even if the coefficients, $A_1$,
$A_2$, $\ldots$, cannot be determined. The two main applications
are {\bf convergence analysis} and {\bf Richardson extrapolation}.
In convergence analysis, we verify empirically the supposed order
of accuracy of numbers produced by a computer code that may be large,
unknown, and contain bugs. In Richardson extrapolation, we combine
approximations for several values of $h$ to produce a new approximation
that has greater order of accuracy than $A(h)$.
\begin{figure}
\begin{center}
\begin{tabular}{|c|c|c|} \hline
$h$ & Error: $E(h)$ & Ratio: $E(h)/E(h/2)$ \\
\hline \hline
.1 & 4.8756e-04 & 3.7339e+00 \\ \hline
.05 & 1.3058e-04 & 6.4103e+00 \\ \hline
.025 & 2.0370e-05 & 7.3018e+00 \\ \hline
.0125 & 2.7898e-06 & 7.6717e+00 \\ \hline
6.2500e-03 & 3.6364e-07 & 7.8407e+00 \\ \hline
3.1250e-03 & 4.6379e-08 & 7.9215e+00 \\ \hline
1.5625e-03 & 5.8547e-09 & 7.9611e+00 \\ \hline
7.8125e-04 & 7.3542e-10 & ---------- \\ \hline
\end{tabular}
\end{center}
\caption{Convergence Study for a third order accurate approximation}
\end{figure}
\begin{figure}
\begin{center}
\begin{tabular}{|c|c|c|} \hline
$h$ & Error: $E(h)$ & Ratio: $E(h)/E(h/2)$ \\
\hline \hline
.1 & 1.9041e-02 & 2.4014e+00 \\ \hline
.05 & 7.9289e-03 & 1.4958e+01 \\ \hline
.025 & 5.3008e-04 & -1.5112e+00 \\ \hline
.0125 & -3.5075e-04 & 3.0145e+00 \\ \hline
6.2500e-03 & -1.1635e-04 & 1.9880e+01 \\ \hline
3.1250e-03 & -5.8529e-06 & -8.9173e-01 \\ \hline
1.5625e-03 & 6.5635e-06 & 2.8250e+00 \\ \hline
7.8125e-04 & 2.3233e-06 & ---------- \\ \hline
\end{tabular}
\end{center}
\caption{Convergence Study for a method that has no asymptotic expansion}
\end{figure}
%*********************************************************************
\subsection{Richardson extrapolation}
%***********************************************************************
Richardson extrapolation is a method that increases the order of accuracy
of an approximation provided that the approximation has an asymptotic
error expansion of the form (7). In its simplest form, we compute $A(h)$
and $A(2h)$ and then form a linear combination that eliminates the
largest error term, $h^{p_1}A_1$. Since
\begin{eqnarray*}
A(2h) & = &
A + \left( 2h\right)^{p_1}A_1 + \left( 2h\right)^{p_2}A_2 + \cdots \\
& = &
A + 2^{p_1}h^{p_1} A_1 + 2^{p_2}h^{p_2} A_2 + \cdots \;\; ,
\end{eqnarray*}
we find that
\begin{displaymath}
\frac{2^{p_1}A(h) - A(2h)}{2^{p_1}-1}
= A + \frac{2^{p_2}-1}{2^{p_1}-1}h^{p_2} A_2
+ \frac{2^{p_3}-1}{2^{p_1}-1}h^{p_3} A_3 + \cdots \;\; .
\end{displaymath}
In other words, the extrapolated approximation
\begin{equation}
A^{(1)}(h) = \frac{2^{p_1}A(h) - A(2h)}{2^{p_1}-1}
\end{equation}
has order of accuracy $p_2 > p_1$ and asymptotic error expansion
\begin{displaymath}
A^{(1)}(h) = A + h^{p_2}A^{(1)}_2 + h^{p_3}A^{(1)}_3 + \cdots \;\; ,
\end{displaymath}
where $\displaystyle A^{(1)}_2 = \frac{2^{p_2}-1}{2^{p_1}-1}A_2$, and so on.
Richardson extrapolation can be repeated to remove more asymptotic error
terms. For example,
\begin{displaymath}
A^{(2)}(h) = \frac{2^{p_2}A^{(1)}(h) - A^{(1)}(2h)}{2^{p_2}-1}
\end{displaymath}
has order $p_3$. Since $A^{(1)}$ depends on $A(h)$ and $A(2h)$, $A^{(2)}$
depends on $A(h)$, $A(2h)$, and $A(4h)$. It is not necessary to use
powers of $2$, but this is natural in many applications. Richardson
extrapolation {\em will not work} if the underlying approximation, $A(h)$,
has accuracy of order $h^p$ in the $O(h^p)$ sense without at least one term
of an asymptotic expansion.
As an example, we derive higher order difference approximations from low
order ones. Start, for example, with the first order one sided approximation
to $f^{\prime}(x)$ given by (1a). Taking $p_1=1$ in (10) leads to the second
order approximation
\begin{eqnarray*}
f^{\prime}(x) & \cong & 2 \cdot \frac{f(x+h) - f(x)}{h}
- \frac{f(x+2h) - f(x)}{2h} \\
& = & \frac{-f(x+2h) + 4f(x+h) -3f(x)}{2h} \;\; ,
\end{eqnarray*}
which is the second order three point one sided difference approximation (1d).
Starting with the second order centered approximation (1c) (with $p_1=2$ and
$p_2=4$) leads to the fourth order approximation (1e).
Richardson extrapolation may also be applied to the output of a complex code.
Run it with step size $h$ and $2h$ and apply (10) to the output. This is
sometimes applied to stochastic differential equations as an alternative
to making up high order schemes from scratch, which can be time consuming
and intricate.
When Richardson extrapolation is applied to integration using the trapezoid
rule, the resulting integration method is called Romberg integration.
%***********************************************************************
\subsection{Convergence analysis}
%***********************************************************************
It is important to find ways to test whether a code and the algorithm it
is based on are correct. A very useful test, ``convergence analysis'',
is based on the asymptotic error expansion (7). It tests not only whether
the error is going to zero as $h \rightarrow 0$, but whether it does so as
predicted by theory. Suppose, for example, we program the formula (1e) but
with 3 instead of 4. The resulting approximation will converge to
$f^{\prime}(x)$ as $h \rightarrow 0$ but with the wrong order fo accuracy.
A convergence analysis would uncover this immediately.
There are two cases, the case where the exact answer is known and the
case where it is not known. While we probably would not write a code
for a problem to which we know the answer, it is often possible to
apply a code to a problem with a known answer for debugging. In fact,
a code should be written modularly so that it is easy to apply it to
a range or problems broad enough to include a nontrivial
problem\footnote{A trivial problem is one that is too simple to test
the code fully. For example, if you compute the derivative of a
linear function, any of the formulae (1a) -- (1e) would give the
exact answer. There would be no truncation error for the convergence
analysis to measure. The fourth order approximation (1e) gives the
exact answer for any polynomial of degree less than five.} with a
known answer.
If $A$ is known, we can run the code with step size $h$ and $2h$ and,
from the resulting approximations, $A(h)$ and $A(2h)$, compute
\begin{eqnarray*}
E(h) & \cong & A_1h^{p_1} + A_2 h^{p_2} + \cdots \;\; , \\
E(2h) & \cong & 2^{p_1} A_1 h^{p_1} + 2^{p_2} A_2 h^{p_2} | \cdots \;\; ,
\end{eqnarray*}
For small $h$ the first term is a good enough approximation so that
the ratio should be approximately the characteristic value
\begin{displaymath}
R(h) = \frac{E(2h)}{E(h)} \cong 2^{p_1}
\end{displaymath}
Figure 1 is a computational illustration of this phenomenon. Figure 2
shows what may happen when we apply this convergence analysis to an
approximation that is second order accurate in the big O sense without
having an asymptotic error expansion. The error gets very small but
the error ratio does not have simple behavior as in Figure 1.
Convergence analysis can be applied even when $A$ is not known. In this case we need three approximations, $A(4h)$, $A(2h)$, and $A(h)$. Again assuming the existence of an asymptotic error expansion (7), we get, for small $h$,
\begin{displaymath}
R^{\prime}(h) = \frac{A(4h) - A(2h)}{A(2h) - A(h)} \approx 2^{p_1} \;\; .
\end{displaymath}
%*************************************************************************
\section{Integration}
%*************************************************************************
Here the Answer we want is the integral
\begin{displaymath}
I = \int_a^b f(x) dx \;\; .
\end{displaymath}
We discuss only ``panel methods'' here. More elegant methods such as Gauss
quadrature are discussed in Dahlquist and Bjork and other places.
In a panel method, the integration interval, $[a,b]$, is divided into
$n$ subintervals, or panels, $P_k = [x_k,x_k+1]$, where
$a=x_0