Dynamic programming is a method for valuing American style options and
other financial instruments that allow the holder to make decisions
that effect the ultimate payout. The idea is to define the appropriate
value function, *f*(*x*,*t*), that satisfies a nonlinear version of the
backwards evolution equation (7). I will explain the idea in a simple but
somewhat abstract situation. From in the previous section, it is possible
to use these ideas to treat other related problems.

We have a Markov chain as before, but now the transition probabilities depend on a ``control parameter'', . That is

In the ``stochastic control problem'', we are allowed to choose the
control parameter at time *t*, , knowing the value of *X*(*t*)
but not any more about the future than the transition probabilities.
Because the system is a Markov chain, knowledge of earlier values,
*X*(*t*-1), , will not help predict or control the future.
Choosing as a function of *X*(*t*) and *t* is called ``feedback
control'' or a ``decision strategy''. The point here is
that the optimal control policy is a feedback control. That is,
instead of trying to choose a whole control trajectory, for
, we instead try to choose the feedback functions
. We will write for such a decision strategy.

Any given strategy has an expected payout, which we write

Our object is to compute the value of the financial instrument under the optimal decision strategy:

and the optimal strategy that achieves this.

The appropriate collection of values for this is the ``cost to go'' function

As before, we have ``initial data'' . We need to
compute the values *f*(*x*,*t*) in terms of already computed values *f*(*x*,*t*+1).
For this, we suppose that the optimal decision strategy at time *t* is not yet
known but those at later times are already computed. If we use control
variable at time *t*, and the optimal control thereafter, we get
payout depending on the state at time *t*+1:

Maximizing this expected payout over gives the optimal expected
payout at time *t*:

This is the principle of dynamic programming. We replace the ``multiperiod optimization problem'' (11) with a sequence of hopefully simpler ``single period'' optimization problems (13) for the cost to go function.

Tue Sep 15 17:12:32 EDT 1998