next up previous
Next: About this document Up: Computational Methods in Finance Previous: Duality and Qualitative Properties

Dynamic Programming

Dynamic programming is a method for valuing American style options and other financial instruments that allow the holder to make decisions that effect the ultimate payout. The idea is to define the appropriate value function, f(x,t), that satisfies a nonlinear version of the backwards evolution equation (7). I will explain the idea in a simple but somewhat abstract situation. From in the previous section, it is possible to use these ideas to treat other related problems.

We have a Markov chain as before, but now the transition probabilities depend on a ``control parameter'', tex2html_wrap_inline473 . That is


In the ``stochastic control problem'', we are allowed to choose the control parameter at time t, tex2html_wrap_inline477 , knowing the value of X(t) but not any more about the future than the transition probabilities. Because the system is a Markov chain, knowledge of earlier values, X(t-1), tex2html_wrap_inline221 , will not help predict or control the future. Choosing tex2html_wrap_inline473 as a function of X(t) and t is called ``feedback control'' or a ``decision strategy''. The point here is that the optimal control policy is a feedback control. That is, instead of trying to choose a whole control trajectory, tex2html_wrap_inline477 for tex2html_wrap_inline493 , we instead try to choose the feedback functions tex2html_wrap_inline495 . We will write tex2html_wrap_inline497 for such a decision strategy.

Any given strategy has an expected payout, which we write


Our object is to compute the value of the financial instrument under the optimal decision strategy:


and the optimal strategy that achieves this.

The appropriate collection of values for this is the ``cost to go'' function


As before, we have ``initial data'' tex2html_wrap_inline499 . We need to compute the values f(x,t) in terms of already computed values f(x,t+1). For this, we suppose that the optimal decision strategy at time t is not yet known but those at later times are already computed. If we use control variable tex2html_wrap_inline477 at time t, and the optimal control thereafter, we get payout depending on the state at time t+1:


Maximizing this expected payout over tex2html_wrap_inline477 gives the optimal expected payout at time t:


This is the principle of dynamic programming. We replace the ``multiperiod optimization problem'' (11) with a sequence of hopefully simpler ``single period'' optimization problems (13) for the cost to go function.

next up previous
Next: About this document Up: Computational Methods in Finance Previous: Duality and Qualitative Properties

Jonathan Goodman
Tue Sep 15 17:12:32 EDT 1998