We propose a new approach to the problem of searching a space of stochastic
controllers for a Markov decision process (MDP) or a partially observable 
Markov decision process (POMDP).  Following several other authors, our 
approach is based on searching in parameterized families of policies 
(for example, via gradient descent) to optimize solution quality.  However,
rather than trying to estimate the values and derivatives of a policy directly, 
we do so indirectly using estimates for the probability densities that the 
policy induces on states at the different points in time.  This enables
our algorithms to exploit the many techniques for efficient and robust 
approximate density propagation in stochastic systems.  We show how our 
techniques can be applied both to deterministic propagation schemes (where 
the MDP's dynamics are given explicitly in compact form) and to stochastic 
propagation schemes (where we have access only to a generative model, 
or simulator, of the MDP). We present empirical results for both 
of these variants on complex problems.
<p>