To monitor or control a stochastic dynamic system, we need to reason about its current state. Exact inference for this task requires that we maintain a complete joint probability distribution over the possible states, an impossible requirement for most processes. Stochastic simulation algorithms provide an alternative solution by approximating the distribution at time t via a (relatively small) set of samples. The time t samples are used as the basis for generating the samples at time t+1. However, since only existing samples are used as the basis for the next sampling phase, new parts of the space are never explored. We propose an approach whereby we try to generalize from the time t samples to unsampled regions of the state space. Thus, these samples are used as data for learning a distribution over the states at time t, which is then used to generate the time t+1 samples. We examine different representations for a distribution, including density trees, Bayesian networks, and tree-structured Bayesian networks, and evaluate their appropriateness to the task. The machine learning perspective allows us to examine issues such as the tradeoffs of using more complex models, and to utilize important techniques such as regularization and priors. We validate the performance of our algorithm on both artificial and real domains, and show significant improvement in accuracy over the existing approach.