Information Theory: T. Cover and J. Thomas, "Elements of Information Theory", Chapters 2, 3
Convex Optimization: S. Boyd and L. Vandenberghe, "Convex Optimization", Chapter 3.
Linear Algebra background: "Convex Optimization", Appendix
Statistics: Daphne Koller and Nir Friedman, "Bayesian Networks and Beyond" (BN Beyond), Draft book, chapter 2
Representation: BN Beyond
Parameter Estimation: BN Beyond.
Expection Maximization: R. Neal, G. Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants.
Exact Inference: BN Beyond.
Variational Inference: Martin Wainwright and Michael Jordan, A Variational Principle for Graphical Models.
Sampling 1: BN Beyond.
Sampling 2: R. Neal, Probabilistic inference using MCMC methods, Chapter 3
Chris Bishop, Pattern Recognition and Machine Learning, (the entire book)
(This book comprehensively covers most of the topics in machine learning)
Book 1: Visual
Perception: Key Readings
Edited by Steven Yantis (Psychology Press) 2000.
(A collection of 25 essential papers in vision science/neuroscience.)
Book 2: Foundations of Vision by Brian A. Wandell (Sinauer) 1995.
Deep belief networks:
Hinton, G. E. and Salakhutdinov, R. R., Reducing the dimensionality of data with neural networks. Science, Vol. 313. no. 5786, pp. 504 - 507, 28 July 2006.
Geoffrey E. Hinton, What kind of a graphical model is the brain?
Olshausen BA, Field DJ (2004). Sparse Coding of Sensory Inputs, Current Opinion in Neurobiology, 14: 481-487.
A. Hyvärinen and P.O. Hoyer. Emergence of phase and shift invariant features by decomposition of natural images into independent feature subspaces. Neural Computation, 12(7):1705-1720, 2000.
Slow feature analysis: Berkes, Pietro and Wiskott, Laurenz. Slow feature analysis yields a rich repertoire of complex cell properties. Journal of Vision, 5(6):579--602, 2005.