Information Theory: T. Cover and J. Thomas, "Elements of Information Theory", Chapters 2, 3
Convex Optimization: S. Boyd and L. Vandenberghe, "Convex Optimization", Chapter 3.
Available online
Linear Algebra background: "Convex Optimization", Appendix
Statistics: Daphne Koller and Nir Friedman, "Bayesian Networks and Beyond" (BN Beyond), Draft book, chapter 2
Representation: BN Beyond
Learning:
Parameter Estimation: BN Beyond.
Expection Maximization: R. Neal, G. Hinton. A view of the EM
algorithm that justifies incremental, sparse, and other
variants.
Inference:
Exact Inference: BN Beyond.
Variational Inference:
Martin Wainwright and Michael Jordan,
A Variational Principle for Graphical Models.
Sampling 1: BN Beyond.
Sampling 2: R. Neal, Probabilistic
inference using MCMC methods, Chapter 3
Chris Bishop, Pattern Recognition and Machine Learning, (the entire book)
(This book comprehensively covers most of the topics in machine learning)
Book 1: Visual
Perception: Key Readings
Edited by Steven Yantis (Psychology Press) 2000.
(A collection of 25 essential papers in vision science/neuroscience.)
Book 2: Foundations of Vision by Brian A. Wandell (Sinauer) 1995.
Deep belief networks:
Hinton, G. E. and Salakhutdinov, R. R., Reducing the
dimensionality of data with neural networks. Science,
Vol. 313. no. 5786, pp. 504 - 507, 28 July 2006.
Geoffrey E. Hinton, What kind of a
graphical model is the brain?
Sparse coding/ICA:
Olshausen BA, Field DJ (2004). Sparse
Coding of Sensory Inputs,
Current Opinion in Neurobiology, 14: 481-487.
A. Hyvärinen and P.O. Hoyer. Emergence
of phase and shift invariant
features by decomposition of natural images into independent feature
subspaces. Neural Computation, 12(7):1705-1720, 2000.
Slow feature analysis: Berkes, Pietro and Wiskott, Laurenz. Slow feature analysis yields a rich repertoire of complex cell properties. Journal of Vision, 5(6):579--602, 2005.