**Information Theory**: T. Cover and J. Thomas, "Elements of Information Theory", Chapters 2, 3

**Convex Optimization**: S. Boyd and L. Vandenberghe, "Convex Optimization", Chapter 3.

Available online

**Linear Algebra background**: "Convex Optimization", Appendix

**Statistics**: Daphne Koller and Nir Friedman, "Bayesian Networks
and Beyond" (BN Beyond), Draft book, chapter 2

**Representation**: BN Beyond

**Learning**:

**Parameter Estimation**: BN Beyond.

**Expection Maximization**: R. Neal, G. Hinton. A view of the EM
algorithm that justifies incremental, sparse, and other
variants.

**Inference**:

**Exact Inference**: BN Beyond.

**Variational Inference**:
Martin Wainwright and Michael Jordan,
A Variational Principle for Graphical Models.

**Sampling 1**: BN Beyond.

**Sampling 2**: R. Neal, Probabilistic
inference using MCMC methods, Chapter 3

Chris Bishop, Pattern Recognition and Machine Learning, (the entire book)

(This book comprehensively covers most of the topics in machine learning)

**Book 1**: Visual
Perception: Key Readings
Edited by Steven Yantis (Psychology Press) 2000.

(A collection of 25 essential papers in vision science/neuroscience.)

**Book 2**: Foundations
of Vision by Brian A. Wandell (Sinauer) 1995.

**Deep belief networks**:

Hinton, G. E. and Salakhutdinov, R. R., Reducing the
dimensionality of data with neural networks. Science,
Vol. 313. no. 5786, pp. 504 - 507, 28 July 2006.

Geoffrey E. Hinton, What kind of a
graphical model is the brain?

**Sparse coding/ICA**:

Olshausen BA, Field DJ (2004). Sparse
Coding of Sensory Inputs,
Current Opinion in Neurobiology, 14: 481-487.

A. Hyvärinen and P.O. Hoyer. Emergence
of phase and shift invariant
features by decomposition of natural images into independent feature
subspaces. Neural Computation, 12(7):1705-1720, 2000.

**Slow feature analysis**:
Berkes, Pietro and Wiskott, Laurenz.
Slow feature
analysis yields a rich repertoire of complex cell properties.
Journal of Vision, 5(6):579--602, 2005.