The mutual information (MI) between two random variables, such as
stimuli S and neural responses R is defined in terms of their joint
distribution . When this distribution is known exactly, the MI
can be calculated as
(1) 
Estimating MI from empirical data commonly involves two steps: first, estimating the joint distribution of stimuli and simplified responses, and then calculating the MI based on this estimated distribution. The first step in such calculations requires estimating the distribution of neural responses for each stimulus. For example, when interested in information in spike counts, one calculates the distribution of number of spikes in the responses, as measured across repeated presentation of each one of the stimuli separately. Repeating this calculation for each stimulus yields the joint distribution of stimuli and responses. An example of this procedure (using what is known as the maximum likelihood estimator) is given in Fig. 3 of the paper. Figure 3b shows raster plots of the responses to five different stimuli, and the number of spikes in each of the presentations of the first stimulus is given in Table 1a below. The corresponding distribution of spike counts for the first stimulus is given in Table 1b below, and the distribution of spike counts for five representative stimuli is depicted in Fig. 3c. Figure 3d assembles all of these distributions together, forming the empirical joint distribution of stimuli and spike counts. Other statistics of spike patterns can be used instead of spike counts. For example, spike trains can be viewed as binary ``words'' of some fixed length, and their distribution can be estimated similarly to spike counts distribution by counting number of appearances of each word across stimulus repeated presentations (Fig 3e).
a.
b.

The second step is to calculate MI from the joint distribution. When
the number of samples is very large relative to the number of bins in
the joint distribution matrix, the observed empirical joint
distributionprovides a good estimate of the true underlying
distribution, and the MI can be calculated by plugging in the
empirical distribution into the MI formula ,
(2) 
(3) 
(4) 
Since the bias is roughly proportional to the number of bins in the joint distribution matrix, we have performed a procedure that iteratively unites rows or columns of the matrix. At each step, the row or column with minimum marginal probability was united with its neighbour with the lower marginal probability. The MI was determined as the largest biascorrected estimate among all tested reduced matrices. This matrix reduction reduces the information in the matrix, but at the same time reduces the bias, and therefore makes it possible to obtain higher and more reliable estimates of the MI. The performance of this algorithm was discussed in detail in [3].