The MI value can be interpreted in a number of equivalent ways:
(i) The MI is the reduction in uncertainty about the stimulus after a response is observed. This is the standard information-theoretic interpretation. In the context studied here, without observing the response, the stimulus can be any one of the stimuli, resulting in an uncertainty that is quantified by the entropy of the set of stimuli, here equal to bits. If the mutual information between a neuron and the stimuli is e.g. 0.5 bit, observing the responses of the neuron reduces this entropy to 3.4. bits. Thus, the a-posteriori distribution over possible stimuli is less variable than the initial distribution. Observing more non-redundant neurons would reduce this uncertainty even more. If the uncertainty about the stimulus is , the stimulus is known with precision. Thus, theoretically, the responses of totally non-redundant neurons, each with 0.5 bit/stimulus, are sufficient to fully specify the stimulus. I practice, neurons may be redundant and the actual number of neurons required to uniquely identify the stimulus may be substantially higher.
(ii) The MI is the of the number of different classes to which the stimuli can be subdivided after observing a response. This interpretation is tightly linked to the previous one, and is a concrete interpretation of the reduction in uncertainty.
(iii) The MI quantifies the differences between the responses to the various stimuli. The MI can be formally written as the the average divergence between the distribution of responses to a specific stimulus and the unconditional response distribution (superposition of the individual distributions to each of the specific stimuli). The divergence used here is the Kullback-Leibler (KL) divergence, . Thus, if the responses are independent of the stimulus, responses to any of the stimuli will be very similar (up to sampling issues) to the average distribution, and the KL distance will be small, resulting in a low MI. Conversely, when the responses strongly depend on the stimulus, the average response distribution is largely different from the response distribution to any specific stimulus, and as a result the KL distance is large and so is the MI.
In this view, the MI quantifies the stimulus effects on the responses. This view is closer to standard statistical tests such as 1-way ANOVA (that tests stimulus effects on the mean responses, assuming equal variance). However, standard statistical tests often have strong assumptions on the distributions of the responses, whereas the MI can be interpreted without any distributional assumptions.