On this page, we explain in further detail the algorithm used to visualize optimal stimuli and invariance, which is mentioned in Visualization of optimal stimuli and invariances for Tiled Convolutional Neural Networks.
Note that for the equations on this page to display correctly, javascript support is needed, as we make use of MathJax.
The algorithm we implement is an extension of the method described in [1] to arbitrary activation functions. For consistency, we borrow their notation here. Denote the unit sphere by $S$. Given a neuron with activation function $g$,
For many common activation functions, step 1 can be performed by finding $x^{+}=\text{argmax}_{||x||\leq1}\, g(x)$. As the norm ball is convex, this optimization is often easier to perform, and for single-layered TCNNs, it can be done analytically. For multi-layered TCNNs, we perform both step 1 and 2 numerically.
To create the videos shown on this site, we chose $c=0.7$.
Here, we show that the algorithm above finds the most (and least) invariant direction of $g$ at $x^{+}$, where the most invariant direction is defined as the direction in which $g$ changes the least in a small neighborhood around $x^{+}$. To do this, we study the geodesic given by $\varphi(t)=\cos t\cdot x^{+}+\sin t\cdot w$. Note that $\varphi(0)=x^{+}$. Since$\frac{d}{dt}(g\circ\varphi)(0)=0$, regardless of $w$, we can find the most invariant direction $w^{+}$ by finding the $w$ for which $\frac{d^{2}}{dt^{2}}(g\circ\varphi)(0)$ is least negative. The proof that follows is a generalization of equations (34) to (44) in [1].
Denote the Jacobian of any function $f$ by $Df$. We thus have \begin{eqnarray*} \varphi(t) & = & \cos t\cdot x^{+}+\sin t\cdot w,\\ D\varphi(t) & =\left(\begin{array}{c} \frac{d\varphi_{1}(t)}{dt}\\ \vdots\\ \frac{d\varphi_{n}(t)}{dt}\end{array}\right)= & -\sin t\cdot x^{+}+\cos t\cdot w,\\ D^{2}\varphi(t) & =\left(\begin{array}{c} \frac{d^{2}\varphi_{1}(t)}{dt^{2}}\\ \vdots\\ \frac{d^{2}\varphi_{n}(t)}{dt^{2}}\end{array}\right)= & -\varphi(t),\end{eqnarray*} and by applying the chain rule, we can find \begin{eqnarray*} \frac{d}{dt}(g\circ\varphi)(t) & = & Dg\left(\varphi(t)\right)D\varphi(t)\\ & = & \sum_{i=1}^{n}\frac{\partial g\left(\varphi(t)\right)}{\partial x_{i}}\cdot\left(D\varphi(t)\right)_{i},\end{eqnarray*} \begin{eqnarray*} \frac{d^{2}}{dt^{2}}(g\circ\varphi)(t) & = & \frac{d}{dt}\left(\sum_{i=1}^{n}\frac{\partial g\left(\varphi(t)\right)}{\partial x_{i}}\cdot\left(D\varphi(t)\right)_{i}\right)\\ & = & \sum_{i=1}^{n}\left(\frac{d}{dt}\frac{\partial g\left(\varphi(t)\right)}{\partial x_{i}}\right)\cdot\left(D\varphi(t)\right)_{i}+\sum_{i=1}^{n}\frac{\partial g\left(\varphi(t)\right)}{\partial x_{i}}\cdot\left(D^{2}\varphi(t)\right)_{i}\\ & = & \sum_{i=1}^{n}\sum_{j=1}^{n}\frac{\partial^{2}g\left(\varphi(t)\right)}{\partial x_{j}\partial x_{i}}\cdot\left(D\varphi(t)\right)_{i}\cdot\left(D\varphi(t)\right)_{j}-Dg\left(\varphi(t)\right)\varphi(t)\\ & = & \left(D\varphi(t)\right)^{T}H\left(D\varphi(t)\right)-Dg\left(\varphi(t)\right)\varphi(t)\\ & = & \cos^{2}t\cdot w^{T}Hw+\sin^{2}t\cdot x^{+T}Hx^{+}-\sin2t\cdot x^{+T}Hw-Dg\left(\varphi(t)\right)(\cos t\cdot x^{+}+\sin t\cdot w).\end{eqnarray*}
Now, at the optimal point $x^{+}$, which corresponds to $t=0$, we have\[ \frac{d^{2}}{dt^{2}}(g\circ\varphi)(0)=w^{T}Hw-\left(Dg(x^{+})\right)x^{+},\] and since $\left(Dg(x^{+})\right)x^{+}$ does not depend on $w$, we can maximize$\frac{d^{2}}{dt^{2}}(g\circ\varphi)(0)$ with respect to $w$ by just maximizing $w^{T}Hw$. Since $w$ is constrained to be orthogonal to $x^{+},$ this corresponds to finding the eigenvector of $\tilde{H}=B^{T}HB$ that has the largest eigenvalue, and hence the algorithm in the above section finds the most invariant direction of $g$ at $x^{+}$.
Note that for the special case of $g(x)=\frac{1}{2}x^{T}Hx+f^{T}x+c$, this gives $\frac{d^{2}}{dt^{2}}(g\circ\varphi)(0)=w^{T}Hw-x^{+T}Hx^{+}-f^{T}x$, as is derived in [1]. A technicality: while the analysis in [1] deals with $\tilde{g}=g|_{S}$, the approaches are equivalent as $\varphi([0,2\pi])\subset S$ implies $g\circ\varphi=\tilde{g}\circ\varphi$.
[1] Berkes, P. and Wiskott, L. On the analysis and interpretation of inhomogeneous quadratic forms as receptive fields. Neural Computation, 2006.