Meeting
points in a triangle: orthocenter (altitudes), incenter (angle bisectors;
inscribed circle), centroid

(medians), circumcenter (perpendicular
bisectors; circumscribed circle)

Parabola:
x^{2} = 4py ; focus (0,p), directrix y = -p

Ellipse:
x^{2}/a^{2} + y^{2}/b^{2} = 1; foci (+-sqrt(a^{2}-b^{2}),0),
vertices (+-a,0)

Hyperbola:
x^{2}/a^{2} - y^{2}/b^{2} = 1
; foci (+-sqrt(a^{2}+b^{2}),0), vertices (+-a,0),
asymptotes y = +-(b/a)x

abscissa
(x), ordinate (y)

cos(pi/3)
= 1/2

sin(2x)
= 2sin(x)cos(x); cos^{2}(x) = (1+cos(2x))/2; sin^{2}(x) =
(1-cos(2x))/2

Law
of Cosines: a^{2} = b^{2} + c^{2} -2*b*c*cosA

Law
of Sines: (sinA)/a = (sinB)/b = (sinC)/c

Dot
(scalar) product: dot(a,b) = |a||b|cos(q)

Cross
(vector) product: cross(a,b) = |a||b|sin(q)n

Scalar
projection: comp_{a}b = dot(a,b)/|a|

Vector
projection: proj_{a}b = dot(a,b)/|a|^{2}*a

Squeeze
Theorem: limit of a function between two functions

Isaac
Newton (British, 1642) attended lectures given by Isaac Barrow and published *Principia
*

*Mathematica* in 1687 due to the
encouragement of Halley.

Augustin-Louis
Cauchy (French, 1789) developed the rigorous definition of limits.

Intermediate
Value Theorem: for any N between f(a) and f(b), there is a c in [a,b] such that f(c) = N

f'(x)
= lim (f(x+h)-f(x))/h

Gottfried
Leibniz (German, 1646) worked as a diplomat and developed calculus, publishing
it in 1684, and

used the Leibnez notation
(d/dx).

Binomial
Theorem: (x+y)^{n} = x^{n} + nx^{n-1}y
+ . . . + (n,k)x^{n-k}y^{k} + . . . + nxy^{n-1} + y^{n}

Poiseuille
developed the law of laminar flow, relating velocity and radius in fluid flow
in a tube; flux is

proportional to the fourth
power of the radius.

Derivatives:
sin = cos, cos = -sin, tan = sec^{2}, csc = -csc*cot, sec = sec*tan,
cot = -csc^{2}

Chain
Rule: dy/dx = (dy/du)*(du/dx)

Niels
Abel (Norway, 1800s) showed that there is no general formula for a fifth-degree
equation.

A
cycloid is the curve traced out by a point on the circumference of a circle as
the circle rolls along a line.

Newton-Raphson
method for approximating zeros of a function.

Hadamard
and de la Vallee Poussin proved the Prime Number Theorem, proposed by Gauss.

Newton's
law of cooling states that the rate of cooling of an object is proportional to
the temperature

difference between the
object and its surroundings.

Catenary
curve: y = C+a*cosh(x/a)

L'Hospital's
rule: lim(f/g) = lim(f'/g')

Extreme
Value Theorem: f has an absolute minimum and maximum in [a,b]

Fermat's
Theorem: f'(c) = 0 at max or min point c

Pierre
Fermat (French, 1601) was a lawyer who developed analytical geometry with Rene
Descarte.

Critical
number: f'(c) = 0 or does not exist

Mean
Value Theorem: f'(c) = (f(b)-f(a))/(b-a)

Joseph-Louis
Lagrange (French, 1736) was born in Italy, developed the Mean Value Theorem,
applied

calculus to the analysis of
the solar system, and worked at the Berlin Academy under Frederick the Great

and the Louvre under Louis
XVI.

First
and Second Derivative Tests for local minima or maxima

Bernhard
Riemann (Germann, 1826), studied under Gauss at Gottingen, defined integrals in
terms of the

limit of an infinite
summation (Riemann sum), and died of TB at 39.

Midpoint
(error over 24n^{2}), Trapezoid (error over 12n^{2}), and
Simpson's (error over 180n^{4}) rules for

approximating integrals

Fundamental
Theorem of Calculus: if g(x) = integral of f(t) from a to x, g'(x) = f(x) ; integral of f(x) from

a to b equals F(b) - F(a)
with F the antiderivative of f

Augustin
Fresnel (French, 1788) developed the Fresnel function (integral of sin(pi*t^{2}/2)
from 0 to x), used

in optics and highway
construction.

Substitution
Rule for integrals

ln(x)
= integral of 1/t from 1 to x

Disk
method for volumes of revolution: integral of pi times f(x)^{2}

Washer
method for volumes of revolution: pi times the integral of f(x)^{2} -
g(x)^{2}

Cylindrical
shells method for volumes: integral of 2 times pi times x times f(x)

Average
value of a function: 1/(b-a) times integral of f(x) from a to b

Mean
Value Theorem for Integrals: integral of f(x) from a to b = f(c)*(b-a)

James
Bernoulli (Swiss, 1700s) introduced Bernoulli numbers: (1/n!) times the
summation from 0 to n of

(n,k)
times b_{k} times x^{n-k} and developed the technique for
solving separable ODEs.

Integration
by parts: integral of u*dv = u*v - integral of v*du

Thomas
Simpson (British, 1700s) was a weaver who studied math and popularized
Simpson's Rule.

An
integral over an infinite region or containing an infinite discontinuity is an
improper integral.

Planck's
Law of Radiation models blackbody radiation.

Logistic
growth: dy/dt = k*y*(M-y)

Arc
length: integral from a to b of square root of one plus (dy/dx)^{2}

Christopher
Wren (British, 1600s) proved that the length of one arch of a cycloid is eight
times the radius

of the generating circle.

Moment:
sum of the mass times position of each particle

The
centroid is the center of mass.

Theorem
of Pappus: The volume of a solid formed by rotating a plane about a line is the
product of the area

of the plane and the
distance traveled by its centroid.

Chebyshev
polynomials: T_{n}(x) = cos(n*arcos(x))

A
monotonic sequence is always increasing (or always decreasing).

Every
bounded, monotonic sequence is convergent.

Sum
of infinite geometric series: a/(1-r)

In
a telescoping sum, the terms cancel in pairs, leaving just the first and the
last, such as the sum of

1/(n*(n+1)).

Harmonic
series: sum of (1/n) for n from 1 to infinity

Tests
for series convergence/divergence: Divergence test, integral test, comparison
test, p-series test (sum

of 1/n^{p} is
convergent if p>1), limit comparison (if lim(a/b) = c, both diverge or both converge).

Alternating
series: terms alternatingly positive and negative, such as the sum of (-1)^{n}^{-1}/n; the alternating

series test says it is
convergent if magnitudes are decreasing and the limit of the non-alternating
sequence

is zero,
and is absolutely convergent if the absolute value of the series
converges.

Ratio
test for absolute convergence (lim(|a_{n+1}/a_{n}|)

Power
series: summation of c_{n}x^{n} ; may
have a radius of convergence

Friedrich
Bessel (German, 1784) developed Bessel functions to solve Kepler's equation for
planetary

motion; Bessel functions are sums
of power series.

James
Gregory (Scottish, 1638) developed the power series for arctan, which leads to
the Leibniz formula

for pi: pi/4 = 1 - 1/3 +1/5
- 1/7 + . . .

Taylor
series: f(x) = summation of f^{(n)}(a)*(x-a)^{n}/n!
; if a = 0, it is the Maclaurin series.

Brook
Taylor (English, 1685) and Colin Maclaurin (Scottish, 1698) represented
functions as sums of power

series, as Gregory and
Bernoulli had done earlier.

e =
1 + 1/1! + 1/2! + 1/3! + . . .

sin(x)
= x - x^{3}/3! + x^{5}/5! - . . .

Parallelepiped
volume: dot(a,cross(b,c))

Ellipsoids:
x^{2}/a^{2} + y^{2}/b^{2} + z^{2}/c^{2}
= 1

Hyperboloids:
x^{2}/a^{2} + y^{2}/b^{2} - z^{2}/c^{2}
= 1 (one sheet) or -x^{2}/a^{2} - y^{2}/b^{2} +
z^{2}/c^{2} = 1 (two sheets)

Cones:
z^{2}/c^{2} = x^{2}/a^{2} + y^{2}/b^{2}

Paraboloids:
z/c = x^{2}/a^{2} + y^{2}/b^{2} (elliptic) or
z/c = x^{2}/a^{2} - y^{2}/b^{2} (hyperbolic)

Cylinders:
x^{2}/a^{2} = y^{2}/b^{2} = 1

A
sharp corner in a vector function graph is a cusp.

Arc
length: integral of r'(t) from a to b

Curvature
(kappa): |dT/ds| = |T'(t)|/|r'(t)| = |r'(t)*r"(t)|/|r'(t)|^{3}

Principal
unit normal vector: T'(t)/|T'(t)|

Binormal
vector: T(t) * N(t)

The
normal plane contains N and B, the osculating plane contains T and N, and the
osculating circle is in

the osculating plane and has
radius 1/curvature.

Level
curves: f(x,y) = k

Clairaut's
Theorem: order of taking partial derivatives does not matter.

Alexis
Clairaut (French, 1713) published the first systematic treatise on 3D analytic
geometry, including

the calculus of space
curves, and developed Clairaut's Theorem.

Solutions
to the Laplace equation (d^{2}u/dx^{2} + d^{2}u/dy^{2}
= 0) are harmonic functions.

Wave
equation: d^{2}u/dt^{2} = a^{2}(d^{2}u/dx^{2})

Chain
Rule: dz/dt = (dz/dx)(dx/dt) + (dz/dy)(dy/dt)

Implicit
Function Theorem: dy/dx = -F_{x}/F_{y}

The
gradient is the vector of partial derivatives of a function and is used with
the del operator.

Second
Derivative test uses sign of f_{xx}f_{yy}-f_{xy}^{2}
and f_{xx} to tell if critical point is min (both > 0) or max (first
>

0 and second < 0)or saddle (first < 0)

Lagrange
multipliers are used to minimize or maximize a function subject to a
constraint; del(f) =

lambda*del(g) and g=k

Fubini's
Theorem: order of taking multiple integrals does not matter.

Guido
Fubini (Italian, 1879) proved a general version of Fubini's Theorem.

A
multiple definite integral is an iterated integral.

A
plane region of type I lies between the graphs of two continuous functions of
x.

Polar
coordinates: x = r*cos(q) ; y = r*sin(q) ; r^{2} = x^{2} + y^{2}
; tan(q) = y/x ; (for cylindrical
add z = z)

Cardioid:
r = c + sin(q)

Rose:
r = cos(aq)

Limacon
(snail): r = 1+c*sin(q)

Conchoid:
r = a + b*sec(q)

Moment
of inertia: mass times square of distance to axis

Radius
of gyration of a lamina about an axis: mR^{2} = I

Surface
area: double integral of square root of f_{x}^{2} + f_{y}^{2}
+ 1

Spherical
coordinates: x = rcos(q)sin(f) ; y = rsin(q)sin(f) ; z = rcos(f) ; r^{2} = x^{2} + y^{2}
+ z^{2}

The
Jacobian is the determinant of the matrix of partial derivatives of a system of
functions.

Jacobians:
r for polar and spherical, r^{2}*sin(f) for spherical

Carl
Jacobi (German, 1804) developed Jacobians for evaluating multiple integrals.

A
vector field is conservative if it is the gradient of some scalar function.

The
fundamental theorem for line integrals: the line integral of the gradient of f
over a smooth curve from a

to b is f(r(b)) - f(r(a))

For
a conservative vector field F = Pi + Qj, dP/dy = dQ/dx

Green's
Theorem: The line integral over a closed line of Pdx + Qdy equals the double
integral of dQ/dx -

dP/dy over the area enclosed
by the curve.

George
Green (English, 1793) was a baker but went to Cambridge at age 40 but died 4
years after

graduation; Lord Kelvin reprinted his
findings.

The
curl of a vector F is the cross product of the gradient and F; the curl of the
gradient of f is zero.

The
divergence of a vector F is the dot product of the gradient and F ; the divergence of the curl of F is 0.

The
divergence of the gradient of a function is the Laplace operator, del^{2}
= del dot del

The
flux of a vector field across a surface is the surface integral of the vector
field over the surface.

Stokes'
Theorem: The line integral over a closed line of a vector field F equals the
double integral of the

curl of F over a surface
bounded by the line.

George
Stokes (Irish, 1819) was a Cambridge professor, studied fluid flow and light
and learned Stokes'

Theorem from Lord Kelvin.

The
line integral of the velocity field dotted with the unit tangent vector is the
circulation.

Gauss's
Theorem (Divergence Theorem or Ostrogradsky's Theorem): the surface integral of
a vector field

F over a closed surface
equals the triple integral of the divergence of F over the volume enclosed by

the surface.

Karl
Friedrich Gauss (German, 1777) studied electrostatics and developed the
Divergence Theorem.

Spring
equation: m(d^{2}x/dt^{2}) + c(dx/dt) + kx = F(t); Kirchoff's
Law: L(d^{2}Q/dt^{2}) + R(dQ/dt) + Q/C = E(t)

(displacement - charge,
velocity - current, mass - inductance, damping constant - resistance, spring

constant - elastance,
external force - electromotive force)

P(n,r) = n!/(n-r)!

Arrangements
of n objects of r types: n!/(n_{1}!*n_{2}!*.
. .*n_{r}!)

C(n,r) = n!/(r!(n-r)!)

Binomial
Theorem: (x+y)^{n} = summation of C(n,k)*x^{k}y^{n-k}
for k from o to n

Combinations
with repetition: C(n+r-1,r) ; distribute r identical
objects n ways

Catalan
Numbers (by Belgian Eugene Catalan, 1814): number of ways to parenthesize a
product or go from

origin to (n,n) below y=x; b_{n} = C(2n,n)/(n+1); 1,1,2,5,14,42,132

DeMorgan's
Laws: -(p v q) = -p A -q ; -(p A q) = -p v -q

Idempotent
Laws: p v p = p ; p A p = p

Domination
Laws: p v T = T ; p A F = F

Absorption
Laws: p v (p A q) = p ; p A (p v q) = p

Identity
Laws: p v F = p ; p A T = T

Inverse
Laws: p v -p = T ; p A -p; = F

Contrapositive
of p -> q : -q -> -p

Converse
of p -> q : q -> p

Inverse
of p -> q : -p -> -q

Modus
Ponens (Rule of Detachment): p, p ->q : q

Modus
Tollens: p -> q, -q : -p

Law
of the Syllogism: p -> q, q -> r : p -> r

Rule
of Disjunctive Syllogism: p v q, -p : q

Quantifiers
for all and for each

Rule
of Universal Specification: If an open statement is true for all replacements,
then it is true for each

specific member

Rule
of Universal Generalization: If an open statement is true for any arbitrary
replacement, it is true for all

replacements.

George
Boole established mathematical logic, Augustus DeMorgan extended it, and
Charles Sanders Peirce

introduced quantifiers.

The
power set of A is the set of all subsets of A and has
2^{n} elements.

The
dual of a set expression interchanges universal sets and null sets, and unions
and intersections.

Venn
diagrams

Georg
Cantor (Russian, 1845) defined sets, Bertrand Russell (English, 1872)
introduced Russell's paradox

showing inconsistencies in
set theory, and Kurt Godel (Austrian, 1906) showed that some things must either

be unprovable or
contradictory.

Well-Ordering
Principle: every nonempty subset of the positive integers contains a smallest
element.

Mathematical
Induction: If S(1) is true, and whenever S(k) is true,
S(k+1) is true, then S(n) is true for all

positive integers k; basis
and inductive steps.

Euclid's
theorem that there are infinitely many primes.

Euclid's
algorithm for finding greatest common divisor.

A
Diophantine equation is a linear equation requiring integer solutions.

Fundamental
Theorem of Arithmetic: Every integer greater than one can be written as a
unique product of primes.

Algorithm
and algebra come from the Islamic mathematician al-Khowarizimi.

Augusta
Ada Byron Lovelace developed the first computer algorithm.

The
Cartesian product of two sets combines all elements, of size the product of the
sizes of the two sets.

There
are 2^{|A|*|B|} relations from A to B.

In
a function from A to B, every element of A appears exactly once as the first
component of an ordered

pair in the relation.

There
are |B|^{|A|} functions from A to B.

In
an injective, or one-to-one function, each element of B appears at most once as
the image of an element of A.

There
are P(|B|,|A|) injective functions from A to B.

The
domain of A can be restricted or extended to form images.

In
a surjective, or onto function, for all elements b of B there is at least one
element a of A such that f(a)=b.

There
are the summation of (-1)^{k}(n,n-k)(n-k)^{m}
for k from 0 to n onto functions from A to B.

Stirling
numbers of the second kind are 1/n! times the number of onto functions, and are
the number of

ways to distribute m
distinct objects into n identical containers.

There
are |A|^{|A|*|A|}_{ }closed binary operations on A.

Pigeonhole
Principle: If m pigeions occupy n pigeonholes and m>n, then at least one
pigeonhole has two or

more pigeons roosting in it.

A
function is bijective if it is both one-to-one and onto.

A
function is invertible if and only if it is one-to-one and onto; there exists a
function from its codomain to

its domain such that
compoing in either order results in the identity.

The
preimage is like the inverse for functions that are not invertible.

If
|f(n)| <= m|g(n)| for all n>=k, then f is an element of O(g).

Peter
Dirichlet formulated the pigeonhole principle.

Paul
Bachmann introduced big-Oh notation.

Deterministic
finite automata accept regular languages, push-down automata accept
context-free languages,

and Turing machines accept
recursively ennumerable languages.

Logistic
population model: dP/dt = k(1-P/N)*P

A
differential equation with an initial condition of the solution function is an
initial-value problem.

A
differential equation is separable if it can be written as the product of one
function depending only on

time and another that depends
only on the dependent variable.

An
ODE dependent only on the dependent variable is autonomous.

An
implicit form of a solution is not solved explicitly for the dependent
variable.

A
slope field shows the approximate direction (slope) of solutions at points in
the graph.

Euler's
Method: y_{k+1} = y_{k} + f(t_{k},y_{k})*dt

The
Existence and Uniqueness Theorem states that solutions exist for all
continuously

defined single ODEs dy/dt =
f(t,y), and that they are unique if df/dy is
continuous .

Points
at which the function is not changing (f(t,y) = 0) are
equilibrium points.

Phase
lines show the direction of solutions between equilibrium points on a straight
line.

An equilibrium
point may be a source (solutions go towards it), sink (solutions go away from
it), or node.

The
Linearization Theorem classifies equilibrium points according to f'(y_{0}).

A
bifurcation is when a small change in a parameter causes a drastic change in
the long-term behavior of

solutions; a bifurcation diagram shows
the phase lines near a bifurcation value.

Integrating
factor: I = e^{integral of a(t)} for ODE in form dy/dt + a(t)y = k

The
phase portrait graphs solutions to systems of two equations with the two
dependent variables as axes.

Undamped
spring harmonic oscillator: d^{2}y/dt^{2} + (k/m)y = 0

The
vector field is a slope field for a system of ODEs on the phase plane.

A
system of ODEs in which the rate of change of one or more dependent variables
depends only on its own

value decouples.

The
P-Delta effect (gravity-overhang) makes oscillating buildings fall.

The
Lorenz System of three ODEs led to the development of Chaos Theory.

The
Linearity Principle states that if Y(t) is a solution to dY/dt = AY, kY(t) is a
solution, and if Y_{1}(t) and

Y_{2}(t) are
solutions, Y_{1}(t) + Y_{2}(t) is a solution.

AV
= lV for eigenvalue l and eigenvector V.

The
roots of the characteristic polynomial are the eigenvalues of a matrix.

Linear
system equilibrium points may be sources (both eigenvalues +), sinks (both -),
or saddles (+ and -).

Euler's
Formula: e^{a+ib} = e^{a}cosb + ie^{a}sin
; spiral source (a>0), spiral sink (a<0), periodic center (a=0)

Damped
harmonic oscillator: m(d^{2}y/dt^{2}) + b(dy/dt) + ky = 0 ; b^{2} - 4km determines if it is

under/over/critically
damped.

The
trace of [a,b;c,d] is a+d; the trace-determinant plane
is a parameter plane that may be graphed.

In
a homogeneous second-order equation, there is no term without some form of the
dependent variable.

The
Method of Undetermined Coefficients tries to solve forced (nonhomogenous)
second-order equations

by guessing a solution with
a parameter, substituting in, and solving for the parameter.

Van
der Pol Equation: dx/dt = y, dy/dt = -x + (1-x^{2})y

Linearization
neglects nonlinear terms near equilibrium points.

Separatrices
tend toward a saddle equilibrium point as t goes to +- infinity, separating
solutions with

different behaviors.

Nullclines
are lines or curves along which the derivative of one of the dependent
variables is zero.

For
a Hamiltonian system, dx/dt = dH/dy and dy/dt = -dH/dx, including the harmonic
oscillator and the idel

pendulum (dx/dt = y, dy/dt =
-g*sin(x)).

Fourier
transform: integral of y(t)*e^{-iwt}dt from - infinity to + infinity

Laplace
transform: integral of y(t)*e^{-st}dt from 0 to infinity

L(dy/dt) = sL(y) - y(0)

L(e^{at}) = 1/(s-a)

L(t^{n}) = n!/s^{n+1}

L(cf) = cL(f)

L(cos(wt)) = s/(s^{2}+w^{2})

L(sin(wt)) = w/(s^{2}+w^{2})

A
Heaviside function is a step function; it has a discontinuity in its definition.

Convolution:
(f*g)(t) = integral of f(t-u)g(u)du from 0 to t

Triangular
form: in kth equation coefficients of first k-1 variables are zero

Elementary
row ops: interchange, multiply by scalar, replace row with multiple of another
row

Row
echelon form: first nonzero entry in each row is one, and number of leading
zeros in each row is more

than in previous row

Gaussian
elimination: perform row ops to transform matrix into row echelon form

Reduced
row echelon form: the first nonzero entry in each row is the only nonzero entry
in its column

Gauss-Jordan
elimination: perform row ops to transform matrix into reduced row echelon form

Homogenous
system of linear equations: constants on right side are all zero

Euclidean
n-space: set of all n*1 matrices of real numbers

For
a nonsingular (invertible) matrix A, there is a multiplicative inverse B such
that AB = BA = I (identity)

Transpose:
switch columns and rows

Row
equivalent matrices: can be transformed into each other by elementary row ops

The
minor of an element is the determinant of the matrix formed by deleting the row
and column of that

element, and its cofactor is
(-1)^{i}^{+j} times that determinant.

Interchanging
rows changes determinant sign, multiplying a row by a scalar multiplies the
determinant by

that scalar,
and adding a multiple of one row to another does not change the
determinant.

Cramer's
Rule: For Ax = b, x_{i} = det (A_{i})/det(A)

Adjoint:
replace each term by its cofactor and transpose; adj(A) = det(A)*A^{-1}

A
vector space is a set on which the operations of addition and scalar
multiplication are defined and adhere

to the vector space axioms.

A
subset of a vector space is a nonempty subset which is closed under addition
and scalar multiplication.

The
nullspace of a matrix A is the set of all solutions to Ax = 0.

The
set of all linear combinations of a set of vectors is the span of those
vectors.

Vectors
are linearly dependent if their sum, with each multiplied by a nonzero
coefficient, can be zero.

The
Wronskian of a set of n functions is a matrix of their first through nth
derivatives.

A
set of linearly independent vectors that span a vector space form a basis.

The
dimension of a vector space is the minimum number of vectors that form a basis.

The
row (column) space of a matrix is the subspace spanned by its row (column)
vectors.

The
rank of a matrix is the dimension of the row space of the matrix.

The
nullity of a matrix is the dimension of its nullspace.

A
linear tranformation is a mapping from one vector space to another.

The
kernel of a linear transformation is the set of vectors in the original vector
space that are mapped to 0.

The
image of a subspace of the original vector space of a linear transform is the
set of vectors in the new

vector space mapped from the
original subspace; the image of the entire vector space is the range of the

linear transform.

Rotations
may be done by the linear transform L(x) = [cos,-sin;
sin,cos]x

A
matrix B is similar to a matrix A if there is a
nonsingular matrix S such that B = S^{-1}AS.

Cauchy-Schwartz
Inequality: |x^{T}y| <= ||x|| ||y||

Orthogonal:
x^{T}y = 0

Dot
(scalar) product: dot(a,b) = |a||b|cos(q)

Cross
(vector) product: cross(a,b) = |a||b|sin(q)n

Scalar
projection: comp_{a}b = dot(a,b)/|a|

Vector
projection: proj_{a}b = dot(a,b)/|a|^{2}*a

If a
vector space can be written as a sum of two subspaces, it is a direct sum of
those subspaces.

An
inner product is an operation on a vector space that assigns a real number to
each of its pairs of vectors.

Pythagorean
Law: ||u+v||^{2} = ||u||^{2} + ||v||^{2} for orthogonal
u and v in an inner product space

AM
Legendre and Carl Gauss developed the technique of least squares to approximate
overdetermined systems.

If
the product of any two vectors (one of them transposed) in an inner product
space is 0, it is an orthogonal set.

An
orthonormal set of vectors in an orthogonal set of unit vectors.

The
column vectors of an orthogonal matrix form an orthonomal set.

Parseval's
Formula for orthonormal basis for inner product spaces

Cooley
and Tukey developed the Fast Fourier Transform to process signals.

The
Gram-Schmidt orthogonalization process constructs an orthonormal basis for an
inner product space.

A
matrix can be factored into QR, with Q having orthonormal columns and R being
upper triangular and

invertible.

Jacobi
(including Legendre and Tchebycheff), Hermite, and Laguerre polynomials are
orthogonal with

respect to an inner product

AV
= lV for eigenvalue l and eigenvector V

Characteristic
polynomial: p(l) = det(A - lI)

A
matrix is diagonalizable if there is a matrix X such that X^{-1}AX is a
diagonal matrix.

A
stochastic process depends on chance, and a Markov process is a stochastic
process with a finite set of

possible outcomes where the
probability of next outcome depends only on previous outcome and the

probabilities are constant
with time.

A
matrix is Hermitian if it is unchanged by conjugating each entry and then
taking the transpose.

A
matrix is unitary if its column vectors form an orthonormal set.

Schur's
Theorem says that there is a matrix U such that U^{H}AU is upper
triangular.

The
Spectral Theorem says that there is a unitary matrix that diagonalizes a
Hermitian matrix.

A
matrix is normal if AA^{H} = A^{H}A.

A
matrix is positive definite if all eigenvalues are positive.

Cholesky
decomposition: A symmetrix positive definite matrix may be factored into LL^{T}
with lower triangular L.

Descriptive
and inferential statistics

Stem-and-leaf
displays: leading digit(s) are stem and trailing digits leaves

Histograms,
mean, median, mode, quartiles, boxplots

Sample
variance: sum(x-mean)^{2}/(n-1)

Sample
standard deviation = sqrt(sample variance)

Outlier
> 1.5f, extreme > 3f

P(A)
= 1 - P(A')

P(A
u B) = P(A) + P(B) - P(A A B)

Permutations:
ordered sequence of r from n: P(n,r) = n!/(n-r)!

Combinations:
Unordered subset of r from n: C(n,r) = n!/(r!(n-r)!)

Conditional
Probability: P(A|B) = P(A A B)/P(B)

Multiplication
Rule: P(A A B) = P(A|B)*P(B)

Law
of Total Probability: P(B) = P(B|A_{1}) + . . . + P(B|A_{k})
for mutually exclusive and exhaustive events

A_{1} . . . A_{k}

Bayes'
Theorem: P(A_{j}|B) = P(A_{j} A B)/P(B) = P(B|A_{j})*P(A_{j})/(P(B|A_{1})
+ . . . + P(B|A_{k}))

P(A
A B) = P(A)*P(B) if A and B are independent

A
random variable is any rule that associates a number with each outcome in the
sample space; can be

discrete or continuous; a
Bernoulli random variable is either 0 or 1

Probability
mass function: p(x) = P(X=x)

Cumulative
distribution function: F(x) = P(X <= x)

Expected
value (mean value): E(X) = summation of x times p(x)

Variance:
V(X) = summation of (x - expected value)^{2}*p(x) = E[(X-expected
value)^{2}]

Standard
deviation: SD = sqrt(variance)

Binomial
probability: (n,x)p^{x}(1-p)^{n-x} ;
probability of x successes out of n, each with probability p

E(X) = np, V(X) = np(1-p)

Hypergeometric
distribution is exact form of binomial; negative binomial distribution lets
number of trials

vary with successes fixed

Poisson
distribution: p(x;L) = e^{-L}L^{x}/x!
; E(X) = V(X) = L ; Poisson process with L = at

Use
integrals rather than summations for continuous random variables

Normal
Distribution: (1/(sqrt(2*pi)*sigma)*e^{-(x-u)*(x-u)/(2*sigma*sigma)}
; in standard normal distribution,

mean u = 0 and standard
deviation sigma = 1

Tail
area z_{a} is the value for which a of the area lies to the right

Standardize:
Z = (X-u)/sigma

About
68% within 1 SD, 95% within 2 SD, 99.7% within 3 SD

Gamma
Distribution: 1/(B^{a}L(a)) x^{a-1}e^{-x/B ;} E(X) = aB, V(X) = aB^{2}

Exponential
Distribution: Le^{-Lx
;} E(X) = 1/L, V(X) = 1/L^{2}

Chi-Squared
Distribution: Gamma distribution with a = v/2 and B = 2, with v = number of
degrees of

freedom

Other
Continuous Distributions: Weibull, Lognormal, Beta

Joint
probability mass function: double sum or double integral of p(x,y) or f(x,y)

Marginal
probability density functions: f_{x}(x) = integral f(x,y)dy
; f_{y}(y) = f(x,y)dx from -inf. to +inf.

Expected
value: E[h(X,Y)] = double integral of
h(x,y)*f(x,y)dxdy from -inf. to +inf.

Covariance:
Cov(X,Y) = E[(X-u_{x})(Y-u_{y})] =
double integral of (x-u_{x})(y-u_{y})*f(x,y)dxdy from -inf. to
+inf

= E(XY) - u_{x}u_{y}

Correlation
coefficient: p_{x,y} = Cov(X,Y)/(sigma_{x}*sigma_{y})
; = 0 if X and Y are independent

100(1-a)% confidence interval for the mean u is sample mean +- z_{a/2}
* (sigma/sqrt(n))

Sample
size needed for an interval width w: n = (2*z_{a/2} * (sigma/w))^{2}

Prediction
interval: sample mean +- t_{a/2,n-1}
*s*sqrt(1+1/n) with t_{a,v} the value for which the area under the

curve with v degrees of
freedom to the right is a

Hypothesis
testing: null hypothesis rejected if test statistic is in rejection region; type
I error wrongly

rejects, type II wrongly
does not reject

Test
statistic: z = (sample mean - actual mean)/(sigma/sqrt(n))

ANOVA:
analysis of variance

Group
- a set with associative multiplication, an identity element, and an inverse
operation

Field
- a group on which addition, subtraction, and division (except by zero) are
universally defined

Ring
- a group with commutative and associative addition, a zero, negatives of all
elements, and a

distributive law of
multiplication and addition.

An
Abelian group is commutative.

Galois
theory - the conditions necessary for an equation to be solvable by radicals

Modulus
of complex number is its magnitude

DeMoivre's
Theorem: z^{n} = r^{n}(cos(nq) + isin(nq))

Euler's
Formula: e^{iy} = cos(y) + isin(y)

Claude
Shannon wrote "Statistical Theory of Communication"

Dick
Hamming developed Hamming codes for error correction; with n parity bits you
can send 2^{n}-n-1

message bits, correcting one
error.

Golay
code correct 3 errors with 12 message bits and 11 parity bits.

Hamming
distance: how many bits are different

|message
space|*|ball size| = |sending space|

Tufte
wrote books about the visual display of information

Shannon's
definition of information: Information(A) = -log(P(A)); entropy = - integral of
p log p ;

entropy of English is about
3.9; maximum for n letters in 2^{n}

Huffman
codes for data compression; no code can be a prefix of another code; # bits
>=

(# symbols sent )*(entropy)

Distribution
of noise: Gauss (law of large numbers), Poisson (law of rare events)

Primes

Fermat prime: p = 1+2^{2^N}
for some nonnegative integer N

Mersenne prime: p = -1 + 2^{q}
for some prime q

Prime
moduli

Fermat's Theorem:
a^{m-1} = 1 mod m for prime m

Euler's Totient Theorem: t(p^{a})
= (p-1)*p^{a-1} if p is prime

t(a*b) = t(a)*t(b) ; if gcd(a,b) = 1

Carmichael's universal
exponent function

Euclidean
method for finding gcd by successive divisions, added to by Aho/Hopcroft/Ullman
with matrices

and row operations

Suntze's
Theorem (the Chinese Remainder Theorem) outputs a y such that x = y mod (ab),

given x = umod a and x = w
mod b

Merkel
and Hellman developed the trapdoor knapsack encryption system
but Shamir broke it

RSA
(Riuest - Shamir - Adleman) pick two primes and publicize their product;
breaking it requires solving

a discrete log problem

Diffie
and Helman defined digital signatures

Shamir
and Blakley threshold schemes for requiring k pieces of information to
determine secret

Purdy
developed high security log-in ids

Data
Encryption Standard (DES) uses a 64-bit plaintext and 56-bit key, 16
transformations

Truncation
vs. Rounding error

Condition
number = |relative change in solution|/|relative change in input data|

IEEE
SP 24 bit precision, IEEE DP 53 bit precision

Machine
precision = B^{1-t}

Solving
Linear Systems

Gaussian elimination for LU
factorization with partial pivoting

Gauss-Jordan elimination to
diagonal form

Sherman-Morrison formula for
inverse of a matrix with a rank-one change to matrix with known inverse

Woodbury formula for inverse
of a matrix with a rank-k change to matrix with known inverse

One-norm (Manhattan norm) is
sum of absolute values; p-norm is pth root of sum of (values)^{p}

Matrix vector one-norm is
max absolute column sum; infinity-norm is max absolute row sum

Condition number of matrix = ||A||*||A^{-1}||

A symmetric (A = A^{T})
positive definite (x^{T}Ax > 0 for all x) matrix can be factored A =
LL^{T} by

Cholesky factorization,
scaling columns by square root of diagonal entry

Band systems are nearly
diagonal; tridiagonal is a band with bandwidth one

Linear
Least Squares (Ax ~ b)

For overdetermined systems; has unique solution if the columns of A are
linearly independent

For orthogonal vectors, y^{T}z
= 0

Normal equations method:
compute Cholesky factorization of A^{T}A

Augmented Systems Method:
variation on normal equations method

Orthogonal matrices have
orthonormal columns, and orthogonal transformations preserve the

Euclidean norm

QR orthogonal
transformations

Householder: H = I - 2vv^{T}/(v^{T}v)

Givens Rotation: compute
sine and cosine

Gram-Schmidt
Orthogonalization: determine two orthonormal vectors than span the same

subspace as two given
vectors by orthogonalizing one against the other

Solution is not unique if
matrix is rank deficient

Eigenvalues
and Singular Values

Spectrum is all eigenvalues
and spectral radius is maximal eigenvalue

Characteristic polynomial:
det(A-LI) = 0

Symmetric (A = A^{T}),
Hermitian (A = A^{H}), Orthogonal (A^{T}A = AA^{T} =
I), Unitary (A^{H}A = AA^{H} =

I), Normal (A^{H}A =
AA^{H})

Similarity transforms: B is similar to A if B = T^{-1}AT, and eigenvalues are
preserved

Jacobi method for symmetric
matrices: iterate A_{k+1} = J_{k}^{T}A_{k}J_{k}^{
}using [cos,sin; -sin,cos] plane

rotation matrix

QR Iteration: A_{k+1}
= R_{k}Q_{k}

Preliminary reduction to
Hessenberg (triangular except one extra adjacent nonzero diagonal) by

Householder

Power method for largest
eigenvalue: x_{k} = Ax_{,k}_{-1}
, optionally with normalization, shifts, and deflation

Inverse iteration of power
method for smallest or given approximated eigenvalue

Rayleigh Quotient Iteration;
Rayleigh-Ritz procedure for approximating eigenvalue over subspace

of higher dimension

Lanczos and spectrum-slicing
methods for symmetric matrices

Singular value decomposition
for rectangular matrices

Nonlinear
Equations

Bisection method: reducing
search interval by half each step

Fixed-point iteration: use
fixed point where x = g(x)

Newton's method: x_{k+1}
= x_{k} - f(x_{k})/f'(x_{k})

Secant method: like Newton's
method but approximates derivative

Inverse Interpolation: fit
quadratic to three points

Linear fractional
interpolation: solve three-equation system, with horizontal and vertical
asymptotes

Broyden's Method for
nonlinear systems: updates approximate Jacobian matrix

Optimization

If all functions are linear,
this is a linear programming problem

Hessian matrix is matrix of
second partial derivatives

Golden section search:
choose relative positions of search boundary points as t and 1-t, with t =

(sqrt(5)-1)/2
~ 0.618

Successive Parabolic
Interpolation: iteratively fit three points to parabola and find minimizer

Newton's method: x_{,k}_{+1} = x_{k} - f'(x_{k})/f"(x_{k})

Steepest Descent Method:
iteratively subtract line search parameter times gradient

Newton's method (for
multidimensional unconstrained opt.): use gradient and Hessian

Secant updating methods
including BFGS method, approximating Hessian

Conjugate gradient method,
quasi-Newton methods, and truncated Newton methods

Gauss-Newton method for
nonlinear least squares using Jacobian

Levenberg-Marquardt method,
weighted combination of steepest descent and Gauss-Newton

Lagrange multipliers for
constrained optimization

Simplex method for linear
programming

Interpolation

A matrix whose columns are
successive powers of an independent variable is a Vandermonde

Monomial basis: find
polynomial of degree n-1 through n points

Lagrange interpolation

Newton interpolation

Incremental Newton
interpolation using divided differences

Hermite (osculatory) cubic
interpolation using values and derivatives at data points

Cubic spline (piecewise
polynomial of degree 3 continuously differentiable 2 times) interpolation,

using equations for each
derivate to get system of linear equations

B-splines form a basis for
the family of spline functions of a given degree

Numerical
Integration and Differentiation

Numerical quadrature is the
approximation of definite integrals

Newton-Cotes Quadrature
Rules: Midpoint, Trapezoid, and Simpson's rules

Method of undetermined
coefficients creates a Vandermonde system and solves for coefficients

Gaussian Quadrature Rules
are based on interpolation, bunching nodes at endpoints instead of

evenly, often using Legendre
polynomial; Gauss-Kronrod Quadrature Rules

Finite difference
approximations for smooth function derivatives; centered difference formula

Automatic Differentiation

Richard extrapolation
improves accuracy of numerical integration or differentiation

Romberg integration uses
Richardson extrapolation with trapezoid quadrature rule

Initial
Value ODEs

Euler's Method: y_{k+1}
= y_{k} + f(t_{k},y_{k})*dt ; stability
interval (-2,0)

Implicit backward Euler
method: y_{k+1} = y_{k} + f(t_{k+1,yk+1})h_{k}

Taylor series method use
high-order derivatives

Runge-Kutta methods,
including Heun's second-order, and classical
fourth-order (which reduces

to Simpson's rule if f
depends only on t)

Extrapolation, multistep,
predictor-corrector, and multivalue methods

Boundary
Value ODEs

Shooting method: replace bv
problem with a sequence of iv problems whose solutions converge

to bv problem

Superposition method: also
replaces bv problem with iv problems

Finite difference method:
convert bv problem into algebraic system of equations, replacing

derivatives with finite
difference approximations

Finite element method:
approximate bv problem by a linear combination of piecewise polynomial

basis functions; weighted
residual methods

Collocation: residual is
zero

Galerkin: residual is
orthogonal to space spanned by basis functions

Rayleigh-Ritz: residual is
minimized in weighted least squares sense

Partial
Differential Equations

Hyperbolic (time-dependent
physical processes not evolving to steady state), parabolic (time-

dependent physical processes
evolving toward steady state), and elliptic (time-independent,

already at steady state)
classifications of PDEs

Semidiscrete methods with
finite differences or finite elements

Fully discrete methods

Jacobi, Gauss-Seidel, Successive
Over-Relaxation, Conjugate Gradient, and Multigrid methods

Fast
Fourier Transform

Trig interpolation
represents a periodic function as a linear combination of sines and cosines

Continuous Fourier
Transform, Fourier series expansion, and Discrete Fourier Transform

Random
Numbers and Stochastic Simulation

Stochastic simulation mimics
a system by exploiting randomness to obtain a statistical sample of

possible outcomes

Random number generators:
congruential (using seed and modulus), Fibonaci, and nonuniform