Meeting
points in a triangle: orthocenter
(altitudes), incenter (angle bisectors; inscribed circle), centroid

(medians), circumcenter (perpendicular bisectors;
circumscribed circle)

Parabola:
x^{2} = 4py ; focus
(0,p), directrix y = -p

Ellipse:
x^{2}/a^{2} + y^{2}/b^{2} = 1; foci (+-sqrt(a^{2}-b^{2}),0),
vertices (+-a,0)

Hyperbola:
x^{2}/a^{2} - y^{2}/b^{2} = 1 ; foci (+-sqrt(a^{2}+b^{2}),0),
vertices (+-a,0), asymptotes y = +-(b/a)x

abscissa
(x), ordinate (y)

cos(pi/3)
= 1/2

sin(2x)
= 2sin(x)cos(x); cos^{2}(x)
= (1+cos(2x))/2; sin^{2}(x)
= (1-cos(2x))/2

Law
of Cosines: a^{2} = b^{2} + c^{2} -2*b*c*cosA

Law
of Sines: (sinA)/a = (sinB)/b = (sinC)/c

Dot
(scalar) product: dot(a,b) = |a||b|cos(q)

Cross
(vector) product: cross(a,b) = |a||b|sin(q)n

Scalar
projection: comp_{a}b = dot(a,b)/|a|

Vector
projection: proj_{a}b = dot(a,b)/|a|^{2}*a

Squeeze
Theorem: limit of a function between two functions

Isaac
Newton (British, 1642) attended lectures given by Isaac Barrow and published *Principia
*

*Mathematica* in 1687 due to the encouragement of Halley.

Augustin-Louis
Cauchy (French, 1789) developed the rigorous definition of limits.

Intermediate
Value Theorem: for any N between f(a) and f(b), there is a c in [a,b] such
that f(c) = N

f'(x)
= lim (f(x+h)-f(x))/h

Gottfried
Leibniz (German, 1646) worked as a diplomat and developed calculus, publishing
it in 1684, and

used the Leibnez notation (d/dx).

Binomial
Theorem: (x+y)^{n} = x^{n} + nx^{n-1}y + . . . + (n,k)x^{n-k}y^{k}
+ . . . + nxy^{n-1} + y^{n}

Poiseuille
developed the law of laminar flow, relating velocity and radius in fluid flow
in a tube; flux is

proportional to the fourth power of the radius.

Derivatives:
sin = cos, cos = -sin, tan = sec^{2}, csc = -csc*cot, sec = sec*tan,
cot = -csc^{2}

Chain
Rule: dy/dx = (dy/du)*(du/dx)

Niels
Abel (Norway, 1800s) showed that there is no general formula for a
fifth-degree equation.

A
cycloid is the curve traced out by a point on the circumference of a circle as
the circle rolls along a line.

Newton-Raphson
method for approximating zeros of a function.

Hadamard
and de la Vallee Poussin proved the Prime Number Theorem, proposed by Gauss.

Newton's
law of cooling states that the rate of cooling of an object is proportional to
the temperature

difference between the object and its surroundings.

Catenary
curve: y = C+a*cosh(x/a)

L'Hospital's
rule: lim(f/g) = lim(f'/g')

Extreme
Value Theorem: f has an absolute minimum and maximum in [a,b]

Fermat's
Theorem: f'(c) = 0 at max or min point c

Pierre
Fermat (French, 1601) was a lawyer who developed analytical geometry with Rene
Descarte.

Critical
number: f'(c) = 0 or does not exist

Mean
Value Theorem: f'(c) = (f(b)-f(a))/(b-a)

Joseph-Louis
Lagrange (French, 1736) was born in Italy, developed the Mean Value Theorem,
applied

calculus to the analysis of the solar system, and
worked at the Berlin Academy under Frederick the Great

and the Louvre under Louis XVI.

First
and Second Derivative Tests for local minima or maxima

Bernhard
Riemann (Germann, 1826), studied under Gauss at Gottingen, defined integrals
in terms of the

limit of an infinite summation (Riemann sum), and
died of TB at 39.

Midpoint
(error over 24n^{2}), Trapezoid (error over 12n^{2}), and
Simpson's (error over 180n^{4}) rules for

approximating integrals

Fundamental
Theorem of Calculus: if g(x) = integral of f(t) from a to x, g'(x) = f(x) ;
integral of f(x) from

a to b equals F(b) - F(a) with F the antiderivative
of f

Augustin
Fresnel (French, 1788) developed the Fresnel function (integral of sin(pi*t^{2}/2)
from 0 to x), used

in optics and highway construction.

Substitution
Rule for integrals

ln(x)
= integral of 1/t from 1 to x

Disk
method for volumes of revolution: integral of pi times f(x)^{2}

Washer
method for volumes of revolution: pi times the integral of f(x)^{2} -
g(x)^{2}

Cylindrical
shells method for volumes: integral of 2 times pi times x times f(x)

Average
value of a function: 1/(b-a) times integral of f(x) from a to b

Mean
Value Theorem for Integrals: integral of f(x) from a to b = f(c)*(b-a)

James
Bernoulli (Swiss, 1700s) introduced Bernoulli numbers: (1/n!) times the
summation from 0 to n of

(n,k) times b_{k} times x^{n-k} and
developed the technique for solving separable ODEs.

Integration
by parts: integral of u*dv = u*v - integral of v*du

Thomas
Simpson (British, 1700s) was a weaver who studied math and popularized
Simpson's Rule.

An
integral over an infinite region or containing an infinite discontinuity is an
improper integral.

Planck's
Law of Radiation models blackbody radiation.

Logistic
growth: dy/dt = k*y*(M-y)

Arc
length: integral from a to b of square root of one plus (dy/dx)^{2}

Christopher
Wren (British, 1600s) proved that the length of one arch of a cycloid is eight
times the radius

of the generating circle.

Moment:
sum of the mass times position of each particle

The
centroid is the center of mass.

Theorem
of Pappus: The volume of a solid formed by rotating a plane about a line is
the product of the area

of the plane and the distance traveled by its
centroid.

Chebyshev
polynomials: T_{n}(x) = cos(n*arcos(x))

A
monotonic sequence is always increasing (or always decreasing).

Every
bounded, monotonic sequence is convergent.

Sum
of infinite geometric series: a/(1-r)

In
a telescoping sum, the terms cancel in pairs, leaving just the first and the
last, such as the sum of

1/(n*(n+1)).

Harmonic
series: sum of (1/n) for n from 1 to infinity

Tests
for series convergence/divergence: Divergence test, integral test, comparison
test, p-series test (sum

of 1/n^{p} is convergent if p>1), limit
comparison (if lim(a/b) = c, both diverge or both converge).

Alternating
series: terms alternatingly positive and negative, such as the sum of (-1)^{n-1}/n;
the alternating

series test says it is convergent if magnitudes are
decreasing and the limit of the non-alternating sequence

is zero, and is absolutely convergent if the absolute
value of the series converges.

Ratio
test for absolute convergence (lim(|a_{n+1}/a_{n}|)

Power
series: summation of c_{n}x^{n} ; may have a radius of
convergence

Friedrich
Bessel (German, 1784) developed Bessel functions to solve Kepler's equation
for planetary

motion; Bessel functions are sums of power series.

James
Gregory (Scottish, 1638) developed the power series for arctan, which leads to
the Leibniz formula

for pi: pi/4 = 1 - 1/3 +1/5 - 1/7 + . . .

Taylor
series: f(x) = summation of f^{(n)}(a)*(x-a)^{n}/n! ;
if a = 0, it is the Maclaurin series.

Brook
Taylor (English, 1685) and Colin Maclaurin (Scottish, 1698) represented
functions as sums of power

series, as Gregory and Bernoulli had done earlier.

e
= 1 + 1/1! + ½! + 1/3! + . . .

sin(x)
= x - x^{3}/3! + x^{5}/5! - . . .

Parallelepiped
volume: dot(a,cross(b,c))

Ellipsoids:
x^{2}/a^{2} + y^{2}/b^{2} + z^{2}/c^{2}
= 1

Hyperboloids:
x^{2}/a^{2} + y^{2}/b^{2} - z^{2}/c^{2}
= 1 (one sheet) or -x^{2}/a^{2} - y^{2}/b^{2}
+ z^{2}/c^{2} = 1 (two sheets)

Cones:
z^{2}/c^{2} = x^{2}/a^{2} + y^{2}/b^{2}

Paraboloids:
z/c = x^{2}/a^{2} + y^{2}/b^{2} (elliptic) or
z/c = x^{2}/a^{2} - y^{2}/b^{2} (hyperbolic)

Cylinders:
x^{2}/a^{2} = y^{2}/b^{2} = 1

A
sharp corner in a vector function graph is a cusp.

Arc
length: integral of r'(t) from a to b

Curvature
(kappa): |dT/ds| = |T'(t)|/|r'(t)| = |r'(t)*r"(t)|/|r'(t)|^{3}

Principal
unit normal vector: T'(t)/|T'(t)|

Binormal
vector: T(t) * N(t)

The
normal plane contains N and B, the osculating plane contains T and N, and the
osculating circle is in

the osculating plane and has radius 1/curvature.

Level
curves: f(x,y) = k

Clairaut's
Theorem: order of taking partial
derivatives does not matter.

Alexis
Clairaut (French, 1713) published the first systematic treatise on 3D analytic
geometry, including

the calculus of space curves, and developed
Clairaut's Theorem.

Solutions
to the Laplace equation (d^{2}u/dx^{2} + d^{2}u/dy^{2}
= 0) are harmonic functions.

Wave
equation: d^{2}u/dt^{2} = a^{2}(d^{2}u/dx^{2})

Chain
Rule: dz/dt = (dz/dx)(dx/dt) + (dz/dy)(dy/dt)

Implicit
Function Theorem: dy/dx = -F_{x}/F_{y}

The
gradient is the vector of partial derivatives of a function and is used with
the del operator.

Second
Derivative test uses sign of f_{xx}f_{yy}-f_{xy}^{2}
and f_{xx} to tell if critical point is min (both > 0) or max
(first >

0 and second < 0)or saddle (first < 0)

Lagrange
multipliers are used to minimize or maximize a function subject to a
constraint; del(f) =

lambda*del(g) and g=k

Fubini's
Theorem: order of taking multiple integrals does not matter.

Guido
Fubini (Italian, 1879) proved a general version of Fubini's Theorem.

A
multiple definite integral is an iterated integral.

A
plane region of type I lies between the graphs of two continuous functions of
x.

Polar
coordinates: x = r*cos(q)
; y = r*sin(q) ; r^{2} = x^{2} + y^{2}
; tan(q)
= y/x (for cylindrical add
z = z)

Cardioid:
r = c + sin(q)

Rose:
r = cos(aq)

Limacon
(snail): r = 1+c*sin(q)

Conchoid:
r = a + b*sec(q)

Moment
of inertia: mass times square of distance to axis

Radius
of gyration of a lamina about an axis: mR^{2} = I

Surface
area: double integral of square root of f_{x}^{2} + f_{y}^{2}
+ 1

Spherical
coordinates: x = rcos(q)sin(f) ; y = rsin(q)sin(f) ; z
= rcos(f)
; r^{2}
= x^{2} + y^{2} + z^{2}

The
Jacobian is the determinant of the matrix of partial derivatives of a system
of functions.

Jacobians:
r for polar and spherical, r^{2}*sin(f) for spherical

Carl
Jacobi (German, 1804) developed Jacobians for evaluating multiple integrals.

A
vector field is conservative if it is the gradient of some scalar function.

The
fundamental theorem for line integrals: the line integral of the gradient of f
over a smooth curve from a

to b is f(r(b)) - f(r(a))

For
a conservative vector field F = Pi + Qj, dP/dy = dQ/dx

Green's
Theorem: The line integral over a closed line of Pdx + Qdy equals the double
integral of dQ/dx -

dP/dy over the area enclosed by the curve.

George
Green (English, 1793) was a baker but went to Cambridge at age 40 but died 4
years after

graduation; Lord Kelvin reprinted his findings.

The
curl of a vector F is the cross product of the gradient and F;
the curl of the gradient of f is zero.

The
divergence of a vector F is the dot product of the gradient and F ; the
divergence of the curl of F is 0.

The
divergence of the gradient of a function is the Laplace operator, del^{2}
= del dot del

The
flux of a vector field across a surface is the surface integral of the vector
field over the surface.

Stokes'
Theorem: The line integral over a closed line of a vector field F equals the
double integral of the

curl of F over a surface bounded by the line.

George
Stokes (Irish, 1819) was a Cambridge professor, studied fluid flow and light
and learned Stokes'

Theorem from Lord Kelvin.

The
line integral of the velocity field dotted with the unit tangent vector is the
circulation.

Gauss's
Theorem (Divergence Theorem or Ostrogradsky's Theorem):
the surface integral of a vector field

F over a closed surface equals the triple integral of
the divergence of F over the volume enclosed by

the surface.

Karl
Friedrich Gauss (German, 1777) studied electrostatics and developed the
Divergence Theorem.

Spring
equation: m(d^{2}x/dt^{2})
+ c(dx/dt) + kx = F(t); Kirchoff's
Law: L(d^{2}Q/dt^{2}) + R(dQ/dt) + Q/C = E(t)

(displacement - charge, velocity - current, mass -
inductance, damping constant - resistance, spring

constant - elastance, external force - electromotive
force)

P(n,r)
= n!/(n-r)!

Arrangements
of n objects of r types: n!/(n_{1}!*n_{2}!*. . .*n_{r}!)

C(n,r)
= n!/(r!(n-r)!)

Binomial
Theorem: (x+y)^{n} = summation of C(n,k)*x^{k}y^{n-k}
for k from o to n

Combinations
with repetition: C(n+r-1,r) ; distribute r identical objects n ways

Catalan
Numbers (by Belgian Eugene Catalan, 1814): number of ways to parenthesize a
product or go from

origin to (n,n) below y=x; b_{n} =
C(2n,n)/(n+1); 1,1,2,5,14,42,132

DeMorgan's
Laws: -(p v q) = -p A -q ; -(p A q) = -p v -q

Idempotent
Laws: p v p = p ; p A p = p

Domination
Laws: p v T = T ; p A F = F

Absorption
Laws: p v (p A q) = p ;
p A (p v q) = p

Identity
Laws: p v F = p ; p A T = T

Inverse
Laws: p v -p = T ; p A -p; = F

Contrapositive
of p -> q : -q -> -p

Converse
of p -> q : q -> p

Inverse
of p -> q : -p -> -q

Modus
Ponens (Rule of Detachment): p, p ->q
: q

Modus
Tollens: p -> q, -q : -p

Law
of the Syllogism: p -> q, q -> r : p -> r

Rule
of Disjunctive Syllogism: p v q, -p : q

Quantifiers
for all and for each

Rule
of Universal Specification: If an open statement is true for all replacements,
then it is true for each

specific member

Rule
of Universal Generalization: If an open statement is true for any arbitrary
replacement, it is true for all

replacements.

George
Boole established mathematical logic, Augustus DeMorgan extended it, and
Charles Sanders Peirce

introduced quantifiers.

The
power set of A is the set of all subsets of A and has 2^{n} elements.

The
dual of a set expression interchanges universal sets and null sets, and unions
and intersections.

Venn
diagrams

Georg
Cantor (Russian, 1845) defined sets, Bertrand Russell (English, 1872)
introduced Russell's paradox

showing inconsistencies in set theory, and Kurt Godel
(Austrian, 1906) showed that some things must either

be unprovable or contradictory.

Well-Ordering
Principle: every nonempty subset of the positive integers contains a smallest
element.

Mathematical
Induction: If S(1) is true, and whenever S(k) is true, S(k+1) is true, then
S(n) is true for all

positive integers k; basis and inductive steps.

Euclid's
theorem that there are infinitely many primes.

Euclid's
algorithm for finding greatest common divisor.

A
Diophantine equation is a linear equation requiring integer solutions.

Fundamental
Theorem of Arithmetic: Every integer greater than one can be written as a
unique product of

primes.

Algorithm
and algebra come from the Islamic mathematician al-Khowarizimi.

Augusta
Ada Byron Lovelace developed the first computer algorithm.

The
Cartesian product of two sets combines all elements, of size the product of
the sizes of the two sets.

There
are 2^{|A|*|B|} relations from A to B.

In
a function from A to B, every element of A appears exactly once as the first
component of an ordered

pair in the relation.

There
are |B|^{|A|} functions from A to B.

In
an injective, or one-to-one function, each element of B appears at most once
as the image of an element

of A.

There
are P(|B|,|A|) injective functions from A to B.

The
domain of A can be restricted or extended to form images.

In
a surjective, or onto function, for all elements b of B there is at least one
element a of A such that f(a)=b.

There
are the summation of (-1)^{k}(n,n-k)(n-k)^{m} for k from 0 to
n onto functions from A to B.

Stirling
numbers of the second kind are 1/n! times the number of onto functions, and
are the number of

ways to distribute m distinct objects into n
identical containers.

There
are |A|^{|A|*|A|}_{ }closed binary operations on A.

Pigeonhole
Principle: If m pigeions occupy n pigeonholes and m>n, then at least one
pigeonhole has two or

more pigeons roosting in it.

A
function is bijective if it is both one-to-one and onto.

A
function is invertible if and only if it is one-to-one and onto; there exists
a function from its codomain to

its domain such that compoing in either order results
in the identity.

The
preimage is like the inverse for functions that are not invertible.

If
|f(n)| <= m|g(n)| for all n>=k, then f is an element of O(g).

Peter
Dirichlet formulated the pigeonhole principle.

Paul
Bachmann introduced big-Oh notation.

Deterministic
finite automata accept regular languages, push-down automata accept
context-free languages,

and Turing machines accept recursively ennumerable
languages.

Logistic
population model: dP/dt = k(1-P/N)*P

A
differential equation with an initial condition of the solution function is an
initial-value problem.

A
differential equation is separable if it can be written as the product of one
function depending only on

time and another that depends only on the dependent
variable.

An
ODE dependent only on the dependent variable is autonomous.

An
implicit form of a solution is not solved explicitly for the dependent
variable.

A
slope field shows the approximate direction (slope) of solutions at points in
the graph.

Euler's
Method: y_{k+1} = y_{k} + f(t_{k},y_{k})*dt

The
Existence and Uniqueness Theorem states that solutions exist for all
continuously

defined single ODEs dy/dt = f(t,y), and that they are
unique if df/dy is continuous .

Points
at which the function is not changing (f(t,y) = 0) are equilibrium points.

Phase
lines show the direction of solutions between equilibrium points on a straight
line.

An
equilibrium point may be a source (solutions go towards it), sink (solutions
go away from it), or node.

The
Linearization Theorem classifies equilibrium points according to f'(y_{0}).

A
bifurcation is when a small change in a parameter causes a drastic change in
the long-term behavior of

solutions; a bifurcation diagram shows the phase
lines near a bifurcation value.

Integrating
factor: I = e^{integral of a(t)}
for ODE in form dy/dt + a(t)y = k

The
phase portrait graphs solutions to systems of two equations with the two
dependent variables as axes.

Undamped
spring harmonic oscillator: d^{2}y/dt^{2} + (k/m)y = 0

The
vector field is a slope field for a system of ODEs on the phase plane.

A
system of ODEs in which the rate of change of one or more dependent variables
depends only on its own

value decouples.

The
P-Delta effect (gravity-overhang) makes oscillating buildings fall.

The
Lorenz System of three ODEs led to the development of Chaos Theory.

The
Linearity Principle states that if Y(t) is a solution to dY/dt = AY, kY(t) is
a solution, and if Y_{1}(t) and

Y_{2}(t) are solutions, Y_{1}(t) + Y_{2}(t)
is a solution.

AV
= lV for eigenvalue l and eigenvector V.

The
roots of the characteristic polynomial are the eigenvalues of a matrix.

Linear
system equilibrium points may be sources (both eigenvalues +), sinks (both -),
or saddles (+ and -).

Euler's
Formula: e^{a+ib} = e^{a}cosb + ie^{a}sin ;
spiral source (a>0), spiral sink (a<0), periodic center (a=0)

Damped
harmonic oscillator: m(d^{2}y/dt^{2}) + b(dy/dt) + ky = 0 ;
b^{2} - 4km determines if it is

under/over/critically damped.

The
trace of [a,b;c,d] is a+d; the
trace-determinant plane is a parameter plane that may be graphed.

In
a homogeneous second-order equation, there is no term without some form of the
dependent variable.

The
Method of Undetermined Coefficients tries to solve forced (nonhomogenous)
second-order equations

by guessing a solution with a parameter, substituting
in, and solving for the parameter.

Van
der Pol Equation: dx/dt = y, dy/dt = -x + (1-x^{2})y

Linearization
neglects nonlinear terms near equilibrium points.

Separatrices
tend toward a saddle equilibrium point as t goes to +- infinity, separating
solutions with

different behaviors.

Nullclines
are lines or curves along which the derivative of one of the dependent
variables is zero.

For
a Hamiltonian system, dx/dt = dH/dy and dy/dt = -dH/dx, including the harmonic
oscillator and the idel

pendulum (dx/dt = y, dy/dt = -g*sin(x)).

Fourier
transform: integral of y(t)*e^{-iwt}dt from - infinity to + infinity

Laplace
transform: integral of y(t)*e^{-st}dt from 0 to infinity

L(dy/dt) = sL(y) - y(0)

L(e^{at}) = 1/(s-a)

L(t^{n}) = n!/s^{n+1}

L(cf) = cL(f)

L(cos(wt)) = s/(s^{2}+w^{2})

L(sin(wt)) = w/(s^{2}+w^{2})

A
Heaviside function is a step function; it has a discontinuity in its
definition.

Convolution:
(f*g)(t) = integral of f(t-u)g(u)du from 0 to t

Triangular
form: in kth equation coefficients of first k-1 variables are zero

Elementary
row ops: interchange, multiply by scalar, replace row with multiple of another
row

Row
echelon form: first nonzero entry in each row is one, and
number of leading zeros in each row is more

than in previous row

Gaussian
elimination: perform row ops to transform matrix into row echelon form

Reduced
row echelon form: the first nonzero entry in each row is the only nonzero
entry in its column

Gauss-Jordan
elimination: perform row ops to transform matrix into reduced row echelon form

Homogenous
system of linear equations: constants on right side are all zero

Euclidean
n-space: set of all n*1 matrices of real numbers

For
a nonsingular (invertible) matrix A, there is a multiplicative inverse B such
that AB = BA = I (identity)

Transpose:
switch columns and rows

Row
equivalent matrices: can be transformed into each other by elementary row ops

The
minor of an element is the determinant of the matrix formed by deleting the
row and column of that

element, and its cofactor is (-1)^{i+j} times
that determinant.

Interchanging
rows changes determinant sign, multiplying a row by a scalar multiplies the
determinant by

that scalar, and adding a multiple of one row to
another does not change the determinant.

Cramer's
Rule: For Ax = b, x_{i} = det (A_{i})/det(A)

Adjoint:
replace each term by its cofactor and transpose; adj(A) = det(A)*A^{-1}

A
vector space is a set on which the operations of addition and scalar
multiplication are defined and adhere

to the vector space axioms.

A
subset of a vector space is a nonempty subset which is closed under addition
and scalar multiplication.

The
nullspace of a matrix A is the set of all solutions to Ax = 0.

The
set of all linear combinations of a set of vectors is the span of those
vectors.

Vectors
are linearly dependent if their sum, with each multiplied by a nonzero
coefficient, can be zero.

The
Wronskian of a set of n functions is a matrix of their first through nth
derivatives.

A
set of linearly independent vectors that span a vector space form a basis.

The
dimension of a vector space is the minimum number of vectors that form a
basis.

The
row (column) space of a matrix is the subspace spanned by its row (column)
vectors.

The
rank of a matrix is the dimension of the row space of the matrix.

The
nullity of a matrix is the dimension of its nullspace.

A
linear tranformation is a mapping from one vector space to another.

The
kernel of a linear transformation is the set of vectors in the original vector
space that are mapped to 0.

The
image of a subspace of the original vector space of a linear transform is the
set of vectors in the new

vector space mapped from the original subspace; the
image of the entire vector space is the range of the

linear transform.

Rotations
may be done by the linear transform L(x) = [cos,-sin; sin,cos]x

A
matrix B is similar to a matrix A if there is a nonsingular matrix S such that
B = S^{-1}AS.

Cauchy-Schwartz
Inequality: |x^{T}y| <= ||x|| ||y||

Orthogonal:
x^{T}y = 0

Dot
(scalar) product: dot(a,b) = |a||b|cos(q)

Cross
(vector) product: cross(a,b) = |a||b|sin(q)n

Scalar
projection: comp_{a}b = dot(a,b)/|a|

Vector
projection: proj_{a}b = dot(a,b)/|a|^{2}*a

If
a vector space can be written as a sum of two subspaces, it is a direct sum of
those subspaces.

An
inner product is an operation on a vector space that assigns a real number to
each of its pairs of vectors.

Pythagorean
Law: ||u+v||^{2} = ||u||^{2} + ||v||^{2} for
orthogonal u and v in an inner product space

AM
Legendre and Carl Gauss developed the technique of least squares to
approximate overdetermined

systems.

If
the product of any two vectors (one of them transposed) in an inner product
space is 0, it is an orthogonal

set.

An
orthonormal set of vectors in an orthogonal set of unit vectors.

The
column vectors of an orthogonal matrix form an orthonomal set.

Parseval's
Formula for orthonormal basis for inner product spaces

Cooley
and Tukey developed the Fast Fourier Transform to process signals.

The
Gram-Schmidt orthogonalization process constructs an orthonormal basis for an
inner product space.

A
matrix can be factored into QR, with Q having orthonormal columns and R being
upper triangular and

invertible.

Jacobi
(including Legendre and Tchebycheff), Hermite, and Laguerre
polynomials are orthogonal with

respect to an inner product

AV
= lV for eigenvalue l and eigenvector V

Characteristic
polynomial: p(l) = det(A - lI)

A
matrix is diagonalizable if there is a matrix X such that X^{-1}AX is
a diagonal matrix.

A
stochastic process depends on chance, and a Markov process is a stochastic
process with a finite set of

possible outcomes where the probability of next
outcome depends only on previous outcome and the

probabilities are constant with time.

A
matrix is Hermitian if it is unchanged by conjugating each entry and then
taking the transpose.

A
matrix is unitary if its column vectors form an orthonormal set.

Schur's
Theorem says that there is a matrix U such that U^{H}AU is upper
triangular.

The
Spectral Theorem says that there is a unitary matrix that diagonalizes a
Hermitian matrix.

A
matrix is normal if AA^{H} = A^{H}A.

A
matrix is positive definite if all eigenvalues are positive.

Cholesky
decomposition: A symmetrix positive definite matrix may be factored into LL^{T}
with lower

triangular L.

Descriptive
and inferential statistics

Stem-and-leaf
displays: leading digit(s) are stem and trailing digits leaves

Histograms,
mean, median, mode, quartiles, boxplots

Sample
variance: sum(x-mean)^{2}/(n-1)

Sample
standard deviation = sqrt(sample variance)

Outlier
> 1.5f, extreme > 3f

P(A)
= 1 - P(A')

P(A
u B) = P(A) + P(B) - P(A A B)

Permutations:
ordered sequence of r from n: P(n,r) = n!/(n-r)!

Combinations:
Unordered subset of r from n: C(n,r) = n!/(r!(n-r)!)

Conditional
Probability: P(A|B) = P(A A B)/P(B)

Multiplication
Rule: P(A A B) = P(A|B)*P(B)

Law
of Total Probability: P(B) = P(B|A_{1}) + . . . + P(B|A_{k})
for mutually exclusive and exhaustive events

A_{1} . . . A_{k}

Bayes'
Theorem: P(A_{j}|B) = P(A_{j} A B)/P(B) = P(B|A_{j})*P(A_{j})/(P(B|A_{1})
+ . . . + P(B|A_{k}))

P(A
A B) = P(A)*P(B) if
A and B are independent

A
random variable is any rule that associates a number with each outcome in the
sample space; can be

discrete or continuous; a Bernoulli random variable
is either 0 or 1

Probability
mass function: p(x) = P(X=x)

Cumulative
distribution function: F(x) = P(X <= x)

Expected
value (mean value): E(X) = summation of x times p(x)

Variance:
V(X) = summation of (x - expected value)^{2}*p(x) = E[(X-expected
value)^{2}]

Standard
deviation: SD = sqrt(variance)

Binomial
probability: (n,x)p^{x}(1-p)^{n-x} probability of x successes out of n, each with
probability p

E(X) = np, V(X) = np(1-p)

Hypergeometric
distribution is exact form of binomial; negative binomial distribution lets
number of trials

vary with successes fixed

Poisson
distribution: p(x;L) = e^{-L}L^{x}/x! ;
E(X) = V(X) = L ; Poisson
process with L = at

Use
integrals rather than summations for continuous random variables

Normal
Distribution: (1/(sqrt(2*pi)*sigma)*e^{-(x-u)*(x-u)/(2*sigma*sigma)} ;
in standard normal distribution,

mean u = 0 and standard deviation sigma = 1

Tail
area z_{a} is the value for which a of the area lies to the right

Standardize:
Z = (X-u)/sigma

About
68% within 1 SD, 95% within 2 SD, 99.7% within 3 SD

Gamma
Distribution: 1/(B^{a}L(a)) x^{a-1}e^{-x/B}
; E(X) = aB, V(X) = aB^{2}

Exponential
Distribution: Le^{-Lx} ;
E(X) = 1/L, V(X) = 1/L^{2}

Chi-Squared
Distribution: Gamma distribution with a = v/2 and B = 2, with v = number of
degrees of

freedom

Other
Continuous Distributions: Weibull, Lognormal, Beta

Joint
probability mass function: double sum or double integral of p(x,y) or f(x,y)

Marginal
probability density functions: f_{x}(x) = integral f(x,y)dy ; f_{y}(y)
= f(x,y)dx from -inf. to +inf.

Expected
value: E[h(X,Y)] = double integral of h(x,y)*f(x,y)dxdy from -inf. to +inf.

Covariance:
Cov(X,Y) = E[(X-u_{x})(Y-u_{y})] = double integral of (x-u_{x})(y-u_{y})*f(x,y)dxdy
from -inf. to +inf

= E(XY) - u_{x}u_{y}

Correlation
coefficient: p_{x,y} = Cov(X,Y)/(sigma_{x}*sigma_{y})
; = 0 if X and Y are
independent

100(1-a)%
confidence interval for the mean u is sample mean +- z_{a/2} * (sigma/sqrt(n))

Sample
size needed for an interval width w: n = (2*z_{a/2} * (sigma/w))^{2}

Prediction
interval: sample mean +- t_{a/2,n-1} *s*sqrt(1+1/n)
with t_{a,v} the value for which the area under the

curve with v degrees of freedom to the right is a

Hypothesis
testing: null hypothesis rejected
if test statistic is in rejection region;
type I error wrongly

rejects, type II wrongly does not reject

Test
statistic: z = (sample mean - actual mean)/(sigma/sqrt(n))

ANOVA:
analysis of variance

Group
- a set with associative multiplication, an identity element, and an inverse
operation

Field
- a group on which addition, subtraction, and division (except by zero) are
universally defined

Ring
- a group with commutative and associative addition, a zero, negatives of all
elements, and a

distributive law of multiplication and addition.

An
Abelian group is commutative.

Galois
theory - the conditions necessary for an equation to be solvable by radicals

Modulus
of complex number is its magnitude

DeMoivre's
Theorem: z^{n} = r^{n}(cos(nq)
+ isin(nq))

Euler's
Formula: e^{iy} = cos(y) + isin(y)

Claude
Shannon wrote "Statistical Theory of Communication"

Dick
Hamming developed Hamming codes for error correction; with n parity bits you
can send 2^{n}-n-1

message bits, correcting one error.

Golay
code correct 3 errors with 12 message bits and 11 parity bits.

Hamming
distance: how many bits are different

|message
space|*|ball size| = |sending space|

Tufte
wrote books about the visual display of information

Shannon's
definition of information: Information(A) = -log(P(A));
entropy = - integral of p log p ;

entropy of English is about 3.9; maximum
for n letters in 2^{n}

Huffman
codes for data compression; no
code can be a prefix of another code; # bits >=

(# symbols sent )*(entropy)

Distribution
of noise: Gauss (law of large numbers), Poisson (law of rare events)

Primes

Fermat prime: p = 1+2^{2^N}
for some nonnegative integer N

Mersenne prime: p = -1 + 2^{q}
for some prime q

Prime
moduli

Fermat's Theorem: a^{m-1} = 1 mod m
for prime m

Euler's Totient Theorem: t(p^{a}) = (p-1)*p^{a-1}
if p is prime

t(a*b) = t(a)*t(b) if
gcd(a,b) = 1

Carmichael's universal exponent function

Euclidean
method for finding gcd by successive divisions, added to by Aho/Hopcroft/Ullman
with matrices

and row operations

Suntze's
Theorem (the Chinese Remainder Theorem) outputs a y such that x = y mod (ab),

given x = umod a and x = w mod b

Merkel
and Hellman developed the trapdoor knapsack encryption system but Shamir broke
it

RSA
(Riuest - Shamir - Adleman) pick two primes and publicize their product;
breaking it requires solving

a discrete log problem

Diffie
and Helman defined digital signatures

Shamir
and Blakley threshold schemes for requiring k pieces of information to
determine secret

Purdy
developed high security log-in ids

Data
Encryption Standard (DES) uses a 64-bit plaintext and 56-bit key, 16
transformations

Truncation
vs. Rounding error

Condition
number = |relative change in solution|/|relative change in input data|

IEEE
SP 24 bit precision, IEEE DP 53 bit precision

Machine
precision = B^{1-t}

Solving
Linear Systems

Gaussian elimination for LU factorization with partial pivoting

Gauss-Jordan elimination to diagonal form

Sherman-Morrison formula for inverse of a matrix with a rank-one change
to matrix with known

inverse

Woodbury formula for inverse of a matrix with a rank-k change to matrix
with known inverse

One-norm (Manhattan norm) is sum of absolute values; p-norm is pth root
of sum of (values)^{p}

Matrix vector one-norm is max absolute column sum; infinity-norm is max
absolute row sum

Condition number of matrix = ||A||*||A^{-1}||

A symmetric (A = A^{T}) positive definite (x^{T}Ax
> 0 for all x) matrix can be factored A = LL^{T} by

Cholesky
factorization, scaling columns by square root of diagonal entry

Band systems are nearly diagonal; tridiagonal is a
band with bandwidth one

Linear
Least Squares (Ax ~ b)

For overdetermined systems; has unique solution if the columns of A are
linearly independent

For orthogonal vectors, y^{T}z = 0

Normal equations method: compute Cholesky factorization of A^{T}A

Augmented Systems Method: variation on normal equations method

Orthogonal matrices have orthonormal columns, and orthogonal
transformations preserve the

Euclidean
norm

QR orthogonal transformations

Householder: H = I - 2vv^{T}/(v^{T}v)

Givens Rotation: compute sine and cosine

Gram-Schmidt Orthogonalization: determine two orthonormal vectors than
span the same

subspace
as two given vectors by orthogonalizing one against the other

Solution is not unique if matrix is rank deficient

Eigenvalues
and Singular Values

Spectrum is all eigenvalues and spectral radius is maximal eigenvalue

Characteristic polynomial: det(A-LI) = 0

Symmetric (A = A^{T}), Hermitian (A = A^{H}),
Orthogonal (A^{T}A = AA^{T} = I), Unitary (A^{H}A = AA^{H}
=

I),
Normal (A^{H}A = AA^{H})

Similarity transforms: B is similar to A if B = T^{-1}AT, and
eigenvalues are preserved

Jacobi method for symmetric matrices:
iterate A_{k+1} = J_{k}^{T}A_{k}J_{k}^{
}using [cos,sin; -sin,cos] plane

rotation
matrix

QR Iteration: A_{k+1}
= R_{k}Q_{k}

Preliminary reduction to Hessenberg (triangular except one extra
adjacent nonzero diagonal) by

Householder

Power method for largest eigenvalue: x_{k} = Ax_{,k-1}
, optionally with normalization, shifts, and

deflation

Inverse iteration of power method for smallest or given approximated
eigenvalue

Rayleigh Quotient Iteration; Rayleigh-Ritz
procedure for approximating eigenvalue over subspace

of
higher dimension

Lanczos and spectrum-slicing methods for symmetric matrices

Singular value decomposition for rectangular matrices

Nonlinear
Equations

Bisection method: reducing search interval by half each step

Fixed-point iteration: use fixed point where x = g(x)

Newton's method: x_{k+1} = x_{k} - f(x_{k})/f'(x_{k})

Secant method: like Newton's method but approximates derivative

Inverse Interpolation: fit quadratic to three points

Linear fractional interpolation: solve three-equation system, with
horizontal and vertical

asymptotes

Broyden's Method for nonlinear systems: updates approximate Jacobian
matrix

Optimization

If all functions are linear, this is a linear programming problem

Hessian matrix is matrix of second partial derivatives

Golden section search: choose relative positions of search boundary
points as t and 1-t, with t =

(sqrt(5)-1)/2
~ 0.618

Successive Parabolic Interpolation: iteratively fit three points to
parabola and find minimizer

Newton's method: x_{,k+1} = x_{k} - f'(x_{k})/f"(x_{k})

Steepest Descent Method: iteratively subtract line search parameter
times gradient

Newton's method (for multidimensional unconstrained opt.): use gradient
and Hessian

Secant updating methods including BFGS method, approximating Hessian

Conjugate gradient method, quasi-Newton methods, and truncated Newton
methods

Gauss-Newton method for nonlinear least squares using Jacobian

Levenberg-Marquardt method, weighted combination of steepest descent
and Gauss-Newton

Lagrange multipliers for constrained optimization

Simplex method for linear programming

Interpolation

A matrix whose columns are successive powers of an independent variable
is a Vandermonde

Monomial basis: find polynomial of degree n-1 through n points

Lagrange interpolation

Newton interpolation

Incremental Newton interpolation using divided differences

Hermite (osculatory) cubic interpolation using values and derivatives
at data points

Cubic spline (piecewise polynomial of degree 3 continuously
differentiable 2 times) interpolation,

using
equations for each derivate to get system of linear equations

B-splines form a basis for the family of spline functions of a given
degree

Numerical
Integration and Differentiation

Numerical quadrature is the approximation of definite integrals

Newton-Cotes Quadrature Rules: Midpoint,
Trapezoid, and Simpson's rules

Method of undetermined coefficients creates a Vandermonde system and
solves for coefficients

Gaussian Quadrature Rules are based on interpolation, bunching nodes at
endpoints instead of

evenly,
often using Legendre polynomial; Gauss-Kronrod
Quadrature Rules

Finite difference approximations for smooth function derivatives;
centered difference formula

Automatic Differentiation

Richard extrapolation improves accuracy of numerical integration or
differentiation

Romberg integration uses Richardson extrapolation with trapezoid
quadrature rule

Initial
Value ODEs

Euler's Method: y_{k+1} = y_{k} + f(t_{k},y_{k})*dt ; stability
interval (-2,0)

Implicit backward Euler method: y_{k+1} = y_{k} + f(t_{k+1},y_{k+1})h_{k}

Taylor series method use high-order derivatives

Runge-Kutta methods, including Heun's second-order, and classical
fourth-order (which reduces

to
Simpson's rule if f depends only on t)

Extrapolation, multistep, predictor-corrector, and
multivalue methods

Boundary
Value ODEs

Shooting method: replace bv problem with a sequence of iv problems
whose solutions converge

to
bv problem

Superposition method: also replaces bv problem with iv problems

Finite difference method: convert bv problem into algebraic system of
equations, replacing

derivatives
with finite difference approximations

Finite element method: approximate bv problem by a linear combination
of piecewise polynomial

basis
functions; weighted residual
methods

Collocation: residual is zero

Galerkin: residual is orthogonal to space spanned by basis functions

Rayleigh-Ritz: residual is minimized in weighted least squares sense

Partial
Differential Equations

Hyperbolic (time-dependent physical processes not evolving to steady
state), parabolic (time-

dependent physical processes evolving toward steady
state), and elliptic (time-independent,

already at steady state) classifications of PDEs

Semidiscrete methods with finite differences or finite elements

Fully discrete methods

Jacobi, Gauss-Seidel, Successive Over-Relaxation, Conjugate Gradient,
and Multigrid methods

Fast
Fourier Transform

Trig interpolation represents a periodic function as a linear
combination of sines and cosines

Continuous Fourier Transform, Fourier series expansion, and Discrete
Fourier Transform

Random
Numbers and Stochastic Simulation

Stochastic simulation mimics a system by exploiting randomness to
obtain a statistical sample of

possible
outcomes

Random number generators: congruential (using seed and modulus),
Fibonaci, and nonuniform