Meeting
points in a triangle: orthocenter (altitudes), incenter (angle bisectors;
inscribed circle), centroid
(medians), circumcenter (perpendicular
bisectors; circumscribed circle)
Parabola:
x2 = 4py ; focus (0,p), directrix y = -p
Ellipse:
x2/a2 + y2/b2 = 1; foci (+-sqrt(a2-b2),0),
vertices (+-a,0)
Hyperbola:
x2/a2 - y2/b2 = 1
; foci (+-sqrt(a2+b2),0), vertices (+-a,0),
asymptotes y = +-(b/a)x
abscissa
(x), ordinate (y)
cos(pi/3)
= 1/2
sin(2x)
= 2sin(x)cos(x); cos2(x) = (1+cos(2x))/2; sin2(x) =
(1-cos(2x))/2
Law
of Cosines: a2 = b2 + c2 -2*b*c*cosA
Law
of Sines: (sinA)/a = (sinB)/b = (sinC)/c
Dot
(scalar) product: dot(a,b) = |a||b|cos(q)
Cross
(vector) product: cross(a,b) = |a||b|sin(q)n
Scalar
projection: compab = dot(a,b)/|a|
Vector
projection: projab = dot(a,b)/|a|2*a
Squeeze
Theorem: limit of a function between two functions
Isaac
Newton (British, 1642) attended lectures given by Isaac Barrow and published Principia
Mathematica in 1687 due to the
encouragement of Halley.
Augustin-Louis
Cauchy (French, 1789) developed the rigorous definition of limits.
Intermediate
Value Theorem: for any N between f(a) and f(b), there is a c in [a,b] such that f(c) = N
f'(x)
= lim (f(x+h)-f(x))/h
Gottfried
Leibniz (German, 1646) worked as a diplomat and developed calculus, publishing
it in 1684, and
used the Leibnez notation
(d/dx).
Binomial
Theorem: (x+y)n = xn + nxn-1y
+ . . . + (n,k)xn-kyk + . . . + nxyn-1 + yn
Poiseuille
developed the law of laminar flow, relating velocity and radius in fluid flow
in a tube; flux is
proportional to the fourth
power of the radius.
Derivatives:
sin = cos, cos = -sin, tan = sec2, csc = -csc*cot, sec = sec*tan,
cot = -csc2
Chain
Rule: dy/dx = (dy/du)*(du/dx)
Niels
Abel (Norway, 1800s) showed that there is no general formula for a fifth-degree
equation.
A
cycloid is the curve traced out by a point on the circumference of a circle as
the circle rolls along a line.
Newton-Raphson
method for approximating zeros of a function.
Hadamard
and de la Vallee Poussin proved the Prime Number Theorem, proposed by Gauss.
Newton's
law of cooling states that the rate of cooling of an object is proportional to
the temperature
difference between the
object and its surroundings.
Catenary
curve: y = C+a*cosh(x/a)
L'Hospital's
rule: lim(f/g) = lim(f'/g')
Extreme
Value Theorem: f has an absolute minimum and maximum in [a,b]
Fermat's
Theorem: f'(c) = 0 at max or min point c
Pierre
Fermat (French, 1601) was a lawyer who developed analytical geometry with Rene
Descarte.
Critical
number: f'(c) = 0 or does not exist
Mean
Value Theorem: f'(c) = (f(b)-f(a))/(b-a)
Joseph-Louis
Lagrange (French, 1736) was born in Italy, developed the Mean Value Theorem,
applied
calculus to the analysis of
the solar system, and worked at the Berlin Academy under Frederick the Great
and the Louvre under Louis
XVI.
First
and Second Derivative Tests for local minima or maxima
Bernhard
Riemann (Germann, 1826), studied under Gauss at Gottingen, defined integrals in
terms of the
limit of an infinite
summation (Riemann sum), and died of TB at 39.
Midpoint
(error over 24n2), Trapezoid (error over 12n2), and
Simpson's (error over 180n4) rules for
approximating integrals
Fundamental
Theorem of Calculus: if g(x) = integral of f(t) from a to x, g'(x) = f(x) ; integral of f(x) from
a to b equals F(b) - F(a)
with F the antiderivative of f
Augustin
Fresnel (French, 1788) developed the Fresnel function (integral of sin(pi*t2/2)
from 0 to x), used
in optics and highway
construction.
Substitution
Rule for integrals
ln(x)
= integral of 1/t from 1 to x
Disk
method for volumes of revolution: integral of pi times f(x)2
Washer
method for volumes of revolution: pi times the integral of f(x)2 -
g(x)2
Cylindrical
shells method for volumes: integral of 2 times pi times x times f(x)
Average
value of a function: 1/(b-a) times integral of f(x) from a to b
Mean
Value Theorem for Integrals: integral of f(x) from a to b = f(c)*(b-a)
James
Bernoulli (Swiss, 1700s) introduced Bernoulli numbers: (1/n!) times the
summation from 0 to n of
(n,k)
times bk times xn-k and developed the technique for
solving separable ODEs.
Integration
by parts: integral of u*dv = u*v - integral of v*du
Thomas
Simpson (British, 1700s) was a weaver who studied math and popularized
Simpson's Rule.
An
integral over an infinite region or containing an infinite discontinuity is an
improper integral.
Planck's
Law of Radiation models blackbody radiation.
Logistic
growth: dy/dt = k*y*(M-y)
Arc
length: integral from a to b of square root of one plus (dy/dx)2
Christopher
Wren (British, 1600s) proved that the length of one arch of a cycloid is eight
times the radius
of the generating circle.
Moment:
sum of the mass times position of each particle
The
centroid is the center of mass.
Theorem
of Pappus: The volume of a solid formed by rotating a plane about a line is the
product of the area
of the plane and the
distance traveled by its centroid.
Chebyshev
polynomials: Tn(x) = cos(n*arcos(x))
A
monotonic sequence is always increasing (or always decreasing).
Every
bounded, monotonic sequence is convergent.
Sum
of infinite geometric series: a/(1-r)
In
a telescoping sum, the terms cancel in pairs, leaving just the first and the
last, such as the sum of
1/(n*(n+1)).
Harmonic
series: sum of (1/n) for n from 1 to infinity
Tests
for series convergence/divergence: Divergence test, integral test, comparison
test, p-series test (sum
of 1/np is
convergent if p>1), limit comparison (if lim(a/b) = c, both diverge or both converge).
Alternating
series: terms alternatingly positive and negative, such as the sum of (-1)n-1/n; the alternating
series test says it is
convergent if magnitudes are decreasing and the limit of the non-alternating
sequence
is zero,
and is absolutely convergent if the absolute value of the series
converges.
Ratio
test for absolute convergence (lim(|an+1/an|)
Power
series: summation of cnxn ; may
have a radius of convergence
Friedrich
Bessel (German, 1784) developed Bessel functions to solve Kepler's equation for
planetary
motion; Bessel functions are sums
of power series.
James
Gregory (Scottish, 1638) developed the power series for arctan, which leads to
the Leibniz formula
for pi: pi/4 = 1 - 1/3 +1/5
- 1/7 + . . .
Taylor
series: f(x) = summation of f(n)(a)*(x-a)n/n!
; if a = 0, it is the Maclaurin series.
Brook
Taylor (English, 1685) and Colin Maclaurin (Scottish, 1698) represented
functions as sums of power
series, as Gregory and
Bernoulli had done earlier.
e =
1 + 1/1! + 1/2! + 1/3! + . . .
sin(x)
= x - x3/3! + x5/5! - . . .
Parallelepiped
volume: dot(a,cross(b,c))
Ellipsoids:
x2/a2 + y2/b2 + z2/c2
= 1
Hyperboloids:
x2/a2 + y2/b2 - z2/c2
= 1 (one sheet) or -x2/a2 - y2/b2 +
z2/c2 = 1 (two sheets)
Cones:
z2/c2 = x2/a2 + y2/b2
Paraboloids:
z/c = x2/a2 + y2/b2 (elliptic) or
z/c = x2/a2 - y2/b2 (hyperbolic)
Cylinders:
x2/a2 = y2/b2 = 1
A
sharp corner in a vector function graph is a cusp.
Arc
length: integral of r'(t) from a to b
Curvature
(kappa): |dT/ds| = |T'(t)|/|r'(t)| = |r'(t)*r"(t)|/|r'(t)|3
Principal
unit normal vector: T'(t)/|T'(t)|
Binormal
vector: T(t) * N(t)
The
normal plane contains N and B, the osculating plane contains T and N, and the
osculating circle is in
the osculating plane and has
radius 1/curvature.
Level
curves: f(x,y) = k
Clairaut's
Theorem: order of taking partial derivatives does not matter.
Alexis
Clairaut (French, 1713) published the first systematic treatise on 3D analytic
geometry, including
the calculus of space
curves, and developed Clairaut's Theorem.
Solutions
to the Laplace equation (d2u/dx2 + d2u/dy2
= 0) are harmonic functions.
Wave
equation: d2u/dt2 = a2(d2u/dx2)
Chain
Rule: dz/dt = (dz/dx)(dx/dt) + (dz/dy)(dy/dt)
Implicit
Function Theorem: dy/dx = -Fx/Fy
The
gradient is the vector of partial derivatives of a function and is used with
the del operator.
Second
Derivative test uses sign of fxxfyy-fxy2
and fxx to tell if critical point is min (both > 0) or max (first
>
0 and second < 0)or saddle (first < 0)
Lagrange
multipliers are used to minimize or maximize a function subject to a
constraint; del(f) =
lambda*del(g) and g=k
Fubini's
Theorem: order of taking multiple integrals does not matter.
Guido
Fubini (Italian, 1879) proved a general version of Fubini's Theorem.
A
multiple definite integral is an iterated integral.
A
plane region of type I lies between the graphs of two continuous functions of
x.
Polar
coordinates: x = r*cos(q) ; y = r*sin(q) ; r2 = x2 + y2
; tan(q) = y/x ; (for cylindrical
add z = z)
Cardioid:
r = c + sin(q)
Rose:
r = cos(aq)
Limacon
(snail): r = 1+c*sin(q)
Conchoid:
r = a + b*sec(q)
Moment
of inertia: mass times square of distance to axis
Radius
of gyration of a lamina about an axis: mR2 = I
Surface
area: double integral of square root of fx2 + fy2
+ 1
Spherical
coordinates: x = rcos(q)sin(f) ; y = rsin(q)sin(f) ; z = rcos(f) ; r2 = x2 + y2
+ z2
The
Jacobian is the determinant of the matrix of partial derivatives of a system of
functions.
Jacobians:
r for polar and spherical, r2*sin(f) for spherical
Carl
Jacobi (German, 1804) developed Jacobians for evaluating multiple integrals.
A
vector field is conservative if it is the gradient of some scalar function.
The
fundamental theorem for line integrals: the line integral of the gradient of f
over a smooth curve from a
to b is f(r(b)) - f(r(a))
For
a conservative vector field F = Pi + Qj, dP/dy = dQ/dx
Green's
Theorem: The line integral over a closed line of Pdx + Qdy equals the double
integral of dQ/dx -
dP/dy over the area enclosed
by the curve.
George
Green (English, 1793) was a baker but went to Cambridge at age 40 but died 4
years after
graduation; Lord Kelvin reprinted his
findings.
The
curl of a vector F is the cross product of the gradient and F; the curl of the
gradient of f is zero.
The
divergence of a vector F is the dot product of the gradient and F ; the divergence of the curl of F is 0.
The
divergence of the gradient of a function is the Laplace operator, del2
= del dot del
The
flux of a vector field across a surface is the surface integral of the vector
field over the surface.
Stokes'
Theorem: The line integral over a closed line of a vector field F equals the
double integral of the
curl of F over a surface
bounded by the line.
George
Stokes (Irish, 1819) was a Cambridge professor, studied fluid flow and light
and learned Stokes'
Theorem from Lord Kelvin.
The
line integral of the velocity field dotted with the unit tangent vector is the
circulation.
Gauss's
Theorem (Divergence Theorem or Ostrogradsky's Theorem): the surface integral of
a vector field
F over a closed surface
equals the triple integral of the divergence of F over the volume enclosed by
the surface.
Karl
Friedrich Gauss (German, 1777) studied electrostatics and developed the
Divergence Theorem.
Spring
equation: m(d2x/dt2) + c(dx/dt) + kx = F(t); Kirchoff's
Law: L(d2Q/dt2) + R(dQ/dt) + Q/C = E(t)
(displacement - charge,
velocity - current, mass - inductance, damping constant - resistance, spring
constant - elastance,
external force - electromotive force)
P(n,r) = n!/(n-r)!
Arrangements
of n objects of r types: n!/(n1!*n2!*.
. .*nr!)
C(n,r) = n!/(r!(n-r)!)
Binomial
Theorem: (x+y)n = summation of C(n,k)*xkyn-k
for k from o to n
Combinations
with repetition: C(n+r-1,r) ; distribute r identical
objects n ways
Catalan
Numbers (by Belgian Eugene Catalan, 1814): number of ways to parenthesize a
product or go from
origin to (n,n) below y=x; bn = C(2n,n)/(n+1); 1,1,2,5,14,42,132
DeMorgan's
Laws: -(p v q) = -p A -q ; -(p A q) = -p v -q
Idempotent
Laws: p v p = p ; p A p = p
Domination
Laws: p v T = T ; p A F = F
Absorption
Laws: p v (p A q) = p ; p A (p v q) = p
Identity
Laws: p v F = p ; p A T = T
Inverse
Laws: p v -p = T ; p A -p; = F
Contrapositive
of p -> q : -q -> -p
Converse
of p -> q : q -> p
Inverse
of p -> q : -p -> -q
Modus
Ponens (Rule of Detachment): p, p ->q : q
Modus
Tollens: p -> q, -q : -p
Law
of the Syllogism: p -> q, q -> r : p -> r
Rule
of Disjunctive Syllogism: p v q, -p : q
Quantifiers
for all and for each
Rule
of Universal Specification: If an open statement is true for all replacements,
then it is true for each
specific member
Rule
of Universal Generalization: If an open statement is true for any arbitrary
replacement, it is true for all
replacements.
George
Boole established mathematical logic, Augustus DeMorgan extended it, and
Charles Sanders Peirce
introduced quantifiers.
The
power set of A is the set of all subsets of A and has
2n elements.
The
dual of a set expression interchanges universal sets and null sets, and unions
and intersections.
Venn
diagrams
Georg
Cantor (Russian, 1845) defined sets, Bertrand Russell (English, 1872)
introduced Russell's paradox
showing inconsistencies in
set theory, and Kurt Godel (Austrian, 1906) showed that some things must either
be unprovable or
contradictory.
Well-Ordering
Principle: every nonempty subset of the positive integers contains a smallest
element.
Mathematical
Induction: If S(1) is true, and whenever S(k) is true,
S(k+1) is true, then S(n) is true for all
positive integers k; basis
and inductive steps.
Euclid's
theorem that there are infinitely many primes.
Euclid's
algorithm for finding greatest common divisor.
A
Diophantine equation is a linear equation requiring integer solutions.
Fundamental
Theorem of Arithmetic: Every integer greater than one can be written as a
unique product of primes.
Algorithm
and algebra come from the Islamic mathematician al-Khowarizimi.
Augusta
Ada Byron Lovelace developed the first computer algorithm.
The
Cartesian product of two sets combines all elements, of size the product of the
sizes of the two sets.
There
are 2|A|*|B| relations from A to B.
In
a function from A to B, every element of A appears exactly once as the first
component of an ordered
pair in the relation.
There
are |B||A| functions from A to B.
In
an injective, or one-to-one function, each element of B appears at most once as
the image of an element of A.
There
are P(|B|,|A|) injective functions from A to B.
The
domain of A can be restricted or extended to form images.
In
a surjective, or onto function, for all elements b of B there is at least one
element a of A such that f(a)=b.
There
are the summation of (-1)k(n,n-k)(n-k)m
for k from 0 to n onto functions from A to B.
Stirling
numbers of the second kind are 1/n! times the number of onto functions, and are
the number of
ways to distribute m
distinct objects into n identical containers.
There
are |A||A|*|A| closed binary operations on A.
Pigeonhole
Principle: If m pigeions occupy n pigeonholes and m>n, then at least one
pigeonhole has two or
more pigeons roosting in it.
A
function is bijective if it is both one-to-one and onto.
A
function is invertible if and only if it is one-to-one and onto; there exists a
function from its codomain to
its domain such that
compoing in either order results in the identity.
The
preimage is like the inverse for functions that are not invertible.
If
|f(n)| <= m|g(n)| for all n>=k, then f is an element of O(g).
Peter
Dirichlet formulated the pigeonhole principle.
Paul
Bachmann introduced big-Oh notation.
Deterministic
finite automata accept regular languages, push-down automata accept
context-free languages,
and Turing machines accept
recursively ennumerable languages.
Logistic
population model: dP/dt = k(1-P/N)*P
A
differential equation with an initial condition of the solution function is an
initial-value problem.
A
differential equation is separable if it can be written as the product of one
function depending only on
time and another that depends
only on the dependent variable.
An
ODE dependent only on the dependent variable is autonomous.
An
implicit form of a solution is not solved explicitly for the dependent
variable.
A
slope field shows the approximate direction (slope) of solutions at points in
the graph.
Euler's
Method: yk+1 = yk + f(tk,yk)*dt
The
Existence and Uniqueness Theorem states that solutions exist for all
continuously
defined single ODEs dy/dt =
f(t,y), and that they are unique if df/dy is
continuous .
Points
at which the function is not changing (f(t,y) = 0) are
equilibrium points.
Phase
lines show the direction of solutions between equilibrium points on a straight
line.
An equilibrium
point may be a source (solutions go towards it), sink (solutions go away from
it), or node.
The
Linearization Theorem classifies equilibrium points according to f'(y0).
A
bifurcation is when a small change in a parameter causes a drastic change in
the long-term behavior of
solutions; a bifurcation diagram shows
the phase lines near a bifurcation value.
Integrating
factor: I = eintegral of a(t) for ODE in form dy/dt + a(t)y = k
The
phase portrait graphs solutions to systems of two equations with the two
dependent variables as axes.
Undamped
spring harmonic oscillator: d2y/dt2 + (k/m)y = 0
The
vector field is a slope field for a system of ODEs on the phase plane.
A
system of ODEs in which the rate of change of one or more dependent variables
depends only on its own
value decouples.
The
P-Delta effect (gravity-overhang) makes oscillating buildings fall.
The
Lorenz System of three ODEs led to the development of Chaos Theory.
The
Linearity Principle states that if Y(t) is a solution to dY/dt = AY, kY(t) is a
solution, and if Y1(t) and
Y2(t) are
solutions, Y1(t) + Y2(t) is a solution.
AV
= lV for eigenvalue l and eigenvector V.
The
roots of the characteristic polynomial are the eigenvalues of a matrix.
Linear
system equilibrium points may be sources (both eigenvalues +), sinks (both -),
or saddles (+ and -).
Euler's
Formula: ea+ib = eacosb + ieasin
; spiral source (a>0), spiral sink (a<0), periodic center (a=0)
Damped
harmonic oscillator: m(d2y/dt2) + b(dy/dt) + ky = 0 ; b2 - 4km determines if it is
under/over/critically
damped.
The
trace of [a,b;c,d] is a+d; the trace-determinant plane
is a parameter plane that may be graphed.
In
a homogeneous second-order equation, there is no term without some form of the
dependent variable.
The
Method of Undetermined Coefficients tries to solve forced (nonhomogenous)
second-order equations
by guessing a solution with
a parameter, substituting in, and solving for the parameter.
Van
der Pol Equation: dx/dt = y, dy/dt = -x + (1-x2)y
Linearization
neglects nonlinear terms near equilibrium points.
Separatrices
tend toward a saddle equilibrium point as t goes to +- infinity, separating
solutions with
different behaviors.
Nullclines
are lines or curves along which the derivative of one of the dependent
variables is zero.
For
a Hamiltonian system, dx/dt = dH/dy and dy/dt = -dH/dx, including the harmonic
oscillator and the idel
pendulum (dx/dt = y, dy/dt =
-g*sin(x)).
Fourier
transform: integral of y(t)*e-iwtdt from - infinity to + infinity
Laplace
transform: integral of y(t)*e-stdt from 0 to infinity
L(dy/dt) = sL(y) - y(0)
L(eat) = 1/(s-a)
L(tn) = n!/sn+1
L(cf) = cL(f)
L(cos(wt)) = s/(s2+w2)
L(sin(wt)) = w/(s2+w2)
A
Heaviside function is a step function; it has a discontinuity in its definition.
Convolution:
(f*g)(t) = integral of f(t-u)g(u)du from 0 to t
Triangular
form: in kth equation coefficients of first k-1 variables are zero
Elementary
row ops: interchange, multiply by scalar, replace row with multiple of another
row
Row
echelon form: first nonzero entry in each row is one, and number of leading
zeros in each row is more
than in previous row
Gaussian
elimination: perform row ops to transform matrix into row echelon form
Reduced
row echelon form: the first nonzero entry in each row is the only nonzero entry
in its column
Gauss-Jordan
elimination: perform row ops to transform matrix into reduced row echelon form
Homogenous
system of linear equations: constants on right side are all zero
Euclidean
n-space: set of all n*1 matrices of real numbers
For
a nonsingular (invertible) matrix A, there is a multiplicative inverse B such
that AB = BA = I (identity)
Transpose:
switch columns and rows
Row
equivalent matrices: can be transformed into each other by elementary row ops
The
minor of an element is the determinant of the matrix formed by deleting the row
and column of that
element, and its cofactor is
(-1)i+j times that determinant.
Interchanging
rows changes determinant sign, multiplying a row by a scalar multiplies the
determinant by
that scalar,
and adding a multiple of one row to another does not change the
determinant.
Cramer's
Rule: For Ax = b, xi = det (Ai)/det(A)
Adjoint:
replace each term by its cofactor and transpose; adj(A) = det(A)*A-1
A
vector space is a set on which the operations of addition and scalar
multiplication are defined and adhere
to the vector space axioms.
A
subset of a vector space is a nonempty subset which is closed under addition
and scalar multiplication.
The
nullspace of a matrix A is the set of all solutions to Ax = 0.
The
set of all linear combinations of a set of vectors is the span of those
vectors.
Vectors
are linearly dependent if their sum, with each multiplied by a nonzero
coefficient, can be zero.
The
Wronskian of a set of n functions is a matrix of their first through nth
derivatives.
A
set of linearly independent vectors that span a vector space form a basis.
The
dimension of a vector space is the minimum number of vectors that form a basis.
The
row (column) space of a matrix is the subspace spanned by its row (column)
vectors.
The
rank of a matrix is the dimension of the row space of the matrix.
The
nullity of a matrix is the dimension of its nullspace.
A
linear tranformation is a mapping from one vector space to another.
The
kernel of a linear transformation is the set of vectors in the original vector
space that are mapped to 0.
The
image of a subspace of the original vector space of a linear transform is the
set of vectors in the new
vector space mapped from the
original subspace; the image of the entire vector space is the range of the
linear transform.
Rotations
may be done by the linear transform L(x) = [cos,-sin;
sin,cos]x
A
matrix B is similar to a matrix A if there is a
nonsingular matrix S such that B = S-1AS.
Cauchy-Schwartz
Inequality: |xTy| <= ||x|| ||y||
Orthogonal:
xTy = 0
Dot
(scalar) product: dot(a,b) = |a||b|cos(q)
Cross
(vector) product: cross(a,b) = |a||b|sin(q)n
Scalar
projection: compab = dot(a,b)/|a|
Vector
projection: projab = dot(a,b)/|a|2*a
If a
vector space can be written as a sum of two subspaces, it is a direct sum of
those subspaces.
An
inner product is an operation on a vector space that assigns a real number to
each of its pairs of vectors.
Pythagorean
Law: ||u+v||2 = ||u||2 + ||v||2 for orthogonal
u and v in an inner product space
AM
Legendre and Carl Gauss developed the technique of least squares to approximate
overdetermined systems.
If
the product of any two vectors (one of them transposed) in an inner product
space is 0, it is an orthogonal set.
An
orthonormal set of vectors in an orthogonal set of unit vectors.
The
column vectors of an orthogonal matrix form an orthonomal set.
Parseval's
Formula for orthonormal basis for inner product spaces
Cooley
and Tukey developed the Fast Fourier Transform to process signals.
The
Gram-Schmidt orthogonalization process constructs an orthonormal basis for an
inner product space.
A
matrix can be factored into QR, with Q having orthonormal columns and R being
upper triangular and
invertible.
Jacobi
(including Legendre and Tchebycheff), Hermite, and Laguerre polynomials are
orthogonal with
respect to an inner product
AV
= lV for eigenvalue l and eigenvector V
Characteristic
polynomial: p(l) = det(A - lI)
A
matrix is diagonalizable if there is a matrix X such that X-1AX is a
diagonal matrix.
A
stochastic process depends on chance, and a Markov process is a stochastic
process with a finite set of
possible outcomes where the
probability of next outcome depends only on previous outcome and the
probabilities are constant
with time.
A
matrix is Hermitian if it is unchanged by conjugating each entry and then
taking the transpose.
A
matrix is unitary if its column vectors form an orthonormal set.
Schur's
Theorem says that there is a matrix U such that UHAU is upper
triangular.
The
Spectral Theorem says that there is a unitary matrix that diagonalizes a
Hermitian matrix.
A
matrix is normal if AAH = AHA.
A
matrix is positive definite if all eigenvalues are positive.
Cholesky
decomposition: A symmetrix positive definite matrix may be factored into LLT
with lower triangular L.
Descriptive
and inferential statistics
Stem-and-leaf
displays: leading digit(s) are stem and trailing digits leaves
Histograms,
mean, median, mode, quartiles, boxplots
Sample
variance: sum(x-mean)2/(n-1)
Sample
standard deviation = sqrt(sample variance)
Outlier
> 1.5f, extreme > 3f
P(A)
= 1 - P(A')
P(A
u B) = P(A) + P(B) - P(A A B)
Permutations:
ordered sequence of r from n: P(n,r) = n!/(n-r)!
Combinations:
Unordered subset of r from n: C(n,r) = n!/(r!(n-r)!)
Conditional
Probability: P(A|B) = P(A A B)/P(B)
Multiplication
Rule: P(A A B) = P(A|B)*P(B)
Law
of Total Probability: P(B) = P(B|A1) + . . . + P(B|Ak)
for mutually exclusive and exhaustive events
A1 . . . Ak
Bayes'
Theorem: P(Aj|B) = P(Aj A B)/P(B) = P(B|Aj)*P(Aj)/(P(B|A1)
+ . . . + P(B|Ak))
P(A
A B) = P(A)*P(B) if A and B are independent
A
random variable is any rule that associates a number with each outcome in the
sample space; can be
discrete or continuous; a
Bernoulli random variable is either 0 or 1
Probability
mass function: p(x) = P(X=x)
Cumulative
distribution function: F(x) = P(X <= x)
Expected
value (mean value): E(X) = summation of x times p(x)
Variance:
V(X) = summation of (x - expected value)2*p(x) = E[(X-expected
value)2]
Standard
deviation: SD = sqrt(variance)
Binomial
probability: (n,x)px(1-p)n-x ;
probability of x successes out of n, each with probability p
E(X) = np, V(X) = np(1-p)
Hypergeometric
distribution is exact form of binomial; negative binomial distribution lets
number of trials
vary with successes fixed
Poisson
distribution: p(x;L) = e-LLx/x!
; E(X) = V(X) = L ; Poisson process with L = at
Use
integrals rather than summations for continuous random variables
Normal
Distribution: (1/(sqrt(2*pi)*sigma)*e-(x-u)*(x-u)/(2*sigma*sigma)
; in standard normal distribution,
mean u = 0 and standard
deviation sigma = 1
Tail
area za is the value for which a of the area lies to the right
Standardize:
Z = (X-u)/sigma
About
68% within 1 SD, 95% within 2 SD, 99.7% within 3 SD
Gamma
Distribution: 1/(BaL(a)) xa-1e-x/B ; E(X) = aB, V(X) = aB2
Exponential
Distribution: Le-Lx
; E(X) = 1/L, V(X) = 1/L2
Chi-Squared
Distribution: Gamma distribution with a = v/2 and B = 2, with v = number of
degrees of
freedom
Other
Continuous Distributions: Weibull, Lognormal, Beta
Joint
probability mass function: double sum or double integral of p(x,y) or f(x,y)
Marginal
probability density functions: fx(x) = integral f(x,y)dy
; fy(y) = f(x,y)dx from -inf. to +inf.
Expected
value: E[h(X,Y)] = double integral of
h(x,y)*f(x,y)dxdy from -inf. to +inf.
Covariance:
Cov(X,Y) = E[(X-ux)(Y-uy)] =
double integral of (x-ux)(y-uy)*f(x,y)dxdy from -inf. to
+inf
= E(XY) - uxuy
Correlation
coefficient: px,y = Cov(X,Y)/(sigmax*sigmay)
; = 0 if X and Y are independent
100(1-a)% confidence interval for the mean u is sample mean +- za/2
* (sigma/sqrt(n))
Sample
size needed for an interval width w: n = (2*za/2 * (sigma/w))2
Prediction
interval: sample mean +- ta/2,n-1
*s*sqrt(1+1/n) with ta,v the value for which the area under the
curve with v degrees of
freedom to the right is a
Hypothesis
testing: null hypothesis rejected if test statistic is in rejection region; type
I error wrongly
rejects, type II wrongly
does not reject
Test
statistic: z = (sample mean - actual mean)/(sigma/sqrt(n))
ANOVA:
analysis of variance
Group
- a set with associative multiplication, an identity element, and an inverse
operation
Field
- a group on which addition, subtraction, and division (except by zero) are
universally defined
Ring
- a group with commutative and associative addition, a zero, negatives of all
elements, and a
distributive law of
multiplication and addition.
An
Abelian group is commutative.
Galois
theory - the conditions necessary for an equation to be solvable by radicals
Modulus
of complex number is its magnitude
DeMoivre's
Theorem: zn = rn(cos(nq) + isin(nq))
Euler's
Formula: eiy = cos(y) + isin(y)
Claude
Shannon wrote "Statistical Theory of Communication"
Dick
Hamming developed Hamming codes for error correction; with n parity bits you
can send 2n-n-1
message bits, correcting one
error.
Golay
code correct 3 errors with 12 message bits and 11 parity bits.
Hamming
distance: how many bits are different
|message
space|*|ball size| = |sending space|
Tufte
wrote books about the visual display of information
Shannon's
definition of information: Information(A) = -log(P(A)); entropy = - integral of
p log p ;
entropy of English is about
3.9; maximum for n letters in 2n
Huffman
codes for data compression; no code can be a prefix of another code; # bits
>=
(# symbols sent )*(entropy)
Distribution
of noise: Gauss (law of large numbers), Poisson (law of rare events)
Primes
Fermat prime: p = 1+22^N
for some nonnegative integer N
Mersenne prime: p = -1 + 2q
for some prime q
Prime
moduli
Fermat's Theorem:
am-1 = 1 mod m for prime m
Euler's Totient Theorem: t(pa)
= (p-1)*pa-1 if p is prime
t(a*b) = t(a)*t(b) ; if gcd(a,b) = 1
Carmichael's universal
exponent function
Euclidean
method for finding gcd by successive divisions, added to by Aho/Hopcroft/Ullman
with matrices
and row operations
Suntze's
Theorem (the Chinese Remainder Theorem) outputs a y such that x = y mod (ab),
given x = umod a and x = w
mod b
Merkel
and Hellman developed the trapdoor knapsack encryption system
but Shamir broke it
RSA
(Riuest - Shamir - Adleman) pick two primes and publicize their product;
breaking it requires solving
a discrete log problem
Diffie
and Helman defined digital signatures
Shamir
and Blakley threshold schemes for requiring k pieces of information to
determine secret
Purdy
developed high security log-in ids
Data
Encryption Standard (DES) uses a 64-bit plaintext and 56-bit key, 16
transformations
Truncation
vs. Rounding error
Condition
number = |relative change in solution|/|relative change in input data|
IEEE
SP 24 bit precision, IEEE DP 53 bit precision
Machine
precision = B1-t
Solving
Linear Systems
Gaussian elimination for LU
factorization with partial pivoting
Gauss-Jordan elimination to
diagonal form
Sherman-Morrison formula for
inverse of a matrix with a rank-one change to matrix with known inverse
Woodbury formula for inverse
of a matrix with a rank-k change to matrix with known inverse
One-norm (Manhattan norm) is
sum of absolute values; p-norm is pth root of sum of (values)p
Matrix vector one-norm is
max absolute column sum; infinity-norm is max absolute row sum
Condition number of matrix = ||A||*||A-1||
A symmetric (A = AT)
positive definite (xTAx > 0 for all x) matrix can be factored A =
LLT by
Cholesky factorization,
scaling columns by square root of diagonal entry
Band systems are nearly
diagonal; tridiagonal is a band with bandwidth one
Linear
Least Squares (Ax ~ b)
For overdetermined systems; has unique solution if the columns of A are
linearly independent
For orthogonal vectors, yTz
= 0
Normal equations method:
compute Cholesky factorization of ATA
Augmented Systems Method:
variation on normal equations method
Orthogonal matrices have
orthonormal columns, and orthogonal transformations preserve the
Euclidean norm
QR orthogonal
transformations
Householder: H = I - 2vvT/(vTv)
Givens Rotation: compute
sine and cosine
Gram-Schmidt
Orthogonalization: determine two orthonormal vectors than span the same
subspace as two given
vectors by orthogonalizing one against the other
Solution is not unique if
matrix is rank deficient
Eigenvalues
and Singular Values
Spectrum is all eigenvalues
and spectral radius is maximal eigenvalue
Characteristic polynomial:
det(A-LI) = 0
Symmetric (A = AT),
Hermitian (A = AH), Orthogonal (ATA = AAT =
I), Unitary (AHA = AAH =
I), Normal (AHA =
AAH)
Similarity transforms: B is similar to A if B = T-1AT, and eigenvalues are
preserved
Jacobi method for symmetric
matrices: iterate Ak+1 = JkTAkJk
using [cos,sin; -sin,cos] plane
rotation matrix
QR Iteration: Ak+1
= RkQk
Preliminary reduction to
Hessenberg (triangular except one extra adjacent nonzero diagonal) by
Householder
Power method for largest
eigenvalue: xk = Ax,k-1
, optionally with normalization, shifts, and deflation
Inverse iteration of power
method for smallest or given approximated eigenvalue
Rayleigh Quotient Iteration;
Rayleigh-Ritz procedure for approximating eigenvalue over subspace
of higher dimension
Lanczos and spectrum-slicing
methods for symmetric matrices
Singular value decomposition
for rectangular matrices
Nonlinear
Equations
Bisection method: reducing
search interval by half each step
Fixed-point iteration: use
fixed point where x = g(x)
Newton's method: xk+1
= xk - f(xk)/f'(xk)
Secant method: like Newton's
method but approximates derivative
Inverse Interpolation: fit
quadratic to three points
Linear fractional
interpolation: solve three-equation system, with horizontal and vertical
asymptotes
Broyden's Method for
nonlinear systems: updates approximate Jacobian matrix
Optimization
If all functions are linear,
this is a linear programming problem
Hessian matrix is matrix of
second partial derivatives
Golden section search:
choose relative positions of search boundary points as t and 1-t, with t =
(sqrt(5)-1)/2
~ 0.618
Successive Parabolic
Interpolation: iteratively fit three points to parabola and find minimizer
Newton's method: x,k+1 = xk - f'(xk)/f"(xk)
Steepest Descent Method:
iteratively subtract line search parameter times gradient
Newton's method (for
multidimensional unconstrained opt.): use gradient and Hessian
Secant updating methods
including BFGS method, approximating Hessian
Conjugate gradient method,
quasi-Newton methods, and truncated Newton methods
Gauss-Newton method for
nonlinear least squares using Jacobian
Levenberg-Marquardt method,
weighted combination of steepest descent and Gauss-Newton
Lagrange multipliers for
constrained optimization
Simplex method for linear
programming
Interpolation
A matrix whose columns are
successive powers of an independent variable is a Vandermonde
Monomial basis: find
polynomial of degree n-1 through n points
Lagrange interpolation
Newton interpolation
Incremental Newton
interpolation using divided differences
Hermite (osculatory) cubic
interpolation using values and derivatives at data points
Cubic spline (piecewise
polynomial of degree 3 continuously differentiable 2 times) interpolation,
using equations for each
derivate to get system of linear equations
B-splines form a basis for
the family of spline functions of a given degree
Numerical
Integration and Differentiation
Numerical quadrature is the
approximation of definite integrals
Newton-Cotes Quadrature
Rules: Midpoint, Trapezoid, and Simpson's rules
Method of undetermined
coefficients creates a Vandermonde system and solves for coefficients
Gaussian Quadrature Rules
are based on interpolation, bunching nodes at endpoints instead of
evenly, often using Legendre
polynomial; Gauss-Kronrod Quadrature Rules
Finite difference
approximations for smooth function derivatives; centered difference formula
Automatic Differentiation
Richard extrapolation
improves accuracy of numerical integration or differentiation
Romberg integration uses
Richardson extrapolation with trapezoid quadrature rule
Initial
Value ODEs
Euler's Method: yk+1
= yk + f(tk,yk)*dt ; stability
interval (-2,0)
Implicit backward Euler
method: yk+1 = yk + f(tk+1,yk+1)hk
Taylor series method use
high-order derivatives
Runge-Kutta methods,
including Heun's second-order, and classical
fourth-order (which reduces
to Simpson's rule if f
depends only on t)
Extrapolation, multistep,
predictor-corrector, and multivalue methods
Boundary
Value ODEs
Shooting method: replace bv
problem with a sequence of iv problems whose solutions converge
to bv problem
Superposition method: also
replaces bv problem with iv problems
Finite difference method:
convert bv problem into algebraic system of equations, replacing
derivatives with finite
difference approximations
Finite element method:
approximate bv problem by a linear combination of piecewise polynomial
basis functions; weighted
residual methods
Collocation: residual is
zero
Galerkin: residual is
orthogonal to space spanned by basis functions
Rayleigh-Ritz: residual is
minimized in weighted least squares sense
Partial
Differential Equations
Hyperbolic (time-dependent
physical processes not evolving to steady state), parabolic (time-
dependent physical processes
evolving toward steady state), and elliptic (time-independent,
already at steady state)
classifications of PDEs
Semidiscrete methods with
finite differences or finite elements
Fully discrete methods
Jacobi, Gauss-Seidel, Successive
Over-Relaxation, Conjugate Gradient, and Multigrid methods
Fast
Fourier Transform
Trig interpolation
represents a periodic function as a linear combination of sines and cosines
Continuous Fourier
Transform, Fourier series expansion, and Discrete Fourier Transform
Random
Numbers and Stochastic Simulation
Stochastic simulation mimics
a system by exploiting randomness to obtain a statistical sample of
possible outcomes
Random number generators:
congruential (using seed and modulus), Fibonaci, and nonuniform