Basic Math


Meeting points in a triangle:  orthocenter (altitudes), incenter (angle bisectors; inscribed circle), centroid

(medians), circumcenter (perpendicular bisectors; circumscribed circle)

Parabola: x2 = 4py ;  focus (0,p), directrix y = -p

Ellipse: x2/a2 + y2/b2 = 1; foci (+-sqrt(a2-b2),0), vertices (+-a,0)

Hyperbola: x2/a2 - y2/b2 = 1 ; foci (+-sqrt(a2+b2),0), vertices (+-a,0), asymptotes y = +-(b/a)x

abscissa (x), ordinate (y)

cos(pi/3) = 1/2

sin(2x) = 2sin(x)cos(x);  cos2(x) = (1+cos(2x))/2;  sin2(x) = (1-cos(2x))/2

Law of Cosines: a2 = b2 + c2 -2*b*c*cosA

Law of Sines: (sinA)/a = (sinB)/b = (sinC)/c





Dot (scalar) product: dot(a,b) = |a||b|cos(q)

Cross (vector) product: cross(a,b) = |a||b|sin(q)n

Scalar projection: compab = dot(a,b)/|a|

Vector projection: projab = dot(a,b)/|a|2*a

Squeeze Theorem: limit of a function between two functions

Isaac Newton (British, 1642) attended lectures given by Isaac Barrow and published Principia

Mathematica in 1687 due to the encouragement of Halley.

Augustin-Louis Cauchy (French, 1789) developed the rigorous definition of limits.

Intermediate Value Theorem: for any N between f(a) and f(b), there is a c in [a,b] such that f(c) = N

f'(x) = lim (f(x+h)-f(x))/h 

Gottfried Leibniz (German, 1646) worked as a diplomat and developed calculus, publishing it in 1684, and

used the Leibnez notation (d/dx).

Binomial Theorem: (x+y)n = xn + nxn-1y + . . . + (n,k)xn-kyk + . . . + nxyn-1 + yn

Poiseuille developed the law of laminar flow, relating velocity and radius in fluid flow in a tube; flux is

proportional to the fourth power of the radius.

Derivatives: sin = cos, cos = -sin, tan = sec2, csc = -csc*cot, sec = sec*tan, cot = -csc2

Chain Rule: dy/dx = (dy/du)*(du/dx)

Niels Abel (Norway, 1800s) showed that there is no general formula for a fifth-degree equation.

A cycloid is the curve traced out by a point on the circumference of a circle as the circle rolls along a line.

Newton-Raphson method for approximating zeros of a function.

Hadamard and de la Vallee Poussin proved the Prime Number Theorem, proposed by Gauss.

Newton's law of cooling states that the rate of cooling of an object is proportional to the temperature

difference between the object and its surroundings.

Catenary curve: y = C+a*cosh(x/a)

L'Hospital's rule:  lim(f/g) = lim(f'/g')

Extreme Value Theorem: f has an absolute minimum and maximum in [a,b]

Fermat's Theorem: f'(c) = 0 at max or min point c

Pierre Fermat (French, 1601) was a lawyer who developed analytical geometry with Rene Descarte.

Critical number: f'(c) = 0 or does not exist

Mean Value Theorem: f'(c) = (f(b)-f(a))/(b-a)

Joseph-Louis Lagrange (French, 1736) was born in Italy, developed the Mean Value Theorem, applied

calculus to the analysis of the solar system, and worked at the Berlin Academy under Frederick the Great

and the Louvre under Louis XVI.

First and Second Derivative Tests for local minima or maxima

Bernhard Riemann (Germann, 1826), studied under Gauss at Gottingen, defined integrals in terms of the

limit of an infinite summation (Riemann sum), and died of TB at 39.

Midpoint (error over 24n2), Trapezoid (error over 12n2), and Simpson's (error over 180n4) rules for

approximating integrals

Fundamental Theorem of Calculus: if g(x) = integral of f(t) from a to x, g'(x) = f(x) ;  integral of f(x) from

a to b equals F(b) - F(a) with F the antiderivative of f

Augustin Fresnel (French, 1788) developed the Fresnel function (integral of sin(pi*t2/2) from 0 to x), used

in optics and highway construction.

Substitution Rule for integrals

ln(x) = integral of 1/t from 1 to x

Disk method for volumes of revolution: integral of pi times f(x)2

Washer method for volumes of revolution: pi times the integral of f(x)2 - g(x)2

Cylindrical shells method for volumes: integral of 2 times pi times x times f(x)

Average value of a function: 1/(b-a) times integral of f(x) from a to b

Mean Value Theorem for Integrals: integral of f(x) from a to b = f(c)*(b-a)

James Bernoulli (Swiss, 1700s) introduced Bernoulli numbers: (1/n!) times the summation from 0 to n of

(n,k) times bk times xn-k and developed the technique for solving separable ODEs.

Integration by parts: integral of u*dv = u*v - integral of v*du

Thomas Simpson (British, 1700s) was a weaver who studied math and popularized Simpson's Rule.

An integral over an infinite region or containing an infinite discontinuity is an improper integral.

Planck's Law of Radiation models blackbody radiation.

Logistic growth: dy/dt = k*y*(M-y)

Arc length: integral from a to b of square root of one plus (dy/dx)2

Christopher Wren (British, 1600s) proved that the length of one arch of a cycloid is eight times the radius

of the generating circle.

Moment: sum of the mass times position of each particle

The centroid is the center of mass.

Theorem of Pappus: The volume of a solid formed by rotating a plane about a line is the product of the area

of the plane and the distance traveled by its centroid.

Chebyshev polynomials: Tn(x) = cos(n*arcos(x))

A monotonic sequence is always increasing (or always decreasing).

Every bounded, monotonic sequence is convergent.

Sum of infinite geometric series:  a/(1-r)

In a telescoping sum, the terms cancel in pairs, leaving just the first and the last, such as the sum of


Harmonic series: sum of (1/n) for n from 1 to infinity

Tests for series convergence/divergence: Divergence test, integral test, comparison test, p-series test (sum

of 1/np is convergent if p>1), limit comparison (if lim(a/b) = c, both diverge or both converge).

Alternating series: terms alternatingly positive and negative, such as the sum of (-1)n-1/n;  the alternating

series test says it is convergent if magnitudes are decreasing and the limit of the non-alternating sequence

is zero, and is absolutely convergent if the absolute value of the series converges.

Ratio test for absolute convergence (lim(|an+1/an|)

Power series: summation of cnxn ; may have a radius of convergence

Friedrich Bessel (German, 1784) developed Bessel functions to solve Kepler's equation for planetary

motion; Bessel functions are sums of power series.

James Gregory (Scottish, 1638) developed the power series for arctan, which leads to the Leibniz formula

for pi: pi/4 = 1 - 1/3 +1/5 - 1/7 + . . .

Taylor series: f(x) = summation of f(n)(a)*(x-a)n/n! ;   if a = 0, it is the Maclaurin series.

Brook Taylor (English, 1685) and Colin Maclaurin (Scottish, 1698) represented functions as sums of power

series, as Gregory and Bernoulli had done earlier.

e = 1 + 1/1! + ! + 1/3! + . . .

sin(x) = x - x3/3! + x5/5! - . . .

Parallelepiped volume: dot(a,cross(b,c))

Ellipsoids: x2/a2 + y2/b2 + z2/c2 = 1

Hyperboloids: x2/a2 + y2/b2 - z2/c2 = 1 (one sheet) or -x2/a2 - y2/b2 + z2/c2 = 1 (two sheets)

Cones: z2/c2 = x2/a2 + y2/b2

Paraboloids: z/c = x2/a2 + y2/b2 (elliptic) or z/c = x2/a2 - y2/b2 (hyperbolic)

Cylinders: x2/a2 = y2/b2 = 1

A sharp corner in a vector function graph is a cusp.

Arc length: integral of r'(t) from a to b

Curvature (kappa): |dT/ds| = |T'(t)|/|r'(t)| = |r'(t)*r"(t)|/|r'(t)|3

Principal unit normal vector: T'(t)/|T'(t)|

Binormal vector: T(t) * N(t)

The normal plane contains N and B, the osculating plane contains T and N, and the osculating circle is in

the osculating plane and has radius 1/curvature.

Level curves: f(x,y) = k

Clairaut's Theorem:  order of taking partial derivatives does not matter.

Alexis Clairaut (French, 1713) published the first systematic treatise on 3D analytic geometry, including

the calculus of space curves, and developed Clairaut's Theorem.

Solutions to the Laplace equation (d2u/dx2 + d2u/dy2 = 0) are harmonic functions.

Wave equation: d2u/dt2 = a2(d2u/dx2)

Chain Rule: dz/dt = (dz/dx)(dx/dt) + (dz/dy)(dy/dt)

Implicit Function Theorem: dy/dx = -Fx/Fy

The gradient is the vector of partial derivatives of a function and is used with the del operator.

Second Derivative test uses sign of  fxxfyy-fxy2 and fxx to tell if critical point is min (both > 0) or max (first >

0 and second < 0)or saddle (first < 0)

Lagrange multipliers are used to minimize or maximize a function subject to a constraint; del(f) =

lambda*del(g) and g=k

Fubini's Theorem: order of taking multiple integrals does not matter.

Guido Fubini (Italian, 1879) proved a general version of Fubini's Theorem.

A multiple definite integral is an iterated integral.

A plane region of type I lies between the graphs of two continuous functions of x.

Polar coordinates: x = r*cos(q) ;  y = r*sin(q)  ; r2 = x2 + y2  ; tan(q) = y/x   (for cylindrical add z = z)

Cardioid: r = c + sin(q)

Rose: r = cos(aq)

Limacon (snail): r = 1+c*sin(q)

Conchoid: r = a + b*sec(q)

Moment of inertia: mass times square of distance to axis

Radius of gyration of a lamina about an axis: mR2 = I

Surface area: double integral of square root of fx2 + fy2 + 1

Spherical coordinates: x = rcos(q)sin(f)  ;  y = rsin(q)sin(f) ;  z = rcos(f) ;  r2 = x2 + y2 + z2

The Jacobian is the determinant of the matrix of partial derivatives of a system of functions.

Jacobians:  r for polar and spherical,  r2*sin(f) for spherical

Carl Jacobi (German, 1804) developed Jacobians for evaluating multiple integrals.

A vector field is conservative if it is the gradient of some scalar function.

The fundamental theorem for line integrals: the line integral of the gradient of f over a smooth curve from a

to b is f(r(b)) - f(r(a))

For a conservative vector field F = Pi + Qj, dP/dy = dQ/dx

Green's Theorem: The line integral over a closed line of Pdx + Qdy equals the double integral of dQ/dx -

dP/dy over the area enclosed by the curve.

George Green (English, 1793) was a baker but went to Cambridge at age 40 but died 4 years after

graduation; Lord Kelvin reprinted his findings.

The curl of a vector F is the cross product of the gradient and F;  the curl of the gradient of f is zero.

The divergence of a vector F is the dot product of the gradient and F ; the divergence of the curl of F is 0.

The divergence of the gradient of a function is the Laplace operator, del2 = del dot del

The flux of a vector field across a surface is the surface integral of the vector field over the surface.

Stokes' Theorem: The line integral over a closed line of a vector field F equals the double integral of the

curl of F over a surface bounded by the line.

George Stokes (Irish, 1819) was a Cambridge professor, studied fluid flow and light and learned Stokes'

Theorem from Lord Kelvin.

The line integral of the velocity field dotted with the unit tangent vector is the circulation.

Gauss's Theorem (Divergence Theorem or Ostrogradsky's Theorem):  the surface integral of a vector field

F over a closed surface equals the triple integral of the divergence of F over the volume enclosed by

the surface.

Karl Friedrich Gauss (German, 1777) studied electrostatics and developed the Divergence Theorem.

Spring equation:  m(d2x/dt2) + c(dx/dt) + kx = F(t);  Kirchoff's Law: L(d2Q/dt2) + R(dQ/dt) + Q/C = E(t)

(displacement - charge, velocity - current, mass - inductance, damping constant - resistance, spring

constant - elastance, external force - electromotive force)



Discrete Math


P(n,r) = n!/(n-r)!

Arrangements of n objects of r types: n!/(n1!*n2!*. . .*nr!)

C(n,r) = n!/(r!(n-r)!)

Binomial Theorem: (x+y)n = summation of C(n,k)*xkyn-k for k from o to n

Combinations with repetition: C(n+r-1,r) ; distribute r identical objects n ways

Catalan Numbers (by Belgian Eugene Catalan, 1814): number of ways to parenthesize a product or go from

origin to (n,n) below y=x; bn = C(2n,n)/(n+1);  1,1,2,5,14,42,132

DeMorgan's Laws: -(p v q) = -p A -q ; -(p A q) = -p v -q

Idempotent Laws: p v p = p ; p A p = p

Domination Laws: p v T = T ; p A F = F

Absorption Laws: p v (p A q) = p  ;  p A (p v q) = p

Identity Laws: p v F = p ; p A T = T

Inverse Laws: p v -p = T ; p A -p; = F

Contrapositive of p -> q : -q -> -p

Converse of p -> q : q -> p

Inverse of p -> q : -p -> -q

Modus Ponens (Rule of Detachment): p, p ->q  :  q

Modus Tollens: p -> q, -q : -p

Law of the Syllogism: p -> q, q -> r : p -> r

Rule of Disjunctive Syllogism: p v q, -p : q

Quantifiers for all and for each

Rule of Universal Specification: If an open statement is true for all replacements, then it is true for each

specific member

Rule of Universal Generalization: If an open statement is true for any arbitrary replacement, it is true for all


George Boole established mathematical logic, Augustus DeMorgan extended it, and Charles Sanders Peirce

introduced quantifiers.

The power set of A is the set of all subsets of A and has 2n elements.

The dual of a set expression interchanges universal sets and null sets, and unions and intersections.

Venn diagrams

Georg Cantor (Russian, 1845) defined sets, Bertrand Russell (English, 1872) introduced Russell's paradox

showing inconsistencies in set theory, and Kurt Godel (Austrian, 1906) showed that some things must either

be unprovable or contradictory.

Well-Ordering Principle: every nonempty subset of the positive integers contains a smallest element.

Mathematical Induction: If S(1) is true, and whenever S(k) is true, S(k+1) is true, then S(n) is true for all

positive integers k;  basis and inductive steps.

Euclid's theorem that there are infinitely many primes.

Euclid's algorithm for finding greatest common divisor.

A Diophantine equation is a linear equation requiring integer solutions.

Fundamental Theorem of Arithmetic: Every integer greater than one can be written as a unique product of


Algorithm and algebra come from the Islamic mathematician al-Khowarizimi.

Augusta Ada Byron Lovelace developed the first computer algorithm.

The Cartesian product of two sets combines all elements, of size the product of the sizes of the two sets.

There are 2|A|*|B| relations from A to B.

In a function from A to B, every element of A appears exactly once as the first component of an ordered

pair in the relation.

There are |B||A| functions from A to B.

In an injective, or one-to-one function, each element of B appears at most once as the image of an element

of A.

There are P(|B|,|A|) injective functions from A to B.

The domain of A can be restricted or extended to form images.

In a surjective, or onto function, for all elements b of B there is at least one element a of A such that f(a)=b.

There are the summation of (-1)k(n,n-k)(n-k)m for k from 0 to n onto functions from A to B.

Stirling numbers of the second kind are 1/n! times the number of onto functions, and are the number of

ways to distribute m distinct objects into n identical containers.

There are |A||A|*|A| closed binary operations on A.

Pigeonhole Principle: If m pigeions occupy n pigeonholes and m>n, then at least one pigeonhole has two or

more pigeons roosting in it.

A function is bijective if it is both one-to-one and onto.

A function is invertible if and only if it is one-to-one and onto; there exists a function from its codomain to

its domain such that compoing in either order results in the identity.

The preimage is like the inverse for functions that are not invertible.

If |f(n)| <= m|g(n)| for all n>=k, then f is an element of O(g).

Peter Dirichlet formulated the pigeonhole principle.

Paul Bachmann introduced big-Oh notation.

Deterministic finite automata accept regular languages, push-down automata accept context-free languages,

and Turing machines accept recursively ennumerable languages.



Differential Equations


Logistic population model: dP/dt = k(1-P/N)*P

A differential equation with an initial condition of the solution function is an initial-value problem.

A differential equation is separable if it can be written as the product of one function depending only on

time and another that depends only on the dependent variable.

An ODE dependent only on the dependent variable is autonomous.

An implicit form of a solution is not solved explicitly for the dependent variable.

A slope field shows the approximate direction (slope) of solutions at points in the graph.

Euler's Method: yk+1 = yk + f(tk,yk)*dt

The Existence and Uniqueness Theorem states that solutions exist for all continuously

defined single ODEs dy/dt = f(t,y), and that they are unique if df/dy is continuous .

Points at which the function is not changing (f(t,y) = 0) are equilibrium points.

Phase lines show the direction of solutions between equilibrium points on a straight line.

An equilibrium point may be a source (solutions go towards it), sink (solutions go away from it), or node.

The Linearization Theorem classifies equilibrium points according to f'(y0).

A bifurcation is when a small change in a parameter causes a drastic change in the long-term behavior of

solutions; a bifurcation diagram shows the phase lines near a bifurcation value.

Integrating factor: I = eintegral of a(t)  for ODE in form dy/dt + a(t)y = k

The phase portrait graphs solutions to systems of two equations with the two dependent variables as axes.

Undamped spring harmonic oscillator: d2y/dt2 + (k/m)y = 0

The vector field is a slope field for a system of ODEs on the phase plane.

A system of ODEs in which the rate of change of one or more dependent variables depends only on its own

value decouples.

The P-Delta effect (gravity-overhang) makes oscillating buildings fall.

The Lorenz System of three ODEs led to the development of Chaos Theory.

The Linearity Principle states that if Y(t) is a solution to dY/dt = AY, kY(t) is a solution, and if Y1(t) and

Y2(t) are solutions, Y1(t) + Y2(t) is a solution.

AV = lV for eigenvalue l and eigenvector V.

The roots of the characteristic polynomial are the eigenvalues of a matrix.

Linear system equilibrium points may be sources (both eigenvalues +), sinks (both -), or saddles (+ and -).

Euler's Formula: ea+ib = eacosb + ieasin ;   spiral source (a>0), spiral sink (a<0), periodic center (a=0)

Damped harmonic oscillator: m(d2y/dt2) + b(dy/dt) + ky = 0 ;  b2 - 4km determines if it is

under/over/critically damped.

The trace of [a,b;c,d] is a+d;  the trace-determinant plane is a parameter plane that may be graphed.

In a homogeneous second-order equation, there is no term without some form of the dependent variable.

The Method of Undetermined Coefficients tries to solve forced (nonhomogenous) second-order equations

by guessing a solution with a parameter, substituting in, and solving for the parameter.

Van der Pol Equation: dx/dt = y, dy/dt = -x + (1-x2)y

Linearization neglects nonlinear terms near equilibrium points.

Separatrices tend toward a saddle equilibrium point as t goes to +- infinity, separating solutions with

different behaviors.

Nullclines are lines or curves along which the derivative of one of the dependent variables is zero.

For a Hamiltonian system, dx/dt = dH/dy and dy/dt = -dH/dx, including the harmonic oscillator and the idel

pendulum (dx/dt = y, dy/dt = -g*sin(x)).

Fourier transform: integral of y(t)*e-iwtdt from - infinity to + infinity

Laplace transform: integral of y(t)*e-stdt from 0 to infinity

                L(dy/dt) = sL(y) - y(0)

                L(eat) = 1/(s-a)

                L(tn) = n!/sn+1

                L(cf) = cL(f)

                L(cos(wt)) = s/(s2+w2)

                L(sin(wt)) = w/(s2+w2)

A Heaviside function is a step function; it has a discontinuity in its definition.

Convolution: (f*g)(t) = integral of f(t-u)g(u)du from 0 to t



Linear Algebra


Triangular form: in kth equation coefficients of first k-1 variables are zero

Elementary row ops: interchange, multiply by scalar, replace row with multiple of another row

Row echelon form: first nonzero entry in each row is one, and  number of leading zeros in each row is more

than in previous row

Gaussian elimination: perform row ops to transform matrix into row echelon form

Reduced row echelon form: the first nonzero entry in each row is the only nonzero entry in its column

Gauss-Jordan elimination: perform row ops to transform matrix into reduced row echelon form

Homogenous system of linear equations: constants on right side are all zero

Euclidean n-space: set of all n*1 matrices of real numbers

For a nonsingular (invertible) matrix A, there is a multiplicative inverse B such that AB = BA = I (identity)

Transpose: switch columns and rows

Row equivalent matrices: can be transformed into each other by elementary row ops

The minor of an element is the determinant of the matrix formed by deleting the row and column of that

element, and its cofactor is (-1)i+j times that determinant.

Interchanging rows changes determinant sign, multiplying a row by a scalar multiplies the determinant by

that scalar, and adding a multiple of one row to another does not change the determinant.

Cramer's Rule: For Ax = b, xi = det (Ai)/det(A)

Adjoint: replace each term by its cofactor and transpose; adj(A) = det(A)*A-1

A vector space is a set on which the operations of addition and scalar multiplication are defined and adhere

to the vector space axioms.

A subset of a vector space is a nonempty subset which is closed under addition and scalar multiplication.

The nullspace of a matrix A is the set of all solutions to Ax = 0.

The set of all linear combinations of a set of vectors is the span of those vectors.

Vectors are linearly dependent if their sum, with each multiplied by a nonzero coefficient, can be zero.

The Wronskian of a set of n functions is a matrix of their first through nth derivatives.

A set of linearly independent vectors that span a vector space form a basis.

The dimension of a vector space is the minimum number of vectors that form a basis. 

The row (column) space of a matrix is the subspace spanned by its row (column) vectors.

The rank of a matrix is the dimension of the row space of the matrix.

The nullity of a matrix is the dimension of its nullspace.

A linear tranformation is a mapping from one vector space to another.

The kernel of a linear transformation is the set of vectors in the original vector space that are mapped to 0.

The image of a subspace of the original vector space of a linear transform is the set of vectors in the new

vector space mapped from the original subspace; the image of the entire vector space is the range of the

linear transform.

Rotations may be done by the linear transform L(x) = [cos,-sin; sin,cos]x

A matrix B is similar to a matrix A if there is a nonsingular matrix S such that B = S-1AS.

Cauchy-Schwartz Inequality: |xTy| <= ||x|| ||y||

Orthogonal: xTy = 0

Dot (scalar) product: dot(a,b) = |a||b|cos(q)

Cross (vector) product: cross(a,b) = |a||b|sin(q)n

Scalar projection: compab = dot(a,b)/|a|

Vector projection: projab = dot(a,b)/|a|2*a

If a vector space can be written as a sum of two subspaces, it is a direct sum of those subspaces.

An inner product is an operation on a vector space that assigns a real number to each of its pairs of vectors.

Pythagorean Law: ||u+v||2 = ||u||2 + ||v||2 for orthogonal u and v in an inner product space

AM Legendre and Carl Gauss developed the technique of least squares to approximate overdetermined


If the product of any two vectors (one of them transposed) in an inner product space is 0, it is an orthogonal


An orthonormal set of vectors in an orthogonal set of unit vectors.

The column vectors of an orthogonal matrix form an orthonomal set.

Parseval's Formula for orthonormal basis for inner product spaces

Cooley and Tukey developed the Fast Fourier Transform to process signals.

The Gram-Schmidt orthogonalization process constructs an orthonormal basis for an inner product space.

A matrix can be factored into QR, with Q having orthonormal columns and R being upper triangular and


Jacobi (including Legendre and Tchebycheff), Hermite, and Laguerre  polynomials are orthogonal with

respect to an inner product

AV = lV for eigenvalue l and eigenvector V

Characteristic polynomial: p(l) = det(A - lI)

A matrix is diagonalizable if there is a matrix X such that X-1AX is a diagonal matrix.

A stochastic process depends on chance, and a Markov process is a stochastic process with a finite set of

possible outcomes where the probability of next outcome depends only on previous outcome and the

probabilities are constant with time.

A matrix is Hermitian if it is unchanged by conjugating each entry and then taking the transpose.

A matrix is unitary if its column vectors form an orthonormal set.

Schur's Theorem says that there is a matrix U such that UHAU is upper triangular.

The Spectral Theorem says that there is a unitary matrix that diagonalizes a Hermitian matrix.

A matrix is normal if AAH = AHA.

A matrix is positive definite if all eigenvalues are positive.

Cholesky decomposition: A symmetrix positive definite matrix may be factored into LLT with lower

triangular L.





Descriptive and inferential statistics

Stem-and-leaf displays: leading digit(s) are stem and trailing digits leaves

Histograms, mean, median, mode, quartiles, boxplots

Sample variance: sum(x-mean)2/(n-1)

Sample standard deviation = sqrt(sample variance)

Outlier > 1.5f, extreme > 3f

P(A) = 1 - P(A')

P(A u B) = P(A) + P(B) - P(A A B)

Permutations: ordered sequence of r from n: P(n,r) = n!/(n-r)!

Combinations: Unordered subset of r from n: C(n,r) = n!/(r!(n-r)!)

Conditional Probability: P(A|B) = P(A A B)/P(B)

Multiplication Rule: P(A A B) = P(A|B)*P(B)

Law of Total Probability: P(B) = P(B|A1) + . . . + P(B|Ak)  for mutually exclusive and exhaustive events

A1 . . . Ak

Bayes' Theorem: P(Aj|B) = P(Aj A B)/P(B) = P(B|Aj)*P(Aj)/(P(B|A1) + . . . + P(B|Ak))

P(A A B) = P(A)*P(B)     if A and B are independent

A random variable is any rule that associates a number with each outcome in the sample space; can be

discrete or continuous; a Bernoulli random variable is either 0 or 1

Probability mass function: p(x) = P(X=x)

Cumulative distribution function: F(x) = P(X <= x)

Expected value (mean value): E(X) = summation of x times p(x)

Variance: V(X) = summation of (x - expected value)2*p(x) = E[(X-expected value)2]

Standard deviation: SD = sqrt(variance)

Binomial probability: (n,x)px(1-p)n-x   probability of x successes out of n, each with probability p

                E(X) = np, V(X) = np(1-p)

Hypergeometric distribution is exact form of binomial; negative binomial distribution lets number of trials

vary with successes fixed

Poisson distribution: p(x;L) = e-LLx/x! ;  E(X) = V(X) = L  ;  Poisson process with L = at

Use integrals rather than summations for continuous random variables

Normal Distribution: (1/(sqrt(2*pi)*sigma)*e-(x-u)*(x-u)/(2*sigma*sigma) ;  in standard normal distribution,

mean u = 0 and standard deviation sigma = 1

Tail area za is the value for which a of the area lies to the right

Standardize: Z = (X-u)/sigma

About 68% within 1 SD, 95% within 2 SD, 99.7% within 3 SD

Gamma Distribution: 1/(BaL(a)) xa-1e-x/B  ;  E(X) = aB, V(X) = aB2

Exponential Distribution: Le-Lx  ;  E(X) = 1/L,  V(X) = 1/L2

Chi-Squared Distribution: Gamma distribution with a = v/2 and B = 2, with v = number of degrees of


Other Continuous Distributions: Weibull, Lognormal, Beta

Joint probability mass function: double sum or double integral of p(x,y) or f(x,y)

Marginal probability density functions: fx(x) = integral f(x,y)dy ; fy(y) = f(x,y)dx  from -inf. to +inf.

Expected value: E[h(X,Y)] = double integral of h(x,y)*f(x,y)dxdy from -inf. to +inf.

Covariance: Cov(X,Y) = E[(X-ux)(Y-uy)] = double integral of (x-ux)(y-uy)*f(x,y)dxdy from -inf. to +inf

                = E(XY) - uxuy

Correlation coefficient: px,y = Cov(X,Y)/(sigmax*sigmay) ;    = 0 if X and Y are independent

100(1-a)% confidence interval for the mean u is sample mean +- za/2 * (sigma/sqrt(n))

Sample size needed for an interval width w: n = (2*za/2 * (sigma/w))2

Prediction interval: sample mean +- ta/2,n-1 *s*sqrt(1+1/n)   with ta,v the value for which the area under the

curve with v degrees of freedom to the right is a

Hypothesis testing:  null hypothesis rejected if test statistic is in rejection region;  type I error wrongly

rejects, type II wrongly does not reject

Test statistic: z = (sample mean - actual mean)/(sigma/sqrt(n))

ANOVA:  analysis of variance



Number Theory and Cryptography


Group - a set with associative multiplication, an identity element, and an inverse operation

Field - a group on which addition, subtraction, and division (except by zero) are universally defined

Ring - a group with commutative and associative addition, a zero, negatives of all elements, and a

distributive law of multiplication and addition.

An Abelian group is commutative.

Galois theory - the conditions necessary for an equation to be solvable by radicals

Modulus of complex number is its magnitude

DeMoivre's Theorem: zn = rn(cos(nq) + isin(nq))

Euler's Formula: eiy = cos(y) + isin(y)

Claude Shannon wrote "Statistical Theory of Communication"

Dick Hamming developed Hamming codes for error correction; with n parity bits you can send 2n-n-1

message bits, correcting one error.

Golay code correct 3 errors with 12 message bits and 11 parity bits.

Hamming distance: how many bits are different

|message space|*|ball size| = |sending space|

Tufte wrote books about the visual display of information

Shannon's definition of information: Information(A) = -log(P(A));  entropy = - integral of p log p ;

                entropy of English is about 3.9;  maximum for n letters in 2n

Huffman codes for data compression;  no code can be a prefix of another code; # bits >=

(# symbols sent )*(entropy)

Distribution of noise: Gauss (law of large numbers), Poisson (law of rare events)


                Fermat prime: p = 1+22^N  for some nonnegative integer N

                Mersenne prime: p = -1 + 2q  for some prime q

Prime moduli

                Fermat's Theorem: am-1 = 1 mod m  for prime m

                Euler's Totient Theorem: t(pa) = (p-1)*pa-1   if p is prime

                                t(a*b) = t(a)*t(b)    if gcd(a,b) = 1

                Carmichael's universal exponent function

Euclidean method for finding gcd by successive divisions, added to by Aho/Hopcroft/Ullman with matrices

and row operations

Suntze's Theorem (the Chinese Remainder Theorem) outputs a y such that x = y mod (ab),

given x = umod a and x = w mod b

Merkel and Hellman developed the trapdoor knapsack encryption system but Shamir broke it

RSA (Riuest - Shamir - Adleman) pick two primes and publicize their product; breaking it requires solving

a discrete log problem

Diffie and Helman defined digital signatures

Shamir and Blakley threshold schemes for requiring k pieces of information to determine secret

Purdy developed high security log-in ids

Data Encryption Standard (DES) uses a 64-bit plaintext and 56-bit key, 16 transformations



Numerical Analysis


Truncation vs. Rounding error

Condition number = |relative change in solution|/|relative change in input data|

IEEE SP 24 bit precision, IEEE DP 53 bit precision

Machine precision = B1-t

Solving Linear Systems

                Gaussian elimination for LU factorization with partial pivoting

                Gauss-Jordan elimination to diagonal form

                Sherman-Morrison formula for inverse of a matrix with a rank-one change to matrix with known


                Woodbury formula for inverse of a matrix with a rank-k change to matrix with known inverse

                One-norm (Manhattan norm) is sum of absolute values; p-norm is pth root of sum of (values)p

                Matrix vector one-norm is max absolute column sum; infinity-norm is max absolute row sum

Condition number of matrix = ||A||*||A-1||

A symmetric (A = AT) positive definite (xTAx > 0 for all x) matrix can be factored A = LLT by

Cholesky factorization, scaling columns by square root of diagonal entry

Band systems are nearly diagonal; tridiagonal is a band with bandwidth one

Linear Least Squares  (Ax ~ b)

                For overdetermined systems; has unique solution if the columns of A are linearly independent

                For orthogonal vectors, yTz = 0

                Normal equations method: compute Cholesky factorization of ATA

                Augmented Systems Method: variation on normal equations method

                Orthogonal matrices have orthonormal columns, and orthogonal transformations preserve the

Euclidean norm

QR orthogonal transformations

                Householder:  H = I - 2vvT/(vTv)

                Givens Rotation: compute sine and cosine

                Gram-Schmidt Orthogonalization: determine two orthonormal vectors than span the same

subspace as two given vectors by orthogonalizing one against the other

Solution is not unique if matrix is rank deficient

Eigenvalues and Singular Values

                Spectrum is all eigenvalues and spectral radius is maximal eigenvalue

                Characteristic polynomial: det(A-LI) = 0

                Symmetric (A = AT), Hermitian (A = AH), Orthogonal (ATA = AAT = I), Unitary (AHA = AAH =

I), Normal (AHA = AAH)

                Similarity transforms: B is similar to A if B = T-1AT, and eigenvalues are preserved

                Jacobi method for symmetric matrices:  iterate Ak+1 = JkTAkJk using [cos,sin; -sin,cos] plane

rotation matrix

                QR Iteration:  Ak+1 = RkQk

                Preliminary reduction to Hessenberg (triangular except one extra adjacent nonzero diagonal) by


                Power method for largest eigenvalue: xk = Ax,k-1 , optionally with normalization, shifts, and


                Inverse iteration of power method for smallest or given approximated eigenvalue

                Rayleigh Quotient Iteration;  Rayleigh-Ritz procedure for approximating eigenvalue over subspace

of higher dimension

                Lanczos and spectrum-slicing methods for symmetric matrices  

                Singular value decomposition for rectangular matrices

Nonlinear Equations

                Bisection method: reducing search interval by half each step

                Fixed-point iteration: use fixed point where x = g(x)

                Newton's method: xk+1 = xk - f(xk)/f'(xk)

                Secant method: like Newton's method but approximates derivative

                Inverse Interpolation: fit quadratic to three points

                Linear fractional interpolation: solve three-equation system, with horizontal and vertical


                Broyden's Method for nonlinear systems: updates approximate Jacobian matrix


                If all functions are linear, this is a linear programming problem

                Hessian matrix is matrix of second partial derivatives

                Golden section search: choose relative positions of search boundary points as t and 1-t, with t =

(sqrt(5)-1)/2 ~ 0.618

                Successive Parabolic Interpolation: iteratively fit three points to parabola and find minimizer

                Newton's method: x,k+1 = xk - f'(xk)/f"(xk)

                Steepest Descent Method: iteratively subtract line search parameter times gradient

                Newton's method (for multidimensional unconstrained opt.): use gradient and Hessian

                Secant updating methods including BFGS method, approximating Hessian

                Conjugate gradient method, quasi-Newton methods, and truncated Newton methods

                Gauss-Newton method for nonlinear least squares using Jacobian

                Levenberg-Marquardt method, weighted combination of steepest descent and Gauss-Newton

                Lagrange multipliers for constrained optimization

                Simplex method for linear programming


                A matrix whose columns are successive powers of an independent variable is a Vandermonde

                Monomial basis: find polynomial of degree n-1 through n points

                Lagrange interpolation

                Newton interpolation

                Incremental Newton interpolation using divided differences

                Hermite (osculatory) cubic interpolation using values and derivatives at data points

                Cubic spline (piecewise polynomial of degree 3 continuously differentiable 2 times) interpolation,

using equations for each derivate to get system of linear equations

                B-splines form a basis for the family of spline functions of a given degree

Numerical Integration and Differentiation

                Numerical quadrature is the approximation of definite integrals

                Newton-Cotes Quadrature Rules:  Midpoint, Trapezoid, and Simpson's rules

                Method of undetermined coefficients creates a Vandermonde system and solves for coefficients

                Gaussian Quadrature Rules are based on interpolation, bunching nodes at endpoints instead of

evenly, often using Legendre polynomial;  Gauss-Kronrod Quadrature Rules

                Finite difference approximations for smooth function derivatives; centered difference formula

                Automatic Differentiation

                Richard extrapolation improves accuracy of numerical integration or differentiation

                Romberg integration uses Richardson extrapolation with trapezoid quadrature rule

Initial Value ODEs

                Euler's Method: yk+1 = yk + f(tk,yk)*dt  ;  stability interval (-2,0)

                Implicit backward Euler method: yk+1 = yk + f(tk+1,yk+1)hk

                Taylor series method use high-order derivatives

                Runge-Kutta methods, including Heun's second-order, and classical fourth-order (which reduces

to Simpson's rule if f depends only on t)

Extrapolation, multistep, predictor-corrector, and multivalue methods

Boundary Value ODEs

                Shooting method: replace bv problem with a sequence of iv problems whose solutions converge

to bv problem

                Superposition method: also replaces bv problem with iv problems

                Finite difference method: convert bv problem into algebraic system of equations, replacing

derivatives with finite difference approximations

                Finite element method: approximate bv problem by a linear combination of piecewise polynomial

basis functions;  weighted residual methods

                                Collocation: residual is zero

                                Galerkin: residual is orthogonal to space spanned by basis functions

                                Rayleigh-Ritz: residual is minimized in weighted least squares sense

Partial Differential Equations

                Hyperbolic (time-dependent physical processes not evolving to steady state), parabolic (time-

dependent physical processes evolving toward steady state), and elliptic (time-independent,

already at steady state) classifications of PDEs

                Semidiscrete methods with finite differences or finite elements

                Fully discrete methods

                Jacobi, Gauss-Seidel, Successive Over-Relaxation, Conjugate Gradient, and Multigrid methods

Fast Fourier Transform

                Trig interpolation represents a periodic function as a linear combination of sines and cosines

                Continuous Fourier Transform, Fourier series expansion, and Discrete Fourier Transform

Random Numbers and Stochastic Simulation

                Stochastic simulation mimics a system by exploiting randomness to obtain a statistical sample of

possible outcomes

                Random number generators: congruential (using seed and modulus), Fibonaci, and nonuniform