Course Description

Genomics is a new and very active application area of computer science. The past ten years there has been an explosion of genomics data -- the entire DNA sequences of several organisms, including human, are now available. These are long strings of base pairs (A,C,G,T) containing all the information necessary for an organism's development and life. Computer science is playing a central role in genomics: from sequencing and assembling of DNA sequences to analyzing genomes in order to locate genes, repeat families, similarities between sequences of different organisms, and several other applications. The area of computational genomics includes both applications of older methods, and development of novel algorithms for the analysis of genomic sequences. This course aims to present some of the most basic and useful algorithms for sequence analysis, together with the minimal biological background necessary for a computer science student to appreciate their application to current genomics research. Sequence alignments, hidden Markov models, multiple alignment algorithms and heuristics such as Gibbs sampling, and the probabilistic interpretation of alignments will be covered. Applications of these tools to sequence analysis will be presented: comparing genomes of different species, gene finding, gene regulation, whole genome sequencing and assembly. Whenever possible, examples will be drawn from the most current developments in genomics research.


CS161, Design and Analysis of Algorithms, or equivalent familiarity with algorithmic and data structure concepts. Basic knowledge of genetics and biology is helpful, but not required.

Requirements and Grading

The course will have four challenging problem sets of equal size and grading weight. These must be handed in at the beginning of class on the due date, which will usually be two weeks after they are handed out. Collaboration is allowed on the homework.

Recognizing that students may face unusual circumstances and require some flexibility in the course of the quarter, each student will have a total of five free late days (weekends are NOT counted) to use as s/he sees fit. Once these late days are exhausted, any homework turned in late will be penalized at the rate of 20% per late day (or fraction thereof). Under no circumstances will a homework be accepted more than three days after its due date.

Late homework should be turned in to a member of the course staff, or, if none are available, placed under the door of S266 Clark Center. You must write the time and date of submission on the assignment. It is an honor code violation to write down the wrong time.

Students with biological and computational backgrounds are encouraged to work together.

Optionally, a student can scribe one lecture.  Lecture notes will be due one week after the lecture date, and the grade on the lecture notes will substitute the two lowest-scoring problems in the homeworks. To ensure even coverage of the lectures, please sign up to scribe beforehand with one of the course staff.

There will be a take-home final, which will be generally simpler than the homeworks. Collaboration is not allowed on the final. The homework will count for 80% of the grade, and the final will count for 20%.

Collaboration and Honor Code

Students may discuss and work on problems in groups but must write up their own solutions. When writing up the solutions, students should write the names of people with whom they discussed the assignment. Also, when writing up the solutions students should not use written notes from group work.