Course Description
Genomics is a new and very active application area of computer science. The past ten years there has been an explosion of genomics data -- the entire DNA sequences of several organisms, including human, are now available. These are long strings of base pairs (A,C,G,T) containing all the information necessary for an organism's development and life. Computer science is playing a central role in genomics: from sequencing and assembling of DNA sequences to analyzing genomes in order to locate genes, repeat families, similarities between sequences of different organisms, and several other applications. The area of computational genomics includes both applications of older methods, and development of novel algorithms for the analysis of genomic sequences. This course aims to present some of the most basic and useful algorithms for sequence analysis, together with the minimal biological background necessary for a computer science student to appreciate their application to current genomics research. Sequence alignments, hidden Markov models, multiple alignment algorithms and heuristics such as Gibbs sampling, and the probabilistic interpretation of alignments will be covered. Applications of these tools to sequence analysis will be presented: comparing genomes of different species, gene finding, gene regulation, whole genome sequencing and assembly. Whenever possible, examples will be drawn from the most current developments in genomics research.

The following courses are recommended:
  • CS161: Design and Analysis of Algorithms, or equivalent familiarity with algorithmic and data structure concepts.


Durbin, Eddy, Krogh, Mitchison "Biological Sequence Analysis"

Gusfield "Algorithms on Strings, Trees, and Sequences"

Requirements and Grading
  1. Homework. Course will be graded based on the homeworks, NO FINAL. The course will have four challenging problem sets of equal size and grading weight. These must be handed in at the beginning of class on the due date, which will usually be two weeks after they are handed out. Collaboration is allowed on the homework. Recognizing that students may face unusual circumstances and require some flexibility in the course of the quarter, each student will have a total of three free late days (weekends are NOT counted) to use as s/he sees fit. Once these late days are exhausted, any homework turned in late will be penalized at the rate of 20% per late day (or fraction thereof). Under no circumstances will a homework be accepted more than three days after its due date.

    Late homework should be turned in to a member of the course staff, or, if none are available, placed under the door of S266 Clark Center. You must write the time and date of submission on the assignment. It is an honor code violation to write down the wrong time. Students with biological and computational backgrounds are encouraged to work together.

  2. Scribing. Optionally, a student can scribe one lecture. Lecture notes will be due one week after the lecture date, and the grade on the lecture notes will substitute the two lowest-scoring problems in the homeworks. To ensure even coverage of the lectures, please sign up to scribe beforehand with one of the course staff.

Collaboration and Honor Code
Students may discuss and work on problems in groups but must write up their own solutions. When writing up the solutions, students should write the names of people with whom they discussed the assignment. Also, when writing up the solutions students should not use written notes from group work.

Class Schedule
Lecture: Tue Thu 02:45 PM - 04:00 PM at Skill 193
Review Sessions (Optional): Fri 03:15 PM - 04:05 PM at Skill 193

Serafim Batzoglou
Office: Clark Center S266
Office hours: Tue 4:15-5:30 or by appt, Clark Center S266
Phone: (650) 723-3334
Email: ude.drofnats@mifares (written backwards to avoid spam)

Teaching Assistant
Eugene Fratkin
Office: Clark Center S256
Office hours: Wed 1:15-3:15, Mon 3:30-5:30, Clark Center S256
Phone: (650) 725-6094
Email: ude.drofnats@tsikehc (written backwards to avoid spam)

Questions should be sent to the instructor and the TA directly with email, or communicated to course staff in person after lecture or during office hours.

Additional Material and Tutorials
Some additional materials can be found Here

As the quarter progresses, the following schedule will be updated accordingly. Please check back often for the latest material.

11/9Introduction: Biology Background  Sample Notes
21/11Sequence Alignment--Dynamic ProgrammingDurbin Chapters 1, 2
Gusfield Chapters 11, 12.1, 12.2, 12.7
 Brian An Quang Tran
31/16Sequence Alignment Cont'd--Linear-Space Alignment;  Boyko Kakaradov
41/18Heuristic Local Aligners; Four-Russian Algorims HW 1 OutRichard Pang
51/23Hidden Markov Models--Decoding & EvaluationDurbin Chapters 3, 4 Chuan Sheng Foo
61/25Learning: EM / Baum-Welch  Atif Faheem
71/30Learning cont'd  Kari Lee
82/1Pair HMMs for Sequence AlignmentDurbin Chapters 4HW 2 OutBahman Bahmani
92/6DNA SequencingARACHNE, Euler, Genome sizes, transposons, genomic mapping--mathematical analysisHW 1 DueHuy T. Vo
102/8DNA Sequencing and Fragment Assembly  Satish Viswanatham
112/13Cont'd Fragment AssemblyGusfield Chapter 5
Genescan, Twinscan, EasyGene, SLAM
 Jasmyn Pangilinan
122/15Molecular Evolution and Phylogenetic Trees HW 2 Due; HW 3 OutJoni Fazo
132/20Multiple Sequence AlignmentGene Regulation and Motif Finding references below Vinayak Ganeshan
142/22Chaining of Local Alignments, Protein Profile HMMs and Classification  Joel Galeson
152/27Gene RecognitionAVID, LAGAN Yuliya Sarkisyan
163/1Gene Regulation, MicroarraysChaining: Gusfield 13.3, Multiple Alignment: suggested reading Gusfield 14.1, 14.2, 14.5, 14.5, 14.10.1-14.10.2
Durbin Chapter 6
HW 3 Due; HW 4 OutMohammadreza Alizadeh Attar
173/6Motif Finding  Francisco Luis Adarve
183/8Protein Interaction NetworksDurbin Chapters 7.1-7.4, 8.1-8.3 Lukasz Szajkowski
193/13Protein Structure Prediction  Shradha Budhiraja
203/15TBA HW 4 Due; HW 4 SolutionMichael Yu