Genomics is a new and very active application area of computer science. The past ten years there has been an explosion of genomics data -- the entire DNA sequences of several organisms, including human, are now available. These are long strings of base pairs (A,C,G,T) containing all the information necessary for an organism's development and life. Computer science is playing a central role in genomics: from sequencing and assembling of DNA sequences to analyzing genomes in order to locate genes, repeat families, similarities between sequences of different organisms, and several other applications. The area of computational genomics includes both applications of older methods, and development of novel algorithms for the analysis of genomic sequences. This course aims to present some of the most basic and useful algorithms for sequence analysis, together with the minimal biological background necessary for a computer science student to appreciate their application to current genomics research. Sequence alignments, hidden Markov models, multiple alignment algorithms and heuristics such as Gibbs sampling, and the probabilistic interpretation of alignments will be covered. Applications of these tools to sequence analysis will be presented: comparing genomes of different species, gene finding, gene regulation, whole genome sequencing and assembly. Whenever possible, examples will be drawn from the most current developments in genomics research.
The following courses are recommended
Durbin, Eddy, Krogh, Mitchison "Biological Sequence Analysis"
Gusfield "Algorithms on Strings, Trees, and Sequences"
Requirements and Grading
Homework. Course will be graded based on the homeworks, NO FINAL. The course will have four challenging problem sets of equal size and grading weight. These must be handed in at the beginning of class on the due date, which will usually be two weeks after they are handed out. Collaboration is allowed on the homework.
Recognizing that students may face unusual circumstances and require some flexibility in the course of the quarter, each student will have a total of three free late days (weekends are NOT counted) to use as s/he sees fit. Once these late days are exhausted, any homework turned in late will be penalized at the rate of 20% per late day (or fraction thereof). Under no circumstances will a homework be accepted more than three days after its due date.
Late homework should be turned in to a member of the course staff, or, if none are available, placed under the door of S266 Clark Center. You must write the time and date of submission on the assignment. It is an honor code violation to write down the wrong time.
Students with biological and computational backgrounds are encouraged to work together.
Scribing. Optionally, a student can scribe one lecture. Lecture notes will be due one week after the lecture date, and the grade on the lecture notes will substitute the two lowest-scoring problems in the homeworks. To ensure even coverage of the lectures, please sign up to scribe beforehand with one of the course staff.
Collaboration and Honor Code
Students may discuss and work on problems in groups but must write up their own solutions. When writing up the solutions, students should write the names of people with whom they discussed the assignment. Also, when writing up the solutions students should not use written notes from group work.
Lecture: Tue Thu 02:45 PM - 04:00 PM at Skill 193
Review Sessions (Optional): Fri 03:15 PM - 04:05 PM at Skill 193
Office: Clark Center S266
Office hours: Tue 4:15-5:30 or by appt, Clark Center S266
Phone: (650) 723-3334
Email: ude.drofnats@mifares (written backwards to avoid spam)
Office: Clark Center S256
Office hours: Wed 1:15-3:15, Mon 3:30-5:30, Clark Center S256
Phone: (650) 725-6094
Email: ude.drofnats@tsikehc (written backwards to avoid spam)
Questions should be sent to the instructor and the TA directly with email, or communicated to course staff in person after lecture or during office hours.
Additional Material and Tutorials
Some additional materials can be found Here
As the quarter progresses, the following schedule will be updated accordingly. Please check back often for the latest material.
|1||1/9||Introduction: Biology Background|| || ||Sample Notes|
|2||1/11||Sequence Alignment--Dynamic Programming||Durbin Chapters 1, 2|
Gusfield Chapters 11, 12.1, 12.2, 12.7
| ||Brian An Quang Tran|
|3||1/16||Sequence Alignment Cont'd--Linear-Space Alignment;|| || ||Boyko Kakaradov|
|4||1/18||Heuristic Local Aligners; Four-Russian Algorims|| ||HW 1 Out||Richard Pang|
|5||1/23||Hidden Markov Models--Decoding & Evaluation||Durbin Chapters 3, 4|| ||Chuan Sheng Foo|
|6||1/25||Learning: EM / Baum-Welch|| || ||Atif Faheem|
|7||1/30||Learning cont'd|| || ||Kari Lee|
|8||2/1||Pair HMMs for Sequence Alignment||Durbin Chapters 4||HW 2 Out||Bahman Bahmani|
|9||2/6||DNA Sequencing||ARACHNE, Euler, Genome sizes, transposons, genomic mapping--mathematical analysis||HW 1 Due||Huy T. Vo|
|10||2/8||DNA Sequencing and Fragment Assembly|| || ||Satish Viswanatham|
|11||2/13||Cont'd Fragment Assembly||Gusfield Chapter 5|
Genescan, Twinscan, EasyGene, SLAM
| ||Jasmyn Pangilinan|
|12||2/15||Molecular Evolution and Phylogenetic Trees|| ||HW 2 Due; HW 3 Out||Joni Fazo|
|13||2/20||Multiple Sequence Alignment||Gene Regulation and Motif Finding references below|| ||Vinayak Ganeshan|
|14||2/22||Chaining of Local Alignments, Protein Profile HMMs and Classification|| || ||Joel Galeson|
|15||2/27||Gene Recognition||AVID, LAGAN|| ||Yuliya Sarkisyan|
|16||3/1||Gene Regulation, Microarrays||Chaining: Gusfield 13.3, Multiple Alignment: suggested reading Gusfield 14.1, 14.2, 14.5, 14.5, 14.10.1-14.10.2|
Durbin Chapter 6
|HW 3 Due; HW 4 Out||Mohammadreza Alizadeh Attar|
|17||3/6||Motif Finding|| || ||Francisco Luis Adarve|
|18||3/8||Protein Interaction Networks||Durbin Chapters 7.1-7.4, 8.1-8.3|| ||Lukasz Szajkowski|
|19||3/13||Protein Structure Prediction|| || ||Shradha Budhiraja|
|20||3/15||TBA|| ||HW 4 Due; HW 4 Solution||Michael Yu|