Chuong (Tom) Do
23andMe, Inc.
1390 Shorebird Way
Mountain View, CA 94043
E-mail:
chuongdo@cs.stanford.edu
(
cdo@23andme.com
)
I am currently a research scientist at
23andMe, Inc.
, a personal genetics company based in Mountain View, CA. Previously, I completed a Ph.D. in
Computer Science
at
Stanford University
, where I did research in machine learning with
Andrew Ng
and computational biology with
Serafim Batzoglou
.
Links
CONTRA project
: biosequence analysis with discriminative probabilistic models
CONTRAfold
: RNA secondary structure prediction server
CONTRAlign
: protein alignment server
PROBCONS
: protein alignment server
LAGAN
: genomic alignment
PRODA
: protein alignments with rearrangements and repeats
Selected publications
Proximal regularization for online and batch learning
[
pdf
]
Do, C.B., Le, Q., Foo, C.S. 2009. To appear in
Proceedings of the 26th International Conference on Machine Learning
.
(
extended version with proofs
)
A majorization-minimization algorithm for (multiple) hyperparameter learning
[
pdf
]
Foo, C.S., Do, C.B., Ng, A.Y. 2009. To appear in
Proceedings of the 26th International Conference on Machine Learning
.
A classifier-based approach to identify genetic similarities between diseases
[
pdf
]
Schaub, M.A., Kaplow, I.M., Sirota, M., Do, C.B., Butte, A.J., Batzoglou, S. 2009. To appear in
Bioinformatics
.
What is the expectation maximization algorithm?
[
pdf
]
Do, C.B., Batzoglou, S. 2008.
Nature Biotechnology
, 26:897-899.
Tighter bounds for structured estimation
[
pdf
]
Do, C.B., Le, Q., Teo, C.H., Chapelle, O., Smola, S. 2008. In
Advances in Neural Information Processing Systems 21
.
A max-margin model for efficient simultaneous alignment and folding of RNA sequences
[
pdf
]
Do, C.B., Foo, C.S., Batzoglou, S. 2008.
Bioinformatics
, 24(13):i68-i76. (
source code
)
Protein multiple sequence alignment
[
pdf
]
Do, C.B., Katoh, K. 2008.
Methods in Molecular Biology
484:379-413.
Effect of genetic divergence in identifying ancestral origin using HAPAA
[
pdf
]
Sundquist, A., Fratkin, E., Do, C.B., Batzoglou, S. 2008.
Genome Research
, 18:676-682.
(
website, source code
).
Automatic parameter learning for multiple network alignment
[
pdf
]
Flannick, J., Novak, A., Do, C.B., Batzoglou, S. 2008. In
Proceedings of the 12th Annual International Conference on Computational Molecular Biology (RECOMB 2008)
, 214-231.
(
website
)
CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction
[
pdf
]
Gross, S.S., Do, C.B., Sirota, M., Batzoglou, S. 2008.
Genome Biology
8:R269.
Efficient multiple hyperparameter learning for log-linear models
[
pdf
]
Do, C.B., Foo, C.-S., Ng, A.Y. 2007. In
Advances in Neural Information Processing Systems 20
.
Evolution of genes and genomes on the
Drosophila
phylogeny
[
pdf
]
Drosophila
12 Genomes Consortium. 2007.
Nature
450, 203-218.
Training conditional random fields for maximum labelwise accuracy
[
pdf
]
Gross, S.S., Russakovsky, O., Do, C.B., Batzoglou, S. 2006. In
Advances in Neural Information Processing Systems 19
.
(
derivation of recurrences
)
CONTRAfold: RNA secondary structure prediction without physics-based models
[
pdf
]
Do, C.B., Woods, D.A., Batzoglou, S. 2006.
Bioinformatics
, 22(14):e90-e98.
(
Best Paper
,
ISMB 2006
) (
webserver, manual, recurrences, source code
)
Multiple alignment of protein sequences with repeats and rearrangements
[
pdf
]
Phuong, T.M., Do, C.B., Edgar, R.C., Batzoglou, S. 2006.
Nucleic Acids Res
, 34(20):5932-5942.
(
manual, source code
)
Evidence for intelligent (algorithm) design
[
pdf
]
Srinivasan, B.S., Do, C.B., Batzoglou, S. 2006. In
Genome Biology
, 7:322.
CONTRAlign: Discriminative training for protein sequence alignment
[
pdf
]
Do, C.B., Gross, S.S., Batzoglou, S. 2006. In
Proceedings of the 10th Annual International Conference on Computational Molecular Biology (RECOMB 2006)
, 160-174.
(
Best Poster
,
BCATS 2005
) (
webserver, source code
)
Transfer learning for text classification
[
pdf
]
Do, C.B., Ng, A.Y. 2006. In
Advances in Neural Information Processing Systems 18
, 299-306.
PROBCONS: Probabilistic consistency-based multiple sequence alignment
[
pdf
]
Do, C.B., Mahabhashyam, M.S.P., Brudno, M., and Batzoglou, S. 2005.
Genome Research
, 15(2):330-340.
(
Best Paper
,
ISMB/ECCB 2004
;
Best Master's Thesis
, Stanford Computer Science Department 2004) (
webserver, source code
)
Glocal Alignment: Finding rearrangements during alignment
[
pdf
]
Brudno, M., Malde, S., Poliakov, A., Do, C.B., Couronne, O., Dubchak, I., and Batzoglou, S. 2003.
Bioinformatics
, 19S1:i54-i62.
LAGAN and Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA
[
pdf
]
Brudno, M., Do, C.B., Cooper, G.M., Kim, M.F., Davydov, E., Green, E.D., Sidow, A., and Batzoglou, S; NISC Comparative Sequencing Program. 2003.
Genome Research
, 13(4):721-731.
(
webserver, source code
)
Odds and ends
Demosthenes
: winning entry in the 2002 CS 221 Othello competition, written with Sanders Chong, Mark Tong, and Anthony Hui
Bunny World documentation
: CS 108 final project, written with Xinan Wu
Satisfiability and peg solitaire
: solving a classic puzzle with zChaff or other SAT solvers
Dense Boggle® boards
: computing dense Boggle boards with simulated annealing
Miscellaneous tricks and tips
: useful things I keep forgetting
Comments to Chuong Do (
chuongdo@cs.stanford.edu
)