Bio
I received my Ph.D. in Electrical Engineering and Computer Science at Massachusetts Institute of Technology under David Gifford. My research focused on developing machine learning models for functional genomics and therapeutic design.
Before coming to MIT, I got my bachelor’s degree at Tsinghua University in Beijing, China. I have worked with Professor David Dill in Stanford in the summer of 2012. I have also been to University of Toronto as an exchange student.
Contact: AB@gmail.com (where A is my last name and B is my first name)
Before coming to MIT, I got my bachelor’s degree at Tsinghua University in Beijing, China. I have worked with Professor David Dill in Stanford in the summer of 2012. I have also been to University of Toronto as an exchange student.
Contact: AB@gmail.com (where A is my last name and B is my first name)
Research
Predict the functional impact of non-coding genetic variants
I developed machine learning models that accurately predict transcription factor binding and DNA methylation, two fundamental epigenetic phenotypes closely tied to gene regulation, from DNA sequence alone. The computationally predicted change in phenotype between the reference and alternate allele of a genetic variant accurately reflect its functional impact and improves the identification of regulatory variants causal for complex diseases.Model peptide display by the major histocompatibility complex
I devised machine learning models that improve the prediction of peptides displayed by the major histocompatibility complex (MHC) on the cell surface. Computational modeling of peptide-display by MHC is central in the design of peptide-based therapeutics. Our machine learning models reduce false positives in high-affinity peptide design and improve the predictive accuracy in natural MHC ligands prediction.Design novel antibody sequences with improved affinity and specificty
I developed machine learning frameworks to model the enrichment of an antibody sequence in phage-panning experiments against a target antigen. We show that antibodies with low specificity can be reduced by a computational procedure using machine learning models trained for multiple targets. Moreover, machine learning can help to design novel antibody sequences with improved affinity.Publications (Google Scholar)
(* indicate co-first authors):
Antibody Complementarity Determining Region Design Using High-Capacity Machine Learning
Liu G*, Zeng H*, Mueller J, Carter B, Wang Z, Schilz J, Horny G, Birnbaum ME, Ewert S, and Gifford DK
bioRxiv, 2019
Quantification of uncertainty in peptide-MHC binding prediction improves high-affinity peptide selection for therapeutic design
Zeng H and Gifford DK
Cell Systems, 2019
Accurate prediction of MHC class I ligands using peptide embedding
Zeng H and Gifford DK
Intelligent Systems for Molecular Biology (ISMB/ECCB) , 2019
Bioinformatics 35 (14), i278-i283, 2019
Visualizing Complex Feature Interactions and Feature Sharing in Genomic Deep Neural Networks
Liu G, Zeng H and Gifford DK
BMC Bioinformatics 20:401, 2019
Training GANs with Optimism
Constantinos D*, Ilyas A*, Syrgkanis V*, and Zeng H*
International Conference on Learning Representations (ICLR) , 2018
A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction
Guo Y, Tian K, Zeng H, Guo X, and Gifford DK.
Genome research , 28 (6), 891-900, 2018
Predicting the impact of non-coding variants on DNA methylation.
Zeng H, and Gifford DK.
Nucleic Acids Research , 45 (11): e99, 2017
Accurate eQTL prioritization with an ensemble-based framework
Zeng H, Edwards MD, Guo Y, and Gifford DK.
Human Mutation , 38(9), 1259-1265, 2017
K-mer Set Memory (KSM) Motif Representation Enables Accurate Prediction of the Impact of Regulatory Variants.
Guo Y, Tian K, Zeng H, and Gifford DK.
Research in Computational Molecular Biology (RECOMB) , p. 372. Springer, 2017
Predicting gene expression in massively parallele reporter assays: a comparative study
Kreimer A, Zeng H, Edwards MD, Guo Y, Tian K, Shin S, Welch R, Wainberg M, Mohan R, Sinnott-Armstrong NA, Li Y, Eraslan G, AMIN TB, Goke J, Mueller NS, Kellis M, Kundaje A, Beer MA, Keles S, Gifford DK and Yosef N
Human Mutation , 38(9), 1240-1250, 2017
Convolutional Neural Network Architectures for Predicting DNA-Protein Binding.
Zeng H, Edwards M, Liu G, and Gifford DK.
Intelligent Systems for Molecular Biology (ISMB) , 2016
Bioinformatics , 32(12), i121-i127, 2016
A DNA code governs chromatin accessibility
Hashimoto T, Sherwood RI, Kang DD, Barkal AA, Zeng H, Emons BJM, Srinivasan S, Rajagopal N, Jaakkola T, and Gifford DK.
Genome Research , 26(10), 1430-1440, 2016
GERV: A statistical method for generative evaluation of regulatory variants for transcription factor binding
Zeng H, Hashimoto TB, Kang DD, and Gifford DK.
Bioinformatics , 32(4), 490-496, 2015
Abundant contribution of short tandem repeats to gene expression variation in humans
Gymrek M, Willems T, Guilmatre A, Zeng H, Markus B, Georgiev S, Daly MJ, Price AL, Pritchard J, Sharp A and Erlich Y.
Nature Genetics , 48(1), 22–29, 2015
A community computational challenge to predict the activity of pairs of compounds
Bansal M, Yang J, Karan C, Menden MP, Costello JC, Tang H et al.
Nature Biotechnology , 32(12), 1213-1222, 2014
Mining TCGA data using Boolean implications
Sinha S, Tsang EK, Zeng H, Meister M, and Dill DL.
PLOS ONE , 9(7): e102119, 2014
Integrative analysis of cancer data using Boolean implication networks.
Zeng H, Meister M, Sinha S, and Dill DL.
RECOMB SB/RG/DREAM , 2012
Gifford DK, Zeng H, Liu G
US Patent App. 16/171,596
Antibody Complementarity Determining Region Design Using High-Capacity Machine Learning
Liu G*, Zeng H*, Mueller J, Carter B, Wang Z, Schilz J, Horny G, Birnbaum ME, Ewert S, and Gifford DK
bioRxiv, 2019
Quantification of uncertainty in peptide-MHC binding prediction improves high-affinity peptide selection for therapeutic design
Zeng H and Gifford DK
Cell Systems, 2019
Accurate prediction of MHC class I ligands using peptide embedding
Zeng H and Gifford DK
Intelligent Systems for Molecular Biology (ISMB/ECCB) , 2019
Bioinformatics 35 (14), i278-i283, 2019
Visualizing Complex Feature Interactions and Feature Sharing in Genomic Deep Neural Networks
Liu G, Zeng H and Gifford DK
BMC Bioinformatics 20:401, 2019
Training GANs with Optimism
Constantinos D*, Ilyas A*, Syrgkanis V*, and Zeng H*
International Conference on Learning Representations (ICLR) , 2018
A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction
Guo Y, Tian K, Zeng H, Guo X, and Gifford DK.
Genome research , 28 (6), 891-900, 2018
Predicting the impact of non-coding variants on DNA methylation.
Zeng H, and Gifford DK.
Nucleic Acids Research , 45 (11): e99, 2017
Accurate eQTL prioritization with an ensemble-based framework
Zeng H, Edwards MD, Guo Y, and Gifford DK.
Human Mutation , 38(9), 1259-1265, 2017
K-mer Set Memory (KSM) Motif Representation Enables Accurate Prediction of the Impact of Regulatory Variants.
Guo Y, Tian K, Zeng H, and Gifford DK.
Research in Computational Molecular Biology (RECOMB) , p. 372. Springer, 2017
Predicting gene expression in massively parallele reporter assays: a comparative study
Kreimer A, Zeng H, Edwards MD, Guo Y, Tian K, Shin S, Welch R, Wainberg M, Mohan R, Sinnott-Armstrong NA, Li Y, Eraslan G, AMIN TB, Goke J, Mueller NS, Kellis M, Kundaje A, Beer MA, Keles S, Gifford DK and Yosef N
Human Mutation , 38(9), 1240-1250, 2017
Convolutional Neural Network Architectures for Predicting DNA-Protein Binding.
Zeng H, Edwards M, Liu G, and Gifford DK.
Intelligent Systems for Molecular Biology (ISMB) , 2016
Bioinformatics , 32(12), i121-i127, 2016
A DNA code governs chromatin accessibility
Hashimoto T, Sherwood RI, Kang DD, Barkal AA, Zeng H, Emons BJM, Srinivasan S, Rajagopal N, Jaakkola T, and Gifford DK.
Genome Research , 26(10), 1430-1440, 2016
GERV: A statistical method for generative evaluation of regulatory variants for transcription factor binding
Zeng H, Hashimoto TB, Kang DD, and Gifford DK.
Bioinformatics , 32(4), 490-496, 2015
Abundant contribution of short tandem repeats to gene expression variation in humans
Gymrek M, Willems T, Guilmatre A, Zeng H, Markus B, Georgiev S, Daly MJ, Price AL, Pritchard J, Sharp A and Erlich Y.
Nature Genetics , 48(1), 22–29, 2015
A community computational challenge to predict the activity of pairs of compounds
Bansal M, Yang J, Karan C, Menden MP, Costello JC, Tang H et al.
Nature Biotechnology , 32(12), 1213-1222, 2014
Mining TCGA data using Boolean implications
Sinha S, Tsang EK, Zeng H, Meister M, and Dill DL.
PLOS ONE , 9(7): e102119, 2014
Integrative analysis of cancer data using Boolean implication networks.
Zeng H, Meister M, Sinha S, and Dill DL.
RECOMB SB/RG/DREAM , 2012
Patents
Machine learning based antibody designGifford DK, Zeng H, Liu G
US Patent App. 16/171,596