Ga naar de inhoud

STAPLER

Efficient learning of TCR-peptide specificity prediction from full-length TCR-peptide data

Bjørn P. Y. Kwee 1,3,‡, Marius Messemaker 1,3,‡, Eric Marcus 2,4, Giacomo Oliveira 5,6, Wouter Scheper 1, Catherine J. Wu 5,6,7,8, Jonas Teuwen 2,4,9,10, and Ton N. Schumacher 1,3,*.

The prediction of peptide-MHC (pMHC) recognition by αβ T-cell receptors (TCRs) remains a major biomedical challenge. Here, we develop STAPLER (Shared TCR And Peptide Language bidirectional Encoder Representations from transformers), a transformer language model that uses a joint TCRαβ-peptide input to allow the learning of patterns within and between TCRαβ and peptide sequences that encode recognition. First, we demonstrate how data leakage during negative data generation can confound performance estimates of neural network-based models in predicting TCR-pMHC specificity. We then demonstrate that, because of its pre-training and fine-tuning masked language modeling tasks, STAPLER outperforms both neural network-based and distance-based ML models in predicting the recognition of known antigens in an independent dataset, in particular for antigens for which little related data is available. Based on this ability to efficiently learn from limited labeled TCR-peptide data, STAPLER is well-suited to utilize growing TCR-pMHC datasets to achieve accurate prediction of TCR-pMHC specificity.

Read here (pdf)

1 Division of Molecular Oncology & Immunology, Oncode Institute, The Netherlands Cancer Institute, Amsterdam, The Netherlands.
2 Division of Radiation Oncology, The Netherlands Cancer Institute, Amsterdam, The Netherlands.
3 Department of Hematology, Leiden University Medical Center, Leiden, The Netherlands.
4 Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands.
5 Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.
6 Harvard Medical School, Boston, MA, USA.
7 Broad Institute of MIT and Harvard, Cambridge, MA, USA.
8 Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA.
9 Department of Medical Imaging, Radboud University Medical Center, Nijmegen, the Netherlands.
10 Department of Radiology, Memorial Sloan Kettering Cancer Center, New York City, NY, USA.
‡ These authors contributed equally: Bjørn Kwee, Marius Messemaker.

Competing Interest Statement
T.N.S. is advisor for Allogene Therapeutics, Asher Bio, Celsius, Merus, Neogene Therapeutics, and Scenic Biotech; is a stockholder in Allogene Therapeutics, Asher Bio, Cell Control, Celsius, Merus, and Scenic Biotech; and is venture partner at Third Rock Ventures, all outside of the current work. J.T is advisor for ScreenPoint Medical and is a stockholder in Ellogon.AI, all outside of the current work. C.J.W is an equity holder of BioNTech. The remaining authors declare no competing interest.