Arriola, M., Johnston, K., Frances, A.

While standard protein representations depend on sequence, we hypothesize that protein structure provides richer information for the model to learn these properties. We represent each protein of interest as a graph, or a network of amino-acid connections in the protein, and implement a graph machine learning model to predict a protein’s fitness. We show that this structure-based model has superior performance to sequence-only approaches for fitness prediction. We further extend this model to automatically find the best protein for a given task by optimizing a protein’s graph representation.

Updated: