Predicting Protein Structure from First Principles

Understanding how an amino acid sequence attains its 3D structure has been a subject of intense research over the last 60 years. Recently, deep learning models like AlphaFold2 or AlphaFold3 represented a breakthrough in the prediction of protein structures. However, these algorithms do not reveal too many of the fundamental physico-chemical principles governing the folding process. Therefore, there is still a need for a deeper understanding of the “aufbau” principles of protein 3D structures and protein folding.

One of the questions in protein folding/3D-structure prediction is whether the process can be understood from the conformational preferences of short peptide fragments. To address this, we presented a series of initial computational and bioinformatics studies (see below), pioneering the so-called ‘ab initio protein folding’, where ab initio means (1) without prior knowledge, and (2) employing quantum chemistry rather than force fields.

We systematically investigated shorter peptides and searched for an appearance of secondary structure. We aimed to build foundations for a rigorous theoretical framework to address the predictions of protein 3D structures from first principles (ab initio). Our research efforts culminated in a joint computational and experimental study showing the gradual appearance of extended or helical propensities, predicted by large-scale quantum chemical (DFT-D3) computations in implicit solvent (COSMO-RS). We concluded that in complex conformational ensembles of capped (= N-terminal acetylated, C-terminal amidated) tripeptides/triplets, the secondary structure propensities are barely visible, yet existent. However, they could be fully developed in peptides as short as 11 amino acids. In particular, the NMR data and ECD and VCD spectra aligned perfectly with computations and provided conclusive evidence that an 11-peptide, CATWEAMEKCK, forms a relatively stable α-helix. This gave us confidence that the computed low-energy conformational ensembles of the shorter peptides indeed correspond to their real counterparts in test tubes. In parallel, we have constructed exhaustive databases of equilibrium structures and energies, denoted P-CONF_1.6M and PeptideCs, of capped amino acids, dipeptides and tripeptides, see http://peptidecs.uochb.cas.cz, respectively. The energies were calculated from first principles, at the calibrated quantum mechanical (DFT-D3) level, employing the COSMO-RS solvation method. Last but not least, we have shown that there is a correspondence between (over/under-)populations of certain short peptide fragments in the secondary structures of proteins and their inherent propensities for the specific secondary structure.