The recent explosion in the power of cryo-electron microscopy has revolutionised the structural biology field, especially the characterisation of large protein complexes. This is helping us tackle very important biological problems in a way that they could never before.
There are, however, inherent limitations that not only pose difficulties to the structure solving stage (i.e. properly positioning a protein chain within an electron density map and correctly defining the register of the sequence) but also potentially introducing errors that might be propagated to the refined structure. This is especially relevant for medium resolution structures (4–8 Å). The problem is analogous to looking through blurry (or drunk) glasses and, without good points of reference, not being able to orientate yourself.
To improve this procedure we can leverage the power of our existing structural and evolutionary knowledge accumulated over decades and deposited in structural databases in order to help guide the proposal of more effective methods for this molecule placement.
We are using structural bioinformatics and machine learning to develop novel computational tools to aid cryo-EM and low resolution crystal structure solving, analysing protein residue environments, protein interaction interfaces, and protein functional sites. These methods will be brought together into an integrated platform for the evaluation and validation of medium resolution protein structures.
A major challenge is identifying and understanding short protein stretches that mediate functional interactions. We have developed several methods for predicting these kinds of interactions. Furthermore, understanding protein flexibility, a key component of protein interactions, is another challenge in fitting models of multi-protein complexes. By analysing the movements of proteins that undergo large conformational change upon association (> 2 Å RMSD), we aim to identify those structural features that provide the information to guide this motion and binding. Using these approaches, we aim to be able to identify and reproduce the direction and extent of conformational change, which also has implications for the mechanics of protein recognition.
We are also using graph-based signatures to evaluate protein structure, chemistry, interactions and geometry. This will help aid in the identification of problems in structures, but are also being used to build tools to identify active sites, and to identify protein, nucleic acid and small molecule interaction sites, including cryptic pockets.