Annotating Protein Domains with Machine Learning – A Google Research Project
Google Researchers Use Machine Learning Approach To Annotate Protein Domains
In a recent study published in Nature Communications, Google researchers used a machine learning approach to annotate the structures of protein domains. The team employed a deep learning algorithm to detect the structural features of proteins and related domains from publicly available data.
Proteins are complex molecules that are essential for life. They perform a range of functions that are critical for the health of an organism. For example, proteins help maintain cell structure, regulate chemistry and transport molecules within cells.
Comprehending the structure of a protein domain is an essential step in understanding how proteins work. However, the complexity of proteins means that manual annotation is laborious and time consuming, leading to inaccurate results. Automation of this process has the potential to improve accuracy and speed.
To tackle this task, the researchers used a deep learning architecture called DeepDETECT to analyze protein domains. Specifically, they used a fully-convolutional neural network to map raw amino acid sequences onto structural templates. They then used another layer to detect patterns that help identify proteins’ domains, such as alpha-helices and beta-sheets.
The researchers found that DeepDETECT was able to detect protein domains with an accuracy of 96.4%. Additionally, the algorithm was effective at recognizing other protein features, such as metal-binding sites and locations of disulfide bonds. This suggests that DeepDETECT may have potential for a wide range of applications in protein structure analysis.
The use of machine learning to analyze protein domains could prove to be a powerful tool in the field of structural biology. It could be used to quickly annotate protein structures, which would speed up the process of drug development and could lead to more effective treatments.