Prediction of alphabets of local protein structures using data mining methods

The 3D structure of the backbone can be described using prototypes of local protein structures. A set of local structure prototypes determines the library of local protein structures, also called the structural alphabet. A structural alphabet is defined as a set of N prototypes of L amino acid length.

Amongst several approaches to the prediction of 3D structures from amino acid sequences, one approach is based on the prediction of SA prototypes for a given amino acid sequence. Protein Blocks (PBs) is the most known SA, and it is composed of 16 prototypes of five consecutive amino acids.

In the research, models for PBs prediction from sequence information were developed using different data mining approaches. The amino acid sequences were combined with the results of the following tools: Spider3 predictor of protein structure properties, several predictors of the protein’s intrinsically disordered regions and a tool for finding repeats in amino acid sequences. Obtained data were used as an input to the prediction model of structural alphabet prototypes. The highest accuracy of the constructed models is 80%. The previous best available prediction has accuracy of 61%. The best-achieved accuracy was for the model constructed using the C5.0 algorithm.

Mirjana Maljković, PhD, is a teaching assistant at the Department of Computer Science and Informatics at the Faculty of Mathematics, University of Belgrade. Mirjana has participated in providing lectures in the following subjects: Relational Databases, Database Programming, Software Development, Data Mining and Introduction to Programming. Her main areas of interest are data mining and bioinformatics. She is a member of BioMath: Bioinformatics research group at the Faculty of Mathematics, University of Belgrade.