Nucleic acid structure prediction

Nucleic acid structure prediction is a computational method to determine secondary and tertiary nucleic acid structure from its sequence. Secondary structure can be predicted from one or several nucleic acid sequences. Tertiary structure can be predicted from the sequence, or by comparative modeling.
The problem of predicting nucleic acid secondary structure is dependent mainly on base pairing and base stacking interactions; many molecules have several possible three-dimensional structures, so predicting these structures remains out of reach unless obvious sequence and functional similarity to a known class of nucleic acid molecules, such as transfer RNA or microRNA, is observed. Many secondary structure prediction methods rely on variations of dynamic programming and therefore are unable to efficiently identify pseudoknots.
While the methods are similar, there are slight differences in the approaches to RNA and DNA structure prediction. In vivo, DNA structures are more likely to be duplexes with full complementarity between two strands, while RNA structures are more likely to fold into complex secondary and tertiary structures such as in the ribosome, spliceosome, or transfer RNA. This is partly because the extra oxygen in RNA increases the propensity for hydrogen bonding in the nucleic acid backbone. The energy parameters are also different for the two nucleic acids. The structure prediction methods can follow a completely theoretical approach, or a hybrid one incorporating experimental data.

Single sequence structure prediction

A common problem for researchers working with RNA is to determine the three-dimensional structure of the molecule given only a nucleic acid sequence. However, in the case of RNA much of the final structure is determined by the secondary structure or intra-molecular base pairing interactions of the molecule. This is shown by the high conservation of base pairings across diverse species.

The most stable structure

Secondary structure of small RNA molecules is largely determined by strong, local interactions such as hydrogen bonds and base stacking. Summing the free energy for such interactions should provide an approximation for the stability of a given structure. To predict the folding free energy of a given secondary structure, an empirical nearest-neighbor model is used. In the nearest neighbor model the free energy change for each motif depends on the sequence of the motif and of its closest base-pairs. The model and parameters of minimal energy for Watson–Crick pairs, GU pairs and loop regions were derived from empirical calorimetric experiments, the most up-to-date parameters were published in 2004, although most software packages use the prior set assembled in 1999.
The simplest way to find the lowest free energy structure would be to generate all possible structures and calculate the free energy for it, but the number of possible structures for a sequence increases exponentially with the length of RNA: number of secondary structures = ^N, N- number of nucleotides. For longer molecules, the number of possible secondary structures is huge: a sequence of 100 nucleotides has more than 10²⁵ possible secondary structures.

Dynamic programming algorithms

Most popular methods for predicting RNA and DNA's secondary structure involve dynamic programming. One of the early attempts at predicting RNA secondary structure was made by Ruth Nussinov and co-workers who developed a dynamic programming-based algorithm that maximized the length and number of a series of "blocks". Each "block" required at least two nucleotides, which reduced the algorithm's storage requirements over single base-matching approaches. Nussinov et al. later published an adapted approach with improved performance that increased the RNA size limit to ~1,000 bases by folding increasingly sized subsections while storing the results of prior folds. In 1981, Michael Zuker and Patrick Stiegler proposed a refined approach with performance comparable to Nussinov et al.'s solution but with the additional ability to find also find "suboptimal" secondary structures.
Dynamic programming algorithms provide a means to implicitly check all variants of possible RNA secondary structures without explicitly generating the structures. First, the lowest conformational free energy is determined for each possible sequence fragment starting with the shortest fragments and then for longer fragments. For longer fragments, recursion on the optimal free energy changes determined for shorter sequences speeds the determination of the lowest folding free energy. Once the lowest free energy of the complete sequence is calculated, the exact structure of RNA molecule is determined.
Dynamic programming algorithms are commonly used to detect base pairing patterns that are "well-nested", that is, form hydrogen bonds only to bases that do not overlap one another in sequence position. Secondary structures that fall into this category include double helices, stem-loops, and variants of the "cloverleaf" pattern found in transfer RNA molecules. These methods rely on pre-calculated parameters which estimate the free energy associated with certain types of base-pairing interactions, including Watson-Crick and Hoogsteen base pairs. Depending on the complexity of the method, single base pairs may be considered, and short two- or three-base segments, to incorporate the effects of base stacking. This method cannot identify pseudoknots, which are not well nested, without substantial algorithmic modifications that are computationally very costly.

Suboptimal structures

The accuracy of RNA secondary structure prediction from one sequence by free energy minimization is limited by several factors:

The free energy value's list in nearest neighbor model is incomplete
Not all known RNA folds in such a way as to conform with the thermodynamic minimum.
Some RNA sequences have more than one biologically active conformation

For this reason, the ability to predict structures which have similar low free energy can provide significant information. Such structures are termed suboptimal structures. MFOLD is one program that generates suboptimal structures.

Predicting pseudoknots

One of the issues when predicting RNA secondary structure is that the standard free energy minimization and statistical sampling methods can not find pseudoknots. The major problem is that the usual dynamic programing algorithms, when predicting secondary structure, consider only the interactions between the closest nucleotides, while pseudoknotted structures are formed due to interactions between distant nucleotides. Rivas and Eddy published a dynamic programming algorithm for predicting pseudoknots. However, this dynamic programming algorithm is very slow. The standard dynamic programming algorithm for free energy minimization scales O in time, while the Rivas and Eddy algorithm scales O in time. This has prompted several researchers to implement versions of the algorithm that restrict classes of pseudoknots, resulting in performance gains. For example, pknotsRG tool includes only the class of simple recursive pseudoknots and scales O in time.

Other approaches for RNA secondary structure prediction

Another approach for RNA secondary structure determination is to sample structures from the Boltzmann ensemble, as exemplified by the program SFOLD. The program generates a statistical sample of all possible RNA secondary structures. The algorithm samples secondary structures according to the Boltzmann distribution. The sampling method offers an appealing solution to the problem of uncertainties in folding.

Comparative secondary structure prediction

Sequence covariation methods rely on the existence of a data set composed of multiple homologous RNA sequences with related but dissimilar sequences. These methods analyze the covariation of individual base sites in evolution; maintenance at two widely separated sites of a pair of base-pairing nucleotides indicates the presence of a structurally required hydrogen bond between those positions. The general problem of pseudoknot prediction has been shown to be NP-complete.
In general, the problem of alignment and consensus structure prediction are closely related. Three different approaches to the prediction of consensus structures can be distinguished:

Folding of alignment
Simultaneous sequence alignment and folding
Alignment of predicted structures
Align then fold

A practical heuristic approach is to use multiple sequence alignment tools to produce an alignment of several RNA sequences, to find consensus sequence and then fold it. The quality of the alignment determines the accuracy of the consensus structure model. Consensus sequences are folded using various approaches similarly as in individual structure prediction problem. The thermodynamic folding approach is exemplified by RNAalifold program. The different approaches are exemplified by Pfold and ILM programs. Pfold program implements a SCFGs. ILM unlike the other algorithms for folding of alignments, can return pseudoknoted structures. It uses combination of thermodynamics and mutual information content scores.

Align and fold

frequently preserves functional RNA structure better than RNA sequence. Hence, a common biological problem is to infer a common structure for two or more highly diverged but homologous RNA sequences. In practice, sequence alignments become unsuitable and do not help to improve the accuracy of structure prediction, when sequence similarity of two sequences is less than 50%.
Structure-based alignment programs improves the performance of these alignments and most of them are variants of the Sankoff algorithm. Basically, Sankoff algorithm is a merger of sequence alignment and Nussinov folding dynamic programming method. Sankoff algorithm itself is a theoretical exercise because it requires extreme computational resources in time, and O^{. Some notable attempts at implementing restricted versions of Sankoff's algorithm are Foldalign, Dynalign, PMmulti/PMcomp, Stemloc, and Murlet. In these implementations the maximal length of alignment or variants of possible consensus structures are restricted. For example, Foldalign focuses on local alignments and restricts the possible length of the sequences alignment.Fold then align
A less widely used approach is to fold the sequences using single sequence structure prediction methods and align the resulting structures using tree-based metrics. The fundamental weakness with this approach is that single sequence predictions are often inaccurate, thus all further analyses are affected.Tertiary structure prediction
Once secondary structure of RNA is known, the next challenge is to predict tertiary structure. The biggest problem is to determine the structure of regions between double stranded helical regions. Also RNA molecules often contain posttranscriptionally modified nucleosides, which because of new possible non-canonical interactions, cause a lot of troubles for tertiary structure prediction.
The three-dimensional structure prediction methods can use comparative modeling which starts from a related known structure known as the template. The alternative strategy is de novo modeling of RNA secondary structure which uses physics-based principles such as molecular dynamics or random sampling of the conformational landscape followed by screening with a statistical potential for scoring. These methods either use an all-atom representation of the nucleic acid structure or a coarse-grained representation. The low-resolution structures generated by many of these modeling methods are then subjected to high-resolution refinement.}

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...