Nucleic acid analogue


Nucleic acid analogues are compounds which are analogous to naturally occurring RNA and DNA, used in medicine and in molecular biology research.
Nucleic acids are chains of nucleotides, which are composed of three parts: a phosphate backbone, a pentose sugar, either ribose or deoxyribose, and one of four nucleobases.
An analogue may have any of these altered. Typically the analogue nucleobases confer, among other things, different base pairing and base stacking properties. Examples include universal bases, which can pair with all four canonical bases, and phosphate-sugar backbone analogues such as PNA, which affect the properties of the chain.
Nucleic acid analogues are also called Xeno Nucleic Acid and represent one of the main pillars of xenobiology, the design of new-to-nature forms of life based on alternative biochemistries.
Artificial nucleic acids include peptide nucleic acid, Morpholino and locked nucleic acid, as well as glycol nucleic acid and threose nucleic acid. Each of these is distinguished from naturally occurring DNA or RNA by changes to the backbone of the molecule.
In May 2014, researchers announced that they had successfully introduced two new artificial nucleotides into bacterial DNA, and by including individual artificial nucleotides in the culture media, were able to passage the bacteria 24 times; they did not create mRNA or proteins able to use the artificial nucleotides. The artificial nucleotides featured 2 fused aromatic rings.

Medicine

Several nucleoside analogues are used as antiviral or anticancer agents. The viral polymerase incorporates these compounds with non-canonical bases. These compounds are activated in the cells by being converted into nucleotides, they are administered as nucleosides since charged nucleotides cannot easily cross cell membranes.

Molecular biology

Nucleic acid analogues are used in molecular biology for several purposes:
Investigation of possible scenarios of the origin of life: By testing different analogs, researchers try to answer the question of whether life's use of DNA and RNA was selected over time due to its advantages, or if they were chosen by arbitrary chance;
As a tool to detect particular sequences: XNA can be used to tag and identify a wide range of DNA and RNA components with high specificity and accuracy;
As an enzyme acting on DNA, RNA and XNA substrates - XNA has been shown to have the ability to cleave and ligate DNA, RNA and other XNA molecules similar to the actions of RNA ribozymes;
As a tool with resistance to RNA hydrolysis;
Investigation of the mechanisms used by enzyme;
Investigation of the structural features of nucleic acids.

Backbone analogues

Hydrolysis resistant RNA-analogues

To overcome the fact that ribose's 2' hydroxy group that reacts with the phosphate linked 3' hydroxy group, a ribose analogue is used. The most common RNA analogues are 2'-O-methyl-substituted RNA, locked nucleic acid or bridged nucleic acid, morpholino, and peptide nucleic acid. Although these oligonucleotides have a different backbone sugar or, in the case of PNA, an amino acid residue in place of the ribose phosphate, they still bind to RNA or DNA according to Watson and Crick pairing, but are immune to nuclease activity. They cannot be synthesized enzymatically and can only be obtained synthetically using phosphoramidite strategy or, for PNA, methods of peptide synthesis.

Other notable analogues used as tools

are used in sequencing. These nucleoside triphosphates possess a non-canonical sugar, dideoxyribose, which lacks the 3' hydroxyl group normally present in DNA and therefore cannot bond with the next base. The lack of the 3' hydroxyl group terminates the chain reaction as the DNA polymerases mistake it for a regular deoxyribonucleotide. Another chain-terminating analogue that lacks a 3' hydroxyl and mimics adenosine is called cordycepin. Cordycepin is an anticancer drug that targets RNA replication. Another analogue in sequencing is a nucleobase analogue, 7-deaza-GTP and is used to sequence CG rich regions, instead 7-deaza-ATP is called tubercidin, an antibiotic.

Precursors to the RNA world

RNA may be too complex to be the first nucleic acid, so before the RNA world several simpler nucleic acids that differ in the backbone, such as TNA and GNA and PNA, have been offered as candidates for the first nucleic acids.

Base analogues

Nucleobase structure and nomenclature

Naturally occurring bases can be divided into two classes according to their structure:
Artificial nucleotides have been inserted into bacterial DNA but these genes did not template mRNA or induce protein synthesis. The artificial nucleotides featured two fused aromatic rings which formed a complex mimicking the natural base pair.

Mutagens

One of the most common base analogs is 5-bromouracil, the abnormal base found in the mutagenic nucleotide analog BrdU. When a nucleotide containing 5-bromouracil is incorporated into the DNA, it is most likely to pair with adenine; however, it can spontaneously shift into another isomer which pairs with a different nucleobase, guanine. If this happens during DNA replication, a guanine will be inserted as the opposite base analog, and in the next DNA replication, that guanine will pair with a cytosine. This results in a change in one base pair of DNA, specifically a transition mutation.
Additionally, HNO2, or nitrous acid is a potent mutagen that acts on replicating and non-replicating DNA. It can cause deamination of the amino groups of Adenine, Guanine and Cytosine. Adenine is deaminated to hypoxanthine, which base pairs to cytosine instead of thymine. Cytosine is deaminated to uracil, which base pairs with Adenine instead of Guanine. Deamination of Guanine is not mutagenic. Nitrous acid-induced mutations also are induced to mutate back to wild-type using nitrous acid.

Fluorophores

Commonly fluorophores are linked to the ring linked to the sugar via a flexible arm, presumably extruding from the major groove of the helix. Due to low processivity of the nucleotides linked to bulky adducts such as florophores by taq polymerases, the sequence is typically copied using a nucleotide with an arm and later coupled with a reactive fluorophore :
Fluorophores find a variety of uses in medicine and biochemistry.

Fluorescent base analogues

The most commonly used and commercially available fluorescent base analogue, 2-aminopurine, has a high-fluorescence quantum yield free in solution that is considerably reduced when incorporated into nucleic acids. The emission sensitivity of 2-AP to immediate surroundings is shared by other promising and useful fluorescent base analogues like 3-MI, 6-MI, 6-MAP, pyrrolo-dC, modified and improved derivatives of pyrrolo-dC, furan-modified bases and many other ones. This sensitivity to the microenvironment has been utilized in studies of e.g. structure and dynamics within both DNA and RNA, dynamics and kinetics of DNA-protein interaction and electron transfer within DNA. A newly developed and very interesting group of fluorescent base analogues that has a fluorescence quantum yield that is nearly insensitive to their immediate surroundings is the tricyclic cytosine family. 1,3-Diaza-2-oxophenothiazine, tC, has a fluorescence quantum yield of approximately 0.2 both in single- and in double-strands irrespective of surrounding bases. Also the oxo-homologue of tC called tCO, 1,3-diaza-2-oxophenoxazine, has a quantum yield of 0.2 in double-stranded systems. However, it is somewhat sensitive to surrounding bases in single-strands. The high and stable quantum yields of these base analogues make them very bright, and, in combination with their good base analogue properties, they are especially useful in fluorescence anisotropy and FRET measurements, areas where other fluorescent base analogues are less accurate. Also, in the same family of cytosine analogues, a FRET-acceptor base analogue, tCnitro, has been developed. Together with tCO as a FRET-donor this constitutes the first nucleic acid base analogue FRET-pair ever developed. The tC-family has, for example, been used in studies related to polymerase DNA-binding and DNA-polymerization mechanisms.

Natural non-canonical bases

In a cell, there are several non-canonical bases present: CpG islands in DNA, all eukaryotic mRNA, and several bases of rRNAs. Often, tRNAs are heavily modified postranscriptionally in order to improve their conformation or base pairing, in particular in/near the anticodon: inosine can base pair with C, U, and even with A, whereas thiouridine is more specific than uracil. Other common tRNA base modifications are pseudouridine, dihydrouridine, queuosine, wyosine, and so forth. Nevertheless, these are all modifications to normal bases and are not placed by a polymerase.

Base-pairing

Canonical bases may have either a carbonyl or an amine group on the carbons surrounding the nitrogen atom furthest away from the glycosidic bond, which allows them to base pair via hydrogen bonds. Adenine and 2-aminoadenine have one/two amine group, whereas thymine has two carbonyl groups, and cytosine and guanine are mixed amine and carbonyl.
The precise reason why there are only four nucleotides is debated, but there are several unused possibilities.
Furthermore, adenine is not the most stable choice for base pairing: in Cyanophage S-2L diaminopurine is used instead of adenine. Diaminopurine basepairs perfectly with thymine as it is identical to adenine but has an amine group at position 2 forming 3 intramolecular hydrogen bonds, eliminating the major difference between the two types of basepairs. This improved stability affects protein-binding interactions that rely on those differences.
Other combination include,
However, correct DNA structure can form even when the bases are not paired via hydrogen bonding; that is, the bases pair thanks to hydrophobicity, as studies have shown using DNA isosteres, such as the thymine analogue 2,4-difluorotoluene or the adenine analogue 4-methylbenzimidazole. An alternative hydrophobic pair could be isoquinoline, and the pyrrolopyridine
Other noteworthy basepairs:
In metal base-pairing, the Watson-Crick hydrogen bonds are replaced by the interaction between a metal ion with nucleosides acting as ligands. The possible geometries of the metal that would allow for duplex formation with two bidentate nucleosides around a central metal atom are: tetrahedral, dodecahedral, and square planar. Metal-complexing with DNA can occur by the formation of non-canonical base pairs from natural nucleobases with participation by metal ions and also by the exchanging the hydrogen atoms that are part of the Watson-Crick base pairing by metal ions. Introduction of metal ions into a DNA duplex has shown to have potential magnetic, conducting properties, as well as increased stability.
Metal complexing has been shown to occur between natural nucleobases. A well-documented example is the formation of T-Hg-T, which involves two deprotonated thymine nucleobases that are brought together by Hg2+ and forms a connected metal-base pair. This motif does not accommodate stacked Hg2+ in a duplex due to an intrastrand hairpin formation process that is favored over duplex formation. Two thymines across from each other in a duplex do not form a Watson-Crick base pair in a duplex; this is an example where a Watson-Crick basepair mismatch is stabilized by the formation of the metal-base pair. Another example of a metal complexing to natural nucleobases is the formation of A-Zn-T and G-Zn-C at high pH; Co+2 and Ni+2 also form these complexes. These are Watson-Crick base pairs where the divalent cation in coordinated to the nucleobases. The exact binding is debated.
A large variety of artificial nucleobases have been developed for use as metal base pairs. These modified nucleobases exhibit tunable electronic properties, sizes, and binding affinities that can be optimized for a specific metal. For, example a nucleoside modified with a pyridine-2,6-dicarboxylate has shown to bind tightly to Cu2+, whereas other divalent ions are only loosely bound. The tridentate character contributes to this selectivity. The fourth coordination site on the copper is saturated by an oppositely arranged pyridine nucleobase. The asymmetric metal base pairing system is orthogonal to the Watson-Crick base pairs. Another example of an artificial nucleobase is that with hydroxypyridone nucleobases, which are able to bind Cu2+ inside the DNA duplex. Five consecutive copper-hydroxypyridone base pairs were incorporated into a double strand, which were flanked by only one natural nucleobase on both ends. EPR data showed that the distance between copper centers was estimated to be 3.7 ± 0.1 Å, while a natural B-type DNA duplex is only slightly larger. The appeal for stacking metal ions inside a DNA duplex is the hope to obtain nanoscopic self-assembling metal wires, though this has not been realized yet.

Unnatural base pair (UBP)

An unnatural base pair is a designed subunit of DNA which is created in a laboratory and does not occur in nature. In 2012, a group of American scientists led by Floyd Romesberg, a chemical biologist at the Scripps Research Institute in San Diego, California, published that his team designed an unnatural base pair. The two new artificial nucleotides or Unnatural Base Pair were named d5SICS and dNaM. More technically, these artificial nucleotides bearing hydrophobic nucleobases, feature two fused aromatic rings that form a complex or base pair in DNA. In 2014 the same team from the Scripps Research Institute reported that they synthesized a stretch of circular DNA known as a plasmid containing natural T-A and C-G base pairs along with the best-performing UBP Romesberg's laboratory had designed, and inserted it into cells of the common bacterium E. coli that successfully replicated the unnatural base pairs through multiple generations. This is the first known example of a living organism passing along an expanded genetic code to subsequent generations. This was in part achieved by the addition of a supportive algal gene that expresses a nucleotide triphosphate transporter which efficiently imports the triphosphates of both d5SICSTP and dNaMTP into E. coli bacteria. Then, the natural bacterial replication pathways use them to accurately replicate the plasmid containing d5SICS-dNaM.
The successful incorporation of a third base pair is a significant breakthrough toward the goal of greatly expanding the number of amino acids which can be encoded by DNA, from the existing 20 amino acids to a theoretically possible 172, thereby expanding the potential for living organisms to produce novel proteins. Earlier, the artificial strings of DNA did not encode for anything, but scientists speculated they could be designed to manufacture new proteins which could have industrial or pharmaceutical uses. Transcription of DNA containing unnatural base pair and translation of corresponding mRNA were actually achieved recently. On November 2017, the same team at the Scripps Research Institute that first introduced two extra nucleobases into bacterial DNA, reported having constructed a semi-synthetic E. coli bacteria able to make proteins using such DNA. Its DNA contained six different nucleobases: four canonical and two artificially added, dNaM and dTPT3. Also, this bacteria had two corresponding additional RNA bases included in two new codons, additional tRNAs recognizing these new codons and additional amino acids, making the bacteria able to synthesize "unnatural" proteins.
Another demonstration of UBPs were achieved by Ichiro Hirao's group at RIKEN institute in Japan. In 2002, they developed an unnatural base pair between 2-amino-8-purine and pyridine-2-one that functions in vitro in transcription and translation, for the site-specific incorporation of non-standard amino acids into proteins. In 2006, they created 7-imidazopyridine and pyrrole-2-carbaldehyde as a third base pair for replication and transcription. Afterward, Ds and 4--2-nitropyrrole was discovered as a high fidelity pair in PCR amplification. In 2013, they applied the Ds-Px pair to DNA aptamer generation by in vitro selection and demonstrated the genetic alphabet expansion significantly augment DNA aptamer affinities to target proteins.

Orthogonal system

The possibility has been proposed and studied, both theoretically and experimentally, of implementing an orthogonal system inside cells independent of the cellular genetic material in order to make a completely safe system, with the possible increase in encoding potentials.
Several groups have focused on different aspects: