All classes of RNA transcripts must be processed into mature species. The reactions include several types: Nucleolytic cleavage, as in the separation of the mature rRNA species from the primary transcript of RNA polymerase I action; Chain extension (non‐template‐directed), as in the synthesis or regeneration of the common CCA sequence at the 3′ end of transfer RNAs or of PolyA at the 3′ end of mRNAs; and Nucleotide modification, for example, the synthesis of methylated nucleotides in tRNA or rRNA. These reactions are a feature of both prokaryotic and eukaryotic gene expression, and the biological consequences are diverse. For example, modified nucleotides can affect the way in which a tRNA recognizes different codons.
Messenger RNA processing is complex, especially in eukaryotes. Prokaryotic mRNA processing is relatively unimportant in regulating gene expression. The chief function of prokaryotic mRNA seems to be to regulate stability. The terminator stem and loop stabilizes mRNA against nucleolytic degradation, and in some cases, removal of this structure destabilizes mRNA so that it is transcribed less efficiently.
Eukaryotic mRNA processing is much more complex and has many consequences for gene expression. Most obviously, many eukaryotic genes contain introns, which are found in the primary transcript. For the mRNA to be translated into a useful protein, some way to remove them from the transcript and still preserve the coding sequence of the mature messenger RNA must exist. Additionally, the 5′ ends of eukaryotic mRNAs contain a specialized cap structure and a 3′ polyA “tail,” and these are not encoded by the DNA template. Many of these reactions involve small nuclear ribonucleoprotein particles (snRNPs or “snurps”) as essential components. Other submicroscopic structures in the nucleus are essential for processing and transport of the mature RNAs from the nucleus into the cytoplasm.
A soluble enzyme system carries out cap addition to the 5′ end of mRNA during the time that RNA polymerase II is still synthesizing the 3′ region of the mRNA. See Figure 1.
Figure 1
First, one phosphate of the 5′‐terminal phosphate of the pre‐mRNA transcript is cleaved to yield a diphosphate, which then attacks GTP, releasing inorganic pyrophosphate and making a 5′‐5′ phosphate bond. The immature cap is then methylated. The G is methylated to make N 7‐methyl G, while the first and (sometimes) the second nucleotides of the transcript are methylated on their 2′‐hydroxyl groups. The cap performs two functions: First, it is usually required for binding the mRNA to the ribosome, and it also seems to allow the recognition of introns.
Polyadenylation occurs at the 3′ end of the pre‐mRNA. First, the pre‐mRNA is cleaved when a specific sequence, AAUAAA, is present in the transcript. Cleavage of the pre‐mRNA occurs about 20 or so nucleotides “downstream” (3′) of the polyA signal sequence. RNA polymerase II continues on the template, sometimes for as long as 1,000 nucleotides before termination occurs. The RNA that is made during this transcription is simply degraded. Cleavage of the pre‐mRNA, on the other hand, generates a free 3′‐OH group, which is then extended by the enzyme poly(A) polymerase in a reaction that uses ATP and releases inorganic pyrophosphate (just as template‐directed synthesis does). The polyA sequence is about 200–300 nucleotides long when it is first made. See Figure 2.
Figure 2
Both capping and poly A formation precede intron removal. Thus, the first steps of processing result in a pre‐mRNA that has a 5′ cap and a 3′ polyA tail, but with all its introns present.
The term splicing refers to the process by which introns are removed and the mRNA put back together to form a continuous coding sequence in the 5′‐3′ direction. Remembering how accurate this process must be is important. If only a single nucleotide of an intron were left in the processed mRNA, the protein made from that mRNA would be non‐functional, because the ribosome would read the wrong codons. The cellular machinery that splices pre‐mRNAs uses information at the splice junctions to determine where to cut and where to rejoin the mRNA. Removal of introns from transcripts containing more than one intron usually occurs in a preferred but not exclusive order. Several “pathways” are used.
The small nuclear ribonucleoprotein particles (snRNPs or “snurps”) that carry out the splicing reaction use RNA‐RNA base‐pairing to select the splice sites. Almost all intron‐exon junctions contain the sequence AG‐GU with the GU beginning the intron sequence. Furthermore, the consensus sequence for the beginning of the intron has a longer sequence complementary to the U1 RNA. Thus, assembly of the splicing complex, called the spliceosome, starts when the RNA component of the U1 snRNP base pairs with the junction between the 3′ end of the exon and the 5′ end of the intron. See Figure 3.
Figure 3
Introns end in the dinucleotide sequence AG. A region high in C and U residues, called the polypyrimidine tract, precedes this sequence. Upstream of the polypyrimidine tract is a sequence that has the (more or less) ‐consensus 5′ AUCUAACA ‐3′, which is partially complementary to a sequence in U2 RNA, 5′ UGUAGUA 3′. One of the As in the intron sequence is bulged out when the intron‐U2 junction is formed. This bulged A is important in the splicing reaction. Spliceosome assembly also involves three other snRNPs and a variety of accessory proteins, including those that recognize the polypyrimidine tract and other proteins that unwind the pre‐mRNA.
The actual splicing reaction involves an unusual RNA reaction. The pre‐mRNA is cleaved at the junction so that the intron begins with a 5′‐phosphate. This pGU end of the intron then is joined to the free 2′ OH of the bulged A at the consensus AUCUAACA sequence. The result is an unusual A with all three of its hydroxyl groups bound into a phosphodiester linkage. The 5′ and 3′ hydroxyls are in the linear chain of the intron, while the 2′ OH forms a 2′,5′ phosphodiester linkage with the 5′ phosphate of the intron's G. This RNA structure is called a lariat. The formation of the lariat determines the site of cleavage at the 3′ end of the intron. Cleavage of the intron‐exon junction and joining of the two exon sequences occurs at the first AG downstream of the lariat branch point.
In summary, mRNA processing requires an interconnected set of reactions: Cap and polyA synthesis, followed by intron removal. Soluble enzymes catalyze the former two reactions, while the latter set of reactions involves both RNA and protein components.
This scheme for eukaryotic mRNA processing is consistent with defects in various genetic diseases. For example, phenylketonuria (PKU) is a condition characterized by an autosomal recessive gene that results from a deficiency of the enzyme phenylalanine hydroxylase, which carries out the conversion of phenylalanine to tyrosine, which is subsequently broken down by other enzymes to TCA cycle intermediates. Children with the disease must be given a synthetic low‐phenylanine diet, or mental retardation will result. The most common mutation is the conversion of an intron 5′ GU sequence to an AU, which results in the synthesis of mRNA that can't be translated into active protein because intron sequences remain after processing. Similarly, some forms of the disease β‐thalassemia result from defective mRNA processing reactions due to base changes in the gene for β‐globin. The defective mRNA that is produced in these reactions codes for mutated β‐globin that doesn't function as an oxygen carrier.
RNA catalysis
Surprisingly, RNA can catalyze biochemical reactions. Most of these “ribozymes” work on RNA substrates (often themselves) but one case is known where a naturally occurring RNA carries out a reaction not involving RNA specifically.
Thomas Cech and his coworkers discovered RNA catalysis in the early 1980s while studying the removal of an intron from the ribosomal RNA of a small ciliated protozoan, Tetrahymena. They established the reaction in vitro and then purified the components of the reaction mixture to isolate what they thought would be the enzyme responsible. To their surprise, they could remove all the detectable proteins from the mixture and still get intron removal to occur efficiently. Subsequently, several other kinds of ribozymes were discovered: ribonuclease P, which cleaves transfer RNA precursors; another variety of intron from fungi, which carries out its own removal; a number of viral RNAs, which cleave themselves from large end‐to‐end intermediates to give genomic‐sized RNAs; and, most remarkably, the ribosome. Originally, all of these RNAs were thought to have a similar mechanism; however, this may be an oversimplification, and more RNA‐catalyzed reactions may exist (perhaps in the spliceosome) waiting to be characterized.