Two Equal Genes on the Same Reading Frame
Abstract
Pairs of unidirectional (same strand) genes tin overlap in one of two phases (relative reading frames). In that location is a striking bias in the relative abundance of prokaryotic gene overlaps in the ii possible phases. A simple model is presented based on unidirectional gene overlaps evolving from nonoverlapping gene pairs, through the adoption of culling start codons by the downstream genes. Potential culling start codons within upstream gene sequences were plant to occur at greater frequencies in ane stage, respective to the most prevalent phase of factor overlaps. We therefore suggest that the stage bias of overlapping genes is primarily a effect of the Northward-terminal extension of downstream genes through adoption of new start codons.
Overlapping genes are ubiquitous in viruses, and common in prokaryotes, just are also plant in eukaryotes (Normark et al. 1983; Rogozin et al. 2002; Makalowska et al. 2005). In overlapping coding sequences, the Deoxyribonucleic acid simultaneously encodes portions of ii separate polypeptides. Consequently, mutation of the overlapping region has implications for both proteins, and it is therefore nontrivial to characterize the selective pressures at work on such sequences. Neighboring genes on opposite strands of the DNA can be convergent (tail-to-tail, → ←) or divergent (caput-to-head, ← →). Same strand genes (head-to-tail, → →) are unidirectional, with an upstream and a downstream gene. In general, phases +0, +i, and +2 can exist defined by the gene separation modulo 3 (Kingsford et al. 2007) or equivalently by overlaps of threei, 3i−one, and threei −2 (for an integer i). For unidirectional genes, this phase is too the relative reading frame (Erect and Whitworth 2007).
Unidirectional (aforementioned strand) overlaps are the most common overlap orientation in prokaryotes (Fukuda et al. 2003). Only overlaps in phases +1 and +two are considered considering in-phase overlaps can be regarded equally a single gene with alternative initiation sites. Nigh unidirectional gene overlaps are of one or 4 bp (phase +2), but for longer overlaps, phase +1 is much more mutual than phase +2 (Eyre-Walker 1996; Borodovsky et al. 1999; Johnson and Chisholm 2004; Cock and Whitworth 2007; Lillo and Krakauer 2007).
Nosotros previously highlighted this interesting phase bias in prokaryotic unidirectional overlaps (Cock and Whitworth 2007) and attempted to explicate information technology with a mutual constraint argument (akin to that of Rogozin et al. 2002 for convergent overlaps). Although this mechanism could explain the phase bias, it did not brand testable predictions.
Kingsford et al. (2007) provided a simple model to explain the phase bias in convergent overlaps based on the observed frequencies at which culling stop codons are found in the reverse complement sequence of nonoverlapping convergent genes. Herein, we extend this thought to unidirectional overlaps evolving from nonoverlapping unidirectional genes. Start, C-terminal extension of the upstream gene is considered by looking at the frequencies of culling end codons within the downstream gene. The machinery in mind is that the existing finish codon of a nonoverlapping upstream gene is lost through a point mutation or indel, and the gene therefore is extended to the next stop codon, which may cause an overlap with the downstream factor. 2nd, Due north-terminal extension of the downstream factor by the adoption of a new start codon is considered by looking at the frequencies of alternative start codons inside the upstream gene. Here, the creation of overlaps from unidirectional neighboring genes is by the adoption of a new showtime codon for a downstream gene (east.one thousand., due to an indel or an accumulation of point mutations, perchance in association with the loss of the original start codon).
We demonstrate that C-terminal extension of an upstream cistron does non give the observed phase bias, but that N-terminal extension of a downstream gene does. Coupled with an exponential fitness cost to the overlap length, this model reproduces the general character of the observed distribution of overlaps.
In total, 3,153,393 gene pairs were considered: 460,065 divergent, 460,751 convergent, and ii,232,577 unidirectional; a dissever of xiv.6%, xiv.6%, and 70.8%, respectively (three pregnant figures [sf]). As noted by Lillo and Krakauer (2007), the number of divergent and convergent pairs are expected to be almost equal. Within each orientation, the proportion of overlapping gene pairs varies considerably. Only 3.1% of divergent genes are annotated as overlapping compared with 13% of convergent pairs and 21% (ii sf) of unidirectional pairs. These ratios are in expert agreement with published results (Fukuda et al. 2003; Erect and Whitworth 2007; Kingsford et al. 2007). For the rest of this newspaper, we focus on unidirectional overlaps. Figure 1 shows the observed distribution, with the articulate phase +1 bias in overlaps of seven bp or more. This bias persists even for genomes grouped by GC% (data not shown).
FIG. 1.
FIG. 1.
Figure 2 shows overlaps generated from the last out-of-frame cease codon within each nonoverlapped downstream gene, that is, consideration of C-terminal extension of an upstream gene by adoption of a new stop codon. This shows only a small phase bias, but information technology is opposite to that observed in the annotated genomes.
FIG. 2.
FIG. two.
Effigy 3 shows overlaps generated from the first out-of-frame start codons within each nonoverlapped upstream gene, that is, consideration of N-terminal extension of a downstream gene by adoption of a new commencement codon. This does show the same phase bias observed in the annotated genomes (fig. 1), although the distribution of curt overlaps is very different. In particular, a large number of potential overlaps of length 5 bp have been identified, all of which utilize the culling start codon ATT. Yet, merely a handful of such overlaps were found in the original survey, and this analysis is overly simplistic as the dissimilar start codons are non necessarily equally likely. Indeed, some of the starting time codons defined in the relevant NCBI translation tables (11 and 4) are only used in a small subset of the prokaryotes.
FIG. three.
FIG. 3.
It would exist possible to extend the above analysis with an acceptance weighting based on observed start codon frequencies. Instead, figure 4 shows the more than straightforward arroyo of but because the three most common start codons (ATG, GTG, or TTG). Again there is a strong stage +1 bias in the longer overlaps, and now the ratios of overlaps 1 and 4 are much closer to those observed (fig. one).
FIG. four.
FIG. iv.
One marked deviation between the observed distribution (fig. 1) and figures 3 and iv are the very different decay rates. Kingsford et al. (2007) observed a similar phenomenon in their analysis of convergent overlaps and resolved this with a ii-phase model, whereby there is a phase bias at overlap creation due to codon frequencies, just with overlap length subject field to an exponential fettle, which can be determined empirically to match the observed data. Effigy 5 prove the data normalized with an exponential decay of 0.0931 (least squares difference fitting for overlaps up to 200 bp). There is skillful agreement with figure 1, simply the phase bias is slightly less pronounced.
FIG. five.
FIG. 5.
2 mechanisms for neighboring unidirectional genes to become overlapped were considered. C-terminal extension of the upstream cistron (fig. 2) does not explain the observed pattern in unidirectional gene overlaps (fig. i). However, considering North-terminal extension of the downstream gene past looking for alternative kickoff codons does predict the observed phase bias (figs. 3 and 4), largely explained as due to the relative frequencies of alternative start codons in the ii reading frames. Together with an exponential fitness criteria on the overlap length, this predicts a distribution close to that observed (fig. 5). This proposed exponential fettle cost could be due to the metabolic burden of making a longer protein, the increased likelihood of problems with protein misfolding/aggregation, or a combination of these or other effects. This model is uniform with common sequence constraint arguments as in Cock and Whitworth (2007) but provides a much clearer explanation.
Sabath et al. (2008) depict a related analysis of unidirectional overlaps, which too found that the phase bias in longer unidirectional overlaps could be explained in terms of the relative abundance of alternative start codons within an upstream gene and rejected the complementary explanation of the adoption of alternative stop codons within a downstream cistron. Rather than observing culling start and terminate codons direct from real gene sequences, their frequencies were inferred from di-codon frequencies taken as the product of observed codon frequencies (assuming di-codon frequencies are independent). This cannot capture any differences in codon bias within genes, for example, between the 5′ and 3′ terminal regions of a factor. Also, a much smaller information prepare was used, drawing on annotated overlaps from only 167 genomes. Nevertheless, their work is supportive of the results given here.
Translational coupling provides a biological reason for short unidirectional gene overlaps and thus, the very large number of overlaps of 1 or 4 bp. This may as well apply to five bp overlaps, which could be tested in vitro, suggesting that the loftier number of unidirectional overlaps generated using the "rare" alternative start codon ATT (fig. 3) may exist of biological relevance, with the handful of cases annotated being just the tip of the iceberg. Eyre-Walker (1996) noted skewed ratios of alternative terminate codons in short overlaps of ane or four bp, then atypical showtime codon usage in this context is not unreasonable.
Although the genetic code itself appears to induce these phase biases in longer overlaps, without searching for Shine–Dalgarno translation initiation sites or direct experimental evidence, it is not clear how many of these annotated long unidirectional overlaps are biologically relevant. Although translational coupling provides a biological reason for short unidirectional cistron overlaps, it may not apply to the longer overlaps reported. A contempo analysis past Pallejá et al. (2008) concluded all unidirectional overlaps over 60 bp in their data ready of 338 prokaryotic genomes were misannotations, simply did identify some "real overlaps." However, the stage bias is still found when only genes with annotated functionality are considered (Cock and Whitworth 2007; Lillo and Krakauer 2007; Sabath et al. 2008), and 98.one% (3 sf) of annotated unidirectional overlaps are threescore bp or less.
The phase patterns in overlapping genes have no immediately apparent evolutionary role, simply rather are inherently linked to the genetic code itself, which has evolved under diverse pressures. Itzkovitz and Alon (2007) explored a range of hypothetical genetic codes and concluded that those observed in nature are virtually optimal for encoding additional information within a protein sequence. This work did non specifically mention nucleotide sequences simultaneously encoding two proteins, but rather capricious (brusque) sequences representing possible DNA-binding regions or other motifs. Furthermore, the presence of culling out-of-frame stop codons within a gene (hidden stop codons) has been looked at from the indicate of view of robustness to translational frameshift errors (Seligmann and Pollock 2004). The standard genetic code was institute to stop erroneous reads sooner than hypothetical genetic codes, which is advantageous as less resources are wasted constructing and degrading nonfunctional proteins. It seems reasonable that functions like double coding and hidden stop codons may have shaped the genetic code and thus indirectly contributed to the overlap phase patterns observed.
Methods
Separation/overlap frequencies were tabulated for divergent, convergent, and unidirectional adjacent annotated genes in i,800 GenBank files for the 962 bacterial or archaeal species available from the NCBI equally of 7 September 2009. For simplicity, any genes with nonexact locations, ambiguous sequences, internal stop codons, invalid get-go or terminate codons (as verified using the declared genetic lawmaking), or special cases with noncontinuous coding sequences (e.g., from ribosomal slippages) were excluded, as were cases where one gene was entirely inside another (fully overlapped). Separated factor pairs with any cryptic sequence between them were also excluded.
For each nonoverlapping unidirectional cistron pair, the downstream gene was searched for the kickoff out-of-frame finish codon, whose location determined the length of a hypothetical unidirectional cistron overlap. Additionally, the upstream factor was searched for the last out-of-frame start codon, the location of which as well determined a hypothetical unidirectional gene overlap. If, due to the presence of a nearby in-frame stop codon, this hypothetical factor would be encoded within the upstream gene, it was rejected. Two variants of the start codon analysis were performed, commencement looking for any valid potential start codon in the genetic code declared for that organism and second looking only for the almost typically used starting time codons (ATG, GTG, or TTG).
The assay was written in Python using Biopython (Cock et al. 2009). Figures were drawn with R (R Evolution Cadre Squad 2007).
This work was supported by the Engineering and Physical Sciences Inquiry Quango via the MOAC Doctoral Training Centre (studentship to P.J.A.C.).
References
, , . .
Statistical predictions of coding regions in prokaryotic genomes by using inhomogeneous Markov models
,
Organisation of the prokaryotic genome.
,
1999
Washington, DC
ASM Press
(pg.
eleven
-
33
)
, , , et al.
(11 co-authors
Biopython: freely available Python tools for computational molecular biology and bioinformatics
,
Bioinformatics
,
2009
, vol.
25
11
(pg.
1422
-
1423
)
, .
Evolution of cistron overlaps: relative reading frame bias in prokaryotic two-component system genes
,
J Mol Evol.
,
2007
, vol.
64
4
(pg.
457
-
462
)
.
The close proximity of Escherichia coli genes: consequences for cease codon and synonymous codon apply
,
J Mol Evol.
,
1996
, vol.
42
2
(pg.
73
-
78
)
, , .
On dynamics of overlapping genes in bacterial genomes
,
Gene
,
2003
, vol.
323
(pg.
181
-
187
)
, .
The genetic code is virtually optimal for assuasive boosted information within protein-coding sequences
,
Genome Res.
,
2007
, vol.
17
(pg.
405
-
412
)
, .
Properties of overlapping genes are conserved across microbial genomes
,
Genome Res.
,
2004
, vol.
xiv
(pg.
2268
-
2272
)
, , .
A unified model explaining the offsets of overlapping and nearly-overlapping prokaryotic genes
,
Mol Biol Evol.
,
2007
, vol.
24
nine
(pg.
2091
-
2098
)
, .
A statistical analysis of the three-fold development of genomic compression through frame overlaps in prokaryotes
,
Biol Direct
,
2007
, vol.
2
pg.
22
, , .
Overlapping genes in vertebrate genomes
,
Comput Biol Chem.
,
2005
, vol.
29
1
(pg.
1
-
12
)
, , , , , , .
Overlapping genes
,
Annu Rev Genet.
,
1983
, vol.
17
(pg.
499
-
525
)
, , .
Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions?
,
BMC Genomics
,
2008
, vol.
9
pg.
335
R Evolution Core Team.
R:a language and environment for statistical computing [Net]
,
Vienna (Republic of austria): R Foundation for Statistical Computing
,
2007
ISBN three-900051-07-0
, , , , , , .
Purifying and directional selection in overlapping prokaryotic genes
,
Trends Genet.
,
2002
, vol.
18
5
(pg.
228
-
232
)
, , .
Aforementioned-strand overlapping genes in leaner: compositional determinants of phase bias
,
Biol Direct
,
2008
, vol.
3
i
pg.
36
, .
The ambush hypothesis: hidden stop codons prevent off-frame factor reading
,
DNA Cell Biol.
,
2004
, vol.
23
10
(pg.
701
-
705
)
Writer notes
Associate editor: James McInerney
© The Writer 2009. Published by Oxford University Press on behalf of the Society for Molecular Biological science and Evolution. All rights reserved. For permissions, delight east-mail: journals.permissions@oxfordjournals.org
Source: https://academic.oup.com/mbe/article/27/4/753/1749067
0 Response to "Two Equal Genes on the Same Reading Frame"
إرسال تعليق