Phylogenetic Signal of the Nuclear Gene GA20ox1 in Seed Plants: The Relationship Between Monocots and Eudicots
Received Date: 00--0000 Accepted Date: 31-May-2017 Published Date: 00--0000
Citation: Phylogenetic Signal of the Nuclear Gene GA20ox1 in Seed Plants: The Relationship Between Monocots and Eudicots. American Research Journal of Biosciences; V3, I1; pp:1-8.
Copyright This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
This study investigated the phylogenetic signal of the nuclear gene GA20ox1 in seed plants focusing in the relationship between Monocots and Eudicots. Sequences were obtained from GenBank and analyzed using the maximum likelihood and the maximum parsimony approaches. A maximum likelihood tree was built using sequences of the rbcL plastid gene in order to enable comparison of the results. The GA20ox1 gene presents neutral evolution, levels of homoplasy equivalent to that observed in chloroplast sequences and generated well-resolved phylogenetic relationships. The relationship between Mocots and Eudicots based on the GA20ox1 gene was clear resolved, revealing the evolution of both groups. All these characteristics taken together make the GA20ox1 gene a promissory marker to corroborate as well as to complement and resolve phylogenetic relationships among species within one to several genera.
Keywords: Nuclear gene, flowering plants, systematics, gibberellin, phylogeny
The large amount of DNA sequences generated in the last decades for an increasing number of different species has enabled to refine the phylogenetic relationships among flowering plants and enabled the generation of better-resolved classifications for this group (APG 2009, Babineau et al. 2013). Despite this progress, some undefined or weakly supported clades still remain and additional molecular data are needed to increase support for these relationships. Soltis et al. (2000) reported a well resolved and highly supported topology of the angiosperm phylogeny by combining chloroplast and nuclear sequence data sets (rbcL, matK and 18S rDNA) and suggested that most of the remaining phylogenetic questions could best be addressed by sequencing additional genes, without the need of adding more taxa in the analysis.
In general, plant molecular phylogenetics has been very dependent on nucleotide sequences of chloroplast (cpDNA) and nuclear ribosomal (rDNA) genes, but the importance of using additional new nuclear sequences has been proposed as an alternative to improve the resolution of phylogenetic relationships (Qiu et al. 1999, Soltis et al. 1999, Smal et al. 2004). Phylogenetic markers must be at least 500 bp in length and should neither be too conserved nor too variable (Belinky et al. 2012). Since introns and intergenic regions are usually too variable to be used as informative markers above the intraspecific level, the evaluation of exon regions may be a valuable alternative for phylogenetic reconstructions above the species level (e.g. Belinky et al. 2012). In addition, attributes as the elevated rate of sequence evolution, existence of multiple independent loci and the biparental inheritance make nuclear genes a very attractive alternative for estimating species trees (Smal et al. 2004). Following this rationality, a phylogenetic investigation based on nuclear genes should start by selecting candidate genes for a preliminary study (Smal et al. 2004), in which the utility of such DNA sequences can be evaluated. Considering the very large amount of gene sequences deposited in the public databanks of nucleic acids (e.g. GenBank, EMBL, DDBJ), such repositories may be an attractive source of DNA sequences for initial screening of promissory exonic regions with phylogenetic signal. However, an evaluation of the reliability of the sequences deposited in public repositories for phylogenetic analysis is crucial for testing the usefulness of these databanks for such a purpose.
The aim of the present study was to investigate the phylogenetic signal of the nuclear gene Giberellin 20-oxidase1 (GA20ox1) in seed plants focusing in the relationship between Monocots and Eudicots. Using nucleotide sequences deposited in the GenBank, we intended to test the hypotheses that the GA20ox1 is a useful gene for phylogenetic inferences for seed plants.
MATERIAL AND METHODS
Initially an exhaustive search was performed in the NCBI GenBank database (http://www.ncbi.nlm.nih.gov/ genbank/) in order to find all sequences deposited for the gene GA20ox1, as suggested by Small et al. (2004). The GA20ox is a low-copy nuclear gene (four copies characterized in flowering plants) related to the activation of the plant growth hormone gibberellin (Hedden and Phillipos 2000) and seems to be relatively conserved across plant species (Huerta et al. 2009). Although there are no grounds to expect that any particular gene will be universally useful at any given phylogenetic depth (Small et al. 2004), the GA20ox1 gene may match the main features of a phylogenetic useful nuclear gene, an elevated rate of neutral sequence evolution.
Sequences of 60 species representing Monocots (12 species), Eudicots (57 species) and Pinaceae (1 species) were recorded in the GenBank (Table 1). Sequence of the GA20ox1 gene from the lycophyte Selaginella moellendorffii Hieron. (Lycopodiidae) was included in the analysis as outgroup. Sequences of the GA20ox1 gene were aligned with the software Muscle (Edgar 2004), using the Neighbor Joining clustering method for the interactions and all default parameters of the program.
In order to characterize evolutionary patterns of the GA20ox1 sequences, the transition/transversion matrix (using the maximum likelihood method) and a codon-based Z-test of selection (using the Pamilo-Bianchi-Li method, complete deletion of gaps and 1000 bootstraps for significance determination) were estimated using Mega 5.05 (Tamura et al. 2011).
The phylogenetic relationship among species was analyzed using the maximum parsimony (MP) and the maximum likelihood (ML) approaches. The best-fit model of nucleotide substitutions for the sequences was determined through the Akaike Information Criterion (AIC) in the software jModelTest 0.1.1 (Posada 2008). The MP analysis was performed using the Tree-Bisection-Regrafting (TBR) model, all characters unordered and gaps treated as missing data. The ML tree was built using the GTR+G+I mutation model with rate variation among sites modeled with a discrete gamma distribution with five categories. A bootstrap analysis with 500 replicates was employed to assess the internal support of the clades for ML and MP trees. Phylogenetic trees were built in Mega 5.05 (Tamura et al. 2011).
Aiming to compare the phylogenetic signal of the GA20ox1 gene with the phylogenetic relationship retrieved with plastid genes traditionally employed in such studies, sequences of the chloroplast region rbcL deposited in the GenBank were recorded for the same species or, when not available, a species belonging to the same genera recorded for the former gene. The rbcL sequences were aligned and the phylogenetic relationship among species was analyzed using the ML approach as described above. The number of parsimony informative sites, the CI and the RI were computed as for the GA20ox1 sequences.
Phylogenetic signal of the GA20ox1 gene
Concerning the patterns of nucleotide substitution, 40.3% of the mutations retained in the GA20ox1 sequences evaluated were transversions and 59.7% were transitional, with a ratio of transition to transversion R = 1.49. The codon-based Z-test of selection resulted in a statistically not significant value of Z = -0.705 (p = 0.482), suggesting neutral evolution of the studied sequences. For the parsimony analysis of the GA20ox1 nucleotide sequences,
56.94% of the characters were parsimony informative, generating a consensus tree with 2808 steps (Fig. 1A), consistency index CI = 0.23 and retention index RI = 0.51. By comparison, the parsimony analysis of the rbcL sequences (tree not shown) resulted in 28.39% of the characters parsimony informative, generating a consensus tree with 396 steps, consistency index CI = 0.44 and retention index RI = 0.66.
Phylogenetic relationship between monocots and Eudicots
In the MP analysis of the GA20ox1 sequences (Fig. 1B), monocots were placed basal in the phylogenetic tree. One clade retained all asterids with weak bootstrap support (BP < 50%), excepting Daucus that clustered with Cucurbita and Citrulus. Rosids failed to form a single cluster, mixing with Caryophyllales, Proteales, Saxifragales and even Pinales. In the ML tree (Fig. 1A), the GA20ox1 sequences failed in resolving the relationship within Eudicots, with species from rosids, asterids, Caryophyllales, Proteales, and Saxifragales clustering admixed. However, all monocot species grouped in a monophyletic cluster with 88% bootstrap support.
The phylogenetic analysis of the rbcL sequences using the ML approach (Fig. 1C) grouped monocots in two separated clusters, while all species from asterids formed a single cluster, with weak bootstrap support (BP < 50%), and rosids revealed a main cluster (BP < 50%; Fig. 1C) mixed with Kalanchoe (Saxifragales) and Rumex (Charyophyllales). Vitis and Rosa (rosids) were located outside this main group and Pinus (Pinales) is basal to all species.
Considering that a gene tree is the phylogeny of a particular DNA sequence, viewing the alleles themselves as the operational taxonomic units (OTUs) and not the evolutionary pathway of a group of OTUs (Avise 1989), the present analysis of the GA20ox1 reflects the phylogenetic relationships of this gene across the recorded species. Even though the Z-test of selection suggested neutrality among the evaluated sequences, the observed phylogenetic relationships may be consequence of lineage sorting acting over the GA20ox1 gene.
The phylogenetic signal of the GA20ox1 gene was measured based on the nucleotide substitutions pattern and the Z-test of neutrality. By evolving slower than transitions, transversions are less susceptible to homoplasy and considered the more reliable type of mutations in constructing phylogenies (Quicke 1993, Yoder et al. 1996). So, a higher amount of transversional mutations are an important aspect in the informative capacity of DNA regions for phylogenetic analyses. The extent of transversional mutations observed in the GA20ox1 (40.3%) and its neutral evolution, confirmed through the Z-test, are desired characteristic for phylogenetic useful nucleotide sequences and assure a suitable phylogenetic signal for this gene.
The amount of parsimony informative characters and level of homoplasy of the GA20ox1 sequences is similar to that observed in useful cpDNA sequences as the matK and trnK. Evaluating the informative capacity of these two regions for phylogenetic studies of the early diverging eudicots, Hilu et al. (1996) found from 55% to 73% of parsimony informative characters, with CI of the trees ranging from 0.36 to 0.40 and RI ranging from 0.39 to 0.47. Although the proportion of parsimony informative characters of the GA20ox1 sequences is lower than those observed for matK and trnK, its length (>1200 bp) assures a number of informative characters larger than usually employed cpDNA sequences. In addition, the CI estimation of the GA20ox1 sequences is just slightly lower than the values observed for matK and trnK sequences, suggesting levels of homoplasy similar to that observe in widely used chloroplast sequences.
The GA20ox1 gene is responsible for the secondary growth in plants, an important trait differing monocots and Eudicots. While secondary growth is usually unexpressive in monocots, it is significant in Eudicots, mainly in tree species. The monocot-dicot divergence estimated through chloroplast gene sequences is about 200 ± 40 million years ago (Wolfe et al. 1989). Thus, the divergence of the GA20ox1 gene likely derives from this time, as clearly expressed in its phylogenetic analysis, which placed monocots as a monophyletic group with strong bootstrap support in the ML tree. Contrasting, the rbcL tree expressed the monocot group as paraphyletic. The paraphyletic origin of monocots has been reported in all studies based on the nuclear18S gene and was considered effect of differential lineage sorting (Duval and Erwin 2004).
In conclusion, our evaluation of the GA20ox1 nucleotide analysis revealed neutral evolution, levels of homoplasy equivalent to that observed in chloroplast sequences traditionally employed in phylogenetic analysis and phylogenetic relationships at genus and family levels well resolved with high support in both, ML and MP analyses. All these characteristics taken together make the GA20ox1 gene a promissory marker to corroborate as well as to complement and resolve phylogenetic relationships among species within one to several genera, the level at which most systematists work. Considering that in this study we employed just DNA sequences deposited in the GenBank, further efforts are needed in developing universal primers to amplify this gene across different taxa, aiming to highlight relationships still unresolved in the flowering plants phylogeny. Since the admixture of species from different clades as rosids, asterids, Caryophyllales, Proteales and Saxifragales was observed in the phylogenetic analyses of both, GA20ox1 and rbcL sequences, this approach may solve these inconsistencies by enabling the selection of the more appropriated plant species for phylogenetic studies using the GA20ox1 gene.
This work was partially supported by financial support of FAPERGS (Process 1013351) and CNPq (Process 471812/2011-0). The authors thank CNPq, UNIPAMPA and CAPES for grant and scholarships.