| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Previous Article | Next Article ![]()
Eukaryotic Cell, February 2005, p. 455-464, Vol. 4, No. 2
1535-9778/05/$08.00+0 doi:10.1128/EC.4.2.455-464.2005
Copyright © 2005, American Society for Microbiology. All Rights Reserved.
Silvia M. Salem-Izacc,1,
Raphaela C. Georg,1
Ricardo Z. N. Vêncio,2
Luci D. Navarro,1 and
Suely L. Gomes1*
Departamento de Bioquímica, Instituto de Química,1 BIOINFO-USP, Núcleo de Pesquisa em Bioinformática, Universidade de São Paulo, São Paulo, Brasil2
Received 24 August 2004/ Accepted 6 December 2004
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
B. emersonii is a primitive fungus which has diverged early in the fungal lineage (17, 49). Based on rRNA data, it seems clear which groups form the fungal monophyletic clade; however, the phylogenetic relationships among the various fungal taxa remain doubtful (19, 36, 49). Similarly, the relationships among the various crown taxa remain not well resolved. In the same way that molecular phylogenies based on rRNA have alternatively placed either plants or fungi as more closely related to animals, different works, mainly based on protein sequences (elongation factors 1
and 2, actin, and tubulins), have supported Cavalier-Smith's proposal that animals and fungi are sister groups (2, 48).
Despite the particular taxonomic position and the significance as an important ecological group that involves saprobes as well as plant, animal, and fungal pathogens (35), the chytrids remain poorly characterized. Although B. emersonii has become one member of the group that has been extensively studied at different levels, present knowledge about its expressed genes is limited to the rRNA genes and eight protein coding sequences (4, 5, 9, 10, 31, 37, 38, 42, 45, 49).
An efficient way to obtain information about gene expression and coding sequences of uncharacterized genomes is to sequence a large number of expressed sequence tags (ESTs). If obtained from nonnormalized libraries, EST sequencing analysis (also known as digital Northern analysis) can represent the expression profile, including complexity and abundance levels of transcripts from different tissues, cell types, and developmental stages (8).
We report here a high-throughput cDNA sequencing program which is the first approach to the understanding of gene complexity in B. emersonii. We have produced 16,984 high-quality sequences corresponding to the 5' ends of cDNAs from 10 different libraries constructed using mRNA isolated from synchronized cells, collected at different stages along the B. emersonii life cycle. The set includes 4,873 putative unique sequences, among which 2,306 were annotated in at least one of the three Gene Ontology (GO) project terms: biological process, molecular function and cellular component. A total of 1,680 ESTs were classified in different biological processes. We also tested previously selected proteins to reconstruct the eukaryotic phylogeny based on the neighbor-joining method. At the same time, we conducted an in silico analysis to evaluate differential gene expression throughout B. emersonii life cycle, which was validated by Northern blot for eight selected genes. This first large-scale sequencing project of a chytridiomycete transcriptome represents an important set of expressed sequences for studies of phylogeny as well as growth and differentiation in lower fungi.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Construction of cDNA libraries and DNA sequencing. Total RNA was isolated using Trizol LS (Life Technologies), and mRNA was purified by employing the Oligotex-dT mRNA mini kit (QIAGEN). RNA samples were isolated from zoospores, germinating cells 30, 60, 90, and 120 min after induction of germination at 27°C, vegetative cells after 16 h of growth at 18°C, and sporulating cells 30, 60, 90, and 120 min after induction of sporulation at 27°C, totalling 10 different samples along the fungal life cycle. Libraries were denominated as follows: ZSP (zoospores), G30-G120 (germlings from 30 to 120 min of germination), NSV (vegetative cells), and E30-E120 (zoosporangia from 30 to 120 min of sporulation). DNA was synthesized using 1 to 5 µg of poly(A)+ RNA and size-fractionated in Sephacryl S-500 HR (Life Technologies) columns, and cDNA fractions containing fragments larger than 500 bp were pooled. We constructed nonnormalized cDNA libraries in the vector pSPORT I using the SuperScript plasmid system (Life Technologies), resulting in directional cloning. We also prepared a nondirected cDNA library from vegetative cells in pGEMT-Easy (Promega) using a PCR-select cDNA subtraction kit (Clontech) without the subtraction step. We carried out the link between the adapter sequences provided for subtraction and the cDNA sample, cloned the resulting fragments into the plasmid, and amplified the inserts by PCR, using the adapter sequences as primers. Sequencing reactions were performed with 100 to 200 ng of plasmid DNA prepared in a 96-well format at all stages, from bacterial growth through sequencing reaction purification. Sequencing reactions were carried out using the ABI Prism BigDye Terminator sequencing kit (Applied Biosystems) and analyzed on ABI377 or ABI3100 automatic sequencer instruments.
EST processing pipeline, annotation, and database construction.
The EST sequence chromatograms were stored, processed, and trimmed through a web-based service (34; http://bioinfo.iq.usp.br/estweb) including base calling and quality control by PHRED (6, 7) and trimming (which included the removal of poly(A) tails, low-quality sequences, and vector and adapter regions) by Cross Match (14; http://www.phrap.org/phrap_documentation.html). The accepted sequences were chosen to contain a minimum of 150 bases, with at least 75 bases within a window of 100 bases with a PHRED quality value of
15. The sequences were filtered using a local BLASTN analysis (1) to eliminate sequences that matched ribosomal and mitochondrial sequences or that matched bacterial sequences. Assembly of ESTs into contigs was carried out using the CAP3 program (18; http://genome.cs.mtu.edu/cap/cap3.html) set with default parameters. The redundancy was estimated as the number of assembled ESTs minus the number of contigs divided by the total number of ESTs and transformed to a percentage. We annotated putative protein products for assembling results based on BLASTX best hits against a nonredundant database at the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/BLAST/), and we automatically assigned Gene Ontology terms (http://www.geneontology.org) to the resulting contigs and singlets using BLASTX against curated databases (Swiss-Prot and TrEMBL) available at the Swiss Institute of Bioinformatics (http://www.expasy.org). We used a bit score cutoff of 55 for the NCBI annotation (this score corresponds roughly to an E-value
106 considering the nonredundant database size) and an E value of
106 as the BLASTX cutoff for the Swiss-Prot and TrEMBL annotation. We included both annotation browsers in our online integrated database available at the project website (http://blasto.iq.usp.br), as well as a dynamic searching tool. Annotation was also manually checked. The unigenes showing matches with any of the first 10 amino acids (aa) of a protein present in the SwissProt database (BLASTX hit with an E value of <106), and with a minimum of 30 nucleotides (nt) 5' to the matching amino acids, if not the initiator methionine, were assumed to be full-length transcripts.
Alignments and construction of trees. We aligned protein sequences using Clustal W version 1.82, and alignments were manually checked for final evaluation (47; http://www.ebi.ac.uk/clustalw). We constructed neighbor-joining trees (39) based on a gamma model using the software package MEGA version 2.1 (25; http://www.megasoftware.net) and performed bootstrap analysis on 3,000 samples to estimate the tree support.
Analysis of gene expression profiles. To evaluate changes in EST abundance among the sequenced ESTs from different libraries, we used the so-called digital Northern analysis or in silico transcriptional profiling, which can be performed by counting the number of sequenced ESTs for a given gene within the whole sequenced EST population (normalized counts), obtained from nonnormalized libraries. From a mathematical viewpoint, this in silico approach is a small-scale equivalent of the SAGE (serial analysis of gene expression) technique. As a counting sampling process, digital Northern analysis is easily modeled using basic Bayesian statistics. This model was used to build credibility intervals (error bars) for abundance results, according to Vêncio et al. (50). We chose a noninformative a priori probability density function and reported 68% credibility intervals. To group gene expression level profiles, we used the well-known hierarchical clustering algorithm with the unweighted-pair group tree-building method with arithmetic means and a correlation metric (16) using Spotfire for Functional Genomics software.
Differentially expressed transcripts were evaluated by considering their error bars. To validate the results obtained in the digital Northern analysis, classical Northern blots were carried out for eight genes selected by considering their different expression profiles and using biological replicates. Total RNA was extracted from synchronized cells at different stages of the B. emersonii developmental cycle, using the Trizol method (Life Technologies), and Northern blot analysis was performed as previously described (42). We blotted the RNA to Hybond N+ membranes (Amersham) and used as probes EcoRI/NotI fragments from sequenced library clones corresponding to the selected transcripts. We normalized the hybridization signals obtained using the bands corresponding to the 28S and 18S rRNAs, visualized in the agarose gels by ethidium bromide staining.
Nucleotide sequence accession numbers. All sequences described in this study have been submitted to the GenBank EST section and the accession numbers assigned to them were CO961503 to CO978552 and dbEST-Id 25065829 to 25082878.
| RESULTS |
|---|
|
|
|---|
|
Roughly 52% of the assembled sequences (2,535) represent novel orthologs from B. emersonii (ESTs with matches in GenBank), whereas 48% (2,338) have no putative identification. The high percentage of sequences with no match reflects the early stage of fungal genome exploration, in particular in the chytridiomycete class. Previously known protein-coding genes from B. emersonii are represented in our database, and their contribution amounts to about 0.16%. In addition, we have found one new paralog corresponding to the catalytic subunit of the cyclic AMP-dependent protein kinase and four paralogs corresponding to the heat shock protein HSP70.
Biological survey of B. emersonii ESTs. As the result of GO annotation, a total of 2,306 assembled sequences (47%) were classified in 4,861 nodes distributed among the three ontologies of the GO structure: molecular function, biological process, and cellular component (46). Hypothetical proteins have no designation within this hierarchy because they do not represent a biological entity. However, the number of hypothetical proteins (a total of 989) identified in the NCBI database dropped to nearly half (468) after they were classified in one or more ontologies. Pure hypothetical proteins, i.e., open reading frames without any evidence of being transcribed and without significant similarity with any other well-characterized expressed sequence, represented the other half, which were not considered in the GO classification. Despite the difficulty in establishing a direct comparison between the two annotations, this number probably represents the majority of the observed differences between both assignments.
About 35% (1,680) of the annotated unigenes corresponded to biological process ontology. Among the main represented nodes, 18% corresponded to nucleic acid metabolism, 21% corresponded to protein metabolism, and 20% corresponded to cell growth and/or maintenance. Transcription and RNA processing (8%), protein biosynthesis (9%), and transport (11%) were the most prevalent processes related to main nodes, respectively (see Fig. S1 in the supplemental material). Among the 20 most abundant ESTs, 15 are involved in the protein biosynthesis process (GO:0006412), 14 of which encode ribosomal proteins, including 5 misidentified according to the first best match, suggesting the importance of this overrepresentation for B. emersonii protein synthesis (Table 2).
|
|
According to rRNA sequence analysis, B. emersonii has diverged early in the fungal lineage (49). Thus, we wondered whether our results could be offering a new perspective to construct phylogenetic trees based on different protein sequences. Towards this end, we decided to look for previously selected proteins used in phylogenetic studies among our ESTs, and tested them in the construction of neighbor-joining trees (39). We selected three proteins used or indicated in the literature for which complete cDNAs were present in our libraries: elongation factor 1
, enolase, and ß-tubulin (2, 36).
Animals and fungi all share a 12-aa insertion in EF-1
and a 2-aa deletion located approximately 53 residues N-terminal to the 12-aa insertion, relative to other eukaryotes (2, 36). Our amino acid sequence alignments indicated that B. emersonii EF-1
also presents the conserved 12-aa insertion (Fig. 2A) but with an additional amino acid (13-aa insertion). Furthermore, B. emersonii EF-1
does not share the 2-aa deletion but has three different amino acid insertions of 8, 4, and 5 amino acids not found in fungi. These differences are also shared by the choanoflagellate Monosiga brevicollis EF-1
long-form sequence, the best match obtained against the nonredundant GenBank database after the alignment done with the BLASTX tool (Fig. 2A). Additionally, B. emersonii and M. brevicollis EF-1
sequences have C-terminal regions 18 to 20 amino acids shorter than those of the other sequences.
|
long form is in fact an elongation factor-like protein (EFL), which is a class of proteins very similar to but distinct from canonical EF-1
that can replace it in diverse organisms (23). Thus, it is most probable that the full-length sequence presented here is an EFL sequence. In agreement with this hypothesis, a neighbor-joining tree constructed with B. emersonii EF-1
and the sequences used by Keeling and Inagaki (23) resulted in a group containing EFL sequences which included B. emersonii EF-1
(see Fig. S2A in the supplemental material). Previous analysis of enolase sequences has revealed three small gaps (two deletions and one insertion) that join together fungi and animals, leaving plants out (2). Clustal W alignments, manually checked, indicated the previously reported conservation of the 1-aa deletion and the 2-aa insertion, except in the case of B. emersonii that showed a 4-aa insertion (Fig. 2B). We also observed two conserved 1- and 5-aa deletions, approximately 15 and 23 residues C-terminal to this insertion, which had not been reported before (Fig. 2B). Furthermore, we noticed that the last 1-aa deletion reported by Baldauf and Palmer (2) was not found in four fungal sequences (Alternaria alternata, Aspergillus oryzae, N. crassa, and Tuber borchii), whereas another 2-aa deletion (between G314 and I315 from Saccharomyces cerevisiae) was observed, except for three fungi: B. emersonii, Neocallimastix frontalis and Candida albicans (data not shown). Protein neighbor-joining phylogeny based on 21 sequences clustered plants and fungi, including B. emersonii and another chytrid, N. frontalis, with 53 and 59% of bootstrap support, respectively. Animals were resolved in a group that excluded nematodes, which branched in a second group (with 71 and 100% of bootstrap support, respectively), but all clustered closer to fungi than plants (with 62% of bootstrap support) (see Fig. S2B in the supplemental material).
We also evaluated 31 ß-tubulin sequences, among which we included the partial sequence of Allomyces macrogynus, a member of the order Blastocladiales, which is closer to B. emersonii than the other chytrids analyzed above, and the complete sequence of M. brevicollis. The distance tree was informative, resolving plant, animal, and fungal groups (95, 81, and 50% of bootstrap support, respectively) with animals and fungi in the same clade (with bootstrap confidence of 91%) (see Fig. S2C in the supplemental material).
Putative higher eukaryote receptor proteins in B. emersonii. Several B. emersonii ESTs presented best matches with different receptor proteins. Interestingly, two best hits were found with plant hormone receptors: the ethylene receptor CS-ETR2 of Cucumis sativus (gi 6136818) and the putative phytosulfokine receptor precursor of Oryza sativa (japonica cultivar group; gi 48717048). The corresponding cDNAs were sequenced from both ends and were found to be incomplete. Nevertheless, we identified the GAF domain characteristic of the ethylene receptor using PfamB and also identified the three leucine-rich repeats as well as the cysteine pair that flanks the leucine-rich repeat domain (33) characteristic of the phytosulfokine receptor using PfamA. These results constitute the first description of such putative receptors in an organism other than a plant.
Another receptor best hit was found to the mannose-6-phosphate/insulin-like growth factor II receptor (MPG/IGF2R) from Didelphis virginiana (gi 16506700), a North American opossum. The MGP/IGF2R has homologues identified in reptiles, amphibians, fish, birds, and mammals. However, no DNA sequence information concerning homologues in other animals or microorganisms exists in public databases. In B. emersonii, two incomplete cDNAs with similarity to MPG/IGF2R were found, and Pfam analysis has confirmed the presence of a cation-independent mannose-6-phosphate receptor domain.
Looking for genes with different expression profiles. We have considered nine cDNA libraries to evaluate differential gene expression, excluding the NSV library which had a PCR amplification step and whose EST normalized counts thus could not represent the real transcript abundance. Therefore, for digital Northern analysis purposes, the EST cluster assignment was carried out again without the NSV library. This new cDNA library-assembling process produced 1,897 clusters and 2,857 singlets, with a total of 4,754 putative unigenes.
Initially, we analyzed the global patterns of EST diversity by comparing the sporulation and germination stages using the distribution of clusters among different levels of abundance. We included the zoospore at the end of the sporulation stage for this analysis, since zoospore-expressed messages represent the last part of the sporulation-transcribed EST population (29). The two stages presented quite different distributions, as shown in Fig. S3 in the supplemental material, and were validated by a P value of 0.00 in the two-sample Kolmogorov-Smirnov test. The normalized histograms indicate that sporulating cells express higher EST diversity than germinating cells, as reflected by the presence of a larger number of singlets and low-abundance contigs in the libraries from sporulation and a larger number of high-abundance contigs in the libraries from germination.
As a second approach, we analyzed the expression profile of each transcript throughout the life cycle of the fungus in a time series framework. We used a Bayesian statistical model to account for sampling error in transcript abundance and produced qualitative classes (representing different reliability classes) constructing credibility intervals (error bars) for transcript abundance. Afterwards, we performed a profile hierarchical clustering by using a correlation metric to identify coexpressed transcripts and selected 109 contigs, based on previously adopted error bars, which were hierarchically reclustered (Fig. 3). A similar reassembling was carried out with the complete set of contigs, which can be assessed using the project web site (http://blasto.iq.usp.br). After this final step, we were prepared to look for putative developmentally regulated genes, i.e., assembled sequences presenting different abundance levels during specific stages of the fungal life cycle.
|
In addition, we confirmed the in silico profiles for eight genes by Northern blot analysis, as depicted in Fig. 4. A good correlation was observed between the in silico profiles and normalized Northern blot hybridization signals.
|
| DISCUSSION |
|---|
|
|
|---|
Multiple levels of protein synthesis regulation, involving both transcriptional and translational controls, seem to occur during the sporulation and germination stages of the B. emersonii life cycle (40, 41). We were able to associate the assembled expressed sequences reported in the present work with different biological processes using the GO system and found that transcription and RNA processing (8%) and protein biosynthesis (9%) were among the most prevalent biological processes. Furthermore, the most frequent ESTs corresponded to sequences encoding putative proteins directly related to the protein biosynthesis machinery (15 of the first 20). Among them, except for elongation factor 1
, all were ribosomal protein transcripts. The level of expression of EF-l
correlates with the rate of cell growth and proliferation (24). Indeed, the noteworthy expression profile of EF-1
mRNA, similar to that of ribosomal proteins, indicated its developmental regulation with increasing levels during the germination stage. Furthermore, EF-1
is always present in vast molar excess compared to the other essential components of translation, raising the possibility that EF-1
may have additional functions in the cell (27). In B. emersonii, one such additional function could be the reallocation of zoospore-stored mRNAs for their regulated translation during germination.
The relationships among fungal lineages as well as among the crown taxa are still not well resolved. Multiple molecular data have suggested that fungi and animals are sister groups forming a unique clade together with choanoflagellates (2, 36, 48). We have tested three commonly used protein sequences for the reconstruction of phylogenetic trees, EF-1
, enolase, and ß-tubulin, which were found to have been completely sequenced in our study, with the goal of doing a first reevaluation of B. emersonii branching position and a first evaluation of our data source.
The B. emersonii EF-1
amino acid sequence presented several insertions shared with the choanoflagellate M. brevicollis but not found in EF-1
from other organisms. The recently described class of eukaryotic GTPase proteins named EFL, which are similar to canonical EF-1
but clearly distinct from it and which included the M. brevicollis protein, shed light upon our observation (23). EFL and EF-1
seem to have similar functions and expression levels and tend to be mutually exclusive in distribution. Organisms that possess EFL are typically closely related to other organisms that possess EF-1
and lack EFL. In this sense, B. emersonii EF-1
reported here could be an EFL sequence and thus not useful for tree reconstruction. Furthermore, a canonical EF-1
sequence is probably absent from the fungus genome.
The trees we constructed based on enolase and ß-tubulin sequences better resolved the phylogenies in agreement with several studies, indicating animals as a sister group of fungi (2, 36, 48) and including M. brevicollis within the clade, in the case of the ß-tubulin tree. Larger data sets, many gene sequences considered together, and other methods of phylogenetic analysis would contribute to clarify these initial results.
The novel information about B. emersonii led us to confirm how much more we need to learn about this chytrid. We generated several ESTs that show similarity to receptor proteins, some of them never described for fungi. Features like that were previously reported for B. emersonii, as is the case of the P-type ATPases related to animal Na+/K+-ATPase characterized in our laboratory (5, 9). Another interesting finding from our study is an MGP/IGF2R best hit that was described for animals but not previously identified in fungi. The primary functions of MGP/IGF2R are intracellular traficking of lysosomal enzymes, and the internalization of IGF2 and other extracellular ligands to the lysosomes for degradation (20).
Other findings have revealed features of B. emersonii that are closer to those of plants than of animals. For instance, previous data indicating the association of the
and ß subunits of B. emersonii mitochondrial processing peptidase with the mitochondrial inner membrane (38), as described for plants, instead of their localization in the mitochondrial matrix, as verified in S. cerevisiae and mammals. In this context, two ESTs found in our libraries showed best matches with receptors described only in plants: the putative phytosulfokine receptor precursor and the ethylene receptor CS-ETR2. Phytosulfokine is an intercellular signal that plays a key role in cellular dedifferentiation and proliferation in plants (32), whereas ethylene is a plant hormone that regulates many aspects of plant growth and development (15).
The mRNAs of a typical somatic cell are distributed in three abundance classes: superprevalent, intermediate, and complex (3). Considering that the proportions among these three classes are presumably represented in nonnormalized cDNA libraries, we conducted an evaluation of the expression profiles of the sequenced ESTs, with the goal of determining some biologically significant expressed genes in B. emersonii. The normalized histogram of gene expression levels can give an overall idea of the organism transcriptome (26). In the case of B. emersonii, the normalized histograms of EST counts comparing the sporulation and germination stages indicated a greater diversity of transcripts during sporulation, which is in agreement with the higher mRNA turnover rate observed during this stage (21, 40).
Furthermore, we were able to identify various highly abundant clusters, which are more represented in the germination libraries, by looking at the most abundant cloned ESTs and the expression profiles corresponding to these messages and concluding that mRNAs related to the protein biosynthesis process are in part responsible for the different distribution. This is more significant when taking into account two Northern blot experiments (for a ribosomal protein and for EF-1
) that confirmed the in silico expression profiles. These patterns indicate the importance of transcriptional regulation for the protein biosynthesis machinery when the growth rate is increasing.
Conversely, we observed contigs associated with the proteolysis or peptidolysis process with correlated expression profiles during the sporulation, which is in agreement with a previous report (28). The probable biological significance for this correlation could be the necessity to provide amino acids for new protein synthesis and to remove proteins which could impair the progress of cell differentiation (28).
In addition to the expression profiles related to protein synthesis and degradation, we reported other expressive correlated patterns. For instance, histone gene expression (with a peak in zoospores) could be a marker of a particular moment during the cell cycle, as shown for histone H3 in alfalfa cells (22). Furthermore, at least 1% of all contigs considered in the digital Northern analysis with no putative identity showed relatively narrow credibility intervals and expression profiles which were hierarchically clustered using a correlation metric. The statistical identification of significant similarities between expression patterns of known and unknown genes could help in ascribing functions to these anonymous sequences.
In conclusion, the unique collection of ESTs presented here represents a valuable resource not only for studies in lower fungal biology and phylogeny but also for future microarray construction and subsequent high-throughput transcriptional studies in diverse biological contexts.
| ACKNOWLEDGMENTS |
|---|
We thank Sergio Verjovski-Almeida, Abimael A. Machado, and Milton Y. Nishiyama, Jr., for their help with data processing, Apuã C. M. Paquola for his advice with the ESTWeb package, J. Adriano A. Freitas for his assistance with the database, Tie Koide for her advice with the Spotfire software, and Rosangela Campanhã and Mara Silvestri for their help in the beginning of this work. K.F.R. and S.M.S.-I. are doctoral fellows of Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES). R.C.G. and R.Z.N.V. are doctoral fellows of FAPESP.
| FOOTNOTES |
|---|
Supplemental material for this article may be found at http://ec.asm.org/. ![]()
K.F.R. and S.M.S.-I. contributed equally to this work. ![]()
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Appl. Environ. Microbiol. | Infect. Immun. | J. Bacteriol. |
|---|---|---|
| Mol. Cell Biol. | Microbiol. Mol. Biol. Rev. | ALL ASM JOURNALS |