| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| MINIREVIEW |
Institut de Biologie Physico-Chimique, UPR 1261 CNRS, Paris, France
| INTRODUCTION |
|---|
|
|
|---|
The unicellular green alga Chlamydomonas reinhardtii appears as a model of choice for the study of plant FKBPs and cyclophilins. Its single-cell type can be readily exposed to controlled concentrations of immunosuppressive drugs, and a powerful genetic system has been developed by decades of work on photosynthesis, organelle biogenesis, flagellar function, and other basic cellular processes (18). A Chlamydomonas cytosolic cyclophilin has been identified and shown to be induced by low-CO2 conditions (32). This gene is repressed during sulfur starvation in a SAC1-independent manner, together with two chloroplast cyclophilins (37), suggesting a link with cell growth. And a homologue of the FKBP12-interacting protein AtFIP37 (8) is also encoded in the Chlamydomonas genome. A comprehensive description of Chlamydomonas immunophilins thus appears desirable.
With the recent release by the Joint Genome Institute (JGI) of a draft nuclear genome sequence, Chlamydomonas has fully entered the genomics era (16). Among the primary goals of genomics, and one of its toughest challenges, is the comprehensive description of gene content. To delineate transcripts for protein-coding genes along the genome, the Joint Genome Institute has used a variety of algorithms relying either mostly on homology (Genewise [2]) or on coding capacity (greenGenie [24]) or expression signals (FgeneSH [31]). For each locus, the preferred model is chosen and refined, and its untranslated regions (UTR) are determined, making use of expressed sequence tag (EST) data.
In the context of a draft sequence, such as version 2.0 of the Chlamydomonas genome, it is expected that the accuracy of gene prediction will be limited by a variety of factors, including but not limited to the following. (i) Incomplete coverage of the genome: 2 to 5% of ESTs, depending on libraries, do not map onto the genome (O. Vallon and C. Hauser, unpublished results), an indication of the proportion of genes that have not yet been hit by genome sequencing. (ii) Sequence gaps within or at the ends of genes, hiding some of the information necessary to predict the gene correctly. Wisely, the programs have been allowed to build models across sequence gaps, even to incorporate them within an exon. While allowing a better coverage, this will inevitably result in ill-predicted gene structures, fusion of neighboring genes, and other problems. (iii) Assembly artifacts, which are difficult to avoid in a whole-genome shotgun sequencing approach. Repeated sequences are an obvious source of such artifacts, as are chimeric DNA clones. This can result in various fragments of a gene being found in different scaffolds. (iv) Limitations in the algorithms themselves: the programs use regular properties of transcribed sequences, of transcription, splicing and termination signals, etc., which, although established statistically and tested rigorously, may not always apply. A case in point is alternative splicing, whereby the molecular machinery of splicing interprets in multiple ways the sequence information in the pre-mRNA, whereas gene prediction programs will only choose the most likely intron/exon structure.
Thus, the Chlamydomonas genomics project, just like any other, must at some point face the question of the reliability and completeness of its gene model data set. This is crucial, since this data set is to serve as a basis for most of the postgenomic analysis. In Drosophila, a large-scale experiment has been devised to confront gene prediction programs and experimental approaches (1), based on high-resolution gene mapping in a well-known region of the genome. In Chlamydomonas, the early sequencing of a large stretch of genomic DNA has allowed benchmarking of greenGenie and a preliminary assessment of gene content (24).
It is not the scope of this paper to provide a complete analysis of a particular fragment of the genome. Rather, I will try to describe the Chlamydomonas instantiation of two well-known medium-size gene families, cyclophilins and FK506-binding proteins. Both show a high degree of sequence conservation across phyla, so that simple BLAST searches are expected to provide an exhaustive identification of all family members in Chlamydomonas. By comparing Chlamydomonas immunophilins with those of vascular plants, we can hope to identify which isoforms could be involved in specific aspects of signal transduction and development, inasmuch as they will differ between a multicellular organism and a unicellular organism. We can also shed light on the evolution of gene families with isoforms directed to many intracellular compartments.
The aim of this paper is therefore threefold: to describe Chlamydomonas immunophilins and parvulins, an important class of proteins that can become the subject of experimentation with this microbe; to analyze phylogenetic relationships between family members, in particular identifying early and late gene duplication events that have given rise to the present-day diversity; and to examine the validity of Chlamydomonas gene models whenever possible by comparison either with Chlamydomonas ESTs or with sequences of orthologues in other organisms. This last perspective, although in no way a quantitative assessment of gene prediction in Chlamydomonas, can help identify common artifacts in the current genomic data set. It can thus serve as a guide for those who want to use this information in the study of their favorite genes. Our hope is that it can also help improve gene models in future versions of the Chlamydomonas genome.
| METHODOLOGY |
|---|
Phylogenetic trees were built using the optimized alignments after trimming to the conserved domain, i.e., excluding the N-terminal targeting peptide and the unique domains. This was judged preferable, since the part of the alignment covering the N-terminal transit peptides (TPs) and additional domains was not meaningful. The neighbor-joining method, run at http://bioweb.pasteur.fr/seqanal/interfaces/clustalw.html#trees,was used with Kimura's correction and bootstrapping (n = 1,000). Prediction of intracellular localization made use of TargetP (http://www.cbs.dtu.dk/services/TargetP/). Note that Chlamydomonas chloroplast TPs are different from those of higher plants (11). Thus, the indication of a chloroplast or mitochondrial location was only taken as indicative of targeting to either of these organelles.
| DIVERSITY OF IMMUNOPHILINS |
|---|
| FK506-BINDING PROTEINS |
|---|
|
|
|
Other single-domain FKBPs include the three isoforms putatively directed to the secretory pathway, FKB15-1, -15-2, and -15-4. They are closely related in sequence, and all linked on scaffold 48; thus, they probably arose from recent gene duplications, obviously distinct from the duplication that gave rise to AtFKBP15-1 and -15-2 in Arabidopsis (19). As a group, secretory pathway FKBPs are characterized by the presence of two conserved Cys residues, already noted in human and yeast FKBP13 (21): they form a disulfide bridge stabilizing the loop crossing region in this particular environment. Interestingly, while the Arabidopsis proteins have C-terminal signals (KNEL and NDEL) that may retain them in the endoplasmic reticulum (ER), as is the case for human FKBP13, the Chlamydomonas proteins lack such signals or the recently proposed CVLF signal (36). They may be secreted, and operate in the cell wall compartment. This is also true of the Chlamydomonas cyclophilins, which raises the question of which protein, if any, is responsible for PPIase activity in the ER lumen.
Eleven FKBPs in Arabidopsis have been shown or predicted to be targeted to the thylakoid lumen and suggested to have common ancestry (19). Based on high sequence conservation with the Arabidopsis orthologue and on the presence of a putative bipartite transit sequence, 11 Chlamydomonas FKBPs can be predicted to localize to the thylakoid lumen as well: they have been called FKB16-1, -16-2, -16-3, -16-4, -16-6, -16-8, -16-9, -17-1, -18, -19, and -20-2. Thus, diversity of lumen-targeted FKBPs is probably an ancient trait in the green lineage. This is in marked contrast with the red algae: only one C. merolae FKBP (CMT472C) branches together with the thylakoid lumen FKBPs of green organisms. I note, though, that the diversification of lumen-targeted FKBPs must have continued after the separation of algae and plants: FKB16-5 is found in tandem with, and is extremely similar to, FKB16-2, while FKB16-1 has FKB16-6 as its closest relative, not AtFKBP16-1. Symmetrically, AtFKBP17-2 and -17-3 are also more closely related to one another than to Chlamydomonas FKB17-2. Strictly speaking, unambiguous orthology can be claimed only between FKB16-3, -16-4, -18, -19, and -20-2 and the Arabidopsis genes of same numbering.
Like their Arabidopsis counterparts (19), all these proteins show a twin-arginine motif typical of proteins translocated via the TAT pathway (two Arg residues followed by a hydrophobic stretch; see supplemental Fig. S1). In general, it is followed by a transit peptidase cleavage site in the form Ala-Xaa-Ala, indicating that the proteins are soluble in the lumen. Why are all the lumenal FKBPs transported by the TAT pathway, which is believed to transport proteins in the folded state? It could be because folding of small FKBPs is a rapid process, occurring before they can be presented to the translocation apparatus. Alternatively, it could be related to the binding of a specific effector, similar to FK506, in the chloroplast stroma, so that the binary complex would be the transported entity. In any event, the question remains of why so many different, sometimes extremely well-conserved FKBP-type PPIases localize to a compartment that harbors only a small fraction of the proteome. I note that among these lumenal FKBPs, only FKB16-2, like the cognate AtFKBP16-2 and AtFKBP13, shows a good conservation of the residues involved in PPIase activity (Table 2). The suggestion that AtFKBP20-2 (with only two critical residues conserved) is involved in isomerization of a critical Pro residue in LHCII (29) may need to be reexamined.
An interesting case is that of FKB17-2, which also appears to be targeted to an organelle and shows a twin-arginine signal, but where the AXA signal peptidase cleavage site is absent. Interestingly, the entire sequence following the two Arg residues is extremely well conserved between FKB17-2 and its two orthologues, AtFKBP17-2 and AtFKBP17-3 (see supplemental Fig. S1), which are predicted to localize to the thylakoid lumen but which also lack a cleavage site. Since sequence conservation in signal peptides is in general very low, this leads us to propose that this region is part of the mature protein. It may constitute a transmembrane helix spanning the thylakoid membrane, similar to that observed in another membrane-anchored TAT pathway substrate, the Rieske protein (10).
In terms of localization, FKB16-2 presents an interesting puzzle. Three distinct splicing variants are documented in the EST data. The main isoform, FKB16-2A, like the Arabidopsis orthologue AtFKBP16-2, has an organellar TP and an RR motif with a cleavage site and hence is probably directed to the thylakoid lumen. But alternative splicing generates another isoform, FKB16-2C, with a deletion of the RR motif. This protein would thus be predicted to reside in the chloroplast stroma. And yet another one, FKB16-2B, has a slightly different N-terminal sequence that could direct it to another location, possibly the mitochondrion.
I note that the N-terminal targeting sequence of FKB16-5 differs markedly from that of its closely related paralogue FKB16-2: it is predicted to be an organellar TP but does not contain a hydrophobic stretch after the two arginines, so that the protein would be predicted to be retained in the chloroplast stroma or in the mitochondrial matrix. In Arabidopsis, no FKBP is predicted to localize to the mitochondrion, where rotamase activity is carried out by two cyclophilins. Since Chlamydomonas has no orthologue for these two mitochondrial cyclophilins (see below), it is tempting to speculate that rotamase activity in the Chlamydomonas mitochondrion is carried out by FKBPs. It could be carried out by FKB16-5 and/or FKB16-2B, which both show a decent conservation of the residues important for rotamase activity (Table 2).
In addition to these simple FKBPs, a series of complex FKBPs can be found in the plant genomes, which combine an FKBP and a TPR domain formed of three tetratricopeptide (TPR) repeats. The latter domain is generally involved in protein-protein interactions, in particular as a binding domain for HSP90 chaperones (26). These proteins are predicted to reside either in the cytosol or in the nucleus. Overall, their function is poorly understood, but they may play an important role in signal transduction: mutation of AtFKBP72, also known as Pasticcino 1, leads to ectopic cell proliferation (35), while that of AtFKBP42 causes a twisted dwarf phenotype (20). In Chlamydomonas, three proteins are found to combine FKBP and TPR domains. C_680056 (1,785 residues) has been named FKB42 on the basis of the similarity of its N-terminal 350 residues to the sequence of ATFKBP42: a single FKBP domain, 3 TPR repeats, a calmodulin binding site, and a C-terminal membrane-anchoring domain (20). This combination is also present in human (FKBP38) and C. merolae (CMH207C) and may thus be an ancient eukaryotic trait. In addition, C_680056 comprises an unknown domain, two PQQ domains (WD40-like repeats; COG1520), and a C-terminal Leu-rich repeat, making this protein arguably one of the most complex encoded by the Chlamydomonas genome (note that we cannot rule out that the model fuses two neighboring genes). Chlamydomonas FKB62 (split between two scaffolds) contains at least two, probably three, FKBP domains in tandem, followed by a TRP domain. This is similar to the closely related Arabidopsis proteins AtFKBP62 and AtFKBP65 (which probably arose recently from the same large duplication that generated AtFKBP15-1 and 15-2). Finally, C_280158 (2,437 residues) shows a hydrophobic domain with three probable transmembrane helices, followed by one (possibly two) FKBP domain and a TRP domain, plus a calmodulin-binding motif and a nuclear localization signal. This is in part similar to AtFKBP62, -65 and -72. Unfortunately, C_280158 suffers from sequence gaps and possible gene fusion, so that its relationship to Arabidopsis FKBPs is not clear. I give this gene the provisory name FKB99.
Another type of complex FKBP is represented by FKB53, with its negatively charged N-terminal domain (45% E/D; theoretical pI = 3.45 over the first 131 amino acids). It is very close to AtFKBP53 (17), in which the N-terminal domain contains both acidic and basic residues (23.3% E/D, 15% R/K; pI = 4.45). AtFKBP53 and the related AtFKBP43 have been proposed to interact with DNA via their Arg/Lys-rich domain, but it is unclear how this could fit with the negative charge on the Chlamydomonas orthologue.
Finally, the most divergent FKBP is trigger factor, a PPIase and chaperone associated with the ribosome and involved in the early steps of protein folding (9). In Chlamydomonas, EST data are consistent with a single gene, which I call TIG1, represented by two overlapping gene models. Phylogenetic analysis (data not shown) indicates that the trigger factors of Chlamydomonas and Arabidopsis are related to that of Synechocystis rather than to that of Rickettsia and other Proteobacteria, believed to be close to the ancestor of mitochondria. It is probably directed to the chloroplast and clearly descends directly from the trigger factor gene of the cyanobacterial endosymbiont. Similarly, none of the Chlamydomonas or Arabidopsis FKBPs appeared to be related to that of Rickettsia, suggesting a complete loss of any FKBP that could have been present in the early mitochondrial endosymbiont.
| CYCLOPHILINS |
|---|
Chlamydomonas proteins were aligned with those of Arabidopsis, Synechocystis, and C. merolae, plus human cyclophilin A for structural comparison (see supplemental Fig. S2), and the alignment was used to generate a phylogenetic tree (Fig. 2). Here again, unambiguous one-to-one orthology could often be observed between Arabidopsis and Chlamydomonas proteins. Chlamydomonas genes were named based on the closest Arabidopsis homologue except that the root CYN was used (CYP being reserved for cytochrome P450). The residues implicated in PPIase activity (38) are fully conserved in only a fraction of the cyclophilins analyzed (Table 3). This does not necessarily mean that these proteins are not enzymatically active, since only a few substitutions have been tested. Only nine of the Chlamydomonas proteins show conservation of the W121 residue in helix II that is crucial for cyclosporine binding, independently of PPIase activity, and orthologues are generally consistent at that position (except AtCyp18-2/CYN18-2).
|
|
Another group of cyclophilins shows complex orthology relationships. CYN20-1 is related to AtCYP20-1, -19-4, and -21-2, all clearly directed to the secretory pathway. CYN20-5 is similar to these proteins, but it has a long N-terminal extension that could direct it to an organelle. Note that this branch is separate from that which harbors CYN23 and AtCYP23, also unambiguously directed to the secretory pathway but characterized by an insertion after helix II. This confirms the hypothesis that plant ER cyclophilins are polyphyletic (4). None show an ER retention signal, suggesting that they are secreted to the periplasm. Note that no Cyanidioschyzon cyclophilin branches in either of these clades. The only PPIase in this genome with anything approaching a potential ER-targeting signal is the cyclophilin CMH263C.
In several clades, univocal orthology and concordant N-terminal sequences leave no doubt as to the final location of the protein. Thus, CYN26-2 and CYN28, like their respective orthologues and the red algal CMP271C, appear targeted to the thylakoid lumen. They have insertions between ß-strands 5 and 6 and after helix II. Extended loops (this time between strand 2 and helix 1 and after strand 4) are also found in the group formed by CYN37, CYN38, and the related Arabidopsis proteins. Two Synechocystis cyclophilins are found at the root of each branch, indicating an ancient diversification inherited from the cyanobacterial endosymbiont. AtCYP38 (TLP40) is one of the most extensively studied cyclophilins of higher plants (13, 33) and has been shown to be a lumenal protein. This is also probably true of CYN38 and of the two related cyanobacterial cyclophilins. The localization of the related CYN37 remains uncertain, since it does not show a convincing organelle targeting sequence, in contrast to its orthologue AtCYP37, and its N-terminal domain is truncated. The putative leucine zipper in the N-terminal domain of AtCYP38 is not conserved in this group of related sequences, and the role of this entire domain is unknown.
Branching close to this clade are the two mitochondrion-targeted proteins AtCyp21-3 and -21-4. They do not have orthologues either in the red alga or in Chlamydomonas, which suggests a recent origin. As mentioned above, the question of which protein carries out PPIase activity in this organelle in algae remains open. Several Chlamydomonas cyclophilins, like CYN16 and CYN17, have no orthologue in Arabidopsis, but they lack an N-terminal extension that could direct them to an organelle.
Complex cyclophilins appear in several distinct branches of the phylogenetic tree. CYN59, like AtCYP59, has an RRM domain involved in RNA binding but lacks the Zn finger. Its C terminus is rich in Arg and Gly residues and may be homologous to the Arg/Lys-rich domain of the Arabidopsis protein. CYN65 is entirely orthologous to the cytosolic AtCYP65, with its N-terminal U box (modified RING Zn finger). CYN57 is orthologous to AtCYP57 and probably also nucleus located. The sequence of CYN71 is incomplete, but it shares with AtCYP71 an N-terminal domain of unknown function. As a group, these complex cyclophilins form a clade with CYN18-1 and -18-2 and their Arabidopsis orthologues, with which they share a compact structure of the cyclophilin domain with short loops. The common ancestor of green algae and land plants probably showed a variety of complex cyclophilins. No cyanobacterial or red algal cyclophilins are found in this group, suggesting that it appeared after the green and red lineage separated.
Of particular interest are three complex Chlamydomonas cyclophilins with no orthologues in Arabidopsis. CYN52 and -53 have two cyclophilin domains in tandem, a feature not hitherto found in any other organism. Phylogenetic analysis (data not shown) shows that internal duplication predated gene duplication, since the N- and C-terminal cyclophilin domains are more similar from one gene to the other than to each other. As is often found in Chlamydomonas, these closely related genes are found next to one another on the genome. The related CYN51 has only one cyclophilin domain and thus appears closer to the ancestor. It shares with CYN53 a new type of domain, also found in higher plants (for example, AT4g33380 and At4g17070). This domain has apparently been lost in CYN52. Interestingly, while CYN53 appears directed to the chloroplast stroma or mitochondrial matrix, the N-terminal sequence of CYN51 has typical features of a dual targeting sequence, suggesting that the protein could end up in the thylakoid lumen. CYN52, in contrast, is unambiguously directed to the secretory pathway (TargetP score of 0.951). Clearly, this subfamily of cyclophilins deserves further study.
CYN40 is another type of complex cyclophilin, with a C-terminal TPR domain. It is probably cytosolic, like its Arabidopsis orthologue, AtCYP40, and so are the related simple cyclophilins CYN22 and AtCYP22. Also in this group are the organelle-targeted CYN20-2 and CYN20-3. They received their names from AtCYP20-2 and -20-3, but this is based more on their putative localization than on sequence similarities. Based on the presence of a potential thylakoid transfer sequence in CYN20-2, I propose that it is directed to the thylakoid lumen, whereas CYN20-3 would be a stromal protein.
Several Arabidopsis cyclophilins have no orthologue in Chlamydomonas: AtCyp21-1, AtCyp26-1, and AtCyp95. While the last is presumably nucleus located, AtCyp26-1 is predicted to be membrane anchored. It is expressed only in flowers (19), so it may function in a development pathway specific to spermatophytes. I note that no cDNA sequence is available for this gene and that no other plant has the N-terminal hydrophobic stretch predicted at the C terminus of the Arabidopsis protein, so that its membrane anchoring may need to be checked.
| PARVULINS |
|---|
| ASSESSMENT OF GENE MODELS |
|---|
Alternative splicing was found in four genes. For FKB16-2 and CYN17, the isoform described by the model was the one most represented in the EST database, but this was not true for FKB16-7 and CYN23. For 10 genes, the sequence was corrected based on EST data. For example, I found several cases where the 3' or 5' UTR was incorrectly predicted due to faulty interpretation of EST (in general because of overlapping genes). For several genes, internal exons were ill predicted, and I always verified that the EST data gave a protein with a better alignment to the other family members. Thus, gene modeling could be improved by placing more emphasis on concordance with EST contigs. In other cases, sequence correction was possible because the EST data bridged a gap in the nucleotide sequence. The missing sequence was sometimes found by BLAST in the unplaced reads, not used in the assembly, suggesting a possible use of EST contigs to guide gap closure.
Sometimes, even when the genomic sequence was complete and no EST data were available, I proposed to change the gene models in order to restore good alignment of the protein products. This implied extending the 5' end (CYN18-1 and CYN19-1) or changing the intron exon boundaries. For example, I could add a fourth exon to CYN16 simply by using a noncanonical splice site. In the absence of experimental data, I cannot ascertain that my propositions are valid: the gene models could be right, and the genes could either be divergent at that position or be pseudogenes in the making. Still, I feel that there is a window of improvement for the computation of gene models, and my bias would be to make heavier use of homology-based modeling.
Several cases were found where the genome sequence is probably erroneously assembled. This was usually evidenced as one arm of a small scaffold being repeated in another scaffold, next to a gap (possibly due to a chimeric DNA clone). Thus, three genes were split between two gene models on different scaffolds. For FKB16-4, I found that the gene sequence was partly repeated on another, small scaffold, but this did not affect the model. For CYN19-1, adding a C nucleotide at position 543109 of scaffold_25 changed the reading frame in such a way that use of the next canonical 5' intron splice site was possible and full conservation with the Arabidopsis orthologue was achieved. Sequencing errors are predicted to appear at fewer than 1/10,000 positions in the sequence; this could be one of them. Finally, two strange cases were found of a "bug" in the prediction. In C_290072 (CYN20-1), the sixth exon is presented as starting at position +2 with respect to the exon that can be deduced from EST data or predicted using the canonical 3' splice site. This introduces a frameshift that throws off the alignment. In C_530020 (PIN4), the 4-nucleotide-long fifth exon obeys no consensus and probably also results from a computation error.
| CONCLUSIONS |
|---|
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Supplemental material for this article may be found at http://ec.asm.org/. ![]()
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Appl. Environ. Microbiol. | Infect. Immun. | J. Bacteriol. |
|---|---|---|
| Mol. Cell Biol. | Microbiol. Mol. Biol. Rev. | ALL ASM JOURNALS |