Introduction

The intein protein family is part of the Hint superfamily, named after the characteristic structure fold first identified in Hedgehog and intein protein domains (Hall et al. 1997). Four characterized Hint domain families are currently known: Hog-Hint, intein, and two types of bacterial intein-like (BIL) domains (Fig. 1). Together with sharing the same structural fold and common sequence features, Hint domains have similar biochemical activities. The domains post-translationally process the proteins in which they are present by protein-splicing, self-cleavage or ligation activities (Paulus 2000; Dassa et al. 2004a).

B. Dassa, S. Pietrokovski (e-mail: [email protected])

Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel

Nucleic Acids and Molecular Biology, Vol. 16 Marlene Belfort et al. (Eds.) Homing Endonucleases and Inteins © Springer-Verlag Berlin Heidelberg 2005

Fig. 1. Evolution of Hint domains. The progenitor Hint domain evolved into at least four currently known protein families: intein, Hog-Hint, A-type BIL and B-type BIL domains. The progenitor domain itself probably emerged by fusion of two duplicated and symmetrical subdomains. Inteins acquire and lose EN and DNA-binding domains during their evolution and some have irreversibly split. More information on Hint domain families can be found at http://bioinfo.weizmann.ac.il/~pietro/Hints

Fig. 1. Evolution of Hint domains. The progenitor Hint domain evolved into at least four currently known protein families: intein, Hog-Hint, A-type BIL and B-type BIL domains. The progenitor domain itself probably emerged by fusion of two duplicated and symmetrical subdomains. Inteins acquire and lose EN and DNA-binding domains during their evolution and some have irreversibly split. More information on Hint domain families can be found at http://bioinfo.weizmann.ac.il/~pietro/Hints

Hint domains are 130 to 160 amino acids long, sharing 4-6 conserved sequence motifs (Amitai et al. 2003). The Hint protein fold is a compact, relatively flat, symmetrical structure, mainly composed of P-strands, with its N- and C- termini close together (Duan et al. 1997; Hall et al. 1997). Inteins usually include additional homing-endonuclease (EN) and DNA-binding domains, not necessary for protein-splicing, which mediate the homing of the intein gene.

Species from the three domains of life, Eukarya, Bacteria and Archaea, include Hint domains. While inteins are apparently limited to unicellular eu-karyotes and prokaryotes, Hog-Hint domains are present in multicellular animals. BIL domains and inteins overlap in their phylogenetic distributions, both are present in bacteria, but in different types of proteins.

Data gathered on Hint domains since their discovery in 1990 have identified their biochemical activity, genetics and evolution. Apparently, each Hint family has its own distinct biological role. Nevertheless, many facets of Hint domains are still unknown or debated.

This chapter describes each of the known Hint families, and evidence for the existence of other Hint families. The data available for each family are used to illuminate its evolution. We then discuss hypotheses for the evolutionary history of the Hint families, the possible activities and biological roles of the progenitor Hint domains, and how these progenitor domains themselves originated.

2 Hint Domain Families 2.1 Inteins

Inteins are selfish genetic elements. They are not known to confer any advantage to their host proteins and species. Inteins survive (l)by having a negligible impact on the fitness of their host genes, due to effectively removing themselves from the precursor proteins; (2) by being integrated in conserved positions of essential genes, where they are difficult to remove without compromising their host genes; and (3) by frequently being able to mediate the insertion of their gene into unoccupied integration points (homing; Belfort and Roberts 1997), counteracting removal of their genes. This survival strategy can account for the phylogenetic distribution of inteins. They are found across all known types of unicellular organisms, but in a highly sporadic manner.

2.1.1 Inteins Include Different Domains

All inteins include a protein-splicing Hint domain. Efficient self-catalyzed protein-splicing minimizes the intein's effect on its host protein, and hence on its host cell. Proper folding of the Hint domain seems all that is required for accurate splicing of the intein flanks (exteins). No additional proteins or energy sources are required. Protein-splicing allows a gene with an intein to function with no apparent difference from its intein-less version. Nevertheless, without contributing to the fitness of the host gene the intein gene is liable to eventually be lost by genomic events that will remove it. However, incomplete deletions and changes in the intein gene, that abolish or reduce its protein-splicing activity, will leave its host protein with an "inert" inserted domain. Removal of the intein gene together with some flanking sequences will also be deleterious. This is probably the major selection force for the presence of inteins in conserved sequence motifs of essential proteins. Most changes in such sequence motifs of these proteins will harm the cells. Intein genes are thus most difficult to remove from conserved protein-coding regions and these form genomic niches for their survival (Pietrokovski 2001).

Many inteins have homing-endonuclease and DNA-binding domains integrated in their Hint domain. Most of the domains are of the LAGLIDADG type, and a few are of the HNH type (Hirata et al. 1990; Pietrokovski 1994; Dal-gaard et al. 1997). Several of the remaining inteins seem to have lost their EN domains. Inactivating deletions and mutations are also noticeable in several EN domains. Inteins apparently undergo cycles of EN-activity gain and loss. EN domains are not necessary for protein-splicing and are also found in other proteins. Since inteins can also survive just by efficient protein-splicing, it is most likely that the earliest intein only consisted of a Hint domain. Current inteins with no EN domains either lost or never acquired them.

2.1.2 Intein Protein Hosts and Insertion Points

Inteins are present in ancient protein types, ones that are found in prokaryotes and eukaryotes or in bacteria and archaea (Pietrokovski 2001). Most of these are also essential proteins (e.g., Table 1). A specific explanation for the abundance of inteins in proteins which are involved in DNA and RNA metabolism was previously suggested (Liu 2000). Liu noted that these types of host proteins can enhance the repair of DNA lesions generated by intein EN domains during their homing process and possible non-specific activity.

Inteins are often present at the same insertion points in homologous genes (Table 1). Such inteins are particularly similar to each other and are termed alleles. Some proteins include more than one intein, but these are no more related to each other than to other inteins (Perler et al. 1997). The relatively high similarity between intein alleles is attributed to customary vertical gene transmission during speciation events and to homing of intein genes between strains and species. Non-allelic inteins have only weak sequence similarity to each other. Trees created from intein multiple sequence alignments are thus characterized by very deep branches between many inteins, indicating distant uncertain relations, and by clusters of intein alleles.

Different, non-allelic, intein insertion points are each differently conserved and share only one common sequence feature. The residue C-terminal to the intein (+1 residue) is typically Cys, Ser or Thr. Its side chain is crucial for li-gating the exteins in canonical intein protein-splicing. However, some inteins utilize alternative protein-splicing mechanisms (Southworth et al. 2000; Ami-tai et al. 2004; Mills et al. 2004).

2.1.3 Inteins Are Sporadically Distributed

Inteins are sporadically present in diverse bacteria, archaea, fungi, phages and viruses (see http://bioinfo.weizmann.ac.il/~pietro/inteins for updated details). By examining the few hundred fully and almost fully sequenced genomes of microbes and higher organisms, it seems that inteins are absent from multicellular organisms (higher plants and animals) and that closely related species and orthologous proteins often have different inteins (Table 1).

Inteins are thus sporadically present in conserved motifs of essential proteins across diverse organisms. This could result from independent loss of inteins in different lineages (Pietrokovski 2001) and from horizontal interspecies transfer of inteins (Gogarten et al. 2002). While the two hypotheses are not exclusive, we believe the first one is the main explanation for current in-tein distribution.

Horizontal transfer readily accounts for the discontinuous intein dispersion in corresponding insertion points (e.g. Fsihi et al. 1996; Saves et al. 2000; Isabelle et al. 2001; Koufopanou et al. 2002; Okuda et al. 2003) since inteins have an established homing mechanism. However, there are no known specific mechanisms for intein gene "invasion" into heterologous insertion points. Intein homing ability is irrelevant for this process since copying of the intein gene relies on specific sequence similarity between the intein gene flanks and the insertion point. However, constraints on loss of intein genes from conserved motifs of essential genes are relevant for their establishment in such new points. The ends of the copied region must exactly correspond to the intein gene, not including any additional flanks or missing the intein ends. One property that greatly facilitates such exact copying is the occurrence of the mobile gene region by itself as a free molecule. This is found in introns, as spliced-out RNA molecules, but not in inteins, that are removed at the protein level.

Independent loss of inteins from different lineages assumes that while at present each species only has a fraction of intein insertion points occupied, past species had inteins in all insertion points. Inteins were thus more common in the past, and we need to account for their gradual loss. Either inteins had some advantage for the cells that was lost, or new mechanisms selecting against inteins arose. Possible lost advantages include control of protein activity and combinatorial trans-protein-splicing (see Sect. 3). These functions could be made redundant with the rise of current transcriptional and post-translational control mechanisms, and of larger genomes affording the direct coding of diverse proteins. New mechanisms selecting against inteins might have included improved barriers against invasion of foreign DNA into the genome, recognition and elimination of cells with invaded genomes, and additional repair mechanisms of DNA lesions. All of these would make positions that lost their inteins less likely to be reinvaded.

Table 1. Intein distribution in Archaea

Table 1. Intein distribution in Archaea

2.1.4 A Non-selfish Intein-Derived Protein

No example of inteins with a beneficial cellular role is known. However, one intein-derived protein is crucial for an important cellular process. Mating-type switches in Saccharomyces yeasts are initiated with a specific double-

strand cleavage of the MAT locus by the HO endonuclease (Haber 1998). The full switching system evolved gradually and is only present in Saccharomy-ces sensu-stricto and related yeasts. The final step in the development of the system was acquisition of the HO endonuclease (Butler et al. 2004). HO is derived from a VDE intein present in Saccharomyces yeasts, as shown by sequence similarity and phylogenetic analysis (Hirata et al. 1990; Pietrokovski 1994; Dalgaard et al. 1997). HO includes the VDE protein-splicing, homing-en-donuclease and DNA-binding domains. It has no N-terminal flanking region upstream of the protein-splicing domain and includes a C-terminal Zn-finger DNA-binding domain. It is not known why the protein-splicing domain was retained and its motifs conserved when it seems that HO only needed to retain the endonuclease activity of the VDE intein.

2.1.5 Split Inteins

Inteins can be split into two or more parts which associate and ligate their ex-teins in a trans protein-splicing reaction. Split inteins naturally occur in Cy-anobacteria and Nanoarchaea (Wu et al. 1998; Caspi et al. 2003; Waters et al. 2003), and can be engineered by splitting contiguous inteins (Shingledecker et al. 1998; Southworth et al. 1998; Sun et al. 2004). Naturally split inteins include all the typical intein Hint sequence features. They probably resulted from genomic DNA rearrangements that split contiguous inteins and their host genes. These rearrangements might not be rare, but probably only very few produced active split-intein genes. These genes had to code for intein parts that could frans-splice, and produce the right amounts of the split products at the right time, to generate active, mature proteins. The two types of naturally split inteins are in catalytic subunits of DNA polymerases. These proteins are crucial in every cell generation. It is thus remarkable that the cells could survive the splitting event, since the gene's function was immediately needed. If an alternative mechanism for the gene's activity was present, it was either transient or inferior to the function of the split genes that thus managed to survive.

Once established in a conserved motif of an essential gene a split intein is very difficult to lose. Both intein parts must be precisely excised, and the split host part rejoined to recreate a functional intein-less gene. One way to remove the split intein is to acquire a gene that can replace the split one. However, this replacement is difficult for genes such as DNA polymerases that interact with many other proteins. Allelic split DnaE inteins were found in diverse cyano-bacteria. This led us to propose that the split dnaE gene is fixed in a number of large cyanobacterial lineages (Caspi et al. 2003). The Nanoarchaeum equitans split pol intein is an allele of several archaeal type-B DNA polymerase con tiguous inteins. No other homologous DNA polymerases are known from this phylum, and thus we do not know if this split intein is fixed in it.

We do not know how split intein genes managed to establish themselves in cyanobacteria and Nanoarchaeum. The genomic event that created them might have generated other advantages to the cell, ensuring its survival and fixing the split intein gene.

2.2 Hog-Hint Domains

Hedgehog developmental proteins of animals are composed of three protein domains. The N-terminal domain (Hedge) is a secreted developmental signal. It is first cleaved off from the precursor protein, then modified by cova-lent attachment of lipids at both ends and secreted from the cell. The C-ter-minal part of the protein (Hog) is composed of two domains, a Hint domain and a sterol-recognition region (SRR). The Hint domain shifts the peptide bond, which attaches it to the Hedge domain, to a thioester bond. This bond is attacked by the hydroxyl group of a cholesterol molecule bound by the SRR. This cleaves off the Hedge domain while attaching a cholesterol molecule to the resulting C-terminus by an ester bond (Mann and Beachy 2004).

Hog-Hint domains have the same structural fold as intein Hint domains and also share their sequence motifs, only missing the CI motif of the intein C-terminus (Hall et al. 1997). Hog-Hint domains also have a few additional motifs including one with an active site Asp or His residue responsible for activation of the cholesterol molecule. These specific sequence features of the Hint motifs allow us to distinguish Hog-Hint from the other Hint domains (Pietrokovski 1998; Amitai et al. 2003).

2.2.1 Phylogenetic Distribution of Hog-Hint Domains

Arthropods apparently have single hedgehog genes, while vertebrates have three types of such genes: Sonic, Desert and Indian hedgehogs (Hammerschmidt et al. 1997). Other chordate subphyla, urochordates and cephalo-chordates, have fewer hedgehog genes. Earlier diverged animal groups, echi-noderms, mollusks, and annelids (segmented worms), each have at least one hedgehog gene. Nematodes have a larger number of genes with Hog domains (Table 2). C. elegans has ten genes belonging to three families. While their biological role is as yet unclear, these nematode proteins seem to be distant homologues of Hedgehog proteins. Their N-terminal domains correspond to the Hedge domain, and the region downstream of their Hint domains corre-

Table 2. Phylogenetic distribution of Hog-Hint domains

Phylogenetic group

Genes with Hog-Hint domains

Reference or examples

Vertebrates (mammals, amphibians and fish) Urochordates (Ciona) Cephalochordates (Ampliioxus) Arthropods (insects, arachnids, millipedes)

Echinoderms (sea urchin) Mollusks (snail)

Annelids - segmented worms (leach) Nematodes - round worms (C. elegans) Rhodophyta - red algae (Cyanidioscliyzon porphyra) Cryptosporidium parvum - apicomplexan protist

Sonic, Desert and Indian hedgehogs hhl and hh2 hedgehogs Amphioxus hedgehog Hedgehog

Hedgehog Hedgehog Hedgehog

Warthogs, groundhogs and Ml 10 Rhodhogs

Gene with Hog-Hint domain

Hammerschmidt et al. (1997) Takatori et al. (2002) Shim eld (1999) Hammerschmidt et al. (1997) Janssen et al. (2004)

GenBank accessions AAP38182, BAD01490

GenBank accession AAC15065

Nederbragt et al. (2002)

Aspocket al. (1999)

Matsuzaki et al. (2004)

Abrahamsen et al. (2004)

sponds to the Hedgehog SRR and is termed ARR, for adduct recognition region (Aspock et al. 1999; Mann and Beachy 2004). A nematode Hog protein region was also shown to have the cholesterol-mediated autocleavage activity of Hedgehog Hog regions (Porter et al. 1996). However, in their natural context, nematode Hog proteins might use other molecules to cleave the ester bond to their N-terminal domains (Mann and Beachy 2004).

No Hog-Hint domains were identified in the few very different protists that have been completely and almost completely sequenced, except for Cryptosporidium parvum. This apicomplexan intracellular parasite of mammals includes a single gene with a Hog-Hint domain. It is highly expressed during in vitro development but no other experimental data are currently available on it (Abrahamsen et al. 2004).

Different subclasses of red algae (Rhodophyta) include genes with Hog domains, which we term Rhodhog genes. These include Cyanidioschyzon mero-lae and Porphyra yezoensis red algae, that each has a multi-gene Rhodhog family (Matsuzaki et al. 2004).

Hog-Hint domains are present in animals and red algae, but are not found in fungi and plants. The most parsimonious explanation is the presence of a single hedgehog gene in the animal progenitor that diversified in different phyla. The C. parvum Hog-Hint domain is probably the result of horizontal transfer, since no other Hog-Hint domain is known in other protists. The red algae Rhodhog domains might also result from an ancient horizontal transfer, but this depends on the contested phylogenetic position of this kingdom relative to green plants and animals.

2.3 BIL Hint Domains

Two new types of Hint domains were recently identified and termed A- and B-type BIL (for bacterial intein-like) domains (Amitai et al. 2003). Members of each BIL type are more similar by sequence to each other than to other types of Hint domains. The two types are also as different from one another as they are from intein and Hog-Hint domains. While inteins are integrated in highly conserved sites of essential proteins, both A- and B-type BIL domains are integrated in variable regions of non-conserved and diverse bacterial proteins, some of which are extracellular. Similar to inteins, BIL domains can auto-cleave at either their N- or C-termini. A-type BIL domains can also protein splice, but not by the canonical intein mechanism, since the C-terminal flanking residue of the A-type BIL does not always have a nucleophilic side chain. This and other features suggest that the biological role of BIL domains differs from that of inteins (Amitai et al. 2003; Dassa et al. 2004a). BIL domains may contribute to the variability of their flanking protein by protein-splic ing, cleavage and ligation. Studying BIL domains may reveal new ways of protein maturation and control, and enhance our understanding of other Hint domains.

2.3.1 Phylogenetic Distribution of BIL Domains

The first 100 BIL domains were identified exclusively in bacterial genomes, hence their name, bacterial intein-like domains (Amitai et al. 2003; Dassa et al. 2004a). They are present in more than 25 taxonomically diverse bacterial species, including Proteobacteria, Cyanobacteria, Planctomycetes, Clostridia, and others (see http://bioinfo.weizmann.ac.il/~pietro/BILs for updated details). Some BIL domains are found in pathogens of humans and plants, such as Neisseria meningitidis and Pseudomonas syringae. A-type BIL domains were recently found in ciliates, thus extending their phylogenetic distribution to eukaryotic protists.

The genomic distribution of BIL domains is variable and highly dynamic, with relatively fast gene duplications and losses. While some species include more than 20 BIL domains, other related species have none. Sequence similarity between BIL domains shows that multiple BIL domains occurring in the same or related species are the result of gene expansions within each lineage (Amitai et al. 2003). This includes pseudo-genes, apparent by deletions, and non-sense mutations within BIL domains and their protein flanks.

BIL domains usually appear once in each protein. However, the ciliate Tet-rahymena thermophila includes two genes with the same domain organization, each containing a pair of A-type BIL domains interspersed with ubiqui-tin-like (Ubl) domains. These genes are termed BUBL, for BIL-Ubl domains. Another BUBL gene with poly-Ubl domains and a single A-type BIL domain is found in the ciliate Paramecium tetraurelia (Dassa et al. 2004b; Fig. 2). The original BUBL gene probably included one BIL and poly-Ubl domains, and, after the divergence of Tetrahymena from Paramecium, the Tetrahymena

Fig. 2. Ciliate BUBL genes. Schemes of the Tetrahymena thermophila (Tth) and Paramecium tetraurelia (Pte) BUBL genes with ubiquitin-like (Ubl) domains shown as round boxes, A-type BIL domains shown as hexagons, and the ADP-ribosyltransferase (ART) domains as ovals

Fig. 2. Ciliate BUBL genes. Schemes of the Tetrahymena thermophila (Tth) and Paramecium tetraurelia (Pte) BUBL genes with ubiquitin-like (Ubl) domains shown as round boxes, A-type BIL domains shown as hexagons, and the ADP-ribosyltransferase (ART) domains as ovals

BUBL gene underwent two duplication events. The first one was intra-gen-ic, duplicating the BIL domain, followed by a tandem duplication of the entire BUBL gene.

2.3.2 Protein Distribution of BIL Domains

BIL domains are integrated into hyper-variable positions of non-conserved proteins and are not always flanked at their C-terminus by nucleophilic residues. This is in sharp contrast to inteins, which are present in conserved positions of essential proteins. BIL domains are frequently found in the C-ter-minal regions of proteins, which are sometimes unstable coding regions that change by microevolutionary processes. Many BIL domains are flanked by N-terminal regions with characteristic motifs of extracellular proteins (Amitai et al. 2003).

A-type BIL domains are found in the 3' region of Neisserial mafB adhes-ins or in short coding regions downstream of these genes. These regions can recombine with mafB gene 3' ends, creating hyper-variable C-terminal regions. Thus, BIL domains might offer a post-translational mechanism for generating protein variability, together with genetic rearrangements in microevolution.

Ciliate A-type BIL domains are found between conserved residues of ubiq-uitin-like (Ubl) domains in BUBL genes (Dassa et al. 2004b). This probably relates to the role of the BUBL BIL domains as discussed below.

2.3.3 Biochemical Activity and Biological Roles of BIL Domains

BIL domains can protein splice and auto-cleave at either N- or C-termini. Activity of BIL domains was extrapolated from that of inteins and Hog-Hint domains by comparing corresponding active-site positions (Fig. 4A). Similar to intein and Hog-Hint domains, the N-terminus of BIL domains is conserved, typically being Cys or Ser. The domains also include a highly conserved His in a motif corresponding to the N3 motif of inteins and Hog-Hint domains. The N-terminal and N3 (His) residues are responsible for the NIS or N/O acyl shifts of inteins and Hog-Hint domains (Paulus 2000; Romanelli et al. 2004). Thus, it is assumed that this reaction also occurs in BIL domains (Dassa et al. 2004a).

A-type BIL domains have highly conserved His-Asn residues at their C-ter-mini (like inteins), followed by a diverse position (unlike inteins). Asn is necessary for cleavage of the C-termini of inteins, and is followed by a conserved Cys, Ser or Thr +1 residue to allow ligation of the two intein flanks. Accord ing to intein protein-splicing mechanisms, A-type BIL domains can cleave their C-termini but not ligate their flanking sequences. Nevertheless, A-type BIL domains can protein splice, even with a non-nucleophilic + 1 residue (Ala) (Dassa et al. 2004a).

In the suggested mechanism for protein-splicing of A-type BIL domains, the C-terminus of the domain is cleaved by Asn cyclization and the resulting free amino group of the C-terminal flank attacks the previously formed thioester/ester bond at the N-terminus of the domain by an aminolysis reaction. This ligates the two flanks and completes the release of the BIL domain. Nevertheless, A-type BIL protein-splicing is not as efficient as that of inteins, also producing large amounts of cleavage products (Dassa et al. 2004a). This could be the reason why inteins do not utilize such a mechanism. Cleavage products of essential proteins, like those flanking inteins, may be deleterious to the cell in a dominant-negative manner - partial products inactivating the properly processed products.

Aminolysis was also suggested for proteosomal peptide ligation (Vigneron et al. 2004). There, an ester bond between a cleaved peptide and the proteosomal Thr active site is attacked by the amino group of another cleaved peptide. This ligates two peptides, which are then displayed on the cell surface. This is an example of natural convergence, since there are no apparent evolutionary or structural commonalities between Hint protein-splicing and proteosomal peptide ligation. Thus, features and chemical reactions of Hint domains may illuminate other, seemingly unrelated, biological systems.

B-type BIL domains share all intein protein-splicing motifs except for the C-terminal motif (Amitai et al. 2003), yet the domains can cleave their N- or C-termini (Dassa et al. 2004a). While N-terminal cleavage may resemble the intein reaction, C-terminal cleavage may proceed as in atypical reactions of inteins, where the N-terminus of the domain was suggested to attack its C-ter-minus (Amitai et al. 2003).

The expansion of BIL genes in different lineages indicates that they benefit the cell. Additionally, unlike inteins, both BIL domains are typically present in hyper-variable sites of non-conserved proteins, and the domains do not have any evident means for horizontal transfer. At present, there is only scant circumstantial data for the biological roles of BIL domains. We thus raise some possible hypotheses addressing both BIL types together.

We suggest that BIL domains contribute to their host protein by post-trans-lational modifications. This can increase the host protein variability by generating alternative products from a single protein precursor: the precursor itself, N- or C-terminal- cleavage products, both N- and C-terminal flanks, and the splicing product.

Ligation of diverse molecules to the host protein is another means to increase its variability. This provides single-cell organisms with an advanta geous adaptation to dynamically changing environments (Ziebuhr et al. 1999). One well-studied example is evading the immune system of multicellular organisms infected by pathogens. Another example is adhering the cell to new targets. Activity of BIL domains may directly assist bacteria in binding to their target hosts, living environment, or other bacteria (forming biofilms). Hog-Hint domains efficiently ligate their N-terminal flank to cholesterol. Similarly, BIL domains might ligate their host proteins to other molecules. This can covalently attach bacteria to their adherence surface. The preceding hypotheses concur with finding BIL domains in proteins that undergo microevolution (varying proteins by rapid genetic changes), and in extracellular proteins.

BIL domains may control the expression of their host proteins and modulate their activity by external signals. Splicing or autocleavage can be triggered by changes in redox environment or by conformational changes induced by an allosteric modification to the BIL domain or its host protein.

Ciliate BUBL A-type BIL domains are present together with Ubl domains. Ubiquitin (Ub) and Ubl domains are translated as inactive pre-protein precursors that need to be proteolytically cleaved to release the functional Ub or Ubl monomers. These are then conjugated by a series of activating (El), conjugating (E2) and ligating (E3) enzymes (Jentsch and Pyrowolakis 2000). We suggest that BIL domains in BUBL proteins release the Ubl domains from the pre-protein in a self-processing reaction, and may also conjugate them to their targets (Dassa et al. 2004b).

2.4 Other Hint Domains

Advances in sequence analysis methods and the accumulation of sequence data enabled the definition of the Hint superfamily (Koonin 1995; Hall et al. 1997; Amitai et al. 2003). However, as more Hint families are found it is difficult to decide if a specific domain, or a whole group, belongs to a new or to an already known Hint family. The recently sequenced genome of the enterobac-terium Photorhabdus luminescens strain TT01 (Duchaud et al. 2003) includes an example of a probable new family of Hint domains. A 595-residue open reading frame (ORF) of unknown function includes a region with significant similarities to the N-terminus of Hint domains (Fig. 3). The regions flanking the Hint domain are not similar to known proteins, and the domain itself does not have any of the motifs (or EN and DNA-binding domains) characteristic of known Hint domains.

PI 448 CIAEGTLIDMADGSKKKVEDIRSGDKVLTKQGG VLQVKSR-----1VGHDTEFVDLIYNNDEK 505

BIL4 Ct 316 &VAGTMILTATGLVA— IENIKAGDKVIATNPETFEVAEKTVLETYVRETTELLHLTIGG-EV 376

PI 506 VSI/EPiHPVATLR-GI VKADELKIGDTI¥-TRDGQTTLSS VKLRNTD-PLNVYNFVLEKTDD 564 + T HP G V+A +L++GD + +R + KL D P+ VYNF K DD BIL4 Ct 377 IKTTFDHPFV VKD VGFVE&GKLQVGDKLLDSRGNVLVVEEKKLEIADKPVKVYNF---KVDD 435

Fig. 3. Photorhabdus luminescens new type of Hint domain. BLASTP alignment of Pho-torhabdus luminescens locus 1731 (PI, NCBI gi code 36785096) Hint domain and Clostridium thermocellum BIL4 domain, BIL4 Ct (Dassa et al. 2004a). Putative Hint domain active-site residues are highlighted

0 0

Post a comment