Patent application title: PROTEIN HAVING NUCLEASE ACTIVITY, FUSION PROTEINS AND USES THEREOF
Inventors:
Ralf KÜhn (Berlin, DE)
Assignees:
Helmholtz Zentrum München Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH)
IPC8 Class: AC12N1590FI
USPC Class:
1 1
Class name:
Publication date: 2017-06-22
Patent application number: 20170175141
Abstract:
The present invention relates to a nucleic acid molecule encoding (I) a
polypeptide having the activity of an endonuclease, which is (a) a
nucleic acid molecule encoding a polypeptide comprising or consisting of
the amino acid sequence of SEQ ID NO: 1; (b) a nucleic acid molecule
comprising or consisting of the nucleotide sequence of SEQ ID NO: 2; (c)
a nucleic acid molecule encoding an endonuclease, the amino acid sequence
of which is at least 70% identical to the amino acid sequence of SEQ ID
NO: 1; (d) a nucleic acid molecule comprising or consisting of a
nucleotide sequence which is at least 50% identical to the nucleotide
sequence of SEQ ID NO: 2; (e) a nucleic acid molecule which is degenerate
with respect to the nucleic acid molecule of (d); or (f) a nucleic acid
molecule corresponding to the nucleic acid molecule of any one of (a) to
(e) wherein T is replaced by U; (II) a fragment of the polypeptide of (I)
having the activity of an endonuclease. Also, the present invention
relates to a vector comprising the nucleic acid molecule and a protein
encoded by said nucleic acid molecule. Further, the invention relates to
a method of modifying the genome of a eukaryotic cell and a method of
producing a non-human vertebrate or mammal.Claims:
1. A nucleic acid molecule encoding (I) a polypeptide having the activity
of an endonuclease, which is selected from the group consisting of: (a) a
nucleic acid molecule encoding a polypeptide comprising or consisting of
the amino acid sequence of SEQ ID NO: 1; (b) a nucleic acid molecule
comprising or consisting of the nucleotide sequence of SEQ ID NO: 2; (c)
a nucleic acid molecule encoding an endonuclease, the amino acid sequence
of which is at least 70% identical to the amino acid sequence of SEQ ID
NO: 1; (d) a nucleic acid molecule comprising or consisting of a
nucleotide sequence which is at least 50% identical to the nucleotide
sequence of SEQ ID NO: 2; (e) a nucleic acid molecule which is degenerate
with respect to the nucleic acid molecule of (d); and (f) a nucleic acid
molecule corresponding to the nucleic acid molecule of any one of (a) to
(e) wherein T is replaced by U; or (II) a fragment of the polypeptide of
(I) having the activity of an endonuclease.
2. The nucleic acid molecule of claim 1, wherein in (I)(c) in said amino acid sequence having at least 70% sequence identity to SEQ ID NO: 1 the amino acid residues P66, D67, D84 and/or K86 of SEQ ID NO: 1 are not modified.
3. The nucleic acid molecule of claim 1 further encoding a DNA-binding domain.
4. The nucleic acid molecule of claim 3, wherein the DNA-binding domain is a TAL effector motif of a TAL effector protein.
5. A vector comprising the nucleic acid molecule of claim 1.
6. A host cell comprising the nucleic acid molecule of claim 1.
7. A protein or fusion protein having the activity of an endonuclease encoded by the nucleic acid molecule of claim 1.
8. A method of modifying a target sequence in the genome of a eukaryotic cell, the method comprising the step of: (a) introducing into said cell the nucleic acid molecule of claim 1, a vector of comprising the nucleic acid molecule of claim 1 or a protein or fusion protein having the activity of an endonuclease encoded by the nucleic acid molecule of claim 1.
9. The method of claim 8, wherein the modification of said target sequence is by homologous recombination with a donor nucleic acid sequence, further comprising the step: (b) introducing a nucleic acid molecule into said cell, wherein said nucleic acid molecule comprises said donor nucleic acid sequence, wherein said donor DNA sequence is flanked upstream by a first flanking element and downstream by a second flanking element, wherein said first and second flanking element are different and wherein each of said first and second flanking element are homologous to a continuous DNA sequence on either side of the double-strand break introduced in (a) of claim 8 within said target sequence in the genome of said eukaryotic cell.
10. The method of claim 8, wherein said cell is analysed for successful modification of said target sequence in the genome.
11. The method of claim 8, wherein the cell is selected from the group consisting of a mammalian or vertebrate cell, a plant cell or a fungal cell.
12. The method of claim 8, wherein the cell is an oocyte.
13. A method of producing a non-human vertebrate or mammal carrying a modified target sequence in its genome, the method comprising transferring a cell produced by the method of claim 9 into a pseudo pregnant female host.
14. The method of claim 8, wherein the cell is selected from the group consisting of rodents, dogs, felides, primates, rabbits, pigs, cows, chickens, turkeys, pheasants, ducks, geese, quails, ostriches, emus, cassowaries and zebrafish.
15. A method of producing a protein or fusion protein having the activity of an endonuclease encoded by the nucleic acid molecule of claim 1 comprising the steps of: (a) culturing a host cell comprising the nucleic acid molecule of claim 1 and (b) isolating the produced protein or fusion protein.
16. A host cell comprising the vector of claim 5.
17. A protein or fusion protein having the activity of an endonuclease encoded by the vector of claim 5.
Description:
RELATED APPLICATIONS
[0001] This application is a divisional of U.S. Patent Application No. 14/124,117, filed on Dec. 5, 2013, which is a U.S. National Stage Application under 35 U.S.C. .sctn.371 of International Application Number PCT/EP2012/060711, filed on Jun. 6, 2012, which claims priority to European Patent Application No. 11004635.6, filed on Jun. 7, 2011, the contents of which applications are incorporated by reference herein in their entirety.
SEQUENCE LISTING
[0002] The text file entitled "Vossius_03USnew_ST25.txt," created on Jun. 27, 2016, having 404,886 bytes of data, and filed concurrently herewith, is hereby incorporated by reference in its entirety in this application.
DESCRIPTION
[0003] The present invention relates to a nucleic acid molecule encoding (I) a polypeptide having the activity of an endonuclease, which is (a) a nucleic acid molecule encoding a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 1; (b) a nucleic acid molecule comprising or consisting of the nucleotide sequence of SEQ ID NO: 2; (c) a nucleic acid molecule encoding an endonuclease, the amino acid sequence of which is at least 70% identical to the amino acid sequence of SEQ ID NO: 1; (d) a nucleic acid molecule comprising or consisting of a nucleotide sequence which is at least 50% identical to the nucleotide sequence of SEQ ID NO: 2; (e) a nucleic acid molecule which is degenerate with respect to the nucleic acid molecule of (d); or (f) a nucleic acid molecule corresponding to the nucleic acid molecule of any one of (a) to (e) wherein T is replaced by U; (II) a fragment of the polypeptide of (I) having the activity of an endonuclease. Also, the present invention relates to a vector comprising the nucleic acid molecule and a protein encoded by said nucleic acid molecule. Further, the invention relates to a method of modifying the genome of a eukaryotic cell and a method of producing a non-human vertebrate or mammal.
[0004] In this specification, a number of documents including patent applications and manufacturer's manuals are cited. The disclosure of these documents, while not considered relevant for the patentability of this invention, is herewith incorporated by reference in its entirety. More specifically, all referenced documents are incorporated by reference to the same extent as if each individual document was specifically and individually indicated to be incorporated by reference.
[0005] Nucleases remain to be one of the most important tools of molecular biologists since their discovery in the late 1960s. Nucleases are enzymes capable of cleaving the phosphodiester bonds between the nucleotide subunits of nucleic acids. Enzymes catalyzing DNA and RNA cleavage are integral parts of major DNA metabolic processes such as DNA replication, DNA recombination, DNA repair, site-specific recombination and RNA splicing. In addition, nuclease activities are essential in RNA processing, maturation, RNA interference and are components of microbial defense mechanisms.
[0006] RNA and DNA present only two types of phosphodiester bonds for cleavage, 5'- or 3'- of a scissile phosphate and the fundamental chemistry is bimolecular nucleophilic substitution. Nonetheless, structures and catalytic mechanisms of RNA and DNA nucleases are greatly varied and complex. Nucleases may be endo- or exonucleases, DNA or RNA specific, topoisomerases, recombinases, ribozymes, or RNA splicing enzymes. Their reaction can be divided into the three stages of nucleophilic attack, the formation of a negatively charged penta-covalent intermediate and the breakage of the scissile bond. Nucleases utilize a variety of nucleophiles to cleave a scissile phosphate bond. The most common nucleophiles are water molecules deprotonated by a general base for direct hydrolysis. For DNA cleavage, the side chains of Ser, Tyr and His serve as nucleophiles to form a covalent DNA phosphoryl-protein intermediate, which is subsequently resolved either by phosphoryl transfer reaction back to DNA during recombination and topoisomerization or by hydrolysis in two-step cleavage reactions. To enable the controlled degradation or processing of cellular DNA or RNA, nuclease activities are strictly regulated by stringent substrate specificity, confined localization, or by potent inhibitors.
[0007] For convenience nucleases can be classified according to their catalytic mechanism into three major classes based on their metal-ion dependence (Yang, W. (2011). Q. Rev. Biophys. 44(1): 1-93). These classes of two-metal-ion-dependent, one-metal-ion-dependent and metal-independent nucleases are further divided into families or superfamilies according to sequence and structure conservation and functional diversity.
Restriction Endonucleases
[0008] Various families of restriction endonucleases are found among all three catalytic classes. The type I, III and IV restriction enzymes are multisubunit and complex molecular machines that combine multiple activities including restriction, methylation and DNA translocation, require additional cofactors (AdoMet, ATP or GTP), bind more than one target site, and cleave outside the recognition sequence, often at a random distance. Type II restriction endonucleases are enzymes that recognize short DNA sequences (usually 4-8-bp long) and cleave the target in both strands at, or in close proximity to the recognition site. Orthodox type II restriction enzymes are homodimeric, cleave within palindromic sequences, require Mg2.sup.+ ions and can act on single copies of their targets. Because of their remarkably high specificity in recognizing and cleaving their target sequences, they are of high interest as the most frequently used tools for recombinant DNA technology (Pingoud, A., M. Fuxreiter, et al. (2005). Cell Mol Life Sci 62(6): 685-707; Orlowski, J. and J. M. Bujnicki (2008). Nucleic Acids Res 36(11): 3552-69). In nature, type II REases (restriction endonucleases) are found in prokaryotic organisms, where they form restriction-modification systems with DNA methyltransferases of the same or very similar substrate specificity. DNA methyltransferases use S-adenosylmethionine (AdoMet) as a methyl group donor to modify specific bases in the target sequence, thereby rendering it resistant to cleavage by the restriction enzyme. While the Restriction-Modification system's own DNA is protected against self-degradation by the nuclease, any foreign DNA (e.g. from phages) that invades the host cell and lacks methylation, can be efficiently destroyed. In order to distinguish the components of restriction-modification systems the names of methylases and nucleases are preceded with `M`. and `R.` prefixes (e.g. M. Fokl and R. Fokl).
[0009] Many commonly used type-II restriction endonucleases share the conserved motif PD-(D/E)XK. Said motif is generally found in proteins that interact with nucleic acid molecules such as DNA and is not limited to the presence in nucleases. The three catalytic residues are located close to each other on an uneven .beta.-hairpin. The first D is located at the beginning of the first and shorter strand, and the E and K, separated by a hydrophobic residue x, are located in the middle of the second and longer strand. The first D is most conserved and coordinates both metal ions, whereas the second E can be replaced by Q, D, N, H or S, and the third K can be replaced E, Q, D, S, N or T. By varying dimeric interfaces and thus the relative positions of the two catalytic centers, dimeric endonucleases can cleave DNA to generate blunt ends or staggered ends with various 5'- or 3'-overhangs. The catalytic module invariably approaches DNA from the minor groove side, and the sequence-specific binding is conducted by a separate module/subdomain in the major groove. The first two carboxylates of the DEK motif coordinate the metal ions. The third, which usually is hydrogen bonded with both the nucleophilic water and the DNA-binding module in the major groove, couples DNA sequence recognition with the cleavage reaction. Members of this superfamily have a very diverse primary sequence and thus different structures surrounding the catalytic core. Database searches with restriction enzyme sequences typically reveal either no significant similarity to any protein, or very high similarity (>90% identity) to a few isoschizomers, and no similarity to other proteins. This strongly biased distribution of similarities and dissimilarities made comparative sequence analysis of all restricition enzymes difficult and raised a question whether the diversity of amino acid sequences of restriction endonucleases indicates polyphyletic evolution (convergence) or extreme divergence from a common ancestor.
[0010] While .about.70% of restriction endonucleases belong to the PD-(D/E)XK superfamily, other superfamily members can be monomeric or tetrameric and be involved in other processes such as DNA repair and homologous recombination. In addition to endonucleases, members in this superfamily can also be 5'- or 3'-exonucleases. The most comprehensive source of information on restriction enzymes is the REBASE database (rebase.neb.com) that lists several thousand functionally characterized enzymes and several thousand putative enzymes, inferred from sequence comparisons or genomic analyses. Therefore, a large disproportion exists between the number of known or predicted sequences and the small number of .about.50 experimentally characterized proteins with known three-dimensional structures. Presently, a large fraction of putative enzymes remains without any predictions or experimental data. Type II REases are further subdivided into several types according to their recognition site symmetry, structural organization or cofactor requirement. Most of the restriction enzymes used for recombinant DNA work belong to type HP (P-palindromic). Type HA enzymes recognize asymmetric sequences, like Bpu10I, a dimer of non-identical subunits, each of which is responsible for cleavage of one strand of the DNA. Type IIB enzymes cleave DNA at both sides of the recognition sequence, an example being Bpll that cleaves the topstrand 8 nucleotides before and 13 nucleotides after the recognition sequence, while the bottom strand is cleaved 13 nucleotides before and 8 nucleotides after the recognition sequence. Type IIC enzymes have both cleavage and modification domains within one polypeptide. Type IIE enzymes need to interact with two copies of their recognition sequence for efficient cleavage, one copy being the target for cleavage, the other serving as an allosteric effector. Type IIE enzymes like Nael recognize palindromic nucleotide sequences in a manner similar to the type IIP enzymes and cleave DNA within the boundaries of their recognition sites; however, they possess a separate DNA binding domain to perform allosteric function. Type IIF enzymes are typically homotetrameric restriction endonucleases that also interact with two copies of their recognition site, but cleave both of them in a concerted manner. Type IIG enzymes, essentially a subgroup of Type IIC enzymes, have both cleavage and modification domains within one polypeptide. They are in general stimulated by AdoMet, but otherwise behave as typical Type II enzymes. Type IIH enzymes behave like type II enzymes, but their genetic organization resembles Type I Restriction-Modification systems. Type IIM enzymes recognize a specific methylated sequence and cleave the DNA at a fixed site. The best known representative is Dpnl which cleaves Gm6ATC, Gm6ATm4C and Gm6ATm5C, yet not GATC,GATm4C, GATm5C or hemimethylated sites. Many other restriction enzymes are more or less tolerant to methylation, but for Type IIM enzymes the methyl group is an essential recognition element. Orthodox Type IIP enzymes like EcoRI recognize symmetric nucleotide sequences and cleave within their recognition sites. They share both a common structural core comprising the five stranded mixed .beta.-sheet flanked by a-helices. The DNA binding sites of Type IIP enzymes, however, are highly diverse and usually form a patch on the protein surface composed of amino acid residues located on the different structural elements (.alpha.-helices, .beta.-strands, loops). Orthodox Type IIP enzymes interact with DNA as homodimers, and each subunit contributes to the recognition of half of the palindromic sequence. Type IIS enzymes cleave at least one strand of the target DNA outside of the recognition sequence. The best-known type IIS enzyme is Fokl, which like many other type IIS enzymes interacts with two recognition sites before cleaving DNA. Type IIS enzymes are active as homodimers and are composed of two domains, one responsible for target recognition and the other for catalysis (also serving as the dimerization domain). This is apparent from the crystal structure and biochemical studies of Fokl (Bitinaite, J., D. A. Wah, et al. (1998). Proc Natl Acad Sci U S A 95(18): 10570-5; Wah, D. A., J. Bitinaite, et al. (1998). Proc Natl Acad Sci U S A 95(18): 10564-9). Crystal structure analysis of Fokl reveals that it is composed of a specific DNA binding module fused to the cleavage domain that possesses a conserved endonuclease catalytic core but cuts DNA in a nonspecific manner. Modular architecture is also characteristic for the type IIS enzyme Bfil, which is composed of two DNA binding domains fused to the dimeric catalytic core similar to the nonspecific nuclease belonging to the phospholipase D family. The presence of a separate nuclease domain has been also reported from the crystal structure of the Type IIP enzyme Sdal (Tamulaitiene, G., A. Jakubauskas, et al. (2006). Structure 14(9): 1389-400)
Modified Restriction Enzymes and Chimaeric Nucleases as Tools for Qenome Editing
[0011] Nucleases that cleave nucleic acid molecules at specific sites rather than randomly are of increasing importance in emerging technologies such as, e.g., in genetic engineering and gene targeting. Gene targeting is a process in which a DNA molecule introduced into a cell replaces the corresponding chromosomal segment by homologous recombination, and thus presents a precise way to manipulate the genome (Capecchi, M. R. (2005). Nat Rev Genet 6(6): 507-12). In the past, the application of gene targeting to mammalian cells has been limited by its low efficiency. Experiments in model systems have demonstrated that the frequency of homologous recombination of a gene targeting vector is strongly increased if a double-strand break is induced within its chromosomal target sequence. Using the yeast homing endonuclease I-Scel, that cuts DNA at an 18 base pair-long recognition site, it was initially shown that homologous recombination and gene targeting are stimulated over 1000-fold in mammalian cells when a recognition site is inserted into a target gene and I-Scel is expressed in these cells (Rouet, P., Smih, F., Jasin, M.; Mol Cell Biol 1994; 14: 8096-8106; Rouet, P., Smih, F. Jasin, M; Proc Natl Acad Sci USA 1994; 91: 6064-6068). In the absence of a gene targeting vector for homology directed repair, the cells frequently close the double-strand break by non-homologous end-joining (NHEJ). Since this mechanism is error-prone it frequently leads to the deletion or insertion of multiple nucleotides at the cleavage site. If the cleavage site is located within the coding region of a gene it is thereby possible to identify and select mutants that exhibit reading frameshift mutations from a mutagenised population and that represent non-functional knockout alleles of the targeted gene.
[0012] Therefore, sequence specific nucleases represent an important tool for biotechnology to modify the genome of model organisms or cell lines. In order to construct nucleases that specifically recognise new target sequences within genes, two approaches have been pursued that rely on the modification of natural homing endonucleases or on the fusion of a natural or engineered
[0013] DNA binding domain to a nuclease domain. Such modified restriction enzymes or chimaeric nucleases can target large DNA sites (up to 36 bp) and can be engineered to bind to desired DNA sequences.
[0014] Homing endonucleases, such as I-Scel of yeast, are natural genetic elements that catalyze their own duplication into recipient alleles by creating site-specific DSBs that initiate their own genetic transfer by homologous recombination. A key feature of these enzymes is that they create double-strand breaks at recognition sites that are 14- to 40-bp long. The major limitation to the use of homing endonucleases in gene targeting is that each enzyme recognises exclusively its natural target sequence. By protein engineering it has been attempted to modify homing endonucleases in order to recognize new target sites. In this work, modifications could be made that alter the natural target site within some nucleotides, but it is yet not possible to design enzymes specific for entirely new target regions.
[0015] Due to the difficulty of manipulating the sequence recognition of homing enonucleases, zinc-finger nucleases (ZFN) are presently the most commonly used artificial nucleases for genetic engineering (Urnov, F. D., E. J. Rebar, et al. Nat Rev Genet 11(9): 636-46). Zinc-finger nucleases were developed by fusing the nonsequence-specific cleavage domain of the Fokl type IIS restriction endonuclease (Fn domain) to a new DNA binding domain. The advantage of zinc-finger nucleases is that the zinc-finger DNA binding domain can be modified to recognize novel target sequences, including those in endogenous genes. The protein modules known as zinc-fingers are found in the DNA-binding domain of the most abundant family of transcription factors in most eukaryotic genomes. Each finger is composed of 30 amino-acids, coordinates one Zn2+-ion using two cysteines and two histidine residues, and contacts primarily three basepairs of DNA. Two critical features of the structure are that each finger binds its 3-bp target site independently and that each nucleotide seemed to be contacted by a single amino acid side chain projecting from one end of the a-helix into the major groove of the DNA. Individual fingers have been designed to recognize many of the 64 different target triplets, but the greatest success has been in designing zinc fingers to recognize 5'-GNN-3' triplets. Although zinc-finger recognition codes have been proposed, no code currently exists that consistently results in zinc-fingers with high affinity binding. Improving the specificity of zinc-finger binding, such as by increasing the number of fingers or by constructing multifinger proteins using two-finger units, remains an active area of research.
[0016] Using zinc-finger nucleases in the absence of a gene targeting vector for homology directed repair, knockout alleles were generated in mammalian cell lines and knockout zebra fish and rats were obtained upon the expression of ZFN mRNA in one cell embryos (Santiago Y, Chan E, Liu P Q, Orlando S, Zhang L, Urnov F D, Holmes M C, Guschin D, Waite A, Miller J C, Rebar E J, Gregory P D, Klug A, Collingwood T N; Proc Natl Acad Sci U S A 2008; 105:5809-5814; Doyon Y, McCammon J M, Miller J C, Faraji F, Ngo C, Katibah G E, Amora R, Hocking T D, Zhang L, Rebar E J, Gregory P D, Urnov F D, Amacher S L; Nat Biotechnol 2008; 26:702-708; Geurts A M, Cost G J, Freyvert Y, Zeitler B, Miller J C, Choi V M, Jenkins S S, Wood A, Cui X, Meng X, Vincent A, Lam S, Michalkiewicz M, Schilling R, Foeckler J, Kalloway S, Weiler H, Menoret S, Anegon I, Davis G D, Zhang L, Rebar E J, Gregory P D, Urnov F D, Jacob H J, Buelow R.; Science 2009; 325:433). Furthermore, zinc-finger nucleases were used in the presence of exogeneous gene targeting vectors that contain homology regions to the target gene for homology driven repair of the double strand break through gene conversion. This methodology has been applied to gene engineering in mammalian cell lines and gene correction in primary human cells (Urnov F D, Miller J C, Lee Y L, Beausejour C M, Rock J M, Augustus S, Jamieson A C, Porteus M H, Gregory P D, Holmes M C.; Nature 2005; 435:646-651; Porteus M H, Baltimore D. 2003. Science 300:763; Hockemeyer D, Soldner F, Beard C, Gao Q, Mitalipova M, DeKelver R C, Katibah G E, Amora R, Boydston E A, Zeitler B, Meng X, Miller J C, Zhang L, Rebar E J, Gregory P D, Urnov F D, Jaenisch R.; Nat Biotechnol 2009; 27:851-857).
[0017] Although the use of zinc-finger nucleases results in a higher frequency of homologous recombination, considerable efforts and time are required to design zinc-finger proteins that bind a new DNA target sequence at high efficiency and that act as sequence specific nuclease.
[0018] In addition, it has been long ignored that the nature of the nuclease domain of zinc-finger and other chimaeric nucleases may represent an equally important success factor for the overall activity of the fusion protein. The reason for this neglection is based on the fact that up to date only a single nuclease domain has been found that retains nuclease activity within a separate protein folding domain and that can be combined with DNA binding domains, in order to generate a sequence specific nuclease fusion proteins. This nuclease domain is derived from the type IIS Fokl restriction enzyme that has been characterised in detail and is known to act as an obligate dimer (Bitinaite, J., D. A. Wah, et al. (1998). Proc Natl Acad Sci U S A 95(18): 10570-5; Wah, D. A., J. Bitinaite, et al. (1998). Proc Natl Acad Sci U S A 95(18): 10564-9). In most other restriction enzymes DNA recognition and cleavage are combined into a single protein domain and can not be separated. An exeption is the Sdal enzyme that has been structurally characterised to posses a separate nuclease domain (Tamulaitiene, G., A. Jakubauskas, et al. (2006). Structure 14(9): 1389-400). In addition, it has not been possible to isolate mutants that loose DNA recognition but retain DNA cleavage activity.
[0019] Therefore, due to the lack other comparable functional nuclease domains, it was for a long time essentially unknown whether the enzymatic properties of the Fokl Fn domain may constitute a limiting factor for the nuclease activity of Fn domain fusion proteins. For example, the intrinsic structure of the Fn domain may restrict its enzymatic processivity or the small dimerisation interface of two Fn domains may lead to a suboptimal interaction and a low cleavage rate of the DNA substrate.
[0020] By site-directed mutagenesis the Fokl Fn domain has been engineered into the KK and EL variants that preferentially act as heterodimers (Miller, J. C., M. C. Holmes, et al. (2007). Nat Biotechnol 25(7): 778-85). The use of these variants provides the improved target sequence specificity of zinc-finger nucleases and reduces toxicity in mammalian cells since less genomic off-target sequences are recognised and processed. However, the overall nuclease activity of the KK and EL variants is at most comparable to that of the Fn wildtype domain.
[0021] Only very recently it has been found that the wildtype Fokl Fn domain indeed exhibits only a suboptimal enzymatic nuclease activity that limits the use of zinc-finger nucleases for genome engineering. In a study of directed protein evolution the Fn domain has been randomly mutagenised and subjected to an E. coli based nuclease assay able to select mutants that exhibit increased enzymatic activity (Guo, J., T. Gaj, et al. (2010), J Mol Biol 400(1): 96-107). By this procedure it has been possible to isolate mutants that exhibit >10-fold higher nuclease activity as compared to the wildtype Fn domain. Upon coupling of these mutants to zinc-finger domains such fusion proteins showed a three to sixfold improved substrate processing in mammalian cells. However, it remains unknown at present whether the activity of the Fn domain can be further enhanced or whether the intrinsic protein architecture of the Fn domain may restrict any further improvements.
[0022] Besides zinc-finger DNA-binding domains fused to nuclease domains, very recently also TAL effector protein DNA-binding domains have been identified. As compared to zinc-finger motifs, TAL repeat elements within TAL effector proteins provide a new type of DNA binding domain that may be combined with a nuclease domain into sequence specific nucleases. A key feature of the TAL peptide elements is provided by their modulatory nature. Thereby, new sequence specific DNA-binding proteins can be generated through the combination of just four basic TAL elements that are each specific for the A, C, G or T nucleotide. Currently, only the nuclease domain of Fokl is successfully used in fusion with TAL effector protein DNA-binding domains (Miller et al. (2010). Nat. Biotechnol. 29, 143-148).
[0023] In summary, there is an ongoing need for nucleases that can be used in various experimental settings including their fusion to other proteins and modification of the nuclease domain.
[0024] The technical problem underlying the present invention was to identify alternative and/or improved means and methods for cleaving nucleic acid molecules.
[0025] The solution to this technical problem is achieved by providing the embodiments characterized in the claims.
[0026] Accordingly, the present invention relates in a first embodiment to a nucleic acid molecule encoding (I) a polypeptide having the activity of an endonuclease, which is (a) a nucleic acid molecule encoding a polypeptide comprising or consisting of the amino acid sequence of SEQ ID NO: 1; (b) a nucleic acid molecule comprising or consisting of the nucleotide sequence of SEQ ID NO: 2; (c) a nucleic acid molecule encoding an endonuclease, the amino acid sequence of which is at least 70% identical to the amino acid sequence of SEQ ID NO: 1; (d) a nucleic acid molecule comprising or consisting of a nucleotide sequence which is at least 50% identical to the nucleotide sequence of SEQ ID NO: 2; (e) a nucleic acid molecule which is degenerate with respect to the nucleic acid molecule of (d); or (f) a nucleic acid molecule corresponding to the nucleic acid molecule of any one of (a) to (e) wherein T is replaced by U; (II) a fragment of the polypeptide of (I) having the activity of an endonuclease.
[0027] In accordance with the present invention the term "nucleic acid molecule" defines a linear molecular chain consisting of at least (for each) 2, 5, 10, 25, 50, 75, 100, 250, 500, such as at least 750, 1000, or at least 2500 or more nucleotides. The group of molecules designated herein as "nucleic acid molecules" also comprises complete genes. The term "nucleic acid molecule" is interchangeably used herein with the term "polynucleotide".
[0028] The term "nucleic acid molecule" in accordance with the present invention includes DNA, such as cDNA or double or single stranded genomic DNA and RNA. In this regard, "DNA" (deoxyribonucleic acid) means any chain or sequence of the chemical building blocks adenine (A), guanine (G), cytosine (C) and thymine (T), called nucleotide bases, that are linked together on a deoxyribose sugar backbone. DNA can have one strand of nucleotide bases, or two complimentary strands which may form a double helix structure. "RNA" (ribonucleic acid) means any chain or sequence of the chemical building blocks adenine (A), guanine (G), cytosine (C) and uracil (U), called nucleotide bases that are linked together on a ribose sugar backbone. RNA typically has one strand of nucleotide bases. Included are also single- and double-stranded hybrid molecules, i.e., DNA-RNA. The nucleic acid molecule may also be modified by many means known in the art. Non-limiting examples of such modifications include methylation, "caps", substitution of one or more of the naturally occurring nucleotides with an analog, and internucleotide modifications such as, for example, those with uncharged linkages (e.g., methyl phosphonates, phosphotriesters, phosphoroamidates, carbamates, etc.) and with charged linkages (e.g., phosphorothioates, phosphorodithioates, etc.). Polynucleotides may contain one or more additional covalently linked moieties, such as, for example, proteins (e.g., nucleases, toxins, antibodies, signal peptides, poly-L-lysine, etc.), intercalators (e.g., acridine, psoralen, etc.), chelators (e.g., metals, radioactive metals, iron, oxidative metals, etc.), and alkylators. The polynucleotides may be derivatized by formation of a methyl or ethyl phosphotriester or an alkyl phosphorarnidate linkage. Further included are nucleic acid mimicking molecules known in the art such as synthetic or semi-synthetic derivatives of DNA or RNA and mixed polymers. Such nucleic acid mimicking molecules or nucleic acid derivatives according to the invention include phosphorothioate nucleic acid, phosphoramidate nucleic acid, 2'-O-methoxyethyl ribonucleic acid, morpholino nucleic acid, hexitol nucleic acid (HNA), peptide nucleic acid (PNA) and locked nucleic acid (LNA) (see Braasch and Corey, Chem Biol 2001, 8: 1). LNA is an RNA derivative in which the ribose ring is constrained by a methylene linkage between the 2'-oxygen and the 4'-carbon. Also included are nucleic acids containing modified bases, for example thio-uracil, thio-guanine and fluoro-uracil. A nucleic acid molecule typically carries genetic information, including the information used by cellular machinery to make proteins and/or polypeptides. The nucleic acid molecule of the invention may additionally comprise promoters, enhancers, response elements, signal sequences, polyadenylation sequences, introns, 5'- and 3'-non-coding regions, and the like.
[0029] The term "polypeptide" as used herein interchangeably with the term "protein" describes linear molecular chains of amino acids, including single chain proteins, containing more than 30 amino acids, whereas the term "peptide" describes linear molecular chains of amino acids, including single chain proteins, containing less than and up to 30 amino acids. Polypeptides may further form oligomers consisting of at least two identical or different molecules. The corresponding higher order structures of such multimers are, correspondingly, termed homo- or heterodimers, homo- or heterotrimers etc. The polypeptides of the invention may form heteromultimers or homomultimers, such as heterodimers or homodimers. Furthermore, peptidomimetics of such proteins/polypeptides where amino acid(s) and/or peptide bond(s) have been replaced by functional analogues are also encompassed by the invention. Such functional analogues include all known amino acids other than the 20 gene-encoded amino acids, such as selenocysteine. The terms "polypeptide" and "protein" also refer to naturally modified polypeptides and proteins where the modification is effected e.g. by glycosylation, acetylation, phosphorylation, ubiqitinylation and similar modifications which are well known in the art.
[0030] The term "a polypeptide having the activity of an endonuclease" as used herein means a polypeptide which is capable of cleaving the phosphodiester bonds between nucleotides subunits of nucleic acids within a polynucleotide chain.
[0031] According to the invention, the endonuclease enzymatic activity is considered as stable when, in the respective conditions, the enzyme is capable of lasting long enough to obtain the desired effect, namely the cleavage of its substrate. In this regard it is noted that endonuclease activity can be assayed as described in the examples of the specification or by methods well known in the art. For example, a nucleic acid molecule can be exposed to a protein whose endonuclease activity is to be assessed under conditions that are suitable for endonuclease enzymatic activity. After incubation, the composition comprising the nucleic acid molecule (with or without said protein to be assessed) may be subjected to an assay for assessing the length of a nucleic acid molecule such as, e.g., gel-electrophoresis, to determine whether the nucleic acid molecule has been cleaved.
[0032] In accordance with the present invention, the term "percent (%) sequence identity" describes the number of matches ("hits") of identical nucleotides/amino acids of two or more aligned nucleic acid or amino acid sequences as compared to the number of nucleotides or amino acid residues making up the overall length of the template nucleic acid or amino acid sequences. In other terms, using an alignment, for two or more sequences or subsequences the percentage of amino acid residues or nucleotides that are the same (e.g. 95% identity) may be determined, when the (sub)sequences are compared and aligned for maximum correspondence over a window of comparison, or over a designated region as measured using a sequence comparison algorithm as known in the art, or when manually aligned and visually inspected.
[0033] This definition also applies to the complement of any sequence to be aligned. Amino acid sequence analysis and alignment in connection with the present invention was carried out using the NCBI BLAST algorithm (Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic
[0034] Acids Res. 25:3389-3402) and the CLC main workbench software (version 5.7.1; CLC bio, Aarhus, Denmark) which are preferably employed in accordance with this invention. Preferably, the published standard parameters are used (Altschul et al. loc cit.). The skilled person is aware of additional suitable programs to align nucleic acid sequences. A preferred program for nucleic acid sequence alignment in accordance with the invention is the CLC main workbench software using the standard alignment parameters of the software program (version 5.7.1; CLC bio, Aarhus, Denmark).
[0035] As defined in the embodiments herein above, certain amino acid sequence identities are envisaged by the invention. Also envisaged are--with increasing preference--amino acid sequence identities of at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97,5%, at least 98%, at least 98.5%, at least 99%, at least 99,5%, at least 99,8%, and 100% identity to the respective amino acid sequence in accordance with the invention.
[0036] As defined in the embodiments herein above, certain nucleotide sequence identities are envisaged by the invention. Also envisaged are--with increasing preference--nucleotide sequence identities of at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 97,5%, at least 98%, at least 98.5%, at least 99%, at least 99,5%, at least 99,8%, and 100% identity to the respective nucleic acid sequence in accordance with the invention.
[0037] It will be readily appreciated by the skilled person that more than one nucleic acid molecule may encode the same polypeptide due to the degeneracy of the genetic code. Degeneracy results because a triplet code designates 20 amino acids and a stop codon. Because four bases exist which are utilized to encode genetic information, triplet codons are required to produce at least 21 different codes. The possible 4.sup.3 possibilities for bases in triplets give 64 possible codons, meaning that some degeneracy must exist. As a result, some amino acids are encoded by more than one triplet, i.e. by up to six. The degeneracy mostly arises from alterations in the third position in a triplet. This means that nucleic acid molecules having different sequences, but still encoding the same polypeptide are envisaged and can be employed in accordance with the method of present invention.
[0038] Fragments according to the present invention are polypeptides having the activity of an endonuclease as defined herein above and comprise at least 90 amino acids. In this regard, it is preferred--with increasing preference--that the fragments according the present invention are polypeptides of at least 100, at least 125, at least 150, at least 200 amino acids, at least 300 amino acids, at least 400 amino acids. Fragments of the polypeptide of the invention, which substantially retain endonuclease activity, include N-terminal truncations, C-terminal truncations, amino acid substitutions, internal deletions and addition of amino acids (either internally or at either terminus of the protein). For example, conservative amino acid substitutions are known in the art and may be introduced into the endonuclease of the invention without substantially affecting endonuclease activity, i.e. reducing said activity.
[0039] As is evident from the examples, the inventor was able to identify and isolate a novel nuclease, in particular the endonuclease domain, derived from a Clostridium strain as detailed below.
[0040] Specifically, the inventor could establish the utility of the gene product of a putative bacterial gene without known functional connotation as a sequence unspecific nuclease. The novel nuclease can be employed in various experimental settings just as any other nuclease. For example, it may be used to randomly cleave nucleic acid molecules or, e.g., in fusion with DNA-binding domains, for site-specific cleavage of nucleic acid molecules. Importantly, and as outlined below and specifically in the examples, the novel endonuclease can be used in combination with TAL effector protein DNA-binding domains as part of a fusion protein for sequence-specific nucleic acid cleavage. In this respect, the novel nuclease shows its superiority over state of the art endonucleases other than Fokl which could so far not be shown to be active in corresponding fusion proteins. Briefly, the inventors tested the gene product of said uncharacterised, hypothetical microbial gene which they designated as "Clo051" (SEQ ID NO: 17) and which is derived from the genome of Clostridium spec. 7 2 43FAA (NCBI Reference Sequence: ZP_05132802.1; publication/database release date: Jun. 9, 2010), more specifically its putative nuclease domain (see FIGS. 5 and 6), for its endonuclease activity in combination with the DNA-binding domain of a TAL effector protein. Also various known endonuclease proteins were tested in combination with TAL effector protein DNA binding domains as well as two more hypothetical microbial genes. Surprisingly, only the nuclease domain from Clo051 could be shown to be active, whereas the other fusion proteins did not show activity (see Example 1 for details). The comparative experiments emphasized the significance of the finding of the present invention in that a novel nuclease has been identified that also exhibits activity when fused to the DNA-binding domains of TAL effector proteins. TAL effector proteins are expressed by plant pathogens of the genus Xanthomonas and reprogram host cells by mimicking eukaryotic transcription factors. TAL effector proteins are characterized by a central domain of tandem repeats of 32 to 34 amino acid that constitute a DNA-binding domain. The number and order of repeats in a TAL effector protein determines its specific DNA binding activity. (Boch, J., et al. 2009 Science 326: 1509-12). The amino acid sequences of the repeats are conserved, except for two adjacent highly variable residues (at positions 12 and 13) that determine specificity towards the DNA base A, G, C or T. Binding to DNA is mediated by contacting a nucleotide of the DNA double helix with the variable residues at position 12 and 13 within the Tal effector motif resulting into a one-to-one correspondence between sequential repeats in the Tal effector proteins and sequential nucleotides in the target DNA. Binding to longer DNA sequences is achieved by linking several of these Tal effector motifs in tandem to form a "DNA-binding domain of a Tal effector protein". The use of such DNA-binding domains of Tal effector proteins for the creation of Tal effector motif-nuclease fusion proteins that recognize and cleave a specific target sequence depends on the reliable creation of DNA-binding domains of Tal effector proteins that can specifically recognize said particular target. The advantage of the TAL repeat elements, as compared to e.g. zinc-finger elements, is provided by their truly modular nature. Thereby, new sequence specific DNA binding proteins can be generated through the combination of the four basic TAL elements that are specific for the A, C, G or T nucleotide.
[0041] It is important to note that in the present invention the Clo051 nuclease domain fused to DNA-binding domains of TAL effector proteins has been tested and found to be active in mammalian, specifically human cultured cells. Therefore, the utility of Clo051 nuclease domain fusion proteins for DNA and gene manipulation, specifically but without limitation in mammalian cells has been directly proven in the biological system that provides important applications for this technology. This finding is of particular importance since studies on protein function that are performed in lower eucaryotic organims, like e.g. yeast, do not allow a definite conclusion on the utility of the protein under study in mammalian cells. For example, a specific protein may function optimal at 30.degree. Celsius, the growth temperature of yeast, but becomes unstable or inactive at 37.degree. Celsius as the typical body temperature of mammals. In addition, the intracellular milieu of e.g. yeast cells, like ion and protein concentration, protein diversity and protein degradation mechanisms, are distinguished from the intracellular milieu of mammalian cells.
[0042] While the examples only describe the use of the nuclease domain of Clo051 (SEQ ID NO: 1), e.g. in combination with DNA-binding domains, the skilled person will appreciate that one may also employ the entire sequence of 010051 as set forth in SEQ ID NO: 17 or shorter fragments thereof having endonuclease activity and comprising the amino acid sequence of SEQ ID NO: 1. The amino acid sequence of SEQ ID NO: 1 starts at E389 and ends at Y587 of the amino acid of SEQ ID NO: 17 as also exemplified in FIG. 5.
[0043] In a preferred embodiment of the nucleic acid molecule of the invention, in (I)(c) in said amino acid sequence having at least 70% sequence identity to SEQ ID NO: 1 the amino acid residues P66, D67, D84 and/or K86 of SEQ ID NO: 1 are not modified.
[0044] The nuclease domain of Clo051, like many type-II restriction endonucleases and e.g. the DNA repair protein MutH, share the conserved sequence motif PD-(D/E)XK within the core of their catalytic domain. The core serves as a scaffold for a weakly conserved active site, typically comprising two or three acidic residues (Asp or Glu) and one Lys residue, which together form the hallmark bipartite catalytic motif [(P)D. Xn. (D/E)XK] (where X is any amino acid). This motif has led to naming this superfamily of proteins as `PD-(D/E)XK`. Work on restriction enzymes and DNA repair proteins has shown that the three catalytic residues are located close to each other on an uneven .beta.-hairpin. The first D is located at the beginning of the first and shorter strand, and the E and K, separated by a hydrophobic residue x, are located in the middle of the second and longer strand. The catalytic module invariably approaches DNA from the minor groove side, and the sequence-specific binding is conducted by a separate module/subdomain in the major groove. The first two carboxylates of the DEK motif coordinate the metal ions. The first D is most conserved and coordinates both metal ions, whereas the second E can be replaced by Q, D, N, H or S, and the third K can be replaced E, Q, D, S, N or T. The Lysine residue in the conserved DEK motif coordinates the nucleophilic water in conjunction with the phosphate 3' to the scissile bond; the same Lysine is also hydrogen bonded with a carbonyl oxygen in the DNA binding module. This Lysine, which is conserved in many restriction endonucleases and is replaced by Glu or Gln in BamHl and BgIII, has been proposed as a sensor for DNA binding and a hub that couples base recognition and DNA cleavage (Lee et al. (2005). Molecular Cell 20, 155-166; Orlowski, J. and J. M. Bujnicki (2008). Nucleic Acids Res 36(11): 3552-69).
[0045] The primary sequence of the Cl0051 nuclease domain between the positions E389 and Y587 of the sequence of SEQ ID NO: 17, i.e. the sequence of SEQ ID NO: 1, exhibits a unique distribution of the positively charged arginine (R) and lysine (K) residues and of negatively charged glutamate (E) and aspartate (D) residues (FIG. 13). These residues constitute a three-dimensional landscape of charges within the Clo051 domain that determines the unique tertiary structure of this nuclease, as shown in the structural model in FIG. 6. Certain replacements of polar versus non-polar residues or of non-polar residues against polar residues, e.g. at the positions S35 and/or R58 of SEQ ID NO:1 (or S423 and R446 of SEQ ID NO: 17), alter the three-dimensional structure of the protein chain and may result into an increase of the nuclease activity. Such amino acid replacements may be made by trial and error or may follow specific hypotheses on the structural and functional impact on the Clo051 nuclease domain. Alternatively, a large number of randomly mutagenised variants of the Clo051 nuclease domain coding region can be assembled in a library by mutagenic, error prone PCR. This library of mutant molecules can be tested for the presence of hyperactive nuclease variants by a phenotypic screening assay in E. coli, yeast or mammalian cells that is coupled to a functional nuclease readout, e.g. as described for the improvement of the FLP recombinase (Buchholz et al., Nat. Biotechnol. 16, 657-62, 1998). Such a functional screen for improved nuclease variants can result into the replacement of single or multiple residues that lead to increased nuclease activity as compared to the Clo051 wildtype form.
[0046] Also envisaged are embodiments where more than the amino acid residues P66, D67, D84 and/or K86 of SEQ ID NO: 1 are not modified such as, e.g., amino acid stretches as, e.g. from at least P66 to at least K86, at least R64 to at least Y88, at least G62 to at least E90, as well as L60 to at least Y92 of SEQ ID NO: 1.
[0047] In a preferred embodiment of the invention, the nucleic acid molecule further encodes a DNA-binding domain.
[0048] In this embodiment the nucleic acid molecule of the invention encodes a fusion protein having the activity of an endonuclease and comprises a DNA-binding domain and a cleavage domain comprising or consisting of the novel endonuclease domain. The term "fusion protein" is wel1-known in the art and has the same meaning herein. Namely, it refers to a protein generated by joining two or more target nucleic acid sequences, e.g. genes, which originally code for separate proteins to create a fusion construct. Translation of said fusion construct results in a single protein with the functional properties derived from said separate proteins. The two proteins giving rise to the fusion protein may be connected by a linker, such as, e.g., a peptide linker. In other words, the DNA-binding domain and the cleavage domain of the nucleases may be directly fused to one another or may be fused via a linker.
[0049] The term "linker" as used in accordance with the present invention relates to a sequel of amino acids (i.e. peptide linkers) as well as to non-peptide linkers.
[0050] Peptide linkers as envisaged by the present invention are peptide or polypeptide linkers of at least 1 amino acid in length. Preferably, the linkers are 1 to 100 amino acids in length. More preferably, the linkers are 5 to 50 amino acids in length and even more preferably, the linkers are 10 to 20 amino acids in length. It is well known to the skilled person that the nature, i.e. the length and/or amino acid sequence of the linker may modify or enhance the stability and/or solubility of the molecule. Thus, the length and sequence of a linker depends on the composition of the respective portions of the fusion protein.
[0051] The skilled person is aware of methods to test the suitability of different linkers. For example, the properties of the molecule can easily be tested by testing the nuclease activity as well as the DNA-binding specificity of the respective portions of the fusion protein to be used in the method of the invention.
[0052] It will be appreciated by the skilled person that when the fusion protein is provided as a nucleic acid molecule encoding the fusion protein in expressible form, the linker is a peptide linker also encoded by said nucleic acid molecule.
[0053] The term "non-peptide linker", as used in accordance with the present invention, refers to linkage groups having two or more reactive groups but excluding peptide linkers as defined above. For example, the non-peptide linker may be a polymer having reactive groups at both ends, which individually bind to reactive groups of the individual portions of the fusion protein, for example, an amino terminus, a lysine residue, a histidine residue or a cysteine residue. The reactive groups of the polymer include an aldehyde group, a propionic aldehyde group, a butyl aldehyde group, a maleimide group, a ketone group, a vinyl sulfone group, a thiol group, a hydrazide group, a carbonyldimidazole (CDI) group, a nitrophenyl carbonate (NPC) group, a trysylate group, an isocyanate group, and succinimide derivatives. Examples of succinimide derivatives include succinimidyl propionate (SPA), succinimidyl butanoic acid (SBA), succinimidyl carboxymethylate (SCM), succinimidyl succinamide (SSA), succinimidyl succinate (SS), succinimidyl carbonate, and N-hydroxy succinimide (NHS). The reactive groups at both ends of the non-peptide polymer may be the same or different. For example, the non-peptide polymer may have a maleimide group at one end and an aldehyde group at another end. Preferably, the linker is a peptide linker. More preferably, the peptide linker consists of seven glycine residues.
[0054] Also the fusion protein may be flanked N- or C-terminally by additional sequences unrelated to said proteins in the fusion protein. In accordance with the present invention, a fusion protein of the invention comprises a DNA-binding domain. The term "DNA-binding domain" has the same meaning as known in the art and relates to a sequence motif/conformation within a protein that binds to DNA motifs. Protein domains that can specifically bind to a nucleic acid sequence include, e.g., zinc finger repeats, the helix-turn-helix (HTH) motif of homeodomains, and the ribbon-helix-helix (RHH) motif. Specific binding refers to the sequence specific binding and is specific, when a DNA-binding domain statistically only binds to a particular sequence and does not or essentially not bind to an unrelated sequence. The skilled person is wel1-aware of sequences encoding DNA-binding domains (Rohs et al. (2010). Annu. Rev. Biochem. 79, 233-269; Maeder et al. (2009). Nat. Protocols 10, 1471-1501).
[0055] In a more preferred embodiment of the nucleic acid molecule of the invention, the DNA-binding domain is a TAL effector motif of a TAL effector protein.
[0056] This embodiment relates to a nucleic acid molecule also encoding a TAL nuclease. The term "TAL nuclease" as used herein, is well known in the art and refers to a fusion protein comprising a DNA-binding domain, wherein the DNA-binding domain comprises or consists of Tal effector motifs of a TAL effector protein and the non-specific cleavage domain of a restriction nuclease. The fusion protein of the invention that is also employed in the method of the invention below retains or essentially retains the enzymatic activity of the endonuclease of the invention. In accordance with the present invention, said endonuclease activity (also referred to as function) is essentially retained if at least 60% of the biological activity of the endonuclease activity are retained. Preferably, at least 75% or at least 80% of the endonuclease activity are retained. More preferred is that at least 90% such as at least 95%, even more preferred at least 98% such as at least 99% of the biological activity of the endonuclease are retained. Most preferred is that the biological activity is fully, i.e. to 100%, retained. Also in accordance with the invention, fusion proteins having an increased biological activity compared to the endonuclease when not fused to a DNA-binding domain, i.e. more than 100% activity, are envisaged. Methods of assessing biological activity of (restriction) endonucleases are well known to the person skilled in the art and include, without being limiting, the incubation of an endonuclease with recombinant DNA and the analysis of the reaction products by gel electrophoresis (Bloch K D.; Curr Protoc Mol Biol 2001; Chapter 3:Unit 3.2).
[0057] The term "Tal effector protein", as used herein, refers to proteins belonging to the TAL (transcription activator-like) family of proteins. These proteins are expressed by bacterial plant pathogens of the genus Xanthomonas. Members of the large TAL effector family are key virulence factors of Xanthomonas and reprogram host cells by mimicking eukaryotic transcription factors. The pathogenicity of many bacteria depends on the injection of effector proteins via type III secretion into eukaryotic cells in order to manipulate cellular processes. TAL effector proteins from plant pathogenic Xanthomonas are important virulence factors that act as transcriptional activators in the plant cell nucleus. PthXo1, a TAL effector protein of a Xanthomonas rice pathogen, activates expression of the rice gene Os8N3, allowing Xanthomonas to colonize rice plants. TAL effector proteins are characterized by a central domain of tandem repeats, i.e. a DNA-binding domain as well as nuclear localization signals (NLSs) and an acidic transcriptional activation domain. Members of this effector family are highly conserved and differ mainly in the amino acid sequence of their repeats and in the number of repeats. The number and order of repeats in a TAL effector protein determine its specific activity. These repeats are referred to herein as "TAL effector motifs". One exemplary member of this effector family, AvrBs3 from Xanthomonas campestris pv. vesicatoria, contains 17.5 repeats and induces expression of UPA (up-regulated by AvrBs3) genes, including the Bs3 resistance gene in pepper plants (Kay, et al. 2005 Mol Plant Microbe Interact 18(8): 838-48; Kay, S. and U. Bonas 2009 Curr Opin Microbiol 12(1): 37-43). The repeats of AvrBs3 are essential for DNA binding of AvrBs3 and represent a distinct type of DNA binding domain. The mechanism of sequence specific DNA recognition has been elucidated by recent studies on the AvrBs3, Hax2, Hax3 and Hax4 proteins that revealed the TAL effectors' DNA recognition code (Boch, J., et al. 2009 Science 326: 1509-12).
[0058] Tal effector motifs or repeats are 32 to 34 amino acid protein sequence motifs. The amino acid sequences of the repeats are conserved, except for two adjacent highly variable residues (at positions 12 and 13) that determine specificity towards the DNA base A, G, C or T. In other words, binding to DNA is mediated by contacting a nucleotide of the DNA double helix with the variable residues at position 12 and 13 within the Tal effector motif of a particular Tal effector protein (Boch, J., et al. 2009 Science 326: 1509-12).Therefore, a one-to-one correspondence between sequential amino acid repeats in the Tal effector proteins and sequential nucleotides in the target DNA was found. Each Tal effector motif primarily recognizes a single nucleotide within the DNA substrate. For example, the combination of histidine at position 12 and aspartic acid at position 13 specifically binds cytosine; the combination of asparagine at both position 12 and position 13 specifically binds guanosine; the combination of asparagine at position 12 and isoleucine at position 13 specifically binds adenosine and the combination of asparagine at position 12 and glycine at position 13 specifically binds thymidine. Binding to longer DNA sequences is achieved by linking several of these Tal effector motifs in tandem to form a "DNA-binding domain of a Tal effector protein". Thus, a DNA-binding domain of a Tal effector protein relates to DNA-binding domains found in naturally occurring Tal effector proteins as well as to DNA-binding domains designed to bind to a specific target nucleotide sequence as described in the examples below. The use of such DNA-binding domains of Tal effector proteins for the generation of Tal effector motif-nuclease fusion proteins that recognize and cleave a specific target sequence depends on the reliable generation of DNA-binding domains of Tal effector proteins that can specifically recognize said particular target. Methods for the generation of DNA-binding domains of Tal effector proteins are wel1-known in the art (Zhang et al. (2011). Nat Biotechol. 29, 149-153; Cermak et al. (2011). Nucleic Acis Res. April 14, PubMed identifier 21493687).
[0059] Preferably, the DNA-binding domain is derived from the Tal effector motifs found in naturally occurring Tal effector proteins, such as for example Tal effector proteins selected from the group consisting of AvrBs3, Hax2, Hax3 or Hax4 (Bonas et al. 1989. Mol Gen Genet 218(1): 127-36; Kay et al. 2005 Mol Plant Microbe Interact 18(8): 838-48).
[0060] Envisaged in accordance with the present invention are fusion proteins that are provided as a DNA-binding domain of a Tal effector protein coupled with a single nuclease domain. These monomeric proteins can be combined to act as a functional dimer in order to develop nuclease activity through the cooperation of two nuclease domains, each being part of one fusion protein.
[0061] Preferably, the TAL nuclease in accordance with the present invention comprises more than one, i.e. several Tal effector motifs, such as at least 12 Tal effector motifs, such as for example at least 14 or at least 16 Tal effector motifs. More preferably, the TAL nuclease comprises at least 18 Tal effector motifs. In other words, the DNA-binding domain of a Tal effector protein within said fusion protein is comprised of at least 18 Tal effector motifs. In the case of fusion proteins consisting of dimers as described above this means that each fusion protein monomer comprises at least nine Tal effector motifs. Methods for testing the DNA-binding specificity of a fusion protein in accordance with the present invention are known to the skilled person and include, without being limiting, transcriptional reporter gene assays and electrophoretic mobility shift assays (EMSA).
[0062] Preferably, the binding site of the fusion protein is up to 500 nucleotides, such as up to 250 nucleotides, up to 100 nucleotides, up to 50 nucleotides, up to 25 nucleotides, up to 10 nucleotides such as up to 5 nucleotides upstream (i.e. 5') or downstream (i.e. 3') of the nucleotide(s) that is/are modified in accordance with the method of the present invention as detailed below.
[0063] In another embodiment, the invention relates to a vector encoding the nucleic acid molecule of the invention.
[0064] The term "vector" in accordance with the invention preferably means a plasmid, cosmid, virus, bacteriophage or another vector used e.g. conventionally in genetic engineering which carries the nucleic acid molecule of the invention either encoding the peptide or the fusion protein of the invention. Accordingly, the nucleic acid molecule of the invention may be inserted into several commercially available vectors. Non-limiting examples include prokaryotic plasmid vectors, such as of the pUC-series, pBluescript (Stratagene), the pET-series of expression vectors (Novagen) or pCRTOPO (Invitrogen) and vectors compatible with an expression in mammalian cells like pREP (Invitrogen), pcDNA3 (Invitrogen), pCEP4 (Invitrogen), pMC1neo (Stratagene), pXT1 (Stratagene), pSG5 (Stratagene), EBO-pSV2neo, pBPV-1, pdBPVMMTneo, pRSVgpt, pRSVneo, pSV2-dhfr, plZD35, pLXIN, pSIR (Clontech), pIRES-EGFP (Clontech), pEAK-10 (Edge Biosystems) pTriEx-Hygro (Novagen) and pClNeo (Promega). xamples for plasmid vectors suitable for Pichia pastoris comprise e.g. the plasmids pAO815, pPIC9K and pPIC3.5K (all Intvitrogen).
[0065] The nucleic acid molecule of the present invention referred to above may also be inserted into vectors such that a (further) translational fusion with another nucleic acid molecule is generated. To this aim, overlap extension PCR can be applied (e.g. Wurch, T., Lestienne, F., and Pauwels, P.J., A modified overlap extension PCR method to create chimeric genes in the absence of restriction enzymes, Biotechn. Techn. 12, 9, Sept. 1998, 653-657). The products arising therefrom are termed fusion proteins and will be described further below. The other nucleic acid molecules may encode a protein which may e.g. increase the solubility and/or facilitate the purification of the protein encoded by the nucleic acid molecule of the invention. Non-limiting examples include pET32, pET41, pET43. The vectors may also contain an additional expressible nucleic acid coding for one or more chaperones to facilitate correct protein folding. Suitable bacterial expression hosts comprise e. g. strains derived from BL21 (such as BL21(DE3), BL21(DE3)PlysS, BL21(DE3)RIL, BL21(DE3)PRARE) or Rosetta.RTM..
[0066] Particularly preferred plasmids which can be used to introduce the nucleic acid encoding the polypeptide of the invention having the activity of an endonuclease into the host cell are: pUC18/19 (Roche Biochemicals), pBluescript II (Alting-Mees, et al. (1992). Meth. Enzymol., 216, 483-495), pKK-177-3H (Roche Biochemicals), pBTac2 (Roche Biochemicals), pKK223-3 (Amersham Pharmacia Biotech), pKK-233-3 (Stratagene) and pET (Novagen).
[0067] For vector modification techniques, see Sambrook and Russel, 2001. Generally, vectors can contain one or more origins of replication (ori) and inheritance systems for cloning or expression, one or more markers for selection in the host, e.g., antibiotic resistance, and one or more expression cassettes. Suitable origins of replication include, for example, the Col E1, the SV40 viral and the M13 origins of replication.
[0068] The coding sequences inserted in the vector can e.g. be synthesized by standard methods, or isolated from natural sources. Ligation of the coding sequences to transcriptional regulatory elements and/or to other amino acid encoding sequences can be carried out using established methods. Transcriptional regulatory elements (parts of an expression cassette) ensuring expression in prokaryotes or eukaryotic cells are well known to those skilled in the art. These elements comprise regulatory sequences ensuring the initiation of the transcription (e. g., translation initiation codon, transcriptional termination sequences, promoters, enhancers, and/or insulators), internal ribosomal entry sites (IRES) and optionally poly-A signals ensuring termination of transcription and stabilization of the transcript. Additional regulatory elements may include transcriptional as well as translational enhancers, and/or naturally-associated or heterologous promoter regions. The regulatory elements may heterologous regulatory elements. Preferably, the nucleic acid molecule of the invention is operably linked to such expression control sequences allowing expression in prokaryotes or eukaryotic cells. The vector may further comprise nucleotide sequences encoding secretion signals as further regulatory elements. Such sequences are well known to the person skilled in the art. Furthermore, depending on the expression system used, leader sequences capable of directing the expressed polypeptide to a cellular compartment may be added to the coding sequence of the nucleic acid molecule of the invention. Such leader sequences are well known in the art. Specifically designed vectors allow the shuttling of DNA between different hosts, such as bacteria-fungal cells or bacteria-animal cells.
[0069] The co-transfection with a selectable marker such as kanamycin or ampicillin resistance genes for culturing in E. coli and other bacteria allows the identification and isolation of the transfected cells. Selectable markers for mammalian cell culture are the dhfr, gpt, neomycin, hygromycin resistance genes. The transfected nucleic acid can also be amplified to express large amounts of the encoded polypeptide. The DHFR (dihydrofolate reductase) marker is useful to develop cell lines that carry several hundred or even several thousand copies of the gene of interest. Another useful selection marker is the enzyme glutamine synthase (GS) (Fisher et al., Infect Immun. 1991 October 59(10):3562-5; Bebbington et al., Biotechnology (N Y). 1992 Feb.;10(2):169-75).
[0070] Using such markers, the cells are grown in selective medium and the cells with the highest resistance are selected.
[0071] In another embodiment the invention relates to a host cell comprising, e.g., as a result of transformation, transduction, microinjection or transfection, the nucleic acid molecule or the vector of the invention.
[0072] A variety of host-expression systems may be conceived to express the endonuclease coding sequence in a host cell using a suitable vector.
[0073] The "host cell" in accordance with the invention may be produced by introducing the nucleic acid molecule or vector(s) of the invention into the host cell which upon its/their presence preferably mediates the expression of the nucleic acid molecule of the invention encoding the endonuclease of the invention. The host from which the host cell is derived may be any prokaryote or eukaryotic cell.
[0074] A suitable eukaryotic host cell may be a vertebrate cell, an amphibian cell, a fish cell, an insect cell, a fungal/yeast cell, a nematode cell or a plant cell. The insect cell may be a Spodoptera frugiperda cell, a Drosophila S2 cell or a Spodoptera Sf9 cell, the fungal/yeast cell may a Saccharomyces cerevisiae cell, Pichia pastoris cell or an Aspergillus cell. It is preferred that the vertebrate cell is a mammalian cell such as a human cell, CHO, COS, 293 or Bowes melanoma cell. The plant cell is preferably selected independently from a cell of Anacardium, Anona, Arachis, Artocarpus, Asparagus, Atropa, Avena, Brassica, Carica, Citrus, Citrullus, Capsicum, Carthamus, Cocos, Coffea, Cucumis, Cucurbita, Daucus, Elaeis, Fragaria, Glycine, Gossypium, Helianthus, Heterocallis, Hordeum, Hyoseyamus, Lactuca, Linum, Lolium, Lupinus, Lycopersicon, Malus, Manihot, Majorana, Medicago, Nicotiana, Olea, Oryza, Panieum, Pannesetum, Passiflora, Persea, Phaseolus, Pistachia, Pisum, Pyrus, Prunus, Psidium, Raphanus, Ricinus, Secale, Senecio, Sinapis, Solanum, Sorghum, Theobromus, Trigonella, Triticum, Vicia, Vitis, Vigna and Zea. The cell may be a part of a cell line. The cell from plant may, e.g., be derived from root, leave, bark, needle, bole or caulis.
[0075] Suitable prokaryotes (bacteria) useful as hosts for the invention are those generally used for cloning and/or expression like E. coli (e.g., E coli strains BL21, HB101, DH5a, XL1 Blue, Y1090 and JM101), Salmonella typhimurium, Serratia marcescens, Burkholderia glumae, Pseudomonas putida, Pseudomonas fluorescens, Pseudomonas stutzeri, Streptomyces lividans, Lactococcus lactis, Mycobacterium smegmatis, Streptomyces or Bacillus subtilis. Appropriate culture mediums and conditions for the above described host cells are known in the art.
[0076] Preferred examples for host cell to be genetically engineered with the nucleic acid molecule or the vector(s) of the invention is a cell of yeast, E. coli and/or a species of the genus Bacillus (e.g., B. subtilis). The most preferred host cell is Bacillus spec.
[0077] In a further embodiment the invention relates to a method of producing a protein or fusion having the activity of an endonuclease as defined herein above comprising the steps: (a) culturing the host cell of the invention and (b) isolating the produced protein or fusion protein having the activity of said endonuclease.
[0078] Suitable conditions for culturing a prokaryotic or eukaryotic host are well known to the person skilled in the art. Suitable conditions for culturing E. coli DH18B.alpha.kat E (Invitrogen), Pichia pastoris or Aspergillus niger are, for example provided in the examples of the invention. In general, suitable conditions for culturing bacteria are growing them under aeration in Luria Bertani (LB) medium. To increase the yield and the solubility of the expression product, the medium can be buffered or supplemented with suitable additives known to enhance or facilitate both. E. coli can be cultured from 4 to about 37 .degree. C., the exact temperature or sequence of temperatures depends on the molecule to be overexpressed. In general, Aspergillus sp. may be grown on Sabouraud dextrose agar, or potato dextrose agar at about to 10.degree. C. to about 40.degree. C., and preferably at about 25.degree. C. Suitable conditions for yeast cultures are known, for example from Guthrie and Fink, "Guide to Yeast Genetics and Molecular Cell Biology" (2002); Academic Pr Inc. The skilled person is also aware of all these conditions and may further adapt these conditions to the needs of a particular host species and the requirements of the polypeptide expressed. In case an inducible promoter controls the nucleic acid of the invention in the vector present in the host cell, expression of the polypeptide can be induced by addition of an appropriate inducing agent. Suitable expression protocols and strategies are known to the skilled person.
[0079] Depending on the cell type and its specific requirements, mammalian cell culture can e.g. be carried out in RPMI or DMEM medium containing 10% (v/v) FCS, 2mM L-glutamine and 100 U/ml penicillin/streptomycin. The cells can be kept at 37 .degree. C. in a 5% CO2, water saturated atmosphere.
[0080] Suitable expression protocols for eukaryotic cells are well known to the skilled person and can be retrieved e.g. from in Sambrook, 2001.
[0081] Methods of isolation of the polypeptide produced are wel1-known in the art and comprise without limitation method steps such as ion exchange chromatography, gel filtration chromatography (size exclusion chromatography), affinity chromatography, high pressure liquid chromatography (HPLC), reversed phase HPLC, disc gel electrophoresis or immunoprecipitation, see, for example, in Sambrook, 2001.
[0082] The step of protein isolation is preferably a step of protein purification. Protein purification in accordance with the invention specifies a process or a series of processes intended to further isolate the polypeptide of the invention from a complex mixture preferably to homogeneity. Purification steps, for example, exploit differences in protein size, physico-chemical properties and binding affinity. For example, proteins may be purified according to their isoelectric points by running them through a pH graded gel or an ion exchange column. Further, proteins may be separated according to their size or molecular weight via size exclusion chromatography or by SDS-PAGE (sodium dodecyl sulfate-polyacrylamide gel electrophoresis) analysis. In the art, proteins are often purified by using 2D-PAGE and are then further analysed by peptide mass fingerprinting to establish the protein identity. This is very useful for scientific purposes and the detection limits for protein are very low and nanogram amounts of protein are sufficient for their analysis. Proteins may also be separated by polarity/hydrophobicity via high performance liquid chromatography or reversed-phase chromatography. Thus, methods for protein purification are well known to the skilled person.
[0083] Furthermore, the invention relates in one embodiment to a protein or fusion protein having the activity of an endonuclease encoded by the nucleic acid molecule or vector of the invention.
[0084] The definitions for proteins or fusion proteins having the activity of an endonuclease encoded by the nucleic acid molecule or vector of the invention already given in the above embodiments pertaining to the nucleic acid molecule or vector of the invention apply explicitly also to this embodiment.
[0085] As a consequence of its endonuclease activity, another embodiment of the invention relates to the use of the protein or fusion protein of the invention to cleave a nucleic acid molecule, e.g. in one of the methods of the invention described below.
[0086] Furthermore, the present invention also relates to a kit comprising the nucleic acid molecule, the protein and/or the fusion protein of the invention. The various components of the kit may be packaged in one or more containers such as one or more vials. The vials may, in addition to the components, comprise preservatives or buffers for storage. In addition, the kit may contain instructions for use.
[0087] In another embodiment, the invention relates to a method of modifying a target sequence in the genome of a eukaryotic cell, the method comprising the step: (a) introducing into said cell the nucleic acid molecule, the vector or the protein or fusion protein of the invention.
[0088] The term "modifying" as used in accordance with the present invention refers to random and site-specific genomic manipulations resulting in changes in the nucleotide sequence of the genome of the eukaryotic host. When the fusion protein of the invention is introduced, site-specific modification of said "target sequence" in the genome is achieved via the DNA-binding domain. When only the protein of the invention is introduced, the "target sequence" is no specific sequence, because the novel endonuclease is not site-specific. Thus, the protein of the invention may be used to introduce random mutations into a genome, i.e. the "target sequence" occurs multiple times with in the genome and does not depend on a specific sequence motif. The genetic material comprising these changes in its nucleotide sequence is also referred to herein as the "modified target sequence" when modification is site-specific as, e.g. in the case of using the fusion protein of the invention. The term "modifying" includes, but is not limited to, substitution, insertion and deletion of one or more nucleotides within the target sequence. In the process of homologous recombination, the end product may reflect a deletion of sequences. As is understood by the skilled person, a homologous recombination, on the other hand, always also includes the incorporation of genetic material from the donor DNA sequence, which in this embodiment, however, leads to an overall deletion. It is understood by the skilled person that by simply introducing double-strand breaks into the genome of a cell modifications can be introduced that are the result of homologous recombination (in the presence and absence of exogenous donor sequences) or an endogenous DNA-repair mechanism such as, e.g., the non-homologous end joining (NHEJ) DNA repair that is prone to introducing small deletions at the site of the double-strand break in the course of ligating the broken ends.
[0089] The term "substitution", as used herein, refers to the replacement of nucleotides with other nucleotides. The term includes for example the replacement of single nucleotides resulting in point mutations. Said point mutations can lead to an amino acid exchange in the resulting protein product but may also not be reflected on the amino acid level. Also encompassed by the term "substitution" are mutations resulting in the replacement of multiple nucleotides, such as for example parts of genes, such as parts of exons or introns as well as replacement of entire genes.
[0090] The term "insertion" in accordance with the present invention refers to the incorporation of one or more nucleotides into a nucleic acid molecule. Insertion of parts of genes, such as parts of exons or introns as well as insertion of entire genes is also encompassed by the term "insertion". When the number of inserted nucleotides is not dividable by three, the insertion can result in a frameshift mutation within a coding sequence of a gene. Such frameshift mutations will alter the amino acids encoded by a gene following the mutation. In some cases, such a mutation will cause the active translation of the gene to encounter a premature stop codon, resulting in an end to translation and the production of a truncated protein. When the number of inserted nucleotides is instead dividable by three, the resulting insertion is an "in-frame insertion". In this case, the reading frame remains intact after the insertion and translation will most likely run to completion if the inserted nucleotides do not code for a stop codon. However, because of the inserted nucleotides, the resulting protein will contain, depending on the size of the insertion, one or multiple new amino acids that may effect the function of the protein.
[0091] The term "deletion" as used in accordance with the present invention refers to the loss of nucleotides or part of genes, such as exons or introns as well as entire genes. As defined with regard to the term "insertion", the deletion of a number of nucleotides that is not evenly dividable by three will lead to a frameshift mutation, causing all of the codons occurring after the deletion to be read incorrectly during translation, potentially producing a severely altered and most likely non-functional protein. If a deletion does not result in a frameshift mutation, i.e. because the number of nucleotides deleted is dividable by three, the resulting protein is nonetheless altered as the it will lack, depending on the size of the deletion, several amino acids that may affect or effect the function of the protein.
[0092] The above defined modifications are not restricted to coding regions in the genome, but can also occur in non-coding regions of the target genome, for example in regulatory regions such as promoter or enhancer elements or in introns.
[0093] Examples of modifications of the target genome include, without being limiting, the introduction of mutations into a wild type gene in order to analyse its effect on gene function; the replacement of an entire gene with a mutated gene or, alternatively, if the target sequence comprises mutation(s), the alteration of these mutations to identify which mutation is causative of a particular effect; the removal of entire genes or proteins or the removal of regulatory elements from genes or proteins as well as the introduction of fusion-partners, such as for example purification tags such as the his-tag or the tap-tag etc. In the latter case, the term "addition" may also be used instead of "insertion" so as to describe the preferable addition of a tag to a terminus of a polypeptide rather than within the sequence of a polypeptide
[0094] The term "eukaryotic cell" as used herein, refers to any cell of a unicellular or multi-cellular eukaryotic organism, including cells from animals like vertebrates and from fungi and plants. Preferably, but without limitation, the cell is a mammalian cell. The term "mammalian cell" as used herein, is well known in the art and refers to any cell belonging to an animal that is grouped into the class of mammalia. The term "cell" as used in connection with the present invention can refer to a single and/or isolated cell or to a cell that is part of a multicellular entity such as a tissue, an organism or a cell culture another. In other words the method can be performed in vivo, ex vivo or in vitro. Depending on the particular goal to be achieved through modifying the genome of a mammalian cell, cells of different mammalian subclasses such as prototheria or theria may be used. For example, within the subclass of theria, preferably cells of animals of the infraclass eutheria, more preferably of the order primates, artiodactyla, perissodactyla, rodentia and lagomorpha are used in the method of the invention as detailed below. Furthermore, within a species one may choose a cell to be used in the method of the invention based on the tissue type and/or capacity to differentiate equally depending on the goal to be achieved by modifying the genome. Three basic categories of cells make up the mammalian body: germ cells, somatic cells and stem cells. A germ cell is a cell that gives rise to gametes and thus is continuous through the generations. Stem cells can divide and differentiate into diverse specialized cell types as well as self renew to produce more stem cells. In mammals there are two main types of stem cells: embryonic stem cells and adult stem cells. Somatic cells include all cells that are not a gametes, gametocytes or undifferentiated stem cells. The cells of a mammal can also be grouped by their ability to differentiate. A totipotent (also known as omnipotent) cell is a cell that is able to differentiate into all cell types of an adult organism including placental tissue such as a zygote (fertilized oocyte) and subsequent blastomeres, whereas pluripotent cells, such as embryonic stem cells, cannot contribute to extraembryonic tissue such as the placenta, but have the potential to differentiate into any of the three germ layers endoderm, mesoderm and ectoderm. Multipotent progenitor cells have the potential to give rise to cells from multiple, but limited number of cell lineages.
[0095] Further, there are oligopotent cells that can develop into only a few cell types and unipotent cells (also sometimes termed a precursor cell) that can develop into only one cell type. There are four basic types of tissues: muscle tissue, nervous tissue, connective tissue and epithelial tissue that a cell to be used in the method of the invention can be derived from, such as for example hematopoietic stem cells or neuronal stem cells. To the extent human cells are envisaged for use in the method of the invention, it is preferred that such human cell is not obtained from a human embryo, in particular not via methods entailing destruction of a human embryo. On the other hand, human embryonic stem cells are at the skilled person's disposal such as taken from existent embryonic stem cell lines commercially available. Accordingly, the present invention may be worked with human embryonic stem cells without any need to use or destroy a human embryo. Alternatively, or instead of human embryonic stem cells, pluripotent cells that resemble embryonic stem cells such induced pluripotent stem (iPS) cells may be used, the generation of which is state of the art (Hargus G et al., Proc Natl Acad Sci U S A 107:15921-15926; Jaenisch R. and Young R., 2008, Cell 132:567-582; Saha K, and Jaenisch R., 2009, Cell Stem Cell 5:584-595).
[0096] The term "nucleic acid molecules encoding said protein or fusion protein in expressible form" refers to a nucleic acid molecule which, upon expression in a cell or a cel1-free system, results in a functional protein or fusion protein of the invention. Preferably, but without limitation, said nucleic acid is mRNA. Alternatively, DNA having appropriate transcription signals to enable expression or cDNA may be used.
[0097] Introduction of the protein, fusion protein or of the nucleic acid molecule encoding said protein, fusion protein in expressible form into a cell can be achieved by methods known in the art and depends on the nature of said proteins or nucleic acid molecules. For example, and in the case of introducing nucleic acid molecules, said introducing can be achieved by chemical based methods (calcium phosphate, liposomes, DEAE-dextrane, polyethylenimine, nucleofection), non chemical methods (electroporation, sonoporation, optical transfection, gene electrotransfer, hydrodynamic delivery), particle-based methods (gene gun, magnetofection, impalefection) and viral methods. Preferably, the nucleic acid molecules are to be introduced into the nucleus by methods such as, e.g., microinjection or nucleofection. Methods for carrying out microinjection are well known in the art and are described for example in Nagy et al. (Nagy A, Gertsenstein M, Vintersten K, Behringer R., 2003. Manipulating the Mouse Embryo. Cold Spring Harbour, New York: Cold Spring Harbour Laboratory Press) as well as in the examples herein below. It is understood by the skilled person that depending on the method of introduction it may be advantageous to adapt DNA molecules. For example, a linear DNA molecule may be more efficient in homologous recombination events when using electroporation as method to introduce said DNA molecule into a, e.g., mammalian cell, whereas a circular DNA molecule may be more advantageous when injecting cells.
[0098] All the definitions and preferred embodiments defined above with regard to the nucleic acid molecule, protein or fusion protein of the invention also apply mutatis mutandis in the context of the method of the invention.
[0099] In accordance with the present invention, the term "target sequence in the genome" refers to the genomic location that is to be modified by the method of the invention. The "target sequence in the genome" comprises but is not restricted to the nucleotide(s) subject to the particular modification. Furthermore, and preferably with regard to the fusion protein of the invention the term "target sequence in the genome" also comprises regions for binding of homologous sequences of a second nucleic acid molecule. In other words, the term "target sequence in the genome" also comprises the sequence flanking/surrounding the relevant nucleotide(s) to be modified. In some instances, the term "target sequence" may also refer to the entire gene to be modified.
[0100] Specific binding has been defined herein above and ensures that double-strand breaks are only introduced within said target sequence.
[0101] In a more preferred embodiment of the method of the invention, the modification of said target sequence is by homologous recombination with a donor nucleic acid sequence, further comprising the step: (b) introducing a nucleic acid molecule into said cell, wherein said nucleic acid molecule comprises said donor nucleic acid sequence, wherein said donor DNA sequence is flanked upstream by a first flanking element and downstream by a second flanking element, wherein said first and second flanking element are different and wherein each of said first and second flanking element are homologous to a continuous DNA sequence on either side of the double-strand break introduced in (a) of the method of the invention within said target sequence in the genome of said eukaryotic cell.
[0102] The term "homologous recombination", is used according to the definitions provided in the art. Thus, it refers to a mechanism of genetic recombination in which two DNA strands comprising similar nucleotide sequences exchange genetic material. Cells use homologous recombination during meiosis, where it serves to rearrange DNA to create an entirely unique set of haploid chromosomes, but also for the repair of damaged DNA, in particular for the repair of double strand breaks. The mechanism of homologous recombination is well known to the skilled person and has been described, for example by Paques and Haber (Paques F, Haber J E.; Microbiol Mol Biol Rev 1999; 63:349-404). In the method of the present invention, homologous recombination of the donor sequence is enabled by the presence of said first and said second flanking element being placed upstream (5') and downstream (3'), respectively, of said donor DNA sequence each of which being homologous to a continuous DNA sequence within said target sequence.
[0103] In accordance with the present invention, the term "donor DNA sequence" refers to a DNA sequence that serves as a template in the process of homologous recombination and that carries the modification that is to be introduced into the target sequence. By using this donor DNA sequence as a template, the genetic information, including the modifications, is copied into the target sequence within the genome of the cell by way of homologous recombination. In non-limiting examples, the donor nucleic acid sequence can be essentially identical to the part of the target sequence to be replaced, with the exception of one nucleotide which differs and results in the introduction of a point mutation upon homologous recombination or it can consist of an additional gene previously not present in the target sequence. Conceivably, the nature, i.e. its length, base composition, similarity with the target sequence, of the donor DNA sequence depends on how the target sequence is to be modified as well as the particular goal to be achieved by the modification of the target sequence. It is understood by those skilled in the art that said donor DNA sequence is flanked by sequences that are homologous to sequences within the target sequence to enable homologous recombination to take place leading to the incorporation of the donor DNA sequence into the genome of said cell. In addition to being homologous to a continuous DNA sequence within the genomic DNA, the first and the second flanking element are different to allow targeted homologous recombination to take place.
[0104] The term "homologous to a continuous DNA sequence on either side of the double-strand break introduced in (a) of the method of the invention within said target sequence", in accordance with the present invention, refers to regions having sufficient sequence identity to ensure specific binding to the target sequences that lie upstream and downstream of the location of the double-strand break. The term "homologous" as used herein can be interchanged with the term "identical" as outlined herein elsewhere with regard to varying levels of sequence identity. Methods to evaluate the identity level between two nucleic acid sequences are well known in the art and have been described herein above. These methods involving programs, in addition to providing a pairwise sequence alignment, also report the sequence identity level (usually in percent identity) and the probability for the occurrence of the alignment by chance (P-value) and can further be used to predict the occurrence of specific binding.
[0105] Preferably, said first and second flanking element being "homologous to a continuous DNA sequence within said target sequence" (also referred to as "homology arms" in the art) have a sequence identity with the corresponding part of the target sequence of at least 95%, more preferred at least 97%, more preferred at least 98%, more preferred at least 99%, even more preferred at least 99.9% and most preferred 100%. The above defined sequence identities are defined only with respect to those parts of the target sequence which serve as binding sites for the homology arms, i.e. said first and said second flanking element. Thus, the overall sequence identity between the entire target sequence and the homologous regions of the nucleic acid molecule of step (b) of the method of modifying a target sequence of the present invention can differ from the above defined sequence identities, due to the presence of the part of the target sequence which is to be replaced by the donor DNA sequence.
[0106] The flanking elements homologous to the target sequence comprised in the DNA molecule have a length of at least 170 bp each. Preferably, the elements each have a length of at least 250 nucleotides, at least 300 nucleotides, at least 400 nucleotides, at least 500 nucleotides, such as at least 600 nucleotides, at least 750 bp nucleotides, more preferably at least 1000 nucleotides, such as at least 1500 nucleotides, even more preferably at least 2000 nucleotides and most preferably at least 2500 nucleotides. The maximum length of the elements homologous to the target sequence comprised in the nucleic acid molecule depends on the type of cloning vector used and can be up to a length 20.000 nucleotides each in E. coli high copy plasmids using the col El replication origin (e.g. pBluescript) or up to a length of 300.000 nucleotides each in plasmids using the F-factor origin (e.g. in BAC vectors such as for example pTARBAC1).
[0107] The DNA molecules comprising the donor DNA sequence and the flanking elements are--necessarily if the site-specific nuclease (fusion protein) binding site is contained undisrupted within one of the flanking elements and preferably if the site-specific nuclease (fusion protein) binding site is disrupted by the donor sequence, i.e. one part on each of the flanking elements--modified so that the fusion protein not introduce a double-strand break into the sequence of the donor DNA as part of a DNA molecule. When the fusion protein is a TAL or zinc-finger nuclease, this can be achieved, e.g., by modifying either the binding or cleavage motif (see Example 2, FIG. 12).
[0108] It will be appreciated by one of skill in the art that said DNA molecule to be introduced into the cell in item (b) of the method of the invention may comprise all a nucleic acid molecule (sequence) encoding said fusion protein in expressible form and the nucleic acid molecule comprising the donor nucleic acid sequence and the flanking elements homologous to the target sequence. Alternatively, the nucleic acid molecule of item (b) may be a distinct nucleic acid molecule, to be introduced in addition to the nucleic acid molecules encoding said fusion protein in expressible form of item (a).
[0109] Also envisaged in a preferred embodiment of the method of the invention is that said cell is analysed for successful modification of said target sequence in the genome.
[0110] Methods for analysing for the presence or absence of a modification are well known in the art and include, without being limiting, assays based on physical separation of nucleic acid molecules, sequencing assays as well as cleavage and digestion assays and DNA analysis by the polymerase chain reaction (PCR).
[0111] Examples for assays based on physical separation of nucleic acid molecules include without limitation MALDI-TOF, denaturating gradient gel electrophoresis and other such methods known in the art, see for example Petersen et al., Hum. Mutat. 20 (2002) 253-259; Hsia et al., Theor. Appl. Genet. 111 (2005) 218-225; Tost and Gut, Clin. Biochem. 35 (2005) 335-350; Palais et al., Anal. Biochem. 346 (2005) 167-175.
[0112] Examples for sequencing assays comprise without limitation approaches of sequence analysis by direct sequencing, fluorescent SSCP in an automated DNA sequencer and pyrosequencing. These procedures are common in the art, see e.g. Adams et al. (Ed.), "Automated DNA Sequencing and Analysis", Academic Press, 1994; Alphey, "DNA Sequencing: From Experimental Methods to Bioinformatics", Springer Verlag Publishing, 1997; Ramon et al., J. Transl. Med. 1 (2003) 9; Meng et al., J. Clin. Endocrinol. Metab. 90 (2005) 3419-3422.
[0113] Examples for cleavage and digestion assays include without limitation restriction digestion assays such as restriction fragments length polymorphism assays (RFLP assays), RNase protection assays, assays based on chemical cleavage methods and enzyme mismatch cleavage assays, see e.g. Youil et al., Proc. Natl. Acad. Sci. U.S.A. 92 (1995) 87-91; Todd et al., J. Oral Maxil. Surg. 59 (2001) 660-667; Amar et al., J. Clin. Microbiol. 40 (2002) 446-452.
[0114] Alternatively, instead of analysing the cells for the presence or absence of the desired modification, in particular in the case of sequence-specific modification, successfully modified cells may be selected by incorporation of appropriate selection markers. Selection markers include positive and negative selection markers, which are well known in the art and routinely employed by the skilled person. Non-limiting examples of selection markers include dhfr, gpt, neomycin, hygromycin, dihydrofolate reductase, G418 or glutamine synthase (GS) (Murphy et al., Biochem J. 1991, 227:277; Bebbington et al., Bio/Technology 1992, 10:169). Using these markers, the cells are grown in selective medium and the cells with the highest resistance are selected. Also envisaged are combined positive-negative selection markers, which may be incorporated into the target genome by homologous recombination or random integration. After positive selection, the first cassette comprising the positive selection marker flanked by recombinase recognition sites is exchanged by recombinase mediated cassette exchange against a second, marker-less cassette. Clones containing the desired exchange cassette are then obtained by negative selection.
[0115] In a preferred embodiment of the method of the invention, the cell is selected from the group consisting of a mammalian or vertebrate cell, a plant cell or a fungal cell.
[0116] In another preferred embodiment of the method of the invention, the cell is an oocyte.
[0117] As used herein the term "oocyte" refers to the female germ cell involved in reproduction, i.e. the ovum or egg cell. In accordance with the present invention, the term "oocyte" comprises both oocytes before fertilisation as well as fertilised oocytes, which are also called zygotes. Thus, the oocyte before fertilisation comprises only maternal chromosomes, whereas an oocyte after fertilisation comprises both maternal and paternal chromosomes. After fertilisation, the oocyte remains in a double-haploid status for several hours, in mice for example for up to 18 hours after fertilisation. In accordance with the invention, the oocyte may be non-human.
[0118] In a more preferred embodiment of the method of the invention, the oocyte is a fertilised oocyte. The term "fertilised oocyte", as used herein, refers to an oocyte after fusion with the fertilizing sperm. For a period of many hours (such as up to 18 hours in mice) after fertilisation, the oocyte is in a double-haploid state, comprising one maternal haploid pronucleus and one paternal haploid pronucleus. After migration of the two pronuclei together, their membranes break down, and the two genomes condense into chromosomes, thereby reconstituting a diploid organism. Preferably, the mammalian or avian oocyte used in the method of the present invention is a fertilised mammalian or avian oocyte in the double-haploid state. In the case of oocytes to be used as cells in the method of the invention the protein, fusion protein or the nucleic acid molecule encoding said protein or fusion protein is introduced into the oocyte by microinjection. Microinjection into the oocyte can be carried out by injection into the nucleus (before fertilisation), the pronucleus (after fertilisation) and/or by injection into the cytoplasm (both before and after fertilisation). When a fertilised oocyte is employed, injection into the pronucleus is carried out either for one pronucleus or for both pronuclei. Injection of the Tal-finger nuclease or of a DNA encoding the Tal-finger nuclease of step (a) of the method of modifying a target sequence of the present invention is preferably into the nucleus/pronucleus, while injection of an mRNA encoding the Tal-finger nuclease of step (a) is preferably into the cytoplasm. Injection of the nucleic acid molecule of step (b) is preferably into the nucleus/pronucleus. However, injection of the nucleic acid molecule of step (b) can also be carried out into the cytoplasm when said nucleic acid molecule is provided as a nucleic acid sequence having a nuclear localisation signal to ensure delivery into the nucleus/pronucleus. Preferably, the microinjection is carried out by injection into both the nucleus/pronucleus and the cytoplasm. For example, the needle can be introduced into the nucleus/pronucleus and a first amount of the Tal-finger nuclease and/or nucleic acid molecule are injected into the nucleus/pronucleus. While removing the needle from the oocyte, a second amount of the Tal-finger nuclease and/or nucleic acid molecule is injected into the cytoplasm.
[0119] Methods for carrying out microinjection are well known in the art and are described for example in Nagy et al. (Nagy A, Gertsenstein M, Vintersten K, Behringer R., 2003. Manipulating the Mouse Embryo. Cold Spring Harbour, New York: Cold Spring Harbour Laboratory Press) as well as in the examples herein below.
[0120] Also preferred is that the nucleic acid molecule of step (b) of the method of the invention is (also) introduced into the cell by microinjection.
[0121] In another embodiment, the invention relates to method of producing a non-human vertebrate or mammal carrying a modified target sequence in its genome, the method comprising transferring a cell produced by the method of the invention into a pseudo pregnant female host.
[0122] In accordance with the present invention, the term "transferring a cell produced by the method of the invention into a pseudopregnant female host" includes the transfer of a fertilised oocyte but also the transfer of pre-implantation embryos of for example the 2-cell, 4-cell, 8-cell, 16-cell and blastocyst (70- to 100-cell) stage. Said pre-implantation embryos can be obtained by culturing the cell under appropriate conditions for it to develop into a pre-implantation embryo. Furthermore, injection or fusion of the cell with a blastocyst are appropriate methods of obtaining a pre-implantation embryo. Where the cell produced by the method of the invention is a somatic cell, derivation of induced pluripotent stem cells is required prior to transferring the cell into a female host such as for example prior to culturing the cell or injection or fusion of the cell with a pre-implantation embryo. Methods for transferring an oocyte or pre-implantation embryo to a pseudo pregnant female host are well known in the art and are, for example, described in Nagy et al., (Nagy A, Gertsenstein M, Vintersten K, Behringer R., 2003. Manipulating the Mouse Embryo. Cold Spring Harbour, New York: Cold Spring Harbour Laboratory Press).
[0123] It is further envisaged in accordance with the method of producing a non-human vertebrate or mammal carrying a modified target sequence in its genome that a step of analysis of successful genomic modification is carried out before transplantation into the female host. As a non-limiting example, the oocyte can be cultured to the 2-cell, 4-cell or 8-cell stage and one cell can be removed without destroying or altering the resulting embryo. Analysis for the genomic constitution, e.g. the presence or absence of the genomic modification, can then be carried out using for example PCR or southern blotting techniques or any of the methods described herein above. Such methods of analysis of successful genotyping prior to transplantation are known in the art and are described, for example in Peippo et al. (Peippo J, Viitala S, Virta J, Raty M, Tammiranta N, Lamminen T, Aro J, Myllymaki H, Vilkki J.; Mol Reprod Dev 2007; 74:1373-1378).
[0124] Where the cell is an oocyte, the method of producing a non-human vertebrate or mammal carrying a modified target sequence in its genome comprises (a) modifying the target sequence in the genome of a vertebrate or mammalian oocyte in accordance with the method of the invention; (b) transferring the oocyte obtained in (a) to a pseudopregnant female host; and, optionally, (c) analysing the offspring delivered by the female host for the presence of the modification.
[0125] For this method of producing a non-human vertebrate or mammal, fertilisation of the oocyte is required. Said fertilisation can occur before the modification of the target sequence in step (a) in accordance with the method of producing a non-human vertebrate or mammal of the invention, i.e. a fertilised oocyte can be used for the method of modifying a target sequence in accordance with the invention. The fertilisation can also be carried out after the modification of the target sequence in step (a), i.e. a non-fertilised oocyte can be used for the method of modifying a target sequence in accordance with the invention, wherein the oocyte is subsequently fertilised before transfer into the pseudopregnant female host.
[0126] The step of analysing for the presence of the modification in the offspring delivered by the female host provides the necessary information whether or not the produced non-human vertebrate or mammal carries the modified target sequence in its genome. Thus, the presence of the modification is indicative of said offspring carrying a modified target sequence in its genome whereas the absence of the modification is indicative of said offspring not carrying the modified target sequence in its genome. Methods for analysing for the presence or absence of a modification have been detailed above.
[0127] The non-human vertebrate or mammal produced by the method of the invention is, inter alia, useful to study the function of genes of interest and the phenotypic expression/outcome of modifications of the genome in such animals. It is furthermore envisaged, that the non-human mammals of the invention can be employed as disease models and for testing therapeutic agents/compositions. Furthermore, the non-human vertebrate or mammal of the invention can also be used for livestock breeding.
[0128] In a preferred embodiment, the method of producing a non-human vertebrate or mammal further comprises culturing the cell to form a pre-implantation embryo or introducing the cell into a blastocyst prior to transferring it into the pseudo pregnant female host. Methods for culturing the cell to form a pre-implantation embryo or introducing the cell into a blastocyst are well known in the art and are, for example, described in Nagy et al., loc. cit.
[0129] The term "introducing the cell into a blastocyst" as used herein encompasses injection of the cell into a blastocyst as well as fusion of a cell with a blastocyst. Methods of introducing a cell into a blastocyst are described in the art, for example in Nagy et al., loc. cit.
[0130] The present invention further relates to a non-human vertebrate or mammalian animal obtainable by the above described method of the invention.
[0131] In a preferred embodiment of the methods of the invention, the cell is from a mammal selected from the group consisting of rodents, dogs, felides, primates, rabbits, pigs, or cows or the cell is from an avian selected from the group consisting of chickens, turkeys, pheasants, ducks, geese, quails and ratites including ostriches, emus and cassowaries or the cell is from a fish such as for example a zebrafish, salmon, trout, common carp or coi carp.
[0132] All of the mammals, avians and fish described herein are well known to the skilled person and are taxonomically defined in accordance with the prior art and the common general knowledge of the skilled person.
[0133] Non-limiting examples of "rodents" are mice, rats, squirrels, chipmunks, gophers, porcupines, beavers, hamsters, gerbils, guinea pigs, degus, chinchillas, prairie dogs, and groundhogs.
[0134] Non-limiting examples of "dogs" include members of the subspecies canis lupus familiaris as well as wolves, foxes, jackals, and coyotes.
[0135] Non-limiting examples of "felides" include members of the two subfamilies: the pantherinae, including lions, tigers, jaguars and leopards and the felinae, including cougars, cheetahs, servals, lynxes, caracals, ocelots and domestic cats.
[0136] The term "primates", as used herein, refers to all monkey including for example cercopithecoid (old world monkey) or platyrrhine (new world monkey) as well as lemurs, tarsiers, apes and marmosets (Callithrix jacchus).
[0137] As regards the embodiments characterized in this specification, in particular in the claims, it is intended that each embodiment mentioned in a dependent claim is combined with each embodiment of each claim (independent or dependent) said dependent claim depends from. For example, in case of an independent claim 1 reciting 3 alternatives A, B and C, a dependent claim 2 reciting 3 alternatives D, E and F and a claim 3 depending from claims 1 and 2 and reciting 3 alternatives G, H and I, it is to be understood that the specification unambiguously discloses embodiments corresponding to combinations A, D, G; A, D, H; A, D, I; A, E, G; A, E, H; A, E, I; A, F, G; A, F, H; A, F, I; B, D, G; B, D, H; B, D, I; B, E, G; B, E, H; B, E, I; B, F, G; B, F, H; B, F, I; C, D, G; C, D, H; C, D, I; C, E, G; C, E, H; C, E, I; C, F, G; C, F, H; C, F, I, unless specifically mentioned otherwise.
[0138] Similarly, and also in those cases where independent and/or dependent claims do not recite alternatives, it is understood that if dependent claims refer back to a plurality of preceding claims, any combination of subject-matter covered thereby is considered to be explicitly disclosed. For example, in case of an independent claim 1, a dependent claim 2 referring back to claim 1, and a dependent claim 3 referring back to both claims 2 and 1, it follows that the combination of the subject-matter of claims 3 and 1 is clearly and unambiguously disclosed as is the combination of the subject-matter of claims 3, 2 and 1. In case a further dependent claim 4 is present which refers to any one of claims 1 to 3, it follows that the combination of the subject-matter of claims 4 and 1, of claims 4, 2 and 1, of claims 4, 3 and 1, as well as of claims 4, 3, 2 and 1 is clearly and unambiguously disclosed.
[0139] The Figures Show:
[0140] FIG. 1: TAL-Nuclease expression vectors.
[0141] The figure shows the structure and function of TAL-Nuclease fusion proteins, consisting of a sequence-specific DNA-binding domain and a nonspecific DNA cleavage (nuclease) domain. The DNA-binding domain can be assembled from the four types of 34 amino acid TAL peptide elements that exhibit binding specificity against one of the DNA nucleotides through the amino acid positions 12 and 13 (NI-A; HD-C; NG-T; NN-G). Upon binding of the TAL element domain to the selected target DNA sequence, the nuclease domain of the fusion protein comes into close contact to the DNA double-strand but does not cleave the DNA as a nuclease monomer. Only upon the binding of a second TAL-Nuclease fusion protein to a second DNA target sequence located downstream of the binding site of the first fusion protein, the DNA double strand is cleaved through cooperation of the two nuclease domains that are in close contact.
[0142] FIG. 2: TAL-Nuclease induced modification of genomic sequences.
[0143] The figure shows a pair of TAL-nuclease fusion proteins that bind up- and downstream of a selected target site within a genomic target gene. Upon the creation of a DNA double-strand break within the target site two competing DNA repair mechanisms are strongly activated in cells: i) by homologous recombination, in the presence of an externally introduced gene targeting vector that comprises two homology regions to the target gene and a predesigned genetic modification/mutation, the preplanned modification is copied from the targeting vector into the genome; by this route any targeted gene modification (e.g. knock-out, knock-in) can be placed into the genome, ii) by the non-homologous end joining repair pathway (NHEJ) the free DNA ends are closed by ligation without a repair template; by this route a variable number of nucleotides is frequently lost (knife symbol) before end ligation and results frequently into a knockout allele of the target gene.
[0144] FIG. 3: Use of TAL-Nucleases for gene targeting in mammalian cell lines and zygotes.
[0145] A: For the generation of genetic modifications in mammalian cell lines TAL-nuclease expression vectors can be transfected, together with or without a specific gene targeting vector, into cultured cells. Upon nuclease expression and DNA repair a fraction of the treated cells contains the desired genetic alteration. These cells can be isolated and further cultured as a pure genetically modified cell line. B: Upon the microinjection of TAL-nuclease mRNA, together with or without a specific gene targeting vector, into fertilized mammalian oocytes (zygotes, isolated from wildtype female e.g. mice) a knockout (KO) or Knockin (KI) allele can be directly introduced into the genome of the one-cell embryo. Pseudopregnant females deliver live offspring from microinjected oocytes. The offspring is genotyped for the presence of the induced genetic modification. Positive animals are selected for further breeding to establish a gene targeted strain.
[0146] FIG. 4: TAL-Nuclease expression vectors.
[0147] The Tal nuclease expression vector pCAG-Tal-nuclease contains a CAG promoter region and a transcriptional unit comprising, upstream of a central pair of BsmBl restriction sites, an ATG start codon (arrow), a nuclear localisation sequence (NLS), a FLAG Tag sequence (FLAG), a linker sequence, a segment coding for 110 amino acids of the Tal protein AvrBs3 (AvrN) and its invariable N-terminal Tal repeat (r0.5). Downstream of the BsmBl sites the transcriptional unit contains an invariable C-terminal Tal repeat (rx.5), a segment coding for 44 amino acids derived from the Tal protein AvrBs3, a Pmel and Mlul restriction site for the insertion of nuclease coding regions and a polyadenylation signal sequence (pA). DNA segments coding for TAL repeat elements can be inserted into the BsmBl sites of pCAG-Tal-nuclease for the expression of variable TAL-nuclease fusion proteins. To create ArtTal1-nuclease expression vectors the ArtTal1 array of TAL repeat elements, recognizing the specified 12 bp target sequence, was inserted into the BsmBl sites of pCAG-TAL-nuclease. Each 34 amino acid Tal repeat is drawn as a square indicating the repeat's amino acid code at positions 12/13 that confers binding to one of the DNA nucleotides of the target sequence (NI>A, NG>T, HD>C, NN>G) shown above. Next, synthetic nuclease domain coding regions were inserted into the Pmel and Mlul sites of pCAG-ArtTal1-nuclease to obtain the expression vectors: A: pCAG-ArtTal1-Alw including the nuclease domain of the Alwl restriction endonuclease, B: pCAG-ArtTal1-CleDORF including the nuclease domain of the CleDORF gene, C: pCAG-ArtTal1-Clo051 including the nuclease domain of the Clo051 gene, D: pCAG-ArtTal1-Mly including the nuclease domain of the Mlyl restriction endonuclease, E: pCAG-ArtTal1-Pept071 including the nuclease domain of the Pept071 gene, F: pCAG-ArtTal1-Sbf including the nuclease domain of the Sbfl restriction endonuclease, G: pCAG-ArtTal1-Sdal including the nuclease domain of the Sdal restriction endonuclease, H: pCAG-ArtTal1-Sst including the nuclease domain of the Stsl restriction endonuclease, and I: pCAG-ArtTal1-Fok including the nuclease domain of the Fokl restriction endonuclease
[0148] FIG. 5: Amino acid sequence of the Clo051 protein
[0149] Sequence of the 587 amino acid Clo051 protein in the single letter code. Indicated are the methionine at position 1 (M1), the tyrosine at positon 587 (Y587) and the 199 residue nuclease domain between position E389 and Y587. Further highlighted are the positions D455, D472 and K474 that are characteristic for the conserved active site of the `PD-(D/E)XK` superfamily of enzymes interacting with DNA.
[0150] FIG. 6: Predicted structure of the Clo051 protein and its nuclease domain.
[0151] The tertiary structure of the Clo051 protein was predicted from its amino acid sequence (FIG. 5) using the I-TASSER software. The secondary structures are shown as alpha-helical and beta-stranded regions. Highlighted are the methionine at position 1 (M1), the glutamate residue 389 (E389) and tyrosine 587 (Y587). The protein chain between E389 and Y587 forms a separate folding domain that acts as a nuclease.
[0152] FIG. 7: TAL-Nuclease reporter plasmids and nuclease reporter assay.
[0153] A: TAL-nuclease reporter plasmids contain a CMV promoter region, a 400 bp sequence coding for the N-terminal segment of .beta.-galactosidase and a stop codon. This unit is followed by a TAL binding target region consisting of two inverse oriented recognition sequences (underlined), separated by a 15 bp spacer region (NNN.), for the ArtTali array (a), the TalRab1 array (b), the TalRab2 array (c), or a hybrid binding region composed of one ArtTal1 and one TalRab2 recognition sequence (d). The TAL-nuclease target region is followed by the complete coding region for .beta.-galactosidase and a polyadenylation signal (pA). To test for nuclease activity against the target sequence a TAL-nuclease expression vector (FIG. 4) is transiently cotransfected with its corresponding reporter plasmid into HEK 293 cells. Upon expression of the TAL-nuclease protein the reporter plasmid is opened by a nuclease-induced double-strand break within the TAL-nuclease target sequence (scissor symbol). B: The DNA regions adjacent to the double-strand break are identical over 400 bp and can be aligned and recombined (X) by homologous recombination DNA repair. C: Homologous recombination of an opened reporter plasmid results into a functional .beta.-galactosidase expression vector that produces the .beta.-galactosidase enzyme. After two days the transfected cells are lysed and the enzyme activity in the lysate is determined with a chemiluminescent reporter assay. The levels of the reporter catalysed light emission are measured and indicate TAL-nuclease activity in comparison to samples that were transefcted with the reporter plasmid alone.
[0154] FIG. 8: Activity of Tal nuclease fusion proteins in HEK 293 cells.
[0155] To test for the nuclease activity of TAL-nuclease domain fusion proteins, expression vectors for the ArtTal1-Alwl, -CleDORF, -Clo051, -Mlyl, -Fokl, -Pept071, -Sbfl, -Sdal, and -Stsl proteins (FIG. 4) were transfected together with the ArtTal1 reporter plasmid (FIG. 7) into HEK 293 cells. Specific nuclease activity against the reporter plasmid's target sequence leads to homologous recombination and the expression of .beta.-galactosidase. Two days after transfection the cell populations were lysed and the.quadrature. .beta.-galactosidase activity determined with a chemiluminescent reporter assay. The levels of light emission were normalised in relation to the activity of a cotransfected Luciferase expression plasmid (pLuciferase) and are shown in comparison to the activity of a positive control .beta.-galactosidase expression vector. The bar for each transfected sample represents the mean value and SD derived from three culture wells transfected side by side. A: The transfection of the ArtTal1 reporter plasmid without nuclease expression vector results in a low background level of .beta.-galactosidase. The cotransfection of pCAG-ArtTal1-Alwl, -CleDORF, and -Mlyl with the ArtTal1 reporter plasmid did not lead to a significant increase of reporter expression, indicating that the ArtTal1-Alwl, -CleDORF, and --Mlyl fusion proteins do not exhibit nuclease activity. In contrast, the cotransfection of the ArtTal1 reporter and the pCAG-ArtTal1-Clo051 plasmids resulted in a strong increase of reporter expression, indicating that the ArtTal1-CIo051 fusion protein exhibits target specific nuclease activity in 293 cells. B: In an independent transfection experiment the cotransfection of pCAG-ArtTal1-Pept071, -Sbfl, -Sdal and -Sst with the ArtTal1 reporter plasmid did not lead to a significant increase of reporter expression, as compared to the ArtTal1 reporter plasmid alone, indicating that the ArtTal1-Pept071, -Sbfl, -Sdal, and -Stsl fusion proteins do not exhibit nuclease activity. In contrast, the cotransfection of the ArtTal1 reporter and the pCAG-ArtTal1-Fokl plasmids resulted in the increase of reporter expression, indicating the nuclease activity of the ArtTal1-Fokl fusion protein in 293 cells.
[0156] FIG. 9: Target sequence specificity of the ArtTal1-Clo051 nuclease.
[0157] To test for the specificity of the ArtTal1-Clo051 nuclease against the predesigned target sequence in comparison to unrelated DNA sequences, the pCAG-ArtTal1-Clo051 expression vector was cotransfected with the corresponding ArtTal1-reporter plasmid or with the TalRab1 or TalRab2 reporter plasmids (FIG. 7), which contain unrelated target sequences, into HEK 293 cells. Strong nuclease activity developed only in the specific combination of the ArtTal1-Clo051 expression vector together with the ArtTal1-reporter plasmid, indicating that the ArtTal1-Clo051 nuclease acts specifically against the predesigned target sequence.
[0158] FIG. 10: Characterisation of the cooperativity of TAL-Clo051 nuclease fusion proteins A: To test for the cooperativity of the Clo051 nuclease domains of a pair of TAL-Clo051 fusion proteins, expression vectors for the ArtTal1-Clo051 or TalRab2-Clo051 fusion proteins were cotransfected with the corresponding ArtTal1- or TalRab2-reporter plasmid (FIG. 7) and compared to the cotransfection with the ArtTa1/TalRab2-reporter plasmid, that contains a hybrid target region (FIG. 7). Significant nuclease activity developed only in the combination of TAL-nuclease expression vectors with reporter plasmids that contain two identical, inverse copies of the corresponding TAL array target sequence, but not with the ArtTal1/TalRab2-reporter plasmid that contains only a single binding sequence of the ArtTal1-Clo051 and TalRab2-Clo051 fusion proteins. This result indicates that two Clo051 nuclease domains must cooperate to induce a DNA double-strand break, whereas a single Clo051 nuclease domain does not act as a nuclease. B: The cotransfection of the ArtTal1/TalRab2-reporter plasmid with both expression vectors for ArtTal1-Clo051 and TalRab2-Clo051, but not with ArtTal1-Clo051 or -Fok alone, results into strong nuclease activity, as compared to the transfection of the ArtTal1/TalRab2 reporter plasmid. This result indicates that nuclease activity and the induction of double-strand breaks in the target region occurs only upon the binding of two TAL-Clo051 fusion proteins and the interaction of a pair of Clo051 nuclease domains.
[0159] FIG. 11: Design of a TAL-Clo051 fusion protein pair in accordance with the present invention, recognizing the mouse Rab38 gene.
[0160] TAL nucleases recognizing a target sequence within exon 1 of the mouse Rab38 gene. The trinucleotide representing codon 19 is underlined. Indicated is each of a 14 nucleotide sequence that is recognised by one the indicated TAL-Clo051 fusion proteins RabChtTal1- and
[0161] RabChtTa2-Clo051. The two 14 bp target sequences are flanking a central 15 bp spacer sequence that is cleaved by the Clo051 nuclease domains.
[0162] FIG. 12: Strategy for the modification of the mouse Rab38 gene in ES cells and zygotes using TAL-Clo051 fusion proteins.
[0163] Within exon 1 of the wildtype Rab38 gene (Rab38 WT) the position of the binding sites for the TAL nuclease pair RabChtTal1- and RabChtTa12-Clo051 are indicated. The Rab38-cht targeting vector contains a 942 bp 5'-homology region and a 2788 bp 3'-homology region flanking the Rab38 TAL recognition sites. Within exon1 two nucleotide changes within codon 19 (Gta) of Rab38 create a chocolate (cht) missense mutation coding for valine (Val) instead of the wildtype (WT) glycine (Gly), and remove a BsaJI restriction site. In each of the adjacent Rab38 TAL recognition sites several silent mutations were introduced to prevent the binding of Rab38 TAL proteins to the targeting vector. The induction of a double-strand break within the wildtype Rab38 gene by the RabChtTal protein pair stimulates homologous recombination with the Rab38-cht targeting vector and integrates the chocolate missense and the silent mutations into the genome.
[0164] FIG. 13: Isolation of hyperactive Clo051 nuclease mutants.
[0165] The figure shows the primary sequence of the Clo051 nuclease domain between the positions E389 and Y587. Indicated is the distribution of the positively charged arginine (R) and lysine (K) residues (filled squares) and of negatively charged glutamate (E) and aspartate (D) residues (open circles). Triangles indicate the positions S423 and R446. These residues constitute a three-dimensional framework of charges within the Clo051 domain that determines the unique tertiary structure of this nuclease, as modelled in the structure of FIG. 6. Certain replacements of polar versus non-polar residues or of non-polar residues against polar residues, e.g. at the positions 423 and 446, changes the three-dimensional structure of the protein chain and results into a more efficiently working nuclease activity.
[0166] FIG. 14: Activity of ArtTal1-Clo051 nuclease on a genomic reporter in HEK 293 cells
[0167] HEK293 cells harboring genomic integrated copies of the pCMV-Rab-Reporter(hygro) reporter construct were transfected with pBluescript or pCAG.ArtTal1-Clo051. Specific nuclease activity against the reporter's target sequence leads to homologous recombination and the expression of .beta.-galactosidase. Two days after transfection the cell populations were fixed and the fraction of .beta.-galactosidase expressing cells was determined by histochemical X-Gal staining. A: X-Gal stained reporter cell culture upon transfection with pBluescript. B: X-Gal stained reporter cell culture upon transfection with pCAG-ArtTal1-Clo051 nuclease expression vector.
[0168] The examples illustrate the invention:
EXAMPLE 1
Construction of Expression and Reporter Vectors for Tal Nucleases and Detection of Specific Nuclease Activity
[0169] Construction of TAL-Nuclease Expression Vectors
[0170] For the expression of TAL-nucleases in mammalian cells we designed the generic expression vector pCAG-TAL-nuclease (SEQ ID NO: 3) (FIG. 4), that contains a CAG hybrid promoter region and a transcriptional unit comprising a sequence coding for a N-terminal peptide of 176 amino acids (SEQ ID NO: 4) of TAL nuclease fusion proteins, located upstream of a pair of
[0171] BsmBl restriction sites. This N-terminal regions includes an ATG start codon, a nuclear localisation sequence, a FLAG Tag sequence, a glycine rich linker sequence, a segment coding for 110 amino acids of the Tal protein AvrBs3 and the invariable N-terminal Tal repeat of the Hax3 TAL effector. Downstream of the central BsmBl sites, the transcriptional unit contains 78 codons (SEQ ID NO: 5) including an invariable C-terminal TAL repeat (34 amino acids) and 44 residues derived from the TAL protein AvrBs3, followed by a Pmel and Mlul restriction site for the insertion of a nuclease coding region and by a polyadenylation signal sequence (pA). DNA segments coding for arrays of TAL repeats, designed to bind a TAL nuclease target sequence can be inserted into the BsmBl sites of pCAG-Tal-nuclease in frame with the up- and downstream coding regions for the expression of predesigned TAL-nuclease proteins. To generate TAL-nuclease vectors for expression in mammalian cells we inserted a synthetic DNA segment with the coding region of an array of 12 Tal repeats, designated ArtTal1 (SEQ ID NO: 6), into the BsmBl sites of pCAG-TAL-nuclease, to derive the plasmid pCAG-ArtTal1-nuclease (SEQ ID NO: 7). The TAL element array ArtTal1 recognises the artificial DNA target sequence 5'-ATTCTGGGACGT-3' (SEQ ID NO: 62) (FIG. 4), In another example we inserted a synthetic DNA segment with the coding region of an array of 14 Tal repeats, designated TalRab2 (SEQ ID NO: 8), into the BsmBl sites of pCAG-TAL-nuclease, to derive the plasmid pCAG-TalRab2-nuclease (SEQ ID NO: 9). The TAL element array TalRab2 recognises the DNA target sequence 5'-GGTGGCCCGGTAGT-3' (SEQ ID NO: 63) (FIG. 7) that occurs within the mouse Rab38 gene. The TAL target sequences were selected such that the binding regions of the TAL proteins are preceeded by a T nucleotide. Following the sequence downstream of the initial T in the 5'>3' direction, specific TAL DNA-binding domains were combined together into arrays of 12 (ArtTal1) (FIG. 4), or 14 (TalRab2) TAL elements. Each TAL element motif consists of 34 amino acids, the position 12 and 13 of which determines the specificity towards recognition of A, G, C or T within the target sequence. To derive TAL element DNA-binding domains we used the TAL effector motif (repeat) #11 of the Xanthomonas Hax3 protein (GenBank accession No. AY993938.1 (LTPEQVVAIASNIGGKQALETVQRLLPVLCQAHG) (SEQ ID NO: 64) with amino acids N12 and I13 to recognize A, the TAL effector motif (repeat) #5 (LTPQQVVAIASHDGGKQALETVQRLLPVLCQAHG) (SEQ ID NO: 65) derived from the Hax3 protein with amino acids H12 and D13 to recognize C, and the TAL effector motif (repeat) #4 (LTPQQVVAIASNGGGKQALETVQRLLPVLCQAHG) (SEQ ID NO: 66) from the Xanthomonas Hax4 protein (Genbank accession No.: AY993939.1) with amino acids N12 and G13 to recognize T. To recognize a target G nucleotide we used the TAL effector motif (repeat) #4 from the Hax4 protein with replacement of the amino acids 12 into N and 13 into N (LTPQQVVAIASNNGGKQALETVQRLLPVLCQAHG) (SEQ ID NO: 67).
[0172] Next, we constructed fusion proteins of the ArtTal1 DNA binding domain with protein domains derived from known or putative nucleases and tested whether these TAL-nuclease fusion proteins are able to induce a double-strand break next to the DNA bound by the TAL recognition region. For this purpose we inserted synthetic DNA segments comprising the coding regions of eight putative nuclease domains and the known nuclease domain of Fokl (SEQ ID NO: 10), into the Pmel and Mlul sites of the pCAG-ArtTal1-nuclease plasmid. Among the eight putative nuclease domains we selected domains from the five known restriction enzymes Alwl (SEQ ID NO: 11), Mlyl (SEQ ID NO: 12), Sbfl (SEQ ID NO: 13), Sdal (SEQ ID NO: 14) and Stsl (SEQ ID NO: 15). In addition, we selected putative nuclease domains of three yet uncharacterised, hypothetical microbial genes, designated here as `CIeDORF` (SEQ ID NO: 16) (NCBI Reference Sequence: ZP_02080987.1, derived from the genome of Clostridium leptum DSM753), `Clo051` (SEQ ID NO: 17) (NCBI Reference Sequence: ZP_05132802.1, derived from the genome of Clostridium spec.7_2_43FAA) and `Pept071` (SEQ ID NO: 18) (NCBI Reference Sequence: ZP_07399918.1, derived from the genome of Peptoniphilus duerdenii ATCC BAA-1640). These proteins were selected by characteristic sequence features that are compatible with the conserved active site of the `PD-(D/E)XK` superfamily of enzymes (Kosinski, J., et al. (2005). BMC Bioinformatics, 6,172) interacting with DNA (see FIG. 6 for the Clo051 protein).
[0173] In particular, the 587 residue Clo051 protein can be classified as a member of the PD-(D/E)XK protein family by the location of the amino acid pairs P454/D455 (PD motif) and D472/K474 (DXK motif) (FIG. 5). To elucidate whether the Clo051 protein contains a separate nuclease domain we performed a three-dimensional structural prediction from its primary amino acid sequence using the I-TASSER software (Roy, A. et al. (2010). Nat Protoc., 5(4):725-38). As shown in FIG. 6 the Clo051 protein is composed of two protein domains. The C-terminal domain of Clo051, approximately beginning with the residue E389, contains the PD-(D/E)XK family consensus motif and appears as a non specific nuclease domain.
[0174] For the expression of these protein domains in mammalian cells we used synthetic coding regions optimised according to the mammalian codon usage and inserted segments comprising the putative nuclease domains of Alwl (SEQ ID NO: 19), CleDORF (SEQ ID NO: 20), Clo051 (SEQ ID NO: 1), Mlyl (SEQ ID NO: 21), Pept071 (SEQ ID NO: 22), Sbfl (SEQ ID NO: 23), Sdal (SEQ ID NO: 24), Stsl (SEQ ID NO: 25) and the known nuclease domain of Fokl (SEQ ID NO: 26) into the Pmel and Mlul sites of the pCAG-ArtTal1-nuclease plasmid, to derive the expression vectors pCAG-ArtTal1-Alwl (SEQ ID NO: 27) (FIG. 4A), pCAG-ArtTal1-CleDORF (SEQ ID NO: 28) (FIG. 4B), pCAG-ArtTal1-Clo051 (SEQ ID NO: 29) (FIG. 4C), pCAG-ArtTal1-Mlyl (SEQ ID NO: 30) (FIG. 4D), pCAG-ArtTal1-Pept071 (SEQ ID NO: 31) (FIG. 4E), pCAG-ArtTal1-Sbfl (SEQ ID NO: 32) (FIG. 4F), pCAG-ArtTal1-Sdal (SEQ ID NO: 33) (FIG. 4G), pCAG-ArtTal1-Stsl (SEQ ID NO: 34) (FIG. 4H), and pCAG-ArtTal1-Fokl (SEQ ID NO: 35) (FIG. 4I). These expression vectors code for the TAL-fusion proteins designated as ArtTal1-Alwl (SEQ ID NO: 36), ArtTal1-CleDORF (SEQ ID NO: 37), ArtTal1-Clo051 (SEQ ID NO: 38), ArtTal1-Mlyl (SEQ ID NO: 39), ArtTal1-Pept071 (SEQ ID NO: 40), ArtTal1-Sbfl (SEQ ID NO: 41), ArtTal1-Sdal (SEQ ID NO: 42), ArtTal1-Stsl (SEQ ID NO: 43), and ArtTal1-Fokl (SEQ ID NO: 44).
[0175] Construction of TAL Nuclease Reporter Plasmids
[0176] To determine the activity and specificity of TAL nuclease domain fusion proteins in mammalian cells we constructed TAL nuclease reporter plasmids that contain two copies of a TAL DNA target sequence in inverse orientation, separated by a 15 nucleotide spacer region (FIG. 7a-d). This configuration enables to measure the activity of a single type of TAL nuclease that interacts as a homodimer of two protein molecules that are bound to the inverse pair of target sequences of the reporter plasmid. Upon DNA binding and interaction of two nuclease domains the reporter plasmid DNA is cleaved within the 15 bp spacer region and exhibits a double-strand break.
[0177] The TAL nuclease reporter plasmids contain a CMV promoter region, a 400 bp sequence coding for the N-terminal segment of .beta.-galactosidase and a stop codon. This unit is followed by the TAL nuclease target region (consisting of two inverse oriented recognition sequences separated by a 15 bp spacer region) for ArtTal1-fusion proteins in the plasmid ArtTal1-reporter (SEQ ID NO: 45)(FIG. 7a), by the unrelated target sequence TalRab1 in the TalRab1-reporter plasmid (SEQ ID NO: 46) (FIG. 7b), by the target region for TalRab2 fusion proteins in the TalRab2-reporter plasmid (SEQ ID NO: 47) (FIG. 8c), or a hybrid target region containing one copy of the ArtTal1 and the TalRab2 recognition sequence in the ArtTal1/TalRab2-reporter plasmid (SEQ ID NO: 48) (FIG. 8d).
[0178] Within these reporter plasmids the TAL nuclease target regions are followed by the complete coding region for .beta.-galactosidase and a polyadenylation signal (pA). To test for nuclease activity against the specific target sequence a TAL nuclease expression vector (FIG. 4) was transiently cotransfected with its corresponding reporter plasmid into mammalian cells. Upon expression of the TAL nuclease protein the reporter plasmid is opened by a nuclease-induced double-strand break within the TAL nuclease target sequence (FIG. 7A). The DNA regions adjacent to the double-strand break are identical over 400 bp and can be aligned and recombined by homologous recombination DNA repair (FIG. 7B). Homologous recombination of an opened reporter plasmid will subsequently result into a functional .beta.-galactosidase coding region transcribed from the CMV promoter that leads to the production of .beta.-galactosidase protein (FIG. 7C). In lysates of transfected cells the enzymatic activity of .beta.-galactosidase can be determined by chemiluminescense and reports the nuclease activity of the TAL fusion proteins.
[0179] Measurement of TAL-Nuclease Activity and Specificity in Human 293 Cells
[0180] To determine the activity and specificity of TAL nucleases in mammalian cells, we electroporated one million HEK 293 cells (ATCC #CRL-1573) (Graham Fla., Smiley J, Russell W C, Nairn R., J. Gen. Virol. 36, 59-74, 1977) with 5 .mu.g plasmid DNA of one of the TAL nuclease expression vectors (FIG. 4) together with 5 .mu.g of one of the TAL nuclease reporter plasmids (FIG. 7). In addition, each sample received 5 .mu.g of the firefly Luciferase expression plasmid pCMV-hLuc (SEQ ID NO: 49) and was adjusted to a total DNA amount of 20 .mu.g with pBluescript (pBS) plasmid DNA (SEQ ID NO: 50). Upon transfection the cells were seeded in triplicate wells of a 6-well tissue culture plate and cultured for two days before analysis was started. For analysis the transfected cells of each well were lysed and the .beta.-galactosidase and luciferase enzyme activities of the lysates were individually determined using chemiluminescent reporter assays following the manufacturer's instruction (Roche Applied Science, Germany) in a luminometer (Berthold Centro LB 960). As positive control we transfected 5 .mu.g of the .beta.-galactosidase expression plasmid pCMV.beta. (SEQ ID NO: 51) with 15 .mu.g pBS, as negative control 5 .mu.g pCMV-hLuc were transfected with 15 .mu.g pBS or 5 .mu.g pCMV-hLuc together with 5 .mu.g of a TAL nuclease reporter plasmid and 10 .mu.g pBS. The triplicate .beta.-galactosidase values of each sample were normalised in relation to the levels of Luciferase activity and the mean value and standard deviation of .beta.-galactosidase activity were calculated and expressed in comparison to the pCMV.beta. positive control. In this type of recombination assay the level of the .beta.-galactosidase catalysed light emission reflects the cleavage and repair of the reporter plasmids and thereby indicates the activity of TAL nucleases.
[0181] As shown in FIG. 8 transfection of the ArtTal1-Reporter plasmid alone resulted in just background levels of .beta.-galactosidase. The cotransfection of the ArtTal1-Reporter plasmid with the expression vectors pCAG-ArtTal1-Alwl, -CleDORF, -Mlyl, -Pept071, -Sbfl, -Sdal, and -Stsl did not reveal any significant nuclease activity of the encoded TAL fusion proteins (FIG. 8), indicating that the selected nuclease domains are unable to operate in combination with TAL DNA binding elements. In contrast, the cotransfection of the ArtTal1-Reporter plasmid with the expression vectors pCAG-ArtTal1-Clo051 (FIG. 8A) and -Fokl (FIG. 8b) resulted in significantly increased reporter activity, indicating that the selected Fokl and Clo051 protein domains are able to function as nuclease in fusion with TAL DNA binding elements.
[0182] Since in repeated assays TAL fusions with the Clo051 domain appeared more active as compared to fusions with the Fokl nuclease domain, we believe that the Clo051 domain is most suited for the construction of highly active TAL-nucleases.
[0183] In order to define whether the ArtTal1-Clo051 nuclease specifically recognizes its target sequence within the ArtTal1-reporter plasmid (FIG. 7a), pCAG-ArtTal1-Clo051 was cotransfected with the corresponding ArtTal1- or with the unrelated TalRab1- or TalRab2-reporter 7766522v2 plasmids (FIG. 7b,c) into HEK 293 cells. As shown in FIG. 9 significantly increased reporter activity was detected only from the specific combination of the ArtTal-Clo051 nuclease with its corresponding promoter, whereas the cotransfection with unrelated reporter plasmids did not exhibit significant nuclease activity. These results indicate that the Clo051 nuclease domain in fusion with TAL DNA binding elements acts in a target sequence specific manner and that unrelated target sequences are not processed.
[0184] Next, we characterized whether the Clo051 nuclease domain induces recombinogenic double-strand breaks as a monomer, or whether the interaction of two nuclease domains as dimer is required. For this purpose we constructed the hybrid reporter plasmid ArtTal1/TalRab2-reporter (SEQ ID NO: 48) (FIG. 7d) that contains one ArtTal1 recognition sequence upstream of the spacer region and one TalRab2 recognition sequence downstream of the spacer region. The TalRab2 array (SEQ ID NO: 8) is composed of 14 TAL elements recognising the target sequence 5'-GGTGGCCCGGTAGT-3'' (SEQ ID NO: 63). The Clo051 nuclease domain was cloned as synthetic coding region into the Pmel and Mlul sites of plasmid pCAG-TalRab2-nuclease (SEQ ID NO: 9) to derive the expression vector pCAG-TalRab2-Clo051 (SEQ ID NO: 52) for the expression of the TalRab2-Clo051 protein (SEQ ID NO: 53). As shown in FIG. 10A the cotransfection of pCAG-ArtTal1-Clo051 together with the ArtTal1-reporter plasmid resulted in significant reporter gene expression indicating specific nuclease activity of the ArtTal1-Clo051 fusion protein. Since the ArtTal1-reporter plasmids contains two inverse ArtTal1 binding sequences, the nuclease activity of ArtTal1-Clo051 may result from the action of a single fusion protein or the combined action of two molecules. To distinguish between these possibilities pCAGArtTal1-Clo051 was cotransfected with the ArtTal1/TalRab2-reporter plasmid that contains only one ArtTal1 binding sequence. As shown in FIG. 10A the ArtTal1-Clo051 nuclease did not exhibit significant nuclease activity on the ArtTal1/TalRab2-reporter, indicating that two Clo051 nuclease domains must interact as a dimer to induce a DNA double-strand break. These results were confirmed with the TalRab2-Clo051 nuclease that acted on its corresponding TalRab2-reporter but not on the hybrid ArtTa1/TalRab2-reporter plasmid (FIG. 10A). As expected, the ArtTal1-Fokl fusion protein did likewise not exhibit nuclease activity on the ArtTal1/TalRab2-reporter (FIG. 10B).
[0185] Next, we studied whether two Clo051 nuclease domains, that are fused to different arrays of TAL DNA binding elements, are also able to interact and to induce double-strand breaks. For this purpose the expression vectors pCAG-ArtTal1-Clo051 and pCAG-TalRab2-Clo051 were cotransfected together with the ArtTal1/TalRab2-reporter plasmid and the results compared to the cotransfection of pCAG-ArtTal1-Clo051 together with the ArtTal1/TalRab2-reporter. As shown in FIG. 10B, significant nuclease activity on the ArtTal1/TalRab2-reporter developed only by the coexpression of the ArtTal1- and TalRab2-Clo051 nucleases, indicating that Clo051 nuclease domains fused with different TAL arrays are able to interact and to induce a DNA double-strand break within a hybrid target region containing the recognition sequences of two distinguished TAL DNA binding arrays.
EXAMPLE 2
Targeting of the Mouse Rab38 gene in ES Cells and Zygotes with TAL-Clo051 Nucleases
[0186] Construction of Rab38 Specific TAL-Clo051 Nucleases and a Targeting Vector
[0187] To demonstrate the functionality of TAL effector DNA-binding domain--nuclease fusion proteins in mammalian cells we designed a pair of fusion proteins that recognizes a DNA target sequence within the mouse Rab38 gene (FIG. 11). The two TAL effector DNA-binding domain--nuclease fusion proteins are intended to bind together to the bipartite target DNA region and to induce a double strand break in the spacer region of the target region to stimulate homologus recombination at the target locus in mammalian cells.
[0188] The mouse Rab38 gene encodes the RAB38 protein that is a member of a family of proteins known to play a crucial role in vesicular trafficking. In chocolate (cht) mutant mice a single nucleotide exchange at position 146 (G>T mutation) within the first exon of Rab38 leads to the replacement of glycine by valine at codon 19 (Loftus, S. K., et al., Proc Natl Acad Sci U S A, 2002. 99(7): p. 4471-6). This amino acid replacement is located within the conserved GTP binding domain of RAB38 and impairs the sorting of the tyrosinase-related protein 1 (TYRP1) into the melanosomes of Rab38.sup.cht/Rab38.sup.cht melanocytes. TYRP1 is a melanosomal membrane glycoprotein, which functions both as a 5,6-Dihydroxyindol-2-carbonic-acid oxidase enzyme to produce melanin and as a provider of structural stability to tyrosinase in the melanogenic enzyme complex. TYRP1 is believed to transit from the trans-Golgi network to stage II melanosomes by means of clathrin-coated vesicles. The reduced amount of correctly located TYRP1 leads to an impairment of pigment production and the change of fur color from black to a chocolate-like brown color in Rab38.sup.cht/Rab38.sup.cht mice. Since mutations of genes needed for melanocyte function are known to cause oculocutaneous albinism (OCD), such as Hermansky-Pudlak syndrome in man, the Rab38 gene is a candidate locus in OCD patients.
[0189] We aimed to introduce a phenocopy of the chocolate mutation at codon 19 of Rab38 using a pair of TAL-nucleases (RabChtTal1- and RabChtTal2-Clo051) that each recognise a 14 bp target sequence located up- and downstream of a central 15 bp spacer sequence within exon 1 of the Rab38 gene (FIG. 11). To derive expression vectors for the RabChtTal1- and RabChtTal2-Clo051 nucleases synthetic coding regions for the DNA binding domains RabChtTal1 and RabChtTal2 composed of 14 TAL elements and the Clo051 nuclease domain were inserted into the pCAG-TAL-nuclease vector. The resulting plasmid pCAG-RabChtTal1-Clo051 (SEQ ID NO: 54) encodes the RabChtTal1-Clo051 fusion protein (SEQ ID NO: 55), and the plasmid pCAG-RabChtTal2-Clo051 (SEQ ID NO: 56) encodes the RabChtTal2-Clo051 fusion protein (SEQ ID NO: 57).
[0190] For the modification of the Rab38 gene by homologous recombination in fertilised oocytes we constructed the gene targeting vector pRab38-chtTAL (FIG. 12) (SEQ ID NO: 58), comprised of two homology regions encompassing 942 and 2788 bp of genomic sequence flanking exon1 of the mouse Rab38 gene (SEQ ID NO: 59). For this purpose the vectors 5'- and 3'-homology arms were amplified from the genomic BAC clone RPCl-421G2 (derived from the C57BL/6J genome, Imagenes GmbH, Berlin) using specific PCR primers. Within the sequence of codon 19 we introduced two nucleotide changes that modify codon 19 from the wildtype sequence GGT, coding for glycine, into GTA, coding for valine. This new chocolate mutation can be distinguished from the natural chocolate mutation, which exhibits only a single nucleotide exchange within codon 19 (GTT) coding for valine (Loftus, S. K., et al., Proc Natl Acad Sci U S A, 2002. 99(7): p. 4471-6). Both chocolate mutant alleles can be further distinguished from the wildtype allele by restriction analysis since the mutations in codon 19 remove a recognition site for the restriction endonuclease BsaJl (FIG. 12). The recognition region for the TAL-nucleases is located downstream of codon 19 (FIG. 11). For the construction of the targeting vector 3'-homology region each 14 bp TAL fusion protein recognition sequence was further modified by the introduction of silent nucleotide changes that do not alter the RAB38 protein sequence (FIG. 12), in order to avoid the potential processing of the targeting vector by the Rab38 specific TAL-nucleases.
[0191] For the modification of the Rab38 gene by homologous recombination in mouse ES cells we modified the gene targeting vector pRab38-chtTAL (FIG. 12) by the insertion of a neomycin resistance gene as selection marker into spacer region of the TAL-nuclease recognition region, to derive the targeting vector pRab38-chtTAL-neo (SEQ ID NO: 60).
[0192] Targeting of the Rab38 Gene in ES Cells and Zygotes
[0193] To demonstrate the utility of the RabChtTal1- and RabChtTal2-Clo051 proteins for gene targeting in mammalian cells (FIG. 3) we introduced the expression vectors or protein coding mRNA together with the pRab38-chtTAL-neo targeting vector into mouse ES cells or with the pRab38-chtTAL vector into fertilised mouse oocytes.
[0194] For targeting in ES cells we transfected IDG3.2 ES cells (Hitz, C. et al. Nucleic Acids Res. 35, e90, 2007) with linearised pRab38-chtTAL-neo targeting vector together with or without the TAL-nuclease expression plasmids pCAG-RabChtTal1- and pCAG-RabChtTal2-Clo051. The transfection, selection, expansion and genotyping of neomycin resistant ES cell clones was performed according to standard gene targeting procedures as described ((Nagy A, Gertsenstein M, Vintersten K, Behringer R., 2003. Manipulating the Mouse Embryo. Cold Spring Harbour, N.Y.: Cold Spring Harbour Laboratory Press). The analysis of resistant ES cell clones revealed that the expression of the TAL-nucleases lead to a significantly increased rate of homologous recombination at the Rab38 gene in ES cells. For microinjection into fertilised mouse oocytes the circular pRab38-chtTAL vector DNA was mixed with in vitro transcribed mRNA coding for RabChtTal1- and RabChtTa12-Clo051 proteins in injection buffer as described (Meyer, M., et al., Proc Natl Acad Sci U S A. 107(34): p. 15022-6). TAL-nuclease mRNA is prepared from the linearised expression plasmids pCAG-RabChtTAl1- and pCAG-RabChtTal2-Clo051 by in vitro transcription from the T7 promoter using the mMessage mMachine kit (Ambion) according to the manufacturers instructions. The mRNA is further modified by the addition of a poly-A tail using the Poly(A) tailing kit and purified with MegaClear columns from Ambion. Finally the mRNA is precipitated and resolved in injection buffer.
[0195] To isolate fertilised oocytes, males of the C57BL/6 strain are mated to super-ovulated females of the FVB strain. For super-ovulation three-week old FVB females are treated with 2.5 IU pregnant mares serum (PMS) 2 days before mating and with 2.5 IU Human chorionic gonadotropin (hCG) at the day of mating. Fertilised oocytes are isolated from the oviducts of plug positive females and microinjected in M2 medium (Sigma-Aldrich Inc Cat. No. M7167) with the TAL-nuclease mRNA and pRab38-chtTAL targeting vector preparation into one pronucleus and the cytoplasm following standard procedures (Nagy A, Gertsenstein M, Vintersten K, Behringer R., 2003. Manipulating the Mouse Embryo. Cold Spring Harbour, N.Y.: Cold Spring Harbour Laboratory Press).
[0196] Upon microinjection the TAL-nuclease mRNAs are translated into proteins that induce a double-strand break at one or both Rab38 alleles in one or more cells of the developing embryo. This event stimulates the recombination of the pRab38-chtTAL targeting vector with a Rab38 allele via the homology regions present in the vector and leads to the site-specific insertion of the mutant codon 19 into the genome, resulting into a Rab38.sup.cht allele bearing the chocolate mutation (FIG. 12). The microinjected zygotes were transferred into pseudopregnant females to allow their further development into live mice (Nagy A, Gertsenstein M, Vintersten K, Behringer R., 2003. Manipulating the Mouse Embryo. Cold Spring Harbour, New York: Cold Spring Harbour Laboratory Press). From the resulting offspring genomic DNA was extracted from tail tips to analyse for the presence of the desired homologous recombination event at the Rab38 locus by PCR. This analysis was performed by the PCR amplification of the genomic region encompassing exon1. The presence of a Rab38.sup.cht allele can be recognised upon digestion of the PCR products with BsaJl, since the Rab38.sup.cht mutation at codon 19 leads to the removal of a BsaJI restriction site that is present in the wildtype sequence.
[0197] In one such experiment, mice derived from microinjected zygotes were analysed by a Rab38 PCR assay. Among this group most mice exhibited two alleles of the normal Rab38 wildtype genotype, whereas some individuals harboured one allele of the preplanned Rab38 chocolate mutation, as indicated by the absence of the BsaJl restriction site in exon 1
[0198] Taken together, it was possible to introduce a preplanned modification into the coding region of the Rab38 gene by TAL-Clo051 nuclease-assisted homologous recombination in mouse ES cells and fertilised oocytes.
EXAMPLE 3
Isolation of Hyperactive Clo051 Nuclease Mutants
[0199] As shown in FIG. 13 the primary sequence of the Clo051 nuclease domain between the positions E389 and Y587 exhibits a unique distribution of the positively charged arginine (R) and lysine (K) residues and of negatively charged glutamate (E) and aspartate (D) residues. These residues constitute a three-dimensional landscape of charges within the Clo051 domain that determines the unique tertiary structure of this nuclease, as shown in the structural model in FIG. 6. Certain replacements of polar versus non-polar residues or of non-polar residues against polar residues, e.g. at the positions 423 and 446, alter the three-dimensional structure of the protein chain and can result into an increase of the nuclease activity.
[0200] Such amino acid replacements may be made by trial and error or may follow specific hypotheses on the structural and functional impact on the Clo051 nuclease domain. Alternatively, a large number of randomly mutagenised variants of the Clo051 nuclease domain coding region can be assembled in a library by mutagenic PCR. This library of mutant molecules can be tested for the presence of hyperactive nuclease variants by a phenotypic screening assay in yeast, mammalian or E. coli cells that is coupled to a functional nuclease readout, e.g. as described for the improvement of the FLP recombinase (Buchholz et al., Nat. Biotechnol. 16, 657-62, 1998).
[0201] Such a functional screen for improved nuclease variants can result into the replacement of e.g. the residue 423 from a serine to a proline and of the residue 446 from an arginine to a glutamate. Such variant molecules can prove a superior nuclease activity as compared to the Clo051 wildtype form.
EXAMPLE 4
Clo051 Nuclease Induced Recombination of Genomic Substrates in Human Cells
[0202] The action of Clo051 nuclease was further tested in human HEK293 cells on a genomic integrated reporter construct. For this purpose the ArtTal1 reporter plasmid (FIG. 7) was modified by the insertion of a hygromycin resistance gene into the plasmid backbone. In addition the .beta.-galactosidase reading frame was fused with the coding region of the neomycin resistance gene, resulting in the reporter plasmid pCMV-Rab-Reporter(hygro) (SEQ ID NO: 61). To generate a cell line harboring the reporter construct in its genome, linearized reporter plasmid DNA was electroporated into human HEK 293 cells (ATCC #CRL-1573) (Graham Fla., Smiley J, Russell W C, Nairn R., J. Gen. Virol. 36, 59-74, 1977) and hygromycin resistant clones were selected and isolated. One of the resistant clones, that showed no background activity of the reporter gene, 293ArtTal-Rep#2, was chosen for further work.
[0203] Next, one million reporter cells were transfected with 5 .mu.g plasmid DNA of the Tal nuclease expression vector pCAG-ArtTal1-Clo051 (FIG. 4) or with 5 .mu.g of the unrelated cloning vector pBluescript as negative control. Upon transfection the cells were seeded in duplicate wells of a 6-well tissue culture plate and cultured for two days before analysis was started. For analysis the transfected cells of each well were fixed for 10 minutes with 4% formaldehyde and incubated for 4 hours with X-Gal staining solution (5 mM K3(FeIII(CN)6), 5 mM K4(FeII(CN)6), 2 mM MgCl2, 1 mg/ml X-Gal (5-bromo-chloro-3-indoyl-13-D-galactopyranosid). Recombined cells that express the reporter gene are visualized by an intracellular blue staining and were quantified on photographic images using the ImageJ software's cell counter function (available at the website with the address http://imagej.nih.gov/ij). As shown in FIG. 14 A, transfection with the pBluescript control plasmid did not result in positive reporter cells (>0.1%, 0 positive cells of 1076 counted cells). In contrast, the transfection of pCAG-ArtTal-1 resulted into a substantial fraction of cells that recombined the reporter construct and express .beta.-galactosidase (FIG. 14B). As quantified from photographic images, 42.7% of the reporter cells (227 positive cells of 531 counted cells) showed successful recombination as indicated by expression of the reporter gene. In conclusion, this result indicates that ArtTal1-Clo051 nuclease protein can efficiently process a target sequence located within mammalian genomic DNA.
Sequence CWU
1
1
671199PRTClostridium spec. 7_2_43 FAA 1Glu Gly Ile Lys Ser Asn Ile Ser Leu
Leu Lys Asp Glu Leu Arg Gly 1 5 10
15 Gln Ile Ser His Ile Ser His Glu Tyr Leu Ser Leu Ile Asp
Leu Ala 20 25 30
Phe Asp Ser Lys Gln Asn Arg Leu Phe Glu Met Lys Val Leu Glu Leu
35 40 45 Leu Val Asn Glu
Tyr Gly Phe Lys Gly Arg His Leu Gly Gly Ser Arg 50
55 60 Lys Pro Asp Gly Ile Val Tyr Ser
Thr Thr Leu Glu Asp Asn Phe Gly 65 70
75 80 Ile Ile Val Asp Thr Lys Ala Tyr Ser Glu Gly Tyr
Ser Leu Pro Ile 85 90
95 Ser Gln Ala Asp Glu Met Glu Arg Tyr Val Arg Glu Asn Ser Asn Arg
100 105 110 Asp Glu Glu
Val Asn Pro Asn Lys Trp Trp Glu Asn Phe Ser Glu Glu 115
120 125 Val Lys Lys Tyr Tyr Phe Val Phe
Ile Ser Gly Ser Phe Lys Gly Lys 130 135
140 Phe Glu Glu Gln Leu Arg Arg Leu Ser Met Thr Thr Gly
Val Asn Gly 145 150 155
160 Ser Ala Val Asn Val Val Asn Leu Leu Leu Gly Ala Glu Lys Ile Arg
165 170 175 Ser Gly Glu Met
Thr Ile Glu Glu Leu Glu Arg Ala Met Phe Asn Asn 180
185 190 Ser Glu Phe Ile Leu Lys Tyr
195 2597DNAClostridium spec. 7_2_43 FAA 2gaaggcatca
aaagcaacat ctccctcctg aaagacgaac tccgggggca gattagccac 60attagtcacg
aatacctctc cctcatcgac ctggctttcg atagcaagca gaacaggctc 120tttgagatga
aagtgctgga actgctcgtc aatgagtacg ggttcaaggg tcgacacctc 180ggcggatcta
ggaaaccaga cggcatcgtg tatagtacca cactggaaga caactttggg 240atcattgtgg
ataccaaggc atactctgag ggttatagtc tgcccatttc acaggccgac 300gagatggaac
ggtacgtgcg cgagaactca aatagagatg aggaagtcaa ccctaacaag 360tggtgggaga
acttctctga ggaagtgaag aaatactact tcgtctttat cagcgggtcc 420ttcaagggta
aatttgagga acagctcagg agactgagca tgactaccgg cgtgaatggc 480agcgccgtca
acgtggtcaa tctgctcctg ggcgctgaaa agattcggag cggagagatg 540accatcgaag
agctggagag ggcaatgttt aataatagcg agtttatcct gaaatac
59735866DNAArtificial sequencepCAG-TAL-nuclease 3ggcgcgccgg attcgacatt
gattattgac tagttattaa tagtaatcaa ttacggggtc 60attagttcat agcccatata
tggagttccg cgttacataa cttacggtaa atggcccgcc 120tggctgaccg cccaacgacc
cccgcccatt gacgtcaata atgacgtatg ttcccatagt 180aacgccaata gggactttcc
attgacgtca atgggtggag tatttacggt aaactgccca 240cttggcagta catcaagtgt
atcatatgcc aagtacgccc cctattgacg tcaatgacgg 300taaatggccc gcctggcatt
atgcccagta catgacctta tgggactttc ctacttggca 360gtacatctac gtattagtca
tcgctattac catggtcgag gtgagcccca cgttctgctt 420cactctcccc atctcccccc
cctccccacc cccaattttg tatttattta ttttttaatt 480attttgtgca gcgatggggg
cggggggggg gggggggcgc gcgccaggcg gggcggggcg 540gggcgagggg cggggcgggg
cgaggcggag aggtgcggcg gcagccaatc agagcggcgc 600gctccgaaag tttcctttta
tggcgaggcg gcggcggcgg cggccctata aaaagcgaag 660cgcgcggcgg gcgggagtcg
ctgcgcgctg ccttcgcccc gtgccccgct ccgccgccgc 720ctcgcgccgc ccgccccggc
tctgactgac cgcgttactc ccacaggtga gcgggcggga 780cggcccttct cctccgggct
gtaattagcg cttggtttaa tgacggcttg tttcttttct 840gtggctgcgt gaaagccttg
aggggctccg ggagggccct ttgtgcgggg gggagcggct 900cggggggtgc gtgcgtgtgt
gtgtgcgtgg ggagcgccgc gtgcggctcc gcgctgcccg 960gcggctgtga gcgctgcggg
cgcggcgcgg ggctttgtgc gctccgcagt gtgcgcgagg 1020ggagcgcggc cgggggcggt
gccccgcggt gcgggggggg ctgcgagggg aacaaaggct 1080gcgtgcgggg tgtgtgcgtg
ggggggtgag cagggggtgt gggcgcgtcg gtcgggctgc 1140aaccccccct gcacccccct
ccccgagttg ctgagcacgg cccggcttcg ggtgcggggc 1200tccgtacggg gcgtggcgcg
gggctcgccg tgccgggcgg ggggtggcgg caggtggggg 1260tgccgggcgg ggcggggccg
cctcgggccg gggagggctc gggggagggg cgcggcggcc 1320cccggagcgc cggcggctgt
cgaggcgcgg cgagccgcag ccattgcctt ttatggtaat 1380cgtgcgagag ggcgcaggga
cttcctttgt cccaaatctg tgcggagccg aaatctggga 1440ggcgccgccg caccccctct
agcgggcgcg gggcgaagcg gtgcggcgcc ggcaggaagg 1500aaatgggcgg ggagggcctt
cgtgcgtcgc cgcgccgccg tccccttctc cctctccagc 1560ctcggggctg tccgcggggg
gacggctgcc ttcggggggg acggggcagg gcggggttcg 1620gcttctggcg tgtgaccggc
ggctctagag cctctgctaa ccatgttcat gccttcttct 1680ttttcctaca gatccttaat
taataatacg actcactata ggggccgcca ccatgggacc 1740taagaaaaag aggaaggtgg
cggccgctga ctacaaggat gacgacgata aaccaggtgg 1800cggaggtagt ggcggaggtg
gggtacccgc cagtccagca gcccaggtgg atctgagaac 1860cctcggctac agccagcagc
agcaggagaa gatcaaacca aaggtgcggt ccaccgtcgc 1920tcagcaccat gaagcactgg
tggggcacgg tttcacacac gcccatattg tggctctgtc 1980tcagcatccc gctgcactcg
ggactgtggc cgtcaaatat caggacatga tcgccgctct 2040gcctgaggca acccacgaag
ccattgtggg cgtcggaaag cagtggagcg gtgccagagc 2100actcgaagca ctcctcaccg
tcgccgggga actgcggggt ccaccactcc agtccggact 2160ggacactgga cagctgctga
agatcgctaa acgcggcgga gtgacagctg tggaagctgt 2220gcacgcttgg aggaatgctc
tgacaggagc cccactgaat cttatgagac gacgtctcac 2280ggcctgaccc cacagcaggt
cgtcgctatt gcttctaatg gcggagggcg gcctgctctg 2340gagagcattg tggctcagct
gtccaggccc gatcctgccc tggctagatc cgcactcact 2400aacgatcatc tggtcgctct
cgcttgcctc ggtggacggc ccgctctgga cgcagtcaaa 2460aagggtctcc cccatgctcc
cgcactgatc aagagaacca acaggagaat tcctgaggga 2520tccgatcgtt taaacgatca
cgcgtaaatg attgcagatc cactagttct agaattccag 2580ctgagcgccg gtcgctacca
ttaccagttg gtctggtgtc aaaaataata ataaccgggc 2640aggggggatc tgcatggatc
tttgtgaagg aaccttactt ctgtggtgtg acataattgg 2700acaaactacc tacagagatt
taaagctcta aggtaaatat aaaattttta agtgtataat 2760gtgttaaact actgattcta
attgtttgtg tattttagat tccaacctat ggaactgatg 2820aatgggagca gtggtggaat
gccagatcca gacatgataa gatacattga tgagtttgga 2880caaaccacaa ctagaatgca
gtgaaaaaaa tgctttattt gtgaaatttg tgatgctatt 2940gctttatttg taaccattat
aagctgcaat aaacaagtta acaacaacaa ttgcattcat 3000tttatgtttc aggttcaggg
ggaggtgtgg gaggtttttt aaagcaagta aaacctctac 3060aaatgtggta tggctgatta
tgatctgcgg ccgccactgg ccgtcgtttt acaacgtcgt 3120gactgggaaa accctggcgt
tacccaactt aatcgccttg cagcacatcc ccctttcgcc 3180agctggcgta atagcgaaga
ggcccgcacc gatcgccctt cccaacagtt gcgcagcctg 3240aatggcgaat ggaacgcgcc
ctgtagcggc gcattaagcg cggcgggtgt ggtggttacg 3300cgcagcgtga ccgctacact
tgccagcgcc ctagcgcccg ctcctttcgc tttcttccct 3360tcctttctcg ccacgttcgc
cggctttccc cgtcaagctc taaatcgggg gctcccttta 3420gggttccgat ttagtgcttt
acggcacctc gaccccaaaa aacttgatta gggtgatggt 3480tcacgtagtg ggccatcgcc
ctgatagacg gtttttcgcc ctttgacgtt ggagtccacg 3540ttctttaata gtggactctt
gttccaaact ggaacaacac tcaaccctat ctcggtctat 3600tcttttgatt tataagggat
tttgccgatt tcggcctatt ggttaaaaaa tgagctgatt 3660taacaaaaat ttaacgcgaa
ttttaacaaa atattaacgc ttacaattta ggtggcactt 3720ttcggggaaa tgtgcgcgga
acccctattt gtttattttt ctaaatacat tcaaatatgt 3780atccgctcat gagacaataa
ccctgataaa tgcttcaata atattgaaaa aggaagagta 3840tgagtattca acatttccgt
gtcgccctta ttcccttttt tgcggcattt tgccttcctg 3900tttttgctca cccagaaacg
ctggtgaaag taaaagatgc tgaagatcag ttgggtgcac 3960gagtgggtta catcgaactg
gatctcaaca gcggtaagat ccttgagagt tttcgccccg 4020aagaacgttt tccaatgatg
agcactttta aagttctgct atgtggcgcg gtattatccc 4080gtattgacgc cgggcaagag
caactcggtc gccgcataca ctattctcag aatgacttgg 4140ttgagtactc accagtcaca
gaaaagcatc ttacggatgg catgacagta agagaattat 4200gcagtgctgc cataaccatg
agtgataaca ctgcggccaa cttacttctg acaacgatcg 4260gaggaccgaa ggagctaacc
gcttttttgc acaacatggg ggatcatgta actcgccttg 4320atcgttggga accggagctg
aatgaagcca taccaaacga cgagcgtgac accacgatgc 4380ctgtagcaat ggcaacaacg
ttgcgcaaac tattaactgg cgaactactt actctagctt 4440cccggcaaca attaatagac
tggatggagg cggataaagt tgcaggacca cttctgcgct 4500cggcccttcc ggctggctgg
tttattgctg ataaatctgg agccggtgag cgtgggtctc 4560gcggtatcat tgcagcactg
gggccagatg gtaagccctc ccgtatcgta gttatctaca 4620cgacggggag tcaggcaact
atggatgaac gaaatagaca gatcgctgag ataggtgcct 4680cactgattaa gcattggtaa
ctgtcagacc aagtttactc atatatactt tagattgatt 4740taaaacttca tttttaattt
aaaaggatct aggtgaagat cctttttgat aatctcatga 4800ccaaaatccc ttaacgtgag
ttttcgttcc actgagcgtc agaccccgta gaaaagatca 4860aaggatcttc ttgagatcct
ttttttctgc gcgtaatctg ctgcttgcaa acaaaaaaac 4920caccgctacc agcggtggtt
tgtttgccgg atcaagagct accaactctt tttccgaagg 4980taactggctt cagcagagcg
cagataccaa atactgtcct tctagtgtag ccgtagttag 5040gccaccactt caagaactct
gtagcaccgc ctacatacct cgctctgcta atcctgttac 5100cagtggctgc tgccagtggc
gataagtcgt gtcttaccgg gttggactca agacgatagt 5160taccggataa ggcgcagcgg
tcgggctgaa cggggggttc gtgcacacag cccagcttgg 5220agcgaacgac ctacaccgaa
ctgagatacc tacagcgtga gctatgagaa agcgccacgc 5280ttcccgaagg gagaaaggcg
gacaggtatc cggtaagcgg cagggtcgga acaggagagc 5340gcacgaggga gcttccaggg
ggaaacgcct ggtatcttta tagtcctgtc gggtttcgcc 5400acctctgact tgagcgtcga
tttttgtgat gctcgtcagg ggggcggagc ctatggaaaa 5460acgccagcaa cgcggccttt
ttacggttcc tggccttttg ctggcctttt gctcacatgt 5520tctttcctgc gttatcccct
gattctgtgg ataaccgtat taccgccttt gagtgagctg 5580ataccgctcg ccgcagccga
acgaccgagc gcagcgagtc agtgagcgag gaagcggaag 5640agcgcccaat acgcaaaccg
cctctccccg cgcgttggcc gattcattaa tgcagctggc 5700acgacaggtt tcccgactgg
aaagcgggca gtgagcgcaa cgcaattaat gtgagttagc 5760tcactcatta ggcaccccag
gctttacact ttatgcttcc ggctcgtatg ttgtgtggaa 5820ttgtgagcgg ataacaattt
cacacaggaa acagctatga ccatga 58664176PRTArtificial
sequenceN-terminal peptide 4Met Gly Pro Lys Lys Lys Arg Lys Val Ala Ala
Ala Asp Tyr Lys Asp 1 5 10
15 Asp Asp Asp Lys Pro Gly Gly Gly Gly Ser Gly Gly Gly Gly Val Pro
20 25 30 Ala Ser
Pro Ala Ala Gln Val Asp Leu Arg Thr Leu Gly Tyr Ser Gln 35
40 45 Gln Gln Gln Glu Lys Ile Lys
Pro Lys Val Arg Ser Thr Val Ala Gln 50 55
60 His His Glu Ala Leu Val Gly His Gly Phe Thr His
Ala His Ile Val 65 70 75
80 Ala Leu Ser Gln His Pro Ala Ala Leu Gly Thr Val Ala Val Lys Tyr
85 90 95 Gln Asp Met
Ile Ala Ala Leu Pro Glu Ala Thr His Glu Ala Ile Val 100
105 110 Gly Val Gly Lys Gln Trp Ser Gly
Ala Arg Ala Leu Glu Ala Leu Leu 115 120
125 Thr Val Ala Gly Glu Leu Arg Gly Pro Pro Leu Gln Ser
Gly Leu Asp 130 135 140
Thr Gly Gln Leu Leu Lys Ile Ala Lys Arg Gly Gly Val Thr Ala Val 145
150 155 160 Glu Ala Val His
Ala Trp Arg Asn Ala Leu Thr Gly Ala Pro Leu Asn 165
170 175 578PRTArtificial sequenceC-terminal
peptide 5Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn Gly Gly Gly Arg
1 5 10 15 Pro Ala
Leu Glu Ser Ile Val Ala Gln Leu Ser Arg Pro Asp Pro Ala 20
25 30 Leu Ala Arg Ser Ala Leu Thr
Asn Asp His Leu Val Ala Leu Ala Cys 35 40
45 Leu Gly Gly Arg Pro Ala Leu Asp Ala Val Lys Lys
Gly Leu Pro His 50 55 60
Ala Pro Ala Leu Ile Lys Arg Thr Asn Arg Arg Ile Pro Glu 65
70 75 6408PRTArtificial
sequenceArtTal1 6Leu Thr Pro Glu Gln Val Val Ala Ile Ala Ser Asn Ile Gly
Gly Lys 1 5 10 15
Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala
20 25 30 His Gly Leu Thr Pro
Gln Gln Val Val Ala Ile Ala Ser Asn Gly Gly 35
40 45 Gly Lys Gln Ala Leu Glu Thr Val Gln
Arg Leu Leu Pro Val Leu Cys 50 55
60 Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala Ile
Ala Ser Asn 65 70 75
80 Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val
85 90 95 Leu Cys Gln Ala
His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala 100
105 110 Ser His Asp Gly Gly Lys Gln Ala Leu
Glu Thr Val Gln Arg Leu Leu 115 120
125 Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val
Val Ala 130 135 140
Ile Ala Ser Asn Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg 145
150 155 160 Leu Leu Pro Val Leu
Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val 165
170 175 Val Ala Ile Ala Ser Asn Asn Gly Gly Lys
Gln Ala Leu Glu Thr Val 180 185
190 Gln Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro
Gln 195 200 205 Gln
Val Val Ala Ile Ala Ser Asn Asn Gly Gly Lys Gln Ala Leu Glu 210
215 220 Thr Val Gln Arg Leu Leu
Pro Val Leu Cys Gln Ala His Gly Leu Thr 225 230
235 240 Pro Gln Gln Val Val Ala Ile Ala Ser Asn Asn
Gly Gly Lys Gln Ala 245 250
255 Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly
260 265 270 Leu Thr
Pro Glu Gln Val Val Ala Ile Ala Ser Asn Ile Gly Gly Lys 275
280 285 Gln Ala Leu Glu Thr Val Gln
Arg Leu Leu Pro Val Leu Cys Gln Ala 290 295
300 His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala
Ser His Asp Gly 305 310 315
320 Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys
325 330 335 Gln Ala His
Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn 340
345 350 Asn Gly Gly Lys Gln Ala Leu Glu
Thr Val Gln Arg Leu Leu Pro Val 355 360
365 Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val Val
Ala Ile Ala 370 375 380
Ser Asn Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu 385
390 395 400 Pro Val Leu Cys
Gln Ala His Gly 405 77067DNAArtificial
sequencepCAG-ArtTal1-nuclease 7gacattgatt attgactagt tattaatagt
aatcaattac ggggtcatta gttcatagcc 60catatatgga gttccgcgtt acataactta
cggtaaatgg cccgcctggc tgaccgccca 120acgacccccg cccattgacg tcaataatga
cgtatgttcc catagtaacg ccaataggga 180ctttccattg acgtcaatgg gtggagtatt
tacggtaaac tgcccacttg gcagtacatc 240aagtgtatca tatgccaagt acgcccccta
ttgacgtcaa tgacggtaaa tggcccgcct 300ggcattatgc ccagtacatg accttatggg
actttcctac ttggcagtac atctacgtat 360tagtcatcgc tattaccatg gtcgaggtga
gccccacgtt ctgcttcact ctccccatct 420cccccccctc cccaccccca attttgtatt
tatttatttt ttaattattt tgtgcagcga 480tgggggcggg gggggggggg gggcgcgcgc
caggcggggc ggggcggggc gaggggcggg 540gcggggcgag gcggagaggt gcggcggcag
ccaatcagag cggcgcgctc cgaaagtttc 600cttttatggc gaggcggcgg cggcggcggc
cctataaaaa gcgaagcgcg cggcgggcgg 660gagtcgctgc gcgctgcctt cgccccgtgc
cccgctccgc cgccgcctcg cgccgcccgc 720cccggctctg actgaccgcg ttactcccac
aggtgagcgg gcgggacggc ccttctcctc 780cgggctgtaa ttagcgcttg gtttaatgac
ggcttgtttc ttttctgtgg ctgcgtgaaa 840gccttgaggg gctccgggag ggccctttgt
gcggggggga gcggctcggg gggtgcgtgc 900gtgtgtgtgt gcgtggggag cgccgcgtgc
ggctccgcgc tgcccggcgg ctgtgagcgc 960tgcgggcgcg gcgcggggct ttgtgcgctc
cgcagtgtgc gcgaggggag cgcggccggg 1020ggcggtgccc cgcggtgcgg ggggggctgc
gaggggaaca aaggctgcgt gcggggtgtg 1080tgcgtggggg ggtgagcagg gggtgtgggc
gcgtcggtcg ggctgcaacc ccccctgcac 1140ccccctcccc gagttgctga gcacggcccg
gcttcgggtg cggggctccg tacggggcgt 1200ggcgcggggc tcgccgtgcc gggcgggggg
tggcggcagg tgggggtgcc gggcggggcg 1260gggccgcctc gggccgggga gggctcgggg
gaggggcgcg gcggcccccg gagcgccggc 1320ggctgtcgag gcgcggcgag ccgcagccat
tgccttttat ggtaatcgtg cgagagggcg 1380cagggacttc ctttgtccca aatctgtgcg
gagccgaaat ctgggaggcg ccgccgcacc 1440ccctctagcg ggcgcggggc gaagcggtgc
ggcgccggca ggaaggaaat gggcggggag 1500ggccttcgtg cgtcgccgcg ccgccgtccc
cttctccctc tccagcctcg gggctgtccg 1560cggggggacg gctgccttcg ggggggacgg
ggcagggcgg ggttcggctt ctggcgtgtg 1620accggcggct ctagagcctc tgctaaccat
gttcatgcct tcttcttttt cctacagatc 1680cttaattaat aatacgactc actatagggg
ccgccaccat gggacctaag aaaaagagga 1740aggtggcggc cgctgactac aaggatgacg
acgataaacc aggtggcgga ggtagtggcg 1800gaggtggggt acccgccagt ccagcagccc
aggtggatct gagaaccctc ggctacagcc 1860agcagcagca ggagaagatc aaaccaaagg
tgcggtccac cgtcgctcag caccatgaag 1920cactggtggg gcacggtttc acacacgccc
atattgtggc tctgtctcag catcccgctg 1980cactcgggac tgtggccgtc aaatatcagg
acatgatcgc cgctctgcct gaggcaaccc 2040acgaagccat tgtgggcgtc ggaaagcagt
ggagcggtgc cagagcactc gaagcactcc 2100tcaccgtcgc cggggaactg cggggtccac
cactccagtc cggactggac actggacagc 2160tgctgaagat cgctaaacgc ggcggagtga
cagctgtgga agctgtgcac gcttggagga 2220atgctctgac aggagcccca ctgaatctta
ctccagaaca ggtcgtcgca atcgcaagta 2280acatcggcgg aaaacaggcc ctcgaaaccg
tccagagact cctccccgtg ctgtgccagg 2340cccacggact gaccccacag caggtggtcg
ccatcgctag caacggcgga gggaagcagg 2400ctctggagac cgtgcagagg ctgctccccg
tcctgtgcca ggcacatggg ctcacacctc 2460agcaggtggt cgcaattgcc tccaatggtg
gcggaaaaca ggccctggaa actgtgcaga 2520gactgctccc cgtgctgtgc caggctcacg
gtctcacacc ccagcaggtg gtcgctatcg 2580catctcatga cgggggcaag caggcactgg
agacagtgca gcggctgctc cctgtcctgt 2640gccaggccca cggactcact cctcagcagg
tcgtcgccat tgctagtaac ggcggaggga 2700aacaggctct ggaaaccgtg cagcgcctgc
tccccgtgct gtgccaagcc cacggcctga 2760ccccccagca ggtggtcgca atcgcctcaa
acaatggtgg caagcaggcc ctggagactg 2820tgcagcgact gctcccagtg ctgtgccagg
cccatggact cacaccacag caggtcgtcg 2880ctattgcaag caacaatgga gggaaacagg
cactggaaac agtccagagg ctgctccccg 2940tgctgtgcca agcgcatgga ctcactcccc
agcaggtcgt cgccatcgct tccaataacg 3000gcggcaagca ggccctggag accgtccaga
gactgctccc cgtgctgtgc caagctcacg 3060gactcacacc tgagcaggtc gtggcaatcg
cctctaacat tggagggaaa caggccctgg 3120aaactgtaca gcggctgctc cccgtgctgt
gccaagcaca cggactcact ccacagcagg 3180tcgtggccat tgcaagtcat gacggaggca
agcaggccct ggaaacagtg cagcgcctgc 3240tccctgtgct gtgccaggct catggtctga
ctcctcagca ggtggtggcc atcgcttcca 3300acaatggagg gaagcaggcc ctggagaccg
tacagagact gctccccgtg ctgtgccaag 3360cgcacggtct gacccctcag caggtcgtcg
caatcgccag caatggcggg ggcaagcagg 3420ctctcgaaac cgtccagcgg ctcctcccag
tcctctgtca ggctcacggc ctgaccccac 3480agcaggtcgt cgctattgct tctaatggcg
gagggcggcc tgctctggag agcattgtgg 3540ctcagctgtc caggcccgat cctgccctgg
ctagatccgc actcactaac gatcatctgg 3600tcgctctcgc ttgcctcggt ggacggcccg
ctctggacgc agtcaaaaag ggtctccccc 3660atgctcccgc actgatcaag agaaccaaca
ggagaattcc tgagggatcc gatcgtttaa 3720acgatcacgc gtaaatgatt gcagatccac
tagttctaga attccagctg agcgccggtc 3780gctaccatta ccagttggtc tggtgtcaaa
aataataata accgggcagg ggggatctgc 3840atggatcttt gtgaaggaac cttacttctg
tggtgtgaca taattggaca aactacctac 3900agagatttaa agctctaagg taaatataaa
atttttaagt gtataatgtg ttaaactact 3960gattctaatt gtttgtgtat tttagattcc
aacctatgga actgatgaat gggagcagtg 4020gtggaatgcc agatccagac atgataagat
acattgatga gtttggacaa accacaacta 4080gaatgcagtg aaaaaaatgc tttatttgtg
aaatttgtga tgctattgct ttatttgtaa 4140ccattataag ctgcaataaa caagttaaca
acaacaattg cattcatttt atgtttcagg 4200ttcaggggga ggtgtgggag gttttttaaa
gcaagtaaaa cctctacaaa tgtggtatgg 4260ctgattatga tctgcggccg ccactggccg
tcgttttaca acgtcgtgac tgggaaaacc 4320ctggcgttac ccaacttaat cgccttgcag
cacatccccc tttcgccagc tggcgtaata 4380gcgaagaggc ccgcaccgat cgcccttccc
aacagttgcg cagcctgaat ggcgaatgga 4440acgcgccctg tagcggcgca ttaagcgcgg
cgggtgtggt ggttacgcgc agcgtgaccg 4500ctacacttgc cagcgcccta gcgcccgctc
ctttcgcttt cttcccttcc tttctcgcca 4560cgttcgccgg ctttccccgt caagctctaa
atcgggggct ccctttaggg ttccgattta 4620gtgctttacg gcacctcgac cccaaaaaac
ttgattaggg tgatggttca cgtagtgggc 4680catcgccctg atagacggtt tttcgccctt
tgacgttgga gtccacgttc tttaatagtg 4740gactcttgtt ccaaactgga acaacactca
accctatctc ggtctattct tttgatttat 4800aagggatttt gccgatttcg gcctattggt
taaaaaatga gctgatttaa caaaaattta 4860acgcgaattt taacaaaata ttaacgctta
caatttaggt ggcacttttc ggggaaatgt 4920gcgcggaacc cctatttgtt tatttttcta
aatacattca aatatgtatc cgctcatgag 4980acaataaccc tgataaatgc ttcaataata
ttgaaaaagg aagagtatga gtattcaaca 5040tttccgtgtc gcccttattc ccttttttgc
ggcattttgc cttcctgttt ttgctcaccc 5100agaaacgctg gtgaaagtaa aagatgctga
agatcagttg ggtgcacgag tgggttacat 5160cgaactggat ctcaacagcg gtaagatcct
tgagagtttt cgccccgaag aacgttttcc 5220aatgatgagc acttttaaag ttctgctatg
tggcgcggta ttatcccgta ttgacgccgg 5280gcaagagcaa ctcggtcgcc gcatacacta
ttctcagaat gacttggttg agtactcacc 5340agtcacagaa aagcatctta cggatggcat
gacagtaaga gaattatgca gtgctgccat 5400aaccatgagt gataacactg cggccaactt
acttctgaca acgatcggag gaccgaagga 5460gctaaccgct tttttgcaca acatggggga
tcatgtaact cgccttgatc gttgggaacc 5520ggagctgaat gaagccatac caaacgacga
gcgtgacacc acgatgcctg tagcaatggc 5580aacaacgttg cgcaaactat taactggcga
actacttact ctagcttccc ggcaacaatt 5640aatagactgg atggaggcgg ataaagttgc
aggaccactt ctgcgctcgg cccttccggc 5700tggctggttt attgctgata aatctggagc
cggtgagcgt gggtctcgcg gtatcattgc 5760agcactgggg ccagatggta agccctcccg
tatcgtagtt atctacacga cggggagtca 5820ggcaactatg gatgaacgaa atagacagat
cgctgagata ggtgcctcac tgattaagca 5880ttggtaactg tcagaccaag tttactcata
tatactttag attgatttaa aacttcattt 5940ttaatttaaa aggatctagg tgaagatcct
ttttgataat ctcatgacca aaatccctta 6000acgtgagttt tcgttccact gagcgtcaga
ccccgtagaa aagatcaaag gatcttcttg 6060agatcctttt tttctgcgcg taatctgctg
cttgcaaaca aaaaaaccac cgctaccagc 6120ggtggtttgt ttgccggatc aagagctacc
aactcttttt ccgaaggtaa ctggcttcag 6180cagagcgcag ataccaaata ctgtccttct
agtgtagccg tagttaggcc accacttcaa 6240gaactctgta gcaccgccta catacctcgc
tctgctaatc ctgttaccag tggctgctgc 6300cagtggcgat aagtcgtgtc ttaccgggtt
ggactcaaga cgatagttac cggataaggc 6360gcagcggtcg ggctgaacgg ggggttcgtg
cacacagccc agcttggagc gaacgaccta 6420caccgaactg agatacctac agcgtgagct
atgagaaagc gccacgcttc ccgaagggag 6480aaaggcggac aggtatccgg taagcggcag
ggtcggaaca ggagagcgca cgagggagct 6540tccaggggga aacgcctggt atctttatag
tcctgtcggg tttcgccacc tctgacttga 6600gcgtcgattt ttgtgatgct cgtcaggggg
gcggagccta tggaaaaacg ccagcaacgc 6660ggccttttta cggttcctgg ccttttgctg
gccttttgct cacatgttct ttcctgcgtt 6720atcccctgat tctgtggata accgtattac
cgcctttgag tgagctgata ccgctcgccg 6780cagccgaacg accgagcgca gcgagtcagt
gagcgaggaa gcggaagagc gcccaatacg 6840caaaccgcct ctccccgcgc gttggccgat
tcattaatgc agctggcacg acaggtttcc 6900cgactggaaa gcgggcagtg agcgcaacgc
aattaatgtg agttagctca ctcattaggc 6960accccaggct ttacacttta tgcttccggc
tcgtatgttg tgtggaattg tgagcggata 7020acaatttcac acaggaaaca gctatgacca
tgaggcgcgc cggattc 70678476PRTArtificial sequenceTalRab2
8Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn Asn Gly Gly Lys 1
5 10 15 Gln Ala Leu Glu
Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala 20
25 30 His Gly Leu Thr Pro Gln Gln Val Val
Ala Ile Ala Ser Asn Asn Gly 35 40
45 Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val
Leu Cys 50 55 60
Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn 65
70 75 80 Gly Gly Gly Lys Gln
Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val 85
90 95 Leu Cys Gln Ala His Gly Leu Thr Pro Gln
Gln Val Val Ala Ile Ala 100 105
110 Ser Asn Asn Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu
Leu 115 120 125 Pro
Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala 130
135 140 Ile Ala Ser Asn Asn Gly
Gly Lys Gln Ala Leu Glu Thr Val Gln Arg 145 150
155 160 Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu
Thr Pro Gln Gln Val 165 170
175 Val Ala Ile Ala Ser His Asp Gly Gly Lys Gln Ala Leu Glu Thr Val
180 185 190 Gln Arg
Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln 195
200 205 Gln Val Val Ala Ile Ala Ser
His Asp Gly Gly Lys Gln Ala Leu Glu 210 215
220 Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala
His Gly Leu Thr 225 230 235
240 Pro Gln Gln Val Val Ala Ile Ala Ser His Asp Gly Gly Lys Gln Ala
245 250 255 Leu Glu Thr
Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly 260
265 270 Leu Thr Pro Gln Gln Val Val Ala
Ile Ala Ser Asn Asn Gly Gly Lys 275 280
285 Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu
Cys Gln Ala 290 295 300
His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn Asn Gly 305
310 315 320 Gly Lys Gln Ala
Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys 325
330 335 Gln Ala His Gly Leu Thr Pro Gln Gln
Val Val Ala Ile Ala Ser Asn 340 345
350 Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu
Pro Val 355 360 365
Leu Cys Gln Ala His Gly Leu Thr Pro Glu Gln Val Val Ala Ile Ala 370
375 380 Ser Asn Ile Gly Gly
Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu 385 390
395 400 Pro Val Leu Cys Gln Ala His Gly Leu Thr
Pro Gln Gln Val Val Ala 405 410
415 Ile Ala Ser Asn Asn Gly Gly Lys Gln Ala Leu Glu Thr Val Gln
Arg 420 425 430 Leu
Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val 435
440 445 Val Ala Ile Ala Ser Asn
Gly Gly Gly Lys Gln Ala Leu Glu Thr Val 450 455
460 Gln Arg Leu Leu Pro Val Leu Cys Gln Ala His
Gly 465 470 475 97271DNAArtificial
sequencepCAG-TalRab2-nuclease 9ggcgcgccgg attcgacatt gattattgac
tagttattaa tagtaatcaa ttacggggtc 60attagttcat agcccatata tggagttccg
cgttacataa cttacggtaa atggcccgcc 120tggctgaccg cccaacgacc cccgcccatt
gacgtcaata atgacgtatg ttcccatagt 180aacgccaata gggactttcc attgacgtca
atgggtggag tatttacggt aaactgccca 240cttggcagta catcaagtgt atcatatgcc
aagtacgccc cctattgacg tcaatgacgg 300taaatggccc gcctggcatt atgcccagta
catgacctta tgggactttc ctacttggca 360gtacatctac gtattagtca tcgctattac
catggtcgag gtgagcccca cgttctgctt 420cactctcccc atctcccccc cctccccacc
cccaattttg tatttattta ttttttaatt 480attttgtgca gcgatggggg cggggggggg
gggggggcgc gcgccaggcg gggcggggcg 540gggcgagggg cggggcgggg cgaggcggag
aggtgcggcg gcagccaatc agagcggcgc 600gctccgaaag tttcctttta tggcgaggcg
gcggcggcgg cggccctata aaaagcgaag 660cgcgcggcgg gcgggagtcg ctgcgcgctg
ccttcgcccc gtgccccgct ccgccgccgc 720ctcgcgccgc ccgccccggc tctgactgac
cgcgttactc ccacaggtga gcgggcggga 780cggcccttct cctccgggct gtaattagcg
cttggtttaa tgacggcttg tttcttttct 840gtggctgcgt gaaagccttg aggggctccg
ggagggccct ttgtgcgggg gggagcggct 900cggggggtgc gtgcgtgtgt gtgtgcgtgg
ggagcgccgc gtgcggctcc gcgctgcccg 960gcggctgtga gcgctgcggg cgcggcgcgg
ggctttgtgc gctccgcagt gtgcgcgagg 1020ggagcgcggc cgggggcggt gccccgcggt
gcgggggggg ctgcgagggg aacaaaggct 1080gcgtgcgggg tgtgtgcgtg ggggggtgag
cagggggtgt gggcgcgtcg gtcgggctgc 1140aaccccccct gcacccccct ccccgagttg
ctgagcacgg cccggcttcg ggtgcggggc 1200tccgtacggg gcgtggcgcg gggctcgccg
tgccgggcgg ggggtggcgg caggtggggg 1260tgccgggcgg ggcggggccg cctcgggccg
gggagggctc gggggagggg cgcggcggcc 1320cccggagcgc cggcggctgt cgaggcgcgg
cgagccgcag ccattgcctt ttatggtaat 1380cgtgcgagag ggcgcaggga cttcctttgt
cccaaatctg tgcggagccg aaatctggga 1440ggcgccgccg caccccctct agcgggcgcg
gggcgaagcg gtgcggcgcc ggcaggaagg 1500aaatgggcgg ggagggcctt cgtgcgtcgc
cgcgccgccg tccccttctc cctctccagc 1560ctcggggctg tccgcggggg gacggctgcc
ttcggggggg acggggcagg gcggggttcg 1620gcttctggcg tgtgaccggc ggctctagag
cctctgctaa ccatgttcat gccttcttct 1680ttttcctaca gatccttaat taataatacg
actcactata ggggccgcca ccatgggacc 1740taagaaaaag aggaaggtgg cggccgctga
ctacaaggat gacgacgata aaccaggtgg 1800cggaggtagt ggcggaggtg gggtacccgc
cagtccagca gcccaggtgg atctgagaac 1860cctcggctac agccagcagc agcaggagaa
gatcaaacca aaggtgcggt ccaccgtcgc 1920tcagcaccat gaagcactgg tggggcacgg
tttcacacac gcccatattg tggctctgtc 1980tcagcatccc gctgcactcg ggactgtggc
cgtcaaatat caggacatga tcgccgctct 2040gcctgaggca acccacgaag ccattgtggg
cgtcggaaag cagtggagcg gtgccagagc 2100actcgaagca ctcctcaccg tcgccgggga
actgcggggt ccaccactcc agtccggact 2160ggacactgga cagctgctga agatcgctaa
acgcggcgga gtgacagctg tggaagctgt 2220gcacgcttgg aggaatgctc tgacaggagc
cccactgaat ctgacacccc agcaggtggt 2280ggccattgct agcaacaatg ggggcaagca
ggctctggag acagtgcagc gcctgctgcc 2340tgtgctgtgc caggctcacg gactgactcc
acagcaggtg gtggccatcg cttccaacaa 2400tggagggaaa caggctctgg aaacagtgca
gaggctgctg cccgtgctgt gccaggctca 2460tggactgaca cctcagcagg tcgtcgccat
tgcttctaac ggcggaggga agcaggctct 2520ggagactgtg cagagactgc tgccagtgct
gtgccaggcc catggactga cccctcagca 2580ggtcgtggct atcgctagta acaatggcgg
aaaacaggct ctggaaactg tgcagcggct 2640gctccccgtg ctgtgccagg cccacggcct
cactccacag caggtcgtcg ctatcgcctc 2700taataacggg ggcaagcagg ctctggagac
agtacagcgc ctgttacccg tgctgtgcca 2760ggcacacggc ctcacacctc agcaggtcgt
ggcaatcgct tcccatgacg gagggaaaca 2820ggctctggaa acggtccaga ggctgctccc
cgtgctgtgc caagctcacg gcctcacccc 2880tcagcaggtg gtcgctattg cttctcatga
tggcggaaag caggctctgg agaccgtgca 2940gagactgctc cctgtgctgt gccaagccca
cggcctgact ccacagcagg tcgtggccat 3000cgctagtcat gacgggggca aacaggctct
ggaaacagta cagcggctgt tacccgtgct 3060gtgccaagcc catggcctca cacctcagca
agtcgtcgct atcgctagca acaatggagg 3120gaagcaggct ctggagacgg tgcagcgcct
gctcccagtg ctgtgccaag ctcatggcct 3180cacccctcag caagtcgtcg caattgcttc
caataacggc ggaaaacagg ctctggaaac 3240cgtccagagg ctgctgcccg tgctgtgcca
agcacatggc ttaactccac agcaagtggt 3300ggccattgct tctaatgggg gcggaaagca
ggccctggag acagtccaga gactgttgcc 3360cgtgctgtgc caagcgcatg gactgacacc
tgaacaggtc gtcgctatcg ctagtaatat 3420tgggggcaaa caggccctgg aaacagtgca
gcggctgctt cccgtgctgt gccaggcgca 3480tggactcaca ccccagcagg tcgtcgcaat
cgcctctaat aacggaggga agcaggccct 3540ggaaaccgtg cagagactgt tacctgtgct
gtgccaggca catggtctga caccacagca 3600ggtggtcgca attgctagca atggcggagg
gaagcaggcc ctggagactg tccagagact 3660gctacccgtg ctgtgccaag cgcacggcct
gaccccacag caggtcgtcg ctattgcttc 3720taatggcgga gggcggcctg ctctggagag
cattgtggct cagctgtcca ggcccgatcc 3780tgccctggct agatccgcac tcactaacga
tcatctggtc gctctcgctt gcctcggtgg 3840acggcccgct ctggacgcag tcaaaaaggg
tctcccccat gctcccgcac tgatcaagag 3900aaccaacagg agaattcctg agggatccga
tcgtttaaac gatcacgcgt aaatgattgc 3960agatccacta gttctagaat tccagctgag
cgccggtcgc taccattacc agttggtctg 4020gtgtcaaaaa taataataac cgggcagggg
ggatctgcat ggatctttgt gaaggaacct 4080tacttctgtg gtgtgacata attggacaaa
ctacctacag agatttaaag ctctaaggta 4140aatataaaat ttttaagtgt ataatgtgtt
aaactactga ttctaattgt ttgtgtattt 4200tagattccaa cctatggaac tgatgaatgg
gagcagtggt ggaatgccag atccagacat 4260gataagatac attgatgagt ttggacaaac
cacaactaga atgcagtgaa aaaaatgctt 4320tatttgtgaa atttgtgatg ctattgcttt
atttgtaacc attataagct gcaataaaca 4380agttaacaac aacaattgca ttcattttat
gtttcaggtt cagggggagg tgtgggaggt 4440tttttaaagc aagtaaaacc tctacaaatg
tggtatggct gattatgatc tgcggccgcc 4500actggccgtc gttttacaac gtcgtgactg
ggaaaaccct ggcgttaccc aacttaatcg 4560ccttgcagca catccccctt tcgccagctg
gcgtaatagc gaagaggccc gcaccgatcg 4620cccttcccaa cagttgcgca gcctgaatgg
cgaatggaac gcgccctgta gcggcgcatt 4680aagcgcggcg ggtgtggtgg ttacgcgcag
cgtgaccgct acacttgcca gcgccctagc 4740gcccgctcct ttcgctttct tcccttcctt
tctcgccacg ttcgccggct ttccccgtca 4800agctctaaat cgggggctcc ctttagggtt
ccgatttagt gctttacggc acctcgaccc 4860caaaaaactt gattagggtg atggttcacg
tagtgggcca tcgccctgat agacggtttt 4920tcgccctttg acgttggagt ccacgttctt
taatagtgga ctcttgttcc aaactggaac 4980aacactcaac cctatctcgg tctattcttt
tgatttataa gggattttgc cgatttcggc 5040ctattggtta aaaaatgagc tgatttaaca
aaaatttaac gcgaatttta acaaaatatt 5100aacgcttaca atttaggtgg cacttttcgg
ggaaatgtgc gcggaacccc tatttgttta 5160tttttctaaa tacattcaaa tatgtatccg
ctcatgagac aataaccctg ataaatgctt 5220caataatatt gaaaaaggaa gagtatgagt
attcaacatt tccgtgtcgc ccttattccc 5280ttttttgcgg cattttgcct tcctgttttt
gctcacccag aaacgctggt gaaagtaaaa 5340gatgctgaag atcagttggg tgcacgagtg
ggttacatcg aactggatct caacagcggt 5400aagatccttg agagttttcg ccccgaagaa
cgttttccaa tgatgagcac ttttaaagtt 5460ctgctatgtg gcgcggtatt atcccgtatt
gacgccgggc aagagcaact cggtcgccgc 5520atacactatt ctcagaatga cttggttgag
tactcaccag tcacagaaaa gcatcttacg 5580gatggcatga cagtaagaga attatgcagt
gctgccataa ccatgagtga taacactgcg 5640gccaacttac ttctgacaac gatcggagga
ccgaaggagc taaccgcttt tttgcacaac 5700atgggggatc atgtaactcg ccttgatcgt
tgggaaccgg agctgaatga agccatacca 5760aacgacgagc gtgacaccac gatgcctgta
gcaatggcaa caacgttgcg caaactatta 5820actggcgaac tacttactct agcttcccgg
caacaattaa tagactggat ggaggcggat 5880aaagttgcag gaccacttct gcgctcggcc
cttccggctg gctggtttat tgctgataaa 5940tctggagccg gtgagcgtgg gtctcgcggt
atcattgcag cactggggcc agatggtaag 6000ccctcccgta tcgtagttat ctacacgacg
gggagtcagg caactatgga tgaacgaaat 6060agacagatcg ctgagatagg tgcctcactg
attaagcatt ggtaactgtc agaccaagtt 6120tactcatata tactttagat tgatttaaaa
cttcattttt aatttaaaag gatctaggtg 6180aagatccttt ttgataatct catgaccaaa
atcccttaac gtgagttttc gttccactga 6240gcgtcagacc ccgtagaaaa gatcaaagga
tcttcttgag atcctttttt tctgcgcgta 6300atctgctgct tgcaaacaaa aaaaccaccg
ctaccagcgg tggtttgttt gccggatcaa 6360gagctaccaa ctctttttcc gaaggtaact
ggcttcagca gagcgcagat accaaatact 6420gtccttctag tgtagccgta gttaggccac
cacttcaaga actctgtagc accgcctaca 6480tacctcgctc tgctaatcct gttaccagtg
gctgctgcca gtggcgataa gtcgtgtctt 6540accgggttgg actcaagacg atagttaccg
gataaggcgc agcggtcggg ctgaacgggg 6600ggttcgtgca cacagcccag cttggagcga
acgacctaca ccgaactgag atacctacag 6660cgtgagctat gagaaagcgc cacgcttccc
gaagggagaa aggcggacag gtatccggta 6720agcggcaggg tcggaacagg agagcgcacg
agggagcttc cagggggaaa cgcctggtat 6780ctttatagtc ctgtcgggtt tcgccacctc
tgacttgagc gtcgattttt gtgatgctcg 6840tcaggggggc ggagcctatg gaaaaacgcc
agcaacgcgg cctttttacg gttcctggcc 6900ttttgctggc cttttgctca catgttcttt
cctgcgttat cccctgattc tgtggataac 6960cgtattaccg cctttgagtg agctgatacc
gctcgccgca gccgaacgac cgagcgcagc 7020gagtcagtga gcgaggaagc ggaagagcgc
ccaatacgca aaccgcctct ccccgcgcgt 7080tggccgattc attaatgcag ctggcacgac
aggtttcccg actggaaagc gggcagtgag 7140cgcaacgcaa ttaatgtgag ttagctcact
cattaggcac cccaggcttt acactttatg 7200cttccggctc gtatgttgtg tggaattgtg
agcggataac aatttcacac aggaaacagc 7260tatgaccatg a
727110583PRTFlavobacterium okeanokoites
10Met Phe Leu Ser Met Val Ser Lys Ile Arg Thr Phe Gly Trp Val Gln 1
5 10 15 Asn Pro Gly Lys
Phe Glu Asn Leu Lys Arg Val Val Gln Val Phe Asp 20
25 30 Arg Asn Ser Lys Val His Asn Glu Val
Lys Asn Ile Lys Ile Pro Thr 35 40
45 Leu Val Lys Glu Ser Lys Ile Gln Lys Glu Leu Val Ala Ile
Met Asn 50 55 60
Gln His Asp Leu Ile Tyr Thr Tyr Lys Glu Leu Val Gly Thr Gly Thr 65
70 75 80 Ser Ile Arg Ser Glu
Ala Pro Cys Asp Ala Ile Ile Gln Ala Thr Ile 85
90 95 Ala Asp Gln Gly Asn Lys Lys Gly Tyr Ile
Asp Asn Trp Ser Ser Asp 100 105
110 Gly Phe Leu Arg Trp Ala His Ala Leu Gly Phe Ile Glu Tyr Ile
Asn 115 120 125 Lys
Ser Asp Ser Phe Val Ile Thr Asp Val Gly Leu Ala Tyr Ser Lys 130
135 140 Ser Ala Asp Gly Ser Ala
Ile Glu Lys Glu Ile Leu Ile Glu Ala Ile 145 150
155 160 Ser Ser Tyr Pro Pro Ala Ile Arg Ile Leu Thr
Leu Leu Glu Asp Gly 165 170
175 Gln His Leu Thr Lys Phe Asp Leu Gly Lys Asn Leu Gly Phe Ser Gly
180 185 190 Glu Ser
Gly Phe Thr Ser Leu Pro Glu Gly Ile Leu Leu Asp Thr Leu 195
200 205 Ala Asn Ala Met Pro Lys Asp
Lys Gly Glu Ile Arg Asn Asn Trp Glu 210 215
220 Gly Ser Ser Asp Lys Tyr Ala Arg Met Ile Gly Gly
Trp Leu Asp Lys 225 230 235
240 Leu Gly Leu Val Lys Gln Gly Lys Lys Glu Phe Ile Ile Pro Thr Leu
245 250 255 Gly Lys Pro
Asp Asn Lys Glu Phe Ile Ser His Ala Phe Lys Ile Thr 260
265 270 Gly Glu Gly Leu Lys Val Leu Arg
Arg Ala Lys Gly Ser Thr Lys Phe 275 280
285 Thr Arg Val Pro Lys Arg Val Tyr Trp Glu Met Leu Ala
Thr Asn Leu 290 295 300
Thr Asp Lys Glu Tyr Val Arg Thr Arg Arg Ala Leu Ile Leu Glu Ile 305
310 315 320 Leu Ile Lys Ala
Gly Ser Leu Lys Ile Glu Gln Ile Gln Asp Asn Leu 325
330 335 Lys Lys Leu Gly Phe Asp Glu Val Ile
Glu Thr Ile Glu Asn Asp Ile 340 345
350 Lys Gly Leu Ile Asn Thr Gly Ile Phe Ile Glu Ile Lys Gly
Arg Phe 355 360 365
Tyr Gln Leu Lys Asp His Ile Leu Gln Phe Val Ile Pro Asn Arg Gly 370
375 380 Val Thr Lys Gln Leu
Val Lys Ser Glu Leu Glu Glu Lys Lys Ser Glu 385 390
395 400 Leu Arg His Lys Leu Lys Tyr Val Pro His
Glu Tyr Ile Glu Leu Ile 405 410
415 Glu Ile Ala Arg Asn Ser Thr Gln Asp Arg Ile Leu Glu Met Lys
Val 420 425 430 Met
Glu Phe Phe Met Lys Val Tyr Gly Tyr Arg Gly Lys His Leu Gly 435
440 445 Gly Ser Arg Lys Pro Asp
Gly Ala Ile Tyr Thr Val Gly Ser Pro Ile 450 455
460 Asp Tyr Gly Val Ile Val Asp Thr Lys Ala Tyr
Ser Gly Gly Tyr Asn 465 470 475
480 Leu Pro Ile Gly Gln Ala Asp Glu Met Gln Arg Tyr Val Glu Glu Asn
485 490 495 Gln Thr
Arg Asn Lys His Ile Asn Pro Asn Glu Trp Trp Lys Val Tyr 500
505 510 Pro Ser Ser Val Thr Glu Phe
Lys Phe Leu Phe Val Ser Gly His Phe 515 520
525 Lys Gly Asn Tyr Lys Ala Gln Leu Thr Arg Leu Asn
His Ile Thr Asn 530 535 540
Cys Asn Gly Ala Val Leu Ser Val Glu Glu Leu Leu Ile Gly Gly Glu 545
550 555 560 Met Ile Lys
Ala Gly Thr Leu Thr Leu Glu Glu Val Arg Arg Lys Phe 565
570 575 Asn Asn Gly Glu Ile Asn Phe
580 11558PRTAcinetobacter lwoffii 11Met Ser Thr Trp
Leu Leu Gly Asn Thr Thr Val Arg Ser Pro Phe Arg 1 5
10 15 Leu Ile Asp Gly Leu Lys Val Phe Ala
Leu Thr Asn Gly Asp Ile Arg 20 25
30 Gly Thr Lys Glu Lys Glu Leu Val Phe Cys Lys Ala Leu Val
Glu Gly 35 40 45
Gly Ile Ile Ser Ala Ser Phe Glu Ala Glu Asp Thr Ser Gly Phe Ser 50
55 60 Asp Thr Thr Tyr Ser
Val Gly Arg Lys Trp Arg Ser Ala Leu Glu Lys 65 70
75 80 Leu Gly Phe Ile Glu Gln Phe Asn Gln Ile
Tyr Ile Leu Thr Glu Asn 85 90
95 Gly Arg Asn Leu Leu Asn Ser Gln Thr Leu Gln Ser Asp Gln Glu
Cys 100 105 110 Tyr
Leu Arg Ser Leu Ile Leu Tyr Ser Tyr Lys Ala Glu Asn Ser Asp 115
120 125 Asn Pro Gly Gly Phe Phe
Ser Pro Leu Met Leu Thr Leu His Ile Met 130 135
140 Lys Glu Leu Glu Ile Arg Thr Gly Ser Ser Arg
Ile Ser Phe Gln Glu 145 150 155
160 Met Ala Ala Val Ile Gln Leu Thr Phe Ser Tyr Leu Asp Ile Asn Gln
165 170 175 Ser Val
Asn Glu Ile Leu Thr Ile Arg Ser Asn Arg Gln Ala Ser Leu 180
185 190 Ser Lys Lys Lys Phe Asp Arg
Glu Leu Tyr Glu Ser Lys Ser Ser Lys 195 200
205 Ala Lys Ile Lys Ala Pro Ser Ile Lys Asp Tyr Ala
Asp Thr Asn Leu 210 215 220
Arg Tyr Leu Lys Ser Thr Gly Leu Phe Thr Ala Ser Gly Lys Gly Ile 225
230 235 240 Cys Phe Ile
Asp Asp Lys Lys Ile Val Ile Asp Lys Leu Ile Ala Met 245
250 255 Tyr Gly Thr Phe Asp Ile Ser Gln
Ser Asp Leu Lys Ile Gln Lys Gly 260 265
270 Ala Pro Leu Pro Thr Asp His Lys Glu Thr Asn Ile Leu
Leu Val Glu 275 280 285
Gln Leu Glu Glu Thr Leu Asn Arg Asn Arg Ile Leu Phe Glu Lys Asn 290
295 300 Ser Ser Ile Ala
Gln Ala Pro Ile Gly Glu Ile Lys Asn Tyr Arg Tyr 305 310
315 320 His Leu Glu Glu Leu Leu Phe Glu Asn
Asn Glu Lys Lys Phe Ala Glu 325 330
335 Asn Gln Lys Asn Glu Trp Asp Glu Ile Leu Ala Tyr Met Asp
Leu Leu 340 345 350
Ile Ser Pro Lys Pro Ile Ser Ile Glu Ile Ala Asp Lys Glu Ile Ser
355 360 365 Ile Pro Ser Gly
Glu Arg Pro Ala Tyr Phe Glu Trp Val Leu Trp Arg 370
375 380 Ala Phe Leu Ala Leu Asn His Leu
Ile Ile Glu Pro Gln Gln Cys Arg 385 390
395 400 Arg Phe Lys Val Asp Gln Asp Phe Lys Pro Ile His
Asn Ala Pro Gly 405 410
415 Gly Gly Ala Asp Val Ile Phe Glu Tyr Glu Asn Phe Lys Ile Leu Gly
420 425 430 Glu Val Thr
Leu Thr Ser Asn Ser Arg Gln Glu Ala Ala Glu Gly Glu 435
440 445 Pro Val Arg Arg His Ile Ala Val
Glu Thr Val Asn Thr Pro Asp Lys 450 455
460 Asp Val Tyr Gly Leu Phe Leu Ala Leu Thr Ile Asp Thr
Asn Thr Ala 465 470 475
480 Glu Thr Phe Arg His Gly Ala Trp Tyr His Gln Glu Glu Leu Met Asp
485 490 495 Val Lys Ile Leu
Pro Leu Thr Leu Glu Ser Phe Lys Lys Tyr Leu Glu 500
505 510 Ser Leu Arg Lys Lys Asn Gln Val Glu
Thr Gly Ile Phe Asp Leu Lys 515 520
525 Lys Met Met Asp Glu Ser Leu Lys Leu Arg Glu Thr Leu Thr
Ala Pro 530 535 540
Gln Trp Lys Asn Glu Ile Thr Asn Lys Phe Ala Arg Pro Ile 545
550 555 12556PRTMicrococcus lylae 12Met Ala
Ser Leu Ser Lys Thr Lys His Leu Phe Gly Phe Thr Ser Pro 1 5
10 15 Arg Thr Ile Glu Lys Ile Ile
Pro Glu Leu Asp Ile Leu Ser Gln Gln 20 25
30 Phe Ser Gly Lys Val Trp Gly Glu Asn Gln Ile Asn
Phe Phe Asp Ala 35 40 45
Ile Phe Asn Ser Asp Phe Tyr Glu Gly Thr Thr Tyr Pro Gln Asp Pro
50 55 60 Ala Leu Ala
Ala Arg Asp Arg Ile Thr Arg Ala Pro Lys Ala Leu Gly 65
70 75 80 Phe Ile Gln Leu Lys Pro Val
Ile Gln Leu Thr Lys Ala Gly Asn Gln 85
90 95 Leu Val Asn Gln Lys Arg Leu Pro Glu Leu Phe
Thr Lys Gln Leu Leu 100 105
110 Lys Phe Gln Leu Pro Ser Pro Tyr His Thr Gln Ser Pro Thr Val
Asn 115 120 125 Phe
Asn Val Arg Pro Tyr Leu Glu Leu Leu Arg Leu Ile Asn Glu Leu 130
135 140 Gly Ser Ile Ser Lys Thr
Glu Ile Ala Leu Phe Phe Leu Gln Leu Val 145 150
155 160 Asn Tyr Asn Lys Phe Asp Glu Ile Lys Asn Lys
Ile Leu Lys Phe Arg 165 170
175 Glu Thr Arg Lys Asn Asn Arg Ser Val Ser Trp Lys Thr Tyr Val Ser
180 185 190 Gln Glu
Phe Glu Lys Gln Ile Ser Ile Ile Phe Ala Asp Glu Val Thr 195
200 205 Ala Lys Asn Phe Arg Thr Arg
Glu Ser Ser Asp Glu Ser Phe Lys Lys 210 215
220 Phe Val Lys Thr Lys Glu Gly Asn Met Lys Asp Tyr
Ala Asp Ala Phe 225 230 235
240 Phe Arg Tyr Ile Arg Gly Thr Gln Leu Val Thr Ile Asp Lys Asn Leu
245 250 255 His Leu Lys
Ile Ser Ser Leu Lys Gln Asp Ser Val Asp Phe Leu Leu 260
265 270 Lys Asn Thr Asp Arg Asn Ala Leu
Asn Leu Ser Leu Met Glu Tyr Glu 275 280
285 Asn Tyr Leu Phe Asp Pro Asp Gln Leu Ile Val Leu Glu
Asp Asn Ser 290 295 300
Gly Leu Ile Asn Ser Lys Ile Lys Gln Leu Asp Asp Ser Ile Asn Val 305
310 315 320 Glu Ser Leu Lys
Ile Asp Asp Ala Lys Asp Leu Leu Asn Asp Leu Glu 325
330 335 Ile Gln Arg Lys Ala Lys Thr Ile Glu
Asp Thr Val Asn His Leu Lys 340 345
350 Leu Arg Ser Asp Ile Glu Asp Ile Leu Asp Val Phe Ala Lys
Ile Lys 355 360 365
Lys Arg Asp Val Pro Asp Val Pro Leu Phe Leu Glu Trp Asn Ile Trp 370
375 380 Arg Ala Phe Ala Ala
Leu Asn His Thr Gln Ala Ile Glu Gly Asn Phe 385 390
395 400 Ile Val Asp Leu Asp Gly Met Pro Leu Asn
Thr Ala Pro Gly Lys Lys 405 410
415 Pro Asp Ile Glu Ile Asn Tyr Gly Ser Phe Ser Cys Ile Val Glu
Val 420 425 430 Thr
Met Ser Ser Gly Glu Thr Gln Phe Asn Met Glu Gly Ser Ser Val 435
440 445 Pro Arg His Tyr Gly Asp
Leu Val Arg Lys Val Asp His Asp Ala Tyr 450 455
460 Cys Ile Phe Ile Ala Pro Lys Val Ala Pro Gly
Thr Lys Ala His Phe 465 470 475
480 Phe Asn Leu Asn Arg Leu Ser Thr Lys His Tyr Gly Gly Lys Thr Lys
485 490 495 Ile Ile
Pro Met Ser Leu Asp Asp Phe Ile Cys Phe Leu Gln Val Gly 500
505 510 Ile Thr His Asn Phe Gln Asp
Ile Asn Lys Leu Lys Asn Trp Leu Asp 515 520
525 Asn Leu Ile Asn Phe Asn Leu Glu Ser Glu Asp Glu
Glu Ile Trp Phe 530 535 540
Glu Glu Ile Ile Ser Lys Ile Ser Thr Trp Ala Ile 545
550 555 13323PRTStreptomyces spec. Bf-61 13Met Asn
Ser Ser Asp Gly Ile Asp Gly Thr Val Ala Ser Ile Asp Thr 1 5
10 15 Ala Arg Ala Leu Leu Lys Arg
Phe Gly Phe Asp Ala Gln Arg Tyr Asn 20 25
30 Val Arg Ser Ala Val Thr Leu Leu Ala Leu Ala Gly
Leu Lys Pro Gly 35 40 45
Asp Arg Trp Val Asp Ser Thr Thr Pro Arg Leu Gly Val Gln Lys Ile
50 55 60 Met Asp Trp
Ser Gly Glu His Trp Ala Lys Pro Tyr Ala Thr Gly Ser 65
70 75 80 Arg Glu Asp Phe Arg Lys Lys
Thr Leu Arg Gln Trp Val Asp Asn Gly 85
90 95 Phe Ala Val Leu Asn Ala Asp Asn Leu Asn Ile
Ala Thr Asn Ser Gln 100 105
110 Leu Asn Glu Tyr Cys Leu Ser Asp Glu Ala Leu Gln Ala Leu Arg
Ala 115 120 125 Tyr
Gly Thr Glu Gly Phe Glu Glu Ser Leu Val Val Phe Leu Asp Glu 130
135 140 Ala Ser Lys Ala Val Lys
Ala Arg Ala Glu Ala Leu Gln Ala Ala Met 145 150
155 160 Ile Ser Val Asp Leu Pro Gly Gly Glu Glu Phe
Leu Leu Ser Pro Ala 165 170
175 Gly Gln Asn Pro Leu Leu Lys Lys Met Val Glu Glu Phe Val Pro Arg
180 185 190 Phe Ala
Pro Arg Ser Thr Val Leu Tyr Leu Gly Asp Thr Arg Gly Lys 195
200 205 His Ser Leu Phe Glu Arg Glu
Ile Phe Glu Glu Val Leu Gly Leu Thr 210 215
220 Phe Asp Pro His Gly Arg Met Pro Asp Leu Ile Leu
His Asp Glu Val 225 230 235
240 Arg Gly Trp Leu Phe Leu Met Glu Ala Val Lys Ser Lys Gly Pro Phe
245 250 255 Asp Glu Glu
Arg His Arg Ser Leu Gln Glu Leu Phe Val Thr Pro Ser 260
265 270 Ala Gly Leu Ile Phe Val Asn Cys
Phe Glu Asn Arg Glu Ser Met Arg 275 280
285 Gln Trp Leu Pro Glu Leu Ala Trp Glu Thr Glu Ala Trp
Val Ala Glu 290 295 300
Asp Pro Asp His Leu Ile His Leu Asn Gly Ser Arg Phe Leu Gly Pro 305
310 315 320 Tyr Glu Arg
14323PRTStreptomyces diastaticus 14Met Thr Asn Ser Asn Asp Ile Asp Glu
Thr Ala Ala Thr Ile Asp Thr 1 5 10
15 Ala Arg Ala Leu Leu Lys Ser Phe Gly Phe Glu Ala Gln Arg
His Asn 20 25 30
Val Arg Ser Ala Val Thr Leu Leu Ala Leu Ala Gly Leu Lys Pro Gly
35 40 45 Asp His Trp Ala
Asp Ser Thr Thr Pro Arg Leu Gly Val Gln Lys Ile 50
55 60 Met Asp Trp Ser Gly Ala Tyr Trp
Ala Lys Pro Tyr Ala Thr Gly Ser 65 70
75 80 Arg Glu Asp Phe Arg Lys Lys Thr Leu Arg Gln Trp
Val Asp Asn Gly 85 90
95 Phe Ala Val Leu Asn Pro Asp Asn Leu Asn Ile Ala Thr Asn Ser Gln
100 105 110 Leu Asn Glu
Tyr Cys Leu Ser Asp Glu Ala Ala Gln Ala Ile Arg Ser 115
120 125 Tyr Gly Thr Asp Ala Phe Glu Ser
Ala Leu Val Asp Phe Leu Ser Lys 130 135
140 Ala Ser Asp Thr Val Arg Ala Arg Ala Glu Ala Leu Arg
Ala Ala Met 145 150 155
160 Ile Ser Val Asp Leu Ala Asp Gly Asp Glu Phe Leu Leu Ser Pro Ala
165 170 175 Gly Gln Asn Pro
Leu Leu Lys Lys Met Val Glu Glu Phe Met Pro Arg 180
185 190 Phe Ala Pro Gly Ala Lys Val Leu Tyr
Ile Gly Asp Trp Arg Gly Lys 195 200
205 His Thr Arg Phe Glu Lys Arg Ile Phe Glu Glu Thr Leu Gly
Leu Thr 210 215 220
Phe Asp Pro His Gly Arg Met Pro Asp Leu Val Leu His Asp Lys Val 225
230 235 240 Arg Lys Trp Leu Phe
Leu Met Glu Ala Val Lys Ser Lys Gly Pro Phe 245
250 255 Asp Glu Glu Arg His Arg Thr Leu Arg Glu
Leu Phe Ala Thr Pro Val 260 265
270 Ala Gly Leu Val Phe Val Asn Cys Phe Glu Asn Arg Glu Ala Met
Arg 275 280 285 Gln
Trp Leu Pro Glu Leu Ala Trp Glu Thr Glu Ala Trp Val Ala Asp 290
295 300 Asp Pro Asp His Leu Ile
His Leu Asn Gly Ser Arg Phe Leu Gly Pro 305 310
315 320 Tyr Glu Arg 15602PRTStreptococcus sanguis
15Met Thr Ile Ser Ile Asn Glu Tyr Ser Asp Leu Asn Asn Leu Ala Phe 1
5 10 15 Gly Leu Gly Gln
Asp Val Ser Gln Asp Leu Lys Glu Leu Val Lys Val 20
25 30 Ala Ser Ile Phe Met Pro Asp Ser Lys
Ile His Lys Trp Leu Ile Asp 35 40
45 Thr Arg Leu Glu Glu Val Val Thr Asp Leu Asn Leu Arg Tyr
Glu Leu 50 55 60
Lys Ser Val Ile Thr Asn Thr Pro Ile Ser Val Thr Trp Lys Gln Leu 65
70 75 80 Thr Gly Thr Arg Thr
Lys Arg Glu Ala Asn Ser Leu Val Gln Ala Val 85
90 95 Phe Pro Gly Gln Cys Ser Arg Leu Ala Ile
Val Asp Trp Ala Ala Lys 100 105
110 Asn Tyr Val Ser Val Ala Val Ala Phe Gly Leu Leu Lys Phe His
Arg 115 120 125 Ala
Asp Lys Thr Phe Thr Ile Ser Glu Ile Gly Ile Gln Ala Val Lys 130
135 140 Leu Tyr Asp Ser Glu Glu
Leu Ala Glu Leu Asp Lys Phe Leu Tyr Glu 145 150
155 160 Arg Leu Leu Glu Tyr Pro Tyr Ala Ala Trp Leu
Ile Arg Leu Leu Gly 165 170
175 Asn Gln Pro Ser Lys Gln Phe Ser Lys Phe Asp Leu Gly Glu His Phe
180 185 190 Gly Phe
Ile Asp Glu Leu Gly Phe Glu Thr Ala Pro Ile Glu Ile Phe 195
200 205 Leu Asn Gly Leu Ala Gln Ala
Glu Ile Asp Gly Asp Lys Thr Ala Ala 210 215
220 Gln Lys Ile Lys Ser Asn Phe Glu Ser Thr Ser Asp
Lys Tyr Met Arg 225 230 235
240 Trp Leu Ala Gly Val Leu Val Thr Ala Gly Leu Ala Thr Ser Thr Thr
245 250 255 Lys Lys Val
Thr His Thr Tyr Lys Asn Arg Lys Phe Glu Leu Thr Leu 260
265 270 Gly Thr Val Tyr Gln Ile Thr Ala
Lys Gly Leu Thr Ala Leu Lys Glu 275 280
285 Val Asn Gly Lys Ser Arg Tyr Pro Arg Ser Arg Lys Arg
Val Met Trp 290 295 300
Glu Phe Leu Ala Thr Lys Asp Lys Glu Ala Ile Ala Lys Lys Thr Ser 305
310 315 320 Arg Ser Leu Met
Leu Lys His Leu Thr Glu Lys Lys Asn Pro Ile Gln 325
330 335 Ala Glu Val Ile Ala Thr Leu Ile Asn
Thr Asp Tyr Pro Thr Leu Glu 340 345
350 Ile Thr Pro Glu Glu Val Ile Asp Asp Cys Ile Gly Leu Asn
Arg Ile 355 360 365
Gly Ile Glu Ile Leu Ile Asp Gly Asp Lys Leu Thr Leu Asn Asp Lys 370
375 380 Leu Phe Asp Phe Glu
Ile Pro Val Gln Lys Asp Val Val Leu Glu Lys 385 390
395 400 Ser Asp Ile Glu Lys Phe Lys Asn Gln Leu
Arg Thr Glu Leu Thr Asn 405 410
415 Ile Asp His Ser Tyr Leu Lys Gly Ile Asp Ile Ala Ser Lys Lys
Lys 420 425 430 Thr
Ser Asn Val Glu Asn Thr Glu Phe Glu Ala Ile Ser Thr Lys Ile 435
440 445 Phe Thr Asp Glu Leu Gly
Phe Ser Gly Lys His Leu Gly Gly Ser Asn 450 455
460 Lys Pro Asp Gly Leu Leu Trp Asp Asp Asp Cys
Ala Ile Ile Leu Asp 465 470 475
480 Ser Lys Ala Tyr Ser Glu Gly Phe Pro Leu Thr Ala Ser His Thr Asp
485 490 495 Ala Met
Gly Arg Tyr Leu Arg Gln Phe Thr Glu Arg Lys Glu Glu Ile 500
505 510 Lys Pro Thr Trp Trp Asp Ile
Ala Pro Glu His Leu Asp Asn Thr Tyr 515 520
525 Phe Ala Tyr Val Ser Gly Ser Phe Ser Gly Asn Tyr
Lys Glu Gln Leu 530 535 540
Gln Lys Phe Arg Gln Asp Thr Asn His Leu Gly Gly Ala Leu Glu Phe 545
550 555 560 Val Lys Leu
Leu Leu Leu Ala Asn Asn Tyr Lys Thr Gln Lys Met Ser 565
570 575 Lys Lys Glu Val Lys Lys Ser Ile
Leu Asp Tyr Asn Ile Ser Tyr Glu 580 585
590 Glu Tyr Ala Pro Leu Leu Ala Glu Ile Glu 595
600 16593PRTClostridium leptum 16Met Ile His Leu
Ile Pro Thr Glu Ala Lys Arg Phe Arg Thr Phe Gly 1 5
10 15 Trp Val Gln Asp Pro Ser Asp Phe Arg
Ser Leu Cys Asp Val Val Ala 20 25
30 Ile Phe Asp Glu Thr Ser Leu Lys His Gln Glu Leu Ala Gly
Gln Val 35 40 45
Ile Pro Ala Leu Val Glu Glu Arg Asp Gly Arg Gln Arg Leu Leu Asp 50
55 60 Ala Leu Asn Gln Arg
Pro Leu Arg Ile Ser Tyr Thr Asp Leu Val Gly 65 70
75 80 Thr Ser Phe Thr Pro Arg Ser Ala Ala Arg
Cys Asn Gly Ile Val Gln 85 90
95 Ala Ala Val Arg Gly Gln Val Arg Pro Phe Ile Gly Asp Trp Pro
Ala 100 105 110 Asp
Asn Phe Val Arg Trp Ala His Ala Leu Gly Phe Leu Arg Tyr Gly 115
120 125 Tyr Gln Gly Asp Ala Phe
Glu Leu Thr Glu Thr Gly Lys Ala Leu Ala 130 135
140 Gln Ala Arg Thr Gln Gly Glu Glu Leu Asn Ser
Gln Glu Lys Glu Leu 145 150 155
160 Leu Thr Ser Ala Val Leu Ala Tyr Pro Pro Ala Val Arg Ile Leu Ser
165 170 175 Leu Leu
Gly Glu Gly Glu Gly Ala His Leu Thr Lys Phe Glu Leu Gly 180
185 190 Lys Gln Leu Gly Phe Val Gly
Glu Asp Gly Phe Thr Ser Leu Pro Gln 195 200
205 Thr Val Leu Val Arg Ser Leu Ala Ser Ser Lys Asp
Ala Lys Glu Lys 210 215 220
Asn Lys Met Lys Thr Asp Trp Asp Gly Ser Ser Asp Lys Tyr Ala Arg 225
230 235 240 Met Ile Ala
Lys Trp Leu Glu Lys Leu Gly Leu Val Lys Gln Glu Ala 245
250 255 Lys Pro Val Thr Val Thr Leu Ala
Gly Arg Lys Tyr Thr Glu Ser Ile 260 265
270 Gly Gln Ser Tyr Val Ile Thr Gly Leu Gly Ile Thr Ala
Leu Asn Arg 275 280 285
Thr Leu Gly Lys Ser Arg His Lys Arg Ile Pro Lys Asn Val Ser Phe 290
295 300 Glu Met Met Ala
Thr Lys Gly Asp Asp Arg Glu Tyr Leu Arg Thr Arg 305 310
315 320 Arg Thr Cys Val Leu Lys Ala Val Ser
Glu Gly Lys Gly Arg Val Ser 325 330
335 Tyr Thr Glu Ile Gln Lys Tyr Leu Glu Ala Leu Gly Leu Gln
Glu Asp 340 345 350
Glu Ala Thr Ile Arg Asp Asp Val Gln Gly Leu Ile His Ile Gly Leu
355 360 365 Asn Ile Ala Ala
Gly Glu Arg Glu Cys Val Trp Lys Asp Glu Ile Asn 370
375 380 Asp Leu Ile Leu Pro Val Pro Lys
Lys Leu Ala Lys Ser Ser Gln Ser 385 390
395 400 Glu Thr Lys Glu Lys Leu Arg Glu Lys Leu Arg Asn
Leu Pro His Glu 405 410
415 Tyr Leu Ser Leu Val Asp Leu Ala Tyr Asp Ser Lys Gln Asn Arg Leu
420 425 430 Phe Glu Met
Lys Val Ile Glu Leu Leu Thr Glu Glu Cys Gly Phe Gln 435
440 445 Gly Leu His Leu Gly Gly Ser Arg
Arg Pro Asp Gly Val Leu Tyr Thr 450 455
460 Ala Gly Leu Thr Asp Asn Tyr Gly Ile Ile Leu Asp Thr
Lys Ala Tyr 465 470 475
480 Ser Ser Gly Tyr Ser Leu Pro Ile Ala Gln Ala Asp Glu Met Glu Arg
485 490 495 Tyr Val Arg Glu
Asn Gln Thr Arg Asp Glu Leu Val Asn Pro Asn Gln 500
505 510 Trp Trp Glu Asn Phe Glu Asn Gly Leu
Gly Thr Phe Tyr Phe Leu Phe 515 520
525 Val Ala Gly His Phe Asn Gly Asn Val Gln Ala Gln Leu Glu
Arg Ile 530 535 540
Ser Arg Asn Thr Gly Val Leu Gly Ala Ala Ala Ser Ile Ser Gln Leu 545
550 555 560 Leu Leu Leu Ala Asp
Ala Ile Arg Gly Gly Arg Met Asp Arg Glu Arg 565
570 575 Leu Arg His Leu Met Phe Gln Asn Glu Glu
Phe Leu Leu Glu Gln Glu 580 585
590 Leu 17587PRTClostridium spec. 7_2_43 FAA 17Met Ile Asn Ile
Ile Asp Val Asn Asn Lys Thr Ile Arg Thr Phe Gly 1 5
10 15 Trp Val Gln Asn Pro Ser Asn Phe Glu
Ser Leu Lys Lys Val Val Ala 20 25
30 Ile Phe Asp Asn Thr Ser Lys Thr Tyr Asn Glu Leu Lys Asp
Lys Lys 35 40 45
Ile Lys Lys Leu Val Asp Glu Arg Asp Gly Gln Lys Glu Leu Leu Asn 50
55 60 Ala Leu Asn Ala Asn
Pro Leu Lys Ile Lys Tyr Cys Asn Leu Val Gly 65 70
75 80 Thr Ser Phe Thr Pro Arg Ser Ser Ala Arg
Cys Asn Gly Ile Val Gln 85 90
95 Ala Thr Val Lys Gly Gln Arg Lys Glu Phe Ile Asp Asp Trp Ser
Ser 100 105 110 Asp
Asn Phe Val Arg Trp Ala His Ala Leu Gly Phe Ile Lys Tyr Asn 115
120 125 Tyr Asp Thr Asp Thr Phe
Glu Ile Thr Asp Val Gly Arg Lys Tyr Val 130 135
140 Gln Ser Glu Asp Asp Ser Asn Glu Glu Ser Thr
Ile Leu Glu Glu Ala 145 150 155
160 Met Leu Ser Tyr Pro Pro Val Ala Arg Val Leu Thr Leu Leu Ser Asn
165 170 175 Gly Glu
His Leu Thr Lys Tyr Glu Ile Gly Lys Lys Leu Gly Phe Val 180
185 190 Gly Glu Ala Gly Phe Thr Ser
Leu Pro Leu Asn Val Leu Ile Met Thr 195 200
205 Leu Ala Thr Thr Asp Glu Pro Lys Glu Lys Asn Lys
Ile Lys Thr Asp 210 215 220
Trp Asp Gly Ser Ser Asp Lys Tyr Ala Arg Met Ile Ser Gly Trp Leu 225
230 235 240 Val Lys Leu
Gly Leu Leu Val Gln Arg Pro Lys Leu Val Thr Val Asp 245
250 255 Phe Gly Gly Glu Leu Tyr Ser Glu
Thr Ile Gly His Ala Tyr Met Ile 260 265
270 Thr Asp Arg Gly Leu Lys Ala Val Arg Arg Leu Leu Gly
Ile Asn Lys 275 280 285
Val Ala Arg Val Ser Lys Asn Val Phe Trp Glu Met Leu Ala Thr Lys 290
295 300 Gly Ile Asp Lys
Asn Tyr Ile Arg Thr Arg Arg Ala Tyr Ile Leu Lys 305 310
315 320 Ile Leu Ile Glu Ser Asn Lys Val Leu
Thr Leu Glu Asp Ile Lys Gly 325 330
335 Lys Leu Lys Leu Ala Ser Ile Asn Glu Ser Ile Asn Thr Ile
Lys Asp 340 345 350
Asp Ile Asn Gly Leu Ile Asn Thr Gly Ile Asn Ile Lys Ser Glu Thr
355 360 365 Thr Gly Tyr Lys
Ile Tyr Asp Ser Ile Asn Asp Phe Ile Ile Pro Lys 370
375 380 Thr Gly Asp Thr Glu Gly Ile Lys
Ser Asn Ile Ser Leu Leu Lys Asp 385 390
395 400 Glu Leu Arg Gly Gln Ile Ser His Ile Ser His Glu
Tyr Leu Ser Leu 405 410
415 Ile Asp Leu Ala Phe Asp Ser Lys Gln Asn Arg Leu Phe Glu Met Lys
420 425 430 Val Leu Glu
Leu Leu Val Asn Glu Tyr Gly Phe Lys Gly Arg His Leu 435
440 445 Gly Gly Ser Arg Lys Pro Asp Gly
Ile Val Tyr Ser Thr Thr Leu Glu 450 455
460 Asp Asn Phe Gly Ile Ile Val Asp Thr Lys Ala Tyr Ser
Glu Gly Tyr 465 470 475
480 Ser Leu Pro Ile Ser Gln Ala Asp Glu Met Glu Arg Tyr Val Arg Glu
485 490 495 Asn Ser Asn Arg
Asp Glu Glu Val Asn Pro Asn Lys Trp Trp Glu Asn 500
505 510 Phe Ser Glu Glu Val Lys Lys Tyr Tyr
Phe Val Phe Ile Ser Gly Ser 515 520
525 Phe Lys Gly Lys Phe Glu Glu Gln Leu Arg Arg Leu Ser Met
Thr Thr 530 535 540
Gly Val Asn Gly Ser Ala Val Asn Val Val Asn Leu Leu Leu Gly Ala 545
550 555 560 Glu Lys Ile Arg Ser
Gly Glu Met Thr Ile Glu Glu Leu Glu Arg Ala 565
570 575 Met Phe Asn Asn Ser Glu Phe Ile Leu Lys
Tyr 580 585 18589PRTPeptoniphilus
duerdenii 18Met Ala Glu Arg Thr Leu Gly Trp Ile Gln Asn Pro Ser Ser Phe
Glu 1 5 10 15 Asn
Leu Lys Asn Val Val Ser Val Phe Asp Lys Asn Ser Asp Ile Tyr
20 25 30 Lys Glu Ile Leu Asn
Thr Lys Leu Pro Lys Leu Val Lys Asp Leu Asp 35
40 45 Leu Gln Asn Lys Leu Ile Ser Glu Leu
Glu Lys Asp Pro Leu Glu Met 50 55
60 Asp Tyr Val Leu Leu Lys Gly His Gly Ile Lys Ser Gly
Gln Lys Arg 65 70 75
80 Ala Asp Ala Glu Cys Ser Gly Ile Val Gln Ala Ala Ile Thr Thr Gln
85 90 95 Gly Gly Arg Ala
Tyr Thr Asp Asp Trp Thr Ala Asp Gly Phe Leu Arg 100
105 110 Trp Gly Ile Ser Ile Gly Leu Leu Asp
Tyr Asp Thr Glu Lys Asp Thr 115 120
125 Val Ser Ile Thr Lys Leu Gly Glu Lys Phe Val Lys Ser Asn
Ser Glu 130 135 140
Asp Ser Asp Lys Glu Ile Leu Ile Ser Ala Phe Leu Ser Tyr Pro Pro 145
150 155 160 Ala Val Arg Ile Leu
Thr Leu Leu Glu Asn Gly Asp His Leu Thr Lys 165
170 175 Phe Glu Leu Gly Lys Gln Leu Gly Gly Leu
Gly Glu Ala Gly Phe Thr 180 185
190 Ser Ile Pro Gln Asp Leu Tyr Ile Gln Ala Ile Glu Leu Ala Ala
Asp 195 200 205 Lys
Asp Lys Ala Ser Ile Arg Ser Asn Thr Glu Gly Ser Ala Asp Lys 210
215 220 Tyr Ala Arg Met Ile Ser
Gly Trp Leu Ser Lys Val Gly Leu Ile Gln 225 230
235 240 Arg Ile Gly Lys Glu Val Ser Thr Lys Ile Gly
Asp Val Glu Tyr Lys 245 250
255 Val Asn Ile Gly His Ser Phe Arg Ile Thr Leu Asn Gly Ile Lys Glu
260 265 270 Leu Lys
Arg Ala Met Gly Leu Ser Ser Tyr Pro Lys Thr Asp Lys Ile 275
280 285 Val Tyr Trp Gln Met Leu Ala
Thr Lys Gly Lys Asp Arg Asp Tyr Ile 290 295
300 Arg Asn Arg Arg Gly Tyr Ile Ile Lys Ala Ile Asn
Asn Arg Glu Arg 305 310 315
320 Asn Leu Glu Asp Ile Lys Ala Tyr Leu Leu Glu Asn Asn Ile Asp Glu
325 330 335 Ser Ile Thr
Thr Ile Glu Asp Glu Leu Lys Val Ile Glu Ala Met Gly 340
345 350 Leu Ser Phe Lys His Ser Arg Asn
Gly Tyr Val Ile Asp Asp Asn Ile 355 360
365 Ile Lys Leu Glu Ile Pro Arg Thr Lys Ile Ser Lys Thr
Asn Val Leu 370 375 380
Glu Leu Lys Asp Lys Val Arg Asp Lys Leu Lys Tyr Val Asp His Arg 385
390 395 400 Tyr Leu Ala Leu
Ile Asp Leu Ala Tyr Asp Gly Thr Ala Asn Arg Asp 405
410 415 Phe Glu Ile Gln Thr Ile Asp Leu Leu
Ile Asn Glu Leu Lys Phe Lys 420 425
430 Gly Val Arg Leu Gly Glu Ser Arg Lys Pro Asp Gly Ile Ile
Ser Tyr 435 440 445
Asn Ile Asn Gly Val Ile Ile Asp Asn Lys Ala Tyr Ser Thr Gly Tyr 450
455 460 Asn Leu Pro Ile Asn
Gln Ala Asp Glu Met Ile Arg Tyr Ile Glu Glu 465 470
475 480 Asn Gln Thr Arg Asp Glu Lys Ile Asn Ser
Asn Lys Trp Trp Glu Ser 485 490
495 Phe Asp Asp Lys Val Lys Asp Phe Asn Tyr Leu Phe Val Ser Ser
Phe 500 505 510 Phe
Lys Gly Asn Phe Lys Asn Asn Leu Lys His Ile Ala Asn Arg Thr 515
520 525 Gly Val Ser Gly Gly Ala
Ile Asn Val Glu Asn Leu Leu Tyr Phe Ala 530 535
540 Glu Glu Leu Lys Ala Gly Arg Leu Ser Tyr Val
Asp Ser Phe Lys Met 545 550 555
560 Tyr Asp Asn Asp Glu Ile Tyr Val Gly Asp Phe Ser Asp Tyr Ser Tyr
565 570 575 Val Lys
Phe Ala Ala Glu Glu Glu Gly Glu Tyr Leu Thr 580
585 19279PRTAcinetobacter lwoffii 19Lys Glu Thr Asn Ile
Leu Leu Val Glu Gln Leu Glu Glu Thr Leu Asn 1 5
10 15 Arg Asn Arg Ile Leu Phe Glu Lys Asn Ser
Ser Ile Ala Gln Ala Pro 20 25
30 Ile Gly Glu Ile Lys Asn Tyr Arg Tyr His Leu Glu Glu Leu Leu
Phe 35 40 45 Glu
Asn Asn Glu Lys Lys Phe Ala Glu Asn Gln Lys Asn Glu Trp Asp 50
55 60 Glu Ile Leu Ala Tyr Met
Asp Leu Leu Ile Ser Pro Lys Pro Ile Ser 65 70
75 80 Ile Glu Ile Ala Asp Lys Glu Ile Ser Ile Pro
Ser Gly Glu Arg Pro 85 90
95 Ala Tyr Phe Glu Trp Val Leu Trp Arg Ala Phe Leu Ala Leu Asn His
100 105 110 Leu Ile
Ile Glu Pro Gln Gln Cys Arg Arg Phe Lys Val Asp Gln Asp 115
120 125 Phe Lys Pro Ile His Asn Ala
Pro Gly Gly Gly Ala Asp Val Ile Phe 130 135
140 Glu Tyr Glu Asn Phe Lys Ile Leu Gly Glu Val Thr
Leu Thr Ser Asn 145 150 155
160 Ser Arg Gln Glu Ala Ala Glu Gly Glu Pro Val Arg Arg His Ile Ala
165 170 175 Val Glu Thr
Val Asn Thr Pro Asp Lys Asp Val Tyr Gly Leu Phe Leu 180
185 190 Ala Leu Thr Ile Asp Thr Asn Thr
Ala Glu Thr Phe Arg His Gly Ala 195 200
205 Trp Tyr His Gln Glu Glu Leu Met Asp Val Lys Ile Leu
Pro Leu Thr 210 215 220
Leu Glu Ser Phe Lys Lys Tyr Leu Glu Ser Leu Arg Lys Lys Asn Gln 225
230 235 240 Val Glu Thr Gly
Ile Phe Asp Leu Lys Lys Met Met Asp Glu Ser Leu 245
250 255 Lys Leu Arg Glu Thr Leu Thr Ala Pro
Gln Trp Lys Asn Glu Ile Thr 260 265
270 Asn Lys Phe Ala Arg Pro Ile 275
20201PRTClostridium leptum 20Lys Leu Ala Lys Ser Ser Gln Ser Glu Thr Lys
Glu Lys Leu Arg Glu 1 5 10
15 Lys Leu Arg Asn Leu Pro His Glu Tyr Leu Ser Leu Val Asp Leu Ala
20 25 30 Tyr Asp
Ser Lys Gln Asn Arg Leu Phe Glu Met Lys Val Ile Glu Leu 35
40 45 Leu Thr Glu Glu Cys Gly Phe
Gln Gly Leu His Leu Gly Gly Ser Arg 50 55
60 Arg Pro Asp Gly Val Leu Tyr Thr Ala Gly Leu Thr
Asp Asn Tyr Gly 65 70 75
80 Ile Ile Leu Asp Thr Lys Ala Tyr Ser Ser Gly Tyr Ser Leu Pro Ile
85 90 95 Ala Gln Ala
Asp Glu Met Glu Arg Tyr Val Arg Glu Asn Gln Thr Arg 100
105 110 Asp Glu Leu Val Asn Pro Asn Gln
Trp Trp Glu Asn Phe Glu Asn Gly 115 120
125 Leu Gly Thr Phe Tyr Phe Leu Phe Val Ala Gly His Phe
Asn Gly Asn 130 135 140
Val Gln Ala Gln Leu Glu Arg Ile Ser Arg Asn Thr Gly Val Leu Gly 145
150 155 160 Ala Ala Ala Ser
Ile Ser Gln Leu Leu Leu Leu Ala Asp Ala Ile Arg 165
170 175 Gly Gly Arg Met Asp Arg Glu Arg Leu
Arg His Leu Met Phe Gln Asn 180 185
190 Glu Glu Phe Leu Leu Glu Gln Glu Leu 195
200 21250PRTMicrococcus lylae 21Ile Asn Ser Lys Ile Lys Gln
Leu Asp Asp Ser Ile Asn Val Glu Ser 1 5
10 15 Leu Lys Ile Asp Asp Ala Lys Asp Leu Leu Asn
Asp Leu Glu Ile Gln 20 25
30 Arg Lys Ala Lys Thr Ile Glu Asp Thr Val Asn His Leu Lys Leu
Arg 35 40 45 Ser
Asp Ile Glu Asp Ile Leu Asp Val Phe Ala Lys Ile Lys Lys Arg 50
55 60 Asp Val Pro Asp Val Pro
Leu Phe Leu Glu Trp Asn Ile Trp Arg Ala 65 70
75 80 Phe Ala Ala Leu Asn His Thr Gln Ala Ile Glu
Gly Asn Phe Ile Val 85 90
95 Asp Leu Asp Gly Met Pro Leu Asn Thr Ala Pro Gly Lys Lys Pro Asp
100 105 110 Ile Glu
Ile Asn Tyr Gly Ser Phe Ser Cys Ile Val Glu Val Thr Met 115
120 125 Ser Ser Gly Glu Thr Gln Phe
Asn Met Glu Gly Ser Ser Val Pro Arg 130 135
140 His Tyr Gly Asp Leu Val Arg Lys Val Asp His Asp
Ala Tyr Cys Ile 145 150 155
160 Phe Ile Ala Pro Lys Val Ala Pro Gly Thr Lys Ala His Phe Phe Asn
165 170 175 Leu Asn Arg
Leu Ser Thr Lys His Tyr Gly Gly Lys Thr Lys Ile Ile 180
185 190 Pro Met Ser Leu Asp Asp Phe Ile
Cys Phe Leu Gln Val Gly Ile Thr 195 200
205 His Asn Phe Gln Asp Ile Asn Lys Leu Lys Asn Trp Leu
Asp Asn Leu 210 215 220
Ile Asn Phe Asn Leu Glu Ser Glu Asp Glu Glu Ile Trp Phe Glu Glu 225
230 235 240 Ile Ile Ser Lys
Ile Ser Thr Trp Ala Ile 245 250
22213PRTPeptoniphilus duerdenii 22Lys Ile Ser Lys Thr Asn Val Leu Glu Leu
Lys Asp Lys Val Arg Asp 1 5 10
15 Lys Leu Lys Tyr Val Asp His Arg Tyr Leu Ala Leu Ile Asp Leu
Ala 20 25 30 Tyr
Asp Gly Thr Ala Asn Arg Asp Phe Glu Ile Gln Thr Ile Asp Leu 35
40 45 Leu Ile Asn Glu Leu Lys
Phe Lys Gly Val Arg Leu Gly Glu Ser Arg 50 55
60 Lys Pro Asp Gly Ile Ile Ser Tyr Asn Ile Asn
Gly Val Ile Ile Asp 65 70 75
80 Asn Lys Ala Tyr Ser Thr Gly Tyr Asn Leu Pro Ile Asn Gln Ala Asp
85 90 95 Glu Met
Ile Arg Tyr Ile Glu Glu Asn Gln Thr Arg Asp Glu Lys Ile 100
105 110 Asn Ser Asn Lys Trp Trp Glu
Ser Phe Asp Asp Lys Val Lys Asp Phe 115 120
125 Asn Tyr Leu Phe Val Ser Ser Phe Phe Lys Gly Asn
Phe Lys Asn Asn 130 135 140
Leu Lys His Ile Ala Asn Arg Thr Gly Val Ser Gly Gly Ala Ile Asn 145
150 155 160 Val Glu Asn
Leu Leu Tyr Phe Ala Glu Glu Leu Lys Ala Gly Arg Leu 165
170 175 Ser Tyr Val Asp Ser Phe Lys Met
Tyr Asp Asn Asp Glu Ile Tyr Val 180 185
190 Gly Asp Phe Ser Asp Tyr Ser Tyr Val Lys Phe Ala Ala
Glu Glu Glu 195 200 205
Gly Glu Tyr Leu Thr 210 23163PRTStreptomyces spec.
Bf-61 23Ile Ser Val Asp Leu Pro Gly Gly Glu Glu Phe Leu Leu Ser Pro Ala 1
5 10 15 Gly Gln Asn
Pro Leu Leu Lys Lys Met Val Glu Glu Phe Val Pro Arg 20
25 30 Phe Ala Pro Arg Ser Thr Val Leu
Tyr Leu Gly Asp Thr Arg Gly Lys 35 40
45 His Ser Leu Phe Glu Arg Glu Ile Phe Glu Glu Val Leu
Gly Leu Thr 50 55 60
Phe Asp Pro His Gly Arg Met Pro Asp Leu Ile Leu His Asp Glu Val 65
70 75 80 Arg Gly Trp Leu
Phe Leu Met Glu Ala Val Lys Ser Lys Gly Pro Phe 85
90 95 Asp Glu Glu Arg His Arg Ser Leu Gln
Glu Leu Phe Val Thr Pro Ser 100 105
110 Ala Gly Leu Ile Phe Val Asn Cys Phe Glu Asn Arg Glu Ser
Met Arg 115 120 125
Gln Trp Leu Pro Glu Leu Ala Trp Glu Thr Glu Ala Trp Val Ala Glu 130
135 140 Asp Pro Asp His Leu
Ile His Leu Asn Gly Ser Arg Phe Leu Gly Pro 145 150
155 160 Tyr Glu Arg 24163PRTStreptomyces
diastaticus 24Ile Ser Val Asp Leu Ala Asp Gly Asp Glu Phe Leu Leu Ser Pro
Ala 1 5 10 15 Gly
Gln Asn Pro Leu Leu Lys Lys Met Val Glu Glu Phe Met Pro Arg
20 25 30 Phe Ala Pro Gly Ala
Lys Val Leu Tyr Ile Gly Asp Trp Arg Gly Lys 35
40 45 His Thr Arg Phe Glu Lys Arg Ile Phe
Glu Glu Thr Leu Gly Leu Thr 50 55
60 Phe Asp Pro His Gly Arg Met Pro Asp Leu Val Leu His
Asp Lys Val 65 70 75
80 Arg Lys Trp Leu Phe Leu Met Glu Ala Val Lys Ser Lys Gly Pro Phe
85 90 95 Asp Glu Glu Arg
His Arg Thr Leu Arg Glu Leu Phe Ala Thr Pro Val 100
105 110 Ala Gly Leu Val Phe Val Asn Cys Phe
Glu Asn Arg Glu Ala Met Arg 115 120
125 Gln Trp Leu Pro Glu Leu Ala Trp Glu Thr Glu Ala Trp Val
Ala Asp 130 135 140
Asp Pro Asp His Leu Ile His Leu Asn Gly Ser Arg Phe Leu Gly Pro 145
150 155 160 Tyr Glu Arg
25208PRTStreptococcus sanguis 25Asp Val Val Leu Glu Lys Ser Asp Ile Glu
Lys Phe Lys Asn Gln Leu 1 5 10
15 Arg Thr Glu Leu Thr Asn Ile Asp His Ser Tyr Leu Lys Gly Ile
Asp 20 25 30 Ile
Ala Ser Lys Lys Lys Thr Ser Asn Val Glu Asn Thr Glu Phe Glu 35
40 45 Ala Ile Ser Thr Lys Ile
Phe Thr Asp Glu Leu Gly Phe Ser Gly Lys 50 55
60 His Leu Gly Gly Ser Asn Lys Pro Asp Gly Leu
Leu Trp Asp Asp Asp 65 70 75
80 Cys Ala Ile Ile Leu Asp Ser Lys Ala Tyr Ser Glu Gly Phe Pro Leu
85 90 95 Thr Ala
Ser His Thr Asp Ala Met Gly Arg Tyr Leu Arg Gln Phe Thr 100
105 110 Glu Arg Lys Glu Glu Ile Lys
Pro Thr Trp Trp Asp Ile Ala Pro Glu 115 120
125 His Leu Asp Asn Thr Tyr Phe Ala Tyr Val Ser Gly
Ser Phe Ser Gly 130 135 140
Asn Tyr Lys Glu Gln Leu Gln Lys Phe Arg Gln Asp Thr Asn His Leu 145
150 155 160 Gly Gly Ala
Leu Glu Phe Val Lys Leu Leu Leu Leu Ala Asn Asn Tyr 165
170 175 Lys Thr Gln Lys Met Ser Lys Lys
Glu Val Lys Lys Ser Ile Leu Asp 180 185
190 Tyr Asn Ile Ser Tyr Glu Glu Tyr Ala Pro Leu Leu Ala
Glu Ile Glu 195 200 205
26196PRTFlavobacterium okeanokoites 26Gln Leu Val Lys Ser Glu Leu Glu
Glu Lys Lys Ser Glu Leu Arg His 1 5 10
15 Lys Leu Lys Tyr Val Pro His Glu Tyr Ile Glu Leu Ile
Glu Ile Ala 20 25 30
Arg Asn Ser Thr Gln Asp Arg Ile Leu Glu Met Lys Val Met Glu Phe
35 40 45 Phe Met Lys Val
Tyr Gly Tyr Arg Gly Lys His Leu Gly Gly Ser Arg 50
55 60 Lys Pro Asp Gly Ala Ile Tyr Thr
Val Gly Ser Pro Ile Asp Tyr Gly 65 70
75 80 Val Ile Val Asp Thr Lys Ala Tyr Ser Gly Gly Tyr
Asn Leu Pro Ile 85 90
95 Gly Gln Ala Asp Glu Met Gln Arg Tyr Val Glu Glu Asn Gln Thr Arg
100 105 110 Asn Lys His
Ile Asn Pro Asn Glu Trp Trp Lys Val Tyr Pro Ser Ser 115
120 125 Val Thr Glu Phe Lys Phe Leu Phe
Val Ser Gly His Phe Lys Gly Asn 130 135
140 Tyr Lys Ala Gln Leu Thr Arg Leu Asn His Ile Thr Asn
Cys Asn Gly 145 150 155
160 Ala Val Leu Ser Val Glu Glu Leu Leu Ile Gly Gly Glu Met Ile Lys
165 170 175 Ala Gly Thr Leu
Thr Leu Glu Glu Val Arg Arg Lys Phe Asn Asn Gly 180
185 190 Glu Ile Asn Phe 195 27
7903DNAArtificial sequencepCAG-ArtTal1-AlwI 27gacattgatt attgactagt
tattaatagt aatcaattac ggggtcatta gttcatagcc 60catatatgga gttccgcgtt
acataactta cggtaaatgg cccgcctggc tgaccgccca 120acgacccccg cccattgacg
tcaataatga cgtatgttcc catagtaacg ccaataggga 180ctttccattg acgtcaatgg
gtggagtatt tacggtaaac tgcccacttg gcagtacatc 240aagtgtatca tatgccaagt
acgcccccta ttgacgtcaa tgacggtaaa tggcccgcct 300ggcattatgc ccagtacatg
accttatggg actttcctac ttggcagtac atctacgtat 360tagtcatcgc tattaccatg
gtcgaggtga gccccacgtt ctgcttcact ctccccatct 420cccccccctc cccaccccca
attttgtatt tatttatttt ttaattattt tgtgcagcga 480tgggggcggg gggggggggg
gggcgcgcgc caggcggggc ggggcggggc gaggggcggg 540gcggggcgag gcggagaggt
gcggcggcag ccaatcagag cggcgcgctc cgaaagtttc 600cttttatggc gaggcggcgg
cggcggcggc cctataaaaa gcgaagcgcg cggcgggcgg 660gagtcgctgc gcgctgcctt
cgccccgtgc cccgctccgc cgccgcctcg cgccgcccgc 720cccggctctg actgaccgcg
ttactcccac aggtgagcgg gcgggacggc ccttctcctc 780cgggctgtaa ttagcgcttg
gtttaatgac ggcttgtttc ttttctgtgg ctgcgtgaaa 840gccttgaggg gctccgggag
ggccctttgt gcggggggga gcggctcggg gggtgcgtgc 900gtgtgtgtgt gcgtggggag
cgccgcgtgc ggctccgcgc tgcccggcgg ctgtgagcgc 960tgcgggcgcg gcgcggggct
ttgtgcgctc cgcagtgtgc gcgaggggag cgcggccggg 1020ggcggtgccc cgcggtgcgg
ggggggctgc gaggggaaca aaggctgcgt gcggggtgtg 1080tgcgtggggg ggtgagcagg
gggtgtgggc gcgtcggtcg ggctgcaacc ccccctgcac 1140ccccctcccc gagttgctga
gcacggcccg gcttcgggtg cggggctccg tacggggcgt 1200ggcgcggggc tcgccgtgcc
gggcgggggg tggcggcagg tgggggtgcc gggcggggcg 1260gggccgcctc gggccgggga
gggctcgggg gaggggcgcg gcggcccccg gagcgccggc 1320ggctgtcgag gcgcggcgag
ccgcagccat tgccttttat ggtaatcgtg cgagagggcg 1380cagggacttc ctttgtccca
aatctgtgcg gagccgaaat ctgggaggcg ccgccgcacc 1440ccctctagcg ggcgcggggc
gaagcggtgc ggcgccggca ggaaggaaat gggcggggag 1500ggccttcgtg cgtcgccgcg
ccgccgtccc cttctccctc tccagcctcg gggctgtccg 1560cggggggacg gctgccttcg
ggggggacgg ggcagggcgg ggttcggctt ctggcgtgtg 1620accggcggct ctagagcctc
tgctaaccat gttcatgcct tcttcttttt cctacagatc 1680cttaattaat aatacgactc
actatagggg ccgccaccat gggacctaag aaaaagagga 1740aggtggcggc cgctgactac
aaggatgacg acgataaacc aggtggcgga ggtagtggcg 1800gaggtggggt acccgccagt
ccagcagccc aggtggatct gagaaccctc ggctacagcc 1860agcagcagca ggagaagatc
aaaccaaagg tgcggtccac cgtcgctcag caccatgaag 1920cactggtggg gcacggtttc
acacacgccc atattgtggc tctgtctcag catcccgctg 1980cactcgggac tgtggccgtc
aaatatcagg acatgatcgc cgctctgcct gaggcaaccc 2040acgaagccat tgtgggcgtc
ggaaagcagt ggagcggtgc cagagcactc gaagcactcc 2100tcaccgtcgc cggggaactg
cggggtccac cactccagtc cggactggac actggacagc 2160tgctgaagat cgctaaacgc
ggcggagtga cagctgtgga agctgtgcac gcttggagga 2220atgctctgac aggagcccca
ctgaatctta ctccagaaca ggtcgtcgca atcgcaagta 2280acatcggcgg aaaacaggcc
ctcgaaaccg tccagagact cctccccgtg ctgtgccagg 2340cccacggact gaccccacag
caggtggtcg ccatcgctag caacggcgga gggaagcagg 2400ctctggagac cgtgcagagg
ctgctccccg tcctgtgcca ggcacatggg ctcacacctc 2460agcaggtggt cgcaattgcc
tccaatggtg gcggaaaaca ggccctggaa actgtgcaga 2520gactgctccc cgtgctgtgc
caggctcacg gtctcacacc ccagcaggtg gtcgctatcg 2580catctcatga cgggggcaag
caggcactgg agacagtgca gcggctgctc cctgtcctgt 2640gccaggccca cggactcact
cctcagcagg tcgtcgccat tgctagtaac ggcggaggga 2700aacaggctct ggaaaccgtg
cagcgcctgc tccccgtgct gtgccaagcc cacggcctga 2760ccccccagca ggtggtcgca
atcgcctcaa acaatggtgg caagcaggcc ctggagactg 2820tgcagcgact gctcccagtg
ctgtgccagg cccatggact cacaccacag caggtcgtcg 2880ctattgcaag caacaatgga
gggaaacagg cactggaaac agtccagagg ctgctccccg 2940tgctgtgcca agcgcatgga
ctcactcccc agcaggtcgt cgccatcgct tccaataacg 3000gcggcaagca ggccctggag
accgtccaga gactgctccc cgtgctgtgc caagctcacg 3060gactcacacc tgagcaggtc
gtggcaatcg cctctaacat tggagggaaa caggccctgg 3120aaactgtaca gcggctgctc
cccgtgctgt gccaagcaca cggactcact ccacagcagg 3180tcgtggccat tgcaagtcat
gacggaggca agcaggccct ggaaacagtg cagcgcctgc 3240tccctgtgct gtgccaggct
catggtctga ctcctcagca ggtggtggcc atcgcttcca 3300acaatggagg gaagcaggcc
ctggagaccg tacagagact gctccccgtg ctgtgccaag 3360cgcacggtct gacccctcag
caggtcgtcg caatcgccag caatggcggg ggcaagcagg 3420ctctcgaaac cgtccagcgg
ctcctcccag tcctctgtca ggctcacggc ctgaccccac 3480agcaggtcgt cgctattgct
tctaatggcg gagggcggcc tgctctggag agcattgtgg 3540ctcagctgtc caggcccgat
cctgccctgg ctagatccgc actcactaac gatcatctgg 3600tcgctctcgc ttgcctcggt
ggacggcccg ctctggacgc agtcaaaaag ggtctccccc 3660atgctcccgc actgatcaag
agaaccaaca ggagaattcc tgagggatcc gatcgtttaa 3720acaaagagac taatatcctc
ctcgtcgagc agctggaaga gaccctcaat cgcaatcgca 3780ttctgtttga aaagaactcc
tcaatcgcac aggccccaat tggcgagatc aagaactacc 3840ggtatcacct ggaggaactg
ctcttcgaga acaatgaaaa gaaatttgca gagaaccaga 3900aaaatgagtg ggacgaaatt
ctggcctaca tggatctgct catctcaccc aagcctatca 3960gcattgagat cgctgacaaa
gaaatttcta tcccaagtgg ggagcgaccc gcatatttcg 4020aatgggtgct gtggagggca
tttctggccc tcaaccacct gatcattgag ccccagcagt 4080gcaggagatt caaggtcgac
caggacttca agcctatcca taatgctcca ggcggagggg 4140cagatgtgat tttcgagtac
gaaaacttta agatcctggg cgaggtcacc ctcacaagca 4200attcccgaca ggaagcagct
gagggagaac ccgtgcggcg ccatattgcc gtggagacag 4260tcaacactcc tgacaaggat
gtctatggac tgttcctcgc tctgaccatc gacactaata 4320ccgccgagac atttcgacac
ggggcttggt atcaccagga ggaactgatg gatgtgaaga 4380ttctccccct gactctcgag
tccttcaaga agtatctgga atctctcaga aagaaaaatc 4440aggtggagac aggaatcttt
gacctgaaga aaatgatgga tgaaagcctg aagctccggg 4500aaaccctgac cgcaccccag
tggaaaaatg aaatcacaaa caaattcgcc agaccaatct 4560gaacgcgtaa atgattgcag
atccactagt tctagaattc cagctgagcg ccggtcgcta 4620ccattaccag ttggtctggt
gtcaaaaata ataataaccg ggcagggggg atctgcatgg 4680atctttgtga aggaacctta
cttctgtggt gtgacataat tggacaaact acctacagag 4740atttaaagct ctaaggtaaa
tataaaattt ttaagtgtat aatgtgttaa actactgatt 4800ctaattgttt gtgtatttta
gattccaacc tatggaactg atgaatggga gcagtggtgg 4860aatgccagat ccagacatga
taagatacat tgatgagttt ggacaaacca caactagaat 4920gcagtgaaaa aaatgcttta
tttgtgaaat ttgtgatgct attgctttat ttgtaaccat 4980tataagctgc aataaacaag
ttaacaacaa caattgcatt cattttatgt ttcaggttca 5040gggggaggtg tgggaggttt
tttaaagcaa gtaaaacctc tacaaatgtg gtatggctga 5100ttatgatctg cggccgccac
tggccgtcgt tttacaacgt cgtgactggg aaaaccctgg 5160cgttacccaa cttaatcgcc
ttgcagcaca tccccctttc gccagctggc gtaatagcga 5220agaggcccgc accgatcgcc
cttcccaaca gttgcgcagc ctgaatggcg aatggaacgc 5280gccctgtagc ggcgcattaa
gcgcggcggg tgtggtggtt acgcgcagcg tgaccgctac 5340acttgccagc gccctagcgc
ccgctccttt cgctttcttc ccttcctttc tcgccacgtt 5400cgccggcttt ccccgtcaag
ctctaaatcg ggggctccct ttagggttcc gatttagtgc 5460tttacggcac ctcgacccca
aaaaacttga ttagggtgat ggttcacgta gtgggccatc 5520gccctgatag acggtttttc
gccctttgac gttggagtcc acgttcttta atagtggact 5580cttgttccaa actggaacaa
cactcaaccc tatctcggtc tattcttttg atttataagg 5640gattttgccg atttcggcct
attggttaaa aaatgagctg atttaacaaa aatttaacgc 5700gaattttaac aaaatattaa
cgcttacaat ttaggtggca cttttcgggg aaatgtgcgc 5760ggaaccccta tttgtttatt
tttctaaata cattcaaata tgtatccgct catgagacaa 5820taaccctgat aaatgcttca
ataatattga aaaaggaaga gtatgagtat tcaacatttc 5880cgtgtcgccc ttattccctt
ttttgcggca ttttgccttc ctgtttttgc tcacccagaa 5940acgctggtga aagtaaaaga
tgctgaagat cagttgggtg cacgagtggg ttacatcgaa 6000ctggatctca acagcggtaa
gatccttgag agttttcgcc ccgaagaacg ttttccaatg 6060atgagcactt ttaaagttct
gctatgtggc gcggtattat cccgtattga cgccgggcaa 6120gagcaactcg gtcgccgcat
acactattct cagaatgact tggttgagta ctcaccagtc 6180acagaaaagc atcttacgga
tggcatgaca gtaagagaat tatgcagtgc tgccataacc 6240atgagtgata acactgcggc
caacttactt ctgacaacga tcggaggacc gaaggagcta 6300accgcttttt tgcacaacat
gggggatcat gtaactcgcc ttgatcgttg ggaaccggag 6360ctgaatgaag ccataccaaa
cgacgagcgt gacaccacga tgcctgtagc aatggcaaca 6420acgttgcgca aactattaac
tggcgaacta cttactctag cttcccggca acaattaata 6480gactggatgg aggcggataa
agttgcagga ccacttctgc gctcggccct tccggctggc 6540tggtttattg ctgataaatc
tggagccggt gagcgtgggt ctcgcggtat cattgcagca 6600ctggggccag atggtaagcc
ctcccgtatc gtagttatct acacgacggg gagtcaggca 6660actatggatg aacgaaatag
acagatcgct gagataggtg cctcactgat taagcattgg 6720taactgtcag accaagttta
ctcatatata ctttagattg atttaaaact tcatttttaa 6780tttaaaagga tctaggtgaa
gatccttttt gataatctca tgaccaaaat cccttaacgt 6840gagttttcgt tccactgagc
gtcagacccc gtagaaaaga tcaaaggatc ttcttgagat 6900cctttttttc tgcgcgtaat
ctgctgcttg caaacaaaaa aaccaccgct accagcggtg 6960gtttgtttgc cggatcaaga
gctaccaact ctttttccga aggtaactgg cttcagcaga 7020gcgcagatac caaatactgt
ccttctagtg tagccgtagt taggccacca cttcaagaac 7080tctgtagcac cgcctacata
cctcgctctg ctaatcctgt taccagtggc tgctgccagt 7140ggcgataagt cgtgtcttac
cgggttggac tcaagacgat agttaccgga taaggcgcag 7200cggtcgggct gaacgggggg
ttcgtgcaca cagcccagct tggagcgaac gacctacacc 7260gaactgagat acctacagcg
tgagctatga gaaagcgcca cgcttcccga agggagaaag 7320gcggacaggt atccggtaag
cggcagggtc ggaacaggag agcgcacgag ggagcttcca 7380gggggaaacg cctggtatct
ttatagtcct gtcgggtttc gccacctctg acttgagcgt 7440cgatttttgt gatgctcgtc
aggggggcgg agcctatgga aaaacgccag caacgcggcc 7500tttttacggt tcctggcctt
ttgctggcct tttgctcaca tgttctttcc tgcgttatcc 7560cctgattctg tggataaccg
tattaccgcc tttgagtgag ctgataccgc tcgccgcagc 7620cgaacgaccg agcgcagcga
gtcagtgagc gaggaagcgg aagagcgccc aatacgcaaa 7680ccgcctctcc ccgcgcgttg
gccgattcat taatgcagct ggcacgacag gtttcccgac 7740tggaaagcgg gcagtgagcg
caacgcaatt aatgtgagtt agctcactca ttaggcaccc 7800caggctttac actttatgct
tccggctcgt atgttgtgtg gaattgtgag cggataacaa 7860tttcacacag gaaacagcta
tgaccatgag gcgcgccgga ttc 7903287669DNAArtificial
sequencepCAG-ArtTal1-CLEDORF 28gacattgatt attgactagt tattaatagt
aatcaattac ggggtcatta gttcatagcc 60catatatgga gttccgcgtt acataactta
cggtaaatgg cccgcctggc tgaccgccca 120acgacccccg cccattgacg tcaataatga
cgtatgttcc catagtaacg ccaataggga 180ctttccattg acgtcaatgg gtggagtatt
tacggtaaac tgcccacttg gcagtacatc 240aagtgtatca tatgccaagt acgcccccta
ttgacgtcaa tgacggtaaa tggcccgcct 300ggcattatgc ccagtacatg accttatggg
actttcctac ttggcagtac atctacgtat 360tagtcatcgc tattaccatg gtcgaggtga
gccccacgtt ctgcttcact ctccccatct 420cccccccctc cccaccccca attttgtatt
tatttatttt ttaattattt tgtgcagcga 480tgggggcggg gggggggggg gggcgcgcgc
caggcggggc ggggcggggc gaggggcggg 540gcggggcgag gcggagaggt gcggcggcag
ccaatcagag cggcgcgctc cgaaagtttc 600cttttatggc gaggcggcgg cggcggcggc
cctataaaaa gcgaagcgcg cggcgggcgg 660gagtcgctgc gcgctgcctt cgccccgtgc
cccgctccgc cgccgcctcg cgccgcccgc 720cccggctctg actgaccgcg ttactcccac
aggtgagcgg gcgggacggc ccttctcctc 780cgggctgtaa ttagcgcttg gtttaatgac
ggcttgtttc ttttctgtgg ctgcgtgaaa 840gccttgaggg gctccgggag ggccctttgt
gcggggggga gcggctcggg gggtgcgtgc 900gtgtgtgtgt gcgtggggag cgccgcgtgc
ggctccgcgc tgcccggcgg ctgtgagcgc 960tgcgggcgcg gcgcggggct ttgtgcgctc
cgcagtgtgc gcgaggggag cgcggccggg 1020ggcggtgccc cgcggtgcgg ggggggctgc
gaggggaaca aaggctgcgt gcggggtgtg 1080tgcgtggggg ggtgagcagg gggtgtgggc
gcgtcggtcg ggctgcaacc ccccctgcac 1140ccccctcccc gagttgctga gcacggcccg
gcttcgggtg cggggctccg tacggggcgt 1200ggcgcggggc tcgccgtgcc gggcgggggg
tggcggcagg tgggggtgcc gggcggggcg 1260gggccgcctc gggccgggga gggctcgggg
gaggggcgcg gcggcccccg gagcgccggc 1320ggctgtcgag gcgcggcgag ccgcagccat
tgccttttat ggtaatcgtg cgagagggcg 1380cagggacttc ctttgtccca aatctgtgcg
gagccgaaat ctgggaggcg ccgccgcacc 1440ccctctagcg ggcgcggggc gaagcggtgc
ggcgccggca ggaaggaaat gggcggggag 1500ggccttcgtg cgtcgccgcg ccgccgtccc
cttctccctc tccagcctcg gggctgtccg 1560cggggggacg gctgccttcg ggggggacgg
ggcagggcgg ggttcggctt ctggcgtgtg 1620accggcggct ctagagcctc tgctaaccat
gttcatgcct tcttcttttt cctacagatc 1680cttaattaat aatacgactc actatagggg
ccgccaccat gggacctaag aaaaagagga 1740aggtggcggc cgctgactac aaggatgacg
acgataaacc aggtggcgga ggtagtggcg 1800gaggtggggt acccgccagt ccagcagccc
aggtggatct gagaaccctc ggctacagcc 1860agcagcagca ggagaagatc aaaccaaagg
tgcggtccac cgtcgctcag caccatgaag 1920cactggtggg gcacggtttc acacacgccc
atattgtggc tctgtctcag catcccgctg 1980cactcgggac tgtggccgtc aaatatcagg
acatgatcgc cgctctgcct gaggcaaccc 2040acgaagccat tgtgggcgtc ggaaagcagt
ggagcggtgc cagagcactc gaagcactcc 2100tcaccgtcgc cggggaactg cggggtccac
cactccagtc cggactggac actggacagc 2160tgctgaagat cgctaaacgc ggcggagtga
cagctgtgga agctgtgcac gcttggagga 2220atgctctgac aggagcccca ctgaatctta
ctccagaaca ggtcgtcgca atcgcaagta 2280acatcggcgg aaaacaggcc ctcgaaaccg
tccagagact cctccccgtg ctgtgccagg 2340cccacggact gaccccacag caggtggtcg
ccatcgctag caacggcgga gggaagcagg 2400ctctggagac cgtgcagagg ctgctccccg
tcctgtgcca ggcacatggg ctcacacctc 2460agcaggtggt cgcaattgcc tccaatggtg
gcggaaaaca ggccctggaa actgtgcaga 2520gactgctccc cgtgctgtgc caggctcacg
gtctcacacc ccagcaggtg gtcgctatcg 2580catctcatga cgggggcaag caggcactgg
agacagtgca gcggctgctc cctgtcctgt 2640gccaggccca cggactcact cctcagcagg
tcgtcgccat tgctagtaac ggcggaggga 2700aacaggctct ggaaaccgtg cagcgcctgc
tccccgtgct gtgccaagcc cacggcctga 2760ccccccagca ggtggtcgca atcgcctcaa
acaatggtgg caagcaggcc ctggagactg 2820tgcagcgact gctcccagtg ctgtgccagg
cccatggact cacaccacag caggtcgtcg 2880ctattgcaag caacaatgga gggaaacagg
cactggaaac agtccagagg ctgctccccg 2940tgctgtgcca agcgcatgga ctcactcccc
agcaggtcgt cgccatcgct tccaataacg 3000gcggcaagca ggccctggag accgtccaga
gactgctccc cgtgctgtgc caagctcacg 3060gactcacacc tgagcaggtc gtggcaatcg
cctctaacat tggagggaaa caggccctgg 3120aaactgtaca gcggctgctc cccgtgctgt
gccaagcaca cggactcact ccacagcagg 3180tcgtggccat tgcaagtcat gacggaggca
agcaggccct ggaaacagtg cagcgcctgc 3240tccctgtgct gtgccaggct catggtctga
ctcctcagca ggtggtggcc atcgcttcca 3300acaatggagg gaagcaggcc ctggagaccg
tacagagact gctccccgtg ctgtgccaag 3360cgcacggtct gacccctcag caggtcgtcg
caatcgccag caatggcggg ggcaagcagg 3420ctctcgaaac cgtccagcgg ctcctcccag
tcctctgtca ggctcacggc ctgaccccac 3480agcaggtcgt cgctattgct tctaatggcg
gagggcggcc tgctctggag agcattgtgg 3540ctcagctgtc caggcccgat cctgccctgg
ctagatccgc actcactaac gatcatctgg 3600tcgctctcgc ttgcctcggt ggacggcccg
ctctggacgc agtcaaaaag ggtctccccc 3660atgctcccgc actgatcaag agaaccaaca
ggagaattcc tgagggatcc gatcgtttaa 3720acaagctcgc aaagtcaagc cagtccgaaa
caaaggaaaa actcagagaa aaactcagaa 3780acctgcccca tgaatacctg tccctcgtcg
acctggccta cgattcaaag cagaaccgcc 3840tctttgagat gaaagtgatc gaactgctca
cagaggaatg cgggttccag ggtctgcacc 3900tcggcggaag caggagacca gacggcgtcc
tgtacaccgc cggactcaca gacaactatg 3960ggatcattct ggatactaag gcttacagct
ccggatattc cctgcccatt gcccaggctg 4020acgagatgga acggtacgtg cgcgagaatc
agactagaga tgaactggtc aaccctaatc 4080agtggtggga gaactttgaa aatggcctgg
gaaccttcta ttttctcttc gtggctgggc 4140atttcaacgg taatgtccag gcacagctgg
agcgaatcag taggaatacc ggcgtgctgg 4200gagccgctgc atctatcagt cagctgctcc
tgctcgcaga cgccattaga gggggtcgga 4260tggatagaga gagactgcgg cacctcatgt
ttcagaacga agagtttctg ctggaacagg 4320agctgtgaac gcgtaaatga ttgcagatcc
actagttcta gaattccagc tgagcgccgg 4380tcgctaccat taccagttgg tctggtgtca
aaaataataa taaccgggca ggggggatct 4440gcatggatct ttgtgaagga accttacttc
tgtggtgtga cataattgga caaactacct 4500acagagattt aaagctctaa ggtaaatata
aaatttttaa gtgtataatg tgttaaacta 4560ctgattctaa ttgtttgtgt attttagatt
ccaacctatg gaactgatga atgggagcag 4620tggtggaatg ccagatccag acatgataag
atacattgat gagtttggac aaaccacaac 4680tagaatgcag tgaaaaaaat gctttatttg
tgaaatttgt gatgctattg ctttatttgt 4740aaccattata agctgcaata aacaagttaa
caacaacaat tgcattcatt ttatgtttca 4800ggttcagggg gaggtgtggg aggtttttta
aagcaagtaa aacctctaca aatgtggtat 4860ggctgattat gatctgcggc cgccactggc
cgtcgtttta caacgtcgtg actgggaaaa 4920ccctggcgtt acccaactta atcgccttgc
agcacatccc cctttcgcca gctggcgtaa 4980tagcgaagag gcccgcaccg atcgcccttc
ccaacagttg cgcagcctga atggcgaatg 5040gaacgcgccc tgtagcggcg cattaagcgc
ggcgggtgtg gtggttacgc gcagcgtgac 5100cgctacactt gccagcgccc tagcgcccgc
tcctttcgct ttcttccctt cctttctcgc 5160cacgttcgcc ggctttcccc gtcaagctct
aaatcggggg ctccctttag ggttccgatt 5220tagtgcttta cggcacctcg accccaaaaa
acttgattag ggtgatggtt cacgtagtgg 5280gccatcgccc tgatagacgg tttttcgccc
tttgacgttg gagtccacgt tctttaatag 5340tggactcttg ttccaaactg gaacaacact
caaccctatc tcggtctatt cttttgattt 5400ataagggatt ttgccgattt cggcctattg
gttaaaaaat gagctgattt aacaaaaatt 5460taacgcgaat tttaacaaaa tattaacgct
tacaatttag gtggcacttt tcggggaaat 5520gtgcgcggaa cccctatttg tttatttttc
taaatacatt caaatatgta tccgctcatg 5580agacaataac cctgataaat gcttcaataa
tattgaaaaa ggaagagtat gagtattcaa 5640catttccgtg tcgcccttat tccctttttt
gcggcatttt gccttcctgt ttttgctcac 5700ccagaaacgc tggtgaaagt aaaagatgct
gaagatcagt tgggtgcacg agtgggttac 5760atcgaactgg atctcaacag cggtaagatc
cttgagagtt ttcgccccga agaacgtttt 5820ccaatgatga gcacttttaa agttctgcta
tgtggcgcgg tattatcccg tattgacgcc 5880gggcaagagc aactcggtcg ccgcatacac
tattctcaga atgacttggt tgagtactca 5940ccagtcacag aaaagcatct tacggatggc
atgacagtaa gagaattatg cagtgctgcc 6000ataaccatga gtgataacac tgcggccaac
ttacttctga caacgatcgg aggaccgaag 6060gagctaaccg cttttttgca caacatgggg
gatcatgtaa ctcgccttga tcgttgggaa 6120ccggagctga atgaagccat accaaacgac
gagcgtgaca ccacgatgcc tgtagcaatg 6180gcaacaacgt tgcgcaaact attaactggc
gaactactta ctctagcttc ccggcaacaa 6240ttaatagact ggatggaggc ggataaagtt
gcaggaccac ttctgcgctc ggcccttccg 6300gctggctggt ttattgctga taaatctgga
gccggtgagc gtgggtctcg cggtatcatt 6360gcagcactgg ggccagatgg taagccctcc
cgtatcgtag ttatctacac gacggggagt 6420caggcaacta tggatgaacg aaatagacag
atcgctgaga taggtgcctc actgattaag 6480cattggtaac tgtcagacca agtttactca
tatatacttt agattgattt aaaacttcat 6540ttttaattta aaaggatcta ggtgaagatc
ctttttgata atctcatgac caaaatccct 6600taacgtgagt tttcgttcca ctgagcgtca
gaccccgtag aaaagatcaa aggatcttct 6660tgagatcctt tttttctgcg cgtaatctgc
tgcttgcaaa caaaaaaacc accgctacca 6720gcggtggttt gtttgccgga tcaagagcta
ccaactcttt ttccgaaggt aactggcttc 6780agcagagcgc agataccaaa tactgtcctt
ctagtgtagc cgtagttagg ccaccacttc 6840aagaactctg tagcaccgcc tacatacctc
gctctgctaa tcctgttacc agtggctgct 6900gccagtggcg ataagtcgtg tcttaccggg
ttggactcaa gacgatagtt accggataag 6960gcgcagcggt cgggctgaac ggggggttcg
tgcacacagc ccagcttgga gcgaacgacc 7020tacaccgaac tgagatacct acagcgtgag
ctatgagaaa gcgccacgct tcccgaaggg 7080agaaaggcgg acaggtatcc ggtaagcggc
agggtcggaa caggagagcg cacgagggag 7140cttccagggg gaaacgcctg gtatctttat
agtcctgtcg ggtttcgcca cctctgactt 7200gagcgtcgat ttttgtgatg ctcgtcaggg
gggcggagcc tatggaaaaa cgccagcaac 7260gcggcctttt tacggttcct ggccttttgc
tggccttttg ctcacatgtt ctttcctgcg 7320ttatcccctg attctgtgga taaccgtatt
accgcctttg agtgagctga taccgctcgc 7380cgcagccgaa cgaccgagcg cagcgagtca
gtgagcgagg aagcggaaga gcgcccaata 7440cgcaaaccgc ctctccccgc gcgttggccg
attcattaat gcagctggca cgacaggttt 7500cccgactgga aagcgggcag tgagcgcaac
gcaattaatg tgagttagct cactcattag 7560gcaccccagg ctttacactt tatgcttccg
gctcgtatgt tgtgtggaat tgtgagcgga 7620taacaatttc acacaggaaa cagctatgac
catgaggcgc gccggattc 7669297663DNAArtificial
sequencepCAG-ArtTal1-Clo051 29gacattgatt attgactagt tattaatagt aatcaattac
ggggtcatta gttcatagcc 60catatatgga gttccgcgtt acataactta cggtaaatgg
cccgcctggc tgaccgccca 120acgacccccg cccattgacg tcaataatga cgtatgttcc
catagtaacg ccaataggga 180ctttccattg acgtcaatgg gtggagtatt tacggtaaac
tgcccacttg gcagtacatc 240aagtgtatca tatgccaagt acgcccccta ttgacgtcaa
tgacggtaaa tggcccgcct 300ggcattatgc ccagtacatg accttatggg actttcctac
ttggcagtac atctacgtat 360tagtcatcgc tattaccatg gtcgaggtga gccccacgtt
ctgcttcact ctccccatct 420cccccccctc cccaccccca attttgtatt tatttatttt
ttaattattt tgtgcagcga 480tgggggcggg gggggggggg gggcgcgcgc caggcggggc
ggggcggggc gaggggcggg 540gcggggcgag gcggagaggt gcggcggcag ccaatcagag
cggcgcgctc cgaaagtttc 600cttttatggc gaggcggcgg cggcggcggc cctataaaaa
gcgaagcgcg cggcgggcgg 660gagtcgctgc gcgctgcctt cgccccgtgc cccgctccgc
cgccgcctcg cgccgcccgc 720cccggctctg actgaccgcg ttactcccac aggtgagcgg
gcgggacggc ccttctcctc 780cgggctgtaa ttagcgcttg gtttaatgac ggcttgtttc
ttttctgtgg ctgcgtgaaa 840gccttgaggg gctccgggag ggccctttgt gcggggggga
gcggctcggg gggtgcgtgc 900gtgtgtgtgt gcgtggggag cgccgcgtgc ggctccgcgc
tgcccggcgg ctgtgagcgc 960tgcgggcgcg gcgcggggct ttgtgcgctc cgcagtgtgc
gcgaggggag cgcggccggg 1020ggcggtgccc cgcggtgcgg ggggggctgc gaggggaaca
aaggctgcgt gcggggtgtg 1080tgcgtggggg ggtgagcagg gggtgtgggc gcgtcggtcg
ggctgcaacc ccccctgcac 1140ccccctcccc gagttgctga gcacggcccg gcttcgggtg
cggggctccg tacggggcgt 1200ggcgcggggc tcgccgtgcc gggcgggggg tggcggcagg
tgggggtgcc gggcggggcg 1260gggccgcctc gggccgggga gggctcgggg gaggggcgcg
gcggcccccg gagcgccggc 1320ggctgtcgag gcgcggcgag ccgcagccat tgccttttat
ggtaatcgtg cgagagggcg 1380cagggacttc ctttgtccca aatctgtgcg gagccgaaat
ctgggaggcg ccgccgcacc 1440ccctctagcg ggcgcggggc gaagcggtgc ggcgccggca
ggaaggaaat gggcggggag 1500ggccttcgtg cgtcgccgcg ccgccgtccc cttctccctc
tccagcctcg gggctgtccg 1560cggggggacg gctgccttcg ggggggacgg ggcagggcgg
ggttcggctt ctggcgtgtg 1620accggcggct ctagagcctc tgctaaccat gttcatgcct
tcttcttttt cctacagatc 1680cttaattaat aatacgactc actatagggg ccgccaccat
gggacctaag aaaaagagga 1740aggtggcggc cgctgactac aaggatgacg acgataaacc
aggtggcgga ggtagtggcg 1800gaggtggggt acccgccagt ccagcagccc aggtggatct
gagaaccctc ggctacagcc 1860agcagcagca ggagaagatc aaaccaaagg tgcggtccac
cgtcgctcag caccatgaag 1920cactggtggg gcacggtttc acacacgccc atattgtggc
tctgtctcag catcccgctg 1980cactcgggac tgtggccgtc aaatatcagg acatgatcgc
cgctctgcct gaggcaaccc 2040acgaagccat tgtgggcgtc ggaaagcagt ggagcggtgc
cagagcactc gaagcactcc 2100tcaccgtcgc cggggaactg cggggtccac cactccagtc
cggactggac actggacagc 2160tgctgaagat cgctaaacgc ggcggagtga cagctgtgga
agctgtgcac gcttggagga 2220atgctctgac aggagcccca ctgaatctta ctccagaaca
ggtcgtcgca atcgcaagta 2280acatcggcgg aaaacaggcc ctcgaaaccg tccagagact
cctccccgtg ctgtgccagg 2340cccacggact gaccccacag caggtggtcg ccatcgctag
caacggcgga gggaagcagg 2400ctctggagac cgtgcagagg ctgctccccg tcctgtgcca
ggcacatggg ctcacacctc 2460agcaggtggt cgcaattgcc tccaatggtg gcggaaaaca
ggccctggaa actgtgcaga 2520gactgctccc cgtgctgtgc caggctcacg gtctcacacc
ccagcaggtg gtcgctatcg 2580catctcatga cgggggcaag caggcactgg agacagtgca
gcggctgctc cctgtcctgt 2640gccaggccca cggactcact cctcagcagg tcgtcgccat
tgctagtaac ggcggaggga 2700aacaggctct ggaaaccgtg cagcgcctgc tccccgtgct
gtgccaagcc cacggcctga 2760ccccccagca ggtggtcgca atcgcctcaa acaatggtgg
caagcaggcc ctggagactg 2820tgcagcgact gctcccagtg ctgtgccagg cccatggact
cacaccacag caggtcgtcg 2880ctattgcaag caacaatgga gggaaacagg cactggaaac
agtccagagg ctgctccccg 2940tgctgtgcca agcgcatgga ctcactcccc agcaggtcgt
cgccatcgct tccaataacg 3000gcggcaagca ggccctggag accgtccaga gactgctccc
cgtgctgtgc caagctcacg 3060gactcacacc tgagcaggtc gtggcaatcg cctctaacat
tggagggaaa caggccctgg 3120aaactgtaca gcggctgctc cccgtgctgt gccaagcaca
cggactcact ccacagcagg 3180tcgtggccat tgcaagtcat gacggaggca agcaggccct
ggaaacagtg cagcgcctgc 3240tccctgtgct gtgccaggct catggtctga ctcctcagca
ggtggtggcc atcgcttcca 3300acaatggagg gaagcaggcc ctggagaccg tacagagact
gctccccgtg ctgtgccaag 3360cgcacggtct gacccctcag caggtcgtcg caatcgccag
caatggcggg ggcaagcagg 3420ctctcgaaac cgtccagcgg ctcctcccag tcctctgtca
ggctcacggc ctgaccccac 3480agcaggtcgt cgctattgct tctaatggcg gagggcggcc
tgctctggag agcattgtgg 3540ctcagctgtc caggcccgat cctgccctgg ctagatccgc
actcactaac gatcatctgg 3600tcgctctcgc ttgcctcggt ggacggcccg ctctggacgc
agtcaaaaag ggtctccccc 3660atgctcccgc actgatcaag agaaccaaca ggagaattcc
tgagggatcc gatcgtttaa 3720acgaaggcat caaaagcaac atctccctcc tgaaagacga
actccggggg cagattagcc 3780acattagtca cgaatacctc tccctcatcg acctggcttt
cgatagcaag cagaacaggc 3840tctttgagat gaaagtgctg gaactgctcg tcaatgagta
cgggttcaag ggtcgacacc 3900tcggcggatc taggaaacca gacggcatcg tgtatagtac
cacactggaa gacaactttg 3960ggatcattgt ggataccaag gcatactctg agggttatag
tctgcccatt tcacaggccg 4020acgagatgga acggtacgtg cgcgagaact caaatagaga
tgaggaagtc aaccctaaca 4080agtggtggga gaacttctct gaggaagtga agaaatacta
cttcgtcttt atcagcgggt 4140ccttcaaggg taaatttgag gaacagctca ggagactgag
catgactacc ggcgtgaatg 4200gcagcgccgt caacgtggtc aatctgctcc tgggcgctga
aaagattcgg agcggagaga 4260tgaccatcga agagctggag agggcaatgt ttaataatag
cgagtttatc ctgaaatact 4320gaacgcgtaa atgattgcag atccactagt tctagaattc
cagctgagcg ccggtcgcta 4380ccattaccag ttggtctggt gtcaaaaata ataataaccg
ggcagggggg atctgcatgg 4440atctttgtga aggaacctta cttctgtggt gtgacataat
tggacaaact acctacagag 4500atttaaagct ctaaggtaaa tataaaattt ttaagtgtat
aatgtgttaa actactgatt 4560ctaattgttt gtgtatttta gattccaacc tatggaactg
atgaatggga gcagtggtgg 4620aatgccagat ccagacatga taagatacat tgatgagttt
ggacaaacca caactagaat 4680gcagtgaaaa aaatgcttta tttgtgaaat ttgtgatgct
attgctttat ttgtaaccat 4740tataagctgc aataaacaag ttaacaacaa caattgcatt
cattttatgt ttcaggttca 4800gggggaggtg tgggaggttt tttaaagcaa gtaaaacctc
tacaaatgtg gtatggctga 4860ttatgatctg cggccgccac tggccgtcgt tttacaacgt
cgtgactggg aaaaccctgg 4920cgttacccaa cttaatcgcc ttgcagcaca tccccctttc
gccagctggc gtaatagcga 4980agaggcccgc accgatcgcc cttcccaaca gttgcgcagc
ctgaatggcg aatggaacgc 5040gccctgtagc ggcgcattaa gcgcggcggg tgtggtggtt
acgcgcagcg tgaccgctac 5100acttgccagc gccctagcgc ccgctccttt cgctttcttc
ccttcctttc tcgccacgtt 5160cgccggcttt ccccgtcaag ctctaaatcg ggggctccct
ttagggttcc gatttagtgc 5220tttacggcac ctcgacccca aaaaacttga ttagggtgat
ggttcacgta gtgggccatc 5280gccctgatag acggtttttc gccctttgac gttggagtcc
acgttcttta atagtggact 5340cttgttccaa actggaacaa cactcaaccc tatctcggtc
tattcttttg atttataagg 5400gattttgccg atttcggcct attggttaaa aaatgagctg
atttaacaaa aatttaacgc 5460gaattttaac aaaatattaa cgcttacaat ttaggtggca
cttttcgggg aaatgtgcgc 5520ggaaccccta tttgtttatt tttctaaata cattcaaata
tgtatccgct catgagacaa 5580taaccctgat aaatgcttca ataatattga aaaaggaaga
gtatgagtat tcaacatttc 5640cgtgtcgccc ttattccctt ttttgcggca ttttgccttc
ctgtttttgc tcacccagaa 5700acgctggtga aagtaaaaga tgctgaagat cagttgggtg
cacgagtggg ttacatcgaa 5760ctggatctca acagcggtaa gatccttgag agttttcgcc
ccgaagaacg ttttccaatg 5820atgagcactt ttaaagttct gctatgtggc gcggtattat
cccgtattga cgccgggcaa 5880gagcaactcg gtcgccgcat acactattct cagaatgact
tggttgagta ctcaccagtc 5940acagaaaagc atcttacgga tggcatgaca gtaagagaat
tatgcagtgc tgccataacc 6000atgagtgata acactgcggc caacttactt ctgacaacga
tcggaggacc gaaggagcta 6060accgcttttt tgcacaacat gggggatcat gtaactcgcc
ttgatcgttg ggaaccggag 6120ctgaatgaag ccataccaaa cgacgagcgt gacaccacga
tgcctgtagc aatggcaaca 6180acgttgcgca aactattaac tggcgaacta cttactctag
cttcccggca acaattaata 6240gactggatgg aggcggataa agttgcagga ccacttctgc
gctcggccct tccggctggc 6300tggtttattg ctgataaatc tggagccggt gagcgtgggt
ctcgcggtat cattgcagca 6360ctggggccag atggtaagcc ctcccgtatc gtagttatct
acacgacggg gagtcaggca 6420actatggatg aacgaaatag acagatcgct gagataggtg
cctcactgat taagcattgg 6480taactgtcag accaagttta ctcatatata ctttagattg
atttaaaact tcatttttaa 6540tttaaaagga tctaggtgaa gatccttttt gataatctca
tgaccaaaat cccttaacgt 6600gagttttcgt tccactgagc gtcagacccc gtagaaaaga
tcaaaggatc ttcttgagat 6660cctttttttc tgcgcgtaat ctgctgcttg caaacaaaaa
aaccaccgct accagcggtg 6720gtttgtttgc cggatcaaga gctaccaact ctttttccga
aggtaactgg cttcagcaga 6780gcgcagatac caaatactgt ccttctagtg tagccgtagt
taggccacca cttcaagaac 6840tctgtagcac cgcctacata cctcgctctg ctaatcctgt
taccagtggc tgctgccagt 6900ggcgataagt cgtgtcttac cgggttggac tcaagacgat
agttaccgga taaggcgcag 6960cggtcgggct gaacgggggg ttcgtgcaca cagcccagct
tggagcgaac gacctacacc 7020gaactgagat acctacagcg tgagctatga gaaagcgcca
cgcttcccga agggagaaag 7080gcggacaggt atccggtaag cggcagggtc ggaacaggag
agcgcacgag ggagcttcca 7140gggggaaacg cctggtatct ttatagtcct gtcgggtttc
gccacctctg acttgagcgt 7200cgatttttgt gatgctcgtc aggggggcgg agcctatgga
aaaacgccag caacgcggcc 7260tttttacggt tcctggcctt ttgctggcct tttgctcaca
tgttctttcc tgcgttatcc 7320cctgattctg tggataaccg tattaccgcc tttgagtgag
ctgataccgc tcgccgcagc 7380cgaacgaccg agcgcagcga gtcagtgagc gaggaagcgg
aagagcgccc aatacgcaaa 7440ccgcctctcc ccgcgcgttg gccgattcat taatgcagct
ggcacgacag gtttcccgac 7500tggaaagcgg gcagtgagcg caacgcaatt aatgtgagtt
agctcactca ttaggcaccc 7560caggctttac actttatgct tccggctcgt atgttgtgtg
gaattgtgag cggataacaa 7620tttcacacag gaaacagcta tgaccatgag gcgcgccgga
ttc 7663307816DNAArtificial sequencepCAG-ArtTal1-MlyI
30gacattgatt attgactagt tattaatagt aatcaattac ggggtcatta gttcatagcc
60catatatgga gttccgcgtt acataactta cggtaaatgg cccgcctggc tgaccgccca
120acgacccccg cccattgacg tcaataatga cgtatgttcc catagtaacg ccaataggga
180ctttccattg acgtcaatgg gtggagtatt tacggtaaac tgcccacttg gcagtacatc
240aagtgtatca tatgccaagt acgcccccta ttgacgtcaa tgacggtaaa tggcccgcct
300ggcattatgc ccagtacatg accttatggg actttcctac ttggcagtac atctacgtat
360tagtcatcgc tattaccatg gtcgaggtga gccccacgtt ctgcttcact ctccccatct
420cccccccctc cccaccccca attttgtatt tatttatttt ttaattattt tgtgcagcga
480tgggggcggg gggggggggg gggcgcgcgc caggcggggc ggggcggggc gaggggcggg
540gcggggcgag gcggagaggt gcggcggcag ccaatcagag cggcgcgctc cgaaagtttc
600cttttatggc gaggcggcgg cggcggcggc cctataaaaa gcgaagcgcg cggcgggcgg
660gagtcgctgc gcgctgcctt cgccccgtgc cccgctccgc cgccgcctcg cgccgcccgc
720cccggctctg actgaccgcg ttactcccac aggtgagcgg gcgggacggc ccttctcctc
780cgggctgtaa ttagcgcttg gtttaatgac ggcttgtttc ttttctgtgg ctgcgtgaaa
840gccttgaggg gctccgggag ggccctttgt gcggggggga gcggctcggg gggtgcgtgc
900gtgtgtgtgt gcgtggggag cgccgcgtgc ggctccgcgc tgcccggcgg ctgtgagcgc
960tgcgggcgcg gcgcggggct ttgtgcgctc cgcagtgtgc gcgaggggag cgcggccggg
1020ggcggtgccc cgcggtgcgg ggggggctgc gaggggaaca aaggctgcgt gcggggtgtg
1080tgcgtggggg ggtgagcagg gggtgtgggc gcgtcggtcg ggctgcaacc ccccctgcac
1140ccccctcccc gagttgctga gcacggcccg gcttcgggtg cggggctccg tacggggcgt
1200ggcgcggggc tcgccgtgcc gggcgggggg tggcggcagg tgggggtgcc gggcggggcg
1260gggccgcctc gggccgggga gggctcgggg gaggggcgcg gcggcccccg gagcgccggc
1320ggctgtcgag gcgcggcgag ccgcagccat tgccttttat ggtaatcgtg cgagagggcg
1380cagggacttc ctttgtccca aatctgtgcg gagccgaaat ctgggaggcg ccgccgcacc
1440ccctctagcg ggcgcggggc gaagcggtgc ggcgccggca ggaaggaaat gggcggggag
1500ggccttcgtg cgtcgccgcg ccgccgtccc cttctccctc tccagcctcg gggctgtccg
1560cggggggacg gctgccttcg ggggggacgg ggcagggcgg ggttcggctt ctggcgtgtg
1620accggcggct ctagagcctc tgctaaccat gttcatgcct tcttcttttt cctacagatc
1680cttaattaat aatacgactc actatagggg ccgccaccat gggacctaag aaaaagagga
1740aggtggcggc cgctgactac aaggatgacg acgataaacc aggtggcgga ggtagtggcg
1800gaggtggggt acccgccagt ccagcagccc aggtggatct gagaaccctc ggctacagcc
1860agcagcagca ggagaagatc aaaccaaagg tgcggtccac cgtcgctcag caccatgaag
1920cactggtggg gcacggtttc acacacgccc atattgtggc tctgtctcag catcccgctg
1980cactcgggac tgtggccgtc aaatatcagg acatgatcgc cgctctgcct gaggcaaccc
2040acgaagccat tgtgggcgtc ggaaagcagt ggagcggtgc cagagcactc gaagcactcc
2100tcaccgtcgc cggggaactg cggggtccac cactccagtc cggactggac actggacagc
2160tgctgaagat cgctaaacgc ggcggagtga cagctgtgga agctgtgcac gcttggagga
2220atgctctgac aggagcccca ctgaatctta ctccagaaca ggtcgtcgca atcgcaagta
2280acatcggcgg aaaacaggcc ctcgaaaccg tccagagact cctccccgtg ctgtgccagg
2340cccacggact gaccccacag caggtggtcg ccatcgctag caacggcgga gggaagcagg
2400ctctggagac cgtgcagagg ctgctccccg tcctgtgcca ggcacatggg ctcacacctc
2460agcaggtggt cgcaattgcc tccaatggtg gcggaaaaca ggccctggaa actgtgcaga
2520gactgctccc cgtgctgtgc caggctcacg gtctcacacc ccagcaggtg gtcgctatcg
2580catctcatga cgggggcaag caggcactgg agacagtgca gcggctgctc cctgtcctgt
2640gccaggccca cggactcact cctcagcagg tcgtcgccat tgctagtaac ggcggaggga
2700aacaggctct ggaaaccgtg cagcgcctgc tccccgtgct gtgccaagcc cacggcctga
2760ccccccagca ggtggtcgca atcgcctcaa acaatggtgg caagcaggcc ctggagactg
2820tgcagcgact gctcccagtg ctgtgccagg cccatggact cacaccacag caggtcgtcg
2880ctattgcaag caacaatgga gggaaacagg cactggaaac agtccagagg ctgctccccg
2940tgctgtgcca agcgcatgga ctcactcccc agcaggtcgt cgccatcgct tccaataacg
3000gcggcaagca ggccctggag accgtccaga gactgctccc cgtgctgtgc caagctcacg
3060gactcacacc tgagcaggtc gtggcaatcg cctctaacat tggagggaaa caggccctgg
3120aaactgtaca gcggctgctc cccgtgctgt gccaagcaca cggactcact ccacagcagg
3180tcgtggccat tgcaagtcat gacggaggca agcaggccct ggaaacagtg cagcgcctgc
3240tccctgtgct gtgccaggct catggtctga ctcctcagca ggtggtggcc atcgcttcca
3300acaatggagg gaagcaggcc ctggagaccg tacagagact gctccccgtg ctgtgccaag
3360cgcacggtct gacccctcag caggtcgtcg caatcgccag caatggcggg ggcaagcagg
3420ctctcgaaac cgtccagcgg ctcctcccag tcctctgtca ggctcacggc ctgaccccac
3480agcaggtcgt cgctattgct tctaatggcg gagggcggcc tgctctggag agcattgtgg
3540ctcagctgtc caggcccgat cctgccctgg ctagatccgc actcactaac gatcatctgg
3600tcgctctcgc ttgcctcggt ggacggcccg ctctggacgc agtcaaaaag ggtctccccc
3660atgctcccgc actgatcaag agaaccaaca ggagaattcc tgagggatcc gatcgtttaa
3720acatcaatag caagatcaag cagctggacg atagcatcaa cgtggagtcc ctgaagattg
3780acgatgccaa agatctgctg aatgacctgg agatccagcg gaaggctaaa accattgaag
3840atacagtgaa ccacctgaag ctgcgctccg acatcgagga tattctggac gtgttcgcca
3900aaatcaagaa aagggatgtg cccgacgtgc ctctgttcct ggagtggaat atctggcggg
3960cctttgccgc tctgaatcat acccaggcta tcgaagggaa ctttattgtg gacctggatg
4020gcatgcccct gaatacagct ccaggaaaga aacccgatat cgagattaac tacggaagct
4080tctcctgcat cgtggaagtg actatgagct ccggggagac ccagtttaac atggaaggct
4140ctagtgtgcc taggcactac ggagacctgg tgagaaaggt ggaccatgat gcctattgta
4200tcttcattgc ccctaaggtg gctccaggga ctaaagctca cttctttaac ctgaataggc
4260tgtctacaaa gcattatggc ggaaagacta agatcattcc aatgagtctg gacgatttca
4320tctgctttct gcaagtgggc attacccaca actttcagga tatcaacaag ctgaaaaatt
4380ggctggacaa cctgattaac ttcaatctgg agtctgaaga cgaggaaatc tggtttgagg
4440aaatcatttc taagatcagt acatgggcca tttgaacgcg taaatgattg cagatccact
4500agttctagaa ttccagctga gcgccggtcg ctaccattac cagttggtct ggtgtcaaaa
4560ataataataa ccgggcaggg gggatctgca tggatctttg tgaaggaacc ttacttctgt
4620ggtgtgacat aattggacaa actacctaca gagatttaaa gctctaaggt aaatataaaa
4680tttttaagtg tataatgtgt taaactactg attctaattg tttgtgtatt ttagattcca
4740acctatggaa ctgatgaatg ggagcagtgg tggaatgcca gatccagaca tgataagata
4800cattgatgag tttggacaaa ccacaactag aatgcagtga aaaaaatgct ttatttgtga
4860aatttgtgat gctattgctt tatttgtaac cattataagc tgcaataaac aagttaacaa
4920caacaattgc attcatttta tgtttcaggt tcagggggag gtgtgggagg ttttttaaag
4980caagtaaaac ctctacaaat gtggtatggc tgattatgat ctgcggccgc cactggccgt
5040cgttttacaa cgtcgtgact gggaaaaccc tggcgttacc caacttaatc gccttgcagc
5100acatccccct ttcgccagct ggcgtaatag cgaagaggcc cgcaccgatc gcccttccca
5160acagttgcgc agcctgaatg gcgaatggaa cgcgccctgt agcggcgcat taagcgcggc
5220gggtgtggtg gttacgcgca gcgtgaccgc tacacttgcc agcgccctag cgcccgctcc
5280tttcgctttc ttcccttcct ttctcgccac gttcgccggc tttccccgtc aagctctaaa
5340tcgggggctc cctttagggt tccgatttag tgctttacgg cacctcgacc ccaaaaaact
5400tgattagggt gatggttcac gtagtgggcc atcgccctga tagacggttt ttcgcccttt
5460gacgttggag tccacgttct ttaatagtgg actcttgttc caaactggaa caacactcaa
5520ccctatctcg gtctattctt ttgatttata agggattttg ccgatttcgg cctattggtt
5580aaaaaatgag ctgatttaac aaaaatttaa cgcgaatttt aacaaaatat taacgcttac
5640aatttaggtg gcacttttcg gggaaatgtg cgcggaaccc ctatttgttt atttttctaa
5700atacattcaa atatgtatcc gctcatgaga caataaccct gataaatgct tcaataatat
5760tgaaaaagga agagtatgag tattcaacat ttccgtgtcg cccttattcc cttttttgcg
5820gcattttgcc ttcctgtttt tgctcaccca gaaacgctgg tgaaagtaaa agatgctgaa
5880gatcagttgg gtgcacgagt gggttacatc gaactggatc tcaacagcgg taagatcctt
5940gagagttttc gccccgaaga acgttttcca atgatgagca cttttaaagt tctgctatgt
6000ggcgcggtat tatcccgtat tgacgccggg caagagcaac tcggtcgccg catacactat
6060tctcagaatg acttggttga gtactcacca gtcacagaaa agcatcttac ggatggcatg
6120acagtaagag aattatgcag tgctgccata accatgagtg ataacactgc ggccaactta
6180cttctgacaa cgatcggagg accgaaggag ctaaccgctt ttttgcacaa catgggggat
6240catgtaactc gccttgatcg ttgggaaccg gagctgaatg aagccatacc aaacgacgag
6300cgtgacacca cgatgcctgt agcaatggca acaacgttgc gcaaactatt aactggcgaa
6360ctacttactc tagcttcccg gcaacaatta atagactgga tggaggcgga taaagttgca
6420ggaccacttc tgcgctcggc ccttccggct ggctggttta ttgctgataa atctggagcc
6480ggtgagcgtg ggtctcgcgg tatcattgca gcactggggc cagatggtaa gccctcccgt
6540atcgtagtta tctacacgac ggggagtcag gcaactatgg atgaacgaaa tagacagatc
6600gctgagatag gtgcctcact gattaagcat tggtaactgt cagaccaagt ttactcatat
6660atactttaga ttgatttaaa acttcatttt taatttaaaa ggatctaggt gaagatcctt
6720tttgataatc tcatgaccaa aatcccttaa cgtgagtttt cgttccactg agcgtcagac
6780cccgtagaaa agatcaaagg atcttcttga gatccttttt ttctgcgcgt aatctgctgc
6840ttgcaaacaa aaaaaccacc gctaccagcg gtggtttgtt tgccggatca agagctacca
6900actctttttc cgaaggtaac tggcttcagc agagcgcaga taccaaatac tgtccttcta
6960gtgtagccgt agttaggcca ccacttcaag aactctgtag caccgcctac atacctcgct
7020ctgctaatcc tgttaccagt ggctgctgcc agtggcgata agtcgtgtct taccgggttg
7080gactcaagac gatagttacc ggataaggcg cagcggtcgg gctgaacggg gggttcgtgc
7140acacagccca gcttggagcg aacgacctac accgaactga gatacctaca gcgtgagcta
7200tgagaaagcg ccacgcttcc cgaagggaga aaggcggaca ggtatccggt aagcggcagg
7260gtcggaacag gagagcgcac gagggagctt ccagggggaa acgcctggta tctttatagt
7320cctgtcgggt ttcgccacct ctgacttgag cgtcgatttt tgtgatgctc gtcagggggg
7380cggagcctat ggaaaaacgc cagcaacgcg gcctttttac ggttcctggc cttttgctgg
7440ccttttgctc acatgttctt tcctgcgtta tcccctgatt ctgtggataa ccgtattacc
7500gcctttgagt gagctgatac cgctcgccgc agccgaacga ccgagcgcag cgagtcagtg
7560agcgaggaag cggaagagcg cccaatacgc aaaccgcctc tccccgcgcg ttggccgatt
7620cattaatgca gctggcacga caggtttccc gactggaaag cgggcagtga gcgcaacgca
7680attaatgtga gttagctcac tcattaggca ccccaggctt tacactttat gcttccggct
7740cgtatgttgt gtggaattgt gagcggataa caatttcaca caggaaacag ctatgaccat
7800gaggcgcgcc ggattc
7816317705DNAArtificial sequencepCAG-ArtTal1-Pept071 31gacattgatt
attgactagt tattaatagt aatcaattac ggggtcatta gttcatagcc 60catatatgga
gttccgcgtt acataactta cggtaaatgg cccgcctggc tgaccgccca 120acgacccccg
cccattgacg tcaataatga cgtatgttcc catagtaacg ccaataggga 180ctttccattg
acgtcaatgg gtggagtatt tacggtaaac tgcccacttg gcagtacatc 240aagtgtatca
tatgccaagt acgcccccta ttgacgtcaa tgacggtaaa tggcccgcct 300ggcattatgc
ccagtacatg accttatggg actttcctac ttggcagtac atctacgtat 360tagtcatcgc
tattaccatg gtcgaggtga gccccacgtt ctgcttcact ctccccatct 420cccccccctc
cccaccccca attttgtatt tatttatttt ttaattattt tgtgcagcga 480tgggggcggg
gggggggggg gggcgcgcgc caggcggggc ggggcggggc gaggggcggg 540gcggggcgag
gcggagaggt gcggcggcag ccaatcagag cggcgcgctc cgaaagtttc 600cttttatggc
gaggcggcgg cggcggcggc cctataaaaa gcgaagcgcg cggcgggcgg 660gagtcgctgc
gcgctgcctt cgccccgtgc cccgctccgc cgccgcctcg cgccgcccgc 720cccggctctg
actgaccgcg ttactcccac aggtgagcgg gcgggacggc ccttctcctc 780cgggctgtaa
ttagcgcttg gtttaatgac ggcttgtttc ttttctgtgg ctgcgtgaaa 840gccttgaggg
gctccgggag ggccctttgt gcggggggga gcggctcggg gggtgcgtgc 900gtgtgtgtgt
gcgtggggag cgccgcgtgc ggctccgcgc tgcccggcgg ctgtgagcgc 960tgcgggcgcg
gcgcggggct ttgtgcgctc cgcagtgtgc gcgaggggag cgcggccggg 1020ggcggtgccc
cgcggtgcgg ggggggctgc gaggggaaca aaggctgcgt gcggggtgtg 1080tgcgtggggg
ggtgagcagg gggtgtgggc gcgtcggtcg ggctgcaacc ccccctgcac 1140ccccctcccc
gagttgctga gcacggcccg gcttcgggtg cggggctccg tacggggcgt 1200ggcgcggggc
tcgccgtgcc gggcgggggg tggcggcagg tgggggtgcc gggcggggcg 1260gggccgcctc
gggccgggga gggctcgggg gaggggcgcg gcggcccccg gagcgccggc 1320ggctgtcgag
gcgcggcgag ccgcagccat tgccttttat ggtaatcgtg cgagagggcg 1380cagggacttc
ctttgtccca aatctgtgcg gagccgaaat ctgggaggcg ccgccgcacc 1440ccctctagcg
ggcgcggggc gaagcggtgc ggcgccggca ggaaggaaat gggcggggag 1500ggccttcgtg
cgtcgccgcg ccgccgtccc cttctccctc tccagcctcg gggctgtccg 1560cggggggacg
gctgccttcg ggggggacgg ggcagggcgg ggttcggctt ctggcgtgtg 1620accggcggct
ctagagcctc tgctaaccat gttcatgcct tcttcttttt cctacagatc 1680cttaattaat
aatacgactc actatagggg ccgccaccat gggacctaag aaaaagagga 1740aggtggcggc
cgctgactac aaggatgacg acgataaacc aggtggcgga ggtagtggcg 1800gaggtggggt
acccgccagt ccagcagccc aggtggatct gagaaccctc ggctacagcc 1860agcagcagca
ggagaagatc aaaccaaagg tgcggtccac cgtcgctcag caccatgaag 1920cactggtggg
gcacggtttc acacacgccc atattgtggc tctgtctcag catcccgctg 1980cactcgggac
tgtggccgtc aaatatcagg acatgatcgc cgctctgcct gaggcaaccc 2040acgaagccat
tgtgggcgtc ggaaagcagt ggagcggtgc cagagcactc gaagcactcc 2100tcaccgtcgc
cggggaactg cggggtccac cactccagtc cggactggac actggacagc 2160tgctgaagat
cgctaaacgc ggcggagtga cagctgtgga agctgtgcac gcttggagga 2220atgctctgac
aggagcccca ctgaatctta ctccagaaca ggtcgtcgca atcgcaagta 2280acatcggcgg
aaaacaggcc ctcgaaaccg tccagagact cctccccgtg ctgtgccagg 2340cccacggact
gaccccacag caggtggtcg ccatcgctag caacggcgga gggaagcagg 2400ctctggagac
cgtgcagagg ctgctccccg tcctgtgcca ggcacatggg ctcacacctc 2460agcaggtggt
cgcaattgcc tccaatggtg gcggaaaaca ggccctggaa actgtgcaga 2520gactgctccc
cgtgctgtgc caggctcacg gtctcacacc ccagcaggtg gtcgctatcg 2580catctcatga
cgggggcaag caggcactgg agacagtgca gcggctgctc cctgtcctgt 2640gccaggccca
cggactcact cctcagcagg tcgtcgccat tgctagtaac ggcggaggga 2700aacaggctct
ggaaaccgtg cagcgcctgc tccccgtgct gtgccaagcc cacggcctga 2760ccccccagca
ggtggtcgca atcgcctcaa acaatggtgg caagcaggcc ctggagactg 2820tgcagcgact
gctcccagtg ctgtgccagg cccatggact cacaccacag caggtcgtcg 2880ctattgcaag
caacaatgga gggaaacagg cactggaaac agtccagagg ctgctccccg 2940tgctgtgcca
agcgcatgga ctcactcccc agcaggtcgt cgccatcgct tccaataacg 3000gcggcaagca
ggccctggag accgtccaga gactgctccc cgtgctgtgc caagctcacg 3060gactcacacc
tgagcaggtc gtggcaatcg cctctaacat tggagggaaa caggccctgg 3120aaactgtaca
gcggctgctc cccgtgctgt gccaagcaca cggactcact ccacagcagg 3180tcgtggccat
tgcaagtcat gacggaggca agcaggccct ggaaacagtg cagcgcctgc 3240tccctgtgct
gtgccaggct catggtctga ctcctcagca ggtggtggcc atcgcttcca 3300acaatggagg
gaagcaggcc ctggagaccg tacagagact gctccccgtg ctgtgccaag 3360cgcacggtct
gacccctcag caggtcgtcg caatcgccag caatggcggg ggcaagcagg 3420ctctcgaaac
cgtccagcgg ctcctcccag tcctctgtca ggctcacggc ctgaccccac 3480agcaggtcgt
cgctattgct tctaatggcg gagggcggcc tgctctggag agcattgtgg 3540ctcagctgtc
caggcccgat cctgccctgg ctagatccgc actcactaac gatcatctgg 3600tcgctctcgc
ttgcctcggt ggacggcccg ctctggacgc agtcaaaaag ggtctccccc 3660atgctcccgc
actgatcaag agaaccaaca ggagaattcc tgagggatcc gatcgtttaa 3720acaagatcag
caaaaccaat gtgctggagc tcaaggacaa agtccgagat aagctgaaat 3780acgtggacca
caggtatctg gcactcatcg acctcgccta tgatgggacc gctaacaggg 3840acttcgaaat
ccagacaatt gatctgctca ttaatgagct gaagtttaaa ggggtcaggc 3900tcggtgaaag
tagaaagccc gacggcatca tttcatacaa catcaatgga gtgatcattg 3960ataacaaggc
ttactctact ggttataacc tgcctattaa tcaggccgac gagatgatcc 4020ggtatattga
ggaaaatcag acccgcgatg aaaaaatcaa ctccaataag tggtgggagt 4080ctttcgacga
taaggtcaaa gacttcaact acctgtttgt gagctccttc tttaagggga 4140actttaaaaa
caatctgaag catatcgcta acagaacagg tgtcagcggc ggagcaatta 4200acgtggagaa
tctgctctac ttcgcagagg aactgaaagc cggccggctc tcatatgtgg 4260atagctttaa
gatgtacgac aacgatgaga tctatgtcgg cgacttctct gattacagtt 4320atgtgaagtt
tgccgctgag gaagagggag aatacctgac ttgaacgcgt aaatgattgc 4380agatccacta
gttctagaat tccagctgag cgccggtcgc taccattacc agttggtctg 4440gtgtcaaaaa
taataataac cgggcagggg ggatctgcat ggatctttgt gaaggaacct 4500tacttctgtg
gtgtgacata attggacaaa ctacctacag agatttaaag ctctaaggta 4560aatataaaat
ttttaagtgt ataatgtgtt aaactactga ttctaattgt ttgtgtattt 4620tagattccaa
cctatggaac tgatgaatgg gagcagtggt ggaatgccag atccagacat 4680gataagatac
attgatgagt ttggacaaac cacaactaga atgcagtgaa aaaaatgctt 4740tatttgtgaa
atttgtgatg ctattgcttt atttgtaacc attataagct gcaataaaca 4800agttaacaac
aacaattgca ttcattttat gtttcaggtt cagggggagg tgtgggaggt 4860tttttaaagc
aagtaaaacc tctacaaatg tggtatggct gattatgatc tgcggccgcc 4920actggccgtc
gttttacaac gtcgtgactg ggaaaaccct ggcgttaccc aacttaatcg 4980ccttgcagca
catccccctt tcgccagctg gcgtaatagc gaagaggccc gcaccgatcg 5040cccttcccaa
cagttgcgca gcctgaatgg cgaatggaac gcgccctgta gcggcgcatt 5100aagcgcggcg
ggtgtggtgg ttacgcgcag cgtgaccgct acacttgcca gcgccctagc 5160gcccgctcct
ttcgctttct tcccttcctt tctcgccacg ttcgccggct ttccccgtca 5220agctctaaat
cgggggctcc ctttagggtt ccgatttagt gctttacggc acctcgaccc 5280caaaaaactt
gattagggtg atggttcacg tagtgggcca tcgccctgat agacggtttt 5340tcgccctttg
acgttggagt ccacgttctt taatagtgga ctcttgttcc aaactggaac 5400aacactcaac
cctatctcgg tctattcttt tgatttataa gggattttgc cgatttcggc 5460ctattggtta
aaaaatgagc tgatttaaca aaaatttaac gcgaatttta acaaaatatt 5520aacgcttaca
atttaggtgg cacttttcgg ggaaatgtgc gcggaacccc tatttgttta 5580tttttctaaa
tacattcaaa tatgtatccg ctcatgagac aataaccctg ataaatgctt 5640caataatatt
gaaaaaggaa gagtatgagt attcaacatt tccgtgtcgc ccttattccc 5700ttttttgcgg
cattttgcct tcctgttttt gctcacccag aaacgctggt gaaagtaaaa 5760gatgctgaag
atcagttggg tgcacgagtg ggttacatcg aactggatct caacagcggt 5820aagatccttg
agagttttcg ccccgaagaa cgttttccaa tgatgagcac ttttaaagtt 5880ctgctatgtg
gcgcggtatt atcccgtatt gacgccgggc aagagcaact cggtcgccgc 5940atacactatt
ctcagaatga cttggttgag tactcaccag tcacagaaaa gcatcttacg 6000gatggcatga
cagtaagaga attatgcagt gctgccataa ccatgagtga taacactgcg 6060gccaacttac
ttctgacaac gatcggagga ccgaaggagc taaccgcttt tttgcacaac 6120atgggggatc
atgtaactcg ccttgatcgt tgggaaccgg agctgaatga agccatacca 6180aacgacgagc
gtgacaccac gatgcctgta gcaatggcaa caacgttgcg caaactatta 6240actggcgaac
tacttactct agcttcccgg caacaattaa tagactggat ggaggcggat 6300aaagttgcag
gaccacttct gcgctcggcc cttccggctg gctggtttat tgctgataaa 6360tctggagccg
gtgagcgtgg gtctcgcggt atcattgcag cactggggcc agatggtaag 6420ccctcccgta
tcgtagttat ctacacgacg gggagtcagg caactatgga tgaacgaaat 6480agacagatcg
ctgagatagg tgcctcactg attaagcatt ggtaactgtc agaccaagtt 6540tactcatata
tactttagat tgatttaaaa cttcattttt aatttaaaag gatctaggtg 6600aagatccttt
ttgataatct catgaccaaa atcccttaac gtgagttttc gttccactga 6660gcgtcagacc
ccgtagaaaa gatcaaagga tcttcttgag atcctttttt tctgcgcgta 6720atctgctgct
tgcaaacaaa aaaaccaccg ctaccagcgg tggtttgttt gccggatcaa 6780gagctaccaa
ctctttttcc gaaggtaact ggcttcagca gagcgcagat accaaatact 6840gtccttctag
tgtagccgta gttaggccac cacttcaaga actctgtagc accgcctaca 6900tacctcgctc
tgctaatcct gttaccagtg gctgctgcca gtggcgataa gtcgtgtctt 6960accgggttgg
actcaagacg atagttaccg gataaggcgc agcggtcggg ctgaacgggg 7020ggttcgtgca
cacagcccag cttggagcga acgacctaca ccgaactgag atacctacag 7080cgtgagctat
gagaaagcgc cacgcttccc gaagggagaa aggcggacag gtatccggta 7140agcggcaggg
tcggaacagg agagcgcacg agggagcttc cagggggaaa cgcctggtat 7200ctttatagtc
ctgtcgggtt tcgccacctc tgacttgagc gtcgattttt gtgatgctcg 7260tcaggggggc
ggagcctatg gaaaaacgcc agcaacgcgg cctttttacg gttcctggcc 7320ttttgctggc
cttttgctca catgttcttt cctgcgttat cccctgattc tgtggataac 7380cgtattaccg
cctttgagtg agctgatacc gctcgccgca gccgaacgac cgagcgcagc 7440gagtcagtga
gcgaggaagc ggaagagcgc ccaatacgca aaccgcctct ccccgcgcgt 7500tggccgattc
attaatgcag ctggcacgac aggtttcccg actggaaagc gggcagtgag 7560cgcaacgcaa
ttaatgtgag ttagctcact cattaggcac cccaggcttt acactttatg 7620cttccggctc
gtatgttgtg tggaattgtg agcggataac aatttcacac aggaaacagc 7680tatgaccatg
aggcgcgccg gattc
7705327555DNAArtificial sequencepCAG-ArtTal1-SbfI 32gacattgatt attgactagt
tattaatagt aatcaattac ggggtcatta gttcatagcc 60catatatgga gttccgcgtt
acataactta cggtaaatgg cccgcctggc tgaccgccca 120acgacccccg cccattgacg
tcaataatga cgtatgttcc catagtaacg ccaataggga 180ctttccattg acgtcaatgg
gtggagtatt tacggtaaac tgcccacttg gcagtacatc 240aagtgtatca tatgccaagt
acgcccccta ttgacgtcaa tgacggtaaa tggcccgcct 300ggcattatgc ccagtacatg
accttatggg actttcctac ttggcagtac atctacgtat 360tagtcatcgc tattaccatg
gtcgaggtga gccccacgtt ctgcttcact ctccccatct 420cccccccctc cccaccccca
attttgtatt tatttatttt ttaattattt tgtgcagcga 480tgggggcggg gggggggggg
gggcgcgcgc caggcggggc ggggcggggc gaggggcggg 540gcggggcgag gcggagaggt
gcggcggcag ccaatcagag cggcgcgctc cgaaagtttc 600cttttatggc gaggcggcgg
cggcggcggc cctataaaaa gcgaagcgcg cggcgggcgg 660gagtcgctgc gcgctgcctt
cgccccgtgc cccgctccgc cgccgcctcg cgccgcccgc 720cccggctctg actgaccgcg
ttactcccac aggtgagcgg gcgggacggc ccttctcctc 780cgggctgtaa ttagcgcttg
gtttaatgac ggcttgtttc ttttctgtgg ctgcgtgaaa 840gccttgaggg gctccgggag
ggccctttgt gcggggggga gcggctcggg gggtgcgtgc 900gtgtgtgtgt gcgtggggag
cgccgcgtgc ggctccgcgc tgcccggcgg ctgtgagcgc 960tgcgggcgcg gcgcggggct
ttgtgcgctc cgcagtgtgc gcgaggggag cgcggccggg 1020ggcggtgccc cgcggtgcgg
ggggggctgc gaggggaaca aaggctgcgt gcggggtgtg 1080tgcgtggggg ggtgagcagg
gggtgtgggc gcgtcggtcg ggctgcaacc ccccctgcac 1140ccccctcccc gagttgctga
gcacggcccg gcttcgggtg cggggctccg tacggggcgt 1200ggcgcggggc tcgccgtgcc
gggcgggggg tggcggcagg tgggggtgcc gggcggggcg 1260gggccgcctc gggccgggga
gggctcgggg gaggggcgcg gcggcccccg gagcgccggc 1320ggctgtcgag gcgcggcgag
ccgcagccat tgccttttat ggtaatcgtg cgagagggcg 1380cagggacttc ctttgtccca
aatctgtgcg gagccgaaat ctgggaggcg ccgccgcacc 1440ccctctagcg ggcgcggggc
gaagcggtgc ggcgccggca ggaaggaaat gggcggggag 1500ggccttcgtg cgtcgccgcg
ccgccgtccc cttctccctc tccagcctcg gggctgtccg 1560cggggggacg gctgccttcg
ggggggacgg ggcagggcgg ggttcggctt ctggcgtgtg 1620accggcggct ctagagcctc
tgctaaccat gttcatgcct tcttcttttt cctacagatc 1680cttaattaat aatacgactc
actatagggg ccgccaccat gggacctaag aaaaagagga 1740aggtggcggc cgctgactac
aaggatgacg acgataaacc aggtggcgga ggtagtggcg 1800gaggtggggt acccgccagt
ccagcagccc aggtggatct gagaaccctc ggctacagcc 1860agcagcagca ggagaagatc
aaaccaaagg tgcggtccac cgtcgctcag caccatgaag 1920cactggtggg gcacggtttc
acacacgccc atattgtggc tctgtctcag catcccgctg 1980cactcgggac tgtggccgtc
aaatatcagg acatgatcgc cgctctgcct gaggcaaccc 2040acgaagccat tgtgggcgtc
ggaaagcagt ggagcggtgc cagagcactc gaagcactcc 2100tcaccgtcgc cggggaactg
cggggtccac cactccagtc cggactggac actggacagc 2160tgctgaagat cgctaaacgc
ggcggagtga cagctgtgga agctgtgcac gcttggagga 2220atgctctgac aggagcccca
ctgaatctta ctccagaaca ggtcgtcgca atcgcaagta 2280acatcggcgg aaaacaggcc
ctcgaaaccg tccagagact cctccccgtg ctgtgccagg 2340cccacggact gaccccacag
caggtggtcg ccatcgctag caacggcgga gggaagcagg 2400ctctggagac cgtgcagagg
ctgctccccg tcctgtgcca ggcacatggg ctcacacctc 2460agcaggtggt cgcaattgcc
tccaatggtg gcggaaaaca ggccctggaa actgtgcaga 2520gactgctccc cgtgctgtgc
caggctcacg gtctcacacc ccagcaggtg gtcgctatcg 2580catctcatga cgggggcaag
caggcactgg agacagtgca gcggctgctc cctgtcctgt 2640gccaggccca cggactcact
cctcagcagg tcgtcgccat tgctagtaac ggcggaggga 2700aacaggctct ggaaaccgtg
cagcgcctgc tccccgtgct gtgccaagcc cacggcctga 2760ccccccagca ggtggtcgca
atcgcctcaa acaatggtgg caagcaggcc ctggagactg 2820tgcagcgact gctcccagtg
ctgtgccagg cccatggact cacaccacag caggtcgtcg 2880ctattgcaag caacaatgga
gggaaacagg cactggaaac agtccagagg ctgctccccg 2940tgctgtgcca agcgcatgga
ctcactcccc agcaggtcgt cgccatcgct tccaataacg 3000gcggcaagca ggccctggag
accgtccaga gactgctccc cgtgctgtgc caagctcacg 3060gactcacacc tgagcaggtc
gtggcaatcg cctctaacat tggagggaaa caggccctgg 3120aaactgtaca gcggctgctc
cccgtgctgt gccaagcaca cggactcact ccacagcagg 3180tcgtggccat tgcaagtcat
gacggaggca agcaggccct ggaaacagtg cagcgcctgc 3240tccctgtgct gtgccaggct
catggtctga ctcctcagca ggtggtggcc atcgcttcca 3300acaatggagg gaagcaggcc
ctggagaccg tacagagact gctccccgtg ctgtgccaag 3360cgcacggtct gacccctcag
caggtcgtcg caatcgccag caatggcggg ggcaagcagg 3420ctctcgaaac cgtccagcgg
ctcctcccag tcctctgtca ggctcacggc ctgaccccac 3480agcaggtcgt cgctattgct
tctaatggcg gagggcggcc tgctctggag agcattgtgg 3540ctcagctgtc caggcccgat
cctgccctgg ctagatccgc actcactaac gatcatctgg 3600tcgctctcgc ttgcctcggt
ggacggcccg ctctggacgc agtcaaaaag ggtctccccc 3660atgctcccgc actgatcaag
agaaccaaca ggagaattcc tgagggatcc gatcgtttaa 3720acatctctgt ggacctgcca
ggcggagagg aattcctgct gagtccagcc ggacagaacc 3780ccctgctgaa gaaaatggtg
gaggaattcg tgccccggtt tgctcctcgc agcaccgtgc 3840tgtacctggg ggacacaagg
ggcaagcact ccctgttcga gagagaaatc tttgaggaag 3900tgctgggcct gaccttcgat
cctcacggac ggatgccaga cctgattctg catgatgagg 3960tgagggggtg gctgttcctg
atggaagccg tgaagtctaa aggccccttt gatgaggaaa 4020ggcatagaag cctgcaggag
ctgtttgtga ctccttccgc cggcctgatc ttcgtgaact 4080gctttgagaa tagggaatct
atgagacagt ggctgcccga gctggcttgg gagaccgaag 4140cctgggtggc tgaagaccct
gatcacctga ttcatctgaa tggaagtcgg tttctggggc 4200catatgagcg ctgaacgcgt
aaatgattgc agatccacta gttctagaat tccagctgag 4260cgccggtcgc taccattacc
agttggtctg gtgtcaaaaa taataataac cgggcagggg 4320ggatctgcat ggatctttgt
gaaggaacct tacttctgtg gtgtgacata attggacaaa 4380ctacctacag agatttaaag
ctctaaggta aatataaaat ttttaagtgt ataatgtgtt 4440aaactactga ttctaattgt
ttgtgtattt tagattccaa cctatggaac tgatgaatgg 4500gagcagtggt ggaatgccag
atccagacat gataagatac attgatgagt ttggacaaac 4560cacaactaga atgcagtgaa
aaaaatgctt tatttgtgaa atttgtgatg ctattgcttt 4620atttgtaacc attataagct
gcaataaaca agttaacaac aacaattgca ttcattttat 4680gtttcaggtt cagggggagg
tgtgggaggt tttttaaagc aagtaaaacc tctacaaatg 4740tggtatggct gattatgatc
tgcggccgcc actggccgtc gttttacaac gtcgtgactg 4800ggaaaaccct ggcgttaccc
aacttaatcg ccttgcagca catccccctt tcgccagctg 4860gcgtaatagc gaagaggccc
gcaccgatcg cccttcccaa cagttgcgca gcctgaatgg 4920cgaatggaac gcgccctgta
gcggcgcatt aagcgcggcg ggtgtggtgg ttacgcgcag 4980cgtgaccgct acacttgcca
gcgccctagc gcccgctcct ttcgctttct tcccttcctt 5040tctcgccacg ttcgccggct
ttccccgtca agctctaaat cgggggctcc ctttagggtt 5100ccgatttagt gctttacggc
acctcgaccc caaaaaactt gattagggtg atggttcacg 5160tagtgggcca tcgccctgat
agacggtttt tcgccctttg acgttggagt ccacgttctt 5220taatagtgga ctcttgttcc
aaactggaac aacactcaac cctatctcgg tctattcttt 5280tgatttataa gggattttgc
cgatttcggc ctattggtta aaaaatgagc tgatttaaca 5340aaaatttaac gcgaatttta
acaaaatatt aacgcttaca atttaggtgg cacttttcgg 5400ggaaatgtgc gcggaacccc
tatttgttta tttttctaaa tacattcaaa tatgtatccg 5460ctcatgagac aataaccctg
ataaatgctt caataatatt gaaaaaggaa gagtatgagt 5520attcaacatt tccgtgtcgc
ccttattccc ttttttgcgg cattttgcct tcctgttttt 5580gctcacccag aaacgctggt
gaaagtaaaa gatgctgaag atcagttggg tgcacgagtg 5640ggttacatcg aactggatct
caacagcggt aagatccttg agagttttcg ccccgaagaa 5700cgttttccaa tgatgagcac
ttttaaagtt ctgctatgtg gcgcggtatt atcccgtatt 5760gacgccgggc aagagcaact
cggtcgccgc atacactatt ctcagaatga cttggttgag 5820tactcaccag tcacagaaaa
gcatcttacg gatggcatga cagtaagaga attatgcagt 5880gctgccataa ccatgagtga
taacactgcg gccaacttac ttctgacaac gatcggagga 5940ccgaaggagc taaccgcttt
tttgcacaac atgggggatc atgtaactcg ccttgatcgt 6000tgggaaccgg agctgaatga
agccatacca aacgacgagc gtgacaccac gatgcctgta 6060gcaatggcaa caacgttgcg
caaactatta actggcgaac tacttactct agcttcccgg 6120caacaattaa tagactggat
ggaggcggat aaagttgcag gaccacttct gcgctcggcc 6180cttccggctg gctggtttat
tgctgataaa tctggagccg gtgagcgtgg gtctcgcggt 6240atcattgcag cactggggcc
agatggtaag ccctcccgta tcgtagttat ctacacgacg 6300gggagtcagg caactatgga
tgaacgaaat agacagatcg ctgagatagg tgcctcactg 6360attaagcatt ggtaactgtc
agaccaagtt tactcatata tactttagat tgatttaaaa 6420cttcattttt aatttaaaag
gatctaggtg aagatccttt ttgataatct catgaccaaa 6480atcccttaac gtgagttttc
gttccactga gcgtcagacc ccgtagaaaa gatcaaagga 6540tcttcttgag atcctttttt
tctgcgcgta atctgctgct tgcaaacaaa aaaaccaccg 6600ctaccagcgg tggtttgttt
gccggatcaa gagctaccaa ctctttttcc gaaggtaact 6660ggcttcagca gagcgcagat
accaaatact gtccttctag tgtagccgta gttaggccac 6720cacttcaaga actctgtagc
accgcctaca tacctcgctc tgctaatcct gttaccagtg 6780gctgctgcca gtggcgataa
gtcgtgtctt accgggttgg actcaagacg atagttaccg 6840gataaggcgc agcggtcggg
ctgaacgggg ggttcgtgca cacagcccag cttggagcga 6900acgacctaca ccgaactgag
atacctacag cgtgagctat gagaaagcgc cacgcttccc 6960gaagggagaa aggcggacag
gtatccggta agcggcaggg tcggaacagg agagcgcacg 7020agggagcttc cagggggaaa
cgcctggtat ctttatagtc ctgtcgggtt tcgccacctc 7080tgacttgagc gtcgattttt
gtgatgctcg tcaggggggc ggagcctatg gaaaaacgcc 7140agcaacgcgg cctttttacg
gttcctggcc ttttgctggc cttttgctca catgttcttt 7200cctgcgttat cccctgattc
tgtggataac cgtattaccg cctttgagtg agctgatacc 7260gctcgccgca gccgaacgac
cgagcgcagc gagtcagtga gcgaggaagc ggaagagcgc 7320ccaatacgca aaccgcctct
ccccgcgcgt tggccgattc attaatgcag ctggcacgac 7380aggtttcccg actggaaagc
gggcagtgag cgcaacgcaa ttaatgtgag ttagctcact 7440cattaggcac cccaggcttt
acactttatg cttccggctc gtatgttgtg tggaattgtg 7500agcggataac aatttcacac
aggaaacagc tatgaccatg aggcgcgccg gattc 7555337555DNAArtificial
sequencepCAG-ArtTal1-Sda 33gacattgatt attgactagt tattaatagt aatcaattac
ggggtcatta gttcatagcc 60catatatgga gttccgcgtt acataactta cggtaaatgg
cccgcctggc tgaccgccca 120acgacccccg cccattgacg tcaataatga cgtatgttcc
catagtaacg ccaataggga 180ctttccattg acgtcaatgg gtggagtatt tacggtaaac
tgcccacttg gcagtacatc 240aagtgtatca tatgccaagt acgcccccta ttgacgtcaa
tgacggtaaa tggcccgcct 300ggcattatgc ccagtacatg accttatggg actttcctac
ttggcagtac atctacgtat 360tagtcatcgc tattaccatg gtcgaggtga gccccacgtt
ctgcttcact ctccccatct 420cccccccctc cccaccccca attttgtatt tatttatttt
ttaattattt tgtgcagcga 480tgggggcggg gggggggggg gggcgcgcgc caggcggggc
ggggcggggc gaggggcggg 540gcggggcgag gcggagaggt gcggcggcag ccaatcagag
cggcgcgctc cgaaagtttc 600cttttatggc gaggcggcgg cggcggcggc cctataaaaa
gcgaagcgcg cggcgggcgg 660gagtcgctgc gcgctgcctt cgccccgtgc cccgctccgc
cgccgcctcg cgccgcccgc 720cccggctctg actgaccgcg ttactcccac aggtgagcgg
gcgggacggc ccttctcctc 780cgggctgtaa ttagcgcttg gtttaatgac ggcttgtttc
ttttctgtgg ctgcgtgaaa 840gccttgaggg gctccgggag ggccctttgt gcggggggga
gcggctcggg gggtgcgtgc 900gtgtgtgtgt gcgtggggag cgccgcgtgc ggctccgcgc
tgcccggcgg ctgtgagcgc 960tgcgggcgcg gcgcggggct ttgtgcgctc cgcagtgtgc
gcgaggggag cgcggccggg 1020ggcggtgccc cgcggtgcgg ggggggctgc gaggggaaca
aaggctgcgt gcggggtgtg 1080tgcgtggggg ggtgagcagg gggtgtgggc gcgtcggtcg
ggctgcaacc ccccctgcac 1140ccccctcccc gagttgctga gcacggcccg gcttcgggtg
cggggctccg tacggggcgt 1200ggcgcggggc tcgccgtgcc gggcgggggg tggcggcagg
tgggggtgcc gggcggggcg 1260gggccgcctc gggccgggga gggctcgggg gaggggcgcg
gcggcccccg gagcgccggc 1320ggctgtcgag gcgcggcgag ccgcagccat tgccttttat
ggtaatcgtg cgagagggcg 1380cagggacttc ctttgtccca aatctgtgcg gagccgaaat
ctgggaggcg ccgccgcacc 1440ccctctagcg ggcgcggggc gaagcggtgc ggcgccggca
ggaaggaaat gggcggggag 1500ggccttcgtg cgtcgccgcg ccgccgtccc cttctccctc
tccagcctcg gggctgtccg 1560cggggggacg gctgccttcg ggggggacgg ggcagggcgg
ggttcggctt ctggcgtgtg 1620accggcggct ctagagcctc tgctaaccat gttcatgcct
tcttcttttt cctacagatc 1680cttaattaat aatacgactc actatagggg ccgccaccat
gggacctaag aaaaagagga 1740aggtggcggc cgctgactac aaggatgacg acgataaacc
aggtggcgga ggtagtggcg 1800gaggtggggt acccgccagt ccagcagccc aggtggatct
gagaaccctc ggctacagcc 1860agcagcagca ggagaagatc aaaccaaagg tgcggtccac
cgtcgctcag caccatgaag 1920cactggtggg gcacggtttc acacacgccc atattgtggc
tctgtctcag catcccgctg 1980cactcgggac tgtggccgtc aaatatcagg acatgatcgc
cgctctgcct gaggcaaccc 2040acgaagccat tgtgggcgtc ggaaagcagt ggagcggtgc
cagagcactc gaagcactcc 2100tcaccgtcgc cggggaactg cggggtccac cactccagtc
cggactggac actggacagc 2160tgctgaagat cgctaaacgc ggcggagtga cagctgtgga
agctgtgcac gcttggagga 2220atgctctgac aggagcccca ctgaatctta ctccagaaca
ggtcgtcgca atcgcaagta 2280acatcggcgg aaaacaggcc ctcgaaaccg tccagagact
cctccccgtg ctgtgccagg 2340cccacggact gaccccacag caggtggtcg ccatcgctag
caacggcgga gggaagcagg 2400ctctggagac cgtgcagagg ctgctccccg tcctgtgcca
ggcacatggg ctcacacctc 2460agcaggtggt cgcaattgcc tccaatggtg gcggaaaaca
ggccctggaa actgtgcaga 2520gactgctccc cgtgctgtgc caggctcacg gtctcacacc
ccagcaggtg gtcgctatcg 2580catctcatga cgggggcaag caggcactgg agacagtgca
gcggctgctc cctgtcctgt 2640gccaggccca cggactcact cctcagcagg tcgtcgccat
tgctagtaac ggcggaggga 2700aacaggctct ggaaaccgtg cagcgcctgc tccccgtgct
gtgccaagcc cacggcctga 2760ccccccagca ggtggtcgca atcgcctcaa acaatggtgg
caagcaggcc ctggagactg 2820tgcagcgact gctcccagtg ctgtgccagg cccatggact
cacaccacag caggtcgtcg 2880ctattgcaag caacaatgga gggaaacagg cactggaaac
agtccagagg ctgctccccg 2940tgctgtgcca agcgcatgga ctcactcccc agcaggtcgt
cgccatcgct tccaataacg 3000gcggcaagca ggccctggag accgtccaga gactgctccc
cgtgctgtgc caagctcacg 3060gactcacacc tgagcaggtc gtggcaatcg cctctaacat
tggagggaaa caggccctgg 3120aaactgtaca gcggctgctc cccgtgctgt gccaagcaca
cggactcact ccacagcagg 3180tcgtggccat tgcaagtcat gacggaggca agcaggccct
ggaaacagtg cagcgcctgc 3240tccctgtgct gtgccaggct catggtctga ctcctcagca
ggtggtggcc atcgcttcca 3300acaatggagg gaagcaggcc ctggagaccg tacagagact
gctccccgtg ctgtgccaag 3360cgcacggtct gacccctcag caggtcgtcg caatcgccag
caatggcggg ggcaagcagg 3420ctctcgaaac cgtccagcgg ctcctcccag tcctctgtca
ggctcacggc ctgaccccac 3480agcaggtcgt cgctattgct tctaatggcg gagggcggcc
tgctctggag agcattgtgg 3540ctcagctgtc caggcccgat cctgccctgg ctagatccgc
actcactaac gatcatctgg 3600tcgctctcgc ttgcctcggt ggacggcccg ctctggacgc
agtcaaaaag ggtctccccc 3660atgctcccgc actgatcaag agaaccaaca ggagaattcc
tgagggatcc gatcgtttaa 3720acattagcgt ggacctcgcc gatggagatg agttcctgct
gagccccgct ggacagaatc 3780ctctgctgaa aaagatggtg gaagaattta tgccacgatt
cgcacctgga gctaaggtgc 3840tgtacatcgg cgactggcga ggaaagcaca cacggttcga
gaaacgcatt tttgaggaaa 3900ccctggggct cacatttgat ccacacggta gaatgcccga
cctggtgctc catgataagg 3960tccggaaatg gctgttcctc atggaggccg tgaagagcaa
aggccctttt gacgaggaaa 4020ggcatagaac tctgcgggaa ctcttcgcta ccccagtcgc
aggactggtg ttcgtcaact 4080gctttgagaa tcgagaagcc atgaggcagt ggctgcccga
gctcgcttgg gagaccgaag 4140catgggtggc cgacgaccct gaccacctga tccacctcaa
cgggagcaga ttcctgggac 4200cctatgaaag atgaacgcgt aaatgattgc agatccacta
gttctagaat tccagctgag 4260cgccggtcgc taccattacc agttggtctg gtgtcaaaaa
taataataac cgggcagggg 4320ggatctgcat ggatctttgt gaaggaacct tacttctgtg
gtgtgacata attggacaaa 4380ctacctacag agatttaaag ctctaaggta aatataaaat
ttttaagtgt ataatgtgtt 4440aaactactga ttctaattgt ttgtgtattt tagattccaa
cctatggaac tgatgaatgg 4500gagcagtggt ggaatgccag atccagacat gataagatac
attgatgagt ttggacaaac 4560cacaactaga atgcagtgaa aaaaatgctt tatttgtgaa
atttgtgatg ctattgcttt 4620atttgtaacc attataagct gcaataaaca agttaacaac
aacaattgca ttcattttat 4680gtttcaggtt cagggggagg tgtgggaggt tttttaaagc
aagtaaaacc tctacaaatg 4740tggtatggct gattatgatc tgcggccgcc actggccgtc
gttttacaac gtcgtgactg 4800ggaaaaccct ggcgttaccc aacttaatcg ccttgcagca
catccccctt tcgccagctg 4860gcgtaatagc gaagaggccc gcaccgatcg cccttcccaa
cagttgcgca gcctgaatgg 4920cgaatggaac gcgccctgta gcggcgcatt aagcgcggcg
ggtgtggtgg ttacgcgcag 4980cgtgaccgct acacttgcca gcgccctagc gcccgctcct
ttcgctttct tcccttcctt 5040tctcgccacg ttcgccggct ttccccgtca agctctaaat
cgggggctcc ctttagggtt 5100ccgatttagt gctttacggc acctcgaccc caaaaaactt
gattagggtg atggttcacg 5160tagtgggcca tcgccctgat agacggtttt tcgccctttg
acgttggagt ccacgttctt 5220taatagtgga ctcttgttcc aaactggaac aacactcaac
cctatctcgg tctattcttt 5280tgatttataa gggattttgc cgatttcggc ctattggtta
aaaaatgagc tgatttaaca 5340aaaatttaac gcgaatttta acaaaatatt aacgcttaca
atttaggtgg cacttttcgg 5400ggaaatgtgc gcggaacccc tatttgttta tttttctaaa
tacattcaaa tatgtatccg 5460ctcatgagac aataaccctg ataaatgctt caataatatt
gaaaaaggaa gagtatgagt 5520attcaacatt tccgtgtcgc ccttattccc ttttttgcgg
cattttgcct tcctgttttt 5580gctcacccag aaacgctggt gaaagtaaaa gatgctgaag
atcagttggg tgcacgagtg 5640ggttacatcg aactggatct caacagcggt aagatccttg
agagttttcg ccccgaagaa 5700cgttttccaa tgatgagcac ttttaaagtt ctgctatgtg
gcgcggtatt atcccgtatt 5760gacgccgggc aagagcaact cggtcgccgc atacactatt
ctcagaatga cttggttgag 5820tactcaccag tcacagaaaa gcatcttacg gatggcatga
cagtaagaga attatgcagt 5880gctgccataa ccatgagtga taacactgcg gccaacttac
ttctgacaac gatcggagga 5940ccgaaggagc taaccgcttt tttgcacaac atgggggatc
atgtaactcg ccttgatcgt 6000tgggaaccgg agctgaatga agccatacca aacgacgagc
gtgacaccac gatgcctgta 6060gcaatggcaa caacgttgcg caaactatta actggcgaac
tacttactct agcttcccgg 6120caacaattaa tagactggat ggaggcggat aaagttgcag
gaccacttct gcgctcggcc 6180cttccggctg gctggtttat tgctgataaa tctggagccg
gtgagcgtgg gtctcgcggt 6240atcattgcag cactggggcc agatggtaag ccctcccgta
tcgtagttat ctacacgacg 6300gggagtcagg caactatgga tgaacgaaat agacagatcg
ctgagatagg tgcctcactg 6360attaagcatt ggtaactgtc agaccaagtt tactcatata
tactttagat tgatttaaaa 6420cttcattttt aatttaaaag gatctaggtg aagatccttt
ttgataatct catgaccaaa 6480atcccttaac gtgagttttc gttccactga gcgtcagacc
ccgtagaaaa gatcaaagga 6540tcttcttgag atcctttttt tctgcgcgta atctgctgct
tgcaaacaaa aaaaccaccg 6600ctaccagcgg tggtttgttt gccggatcaa gagctaccaa
ctctttttcc gaaggtaact 6660ggcttcagca gagcgcagat accaaatact gtccttctag
tgtagccgta gttaggccac 6720cacttcaaga actctgtagc accgcctaca tacctcgctc
tgctaatcct gttaccagtg 6780gctgctgcca gtggcgataa gtcgtgtctt accgggttgg
actcaagacg atagttaccg 6840gataaggcgc agcggtcggg ctgaacgggg ggttcgtgca
cacagcccag cttggagcga 6900acgacctaca ccgaactgag atacctacag cgtgagctat
gagaaagcgc cacgcttccc 6960gaagggagaa aggcggacag gtatccggta agcggcaggg
tcggaacagg agagcgcacg 7020agggagcttc cagggggaaa cgcctggtat ctttatagtc
ctgtcgggtt tcgccacctc 7080tgacttgagc gtcgattttt gtgatgctcg tcaggggggc
ggagcctatg gaaaaacgcc 7140agcaacgcgg cctttttacg gttcctggcc ttttgctggc
cttttgctca catgttcttt 7200cctgcgttat cccctgattc tgtggataac cgtattaccg
cctttgagtg agctgatacc 7260gctcgccgca gccgaacgac cgagcgcagc gagtcagtga
gcgaggaagc ggaagagcgc 7320ccaatacgca aaccgcctct ccccgcgcgt tggccgattc
attaatgcag ctggcacgac 7380aggtttcccg actggaaagc gggcagtgag cgcaacgcaa
ttaatgtgag ttagctcact 7440cattaggcac cccaggcttt acactttatg cttccggctc
gtatgttgtg tggaattgtg 7500agcggataac aatttcacac aggaaacagc tatgaccatg
aggcgcgccg gattc 7555347690DNAArtificial sequencepCAG-ArtTal1-StsI
34gacattgatt attgactagt tattaatagt aatcaattac ggggtcatta gttcatagcc
60catatatgga gttccgcgtt acataactta cggtaaatgg cccgcctggc tgaccgccca
120acgacccccg cccattgacg tcaataatga cgtatgttcc catagtaacg ccaataggga
180ctttccattg acgtcaatgg gtggagtatt tacggtaaac tgcccacttg gcagtacatc
240aagtgtatca tatgccaagt acgcccccta ttgacgtcaa tgacggtaaa tggcccgcct
300ggcattatgc ccagtacatg accttatggg actttcctac ttggcagtac atctacgtat
360tagtcatcgc tattaccatg gtcgaggtga gccccacgtt ctgcttcact ctccccatct
420cccccccctc cccaccccca attttgtatt tatttatttt ttaattattt tgtgcagcga
480tgggggcggg gggggggggg gggcgcgcgc caggcggggc ggggcggggc gaggggcggg
540gcggggcgag gcggagaggt gcggcggcag ccaatcagag cggcgcgctc cgaaagtttc
600cttttatggc gaggcggcgg cggcggcggc cctataaaaa gcgaagcgcg cggcgggcgg
660gagtcgctgc gcgctgcctt cgccccgtgc cccgctccgc cgccgcctcg cgccgcccgc
720cccggctctg actgaccgcg ttactcccac aggtgagcgg gcgggacggc ccttctcctc
780cgggctgtaa ttagcgcttg gtttaatgac ggcttgtttc ttttctgtgg ctgcgtgaaa
840gccttgaggg gctccgggag ggccctttgt gcggggggga gcggctcggg gggtgcgtgc
900gtgtgtgtgt gcgtggggag cgccgcgtgc ggctccgcgc tgcccggcgg ctgtgagcgc
960tgcgggcgcg gcgcggggct ttgtgcgctc cgcagtgtgc gcgaggggag cgcggccggg
1020ggcggtgccc cgcggtgcgg ggggggctgc gaggggaaca aaggctgcgt gcggggtgtg
1080tgcgtggggg ggtgagcagg gggtgtgggc gcgtcggtcg ggctgcaacc ccccctgcac
1140ccccctcccc gagttgctga gcacggcccg gcttcgggtg cggggctccg tacggggcgt
1200ggcgcggggc tcgccgtgcc gggcgggggg tggcggcagg tgggggtgcc gggcggggcg
1260gggccgcctc gggccgggga gggctcgggg gaggggcgcg gcggcccccg gagcgccggc
1320ggctgtcgag gcgcggcgag ccgcagccat tgccttttat ggtaatcgtg cgagagggcg
1380cagggacttc ctttgtccca aatctgtgcg gagccgaaat ctgggaggcg ccgccgcacc
1440ccctctagcg ggcgcggggc gaagcggtgc ggcgccggca ggaaggaaat gggcggggag
1500ggccttcgtg cgtcgccgcg ccgccgtccc cttctccctc tccagcctcg gggctgtccg
1560cggggggacg gctgccttcg ggggggacgg ggcagggcgg ggttcggctt ctggcgtgtg
1620accggcggct ctagagcctc tgctaaccat gttcatgcct tcttcttttt cctacagatc
1680cttaattaat aatacgactc actatagggg ccgccaccat gggacctaag aaaaagagga
1740aggtggcggc cgctgactac aaggatgacg acgataaacc aggtggcgga ggtagtggcg
1800gaggtggggt acccgccagt ccagcagccc aggtggatct gagaaccctc ggctacagcc
1860agcagcagca ggagaagatc aaaccaaagg tgcggtccac cgtcgctcag caccatgaag
1920cactggtggg gcacggtttc acacacgccc atattgtggc tctgtctcag catcccgctg
1980cactcgggac tgtggccgtc aaatatcagg acatgatcgc cgctctgcct gaggcaaccc
2040acgaagccat tgtgggcgtc ggaaagcagt ggagcggtgc cagagcactc gaagcactcc
2100tcaccgtcgc cggggaactg cggggtccac cactccagtc cggactggac actggacagc
2160tgctgaagat cgctaaacgc ggcggagtga cagctgtgga agctgtgcac gcttggagga
2220atgctctgac aggagcccca ctgaatctta ctccagaaca ggtcgtcgca atcgcaagta
2280acatcggcgg aaaacaggcc ctcgaaaccg tccagagact cctccccgtg ctgtgccagg
2340cccacggact gaccccacag caggtggtcg ccatcgctag caacggcgga gggaagcagg
2400ctctggagac cgtgcagagg ctgctccccg tcctgtgcca ggcacatggg ctcacacctc
2460agcaggtggt cgcaattgcc tccaatggtg gcggaaaaca ggccctggaa actgtgcaga
2520gactgctccc cgtgctgtgc caggctcacg gtctcacacc ccagcaggtg gtcgctatcg
2580catctcatga cgggggcaag caggcactgg agacagtgca gcggctgctc cctgtcctgt
2640gccaggccca cggactcact cctcagcagg tcgtcgccat tgctagtaac ggcggaggga
2700aacaggctct ggaaaccgtg cagcgcctgc tccccgtgct gtgccaagcc cacggcctga
2760ccccccagca ggtggtcgca atcgcctcaa acaatggtgg caagcaggcc ctggagactg
2820tgcagcgact gctcccagtg ctgtgccagg cccatggact cacaccacag caggtcgtcg
2880ctattgcaag caacaatgga gggaaacagg cactggaaac agtccagagg ctgctccccg
2940tgctgtgcca agcgcatgga ctcactcccc agcaggtcgt cgccatcgct tccaataacg
3000gcggcaagca ggccctggag accgtccaga gactgctccc cgtgctgtgc caagctcacg
3060gactcacacc tgagcaggtc gtggcaatcg cctctaacat tggagggaaa caggccctgg
3120aaactgtaca gcggctgctc cccgtgctgt gccaagcaca cggactcact ccacagcagg
3180tcgtggccat tgcaagtcat gacggaggca agcaggccct ggaaacagtg cagcgcctgc
3240tccctgtgct gtgccaggct catggtctga ctcctcagca ggtggtggcc atcgcttcca
3300acaatggagg gaagcaggcc ctggagaccg tacagagact gctccccgtg ctgtgccaag
3360cgcacggtct gacccctcag caggtcgtcg caatcgccag caatggcggg ggcaagcagg
3420ctctcgaaac cgtccagcgg ctcctcccag tcctctgtca ggctcacggc ctgaccccac
3480agcaggtcgt cgctattgct tctaatggcg gagggcggcc tgctctggag agcattgtgg
3540ctcagctgtc caggcccgat cctgccctgg ctagatccgc actcactaac gatcatctgg
3600tcgctctcgc ttgcctcggt ggacggcccg ctctggacgc agtcaaaaag ggtctccccc
3660atgctcccgc actgatcaag agaaccaaca ggagaattcc tgagggatcc gatcgtttaa
3720acgatgtggt gctggagaaa agcgacatcg aaaaattcaa gaaccagctg aggaccgagc
3780tgacaaatat tgatcactcc tacctgaagg gaatcgacat tgcctccaag aaaaagacct
3840ctaacgtgga gaatacagag tttgaagcta tctctactaa gattttcacc gatgaactgg
3900gcttcagcgg gaaacatctg ggcggaagca ataagccaga tggcctgctg tgggacgatg
3960actgcgccat cattctggac agtaaggctt acagcgaggg gttccccctg acagcctccc
4020acactgacgc tatgggcagg tatctgagac agtttactga gcggaaagag gaaatcaagc
4080ccacctggtg ggatattgcc cctgaacatc tggacaacac ctacttcgct tatgtgagcg
4140gctccttttc tggaaattat aaagagcagc tgcagaagtt ccgccaggat acaaaccacc
4200tggggggcgc cctggaattt gtgaagctgc tgctgctggc taacaattac aaaactcaga
4260agatgtccaa aaaggaggtg aaaaagtcta tcctggacta taacattagt tacgaggaat
4320atgcccccct gctggctgag atcgaatgaa cgcgtaaatg attgcagatc cactagttct
4380agaattccag ctgagcgccg gtcgctacca ttaccagttg gtctggtgtc aaaaataata
4440ataaccgggc aggggggatc tgcatggatc tttgtgaagg aaccttactt ctgtggtgtg
4500acataattgg acaaactacc tacagagatt taaagctcta aggtaaatat aaaattttta
4560agtgtataat gtgttaaact actgattcta attgtttgtg tattttagat tccaacctat
4620ggaactgatg aatgggagca gtggtggaat gccagatcca gacatgataa gatacattga
4680tgagtttgga caaaccacaa ctagaatgca gtgaaaaaaa tgctttattt gtgaaatttg
4740tgatgctatt gctttatttg taaccattat aagctgcaat aaacaagtta acaacaacaa
4800ttgcattcat tttatgtttc aggttcaggg ggaggtgtgg gaggtttttt aaagcaagta
4860aaacctctac aaatgtggta tggctgatta tgatctgcgg ccgccactgg ccgtcgtttt
4920acaacgtcgt gactgggaaa accctggcgt tacccaactt aatcgccttg cagcacatcc
4980ccctttcgcc agctggcgta atagcgaaga ggcccgcacc gatcgccctt cccaacagtt
5040gcgcagcctg aatggcgaat ggaacgcgcc ctgtagcggc gcattaagcg cggcgggtgt
5100ggtggttacg cgcagcgtga ccgctacact tgccagcgcc ctagcgcccg ctcctttcgc
5160tttcttccct tcctttctcg ccacgttcgc cggctttccc cgtcaagctc taaatcgggg
5220gctcccttta gggttccgat ttagtgcttt acggcacctc gaccccaaaa aacttgatta
5280gggtgatggt tcacgtagtg ggccatcgcc ctgatagacg gtttttcgcc ctttgacgtt
5340ggagtccacg ttctttaata gtggactctt gttccaaact ggaacaacac tcaaccctat
5400ctcggtctat tcttttgatt tataagggat tttgccgatt tcggcctatt ggttaaaaaa
5460tgagctgatt taacaaaaat ttaacgcgaa ttttaacaaa atattaacgc ttacaattta
5520ggtggcactt ttcggggaaa tgtgcgcgga acccctattt gtttattttt ctaaatacat
5580tcaaatatgt atccgctcat gagacaataa ccctgataaa tgcttcaata atattgaaaa
5640aggaagagta tgagtattca acatttccgt gtcgccctta ttcccttttt tgcggcattt
5700tgccttcctg tttttgctca cccagaaacg ctggtgaaag taaaagatgc tgaagatcag
5760ttgggtgcac gagtgggtta catcgaactg gatctcaaca gcggtaagat ccttgagagt
5820tttcgccccg aagaacgttt tccaatgatg agcactttta aagttctgct atgtggcgcg
5880gtattatccc gtattgacgc cgggcaagag caactcggtc gccgcataca ctattctcag
5940aatgacttgg ttgagtactc accagtcaca gaaaagcatc ttacggatgg catgacagta
6000agagaattat gcagtgctgc cataaccatg agtgataaca ctgcggccaa cttacttctg
6060acaacgatcg gaggaccgaa ggagctaacc gcttttttgc acaacatggg ggatcatgta
6120actcgccttg atcgttggga accggagctg aatgaagcca taccaaacga cgagcgtgac
6180accacgatgc ctgtagcaat ggcaacaacg ttgcgcaaac tattaactgg cgaactactt
6240actctagctt cccggcaaca attaatagac tggatggagg cggataaagt tgcaggacca
6300cttctgcgct cggcccttcc ggctggctgg tttattgctg ataaatctgg agccggtgag
6360cgtgggtctc gcggtatcat tgcagcactg gggccagatg gtaagccctc ccgtatcgta
6420gttatctaca cgacggggag tcaggcaact atggatgaac gaaatagaca gatcgctgag
6480ataggtgcct cactgattaa gcattggtaa ctgtcagacc aagtttactc atatatactt
6540tagattgatt taaaacttca tttttaattt aaaaggatct aggtgaagat cctttttgat
6600aatctcatga ccaaaatccc ttaacgtgag ttttcgttcc actgagcgtc agaccccgta
6660gaaaagatca aaggatcttc ttgagatcct ttttttctgc gcgtaatctg ctgcttgcaa
6720acaaaaaaac caccgctacc agcggtggtt tgtttgccgg atcaagagct accaactctt
6780tttccgaagg taactggctt cagcagagcg cagataccaa atactgtcct tctagtgtag
6840ccgtagttag gccaccactt caagaactct gtagcaccgc ctacatacct cgctctgcta
6900atcctgttac cagtggctgc tgccagtggc gataagtcgt gtcttaccgg gttggactca
6960agacgatagt taccggataa ggcgcagcgg tcgggctgaa cggggggttc gtgcacacag
7020cccagcttgg agcgaacgac ctacaccgaa ctgagatacc tacagcgtga gctatgagaa
7080agcgccacgc ttcccgaagg gagaaaggcg gacaggtatc cggtaagcgg cagggtcgga
7140acaggagagc gcacgaggga gcttccaggg ggaaacgcct ggtatcttta tagtcctgtc
7200gggtttcgcc acctctgact tgagcgtcga tttttgtgat gctcgtcagg ggggcggagc
7260ctatggaaaa acgccagcaa cgcggccttt ttacggttcc tggccttttg ctggcctttt
7320gctcacatgt tctttcctgc gttatcccct gattctgtgg ataaccgtat taccgccttt
7380gagtgagctg ataccgctcg ccgcagccga acgaccgagc gcagcgagtc agtgagcgag
7440gaagcggaag agcgcccaat acgcaaaccg cctctccccg cgcgttggcc gattcattaa
7500tgcagctggc acgacaggtt tcccgactgg aaagcgggca gtgagcgcaa cgcaattaat
7560gtgagttagc tcactcatta ggcaccccag gctttacact ttatgcttcc ggctcgtatg
7620ttgtgtggaa ttgtgagcgg ataacaattt cacacaggaa acagctatga ccatgaggcg
7680cgccggattc
7690357654DNAArtificial sequencepCAG-ArtTal1-FokI 35gacattgatt attgactagt
tattaatagt aatcaattac ggggtcatta gttcatagcc 60catatatgga gttccgcgtt
acataactta cggtaaatgg cccgcctggc tgaccgccca 120acgacccccg cccattgacg
tcaataatga cgtatgttcc catagtaacg ccaataggga 180ctttccattg acgtcaatgg
gtggagtatt tacggtaaac tgcccacttg gcagtacatc 240aagtgtatca tatgccaagt
acgcccccta ttgacgtcaa tgacggtaaa tggcccgcct 300ggcattatgc ccagtacatg
accttatggg actttcctac ttggcagtac atctacgtat 360tagtcatcgc tattaccatg
gtcgaggtga gccccacgtt ctgcttcact ctccccatct 420cccccccctc cccaccccca
attttgtatt tatttatttt ttaattattt tgtgcagcga 480tgggggcggg gggggggggg
gggcgcgcgc caggcggggc ggggcggggc gaggggcggg 540gcggggcgag gcggagaggt
gcggcggcag ccaatcagag cggcgcgctc cgaaagtttc 600cttttatggc gaggcggcgg
cggcggcggc cctataaaaa gcgaagcgcg cggcgggcgg 660gagtcgctgc gcgctgcctt
cgccccgtgc cccgctccgc cgccgcctcg cgccgcccgc 720cccggctctg actgaccgcg
ttactcccac aggtgagcgg gcgggacggc ccttctcctc 780cgggctgtaa ttagcgcttg
gtttaatgac ggcttgtttc ttttctgtgg ctgcgtgaaa 840gccttgaggg gctccgggag
ggccctttgt gcggggggga gcggctcggg gggtgcgtgc 900gtgtgtgtgt gcgtggggag
cgccgcgtgc ggctccgcgc tgcccggcgg ctgtgagcgc 960tgcgggcgcg gcgcggggct
ttgtgcgctc cgcagtgtgc gcgaggggag cgcggccggg 1020ggcggtgccc cgcggtgcgg
ggggggctgc gaggggaaca aaggctgcgt gcggggtgtg 1080tgcgtggggg ggtgagcagg
gggtgtgggc gcgtcggtcg ggctgcaacc ccccctgcac 1140ccccctcccc gagttgctga
gcacggcccg gcttcgggtg cggggctccg tacggggcgt 1200ggcgcggggc tcgccgtgcc
gggcgggggg tggcggcagg tgggggtgcc gggcggggcg 1260gggccgcctc gggccgggga
gggctcgggg gaggggcgcg gcggcccccg gagcgccggc 1320ggctgtcgag gcgcggcgag
ccgcagccat tgccttttat ggtaatcgtg cgagagggcg 1380cagggacttc ctttgtccca
aatctgtgcg gagccgaaat ctgggaggcg ccgccgcacc 1440ccctctagcg ggcgcggggc
gaagcggtgc ggcgccggca ggaaggaaat gggcggggag 1500ggccttcgtg cgtcgccgcg
ccgccgtccc cttctccctc tccagcctcg gggctgtccg 1560cggggggacg gctgccttcg
ggggggacgg ggcagggcgg ggttcggctt ctggcgtgtg 1620accggcggct ctagagcctc
tgctaaccat gttcatgcct tcttcttttt cctacagatc 1680cttaattaat aatacgactc
actatagggg ccgccaccat gggacctaag aaaaagagga 1740aggtggcggc cgctgactac
aaggatgacg acgataaacc aggtggcgga ggtagtggcg 1800gaggtggggt acccgccagt
ccagcagccc aggtggatct gagaaccctc ggctacagcc 1860agcagcagca ggagaagatc
aaaccaaagg tgcggtccac cgtcgctcag caccatgaag 1920cactggtggg gcacggtttc
acacacgccc atattgtggc tctgtctcag catcccgctg 1980cactcgggac tgtggccgtc
aaatatcagg acatgatcgc cgctctgcct gaggcaaccc 2040acgaagccat tgtgggcgtc
ggaaagcagt ggagcggtgc cagagcactc gaagcactcc 2100tcaccgtcgc cggggaactg
cggggtccac cactccagtc cggactggac actggacagc 2160tgctgaagat cgctaaacgc
ggcggagtga cagctgtgga agctgtgcac gcttggagga 2220atgctctgac aggagcccca
ctgaatctta ctccagaaca ggtcgtcgca atcgcaagta 2280acatcggcgg aaaacaggcc
ctcgaaaccg tccagagact cctccccgtg ctgtgccagg 2340cccacggact gaccccacag
caggtggtcg ccatcgctag caacggcgga gggaagcagg 2400ctctggagac cgtgcagagg
ctgctccccg tcctgtgcca ggcacatggg ctcacacctc 2460agcaggtggt cgcaattgcc
tccaatggtg gcggaaaaca ggccctggaa actgtgcaga 2520gactgctccc cgtgctgtgc
caggctcacg gtctcacacc ccagcaggtg gtcgctatcg 2580catctcatga cgggggcaag
caggcactgg agacagtgca gcggctgctc cctgtcctgt 2640gccaggccca cggactcact
cctcagcagg tcgtcgccat tgctagtaac ggcggaggga 2700aacaggctct ggaaaccgtg
cagcgcctgc tccccgtgct gtgccaagcc cacggcctga 2760ccccccagca ggtggtcgca
atcgcctcaa acaatggtgg caagcaggcc ctggagactg 2820tgcagcgact gctcccagtg
ctgtgccagg cccatggact cacaccacag caggtcgtcg 2880ctattgcaag caacaatgga
gggaaacagg cactggaaac agtccagagg ctgctccccg 2940tgctgtgcca agcgcatgga
ctcactcccc agcaggtcgt cgccatcgct tccaataacg 3000gcggcaagca ggccctggag
accgtccaga gactgctccc cgtgctgtgc caagctcacg 3060gactcacacc tgagcaggtc
gtggcaatcg cctctaacat tggagggaaa caggccctgg 3120aaactgtaca gcggctgctc
cccgtgctgt gccaagcaca cggactcact ccacagcagg 3180tcgtggccat tgcaagtcat
gacggaggca agcaggccct ggaaacagtg cagcgcctgc 3240tccctgtgct gtgccaggct
catggtctga ctcctcagca ggtggtggcc atcgcttcca 3300acaatggagg gaagcaggcc
ctggagaccg tacagagact gctccccgtg ctgtgccaag 3360cgcacggtct gacccctcag
caggtcgtcg caatcgccag caatggcggg ggcaagcagg 3420ctctcgaaac cgtccagcgg
ctcctcccag tcctctgtca ggctcacggc ctgaccccac 3480agcaggtcgt cgctattgct
tctaatggcg gagggcggcc tgctctggag agcattgtgg 3540ctcagctgtc caggcccgat
cctgccctgg ctagatccgc actcactaac gatcatctgg 3600tcgctctcgc ttgcctcggt
ggacggcccg ctctggacgc agtcaaaaag ggtctccccc 3660atgctcccgc actgatcaag
agaaccaaca ggagaattcc tgagggatcc gatcgtttaa 3720accagctcgt gaaaagcgaa
ctcgaagaaa agaaaagtga actgcggcac aaactgaaat 3780acgtcccaca tgaatacatt
gagctgatcg agattgctag gaactccacc caggacagaa 3840tcctcgagat gaaagtgatg
gaattcttta tgaaagtcta cgggtatcgg ggcaagcacc 3900tgggcggatc tcgcaaacca
gatggggcaa tctacactgt gggtagtccc atcgactatg 3960gcgtgattgt cgataccaag
gcctacagtg ggggttataa tctgcccatt ggacaggctg 4020acgagatgca gcgatacgtg
gaggaaaacc agacaagaaa taagcatatc aaccccaatg 4080agtggtggaa agtgtatcct
agctccgtca ctgaattcaa gtttctcttc gtgtcaggcc 4140actttaaggg aaactacaaa
gcacagctga ccaggctcaa tcatattaca aactgcaatg 4200gcgccgtgct gagcgtcgag
gaactgctca tcggcggaga gatgatcaag gccggcacac 4260tcaccctgga ggaggtccgc
cgaaaattca ataacgggga aatcaacttc tgaacgcgta 4320aatgattgca gatccactag
ttctagaatt ccagctgagc gccggtcgct accattacca 4380gttggtctgg tgtcaaaaat
aataataacc gggcaggggg gatctgcatg gatctttgtg 4440aaggaacctt acttctgtgg
tgtgacataa ttggacaaac tacctacaga gatttaaagc 4500tctaaggtaa atataaaatt
tttaagtgta taatgtgtta aactactgat tctaattgtt 4560tgtgtatttt agattccaac
ctatggaact gatgaatggg agcagtggtg gaatgccaga 4620tccagacatg ataagataca
ttgatgagtt tggacaaacc acaactagaa tgcagtgaaa 4680aaaatgcttt atttgtgaaa
tttgtgatgc tattgcttta tttgtaacca ttataagctg 4740caataaacaa gttaacaaca
acaattgcat tcattttatg tttcaggttc agggggaggt 4800gtgggaggtt ttttaaagca
agtaaaacct ctacaaatgt ggtatggctg attatgatct 4860gcggccgcca ctggccgtcg
ttttacaacg tcgtgactgg gaaaaccctg gcgttaccca 4920acttaatcgc cttgcagcac
atcccccttt cgccagctgg cgtaatagcg aagaggcccg 4980caccgatcgc ccttcccaac
agttgcgcag cctgaatggc gaatggaacg cgccctgtag 5040cggcgcatta agcgcggcgg
gtgtggtggt tacgcgcagc gtgaccgcta cacttgccag 5100cgccctagcg cccgctcctt
tcgctttctt cccttccttt ctcgccacgt tcgccggctt 5160tccccgtcaa gctctaaatc
gggggctccc tttagggttc cgatttagtg ctttacggca 5220cctcgacccc aaaaaacttg
attagggtga tggttcacgt agtgggccat cgccctgata 5280gacggttttt cgccctttga
cgttggagtc cacgttcttt aatagtggac tcttgttcca 5340aactggaaca acactcaacc
ctatctcggt ctattctttt gatttataag ggattttgcc 5400gatttcggcc tattggttaa
aaaatgagct gatttaacaa aaatttaacg cgaattttaa 5460caaaatatta acgcttacaa
tttaggtggc acttttcggg gaaatgtgcg cggaacccct 5520atttgtttat ttttctaaat
acattcaaat atgtatccgc tcatgagaca ataaccctga 5580taaatgcttc aataatattg
aaaaaggaag agtatgagta ttcaacattt ccgtgtcgcc 5640cttattccct tttttgcggc
attttgcctt cctgtttttg ctcacccaga aacgctggtg 5700aaagtaaaag atgctgaaga
tcagttgggt gcacgagtgg gttacatcga actggatctc 5760aacagcggta agatccttga
gagttttcgc cccgaagaac gttttccaat gatgagcact 5820tttaaagttc tgctatgtgg
cgcggtatta tcccgtattg acgccgggca agagcaactc 5880ggtcgccgca tacactattc
tcagaatgac ttggttgagt actcaccagt cacagaaaag 5940catcttacgg atggcatgac
agtaagagaa ttatgcagtg ctgccataac catgagtgat 6000aacactgcgg ccaacttact
tctgacaacg atcggaggac cgaaggagct aaccgctttt 6060ttgcacaaca tgggggatca
tgtaactcgc cttgatcgtt gggaaccgga gctgaatgaa 6120gccataccaa acgacgagcg
tgacaccacg atgcctgtag caatggcaac aacgttgcgc 6180aaactattaa ctggcgaact
acttactcta gcttcccggc aacaattaat agactggatg 6240gaggcggata aagttgcagg
accacttctg cgctcggccc ttccggctgg ctggtttatt 6300gctgataaat ctggagccgg
tgagcgtggg tctcgcggta tcattgcagc actggggcca 6360gatggtaagc cctcccgtat
cgtagttatc tacacgacgg ggagtcaggc aactatggat 6420gaacgaaata gacagatcgc
tgagataggt gcctcactga ttaagcattg gtaactgtca 6480gaccaagttt actcatatat
actttagatt gatttaaaac ttcattttta atttaaaagg 6540atctaggtga agatcctttt
tgataatctc atgaccaaaa tcccttaacg tgagttttcg 6600ttccactgag cgtcagaccc
cgtagaaaag atcaaaggat cttcttgaga tccttttttt 6660ctgcgcgtaa tctgctgctt
gcaaacaaaa aaaccaccgc taccagcggt ggtttgtttg 6720ccggatcaag agctaccaac
tctttttccg aaggtaactg gcttcagcag agcgcagata 6780ccaaatactg tccttctagt
gtagccgtag ttaggccacc acttcaagaa ctctgtagca 6840ccgcctacat acctcgctct
gctaatcctg ttaccagtgg ctgctgccag tggcgataag 6900tcgtgtctta ccgggttgga
ctcaagacga tagttaccgg ataaggcgca gcggtcgggc 6960tgaacggggg gttcgtgcac
acagcccagc ttggagcgaa cgacctacac cgaactgaga 7020tacctacagc gtgagctatg
agaaagcgcc acgcttcccg aagggagaaa ggcggacagg 7080tatccggtaa gcggcagggt
cggaacagga gagcgcacga gggagcttcc agggggaaac 7140gcctggtatc tttatagtcc
tgtcgggttt cgccacctct gacttgagcg tcgatttttg 7200tgatgctcgt caggggggcg
gagcctatgg aaaaacgcca gcaacgcggc ctttttacgg 7260ttcctggcct tttgctggcc
ttttgctcac atgttctttc ctgcgttatc ccctgattct 7320gtggataacc gtattaccgc
ctttgagtga gctgataccg ctcgccgcag ccgaacgacc 7380gagcgcagcg agtcagtgag
cgaggaagcg gaagagcgcc caatacgcaa accgcctctc 7440cccgcgcgtt ggccgattca
ttaatgcagc tggcacgaca ggtttcccga ctggaaagcg 7500ggcagtgagc gcaacgcaat
taatgtgagt tagctcactc attaggcacc ccaggcttta 7560cactttatgc ttccggctcg
tatgttgtgt ggaattgtga gcggataaca atttcacaca 7620ggaaacagct atgaccatga
ggcgcgccgg attc 765436947PRTArtificial
sequenceArtTal1-Alw 36Met Gly Pro Lys Lys Lys Arg Lys Val Ala Ala Ala Asp
Tyr Lys Asp 1 5 10 15
Asp Asp Asp Lys Pro Gly Gly Gly Gly Ser Gly Gly Gly Gly Val Pro
20 25 30 Ala Ser Pro Ala
Ala Gln Val Asp Leu Arg Thr Leu Gly Tyr Ser Gln 35
40 45 Gln Gln Gln Glu Lys Ile Lys Pro Lys
Val Arg Ser Thr Val Ala Gln 50 55
60 His His Glu Ala Leu Val Gly His Gly Phe Thr His Ala
His Ile Val 65 70 75
80 Ala Leu Ser Gln His Pro Ala Ala Leu Gly Thr Val Ala Val Lys Tyr
85 90 95 Gln Asp Met Ile
Ala Ala Leu Pro Glu Ala Thr His Glu Ala Ile Val 100
105 110 Gly Val Gly Lys Gln Trp Ser Gly Ala
Arg Ala Leu Glu Ala Leu Leu 115 120
125 Thr Val Ala Gly Glu Leu Arg Gly Pro Pro Leu Gln Ser Gly
Leu Asp 130 135 140
Thr Gly Gln Leu Leu Lys Ile Ala Lys Arg Gly Gly Val Thr Ala Val 145
150 155 160 Glu Ala Val His Ala
Trp Arg Asn Ala Leu Thr Gly Ala Pro Leu Asn 165
170 175 Leu Thr Pro Glu Gln Val Val Ala Ile Ala
Ser Asn Ile Gly Gly Lys 180 185
190 Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln
Ala 195 200 205 His
Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn Gly Gly 210
215 220 Gly Lys Gln Ala Leu Glu
Thr Val Gln Arg Leu Leu Pro Val Leu Cys 225 230
235 240 Gln Ala His Gly Leu Thr Pro Gln Gln Val Val
Ala Ile Ala Ser Asn 245 250
255 Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val
260 265 270 Leu Cys
Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala 275
280 285 Ser His Asp Gly Gly Lys Gln
Ala Leu Glu Thr Val Gln Arg Leu Leu 290 295
300 Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln
Gln Val Val Ala 305 310 315
320 Ile Ala Ser Asn Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg
325 330 335 Leu Leu Pro
Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val 340
345 350 Val Ala Ile Ala Ser Asn Asn Gly
Gly Lys Gln Ala Leu Glu Thr Val 355 360
365 Gln Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu
Thr Pro Gln 370 375 380
Gln Val Val Ala Ile Ala Ser Asn Asn Gly Gly Lys Gln Ala Leu Glu 385
390 395 400 Thr Val Gln Arg
Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr 405
410 415 Pro Gln Gln Val Val Ala Ile Ala Ser
Asn Asn Gly Gly Lys Gln Ala 420 425
430 Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala
His Gly 435 440 445
Leu Thr Pro Glu Gln Val Val Ala Ile Ala Ser Asn Ile Gly Gly Lys 450
455 460 Gln Ala Leu Glu Thr
Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala 465 470
475 480 His Gly Leu Thr Pro Gln Gln Val Val Ala
Ile Ala Ser His Asp Gly 485 490
495 Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu
Cys 500 505 510 Gln
Ala His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn 515
520 525 Asn Gly Gly Lys Gln Ala
Leu Glu Thr Val Gln Arg Leu Leu Pro Val 530 535
540 Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln
Val Val Ala Ile Ala 545 550 555
560 Ser Asn Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu
565 570 575 Pro Val
Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala 580
585 590 Ile Ala Ser Asn Gly Gly Gly
Arg Pro Ala Leu Glu Ser Ile Val Ala 595 600
605 Gln Leu Ser Arg Pro Asp Pro Ala Leu Ala Arg Ser
Ala Leu Thr Asn 610 615 620
Asp His Leu Val Ala Leu Ala Cys Leu Gly Gly Arg Pro Ala Leu Asp 625
630 635 640 Ala Val Lys
Lys Gly Leu Pro His Ala Pro Ala Leu Ile Lys Arg Thr 645
650 655 Asn Arg Arg Ile Pro Glu Gly Ser
Asp Arg Leu Asn Lys Glu Thr Asn 660 665
670 Ile Leu Leu Val Glu Gln Leu Glu Glu Thr Leu Asn Arg
Asn Arg Ile 675 680 685
Leu Phe Glu Lys Asn Ser Ser Ile Ala Gln Ala Pro Ile Gly Glu Ile 690
695 700 Lys Asn Tyr Arg
Tyr His Leu Glu Glu Leu Leu Phe Glu Asn Asn Glu 705 710
715 720 Lys Lys Phe Ala Glu Asn Gln Lys Asn
Glu Trp Asp Glu Ile Leu Ala 725 730
735 Tyr Met Asp Leu Leu Ile Ser Pro Lys Pro Ile Ser Ile Glu
Ile Ala 740 745 750
Asp Lys Glu Ile Ser Ile Pro Ser Gly Glu Arg Pro Ala Tyr Phe Glu
755 760 765 Trp Val Leu Trp
Arg Ala Phe Leu Ala Leu Asn His Leu Ile Ile Glu 770
775 780 Pro Gln Gln Cys Arg Arg Phe Lys
Val Asp Gln Asp Phe Lys Pro Ile 785 790
795 800 His Asn Ala Pro Gly Gly Gly Ala Asp Val Ile Phe
Glu Tyr Glu Asn 805 810
815 Phe Lys Ile Leu Gly Glu Val Thr Leu Thr Ser Asn Ser Arg Gln Glu
820 825 830 Ala Ala Glu
Gly Glu Pro Val Arg Arg His Ile Ala Val Glu Thr Val 835
840 845 Asn Thr Pro Asp Lys Asp Val Tyr
Gly Leu Phe Leu Ala Leu Thr Ile 850 855
860 Asp Thr Asn Thr Ala Glu Thr Phe Arg His Gly Ala Trp
Tyr His Gln 865 870 875
880 Glu Glu Leu Met Asp Val Lys Ile Leu Pro Leu Thr Leu Glu Ser Phe
885 890 895 Lys Lys Tyr Leu
Glu Ser Leu Arg Lys Lys Asn Gln Val Glu Thr Gly 900
905 910 Ile Phe Asp Leu Lys Lys Met Met Asp
Glu Ser Leu Lys Leu Arg Glu 915 920
925 Thr Leu Thr Ala Pro Gln Trp Lys Asn Glu Ile Thr Asn Lys
Phe Ala 930 935 940
Arg Pro Ile 945 37869PRTArtificial sequenceArtTal1-CLEDORF 37Met
Gly Pro Lys Lys Lys Arg Lys Val Ala Ala Ala Asp Tyr Lys Asp 1
5 10 15 Asp Asp Asp Lys Pro Gly
Gly Gly Gly Ser Gly Gly Gly Gly Val Pro 20
25 30 Ala Ser Pro Ala Ala Gln Val Asp Leu Arg
Thr Leu Gly Tyr Ser Gln 35 40
45 Gln Gln Gln Glu Lys Ile Lys Pro Lys Val Arg Ser Thr Val
Ala Gln 50 55 60
His His Glu Ala Leu Val Gly His Gly Phe Thr His Ala His Ile Val 65
70 75 80 Ala Leu Ser Gln His
Pro Ala Ala Leu Gly Thr Val Ala Val Lys Tyr 85
90 95 Gln Asp Met Ile Ala Ala Leu Pro Glu Ala
Thr His Glu Ala Ile Val 100 105
110 Gly Val Gly Lys Gln Trp Ser Gly Ala Arg Ala Leu Glu Ala Leu
Leu 115 120 125 Thr
Val Ala Gly Glu Leu Arg Gly Pro Pro Leu Gln Ser Gly Leu Asp 130
135 140 Thr Gly Gln Leu Leu Lys
Ile Ala Lys Arg Gly Gly Val Thr Ala Val 145 150
155 160 Glu Ala Val His Ala Trp Arg Asn Ala Leu Thr
Gly Ala Pro Leu Asn 165 170
175 Leu Thr Pro Glu Gln Val Val Ala Ile Ala Ser Asn Ile Gly Gly Lys
180 185 190 Gln Ala
Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala 195
200 205 His Gly Leu Thr Pro Gln Gln
Val Val Ala Ile Ala Ser Asn Gly Gly 210 215
220 Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu
Pro Val Leu Cys 225 230 235
240 Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn
245 250 255 Gly Gly Gly
Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val 260
265 270 Leu Cys Gln Ala His Gly Leu Thr
Pro Gln Gln Val Val Ala Ile Ala 275 280
285 Ser His Asp Gly Gly Lys Gln Ala Leu Glu Thr Val Gln
Arg Leu Leu 290 295 300
Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala 305
310 315 320 Ile Ala Ser Asn
Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg 325
330 335 Leu Leu Pro Val Leu Cys Gln Ala His
Gly Leu Thr Pro Gln Gln Val 340 345
350 Val Ala Ile Ala Ser Asn Asn Gly Gly Lys Gln Ala Leu Glu
Thr Val 355 360 365
Gln Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln 370
375 380 Gln Val Val Ala Ile
Ala Ser Asn Asn Gly Gly Lys Gln Ala Leu Glu 385 390
395 400 Thr Val Gln Arg Leu Leu Pro Val Leu Cys
Gln Ala His Gly Leu Thr 405 410
415 Pro Gln Gln Val Val Ala Ile Ala Ser Asn Asn Gly Gly Lys Gln
Ala 420 425 430 Leu
Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly 435
440 445 Leu Thr Pro Glu Gln Val
Val Ala Ile Ala Ser Asn Ile Gly Gly Lys 450 455
460 Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro
Val Leu Cys Gln Ala 465 470 475
480 His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser His Asp Gly
485 490 495 Gly Lys
Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys 500
505 510 Gln Ala His Gly Leu Thr Pro
Gln Gln Val Val Ala Ile Ala Ser Asn 515 520
525 Asn Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg
Leu Leu Pro Val 530 535 540
Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala 545
550 555 560 Ser Asn Gly
Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu 565
570 575 Pro Val Leu Cys Gln Ala His Gly
Leu Thr Pro Gln Gln Val Val Ala 580 585
590 Ile Ala Ser Asn Gly Gly Gly Arg Pro Ala Leu Glu Ser
Ile Val Ala 595 600 605
Gln Leu Ser Arg Pro Asp Pro Ala Leu Ala Arg Ser Ala Leu Thr Asn 610
615 620 Asp His Leu Val
Ala Leu Ala Cys Leu Gly Gly Arg Pro Ala Leu Asp 625 630
635 640 Ala Val Lys Lys Gly Leu Pro His Ala
Pro Ala Leu Ile Lys Arg Thr 645 650
655 Asn Arg Arg Ile Pro Glu Gly Ser Asp Arg Leu Asn Lys Leu
Ala Lys 660 665 670
Ser Ser Gln Ser Glu Thr Lys Glu Lys Leu Arg Glu Lys Leu Arg Asn
675 680 685 Leu Pro His Glu
Tyr Leu Ser Leu Val Asp Leu Ala Tyr Asp Ser Lys 690
695 700 Gln Asn Arg Leu Phe Glu Met Lys
Val Ile Glu Leu Leu Thr Glu Glu 705 710
715 720 Cys Gly Phe Gln Gly Leu His Leu Gly Gly Ser Arg
Arg Pro Asp Gly 725 730
735 Val Leu Tyr Thr Ala Gly Leu Thr Asp Asn Tyr Gly Ile Ile Leu Asp
740 745 750 Thr Lys Ala
Tyr Ser Ser Gly Tyr Ser Leu Pro Ile Ala Gln Ala Asp 755
760 765 Glu Met Glu Arg Tyr Val Arg Glu
Asn Gln Thr Arg Asp Glu Leu Val 770 775
780 Asn Pro Asn Gln Trp Trp Glu Asn Phe Glu Asn Gly Leu
Gly Thr Phe 785 790 795
800 Tyr Phe Leu Phe Val Ala Gly His Phe Asn Gly Asn Val Gln Ala Gln
805 810 815 Leu Glu Arg Ile
Ser Arg Asn Thr Gly Val Leu Gly Ala Ala Ala Ser 820
825 830 Ile Ser Gln Leu Leu Leu Leu Ala Asp
Ala Ile Arg Gly Gly Arg Met 835 840
845 Asp Arg Glu Arg Leu Arg His Leu Met Phe Gln Asn Glu Glu
Phe Leu 850 855 860
Leu Glu Gln Glu Leu 865 38867PRTArtificial
sequenceArtTal1-Clo051 38Met Gly Pro Lys Lys Lys Arg Lys Val Ala Ala Ala
Asp Tyr Lys Asp 1 5 10
15 Asp Asp Asp Lys Pro Gly Gly Gly Gly Ser Gly Gly Gly Gly Val Pro
20 25 30 Ala Ser Pro
Ala Ala Gln Val Asp Leu Arg Thr Leu Gly Tyr Ser Gln 35
40 45 Gln Gln Gln Glu Lys Ile Lys Pro
Lys Val Arg Ser Thr Val Ala Gln 50 55
60 His His Glu Ala Leu Val Gly His Gly Phe Thr His Ala
His Ile Val 65 70 75
80 Ala Leu Ser Gln His Pro Ala Ala Leu Gly Thr Val Ala Val Lys Tyr
85 90 95 Gln Asp Met Ile
Ala Ala Leu Pro Glu Ala Thr His Glu Ala Ile Val 100
105 110 Gly Val Gly Lys Gln Trp Ser Gly Ala
Arg Ala Leu Glu Ala Leu Leu 115 120
125 Thr Val Ala Gly Glu Leu Arg Gly Pro Pro Leu Gln Ser Gly
Leu Asp 130 135 140
Thr Gly Gln Leu Leu Lys Ile Ala Lys Arg Gly Gly Val Thr Ala Val 145
150 155 160 Glu Ala Val His Ala
Trp Arg Asn Ala Leu Thr Gly Ala Pro Leu Asn 165
170 175 Leu Thr Pro Glu Gln Val Val Ala Ile Ala
Ser Asn Ile Gly Gly Lys 180 185
190 Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln
Ala 195 200 205 His
Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn Gly Gly 210
215 220 Gly Lys Gln Ala Leu Glu
Thr Val Gln Arg Leu Leu Pro Val Leu Cys 225 230
235 240 Gln Ala His Gly Leu Thr Pro Gln Gln Val Val
Ala Ile Ala Ser Asn 245 250
255 Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val
260 265 270 Leu Cys
Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala 275
280 285 Ser His Asp Gly Gly Lys Gln
Ala Leu Glu Thr Val Gln Arg Leu Leu 290 295
300 Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln
Gln Val Val Ala 305 310 315
320 Ile Ala Ser Asn Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg
325 330 335 Leu Leu Pro
Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val 340
345 350 Val Ala Ile Ala Ser Asn Asn Gly
Gly Lys Gln Ala Leu Glu Thr Val 355 360
365 Gln Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu
Thr Pro Gln 370 375 380
Gln Val Val Ala Ile Ala Ser Asn Asn Gly Gly Lys Gln Ala Leu Glu 385
390 395 400 Thr Val Gln Arg
Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr 405
410 415 Pro Gln Gln Val Val Ala Ile Ala Ser
Asn Asn Gly Gly Lys Gln Ala 420 425
430 Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala
His Gly 435 440 445
Leu Thr Pro Glu Gln Val Val Ala Ile Ala Ser Asn Ile Gly Gly Lys 450
455 460 Gln Ala Leu Glu Thr
Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala 465 470
475 480 His Gly Leu Thr Pro Gln Gln Val Val Ala
Ile Ala Ser His Asp Gly 485 490
495 Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu
Cys 500 505 510 Gln
Ala His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn 515
520 525 Asn Gly Gly Lys Gln Ala
Leu Glu Thr Val Gln Arg Leu Leu Pro Val 530 535
540 Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln
Val Val Ala Ile Ala 545 550 555
560 Ser Asn Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu
565 570 575 Pro Val
Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala 580
585 590 Ile Ala Ser Asn Gly Gly Gly
Arg Pro Ala Leu Glu Ser Ile Val Ala 595 600
605 Gln Leu Ser Arg Pro Asp Pro Ala Leu Ala Arg Ser
Ala Leu Thr Asn 610 615 620
Asp His Leu Val Ala Leu Ala Cys Leu Gly Gly Arg Pro Ala Leu Asp 625
630 635 640 Ala Val Lys
Lys Gly Leu Pro His Ala Pro Ala Leu Ile Lys Arg Thr 645
650 655 Asn Arg Arg Ile Pro Glu Gly Ser
Asp Arg Leu Asn Glu Gly Ile Lys 660 665
670 Ser Asn Ile Ser Leu Leu Lys Asp Glu Leu Arg Gly Gln
Ile Ser His 675 680 685
Ile Ser His Glu Tyr Leu Ser Leu Ile Asp Leu Ala Phe Asp Ser Lys 690
695 700 Gln Asn Arg Leu
Phe Glu Met Lys Val Leu Glu Leu Leu Val Asn Glu 705 710
715 720 Tyr Gly Phe Lys Gly Arg His Leu Gly
Gly Ser Arg Lys Pro Asp Gly 725 730
735 Ile Val Tyr Ser Thr Thr Leu Glu Asp Asn Phe Gly Ile Ile
Val Asp 740 745 750
Thr Lys Ala Tyr Ser Glu Gly Tyr Ser Leu Pro Ile Ser Gln Ala Asp
755 760 765 Glu Met Glu Arg
Tyr Val Arg Glu Asn Ser Asn Arg Asp Glu Glu Val 770
775 780 Asn Pro Asn Lys Trp Trp Glu Asn
Phe Ser Glu Glu Val Lys Lys Tyr 785 790
795 800 Tyr Phe Val Phe Ile Ser Gly Ser Phe Lys Gly Lys
Phe Glu Glu Gln 805 810
815 Leu Arg Arg Leu Ser Met Thr Thr Gly Val Asn Gly Ser Ala Val Asn
820 825 830 Val Val Asn
Leu Leu Leu Gly Ala Glu Lys Ile Arg Ser Gly Glu Met 835
840 845 Thr Ile Glu Glu Leu Glu Arg Ala
Met Phe Asn Asn Ser Glu Phe Ile 850 855
860 Leu Lys Tyr 865 39918PRTArtificial
sequenceArtTal1-Mly 39Met Gly Pro Lys Lys Lys Arg Lys Val Ala Ala Ala Asp
Tyr Lys Asp 1 5 10 15
Asp Asp Asp Lys Pro Gly Gly Gly Gly Ser Gly Gly Gly Gly Val Pro
20 25 30 Ala Ser Pro Ala
Ala Gln Val Asp Leu Arg Thr Leu Gly Tyr Ser Gln 35
40 45 Gln Gln Gln Glu Lys Ile Lys Pro Lys
Val Arg Ser Thr Val Ala Gln 50 55
60 His His Glu Ala Leu Val Gly His Gly Phe Thr His Ala
His Ile Val 65 70 75
80 Ala Leu Ser Gln His Pro Ala Ala Leu Gly Thr Val Ala Val Lys Tyr
85 90 95 Gln Asp Met Ile
Ala Ala Leu Pro Glu Ala Thr His Glu Ala Ile Val 100
105 110 Gly Val Gly Lys Gln Trp Ser Gly Ala
Arg Ala Leu Glu Ala Leu Leu 115 120
125 Thr Val Ala Gly Glu Leu Arg Gly Pro Pro Leu Gln Ser Gly
Leu Asp 130 135 140
Thr Gly Gln Leu Leu Lys Ile Ala Lys Arg Gly Gly Val Thr Ala Val 145
150 155 160 Glu Ala Val His Ala
Trp Arg Asn Ala Leu Thr Gly Ala Pro Leu Asn 165
170 175 Leu Thr Pro Glu Gln Val Val Ala Ile Ala
Ser Asn Ile Gly Gly Lys 180 185
190 Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln
Ala 195 200 205 His
Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn Gly Gly 210
215 220 Gly Lys Gln Ala Leu Glu
Thr Val Gln Arg Leu Leu Pro Val Leu Cys 225 230
235 240 Gln Ala His Gly Leu Thr Pro Gln Gln Val Val
Ala Ile Ala Ser Asn 245 250
255 Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val
260 265 270 Leu Cys
Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala 275
280 285 Ser His Asp Gly Gly Lys Gln
Ala Leu Glu Thr Val Gln Arg Leu Leu 290 295
300 Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln
Gln Val Val Ala 305 310 315
320 Ile Ala Ser Asn Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg
325 330 335 Leu Leu Pro
Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val 340
345 350 Val Ala Ile Ala Ser Asn Asn Gly
Gly Lys Gln Ala Leu Glu Thr Val 355 360
365 Gln Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu
Thr Pro Gln 370 375 380
Gln Val Val Ala Ile Ala Ser Asn Asn Gly Gly Lys Gln Ala Leu Glu 385
390 395 400 Thr Val Gln Arg
Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr 405
410 415 Pro Gln Gln Val Val Ala Ile Ala Ser
Asn Asn Gly Gly Lys Gln Ala 420 425
430 Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala
His Gly 435 440 445
Leu Thr Pro Glu Gln Val Val Ala Ile Ala Ser Asn Ile Gly Gly Lys 450
455 460 Gln Ala Leu Glu Thr
Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala 465 470
475 480 His Gly Leu Thr Pro Gln Gln Val Val Ala
Ile Ala Ser His Asp Gly 485 490
495 Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu
Cys 500 505 510 Gln
Ala His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn 515
520 525 Asn Gly Gly Lys Gln Ala
Leu Glu Thr Val Gln Arg Leu Leu Pro Val 530 535
540 Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln
Val Val Ala Ile Ala 545 550 555
560 Ser Asn Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu
565 570 575 Pro Val
Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala 580
585 590 Ile Ala Ser Asn Gly Gly Gly
Arg Pro Ala Leu Glu Ser Ile Val Ala 595 600
605 Gln Leu Ser Arg Pro Asp Pro Ala Leu Ala Arg Ser
Ala Leu Thr Asn 610 615 620
Asp His Leu Val Ala Leu Ala Cys Leu Gly Gly Arg Pro Ala Leu Asp 625
630 635 640 Ala Val Lys
Lys Gly Leu Pro His Ala Pro Ala Leu Ile Lys Arg Thr 645
650 655 Asn Arg Arg Ile Pro Glu Gly Ser
Asp Arg Leu Asn Ile Asn Ser Lys 660 665
670 Ile Lys Gln Leu Asp Asp Ser Ile Asn Val Glu Ser Leu
Lys Ile Asp 675 680 685
Asp Ala Lys Asp Leu Leu Asn Asp Leu Glu Ile Gln Arg Lys Ala Lys 690
695 700 Thr Ile Glu Asp
Thr Val Asn His Leu Lys Leu Arg Ser Asp Ile Glu 705 710
715 720 Asp Ile Leu Asp Val Phe Ala Lys Ile
Lys Lys Arg Asp Val Pro Asp 725 730
735 Val Pro Leu Phe Leu Glu Trp Asn Ile Trp Arg Ala Phe Ala
Ala Leu 740 745 750
Asn His Thr Gln Ala Ile Glu Gly Asn Phe Ile Val Asp Leu Asp Gly
755 760 765 Met Pro Leu Asn
Thr Ala Pro Gly Lys Lys Pro Asp Ile Glu Ile Asn 770
775 780 Tyr Gly Ser Phe Ser Cys Ile Val
Glu Val Thr Met Ser Ser Gly Glu 785 790
795 800 Thr Gln Phe Asn Met Glu Gly Ser Ser Val Pro Arg
His Tyr Gly Asp 805 810
815 Leu Val Arg Lys Val Asp His Asp Ala Tyr Cys Ile Phe Ile Ala Pro
820 825 830 Lys Val Ala
Pro Gly Thr Lys Ala His Phe Phe Asn Leu Asn Arg Leu 835
840 845 Ser Thr Lys His Tyr Gly Gly Lys
Thr Lys Ile Ile Pro Met Ser Leu 850 855
860 Asp Asp Phe Ile Cys Phe Leu Gln Val Gly Ile Thr His
Asn Phe Gln 865 870 875
880 Asp Ile Asn Lys Leu Lys Asn Trp Leu Asp Asn Leu Ile Asn Phe Asn
885 890 895 Leu Glu Ser Glu
Asp Glu Glu Ile Trp Phe Glu Glu Ile Ile Ser Lys 900
905 910 Ile Ser Thr Trp Ala Ile 915
40881PRTArtificial sequenceArtTal1-Pept071 40Met Gly Pro Lys
Lys Lys Arg Lys Val Ala Ala Ala Asp Tyr Lys Asp 1 5
10 15 Asp Asp Asp Lys Pro Gly Gly Gly Gly
Ser Gly Gly Gly Gly Val Pro 20 25
30 Ala Ser Pro Ala Ala Gln Val Asp Leu Arg Thr Leu Gly Tyr
Ser Gln 35 40 45
Gln Gln Gln Glu Lys Ile Lys Pro Lys Val Arg Ser Thr Val Ala Gln 50
55 60 His His Glu Ala Leu
Val Gly His Gly Phe Thr His Ala His Ile Val 65 70
75 80 Ala Leu Ser Gln His Pro Ala Ala Leu Gly
Thr Val Ala Val Lys Tyr 85 90
95 Gln Asp Met Ile Ala Ala Leu Pro Glu Ala Thr His Glu Ala Ile
Val 100 105 110 Gly
Val Gly Lys Gln Trp Ser Gly Ala Arg Ala Leu Glu Ala Leu Leu 115
120 125 Thr Val Ala Gly Glu Leu
Arg Gly Pro Pro Leu Gln Ser Gly Leu Asp 130 135
140 Thr Gly Gln Leu Leu Lys Ile Ala Lys Arg Gly
Gly Val Thr Ala Val 145 150 155
160 Glu Ala Val His Ala Trp Arg Asn Ala Leu Thr Gly Ala Pro Leu Asn
165 170 175 Leu Thr
Pro Glu Gln Val Val Ala Ile Ala Ser Asn Ile Gly Gly Lys 180
185 190 Gln Ala Leu Glu Thr Val Gln
Arg Leu Leu Pro Val Leu Cys Gln Ala 195 200
205 His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala
Ser Asn Gly Gly 210 215 220
Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys 225
230 235 240 Gln Ala His
Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn 245
250 255 Gly Gly Gly Lys Gln Ala Leu Glu
Thr Val Gln Arg Leu Leu Pro Val 260 265
270 Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val Val
Ala Ile Ala 275 280 285
Ser His Asp Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu 290
295 300 Pro Val Leu Cys
Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala 305 310
315 320 Ile Ala Ser Asn Gly Gly Gly Lys Gln
Ala Leu Glu Thr Val Gln Arg 325 330
335 Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln
Gln Val 340 345 350
Val Ala Ile Ala Ser Asn Asn Gly Gly Lys Gln Ala Leu Glu Thr Val
355 360 365 Gln Arg Leu Leu
Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln 370
375 380 Gln Val Val Ala Ile Ala Ser Asn
Asn Gly Gly Lys Gln Ala Leu Glu 385 390
395 400 Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala
His Gly Leu Thr 405 410
415 Pro Gln Gln Val Val Ala Ile Ala Ser Asn Asn Gly Gly Lys Gln Ala
420 425 430 Leu Glu Thr
Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly 435
440 445 Leu Thr Pro Glu Gln Val Val Ala
Ile Ala Ser Asn Ile Gly Gly Lys 450 455
460 Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu
Cys Gln Ala 465 470 475
480 His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser His Asp Gly
485 490 495 Gly Lys Gln Ala
Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys 500
505 510 Gln Ala His Gly Leu Thr Pro Gln Gln
Val Val Ala Ile Ala Ser Asn 515 520
525 Asn Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu
Pro Val 530 535 540
Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala 545
550 555 560 Ser Asn Gly Gly Gly
Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu 565
570 575 Pro Val Leu Cys Gln Ala His Gly Leu Thr
Pro Gln Gln Val Val Ala 580 585
590 Ile Ala Ser Asn Gly Gly Gly Arg Pro Ala Leu Glu Ser Ile Val
Ala 595 600 605 Gln
Leu Ser Arg Pro Asp Pro Ala Leu Ala Arg Ser Ala Leu Thr Asn 610
615 620 Asp His Leu Val Ala Leu
Ala Cys Leu Gly Gly Arg Pro Ala Leu Asp 625 630
635 640 Ala Val Lys Lys Gly Leu Pro His Ala Pro Ala
Leu Ile Lys Arg Thr 645 650
655 Asn Arg Arg Ile Pro Glu Gly Ser Asp Arg Leu Asn Lys Ile Ser Lys
660 665 670 Thr Asn
Val Leu Glu Leu Lys Asp Lys Val Arg Asp Lys Leu Lys Tyr 675
680 685 Val Asp His Arg Tyr Leu Ala
Leu Ile Asp Leu Ala Tyr Asp Gly Thr 690 695
700 Ala Asn Arg Asp Phe Glu Ile Gln Thr Ile Asp Leu
Leu Ile Asn Glu 705 710 715
720 Leu Lys Phe Lys Gly Val Arg Leu Gly Glu Ser Arg Lys Pro Asp Gly
725 730 735 Ile Ile Ser
Tyr Asn Ile Asn Gly Val Ile Ile Asp Asn Lys Ala Tyr 740
745 750 Ser Thr Gly Tyr Asn Leu Pro Ile
Asn Gln Ala Asp Glu Met Ile Arg 755 760
765 Tyr Ile Glu Glu Asn Gln Thr Arg Asp Glu Lys Ile Asn
Ser Asn Lys 770 775 780
Trp Trp Glu Ser Phe Asp Asp Lys Val Lys Asp Phe Asn Tyr Leu Phe 785
790 795 800 Val Ser Ser Phe
Phe Lys Gly Asn Phe Lys Asn Asn Leu Lys His Ile 805
810 815 Ala Asn Arg Thr Gly Val Ser Gly Gly
Ala Ile Asn Val Glu Asn Leu 820 825
830 Leu Tyr Phe Ala Glu Glu Leu Lys Ala Gly Arg Leu Ser Tyr
Val Asp 835 840 845
Ser Phe Lys Met Tyr Asp Asn Asp Glu Ile Tyr Val Gly Asp Phe Ser 850
855 860 Asp Tyr Ser Tyr Val
Lys Phe Ala Ala Glu Glu Glu Gly Glu Tyr Leu 865 870
875 880 Thr 41831PRTArtificial
sequenceArtTal1-Sbf 41Met Gly Pro Lys Lys Lys Arg Lys Val Ala Ala Ala Asp
Tyr Lys Asp 1 5 10 15
Asp Asp Asp Lys Pro Gly Gly Gly Gly Ser Gly Gly Gly Gly Val Pro
20 25 30 Ala Ser Pro Ala
Ala Gln Val Asp Leu Arg Thr Leu Gly Tyr Ser Gln 35
40 45 Gln Gln Gln Glu Lys Ile Lys Pro
Lys Val Arg Ser Thr Val Ala Gln 50 55
60 His His Glu Ala Leu Val Gly His Gly Phe Thr His Ala
His Ile Val 65 70 75
80 Ala Leu Ser Gln His Pro Ala Ala Leu Gly Thr Val Ala Val Lys Tyr
85 90 95 Gln Asp Met Ile
Ala Ala Leu Pro Glu Ala Thr His Glu Ala Ile Val 100
105 110 Gly Val Gly Lys Gln Trp Ser Gly Ala
Arg Ala Leu Glu Ala Leu Leu 115 120
125 Thr Val Ala Gly Glu Leu Arg Gly Pro Pro Leu Gln Ser Gly
Leu Asp 130 135 140
Thr Gly Gln Leu Leu Lys Ile Ala Lys Arg Gly Gly Val Thr Ala Val 145
150 155 160 Glu Ala Val His Ala
Trp Arg Asn Ala Leu Thr Gly Ala Pro Leu Asn 165
170 175 Leu Thr Pro Glu Gln Val Val Ala Ile Ala
Ser Asn Ile Gly Gly Lys 180 185
190 Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln
Ala 195 200 205 His
Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn Gly Gly 210
215 220 Gly Lys Gln Ala Leu Glu
Thr Val Gln Arg Leu Leu Pro Val Leu Cys 225 230
235 240 Gln Ala His Gly Leu Thr Pro Gln Gln Val Val
Ala Ile Ala Ser Asn 245 250
255 Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val
260 265 270 Leu Cys
Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala 275
280 285 Ser His Asp Gly Gly Lys Gln
Ala Leu Glu Thr Val Gln Arg Leu Leu 290 295
300 Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln
Gln Val Val Ala 305 310 315
320 Ile Ala Ser Asn Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg
325 330 335 Leu Leu Pro
Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val 340
345 350 Val Ala Ile Ala Ser Asn Asn Gly
Gly Lys Gln Ala Leu Glu Thr Val 355 360
365 Gln Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu
Thr Pro Gln 370 375 380
Gln Val Val Ala Ile Ala Ser Asn Asn Gly Gly Lys Gln Ala Leu Glu 385
390 395 400 Thr Val Gln Arg
Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr 405
410 415 Pro Gln Gln Val Val Ala Ile Ala Ser
Asn Asn Gly Gly Lys Gln Ala 420 425
430 Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala
His Gly 435 440 445
Leu Thr Pro Glu Gln Val Val Ala Ile Ala Ser Asn Ile Gly Gly Lys 450
455 460 Gln Ala Leu Glu Thr
Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala 465 470
475 480 His Gly Leu Thr Pro Gln Gln Val Val Ala
Ile Ala Ser His Asp Gly 485 490
495 Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu
Cys 500 505 510 Gln
Ala His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn 515
520 525 Asn Gly Gly Lys Gln Ala
Leu Glu Thr Val Gln Arg Leu Leu Pro Val 530 535
540 Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln
Val Val Ala Ile Ala 545 550 555
560 Ser Asn Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu
565 570 575 Pro Val
Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala 580
585 590 Ile Ala Ser Asn Gly Gly Gly
Arg Pro Ala Leu Glu Ser Ile Val Ala 595 600
605 Gln Leu Ser Arg Pro Asp Pro Ala Leu Ala Arg Ser
Ala Leu Thr Asn 610 615 620
Asp His Leu Val Ala Leu Ala Cys Leu Gly Gly Arg Pro Ala Leu Asp 625
630 635 640 Ala Val Lys
Lys Gly Leu Pro His Ala Pro Ala Leu Ile Lys Arg Thr 645
650 655 Asn Arg Arg Ile Pro Glu Gly Ser
Asp Arg Leu Asn Ile Ser Val Asp 660 665
670 Leu Pro Gly Gly Glu Glu Phe Leu Leu Ser Pro Ala Gly
Gln Asn Pro 675 680 685
Leu Leu Lys Lys Met Val Glu Glu Phe Val Pro Arg Phe Ala Pro Arg 690
695 700 Ser Thr Val Leu
Tyr Leu Gly Asp Thr Arg Gly Lys His Ser Leu Phe 705 710
715 720 Glu Arg Glu Ile Phe Glu Glu Val Leu
Gly Leu Thr Phe Asp Pro His 725 730
735 Gly Arg Met Pro Asp Leu Ile Leu His Asp Glu Val Arg Gly
Trp Leu 740 745 750
Phe Leu Met Glu Ala Val Lys Ser Lys Gly Pro Phe Asp Glu Glu Arg
755 760 765 His Arg Ser Leu
Gln Glu Leu Phe Val Thr Pro Ser Ala Gly Leu Ile 770
775 780 Phe Val Asn Cys Phe Glu Asn Arg
Glu Ser Met Arg Gln Trp Leu Pro 785 790
795 800 Glu Leu Ala Trp Glu Thr Glu Ala Trp Val Ala Glu
Asp Pro Asp His 805 810
815 Leu Ile His Leu Asn Gly Ser Arg Phe Leu Gly Pro Tyr Glu Arg
820 825 830 42831PRTArtificial
sequenceArtTal1-SdaI 42Met Gly Pro Lys Lys Lys Arg Lys Val Ala Ala Ala
Asp Tyr Lys Asp 1 5 10
15 Asp Asp Asp Lys Pro Gly Gly Gly Gly Ser Gly Gly Gly Gly Val Pro
20 25 30 Ala Ser Pro
Ala Ala Gln Val Asp Leu Arg Thr Leu Gly Tyr Ser Gln 35
40 45 Gln Gln Gln Glu Lys Ile Lys Pro
Lys Val Arg Ser Thr Val Ala Gln 50 55
60 His His Glu Ala Leu Val Gly His Gly Phe Thr His Ala
His Ile Val 65 70 75
80 Ala Leu Ser Gln His Pro Ala Ala Leu Gly Thr Val Ala Val Lys Tyr
85 90 95 Gln Asp Met Ile
Ala Ala Leu Pro Glu Ala Thr His Glu Ala Ile Val 100
105 110 Gly Val Gly Lys Gln Trp Ser Gly Ala
Arg Ala Leu Glu Ala Leu Leu 115 120
125 Thr Val Ala Gly Glu Leu Arg Gly Pro Pro Leu Gln Ser Gly
Leu Asp 130 135 140
Thr Gly Gln Leu Leu Lys Ile Ala Lys Arg Gly Gly Val Thr Ala Val 145
150 155 160 Glu Ala Val His Ala
Trp Arg Asn Ala Leu Thr Gly Ala Pro Leu Asn 165
170 175 Leu Thr Pro Glu Gln Val Val Ala Ile Ala
Ser Asn Ile Gly Gly Lys 180 185
190 Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln
Ala 195 200 205 His
Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn Gly Gly 210
215 220 Gly Lys Gln Ala Leu Glu
Thr Val Gln Arg Leu Leu Pro Val Leu Cys 225 230
235 240 Gln Ala His Gly Leu Thr Pro Gln Gln Val Val
Ala Ile Ala Ser Asn 245 250
255 Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val
260 265 270 Leu Cys
Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala 275
280 285 Ser His Asp Gly Gly Lys Gln
Ala Leu Glu Thr Val Gln Arg Leu Leu 290 295
300 Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln
Gln Val Val Ala 305 310 315
320 Ile Ala Ser Asn Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg
325 330 335 Leu Leu Pro
Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val 340
345 350 Val Ala Ile Ala Ser Asn Asn Gly
Gly Lys Gln Ala Leu Glu Thr Val 355 360
365 Gln Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu
Thr Pro Gln 370 375 380
Gln Val Val Ala Ile Ala Ser Asn Asn Gly Gly Lys Gln Ala Leu Glu 385
390 395 400 Thr Val Gln Arg
Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr 405
410 415 Pro Gln Gln Val Val Ala Ile Ala Ser
Asn Asn Gly Gly Lys Gln Ala 420 425
430 Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala
His Gly 435 440 445
Leu Thr Pro Glu Gln Val Val Ala Ile Ala Ser Asn Ile Gly Gly Lys 450
455 460 Gln Ala Leu Glu Thr
Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala 465 470
475 480 His Gly Leu Thr Pro Gln Gln Val Val Ala
Ile Ala Ser His Asp Gly 485 490
495 Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu
Cys 500 505 510 Gln
Ala His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn 515
520 525 Asn Gly Gly Lys Gln Ala
Leu Glu Thr Val Gln Arg Leu Leu Pro Val 530 535
540 Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln
Val Val Ala Ile Ala 545 550 555
560 Ser Asn Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu
565 570 575 Pro Val
Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala 580
585 590 Ile Ala Ser Asn Gly Gly Gly
Arg Pro Ala Leu Glu Ser Ile Val Ala 595 600
605 Gln Leu Ser Arg Pro Asp Pro Ala Leu Ala Arg Ser
Ala Leu Thr Asn 610 615 620
Asp His Leu Val Ala Leu Ala Cys Leu Gly Gly Arg Pro Ala Leu Asp 625
630 635 640 Ala Val Lys
Lys Gly Leu Pro His Ala Pro Ala Leu Ile Lys Arg Thr 645
650 655 Asn Arg Arg Ile Pro Glu Gly Ser
Asp Arg Leu Asn Ile Ser Val Asp 660 665
670 Leu Ala Asp Gly Asp Glu Phe Leu Leu Ser Pro Ala Gly
Gln Asn Pro 675 680 685
Leu Leu Lys Lys Met Val Glu Glu Phe Met Pro Arg Phe Ala Pro Gly 690
695 700 Ala Lys Val Leu
Tyr Ile Gly Asp Trp Arg Gly Lys His Thr Arg Phe 705 710
715 720 Glu Lys Arg Ile Phe Glu Glu Thr Leu
Gly Leu Thr Phe Asp Pro His 725 730
735 Gly Arg Met Pro Asp Leu Val Leu His Asp Lys Val Arg Lys
Trp Leu 740 745 750
Phe Leu Met Glu Ala Val Lys Ser Lys Gly Pro Phe Asp Glu Glu Arg
755 760 765 His Arg Thr Leu
Arg Glu Leu Phe Ala Thr Pro Val Ala Gly Leu Val 770
775 780 Phe Val Asn Cys Phe Glu Asn Arg
Glu Ala Met Arg Gln Trp Leu Pro 785 790
795 800 Glu Leu Ala Trp Glu Thr Glu Ala Trp Val Ala Asp
Asp Pro Asp His 805 810
815 Leu Ile His Leu Asn Gly Ser Arg Phe Leu Gly Pro Tyr Glu Arg
820 825 830 43876PRTArtificial
sequenceArtTal1-StsI 43Met Gly Pro Lys Lys Lys Arg Lys Val Ala Ala Ala
Asp Tyr Lys Asp 1 5 10
15 Asp Asp Asp Lys Pro Gly Gly Gly Gly Ser Gly Gly Gly Gly Val Pro
20 25 30 Ala Ser Pro
Ala Ala Gln Val Asp Leu Arg Thr Leu Gly Tyr Ser Gln 35
40 45 Gln Gln Gln Glu Lys Ile Lys Pro
Lys Val Arg Ser Thr Val Ala Gln 50 55
60 His His Glu Ala Leu Val Gly His Gly Phe Thr His Ala
His Ile Val 65 70 75
80 Ala Leu Ser Gln His Pro Ala Ala Leu Gly Thr Val Ala Val Lys Tyr
85 90 95 Gln Asp Met Ile
Ala Ala Leu Pro Glu Ala Thr His Glu Ala Ile Val 100
105 110 Gly Val Gly Lys Gln Trp Ser Gly Ala
Arg Ala Leu Glu Ala Leu Leu 115 120
125 Thr Val Ala Gly Glu Leu Arg Gly Pro Pro Leu Gln Ser Gly
Leu Asp 130 135 140
Thr Gly Gln Leu Leu Lys Ile Ala Lys Arg Gly Gly Val Thr Ala Val 145
150 155 160 Glu Ala Val His Ala
Trp Arg Asn Ala Leu Thr Gly Ala Pro Leu Asn 165
170 175 Leu Thr Pro Glu Gln Val Val Ala Ile Ala
Ser Asn Ile Gly Gly Lys 180 185
190 Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln
Ala 195 200 205 His
Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn Gly Gly 210
215 220 Gly Lys Gln Ala Leu Glu
Thr Val Gln Arg Leu Leu Pro Val Leu Cys 225 230
235 240 Gln Ala His Gly Leu Thr Pro Gln Gln Val Val
Ala Ile Ala Ser Asn 245 250
255 Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val
260 265 270 Leu Cys
Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala 275
280 285 Ser His Asp Gly Gly Lys Gln
Ala Leu Glu Thr Val Gln Arg Leu Leu 290 295
300 Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln
Gln Val Val Ala 305 310 315
320 Ile Ala Ser Asn Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg
325 330 335 Leu Leu Pro
Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val 340
345 350 Val Ala Ile Ala Ser Asn Asn Gly
Gly Lys Gln Ala Leu Glu Thr Val 355 360
365 Gln Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu
Thr Pro Gln 370 375 380
Gln Val Val Ala Ile Ala Ser Asn Asn Gly Gly Lys Gln Ala Leu Glu 385
390 395 400 Thr Val Gln Arg
Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr 405
410 415 Pro Gln Gln Val Val Ala Ile Ala Ser
Asn Asn Gly Gly Lys Gln Ala 420 425
430 Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala
His Gly 435 440 445
Leu Thr Pro Glu Gln Val Val Ala Ile Ala Ser Asn Ile Gly Gly Lys 450
455 460 Gln Ala Leu Glu Thr
Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala 465 470
475 480 His Gly Leu Thr Pro Gln Gln Val Val Ala
Ile Ala Ser His Asp Gly 485 490
495 Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu
Cys 500 505 510 Gln
Ala His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn 515
520 525 Asn Gly Gly Lys Gln Ala
Leu Glu Thr Val Gln Arg Leu Leu Pro Val 530 535
540 Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln
Val Val Ala Ile Ala 545 550 555
560 Ser Asn Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu
565 570 575 Pro Val
Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala 580
585 590 Ile Ala Ser Asn Gly Gly Gly
Arg Pro Ala Leu Glu Ser Ile Val Ala 595 600
605 Gln Leu Ser Arg Pro Asp Pro Ala Leu Ala Arg Ser
Ala Leu Thr Asn 610 615 620
Asp His Leu Val Ala Leu Ala Cys Leu Gly Gly Arg Pro Ala Leu Asp 625
630 635 640 Ala Val Lys
Lys Gly Leu Pro His Ala Pro Ala Leu Ile Lys Arg Thr 645
650 655 Asn Arg Arg Ile Pro Glu Gly Ser
Asp Arg Leu Asn Asp Val Val Leu 660 665
670 Glu Lys Ser Asp Ile Glu Lys Phe Lys Asn Gln Leu Arg
Thr Glu Leu 675 680 685
Thr Asn Ile Asp His Ser Tyr Leu Lys Gly Ile Asp Ile Ala Ser Lys 690
695 700 Lys Lys Thr Ser
Asn Val Glu Asn Thr Glu Phe Glu Ala Ile Ser Thr 705 710
715 720 Lys Ile Phe Thr Asp Glu Leu Gly Phe
Ser Gly Lys His Leu Gly Gly 725 730
735 Ser Asn Lys Pro Asp Gly Leu Leu Trp Asp Asp Asp Cys Ala
Ile Ile 740 745 750
Leu Asp Ser Lys Ala Tyr Ser Glu Gly Phe Pro Leu Thr Ala Ser His
755 760 765 Thr Asp Ala Met
Gly Arg Tyr Leu Arg Gln Phe Thr Glu Arg Lys Glu 770
775 780 Glu Ile Lys Pro Thr Trp Trp Asp
Ile Ala Pro Glu His Leu Asp Asn 785 790
795 800 Thr Tyr Phe Ala Tyr Val Ser Gly Ser Phe Ser Gly
Asn Tyr Lys Glu 805 810
815 Gln Leu Gln Lys Phe Arg Gln Asp Thr Asn His Leu Gly Gly Ala Leu
820 825 830 Glu Phe Val
Lys Leu Leu Leu Leu Ala Asn Asn Tyr Lys Thr Gln Lys 835
840 845 Met Ser Lys Lys Glu Val Lys Lys
Ser Ile Leu Asp Tyr Asn Ile Ser 850 855
860 Tyr Glu Glu Tyr Ala Pro Leu Leu Ala Glu Ile Glu 865
870 875 44864PRTArtificial
sequenceArtTal1-FokI 44Met Gly Pro Lys Lys Lys Arg Lys Val Ala Ala Ala
Asp Tyr Lys Asp 1 5 10
15 Asp Asp Asp Lys Pro Gly Gly Gly Gly Ser Gly Gly Gly Gly Val Pro
20 25 30 Ala Ser Pro
Ala Ala Gln Val Asp Leu Arg Thr Leu Gly Tyr Ser Gln 35
40 45 Gln Gln Gln Glu Lys Ile Lys Pro
Lys Val Arg Ser Thr Val Ala Gln 50 55
60 His His Glu Ala Leu Val Gly His Gly Phe Thr His Ala
His Ile Val 65 70 75
80 Ala Leu Ser Gln His Pro Ala Ala Leu Gly Thr Val Ala Val Lys Tyr
85 90 95 Gln Asp Met Ile
Ala Ala Leu Pro Glu Ala Thr His Glu Ala Ile Val 100
105 110 Gly Val Gly Lys Gln Trp Ser Gly Ala
Arg Ala Leu Glu Ala Leu Leu 115 120
125 Thr Val Ala Gly Glu Leu Arg Gly Pro Pro Leu Gln Ser Gly
Leu Asp 130 135 140
Thr Gly Gln Leu Leu Lys Ile Ala Lys Arg Gly Gly Val Thr Ala Val 145
150 155 160 Glu Ala Val His Ala
Trp Arg Asn Ala Leu Thr Gly Ala Pro Leu Asn 165
170 175 Leu Thr Pro Glu Gln Val Val Ala Ile Ala
Ser Asn Ile Gly Gly Lys 180 185
190 Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln
Ala 195 200 205 His
Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn Gly Gly 210
215 220 Gly Lys Gln Ala Leu Glu
Thr Val Gln Arg Leu Leu Pro Val Leu Cys 225 230
235 240 Gln Ala His Gly Leu Thr Pro Gln Gln Val Val
Ala Ile Ala Ser Asn 245 250
255 Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val
260 265 270 Leu Cys
Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala 275
280 285 Ser His Asp Gly Gly Lys Gln
Ala Leu Glu Thr Val Gln Arg Leu Leu 290 295
300 Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln
Gln Val Val Ala 305 310 315
320 Ile Ala Ser Asn Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg
325 330 335 Leu Leu Pro
Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val 340
345 350 Val Ala Ile Ala Ser Asn Asn Gly
Gly Lys Gln Ala Leu Glu Thr Val 355 360
365 Gln Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu
Thr Pro Gln 370 375 380
Gln Val Val Ala Ile Ala Ser Asn Asn Gly Gly Lys Gln Ala Leu Glu 385
390 395 400 Thr Val Gln Arg
Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr 405
410 415 Pro Gln Gln Val Val Ala Ile Ala Ser
Asn Asn Gly Gly Lys Gln Ala 420 425
430 Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala
His Gly 435 440 445
Leu Thr Pro Glu Gln Val Val Ala Ile Ala Ser Asn Ile Gly Gly Lys 450
455 460 Gln Ala Leu Glu Thr
Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala 465 470
475 480 His Gly Leu Thr Pro Gln Gln Val Val Ala
Ile Ala Ser His Asp Gly 485 490
495 Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu
Cys 500 505 510 Gln
Ala His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn 515
520 525 Asn Gly Gly Lys Gln Ala
Leu Glu Thr Val Gln Arg Leu Leu Pro Val 530 535
540 Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln
Val Val Ala Ile Ala 545 550 555
560 Ser Asn Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu
565 570 575 Pro Val
Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala 580
585 590 Ile Ala Ser Asn Gly Gly Gly
Arg Pro Ala Leu Glu Ser Ile Val Ala 595 600
605 Gln Leu Ser Arg Pro Asp Pro Ala Leu Ala Arg Ser
Ala Leu Thr Asn 610 615 620
Asp His Leu Val Ala Leu Ala Cys Leu Gly Gly Arg Pro Ala Leu Asp 625
630 635 640 Ala Val Lys
Lys Gly Leu Pro His Ala Pro Ala Leu Ile Lys Arg Thr 645
650 655 Asn Arg Arg Ile Pro Glu Gly Ser
Asp Arg Leu Asn Gln Leu Val Lys 660 665
670 Ser Glu Leu Glu Glu Lys Lys Ser Glu Leu Arg His Lys
Leu Lys Tyr 675 680 685
Val Pro His Glu Tyr Ile Glu Leu Ile Glu Ile Ala Arg Asn Ser Thr 690
695 700 Gln Asp Arg Ile
Leu Glu Met Lys Val Met Glu Phe Phe Met Lys Val 705 710
715 720 Tyr Gly Tyr Arg Gly Lys His Leu Gly
Gly Ser Arg Lys Pro Asp Gly 725 730
735 Ala Ile Tyr Thr Val Gly Ser Pro Ile Asp Tyr Gly Val Ile
Val Asp 740 745 750
Thr Lys Ala Tyr Ser Gly Gly Tyr Asn Leu Pro Ile Gly Gln Ala Asp
755 760 765 Glu Met Gln Arg
Tyr Val Glu Glu Asn Gln Thr Arg Asn Lys His Ile 770
775 780 Asn Pro Asn Glu Trp Trp Lys Val
Tyr Pro Ser Ser Val Thr Glu Phe 785 790
795 800 Lys Phe Leu Phe Val Ser Gly His Phe Lys Gly Asn
Tyr Lys Ala Gln 805 810
815 Leu Thr Arg Leu Asn His Ile Thr Asn Cys Asn Gly Ala Val Leu Ser
820 825 830 Val Glu Glu
Leu Leu Ile Gly Gly Glu Met Ile Lys Ala Gly Thr Leu 835
840 845 Thr Leu Glu Glu Val Arg Arg Lys
Phe Asn Asn Gly Glu Ile Asn Phe 850 855
860 457374DNAArtificial sequenceArtTal1-Reporter
45cgttacataa cttacggtaa atggcccgcc tggctgaccg cccaacgacc cccgcccatt
60gacgtcaata atgacgtatg ttcccatagt aacgccaata gggactttcc attgacgtca
120atgggtggag tatttacggt aaactgccca cttggcagta catcaagtgt atcatatgcc
180aagtacgccc cctattgacg tcaatgacgg taaatggccc gcctggcatt atgcccagta
240catgacctta tgggactttc ctacttggca gtacatctac gtattagtca tcgctattac
300catggtgatg cggttttggc agtacatcaa tgggcgtgga tagcggtttg actcacgggg
360atttccaagt ctccacccca ttgacgtcaa tgggagtttg ttttggcacc aaaatcaacg
420ggactttcca aaatgtcgta acaactccgc cccattgacg caaatgggcg gtaggcgtgt
480acggtgggag gtctatataa gcagagctcg tttagtgaac cgtcagatcg cctggagacg
540ccatccacgc tgttttgacc tccatagaag acaccgggac cgatccagcc tccggactct
600agaggatccg gtactcgacg acactgcaga gacctacttc actaacaacc ggtatggtcg
660cgagtagctt ggcactggcc gtcgttttac aacgtcgtga ctgggaaaac cctggcgtta
720cccaacttaa tcgccttgca gcacatcccc ctttcgccag ctggcgtaat agcgaagagg
780cccgcaccga tcgcccttcc caacagttgc gcagcctgaa tggcgaatgg cgctttgcct
840ggtttccggc accagaagcg gtgccggaaa gctggctgga gtgcgatctt cctgaggccg
900atactgtcgt cgtcccctca aactggcaga tgcacggtta cgatgcgccc atctacacca
960acgtgaccta tcccattacg gtcaatccgc cgtttgttcc cacggagaat ccgacgggtt
1020gttactcgct cacatttaat gttgatgaaa gctggctata aaaccggtac agttcggcca
1080ccatggtcgt attctgggac gttttcacac tcttctaacg tcccagaata ctcgagtagc
1140ttggcactgg ccgtcgtttt acaacgtcgt gactgggaaa accctggcgt tacccaactt
1200aatcgccttg cagcacatcc ccctttcgcc agctggcgta atagcgaaga ggcccgcacc
1260gatcgccctt cccaacagtt gcgcagcctg aatggcgaat ggcgctttgc ctggtttccg
1320gcaccagaag cggtgccgga aagctggctg gagtgcgatc ttcctgaggc cgatactgtc
1380gtcgtcccct caaactggca gatgcacggt tacgatgcgc ccatctacac caacgtgacc
1440tatcccatta cggtcaatcc gccgtttgtt cccacggaga atccgacggg ttgttactcg
1500ctcacattta atgttgatga aagctggcta caggaaggcc agacgcgaat tatttttgat
1560ggcgttaact cggcgtttca tctgtggtgc aacgggcgct gggtcggtta cggccaggac
1620agtcgtttgc cgtctgaatt tgacctgagc gcatttttac gcgccggaga aaaccgcctc
1680gcggtgatgg tgctgcgctg gagtgacggc agttatctgg aagatcagga tatgtggcgg
1740atgagcggca ttttccgtga cgtctcgttg ctgcataaac cgactacaca aatcagcgat
1800ttccatgttg ccactcgctt taatgatgat ttcagccgcg ctgtactgga ggctgaagtt
1860cagatgtgcg gcgagttgcg tgactaccta cgggtaacag tttctttatg gcagggtgaa
1920acgcaggtcg ccagcggcac cgcgcctttc ggcggtgaaa ttatcgatga gcgtggtggt
1980tatgccgatc gcgtcacact acgtctgaac gtcgaaaacc cgaaactgtg gagcgccgaa
2040atcccgaatc tctatcgtgc ggtggttgaa ctgcacaccg ccgacggcac gctgattgaa
2100gcagaagcct gcgatgtcgg tttccgcgag gtgcggattg aaaatggtct gctgctgctg
2160aacggcaagc cgttgctgat tcgaggcgtt aaccgtcacg agcatcatcc tctgcatggt
2220caggtcatgg atgagcagac gatggtgcag gatatcctgc tgatgaagca gaacaacttt
2280aacgccgtgc gctgttcgca ttatccgaac catccgctgt ggtacacgct gtgcgaccgc
2340tacggcctgt atgtggtgga tgaagccaat attgaaaccc acggcatggt gccaatgaat
2400cgtctgaccg atgatccgcg ctggctaccg gcgatgagcg aacgcgtaac gcgaatggtg
2460cagcgcgatc gtaatcaccc gagtgtgatc atctggtcgc tggggaatga atcaggccac
2520ggcgctaatc acgacgcgct gtatcgctgg atcaaatctg tcgatccttc ccgcccggtg
2580cagtatgaag gcggcggagc cgacaccacg gccaccgata ttatttgccc gatgtacgcg
2640cgcgtggatg aagaccagcc cttcccggct gtgccgaaat ggtccatcaa aaaatggctt
2700tcgctacctg gagagacgcg cccgctgatc ctttgcgaat acgcccacgc gatgggtaac
2760agtcttggcg gtttcgctaa atactggcag gcgtttcgtc agtatccccg tttacagggc
2820ggcttcgtct gggactgggt ggatcagtcg ctgattaaat atgatgaaaa cggcaacccg
2880tggtcggctt acggcggtga ttttggcgat acgccgaacg atcgccagtt ctgtatgaac
2940ggtctggtct ttgccgaccg cacgccgcat ccagcgctga cggaagcaaa acaccagcag
3000cagtttttcc agttccgttt atccgggcaa accatcgaag tgaccagcga atacctgttc
3060cgtcatagcg ataacgagct cctgcactgg atggtggcgc tggatggtaa gccgctggca
3120agcggtgaag tgcctctgga tgtcgctcca caaggtaaac agttgattga actgcctgaa
3180ctaccgcagc cggagagcgc cgggcaactc tggctcacag tacgcgtagt gcaaccgaac
3240gcgaccgcat ggtcagaagc cgggcacatc agcgcctggc agcagtggcg tctggcggaa
3300aacctcagtg tgacgctccc cgccgcgtcc cacgccatcc cgcatctgac caccagcgaa
3360atggattttt gcatcgagct gggtaataag cgttggcaat ttaaccgcca gtcaggcttt
3420ctttcacaga tgtggattgg cgataaaaaa caactgctga cgccgctgcg cgatcagttc
3480acccgtgcac cgctggataa cgacattggc gtaagtgaag cgacccgcat tgaccctaac
3540gcctgggtcg aacgctggaa ggcggcgggc cattaccagg ccgaagcagc gttgttgcag
3600tgcacggcag atacacttgc tgatgcggtg ctgattacga ccgctcacgc gtggcagcat
3660caggggaaaa ccttatttat cagccggaaa acctaccgga ttgatggtag tggtcaaatg
3720gcgattaccg ttgatgttga agtggcgagc gatacaccgc atccggcgcg gattggcctg
3780aactgccagc tggcgcaggt agcagagcgg gtaaactggc tcggattagg gccgcaagaa
3840aactatcccg accgccttac tgccgcctgt tttgaccgct gggatctgcc attgtcagac
3900atgtataccc cgtacgtctt cccgagcgaa aacggtctgc gctgcgggac gcgcgaattg
3960aattatggcc cacaccagtg gcgcggcgac ttccagttca acatcagccg ctacagtcaa
4020cagcaactga tggaaaccag ccatcgccat ctgctgcacg cggaagaagg cacatggctg
4080aatatcgacg gtttccatat ggggattggt ggcgacgact cctggagccc gtcagtatcg
4140gcggaattac agctgagcgc cggtcgctac cattaccagt tggtctggtg tcaaaaataa
4200taataaccgg gcaggccatg tctgcccgta tttcgcgtaa ggaaatccat tatgtactat
4260ttaaaaaaca caaacttttg gatgttcggt ttattctttt tcttttactt ttttatcatg
4320ggagcctact tcccgttttt cccgatttgg ctacatgaca tcaaccatat cagcaaaagt
4380gatacgggta ttatttttgc cgctatttct ctgttctcgc tattattcca accgctgttt
4440ggtctgcttt ctgacaaact cggcctcgac tctaggcggc cgcggggatc cagacatgat
4500aagatacatt gatgagtttg gacaaaccac aactagaatg cagtgaaaaa aatgctttat
4560ttgtgaaatt tgtgatgcta ttgctttatt tgtaaccatt ataagctgca ataaacaagt
4620taacaacaac aattgcattc attttatgtt tcaggttcag ggggaggtgt gggaggtttt
4680ttcggatcct ctagagtcga cctgcaggca tgcaagcttg gcgtaatcat ggtcatagct
4740gtttcctgtg tgaaattgtt atccgctcac aattccacac aacatacgag ccggaagcat
4800aaagtgtaaa gcctggggtg cctaatgagt gagctaactc acattaattg cgttgcgctc
4860actgcccgct ttccagtcgg gaaacctgtc gtgccagctg cattaatgaa tcggccaacg
4920cgcggggaga ggcggtttgc gtattgggcg ctcttccgct tcctcgctca ctgactcgct
4980gcgctcggtc gttcggctgc ggcgagcggt atcagctcac tcaaaggcgg taatacggtt
5040atccacagaa tcaggggata acgcaggaaa gaacatgtga gcaaaaggcc agcaaaaggc
5100caggaaccgt aaaaaggccg cgttgctggc gtttttccat aggctccgcc cccctgacga
5160gcatcacaaa aatcgacgct caagtcagag gtggcgaaac ccgacaggac tataaagata
5220ccaggcgttt ccccctggaa gctccctcgt gcgctctcct gttccgaccc tgccgcttac
5280cggatacctg tccgcctttc tcccttcggg aagcgtggcg ctttctcata gctcacgctg
5340taggtatctc agttcggtgt aggtcgttcg ctccaagctg ggctgtgtgc acgaaccccc
5400cgttcagccc gaccgctgcg ccttatccgg taactatcgt cttgagtcca acccggtaag
5460acacgactta tcgccactgg cagcagccac tggtaacagg attagcagag cgaggtatgt
5520aggcggtgct acagagttct tgaagtggtg gcctaactac ggctacacta gaaggacagt
5580atttggtatc tgcgctctgc tgaagccagt taccttcgga aaaagagttg gtagctcttg
5640atccggcaaa caaaccaccg ctggtagcgg tggttttttt gtttgcaagc agcagattac
5700gcgcagaaaa aaaggatctc aagaagatcc tttgatcttt tctacggggt ctgacgctca
5760gtggaacgaa aactcacgtt aagggatttt ggtcatgaga ttatcaaaaa ggatcttcac
5820ctagatcctt ttaaattaaa aatgaagttt taaatcaatc taaagtatat atgagtaaac
5880ttggtctgac agttaccaat gcttaatcag tgaggcacct atctcagcga tctgtctatt
5940tcgttcatcc atagttgcct gactccccgt cgtgtagata actacgatac gggagggctt
6000accatctggc cccagtgctg caatgatacc gcgagaccca cgctcaccgg ctccagattt
6060atcagcaata aaccagccag ccggaagggc cgagcgcaga agtggtcctg caactttatc
6120cgcctccatc cagtctatta attgttgccg ggaagctaga gtaagtagtt cgccagttaa
6180tagtttgcgc aacgttgttg ccattgctac aggcatcgtg gtgtcacgct cgtcgtttgg
6240tatggcttca ttcagctccg gttcccaacg atcaaggcga gttacatgat cccccatgtt
6300gtgcaaaaaa gcggttagct ccttcggtcc tccgatcgtt gtcagaagta agttggccgc
6360agtgttatca ctcatggtta tggcagcact gcataattct cttactgtca tgccatccgt
6420aagatgcttt tctgtgactg gtgagtactc aaccaagtca ttctgagaat agtgtatgcg
6480gcgaccgagt tgctcttgcc cggcgtcaat acgggataat accgcgccac atagcagaac
6540tttaaaagtg ctcatcattg gaaaacgttc ttcggggcga aaactctcaa ggatcttacc
6600gctgttgaga tccagttcga tgtaacccac tcgtgcaccc aactgatctt cagcatcttt
6660tactttcacc agcgtttctg ggtgagcaaa aacaggaagg caaaatgccg caaaaaaggg
6720aataagggcg acacggaaat gttgaatact catactcttc ctttttcaat attattgaag
6780catttatcag ggttattgtc tcatgagcgg atacatattt gaatgtattt agaaaaataa
6840acaaataggg gttccgcgca catttccccg aaaagtgcca cctgacgtct aagaaaccat
6900tattatcatg acattaacct ataaaaatag gcgtatcacg aggccctttc gtctcgcgcg
6960tttcggtgat gacggtgaaa acctctgaca catgcagctc ccggagacgg tcacagcttg
7020tctgtaagcg gatgccggga gcagacaagc ccgtcagggc gcgtcagcgg gtgttggcgg
7080gtgtcggggc tggcttaact atgcggcatc agagcagatt gtactgagag tgcaccatat
7140gcggtgtgaa ataccgcaca gatgcgtaag gagaaaatac cgcatcaggc gccattcgcc
7200attcaggctg cgcaactgtt gggaagggcg atcggtgcgg gcctcttcgc tattacgcca
7260gctggcgaaa gggggatgtg ctgcaaggcg attaagttgg gtaacgccag ggttttccca
7320gtcacgacgt tgtaaaacga cggccagtga attcgagctt gcatgcctgc aggt
7374467374DNAArtificial sequenceTalRab1-Reporter 46cgttacataa cttacggtaa
atggcccgcc tggctgaccg cccaacgacc cccgcccatt 60gacgtcaata atgacgtatg
ttcccatagt aacgccaata gggactttcc attgacgtca 120atgggtggag tatttacggt
aaactgccca cttggcagta catcaagtgt atcatatgcc 180aagtacgccc cctattgacg
tcaatgacgg taaatggccc gcctggcatt atgcccagta 240catgacctta tgggactttc
ctacttggca gtacatctac gtattagtca tcgctattac 300catggtgatg cggttttggc
agtacatcaa tgggcgtgga tagcggtttg actcacgggg 360atttccaagt ctccacccca
ttgacgtcaa tgggagtttg ttttggcacc aaaatcaacg 420ggactttcca aaatgtcgta
acaactccgc cccattgacg caaatgggcg gtaggcgtgt 480acggtgggag gtctatataa
gcagagctcg tttagtgaac cgtcagatcg cctggagacg 540ccatccacgc tgttttgacc
tccatagaag acaccgggac cgatccagcc tccggactct 600agaggatccg gtactcgacg
acactgcaga gacctacttc actaacaacc ggtatggtcg 660cgagtagctt ggcactggcc
gtcgttttac aacgtcgtga ctgggaaaac cctggcgtta 720cccaacttaa tcgccttgca
gcacatcccc ctttcgccag ctggcgtaat agcgaagagg 780cccgcaccga tcgcccttcc
caacagttgc gcagcctgaa tggcgaatgg cgctttgcct 840ggtttccggc accagaagcg
gtgccggaaa gctggctgga gtgcgatctt cctgaggccg 900atactgtcgt cgtcccctca
aactggcaga tgcacggtta cgatgcgccc atctacacca 960acgtgaccta tcccattacg
gtcaatccgc cgtttgttcc cacggagaat ccgacgggtt 1020gttactcgct cacatttaat
gttgatgaaa gctggctata aaaccggtac agttcggcca 1080ccatggtcgt gtgcaccaaa
acttttcaca ctcttctaag ttttggtgca cacgagtagc 1140ttggcactgg ccgtcgtttt
acaacgtcgt gactgggaaa accctggcgt tacccaactt 1200aatcgccttg cagcacatcc
ccctttcgcc agctggcgta atagcgaaga ggcccgcacc 1260gatcgccctt cccaacagtt
gcgcagcctg aatggcgaat ggcgctttgc ctggtttccg 1320gcaccagaag cggtgccgga
aagctggctg gagtgcgatc ttcctgaggc cgatactgtc 1380gtcgtcccct caaactggca
gatgcacggt tacgatgcgc ccatctacac caacgtgacc 1440tatcccatta cggtcaatcc
gccgtttgtt cccacggaga atccgacggg ttgttactcg 1500ctcacattta atgttgatga
aagctggcta caggaaggcc agacgcgaat tatttttgat 1560ggcgttaact cggcgtttca
tctgtggtgc aacgggcgct gggtcggtta cggccaggac 1620agtcgtttgc cgtctgaatt
tgacctgagc gcatttttac gcgccggaga aaaccgcctc 1680gcggtgatgg tgctgcgctg
gagtgacggc agttatctgg aagatcagga tatgtggcgg 1740atgagcggca ttttccgtga
cgtctcgttg ctgcataaac cgactacaca aatcagcgat 1800ttccatgttg ccactcgctt
taatgatgat ttcagccgcg ctgtactgga ggctgaagtt 1860cagatgtgcg gcgagttgcg
tgactaccta cgggtaacag tttctttatg gcagggtgaa 1920acgcaggtcg ccagcggcac
cgcgcctttc ggcggtgaaa ttatcgatga gcgtggtggt 1980tatgccgatc gcgtcacact
acgtctgaac gtcgaaaacc cgaaactgtg gagcgccgaa 2040atcccgaatc tctatcgtgc
ggtggttgaa ctgcacaccg ccgacggcac gctgattgaa 2100gcagaagcct gcgatgtcgg
tttccgcgag gtgcggattg aaaatggtct gctgctgctg 2160aacggcaagc cgttgctgat
tcgaggcgtt aaccgtcacg agcatcatcc tctgcatggt 2220caggtcatgg atgagcagac
gatggtgcag gatatcctgc tgatgaagca gaacaacttt 2280aacgccgtgc gctgttcgca
ttatccgaac catccgctgt ggtacacgct gtgcgaccgc 2340tacggcctgt atgtggtgga
tgaagccaat attgaaaccc acggcatggt gccaatgaat 2400cgtctgaccg atgatccgcg
ctggctaccg gcgatgagcg aacgcgtaac gcgaatggtg 2460cagcgcgatc gtaatcaccc
gagtgtgatc atctggtcgc tggggaatga atcaggccac 2520ggcgctaatc acgacgcgct
gtatcgctgg atcaaatctg tcgatccttc ccgcccggtg 2580cagtatgaag gcggcggagc
cgacaccacg gccaccgata ttatttgccc gatgtacgcg 2640cgcgtggatg aagaccagcc
cttcccggct gtgccgaaat ggtccatcaa aaaatggctt 2700tcgctacctg gagagacgcg
cccgctgatc ctttgcgaat acgcccacgc gatgggtaac 2760agtcttggcg gtttcgctaa
atactggcag gcgtttcgtc agtatccccg tttacagggc 2820ggcttcgtct gggactgggt
ggatcagtcg ctgattaaat atgatgaaaa cggcaacccg 2880tggtcggctt acggcggtga
ttttggcgat acgccgaacg atcgccagtt ctgtatgaac 2940ggtctggtct ttgccgaccg
cacgccgcat ccagcgctga cggaagcaaa acaccagcag 3000cagtttttcc agttccgttt
atccgggcaa accatcgaag tgaccagcga atacctgttc 3060cgtcatagcg ataacgagct
cctgcactgg atggtggcgc tggatggtaa gccgctggca 3120agcggtgaag tgcctctgga
tgtcgctcca caaggtaaac agttgattga actgcctgaa 3180ctaccgcagc cggagagcgc
cgggcaactc tggctcacag tacgcgtagt gcaaccgaac 3240gcgaccgcat ggtcagaagc
cgggcacatc agcgcctggc agcagtggcg tctggcggaa 3300aacctcagtg tgacgctccc
cgccgcgtcc cacgccatcc cgcatctgac caccagcgaa 3360atggattttt gcatcgagct
gggtaataag cgttggcaat ttaaccgcca gtcaggcttt 3420ctttcacaga tgtggattgg
cgataaaaaa caactgctga cgccgctgcg cgatcagttc 3480acccgtgcac cgctggataa
cgacattggc gtaagtgaag cgacccgcat tgaccctaac 3540gcctgggtcg aacgctggaa
ggcggcgggc cattaccagg ccgaagcagc gttgttgcag 3600tgcacggcag atacacttgc
tgatgcggtg ctgattacga ccgctcacgc gtggcagcat 3660caggggaaaa ccttatttat
cagccggaaa acctaccgga ttgatggtag tggtcaaatg 3720gcgattaccg ttgatgttga
agtggcgagc gatacaccgc atccggcgcg gattggcctg 3780aactgccagc tggcgcaggt
agcagagcgg gtaaactggc tcggattagg gccgcaagaa 3840aactatcccg accgccttac
tgccgcctgt tttgaccgct gggatctgcc attgtcagac 3900atgtataccc cgtacgtctt
cccgagcgaa aacggtctgc gctgcgggac gcgcgaattg 3960aattatggcc cacaccagtg
gcgcggcgac ttccagttca acatcagccg ctacagtcaa 4020cagcaactga tggaaaccag
ccatcgccat ctgctgcacg cggaagaagg cacatggctg 4080aatatcgacg gtttccatat
ggggattggt ggcgacgact cctggagccc gtcagtatcg 4140gcggaattac agctgagcgc
cggtcgctac cattaccagt tggtctggtg tcaaaaataa 4200taataaccgg gcaggccatg
tctgcccgta tttcgcgtaa ggaaatccat tatgtactat 4260ttaaaaaaca caaacttttg
gatgttcggt ttattctttt tcttttactt ttttatcatg 4320ggagcctact tcccgttttt
cccgatttgg ctacatgaca tcaaccatat cagcaaaagt 4380gatacgggta ttatttttgc
cgctatttct ctgttctcgc tattattcca accgctgttt 4440ggtctgcttt ctgacaaact
cggcctcgac tctaggcggc cgcggggatc cagacatgat 4500aagatacatt gatgagtttg
gacaaaccac aactagaatg cagtgaaaaa aatgctttat 4560ttgtgaaatt tgtgatgcta
ttgctttatt tgtaaccatt ataagctgca ataaacaagt 4620taacaacaac aattgcattc
attttatgtt tcaggttcag ggggaggtgt gggaggtttt 4680ttcggatcct ctagagtcga
cctgcaggca tgcaagcttg gcgtaatcat ggtcatagct 4740gtttcctgtg tgaaattgtt
atccgctcac aattccacac aacatacgag ccggaagcat 4800aaagtgtaaa gcctggggtg
cctaatgagt gagctaactc acattaattg cgttgcgctc 4860actgcccgct ttccagtcgg
gaaacctgtc gtgccagctg cattaatgaa tcggccaacg 4920cgcggggaga ggcggtttgc
gtattgggcg ctcttccgct tcctcgctca ctgactcgct 4980gcgctcggtc gttcggctgc
ggcgagcggt atcagctcac tcaaaggcgg taatacggtt 5040atccacagaa tcaggggata
acgcaggaaa gaacatgtga gcaaaaggcc agcaaaaggc 5100caggaaccgt aaaaaggccg
cgttgctggc gtttttccat aggctccgcc cccctgacga 5160gcatcacaaa aatcgacgct
caagtcagag gtggcgaaac ccgacaggac tataaagata 5220ccaggcgttt ccccctggaa
gctccctcgt gcgctctcct gttccgaccc tgccgcttac 5280cggatacctg tccgcctttc
tcccttcggg aagcgtggcg ctttctcata gctcacgctg 5340taggtatctc agttcggtgt
aggtcgttcg ctccaagctg ggctgtgtgc acgaaccccc 5400cgttcagccc gaccgctgcg
ccttatccgg taactatcgt cttgagtcca acccggtaag 5460acacgactta tcgccactgg
cagcagccac tggtaacagg attagcagag cgaggtatgt 5520aggcggtgct acagagttct
tgaagtggtg gcctaactac ggctacacta gaaggacagt 5580atttggtatc tgcgctctgc
tgaagccagt taccttcgga aaaagagttg gtagctcttg 5640atccggcaaa caaaccaccg
ctggtagcgg tggttttttt gtttgcaagc agcagattac 5700gcgcagaaaa aaaggatctc
aagaagatcc tttgatcttt tctacggggt ctgacgctca 5760gtggaacgaa aactcacgtt
aagggatttt ggtcatgaga ttatcaaaaa ggatcttcac 5820ctagatcctt ttaaattaaa
aatgaagttt taaatcaatc taaagtatat atgagtaaac 5880ttggtctgac agttaccaat
gcttaatcag tgaggcacct atctcagcga tctgtctatt 5940tcgttcatcc atagttgcct
gactccccgt cgtgtagata actacgatac gggagggctt 6000accatctggc cccagtgctg
caatgatacc gcgagaccca cgctcaccgg ctccagattt 6060atcagcaata aaccagccag
ccggaagggc cgagcgcaga agtggtcctg caactttatc 6120cgcctccatc cagtctatta
attgttgccg ggaagctaga gtaagtagtt cgccagttaa 6180tagtttgcgc aacgttgttg
ccattgctac aggcatcgtg gtgtcacgct cgtcgtttgg 6240tatggcttca ttcagctccg
gttcccaacg atcaaggcga gttacatgat cccccatgtt 6300gtgcaaaaaa gcggttagct
ccttcggtcc tccgatcgtt gtcagaagta agttggccgc 6360agtgttatca ctcatggtta
tggcagcact gcataattct cttactgtca tgccatccgt 6420aagatgcttt tctgtgactg
gtgagtactc aaccaagtca ttctgagaat agtgtatgcg 6480gcgaccgagt tgctcttgcc
cggcgtcaat acgggataat accgcgccac atagcagaac 6540tttaaaagtg ctcatcattg
gaaaacgttc ttcggggcga aaactctcaa ggatcttacc 6600gctgttgaga tccagttcga
tgtaacccac tcgtgcaccc aactgatctt cagcatcttt 6660tactttcacc agcgtttctg
ggtgagcaaa aacaggaagg caaaatgccg caaaaaaggg 6720aataagggcg acacggaaat
gttgaatact catactcttc ctttttcaat attattgaag 6780catttatcag ggttattgtc
tcatgagcgg atacatattt gaatgtattt agaaaaataa 6840acaaataggg gttccgcgca
catttccccg aaaagtgcca cctgacgtct aagaaaccat 6900tattatcatg acattaacct
ataaaaatag gcgtatcacg aggccctttc gtctcgcgcg 6960tttcggtgat gacggtgaaa
acctctgaca catgcagctc ccggagacgg tcacagcttg 7020tctgtaagcg gatgccggga
gcagacaagc ccgtcagggc gcgtcagcgg gtgttggcgg 7080gtgtcggggc tggcttaact
atgcggcatc agagcagatt gtactgagag tgcaccatat 7140gcggtgtgaa ataccgcaca
gatgcgtaag gagaaaatac cgcatcaggc gccattcgcc 7200attcaggctg cgcaactgtt
gggaagggcg atcggtgcgg gcctcttcgc tattacgcca 7260gctggcgaaa gggggatgtg
ctgcaaggcg attaagttgg gtaacgccag ggttttccca 7320gtcacgacgt tgtaaaacga
cggccagtga attcgagctt gcatgcctgc aggt 7374477377DNAArtificial
sequenceTalRab2-Reporter 47cgttacataa cttacggtaa atggcccgcc tggctgaccg
cccaacgacc cccgcccatt 60gacgtcaata atgacgtatg ttcccatagt aacgccaata
gggactttcc attgacgtca 120atgggtggag tatttacggt aaactgccca cttggcagta
catcaagtgt atcatatgcc 180aagtacgccc cctattgacg tcaatgacgg taaatggccc
gcctggcatt atgcccagta 240catgacctta tgggactttc ctacttggca gtacatctac
gtattagtca tcgctattac 300catggtgatg cggttttggc agtacatcaa tgggcgtgga
tagcggtttg actcacgggg 360atttccaagt ctccacccca ttgacgtcaa tgggagtttg
ttttggcacc aaaatcaacg 420ggactttcca aaatgtcgta acaactccgc cccattgacg
caaatgggcg gtaggcgtgt 480acggtgggag gtctatataa gcagagctcg tttagtgaac
cgtcagatcg cctggagacg 540ccatccacgc tgttttgacc tccatagaag acaccgggac
cgatccagcc tccggactct 600agaggatccg gtactcgacg acactgcaga gacctacttc
actaacaacc ggtatggtcg 660cgagtagctt ggcactggcc gtcgttttac aacgtcgtga
ctgggaaaac cctggcgtta 720cccaacttaa tcgccttgca gcacatcccc ctttcgccag
ctggcgtaat agcgaagagg 780cccgcaccga tcgcccttcc caacagttgc gcagcctgaa
tggcgaatgg cgctttgcct 840ggtttccggc accagaagcg gtgccggaaa gctggctgga
gtgcgatctt cctgaggccg 900atactgtcgt cgtcccctca aactggcaga tgcacggtta
cgatgcgccc atctacacca 960acgtgaccta tcccattacg gtcaatccgc cgtttgttcc
cacggagaat ccgacgggtt 1020gttactcgct cacatttaat gttgatgaaa gctggctata
aaaccggtac agttcggcca 1080ccatggtcga tggtggcccg gtagttttca cactcttctc
actaccgggc caccacgagt 1140agcttggcac tggccgtcgt tttacaacgt cgtgactggg
aaaaccctgg cgttacccaa 1200cttaatcgcc ttgcagcaca tccccctttc gccagctggc
gtaatagcga agaggcccgc 1260accgatcgcc cttcccaaca gttgcgcagc ctgaatggcg
aatggcgctt tgcctggttt 1320ccggcaccag aagcggtgcc ggaaagctgg ctggagtgcg
atcttcctga ggccgatact 1380gtcgtcgtcc cctcaaactg gcagatgcac ggttacgatg
cgcccatcta caccaacgtg 1440acctatccca ttacggtcaa tccgccgttt gttcccacgg
agaatccgac gggttgttac 1500tcgctcacat ttaatgttga tgaaagctgg ctacaggaag
gccagacgcg aattattttt 1560gatggcgtta actcggcgtt tcatctgtgg tgcaacgggc
gctgggtcgg ttacggccag 1620gacagtcgtt tgccgtctga atttgacctg agcgcatttt
tacgcgccgg agaaaaccgc 1680ctcgcggtga tggtgctgcg ctggagtgac ggcagttatc
tggaagatca ggatatgtgg 1740cggatgagcg gcattttccg tgacgtctcg ttgctgcata
aaccgactac acaaatcagc 1800gatttccatg ttgccactcg ctttaatgat gatttcagcc
gcgctgtact ggaggctgaa 1860gttcagatgt gcggcgagtt gcgtgactac ctacgggtaa
cagtttcttt atggcagggt 1920gaaacgcagg tcgccagcgg caccgcgcct ttcggcggtg
aaattatcga tgagcgtggt 1980ggttatgccg atcgcgtcac actacgtctg aacgtcgaaa
acccgaaact gtggagcgcc 2040gaaatcccga atctctatcg tgcggtggtt gaactgcaca
ccgccgacgg cacgctgatt 2100gaagcagaag cctgcgatgt cggtttccgc gaggtgcgga
ttgaaaatgg tctgctgctg 2160ctgaacggca agccgttgct gattcgaggc gttaaccgtc
acgagcatca tcctctgcat 2220ggtcaggtca tggatgagca gacgatggtg caggatatcc
tgctgatgaa gcagaacaac 2280tttaacgccg tgcgctgttc gcattatccg aaccatccgc
tgtggtacac gctgtgcgac 2340cgctacggcc tgtatgtggt ggatgaagcc aatattgaaa
cccacggcat ggtgccaatg 2400aatcgtctga ccgatgatcc gcgctggcta ccggcgatga
gcgaacgcgt aacgcgaatg 2460gtgcagcgcg atcgtaatca cccgagtgtg atcatctggt
cgctggggaa tgaatcaggc 2520cacggcgcta atcacgacgc gctgtatcgc tggatcaaat
ctgtcgatcc ttcccgcccg 2580gtgcagtatg aaggcggcgg agccgacacc acggccaccg
atattatttg cccgatgtac 2640gcgcgcgtgg atgaagacca gcccttcccg gctgtgccga
aatggtccat caaaaaatgg 2700ctttcgctac ctggagagac gcgcccgctg atcctttgcg
aatacgccca cgcgatgggt 2760aacagtcttg gcggtttcgc taaatactgg caggcgtttc
gtcagtatcc ccgtttacag 2820ggcggcttcg tctgggactg ggtggatcag tcgctgatta
aatatgatga aaacggcaac 2880ccgtggtcgg cttacggcgg tgattttggc gatacgccga
acgatcgcca gttctgtatg 2940aacggtctgg tctttgccga ccgcacgccg catccagcgc
tgacggaagc aaaacaccag 3000cagcagtttt tccagttccg tttatccggg caaaccatcg
aagtgaccag cgaatacctg 3060ttccgtcata gcgataacga gctcctgcac tggatggtgg
cgctggatgg taagccgctg 3120gcaagcggtg aagtgcctct ggatgtcgct ccacaaggta
aacagttgat tgaactgcct 3180gaactaccgc agccggagag cgccgggcaa ctctggctca
cagtacgcgt agtgcaaccg 3240aacgcgaccg catggtcaga agccgggcac atcagcgcct
ggcagcagtg gcgtctggcg 3300gaaaacctca gtgtgacgct ccccgccgcg tcccacgcca
tcccgcatct gaccaccagc 3360gaaatggatt tttgcatcga gctgggtaat aagcgttggc
aatttaaccg ccagtcaggc 3420tttctttcac agatgtggat tggcgataaa aaacaactgc
tgacgccgct gcgcgatcag 3480ttcacccgtg caccgctgga taacgacatt ggcgtaagtg
aagcgacccg cattgaccct 3540aacgcctggg tcgaacgctg gaaggcggcg ggccattacc
aggccgaagc agcgttgttg 3600cagtgcacgg cagatacact tgctgatgcg gtgctgatta
cgaccgctca cgcgtggcag 3660catcagggga aaaccttatt tatcagccgg aaaacctacc
ggattgatgg tagtggtcaa 3720atggcgatta ccgttgatgt tgaagtggcg agcgatacac
cgcatccggc gcggattggc 3780ctgaactgcc agctggcgca ggtagcagag cgggtaaact
ggctcggatt agggccgcaa 3840gaaaactatc ccgaccgcct tactgccgcc tgttttgacc
gctgggatct gccattgtca 3900gacatgtata ccccgtacgt cttcccgagc gaaaacggtc
tgcgctgcgg gacgcgcgaa 3960ttgaattatg gcccacacca gtggcgcggc gacttccagt
tcaacatcag ccgctacagt 4020caacagcaac tgatggaaac cagccatcgc catctgctgc
acgcggaaga aggcacatgg 4080ctgaatatcg acggtttcca tatggggatt ggtggcgacg
actcctggag cccgtcagta 4140tcggcggaat tacagctgag cgccggtcgc taccattacc
agttggtctg gtgtcaaaaa 4200taataataac cgggcaggcc atgtctgccc gtatttcgcg
taaggaaatc cattatgtac 4260tatttaaaaa acacaaactt ttggatgttc ggtttattct
ttttctttta cttttttatc 4320atgggagcct acttcccgtt tttcccgatt tggctacatg
acatcaacca tatcagcaaa 4380agtgatacgg gtattatttt tgccgctatt tctctgttct
cgctattatt ccaaccgctg 4440tttggtctgc tttctgacaa actcggcctc gactctaggc
ggccgcgggg atccagacat 4500gataagatac attgatgagt ttggacaaac cacaactaga
atgcagtgaa aaaaatgctt 4560tatttgtgaa atttgtgatg ctattgcttt atttgtaacc
attataagct gcaataaaca 4620agttaacaac aacaattgca ttcattttat gtttcaggtt
cagggggagg tgtgggaggt 4680tttttcggat cctctagagt cgacctgcag gcatgcaagc
ttggcgtaat catggtcata 4740gctgtttcct gtgtgaaatt gttatccgct cacaattcca
cacaacatac gagccggaag 4800cataaagtgt aaagcctggg gtgcctaatg agtgagctaa
ctcacattaa ttgcgttgcg 4860ctcactgccc gctttccagt cgggaaacct gtcgtgccag
ctgcattaat gaatcggcca 4920acgcgcgggg agaggcggtt tgcgtattgg gcgctcttcc
gcttcctcgc tcactgactc 4980gctgcgctcg gtcgttcggc tgcggcgagc ggtatcagct
cactcaaagg cggtaatacg 5040gttatccaca gaatcagggg ataacgcagg aaagaacatg
tgagcaaaag gccagcaaaa 5100ggccaggaac cgtaaaaagg ccgcgttgct ggcgtttttc
cataggctcc gcccccctga 5160cgagcatcac aaaaatcgac gctcaagtca gaggtggcga
aacccgacag gactataaag 5220ataccaggcg tttccccctg gaagctccct cgtgcgctct
cctgttccga ccctgccgct 5280taccggatac ctgtccgcct ttctcccttc gggaagcgtg
gcgctttctc atagctcacg 5340ctgtaggtat ctcagttcgg tgtaggtcgt tcgctccaag
ctgggctgtg tgcacgaacc 5400ccccgttcag cccgaccgct gcgccttatc cggtaactat
cgtcttgagt ccaacccggt 5460aagacacgac ttatcgccac tggcagcagc cactggtaac
aggattagca gagcgaggta 5520tgtaggcggt gctacagagt tcttgaagtg gtggcctaac
tacggctaca ctagaaggac 5580agtatttggt atctgcgctc tgctgaagcc agttaccttc
ggaaaaagag ttggtagctc 5640ttgatccggc aaacaaacca ccgctggtag cggtggtttt
tttgtttgca agcagcagat 5700tacgcgcaga aaaaaaggat ctcaagaaga tcctttgatc
ttttctacgg ggtctgacgc 5760tcagtggaac gaaaactcac gttaagggat tttggtcatg
agattatcaa aaaggatctt 5820cacctagatc cttttaaatt aaaaatgaag ttttaaatca
atctaaagta tatatgagta 5880aacttggtct gacagttacc aatgcttaat cagtgaggca
cctatctcag cgatctgtct 5940atttcgttca tccatagttg cctgactccc cgtcgtgtag
ataactacga tacgggaggg 6000cttaccatct ggccccagtg ctgcaatgat accgcgagac
ccacgctcac cggctccaga 6060tttatcagca ataaaccagc cagccggaag ggccgagcgc
agaagtggtc ctgcaacttt 6120atccgcctcc atccagtcta ttaattgttg ccgggaagct
agagtaagta gttcgccagt 6180taatagtttg cgcaacgttg ttgccattgc tacaggcatc
gtggtgtcac gctcgtcgtt 6240tggtatggct tcattcagct ccggttccca acgatcaagg
cgagttacat gatcccccat 6300gttgtgcaaa aaagcggtta gctccttcgg tcctccgatc
gttgtcagaa gtaagttggc 6360cgcagtgtta tcactcatgg ttatggcagc actgcataat
tctcttactg tcatgccatc 6420cgtaagatgc ttttctgtga ctggtgagta ctcaaccaag
tcattctgag aatagtgtat 6480gcggcgaccg agttgctctt gcccggcgtc aatacgggat
aataccgcgc cacatagcag 6540aactttaaaa gtgctcatca ttggaaaacg ttcttcgggg
cgaaaactct caaggatctt 6600accgctgttg agatccagtt cgatgtaacc cactcgtgca
cccaactgat cttcagcatc 6660ttttactttc accagcgttt ctgggtgagc aaaaacagga
aggcaaaatg ccgcaaaaaa 6720gggaataagg gcgacacgga aatgttgaat actcatactc
ttcctttttc aatattattg 6780aagcatttat cagggttatt gtctcatgag cggatacata
tttgaatgta tttagaaaaa 6840taaacaaata ggggttccgc gcacatttcc ccgaaaagtg
ccacctgacg tctaagaaac 6900cattattatc atgacattaa cctataaaaa taggcgtatc
acgaggccct ttcgtctcgc 6960gcgtttcggt gatgacggtg aaaacctctg acacatgcag
ctcccggaga cggtcacagc 7020ttgtctgtaa gcggatgccg ggagcagaca agcccgtcag
ggcgcgtcag cgggtgttgg 7080cgggtgtcgg ggctggctta actatgcggc atcagagcag
attgtactga gagtgcacca 7140tatgcggtgt gaaataccgc acagatgcgt aaggagaaaa
taccgcatca ggcgccattc 7200gccattcagg ctgcgcaact gttgggaagg gcgatcggtg
cgggcctctt cgctattacg 7260ccagctggcg aaagggggat gtgctgcaag gcgattaagt
tgggtaacgc cagggttttc 7320ccagtcacga cgttgtaaaa cgacggccag tgaattcgag
cttgcatgcc tgcaggt 7377487383DNAArtificial
sequenceArtTal1/TalRab2-Reporter 48cgttacataa cttacggtaa atggcccgcc
tggctgaccg cccaacgacc cccgcccatt 60gacgtcaata atgacgtatg ttcccatagt
aacgccaata gggactttcc attgacgtca 120atgggtggag tatttacggt aaactgccca
cttggcagta catcaagtgt atcatatgcc 180aagtacgccc cctattgacg tcaatgacgg
taaatggccc gcctggcatt atgcccagta 240catgacctta tgggactttc ctacttggca
gtacatctac gtattagtca tcgctattac 300catggtgatg cggttttggc agtacatcaa
tgggcgtgga tagcggtttg actcacgggg 360atttccaagt ctccacccca ttgacgtcaa
tgggagtttg ttttggcacc aaaatcaacg 420ggactttcca aaatgtcgta acaactccgc
cccattgacg caaatgggcg gtaggcgtgt 480acggtgggag gtctatataa gcagagctcg
tttagtgaac cgtcagatcg cctggagacg 540ccatccacgc tgttttgacc tccatagaag
acaccgggac cgatccagcc tccggactct 600agaggatccg gtactcgagg acactgcaga
gacctacttc actaacaacc ggtatggtcg 660cgagtagctt ggcactggcc gtcgttttac
aacgtcgtga ctgggaaaac cctggcgtta 720cccaacttaa tcgccttgca gcacatcccc
ctttcgccag ctggcgtaat agcgaagagg 780cccgcaccga tcgcccttcc caacagttgc
gcagcctgaa tggcgaatgg cgctttgcct 840ggtttccggc accagaagcg gtgccggaaa
gctggctgga gtgcgatctt cctgaggccg 900atactgtcgt cgtcccctca aactggcaga
tgcacggtta cgatgcgccc atctacacca 960acgtgaccta tcccattacg gtcaatccgc
cgtttgttcc cacggagaat ccgacgggtt 1020gttactcgct cacatttaat gttgatgaaa
gctggctata aaaccggtac agttcggcca 1080ccatggtcgt attctgggac gtttttcaca
ctcttctaaa ctaccgggcc accacgggtc 1140gcgagtagct tggcactggc cgtcgtttta
caacgtcgtg actgggaaaa ccctggcgtt 1200acccaactta atcgccttgc agcacatccc
cctttcgcca gctggcgtaa tagcgaagag 1260gcccgcaccg atcgcccttc ccaacagttg
cgcagcctga atggcgaatg gcgctttgcc 1320tggtttccgg caccagaagc ggtgccggaa
agctggctgg agtgcgatct tcctgaggcc 1380gatactgtcg tcgtcccctc aaactggcag
atgcacggtt acgatgcgcc catctacacc 1440aacgtgacct atcccattac ggtcaatccg
ccgtttgttc ccacggagaa tccgacgggt 1500tgttactcgc tcacatttaa tgttgatgaa
agctggctac aggaaggcca gacgcgaatt 1560atttttgatg gcgttaactc ggcgtttcat
ctgtggtgca acgggcgctg ggtcggttac 1620ggccaggaca gtcgtttgcc gtctgaattt
gacctgagcg catttttacg cgccggagaa 1680aaccgcctcg cggtgatggt gctgcgctgg
agtgacggca gttatctgga agatcaggat 1740atgtggcgga tgagcggcat tttccgtgac
gtctcgttgc tgcataaacc gactacacaa 1800atcagcgatt tccatgttgc cactcgcttt
aatgatgatt tcagccgcgc tgtactggag 1860gctgaagttc agatgtgcgg cgagttgcgt
gactacctac gggtaacagt ttctttatgg 1920cagggtgaaa cgcaggtcgc cagcggcacc
gcgcctttcg gcggtgaaat tatcgatgag 1980cgtggtggtt atgccgatcg cgtcacacta
cgtctgaacg tcgaaaaccc gaaactgtgg 2040agcgccgaaa tcccgaatct ctatcgtgcg
gtggttgaac tgcacaccgc cgacggcacg 2100ctgattgaag cagaagcctg cgatgtcggt
ttccgcgagg tgcggattga aaatggtctg 2160ctgctgctga acggcaagcc gttgctgatt
cgaggcgtta accgtcacga gcatcatcct 2220ctgcatggtc aggtcatgga tgagcagacg
atggtgcagg atatcctgct gatgaagcag 2280aacaacttta acgccgtgcg ctgttcgcat
tatccgaacc atccgctgtg gtacacgctg 2340tgcgaccgct acggcctgta tgtggtggat
gaagccaata ttgaaaccca cggcatggtg 2400ccaatgaatc gtctgaccga tgatccgcgc
tggctaccgg cgatgagcga acgcgtaacg 2460cgaatggtgc agcgcgatcg taatcacccg
agtgtgatca tctggtcgct ggggaatgaa 2520tcaggccacg gcgctaatca cgacgcgctg
tatcgctgga tcaaatctgt cgatccttcc 2580cgcccggtgc agtatgaagg cggcggagcc
gacaccacgg ccaccgatat tatttgcccg 2640atgtacgcgc gcgtggatga agaccagccc
ttcccggctg tgccgaaatg gtccatcaaa 2700aaatggcttt cgctacctgg agagacgcgc
ccgctgatcc tttgcgaata cgcccacgcg 2760atgggtaaca gtcttggcgg tttcgctaaa
tactggcagg cgtttcgtca gtatccccgt 2820ttacagggcg gcttcgtctg ggactgggtg
gatcagtcgc tgattaaata tgatgaaaac 2880ggcaacccgt ggtcggctta cggcggtgat
tttggcgata cgccgaacga tcgccagttc 2940tgtatgaacg gtctggtctt tgccgaccgc
acgccgcatc cagcgctgac ggaagcaaaa 3000caccagcagc agtttttcca gttccgttta
tccgggcaaa ccatcgaagt gaccagcgaa 3060tacctgttcc gtcatagcga taacgagctc
ctgcactgga tggtggcgct ggatggtaag 3120ccgctggcaa gcggtgaagt gcctctggat
gtcgctccac aaggtaaaca gttgattgaa 3180ctgcctgaac taccgcagcc ggagagcgcc
gggcaactct ggctcacagt acgcgtagtg 3240caaccgaacg cgaccgcatg gtcagaagcc
gggcacatca gcgcctggca gcagtggcgt 3300ctggcggaaa acctcagtgt gacgctcccc
gccgcgtccc acgccatccc gcatctgacc 3360accagcgaaa tggatttttg catcgagctg
ggtaataagc gttggcaatt taaccgccag 3420tcaggctttc tttcacagat gtggattggc
gataaaaaac aactgctgac gccgctgcgc 3480gatcagttca cccgtgcacc gctggataac
gacattggcg taagtgaagc gacccgcatt 3540gaccctaacg cctgggtcga acgctggaag
gcggcgggcc attaccaggc cgaagcagcg 3600ttgttgcagt gcacggcaga tacacttgct
gatgcggtgc tgattacgac cgctcacgcg 3660tggcagcatc aggggaaaac cttatttatc
agccggaaaa cctaccggat tgatggtagt 3720ggtcaaatgg cgattaccgt tgatgttgaa
gtggcgagcg atacaccgca tccggcgcgg 3780attggcctga actgccagct ggcgcaggta
gcagagcggg taaactggct cggattaggg 3840ccgcaagaaa actatcccga ccgccttact
gccgcctgtt ttgaccgctg ggatctgcca 3900ttgtcagaca tgtatacccc gtacgtcttc
ccgagcgaaa acggtctgcg ctgcgggacg 3960cgcgaattga attatggccc acaccagtgg
cgcggcgact tccagttcaa catcagccgc 4020tacagtcaac agcaactgat ggaaaccagc
catcgccatc tgctgcacgc ggaagaaggc 4080acatggctga atatcgacgg tttccatatg
gggattggtg gcgacgactc ctggagcccg 4140tcagtatcgg cggaattaca gctgagcgcc
ggtcgctacc attaccagtt ggtctggtgt 4200caaaaataat aataaccggg caggccatgt
ctgcccgtat ttcgcgtaag gaaatccatt 4260atgtactatt taaaaaacac aaacttttgg
atgttcggtt tattcttttt cttttacttt 4320tttatcatgg gagcctactt cccgtttttc
ccgatttggc tacatgacat caaccatatc 4380agcaaaagtg atacgggtat tatttttgcc
gctatttctc tgttctcgct attattccaa 4440ccgctgtttg gtctgctttc tgacaaactc
ggcctcgact ctaggcggcc gcggggatcc 4500agacatgata agatacattg atgagtttgg
acaaaccaca actagaatgc agtgaaaaaa 4560atgctttatt tgtgaaattt gtgatgctat
tgctttattt gtaaccatta taagctgcaa 4620taaacaagtt aacaacaaca attgcattca
ttttatgttt caggttcagg gggaggtgtg 4680ggaggttttt tcggatcctc tagagtcgac
ctgcaggcat gcaagcttgg cgtaatcatg 4740gtcatagctg tttcctgtgt gaaattgtta
tccgctcaca attccacaca acatacgagc 4800cggaagcata aagtgtaaag cctggggtgc
ctaatgagtg agctaactca cattaattgc 4860gttgcgctca ctgcccgctt tccagtcggg
aaacctgtcg tgccagctgc attaatgaat 4920cggccaacgc gcggggagag gcggtttgcg
tattgggcgc tcttccgctt cctcgctcac 4980tgactcgctg cgctcggtcg ttcggctgcg
gcgagcggta tcagctcact caaaggcggt 5040aatacggtta tccacagaat caggggataa
cgcaggaaag aacatgtgag caaaaggcca 5100gcaaaaggcc aggaaccgta aaaaggccgc
gttgctggcg tttttccata ggctccgccc 5160ccctgacgag catcacaaaa atcgacgctc
aagtcagagg tggcgaaacc cgacaggact 5220ataaagatac caggcgtttc cccctggaag
ctccctcgtg cgctctcctg ttccgaccct 5280gccgcttacc ggatacctgt ccgcctttct
cccttcggga agcgtggcgc tttctcatag 5340ctcacgctgt aggtatctca gttcggtgta
ggtcgttcgc tccaagctgg gctgtgtgca 5400cgaacccccc gttcagcccg accgctgcgc
cttatccggt aactatcgtc ttgagtccaa 5460cccggtaaga cacgacttat cgccactggc
agcagccact ggtaacagga ttagcagagc 5520gaggtatgta ggcggtgcta cagagttctt
gaagtggtgg cctaactacg gctacactag 5580aaggacagta tttggtatct gcgctctgct
gaagccagtt accttcggaa aaagagttgg 5640tagctcttga tccggcaaac aaaccaccgc
tggtagcggt ggtttttttg tttgcaagca 5700gcagattacg cgcagaaaaa aaggatctca
agaagatcct ttgatctttt ctacggggtc 5760tgacgctcag tggaacgaaa actcacgtta
agggattttg gtcatgagat tatcaaaaag 5820gatcttcacc tagatccttt taaattaaaa
atgaagtttt aaatcaatct aaagtatata 5880tgagtaaact tggtctgaca gttaccaatg
cttaatcagt gaggcaccta tctcagcgat 5940ctgtctattt cgttcatcca tagttgcctg
actccccgtc gtgtagataa ctacgatacg 6000ggagggctta ccatctggcc ccagtgctgc
aatgataccg cgagacccac gctcaccggc 6060tccagattta tcagcaataa accagccagc
cggaagggcc gagcgcagaa gtggtcctgc 6120aactttatcc gcctccatcc agtctattaa
ttgttgccgg gaagctagag taagtagttc 6180gccagttaat agtttgcgca acgttgttgc
cattgctaca ggcatcgtgg tgtcacgctc 6240gtcgtttggt atggcttcat tcagctccgg
ttcccaacga tcaaggcgag ttacatgatc 6300ccccatgttg tgcaaaaaag cggttagctc
cttcggtcct ccgatcgttg tcagaagtaa 6360gttggccgca gtgttatcac tcatggttat
ggcagcactg cataattctc ttactgtcat 6420gccatccgta agatgctttt ctgtgactgg
tgagtactca accaagtcat tctgagaata 6480gtgtatgcgg cgaccgagtt gctcttgccc
ggcgtcaata cgggataata ccgcgccaca 6540tagcagaact ttaaaagtgc tcatcattgg
aaaacgttct tcggggcgaa aactctcaag 6600gatcttaccg ctgttgagat ccagttcgat
gtaacccact cgtgcaccca actgatcttc 6660agcatctttt actttcacca gcgtttctgg
gtgagcaaaa acaggaaggc aaaatgccgc 6720aaaaaaggga ataagggcga cacggaaatg
ttgaatactc atactcttcc tttttcaata 6780ttattgaagc atttatcagg gttattgtct
catgagcgga tacatatttg aatgtattta 6840gaaaaataaa caaatagggg ttccgcgcac
atttccccga aaagtgccac ctgacgtcta 6900agaaaccatt attatcatga cattaaccta
taaaaatagg cgtatcacga ggccctttcg 6960tctcgcgcgt ttcggtgatg acggtgaaaa
cctctgacac atgcagctcc cggagacggt 7020cacagcttgt ctgtaagcgg atgccgggag
cagacaagcc cgtcagggcg cgtcagcggg 7080tgttggcggg tgtcggggct ggcttaacta
tgcggcatca gagcagattg tactgagagt 7140gcaccatatg cggtgtgaaa taccgcacag
atgcgtaagg agaaaatacc gcatcaggcg 7200ccattcgcca ttcaggctgc gcaactgttg
ggaagggcga tcggtgcggg cctcttcgct 7260attacgccag ctggcgaaag ggggatgtgc
tgcaaggcga ttaagttggg taacgccagg 7320gttttcccag tcacgacgtt gtaaaacgac
ggccagtgaa ttcgagcttg catgcctgca 7380ggt
7383495566DNAArtificial
sequencepCMV-hLuc 49ggtaccgagc tcttacgcgt gctagcccgg gctcgaggag
cttggcccat tgcatacgtt 60gtatccatat cataatatgt acatttatat tggctcatgt
ccaacattac cgccatgttg 120acattgatta ttgactagtt attaatagta atcaattacg
gggtcattag ttcatagccc 180atatatggag ttccgcgtta cataacttac ggtaaatggc
ccgcctggct gaccgcccaa 240cgacccccgc ccattgacgt caataatgac gtatgttccc
atagtaacgc caatagggac 300tttccattga cgtcaatggg tggagtattt acgctaaact
gcccacttgg cagtacatca 360agtgtatcat atgccaagta cgccccctat tgacgtcaat
gacggtaaat ggcccgcctg 420gcattatgcc cagtacatga ccttatggga ctttcctact
tggcagtaca tctacgtatt 480agtcatcgct attaccatgg tgatgcggtt ttggcagtac
atcaatgggc gtggatagcg 540gtttgactca cggggatttc caagtctcca ccccattgac
gtcaatggga gtttgttttg 600gcaccaaaat caacgggact ttccaaaatg tcgtaacaac
tccgccccat tgacgcaaat 660gggcggtagg cgtgtacggt gggaggtcta tataagcaga
gctcgtttag tgaaccgtca 720gatcgcctgg agacgccatc cacgctgttt tgacctccat
agaagacacc gggaccgatc 780cagcctccgc ggccccgaat tagcttggca ttccggtact
gttggtaaag ccaccatgga 840agacgccaaa aacataaaga aaggcccggc gccattctat
ccgctggaag atggaaccgc 900tggagagcaa ctgcataagg ctatgaagag atacgccctg
gttcctggaa caattgcttt 960tacagatgca catatcgagg tggacatcac ttacgctgag
tacttcgaaa tgtccgttcg 1020gttggcagaa gctatgaaac gatatgggct gaatacaaat
cacagaatcg tcgtatgcag 1080tgaaaactct cttcaattct ttatgccggt gttgggcgcg
ttatttatcg gagttgcagt 1140tgcgcccgcg aacgacattt ataatgaacg tgaattgctc
aacagtatgg gcatttcgca 1200gcctaccgtg gtgttcgttt ccaaaaaggg gttgcaaaaa
attttgaacg tgcaaaaaaa 1260gctcccaatc atccaaaaaa ttattatcat ggattctaaa
acggattacc agggatttca 1320gtcgatgtac acgttcgtca catctcatct acctcccggt
tttaatgaat acgattttgt 1380gccagagtcc ttcgataggg acaagacaat tgcactgatc
atgaactcct ctggatctac 1440tggtctgcct aaaggtgtcg ctctgcctca tagaactgcc
tgcgtgagat tctcgcatgc 1500cagagatcct atttttggca atcaaatcat tccggatact
gcgattttaa gtgttgttcc 1560attccatcac ggttttggaa tgtttactac actcggatat
ttgatatgtg gatttcgagt 1620cgtcttaatg tatagatttg aagaagagct gtttctgagg
agccttcagg attacaagat 1680tcaaagtgcg ctgctggtgc caaccctatt ctccttcttc
gccaaaagca ctctgattga 1740caaatacgat ttatctaatt tacacgaaat tgcttctggt
ggcgctcccc tctctaagga 1800agtcggggaa gcggttgcca agaggttcca tctgccaggt
atcaggcaag gatatgggct 1860cactgagact acatcagcta ttctgattac acccgagggg
gatgataaac cgggcgcggt 1920cggtaaagtt gttccatttt ttgaagcgaa ggttgtggat
ctggataccg ggaaaacgct 1980gggcgttaat caaagaggcg aactgtgtgt gagaggtcct
atgattatgt ccggttatgt 2040aaacaatccg gaagcgacca acgccttgat tgacaaggat
ggatggctac attctggaga 2100catagcttac tgggacgaag acgaacactt cttcatcgtt
gaccgcctga agtctctgat 2160taagtacaaa ggctatcagg tggctcccgc tgaattggaa
tccatcttgc tccaacaccc 2220caacatcttc gacgcaggtg tcgcaggtct tcccgacgat
gacgccggtg aacttcccgc 2280cgccgttgtt gttttggagc acggaaagac gatgacggaa
aaagagatcg tggattacgt 2340cgccagtcaa gtaacaaccg cgaaaaagtt gcgcggagga
gttgtgtttg tggacgaagt 2400accgaaaggt cttaccggaa aactcgacgc aagaaaaatc
agagagatcc tcataaaggc 2460caagaagggc ggaaagatcg ccgtgtaatt ctagagtcgg
ggcggccggc cgcttcgagc 2520agacatgata agatacattg atgagtttgg acaaaccaca
actagaatgc agtgaaaaaa 2580atgctttatt tgtgaaattt gtgatgctat tgctttattt
gtaaccatta taagctgcaa 2640taaacaagtt aacaacaaca attgcattca ttttatgttt
caggttcagg gggaggtgtg 2700ggaggttttt taaagcaagt aaaacctcta caaatgtggt
aaaatcgata aggatccgtc 2760gaccgatgcc cttgagagcc ttcaacccag tcagctcctt
ccggtgggcg cggggcatga 2820ctatcgtcgc cgcacttatg actgtcttct ttatcatgca
actcgtagga caggtgccgg 2880cagcgctctt ccgcttcctc gctcactgac tcgctgcgct
cggtcgttcg gctgcggcga 2940gcggtatcag ctcactcaaa ggcggtaata cggttatcca
cagaatcagg ggataacgca 3000ggaaagaaca tgtgagcaaa aggccagcaa aaggccagga
accgtaaaaa ggccgcgttg 3060ctggcgtttt tccataggct ccgcccccct gacgagcatc
acaaaaatcg acgctcaagt 3120cagaggtggc gaaacccgac aggactataa agataccagg
cgtttccccc tggaagctcc 3180ctcgtgcgct ctcctgttcc gaccctgccg cttaccggat
acctgtccgc ctttctccct 3240tcgggaagcg tggcgctttc tcatagctca cgctgtaggt
atctcagttc ggtgtaggtc 3300gttcgctcca agctgggctg tgtgcacgaa ccccccgttc
agcccgaccg ctgcgcctta 3360tccggtaact atcgtcttga gtccaacccg gtaagacacg
acttatcgcc actggcagca 3420gccactggta acaggattag cagagcgagg tatgtaggcg
gtgctacaga gttcttgaag 3480tggtggccta actacggcta cactagaaga acagtatttg
gtatctgcgc tctgctgaag 3540ccagttacct tcggaaaaag agttggtagc tcttgatccg
gcaaacaaac caccgctggt 3600agcggtggtt tttttgtttg caagcagcag attacgcgca
gaaaaaaagg atctcaagaa 3660gatcctttga tcttttctac ggggtctgac gctcagtgga
acgaaaactc acgttaaggg 3720attttggtca tgagattatc aaaaaggatc ttcacctaga
tccttttaaa ttaaaaatga 3780agttttaaat caatctaaag tatatatgag taaacttggt
ctgacagtta ccaatgctta 3840atcagtgagg cacctatctc agcgatctgt ctatttcgtt
catccatagt tgcctgactc 3900cccgtcgtgt agataactac gatacgggag ggcttaccat
ctggccccag tgctgcaatg 3960ataccgcgag acccacgctc accggctcca gatttatcag
caataaacca gccagccgga 4020agggccgagc gcagaagtgg tcctgcaact ttatccgcct
ccatccagtc tattaattgt 4080tgccgggaag ctagagtaag tagttcgcca gttaatagtt
tgcgcaacgt tgttgccatt 4140gctacaggca tcgtggtgtc acgctcgtcg tttggtatgg
cttcattcag ctccggttcc 4200caacgatcaa ggcgagttac atgatccccc atgttgtgca
aaaaagcggt tagctccttc 4260ggtcctccga tcgttgtcag aagtaagttg gccgcagtgt
tatcactcat ggttatggca 4320gcactgcata attctcttac tgtcatgcca tccgtaagat
gcttttctgt gactggtgag 4380tactcaacca agtcattctg agaatagtgt atgcggcgac
cgagttgctc ttgcccggcg 4440tcaatacggg ataataccgc gccacatagc agaactttaa
aagtgctcat cattggaaaa 4500cgttcttcgg ggcgaaaact ctcaaggatc ttaccgctgt
tgagatccag ttcgatgtaa 4560cccactcgtg cacccaactg atcttcagca tcttttactt
tcaccagcgt ttctgggtga 4620gcaaaaacag gaaggcaaaa tgccgcaaaa aagggaataa
gggcgacacg gaaatgttga 4680atactcatac tcttcctttt tcaatattat tgaagcattt
atcagggtta ttgtctcatg 4740agcggataca tatttgaatg tatttagaaa aataaacaaa
taggggttcc gcgcacattt 4800ccccgaaaag tgccacctga cgcgccctgt agcggcgcat
taagcgcggc gggtgtggtg 4860gttacgcgca gcgtgaccgc tacacttgcc agcgccctag
cgcccgctcc tttcgctttc 4920ttcccttcct ttctcgccac gttcgccggc tttccccgtc
aagctctaaa tcgggggctc 4980cctttagggt tccgatttag tgctttacgg cacctcgacc
ccaaaaaact tgattagggt 5040gatggttcac gtagtgggcc atcgccctga tagacggttt
ttcgcccttt gacgttggag 5100tccacgttct ttaatagtgg actcttgttc caaactggaa
caacactcaa ccctatctcg 5160gtctattctt ttgatttata agggattttg ccgatttcgg
cctattggtt aaaaaatgag 5220ctgatttaac aaaaatttaa cgcgaatttt aacaaaatat
taacgcttac aatttgccat 5280tcgccattca ggctgcgcaa ctgttgggaa gggcgatcgg
tgcgggcctc ttcgctatta 5340cgccagccca agctaccatg ataagtaagt aatattaagg
tacgggaggt acttggagcg 5400gccgcaataa aatatcttta ttttcattac atctgtgtgt
tggttttttg tgtgaatcga 5460tagtactaac atacgctctc catcaaaaca aaacgaaaca
aaacaaacta gcaaaatagg 5520ctgtccccag tgcaagtgca ggtgccagaa catttctcta
tcgata 5566502961DNAArtificial sequencepBS 50gtaaaacgac
ggccagtgag cgcgcgtaat acgactcact atagggcgaa ttggagctcc 60accgcggtgg
cggccgctct agaactagtg gatcccccgg gctgcaggaa ttcgatatca 120agcttatcga
taccgtcgac ctcgaggggg ggcccggtac ccagcttttg ttccctttag 180tgagggttaa
ttgcgcgctt ggcgtaatca tggtcatagc tgtttcctgt gtgaaattgt 240tatccgctca
caattccaca caacatacga gccggaagca taaagtgtaa agcctggggt 300gcctaatgag
tgagctaact cacattaatt gcgttgcgct cactgcccgc tttccagtcg 360ggaaacctgt
cgtgccagct gcattaatga atcggccaac gcgcggggag aggcggtttg 420cgtattgggc
gctcttccgc ttcctcgctc actgactcgc tgcgctcggt cgttcggctg 480cggcgagcgg
tatcagctca ctcaaaggcg gtaatacggt tatccacaga atcaggggat 540aacgcaggaa
agaacatgtg agcaaaaggc cagcaaaagg ccaggaaccg taaaaaggcc 600gcgttgctgg
cgtttttcca taggctccgc ccccctgacg agcatcacaa aaatcgacgc 660tcaagtcaga
ggtggcgaaa cccgacagga ctataaagat accaggcgtt tccccctgga 720agctccctcg
tgcgctctcc tgttccgacc ctgccgctta ccggatacct gtccgccttt 780ctcccttcgg
gaagcgtggc gctttctcat agctcacgct gtaggtatct cagttcggtg 840taggtcgttc
gctccaagct gggctgtgtg cacgaacccc ccgttcagcc cgaccgctgc 900gccttatccg
gtaactatcg tcttgagtcc aacccggtaa gacacgactt atcgccactg 960gcagcagcca
ctggtaacag gattagcaga gcgaggtatg taggcggtgc tacagagttc 1020ttgaagtggt
ggcctaacta cggctacact agaaggacag tatttggtat ctgcgctctg 1080ctgaagccag
ttaccttcgg aaaaagagtt ggtagctctt gatccggcaa acaaaccacc 1140gctggtagcg
gtggtttttt tgtttgcaag cagcagatta cgcgcagaaa aaaaggatct 1200caagaagatc
ctttgatctt ttctacgggg tctgacgctc agtggaacga aaactcacgt 1260taagggattt
tggtcatgag attatcaaaa aggatcttca cctagatcct tttaaattaa 1320aaatgaagtt
ttaaatcaat ctaaagtata tatgagtaaa cttggtctga cagttaccaa 1380tgcttaatca
gtgaggcacc tatctcagcg atctgtctat ttcgttcatc catagttgcc 1440tgactccccg
tcgtgtagat aactacgata cgggagggct taccatctgg ccccagtgct 1500gcaatgatac
cgcgagaccc acgctcaccg gctccagatt tatcagcaat aaaccagcca 1560gccggaaggg
ccgagcgcag aagtggtcct gcaactttat ccgcctccat ccagtctatt 1620aattgttgcc
gggaagctag agtaagtagt tcgccagtta atagtttgcg caacgttgtt 1680gccattgcta
caggcatcgt ggtgtcacgc tcgtcgtttg gtatggcttc attcagctcc 1740ggttcccaac
gatcaaggcg agttacatga tcccccatgt tgtgcaaaaa agcggttagc 1800tccttcggtc
ctccgatcgt tgtcagaagt aagttggccg cagtgttatc actcatggtt 1860atggcagcac
tgcataattc tcttactgtc atgccatccg taagatgctt ttctgtgact 1920ggtgagtact
caaccaagtc attctgagaa tagtgtatgc ggcgaccgag ttgctcttgc 1980ccggcgtcaa
tacgggataa taccgcgcca catagcagaa ctttaaaagt gctcatcatt 2040ggaaaacgtt
cttcggggcg aaaactctca aggatcttac cgctgttgag atccagttcg 2100atgtaaccca
ctcgtgcacc caactgatct tcagcatctt ttactttcac cagcgtttct 2160gggtgagcaa
aaacaggaag gcaaaatgcc gcaaaaaagg gaataagggc gacacggaaa 2220tgttgaatac
tcatactctt cctttttcaa tattattgaa gcatttatca gggttattgt 2280ctcatgagcg
gatacatatt tgaatgtatt tagaaaaata aacaaatagg ggttccgcgc 2340acatttcccc
gaaaagtgcc acctaaattg taagcgttaa tattttgtta aaattcgcgt 2400taaatttttg
ttaaatcagc tcatttttta accaataggc cgaaatcggc aaaatccctt 2460ataaatcaaa
agaatagacc gagatagggt tgagtgttgt tccagtttgg aacaagagtc 2520cactattaaa
gaacgtggac tccaacgtca aagggcgaaa aaccgtctat cagggcgatg 2580gcccactacg
tgaaccatca ccctaatcaa gttttttggg gtcgaggtgc cgtaaagcac 2640taaatcggaa
ccctaaaggg agcccccgat ttagagcttg acggggaaag ccggcgaacg 2700tggcgagaaa
ggaagggaag aaagcgaaag gagcgggcgc tagggcgctg gcaagtgtag 2760cggtcacgct
gcgcgtaacc accacacccg ccgcgcttaa tgcgccgcta cagggcgcgt 2820cccattcgcc
attcaggctg cgcaactgtt gggaagggcg atcggtgcgg gcctcttcgc 2880tattacgcca
gctggcgaaa gggggatgtg ctgcaaggcg attaagttgg gtaacgccag 2940ggttttccca
gtcacgacgt t
2961517164DNAArtificial sequencepCMVbeta 51gaattcgagc ttgcatgcct
gcaggtcgtt acataactta cggtaaatgg cccgcctggc 60tgaccgccca acgacccccg
cccattgacg tcaataatga cgtatgttcc catagtaacg 120ccaataggga ctttccattg
acgtcaatgg gtggagtatt tacggtaaac tgcccacttg 180gcagtacatc aagtgtatca
tatgccaagt acgcccccta ttgacgtcaa tgacggtaaa 240tggcccgcct ggcattatgc
ccagtacatg accttatggg actttcctac ttggcagtac 300atctacgtat tagtcatcgc
tattaccatg gtgatgcggt tttggcagta catcaatggg 360cgtggatagc ggtttgactc
acggggattt ccaagtctcc accccattga cgtcaatggg 420agtttgtttt ggcaccaaaa
tcaacgggac tttccaaaat gtcgtaacaa ctccgcccca 480ttgacgcaaa tgggcggtag
gcgtgtacgg tgggaggtct atataagcag agctcgttta 540gtgaaccgtc agatcgcctg
gagacgccat ccacgctgtt ttgacctcca tagaagacac 600cgggaccgat ccagcctccg
gactctagag gatccggtac tcgaggaact gaaaaaccag 660aaagttaact ggtaagttta
gtctttttgt cttttatttc aggtcccgga tccggtggtg 720gtgcaaatca aagaactgct
cctcagtgga tgttgccttt acttctaggc ctgtacggaa 780gtgttacttc tgctctaaaa
gctgcggaat tgtacccgcg gccgcaattc ccggggatcg 840aaagagcctg ctaaagcaaa
aaagaagtca ccatgtcgtt tactttgacc aacaagaacg 900tgattttcgt tgccggtctg
ggaggcattg gtctggacac cagcaaggag ctgctcaagc 960gcgatcccgt cgttttacaa
cgtcgtgact gggaaaaccc tggcgttacc caacttaatc 1020gccttgcagc acatccccct
ttcgccagct ggcgtaatag cgaagaggcc cgcaccgatc 1080gcccttccca acagttgcgc
agcctgaatg gcgaatggcg ctttgcctgg tttccggcac 1140cagaagcggt gccggaaagc
tggctggagt gcgatcttcc tgaggccgat actgtcgtcg 1200tcccctcaaa ctggcagatg
cacggttacg atgcgcccat ctacaccaac gtaacctatc 1260ccattacggt caatccgccg
tttgttccca cggagaatcc gacgggttgt tactcgctca 1320catttaatgt tgatgaaagc
tggctacagg aaggccagac gcgaattatt tttgatggcg 1380ttaactcggc gtttcatctg
tggtgcaacg ggcgctgggt cggttacggc caggacagtc 1440gtttgccgtc tgaatttgac
ctgagcgcat ttttacgcgc cggagaaaac cgcctcgcgg 1500tgatggtgct gcgttggagt
gacggcagtt atctggaaga tcaggatatg tggcggatga 1560gcggcatttt ccgtgacgtc
tcgttgctgc ataaaccgac tacacaaatc agcgatttcc 1620atgttgccac tcgctttaat
gatgatttca gccgcgctgt actggaggct gaagttcaga 1680tgtgcggcga gttgcgtgac
tacctacggg taacagtttc tttatggcag ggtgaaacgc 1740aggtcgccag cggcaccgcg
cctttcggcg gtgaaattat cgatgagcgt ggtggttatg 1800ccgatcgcgt cacactacgt
ctgaacgtcg aaaacccgaa actgtggagc gccgaaatcc 1860cgaatctcta tcgtgcggtg
gttgaactgc acaccgccga cggcacgctg attgaagcag 1920aagcctgcga tgtcggtttc
cgcgaggtgc ggattgaaaa tggtctgctg ctgctgaacg 1980gcaagccgtt gctgattcga
ggcgttaacc gtcacgagca tcatcctctg catggtcagg 2040tcatggatga gcagacgatg
gtgcaggata tcctgctgat gaagcagaac aactttaacg 2100ccgtgcgctg ttcgcattat
ccgaaccatc cgctgtggta cacgctgtgc gaccgctacg 2160gcctgtatgt ggtggatgaa
gccaatattg aaacccacgg catggtgcca atgaatcgtc 2220tgaccgatga tccgcgctgg
ctaccggcga tgagcgaacg cgtaacgcga atggtgcagc 2280gcgatcgtaa tcacccgagt
gtgatcatct ggtcgctggg gaatgaatca ggccacggcg 2340ctaatcacga cgcgctgtat
cgctggatca aatctgtcga tccttcccgc ccggtgcagt 2400atgaaggcgg cggagccgac
accacggcca ccgatattat ttgcccgatg tacgcgcgcg 2460tggatgaaga ccagcccttc
ccggctgtgc cgaaatggtc catcaaaaaa tggctttcgc 2520tacctggaga gacgcgcccg
ctgatccttt gcgaatacgc ccacgcgatg ggtaacagtc 2580ttggcggttt cgctaaatac
tggcaggcgt ttcgtcagta tccccgttta cagggcggct 2640tcgtctggga ctgggtggat
cagtcgctga ttaaatatga tgaaaacggc aacccgtggt 2700cggcttacgg cggtgatttt
ggcgatacgc cgaacgatcg ccagttctgt atgaacggtc 2760tggtctttgc cgaccgcacg
ccgcatccag cgctgacgga agcaaaacac cagcagcagt 2820ttttccagtt ccgtttatcc
gggcaaacca tcgaagtgac cagcgaatac ctgttccgtc 2880atagcgataa cgagctcctg
cactggatgg tggcgctgga tggtaagccg ctggcaagcg 2940gtgaagtgcc tctggatgtc
gctccacaag gtaaacagtt gattgaactg cctgaactac 3000cgcagccgga gagcgccggg
caactctggc tcacagtacg cgtagtgcaa ccgaacgcga 3060ccgcatggtc agaagccggg
cacatcagcg cctggcagca gtggcgtctg gcggaaaacc 3120tcagtgtgac gctccccgcc
gcgtcccacg ccatcccgca tctgaccacc agcgaaatgg 3180atttttgcat cgagctgggt
aataagcgtt ggcaatttaa ccgccagtca ggctttcttt 3240cacagatgtg gattggcgat
aaaaaacaac tgctgacgcc gctgcgcgat cagttcaccc 3300gtgcaccgct ggataacgac
attggcgtaa gtgaagcgac ccgcattgac cctaacgcct 3360gggtcgaacg ctggaaggcg
gcgggccatt accaggccga agcagcgttg ttgcagtgca 3420cggcagatac acttgctgat
gcggtgctga ttacgaccgc tcacgcgtgg cagcatcagg 3480ggaaaacctt atttatcagc
cggaaaacct accggattga tggtagtggt caaatggcga 3540ttaccgttga tgttgaagtg
gcgagcgata caccgcatcc ggcgcggatt ggcctgaact 3600gccagctggc gcaggtagca
gagcgggtaa actggctcgg attagggccg caagaaaact 3660atcccgaccg ccttactgcc
gcctgttttg accgctggga tctgccattg tcagacatgt 3720ataccccgta cgtcttcccg
agcgaaaacg gtctgcgctg cgggacgcgc gaattgaatt 3780atggcccaca ccagtggcgc
ggcgacttcc agttcaacat cagccgctac agtcaacagc 3840aactgatgga aaccagccat
cgccatctgc tgcacgcgga agaaggcaca tggctgaata 3900tcgacggttt ccatatgggg
attggtggcg acgactcctg gagcccgtca gtatcggcgg 3960aattacagct gagcgccggt
cgctaccatt accagttggt ctggtgtcaa aaataataat 4020aaccgggcag gccatgtctg
cccgtatttc gcgtaaggaa atccattatg tactatttaa 4080aaaacacaaa cttttggatg
ttcggtttat tctttttctt ttactttttt atcatgggag 4140cctacttccc gtttttcccg
atttggctac atgacatcaa ccatatcagc aaaagtgata 4200cgggtattat ttttgccgct
atttctctgt tctcgctatt attccaaccg ctgtttggtc 4260tgctttctga caaactcggc
ctcgactcta ggcggccgcg gggatccaga catgataaga 4320tacattgatg agtttggaca
aaccacaact agaatgcagt gaaaaaaatg ctttatttgt 4380gaaatttgtg atgctattgc
tttatttgta accattataa gctgcaataa acaagttaac 4440aacaacaatt gcattcattt
tatgtttcag gttcaggggg aggtgtggga ggttttttcg 4500gatcctctag agtcgacctg
caggcatgca agcttggcgt aatcatggtc atagctgttt 4560cctgtgtgaa attgttatcc
gctcacaatt ccacacaaca tacgagccgg aagcataaag 4620tgtaaagcct ggggtgccta
atgagtgagc taactcacat taattgcgtt gcgctcactg 4680cccgctttcc agtcgggaaa
cctgtcgtgc cagctgcatt aatgaatcgg ccaacgcgcg 4740gggagaggcg gtttgcgtat
tgggcgctct tccgcttcct cgctcactga ctcgctgcgc 4800tcggtcgttc ggctgcggcg
agcggtatca gctcactcaa aggcggtaat acggttatcc 4860acagaatcag gggataacgc
aggaaagaac atgtgagcaa aaggccagca aaaggccagg 4920aaccgtaaaa aggccgcgtt
gctggcgttt ttccataggc tccgcccccc tgacgagcat 4980cacaaaaatc gacgctcaag
tcagaggtgg cgaaacccga caggactata aagataccag 5040gcgtttcccc ctggaagctc
cctcgtgcgc tctcctgttc cgaccctgcc gcttaccgga 5100tacctgtccg cctttctccc
ttcgggaagc gtggcgcttt ctcatagctc acgctgtagg 5160tatctcagtt cggtgtaggt
cgttcgctcc aagctgggct gtgtgcacga accccccgtt 5220cagcccgacc gctgcgcctt
atccggtaac tatcgtcttg agtccaaccc ggtaagacac 5280gacttatcgc cactggcagc
agccactggt aacaggatta gcagagcgag gtatgtaggc 5340ggtgctacag agttcttgaa
gtggtggcct aactacggct acactagaag gacagtattt 5400ggtatctgcg ctctgctgaa
gccagttacc ttcggaaaaa gagttggtag ctcttgatcc 5460ggcaaacaaa ccaccgctgg
tagcggtggt ttttttgttt gcaagcagca gattacgcgc 5520agaaaaaaag gatctcaaga
agatcctttg atcttttcta cggggtctga cgctcagtgg 5580aacgaaaact cacgttaagg
gattttggtc atgagattat caaaaaggat cttcacctag 5640atccttttaa attaaaaatg
aagttttaaa tcaatctaaa gtatatatga gtaaacttgg 5700tctgacagtt accaatgctt
aatcagtgag gcacctatct cagcgatctg tctatttcgt 5760tcatccatag ttgcctgact
ccccgtcgtg tagataacta cgatacggga gggcttacca 5820tctggcccca gtgctgcaat
gataccgcga gacccacgct caccggctcc agatttatca 5880gcaataaacc agccagccgg
aagggccgag cgcagaagtg gtcctgcaac tttatccgcc 5940tccatccagt ctattaattg
ttgccgggaa gctagagtaa gtagttcgcc agttaatagt 6000ttgcgcaacg ttgttgccat
tgctacaggc atcgtggtgt cacgctcgtc gtttggtatg 6060gcttcattca gctccggttc
ccaacgatca aggcgagtta catgatcccc catgttgtgc 6120 aaaaaagcgg
ttagctcctt cggtcctccg atcgttgtca gaagtaagtt ggccgcagtg 6180ttatcactca
tggttatggc agcactgcat aattctctta ctgtcatgcc atccgtaaga 6240tgcttttctg
tgactggtga gtactcaacc aagtcattct gagaatagtg tatgcggcga 6300ccgagttgct
cttgcccggc gtcaatacgg gataataccg cgccacatag cagaacttta 6360aaagtgctca
tcattggaaa acgttcttcg gggcgaaaac tctcaaggat cttaccgctg 6420ttgagatcca
gttcgatgta acccactcgt gcacccaact gatcttcagc atcttttact 6480ttcaccagcg
tttctgggtg agcaaaaaca ggaaggcaaa atgccgcaaa aaagggaata 6540agggcgacac
ggaaatgttg aatactcata ctcttccttt ttcaatatta ttgaagcatt 6600tatcagggtt
attgtctcat gagcggatac atatttgaat gtatttagaa aaataaacaa 6660ataggggttc
cgcgcacatt tccccgaaaa gtgccacctg acgtctaaga aaccattatt 6720atcatgacat
taacctataa aaataggcgt atcacgaggc cctttcgtct cgcgcgtttc 6780ggtgatgacg
gtgaaaacct ctgacacatg cagctcccgg agacggtcac agcttgtctg 6840taagcggatg
ccgggagcag acaagcccgt cagggcgcgt cagcgggtgt tggcgggtgt 6900cggggctggc
ttaactatgc ggcatcagag cagattgtac tgagagtgca ccatatgcgg 6960tgtgaaatac
cgcacagatg cgtaaggaga aaataccgca tcaggcgcca ttcgccattc 7020aggctgcgca
actgttggga agggcgatcg gtgcgggcct cttcgctatt acgccagctg 7080gcgaaagggg
gatgtgctgc aaggcgatta agttgggtaa cgccagggtt ttcccagtca 7140cgacgttgta
aaacgacggc cagt
7164527867DNAArtificial sequencepCAG-TalRab2-Clo051 52ggcgcgccgg
attcgacatt gattattgac tagttattaa tagtaatcaa ttacggggtc 60attagttcat
agcccatata tggagttccg cgttacataa cttacggtaa atggcccgcc 120tggctgaccg
cccaacgacc cccgcccatt gacgtcaata atgacgtatg ttcccatagt 180aacgccaata
gggactttcc attgacgtca atgggtggag tatttacggt aaactgccca 240cttggcagta
catcaagtgt atcatatgcc aagtacgccc cctattgacg tcaatgacgg 300taaatggccc
gcctggcatt atgcccagta catgacctta tgggactttc ctacttggca 360gtacatctac
gtattagtca tcgctattac catggtcgag gtgagcccca cgttctgctt 420cactctcccc
atctcccccc cctccccacc cccaattttg tatttattta ttttttaatt 480attttgtgca
gcgatggggg cggggggggg gggggggcgc gcgccaggcg gggcggggcg 540gggcgagggg
cggggcgggg cgaggcggag aggtgcggcg gcagccaatc agagcggcgc 600gctccgaaag
tttcctttta tggcgaggcg gcggcggcgg cggccctata aaaagcgaag 660cgcgcggcgg
gcgggagtcg ctgcgcgctg ccttcgcccc gtgccccgct ccgccgccgc 720ctcgcgccgc
ccgccccggc tctgactgac cgcgttactc ccacaggtga gcgggcggga 780cggcccttct
cctccgggct gtaattagcg cttggtttaa tgacggcttg tttcttttct 840gtggctgcgt
gaaagccttg aggggctccg ggagggccct ttgtgcgggg gggagcggct 900cggggggtgc
gtgcgtgtgt gtgtgcgtgg ggagcgccgc gtgcggctcc gcgctgcccg 960gcggctgtga
gcgctgcggg cgcggcgcgg ggctttgtgc gctccgcagt gtgcgcgagg 1020ggagcgcggc
cgggggcggt gccccgcggt gcgggggggg ctgcgagggg aacaaaggct 1080gcgtgcgggg
tgtgtgcgtg ggggggtgag cagggggtgt gggcgcgtcg gtcgggctgc 1140aaccccccct
gcacccccct ccccgagttg ctgagcacgg cccggcttcg ggtgcggggc 1200tccgtacggg
gcgtggcgcg gggctcgccg tgccgggcgg ggggtggcgg caggtggggg 1260tgccgggcgg
ggcggggccg cctcgggccg gggagggctc gggggagggg cgcggcggcc 1320cccggagcgc
cggcggctgt cgaggcgcgg cgagccgcag ccattgcctt ttatggtaat 1380cgtgcgagag
ggcgcaggga cttcctttgt cccaaatctg tgcggagccg aaatctggga 1440ggcgccgccg
caccccctct agcgggcgcg gggcgaagcg gtgcggcgcc ggcaggaagg 1500aaatgggcgg
ggagggcctt cgtgcgtcgc cgcgccgccg tccccttctc cctctccagc 1560ctcggggctg
tccgcggggg gacggctgcc ttcggggggg acggggcagg gcggggttcg 1620gcttctggcg
tgtgaccggc ggctctagag cctctgctaa ccatgttcat gccttcttct 1680ttttcctaca
gatccttaat taataatacg actcactata ggggccgcca ccatgggacc 1740taagaaaaag
aggaaggtgg cggccgctga ctacaaggat gacgacgata aaccaggtgg 1800cggaggtagt
ggcggaggtg gggtacccgc cagtccagca gcccaggtgg atctgagaac 1860cctcggctac
agccagcagc agcaggagaa gatcaaacca aaggtgcggt ccaccgtcgc 1920tcagcaccat
gaagcactgg tggggcacgg tttcacacac gcccatattg tggctctgtc 1980tcagcatccc
gctgcactcg ggactgtggc cgtcaaatat caggacatga tcgccgctct 2040gcctgaggca
acccacgaag ccattgtggg cgtcggaaag cagtggagcg gtgccagagc 2100actcgaagca
ctcctcaccg tcgccgggga actgcggggt ccaccactcc agtccggact 2160ggacactgga
cagctgctga agatcgctaa acgcggcgga gtgacagctg tggaagctgt 2220gcacgcttgg
aggaatgctc tgacaggagc cccactgaat ctgacacccc agcaggtggt 2280ggccattgct
agcaacaatg ggggcaagca ggctctggag acagtgcagc gcctgctgcc 2340tgtgctgtgc
caggctcacg gactgactcc acagcaggtg gtggccatcg cttccaacaa 2400tggagggaaa
caggctctgg aaacagtgca gaggctgctg cccgtgctgt gccaggctca 2460tggactgaca
cctcagcagg tcgtcgccat tgcttctaac ggcggaggga agcaggctct 2520ggagactgtg
cagagactgc tgccagtgct gtgccaggcc catggactga cccctcagca 2580ggtcgtggct
atcgctagta acaatggcgg aaaacaggct ctggaaactg tgcagcggct 2640gctccccgtg
ctgtgccagg cccacggcct cactccacag caggtcgtcg ctatcgcctc 2700taataacggg
ggcaagcagg ctctggagac agtacagcgc ctgttacccg tgctgtgcca 2760ggcacacggc
ctcacacctc agcaggtcgt ggcaatcgct tcccatgacg gagggaaaca 2820ggctctggaa
acggtccaga ggctgctccc cgtgctgtgc caagctcacg gcctcacccc 2880tcagcaggtg
gtcgctattg cttctcatga tggcggaaag caggctctgg agaccgtgca 2940gagactgctc
cctgtgctgt gccaagccca cggcctgact ccacagcagg tcgtggccat 3000cgctagtcat
gacgggggca aacaggctct ggaaacagta cagcggctgt tacccgtgct 3060gtgccaagcc
catggcctca cacctcagca agtcgtcgct atcgctagca acaatggagg 3120gaagcaggct
ctggagacgg tgcagcgcct gctcccagtg ctgtgccaag ctcatggcct 3180cacccctcag
caagtcgtcg caattgcttc caataacggc ggaaaacagg ctctggaaac 3240cgtccagagg
ctgctgcccg tgctgtgcca agcacatggc ttaactccac agcaagtggt 3300ggccattgct
tctaatgggg gcggaaagca ggccctggag acagtccaga gactgttgcc 3360cgtgctgtgc
caagcgcatg gactgacacc tgaacaggtc gtcgctatcg ctagtaatat 3420tgggggcaaa
caggccctgg aaacagtgca gcggctgctt cccgtgctgt gccaggcgca 3480tggactcaca
ccccagcagg tcgtcgcaat cgcctctaat aacggaggga agcaggccct 3540ggaaaccgtg
cagagactgt tacctgtgct gtgccaggca catggtctga caccacagca 3600ggtggtcgca
attgctagca atggcggagg gaagcaggcc ctggagactg tccagagact 3660gctacccgtg
ctgtgccaag cgcacggcct gaccccacag caggtcgtcg ctattgcttc 3720taatggcgga
gggcggcctg ctctggagag cattgtggct cagctgtcca ggcccgatcc 3780tgccctggct
agatccgcac tcactaacga tcatctggtc gctctcgctt gcctcggtgg 3840acggcccgct
ctggacgcag tcaaaaaggg tctcccccat gctcccgcac tgatcaagag 3900aaccaacagg
agaattcctg agggatccga tcgtttaaac gaaggcatca aaagcaacat 3960ctccctcctg
aaagacgaac tccgggggca gattagccac attagtcacg aatacctctc 4020cctcatcgac
ctggctttcg atagcaagca gaacaggctc tttgagatga aagtgctgga 4080actgctcgtc
aatgagtacg ggttcaaggg tcgacacctc ggcggatcta ggaaaccaga 4140cggcatcgtg
tatagtacca cactggaaga caactttggg atcattgtgg ataccaaggc 4200atactctgag
ggttatagtc tgcccatttc acaggccgac gagatggaac ggtacgtgcg 4260cgagaactca
aatagagatg aggaagtcaa ccctaacaag tggtgggaga acttctctga 4320ggaagtgaag
aaatactact tcgtctttat cagcgggtcc ttcaagggta aatttgagga 4380acagctcagg
agactgagca tgactaccgg cgtgaatggc agcgccgtca acgtggtcaa 4440tctgctcctg
ggcgctgaaa agattcggag cggagagatg accatcgaag agctggagag 4500ggcaatgttt
aataatagcg agtttatcct gaaatactga acgcgtaaat gattgcagat 4560ccactagttc
tagaattcca gctgagcgcc ggtcgctacc attaccagtt ggtctggtgt 4620caaaaataat
aataaccggg caggggggat ctgcatggat ctttgtgaag gaaccttact 4680tctgtggtgt
gacataattg gacaaactac ctacagagat ttaaagctct aaggtaaata 4740taaaattttt
aagtgtataa tgtgttaaac tactgattct aattgtttgt gtattttaga 4800ttccaaccta
tggaactgat gaatgggagc agtggtggaa tgccagatcc agacatgata 4860agatacattg
atgagtttgg acaaaccaca actagaatgc agtgaaaaaa atgctttatt 4920tgtgaaattt
gtgatgctat tgctttattt gtaaccatta taagctgcaa taaacaagtt 4980aacaacaaca
attgcattca ttttatgttt caggttcagg gggaggtgtg ggaggttttt 5040taaagcaagt
aaaacctcta caaatgtggt atggctgatt atgatctgcg gccgccactg 5100gccgtcgttt
tacaacgtcg tgactgggaa aaccctggcg ttacccaact taatcgcctt 5160gcagcacatc
cccctttcgc cagctggcgt aatagcgaag aggcccgcac cgatcgccct 5220tcccaacagt
tgcgcagcct gaatggcgaa tggaacgcgc cctgtagcgg cgcattaagc 5280gcggcgggtg
tggtggttac gcgcagcgtg accgctacac ttgccagcgc cctagcgccc 5340gctcctttcg
ctttcttccc ttcctttctc gccacgttcg ccggctttcc ccgtcaagct 5400ctaaatcggg
ggctcccttt agggttccga tttagtgctt tacggcacct cgaccccaaa 5460aaacttgatt
agggtgatgg ttcacgtagt gggccatcgc cctgatagac ggtttttcgc 5520cctttgacgt
tggagtccac gttctttaat agtggactct tgttccaaac tggaacaaca 5580ctcaacccta
tctcggtcta ttcttttgat ttataaggga ttttgccgat ttcggcctat 5640tggttaaaaa
atgagctgat ttaacaaaaa tttaacgcga attttaacaa aatattaacg 5700cttacaattt
aggtggcact tttcggggaa atgtgcgcgg aacccctatt tgtttatttt 5760tctaaataca
ttcaaatatg tatccgctca tgagacaata accctgataa atgcttcaat 5820aatattgaaa
aaggaagagt atgagtattc aacatttccg tgtcgccctt attccctttt 5880ttgcggcatt
ttgccttcct gtttttgctc acccagaaac gctggtgaaa gtaaaagatg 5940ctgaagatca
gttgggtgca cgagtgggtt acatcgaact ggatctcaac agcggtaaga 6000tccttgagag
ttttcgcccc gaagaacgtt ttccaatgat gagcactttt aaagttctgc 6060tatgtggcgc
ggtattatcc cgtattgacg ccgggcaaga gcaactcggt cgccgcatac 6120actattctca
gaatgacttg gttgagtact caccagtcac agaaaagcat cttacggatg 6180gcatgacagt
aagagaatta tgcagtgctg ccataaccat gagtgataac actgcggcca 6240acttacttct
gacaacgatc ggaggaccga aggagctaac cgcttttttg cacaacatgg 6300gggatcatgt
aactcgcctt gatcgttggg aaccggagct gaatgaagcc ataccaaacg 6360acgagcgtga
caccacgatg cctgtagcaa tggcaacaac gttgcgcaaa ctattaactg 6420gcgaactact
tactctagct tcccggcaac aattaataga ctggatggag gcggataaag 6480ttgcaggacc
acttctgcgc tcggcccttc cggctggctg gtttattgct gataaatctg 6540gagccggtga
gcgtgggtct cgcggtatca ttgcagcact ggggccagat ggtaagccct 6600cccgtatcgt
agttatctac acgacgggga gtcaggcaac tatggatgaa cgaaatagac 6660agatcgctga
gataggtgcc tcactgatta agcattggta actgtcagac caagtttact 6720catatatact
ttagattgat ttaaaacttc atttttaatt taaaaggatc taggtgaaga 6780tcctttttga
taatctcatg accaaaatcc cttaacgtga gttttcgttc cactgagcgt 6840cagaccccgt
agaaaagatc aaaggatctt cttgagatcc tttttttctg cgcgtaatct 6900gctgcttgca
aacaaaaaaa ccaccgctac cagcggtggt ttgtttgccg gatcaagagc 6960taccaactct
ttttccgaag gtaactggct tcagcagagc gcagatacca aatactgtcc 7020ttctagtgta
gccgtagtta ggccaccact tcaagaactc tgtagcaccg cctacatacc 7080tcgctctgct
aatcctgtta ccagtggctg ctgccagtgg cgataagtcg tgtcttaccg 7140ggttggactc
aagacgatag ttaccggata aggcgcagcg gtcgggctga acggggggtt 7200cgtgcacaca
gcccagcttg gagcgaacga cctacaccga actgagatac ctacagcgtg 7260agctatgaga
aagcgccacg cttcccgaag ggagaaaggc ggacaggtat ccggtaagcg 7320gcagggtcgg
aacaggagag cgcacgaggg agcttccagg gggaaacgcc tggtatcttt 7380atagtcctgt
cgggtttcgc cacctctgac ttgagcgtcg atttttgtga tgctcgtcag 7440gggggcggag
cctatggaaa aacgccagca acgcggcctt tttacggttc ctggcctttt 7500gctggccttt
tgctcacatg ttctttcctg cgttatcccc tgattctgtg gataaccgta 7560ttaccgcctt
tgagtgagct gataccgctc gccgcagccg aacgaccgag cgcagcgagt 7620cagtgagcga
ggaagcggaa gagcgcccaa tacgcaaacc gcctctcccc gcgcgttggc 7680cgattcatta
atgcagctgg cacgacaggt ttcccgactg gaaagcgggc agtgagcgca 7740acgcaattaa
tgtgagttag ctcactcatt aggcacccca ggctttacac tttatgcttc 7800cggctcgtat
gttgtgtgga attgtgagcg gataacaatt tcacacagga aacagctatg 7860accatga
786753935PRTArtificial sequenceTalRab2-Clo051 53Met Gly Pro Lys Lys Lys
Arg Lys Val Ala Ala Ala Asp Tyr Lys Asp 1 5
10 15 Asp Asp Asp Lys Pro Gly Gly Gly Gly Ser Gly
Gly Gly Gly Val Pro 20 25
30 Ala Ser Pro Ala Ala Gln Val Asp Leu Arg Thr Leu Gly Tyr Ser
Gln 35 40 45 Gln
Gln Gln Glu Lys Ile Lys Pro Lys Val Arg Ser Thr Val Ala Gln 50
55 60 His His Glu Ala Leu Val
Gly His Gly Phe Thr His Ala His Ile Val 65 70
75 80 Ala Leu Ser Gln His Pro Ala Ala Leu Gly Thr
Val Ala Val Lys Tyr 85 90
95 Gln Asp Met Ile Ala Ala Leu Pro Glu Ala Thr His Glu Ala Ile Val
100 105 110 Gly Val
Gly Lys Gln Trp Ser Gly Ala Arg Ala Leu Glu Ala Leu Leu 115
120 125 Thr Val Ala Gly Glu Leu Arg
Gly Pro Pro Leu Gln Ser Gly Leu Asp 130 135
140 Thr Gly Gln Leu Leu Lys Ile Ala Lys Arg Gly Gly
Val Thr Ala Val 145 150 155
160 Glu Ala Val His Ala Trp Arg Asn Ala Leu Thr Gly Ala Pro Leu Asn
165 170 175 Leu Thr Pro
Gln Gln Val Val Ala Ile Ala Ser Asn Asn Gly Gly Lys 180
185 190 Gln Ala Leu Glu Thr Val Gln Arg
Leu Leu Pro Val Leu Cys Gln Ala 195 200
205 His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser
Asn Asn Gly 210 215 220
Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys 225
230 235 240 Gln Ala His Gly
Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn 245
250 255 Gly Gly Gly Lys Gln Ala Leu Glu Thr
Val Gln Arg Leu Leu Pro Val 260 265
270 Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala
Ile Ala 275 280 285
Ser Asn Asn Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu 290
295 300 Pro Val Leu Cys Gln
Ala His Gly Leu Thr Pro Gln Gln Val Val Ala 305 310
315 320 Ile Ala Ser Asn Asn Gly Gly Lys Gln Ala
Leu Glu Thr Val Gln Arg 325 330
335 Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln
Val 340 345 350 Val
Ala Ile Ala Ser His Asp Gly Gly Lys Gln Ala Leu Glu Thr Val 355
360 365 Gln Arg Leu Leu Pro Val
Leu Cys Gln Ala His Gly Leu Thr Pro Gln 370 375
380 Gln Val Val Ala Ile Ala Ser His Asp Gly Gly
Lys Gln Ala Leu Glu 385 390 395
400 Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr
405 410 415 Pro Gln
Gln Val Val Ala Ile Ala Ser His Asp Gly Gly Lys Gln Ala 420
425 430 Leu Glu Thr Val Gln Arg Leu
Leu Pro Val Leu Cys Gln Ala His Gly 435 440
445 Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn
Asn Gly Gly Lys 450 455 460
Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala 465
470 475 480 His Gly Leu
Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn Asn Gly 485
490 495 Gly Lys Gln Ala Leu Glu Thr Val
Gln Arg Leu Leu Pro Val Leu Cys 500 505
510 Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala Ile
Ala Ser Asn 515 520 525
Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val 530
535 540 Leu Cys Gln Ala
His Gly Leu Thr Pro Glu Gln Val Val Ala Ile Ala 545 550
555 560 Ser Asn Ile Gly Gly Lys Gln Ala Leu
Glu Thr Val Gln Arg Leu Leu 565 570
575 Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val
Val Ala 580 585 590
Ile Ala Ser Asn Asn Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg
595 600 605 Leu Leu Pro Val
Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val 610
615 620 Val Ala Ile Ala Ser Asn Gly Gly
Gly Lys Gln Ala Leu Glu Thr Val 625 630
635 640 Gln Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly
Leu Thr Pro Gln 645 650
655 Gln Val Val Ala Ile Ala Ser Asn Gly Gly Gly Arg Pro Ala Leu Glu
660 665 670 Ser Ile Val
Ala Gln Leu Ser Arg Pro Asp Pro Ala Leu Ala Arg Ser 675
680 685 Ala Leu Thr Asn Asp His Leu Val
Ala Leu Ala Cys Leu Gly Gly Arg 690 695
700 Pro Ala Leu Asp Ala Val Lys Lys Gly Leu Pro His Ala
Pro Ala Leu 705 710 715
720 Ile Lys Arg Thr Asn Arg Arg Ile Pro Glu Gly Ser Asp Arg Leu Asn
725 730 735 Glu Gly Ile Lys
Ser Asn Ile Ser Leu Leu Lys Asp Glu Leu Arg Gly 740
745 750 Gln Ile Ser His Ile Ser His Glu Tyr
Leu Ser Leu Ile Asp Leu Ala 755 760
765 Phe Asp Ser Lys Gln Asn Arg Leu Phe Glu Met Lys Val Leu
Glu Leu 770 775 780
Leu Val Asn Glu Tyr Gly Phe Lys Gly Arg His Leu Gly Gly Ser Arg 785
790 795 800 Lys Pro Asp Gly Ile
Val Tyr Ser Thr Thr Leu Glu Asp Asn Phe Gly 805
810 815 Ile Ile Val Asp Thr Lys Ala Tyr Ser Glu
Gly Tyr Ser Leu Pro Ile 820 825
830 Ser Gln Ala Asp Glu Met Glu Arg Tyr Val Arg Glu Asn Ser Asn
Arg 835 840 845 Asp
Glu Glu Val Asn Pro Asn Lys Trp Trp Glu Asn Phe Ser Glu Glu 850
855 860 Val Lys Lys Tyr Tyr Phe
Val Phe Ile Ser Gly Ser Phe Lys Gly Lys 865 870
875 880 Phe Glu Glu Gln Leu Arg Arg Leu Ser Met Thr
Thr Gly Val Asn Gly 885 890
895 Ser Ala Val Asn Val Val Asn Leu Leu Leu Gly Ala Glu Lys Ile Arg
900 905 910 Ser Gly
Glu Met Thr Ile Glu Glu Leu Glu Arg Ala Met Phe Asn Asn 915
920 925 Ser Glu Phe Ile Leu Lys Tyr
930 935 547867DNAArtificial
sequencepCAG-RabChtTal1-Clo051 54ggcgcgccgg attcgacatt gattattgac
tagttattaa tagtaatcaa ttacggggtc 60attagttcat agcccatata tggagttccg
cgttacataa cttacggtaa atggcccgcc 120tggctgaccg cccaacgacc cccgcccatt
gacgtcaata atgacgtatg ttcccatagt 180aacgccaata gggactttcc attgacgtca
atgggtggag tatttacggt aaactgccca 240cttggcagta catcaagtgt atcatatgcc
aagtacgccc cctattgacg tcaatgacgg 300taaatggccc gcctggcatt atgcccagta
catgacctta tgggactttc ctacttggca 360gtacatctac gtattagtca tcgctattac
catggtcgag gtgagcccca cgttctgctt 420cactctcccc atctcccccc cctccccacc
cccaattttg tatttattta ttttttaatt 480attttgtgca gcgatggggg cggggggggg
gggggggcgc gcgccaggcg gggcggggcg 540gggcgagggg cggggcgggg cgaggcggag
aggtgcggcg gcagccaatc agagcggcgc 600gctccgaaag tttcctttta tggcgaggcg
gcggcggcgg cggccctata aaaagcgaag 660cgcgcggcgg gcgggagtcg ctgcgcgctg
ccttcgcccc gtgccccgct ccgccgccgc 720ctcgcgccgc ccgccccggc tctgactgac
cgcgttactc ccacaggtga gcgggcggga 780cggcccttct cctccgggct gtaattagcg
cttggtttaa tgacggcttg tttcttttct 840gtggctgcgt gaaagccttg aggggctccg
ggagggccct ttgtgcgggg gggagcggct 900cggggggtgc gtgcgtgtgt gtgtgcgtgg
ggagcgccgc gtgcggctcc gcgctgcccg 960gcggctgtga gcgctgcggg cgcggcgcgg
ggctttgtgc gctccgcagt gtgcgcgagg 1020ggagcgcggc cgggggcggt gccccgcggt
gcgggggggg ctgcgagggg aacaaaggct 1080gcgtgcgggg tgtgtgcgtg ggggggtgag
cagggggtgt gggcgcgtcg gtcgggctgc 1140aaccccccct gcacccccct ccccgagttg
ctgagcacgg cccggcttcg ggtgcggggc 1200tccgtacggg gcgtggcgcg gggctcgccg
tgccgggcgg ggggtggcgg caggtggggg 1260tgccgggcgg ggcggggccg cctcgggccg
gggagggctc gggggagggg cgcggcggcc 1320cccggagcgc cggcggctgt cgaggcgcgg
cgagccgcag ccattgcctt ttatggtaat 1380cgtgcgagag ggcgcaggga cttcctttgt
cccaaatctg tgcggagccg aaatctggga 1440ggcgccgccg caccccctct agcgggcgcg
gggcgaagcg gtgcggcgcc ggcaggaagg 1500aaatgggcgg ggagggcctt cgtgcgtcgc
cgcgccgccg tccccttctc cctctccagc 1560ctcggggctg tccgcggggg gacggctgcc
ttcggggggg acggggcagg gcggggttcg 1620gcttctggcg tgtgaccggc ggctctagag
cctctgctaa ccatgttcat gccttcttct 1680ttttcctaca gatccttaat taataatacg
actcactata ggggccgcca ccatgggacc 1740taagaaaaag aggaaggtgg cggccgctga
ctacaaggat gacgacgata aaccaggtgg 1800cggaggtagt ggcggaggtg gggtacccgc
cagtccagca gcccaggtgg atctgagaac 1860cctcggctac agccagcagc agcaggagaa
gatcaaacca aaggtgcggt ccaccgtcgc 1920tcagcaccat gaagcactgg tggggcacgg
tttcacacac gcccatattg tggctctgtc 1980tcagcatccc gctgcactcg ggactgtggc
cgtcaaatat caggacatga tcgccgctct 2040gcctgaggca acccacgaag ccattgtggg
cgtcggaaag cagtggagcg gtgccagagc 2100actcgaagca ctcctcaccg tcgccgggga
actgcggggt ccaccactcc agtccggact 2160ggacactgga cagctgctga agatcgctaa
acgcggcgga gtgacagctg tggaagctgt 2220gcacgcttgg aggaatgctc tgacaggagc
cccactgaat cttacacccg aacaggtggt 2280ggccatcgct agtaacattg ggggcaaaca
ggctctggaa acagtacagc ggctgttacc 2340tgtgctgtgc caggctcatg gcctcacacc
tcagcaggtc gtcgcaatcg cctccaatgg 2400cggagggaag caggccctgg aaacggtgca
gagactgtta ccagtgctgt gccaggccca 2460tggcctaaca ccccagcagg tggtggccat
cgccagccac gacggcggca agcaggccct 2520ggaaaccgtg cagaggctgc tgcctgtgct
gtgccaggct catggcctga cacctgagca 2580ggtcgtcgcc atcgccagca acatcggcgg
caagcaggcc ctggaaaccg tgcagaggct 2640gctgccagtg ctgtgccagg cccatggctt
aacacccgaa caggtggtgg ccatcgcttc 2700taatattggg ggcaagcagg ccctggaaac
agtccagaga ctgttgcctg tgctgtgcca 2760ggctcatggc ttgacacctc agcaggtcgt
cgctatcgcc tctaataagg ggggcaagca 2820ggctctggag acagtacagc gcctgttacc
agtgctgtgc caggcccacg ggctcacacc 2880ccagcaggtg gtggcaatcg cttcccatga
cggagggaaa caggctctgg aaacggtcca 2940gaggctgctc cctgtgctgt gccaggctca
cggtctaaca ccccagcagg tggtggccat 3000tgctagcaac aatgggggca agcaggctct
ggagacagtg cagcgcctgc tgcctgtgct 3060gtgccaggct catggcctca cacctcagca
ggtcgtcgcc atcgccagcc acgacggcgg 3120caagcaggcc ctggaaaccg tgcagaggct
gctgccagtg ctgtgccagg cccatggcct 3180aacaccccag caggtggtgg caatcgcctc
caatggcgga gggaagcagg ccctggaaac 3240ggtgcagaga ctgttacctg tgctgtgcca
ggctcatggc ctgacacctg agcaggtcgt 3300cgctatcgct agcaatatcg gagggaagca
ggctctggaa actgtccagc gcctgctccc 3360agtgctgtgc caggcccatg gcttaacacc
ccagcaggtg gtggcaattg ctagcaatgg 3420cggagggaag caggccctgg agactgtcca
gagactgcta cctgtgctgt gccaggctca 3480tggcttgaca cctcagcagg tcgtcgctat
cgcctctaat aaggggggca agcaggctct 3540ggagacagta cagcgcctgt taccagtgct
gtgccaggcc cacgggctca caccccagca 3600ggtggtggcc atcgccagca acggcggcgg
caagcaggcc ctggaaaccg tgcagaggct 3660gctgcctgtg ctgtgccagg ctcacggcct
gaccccacag caggtcgtcg ctattgcttc 3720taatggcgga gggcggcctg ctctggagag
cattgtggct cagctgtcca ggcccgatcc 3780tgccctggct agatccgcac tcactaacga
tcatctggtc gctctcgctt gcctcggtgg 3840acggcccgct ctggacgcag tcaaaaaggg
tctcccccat gctcccgcac tgatcaagag 3900aaccaacagg agaattcctg agggatccga
tcgtttaaac gaaggcatca aaagcaacat 3960ctccctcctg aaagacgaac tccgggggca
gattagccac attagtcacg aatacctctc 4020cctcatcgac ctggctttcg atagcaagca
gaacaggctc tttgagatga aagtgctgga 4080actgctcgtc aatgagtacg ggttcaaggg
tcgacacctc ggcggatcta ggaaaccaga 4140cggcatcgtg tatagtacca cactggaaga
caactttggg atcattgtgg ataccaaggc 4200atactctgag ggttatagtc tgcccatttc
acaggccgac gagatggaac ggtacgtgcg 4260cgagaactca aatagagatg aggaagtcaa
ccctaacaag tggtgggaga acttctctga 4320ggaagtgaag aaatactact tcgtctttat
cagcgggtcc ttcaagggta aatttgagga 4380acagctcagg agactgagca tgactaccgg
cgtgaatggc agcgccgtca acgtggtcaa 4440tctgctcctg ggcgctgaaa agattcggag
cggagagatg accatcgaag agctggagag 4500ggcaatgttt aataatagcg agtttatcct
gaaatactga acgcgtaaat gattgcagat 4560ccactagttc tagaattcca gctgagcgcc
ggtcgctacc attaccagtt ggtctggtgt 4620caaaaataat aataaccggg caggggggat
ctgcatggat ctttgtgaag gaaccttact 4680tctgtggtgt gacataattg gacaaactac
ctacagagat ttaaagctct aaggtaaata 4740taaaattttt aagtgtataa tgtgttaaac
tactgattct aattgtttgt gtattttaga 4800ttccaaccta tggaactgat gaatgggagc
agtggtggaa tgccagatcc agacatgata 4860agatacattg atgagtttgg acaaaccaca
actagaatgc agtgaaaaaa atgctttatt 4920tgtgaaattt gtgatgctat tgctttattt
gtaaccatta taagctgcaa taaacaagtt 4980aacaacaaca attgcattca ttttatgttt
caggttcagg gggaggtgtg ggaggttttt 5040taaagcaagt aaaacctcta caaatgtggt
atggctgatt atgatctgcg gccgccactg 5100gccgtcgttt tacaacgtcg tgactgggaa
aaccctggcg ttacccaact taatcgcctt 5160gcagcacatc cccctttcgc cagctggcgt
aatagcgaag aggcccgcac cgatcgccct 5220tcccaacagt tgcgcagcct gaatggcgaa
tggaacgcgc cctgtagcgg cgcattaagc 5280gcggcgggtg tggtggttac gcgcagcgtg
accgctacac ttgccagcgc cctagcgccc 5340gctcctttcg ctttcttccc ttcctttctc
gccacgttcg ccggctttcc ccgtcaagct 5400ctaaatcggg ggctcccttt agggttccga
tttagtgctt tacggcacct cgaccccaaa 5460aaacttgatt agggtgatgg ttcacgtagt
gggccatcgc cctgatagac ggtttttcgc 5520cctttgacgt tggagtccac gttctttaat
agtggactct tgttccaaac tggaacaaca 5580ctcaacccta tctcggtcta ttcttttgat
ttataaggga ttttgccgat ttcggcctat 5640tggttaaaaa atgagctgat ttaacaaaaa
tttaacgcga attttaacaa aatattaacg 5700cttacaattt aggtggcact tttcggggaa
atgtgcgcgg aacccctatt tgtttatttt 5760tctaaataca ttcaaatatg tatccgctca
tgagacaata accctgataa atgcttcaat 5820aatattgaaa aaggaagagt atgagtattc
aacatttccg tgtcgccctt attccctttt 5880ttgcggcatt ttgccttcct gtttttgctc
acccagaaac gctggtgaaa gtaaaagatg 5940ctgaagatca gttgggtgca cgagtgggtt
acatcgaact ggatctcaac agcggtaaga 6000tccttgagag ttttcgcccc gaagaacgtt
ttccaatgat gagcactttt aaagttctgc 6060tatgtggcgc ggtattatcc cgtattgacg
ccgggcaaga gcaactcggt cgccgcatac 6120actattctca gaatgacttg gttgagtact
caccagtcac agaaaagcat cttacggatg 6180gcatgacagt aagagaatta tgcagtgctg
ccataaccat gagtgataac actgcggcca 6240acttacttct gacaacgatc ggaggaccga
aggagctaac cgcttttttg cacaacatgg 6300gggatcatgt aactcgcctt gatcgttggg
aaccggagct gaatgaagcc ataccaaacg 6360acgagcgtga caccacgatg cctgtagcaa
tggcaacaac gttgcgcaaa ctattaactg 6420gcgaactact tactctagct tcccggcaac
aattaataga ctggatggag gcggataaag 6480ttgcaggacc acttctgcgc tcggcccttc
cggctggctg gtttattgct gataaatctg 6540gagccggtga gcgtgggtct cgcggtatca
ttgcagcact ggggccagat ggtaagccct 6600cccgtatcgt agttatctac acgacgggga
gtcaggcaac tatggatgaa cgaaatagac 6660agatcgctga gataggtgcc tcactgatta
agcattggta actgtcagac caagtttact 6720catatatact ttagattgat ttaaaacttc
atttttaatt taaaaggatc taggtgaaga 6780tcctttttga taatctcatg accaaaatcc
cttaacgtga gttttcgttc cactgagcgt 6840cagaccccgt agaaaagatc aaaggatctt
cttgagatcc tttttttctg cgcgtaatct 6900gctgcttgca aacaaaaaaa ccaccgctac
cagcggtggt ttgtttgccg gatcaagagc 6960taccaactct ttttccgaag gtaactggct
tcagcagagc gcagatacca aatactgtcc 7020ttctagtgta gccgtagtta ggccaccact
tcaagaactc tgtagcaccg cctacatacc 7080tcgctctgct aatcctgtta ccagtggctg
ctgccagtgg cgataagtcg tgtcttaccg 7140ggttggactc aagacgatag ttaccggata
aggcgcagcg gtcgggctga acggggggtt 7200cgtgcacaca gcccagcttg gagcgaacga
cctacaccga actgagatac ctacagcgtg 7260agctatgaga aagcgccacg cttcccgaag
ggagaaaggc ggacaggtat ccggtaagcg 7320gcagggtcgg aacaggagag cgcacgaggg
agcttccagg gggaaacgcc tggtatcttt 7380atagtcctgt cgggtttcgc cacctctgac
ttgagcgtcg atttttgtga tgctcgtcag 7440gggggcggag cctatggaaa aacgccagca
acgcggcctt tttacggttc ctggcctttt 7500gctggccttt tgctcacatg ttctttcctg
cgttatcccc tgattctgtg gataaccgta 7560ttaccgcctt tgagtgagct gataccgctc
gccgcagccg aacgaccgag cgcagcgagt 7620cagtgagcga ggaagcggaa gagcgcccaa
tacgcaaacc gcctctcccc gcgcgttggc 7680cgattcatta atgcagctgg cacgacaggt
ttcccgactg gaaagcgggc agtgagcgca 7740acgcaattaa tgtgagttag ctcactcatt
aggcacccca ggctttacac tttatgcttc 7800cggctcgtat gttgtgtgga attgtgagcg
gataacaatt tcacacagga aacagctatg 7860accatga
786755935PRTArtificial
sequenceRabChtTal1-Clo051 55Met Gly Pro Lys Lys Lys Arg Lys Val Ala Ala
Ala Asp Tyr Lys Asp 1 5 10
15 Asp Asp Asp Lys Pro Gly Gly Gly Gly Ser Gly Gly Gly Gly Val Pro
20 25 30 Ala Ser
Pro Ala Ala Gln Val Asp Leu Arg Thr Leu Gly Tyr Ser Gln 35
40 45 Gln Gln Gln Glu Lys Ile Lys
Pro Lys Val Arg Ser Thr Val Ala Gln 50 55
60 His His Glu Ala Leu Val Gly His Gly Phe Thr His
Ala His Ile Val 65 70 75
80 Ala Leu Ser Gln His Pro Ala Ala Leu Gly Thr Val Ala Val Lys Tyr
85 90 95 Gln Asp Met
Ile Ala Ala Leu Pro Glu Ala Thr His Glu Ala Ile Val 100
105 110 Gly Val Gly Lys Gln Trp Ser Gly
Ala Arg Ala Leu Glu Ala Leu Leu 115 120
125 Thr Val Ala Gly Glu Leu Arg Gly Pro Pro Leu Gln Ser
Gly Leu Asp 130 135 140
Thr Gly Gln Leu Leu Lys Ile Ala Lys Arg Gly Gly Val Thr Ala Val 145
150 155 160 Glu Ala Val His
Ala Trp Arg Asn Ala Leu Thr Gly Ala Pro Leu Asn 165
170 175 Leu Thr Pro Glu Gln Val Val Ala Ile
Ala Ser Asn Ile Gly Gly Lys 180 185
190 Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys
Gln Ala 195 200 205
His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn Gly Gly 210
215 220 Gly Lys Gln Ala Leu
Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys 225 230
235 240 Gln Ala His Gly Leu Thr Pro Gln Gln Val
Val Ala Ile Ala Ser His 245 250
255 Asp Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro
Val 260 265 270 Leu
Cys Gln Ala His Gly Leu Thr Pro Glu Gln Val Val Ala Ile Ala 275
280 285 Ser Asn Ile Gly Gly Lys
Gln Ala Leu Glu Thr Val Gln Arg Leu Leu 290 295
300 Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro
Glu Gln Val Val Ala 305 310 315
320 Ile Ala Ser Asn Ile Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg
325 330 335 Leu Leu
Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val 340
345 350 Val Ala Ile Ala Ser Asn Lys
Gly Gly Lys Gln Ala Leu Glu Thr Val 355 360
365 Gln Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly
Leu Thr Pro Gln 370 375 380
Gln Val Val Ala Ile Ala Ser His Asp Gly Gly Lys Gln Ala Leu Glu 385
390 395 400 Thr Val Gln
Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr 405
410 415 Pro Gln Gln Val Val Ala Ile Ala
Ser Asn Asn Gly Gly Lys Gln Ala 420 425
430 Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln
Ala His Gly 435 440 445
Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser His Asp Gly Gly Lys 450
455 460 Gln Ala Leu Glu
Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala 465 470
475 480 His Gly Leu Thr Pro Gln Gln Val Val
Ala Ile Ala Ser Asn Gly Gly 485 490
495 Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val
Leu Cys 500 505 510
Gln Ala His Gly Leu Thr Pro Glu Gln Val Val Ala Ile Ala Ser Asn
515 520 525 Ile Gly Gly Lys
Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val 530
535 540 Leu Cys Gln Ala His Gly Leu Thr
Pro Gln Gln Val Val Ala Ile Ala 545 550
555 560 Ser Asn Gly Gly Gly Lys Gln Ala Leu Glu Thr Val
Gln Arg Leu Leu 565 570
575 Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala
580 585 590 Ile Ala Ser
Asn Lys Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg 595
600 605 Leu Leu Pro Val Leu Cys Gln Ala
His Gly Leu Thr Pro Gln Gln Val 610 615
620 Val Ala Ile Ala Ser Asn Gly Gly Gly Lys Gln Ala Leu
Glu Thr Val 625 630 635
640 Gln Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln
645 650 655 Gln Val Val Ala
Ile Ala Ser Asn Gly Gly Gly Arg Pro Ala Leu Glu 660
665 670 Ser Ile Val Ala Gln Leu Ser Arg Pro
Asp Pro Ala Leu Ala Arg Ser 675 680
685 Ala Leu Thr Asn Asp His Leu Val Ala Leu Ala Cys Leu Gly
Gly Arg 690 695 700
Pro Ala Leu Asp Ala Val Lys Lys Gly Leu Pro His Ala Pro Ala Leu 705
710 715 720 Ile Lys Arg Thr Asn
Arg Arg Ile Pro Glu Gly Ser Asp Arg Leu Asn 725
730 735 Glu Gly Ile Lys Ser Asn Ile Ser Leu Leu
Lys Asp Glu Leu Arg Gly 740 745
750 Gln Ile Ser His Ile Ser His Glu Tyr Leu Ser Leu Ile Asp Leu
Ala 755 760 765 Phe
Asp Ser Lys Gln Asn Arg Leu Phe Glu Met Lys Val Leu Glu Leu 770
775 780 Leu Val Asn Glu Tyr Gly
Phe Lys Gly Arg His Leu Gly Gly Ser Arg 785 790
795 800 Lys Pro Asp Gly Ile Val Tyr Ser Thr Thr Leu
Glu Asp Asn Phe Gly 805 810
815 Ile Ile Val Asp Thr Lys Ala Tyr Ser Glu Gly Tyr Ser Leu Pro Ile
820 825 830 Ser Gln
Ala Asp Glu Met Glu Arg Tyr Val Arg Glu Asn Ser Asn Arg 835
840 845 Asp Glu Glu Val Asn Pro Asn
Lys Trp Trp Glu Asn Phe Ser Glu Glu 850 855
860 Val Lys Lys Tyr Tyr Phe Val Phe Ile Ser Gly Ser
Phe Lys Gly Lys 865 870 875
880 Phe Glu Glu Gln Leu Arg Arg Leu Ser Met Thr Thr Gly Val Asn Gly
885 890 895 Ser Ala Val
Asn Val Val Asn Leu Leu Leu Gly Ala Glu Lys Ile Arg 900
905 910 Ser Gly Glu Met Thr Ile Glu Glu
Leu Glu Arg Ala Met Phe Asn Asn 915 920
925 Ser Glu Phe Ile Leu Lys Tyr 930
935 567867DNAArtificial sequencepCAG-RabChtTal2-Clo051 56ggcgcgccgg
attcgacatt gattattgac tagttattaa tagtaatcaa ttacggggtc 60attagttcat
agcccatata tggagttccg cgttacataa cttacggtaa atggcccgcc 120tggctgaccg
cccaacgacc cccgcccatt gacgtcaata atgacgtatg ttcccatagt 180aacgccaata
gggactttcc attgacgtca atgggtggag tatttacggt aaactgccca 240cttggcagta
catcaagtgt atcatatgcc aagtacgccc cctattgacg tcaatgacgg 300taaatggccc
gcctggcatt atgcccagta catgacctta tgggactttc ctacttggca 360gtacatctac
gtattagtca tcgctattac catggtcgag gtgagcccca cgttctgctt 420cactctcccc
atctcccccc cctccccacc cccaattttg tatttattta ttttttaatt 480attttgtgca
gcgatggggg cggggggggg gggggggcgc gcgccaggcg gggcggggcg 540gggcgagggg
cggggcgggg cgaggcggag aggtgcggcg gcagccaatc agagcggcgc 600gctccgaaag
tttcctttta tggcgaggcg gcggcggcgg cggccctata aaaagcgaag 660cgcgcggcgg
gcgggagtcg ctgcgcgctg ccttcgcccc gtgccccgct ccgccgccgc 720ctcgcgccgc
ccgccccggc tctgactgac cgcgttactc ccacaggtga gcgggcggga 780cggcccttct
cctccgggct gtaattagcg cttggtttaa tgacggcttg tttcttttct 840gtggctgcgt
gaaagccttg aggggctccg ggagggccct ttgtgcgggg gggagcggct 900cggggggtgc
gtgcgtgtgt gtgtgcgtgg ggagcgccgc gtgcggctcc gcgctgcccg 960gcggctgtga
gcgctgcggg cgcggcgcgg ggctttgtgc gctccgcagt gtgcgcgagg 1020ggagcgcggc
cgggggcggt gccccgcggt gcgggggggg ctgcgagggg aacaaaggct 1080gcgtgcgggg
tgtgtgcgtg ggggggtgag cagggggtgt gggcgcgtcg gtcgggctgc 1140aaccccccct
gcacccccct ccccgagttg ctgagcacgg cccggcttcg ggtgcggggc 1200tccgtacggg
gcgtggcgcg gggctcgccg tgccgggcgg ggggtggcgg caggtggggg 1260tgccgggcgg
ggcggggccg cctcgggccg gggagggctc gggggagggg cgcggcggcc 1320cccggagcgc
cggcggctgt cgaggcgcgg cgagccgcag ccattgcctt ttatggtaat 1380cgtgcgagag
ggcgcaggga cttcctttgt cccaaatctg tgcggagccg aaatctggga 1440ggcgccgccg
caccccctct agcgggcgcg gggcgaagcg gtgcggcgcc ggcaggaagg 1500aaatgggcgg
ggagggcctt cgtgcgtcgc cgcgccgccg tccccttctc cctctccagc 1560ctcggggctg
tccgcggggg gacggctgcc ttcggggggg acggggcagg gcggggttcg 1620gcttctggcg
tgtgaccggc ggctctagag cctctgctaa ccatgttcat gccttcttct 1680ttttcctaca
gatccttaat taataatacg actcactata ggggccgcca ccatgggacc 1740taagaaaaag
aggaaggtgg cggccgctga ctacaaggat gacgacgata aaccaggtgg 1800cggaggtagt
ggcggaggtg gggtacccgc cagtccagca gcccaggtgg atctgagaac 1860cctcggctac
agccagcagc agcaggagaa gatcaaacca aaggtgcggt ccaccgtcgc 1920tcagcaccat
gaagcactgg tggggcacgg tttcacacac gcccatattg tggctctgtc 1980tcagcatccc
gctgcactcg ggactgtggc cgtcaaatat caggacatga tcgccgctct 2040gcctgaggca
acccacgaag ccattgtggg cgtcggaaag cagtggagcg gtgccagagc 2100actcgaagca
ctcctcaccg tcgccgggga actgcggggt ccaccactcc agtccggact 2160ggacactgga
cagctgctga agatcgctaa acgcggcgga gtgacagctg tggaagctgt 2220gcacgcttgg
aggaatgctc tgacaggagc cccactgaat cttacacccc agcaggtggt 2280ggccattgct
agcaacaatg ggggcaagca ggctctggag acagtgcagc gcctgctgcc 2340tgtgctgtgc
caggctcatg gcctcacacc tcagcaggtc gtcgccattg cttctaacaa 2400tggagggaag
caggctctgg agactgtgca gagactgctg ccagtgctgt gccaggccca 2460tggcctaaca
ccccagcagg tggtggccat cgccagccac gacggcggca agcaggccct 2520ggaaaccgtg
cagaggctgc tgcctgtgct gtgccaggct catggcctga cacctcagca 2580ggtcgtcgcc
atcgccagcc acgacggcgg caagcaggcc ctggaaaccg tgcagaggct 2640gctgccagtg
ctgtgccagg cccatggctt aacaccccag caggtggtgg ccatcgctag 2700tcatgacggg
ggcaaacagg ctctggaaac agtacagcgg ctgttacctg tgctgtgcca 2760ggctcatggc
ttgacacctc agcaggtcgt cgctatcgcc tctaataagg ggggcaagca 2820ggctctggag
acagtacagc gcctgttacc agtgctgtgc caggcccacg ggctcacacc 2880ccagcaggtg
gtggcaattg cttccaataa gggcggaaaa caggctctgg aaaccgtcca 2940gaggctgctg
cctgtgctgt gccaggctca cggtctaaca ccccagcagg tggtggccat 3000cgcttccaac
ggagggggca aacaggctct ggaaacagtg cagaggctgc tgcctgtgct 3060gtgccaggct
catggcctca cacctgagca ggtcgtcgcc atcgccagca acatcggcgg 3120caagcaggcc
ctggaaaccg tgcagaggct gctgccagtg ctgtgccagg cccatggcct 3180aacaccccag
caggtggtgg caattgcttc caataagggc ggaaaacagg ctctggaaac 3240cgtccagagg
ctgctgcctg tgctgtgcca ggctcatggc ctgacacctc agcaggtcgt 3300cgcaatcgcc
tccaatggcg gagggaagca ggccctggaa acggtgcaga gactgttacc 3360agtgctgtgc
caggcccatg gcttaacacc ccagcaggtg gtggcaatcg cctctaataa 3420gggagggaag
caggccctgg aaaccgtgca gagactgtta cctgtgctgt gccaggctca 3480tggcttgaca
cctcagcagg tcgtcgctat cgctagtcat gatggcggaa aacaggctct 3540ggaaactgtg
cagcggctgc tcccagtgct gtgccaggcc cacgggctca caccccagca 3600ggtggtggcc
atcgccagca acaagggcgg caagcaggcc ctggaaaccg tgcagaggct 3660gctgcctgtg
ctgtgccagg ctcacggcct gaccccacag caggtcgtcg ctattgcttc 3720taatggcgga
gggcggcctg ctctggagag cattgtggct cagctgtcca ggcccgatcc 3780tgccctggct
agatccgcac tcactaacga tcatctggtc gctctcgctt gcctcggtgg 3840acggcccgct
ctggacgcag tcaaaaaggg tctcccccat gctcccgcac tgatcaagag 3900aaccaacagg
agaattcctg agggatccga tcgtttaaac gaaggcatca aaagcaacat 3960ctccctcctg
aaagacgaac tccgggggca gattagccac attagtcacg aatacctctc 4020cctcatcgac
ctggctttcg atagcaagca gaacaggctc tttgagatga aagtgctgga 4080actgctcgtc
aatgagtacg ggttcaaggg tcgacacctc ggcggatcta ggaaaccaga 4140cggcatcgtg
tatagtacca cactggaaga caactttggg atcattgtgg ataccaaggc 4200atactctgag
ggttatagtc tgcccatttc acaggccgac gagatggaac ggtacgtgcg 4260cgagaactca
aatagagatg aggaagtcaa ccctaacaag tggtgggaga acttctctga 4320ggaagtgaag
aaatactact tcgtctttat cagcgggtcc ttcaagggta aatttgagga 4380acagctcagg
agactgagca tgactaccgg cgtgaatggc agcgccgtca acgtggtcaa 4440tctgctcctg
ggcgctgaaa agattcggag cggagagatg accatcgaag agctggagag 4500ggcaatgttt
aataatagcg agtttatcct gaaatactga acgcgtaaat gattgcagat 4560ccactagttc
tagaattcca gctgagcgcc ggtcgctacc attaccagtt ggtctggtgt 4620caaaaataat
aataaccggg caggggggat ctgcatggat ctttgtgaag gaaccttact 4680tctgtggtgt
gacataattg gacaaactac ctacagagat ttaaagctct aaggtaaata 4740taaaattttt
aagtgtataa tgtgttaaac tactgattct aattgtttgt gtattttaga 4800ttccaaccta
tggaactgat gaatgggagc agtggtggaa tgccagatcc agacatgata 4860agatacattg
atgagtttgg acaaaccaca actagaatgc agtgaaaaaa atgctttatt 4920tgtgaaattt
gtgatgctat tgctttattt gtaaccatta taagctgcaa taaacaagtt 4980aacaacaaca
attgcattca ttttatgttt caggttcagg gggaggtgtg ggaggttttt 5040taaagcaagt
aaaacctcta caaatgtggt atggctgatt atgatctgcg gccgccactg 5100gccgtcgttt
tacaacgtcg tgactgggaa aaccctggcg ttacccaact taatcgcctt 5160gcagcacatc
cccctttcgc cagctggcgt aatagcgaag aggcccgcac cgatcgccct 5220tcccaacagt
tgcgcagcct gaatggcgaa tggaacgcgc cctgtagcgg cgcattaagc 5280gcggcgggtg
tggtggttac gcgcagcgtg accgctacac ttgccagcgc cctagcgccc 5340gctcctttcg
ctttcttccc ttcctttctc gccacgttcg ccggctttcc ccgtcaagct 5400ctaaatcggg
ggctcccttt agggttccga tttagtgctt tacggcacct cgaccccaaa 5460aaacttgatt
agggtgatgg ttcacgtagt gggccatcgc cctgatagac ggtttttcgc 5520cctttgacgt
tggagtccac gttctttaat agtggactct tgttccaaac tggaacaaca 5580ctcaacccta
tctcggtcta ttcttttgat ttataaggga ttttgccgat ttcggcctat 5640tggttaaaaa
atgagctgat ttaacaaaaa tttaacgcga attttaacaa aatattaacg 5700cttacaattt
aggtggcact tttcggggaa atgtgcgcgg aacccctatt tgtttatttt 5760tctaaataca
ttcaaatatg tatccgctca tgagacaata accctgataa atgcttcaat 5820aatattgaaa
aaggaagagt atgagtattc aacatttccg tgtcgccctt attccctttt 5880ttgcggcatt
ttgccttcct gtttttgctc acccagaaac gctggtgaaa gtaaaagatg 5940ctgaagatca
gttgggtgca cgagtgggtt acatcgaact ggatctcaac agcggtaaga 6000tccttgagag
ttttcgcccc gaagaacgtt ttccaatgat gagcactttt aaagttctgc 6060tatgtggcgc
ggtattatcc cgtattgacg ccgggcaaga gcaactcggt cgccgcatac 6120actattctca
gaatgacttg gttgagtact caccagtcac agaaaagcat cttacggatg 6180gcatgacagt
aagagaatta tgcagtgctg ccataaccat gagtgataac actgcggcca 6240acttacttct
gacaacgatc ggaggaccga aggagctaac cgcttttttg cacaacatgg 6300gggatcatgt
aactcgcctt gatcgttggg aaccggagct gaatgaagcc ataccaaacg 6360acgagcgtga
caccacgatg cctgtagcaa tggcaacaac gttgcgcaaa ctattaactg 6420gcgaactact
tactctagct tcccggcaac aattaataga ctggatggag gcggataaag 6480ttgcaggacc
acttctgcgc tcggcccttc cggctggctg gtttattgct gataaatctg 6540gagccggtga
gcgtgggtct cgcggtatca ttgcagcact ggggccagat ggtaagccct 6600cccgtatcgt
agttatctac acgacgggga gtcaggcaac tatggatgaa cgaaatagac 6660agatcgctga
gataggtgcc tcactgatta agcattggta actgtcagac caagtttact 6720catatatact
ttagattgat ttaaaacttc atttttaatt taaaaggatc taggtgaaga 6780tcctttttga
taatctcatg accaaaatcc cttaacgtga gttttcgttc cactgagcgt 6840cagaccccgt
agaaaagatc aaaggatctt cttgagatcc tttttttctg cgcgtaatct 6900gctgcttgca
aacaaaaaaa ccaccgctac cagcggtggt ttgtttgccg gatcaagagc 6960taccaactct
ttttccgaag gtaactggct tcagcagagc gcagatacca aatactgtcc 7020ttctagtgta
gccgtagtta ggccaccact tcaagaactc tgtagcaccg cctacatacc 7080tcgctctgct
aatcctgtta ccagtggctg ctgccagtgg cgataagtcg tgtcttaccg 7140ggttggactc
aagacgatag ttaccggata aggcgcagcg gtcgggctga acggggggtt 7200cgtgcacaca
gcccagcttg gagcgaacga cctacaccga actgagatac ctacagcgtg 7260agctatgaga
aagcgccacg cttcccgaag ggagaaaggc ggacaggtat ccggtaagcg 7320gcagggtcgg
aacaggagag cgcacgaggg agcttccagg gggaaacgcc tggtatcttt 7380atagtcctgt
cgggtttcgc cacctctgac ttgagcgtcg atttttgtga tgctcgtcag 7440gggggcggag
cctatggaaa aacgccagca acgcggcctt tttacggttc ctggcctttt 7500gctggccttt
tgctcacatg ttctttcctg cgttatcccc tgattctgtg gataaccgta 7560ttaccgcctt
tgagtgagct gataccgctc gccgcagccg aacgaccgag cgcagcgagt 7620cagtgagcga
ggaagcggaa gagcgcccaa tacgcaaacc gcctctcccc gcgcgttggc 7680cgattcatta
atgcagctgg cacgacaggt ttcccgactg gaaagcgggc agtgagcgca 7740acgcaattaa
tgtgagttag ctcactcatt aggcacccca ggctttacac tttatgcttc 7800cggctcgtat
gttgtgtgga attgtgagcg gataacaatt tcacacagga aacagctatg 7860accatga
786757935PRTArtificial sequenceRabChtTal2-Clo051 57Met Gly Pro Lys Lys
Lys Arg Lys Val Ala Ala Ala Asp Tyr Lys Asp 1 5
10 15 Asp Asp Asp Lys Pro Gly Gly Gly Gly Ser
Gly Gly Gly Gly Val Pro 20 25
30 Ala Ser Pro Ala Ala Gln Val Asp Leu Arg Thr Leu Gly Tyr Ser
Gln 35 40 45 Gln
Gln Gln Glu Lys Ile Lys Pro Lys Val Arg Ser Thr Val Ala Gln 50
55 60 His His Glu Ala Leu Val
Gly His Gly Phe Thr His Ala His Ile Val 65 70
75 80 Ala Leu Ser Gln His Pro Ala Ala Leu Gly Thr
Val Ala Val Lys Tyr 85 90
95 Gln Asp Met Ile Ala Ala Leu Pro Glu Ala Thr His Glu Ala Ile Val
100 105 110 Gly Val
Gly Lys Gln Trp Ser Gly Ala Arg Ala Leu Glu Ala Leu Leu 115
120 125 Thr Val Ala Gly Glu Leu Arg
Gly Pro Pro Leu Gln Ser Gly Leu Asp 130 135
140 Thr Gly Gln Leu Leu Lys Ile Ala Lys Arg Gly Gly
Val Thr Ala Val 145 150 155
160 Glu Ala Val His Ala Trp Arg Asn Ala Leu Thr Gly Ala Pro Leu Asn
165 170 175 Leu Thr Pro
Gln Gln Val Val Ala Ile Ala Ser Asn Asn Gly Gly Lys 180
185 190 Gln Ala Leu Glu Thr Val Gln Arg
Leu Leu Pro Val Leu Cys Gln Ala 195 200
205 His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser
Asn Asn Gly 210 215 220
Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys 225
230 235 240 Gln Ala His Gly
Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser His 245
250 255 Asp Gly Gly Lys Gln Ala Leu Glu Thr
Val Gln Arg Leu Leu Pro Val 260 265
270 Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala
Ile Ala 275 280 285
Ser His Asp Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu 290
295 300 Pro Val Leu Cys Gln
Ala His Gly Leu Thr Pro Gln Gln Val Val Ala 305 310
315 320 Ile Ala Ser His Asp Gly Gly Lys Gln Ala
Leu Glu Thr Val Gln Arg 325 330
335 Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln
Val 340 345 350 Val
Ala Ile Ala Ser Asn Lys Gly Gly Lys Gln Ala Leu Glu Thr Val 355
360 365 Gln Arg Leu Leu Pro Val
Leu Cys Gln Ala His Gly Leu Thr Pro Gln 370 375
380 Gln Val Val Ala Ile Ala Ser Asn Lys Gly Gly
Lys Gln Ala Leu Glu 385 390 395
400 Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr
405 410 415 Pro Gln
Gln Val Val Ala Ile Ala Ser Asn Gly Gly Gly Lys Gln Ala 420
425 430 Leu Glu Thr Val Gln Arg Leu
Leu Pro Val Leu Cys Gln Ala His Gly 435 440
445 Leu Thr Pro Glu Gln Val Val Ala Ile Ala Ser Asn
Ile Gly Gly Lys 450 455 460
Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala 465
470 475 480 His Gly Leu
Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn Lys Gly 485
490 495 Gly Lys Gln Ala Leu Glu Thr Val
Gln Arg Leu Leu Pro Val Leu Cys 500 505
510 Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala Ile
Ala Ser Asn 515 520 525
Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val 530
535 540 Leu Cys Gln Ala
His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala 545 550
555 560 Ser Asn Lys Gly Gly Lys Gln Ala Leu
Glu Thr Val Gln Arg Leu Leu 565 570
575 Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val
Val Ala 580 585 590
Ile Ala Ser His Asp Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg
595 600 605 Leu Leu Pro Val
Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val 610
615 620 Val Ala Ile Ala Ser Asn Lys Gly
Gly Lys Gln Ala Leu Glu Thr Val 625 630
635 640 Gln Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly
Leu Thr Pro Gln 645 650
655 Gln Val Val Ala Ile Ala Ser Asn Gly Gly Gly Arg Pro Ala Leu Glu
660 665 670 Ser Ile Val
Ala Gln Leu Ser Arg Pro Asp Pro Ala Leu Ala Arg Ser 675
680 685 Ala Leu Thr Asn Asp His Leu Val
Ala Leu Ala Cys Leu Gly Gly Arg 690 695
700 Pro Ala Leu Asp Ala Val Lys Lys Gly Leu Pro His Ala
Pro Ala Leu 705 710 715
720 Ile Lys Arg Thr Asn Arg Arg Ile Pro Glu Gly Ser Asp Arg Leu Asn
725 730 735 Glu Gly Ile Lys
Ser Asn Ile Ser Leu Leu Lys Asp Glu Leu Arg Gly 740
745 750 Gln Ile Ser His Ile Ser His Glu Tyr
Leu Ser Leu Ile Asp Leu Ala 755 760
765 Phe Asp Ser Lys Gln Asn Arg Leu Phe Glu Met Lys Val Leu
Glu Leu 770 775 780
Leu Val Asn Glu Tyr Gly Phe Lys Gly Arg His Leu Gly Gly Ser Arg 785
790 795 800 Lys Pro Asp Gly Ile
Val Tyr Ser Thr Thr Leu Glu Asp Asn Phe Gly 805
810 815 Ile Ile Val Asp Thr Lys Ala Tyr Ser Glu
Gly Tyr Ser Leu Pro Ile 820 825
830 Ser Gln Ala Asp Glu Met Glu Arg Tyr Val Arg Glu Asn Ser Asn
Arg 835 840 845 Asp
Glu Glu Val Asn Pro Asn Lys Trp Trp Glu Asn Phe Ser Glu Glu 850
855 860 Val Lys Lys Tyr Tyr Phe
Val Phe Ile Ser Gly Ser Phe Lys Gly Lys 865 870
875 880 Phe Glu Glu Gln Leu Arg Arg Leu Ser Met Thr
Thr Gly Val Asn Gly 885 890
895 Ser Ala Val Asn Val Val Asn Leu Leu Leu Gly Ala Glu Lys Ile Arg
900 905 910 Ser Gly
Glu Met Thr Ile Glu Glu Leu Glu Arg Ala Met Phe Asn Asn 915
920 925 Ser Glu Phe Ile Leu Lys Tyr
930 935 586607DNAArtificial sequencepRab38-chtTAL
58caccgcatta ccctgggcgt tgaaaccgaa gaagacctgg atttgaaata ggcgttttct
60ttacatttct aaagtgggac tcctcacttg taaaaggaaa aataatgata cttttaagac
120ttccaggatg actaaatggt gtgtatgaga agatttataa acatctgccg ctacttacaa
180tgataagacc acttgtgtgt tgttcagctt ggagaattta ggataggagt ggaggctgaa
240agaaaagtaa gcccttagca tttcctctca ggtggcctct actttaggtc attaacagtt
300gaataggcgc taagagatag cattaccact ttatagaagc ccaggcaaaa ggagattaaa
360gggtttgcct aaattctttc aactctaagg gccagagaag acctaagtct actgctttgc
420tgtttctcaa ggtctcccca actttacaac actgtgtggg tggcaacagg gcttaatagc
480ctcagaagac ctgggtattt ttcgacactc agttctctcc ccggcagaac gtggaaaaca
540aaatccacat aagtttgtgt catggacggg aggcgagaga aaaatctctg tgaaaggagt
600aaagcactgt gcaaatacca gcttgacagg cagtagcact ggggtcccgg gtcctttagc
660ttccagtccc aggagttgct cttgtctcct cccactctgg agtccgcaga gtaggaagga
720ggattaaacc cgggggagga gttccgcacc agctccctat cctgcgccag cacgcctagc
780ctaagcgccc acatagagct ccggtctccg tcggtgccca gccccggctg tgcttcccag
840agcaagctcc aggctccgca agacccgcgg gcctccagga tgcagacacc tcacaaggag
900cacctgtaca agctgctggt gatcggcgac ctggtagtgg gcaagaccag cattattaaa
960cggtacgtgc atcaaaattt ctcctctcat tatcgagcca ccattggtgt ggacttcgcg
1020ctgaaggtgc tccactggga cccagagacg gtggtgcgct tgcagctctg ggacattgct
1080ggtgagcgat cagagcagcg cgcaacgggt gagggtggag tgagccagtg aggagttcgg
1140gggtgaaggt tcggggagtg gaaaatgact tttcagtcgg ttccagtccc gggacccttg
1200agtgcaatca agcaggagat ccggatcgcc tgggcgctcc actcttggaa agtttggctt
1260aatggcttgg aaacctgatt tcaaagaaat ggaagtgttt tcttttcttt ctttcctttt
1320tttttttttt tttttttctt ttgctgttgt ttctgttgga gtcgtcccca ctctacctgt
1380aacttctaga taacttcgct ggctctcact ggctgtgaga aagcgaacca ctttctcctg
1440ggattcttgg gtgcagagaa ggctgtcgcc tggactcaca aggagattgt agtcgcattc
1500ttgtttcatt ctagtccttt tctggacaca ggtagccgcg acttggccca gagtatctca
1560cgtggctttc atccttcgtg tttagagggg aagcccctag gaaatttaag aaggagcagg
1620attatcttag gaatttagtt tctttcaaat ctcactacta tcatctcctt gcttattggc
1680ctcttcagtc agaaaaattt gagatgctaa atttgtatac atctagaacg aactatctct
1740tctcactcca ctcccctctt ccccatctct cttccgtctc cctccatcct tggctatctc
1800ttcttcactt tccatttcaa acaggagact gtgtatgttt tttaggaaaa cattaaaaaa
1860aaaaccacaa aaacaaaaac aaaacggaga cagggtcccg tcatgtaact ctgctaacct
1920atatcaagct gaccttgacc tcatagagac ccacttgcct ctgcctccct agaggcaagg
1980gtcggggtta tggtgatgtt aatgtcgttt gctttaagat tccttgattt gatcttggtg
2040tattttttga gaaatctaaa gtatgaaatc agagtttgac taacagcttc taccagctcc
2100tagccacaat aaagactgag gcaggctata gttagtgctc aatactgggt cctacctggc
2160tgcttgtaac ctgggcatgc ctagcattct agatgctaac tcaccaaagc agtagcattt
2220taagctgcaa atggctaggc agcgacagct caagaatctt cttgctttgg agttttaaac
2280tccaatgaga ttttccatga tccctttcaa ataaccctac ttaatctctc ttcatagccc
2340acagtaccaa gaagcctttg ataagctctg gattgaaaag aagcagttct ttttcaaaag
2400atgtgctcat ttgaactagt gcatttccct ggaaacactt tgccaggact tgagatgggc
2460actaagaagg aaaattcctc aaaggacatg tacagtcttg agatgcattc gcttctgtag
2520ccatgagctt gctggtcttg agataaggtt agttggtgta gctaggttca tggtttggag
2580tctttggcag ttctagagaa gcatgagcta ttagagactt ggagattgca tcaagtagag
2640ccttttgagc ttttcactgt gtacctgggc cctctgtcgc tgcacgtttt agtgtctgaa
2700atgtctttca gctgtagcag ttttctcggg accccagttt aaaatagctt actgtttaaa
2760agatgtagct gtagctagca ttattgaact agcataatta tagtctaaat agcattatgt
2820cttcagcctt gttatatgtt ggtgagtttt agtttcctct tctaaacggg aagaacagaa
2880agatgtaatg attctgagct tccagagtga gacacctcta gagagaaata ccttcttctg
2940aagactaccg tgtgattaca gataaattct gatatctttg tttagctttt gatatctata
3000aacagggagt gtattttatc tctccaaatg agagaagaat aaacaataat gcaaggtaaa
3060ggcaatagtg ctacactcta ggagttacca ctctttgtac atttatttat aaatactaag
3120caagaggaac atgccataca tacactgact aagtcctaac aagtggcagt tcttatatca
3180cacatttatc ttgccctcaa atgccagtcc agcatcagtt tagtctcatg catttggcag
3240cataaggcag tttgagttcc acacttgctc tcagaagcaa tttaactccc acacttggga
3300atcctttcct aagccacagt ttcagaccaa agttttggtg aaggctataa tcacagaagt
3360ctgcacaagt agggagtctg aaggatctga gctccattca gcagtcagag cggcatccaa
3420ccccaaggta atgctcagct cactttgata acttcaagct caaaggccct gaactgctga
3480gttggaggtt gaaagatgtt tgggtaaaag caaggtaatt ggcggatagg atggttgtaa
3540cgtaattgtt tcaagttgta ttagagacct ctgggttcta aggggatatg aaatccaacc
3600tccactctcc actgagattc aagttaggtt aagtatgcct ttgagtaccc tcaagtcaca
3660gcatgccact ctccttttct taactctaat atgtatctat aaagaacggg tagtagtcaa
3720ctgagtcgac ggtatcgata agcttgatcc agcttttgtt ccctttagtg agggttaatt
3780gcgcgcttgg cgtaatcatg gtcatagctg tttcctgtgt gaaattgtta tccgctcaca
3840attccacaca acatacgagc cggaagcata aagtgtaaag cctggggtgc ctaatgagtg
3900agctaactca cattaattgc gttgcgctca ctgcccgctt tccagtcggg aaacctgtcg
3960tgccagctgc attaatgaat cggccaacgc gcggggagag gcggtttgcg tattgggcgc
4020tcttccgctt cctcgctcac tgactcgctg cgctcggtcg ttcggctgcg gcgagcggta
4080tcagctcact caaaggcggt aatacggtta tccacagaat caggggataa cgcaggaaag
4140aacatgtgag caaaaggcca gcaaaaggcc aggaaccgta aaaaggccgc gttgctggcg
4200tttttccata ggctccgccc ccctgacgag catcacaaaa atcgacgctc aagtcagagg
4260tggcgaaacc cgacaggact ataaagatac caggcgtttc cccctggaag ctccctcgtg
4320cgctctcctg ttccgaccct gccgcttacc ggatacctgt ccgcctttct cccttcggga
4380agcgtggcgc tttctcatag ctcacgctgt aggtatctca gttcggtgta ggtcgttcgc
4440tccaagctgg gctgtgtgca cgaacccccc gttcagcccg accgctgcgc cttatccggt
4500aactatcgtc ttgagtccaa cccggtaaga cacgacttat cgccactggc agcagccact
4560ggtaacagga ttagcagagc gaggtatgta ggcggtgcta cagagttctt gaagtggtgg
4620cctaactacg gctacactag aaggacagta tttggtatct gcgctctgct gaagccagtt
4680accttcggaa aaagagttgg tagctcttga tccggcaaac aaaccaccgc tggtagcggt
4740ggtttttttg tttgcaagca gcagattacg cgcagaaaaa aaggatctca agaagatcct
4800ttgatctttt ctacggggtc tgacgctcag tggaacgaaa actcacgtta agggattttg
4860gtcatgagat tatcaaaaag gatcttcacc tagatccttt taaattaaaa atgaagtttt
4920aaatcaatct aaagtatata tgagtaaact tggtctgaca gttaccaatg cttaatcagt
4980gaggcaccta tctcagcgat ctgtctattt cgttcatcca tagttgcctg actccccgtc
5040gtgtagataa ctacgatacg ggagggctta ccatctggcc ccagtgctgc aatgataccg
5100cgagacccac gctcaccggc tccagattta tcagcaataa accagccagc cggaagggcc
5160gagcgcagaa gtggtcctgc aactttatcc gcctccatcc agtctattaa ttgttgccgg
5220gaagctagag taagtagttc gccagttaat agtttgcgca acgttgttgc cattgctaca
5280ggcatcgtgg tgtcacgctc gtcgtttggt atggcttcat tcagctccgg ttcccaacga
5340tcaaggcgag ttacatgatc ccccatgttg tgcaaaaaag cggttagctc cttcggtcct
5400ccgatcgttg tcagaagtaa gttggccgca gtgttatcac tcatggttat ggcagcactg
5460cataattctc ttactgtcat gccatccgta agatgctttt ctgtgactgg tgagtactca
5520accaagtcat tctgagaata gtgtatgcgg cgaccgagtt gctcttgccc ggcgtcaata
5580cgggataata ccgcgccaca tagcagaact ttaaaagtgc tcatcattgg aaaacgttct
5640tcggggcgaa aactctcaag gatcttaccg ctgttgagat ccagttcgat gtaacccact
5700cgtgcaccca actgatcttc agcatctttt actttcacca gcgtttctgg gtgagcaaaa
5760acaggaaggc aaaatgccgc aaaaaaggga ataagggcga cacggaaatg ttgaatactc
5820atactcttcc tttttcaata ttattgaagc atttatcagg gttattgtct catgagcgga
5880tacatatttg aatgtattta gaaaaataaa caaatagggg ttccgcgcac atttccccga
5940aaagtgccac ctaaattgta agcgttaata ttttgttaaa attcgcgtta aatttttgtt
6000aaatcagctc attttttaac caataggccg aaatcggcaa aatcccttat aaatcaaaag
6060aatagaccga gatagggttg agtgttgttc cagtttggaa caagagtcca ctattaaaga
6120acgtggactc caacgtcaaa gggcgaaaaa ccgtctatca gggcgatggc ccactacgtg
6180aaccatcacc ctaatcaagt tttttggggt cgaggtgccg taaagcacta aatcggaacc
6240ctaaagggag cccccgattt agagcttgac ggggaaagcc ggcgaacgtg gcgagaaagg
6300aagggaagaa agcgaaagga gcgggcgcta gggcgctggc aagtgtagcg gtcacgctgc
6360gcgtaaccac cacacccgcc gcgcttaatg cgccgctaca gggcgcgtcc cattcgccat
6420tcaggctgcg caactgttgg gaagggcgat cggtgcgggc ctcttcgcta ttacgccagc
6480tggcgaaagg gggatgtgct gcaaggcgat taagttgggt aacgccaggg ttttcccagt
6540cacgacgttg taaaacgacg gccagtgagc gcgcgtaata cgactcacta tagggcgaat
6600tggagct
660759202DNAMus musculus 59atgcagacac ctcacaagga gcacctgtac aagctgctgg
tgatcggcga cctgggtgtg 60ggcaagacca gcattatcaa gcgctatgtg caccaaaact
tctcctcgca ctaccgggcc 120accattggtg tggacttcgc gctgaaggtg ctccactggg
acccagagac ggtggtgcgc 180ttgcagctct gggacattgc tg
202608218DNAArtificial sequencepRab38-chtTAL-neo
60caccgcatta ccctgggcgt tgaaaccgaa gaagacctgg atttgaaata ggcgttttct
60ttacatttct aaagtgggac tcctcacttg taaaaggaaa aataatgata cttttaagac
120ttccaggatg actaaatggt gtgtatgaga agatttataa acatctgccg ctacttacaa
180tgataagacc acttgtgtgt tgttcagctt ggagaattta ggataggagt ggaggctgaa
240agaaaagtaa gcccttagca tttcctctca ggtggcctct actttaggtc attaacagtt
300gaataggcgc taagagatag cattaccact ttatagaagc ccaggcaaaa ggagattaaa
360gggtttgcct aaattctttc aactctaagg gccagagaag acctaagtct actgctttgc
420tgtttctcaa ggtctcccca actttacaac actgtgtggg tggcaacagg gcttaatagc
480ctcagaagac ctgggtattt ttcgacactc agttctctcc ccggcagaac gtggaaaaca
540aaatccacat aagtttgtgt catggacggg aggcgagaga aaaatctctg tgaaaggagt
600aaagcactgt gcaaatacca gcttgacagg cagtagcact ggggtcccgg gtcctttagc
660ttccagtccc aggagttgct cttgtctcct cccactctgg agtccgcaga gtaggaagga
720ggattaaacc cgggggagga gttccgcacc agctccctat cctgcgccag cacgcctagc
780ctaagcgccc acatagagct ccggtctccg tcggtgccca gccccggctg tgcttcccag
840agcaagctcc aggctccgca agacccgcgg gcctccagga tgcagacacc tcacaaggag
900cacctgtaca agctgctggt gatcggcgac ctggtagtgg gcaagaccag cattattaaa
960cggtacgtgc atcaaaatac cgggtagggg aggcgctttt cccaaggcag tctggagcat
1020gcgctttagc agccccgctg ggcacttggc gctacacaag tggcctctgg cctcgcacac
1080attccacatc caccggtagg cgccaaccgg ctccgttctt tggtggcccc ttcgcgccac
1140cttctactcc tcccctagtc aggaagttcc cccccgcccc gcagctcgcg tcgtgcagga
1200cgtgacaaat ggaagtagca cgtctcacta gtctcgtgca gatggacagc accgctgagc
1260aatggaagcg ggtaggcctt tggggcagcg gccaatagca gctttgctcc ttcgctttct
1320gggctcagag gctgggaagg ggtgggtccg ggggcgggct caggggcggg ctcaggggcg
1380gggcgggcgc ccgaaggtcc tccggaggcc cggcattctg cacgcttcaa aagcgcacgt
1440ctgccgcgct gttctcctct tcctcatctc cgggcctttc gacctgcagc caatatggga
1500tcggccattg aacaagatgg attgcacgca ggttctccgg ccgcttgggt ggagaggcta
1560ttcggctatg actgggcaca acagacaatc ggctgctctg atgccgccgt gttccggctg
1620tcagcgcagg ggcgcccggt tctttttgtc aagaccgacc tgtccggtgc cctgaatgaa
1680ctgcaggacg aggcagcgcg gctatcgtgg ctggccacga cgggcgttcc ttgcgcagct
1740gtgctcgacg ttgtcactga agcgggaagg gactggctgc tattgggcga agtgccgggg
1800caggatctcc tgtcatctca ccttgctcct gccgagaaag tatccatcat ggctgatgca
1860atgcggcggc tgcatacgct tgatccggct acctgcccat tcgaccacca agcgaaacat
1920cgcatcgagc gagcacgtac tcggatggaa gccggtcttg tcgatcagga tgatctggac
1980gaagagcatc aggggctcgc gccagccgaa ctgttcgcca ggctcaaggc gcgcatgccc
2040gacggcgatg atctcgtcgt gacccatggc gatgcctgct tgccgaatat catggtggaa
2100aatggccgct tttctggatt catcgactgt ggccggctgg gtgtggcgga ccgctatcag
2160gacatagcgt tggctacccg tgatattgct gaagagcttg gcggcgaatg ggctgaccgc
2220ttcctcgtgc tttacggtat cgccgctccc gattcgcagc gcatcgcctt ctatcgcctt
2280cttgacgagt tcttctgagg ggatcaattc tctagagctc gctgatcagc ctcgactgtg
2340ccttctagtt gccagccatc tgttgtttgc ccctcccccg tgccttcctt gaccctggaa
2400ggtgccactc ccactgtcct ttcctaataa aatgaggaaa ttgcatcgca ttgtctgagt
2460aggtgtcatt ctattctggg gggtggggtg gggcaggaca gcaaggggga ggattgggaa
2520gacaatagca ggcatgctgg ggatgcggtg ggctctatgg cttctgaggc ggaaagaacc
2580agctggggtt tctcctctca ttatcgagcc accattggtg tggacttcgc gctgaaggtg
2640ctccactggg acccagagac ggtggtgcgc ttgcagctct gggacattgc tggtgagcga
2700tcagagcagc gcgcaacggg tgagggtgga gtgagccagt gaggagttcg ggggtgaagg
2760ttcggggagt ggaaaatgac ttttcagtcg gttccagtcc cgggaccctt gagtgcaatc
2820aagcaggaga tccggatcgc ctgggcgctc cactcttgga aagtttggct taatggcttg
2880gaaacctgat ttcaaagaaa tggaagtgtt ttcttttctt tctttccttt tttttttttt
2940ttttttttct tttgctgttg tttctgttgg agtcgtcccc actctacctg taacttctag
3000ataacttcgc tggctctcac tggctgtgag aaagcgaacc actttctcct gggattcttg
3060ggtgcagaga aggctgtcgc ctggactcac aaggagattg tagtcgcatt cttgtttcat
3120tctagtcctt ttctggacac aggtagccgc gacttggccc agagtatctc acgtggcttt
3180catccttcgt gtttagaggg gaagccccta ggaaatttaa gaaggagcag gattatctta
3240ggaatttagt ttctttcaaa tctcactact atcatctcct tgcttattgg cctcttcagt
3300cagaaaaatt tgagatgcta aatttgtata catctagaac gaactatctc ttctcactcc
3360actcccctct tccccatctc tcttccgtct ccctccatcc ttggctatct cttcttcact
3420ttccatttca aacaggagac tgtgtatgtt ttttaggaaa acattaaaaa aaaaaccaca
3480aaaacaaaaa caaaacggag acagggtccc gtcatgtaac tctgctaacc tatatcaagc
3540tgaccttgac ctcatagaga cccacttgcc tctgcctccc tagaggcaag ggtcggggtt
3600atggtgatgt taatgtcgtt tgctttaaga ttccttgatt tgatcttggt gtattttttg
3660agaaatctaa agtatgaaat cagagtttga ctaacagctt ctaccagctc ctagccacaa
3720taaagactga ggcaggctat agttagtgct caatactggg tcctacctgg ctgcttgtaa
3780cctgggcatg cctagcattc tagatgctaa ctcaccaaag cagtagcatt ttaagctgca
3840aatggctagg cagcgacagc tcaagaatct tcttgctttg gagttttaaa ctccaatgag
3900attttccatg atccctttca aataacccta cttaatctct cttcatagcc cacagtacca
3960agaagccttt gataagctct ggattgaaaa gaagcagttc tttttcaaaa gatgtgctca
4020tttgaactag tgcatttccc tggaaacact ttgccaggac ttgagatggg cactaagaag
4080gaaaattcct caaaggacat gtacagtctt gagatgcatt cgcttctgta gccatgagct
4140tgctggtctt gagataaggt tagttggtgt agctaggttc atggtttgga gtctttggca
4200gttctagaga agcatgagct attagagact tggagattgc atcaagtaga gccttttgag
4260cttttcactg tgtacctggg ccctctgtcg ctgcacgttt tagtgtctga aatgtctttc
4320agctgtagca gttttctcgg gaccccagtt taaaatagct tactgtttaa aagatgtagc
4380tgtagctagc attattgaac tagcataatt atagtctaaa tagcattatg tcttcagcct
4440tgttatatgt tggtgagttt tagtttcctc ttctaaacgg gaagaacaga aagatgtaat
4500gattctgagc ttccagagtg agacacctct agagagaaat accttcttct gaagactacc
4560gtgtgattac agataaattc tgatatcttt gtttagcttt tgatatctat aaacagggag
4620tgtattttat ctctccaaat gagagaagaa taaacaataa tgcaaggtaa aggcaatagt
4680gctacactct aggagttacc actctttgta catttattta taaatactaa gcaagaggaa
4740catgccatac atacactgac taagtcctaa caagtggcag ttcttatatc acacatttat
4800cttgccctca aatgccagtc cagcatcagt ttagtctcat gcatttggca gcataaggca
4860gtttgagttc cacacttgct ctcagaagca atttaactcc cacacttggg aatcctttcc
4920taagccacag tttcagacca aagttttggt gaaggctata atcacagaag tctgcacaag
4980tagggagtct gaaggatctg agctccattc agcagtcaga gcggcatcca accccaaggt
5040aatgctcagc tcactttgat aacttcaagc tcaaaggccc tgaactgctg agttggaggt
5100tgaaagatgt ttgggtaaaa gcaaggtaat tggcggatag gatggttgta acgtaattgt
5160ttcaagttgt attagagacc tctgggttct aaggggatat gaaatccaac ctccactctc
5220cactgagatt caagttaggt taagtatgcc tttgagtacc ctcaagtcac agcatgccac
5280tctccttttc ttaactctaa tatgtatcta taaagaacgg gtagtagtca actgagtcga
5340cggtatcgat aagcttgatc cagcttttgt tccctttagt gagggttaat tgcgcgcttg
5400gcgtaatcat ggtcatagct gtttcctgtg tgaaattgtt atccgctcac aattccacac
5460aacatacgag ccggaagcat aaagtgtaaa gcctggggtg cctaatgagt gagctaactc
5520acattaattg cgttgcgctc actgcccgct ttccagtcgg gaaacctgtc gtgccagctg
5580cattaatgaa tcggccaacg cgcggggaga ggcggtttgc gtattgggcg ctcttccgct
5640tcctcgctca ctgactcgct gcgctcggtc gttcggctgc ggcgagcggt atcagctcac
5700tcaaaggcgg taatacggtt atccacagaa tcaggggata acgcaggaaa gaacatgtga
5760gcaaaaggcc agcaaaaggc caggaaccgt aaaaaggccg cgttgctggc gtttttccat
5820aggctccgcc cccctgacga gcatcacaaa aatcgacgct caagtcagag gtggcgaaac
5880ccgacaggac tataaagata ccaggcgttt ccccctggaa gctccctcgt gcgctctcct
5940gttccgaccc tgccgcttac cggatacctg tccgcctttc tcccttcggg aagcgtggcg
6000ctttctcata gctcacgctg taggtatctc agttcggtgt aggtcgttcg ctccaagctg
6060ggctgtgtgc acgaaccccc cgttcagccc gaccgctgcg ccttatccgg taactatcgt
6120cttgagtcca acccggtaag acacgactta tcgccactgg cagcagccac tggtaacagg
6180attagcagag cgaggtatgt aggcggtgct acagagttct tgaagtggtg gcctaactac
6240ggctacacta gaaggacagt atttggtatc tgcgctctgc tgaagccagt taccttcgga
6300aaaagagttg gtagctcttg atccggcaaa caaaccaccg ctggtagcgg tggttttttt
6360gtttgcaagc agcagattac gcgcagaaaa aaaggatctc aagaagatcc tttgatcttt
6420tctacggggt ctgacgctca gtggaacgaa aactcacgtt aagggatttt ggtcatgaga
6480ttatcaaaaa ggatcttcac ctagatcctt ttaaattaaa aatgaagttt taaatcaatc
6540taaagtatat atgagtaaac ttggtctgac agttaccaat gcttaatcag tgaggcacct
6600atctcagcga tctgtctatt tcgttcatcc atagttgcct gactccccgt cgtgtagata
6660actacgatac gggagggctt accatctggc cccagtgctg caatgatacc gcgagaccca
6720cgctcaccgg ctccagattt atcagcaata aaccagccag ccggaagggc cgagcgcaga
6780agtggtcctg caactttatc cgcctccatc cagtctatta attgttgccg ggaagctaga
6840gtaagtagtt cgccagttaa tagtttgcgc aacgttgttg ccattgctac aggcatcgtg
6900gtgtcacgct cgtcgtttgg tatggcttca ttcagctccg gttcccaacg atcaaggcga
6960gttacatgat cccccatgtt gtgcaaaaaa gcggttagct ccttcggtcc tccgatcgtt
7020gtcagaagta agttggccgc agtgttatca ctcatggtta tggcagcact gcataattct
7080cttactgtca tgccatccgt aagatgcttt tctgtgactg gtgagtactc aaccaagtca
7140ttctgagaat agtgtatgcg gcgaccgagt tgctcttgcc cggcgtcaat acgggataat
7200accgcgccac atagcagaac tttaaaagtg ctcatcattg gaaaacgttc ttcggggcga
7260aaactctcaa ggatcttacc gctgttgaga tccagttcga tgtaacccac tcgtgcaccc
7320aactgatctt cagcatcttt tactttcacc agcgtttctg ggtgagcaaa aacaggaagg
7380caaaatgccg caaaaaaggg aataagggcg acacggaaat gttgaatact catactcttc
7440ctttttcaat attattgaag catttatcag ggttattgtc tcatgagcgg atacatattt
7500gaatgtattt agaaaaataa acaaataggg gttccgcgca catttccccg aaaagtgcca
7560cctaaattgt aagcgttaat attttgttaa aattcgcgtt aaatttttgt taaatcagct
7620cattttttaa ccaataggcc gaaatcggca aaatccctta taaatcaaaa gaatagaccg
7680agatagggtt gagtgttgtt ccagtttgga acaagagtcc actattaaag aacgtggact
7740ccaacgtcaa agggcgaaaa accgtctatc agggcgatgg cccactacgt gaaccatcac
7800cctaatcaag ttttttgggg tcgaggtgcc gtaaagcact aaatcggaac cctaaaggga
7860gcccccgatt tagagcttga cggggaaagc cggcgaacgt ggcgagaaag gaagggaaga
7920aagcgaaagg agcgggcgct agggcgctgg caagtgtagc ggtcacgctg cgcgtaacca
7980ccacacccgc cgcgcttaat gcgccgctac agggcgcgtc ccattcgcca ttcaggctgc
8040gcaactgttg ggaagggcga tcggtgcggg cctcttcgct attacgccag ctggcgaaag
8100ggggatgtgc tgcaaggcga ttaagttggg taacgccagg gttttcccag tcacgacgtt
8160gtaaaacgac ggccagtgag cgcgcgtaat acgactcact atagggcgaa ttggagct
8218619989DNAArtificial sequencepCMV-Rab-Reporter (hygro) 61gaattcgagc
ttgcatgcct gcaggtcgtt acataactta cggtaaatgg cccgcctggc 60tgaccgccca
acgacccccg cccattgacg tcaataatga cgtatgttcc catagtaacg 120ccaataggga
ctttccattg acgtcaatgg gtggagtatt tacggtaaac tgcccacttg 180gcagtacatc
aagtgtatca tatgccaagt acgcccccta ttgacgtcaa tgacggtaaa 240tggcccgcct
ggcattatgc ccagtacatg accttatggg actttcctac ttggcagtac 300atctacgtat
tagtcatcgc tattaccatg gtgatgcggt tttggcagta catcaatggg 360cgtggatagc
ggtttgactc acggggattt ccaagtctcc accccattga cgtcaatggg 420agtttgtttt
ggcaccaaaa tcaacgggac tttccaaaat gtcgtaacaa ctccgcccca 480ttgacgcaaa
tgggcggtag gcgtgtacgg tgggaggtct atataagcag agctcgttta 540gtgaaccgtc
agatcgcctg gagacgccat ccacgctgtt ttgacctcca tagaagacac 600cgggaccgat
ccagcctccg gactctagag gatccggtac tcgaggacac tgcagagacc 660tacttcacta
acaaccggta tggtcgccag tagcttggca ctggccgtcg ttttacaacg 720tcgtgactgg
gaaaaccctg gcgttaccca acttaatcgc cttgcagcac atcccccttt 780cgccagctgg
cgtaatagcg aagaggcccg caccgatcgc ccttcccaac agttgcgcag 840cctgaatggc
gaatggcgct ttgcctggtt tccggcacca gaagcggtgc cggaaagctg 900gctggagtgc
gatcttcctg aggccgatac tgtcgtcgtc ccctcaaact ggcagatgca 960cggttacgat
gcgcccatct acaccaacgt gacctatccc attacggtca atccgccgtt 1020tgttcccacg
gagaatccga cgggttgtta ctcgctcaca tttaatgttg atgaaagctg 1080gctataaaac
cggtacagtt cggccaccat ggtcgtatca agcgctatgt gcaccaaaac 1140ttctcctcgc
actaccgggc caccattggt cgagtagctt ggcactggcc gtcgttttac 1200aacgtcgtga
ctgggaaaac cctggcgtta cccaacttaa tcgccttgca gcacatcccc 1260ctttcgccag
ctggcgtaat agcgaagagg cccgcaccga tcgcccttcc caacagttgc 1320gcagcctgaa
tggcgaatgg cgctttgcct ggtttccggc accagaagcg gtgccggaaa 1380gctggctgga
gtgcgatctt cctgaggccg atactgtcgt cgtcccctca aactggcaga 1440tgcacggtta
cgatgcgccc atctacacca acgtgaccta tcccattacg gtcaatccgc 1500cgtttgttcc
cacggagaat ccgacgggtt gttactcgct cacatttaat gttgatgaaa 1560gctggctaca
ggaaggccag acgcgaatta tttttgatgg cgttaactcg gcgtttcatc 1620tgtggtgcaa
cgggcgctgg gtcggttacg gccaggacag tcgtttgccg tctgaatttg 1680acctgagcgc
atttttacgc gccggagaaa accgcctcgc ggtgatggtg ctgcgctgga 1740gtgacggcag
ttatctggaa gatcaggata tgtggcggat gagcggcatt ttccgtgacg 1800tctcgttgct
gcataaaccg actacacaaa tcagcgattt ccatgttgcc actcgcttta 1860atgatgattt
cagccgcgct gtactggagg ctgaagttca gatgtgcggc gagttgcgtg 1920actacctacg
ggtaacagtt tctttatggc agggtgaaac gcaggtcgcc agcggcaccg 1980cgcctttcgg
cggtgaaatt atcgatgagc gtggtggtta tgccgatcgc gtcacactac 2040gtctgaacgt
cgaaaacccg aaactgtgga gcgccgaaat cccgaatctc tatcgtgcgg 2100tggttgaact
gcacaccgcc gacggcacgc tgattgaagc agaagcctgc gatgtcggtt 2160tccgcgaggt
gcggattgaa aatggtctgc tgctgctgaa cggcaagccg ttgctgattc 2220gaggcgttaa
ccgtcacgag catcatcctc tgcatggtca ggtcatggat gagcagacga 2280tggtgcagga
tatcctgctg atgaagcaga acaactttaa cgccgtgcgc tgttcgcatt 2340atccgaacca
tccgctgtgg tacacgctgt gcgaccgcta cggcctgtat gtggtggatg 2400aagccaatat
tgaaacccac ggcatggtgc caatgaatcg tctgaccgat gatccgcgct 2460ggctaccggc
gatgagcgaa cgcgtaacgc gaatggtgca gcgcgatcgt aatcacccga 2520gtgtgatcat
ctggtcgctg gggaatgaat caggccacgg cgctaatcac gacgcgctgt 2580atcgctggat
caaatctgtc gatccttccc gcccggtgca gtatgaaggc ggcggagccg 2640acaccacggc
caccgatatt atttgcccga tgtacgcgcg cgtggatgaa gaccagccct 2700tcccggctgt
gccgaaatgg tccatcaaaa aatggctttc gctacctgga gagacgcgcc 2760cgctgatcct
ttgcgaatac gcccacgcga tgggtaacag tcttggcggt ttcgctaaat 2820actggcaggc
gtttcgtcag tatccccgtt tacagggcgg cttcgtctgg gactgggtgg 2880atcagtcgct
gattaaatat gatgaaaacg gcaacccgtg gtcggcttac ggcggtgatt 2940ttggcgatac
gccgaacgat cgccagttct gtatgaacgg tctggtcttt gccgaccgca 3000cgccgcatcc
agcgctgacg gaagcaaaac accagcagca gtttttccag ttccgtttat 3060ccgggcaaac
catcgaagtg accagcgaat acctgttccg tcatagcgat aacgagctcc 3120tgcactggat
ggtggcgctg gatggtaagc cgctggcaag cggtgaagtg cctctggatg 3180tcgctccaca
aggtaaacag ttgattgaac tgcctgaact accgcagccg gagagcgccg 3240ggcaactctg
gctcacagta cgcgtagtgc aaccgaacgc gaccgcatgg tcagaagccg 3300ggcacatcag
cgcctggcag cagtggcgtc tggcggaaaa cctcagtgtg acgctccccg 3360ccgcgtccca
cgccatcccg catctgacca ccagcgaaat ggatttttgc atcgagctgg 3420gtaataagcg
ttggcaattt aaccgccagt caggctttct ttcacagatg tggattggcg 3480ataaaaaaca
actgctgacg ccgctgcgcg atcagttcac ccgtgcaccg ctggataacg 3540acattggcgt
aagtgaagcg acccgcattg accctaacgc ctgggtcgaa cgctggaagg 3600cggcgggcca
ttaccaggcc gaagcagcgt tgttgcagtg cacggcagat acacttgctg 3660atgcggtgct
gattacgacc gctcacgcgt ggcagcatca ggggaaaacc ttatttatca 3720gccggaaaac
ctaccggatt gatggtagtg gtcaaatggc gattaccgtt gatgttgaag 3780tggcgagcga
tacaccgcat ccggcgcgga ttggcctgaa ctgccagctg gcgcaggtag 3840cagagcgggt
aaactggctc ggattagggc cgcaagaaaa ctatcccgac cgccttactg 3900ccgcctgttt
tgaccgctgg gatctgccat tgtcagacat gtataccccg tacgtcttcc 3960cgagcgaaaa
cggtctgcgc tgcgggacgc gcgaattgaa ttatggccca caccagtggc 4020gcggcgactt
ccagttcaac atcagccgct acagtcaaca gcaactgatg gaaaccagcc 4080atcgccatct
gctgcacgcg gaagaaggca catggctgaa tatcgacggt ttccatatgg 4140ggattggtgg
cgacgactcc tggagcccgt cagtatcggc ggaattccag ctgagcgccg 4200gtcgctacca
ttaccagttg gtctggtgtc aggggatccc ccgggctgca gccaatatgg 4260gatcggccat
tgaacaagat ggattgcacg caggttctcc ggccgcttgg gtggagaggc 4320tattcggcta
tgactgggca caacagacaa tcggctgctc tgatgccgcc gtgttccggc 4380tgtcagcgca
ggggcgcccg gttctttttg tcaagaccga cctgtccggt gccctgaatg 4440aactgcagga
cgaggcagcg cggctatcgt ggctggccac gacgggcgtt ccttgcgcag 4500ctgtgctcga
cgttgtcact gaagcgggaa gggactggct gctattgggc gaagtgccgg 4560ggcaggatct
cctgtcatct caccttgctc ctgccgagaa agtatccatc atggctgatg 4620caatgcggcg
gctgcatacg cttgatccgg ctacctgccc attcgaccac caagcgaaac 4680atcgcatcga
gcgagcacgt actcggatgg aagccggtct tgtcgatcag gatgatctgg 4740acgaagagca
tcaggggctc gcgccagccg aactgttcgc caggctcaag gcgcgcatgc 4800ccgacggcga
ggatctcgtc gtgacccatg gcgatgcctg cttgccgaat atcatggtgg 4860aaaatggccg
cttttctgga ttcatcgact gtggccggct gggtgtggcg gaccgctatc 4920aggacatagc
gttggctacc cgtgatattg ctgaagagct tggcggcgaa tgggctgacc 4980gcttcctcgt
gctttacggt atcgccgctc ccgattcgca gcgcatcgcc ttctatcgcc 5040ttcttgacga
gttcttctga ggggatcaat tctctagagc tcgctgatca gcctcgactg 5100tgccttctag
ttgccagcca tctgttgttt gcccctcccc cgtgccttcc ttgaccctgg 5160aaggtgccac
tcccactgtc ctttcctaat aaaatgagga aattgcatcg cattgtctga 5220gtaggtgtca
ttctattctg gggggtgggg tggggcagga cagcaagggg gaggattggg 5280aagacaatag
caggcatgct ggggatgcgg tgggctctat ggcttctgag acggaaagaa 5340ccagctgggg
ctcgatcctc tagagtcgac gtttgatctg atatcatcga tgaattctac 5400cgggtagggg
aggcgctttt cccaaggcag tctggagcat gcgctttagc agccccgctg 5460ggcacttggc
gctacacaag tggcctctgg cctcgcacac attccacatc caccggtagg 5520cgccaaccgg
ctccgttctt tggtggcccc ttcgcgccac cttctactcc tcccctagtc 5580aggaagttcc
cccccgcccc gcagctcgcg tcgtgcagga cgtgacaaat ggaagtagca 5640cgtctcacta
gtctcgtgca gatggacagc accgctgagc aatggaagcg ggtaggcctt 5700tggggcagcg
gccaatagca gctttgctcc ttcgctttct gggctcagag gctgggaagg 5760ggtgggtccg
ggggcgggct caggggcggg ctcaggggcg gggcgggcgc ccgaaggtcc 5820tccggaggcc
cggcattctg cacgcttcaa aagcgcacgt ctgccgcgct gttctcctct 5880tcctcatctc
cgggcctttc gaccgatcca gccgccacca tgaaaaagcc tgaactcacc 5940gcgacgtctg
tcgagaagtt tctgatcgaa aagttcgaca gcgtctccga cctgatgcag 6000ctctcggagg
gcgaagaatc tcgtgctttc agcttcgatg taggagggcg tggatatgtc 6060ctgcgggtaa
atagctgcgc cgatggtttc tacaaagatc gttatgttta tcggcacttt 6120gcatcggccg
cgctcccgat tccggaagtg cttgacattg gggaattcag cgagagcctg 6180acctattgca
tctcccgccg tgcacagggt gtcacgttgc aagacctgcc tgaaaccgaa 6240ctgcccgctg
ttctgcagcc ggtcgcggag gccatggatg cgatcgctgc ggccgatctt 6300agccagacga
gcgggttcgg cccattcgga ccgcaaggaa tcggtcaata cactacatgg 6360cgtgatttca
tatgcgcgat tgctgatccc catgtgtatc actggcaaac tgtgatggac 6420gacaccgtca
gtgcgtccgt cgcgcaggct ctcgatgagc tgatgctttg ggccgaggac 6480tgccccgaag
tccggcacct cgtgcacgcg gatttcggct ccaacaatgt cctgacggac 6540aatggccgca
taacagcggt cattgactgg agcgaggcga tgttcgggga ttcccaatac 6600gaggtcgcca
acatcttctt ctggaggccg tggttggctt gtatggagca gcagacgcgc 6660tacttcgagc
ggaggcatcc ggagcttgca ggatcgccgc ggctccgggc gtatatgctc 6720cgcattggtc
ttgaccaact ctatcagagc ttggttgacg gcaatttcga tgatgcagct 6780tgggcgcagg
gtcgatgcga cgcaatcgtc cgatccggag ccgggactgt cgggcgtaca 6840caaatcgccc
gcagaagcgc ggccgtctgg accgatggct gtgtagaagt actcgccgat 6900agtggaaacc
gacgccccag cactcgtccg agggcaaagg aatagtcgag aaattgatga 6960tctattaaac
aataaagatg tccactaaaa tggaagtttt tcctgtcata ctttgttaag 7020aagggtgaga
acagagtacc tacattttga atggaaggat tggagctacg ggggtggggg 7080tggggtggga
ttagataaat gcctgctctt tactgaaggc tctttactat tgctttatga 7140taatgtttca
tagttggata tcataattta aacaagcaaa accaaattaa gggccagctc 7200attcctccca
ctcatgatct atagatcaaa catgcatgaa gttcctattc cgaagttcct 7260attctctaga
aagtatagga acttcataaa acctgcaggc atgcaagcga tcgcggccgg 7320ccaaggcccg
cggggccact agttctagag cggccagctt ggcgtaatca tggtcatagc 7380tgtttcctgt
gtgaaattgt tatccgctca caattccaca caacatacga gccggaagca 7440taaagtgtaa
agcctggggt gcctaatgag tgagctaact cacattaatt gcgttgcgct 7500cactgcccgc
tttccagtcg ggaaacctgt cgtgccagct gcattaatga atcggccaac 7560gcgcggggag
aggcggtttg cgtattgggc gctcttccgc ttcctcgctc actgactcgc 7620tgcgctcggt
cgttcggctg cggcgagcgg tatcagctca ctcaaaggcg gtaatacggt 7680tatccacaga
atcaggggat aacgcaggaa agaacatgtg agcaaaaggc cagcaaaagg 7740ccaggaaccg
taaaaaggcc gcgttgctgg cgtttttcca taggctccgc ccccctgacg 7800agcatcacaa
aaatcgacgc tcaagtcaga ggtggcgaaa cccgacagga ctataaagat 7860accaggcgtt
tccccctgga agctccctcg tgcgctctcc tgttccgacc ctgccgctta 7920ccggatacct
gtccgccttt ctcccttcgg gaagcgtggc gctttctcat agctcacgct 7980gtaggtatct
cagttcggtg taggtcgttc gctccaagct gggctgtgtg cacgaacccc 8040ccgttcagcc
cgaccgctgc gccttatccg gtaactatcg tcttgagtcc aacccggtaa 8100gacacgactt
atcgccactg gcagcagcca ctggtaacag gattagcaga gcgaggtatg 8160taggcggtgc
tacagagttc ttgaagtggt ggcctaacta cggctacact agaaggacag 8220tatttggtat
ctgcgctctg ctgaagccag ttaccttcgg aaaaagagtt ggtagctctt 8280gatccggcaa
acaaaccacc gctggtagcg gtggtttttt tgtttgcaag cagcagatta 8340cgcgcagaaa
aaaaggatct caagaagatc ctttgatctt ttctacgggg tctgacgctc 8400agtggaacga
aaactcacgt taagggattt tggtcatgag attatcaaaa aggatcttca 8460cctagatcct
tttaaattaa aaatgaagtt ttaaatcaat ctaaagtata tatgagtaaa 8520cttggtctga
cagttaccaa tgcttaatca gtgaggcacc tatctcagcg atctgtctat 8580ttcgttcatc
catagttgcc tgactccccg tcgtgtagat aactacgata cgggagggct 8640taccatctgg
ccccagtgct gcaatgatac cgcgagaccc acgctcaccg gctccagatt 8700tatcagcaat
aaaccagcca gccggaaggg ccgagcgcag aagtggtcct gcaactttat 8760ccgcctccat
ccagtctatt aattgttgcc gggaagctag agtaagtagt tcgccagtta 8820atagtttgcg
caacgttgtt gccattgcta caggcatcgt ggtgtcacgc tcgtcgtttg 8880gtatggcttc
attcagctcc ggttcccaac gatcaaggcg agttacatga tcccccatgt 8940tgtgcaaaaa
agcggttagc tccttcggtc ctccgatcgt tgtcagaagt aagttggccg 9000cagtgttatc
actcatggtt atggcagcac tgcataattc tcttactgtc atgccatccg 9060taagatgctt
ttctgtgact ggtgagtact caaccaagtc attctgagaa tagtgtatgc 9120ggcgaccgag
ttgctcttgc ccggcgtcaa tacgggataa taccgcgcca catagcagaa 9180ctttaaaagt
gctcatcatt ggaaaacgtt cttcggggcg aaaactctca aggatcttac 9240cgctgttgag
atccagttcg atgtaaccca ctcgtgcacc caactgatct tcagcatctt 9300ttactttcac
cagcgtttct gggtgagcaa aaacaggaag gcaaaatgcc gcaaaaaagg 9360gaataagggc
gacacggaaa tgttgaatac tcatactctt cctttttcaa tattattgaa 9420gcatttatca
gggttattgt ctcatgagcg gatacatatt tgaatgtatt tagaaaaata 9480aacaaatagg
ggttccgcgc acatttcccc gaaaagtgcc acctgacgtc taagaaacca 9540ttattatcat
gacattaacc tataaaaata ggcgtatcac gaggcccttt cgtctcgcgc 9600gtttcggtga
tgacggtgaa aacctctgac acatgcagct cccggagacg gtcacagctt 9660gtctgtaagc
ggatgccggg agcagacaag cccgtcaggg cgcgtcagcg ggtgttggcg 9720ggtgtcgggg
ctggcttaac tatgcggcat cagagcagat tgtactgaga gtgcaccata 9780tgcggtgtga
aataccgcac agatgcgtaa ggagaaaata ccgcatcagg cgccattcgc 9840cattcaggct
gcgcaactgt tgggaagggc gatcggtgcg ggcctcttcg ctattacgcc 9900agctggcgaa
agggggatgt gctgcaaggc gattaagttg ggtaacgcca gggttttccc 9960agtcacgacg
ttgtaaaacg acggccagt
99896212DNAArtificial SequenceDNA target sequence 62attctgggac gt
126314DNAMus musculus
63ggtggcccgg tagt
146434PRTXanthomonas campestris 64Leu Thr Pro Glu Gln Val Val Ala Ile Ala
Ser Asn Ile Gly Gly Lys 1 5 10
15 Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln
Ala 20 25 30 His
Gly 6534PRTXanthomonas campestris 65Leu Thr Pro Gln Gln Val Val Ala Ile
Ala Ser His Asp Gly Gly Lys 1 5 10
15 Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys
Gln Ala 20 25 30
His Gly 6634PRTXanthomonas campestris 66Leu Thr Pro Gln Gln Val Val Ala
Ile Ala Ser Asn Gly Gly Gly Lys 1 5 10
15 Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu
Cys Gln Ala 20 25 30
His Gly 6734PRTXanthomonas campestris 67Leu Thr Pro Gln Gln Val Val Ala
Ile Ala Ser Asn Asn Gly Gly Lys 1 5 10
15 Gln Ala Leu Glu Thr Val Gln Ala Leu Leu Pro Val Leu
Cys Gln Ala 20 25 30
His Gly
User Contributions:
Comment about this patent or add new information about this topic: