Patent application title: MODIFIED CASCADE RIBONUCLEOPROTEINS AND USES THEREOF
Inventors:
Stan Johan Jozef Brouns (Wageningen, NL)
John Van Der Oost (Renkum, NL)
John Van Der Oost (Renkum, NL)
IPC8 Class: AC12N922FI
USPC Class:
435199
Class name: Hydrolase (3. ) acting on ester bond (3.1) ribonuclease (3.1.4)
Publication date: 2016-06-30
Patent application number: 20160186152
Abstract:
A clustered regularly interspaced short palindromic repeat
(CRISPR)-associated complex for adaptive antiviral defence (Cascade); the
Cascade protein complex comprising at least CRISPR-associated protein
subunits Cas7, Cas5 and Cas6 which includes at least one subunit with an
additional amino acid sequence possessing nucleic acid or chromatin
modifying, visualising, transcription activating or transcription
repressing activity. The Cascade complex with additional activity is
combined with an RNA molecule to produce a ribonucleoprotein complex. The
RNA molecule is selected to have substantial complementarity to a target
sequence. Targeted ribonucleoproteins can be used as genetic engineering
tools for precise cutting of nucleic acids in homologous recombination,
non-homologous end joining, gene modification, gene integration, mutation
repair or for their visualisation, transcriptional activation or
repression. A pair of ribonucleotides fused to FokI dimers may be used to
generate double-strand breakages in the DNA to facilitate these
applications in a sequence-specific manner.Claims:
1-46. (canceled)
47. A composition comprising a designed fusion of (i) at least one clustered regularly interspaced short palindromic repeat (CRISPR)-associated protein subunit, and (ii) a nuclease, or a mutant or an active portion thereof.
48. The composition of claim 47, wherein the nuclease is a ribonuclease, or a mutant or an active portion thereof.
49. The composition of claim 48, wherein the ribonuclease is an endonuclease, a 3' exonuclease or a 5' exonuclease.
50. The composition of claim 47, wherein the CRISPR-associated protein subunit is selected from the group consisting of Cas6, Cas5, Cse1, Cse2 and Cas7.
51. The composition of claim 50, wherein the CRISPR-associated protein subunit is Cse1.
52. The composition of claim 47, wherein the nuclease is an endonuclease, or mutant or an active portion thereof.
53. The composition of claim 52, wherein the endonuclease comprises FokI or a modified FokI.
54. The composition of claim 53, wherein the modified FokI is KKR Sharkey or ELD Sharkey or a combination thereof.
55. The composition of claim 47, wherein the nuclease is a FokI KKR Sharkey, further comprising a second composition comprising a second designed fusion of (i) at least one clustered regularly interspaced short palindromic repeat (CRISPR)-associated protein subunit, and (ii) a nuclease, or a mutant or an active portion thereof, wherein the nuclease is a FokI ELD Sharkey.
56. The composition of claim 47, further comprising a second composition comprising a second designed fusion of (i) at least one clustered regularly interspaced short palindromic repeat (CRISPR)-associated protein subunit, and (ii) a nuclease, or a mutant or an active portion thereof.
57. The composition of claim 50, wherein the nuclease comprises FokI or a modified FokI.
58. The composition of claim 57, wherein the modified FokI is KKR Sharkey or ELD Sharkey or a combination thereof.
59. The composition of claim 51, wherein the nuclease comprises FokI or a modified FokI.
60. The composition of claim 59, wherein the modified FokI is KKR Sharkey or ELD Sharkey or a combination thereof.
61. The composition of claim 47, wherein the designed fusion further comprises a linker polypeptide between (i) the clustered regularly interspaced short palindromic repeat (CRISPR)-associated protein subunit, and (ii) the nuclease, or the mutant or the active portion thereof.
62. The composition of claim 47, wherein the clustered regularly interspaced short palindromic repeat (CRISPR)-associated protein subunit and (ii) the nuclease, or the mutant or the active portion thereof, are covalently linked by chemical synthesis.
63. One or more nucleic acid molecules encoding the composition of claim 47.
64. One or more expression vectors comprising the nucleic acid molecules of claim 63.
Description:
[0001] This application is a continuation of U.S. patent application Ser.
No. 14/326,099, filed 8 Jul. 2014, now abandoned, which is a continuation
of U.S. patent application Ser. No. 14/240,735, filed 24 Feb. 2014, now
pending, which is a National Stage Entry of PCT/EP2012/076674, filed 21
Dec. 2012, now expired, which claims the benefit of priority under 35
U.S.C. 119(a)/(b) of United Kingdom Patent Application No. GB 1122458.1,
filed 30 Dec. 2011, now expired, the contents of all of which are herein
incorporated by reference in their entireties.
[0002] The present application contains a Sequence Listing that has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. The ASCII copy, created on 15 Jan. 2016, is named CBI010-13 ST25.txt and is 119 kb in size.
[0003] The invention relates to the field of genetic engineering and more particularly to the area of gene and/or genome modification of organisms, including prokaryotes and eukaryotes. The invention also concerns methods of making site specific tools for use in methods of genome analysis and genetic modification, whether in vivo or in vitro. The invention more particularly relates to the field of ribonucleoproteins which recognise and associate with nucleic acid sequences in a sequence specific way.
[0004] Bacteria and archaea have a wide variety of defense mechanisms against invasive DNA. So called CRISPR/Cas defense systems provide adaptive immunity by integrating plasmid and viral DNA fragments in loci of clustered regularly interspaced short palindromic repeats (CRISPR) on the host chromosome. The viral or plasmid-derived sequences, known as spacers, are separated from each other by repeating host-derived sequences. These repetitive elements are the genetic memory of this immune system and each CRISPR locus contains a diverse repertoire of unique `spacer` sequences acquired during previous encounters with foreign genetic elements.
[0005] Acquisition of foreign DNA is the first step of immunization, but protection requires that the CRISPR is transcribed and that these long transcripts are processed into short CRISPR-derived RNAs (crRNAs) that each contains a unique spacer sequence complementary to a foreign nucleic acid challenger.
[0006] In addition to the crRNA, genetic experiments in several organisms have revealed that a unique set of CRISPR-associated (Cas) proteins is required for the steps of acquiring immunity, for crRNA biogenesis and for targeted interference. Also, a subset of Cas proteins from phylogenetically distinct CRISPR systems have been shown to assemble into large complexes that include a crRNA.
[0007] A recent re-evaluation of the diversity of CRISPR/Cas systems has resulted in a classification into three distinct types (Makarova K. et al (2011) Nature Reviews Microbiology--AOP 9 May 2011; doi:10.1038/nrmicro2577) that vary in cas gene content, and display major differences throughout the CRISPR defense pathway. (The Makarova classification and nomenclature for CRISPR-associated genes is adopted in the present specification.) RNA transcripts of CRISPR loci (pre-crRNA) are cleaved specifically in the repeat sequences by CRISPR associated (Cas) endoribonucleases in type I and type III systems or by RNase III in type II systems; the generated crRNAs are utilized by a Cas protein complex as a guide RNA to detect complementary sequences of either invading DNA or RNA. Cleavage of target nucleic acids has been demonstrated in vitro for the Pyrococcus furiosus type III-B system, which cleaves RNA in a ruler-anchored mechanism, and, more recently, in vivo for the Streptococcus thermophiles type II system, which cleaves DNA in the complementary target sequence (protospacer). In contrast, for type I systems the mechanism of CRISPR-interference is still largely unknown.
[0008] The model organism Escherichia coli strain K12 possesses a CRISPR/Cas type I-E (previously known as CRISPR subtype E (Cse)). It contains eight cas genes (cas1, cas2, cas3 and cse1, cse2, cas7, cas5, cas6e) and a downstream CRISPR (type-2 repeats). In Escherichia coli K12 the eight cas genes are encoded upstream of the CRISPR locus. Cas1 and Cas2 do not appear to be needed for target interference, but are likely to participate in new target sequence acquisition. In contrast, six Cas proteins: Cse1, Cse2, Cas3, Cas7, Cas5 and Cas6e (previously also known as CasA, CasB, Cas3, CasC/Cse4, CasD and CasE/Cse3 respectively) are essential for protection against lambda phage challenge. Five of these proteins: Cse1, Cse2, Cas7, Cas5 and Cas6e (previously known as CasA, CasB, CasC/Cse4, CasD and CasE/Cse3 respectively) assemble with a crRNA to form a multi-subunit ribonucleoprotein (RNP) referred to as Cascade.
[0009] In E. coli, Cascade is a 405 kDa ribonucleoprotein complex composed of an unequal stoichiometry of five functionally essential Cas proteins: Cse1.sub.1Cse2.sub.2Cas7.sub.6Cas5.sub.1Cas6e.sub.1 (i.e. under previous nomenclature CasA.sub.1B.sub.2C.sub.6D.sub.1E.sub.1) and a 61-nt CRISPR-derived RNA. Cascade is an obligate RNP that relies on the crRNA for complex assembly and stability, and for the identification of invading nucleic acid sequences. Cascade is a surveillance complex that finds and binds foreign nucleic acids that are complementary to the spacer sequence of the crRNA.
[0010] Jore et al. (2011) entitled "Structural basis for CRISPR RNA-guided DNA recognition by Cascade" Nature Structural & Molecular Biology 18: 529-537 describes how there is a cleavage of the pre-crRNA transcript by the Cas6e subunit of Cascade, resulting in the mature 61 nt crRNA being retained by the CRISPR complex. The crRNA serves as a guide RNA for sequence specific binding of Cascade to double stranded (ds) DNA molecules through base pairing between the crRNA spacer and the complementary protospacer, forming a so-called R-loop. This is known to be an ATP-independent process.
[0011] Brouns S. J. J., et al (2008) entitled "Small CRISPR RNAs guide antiviral defense in prokaryotes" Science 321: 960-964 teaches that Cascade loaded with a crRNA requires Cas3 for in vivo phage resistance.
[0012] Marraffini L. & Sontheimer E. (2010) entitled "CRISPR interference: RNA-directed adaptive immunity in bacteria and archaea" Nature Reviews Genetics 11: 181-190 is a review article which summarises the state of knowledge in the art in the field. Some suggestions are made about CRISPR-based applications and technologies, but this is mainly in the area of generating phage resistant strains of domesticated bacteria for the dairy industry. The specific cleavage of RNA molecules in vitro by a crRNP complex in Pyrococcus furiosus is suggested as something which awaits further development. Manipulation of CRISPR systems is also suggested as a possible way of reducing transmission of antibiotic-resistant bacterial strains in hospitals. The authors stress that further research effort will be needed to explore the potential utility of the technology in these areas.
[0013] US2011236530 A1 (Manoury et al.) entitled "Genetic cluster of strains of Streptococcus thermophilus having unique rheological properties for dairy fermentation" discloses certain S. thermophilus strains which ferment milk so that it is highly viscous and weakly ropy. A specific CRISPR locus of defined sequence is disclosed.
[0014] US2011217739 A1 (Terns et al.) entitled "Cas6 polypeptides and methods of use" discloses polypeptides which have Cas6 endoribonuclease activity. The polypeptides cleave a target RNA polynucleotide having a Cas6 recognition domain and cleavage site. Cleavage may be carried out in vitro or in vivo. Microbes such as E. coli or Haloferax volcanii are genetically modified so as to express Cas6 endoribonuclease activity.
[0015] WO2010054154 (Danisco) entitled "Bifidobacteria CRISPR sequences" discloses various CRISPR sequences found in Bifidobacteria and their use in making genetically altered strains of the bacteria which are altered in their phage resistance characteristics.
[0016] US2011189776 A1 (Terns et al.) entitled "Prokaryotic RNAi-like system and methods of use" describes methods of inactivating target polynucleotides in vitro or in prokaryotic microbes in vivo. The methods use a psiRNA having a 5' region of 5-10 nucleotides chosen from a repeat from a CRISPR locus immediately upstream of a spacer. The 3' region is substantially complementary to a portion of the target polynucleotide. Also described are polypeptides having endonuclease activity in the presence of psiRNA and target polynucleotide.
[0017] EP2341149 A1 (Danisco) entitled "Use of CRISPR associated genes (CAS) describes how one or more Cas genes can be used for modulating resistance of bacterial cells against bacteriophage; particularly bacteria which provide a starter culture or probiotic culture in dairy products.
[0018] WO2010075424 (The Regents of the University of California) entitled "Compositions and methods for downregulating prokaryotic genes" discloses an isolated polynucleotide comprising a CRISPR array. At least one spacer of the CRISPR is complementary to a gene of a prokaryote so that is can down-regulate expression of the gene; particularly where the gene is associated with biofuel production.
[0019] WO2008108989 (Danisco) entitled "Cultures with improved phage resistance" discloses selecting bacteriophage resistant strains of bacteria and also selecting the strains which have an additional spacer having 100% identity with a region of phage RNA. Improved strain combinations and starter culture rotations are described for use in the dairy industry. Certain phages are described for use as biocontrol agents.
[0020] WO2009115861 (Institut Pasteur) entitled "Molecular typing and subtyping of Salmonella by identification of the variable nucleotide sequences of the CRISPR loci" discloses methods for detecting and identifying bacterial of the Salmonella genus by using their variable nucleotide sequences contained in CRISPR loci.
[0021] WO2006073445 (Danisco) entitled "Detection and typing of bacterial strains" describes detecting and typing of bacterial strains in food products, dietary supplements and environmental samples. Strains of Lactobacillus are identified through specific CRISPR nucleotide sequences.
[0022] Urnov F et al. (2010) entitled "Genome editing with engineered zinc finger nucleases" Nature 11: 636-646 is a review article about zinc finger nucleases and how they have been instrumental in the field of reverse genetics in a range of model organisms. Zinc finger nucleases have been developed so that precisely targeting genome cleavage is possible followed by gene modification in the subsequent repair process. However, zinc finger nucleases are generated by fusing a number of zinc finger DNA-binding domains to a DNA cleavage domain. DNA sequence specificity is achieved by coupling several zinc fingers in series, each recognising a three nucleotide motif. A significant drawback with the technology is that new zinc fingers need to be developed for each new DNA locus which requires to be cleaved. This requires protein engineering and extensive screening to ensure specificity of DNA binding.
[0023] In the fields of genetic engineering and genomic research there is an ongoing need for improved agents for sequence/site specific nucleic acid detection and/or cleavage.
[0024] The inventors have made a surprising discovery in that certain bacteria expressing Cas3, which has helicase-nuclease activity, express Cas3 as a fusion with Cse1. The inventors have also unexpectedly been able to produce artificial fusions of Cse1 with other nuclease enzymes.
[0025] The inventors have also discovered that Cas3-independent target DNA recognition by Cascade marks DNA for cleavage by Cas3, and that Cascade DNA binding is governed by topological requirements of the target DNA.
[0026] The inventors have further found that Cascade is unable to bind relaxed target plasmids, but surprisingly Cascade displays high affinity for targets which have a negatively supercoiled (nSC) topology.
[0027] Accordingly in a first aspect the present invention provides a clustered regularly interspaced short palindromic repeat (CRISPR)-associated complex for antiviral defence (Cascade), the Cascade protein complex, or portion thereof, comprising at least CRISPR-associated protein subunits:
[0028] Cas7 (or COG 1857) having an amino acid sequence of SEQ ID NO:3 or a sequence of at least 18% identity therewith,
[0029] Cas5 (or COG1688) having an amino acid sequence of SEQ ID NO:4 or a sequence of at least 17% identity therewith, and
[0030] Cas6 (or COG 1583) having an amino acid sequence of SEQ ID NO:5 or a sequence of at least 16% identity therewith; and wherein at least one of the subunits includes an additional amino acid sequence providing nucleic acid or chromatin modifying, visualising, transcription activating or transcription repressing activity.
[0031] A subunit which includes an additional amino acid sequence having nucleic acid or chromatin modifying, visualising, transcription activating or transcription repressing activity is an example of what may be termed "a subunit linked to at least one functional moiety"; a functional moiety being the polypeptide or protein made up of the additional amino acid sequence. The transcription activating activity may be that leading to activation or upregulation of a desired genes; the transcription repressing activity leading to repressing or downregulation of a desired genes. The selection of the gene being due to the targeting of the cascade complex of the invention with an RNA molecule, as described further below.
[0032] The additional amino acid sequence having nucleic acid or chromatin modifying, visualising, transcription activating or transcription repressing activity is preferably formed of contiguous amino acid residues. These additional amino acids may be viewed as a polypeptide or protein which is contiguous and forms part of the Cas or Cse subunit(s) concerned. Such a polypeptide or protein sequence is preferably not normally part of any Cas or Cse subunit amino acid sequence. In other words, the additional amino acid sequence having nucleic acid or chromatin modifying, visualising, transcription activating or transcription repressing activity may be other than a Cas or Cse subunit amino acid sequence, or portion thereof, i.e. may be other than a Cas3 submit amino acid sequence or portion thereof.
[0033] The additional amino acid sequence with nucleic acid or chromatin modifying, visualising, transcription activating or transcription repressing activity may, as desired, be obtained or derived from the same organism, e.g. E. coli, as the Cas or Cse subunit(s).
[0034] Additionally and/or alternatively to the above, the additional amino acid sequence having nucleic acid or chromatin modifying, visualising, transcription activating or transcription repressing activity may be "heterologous" to the amino acid sequence of the Cas or Cse subunit(s). Therefore, the additional amino acid sequence may be obtained or derived from an organism different from the organism from which the Cas and/or Cse subunit(s) are derived or originate.
[0035] Throughout, sequence identity may be determined by way of BLAST and subsequent Cobalt multiple sequence alignment at the National Center for Biotechnology Information webserver, where the sequence in question is compared to a reference sequence (e.g. SEQ ID NO: 3, 4 or 5). The amino acid sequences may be defined in terms of percentage sequence similarity based on a BLOSUM62 matrix or percentage identity with a given reference sequence (e.g. SEQ ID NO:3, 4 or 5). The similarity or identity of a sequence involves an initial step of making the best alignment before calculating the percentage conservation with the reference and reflects a measure of evolutionary relationship of sequences.
[0036] Cas7 may have a sequence similarity of at least 31% with SEQ ID NO:3; Cas5 may have a sequence similarity of at least 26% with SEQ ID NO:4. Cas6 may have a sequence similarity of at least 27% with SEQ ID NO:5.
TABLE-US-00001 For Cse1/CasA(502 AA): >gi|16130667|ref|NP_417240.1| CRISP RNA (crRNA) containing Cascade antiviral complex protein [Escherichia coli str. K-12 sub str. MG1655] [SEQ ID NO: 1] MNLLIDNWIPVRPRNGGKVQIINLQSLYCSRDQWRLSLPRDDMELAALA LLVCIGQIIAPAKDDVEFRHRIMNPLTEDEFQQLIAPWIDMFYLNHAEH PFMQTKGVKANDVTPMEKLLAGVSGATNCAFVNQPGQGEALCGGCTAIA LFNQANQAPGFGGGFKSGLRGGTPVTTFVRGIDLRSTVLLNVLTLPRLQ KQFPNESHTENQPTWIKPIKSNESIPASSIGFVRGLFWQPAHIELCDPI GIGKCSCCGQESNLRYTGFLKEKFTFTVNGLWPHPHSPCLVTVKKGEVE EKFLAFTTSAPSWTQISRVVVDKIIQNENGNRVAAVVNQFRNIAPQSPL ELIMGGYRNNQASILERRHDVLMFNQGWQQYGNVINEIVTVGLGYKTAL RKALYTFAEGFKNKDFKGAGVSVHETAERHFYRQSELLIPDVLANVNFS QADEVIADLRDKLHQLCEMLFNQSVAPYAHHPKLISTLALARATLYKHL RELKPQGGPSNG For Cse2/CasB (160 AA): >gi|16130666|ref|NP_417239.1| CRISP RNA (crRNA) containing Cascade antiviral complex protein [Escherichia coli str. K-12 substr. MG1655] [SEQ ID NO: 2] MADEIDAMALYRAWQQLDNGSCAQIRRVSEPDELRDIPAFYRLVQPFGW ENPRHQQALLRMVFCLSAGKNVIRHQDKKSEQTTGISLGRALANSGRIN ERRIFQLIRADRTADMVQLRRLLTHAEPVLDWPLMARMLTWWGKRERQQ LLEDFVLTTNKNA For Cas7/CasC/Cse4 (363 AA): >gi|16130665|ref|NP_417238.11 CRISP RNA (crRNA) containing Cascade antiviral complex protein [Escherichia coli str. K-12 substr. MG1655] [SEQ ID NO: 3] MSNFINIHVLISHSPSCLNRDDMNMQKDAIFGGKRRVRISSQSLKRAMR KSGYYAQNIGESSLRTIHLAQLRDVLRQKLGERFDQKIIDKTLALLSGK SVDEAEKISADAVTPWVVGEIAWFCEQVAKAEADNLDDKKLLKVLKEDI AAIRVNLQQGVDIALSGRMATSGMMTELGKVDGAMSIAHAITTHQVDSD IDWFTAVDDLQEQGSAHLGTQEFSSGVFYRYANINLAQLQENLGGASRE QALEIATHVVHMLATEVPGAKQRTYAAFNPADMVMVNFSDMPLSMANAF EKAVKAKDGFLQPSIQAFNQYWDRVANGYGLNGAAAQFSLSDVDPITAQ VKQMPTLEQLKSWVRNNGEA For Cas5/CasD(224 AA): >gi|90111483|ref|NP_417237.2| CRISP RNA (crRNA) containing Cascade antiviral complex protein [Escherichia coli str. K-12 substr. MG1655] [SEQ ID NO: 4] MRSYLILRLAGPMQAWGQPTFEGTRPTGRFPTRSGLLGLLGACLGIQRD DTSSLQALSESVQFAVRCDELILDDRRVSVTGLRDYHTVLGAREDYRGL KSHETIQTWREYLCDASFTVALWLTPHATMVISELEKAVLKPRYTPYLG RRSCPLTHPLFLGTCQASDPQKALLNYEPVGGDIYSEESVTGHHLKFTA RDEPMITLPRQFASREWYVIKGGMDVSQ For Cas6e/CasE(199 AA): >gi|16130663|ref|NP_417236.1| CRISPR RNA precursor cleavage enzyme; CRISP RNA (crRNA) containing Cascade antiviral complex protein [Escherichia coli str. K-12 substr. MG1655] [SEQ ID NO: 5] MYLSKVIIARAWSRDLYQLHQGLWHLFPNRPDAARDFLFHVEKRNTPEG CHVLLQSAQMPVSTAVATVIKTKQVEFQLQVGVPLYFRLRANPIKTILD NQKRLDSKGNIKRCRVPLIKEAEQIAWLQRKLGNAARVEDVHPISERPQ YFSGDGKSGKIQTVCFEGVLTINDAPALIDLVQQGIGPAKSMGCGLLSL APL
[0037] In defining the range of sequence variants which fall within the scope of the invention, for the avoidance of doubt, the following are each optional limits on the extent of variation, to be applied for each of SEQ ID NO:1, 2, 3, 4 or 5 starting from the respect broadest range of variants as specified in terms of the respective percentage identity above. The range of variants therefore may therefore include: at least 16%, or at least 17%, or at least 18%, or at least 19%, or at least 20%, or at least 21%, or at least 22%, or at least 23%, or at least 24%, or at least 25%, or at least 26%, or at least 27%, or at least 28%, or at least 29%, or at least 30%, or at least 31%, or at least 32%, or at least 33%, or at least 34%, or at least 35%, or at least 36%, or at least 37%, or at least 38%, or at least 39%, or at least 40%, or at least 41%, or at least 42%, or at least 43%, at least 44%, or at least 45%, or at least 46%, or at least 47%, or at least 48%, or at least 49%, or at least 50%, or at least 51%, or at least 52%, or at least 53%, or at least 54%, or at least 55%, or at least 56%, or at least 57%, or at least 58%, or at least 59%, or at least 60%, or at least 61%, or at least 62%, or at least 63%, or at least 64%, or at least 65%, or at least 66%, or at least 67%, or at least 68%, or at least 69%, or at least 70%, or at least 71%, at least 72%, or at least 73%, or at least 74%, or at least 75%, or at least 76%, or at least 77%, or at least 78%, or at least 79%, or at least 80%, or at least 81%, or at least 82%, or at least 83%, or at least 84%, or at least 85%, or at least 86%, or at least 87%, or at least 88%, or at least 89%, or at least 90%, or at least 91%, or at least 92%, or at least 93%, or at least 94%, or at least 95%, or at least 96%, or at least 97%, or at least 98%, or at least 99%, or 100% amino acid sequence identity.
[0038] Throughout, the Makarova et al. (2011) nomenclature is being used in the definition of the Cas protein subunits. Table 2 on page 5 of the Makarova et al. article lists the Cas genes and the names of the families and superfamilies to which they belong. Throughout, reference to a Cas protein or Cse protein subunit includes cross reference to the family or superfamily of which these subunits form part.
[0039] Throughout, the reference sequences of the Cas and Cse subunits of the invention may be defined as a nucleotide sequence encoding the amino acid sequence. For example, the amino acid sequence of SEQ ID NO:3 for Cas7 also includes all nucleic acid sequences which encode that amino acid sequence. The variants of Cas7 included within the scope of the invention therefore include nucleotide sequences of at least the defined amino acid percentage identities or similarities with the reference nucleic acid sequence; as well as all possible percentage identities or similarities between that lower limit and 100%.
[0040] The Cascade complexes of the invention may be made up of subunits derived or modified from more than one different bacterial or archaeal prokaryote. Also, the subunits from different Cas subtypes may be mixed.
[0041] In a preferred aspect, the Cas6 subunit is a Cas6e subunit of SEQ ID NO: 17 below, or a sequence of at least 16% identity therewith.
TABLE-US-00002 The sequence of a preferred Cas6e subunit is >gi|16130663|ref|NP_417236.1| CRISPR RNA precursor cleavage enzyme; CRISP RNA (crRNA) containing Cascade antiviral complex protein [Escherichia coli str. K-12 substr. MG1655]: [SEQ ID NO: 17] MYLSKVIIARAWSRDLYQLHQGLWHLFPNRPDAARDFLFHVEKRNTPEG CHVLLQSAQMPVSTAVATVIKTKQVEFQLQVGVPLYFRLRANPIKTILD NQKRLDSKGNIKRCRVPLIKEAEQIAWLQRKLGNAARVEDVHPISERPQ YFSGDGKSGKIQTVCFEGVLTINDAPALIDLVQQGIGPAKSMGCGLLSL APL
[0042] The Cascade complexes, or portions thereof, of the invention--which comprise at least one subunit which includes an additional amino acid sequence having nucleic acid or chromatin modifying, visualising, transcription activating or transcription repressing activity--may further comprise a Cse2 (or YgcK-like) subunit having an amino acid sequence of SEQ ID NO:2 or a sequence of at least 20% identity therewith, or a portion thereof. Alternatively, the Cse subunit is defined as having at least 38% similarity with SEQ ID NO:2. Optionally, within the protein complex of the invention it is the Cse2 subunit which includes the additional amino acid sequence having nucleic acid or chromatin modifying activity.
[0043] Additionally or alternatively, the Cascade complexes of the invention may further comprise a Cse1 (or YgcL-like) subunit having an amino acid sequence of SEQ ID NO: 1 or a sequence of at least 9% identity therewith, or a portion thereof. Optionally within the protein complex of the invention it is the Cse1 subunit which includes the additional amino acid sequence having nucleic acid or chromatin modifying, visualising, transcription activating or transcription repressing activity.
[0044] In preferred embodiments, a Cascade complex of the invention is a Type I CRISPR-Cas system protein complex; more preferably a subtype I-E CRISPR-Cas protein complex or it can be based on a Type I-A or Type I-B complex. A Type I-C, D or F complex is possible. In particularly preferred embodiments based on the E. coli system, the subunits may have the following stoichiometries: Cse1.sub.1Cse2.sub.2Cas7.sub.6Cas5.sub.1 Cas6.sub.1 or Cse1.sub.1Cse2.sub.2Cas7.sub.6Cas5.sub.1Cas6e.sub.1.
[0045] The additional amino acid sequence having nucleic acid or chromatin modifying, visualising, transcription activating or transcription repressing activity may be translationally fused through expression in natural or artificial protein expression systems, or covalently linked by a chemical synthesis step to the at least one subunit; preferably the at least one functional moiety is fused or linked to at least the region of the N terminus and/or the region of the C terminus of at least one of a Cse1, Cse2, Cas7, Cas5, Cas6 or Cas6e subunit. In particularly preferred embodiments, the additional amino acid sequence having nucleic acid or chromatin modifying activity is fused or linked to the N terminus or the C terminus of a Cse1, a Cse2 or a Cas5 subunit; more preferably the linkage is in the region of the N terminus of a Cse1 subunit, the N terminus of a Cse2 subunit, or the N terminus of a Cas7 subunit.
[0046] The additional amino acid sequence having nucleic acid or chromatin modifying, activating, repressing or visualising activity may be a protein; optionally selected from a helicase, a nuclease, a nuclease-helicase, a DNA methyltransferase (e.g. Dam), or DNA demethylase, a histone methyltransferase, a histone demethylase, an acetylase, a deacetylase, a phosphatase, a kinase, a transcription (co-)activator, an RNA polymerase submit, a transcription repressor, a DNA binding protein, a DNA structuring protein, a marker protein, a reporter protein, a fluorescent protein, a ligand binding protein (e.g. mCherry or a heavy metal binding protein), a signal peptide (e.g. Tat-signal sequence), a subcellular localisation sequence (e.g. nuclear localisation sequence) or an antibody epitope.
[0047] The protein concerned may be a heterologous protein from a species other than the bacterial species from which the Cascade protein subunits have their sequence origin.
[0048] When the protein is a nuclease, it may be one selected from a type II restriction endonuclease such as FokI, or a mutant or an active portion thereof. Other type II restriction endonucleases which may be used include EcoR1, EcoRV, BgII, BamHI, BsgI and BspMI. Preferably, one protein complex of the invention may be fused to the N terminal domain of FokI and another protein complex of the invention may be fused to the C terminal domain of FokI. These two protein complexes may then be used together to achieve an advantageous locus specific double stranded cut in a nucleic acid, whereby the location of the cut in the genetic material is at the design and choice of the user, as guided by the RNA component (defined and described below) and due to presence of a so-called "protospacer adjacent motif" (PAM) sequence in the target nucleic acid strand (also described in more detail below).
[0049] In a preferred embodiment, a protein complex of the invention has an additional amino acid sequence which is a modified restriction endonuclease, e.g. FokI. The modification is preferably in the catalytic domain. In preferred embodiments, the modified FokI is KKR Sharkey or ELD Sharkey which is fused to the Cse1 protein of the protein complex. In a preferred application of these complexes of the invention, two of these complexes (KKR Sharkey and ELD Sharkey) may be together in combination. A heterodimer pair of protein complexes employing differently modified FokI is has particular advantage in targeted double stranded cutting of nucleic acid. If homodimers are used then it is possible that there is more cleavage at non-target sites due to non-specific activity. A heterodimer approach advantageously increases the fidelity of the cleavage in a sample of material.
[0050] The Cascade complex with additional amino acid sequence having nucleic acid or chromatin modifying, visualising, transcription activating or transcription repressing activity defined and described above is a component part of an overall system of the invention which advantageously permits the user to select in a predetermined matter a precise genetic locus which is desired to be cleaved, tagged or otherwise altered in some way, e.g methylation, using any of the nucleic acid or chromatin modifying, visualising, transcription activating or transcription repressing entities defined herein. The other component part of the system is an RNA molecule which acts as a guide for directing the Cascade complex of the invention to the correct locus on DNA or RNA intending to be modified, cut or tagged.
[0051] The Cascade complex of the invention preferably also comprises an RNA molecule which comprises a ribonucleotide sequence of at least 50% identity to a desired target nucleic acid sequence, and wherein the protein complex and the RNA molecule form a ribonucleoprotein complex. Preferably the ribonucleoprotein complex forms when the RNA molecule is hybridized to its intended target nucleic acid sequence. The ribonucleoprotein complex forms when the necessary components of Cascade-functional moiety combination and RNA molecule and nucleic acid (DNA or RNA) are present together in suitable physiological conditions, whether in vivo or in vitro. Without wishing to be bound by any particular theory, the inventors believe that in the context of dsDNA, particularly negatively supercoiled DNA, the Cascade complex associating with the dsDNA causes a partial unwinding of the duplex strands which then allows the RNA to associate with one strand; the whole ribonucleoprotein complex then migrates along the DNA strand until a target sequence substantially complementary to at least a portion of the RNA sequence is reached, at which point a stable interaction between RNA and DNA strand occurs, and the function of the functional moiety takes effect, whether by modifying, nuclease cutting or tagging of the DNA at that locus.
[0052] In preferred embodiments, a portion of the RNA molecule has at least 50% identity to the target nucleic acid sequence; more preferably at least 95% identity to the target sequence. In more preferred embodiments, the portion of the RNA molecule is substantially complementary along its length to the target DNA sequence; i.e. there is only one, two, three, four or five mismatches which may be contiguous or non-contiguous. The RNA molecule (or portion thereof) may have at least 51%, or at least 52%, or at least 53%, or at least 54%, or at least 55%, or at least 56%, or at least 57%, or at least 58%, or at least 59%, or at least 60%, or at least 61%, or at least 62%, or at least 63%, or at least 64%, or least 65%, or at least 66%, or at least 67%, or at least 68%, or at least 69%, or at least 70%, or at least 71%, or at least 72%, or at least 73%, or at least 74%, or at least 75%, or at least 76%, or at least 77%, or at least 78%, or at least 79%, or at least 80%, or at least 81%, or at least 82%, or at least 83%, or at least 84%, or least 85%, or at least 86%, or at least 87%, or at least 88%, or at least 89%, or at least 90%, or at least 91%, or at least 92%, or at least 93%, or at least 94%, or at least 95%, or at least 96%, or at least 97%, or at least 98%, or at least 99%, or 100% identity to the target sequence.
[0053] The target nucleic acid may be DNA (ss or ds) or RNA.
[0054] In other preferred embodiments, the RNA molecule or portion thereof has at least 70% identity with the target nucleic acid. At such levels of identity, the target nucleic acid is preferably dsDNA.
[0055] The RNA molecule will preferably require a high specificity and affinity for the target nucleic acid sequence. A dissociation constant (K.sub.d) in the range 1 pM to 1 .mu.M, preferably 1-100 nM is desirable as determined by preferably native gel electrophoresis, or alternatively isothermal titration calorimetry, surface plasmon resonance, or fluorescence based titration methods. Affinity may be determined using an electrophoretic mobility shift assay (EMSA), also called gel retardation assay (see Semenova E et al. (2011) Proc. Natl. Acad. Sci. USA 108: 10098-10103).
[0056] The RNA molecule is preferably modelled on what are known from nature in prokaryotes as CRISPR RNA (crRNA) molecules. The structure of crRNA molecules is already established and explained in more detail in Jore et al. (2011) Nature Structural & Molecular Biology 18: 529-537. In brief, a mature crRNA of type I-E is often 61 nucleotides long and consists of a 5' "handle" region of 8 nucleotides, the "spacer" sequence of 32 nucleotides, and a 3' sequence of 21 nucleotides which form a hairpin with a tetranucleotide loop. However, the RNA used in the invention does not have to be designed strictly to the design of naturally occurring crRNA, whether in length, regions or specific RNA sequences. What is clear though, is that RNA molecules for use in the invention may be designed based on gene sequence information in the public databases or newly discovered, and then made artificially, e.g. by chemical synthesis in whole or in part. The RNA molecules of the invention may also be designed and produced by way of expression in genetically modified cells or cell free expression systems and this option may include synthesis of some or all of the RNA sequence.
[0057] The structure and requirements of crRNA has also been described in Semenova E et al. (2011) Proc. Natl. Acad. Sci. USA 108: 10098-10103. There is a so-called "SEED" portion forming the 5' end of the spacer sequence and which is flanked 5' thereto by the 5' handle of 8 nucleotides. Semenova et al. (2011) have found that all residues of the SEED sequence should be complementary to the target sequence, although for the residue at position 6, a mismatch may be tolerated. Similarly, when designing and making an RNA component of a ribonucleoprotein complex of the invention directed at a target locus (i.e. sequence), the necessary match and mismatch rules for the SEED sequence can be applied.
[0058] The invention therefore includes a method of detecting and/or locating a single base change in a target nucleic acid molecule comprising contacting a nucleic acid sample with a ribonucleoprotein complex of the invention as hereinbefore described, or with a Cascade complex and separate RNA component of the invention as hereinbefore described, and wherein the sequence of the RNA component (including when in the ribonucleoprotein complex) is such that it discriminates between a normal allele and a mutant allele by virtue of a single base change at position 6 of a contiguous sequence of 8 nucleotide residues.
[0059] In embodiments of the invention, the RNA molecule may have a length in the range of 35-75 residues. In preferred embodiments, the portion of the RNA which is complementary to and used for targeting a desired nucleic acid sequence is 32 or 33 residues long. (In the context of a naturally occurring crRNA, this would correspond to the spacer portion; as shown in FIG. 1 of Semenova et al. (2011)).
[0060] A ribonucleoprotein complex of the invention may additionally have an RNA component comprising 8 residues 5' to the RNA sequence which has at least substantial complementarity to the nucleic acid target sequence. (The RNA sequence having at least substantial complementarity to the nucleic acid target sequence would be understood to correspond in the context of a crRNA as being the spacer sequence. The 5' flanking sequence of the RNA would be considered to correspond to the 5' handle of a crRNA. This is shown in FIG. 1 of Semenova et al. (2011)).
[0061] A ribonucleoprotein complex of the invention may have a hairpin and tetranucleotide loop forming sequence 3' to the RNA sequence which has at least substantial complementarity to the DNA target sequence. (In the context of crRNA, this would correspond to a 3' handle flanking the spacer sequence as shown in FIG. 1 of Semenova et al. (2011)).
[0062] In some embodiments, the RNA may be a CRISPR RNA (crRNA).
[0063] The Cascade proteins and complexes of the invention may be characterised in vitro in terms of its activity of association with the RNA guiding component to form a ribonucleoprotein complex in the presence of the target nucleic acid (which may be DNA or RNA). An electrophoretic mobility shift assay (EMSA) may be used as a functional assay for interaction of complexes of the invention with their nucleic acid targets. Basically, Cascade-functional moiety complex of the invention is mixed with nucleic acid targets and the stable interaction of the Cascade-functional moiety complex is monitored by EMSA or by specific readout out the functional moiety, for example endonucleolytic cleavage of target DNA at the desired site. This can be determined by further restriction fragment length analysis using commercially available enzymes with known specificities and cleavage sites in a target DNA molecule.
[0064] Visualisation of binding of Cascade proteins or complexes of the invention to DNA or RNA in the presence of guiding RNA may be achieved using scanning/atomic force microscopy (SFM/AFM) imaging and this may provide an assay for the presence of functional complexes of the invention.
[0065] The invention also provides a nucleic acid molecule encoding at least one clustered regularly interspaced short palindromic repeat (CRISPR)-associated protein subunit selected from:
[0066] a. a Cse1 subunit having an amino acid sequence of SEQ ID NO: 1 or a sequence of at least 9% identity therewith;
[0067] b. a Cse2 subunit having an amino acid sequence of SEQ ID NO:2 or a sequence of at least 20% identity therewith;
[0068] c. a Cas7 subunit having an amino acid sequence of SEQ ID NO:3 or a sequence of at least 18% identity therewith;
[0069] d. a Cas5 subunit having an amino acid sequence of SEQ ID NO:4 or a sequence of at least 17% identity therewith;
[0070] e. a Cas6 subunit having an amino acid sequence of SEQ ID NO:5 or a sequence of at least 16% identity therewith; and wherein at least a, b, c, d or e includes an additional amino acid sequence having nucleic acid or chromatin modifying, visualising, transcription activating or transcription repressing activity.
[0071] The additional amino acid sequence having nucleic acid or chromatin modifying, visualising, transcription activating or transcription repressing activity is preferably fused to the CRISPR-associated protein subunit.
[0072] In the nucleic acids of the invention defined above, the nucleotide sequence may be that which encodes the respective SEQ ID NO: 1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4 or SEQ ID NO:5, or in defining the range of variant sequences thereto, it may be a sequence hybridisable to that nucleotide sequence, preferably under stringent conditions, more preferably very high stringency conditions. A variety of stringent hybridisation conditions will be familiar to the skilled reader in the field. Hybridization of a nucleic acid molecule occurs when two complementary nucleic acid molecules undergo an amount of hydrogen bonding to each other known as Watson-Crick base pairing. The stringency of hybridization can vary according to the environmental (i.e. chemical/physical/biological) conditions surrounding the nucleic acids, temperature, the nature of the hybridization method, and the composition and length of the nucleic acid molecules used. Calculations regarding hybridization conditions required for attaining particular degrees of stringency are discussed in Sambrook et al., Molecular Cloning: A Laboratory Manual (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N. Y., 2001); and Tijssen, Laboratory Techniques in Biochemistry and Molecular Biology--Hybridization with Nucleic Acid Probes Part I, Chapter 2 (Elsevier, New York, 1993). The T.sub.m is the temperature at which 50% of a given strand of a nucleic acid molecule is hybridized to its complementary strand. The following is an exemplary set of hybridization conditions and is not limiting:
Very High Stringency (Allows Sequences that Share at Least 90% Identity to Hybridize) Hybridization: 5.times.SSC at 65.degree. C. for 16 hours Wash twice: 2.times.SSC at room temperature (RT) for 15 minutes each Wash twice: 0.5.times.SSC at 65.degree. C. for 20 minutes each High Stringency (Allows Sequences that Share at Least 80% Identity to Hybridize) Hybridization: 5.times.-6.times.SSC at 65.degree. C.-70.degree. C. for 16-20 hours Wash twice: 2.times.SSC at RT for 5-20 minutes each Wash twice: 1.times.SSC at 55.degree. C.-70.degree. C. for 30 minutes each Low Stringency (Allows Sequences that Share at Least 50% Identity to Hybridize) Hybridization: 6.times.SSC at RT to 55.degree. C. for 16-20 hours Wash at least twice: 2.times.-3.times.SSC at RT to 55.degree. C. for 20-30 minutes each.
[0073] The nucleic acid molecule may be an isolated nucleic acid molecule and may be an RNA or a DNA molecule.
[0074] The additional amino acid sequence may be selected from a helicase, a nuclease, a nuclease-helicase (e.g. Cas3), a DNA methyltransferase (e.g. Dam), a DNA demethylase, a histone methyltransferase, a histone demethylase, an acetylase, a deacetylase, a phosphatase, a kinase, a transcription (co-)activator, an RNA polymerase subunit, a transcription repressor, a DNA binding protein, a DNA structuring protein, a marker protein, a reporter protein, a fluorescent protein, a ligand binding protein (e.g. mCherry or a heavy metal binding protein), a signal peptide (e.g. Tat-signal sequence), a subcellular localisation sequence (e.g. nuclear localisation sequence), or an antibody epitope. The additional amino acid sequence may be, or from a different protein from the organism from which the relevant Cascade protein subunit(s) are derived.
[0075] The invention includes an expression vector comprising a nucleic acid molecule as hereinbefore defined. One expression vector may contain the nucleotide sequence encoding a single Cascade protein subunit and also the nucleotide sequence encoding the additional amino acid sequence, whereby on expression the subunit and additional sequence are fused. Other expression vectors may comprise nucleotide sequences encoding just one or more Cascade protein subunits which are not fused to any additional amino acid sequence.
[0076] The additional amino acid sequence with nucleic acid or chromatin modifying activity may be fused to any of the Cascade subunits via a linker polypeptide. The linker may be of any length up to about 60 or up to about 100 amino acid residues. Preferably the linker has a number of amino acids in the range 10 to 60, more preferably 10-20. The amino acids are preferably polar and/or small and/or charged amino acids (e.g. Gln, Ser, Thr, Pro, Ala, Glu, Asp, Lys, Arg, His, Asn, Cys, Tyr). The linker peptide is preferably designed to obtain the correct spacing and positioning of the fused functional moiety and the subunit of Cascade to which the moiety is fused to allow proper interaction with the target nucleotide.
[0077] An expression vector of the invention (with or without nucleotide sequence encoding amino acid residues which on expression will be fused to a Cascade protein subunit) may further comprise a sequence encoding an RNA molecule as hereinbefore defined. Consequently, such expression vectors can be used in an appropriate host to generate a ribonucleoprotein of the invention which can target a desired nucleotide sequence.
[0078] Accordingly, the invention also provides a method of modifying, visualising, or activating or repressing transcription of a target nucleic acid comprising contacting the nucleic acid with a ribonucleoprotein complex as hereinbefore defined. The modifying may be by cleaving the nucleic acid or binding to it.
[0079] The invention also includes a method of modifying, visualising, or activating or repressing transcription of a target nucleic acid comprising contacting the nucleic acid with a Cascade protein complex as hereinbefore defined, plus an RNA molecule as hereinbefore defined.
[0080] In accordance with the above methods, the modification, visualising, or activating or repressing transcription of a target nucleic acid may therefore be carried out in vitro and in a cell free environment; i.e. the method is carried out as a biochemical reaction whether free in solution or whether involving a solid phase. Target nucleic acid may be bound to a solid phase, for example.
[0081] In a cell free environment, the order of adding each of the target nucleic acid, the Cascade protein complex and the RNA molecule is at the option of the average skilled person. The three components may be added simultaneously, sequentially in any desired order, or separately at different times and in a desired order. Thus it is possible for the target nucleic acid and RNA to be added simultaneously to a reaction mix and then the Cascade protein complex of the invention to be added separately and later in a sequence of specific method steps.
[0082] The modification, visualising, or activating or repressing transcription of a target nucleic acid may be made in situ in a cell, whether an isolated cell or as part of a multicellular tissue, organ or organism. Therefore in the context of whole tissue and organs, and in the context of an organism, the method can be carried out in vivo or it can be carried out by isolating a cell from the whole tissue, organ or organism and then returning the cell treated with ribonucleoprotein complex to its former location, or a different location, whether within the same or a different organism. Thus the method would include allografts, autografts, isografts and xenografts.
[0083] In these embodiments, the ribonucleoprotein complex or the Cascade protein complex of the invention requires an appropriate form of delivery into the cell, which will be well known to persons of skill in the art, including microinjection, whether into the cell cytoplasm or into the nucleus.
[0084] Also when present separately, the RNA molecule requires an appropriate form of delivery into a cell, whether simultaneously, separately or sequentially with the Cascade protein complex. Such forms of introducing RNA into cells are well known to a person of skill in the art and may include in vitro or ex vivo delivery via conventional transfection methods. Physical methods, such as microinjection and electroporation, as well as calcium co-precipitation, and commercially available cationic polymers and lipids, and cell-penetrating peptides, cell-penetrating particles (gene-gun) may each be used. For example, viruses may be used as delivery vehicles, whether to the cytoplasm and/or nucleus--e.g. via the (reversible) fusion of Cascade protein complex of the invention or a ribonucleoprotein complex of the invention to the viral particle. Viral delivery (e.g. adenovirus delivery) or Agrobacterium-mediated delivery may be used.
[0085] The invention also includes a method of modifying visualising, or activating or repressing transcription of a target nucleic acid in a cell, comprising transfecting, transforming or transducing the cell with any of the expression vectors as hereinbefore described. The methods of transfection, transformation or transduction are of the types well known to a person of skill in the art. Where there is one expression vector used to generate expression of a Cascade complex of the invention and when the RNA is added directly to the cell then the same or a different method of transfection, transformation or transduction may be used. Similarly, when there is one expression vector being used to generate expression of a Cascade-functional fusion complex of the invention and when another expression vector is being used to generate the RNA in situ via expression, then the same or a different method of transfection, transformation or transduction may be used.
[0086] In other embodiments, mRNA encoding the Cascade complex of the invention is introduced into a cell so that the Cascade complex is expressed in the cell. The RNA which guides the Cascade complex to the desired target sequence is also introduced into the cell, whether simultaneously, separately or sequentially from the mRNA, such that the necessary ribonucleoprotein complex is formed in the cell.
[0087] In the aforementioned methods of modifying or visualising a target nucleic acid, the additional amino acid sequence may be a marker and the marker associates with the target nucleic acid; preferably wherein the marker is a protein; optionally a fluorescent protein, e.g. green fluorescent protein (GFP) or yellow fluorescent protein (YFP) or mCherry. Whether in vitro, ex vivo or in vivo, then methods of the invention can be used to directly visualise a target locus in a nucleic acid molecule, preferably in the form of a higher order structure such as a supercoiled plasmid or chromosome, or a single stranded target nucleic acid such as mRNA. Direct visualisation of a target locus may use electron micrography, or fluorescence microscopy.
[0088] Other kinds of label may be used to mark the target nucleic acid including organic dye molecules, radiolabels and spin labels which may be small molecules.
[0089] In methods of the invention described above, the target nucleic acid is DNA; preferably dsDNA although the target can be RNA; preferably mRNA.
[0090] In methods of the invention for modifying, visualising, activating transcription or repressing transcription of a target nucleic acid wherein the target nucleic acid is dsDNA, the additional amino acid sequence with nucleic acid or chromatin modifying activity may be a nuclease or a helicase-nuclease, and the modification is preferably a single stranded or a double stranded break at a desired locus. In this way unique sequence specific cutting of DNA can be engineered by using the Cascade-functional moiety complexes. The chosen sequence of the RNA component of the final ribonucleoprotein complex provides the desired sequence specificity for the action of the additional amino acid sequence.
[0091] Therefore, the invention also provides a method of non-homologous end joining of a dsDNA molecule in a cell at a desired locus to remove at least a part of a nucleotide sequence from the dsDNA molecule; optionally to knockout the function of a gene or genes, wherein the method comprises making double stranded breaks using any of the methods of modifying a target nucleic acid as hereinbefore described.
[0092] The invention further provides a method of homologous recombination of a nucleic acid into a dsDNA molecule in a cell at a desired locus in order to modify an existing nucleotide sequence or insert a desired nucleotide sequence, wherein the method comprises making a double or single stranded break at the desired locus using any of the methods of modifying a target nucleic acid as hereinbefore described.
[0093] The invention therefore also provides a method of modifying, activating or repressing gene expression in an organism comprising modifying, activating transcription or repressing transcription of a target nucleic acid sequence according to any of the methods hereinbefore described, wherein the nucleic acid is dsDNA and the functional moiety is selected from a DNA modifying enzyme (e.g. a demethylase or deacetylase), a transcription activator or a transcription repressor.
[0094] The invention additionally provides a method of modifying, activating or repressing gene expression in an organism comprising modifying, activating transcription or repressing transcription of a target nucleic acid sequence according to any of the methods hereinbefore described, wherein the nucleic acid is an mRNA and the functional moiety is a ribonuclease; optionally selected from an endonuclease, a 3' exonuclease or a 5' exonuclease.
[0095] In any of the methods of the invention as described above, the cell which is subjected to the method may be a prokaryote. Similarly, the cell may be a eukaryotic cell, e.g. a plant cell, an insect cell, a yeast cell, a fungal cell, a mammalian cell or a human cell. When the cell is of a mammal or human then it can be a stem cell (but may not be any human embryonic stem cell). Such stem cells for use in the invention are preferably isolated stem cells. Optionally in accordance with any method the invention a cell is transfected in vitro.
[0096] Preferably though, in any of the methods of the invention, the target nucleic acid has a specific tertiary structure, optionally supercoiled, more preferably wherein the target nucleic acid is negatively supercoiled. Advantageously, the ribonucleoprotein complexes of the invention, whether produced in vitro, or whether formed within cells, or whether formed within cells via expression machinery of the cell, can be used to target a locus which would otherwise be difficult to get access to in order to apply the functional activity of a desired component, whether labelling or tagging of a specific sequence, modification of nucleic acid structure, switching on or off of gene expression, or of modification of the target sequence itself involving single or double stranded cutting followed by insertion of one or more nucleotide residues or a cassette.
[0097] The invention also includes a pharmaceutical composition comprising a Cascade protein complex or a ribonucleoprotein complex of the invention as hereinbefore described.
[0098] The invention further includes a pharmaceutical composition comprising an isolated nucleic acid or an expression vector of the invention as hereinbefore described.
[0099] Also provided is a kit comprising a Casacade protein complex of the invention as hereinbefore described plus an RNA molecule of the invention as hereinbefore described.
[0100] The invention includes a Cascade protein complex or a ribonucleoprotein complex or a nucleic acid or a vector, as hereinbefore described for use as a medicament.
[0101] The invention allows a variety of possibilities to physically alter DNA of prokaryotic or eukaryotic hosts at a specified genomic locus, or change expression patterns of a gene at a given locus. Host genomic DNA can be cleaved or modified by methylation, visualized by fluorescence, transcriptionally activated or repressed by functional domains such as nucleases, methylases, fluorescent proteins, transcription activators or repressors respectively, fused to suitable Cascade-subunits. Moreover, the RNA-guided RNA-binding ability of Cascade permits the monitoring of RNA trafficking in live cells using fluorescent Cascade fusion proteins, and provides ways to sequester or destroy host mRNAs causing interference with gene expression levels of a host cell.
[0102] In any of the methods of the invention, the target nucleic acid may be defined, preferably so if dsDNA, by the presence of at least one of the following nucleotide triplets: 5'-CTT-3', 5'-CAT-3', 5'-CCT-3', or 5'-CTC-3' (or 5'-CUU-3', 5'-CAU-3', 5'-CCU-3', or 5'-CTC-3' if the target is an RNA). The location of the triplet is in the target strand adjacent to the sequence to which the RNA molecule component of a ribonucleoprotein of the invention hybridizes. The triplet marks the point in the target strand sequence at which base pairing with the RNA molecule component of the ribonucleoprotein does not take place in a 5' to 3' (downstream) direction of the target (whilst it takes place upstream of the target sequence from that point subject to the preferred length of the RNA sequence of the RNA molecule component of the ribonucleoprotein of the invention). In the context of a native type I CRISPR system, the triplets correspond to what is known as a "PAM" (protospacer adjacent motif). For ssDNA or ssRNA targets, presence of one of the triplets is not so necessary.
[0103] The invention will now be described in detail and with reference to specific examples and drawings in which:
[0104] FIG. 1 shows the results of gel-shift assays where Cascade binds negatively supercoiled (nSC) plasmid DNA but not relaxed DNA. A) Gel-shift of nSC plasmid DNA with J3-Cascade, containing a targeting (J3) crRNA. pUC-.lamda., was mixed with 2-fold increasing amounts of J3-Cascade, from a pUC-.lamda.:Cascade molar ratio of 1:0.5 up to a 1:256 molar ratio. The first and last lanes contain only pUC-.lamda.. B) Gel-shift as in (A) with R44-Cascade containing a non-targeting (R44) cRNA. C) Gel-shift as in (A) with Nt.BspQI nicked pUC-.lamda.. D) Gel-shift as in (A) with PdmI linearized pUC-.lamda.. E) Fit of the fraction pUC-.lamda. bound to J3-Cascade plotted against the concentration of free J3-Cascade gives the dissociation constant (Kd) for specific binding. F) Fit of the fraction pUC-.lamda. bound to R44-Cascade plotted against the concentration of free R44-Cascade gives the dissociation constant (Kd) for non-specific binding. G) Specific binding of Cascade to the protospacer monitored by restriction analysis, using the unique BsmI restriction site in the protospacer sequence. Lane 1 and 5 contain only pUC-.lamda.. Lane 2 and 6 contain pUC-.lamda. mixed with Cascade. Lane 3 and 7 contain pUC-.lamda. mixed with Cascade and subsequent BsmI addition. Lane 4 and 8 contain pUC-.lamda. mixed with BsmI. H) Gel-shift of pUC-.lamda. bound to Cascade with subsequent Nt.BspQI cleavage of one strand of the plasmid. Lane 1 and 6 contain only pUC-.lamda.. Lane 2 and 7 contain pUC-.lamda., mixed with Cascade. Lane 3 and 8 contain pUC-.lamda. mixed with Cascade and subsequent Nt.BspQI nicking. Lane 4 and 9 contain pUC-.lamda. mixed with Cascade, followed by addition of a ssDNA probe complementary to the displaced strand in the R-loop and subsequent nicking with Nt.BspQI. Lane 5 and 10 contain pUC-.lamda. nicked with Nt.BspQI. H) Gel-shift of pUC-.lamda. bound to Cascade with subsequent Nt.BspQI nicking of the plasmid. Lane 1 and 6 contain only pUC-.lamda.. Lane 2 and 7 contain pUC-.lamda. mixed with Cascade. Lane 3 and 8 contain pUC-.lamda. mixed with Cascade and subsequent Nt.BspQI cleavage. Lane 4 and 9 contain pUC-.lamda. mixed with Cascade, followed by addition of a ssDNA probe complementary to the displaced strand in the R-loop and subsequent cleavage with Nt.BspQI. Lane 5 and 10 contain pUC-.lamda. cleaved with Nt.BspQI. I) Gel-shift of pUC-.lamda. bound to Cascade with subsequent EcoRI cleavage of both strands of the plasmid. Lane 1 and 6 contain only pUC-.lamda.. Lane 2 and 7 contain pUC-.lamda. mixed with Cascade. Lane 3 and 8 contain pUC-.lamda. mixed with Cascade and subsequent EcoRI cleavage. Lane 4 and 9 contain pUC-.lamda. mixed with Cascade, followed by addition of a ssDNA probe complementary to the displaced strand in the R-loop and subsequent cleavage with EcoRI. Lane 5 and 10 contain pUC-.lamda. cleaved with EcoRI.
[0105] FIG. 2 shows scanning force micrographs demonstrating how Cascade induces bending of target DNA upon protospacer binding. A-P) Scanning force microscopy images of nSC plasmid DNA with J3-Cascade containing a targeting (J3) crRNA. pUC-.lamda. was mixed with J3-Cascade at a pUC-.lamda.:Cascade ratio of 1:7. Each image shows a 500.times.500 nm surface area. White dots correspond to Cascade.
[0106] FIG. 3 shows how BiFC analysis reveals that Cascade and Cas3 interact upon target recognition. A) Venus fluorescence of cells expressing Cascade.DELTA.Cse1 and CRISPR 7Tm, which targets 7 protospacers on the phage genome, and Cse1-N155Venus and Cas3-C85Venus fusion proteins. B) Brightfield image of the cells in (A). C) Overlay of (A) and (B). D) Venus fluorescence of phage .lamda. infected cells expressing Cascade.DELTA.Cse1 and CRISPR 7Tm, and Cse1-N155Venus and Cas3-C85Venus fusion proteins. E) Brightfield image of the cells in (G). F) Overlay of (G) and (H). G) Venus fluorescence of phage .lamda. infected cells expressing Cascade.DELTA.Cse1 and non-targeting CRISPR R44, and N155Venus and C85Venus proteins. H) Brightfield image of the cells in (J). I) Overlay of (J) and (K). J) Average of the fluorescence intensity of 4-7 individual cells of each strain, as determined using the profile tool of LSM viewer (Carl Zeiss).
[0107] FIG. 4 shows Cas3 nuclease and helicase activities during CRISPR-interference. A) Competent BL21-AI cells expressing Cascade, a Cas3 mutant and CRISPR J3 were transformed with pUC-.lamda.. Colony forming units per microgram pUC-.lamda. (cfu/.mu.g DNA) are depicted for each of the strains expressing a Cas3 mutant. Cells expressing wt Cas3 and CRISPR J3 or CRISPR R44 serve as positive and negative controls, respectively. B) BL21-AI cells carrying Cascade, Cas3 mutant, and CRISPR encoding plasmids as well as pUC-.lamda. are grown under conditions that suppress expression of the cas genes and CRISPR. At t=0 expression is induced. The percentage of cells that lost pUC-.lamda. over time is shown, as determined by the ratio of ampicillin sensitive and ampicillin resistant cells.
[0108] FIG. 5 shows how a Cascade-Cas3 fusion complex provides in vivo resistance and has in vitro nuclease activity. A) Coomassie Blue stained SDS-PAGE of purified Cascade and Cascade-Cas3 fusion complex. B) Efficiency of plaquing of phage .lamda. on cells expressing Cascade-Cas3 fusion complex and a targeting (J3) or non-targeting (R44) CRISPR and on cells expressing Cascade and Cas3 separately together with a targeting (J3) CRISPR. C) Gel-shift (in the absence of divalent metal ions) of nSC target plasmid with J3-Cascade-Cas3 fusion complex. pUC-.lamda. was mixed with 2-fold increasing amounts of J3-Cascade-Cas3, from a pUC-.lamda.:J3-Cascade-Cas3 molar ratio of 1:0.5 up to a 1:128 molar ratio. The first and last lane contain only pUC-.lamda.. D) Gel-shift (in the absence of divalent metal ions) of nSC non-target plasmid with J3-Cascade-Cas3 fusion complex. pUC-p7 was mixed with 2-fold increasing amounts of J3-Cascade-Cas3, from a pUC-p7:J3-Cascade-Cas3 molar ratio of 1:0.5 up to a 1:128 molar ratio. The first and last lane contain only pUC-p7. E) Incubation of nSC target plasmid (pUC-.lamda., left) or nSC non-target plasmid (pUC-p7, right) with J3-Cascade-Cas3 in the presence of 10 mM MgCl.sub.2. Lane 1 and 7 contain only plasmid. F) Assay as in (E) in the presence of 2 mM ATP. G) Assay as in (E) with the mutant J3-Cascade-Cas3K320N complex. H) Assay as in (G) in the presence of 2 mM ATP.
[0109] FIG. 6 is a schematic diagram showing a model of the CRISPR-interference type I pathway in E. coli.
[0110] FIG. 7 is a schematic diagram showing how a Cascade-FokI fusion embodiment of the invention is used to create FokI dimers which cuts dsDNA to produce blunt ends as part of a process of non-homologous end joining or homologous recombination.
[0111] FIG. 8 shows how BiFC analysis reveals that Cascade and Cas3 interact upon target recognition. Overlay of Brightfield image and Venus fluorescence of cells expressing Cascade without Cse1, Cse1-N155Venus and Cas3-C85Venus and either CRISPR 7Tm, which targets 7 protospacers on the phage Lambda genome, or the non-targeting CRISPR R44. Cells expressing CRISPR 7Tm are fluorescent only when infected with phage Lambda, while cells expressing CRISPR R44 are non-fluorescent. The highly intense fluorescent dots (outside cells) are due to light-reflecting salt crystals. White bars correspond to 10 micron.
[0112] FIG. 9 shows pUC-.lamda., sequences of 4 clones [SEQ ID NOs: 39-42] encoding CRISPR J3, Cascade and Cas3 (wt or S483AT485A) indicate that these are escape mutants carrying (partial) deletions of the protospacer or carrying a single point mutation in the seed region, which explains the inability to cure these plasmids.
[0113] FIG. 10 shows sequence alignments of cas3 genes from organisms containing the Type I-E CRISPR/Cas system. Alignment of cas3-cse1 genes from Streptomyces sp. SPB78 (1.sup.st sequence, Accession Number: ZP_07272643.1) [SEQ ID NO: 43], in Streptomyces griseus (2.sup.nd sequence, Accession Number YP_001825054) [SEQ ID NO: 44], and in Catenulispora acidiphila DSM 44928 (3.sup.rd sequence, Accession Number YP_003114638) [SEQ ID NO: 45] and an artificial E. coli Cas3-Cse1 fusion protein [SEQ ID NO: 46] which includes the polypeptide linker sequence from S. griseus.
[0114] FIG. 11 shows the design of a Cascade.sup.KKR/ELD nuclease pair in which FokI nuclease domains are mutated such that only heterodimers consisting of KKR and ELD nuclease domains are and the distance between the opposing binding sites may be varied to determine the optimal distance between a Cascade nuclease pair.
[0115] FIG. 12 is a schematic diagram showing genome targeting by a Cascade-FokI nuclease pair.
[0116] FIG. 13 shows an SDS PAGE gel of Cascade-nuclease complexes.
[0117] FIG. 14 shows electrophoresis gels of in vitro cleavage assays of Cascade.sup.KKR/ELD on plasmid DNA.
[0118] FIG. 15 shows Cascade.sup.KKR/ELD cleavage patterns and frequency [SEQ ID NO: 47].
EXAMPLES
Materials and Methods Used
Strains, Gene Cloning, Plasmids and Vectors
[0119] E. coli BL21-AI and E. coli BL21 (DE3) strains were used throughout. Table 1 lists all plasmids used in this study. The previously described pWUR408, pWUR480, pWUR404 and pWUR547 were used for production of Strep-tag II R44-Cascade, and pWUR408, pWUR514 and pWUR630 were used for production of Strep-tag II J3-Cascade (Jore et al., (2011) Nature Structural & Molecular Biology 18, 529-536; Semenova et al., (2011) Proceedings of the National Academy of Sciences of the United States of America 108, 10098-10103.) pUC-.lamda. (pWUR610) and pUC-p7 (pWUR613) have been described elsewhere (Jore et al., 2011; Semenova et al., 2011). The C85Venus protein is encoded by pWUR647, which corresponds to pET52b (Novagen) containing the synthetic GA1070943 construct (Table 2) (Geneart) cloned between the BamHI and NotI sites. The N155Venus protein is encoded by pWUR648, which corresponds to pRSF1b (Novagen) containing the synthetic GA1070941 construct (Table 2) (Geneart) cloned between the NotI and XhoI sites. The Cas3-C85Venus fusion protein is encoded by pWUR649, which corresponds to pWUR647 containing the Cas3 amplification product using primers BG3186 and BG3213 (Table 3) between the NcoI and BamHI sites. The CasA-N155Venus fusion protein is encoded by pWUR650, which corresponds to pWUR648 containing the CasA amplification product using primers BG3303 and BG3212 (Table 3) between the NcoI and BamHI sites. CRISPR 7Tm is encoded by pWUR651, which corresponds to pACYCDuet-1 (Novagen) containing the synthetic GA1068859 construct (Table 2) (Geneart) cloned between the NcoI and KpnI sites. The Cascade encoding pWUR400, the Cascade.DELTA.Cse1 encoding WUR401 and the Cas3 encoding pWUR397 were described previously (Jore et al., 2011). The Cas3H74A encoding pWUR652 was constructed using site directed mutagenesis of pWUR397 with primers BG3093, BG3094 (Table 3).
TABLE-US-00003 TABLE 1 Plasmids used Description and order Restriction Plasmids of genes (5'-3') sites Primers Source pWUR397 cas3 in pRSF-1b, no 1 tags pWUR400 casA-casB-casC-casD- 1 casE in pCDF-1b, no tags pWUR401 casB-casC-casD-casE 1 in pCDF-lb, no tags pWUR404 casE in pCDF-1b, no 1 tags pWUR408 casA in pRSF-1b, no 1 tags pWUR480 casB with Strep-tag II 1 (N-term)-casC-casD in pET52b pWUR514 casB with Strep-tag II 2 (N-term)-casC-casD- CasE in pET52b pWUR547 E. coli R44 CRISPR, 7x 2 spacer nr. 2, in pACYCDuet-1 pWUR613 pUC-p7; pUC19 2 containing R44- protospacer on a 350 bp phage P7 amplicon pWUR630 CRISPR poly J3, 5x NcoI/KpnI This spacer J3 in study pACYCDuet-1 pWUR610 pUC-.lamda.;pUC19 3 containing J3- protospacer on a 350 bp phage .lamda. amplicon pWUR647 C85Venus; GA1070943 BamHI/NotI This (Table S1) in pET52b study pWUR648 N155Venus; NotI/XhoI This GA1070941 (Table S1) study in pRSF1b pWUR649 cas3-C85Venus; NcoI/BamHI BG3186 + This pWUR647 containing BG3213 study cas3 amplicon pWUR650 casA-N155Venus NcoI/NotI BG3303 + This pWUR648 containing BG3212 study casA amplicon pWUR651 CRISPR 7Tm; NcoI/KpnI This GA1068859 (Table S1) study in pACYCDuet-1 casB with Strep-tag II This (N-term)-casC-casD- study CasE in pCDF-1b cas3-casA fusion This study cas3H74A-CasA fusion This study cas3D75A-CasA fusion This study cas3K320N-CasA This fusion study cas3D452N-CasA This fusion study
[0120] Source 1 in the table above is Brouns et al (2008) Science 321, 960-964.
[0121] Source 2 in the table above is Jore et al (2011) Nature Structural & Molecular Biology 18: 529-537.
TABLE-US-00004
[0121] TABLE 2 Synthetic Constructs GA1070943 [SEQ ID NO: 6] ACTGGAAAGCGGGCAGTGAAAGGAAGGCCCATGAGGCCAGTTAATTAA GCGGATCCTGGCGGCGGCAGCGGCGGCGGCAGCGACAAGCAGAAGAAC GGCATCAAGGCGAACTTCAAGATCCGCCACAACATCGAGGACGGCGGC GTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGC CCCGTGCTGCTGCCCGACAACCACTACCTGAGCTACCAGTCCGCCCTG AGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTC GTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGTAA GCGGCCGCGGCGCGCCTAGGCCTTGACGGCCTTCCTTCAATTCGCCCT ATAGTGAG GA1070941 [SEQ ID NO: 7] CACTATAGGGCGAATTGGCGGAAGGCCGTCAAGGCCGCATTTAATTAA GCGGCCGCAGGCGGCGGCAGCGGCGGCGGCAGCATGGTGAGCAAGGGC GAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGC GACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGAT GCCACCTACGGCAAGCTGACCCTGAAGCTCATCTGCACCACCGGCAAG CTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTCGGCTACGGCCTG CAGTGCTTCGCCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTC AAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTC AAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGC GACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAG GACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCAC AACGTCTATATCACGGCCTAACTCGAGGGCGCGCCCTGGGCCTCATGG GCCTTCCGCTCACTGCCCGCTTTCCAG GA1068859 [SEQ ID NO: 8] CACTATAGGGCGAATTGGCGGAAGGCCGTCAAGGCCGCATGAGCTCCA TGGAAACAAAGAATTAGCTGATCTTTAATAATAAGGAAATGTTACATT AAGGTTGGTGGGTTGTTTTTATGGGAAAAAATGCTTTAAGAACAAATG TATACTTTTAGAGAGTTCCCCGCGCCAGCGGGGATAAACCGGGCCGAT TGAAGGTCCGGTGGATGGCTTAAAAGAGTTCCCCGCGCCAGCGGGGAT AAACCGCCGCAGGTACAGCAGGTAGCGCAGATCATCAAGAGTTCCCCG CGCCAGCGGGGATAAACCGACTTCTCTCCGAAAAGTCAGGACGCTGTG GCAGAGTTCCCCGCGCCAGCGGGGATAAACCGCCTACGCGCTGAACGC CAGCGGTGTGGTGAATGAGTTCCCCGCGCCAGCGGGGATAAACCGGTG TGGCCATGCACGCCTTTAACGGTGAACTGGAGTTCCCCGCGCCAGCGG GGATAAACCGCACGAACTCAGCCAGAACGACAAACAAAAGGCGAGTTC CCCGCGCCAGCGGGGATAAACCGGCACCAGTACGCGCCCCACGCTGAC GGTTTCTGAGTTCCCCGCGCCAGCGGGGATAAACCGCAGCTCCCATTT TCAAACCCAGGTACCCTGGGCCTCATGGGCCTTCCGCTCACTGCCCGC TTTCCAG GA1047360 [SEQ ID NO: 9] GAGCTCCCGGGCTGACGGTAATAGAGGCACCTACAGGCTCCGGTAAAA CGGAAACAGCGCTGGCCTATGCTTGGAAACTTATTGATCAACAAATTG CGGATAGTGTTATTTTTGCCCTCCCAACACAAGCTACCGCGAATGCTA TGCTTACGAGAATGGAAGCGAGCGCGAGCCACTTATTTTCATCCCCAA ATCTTATTCTTGCTCATGGCAATTCACGGTTTAACCACCTCTTTCAAT CAATAAAATCACGCGCGATTACTGAACAGGGGCAAGAAGAAGCGTGGG TTCAGTGTTGTCAGTGGTTGTCACAAAGCAATAAGAAAGTGTTTCTTG GGCAAATCGGCGTTTGCACGATTGATCAGGTGTTGATTTCGGTATTGC CAGTTAAACACCGCTTTATCCGTGGTTTGGGAATTGGTAGATCTGTTT TAATTGTTAATGAAGTTCATGCTTACGACACCTATATGAACGGCTTGC TCGAGGCAGTGCTCAAGGCTCAGGCTGATGTGGGAGGGAGTGTTATTC TTCTTTCCGCAACCCTACCAATGAAACAAAAACAGAAGCTTCTGGATA CTTATGGTCTGCATACAGATCCAGTGGAAAATAACTCCGCATATCCAC TCATTAACTGGCGAGGTGTGAATGGTGCGCAACGTTTTGATCTGCTAG CGGATCCGGTACC
TABLE-US-00005 TABLE 3 Primers BG3186 ATAGCGCCATGGAACCTTTTAAATATATATGCCATTA [SEQ ID NO: 10] BG3213 ACAGTGGGATCCGCTTTGGGATTTGCAGGGATGACTCTGGT [SEQ ID NO: 11] BG3303 ATAGCGTCATGAATTTGCTTATTGATAACTGGATTCCTGTACG [SEQ ID NO: 12] BG3212 ACAGTGGCGGCCGCGCCATTTGATGGCCCTCCTTGCGGTTTTAA [SEQ ID NO: 13] BG3076 CGTATATCAAACTTTCCAATAGCATGAAGAGCAATGAAAAATAAC [SEQ ID NO: 14] BG3449 ATGATACCGCGAGACCCACGCTC [SEQ ID NO: 15] BG3451 CGGATAAAGTTGCAGGACCACTTC [SEQ ID NO: 16]
Protein Production and Purification
[0122] Cascade was expressed and purified as described (Jore et al., 2011). Throughout purification a buffer containing 20 mM HEPES pH 7.5, 75 mM NaCl, 1 mM DTT, 2 mM EDTA was used for resuspension and washing. Protein elution was performed in the same buffer containing 4 mM desthiobiotin. The Cascade-Cas3 fusion complex was expressed and purified in the same manner, with washing steps being performed with 20 mM HEPES pH 7.5, 200 mM NaCl and 1 mM DTT, and elution in 20 mM HEPES pH 7.5, 75 mM NaCl, 1 mM DTT containing 4 mM desthiobiotin.
Electrophoretic Mobility Shift Assay
[0123] Purified Cascade or Cascade subsomplexes were mixed with pUC-.lamda. in a buffer containing 20 mM HEPES pH 7.5, 75 mM NaCl, 1 mM DTT, 2 mM EDTA, and incubated at 37.degree. C. for 15 minutes. Samples were run overnight on a 0.8% TAE Agarose gel and post-stained with SybR safe (Invitrogen) 1:10000 dilution in TAE for 30 minutes. Cleavage with BsmI (Fermentas) or Nt.BspQI (New England Biolabs) was performed in the HEPES reaction buffer supplemented with 5 mM MgCl.sub.2.
Scanning Force Microscopy
[0124] Purified Cascade was mixed with pUC-.lamda.(at a ratio of 7:1, 250 nM Cascade, 35 nM DNA) in a buffer containing 20 mM HEPES pH 7.5, 75 mM NaCl, 0.2 mM DTT, 0.3 mM EDTA and incubated at 37.degree. C. for 15 minutes. Subsequently, for AFM sample preparation, the incubation mixture was diluted 10.times. in double distilled water and MgCl.sub.2 was added at a final concentration of 1.2 mM. Deposition of the protein-DNA complexes and imaging was carried out as described before (Dame et al., (2000) Nucleic Acids Res. 28: 3504-3510).
Fluorescence Microscopy
[0125] BL21-AI cells carrying CRISPR en cas gene encoding plasmids, were grown overnight at 37.degree. C. in Luria-Bertani broth (LB) containing ampicillin (100 .mu.g/ml), kanamycin (50 .mu.g/ml), streptomycin (50 .mu.g/ml) and chloramphenicol (34 .mu.g/ml). Overnight culture was diluted 1:100 in fresh antibiotic-containing LB, and grown for 1 hour at 37.degree. C. Expression of cas genes and CRISPR was induced for 1 hour by adding L-arabinose to a final concentration of 0.2% and IPTG to a final concentration of 1 mM. For infection, cells were mixed with phage Lambda at a Multiplicity of Infection (MOI) of 4. Cells were applied to poly-L-lysine covered microscope slides, and analyzed using a Zeiss LSM510 confocal laser scanning microscope based on an Axiovert inverted microscope, with a 40.times. oil immersion objective (N.A. of 1.3) and an argon laser as the excitation source (514 nm) and detection at 530-600 nm. The pinhole was set at 203 .mu.m for all measurements.
pUC-.lamda. Transformation Studies
[0126] LB containing kanamycin (50 .mu.g/ml), streptomycin (50 .mu.g/ml) and chloramphenicol (34 .mu.g/ml) was inoculated from an overnight pre-inoculum and grown to an OD.sub.600 of 0.3. Expression of cas genes and CRISPR was induced for 45 minutes with 0.2% L-arabinose and 1 mM IPTG. Cells were collected by centrifugation at 4.degree. C. and made competent by resuspension in ice cold buffer containing 100 mM RbCl.sub.2, 50 mM MnCl.sub.2, 30 mM potassium acetate, 10 mM CaCl.sub.2 and 15% glycerol, pH 5.8. After a 3 hour incubation, cells were collected and resuspended in a buffer containing 10 mM MOPS, 10 mM RbCl, 75 mM CaCl.sub.2, 15% glycerol, pH 6.8. Transformation was performed by adding 80 ng pUC-.lamda., followed by a 1 minute heat-shock at 42.degree. C., and 5 minute cold-shock on ice. Next cells were grown in LB for 45 minutes at 37.degree. C. before plating on LB-agar plates containing 0.2% L-arabinose, 1 mM IPTG, ampicillin (100 .mu.g/ml), kanamycin (50 .mu.g/ml), streptomycin (50 .mu.g/ml) and chloramphenicol (34 .mu.g/ml).
[0127] Plasmid curing was analyzed by transforming BL21-AI cells containing cas gene and CRISPR encoding plasmids with pUC-.lamda., while growing the cells in the presence of 0.2% glucose to suppress expression of the T7-polymerase gene. Expression of cas genes and CRISPR was induced by collecting the cells and re-suspension in LB containing 0.2% arabinose and 1 mM IPTG. Cells were plated on LB-agar containing either streptomycin, kanamycin and chloramphenicol (non-selective for pUC-.lamda.) or ampicillin, streptomycin, kanamycin and chloramphenicol (selective for pUC-.lamda.). After overnight growth the percentage of plasmid loss can be calculated from the ratio of colony forming units on the selective and non-selective plates.
Phage Lambda Infection Studies
[0128] Host sensitivity to phage infection was tested using a virulent phage Lambda (.lamda..sub.vir), as in (Brouns et al (2008) Science 321, 960-964.). The sensitivity of the host to infection was calculated as the efficiency of plaquing (the plaque count ratio of a strain containing an anti-.lamda. CRISPR to that of the strain containing a non-targeting R44 CRISPR) as described in Brouns et al (2008).
Example 1
Cascade Exclusively Binds Negatively Supercoiled Target DNA
[0129] The 3 kb pUC19-derived plasmid denoted pUC-.lamda., contains a 350 bp DNA fragment corresponding to part of the J gene of phage .lamda., which is targeted by J3-Cascade (Cascade associated with crRNA containing spacer J3 (Westra et al (2010) Molecular Microbiology 77, 1380-1393). The electrophoretic mobility shift assays show that Cascade has high affinity only for negatively supercoiled (nSC) target plasmid. At a molar ratio of J3-Cascade to pUC-.lamda. of 6:1 all nSC plasmid was bound by Cascade, (see FIG. 1A), while Cascade carrying the non-targeting crRNA R44 (R44-Cascade) displayed non-specific binding at a molar ratio of 128:1 (see FIG. 1B). The dissociation constant (Kd) of nSC pUC-.lamda. was determined to be 13.+-.1.4 nM for J3-Cascade (see FIG. 1E) and 429.+-.152 nM for R44-Cascade (see FIG. 1F).
[0130] J3-Cascade was unable to bind relaxed target DNA with measurable affinity, such as nicked (see FIG. 1C) or linear pUC-.lamda. (see FIG. 1D), showing that Cascade has high affinity for larger DNA substrates with a nSC topology.
[0131] To distinguish non-specific binding from specific binding, the BsmI restriction site located within the protospacer was used. While adding BsmI enzyme to pUC-.lamda. gives a linear product in the presence of R44-Cascade (see FIG. 1G, lane 4), pUC-.lamda. is protected from BsmI cleavage in the presence of J3-Cascade (see FIG. 1G, lane 7), indicating specific binding to the protospacer. This shows that Cas3 is not required for in vitro sequence specific binding of Cascade to a protospacer sequence in a nSC plasmid.
[0132] Cascade binding to nSC pUC-.lamda. was followed by nicking with Nt.BspQI, giving rise to an OC topology. Cascade is released from the plasmid after strand nicking, as can be seen from the absence of a mobility shift (see FIG. 1H, compare lane 8 to lane 10). In contrast, Cascade remains bound to its DNA target when a ssDNA probe complementary to the displaced strand is added to the reaction before DNA cleavage by Nt.BspQI (see FIG. 1H, lane 9). The probe artificially stabilizes the Cascade R-loop on relaxed target DNA. Similar observations are made when both DNA strands of pUC-.lamda. are cleaved after Cascade binding (see FIG. 1I, lane 8 and lane 9).
Example 2
Cascade Induces Bending of Bound Target DNA
[0133] Complexes formed between purified Cascade and pUC-.lamda. were visualized. Specific complexes containing a single bound J3-Cascade complex were formed, while unspecific R44-Cascade yields no DNA bound complexes in this assay under identical conditions. Out of 81 DNA molecules observed 76% were found to have J3-Cascade bound (see FIGS. 2A-P). Of these complexes in most cases Cascade was found at the apex of a loop (86%), whereas a small fraction only was found at non-apical positions (14%). These data show that Cascade binding causes bending and possibly wrapping of the DNA, probably to facilitate local melting of the DNA duplex.
Example 3
Naturally Occurring Fusions of Cas3 and Cse1: Cas3 Interacts with Cascade Upon Protospacer Recognition
[0134] Figure S3 shows sequence analysis of cas3 genes from organisms containing the Type I-E CRISPR/Cas system reveals that Cas3 and Cse1 occur as fusion proteins in Streptomyces sp. SPB78 (Accession Number: ZP 07272643.1), in Streptomyces griseus (Accession Number YP_001825054), and in Catenulispora acidiphila DSM 44928 (Accession Number YP_003114638).
Example 4
Bimolecular Fluorescence Complementation (BiFC) Shows how a Cse1 Fusion Protein Forming Part of Cascade Continues to Interact with Cas3
[0135] BiFC experiments were used to monitor interactions between Cas3 and Cascade in vivo before and after phage .lamda. infection. BiFC experiments rely on the capacity of the non-fluorescent halves of a fluorescent protein, e.g., Yellow Fluorescent Protein (YFP) to refold and to form a fluorescent molecule when the two halves occur in close proximity. As such, it provides a tool to reveal protein-protein interactions, since the efficiency of refolding is greatly enhanced if the local concentrations are high, e.g., when the two halves of the fluorescent protein are fused to interaction partners. Cse1 was fused at the C-terminus with the N-terminal 155 amino acids of Venus (Cse1-N155Venus), an improved version of YFP (Nagai et al (2002) Nature Biotechnology 20, 87-90). Cas3 was C-terminally fused to the C-terminal 85 amino acids of Venus (Cas3-C85Venus).
[0136] BiFC analysis reveals that Cascade does not interact with Cas3 in the absence of invading DNA (FIG. 3ABC, FIG. 3P and FIG. 8). Upon infection with phage .lamda., however, cells expressing Cascade.DELTA.Cse1, Cse1-N155Venus and Cas3-C85Venus are fluorescent if they co-express the anti-2, CRISPR 7Tm (FIG. 3DEF, FIG. 3P and FIG. 8). When they co-express a non-targeting CRISPR R44 (FIG. 3GHI, FIG. 3P and FIG. 8), the cells remain non-fluorescent. This shows that Cascade and Cas3 specifically interact during infection upon protospacer recognition and that Cse1 and Cas3 are in close proximity of each other in the Cascade-Cas3 binary effector complex.
[0137] These results also show quite clearly that a fusion of Cse1 with an heterologous protein does not disrupt the ribonucleoprotein formation of Cascade and crRNA, nor does it disrupt the interaction of Cascade and Cas3 with the target phage DNA, even when the Cas3 itself is also a fusion protein.
Example 5
Preparing a Designed Cas3-Cse1 Fusion Gives a Protein with In Vivo Functional Activity
[0138] Providing in vitro evidence for Cas3 DNA cleavage activity required purified and active Cas3. Despite various solubilization strategies, Cas3 overproduced (Howard et al (2011) Biochem. J. 439, 85-95) in E. coli BL21 is mainly present in inactive aggregates and inclusion bodies. Cas3 was therefore produced as a Cas3-Cse1 fusion protein, containing a linker identical to that of the Cas3-Cse1 fusion protein in S. griseus (see FIG. 10). When co-expressed with Cascade.DELTA.Cse1 and CRISPR J3, the fusion-complex was soluble and was obtained in high purity with the same apparent stoichiometry as Cascade (FIG. 5A). When functionality of this complex was tested for providing resistance against phage .lamda. infection, the efficiency of plaquing (eop) on cells expressing the fusion-complex J3-Cascade-Cas3 was identical as on cells expressing the separate proteins (FIG. 5B).
[0139] Since the J3-Cascade-Cas3 fusion-complex was functional in vivo, in vitro DNA cleavage assays were carried out using this complex. When J3-Cascade-Cas3 was incubated with pUC-.lamda. in the absence of divalent metals, plasmid binding was observed at molar ratios similar to those observed for Cascade (FIG. 5C), while a-specific binding to a non-target plasmid (pUC-p7, a pUC19 derived plasmid of the same size as pUC-.lamda., but lacking a protospacer) occurred only at high molar ratios (FIG. 5D), indicating that a-specific DNA binding of the complex is also similar to that of Cascade alone.
[0140] Interestingly, the J3-Cascade-Cas3 fusion complex displays magnesium dependent endonuclease activity on nSC target plasmids. In the presence of 10 mM Mg.sup.2+ J3-Cascade-Cas3 nicks nSC pUC-.lamda. (FIG. 5E, lane 3-7), but no cleavage is observed for substrates that do not contain the target sequence (FIG. 5E, lane 9-13), or that have a relaxed topology. No shift of the resulting OC band is observed, in line with previous observations that Cascade dissociates spontaneously after cleavage, without requiring ATP-dependent Cas3 helicase activity. Instead, the helicase activity of Cas3 appears to be involved in exonucleolytic plasmid degradation. When both magnesium and ATP are added to the reaction, full plasmid degradation occurred (FIG. 5H).
[0141] The inventors have found that Cascade alone is unable to bind protospacers on relaxed DNA. In contrast, the inventors have found that Cascade efficiently locates targets in negatively supercoiled DNA, and subsequently recruits Cas3 via the Cse1 subunit. Endonucleolytic cleavage by the Cas3 HD-nuclease domain causes spontaneous release of Cascade from the DNA through the loss of supercoiling, remobilizing Cascade to locate new targets. The target is then progressively unwound and cleaved by the joint ATP-dependent helicase activity and HD-nuclease activity of Cas3, leading to complete target DNA degradation and neutralization of the invader.
[0142] Referring to FIG. 6 and without wishing to be bound to any particular theory, a mechanism of operation for the CRISPR-interference type I pathway in E. coli may involve (1) First, Cascade carrying a crRNA scans the nSC plasmid DNA for a protospacer, with adjacent PAM. Whether during this stage strand separation occurs is unknown. (2) Sequence specific protospacer binding is achieved through basepairing between the crRNA and the complementary strand of the DNA, forming an R-loop. Upon binding, Cascade induces bending of the DNA. (3) The Cse1 subunit of Cascade recruits Cas3 upon DNA binding. This may be achieved by Cascade conformational changes that take place upon nucleic acid binding. (4) The HD-domain (darker part) of Cas3 catalyzes Mg.sup.2+-dependent nicking of the displaced strand of the R-loop, thereby altering the topology of the target plasmid from nSC to relaxed OC. (5a and 5b) The plasmid relaxation causes spontaneous dissociation of Cascade. Meanwhile Cas3 displays ATP-dependent exonuclease activity on the target plasmid, requiring the helicase domain for target dsDNA unwinding and the HD-nuclease domain for successive cleavage activity. (6) Cas3 degrades the entire plasmid in an ATP-dependent manner as it processively moves along, unwinds and cleaves the target dsDNA.
Example 6
Preparation of Artificial Cas-Strep Tag Fusion Proteins and Assembly of Cascade Complexes
[0143] Cascade complexes are produced and purified as described in Brouns et al (2008) Science 321: 960-4 (2008), using the expression plasmids listed in Supplementary Table 3 of Jore et al (2011) Nature Structural & Molecular Biology 18: 529-537. Cascade is routinely purified with an N-terminal Strep-tag II fused to CasB (or CasC in CasCDE). Size exclusion chromatography (Superdex 200 HR 10/30 (GE)) is performed using 20 mM Tris-HCl (pH 8.0), 0.1 M NaCl, 1 mM dithiotreitol. Cascade preparations (.about.0.3 mg) are incubated with DNase I (Invitrogen) in the presence of 2.5 mM MgCl.sub.2 for 15 min at 37.degree. C. prior to size exclusion analysis. Co-purified nucleic acids are isolated by extraction using an equal volume of phenol:chloroform:isoamylalcohol (25:24:1) pH 8.0 (Fluka), and incubated with either DNase I (Invitrogen) supplemented with 2.5 mM MgCl.sub.2 or RNase A (Fermentas) for 10 min at 37.degree. C. Cas subunit proteins fused to the amino acid sequence of Strep-Tag are produced.
[0144] Plaque assays showing the biological activity of the Strep-Tag Cascade subunits are performed using bacteriophage Lambda and the efficiency of plaquing (EOP) was calculated as described in Brouns et al (2008).
[0145] For purification of crRNA, samples are analyzed by ion-pair reversed-phased-HPLC on an Agilent 1100 HPLC with UV.sub.260nm detector (Agilent) using a DNAsep column 50 mm.times.4.6 mm I. D. (Transgenomic, San Jose, Calif.). The chromatographic analysis is performed using the following buffer conditions: A) 0.1 M triethylammonium acetate (TEAA) (pH 7.0) (Fluka); B) buffer A with 25% LC MS grade acetonitrile (v/v) (Fisher). crRNA is obtained by injecting purified intact Cascade at 75.degree. C. using a linear gradient starting at 15% buffer B and extending to 60% B in 12.5 min, followed by a linear extension to 100% B over 2 min at a flow rate of 1.0 ml/min. Hydrolysis of the cyclic phosphate terminus was performed by incubating the HPLC-purified crRNA in a final concentration of 0.1 M HCl at 4.degree. C. for 1 hour. The samples are concentrated to 5-10 .mu.l on a vacuum concentrator (Eppendorf) prior to ESI-MS analysis.
[0146] Electrospray Ionization Mass spectrometry analysis of crRNA is performed in negative mode using an UHR-TOF mass spectrometer (maXis) or an HCT Ultra PTM Discovery instrument (both Bruker Daltonics), coupled to an online capillary liquid chromatography system (Ultimate 3000, Dionex, UK). RNA separations are performed using a monolithic (PS-DVB) capillary column (200 .mu.m.times.50 mm I.D., Dionex, UK). The chromatography is performed using the following buffer conditions: C) 0.4 M 1,1,1,3,3,3,-Hexafluoro-2-propanol (HFIP, Sigma-Aldrich) adjusted with triethylamine (TEA) to pH 7.0 and 0.1 mM TEAA, and D) buffer C with 50% methanol (v/v) (Fisher). RNA analysis is performed at 50.degree. C. with 20% buffer D, extending to 40% D in 5 min followed by a linear extension to 60% D over 8 min at a flow rate of 2 .mu.l/min.
[0147] Cascade protein is analyzed by native mass spectrometry in 0.15 M ammonium acetate (pH 8.0) at a protein concentration of 5 .mu.M. The protein preparation is obtained by five sequential concentration and dilution steps at 4.degree. C. using a centrifugal filter with a cut-off of 10 kDa (Millipore). Proteins are sprayed from borosilicate glass capillaries and analyzed on a LCT electrospray time-of-flight or modified quadrupole time-of-flight instruments (both Waters, UK) adjusted for optimal performance in high mass detection (see Tahallah N et al (2001) Rapid Commun Mass Spectrom 15: 596-601 (2001) and van den Heuvel, R. H. et al. Anal Chem 78: 7473-83 (2006). Exact mass measurements of the individual Cas proteins were acquired under denaturing conditions (50% acetonitrile, 50% MQ, 0.1% formic acid). Sub-complexes in solution were generated by the addition of 2-propanol to the spray solution to a final concentration of 5% (v/v). Instrument settings were as follows; needle voltage .about.1.2 kV, cone voltage .about.175 V, source pressure 9 mbar. Xenon was used as the collision gas for tandem mass spectrometric analysis at a pressure of 1.5 10.sup.-2 mbar. The collision voltage varied between 10-200 V.
[0148] Electrophoretic mobility shift assays (EMSA) are used to demonstrate the functional activity of Cascade complexes for target nucleic acids. EMSA is performed by incubating Cascade, CasBCDE or CasCDE with 1 nM labelled nucleic acid in 50 mM Tris-Cl pH 7.5, 100 mM NaCl. Salmon sperm DNA (Invitrogen) is used as competitor. EMSA reactions are incubated at 37.degree. C. for 20-30 min prior to electrophoresis on 5% polyacrylamide gels. The gels are dried and analyzed using phosphor storage screens and a PMI phosphor imager (Bio-Rad). Target DNA binding and cleavage activity of Cascade is tested in the presence of 1-10 mM Ca, Mg or Mn-ions.
[0149] DNA targets are gel-purified long oligonucleotides (Isogen Life Sciences or Biolegio), listed in Supplementary Table 3 of Jore et al (2011). The oligonucleotides are end-labeled using .gamma..sup.32P-ATP (PerkinElmer) and T4 kinase (Fermentas). Double-stranded DNA targets are prepared by annealing complementary oligonucleotides and digesting remaining ssDNA with Exonuclease I (Fermentas). Labelled RNA targets are in vitro transcribed using T7 Maxiscript or T7 Mega Shortscript kits (Ambion) with .alpha..sup.32P-CTP (PerkinElmer) and removing template by DNase I (Fermentas) digestion. Double stranded RNA targets are prepared by annealing complementary RNAs and digesting surplus ssRNA with RNase T1 (Fermentas), followed by phenol extraction.
[0150] Plasmid mobility shift assays are performed using plasmid pWUR613 containing the R44 protospacer. The fragment containing the protospacer is PCR-amplified from bacteriophage P7 genomic DNA using primers BG3297 and BG 3298 (see Supplementary Table 3 of Jore et al (2011). Plasmid (0.4 .mu.g) and Cascade were mixed in a 1:10 molar ratio in a buffer containing 5 mM Tris-HCl (pH 7.5) and 20 mM NaCl and incubated at 37.degree. C. for 30 minutes.
[0151] Cascade proteins were then removed by proteinase K treatment (Fluka) (0.15 U, 15 min, 37.degree. C.) followed by phenol/chloroform extraction. RNA-DNA complexes were then treated with RNaseH (Promega) (2 U, 1 h, 37.degree. C.).
[0152] Strep-Tag-Cas protein subunit fusions which form Cascade protein complexes or active sub-complexes with the RNA component (equivalent to a crRNA), have the expected biological and functional activity of scanning and specific attachment and cleavage of nucleic acid targets. Fusions of the Cas subunits with the amino acid chains of fluorescent dyes also form Cascade complexes and sub-complexes with the RNA component (equivalent to crRNA) which retains biological and functional activity and allows visualisation of the location of a target nucleic acid sequence in ds DNA for example.
Example 7
A Cascade-Nuclease Pair and Test of Nuclease Activity In Vitro
[0153] Six mutations designated "Sharkey" have been introduced by random mutagenesis and screening to improve nuclease activity and stability of the non-specific nuclease domain from Flavobacterium okeanokoites restriction enzyme FokI (see Guo, J., et al. (2010) J. Mol. Biol. 400: 96-107). Other mutations have been introduced that reduce off-target cleavage activity. This is achieved by engineering electrostatic interactions at the FokI dimer interface of a ZFN pair, creating one FokI variant with a positively charged interface (KKR, E490K, I538K, H537R) and another with a negatively charged interface (ELD, Q486E, I499L, N496D) (see Doyon, Y., et al. (2011) Nature Methods 8: 74-9). Each of these variants is catalytically inactive as a homodimer, thereby reducing the frequency of off-target cleavage.
Cascade-Nuclease Design
[0154] We translationally fused improved FokI nucleases to the N-terminus of Cse1 to generate variants of Cse1 being FokI.sup.KKR-Cse1 and FokI.sup.ELD-Cse1, respectively. These two variants are co-expressed with Cascade subunits (Cse2, Cas7, Cas5 and Cas6e), and one of two distinct CRISPR plasmids with uniform spacers. This loads the Cascade.sup.KKR complex with uniform P7-crRNA, and the Cascade.sup.ELD complex with uniform M13 g8-crRNA. These complexes are purified using the N-terminally StrepII-tagged Cse2 as described in Jore, M. M., et al., (2011) Nat. Struct. Mol. Biol. 18(5): 529-536. Furthermore an additional purification step can be carried out using an N-terminally HIS-tagged FokI, to ensure purifying full length and intact Cascade-nuclease fusion complexes.
[0155] The nucleotide and amino acid sequences of the fusion proteins used in this example were as follows:
TABLE-US-00006 >nucleotide sequence of FokI-(Sharkey-ELD)-Cse1 [SEQ ID NO: 18] ATGGCTCAACTGGTTAAAAGCGAACTGGAAGAGAAAAAAAGTGAACTGCGCCACAAACTGAAATATGTGCCGCA- TGAA TATATCGAGCTGATTGAAATTGCACGTAATCCGACCCAGGATCGTATTCTGGAAATGAAAGTGATGGAATTTTT- TATG AAAGTGTACGGCTATCGCGGTGAACATCTGGGTGGTAGCCGTAAACCGGATGGTGCAATTTATACCGTTGGTAG- CCCG ATTGATTATGGTGTTATTGTTGATACCAAAGCCTATAGCGGTGGTTATAATCTGCCGATTGGTCAGGCAGATGA- AATG GAACGTTATGTGGAAGAAAATCAGACCCGTGATAAACATCTGAATCCGAATGAATGGTGGAAAGTTTATCCGAG- CAGC GTTACCGAGTTTAAATTCCTGTTTGTTAGCGGTCACTTCAAAGGCAACTATAAAGCACAGCTGACCCGTCTGAA- TCAT ATTACCAATTGTAATGGTGCAGTTCTGAGCGTTGAAGAACTGCTGATTGGTGGTGAAATGATTAAAGCAGGCAC- CCTG ACCCTGGAAGAAGTTCGTCGCAAATTTAACAATGGCGAAATCAACTTTGCGGATCCCACCAACCGCGCGAAAGG- CCTG GAAGCGGTGAGCGTGGCGAGCatgaatttgcttattgataactggattcctgtacgcccgcgaaacggggggaa- agtc caaatcataaatctgcaatcgctatactgcagtagagatcagtggcgattaagtttgccccgtgacgatatgga- actg gccgctttagcactgctggtttgcattgggcaaattatcgccccggcaaaagatgacgttgaatttcgacatcg- cata atgaatccgctcactgaagatgagtttcaacaactcatcgcgccgtggatagatatgttctaccttaatcacgc- agaa catccattatgcagaccaaaggtgtcaaagcaaatgatgtgactccaatggaaaaactgttggctggggtaagc- ggcg cgacgaattgtgcatttgtcaatcaaccggggcagggtgaagcattatgtggtggatgcactgcgattgcgtta- ttca accaggcgaatcaggcaccaggttttggtggtggttttaaaagcggtttacgtggaggaacacctgtaacaacg- ttcg tacgtgggatcgatcttcgttcaacggtgttactcaatgtcctcacattacctcgtatcaaaaacaatttccta- atga atcacatacggaaaaccaacctacctggattaaacctatcaagtccaatgagtctatacctgcttcgtcaattg- ggtt tgtccgtggtctattctggcaaccagcgcatattgaattatgcgatcccattgggattggtaaatgttcttgct- gtgg acaggaaagcaatttgcgttataccggttttcttaaggaaaaatttacctttacagttaatgggctatggcccc- atcc gcattccccttgtctggtaacagtcaagaaaggggaggttgaggaaaaatttcttgctttcaccacctccgcac- catc atggacacaaatcagccgagttgtggtagataagattattcaaaatgaaaatggaaatcgcgtggcggcggttg- tgaa tcaattcagaaatattgcgccgcaaagtcctatgaattgattatggggggatatcgtaataatcaagcatctat- tctt gaacggcgtcatgatgtgttgatgtttaatcaggggtggcaacaatacggcaatgtgataaacgaaatagtgac- tgtt ggtttgggatataaaacagccttacgcaaggcgttatatacctttgcagaagggtttaaaaataaagacttcaa- aggg gccggagtctctgttcatgagactgcagaaaggcatttctatcgacagagtgaattattaattcccgatgtact- ggcg aatgttaatttttcccaggctgatgaggtaatagctgatttacgagacaaacttcatcaattgtgtgaaatgct- attt aatcaatctgtagctccctatgcacatcatcctaaattaataagcacattagcgcttgcccgcgccacgctata- caaa catttacgggagttaaaaccgcaaggagggccatcaaatggctga >protein sequence of FokI-(Sharkey-ELD)-Cse1 [SEQ ID NO: 19] MAQLVKSELEEKKSELRHKLKYVPHEYIELIEIARNPTQDRILEMKVMEFFMKVYGYRGEHLGGSRKPDGAIYT- VGSP IDYGVIVDTKAYSGGYNLPIGQADEMERYVEENQTRDKHLNPNEWWKVYPSSVTEEKELEVSGHFKGNYKAQLT- RLNH ITNCNGAVLSVEELLIGGEMIKAGTLTLEEVRRKENNGEINEADPTNRAKGLEAVSVASMNLLIDNWIPVRPRN- GGKV QIINLQSLYCSRDQWRLSLPRDDMELAALALLVCIGQIIAPAKDDVEFRHRIMNPLTEDEFQQLIAPWIDMFYL- NHAE HPFMQTKGVKANDVTPMEKLLAGVSGATNCAFVNQPGQGEALCGGCTAIALFNQANQAPGEGGGFKSGLRGGTP- VTTF VRGIDLRSTVLLNVLTLPRLQKQFPNESHTENQPTWIKPIKSNESIPASSIGFVRGLFWQPAHIELCDPIGIGK- CSCC GQESNLRYTGELKEKETFTVNGLWPHPHSPCLVTVKKGEVEEKELAFTTSAPSWTQISRVVVDKIIQNENGNRV- AAVV NQFRNIAPQSPLELEVIGGYRNNQASILERRHDVLMENQGWQQYGNVINEIVTVGLGYKTALRKALYTEAEGEK- NKDE KGAGVSVHETAERHFYRQSELLIPDVLANVNFSQADEVIADLRDKLHQLCEMLFNQSVAPYAHHPKLISTLALA- RATL YKHLRELKPQGGPSNG* >nucleotide sequence of FokI-(Sharkey -KKR)-Cse1 [SEQ ID NO: 20] ATGGCTCAACTGGTTAAAAGCGAACTGGAAGAGAAAAAAAGTGAACTGCGCCACAAACTGAAATATGTGCCGCA- TGAA TATATCGAGCTGATTGAAATTGCACGTAATCCGACCCAGGATCGTATTCTGGAAATGAAAGTGATGGAATTTTT- TATG AAAGTGTACGGCTATCGCGGTGAACATCTGGGTGGTAGCCGTAAACCGGATGGTGCAATTTATACCGTTGGTAG- CCCG ATTGATTATGGTGTTATTGTTGATACCAAAGCCTATAGCGGTGGTTATAATCTGCCGATTGGTCAGGCAGATGA- AATG CAGCGTTATGTGAAAGAAAATCAGACCCGCAACAAACATATTAACCCGAATGAATGGTGGAAAGTTTATCCGAG- CAGC GTTACCGAGTTTAAATTCCTGTTTGTTAGCGGTCACTTCAAAGGCAACTATAAAGCACAGCTGACCCGTCTGAA- TCGT AAAACCAATTGTAATGGTGCAGTTCTGAGCGTTGAAGAACTGCTGATTGGTGGTGAAATGATTAAAGCAGGCAC- CCTG ACCCTGGAAGAAGTTCGTCGCAAATTTAACAATGGCGAAATCAACTTTGCGGATCCCACCAACCGCGCGAAAGG- CCTG GAAGCGGTGAGCGTGGCGAGCatgaatttgcttattgataactggattcctgtacgcccgcgaaacggggggaa- agtc caaatcataaatctgcaatcgctatactgcagtagagatcagtggcgattaagtttgccccgtgacgatatgga- actg gccgctttagcactgctggtttgcattgggcaaattatcgccccggcaaaagatgacgttgaatttcgacatcg- cata atgaatccgctcactgaagatgagtttcaacaactcatcgcgccgtggatagatatgttctaccttaatcacgc- agaa catccctttatgcagaccaaaggtgtcaaagcaaatgatgtgactccaatggaaaaactgttggctggggtaag- cggc gcgacgaattgtgcatttgtcaatcaaccggggcagggtgaagcattatgtggtggatgcactgcgattgcgtt- attc aaccaggcgaatcaggcaccaggttttggtggtggttttaaaagcggtttacgtggaggaacacctgtaacaac- gttc gtacgtgggatcgatcttcgttcaacggtgttactcaatgtcctcacattacctcgtcttcaaaaacaatttcc- taat gaatcacatacggaaaaccaacctacctggattaaacctatcaagtccaatgagtctatacctgcttcgtcaat- tggg tttgtccgtggtctattctggcaaccagcgcatattgaattatgcgatcccattgggattggtaaatgttcttg- ctgt ggacaggaaagcaatttgcgttataccggttttcttaaggaaaaatttacctttacagttaatgggctatggcc- ccat ccgcattccccttgtctggtaacagtcaagaaaggggaggttgaggaaaaatttcttgctttcaccacctccgc- acca tcatggacacaaatcagccgagttgtggtagataagattattcaaaatgaaaatggaaatcgcgtggcggcggt- tgtg aatcaattcagaaatattgcgccgcaaagtcctcttgaattgattatggggggatatcgtaataatcaagcatc- tatt cttgaacggcgtcatgatgtgttgatgttaatcaggggtggcaacaatacggcaatgtgataaacgaaatagtg- actg ttggtttgggatataaaacagccttacgcaaggcgttatatacctttgcagaagggtttaaaaataaagacttc- aaag gggccggagtctctgttcatgagactgcagaaaggcatttctatcgacagagtgaattattaattcccgatgta- ctgg cgaatgttaatttttcccaggctgatgaggtaatagctgatttacgagacaaacttcatcaattgtgtgaaatg- ctat ttaatcaatctgtagctccctatgcacatcatcctaaattaataagcacattagcgcttgcccgcgccacgcta- taca aacatttacgggagttaaaaccgcaaggagggccatcaaatggctga >protein sequence of FokI-(Sharkey-KKR)-Cse1 [SEQ ID NO: 21] MAQLVKSELEEKKSELRHKLKYVPHEYIELIEIARNPTQDRILEMKVMEFFMKVYGYRGEHLGGSRKPDGAIYT- VGSP IDYGVIVDTKAYSGGYNLPIGQADEMQRYVKENQTRNKHINPNEWWKVYPSSVTEFKFLFVSGHFKGNYKAQLT- RLNR KTNCNGAVLSVEELLIGGEMIKAGTLTLEEVRRKENNGEINFADPTNRAKGLEAVSVASMNLLIDNWIPVRPRN- GGKV QIINLQSLYCSRDQWRLSLPRDDMELAALALLVCIGQIIAPAKDDVEFRHRIMNPLTEDEFQQLIAPWIDMFYL- NHAE HPFMQTKGVKANDVTPMEKLLAGVSGATNCAFVNQPGQGEALCGGCTAIALFNQANQAPGEGGGEKSGLRGGTP- VTTE VRGIDLRSTVLLNVLTLPRLQKQFPNESHTENQPTWIKPIKSNESIPASSIGFVRGLFWQPAHIELCDPIGIGK- CSCC GQESNLRYTGELKEKETFTVNGLWPHPHSPCLVTVKKGEVEEKFLAFTTSAPSWTQISRVVVDKIIQNENGNRV- AAVV NQFRNIAPQSPLELIMGGYRNNQASILERRHDVLMENQGWQQYGNVINEIVTVGLGYKTALRKALYTFAEGEKN- KDFK GAGVSVHETAERHFYRQSELLIPDVLANVNFSQADEVIADLRDKLHQLCEMLFNQSVAPYAHHPKLISTLALAR- ATLY KHLRELKPQGGPSNG* >nucleotide sequence of His.sub.6-Dual-monopartite NLS SV40-FokI-(Sharkey-KKR)-Cse1 ("His.sub.6" disclosed as SEQ ID NO: 48) [SEQ ID NO: 22] ATGcatcaccatcatcaccacCCGAAAAAAAAGCGCAAAGTGGATCCGAAGAAAAAACGTAAAGTTGAAGATCC- GAAA GACATGGCTCAACTGGTTAAAAGCGAACTGGAAGAGAAAAAAAGTGAACTGCGCCACAAACTGAAATATGTGCC- GCAT GAATATATCGAGCTGATTGAAATTGCACGTAATCCGACCCAGGATCGTATTCTGGAAATGAAAGTGATGGAATT- TTTT ATGAAAGTGTACGGCTATCGCGGTGAACATCTGGGTGGTAGCCGTAAACCGGATGGTGCAATTTATACCGTTGG- TAGC CCGATTGATTATGGTGTTATTGTTGATACCAAAGCCTATAGCGGTGGTTATAATCTGCCGATTGGTCAGGCAGA- TGAA ATGCAGCGTTATGTGAAAGAAAATCAGACCCGCAACAAACATATTAACCCGAATGAATGGTGGAAAGTTTATCC- GAGC AGCGTTACCGAGTTTAAATTCCTGTTTGTTAGCGGTCACTTCAAAGGCAACTATAAAGCACAGCTGACCCGTCT- GAAT CGTAAAACCAATTGTAATGGTGCAGTTCTGAGCGTTGAAGAACTGCTGATTGGTGGTGAAATGATTAAAGCAGG- CACC CTGACCCTGGAAGAAGTTCGTCGCAAATTTAACAATGGCGAAATCAACTTTGCGGATCCCACCAACCGCGCGAA- AGGC CTGGAAGCGGTGAGCGTGGCGAGCatgaatttgcttattgataactggattcctgtacgcccgcgaaacggggg- gaaa gtccaaatcataaatattgggcaaattatcgccccggcaaaagatgacgttgaatttcgacatcgcataatgaa- tccg ctcactgaagatgagtttcaacaactcatcgcgccgtggatagatatgttctaccttaatcacgcagaacatcc- cttt atgcagaccaaaggtgtcaaagcaaatgatgtgactccaatggaaaaactgttggctggggtaagcggcgcgac- gaat tgtgcatttgtcaatcaaccggggcagggtgaagcattatgtggtggatgcactgcgattgcgttattcaacca- ggcg aatcaggcaccaggttttggtggtggttttaaaagcggtttacgtggaggaacacctgtaacaacgttcgtacg- tggg atcgatcttcgttcaacggtgttactcaatgtcctcacattacctcgtcttcaaaaacaatttcctaatgaatc- acat acggaaaaccaacctacctggattaaacctatcaagtccaatgagtctatacctgcttcgtcaattgggtttgt- ccgt ggtctattctggcaaccagcgcatattgaattatgcgatcccattgggattggtaaatgttcttgctgtggaca- ggaa agcaatttgcgttataccggttttcttaaggaaaaatttacctttacagttaatgggctatggcccatccgcat- tccc cttgtctggtaacagtcaagaaaggggaggttgaggaaaaatttcttgctttcaccacctccgcaccatcatgg- acac aaatcagccgagttgtggtagataagattattcaaaatgaaaatggaaatcgcgtggcggcggttgtgaatcaa- ttca gaaatattgcgccgcaaagtcctcttgaattgattatggggggatatcgtaataatcaagcatctattcttgaa- cggc gtcatgatgtgttgatgtttaatcaggggtggcaacaatacggcaatgtgataaacgaaatagtgactgttggt- ttgg gatataaaacagccttacgcaaggcgttatatacctttgcagaagggtttaaaaataaagacttcaaaggggcc- ggag tctctgttcatgagactgcagaaaggcatttctatcgacagagtgaattattaattcccgatgtactggcgaat- gtta atttttcccaggctgatgaggtaatagctgatttacgagacaaacttcatcaattgtgtgaaatgctatttaat- caat ctgtagctccctatgcacatcatcctaaattaataagcacattagcgcttgcccgcgccacgctatacaaacat- ttac gggagttaaaaccgcaaggagggccatcaaatggctga >protein sequence of His.sub.6-Dual-monopartite NLS SV40-FokI-(Sharkey-KKR)-Cse1 ("His.sub.6" disclosed as SEQ ID NO: 48) [SEQ ID NO: 23] MHHHHHHPKKKRKVDPKKKRKVEDPKDMAQLVKSELEEKKSELRHKLKYVPHEYIELIEIARNPTQDRILEMKV- MEFF MKVYGYRGEHLGGSRKPDGAIYTVGSPIDYGVIVDTKAYSGGYNLPIGQADEMQRYVKENQTRNKHINPNEWWK- VYPS SVTEFKFLFVSGHFKGNYKAQLTRLNRKTNVNGAVLSVEELLIGGEMIKAGTLTLEEVRRKFNNGEINFADPTN- RAKG LEAVSVASMNLLIDNWIPVRPRNGGKVQIINLQSLYCSRDQWRLSLPRDDMELAALALLVCIGQIIAPAKDDVE- FRHR IMNPLTEDEFQQLIAPWIDMFYLNHAEHPFMQTKGVKANDVTPMEKLLAGVSGATNCAFVNQPGQGEALCGGCT- AIAL FNQANQAPGFGGGFKSGLRGGTPVTTFVRGIDLRSTVLLNVLTLPRLQKQFPNESHTENQPTWIKPIKSNESIP- ASSI GFVRGLFWQPAHIELCDPIGIGKCSCCGQESNLRYTGFLKEKFTFTVNGLWPHPHSPCLVTVKKGEVEEKFLAF- TTSA PSWTQISRVVVDKIIQNENGNRVAAVVNQFRNIAPQSPLELIMGGYRNNQASILERHHDVLMFNQGWQQYGNVI- NEIV TVGLGYKTALRKALYTFAEGFKNKDFKGAGVSVHETAERHFYRQSELLIPDVLANVNFSQADEVIADLRDKLHQ- LCEM LFNQSVAPYAHHPKLISTLALARATLYKHLRELKPQGGPSNG* >nucleotide sequence of His.sub.6-Dual-monopartite NLS SV40-FokI (Sharkey-ELD)-Cse1 ("His.sub.6" disclosed as SEQ ID NO: 48) [SEQ ID NO: 24] ATGcatcaccatcatcaccacCCGAAAAAAAAGCGCAAAGTGGATCCGAAGAAAAAACGTAAAGTTGAAGATCC- GAAA
GACATGGCTCAACTGGTTAAAAGCGAACTGGAAGAGAAAAAAAGTGAACTGCGCCACAAACTGAAATATGTGCC- GCAT GAATATATCGAGCTGATTGAAATTGCACGTAATCCGACCCAGGATCGTATTCTGGAAATGAAAGTGATGGAATT- TTTT ATGAAAGTGTACGGCTATCGCGGTGAACATCTGGGTGGTAGCCGTAAACCGGATGGTGCAATTTATACCGTTGG- TAGC CCGATTGATTATGGTGTTATTGTTGATACCAAAGCCTATAGCGGTGGTTATAATCTGCCGATTGGTCAGGCAGA- TGAA ATGGAACGTTATGTGGAAGAAAATCAGACCCGTGATAAACATCTGAATCCGAATGAATGGTGGAAAGTTTATCC- GAGC AGCGTTACCGAGTTTAAATTCCTGTTTGTTAGCGGTCACTTCAAAGGCAACTATAAAGCACAGCTGACCCGTCT- GAAT CATATTACCAATTGTAATGGTGCAGTTCTGAGCGTTGAAGAACTGCTGATTGGTGGTGAAATGATTAAAGCAGG- CACC CTGACCCTGGAAGAAGTTCGTCGCAAATTTAACAATGGCGAAATCAACTTTGCGGATCCCACCAACCGCGCGAA- AGGC CTGGAAGCGGTGAGCGTGGCGAGCatgaatttgcttattgataactggattcctgtacgcccgcgaaacggggg- gaaa gtccaaatcataaatctgcaatcgctatactgcagtagagatcagtggcgattaagtttgccccgtgacgatat- ggaa ctggccgctttagcactgctggtttgcattgggcaaattatcgccccggcaaaagatgacgttgaatttcgaca- tcgc ataatgaatccgctcactgaagatgagtttcaacaactcatcgcgccgtggatagatatgttctaccttaatca- cgca gaacatccctttatgcagaccaaaggtgtcaaagcaaatgatgtgactccaatggaaaaactgttggctggggt- aagc ggcgcgacgaattgtgcatttgtcaatcaaccggggcagggtgaagcattatgtggtggatgcactgcgattgc- gtta ttcaaccaggcgaatcaggcaccaggttttggtggtggttttaaaagcggtttacgtggaggaacacctgtaac- aacg ttcgtacgtgggatcgatcttcgttcaacggtgttactcaatgtcctcacattacctcgtcttcaaaaacaatt- tcct aatgaatcacatacggaaaaccaacctacctggattaaacctatcaagtccaatgagtctatacctgcttcgtc- aatt gggtttgtccgtggtctattctggcaaccagcgcatattgaattatgcgatcccattgggattggtaaatgttc- ttgc tgtggacaggaaagcaatttgcgttataccggttttcttaaggaaaaatttacctttacagttaatgggctatg- gccc catccgcattccccttgtctggtaacagtcaagaaaggggaggttgaggaaaaatttcttgctttcaccacctc- cgca ccatcatggacacaaatcagccgagttgtggtagataagattattcaaaatgaaaatggaaatcgcgtggcggc- ggtt gtgaatcaattcagaaatattgcgccgcaaagtcctcttgaattgattatggggggatatcgtaataatcaagc- atct attcttgaacggcgtcatgatgtgttgatgtttaatcaggggtggcaacaatacggcaatgtgataaacgaaat- agtg actgttggtttgggatataaaacagccttacgcaaggcgttatatacctttgcagaagggtttaaaaataaaga- cttc aaaggggccggagtctctgttcatgagactgcagaaaggcatttctatcgacagagtgaattattaattcccga- tgta ctggcgaatgttaatttttcccaggctgatgaggtaatagctgatttacgagacaaacttcatcaattgtgtga- aatg ctatttaatcaatctgtagctccctatgcacatcatcctaaattaataagcacattagcgcttgcccgcgccac- gcta tacaaacatttacgggagttaaaaccgcaaggagggccatcaaatggctga >protein sequence of His.sub.6-Dual-monopartite NLS SV40-FokI-(Sharkey-ELD)-Cse1 ("His.sub.6" disclosed as SEQ ID NO: 48) [SEQ ID NO: 25] MHHHHHHPKKKRKVDPKKKRKVEDPKDMAQLVKSELEEKKSELRHKLKYVPHEYIELIEIARNPTQDRILEMKV- MEFF MKVYGYRGEHLGGSRKPDGAIYTVGSPIDYGVIVDTKAYSGGYNLPIGQADEMERYVEENQTRDKHLNPNEWWK- VYPS SVTEFKFLFVSGHFKGNYKAQLTRLNHITNCNGAVLSVEELLIGGEMIKAGTLTLEEVRRKFNNGEINFADPTN- RAKG LEAVSVASMNLLIDNWIPVRPRNGGKVQIINLQSLYCSRDQWRLSLPRDDMELAALALLVCIGQIIAPAKDDVE- FRHR IMNPLTEDEFQQLIAPWIDMFYLNHAEHPFMQTKGVKANDVTPMEKLLAGVSGATNCAFVNQPGQGEALCGGCT- AIAL FNQANQAPGFGGGFKSGLRGGTPVTTFVRGIDLRSTVLLNVLTLPRLQKQFPNESHTENQPTWIKPIKSNESIP- ASSI GFVRGLFWQPAHIELCDPIGIGKCSCCGQESNLRYTGFLKEKFTFTVNGLWPHPHSPCLVTVKKGEVEEKFLAF- TTSA PSWTQISRVVVDKIIQNENGNRVAAVVNQFRNIAPQSPLELIMGGYRNNQASILERRHDVLMFNQGWQQYGNVI- NEIV TVGLGYKTALRKALYTFAEGFKNKDFKGAGVSVHETAERHFYRQSELLIPDVLANVNFSQADEVIADLRDKLHQ- LCEM LFNQSVAPYAHHPKLISTLALARATLYKHLRELKPQGGPSNG*
DNA Cleavage Assay
[0156] The specificity and activity of the complexes was tested using an artificially constructed target plasmid as a substrate. This plasmid contains M13 and P7 binding sites on opposing strands such that both FokI domains face each other (see FIG. 11). The distance between the Cascade binding sites varies between 25 and 50 basepairs with 5 bp increments. As the binding sites of Cascade need to be flanked by any of four known PAM sequences (5'-protospacer-CTT/CAT/CTC/CCT-3' this distance range gives sufficient flexibility to design such a pair for almost any given sequence.
[0157] The sequences of the target plasmids used are as follows. The number indicated the distance between the M13 and P7 target sites. Protospacers are shown in bold, PAMs underlined:
[0158] Sequences of the target plasmids. The number indicates the distance between the M13 and P7 target sites. (protospacers in bold, PAMs underlined)
TABLE-US-00007 >50 bp [SEQ ID NO: 26] gaattcACAACGGTGAGCAAGTCACTGTTGGCAAGCCAGGATCTGAACA ATACCGTCTTGCTTTCGAGCGCTAGCTCTAGAACTAGTCCTCAGCCTAG GCCTCGTTCCGAAGCTGTCTTTCGCTGCTGAGGGTGACGATCCCGCATA GGCGGCCTTTAACTCggatcc >45 bp [SEQ ID NO: 27] gaattcACAACGGTGAGCAAGTCACTGTTGGCAAGCCAGGATCTGAACA ATACCGTCTTTTCGAGCGCTAGCTCTAGAACTAGTCCTCAGCCTAGGCC TCGTTCAAGCTGTCTTTCGCTGCTGAGGGTGACGATCCCGCATAGGCGG CCTTTAACTCggatcc >40 bp [SEQ ID NO: 28] gaattcACAACGGTGAGCAAGTCACTGTTGGCAAGCCAGGATCTGAACA ATACCGTCTTCGAGCGCTAGCTCTAGAACTAGTCCTCAGCCTAGGCCTC GAAGCTGTCTTTCGCTGCTGAGGGTGACGATCCCGCATAGGCGGCCTTT AACTCggatcc >35 bp [SEQ ID NO: 29] gaattcACAACGGTGAGCAAGTCACTGTTGGCAAGCCAGGATCTGAACA ATACCGTCTTGCGCTAGCTCTAGAACTAGTCCTCAGCCTAGGCCTAAGC TGTCTTTCGCTGCTGAGGGTGACGATCCCGCATAGGCGGCCTTTAACTC ggatcc >30 bp [SEQ ID NO: 30] gaattcACAACGGTGAGCAAGTCACTGTTGGCAAGCCAGGATCTGAACA ATACCGTCTTGCTAGCTCTAGAACTAGTCCTCAGCCTAGGAAGCTGTCT TTCGCTGCTGAGGGTGACGATCCCGCATAGGCGGCCTTTAACTCggatc c >25 bp [SEQ ID NO: 31] gaattcACAACGGTGAGCAAGTCACTGTTGGCAAGCCAGGATCTGAACA ATACCGTCTTCTCTAGAACTAGTCCTCAGCCTAGGAAGCTGTCTTTCGC TGCTGAGGGTGACGATCCCGCATAGGCGGCCTTTAACTCggatcc
[0159] Cleavage of the target plasmids was analysed on agarose gels, where negatively supercoiled (nSC) plasmid can be distinguished from linearized- or nicked plasmid. The cleavage site of the Cascade.sup.KKR/ELD pair in a target vector was determined by isolating linear cleavage products from an agarose gel and filling in the recessed 3' ends left by FokI cleavage with the Klenow fragment of E. coli DNA polymerase to create blunt ends. The linear vector was self-ligated, transformed, amplified, isolated and sequenced. Filling in of recessed 3' ends and re-ligation will lead to extra nucleotides in the sequence that represents the overhang left by FokI cleavage. By aligning the sequence reads to the original sequence, the cleavage sites can be found on a clonal level and mapped. Below, the additional bases incorporated into the sequence after filling in recessed 3' ends left by FokI cleavage are underlined:
##STR00001##
Reading from top to bottom, the 5'-3' sequences above are SEQ ID NOs: 32-35, respectively.
Cleavage of a Target Locus in Human Cells
[0160] The human CCR5 gene encodes the C--C chemokine receptor type 5 protein, which serves as the receptor for the human immunodeficiency virus (HIV) on the surface of white blood cells. The CCR5 gene is targeted using a pair of Cascade.sup.KKR/ELD nucleases in addition to an artificial GFP locus. A suitable binding site pair is selected on the coding region of CCR5. Two separate CRISPR arrays containing uniform spacers targeting each of the binding sites are constructed using DNA synthesis (Geneart).
[0161] The human CCR5 target gene selection and CRISPR designs used are as follows:
TABLE-US-00008 >Part of genomic human CCR5 sequence, containing whole ORF (position 347-1446). [SEQ ID NO: 36] GGTGGAACAAGATGGATTATCAAGTGTCAAGTCCAATCTATGACATCA ATTATTATACATCGGAGCCCTGCCAAAAAATCAATGTGAAGCAAATCG CAGCCCGCCTCCTGCCTCCGCTCTACTCACTGGTGTTCATCTTTGGTT TTGTGGGCAACATGCTGGTCATCCTCATCCTGATAAACTGCAAAAGGC TGAAGAGCATGACTGACATCTACCTGCTCAACCTGGCCATCTCTGACC TGTTTTTCCTTCTTACTGTCCCCTTCTGGGCTCACTATGCTGCCGCCC AGTGGGACTTTGGAAATACAATGTGTCAACTCTTGACAGGGCTCTATT TTATAGGCTTCTTCTCTGGAATCTTCTTCATCATCCTCCTGACAATCG ATAGGTACCTGGCTGTCGTCCATGCTGTGTTTGCTTTAAAAGCCAGGA CGGTCACCTTTGGGGTGGTGACAAGTGTGATCACTTGGGTGGTGGCTG TGTTTGCGTCTCTCCCAGGAATCATCTTTACCAGATCTCAAAAAGAAG GTCTTCATTACACCTGCAGCTCTCATTTTCCATACAGTCAGTATCAAT TCTGGAAGAATTTCCAGACATTAAAGATAGTCATCTTGGGGCTGGTCC TGCCGCTGCTTGTCATGGTCATCTGCTACTCGGGAATCCTAAAAACTC TGCTTCGGTGTCGAAATGAGAAGAAGAGGCACAGGGCTGTGAGGCTTA TCTTCACCATCATGATTGTTTATTTTCTCTTCTGGGCTCCCTACAACA TTGTCCTTCTCCTGAACACCTTCCAGGAATTCTTTGGCCTGAATAATT GCAGTAGCTCTAACAGGTTGGACCAAGCTATGCAGGTGACAGAGACTC TTGGGATGACGCACTGCTGCATCAACCCCATCATCTATGCCTTTGTCG GGGAGAAGTTCAGAAACTACCTCTTAGTCTTCTTCCAAAAGCACATTG CCAAACGCTTCTGCAAATGCTGTTCTATTTTCCAGCAAGAGGCTCCCG AGCGAGCAAGCTCAGTTTACACCCGATCCACTGGGGAGCAGGAAATAT CTGTGGGCTTGTGACACGGACTCAAGTGGGCTGGTGACCCAGTC Red1/2: chosen target sites (distance: 34 bp, PAM 5'-CTT-3'). "Red 1 is first appearing underlined sequence in the above. Red2 is the second underlined sequence. >CRISPR array red1 (italics = spacers, bold = repeats) [SEQ ID NO: 37] ccatggTAATACGACTCACTATAGGGAGAATTAGCTGATCTTTAATAA TAAGGAAATGTTACATTAAGGTTGGTGGGTTGTTTTTATGGGAAAAAA TGCTTTAAGAACAAATGTATACTTTTAGAGAGTTCCCCGCGCCAGCGG GGATAAACCGCAAACACAGCATGGACGACAGCCAGGTACCTAGAGTTC CCCGCGCCAGCGGGGATAAACCGCAAACACAGCATGGACGACAGCCAG GTACCTAGAGTTCCCCGCGCCAGCGGGGATAAACCGCAAACACAGCAT GGACGACAGCCAGGTACCTAGAGTTCCCCGCGCCAGCGGGGATAAACC GAAAACAAAAGGCTCAGTCGGAAGACTGGGCCTTTTGTTTTAACCCCT TGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGggtacc >CRISPR array red2 (italics: spacers, bold: repeats) [SEQ ID NO: 38] ccatggTAATACGACTCACTATAGGGAGAATTAGCTGATCTTTAATAA TAAGGAAATGTTACATTAAGGTTGGTGGGTTGTTTTTATGGGAAAAAA TGCTTTAAGAACAAATGTATACTTTTAGAGAGTTCCCCGCGCCAGCGG GGATAAACCGTGTGATCACTTGGGTGGTGGCTGTGTTTGCGTGAGTTC CCCGCGCCAGCGGGGATAAACCGTGTGATCACTTGGGTGGTGGCTGTG TTTGCGTGAGTTCCCCGCGCCAGCGGGGATAAACCGTGTGATCACTTG GGTGGTGGCTGTGTTTGCGTGAGTTCCCCGCGCCAGCGGGGATAAACC GAAAACAAAAGGCTCAGTCGGAAGACTGGGCCTTTTGTTTTAACCCCT TGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGggtacc
Delivery of Cascade.sup.KKR/ELD into the Nucleus of Human Cells
[0162] Cascade is very stable as a multi-subunit protein-RNA complex and is easily produced in mg quantities in E. coli. Transfection or micro-injection of the complex in its intact form as purified from E. coli is used as methods of delivery (see FIG. 12). As shown in FIG. 12, Cascade-FokI nucleases are purified from E. coli and encapsulated in protein transfection vesicles. These are then fused with the cell membrane of human HepG2 cells releasing the nucleases in the cytoplasm (step 2). NLS sequences are then be recognized by importin proteins, which facilitate nucleopore passage (step 3). Cascade.sup.KKR (open rectangle) and Cascade.sup.ELD (filled rectangle) will then find and cleave their target site (step 4.), inducing DNA repair pathways that will alter the target site leading to desired changes. Cascade.sup.KKR/ELD nucleases need to act only once and require no permanent presence in the cell encoded on DNA.
[0163] To deliver Cascade into human cells, protein transfection reagents are used from various sources including Pierce, NEB, Fermentas and Clontech. These reagents have recently been developed for the delivery of antibodies, and are useful to transfect a broad range of human cell lines with efficiencies up to 90%. Human HepG2 cells are transfected. Also, other cell lines including CHO-K1, COS-7, HeLa, and non-embryonic stem cells, are transfected.
[0164] To import the Cascade.sup.KKR/ELD nuclease pair into the nucleus, a tandem monopartite nuclear localisation signal (NLS) from the large T-antigen of simian virus 40 (SV40) is fused to the N-terminus of FokI. This ensures import of only intact Cascade.sup.ELD/KKR into the nucleus. (The nuclear pore complex translocates RNA polymerases (550 kDa) and other large protein complexes). As a check prior to transformations, the nuclease activity of the Cascade.sup.KKR/ELD nuclease pair is checked in vitro using purified complexes and CCR5 PCR amplicons to exclude transfecting non-productive Cascade.sup.KKR/ELD nuclease pairs.
Surveyor Assay
[0165] Transfected cells are cultivated and passaged for several days. The efficiency of in vivo target DNA cleavage is then assessed by using the Surveyor assay of Guschin, D. Y., et at (2010) Methods Mol. Biol., 649: 247-256. Briefly, PCR amplicons of the target DNA locus will be mixed 1:1 with PCR amplicons from untreated cells. These are heated and allowed to anneal, giving rise to mismatches at target sites that have been erroneously repaired by NHEJ. A mismatch nuclease is then used to cleave only mismatched DNA molecules, giving a maximum of 50% of cleavage when target DNA cleavage by Cascade.sup.KKR/ELD is complete. This procedure was then followed up by sequencing of the target DNA amplicons of treated cells. The assay allows for rapid assessment and optimization of the delivery procedure.
Production of Cascade-Nuclease Pairs
[0166] The Cascade-nuclease complexes were constructed as explained above. Affinity purification from E. coli using the StrepII-tagged Cse2 subunit yields a complex with the expected stoichiometry when compared to native Cascade. Referring to FIG. 13, this shows the stoichiometry of native Cascade (1), Cascade.sup.KKR with P7 CrRNA and Cascade.sup.ELD with M13 CrRNA 24 h after purification using only Streptactin. Bands in native Cascade (1) are from top to bottom: Cse1, Cas7, Cas5, Cas6e, Cse2. Cascade.sup.KKR/ELD show the FokI-Cse1 fusion band and an additional band representing Cse1 with a small part of FokI as a result of proteolytic degradation.
[0167] Apart from an intact FokI-Cse1 fusion protein, we observed that a fraction of the FokI-Cse1-fusion protein is proteolytically cleaved, resulting in a Cse1 protein with only the linker and a small part of FokI attached to it (as confirmed by Mass Spectrometry, data not shown). In most protein isolations the fraction of degraded fusion protein is approximately 40%. The isolated protein is stably stored in the elution buffer (20 mM HEPES pH 7.5, 75 mM NaCl, 1 mM DTT, 4 mM desthiobiotin) with additional 0.1% Tween 20 and 50% glycerol at -20.degree. C. Under these storage conditions, integrity and activity of the complex have been found stable for at least three weeks (data not shown).
Introduction of a his.sub.6-Tag (SEQ ID NO: 48) and NLS to the Cascade-Nuclease
[0168] The Cascade nuclease fusion design was modified to incorporate a Nucleolar Localization Signal (NLS) to enable transport into the nucleus of eukaryotic cells. For this a tandem monopartite NLS from the large T-antigen of Simian Virus SV40 (sequence: PKKKRKVDPKKKRKV) (SEQ ID NO: 49) was translationally fused to the N-terminus of the FokI-Cse1 fusion protein, directly preceded by a His.sub.6-tag at the N-terminus. The His.sub.6-tag (sequence: MHHHHHH) ("His.sub.6" disclosed as SEQ ID NO: 48 and "MHHHHHH" disclosed as SEQ ID NO: 50) allows for an additional Ni.sup.2+-resin affinity purification step after StrepII purification. This additional step ensures the isolation of only full-length Cascade-nuclease fusion complex, and increases the efficiency of cleavage by eliminating the binding of non-intact Cascade complexes to the target site forming an unproductive nuclease pair.
In Vitro Cleavage Assay
[0169] Cascade.sup.KRR/ELD activity and specificity was assayed in vitro as described above. FIG. 14A shows plasmids with distances between protospacers of 25-50 bp (5 bp increments, lanes 1-6) incubated with Cascade.sup.KKR/ELD for 30 minutes at 37.degree. C. Lane 10 contains the target plasmid in its three possible topologies: the lowest band represents the initial, negatively supercoiled (nSC) form of the plasmid, the middle band represents the linearized form (cleaved by XbaI), whilst the upper band represents the open circular (OC) form (after nicking with Nt.BbrCI). Lane 7 shows incubation of a plasmid with both binding sites removed (negative control). Therefore FIG. 14A shows a typical cleavage assay using various target plasmids in which the binding sites are separated by 25 to 50 base pairs in 5 bp increments (lanes 1 to 6). These plasmids with distances of 25-50 bp were incubated with Cascade.sup.KKR/ELD carrying anti P7 and M13 crRNA respectively. A plasmid containing no binding sites served as a control (lane 7). The original plasmid exists in negatively supercoiled form (nSC, control lane 8), and nicked or linearized products are clearly distinguishable. Upon incubation a linear cleavage product is formed when binding sites were separated by 30, 35 and 40 base pairs (lanes 2, 3, 4). At 25, 45 and 50 base pairs distance (lanes 1, 5, 6), the target plasmid appeared to be incompletely cleaved leading to the nicked form (OC). These results show the best cleavage in plasmids with distances between 30 and 40 bp, giving sufficient flexibility when designing a crRNA pair for any given locus. Both shorter and longer distances result in increased nicking activity while creating less DSBs. There is very little activity on a plasmid where the two protospacers have been removed, showing target specificity (lane 7).
Cleavage Conditions
[0170] To assess the optimal buffer conditions for cleavage assays, and to estimate whether activity of the complex is expected at physiological conditions, the following two buffers were selected: (1) NEB4 (New England Biolabs, 50 mM potassium acetate, 20 mM Tris-acetate, 10 mM magnesium acetate, 1 mM dithiothreitol, pH 7.9) and (2) Buffer 0 (Fermentas, 50 mM Tris-HCl, 10 mM MgCl.sub.2, 100 mM NaCl, 0.1 mg/mL BSA, pH 7.5). Of the two, NEB4 is recommended for optimal activity of the commercial intact FokI enzyme. Buffer 0 was chosen from a quick screen to give good activity and specificity (data not shown). FIG. 14B shows incubation with different buffers and different incubation times. Lanes 1-4 have been incubated with Fermentas Buffer 0 (lane 1, 2 for 15 minutes, lane 3, 4 for 30 minutes), lanes 5, 6 have been incubated with NEB4 (30 minutes). Lanes 1, 3, 5 used the target plasmid with 35 bp spacing, lanes 2, 4, 6 used the non-target plasmid (no binding sites). Lanes 7, 8 have been incubated with only Cascade.sup.KKR or Cascade.sup.ELD respectively (buffer 0). Lane 9 is the topology marker as in (A). Lane 10 and 11 show the target and non-target plasmid incubated without addition of Cascade. Therefore in FIG. 14B, activity was tested on the target plasmid with 35 base pairs distance (lane 1, 3, 5) and a non-target control plasmid (lane 2, 4, 6). There was a high amount of unspecific nicking and less cleavage in NEB4 (lane 5,6), whilst buffer 0 shows only activity in the target plasmid with a high amount of specific cleavage and little nicking (lane 1-4). The difference is likely caused by the NaCl concentration in buffer 0, higher ionic strength weakens protein-protein interactions, leading to less nonspecific activity. Incubation of 15 or 30 minutes shows little difference in both target and non-target plasmid (lane 1,2 or 3,4 respectively). Addition of only one type of Cascade (P7.sup.KKR or M13.sup.ELD) does not result in cleavage activity (lane 7, 8) as expected. This experiment shows that specific Cascade nuclease activity by a designed pair occurs when the NaCl concentration is at least 100 mM, which is near the physiological saline concentration inside cells (137 mM NaCl). The Cascade nuclease pair is expected to be fully active in vivo, in eukaryotic cells, while displaying negligible off-target cleavage activity.
Cleavage Site
[0171] The site of cleavage in the target plasmid with a spacing of 35 bp (pTarget35) was determined. FIG. 15 shows how sequencing reveals up- and downstream cleavage sites by Cascade.sup.KKR/ELD in the target plasmid with 35 base pair spacing. In FIG. 15A) is shown the target region within pTarget35 with annotated potential cleavage sites. Parts of the protospacers are indicated in red and blue. B) The bar chart shows four different cleavage patterns and their relative abundance within sequenced clones. The blue bars represent the generated overhang, while the left and right border of each bar represents the left and right cleavage site (see B for annotation).
[0172] FIG. 15A shows the original sequence of pTarget35, with numbered cleavage sites from -7 to +7 where 0 lies in the middle between the two protospacers (indicated in red and blue). Seventeen clones were sequenced and these all show cleavage around position 0, creating varying overhangs between 3 and 5 bp (see FIG. 15B). Overhangs of 4 are most abundant (cumulatively 88%), while overhangs of 3 and 5 occur only once (6% each). The cleavage occurred exactly as expected with no clones showing off target cleavage.
Cleaving a Target Locus in Human Cells.
[0173] Cascade.sup.KKR/ELD nucleases were successfully modified to contain an N-terminal His.sub.6-tag (SEQ ID NO: 48) followed by a dual mono-partite Nucleolar Localisation Signal. These modified Cascade nuclease fusion proteins were co-expressed with either one of two synthetically constructed CRISPR arrays, each targeting a binding site in the human CCR5 gene. First the activity of this new nuclease pair is validated in vitro by testing the activity on a plasmid containing this region of the CCR5 gene. The nuclease pair is transfected to a human cell line, e.g. HeLa cell line. Efficiency of target cleavage is assessed using the Surveyor assay as described above.
Sequence CWU
1
1
501502PRTEscherichia coli 1Met Asn Leu Leu Ile Asp Asn Trp Ile Pro Val Arg
Pro Arg Asn Gly 1 5 10
15 Gly Lys Val Gln Ile Ile Asn Leu Gln Ser Leu Tyr Cys Ser Arg Asp
20 25 30 Gln Trp Arg
Leu Ser Leu Pro Arg Asp Asp Met Glu Leu Ala Ala Leu 35
40 45 Ala Leu Leu Val Cys Ile Gly Gln
Ile Ile Ala Pro Ala Lys Asp Asp 50 55
60 Val Glu Phe Arg His Arg Ile Met Asn Pro Leu Thr Glu
Asp Glu Phe 65 70 75
80 Gln Gln Leu Ile Ala Pro Trp Ile Asp Met Phe Tyr Leu Asn His Ala
85 90 95 Glu His Pro Phe
Met Gln Thr Lys Gly Val Lys Ala Asn Asp Val Thr 100
105 110 Pro Met Glu Lys Leu Leu Ala Gly Val
Ser Gly Ala Thr Asn Cys Ala 115 120
125 Phe Val Asn Gln Pro Gly Gln Gly Glu Ala Leu Cys Gly Gly
Cys Thr 130 135 140
Ala Ile Ala Leu Phe Asn Gln Ala Asn Gln Ala Pro Gly Phe Gly Gly 145
150 155 160 Gly Phe Lys Ser Gly
Leu Arg Gly Gly Thr Pro Val Thr Thr Phe Val 165
170 175 Arg Gly Ile Asp Leu Arg Ser Thr Val Leu
Leu Asn Val Leu Thr Leu 180 185
190 Pro Arg Leu Gln Lys Gln Phe Pro Asn Glu Ser His Thr Glu Asn
Gln 195 200 205 Pro
Thr Trp Ile Lys Pro Ile Lys Ser Asn Glu Ser Ile Pro Ala Ser 210
215 220 Ser Ile Gly Phe Val Arg
Gly Leu Phe Trp Gln Pro Ala His Ile Glu 225 230
235 240 Leu Cys Asp Pro Ile Gly Ile Gly Lys Cys Ser
Cys Cys Gly Gln Glu 245 250
255 Ser Asn Leu Arg Tyr Thr Gly Phe Leu Lys Glu Lys Phe Thr Phe Thr
260 265 270 Val Asn
Gly Leu Trp Pro His Pro His Ser Pro Cys Leu Val Thr Val 275
280 285 Lys Lys Gly Glu Val Glu Glu
Lys Phe Leu Ala Phe Thr Thr Ser Ala 290 295
300 Pro Ser Trp Thr Gln Ile Ser Arg Val Val Val Asp
Lys Ile Ile Gln 305 310 315
320 Asn Glu Asn Gly Asn Arg Val Ala Ala Val Val Asn Gln Phe Arg Asn
325 330 335 Ile Ala Pro
Gln Ser Pro Leu Glu Leu Ile Met Gly Gly Tyr Arg Asn 340
345 350 Asn Gln Ala Ser Ile Leu Glu Arg
Arg His Asp Val Leu Met Phe Asn 355 360
365 Gln Gly Trp Gln Gln Tyr Gly Asn Val Ile Asn Glu Ile
Val Thr Val 370 375 380
Gly Leu Gly Tyr Lys Thr Ala Leu Arg Lys Ala Leu Tyr Thr Phe Ala 385
390 395 400 Glu Gly Phe Lys
Asn Lys Asp Phe Lys Gly Ala Gly Val Ser Val His 405
410 415 Glu Thr Ala Glu Arg His Phe Tyr Arg
Gln Ser Glu Leu Leu Ile Pro 420 425
430 Asp Val Leu Ala Asn Val Asn Phe Ser Gln Ala Asp Glu Val
Ile Ala 435 440 445
Asp Leu Arg Asp Lys Leu His Gln Leu Cys Glu Met Leu Phe Asn Gln 450
455 460 Ser Val Ala Pro Tyr
Ala His His Pro Lys Leu Ile Ser Thr Leu Ala 465 470
475 480 Leu Ala Arg Ala Thr Leu Tyr Lys His Leu
Arg Glu Leu Lys Pro Gln 485 490
495 Gly Gly Pro Ser Asn Gly 500
2160PRTEscherichia coli 2Met Ala Asp Glu Ile Asp Ala Met Ala Leu Tyr Arg
Ala Trp Gln Gln 1 5 10
15 Leu Asp Asn Gly Ser Cys Ala Gln Ile Arg Arg Val Ser Glu Pro Asp
20 25 30 Glu Leu Arg
Asp Ile Pro Ala Phe Tyr Arg Leu Val Gln Pro Phe Gly 35
40 45 Trp Glu Asn Pro Arg His Gln Gln
Ala Leu Leu Arg Met Val Phe Cys 50 55
60 Leu Ser Ala Gly Lys Asn Val Ile Arg His Gln Asp Lys
Lys Ser Glu 65 70 75
80 Gln Thr Thr Gly Ile Ser Leu Gly Arg Ala Leu Ala Asn Ser Gly Arg
85 90 95 Ile Asn Glu Arg
Arg Ile Phe Gln Leu Ile Arg Ala Asp Arg Thr Ala 100
105 110 Asp Met Val Gln Leu Arg Arg Leu Leu
Thr His Ala Glu Pro Val Leu 115 120
125 Asp Trp Pro Leu Met Ala Arg Met Leu Thr Trp Trp Gly Lys
Arg Glu 130 135 140
Arg Gln Gln Leu Leu Glu Asp Phe Val Leu Thr Thr Asn Lys Asn Ala 145
150 155 160 3363PRTEscherichia
coli 3Met Ser Asn Phe Ile Asn Ile His Val Leu Ile Ser His Ser Pro Ser 1
5 10 15 Cys Leu Asn
Arg Asp Asp Met Asn Met Gln Lys Asp Ala Ile Phe Gly 20
25 30 Gly Lys Arg Arg Val Arg Ile Ser
Ser Gln Ser Leu Lys Arg Ala Met 35 40
45 Arg Lys Ser Gly Tyr Tyr Ala Gln Asn Ile Gly Glu Ser
Ser Leu Arg 50 55 60
Thr Ile His Leu Ala Gln Leu Arg Asp Val Leu Arg Gln Lys Leu Gly 65
70 75 80 Glu Arg Phe Asp
Gln Lys Ile Ile Asp Lys Thr Leu Ala Leu Leu Ser 85
90 95 Gly Lys Ser Val Asp Glu Ala Glu Lys
Ile Ser Ala Asp Ala Val Thr 100 105
110 Pro Trp Val Val Gly Glu Ile Ala Trp Phe Cys Glu Gln Val
Ala Lys 115 120 125
Ala Glu Ala Asp Asn Leu Asp Asp Lys Lys Leu Leu Lys Val Leu Lys 130
135 140 Glu Asp Ile Ala Ala
Ile Arg Val Asn Leu Gln Gln Gly Val Asp Ile 145 150
155 160 Ala Leu Ser Gly Arg Met Ala Thr Ser Gly
Met Met Thr Glu Leu Gly 165 170
175 Lys Val Asp Gly Ala Met Ser Ile Ala His Ala Ile Thr Thr His
Gln 180 185 190 Val
Asp Ser Asp Ile Asp Trp Phe Thr Ala Val Asp Asp Leu Gln Glu 195
200 205 Gln Gly Ser Ala His Leu
Gly Thr Gln Glu Phe Ser Ser Gly Val Phe 210 215
220 Tyr Arg Tyr Ala Asn Ile Asn Leu Ala Gln Leu
Gln Glu Asn Leu Gly 225 230 235
240 Gly Ala Ser Arg Glu Gln Ala Leu Glu Ile Ala Thr His Val Val His
245 250 255 Met Leu
Ala Thr Glu Val Pro Gly Ala Lys Gln Arg Thr Tyr Ala Ala 260
265 270 Phe Asn Pro Ala Asp Met Val
Met Val Asn Phe Ser Asp Met Pro Leu 275 280
285 Ser Met Ala Asn Ala Phe Glu Lys Ala Val Lys Ala
Lys Asp Gly Phe 290 295 300
Leu Gln Pro Ser Ile Gln Ala Phe Asn Gln Tyr Trp Asp Arg Val Ala 305
310 315 320 Asn Gly Tyr
Gly Leu Asn Gly Ala Ala Ala Gln Phe Ser Leu Ser Asp 325
330 335 Val Asp Pro Ile Thr Ala Gln Val
Lys Gln Met Pro Thr Leu Glu Gln 340 345
350 Leu Lys Ser Trp Val Arg Asn Asn Gly Glu Ala
355 360 4224PRTEscherichia coli 4Met Arg Ser
Tyr Leu Ile Leu Arg Leu Ala Gly Pro Met Gln Ala Trp 1 5
10 15 Gly Gln Pro Thr Phe Glu Gly Thr
Arg Pro Thr Gly Arg Phe Pro Thr 20 25
30 Arg Ser Gly Leu Leu Gly Leu Leu Gly Ala Cys Leu Gly
Ile Gln Arg 35 40 45
Asp Asp Thr Ser Ser Leu Gln Ala Leu Ser Glu Ser Val Gln Phe Ala 50
55 60 Val Arg Cys Asp
Glu Leu Ile Leu Asp Asp Arg Arg Val Ser Val Thr 65 70
75 80 Gly Leu Arg Asp Tyr His Thr Val Leu
Gly Ala Arg Glu Asp Tyr Arg 85 90
95 Gly Leu Lys Ser His Glu Thr Ile Gln Thr Trp Arg Glu Tyr
Leu Cys 100 105 110
Asp Ala Ser Phe Thr Val Ala Leu Trp Leu Thr Pro His Ala Thr Met
115 120 125 Val Ile Ser Glu
Leu Glu Lys Ala Val Leu Lys Pro Arg Tyr Thr Pro 130
135 140 Tyr Leu Gly Arg Arg Ser Cys Pro
Leu Thr His Pro Leu Phe Leu Gly 145 150
155 160 Thr Cys Gln Ala Ser Asp Pro Gln Lys Ala Leu Leu
Asn Tyr Glu Pro 165 170
175 Val Gly Gly Asp Ile Tyr Ser Glu Glu Ser Val Thr Gly His His Leu
180 185 190 Lys Phe Thr
Ala Arg Asp Glu Pro Met Ile Thr Leu Pro Arg Gln Phe 195
200 205 Ala Ser Arg Glu Trp Tyr Val Ile
Lys Gly Gly Met Asp Val Ser Gln 210 215
220 5199PRTEscherichia coli 5Met Tyr Leu Ser Lys Val
Ile Ile Ala Arg Ala Trp Ser Arg Asp Leu 1 5
10 15 Tyr Gln Leu His Gln Gly Leu Trp His Leu Phe
Pro Asn Arg Pro Asp 20 25
30 Ala Ala Arg Asp Phe Leu Phe His Val Glu Lys Arg Asn Thr Pro
Glu 35 40 45 Gly
Cys His Val Leu Leu Gln Ser Ala Gln Met Pro Val Ser Thr Ala 50
55 60 Val Ala Thr Val Ile Lys
Thr Lys Gln Val Glu Phe Gln Leu Gln Val 65 70
75 80 Gly Val Pro Leu Tyr Phe Arg Leu Arg Ala Asn
Pro Ile Lys Thr Ile 85 90
95 Leu Asp Asn Gln Lys Arg Leu Asp Ser Lys Gly Asn Ile Lys Arg Cys
100 105 110 Arg Val
Pro Leu Ile Lys Glu Ala Glu Gln Ile Ala Trp Leu Gln Arg 115
120 125 Lys Leu Gly Asn Ala Ala Arg
Val Glu Asp Val His Pro Ile Ser Glu 130 135
140 Arg Pro Gln Tyr Phe Ser Gly Asp Gly Lys Ser Gly
Lys Ile Gln Thr 145 150 155
160 Val Cys Phe Glu Gly Val Leu Thr Ile Asn Asp Ala Pro Ala Leu Ile
165 170 175 Asp Leu Val
Gln Gln Gly Ile Gly Pro Ala Lys Ser Met Gly Cys Gly 180
185 190 Leu Leu Ser Leu Ala Pro Leu
195 6392DNAArtificial SequenceDescription of
Artificial Sequence Synthetic polynucleotide 6actggaaagc gggcagtgaa
aggaaggccc atgaggccag ttaattaagc ggatcctggc 60ggcggcagcg gcggcggcag
cgacaagcag aagaacggca tcaaggcgaa cttcaagatc 120cgccacaaca tcgaggacgg
cggcgtgcag ctcgccgacc actaccagca gaacaccccc 180atcggcgacg gccccgtgct
gctgcccgac aaccactacc tgagctacca gtccgccctg 240agcaaagacc ccaacgagaa
gcgcgatcac atggtcctgc tggagttcgt gaccgccgcc 300gggatcactc tcggcatgga
cgagctgtac aagtaagcgg ccgcggcgcg cctaggcctt 360gacggccttc cttcaattcg
ccctatagtg ag 3927603DNAArtificial
SequenceDescription of Artificial Sequence Synthetic polynucleotide
7cactataggg cgaattggcg gaaggccgtc aaggccgcat ttaattaagc ggccgcaggc
60ggcggcagcg gcggcggcag catggtgagc aagggcgagg agctgttcac cggggtggtg
120cccatcctgg tcgagctgga cggcgacgta aacggccaca agttcagcgt gtccggcgag
180ggcgagggcg atgccaccta cggcaagctg accctgaagc tcatctgcac caccggcaag
240ctgcccgtgc cctggcccac cctcgtgacc accctcggct acggcctgca gtgcttcgcc
300cgctaccccg accacatgaa gcagcacgac ttcttcaagt ccgccatgcc cgaaggctac
360gtccaggagc gcaccatctt cttcaaggac gacggcaact acaagacccg cgccgaggtg
420aagttcgagg gcgacaccct ggtgaaccgc atcgagctga agggcatcga cttcaaggag
480gacggcaaca tcctggggca caagctggag tacaactaca acagccacaa cgtctatatc
540acggcctaac tcgagggcgc gccctgggcc tcatgggcct tccgctcact gcccgctttc
600cag
6038679DNAArtificial SequenceDescription of Artificial Sequence Synthetic
polynucleotide 8cactataggg cgaattggcg gaaggccgtc aaggccgcat
gagctccatg gaaacaaaga 60attagctgat ctttaataat aaggaaatgt tacattaagg
ttggtgggtt gtttttatgg 120gaaaaaatgc tttaagaaca aatgtatact tttagagagt
tccccgcgcc agcggggata 180aaccgggccg attgaaggtc cggtggatgg cttaaaagag
ttccccgcgc cagcggggat 240aaaccgccgc aggtacagca ggtagcgcag atcatcaaga
gttccccgcg ccagcgggga 300taaaccgact tctctccgaa aagtcaggac gctgtggcag
agttccccgc gccagcgggg 360ataaaccgcc tacgcgctga acgccagcgg tgtggtgaat
gagttccccg cgccagcggg 420gataaaccgg tgtggccatg cacgccttta acggtgaact
ggagttcccc gcgccagcgg 480ggataaaccg cacgaactca gccagaacga caaacaaaag
gcgagttccc cgcgccagcg 540gggataaacc ggcaccagta cgcgccccac gctgacggtt
tctgagttcc ccgcgccagc 600ggggataaac cgcagctccc attttcaaac ccaggtaccc
tgggcctcat gggccttccg 660ctcactgccc gctttccag
6799685DNAArtificial SequenceDescription of
Artificial Sequence Synthetic polynucleotide 9gagctcccgg gctgacggta
atagaggcac ctacaggctc cggtaaaacg gaaacagcgc 60tggcctatgc ttggaaactt
attgatcaac aaattgcgga tagtgttatt tttgccctcc 120caacacaagc taccgcgaat
gctatgctta cgagaatgga agcgagcgcg agccacttat 180tttcatcccc aaatcttatt
cttgctcatg gcaattcacg gtttaaccac ctctttcaat 240caataaaatc acgcgcgatt
actgaacagg ggcaagaaga agcgtgggtt cagtgttgtc 300agtggttgtc acaaagcaat
aagaaagtgt ttcttgggca aatcggcgtt tgcacgattg 360atcaggtgtt gatttcggta
ttgccagtta aacaccgctt tatccgtggt ttgggaattg 420gtagatctgt tttaattgtt
aatgaagttc atgcttacga cacctatatg aacggcttgc 480tcgaggcagt gctcaaggct
caggctgatg tgggagggag tgttattctt ctttccgcaa 540ccctaccaat gaaacaaaaa
cagaagcttc tggatactta tggtctgcat acagatccag 600tggaaaataa ctccgcatat
ccactcatta actggcgagg tgtgaatggt gcgcaacgtt 660ttgatctgct agcggatccg
gtacc 6851037DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
10atagcgccat ggaacctttt aaatatatat gccatta
371141DNAArtificial SequenceDescription of Artificial Sequence Synthetic
primer 11acagtgggat ccgctttggg atttgcaggg atgactctgg t
411243DNAArtificial SequenceDescription of Artificial Sequence
Synthetic primer 12atagcgtcat gaatttgctt attgataact ggattcctgt acg
431344DNAArtificial SequenceDescription of
Artificial Sequence Synthetic primer 13acagtggcgg ccgcgccatt
tgatggccct ccttgcggtt ttaa 441445DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
14cgtatatcaa actttccaat agcatgaaga gcaatgaaaa ataac
451523DNAArtificial SequenceDescription of Artificial Sequence Synthetic
primer 15atgataccgc gagacccacg ctc
231624DNAArtificial SequenceDescription of Artificial Sequence
Synthetic primer 16cggataaagt tgcaggacca cttc
2417199PRTEscherichia coli 17Met Tyr Leu Ser Lys Val
Ile Ile Ala Arg Ala Trp Ser Arg Asp Leu 1 5
10 15 Tyr Gln Leu His Gln Gly Leu Trp His Leu Phe
Pro Asn Arg Pro Asp 20 25
30 Ala Ala Arg Asp Phe Leu Phe His Val Glu Lys Arg Asn Thr Pro
Glu 35 40 45 Gly
Cys His Val Leu Leu Gln Ser Ala Gln Met Pro Val Ser Thr Ala 50
55 60 Val Ala Thr Val Ile Lys
Thr Lys Gln Val Glu Phe Gln Leu Gln Val 65 70
75 80 Gly Val Pro Leu Tyr Phe Arg Leu Arg Ala Asn
Pro Ile Lys Thr Ile 85 90
95 Leu Asp Asn Gln Lys Arg Leu Asp Ser Lys Gly Asn Ile Lys Arg Cys
100 105 110 Arg Val
Pro Leu Ile Lys Glu Ala Glu Gln Ile Ala Trp Leu Gln Arg 115
120 125 Lys Leu Gly Asn Ala Ala Arg
Val Glu Asp Val His Pro Ile Ser Glu 130 135
140 Arg Pro Gln Tyr Phe Ser Gly Asp Gly Lys Ser Gly
Lys Ile Gln Thr 145 150 155
160 Val Cys Phe Glu Gly Val Leu Thr Ile Asn Asp Ala Pro Ala Leu Ile
165 170 175 Asp Leu Val
Gln Gln Gly Ile Gly Pro Ala Lys Ser Met Gly Cys Gly 180
185 190 Leu Leu Ser Leu Ala Pro Leu
195 182154DNAArtificial SequenceDescription of
Artificial Sequence Synthetic polynucleotide 18atggctcaac
tggttaaaag cgaactggaa gagaaaaaaa gtgaactgcg ccacaaactg 60aaatatgtgc
cgcatgaata tatcgagctg attgaaattg cacgtaatcc gacccaggat 120cgtattctgg
aaatgaaagt gatggaattt tttatgaaag tgtacggcta tcgcggtgaa 180catctgggtg
gtagccgtaa accggatggt gcaatttata ccgttggtag cccgattgat 240tatggtgtta
ttgttgatac caaagcctat agcggtggtt ataatctgcc gattggtcag 300gcagatgaaa
tggaacgtta tgtggaagaa aatcagaccc gtgataaaca tctgaatccg 360aatgaatggt
ggaaagttta tccgagcagc gttaccgagt ttaaattcct gtttgttagc 420ggtcacttca
aaggcaacta taaagcacag ctgacccgtc tgaatcatat taccaattgt 480aatggtgcag
ttctgagcgt tgaagaactg ctgattggtg gtgaaatgat taaagcaggc 540accctgaccc
tggaagaagt tcgtcgcaaa tttaacaatg gcgaaatcaa ctttgcggat 600cccaccaacc
gcgcgaaagg cctggaagcg gtgagcgtgg cgagcatgaa tttgcttatt 660gataactgga
ttcctgtacg cccgcgaaac ggggggaaag tccaaatcat aaatctgcaa 720tcgctatact
gcagtagaga tcagtggcga ttaagtttgc cccgtgacga tatggaactg 780gccgctttag
cactgctggt ttgcattggg caaattatcg ccccggcaaa agatgacgtt 840gaatttcgac
atcgcataat gaatccgctc actgaagatg agtttcaaca actcatcgcg 900ccgtggatag
atatgttcta ccttaatcac gcagaacatc cctttatgca gaccaaaggt 960gtcaaagcaa
atgatgtgac tccaatggaa aaactgttgg ctggggtaag cggcgcgacg 1020aattgtgcat
ttgtcaatca accggggcag ggtgaagcat tatgtggtgg atgcactgcg 1080attgcgttat
tcaaccaggc gaatcaggca ccaggttttg gtggtggttt taaaagcggt 1140ttacgtggag
gaacacctgt aacaacgttc gtacgtggga tcgatcttcg ttcaacggtg 1200ttactcaatg
tcctcacatt acctcgtctt caaaaacaat ttcctaatga atcacatacg 1260gaaaaccaac
ctacctggat taaacctatc aagtccaatg agtctatacc tgcttcgtca 1320attgggtttg
tccgtggtct attctggcaa ccagcgcata ttgaattatg cgatcccatt 1380gggattggta
aatgttcttg ctgtggacag gaaagcaatt tgcgttatac cggttttctt 1440aaggaaaaat
ttacctttac agttaatggg ctatggcccc atccgcattc cccttgtctg 1500gtaacagtca
agaaagggga ggttgaggaa aaatttcttg ctttcaccac ctccgcacca 1560tcatggacac
aaatcagccg agttgtggta gataagatta ttcaaaatga aaatggaaat 1620cgcgtggcgg
cggttgtgaa tcaattcaga aatattgcgc cgcaaagtcc tcttgaattg 1680attatggggg
gatatcgtaa taatcaagca tctattcttg aacggcgtca tgatgtgttg 1740atgtttaatc
aggggtggca acaatacggc aatgtgataa acgaaatagt gactgttggt 1800ttgggatata
aaacagcctt acgcaaggcg ttatatacct ttgcagaagg gtttaaaaat 1860aaagacttca
aaggggccgg agtctctgtt catgagactg cagaaaggca tttctatcga 1920cagagtgaat
tattaattcc cgatgtactg gcgaatgtta atttttccca ggctgatgag 1980gtaatagctg
atttacgaga caaacttcat caattgtgtg aaatgctatt taatcaatct 2040gtagctccct
atgcacatca tcctaaatta ataagcacat tagcgcttgc ccgcgccacg 2100ctatacaaac
atttacggga gttaaaaccg caaggagggc catcaaatgg ctga
215419717PRTArtificial SequenceDescription of Artificial Sequence
Synthetic polypeptide 19Met Ala Gln Leu Val Lys Ser Glu Leu Glu Glu
Lys Lys Ser Glu Leu 1 5 10
15 Arg His Lys Leu Lys Tyr Val Pro His Glu Tyr Ile Glu Leu Ile Glu
20 25 30 Ile Ala
Arg Asn Pro Thr Gln Asp Arg Ile Leu Glu Met Lys Val Met 35
40 45 Glu Phe Phe Met Lys Val Tyr
Gly Tyr Arg Gly Glu His Leu Gly Gly 50 55
60 Ser Arg Lys Pro Asp Gly Ala Ile Tyr Thr Val Gly
Ser Pro Ile Asp 65 70 75
80 Tyr Gly Val Ile Val Asp Thr Lys Ala Tyr Ser Gly Gly Tyr Asn Leu
85 90 95 Pro Ile Gly
Gln Ala Asp Glu Met Glu Arg Tyr Val Glu Glu Asn Gln 100
105 110 Thr Arg Asp Lys His Leu Asn Pro
Asn Glu Trp Trp Lys Val Tyr Pro 115 120
125 Ser Ser Val Thr Glu Phe Lys Phe Leu Phe Val Ser Gly
His Phe Lys 130 135 140
Gly Asn Tyr Lys Ala Gln Leu Thr Arg Leu Asn His Ile Thr Asn Cys 145
150 155 160 Asn Gly Ala Val
Leu Ser Val Glu Glu Leu Leu Ile Gly Gly Glu Met 165
170 175 Ile Lys Ala Gly Thr Leu Thr Leu Glu
Glu Val Arg Arg Lys Phe Asn 180 185
190 Asn Gly Glu Ile Asn Phe Ala Asp Pro Thr Asn Arg Ala Lys
Gly Leu 195 200 205
Glu Ala Val Ser Val Ala Ser Met Asn Leu Leu Ile Asp Asn Trp Ile 210
215 220 Pro Val Arg Pro Arg
Asn Gly Gly Lys Val Gln Ile Ile Asn Leu Gln 225 230
235 240 Ser Leu Tyr Cys Ser Arg Asp Gln Trp Arg
Leu Ser Leu Pro Arg Asp 245 250
255 Asp Met Glu Leu Ala Ala Leu Ala Leu Leu Val Cys Ile Gly Gln
Ile 260 265 270 Ile
Ala Pro Ala Lys Asp Asp Val Glu Phe Arg His Arg Ile Met Asn 275
280 285 Pro Leu Thr Glu Asp Glu
Phe Gln Gln Leu Ile Ala Pro Trp Ile Asp 290 295
300 Met Phe Tyr Leu Asn His Ala Glu His Pro Phe
Met Gln Thr Lys Gly 305 310 315
320 Val Lys Ala Asn Asp Val Thr Pro Met Glu Lys Leu Leu Ala Gly Val
325 330 335 Ser Gly
Ala Thr Asn Cys Ala Phe Val Asn Gln Pro Gly Gln Gly Glu 340
345 350 Ala Leu Cys Gly Gly Cys Thr
Ala Ile Ala Leu Phe Asn Gln Ala Asn 355 360
365 Gln Ala Pro Gly Phe Gly Gly Gly Phe Lys Ser Gly
Leu Arg Gly Gly 370 375 380
Thr Pro Val Thr Thr Phe Val Arg Gly Ile Asp Leu Arg Ser Thr Val 385
390 395 400 Leu Leu Asn
Val Leu Thr Leu Pro Arg Leu Gln Lys Gln Phe Pro Asn 405
410 415 Glu Ser His Thr Glu Asn Gln Pro
Thr Trp Ile Lys Pro Ile Lys Ser 420 425
430 Asn Glu Ser Ile Pro Ala Ser Ser Ile Gly Phe Val Arg
Gly Leu Phe 435 440 445
Trp Gln Pro Ala His Ile Glu Leu Cys Asp Pro Ile Gly Ile Gly Lys 450
455 460 Cys Ser Cys Cys
Gly Gln Glu Ser Asn Leu Arg Tyr Thr Gly Phe Leu 465 470
475 480 Lys Glu Lys Phe Thr Phe Thr Val Asn
Gly Leu Trp Pro His Pro His 485 490
495 Ser Pro Cys Leu Val Thr Val Lys Lys Gly Glu Val Glu Glu
Lys Phe 500 505 510
Leu Ala Phe Thr Thr Ser Ala Pro Ser Trp Thr Gln Ile Ser Arg Val
515 520 525 Val Val Asp Lys
Ile Ile Gln Asn Glu Asn Gly Asn Arg Val Ala Ala 530
535 540 Val Val Asn Gln Phe Arg Asn Ile
Ala Pro Gln Ser Pro Leu Glu Leu 545 550
555 560 Ile Met Gly Gly Tyr Arg Asn Asn Gln Ala Ser Ile
Leu Glu Arg Arg 565 570
575 His Asp Val Leu Met Phe Asn Gln Gly Trp Gln Gln Tyr Gly Asn Val
580 585 590 Ile Asn Glu
Ile Val Thr Val Gly Leu Gly Tyr Lys Thr Ala Leu Arg 595
600 605 Lys Ala Leu Tyr Thr Phe Ala Glu
Gly Phe Lys Asn Lys Asp Phe Lys 610 615
620 Gly Ala Gly Val Ser Val His Glu Thr Ala Glu Arg His
Phe Tyr Arg 625 630 635
640 Gln Ser Glu Leu Leu Ile Pro Asp Val Leu Ala Asn Val Asn Phe Ser
645 650 655 Gln Ala Asp Glu
Val Ile Ala Asp Leu Arg Asp Lys Leu His Gln Leu 660
665 670 Cys Glu Met Leu Phe Asn Gln Ser Val
Ala Pro Tyr Ala His His Pro 675 680
685 Lys Leu Ile Ser Thr Leu Ala Leu Ala Arg Ala Thr Leu Tyr
Lys His 690 695 700
Leu Arg Glu Leu Lys Pro Gln Gly Gly Pro Ser Asn Gly 705
710 715 202154DNAArtificial SequenceDescription
of Artificial Sequence Synthetic polynucleotide 20atggctcaac
tggttaaaag cgaactggaa gagaaaaaaa gtgaactgcg ccacaaactg 60aaatatgtgc
cgcatgaata tatcgagctg attgaaattg cacgtaatcc gacccaggat 120cgtattctgg
aaatgaaagt gatggaattt tttatgaaag tgtacggcta tcgcggtgaa 180catctgggtg
gtagccgtaa accggatggt gcaatttata ccgttggtag cccgattgat 240tatggtgtta
ttgttgatac caaagcctat agcggtggtt ataatctgcc gattggtcag 300gcagatgaaa
tgcagcgtta tgtgaaagaa aatcagaccc gcaacaaaca tattaacccg 360aatgaatggt
ggaaagttta tccgagcagc gttaccgagt ttaaattcct gtttgttagc 420ggtcacttca
aaggcaacta taaagcacag ctgacccgtc tgaatcgtaa aaccaattgt 480aatggtgcag
ttctgagcgt tgaagaactg ctgattggtg gtgaaatgat taaagcaggc 540accctgaccc
tggaagaagt tcgtcgcaaa tttaacaatg gcgaaatcaa ctttgcggat 600cccaccaacc
gcgcgaaagg cctggaagcg gtgagcgtgg cgagcatgaa tttgcttatt 660gataactgga
ttcctgtacg cccgcgaaac ggggggaaag tccaaatcat aaatctgcaa 720tcgctatact
gcagtagaga tcagtggcga ttaagtttgc cccgtgacga tatggaactg 780gccgctttag
cactgctggt ttgcattggg caaattatcg ccccggcaaa agatgacgtt 840gaatttcgac
atcgcataat gaatccgctc actgaagatg agtttcaaca actcatcgcg 900ccgtggatag
atatgttcta ccttaatcac gcagaacatc cctttatgca gaccaaaggt 960gtcaaagcaa
atgatgtgac tccaatggaa aaactgttgg ctggggtaag cggcgcgacg 1020aattgtgcat
ttgtcaatca accggggcag ggtgaagcat tatgtggtgg atgcactgcg 1080attgcgttat
tcaaccaggc gaatcaggca ccaggttttg gtggtggttt taaaagcggt 1140ttacgtggag
gaacacctgt aacaacgttc gtacgtggga tcgatcttcg ttcaacggtg 1200ttactcaatg
tcctcacatt acctcgtctt caaaaacaat ttcctaatga atcacatacg 1260gaaaaccaac
ctacctggat taaacctatc aagtccaatg agtctatacc tgcttcgtca 1320attgggtttg
tccgtggtct attctggcaa ccagcgcata ttgaattatg cgatcccatt 1380gggattggta
aatgttcttg ctgtggacag gaaagcaatt tgcgttatac cggttttctt 1440aaggaaaaat
ttacctttac agttaatggg ctatggcccc atccgcattc cccttgtctg 1500gtaacagtca
agaaagggga ggttgaggaa aaatttcttg ctttcaccac ctccgcacca 1560tcatggacac
aaatcagccg agttgtggta gataagatta ttcaaaatga aaatggaaat 1620cgcgtggcgg
cggttgtgaa tcaattcaga aatattgcgc cgcaaagtcc tcttgaattg 1680attatggggg
gatatcgtaa taatcaagca tctattcttg aacggcgtca tgatgtgttg 1740atgtttaatc
aggggtggca acaatacggc aatgtgataa acgaaatagt gactgttggt 1800ttgggatata
aaacagcctt acgcaaggcg ttatatacct ttgcagaagg gtttaaaaat 1860aaagacttca
aaggggccgg agtctctgtt catgagactg cagaaaggca tttctatcga 1920cagagtgaat
tattaattcc cgatgtactg gcgaatgtta atttttccca ggctgatgag 1980gtaatagctg
atttacgaga caaacttcat caattgtgtg aaatgctatt taatcaatct 2040gtagctccct
atgcacatca tcctaaatta ataagcacat tagcgcttgc ccgcgccacg 2100ctatacaaac
atttacggga gttaaaaccg caaggagggc catcaaatgg ctga
215421717PRTArtificial SequenceDescription of Artificial Sequence
Synthetic polypeptide 21Met Ala Gln Leu Val Lys Ser Glu Leu Glu Glu
Lys Lys Ser Glu Leu 1 5 10
15 Arg His Lys Leu Lys Tyr Val Pro His Glu Tyr Ile Glu Leu Ile Glu
20 25 30 Ile Ala
Arg Asn Pro Thr Gln Asp Arg Ile Leu Glu Met Lys Val Met 35
40 45 Glu Phe Phe Met Lys Val Tyr
Gly Tyr Arg Gly Glu His Leu Gly Gly 50 55
60 Ser Arg Lys Pro Asp Gly Ala Ile Tyr Thr Val Gly
Ser Pro Ile Asp 65 70 75
80 Tyr Gly Val Ile Val Asp Thr Lys Ala Tyr Ser Gly Gly Tyr Asn Leu
85 90 95 Pro Ile Gly
Gln Ala Asp Glu Met Gln Arg Tyr Val Lys Glu Asn Gln 100
105 110 Thr Arg Asn Lys His Ile Asn Pro
Asn Glu Trp Trp Lys Val Tyr Pro 115 120
125 Ser Ser Val Thr Glu Phe Lys Phe Leu Phe Val Ser Gly
His Phe Lys 130 135 140
Gly Asn Tyr Lys Ala Gln Leu Thr Arg Leu Asn Arg Lys Thr Asn Cys 145
150 155 160 Asn Gly Ala Val
Leu Ser Val Glu Glu Leu Leu Ile Gly Gly Glu Met 165
170 175 Ile Lys Ala Gly Thr Leu Thr Leu Glu
Glu Val Arg Arg Lys Phe Asn 180 185
190 Asn Gly Glu Ile Asn Phe Ala Asp Pro Thr Asn Arg Ala Lys
Gly Leu 195 200 205
Glu Ala Val Ser Val Ala Ser Met Asn Leu Leu Ile Asp Asn Trp Ile 210
215 220 Pro Val Arg Pro Arg
Asn Gly Gly Lys Val Gln Ile Ile Asn Leu Gln 225 230
235 240 Ser Leu Tyr Cys Ser Arg Asp Gln Trp Arg
Leu Ser Leu Pro Arg Asp 245 250
255 Asp Met Glu Leu Ala Ala Leu Ala Leu Leu Val Cys Ile Gly Gln
Ile 260 265 270 Ile
Ala Pro Ala Lys Asp Asp Val Glu Phe Arg His Arg Ile Met Asn 275
280 285 Pro Leu Thr Glu Asp Glu
Phe Gln Gln Leu Ile Ala Pro Trp Ile Asp 290 295
300 Met Phe Tyr Leu Asn His Ala Glu His Pro Phe
Met Gln Thr Lys Gly 305 310 315
320 Val Lys Ala Asn Asp Val Thr Pro Met Glu Lys Leu Leu Ala Gly Val
325 330 335 Ser Gly
Ala Thr Asn Cys Ala Phe Val Asn Gln Pro Gly Gln Gly Glu 340
345 350 Ala Leu Cys Gly Gly Cys Thr
Ala Ile Ala Leu Phe Asn Gln Ala Asn 355 360
365 Gln Ala Pro Gly Phe Gly Gly Gly Phe Lys Ser Gly
Leu Arg Gly Gly 370 375 380
Thr Pro Val Thr Thr Phe Val Arg Gly Ile Asp Leu Arg Ser Thr Val 385
390 395 400 Leu Leu Asn
Val Leu Thr Leu Pro Arg Leu Gln Lys Gln Phe Pro Asn 405
410 415 Glu Ser His Thr Glu Asn Gln Pro
Thr Trp Ile Lys Pro Ile Lys Ser 420 425
430 Asn Glu Ser Ile Pro Ala Ser Ser Ile Gly Phe Val Arg
Gly Leu Phe 435 440 445
Trp Gln Pro Ala His Ile Glu Leu Cys Asp Pro Ile Gly Ile Gly Lys 450
455 460 Cys Ser Cys Cys
Gly Gln Glu Ser Asn Leu Arg Tyr Thr Gly Phe Leu 465 470
475 480 Lys Glu Lys Phe Thr Phe Thr Val Asn
Gly Leu Trp Pro His Pro His 485 490
495 Ser Pro Cys Leu Val Thr Val Lys Lys Gly Glu Val Glu Glu
Lys Phe 500 505 510
Leu Ala Phe Thr Thr Ser Ala Pro Ser Trp Thr Gln Ile Ser Arg Val
515 520 525 Val Val Asp Lys
Ile Ile Gln Asn Glu Asn Gly Asn Arg Val Ala Ala 530
535 540 Val Val Asn Gln Phe Arg Asn Ile
Ala Pro Gln Ser Pro Leu Glu Leu 545 550
555 560 Ile Met Gly Gly Tyr Arg Asn Asn Gln Ala Ser Ile
Leu Glu Arg Arg 565 570
575 His Asp Val Leu Met Phe Asn Gln Gly Trp Gln Gln Tyr Gly Asn Val
580 585 590 Ile Asn Glu
Ile Val Thr Val Gly Leu Gly Tyr Lys Thr Ala Leu Arg 595
600 605 Lys Ala Leu Tyr Thr Phe Ala Glu
Gly Phe Lys Asn Lys Asp Phe Lys 610 615
620 Gly Ala Gly Val Ser Val His Glu Thr Ala Glu Arg His
Phe Tyr Arg 625 630 635
640 Gln Ser Glu Leu Leu Ile Pro Asp Val Leu Ala Asn Val Asn Phe Ser
645 650 655 Gln Ala Asp Glu
Val Ile Ala Asp Leu Arg Asp Lys Leu His Gln Leu 660
665 670 Cys Glu Met Leu Phe Asn Gln Ser Val
Ala Pro Tyr Ala His His Pro 675 680
685 Lys Leu Ile Ser Thr Leu Ala Leu Ala Arg Ala Thr Leu Tyr
Lys His 690 695 700
Leu Arg Glu Leu Lys Pro Gln Gly Gly Pro Ser Asn Gly 705
710 715 222235DNAArtificial SequenceDescription
of Artificial Sequence Synthetic polynucleotide 22atgcatcacc
atcatcacca cccgaaaaaa aagcgcaaag tggatccgaa gaaaaaacgt 60aaagttgaag
atccgaaaga catggctcaa ctggttaaaa gcgaactgga agagaaaaaa 120agtgaactgc
gccacaaact gaaatatgtg ccgcatgaat atatcgagct gattgaaatt 180gcacgtaatc
cgacccagga tcgtattctg gaaatgaaag tgatggaatt ttttatgaaa 240gtgtacggct
atcgcggtga acatctgggt ggtagccgta aaccggatgg tgcaatttat 300accgttggta
gcccgattga ttatggtgtt attgttgata ccaaagccta tagcggtggt 360tataatctgc
cgattggtca ggcagatgaa atgcagcgtt atgtgaaaga aaatcagacc 420cgcaacaaac
atattaaccc gaatgaatgg tggaaagttt atccgagcag cgttaccgag 480tttaaattcc
tgtttgttag cggtcacttc aaaggcaact ataaagcaca gctgacccgt 540ctgaatcgta
aaaccaattg taatggtgca gttctgagcg ttgaagaact gctgattggt 600ggtgaaatga
ttaaagcagg caccctgacc ctggaagaag ttcgtcgcaa atttaacaat 660ggcgaaatca
actttgcgga tcccaccaac cgcgcgaaag gcctggaagc ggtgagcgtg 720gcgagcatga
atttgcttat tgataactgg attcctgtac gcccgcgaaa cggggggaaa 780gtccaaatca
taaatctgca atcgctatac tgcagtagag atcagtggcg attaagtttg 840ccccgtgacg
atatggaact ggccgcttta gcactgctgg tttgcattgg gcaaattatc 900gccccggcaa
aagatgacgt tgaatttcga catcgcataa tgaatccgct cactgaagat 960gagtttcaac
aactcatcgc gccgtggata gatatgttct accttaatca cgcagaacat 1020ccctttatgc
agaccaaagg tgtcaaagca aatgatgtga ctccaatgga aaaactgttg 1080gctggggtaa
gcggcgcgac gaattgtgca tttgtcaatc aaccggggca gggtgaagca 1140ttatgtggtg
gatgcactgc gattgcgtta ttcaaccagg cgaatcaggc accaggtttt 1200ggtggtggtt
ttaaaagcgg tttacgtgga ggaacacctg taacaacgtt cgtacgtggg 1260atcgatcttc
gttcaacggt gttactcaat gtcctcacat tacctcgtct tcaaaaacaa 1320tttcctaatg
aatcacatac ggaaaaccaa cctacctgga ttaaacctat caagtccaat 1380gagtctatac
ctgcttcgtc aattgggttt gtccgtggtc tattctggca accagcgcat 1440attgaattat
gcgatcccat tgggattggt aaatgttctt gctgtggaca ggaaagcaat 1500ttgcgttata
ccggttttct taaggaaaaa tttaccttta cagttaatgg gctatggccc 1560catccgcatt
ccccttgtct ggtaacagtc aagaaagggg aggttgagga aaaatttctt 1620gctttcacca
cctccgcacc atcatggaca caaatcagcc gagttgtggt agataagatt 1680attcaaaatg
aaaatggaaa tcgcgtggcg gcggttgtga atcaattcag aaatattgcg 1740ccgcaaagtc
ctcttgaatt gattatgggg ggatatcgta ataatcaagc atctattctt 1800gaacggcgtc
atgatgtgtt gatgtttaat caggggtggc aacaatacgg caatgtgata 1860aacgaaatag
tgactgttgg tttgggatat aaaacagcct tacgcaaggc gttatatacc 1920tttgcagaag
ggtttaaaaa taaagacttc aaaggggccg gagtctctgt tcatgagact 1980gcagaaaggc
atttctatcg acagagtgaa ttattaattc ccgatgtact ggcgaatgtt 2040aatttttccc
aggctgatga ggtaatagct gatttacgag acaaacttca tcaattgtgt 2100gaaatgctat
ttaatcaatc tgtagctccc tatgcacatc atcctaaatt aataagcaca 2160ttagcgcttg
cccgcgccac gctatacaaa catttacggg agttaaaacc gcaaggaggg 2220ccatcaaatg
gctga
223523744PRTArtificial SequenceDescription of Artificial Sequence
Synthetic polypeptide 23Met His His His His His His Pro Lys Lys Lys
Arg Lys Val Asp Pro 1 5 10
15 Lys Lys Lys Arg Lys Val Glu Asp Pro Lys Asp Met Ala Gln Leu Val
20 25 30 Lys Ser
Glu Leu Glu Glu Lys Lys Ser Glu Leu Arg His Lys Leu Lys 35
40 45 Tyr Val Pro His Glu Tyr Ile
Glu Leu Ile Glu Ile Ala Arg Asn Pro 50 55
60 Thr Gln Asp Arg Ile Leu Glu Met Lys Val Met Glu
Phe Phe Met Lys 65 70 75
80 Val Tyr Gly Tyr Arg Gly Glu His Leu Gly Gly Ser Arg Lys Pro Asp
85 90 95 Gly Ala Ile
Tyr Thr Val Gly Ser Pro Ile Asp Tyr Gly Val Ile Val 100
105 110 Asp Thr Lys Ala Tyr Ser Gly Gly
Tyr Asn Leu Pro Ile Gly Gln Ala 115 120
125 Asp Glu Met Gln Arg Tyr Val Lys Glu Asn Gln Thr Arg
Asn Lys His 130 135 140
Ile Asn Pro Asn Glu Trp Trp Lys Val Tyr Pro Ser Ser Val Thr Glu 145
150 155 160 Phe Lys Phe Leu
Phe Val Ser Gly His Phe Lys Gly Asn Tyr Lys Ala 165
170 175 Gln Leu Thr Arg Leu Asn Arg Lys Thr
Asn Cys Asn Gly Ala Val Leu 180 185
190 Ser Val Glu Glu Leu Leu Ile Gly Gly Glu Met Ile Lys Ala
Gly Thr 195 200 205
Leu Thr Leu Glu Glu Val Arg Arg Lys Phe Asn Asn Gly Glu Ile Asn 210
215 220 Phe Ala Asp Pro Thr
Asn Arg Ala Lys Gly Leu Glu Ala Val Ser Val 225 230
235 240 Ala Ser Met Asn Leu Leu Ile Asp Asn Trp
Ile Pro Val Arg Pro Arg 245 250
255 Asn Gly Gly Lys Val Gln Ile Ile Asn Leu Gln Ser Leu Tyr Cys
Ser 260 265 270 Arg
Asp Gln Trp Arg Leu Ser Leu Pro Arg Asp Asp Met Glu Leu Ala 275
280 285 Ala Leu Ala Leu Leu Val
Cys Ile Gly Gln Ile Ile Ala Pro Ala Lys 290 295
300 Asp Asp Val Glu Phe Arg His Arg Ile Met Asn
Pro Leu Thr Glu Asp 305 310 315
320 Glu Phe Gln Gln Leu Ile Ala Pro Trp Ile Asp Met Phe Tyr Leu Asn
325 330 335 His Ala
Glu His Pro Phe Met Gln Thr Lys Gly Val Lys Ala Asn Asp 340
345 350 Val Thr Pro Met Glu Lys Leu
Leu Ala Gly Val Ser Gly Ala Thr Asn 355 360
365 Cys Ala Phe Val Asn Gln Pro Gly Gln Gly Glu Ala
Leu Cys Gly Gly 370 375 380
Cys Thr Ala Ile Ala Leu Phe Asn Gln Ala Asn Gln Ala Pro Gly Phe 385
390 395 400 Gly Gly Gly
Phe Lys Ser Gly Leu Arg Gly Gly Thr Pro Val Thr Thr 405
410 415 Phe Val Arg Gly Ile Asp Leu Arg
Ser Thr Val Leu Leu Asn Val Leu 420 425
430 Thr Leu Pro Arg Leu Gln Lys Gln Phe Pro Asn Glu Ser
His Thr Glu 435 440 445
Asn Gln Pro Thr Trp Ile Lys Pro Ile Lys Ser Asn Glu Ser Ile Pro 450
455 460 Ala Ser Ser Ile
Gly Phe Val Arg Gly Leu Phe Trp Gln Pro Ala His 465 470
475 480 Ile Glu Leu Cys Asp Pro Ile Gly Ile
Gly Lys Cys Ser Cys Cys Gly 485 490
495 Gln Glu Ser Asn Leu Arg Tyr Thr Gly Phe Leu Lys Glu Lys
Phe Thr 500 505 510
Phe Thr Val Asn Gly Leu Trp Pro His Pro His Ser Pro Cys Leu Val
515 520 525 Thr Val Lys Lys
Gly Glu Val Glu Glu Lys Phe Leu Ala Phe Thr Thr 530
535 540 Ser Ala Pro Ser Trp Thr Gln Ile
Ser Arg Val Val Val Asp Lys Ile 545 550
555 560 Ile Gln Asn Glu Asn Gly Asn Arg Val Ala Ala Val
Val Asn Gln Phe 565 570
575 Arg Asn Ile Ala Pro Gln Ser Pro Leu Glu Leu Ile Met Gly Gly Tyr
580 585 590 Arg Asn Asn
Gln Ala Ser Ile Leu Glu Arg Arg His Asp Val Leu Met 595
600 605 Phe Asn Gln Gly Trp Gln Gln Tyr
Gly Asn Val Ile Asn Glu Ile Val 610 615
620 Thr Val Gly Leu Gly Tyr Lys Thr Ala Leu Arg Lys Ala
Leu Tyr Thr 625 630 635
640 Phe Ala Glu Gly Phe Lys Asn Lys Asp Phe Lys Gly Ala Gly Val Ser
645 650 655 Val His Glu Thr
Ala Glu Arg His Phe Tyr Arg Gln Ser Glu Leu Leu 660
665 670 Ile Pro Asp Val Leu Ala Asn Val Asn
Phe Ser Gln Ala Asp Glu Val 675 680
685 Ile Ala Asp Leu Arg Asp Lys Leu His Gln Leu Cys Glu Met
Leu Phe 690 695 700
Asn Gln Ser Val Ala Pro Tyr Ala His His Pro Lys Leu Ile Ser Thr 705
710 715 720 Leu Ala Leu Ala Arg
Ala Thr Leu Tyr Lys His Leu Arg Glu Leu Lys 725
730 735 Pro Gln Gly Gly Pro Ser Asn Gly
740 242235DNAArtificial SequenceDescription of
Artificial Sequence Synthetic polynucleotide 24atgcatcacc
atcatcacca cccgaaaaaa aagcgcaaag tggatccgaa gaaaaaacgt 60aaagttgaag
atccgaaaga catggctcaa ctggttaaaa gcgaactgga agagaaaaaa 120agtgaactgc
gccacaaact gaaatatgtg ccgcatgaat atatcgagct gattgaaatt 180gcacgtaatc
cgacccagga tcgtattctg gaaatgaaag tgatggaatt ttttatgaaa 240gtgtacggct
atcgcggtga acatctgggt ggtagccgta aaccggatgg tgcaatttat 300accgttggta
gcccgattga ttatggtgtt attgttgata ccaaagccta tagcggtggt 360tataatctgc
cgattggtca ggcagatgaa atggaacgtt atgtggaaga aaatcagacc 420cgtgataaac
atctgaatcc gaatgaatgg tggaaagttt atccgagcag cgttaccgag 480tttaaattcc
tgtttgttag cggtcacttc aaaggcaact ataaagcaca gctgacccgt 540ctgaatcata
ttaccaattg taatggtgca gttctgagcg ttgaagaact gctgattggt 600ggtgaaatga
ttaaagcagg caccctgacc ctggaagaag ttcgtcgcaa atttaacaat 660ggcgaaatca
actttgcgga tcccaccaac cgcgcgaaag gcctggaagc ggtgagcgtg 720gcgagcatga
atttgcttat tgataactgg attcctgtac gcccgcgaaa cggggggaaa 780gtccaaatca
taaatctgca atcgctatac tgcagtagag atcagtggcg attaagtttg 840ccccgtgacg
atatggaact ggccgcttta gcactgctgg tttgcattgg gcaaattatc 900gccccggcaa
aagatgacgt tgaatttcga catcgcataa tgaatccgct cactgaagat 960gagtttcaac
aactcatcgc gccgtggata gatatgttct accttaatca cgcagaacat 1020ccctttatgc
agaccaaagg tgtcaaagca aatgatgtga ctccaatgga aaaactgttg 1080gctggggtaa
gcggcgcgac gaattgtgca tttgtcaatc aaccggggca gggtgaagca 1140ttatgtggtg
gatgcactgc gattgcgtta ttcaaccagg cgaatcaggc accaggtttt 1200ggtggtggtt
ttaaaagcgg tttacgtgga ggaacacctg taacaacgtt cgtacgtggg 1260atcgatcttc
gttcaacggt gttactcaat gtcctcacat tacctcgtct tcaaaaacaa 1320tttcctaatg
aatcacatac ggaaaaccaa cctacctgga ttaaacctat caagtccaat 1380gagtctatac
ctgcttcgtc aattgggttt gtccgtggtc tattctggca accagcgcat 1440attgaattat
gcgatcccat tgggattggt aaatgttctt gctgtggaca ggaaagcaat 1500ttgcgttata
ccggttttct taaggaaaaa tttaccttta cagttaatgg gctatggccc 1560catccgcatt
ccccttgtct ggtaacagtc aagaaagggg aggttgagga aaaatttctt 1620gctttcacca
cctccgcacc atcatggaca caaatcagcc gagttgtggt agataagatt 1680attcaaaatg
aaaatggaaa tcgcgtggcg gcggttgtga atcaattcag aaatattgcg 1740ccgcaaagtc
ctcttgaatt gattatgggg ggatatcgta ataatcaagc atctattctt 1800gaacggcgtc
atgatgtgtt gatgtttaat caggggtggc aacaatacgg caatgtgata 1860aacgaaatag
tgactgttgg tttgggatat aaaacagcct tacgcaaggc gttatatacc 1920tttgcagaag
ggtttaaaaa taaagacttc aaaggggccg gagtctctgt tcatgagact 1980gcagaaaggc
atttctatcg acagagtgaa ttattaattc ccgatgtact ggcgaatgtt 2040aatttttccc
aggctgatga ggtaatagct gatttacgag acaaacttca tcaattgtgt 2100gaaatgctat
ttaatcaatc tgtagctccc tatgcacatc atcctaaatt aataagcaca 2160ttagcgcttg
cccgcgccac gctatacaaa catttacggg agttaaaacc gcaaggaggg 2220ccatcaaatg
gctga
223525744PRTArtificial SequenceDescription of Artificial Sequence
Synthetic polypeptide 25Met His His His His His His Pro Lys Lys Lys
Arg Lys Val Asp Pro 1 5 10
15 Lys Lys Lys Arg Lys Val Glu Asp Pro Lys Asp Met Ala Gln Leu Val
20 25 30 Lys Ser
Glu Leu Glu Glu Lys Lys Ser Glu Leu Arg His Lys Leu Lys 35
40 45 Tyr Val Pro His Glu Tyr Ile
Glu Leu Ile Glu Ile Ala Arg Asn Pro 50 55
60 Thr Gln Asp Arg Ile Leu Glu Met Lys Val Met Glu
Phe Phe Met Lys 65 70 75
80 Val Tyr Gly Tyr Arg Gly Glu His Leu Gly Gly Ser Arg Lys Pro Asp
85 90 95 Gly Ala Ile
Tyr Thr Val Gly Ser Pro Ile Asp Tyr Gly Val Ile Val 100
105 110 Asp Thr Lys Ala Tyr Ser Gly Gly
Tyr Asn Leu Pro Ile Gly Gln Ala 115 120
125 Asp Glu Met Glu Arg Tyr Val Glu Glu Asn Gln Thr Arg
Asp Lys His 130 135 140
Leu Asn Pro Asn Glu Trp Trp Lys Val Tyr Pro Ser Ser Val Thr Glu 145
150 155 160 Phe Lys Phe Leu
Phe Val Ser Gly His Phe Lys Gly Asn Tyr Lys Ala 165
170 175 Gln Leu Thr Arg Leu Asn His Ile Thr
Asn Cys Asn Gly Ala Val Leu 180 185
190 Ser Val Glu Glu Leu Leu Ile Gly Gly Glu Met Ile Lys Ala
Gly Thr 195 200 205
Leu Thr Leu Glu Glu Val Arg Arg Lys Phe Asn Asn Gly Glu Ile Asn 210
215 220 Phe Ala Asp Pro Thr
Asn Arg Ala Lys Gly Leu Glu Ala Val Ser Val 225 230
235 240 Ala Ser Met Asn Leu Leu Ile Asp Asn Trp
Ile Pro Val Arg Pro Arg 245 250
255 Asn Gly Gly Lys Val Gln Ile Ile Asn Leu Gln Ser Leu Tyr Cys
Ser 260 265 270 Arg
Asp Gln Trp Arg Leu Ser Leu Pro Arg Asp Asp Met Glu Leu Ala 275
280 285 Ala Leu Ala Leu Leu Val
Cys Ile Gly Gln Ile Ile Ala Pro Ala Lys 290 295
300 Asp Asp Val Glu Phe Arg His Arg Ile Met Asn
Pro Leu Thr Glu Asp 305 310 315
320 Glu Phe Gln Gln Leu Ile Ala Pro Trp Ile Asp Met Phe Tyr Leu Asn
325 330 335 His Ala
Glu His Pro Phe Met Gln Thr Lys Gly Val Lys Ala Asn Asp 340
345 350 Val Thr Pro Met Glu Lys Leu
Leu Ala Gly Val Ser Gly Ala Thr Asn 355 360
365 Cys Ala Phe Val Asn Gln Pro Gly Gln Gly Glu Ala
Leu Cys Gly Gly 370 375 380
Cys Thr Ala Ile Ala Leu Phe Asn Gln Ala Asn Gln Ala Pro Gly Phe 385
390 395 400 Gly Gly Gly
Phe Lys Ser Gly Leu Arg Gly Gly Thr Pro Val Thr Thr 405
410 415 Phe Val Arg Gly Ile Asp Leu Arg
Ser Thr Val Leu Leu Asn Val Leu 420 425
430 Thr Leu Pro Arg Leu Gln Lys Gln Phe Pro Asn Glu Ser
His Thr Glu 435 440 445
Asn Gln Pro Thr Trp Ile Lys Pro Ile Lys Ser Asn Glu Ser Ile Pro 450
455 460 Ala Ser Ser Ile
Gly Phe Val Arg Gly Leu Phe Trp Gln Pro Ala His 465 470
475 480 Ile Glu Leu Cys Asp Pro Ile Gly Ile
Gly Lys Cys Ser Cys Cys Gly 485 490
495 Gln Glu Ser Asn Leu Arg Tyr Thr Gly Phe Leu Lys Glu Lys
Phe Thr 500 505 510
Phe Thr Val Asn Gly Leu Trp Pro His Pro His Ser Pro Cys Leu Val
515 520 525 Thr Val Lys Lys
Gly Glu Val Glu Glu Lys Phe Leu Ala Phe Thr Thr 530
535 540 Ser Ala Pro Ser Trp Thr Gln Ile
Ser Arg Val Val Val Asp Lys Ile 545 550
555 560 Ile Gln Asn Glu Asn Gly Asn Arg Val Ala Ala Val
Val Asn Gln Phe 565 570
575 Arg Asn Ile Ala Pro Gln Ser Pro Leu Glu Leu Ile Met Gly Gly Tyr
580 585 590 Arg Asn Asn
Gln Ala Ser Ile Leu Glu Arg Arg His Asp Val Leu Met 595
600 605 Phe Asn Gln Gly Trp Gln Gln Tyr
Gly Asn Val Ile Asn Glu Ile Val 610 615
620 Thr Val Gly Leu Gly Tyr Lys Thr Ala Leu Arg Lys Ala
Leu Tyr Thr 625 630 635
640 Phe Ala Glu Gly Phe Lys Asn Lys Asp Phe Lys Gly Ala Gly Val Ser
645 650 655 Val His Glu Thr
Ala Glu Arg His Phe Tyr Arg Gln Ser Glu Leu Leu 660
665 670 Ile Pro Asp Val Leu Ala Asn Val Asn
Phe Ser Gln Ala Asp Glu Val 675 680
685 Ile Ala Asp Leu Arg Asp Lys Leu His Gln Leu Cys Glu Met
Leu Phe 690 695 700
Asn Gln Ser Val Ala Pro Tyr Ala His His Pro Lys Leu Ile Ser Thr 705
710 715 720 Leu Ala Leu Ala Arg
Ala Thr Leu Tyr Lys His Leu Arg Glu Leu Lys 725
730 735 Pro Gln Gly Gly Pro Ser Asn Gly
740 26168DNAArtificial SequenceDescription of
Artificial Sequence Synthetic polynucleotide 26gaattcacaa
cggtgagcaa gtcactgttg gcaagccagg atctgaacaa taccgtcttg 60ctttcgagcg
ctagctctag aactagtcct cagcctaggc ctcgttccga agctgtcttt 120cgctgctgag
ggtgacgatc ccgcataggc ggcctttaac tcggatcc
16827163DNAArtificial SequenceDescription of Artificial Sequence
Synthetic polynucleotide 27gaattcacaa cggtgagcaa gtcactgttg
gcaagccagg atctgaacaa taccgtcttt 60tcgagcgcta gctctagaac tagtcctcag
cctaggcctc gttcaagctg tctttcgctg 120ctgagggtga cgatcccgca taggcggcct
ttaactcgga tcc 16328158DNAArtificial
SequenceDescription of Artificial Sequence Synthetic polynucleotide
28gaattcacaa cggtgagcaa gtcactgttg gcaagccagg atctgaacaa taccgtcttc
60gagcgctagc tctagaacta gtcctcagcc taggcctcga agctgtcttt cgctgctgag
120ggtgacgatc ccgcataggc ggcctttaac tcggatcc
15829153DNAArtificial SequenceDescription of Artificial Sequence
Synthetic polynucleotide 29gaattcacaa cggtgagcaa gtcactgttg
gcaagccagg atctgaacaa taccgtcttg 60cgctagctct agaactagtc ctcagcctag
gcctaagctg tctttcgctg ctgagggtga 120cgatcccgca taggcggcct ttaactcgga
tcc 15330148DNAArtificial
SequenceDescription of Artificial Sequence Synthetic polynucleotide
30gaattcacaa cggtgagcaa gtcactgttg gcaagccagg atctgaacaa taccgtcttg
60ctagctctag aactagtcct cagcctagga agctgtcttt cgctgctgag ggtgacgatc
120ccgcataggc ggcctttaac tcggatcc
14831143DNAArtificial SequenceDescription of Artificial Sequence
Synthetic polynucleotide 31gaattcacaa cggtgagcaa gtcactgttg
gcaagccagg atctgaacaa taccgtcttc 60tctagaacta gtcctcagcc taggaagctg
tctttcgctg ctgagggtga cgatcccgca 120taggcggcct ttaactcgga tcc
1433241DNAArtificial
SequenceDescription of Artificial Sequence Synthetic
oligonucleotide 32cttgcgctag ctctagaact agtcctcagc ctaggcctaa g
413341DNAArtificial SequenceDescription of Artificial
Sequence Synthetic oligonucleotide 33cttaggccta ggctgaggac
tagttctaga gctagcgcaa g 413445DNAArtificial
SequenceDescription of Artificial Sequence Synthetic
oligonucleotide 34cttgcgctag ctctagaact agctagtcct cagcctaggc ctaag
453545DNAArtificial SequenceDescription of Artificial
Sequence Synthetic oligonucleotide 35cttaggccta ggctgaggac
tagctagttc tagagctagc gcaag 45361100DNAHomo sapiens
36ggtggaacaa gatggattat caagtgtcaa gtccaatcta tgacatcaat tattatacat
60cggagccctg ccaaaaaatc aatgtgaagc aaatcgcagc ccgcctcctg cctccgctct
120actcactggt gttcatcttt ggttttgtgg gcaacatgct ggtcatcctc atcctgataa
180actgcaaaag gctgaagagc atgactgaca tctacctgct caacctggcc atctctgacc
240tgtttttcct tcttactgtc cccttctggg ctcactatgc tgccgcccag tgggactttg
300gaaatacaat gtgtcaactc ttgacagggc tctattttat aggcttcttc tctggaatct
360tcttcatcat cctcctgaca atcgataggt acctggctgt cgtccatgct gtgtttgctt
420taaaagccag gacggtcacc tttggggtgg tgacaagtgt gatcacttgg gtggtggctg
480tgtttgcgtc tctcccagga atcatcttta ccagatctca aaaagaaggt cttcattaca
540cctgcagctc tcattttcca tacagtcagt atcaattctg gaagaatttc cagacattaa
600agatagtcat cttggggctg gtcctgccgc tgcttgtcat ggtcatctgc tactcgggaa
660tcctaaaaac tctgcttcgg tgtcgaaatg agaagaagag gcacagggct gtgaggctta
720tcttcaccat catgattgtt tattttctct tctgggctcc ctacaacatt gtccttctcc
780tgaacacctt ccaggaattc tttggcctga ataattgcag tagctctaac aggttggacc
840aagctatgca ggtgacagag actcttggga tgacgcactg ctgcatcaac cccatcatct
900atgcctttgt cggggagaag ttcagaaact acctcttagt cttcttccaa aagcacattg
960ccaaacgctt ctgcaaatgc tgttctattt tccagcaaga ggctcccgag cgagcaagct
1020cagtttacac ccgatccact ggggagcagg aaatatctgt gggcttgtga cacggactca
1080agtgggctgg tgacccagtc
110037424DNAArtificial SequenceDescription of Artificial Sequence
Synthetic polynucleotide 37ccatggtaat acgactcact atagggagaa
ttagctgatc tttaataata aggaaatgtt 60acattaaggt tggtgggttg tttttatggg
aaaaaatgct ttaagaacaa atgtatactt 120ttagagagtt ccccgcgcca gcggggataa
accgcaaaca cagcatggac gacagccagg 180tacctagagt tccccgcgcc agcggggata
aaccgcaaac acagcatgga cgacagccag 240gtacctagag ttccccgcgc cagcggggat
aaaccgcaaa cacagcatgg acgacagcca 300ggtacctaga gttccccgcg ccagcgggga
taaaccgaaa acaaaaggct cagtcggaag 360actgggcctt ttgttttaac cccttggggc
ctctaaacgg gtcttgaggg gttttttggg 420tacc
42438424DNAArtificial
SequenceDescription of Artificial Sequence Synthetic polynucleotide
38ccatggtaat acgactcact atagggagaa ttagctgatc tttaataata aggaaatgtt
60acattaaggt tggtgggttg tttttatggg aaaaaatgct ttaagaacaa atgtatactt
120ttagagagtt ccccgcgcca gcggggataa accgtgtgat cacttgggtg gtggctgtgt
180ttgcgtgagt tccccgcgcc agcggggata aaccgtgtga tcacttgggt ggtggctgtg
240tttgcgtgag ttccccgcgc cagcggggat aaaccgtgtg atcacttggg tggtggctgt
300gtttgcgtga gttccccgcg ccagcgggga taaaccgaaa acaaaaggct cagtcggaag
360actgggcctt ttgttttaac cccttggggc ctctaaacgg gtcttgaggg gttttttggg
420tacc
4243943DNAArtificial SequenceDescription of Artificial Sequence Synthetic
oligonucleotide 39aaggatgcca gtgataagtg gaatgccatg tgggctgtca aaa
434043DNAArtificial SequenceDescription of Artificial
Sequence Synthetic oligonucleotide 40aaggatgcga gtgataagtg
gaatgccatg tgggctgtca aaa 434119DNAArtificial
SequenceDescription of Artificial Sequence Synthetic
oligonucleotide 41gccatgtggg ctgtcaaaa
194223DNAArtificial SequenceDescription of Artificial
Sequence Synthetic oligonucleotide 42gaatgccatg tgggctgtca aaa
23431604PRTStreptomyces sp. 43Met
Pro Asp Gln Leu Asn Ala Pro Thr Pro Leu Gly Asp Arg Leu Thr 1
5 10 15 Gly Ala Val Arg Thr Val
Trp Ala Lys His Asp Arg Asp Thr Gly Lys 20
25 30 Trp Leu Pro Leu Trp Arg His Met Thr Asp
Ser Ala Ala Val Ala Gly 35 40
45 Leu Leu Trp Asp His Trp Leu Pro Arg Asn Ile Lys Asp Leu
Ile Ala 50 55 60
Glu Pro Leu Pro Gly Gly Val Ala Asp Ala Arg Ser Leu Cys Val Trp 65
70 75 80 Leu Ala Gly Thr His
Asp Ile Gly Lys Ala Thr Pro Ala Phe Ala Cys 85
90 95 Gln Val Asp Glu Leu Ala Gly Val Met Thr
Ala Ala Gly Leu Asp Met 100 105
110 Arg Thr Ser Lys Gln Leu Gly Glu Asp Arg Arg Met Ala Pro His
Gly 115 120 125 Leu
Ala Gly Gln Val Leu Leu Gln Glu Trp Leu Glu Glu Arg Arg Gly 130
135 140 Trp Thr His Arg Ala Ser
Ala Gln Phe Ala Val Val Ala Gly Gly His 145 150
155 160 His Gly Val Pro Pro Asp His Met Gln Leu His
Asn Leu Asp Ala His 165 170
175 Pro Glu Leu Leu Arg Thr Gln Gly Leu Ala Glu Ala Gln Trp Arg Ala
180 185 190 Val Gln
Asp Glu Leu Leu Asp Ala Cys Ala Leu Val Phe Gly Val Glu 195
200 205 Glu Arg Leu Asp Ala Trp Arg
Thr Val Lys Leu Pro Gln Thr Val Gln 210 215
220 Val Leu Leu Thr Ala Thr Val Ile Val Ser Asp Trp
Ile Ala Ser Asn 225 230 235
240 Pro Asp Leu Phe Pro Tyr Phe Pro Glu Glu His Pro Arg Glu Glu Ala
245 250 255 Glu Arg Val
Ala Ala Ala Trp Gln Gly Leu Leu Leu Pro Ala Pro Trp 260
265 270 Glu Pro Glu Glu Pro Ser Ala Pro
Ala Ala Glu Phe Tyr Ala Ser Arg 275 280
285 Phe Ala Leu Pro Pro Gly Ala Val Val Arg Pro Val Gln
Glu Gln Ala 290 295 300
Leu Ala Met Ala Arg Asp Met Glu Arg Pro Gly Met Leu Ile Ile Glu 305
310 315 320 Ala Pro Met Gly
Glu Gly Lys Thr Glu Ala Ala Leu Ala Val Ala Glu 325
330 335 Val Phe Ala Ala Arg Ser Gly Ala Gly
Gly Cys Tyr Val Ala Leu Pro 340 345
350 Thr Met Ala Thr Ser Asn Ala Met Phe Pro Arg Leu Leu Arg
Trp Leu 355 360 365
Asp Arg Leu Pro Arg Ala Asp Val Ser Gly Gly Arg Asp His Glu Gln 370
375 380 Arg Ser Val Leu Leu
Ala His Ala Lys Ser Ala Leu Gln Glu Asp Tyr 385 390
395 400 Ala Thr Leu Met Arg Glu Ser His Arg Thr
Ile Ala Ala Val Asp Ala 405 410
415 Tyr Gly Asp Asp Ser Arg Pro Arg Lys Gly Arg Pro Ala Ala Asp
Gly 420 425 430 Val
Arg Arg Lys Ala Pro Ala Glu Leu Val Ala His Gln Trp Leu Arg 435
440 445 Gly Arg Lys Lys Gly Leu
Leu Ala Ser Phe Ala Val Gly Thr Ile Asp 450 455
460 Gln Leu Leu Met Ala Gly Leu Lys Ser Arg His
Leu Ala Leu Arg His 465 470 475
480 Leu Ala Met Ala Gly Lys Val Val Val Ile Asp Glu Val His Ala Tyr
485 490 495 Asp Thr
Tyr Met Asn Ala Tyr Leu Asp Arg Val Leu Ala Trp Leu Gly 500
505 510 Glu Tyr Arg Val Pro Val Val
Val Leu Ser Ala Thr Leu Pro Ala Arg 515 520
525 Arg Arg Gly Glu Leu Ala Ala Ala Tyr Thr Gly Glu
Asp Ala Gln Ala 530 535 540
Leu Thr Glu Ala Thr Gly Tyr Pro Leu Leu Thr Ala Val Val Pro Gly 545
550 555 560 Arg Glu Ala
Val Gln Phe Val Ala Ala Ala Ser Gly Arg Gly Ser Asp 565
570 575 Val Leu Leu Glu Lys Leu Asp Asp
Asp Asp Glu Ala Leu Ala Asp Arg 580 585
590 Leu Asp Thr Asp Leu Ala Asp Gly Gly Cys Ala Leu Val
Val Arg Asn 595 600 605
Thr Val Asp Arg Val Met Asp Thr Ala Ser Val Leu Arg Glu Arg Phe 610
615 620 Gly Ala Asp His
Val Thr Val Ala His Ala Arg Phe Val Asp Leu Asp 625 630
635 640 Arg Ala Arg Lys Asp Ser Glu Leu Leu
Ala Arg Phe Gly Pro Pro Asp 645 650
655 Pro Asp Gly Gly Ser Pro Gln Arg Pro Arg Asn Ala His Ile
Val Val 660 665 670
Ala Ser Gln Val Ala Glu Gln Ser Leu Asp Val Asp Phe Asp Leu Leu
675 680 685 Val Ser Asp Leu
Cys Pro Val Asp Leu Leu Leu Gln Arg Met Gly Arg 690
695 700 Leu His Arg His Pro Arg Gly Arg
Asp Gln Glu Arg Arg Pro Ala Arg 705 710
715 720 Leu Arg Gln Ala Arg Cys Leu Val Thr Gly Val Gly
Trp Asp Thr Ser 725 730
735 Pro Ala Pro Glu Ala Asp Glu Gly Ser Arg Ala Ile Tyr Gly Ala Tyr
740 745 750 Ser Leu Leu
Arg Ser Leu Ala Val Leu Ala Pro His Leu Gly Thr Ala 755
760 765 Gly Ala Ala Gly His Pro Leu Arg
Leu Pro Glu Asp Ile Ser Pro Leu 770 775
780 Val Arg Arg Ala Tyr Gly Glu Glu Asp Pro Cys Pro Pro
Glu Trp Glu 785 790 795
800 Pro Val Leu Ala Pro Ala Arg Asp Lys Tyr Arg Thr Ala Arg Glu Arg
805 810 815 Gln Ser Gln Lys
Ala Glu Val Phe Arg Leu Asp Glu Val Arg Lys Ala 820
825 830 Gly Arg Pro Leu Ile Gly Trp Ile Asp
Ala Gly Val Gly Asp Ala Asp 835 840
845 Asp Thr Pro Val Gly Arg Ala Gln Val Arg Asp Thr Lys Glu
Gly Leu 850 855 860
Glu Val Leu Val Val Arg Arg Arg Ala Asp Gly Ser Leu Cys Thr Leu 865
870 875 880 Pro Trp Leu Asp Lys
Gly Arg Gly Gly Leu Glu Leu Pro Val Asp Ala 885
890 895 Val Pro Ser Ala Leu Ala Ala Arg Ala Val
Ala Ala Ser Gly Leu Arg 900 905
910 Leu Pro Tyr His Phe Thr Ser Ser Pro Gln Thr Leu Asp Arg Thr
Leu 915 920 925 Ala
Glu Leu Glu Glu Leu Tyr Val Pro Ala Trp Gln Glu Lys Glu Ser 930
935 940 His Trp Ile Ala Gly Glu
Leu Ile Leu Ala Leu Asp Glu Glu Gly Arg 945 950
955 960 Ala Ala Leu Ala Gly Gln Gln Leu Val Tyr Asn
Pro Glu Glu Gly Leu 965 970
975 Leu Val Ala Ser Ala Asp Ala Asn Thr Glu Ala Thr Ser Gly Arg Val
980 985 990 Met Asp
Gly Lys Pro Ser Ser Ala Gly Asp Gly Lys Pro Gly His Ala 995
1000 1005 Ala Asp Gly Asn Arg
Ala Arg Thr Thr Val Gly Gln Ser Pro Ala 1010 1015
1020 Asp Arg Gln Thr His Gln Pro Pro Glu Gly
Glu Arg His Pro Val 1025 1030 1035
Pro Pro Ser Ala Ala Pro Pro Pro Ala Arg Pro Ser Phe Asp Leu
1040 1045 1050 Thr Ser
Arg Pro Trp Leu Pro Val Leu Leu Lys Asp Gly Ser Glu 1055
1060 1065 Arg Glu Leu Ser Leu Pro Glu
Val Phe Asp Gln Ala Arg Asp Ile 1070 1075
1080 Arg Arg Leu Val Gly Asp Leu Pro Thr Gln Asp Phe
Ala Leu Thr 1085 1090 1095
Arg Met Leu Leu Ala Leu Leu Tyr Asp Ala Leu Ser Glu Pro Gly 1100
1105 1110 Gly Asp Met Ala Pro
Ala Asp Thr Asp Ala Trp Glu Glu Leu Trp 1115 1120
1125 Leu Ser Gln Ser Ala Tyr Ala Ala Pro Val
Ala Ala Tyr Leu His 1130 1135 1140
Arg Tyr Arg Glu Arg Phe Asp Leu Leu His Pro Glu Ser Pro Phe
1145 1150 1155 Phe Gln
Thr Pro Gly Leu Arg Thr Ala Lys Asn Glu Val Phe Ser 1160
1165 1170 Leu Asn Arg Leu Val Ala Asp
Val Pro Asn Gly Asp Pro Phe Phe 1175 1180
1185 Ser Met Arg Arg Pro Gly Val Asp Arg Leu Gly Phe
Ala Glu Ala 1190 1195 1200
Ala Arg Trp Leu Val His Ala Gln Ala Tyr Asp Thr Ser Gly Ile 1205
1210 1215 Lys Thr Gly Ala Val
Gly Asp Pro Arg Val Lys Ala Gly Lys Gly 1220 1225
1230 Tyr Pro Gln Gly Pro Ala Trp Ala Gly Asn
Leu Gly Gly Val Leu 1235 1240 1245
Leu Glu Gly Asp Asn Leu His Glu Thr Leu Leu Leu Asn Leu Ile
1250 1255 1260 Ala Gly
Asp Thr Pro Gly Val His Ala Ala Glu Val Asp Arg Pro 1265
1270 1275 Ala Trp Arg Ala Glu Pro Ser
Gly Pro Ala Pro Ala Pro Asp Leu 1280 1285
1290 Gly Leu Arg Pro Tyr Gly Leu Arg Asp Leu Tyr Thr
Trp Gln Ser 1295 1300 1305
Arg Arg Ile Arg Leu His His Asp Ala Asp Gly Val His Gly Val 1310
1315 1320 Val Leu Ala Tyr Gly
Asp Ser Leu Glu Pro His Asn Arg His Gly 1325 1330
1335 His Glu Pro Met Thr Ser Trp Arg Arg Ser
Pro Thr Gln Glu Lys 1340 1345 1350
Lys Arg Gln Glu Asn Leu Val Tyr Leu Pro Arg Glu His Asp Pro
1355 1360 1365 Ser Arg
Leu Ala Trp Arg Gly Met Asp Gly Leu Leu Ala Gly Arg 1370
1375 1380 Glu Thr Gly Ser Ala Gln Gly
Pro Asp Gly Ala Asp Arg Leu Ala 1385 1390
1395 Pro Lys Val Val Gln Trp Ala Ala Gln Leu Thr Thr
Glu Gly Leu 1400 1405 1410
Leu Pro Arg Gly Tyr Leu Ile Arg Thr Arg Val Ile Gly Ala Arg 1415
1420 1425 Tyr Gly Thr Gln Gln
Ser Val Ile Asp Glu Val Val Asp Asp Gly 1430 1435
1440 Val Leu Met Pro Ala Val Leu Leu His Glu
Ala Asp Arg Arg Tyr 1445 1450 1455
Gly Asp Lys Ala Val Asp Ala Leu His Asp Ala Glu Lys Ala Val
1460 1465 1470 Gly Ala
Leu Ala Gln Leu Ala Ala Asp Leu Ala Leu Ala Val Gly 1475
1480 1485 Thr Asp Pro Glu Pro Gly Arg
Asn Thr Ala Arg Asp Leu Gly Phe 1490 1495
1500 Gly Thr Leu Asp Thr His Tyr Arg Arg Trp Leu Arg
Glu Leu Gly 1505 1510 1515
Gly Thr Ser Asp Pro Glu Glu His Arg Asp Arg Trp Lys Gln Glu 1520
1525 1530 Val Arg Arg Leu Val
Ala Glu Leu Gly Glu Arg Leu Leu Asp Gly 1535 1540
1545 Ala Gly Pro Ala Ala Trp Glu Gly Arg Leu
Val Glu Thr Gly Lys 1550 1555 1560
Gly Thr Arg Trp Leu Asn Asp Ala Ala Ala Glu Leu Arg Phe Arg
1565 1570 1575 Thr Arg
Leu Arg Glu Phe Leu Thr Thr Ala Pro Asp Thr Pro Thr 1580
1585 1590 Ser Pro Arg Pro Ala Pro Val
Glu Ser Pro Ala 1595 1600
441559PRTStreptomyces griseus 44Met Ser Asn Thr Pro Met Ser Arg Asp His
Pro Glu Ser Leu Ser Ala 1 5 10
15 Tyr Ala Arg Leu Ser Pro Val Ser Arg Thr Ala Trp Gly Lys His
Asp 20 25 30 Arg
Gln Thr Glu Gln Trp Leu Pro Leu Trp Arg His Met Ala Asp Ser 35
40 45 Ala Ala Val Ala Glu Arg
Leu Trp Asp Gln Trp Val Pro Asp Asn Val 50 55
60 Lys Ala Leu Ile Ala Asp Ala Phe Pro Gln Gly
Ala Gln Asp Ala Arg 65 70 75
80 Arg Val Ala Val Phe Leu Ala Cys Val His Asp Ile Gly Lys Ala Thr
85 90 95 Pro Ala
Phe Ala Cys Gln Val Asp Gly Leu Ala Asp Arg Met Arg Ala 100
105 110 Ala Gly Leu Ser Met Pro Tyr
Leu Lys Gln Phe Gly Leu Asp Arg Arg 115 120
125 Met Ala Pro His Gly Leu Ala Gly Gln Leu Leu Leu
Gln Glu Trp Leu 130 135 140
Ala Glu Arg Phe Gly Trp Ser Glu Arg Ala Ser Gly Gln Phe Ala Val 145
150 155 160 Val Ala Gly
Gly His His Gly Thr Pro Pro Asp His Gln His Ile His 165
170 175 Asp Leu Gly Leu Arg Pro His Leu
Leu Arg Thr Ala Gly Glu Ser Gln 180 185
190 Asp Thr Trp Arg Ser Val Gln Asp Glu Leu Met Asp Ala
Cys Ala Val 195 200 205
Arg Ala Gly Val Gly Gly Arg Phe Gly Ala Trp Arg Ser Val Arg Leu 210
215 220 Pro Gln Pro Val
Gln Val Val Leu Thr Ala Ile Val Ile Val Ser Asp 225 230
235 240 Trp Ile Ala Ser Ser Ser Glu Leu Phe
Pro Tyr Asp Pro Ala Ser Trp 245 250
255 Ser Pro Val Gly Pro Glu Gly Glu Gly Arg Arg Leu Thr Ala
Ala Trp 260 265 270
Gly Gly Leu Asp Leu Pro Gly Pro Trp Arg Ala Asp Gln Pro Asp Cys
275 280 285 Thr Ala Ala Glu
Leu Phe Gly Lys Arg Phe Asp Leu Pro Glu Gly Ala 290
295 300 Gly Val Arg Pro Val Gln Glu Glu
Ala Val Arg Val Ala Gln Glu Leu 305 310
315 320 Pro Gly Pro Gly Leu Leu Ile Ile Glu Ala Pro Met
Gly Glu Gly Lys 325 330
335 Thr Glu Ala Ala Phe Ala Ala Ala Glu Ile Leu Ala Ala Arg Thr Gly
340 345 350 Ala Gly Gly
Cys Leu Val Ala Leu Pro Thr Arg Ala Thr Gly Asp Ala 355
360 365 Met Phe Pro Arg Leu Leu Arg Trp
Leu Glu Arg Leu Pro Ser Asp Gly 370 375
380 Pro Arg Ser Val Val Leu Ala His Ala Lys Ala Ala Leu
Asn Glu Val 385 390 395
400 Trp Ala Gly Met Thr Lys Ala Asp Arg Arg Lys Ile Thr Ala Val Asp
405 410 415 Leu Asp Ser Gln
Val Glu Asp Val Ser Ser Ala Gly Gly Ala Arg Arg 420
425 430 Ala Asn Pro Ala Ser Leu His Ala His
Gln Trp Leu Arg Gly Arg Lys 435 440
445 Lys Ala Leu Leu Ser Ser Phe Ala Val Gly Thr Val Asp Gln
Val Leu 450 455 460
Phe Ala Gly Leu Lys Ser Arg His Leu Ala Leu Arg His Leu Ala Val 465
470 475 480 Ala Gly Lys Val Val
Ile Val Asp Glu Val His Ala Tyr Asp Ala Tyr 485
490 495 Met Ser Ala Tyr Leu Asp Arg Val Leu Glu
Trp Leu Ala Ala Tyr Arg 500 505
510 Val Pro Val Val Met Leu Ser Ala Thr Leu Pro Ala His Arg Arg
Arg 515 520 525 Glu
Leu Ala Ala Ala Tyr Ala Gly Glu Glu Thr Pro Glu Leu Ala Asp 530
535 540 Ala Leu Ala Leu Pro Asp
Asp Ala Tyr Pro Leu Ile Thr Ala Val Ala 545 550
555 560 Pro Gly Gly Leu Val Leu Thr Ala Arg Pro Glu
Pro Ala Ser Gly Arg 565 570
575 Arg Thr Glu Val Val Leu Glu Arg Leu Gly Asp Gly Pro Ala Leu Leu
580 585 590 Ala Ala
Arg Leu Asp Glu Glu Leu Arg Asp Gly Gly Cys Ala Leu Val 595
600 605 Val Arg Asn Thr Val Asp Arg
Val Leu Glu Ala Ala Glu His Leu Arg 610 615
620 Ala His Phe Gly Ala Glu Ala Val Thr Val Ala His
Ser Arg Phe Val 625 630 635
640 Ala Ala Asp Arg Ala Arg Asn Asp Thr Val Leu Arg Glu Arg Phe Gly
645 650 655 Pro Gly Gly
Asp Arg Pro Ala Gly Pro His Ile Val Val Ala Ser Gln 660
665 670 Val Val Glu Gln Ser Leu Asp Ile
Asp Phe Asp Leu Leu Val Thr Asp 675 680
685 Leu Ala Pro Val Asp Leu Val Leu Gln Arg Met Gly Arg
Leu His Arg 690 695 700
His Pro Arg Thr Arg Pro Pro Arg Leu Ser Arg Ala Arg Cys Leu Ile 705
710 715 720 Thr Gly Val Glu
Asp Trp His Ala Glu Arg Pro Val Pro Val Arg Gly 725
730 735 Ser Leu Ala Val Tyr Gln Gly Pro His
Thr Leu Leu Arg Ala Leu Ala 740 745
750 Val Leu Gly Pro His Leu Asp Gly Val Pro Leu Val Leu Pro
Asp His 755 760 765
Ile Ser Pro Leu Val Gln Ala Ala Tyr Asp Glu Arg Pro Val Gly Pro 770
775 780 Ala His Trp Ala Pro
Val Leu Asp Glu Ala Arg Arg Gln Tyr Leu Thr 785 790
795 800 Arg Leu Ala Glu Lys Arg Glu Arg Ala Asp
Val Phe Arg Leu Gly Pro 805 810
815 Val Arg Arg Pro Gly Arg Pro Leu Phe Gly Trp Leu Asp Gly Asn
Ala 820 825 830 Gly
Asp Ala Asp Asp Ser Arg Thr Gly Arg Ala Gln Val Arg Asp Ser 835
840 845 Glu Glu Ser Leu Glu Val
Leu Val Val Gln Arg Arg Ala Asp Gly Arg 850 855
860 Leu Thr Thr Val Ser Trp Leu Asp Gly Gly Arg
Gly Gly Leu Asp Leu 865 870 875
880 Pro Glu His Ala Pro Pro Pro Pro Arg Ala Ala Glu Val Val Ala Ala
885 890 895 Cys Ala
Leu Thr Leu Pro Arg Ser Leu Thr His Pro Gly Val Ile Asp 900
905 910 Arg Thr Ile Ala Glu Leu Glu
Arg Phe Val Val Pro Ala Trp Gln Val 915 920
925 Lys Glu Cys Pro Trp Leu Ala Gly Glu Leu Leu Leu
Val Leu Asp Glu 930 935 940
Asp Cys Gln Thr Arg Leu Ser Gly Leu Glu Val His Tyr Ser Thr Asp 945
950 955 960 Gln Gly Leu
Arg Val Gly Ser Val Gly Thr Arg Ser Thr Asn Arg Ala 965
970 975 Lys Gly Leu Glu Ala Val Ser Val
Ala Ser Phe Asp Leu Val Ser Arg 980 985
990 Pro Trp Leu Pro Val Gln Tyr Glu Asp Gly Ala Thr
Gly Glu Leu Ser 995 1000 1005
Leu Arg Glu Val Phe Ala Arg Ala Gly Glu Val Arg Arg Leu Val
1010 1015 1020 Gly Asp Leu
Pro Thr Gln Glu Leu Ala Leu Leu Arg Leu Leu Leu 1025
1030 1035 Ala Ile Leu Tyr Asp Ala Tyr Asp
Glu Ala Pro Gly Arg Ser Gly 1040 1045
1050 Gly Ala Pro Ala Gln Leu Glu Asp Trp Glu Ala Leu Trp
Asp Glu 1055 1060 1065
Pro Asp Ser Phe Ala Val Val Ala Gly Tyr Leu Asp Arg His Arg 1070
1075 1080 Asp Arg Phe Asp Leu
Leu His Pro Glu Arg Pro Phe Phe Gln Val 1085 1090
1095 Ala Gly Leu His Thr Gln Lys His Glu Val
Ala Ser Leu Asn Arg 1100 1105 1110
Ile Val Ala Asp Val Pro Asn Gly Glu Ala Phe Phe Ser Met Arg
1115 1120 1125 Arg Pro
Gly Val His Arg Leu Gly Leu Ala Glu Ala Ala Arg Trp 1130
1135 1140 Leu Val His Thr His Ala Tyr
Asp Ala Ser Gly Ile Lys Ser Gly 1145 1150
1155 Met Glu Gly Asp Ala Arg Val Lys Gly Gly Lys Val
Tyr Pro Gln 1160 1165 1170
Gly Val Gly Trp Val Gly Gly Leu Gly Gly Val Phe Ala Glu Gly 1175
1180 1185 Ala Ser Leu Arg Glu
Thr Leu Leu Leu Asn Leu Ile Pro Thr Asp 1190 1195
1200 Glu Asp Ile Leu Thr Ser Glu Pro Lys Ala
Asp Leu Pro Val Trp 1205 1210 1215
Arg Arg Glu Thr Pro Pro Gly Pro Gly Val Val Glu Gly Asp Pro
1220 1225 1230 Ser Ala
Pro Arg Pro Ala Gly Pro Arg Asp Leu Tyr Thr Trp Gln 1235
1240 1245 Ser Arg Arg Leu Leu Leu His
Thr Glu Gly Ser Asp Ala Ile Gly 1250 1255
1260 Val Val Leu Gly Tyr Gly Asp Pro Leu Ser Pro Ala
Asn Arg Gln 1265 1270 1275
Lys Thr Glu Pro Met Thr Gly Trp Arg Arg Ser Pro Ala Gln Glu 1280
1285 1290 Lys Lys Leu Gly Arg
Pro Leu Val Tyr Leu Pro Arg Gln His Asp 1295 1300
1305 Pro Gly Arg Ala Ala Trp Arg Gly Leu Ala
Ser Leu Leu Tyr Pro 1310 1315 1320
Gln Gly Glu Asp Gly Asp Thr Thr Gly Arg Gly Thr Asp Arg Ser
1325 1330 1335 Arg Pro
Ala Gly Ile Val Arg Trp Leu Ala Leu Leu Ser Thr Glu 1340
1345 1350 Gly Val Leu Pro Lys Gly Ser
Leu Ile Arg Thr Arg Leu Val Gly 1355 1360
1365 Ala Val Tyr Gly Thr Gln Gln Ser Val Val Asp Asp
Val Val Asp 1370 1375 1380
Asp Ser Ile Ala Leu Pro Val Val Leu Leu His Gln Asp Arg Arg 1385
1390 1395 Leu His Gly Ala Val
Ala Val Asp Ala Val Ala Asp Ala Glu Arg 1400 1405
1410 Ala Val Ser Ala Leu Gly His Leu Ala Gly
Asn Leu Ala Arg Ala 1415 1420 1425
Ser Gly Ser Glu Ala Gly Pro Ala Thr Ala Thr Ala Arg Asp Gln
1430 1435 1440 Gly Phe
Gly Ala Leu Asp Gly Pro Tyr Arg Arg Trp Leu Val Asp 1445
1450 1455 Leu Ala Glu Asp Thr Asp Leu
Glu Arg Ala Arg Ala Ala Trp Arg 1460 1465
1470 Asp Thr Val Arg Leu Val Val Leu Gly Ile Gly Arg
Glu Leu Leu 1475 1480 1485
Asp Ala Ala Gly Arg Ala Ala Ala Glu Gly Arg Val Ile Glu Leu 1490
1495 1500 Pro Gly Val Gly Lys
Arg Trp Ile Asp Ser Ser Arg Ala Asp Leu 1505 1510
1515 Trp Phe Arg Thr Arg Ile Asn Arg Val Leu
Pro Arg Pro Leu Pro 1520 1525 1530
Glu Ala His Ala Pro Thr Ala Asp Ile His Ala Gly His Ala Val
1535 1540 1545 Arg Ala
Asp Glu Ala Leu Ser Glu Glu Thr Val 1550 1555
451540PRTCatenulispora acidiphila 45Met Phe Asn Val Gly Ser Thr
Arg Cys Trp Gly Asp Gly Gly Leu Arg 1 5
10 15 Asn Ala Ala Glu Asp Leu Ser Ala Ala Thr Arg
Ser Ala Trp Ala Lys 20 25
30 Ser Asp Pro Asp Ser Gly Gln Ser Leu Ser Leu Ile Arg His Leu
Ala 35 40 45 Asp
Ser Ala Ala Ile Ala Glu His Leu Trp Asp Gln Trp Leu Pro Asp 50
55 60 His Val Lys Ser Leu Ile
Ala Glu Gly Leu Pro Glu Gly Leu Val Asp 65 70
75 80 Gly Arg Thr Leu Ala Val Trp Leu Ala Gly Thr
His Asp Ile Gly Lys 85 90
95 Leu Thr Pro Ala Phe Ala Cys Gln Cys Glu Pro Leu Ala Gln Ala Met
100 105 110 Arg Glu
Cys Gly Leu Asp Met Pro Thr Arg Thr Gln Phe Gly Asp Asp 115
120 125 Arg Arg Val Ala Pro His Gly
Leu Ala Gly Gln Val Leu Leu Arg Glu 130 135
140 Trp Leu Met Glu Arg His Gly Trp Ser Gly Arg Ser
Ala Asp Ala Phe 145 150 155
160 Thr Val Ile Ala Gly Gly His His Gly Val Pro Pro Ser Tyr Ser Gln
165 170 175 Leu His Asp
Leu Asp Ala Tyr Pro Glu Leu Leu Arg Thr Pro Gly Ala 180
185 190 Ser Glu Gly Ile Trp Lys Ser Ser
Gln His Glu Leu Leu Asp Ala Cys 195 200
205 Ala Val Met Thr Gly Ala Ser Ser Arg Leu Ala His Trp
Arg Gly Leu 210 215 220
Arg Leu Ser Gln Gln Ala Gln Val Leu Leu Thr Gly Leu Val Ile Val 225
230 235 240 Ala Asp Trp Ile
Ala Ser Asn Thr Asp Leu Phe Pro Tyr Pro Ala Leu 245
250 255 Gly Thr Gly Glu Ala Ala Ile Asp Pro
Gly Lys Arg Val Glu Leu Ala 260 265
270 Trp Arg Gly Leu Glu Leu Pro Ala Pro Trp Ala Pro Lys Tyr
Leu Met 275 280 285
Pro Gly Met Gln Gly Leu Leu Ala Ser Arg Phe Gly Leu Pro Ala Asp 290
295 300 Ala Gln Leu Arg Pro
Val Gln Gln Met Ala Val Gln Leu Ala Ser Ala 305 310
315 320 Asn Ala Ala Pro Gly Leu Leu Val Ile Glu
Ala Pro Met Gly Glu Gly 325 330
335 Lys Thr Glu Ala Ala Leu Leu Ala Ala Glu Ile Leu Ala Ala Arg
Ser 340 345 350 Gly
Ala Gly Gly Val Phe Leu Ala Leu Pro Thr Gln Ala Thr Ser Asn 355
360 365 Ala Met Phe Ala Arg Val
Val Asn Trp Leu Arg Gln Val Pro Arg Glu 370 375
380 Gly Val Ala Ser Val His Leu Ala His Gly Lys
Ala Ala Leu Asp Asp 385 390 395
400 Ala Phe Ala Ser Phe Leu Arg Ala Ala Pro Arg Leu Thr Ser Ile Asp
405 410 415 Ala Asp
Gly Tyr Ala Gly Glu Ala Asn Val Arg Arg Asp Arg Arg Ala 420
425 430 Gly Ser Ala Asp Met Val Ala
His Gln Trp Leu Arg Gly Arg Lys Lys 435 440
445 Gly Ile Leu Ser Pro Phe Val Val Gly Thr Ile Asp
Gln Leu Leu Phe 450 455 460
Thr Gly Leu Lys Ser Arg His Leu Ala Leu Arg His Leu Ala Val Ala 465
470 475 480 Gly Lys Val
Val Val Ile Asp Glu Val His Ala Tyr Asp Ala Tyr Met 485
490 495 Ser Val Tyr Leu Glu Arg Val Leu
Ser Trp Leu Gly Ala Tyr Arg Val 500 505
510 Pro Val Val Leu Leu Ser Ala Thr Leu Pro Ala Asp Arg
Arg Gln Ala 515 520 525
Leu Val Glu Ala Tyr Gly Gly Ile Thr Ser Glu Ala Leu Arg Asp Ala 530
535 540 Arg Glu Ala Tyr
Pro Val Leu Thr Ala Val Thr Ile Gly Ala Pro Ala 545 550
555 560 Gln Ala Val Gly Thr Glu Pro Ala Glu
Gly Arg Arg Val Asp Val Asn 565 570
575 Val Glu Ala Phe Asp Asp Asp Leu Gly Arg Leu Ala Asp Arg
Leu Glu 580 585 590
Ala Glu Leu Val Asp Gly Gly Cys Ala Leu Ile Ile Arg Asn Thr Val
595 600 605 Gly Arg Val Leu
Gln Thr Ala Gln Gln Leu Arg Glu Arg Phe Gly Ala 610
615 620 Gly Gln Val Thr Val Ala His Ser
Arg Phe Ile Asp Leu Asp Arg Ala 625 630
635 640 Arg Lys Asp Ala Asp Leu Leu Ala Arg Phe Gly His
Asp Gly Ala Arg 645 650
655 Pro Arg Arg His Ile Val Val Ala Ser Gln Val Ala Glu Gln Ser Leu
660 665 670 Asp Ile Asp
Phe Asp Leu Leu Val Thr Asp Leu Ala Pro Ile Asp Leu 675
680 685 Val Leu Gln Arg Met Gly Arg Val
His Arg His His Arg Gly Gly Pro 690 695
700 Glu Gln Ser Glu Arg Pro Pro Ser Leu Arg Thr Ala Arg
Cys Leu Val 705 710 715
720 Thr Gly Val Asp Trp Ala Gly Ile Pro Ser Ala Pro Ile Ala Gly Ser
725 730 735 Val Ala Val Tyr
Gly Leu His Pro Leu Leu Arg Ser Leu Ala Val Leu 740
745 750 Gln Pro Tyr Leu Thr Gly Ser Ala Leu
Thr Leu Pro Gly Asp Ile Asn 755 760
765 Pro Leu Val Gln Cys Ala Tyr Ala Gln Ser Phe Val Ala Pro
Thr Gly 770 775 780
Trp Gly Glu Ala Met Asp Ala Ala Gln Ala Glu His Met Ala His Ile 785
790 795 800 Val Gln Gln Arg Glu
Gly Ala Met Ala Phe Cys Leu Asp Glu Val Arg 805
810 815 Gly Pro Gly Arg Ser Leu Ile Gly Trp Ile
Asp Gly Gly Val Gly Asp 820 825
830 Ala Asp Asp Thr Arg Ala Gly Arg Ala Gln Val Arg Asp Ser Pro
Glu 835 840 845 Thr
Ile Glu Val Leu Val Val Gln Arg Gly Ser Asp Gly Val Leu Arg 850
855 860 Thr Leu Pro Trp Leu Asp
Arg Gly Arg Gly Gly Leu Glu Leu Pro Thr 865 870
875 880 Glu Ala Val Pro Pro Pro Arg Ala Ala Arg Ala
Ala Ala Ala Ser Ala 885 890
895 Leu Arg Leu Pro Gly Leu Phe Ala Lys Pro Trp Met Phe Asp Arg Val
900 905 910 Leu Arg
Glu Leu Glu Arg Glu Tyr His Glu Ala Trp Gln Ala Lys Glu 915
920 925 Ser Ser Trp Leu Gln Gly Glu
Leu Leu Leu Val Leu Asp Glu Glu Cys 930 935
940 Arg Thr Val Leu Ala Gly Tyr Glu Leu Ser Tyr Asn
Pro Asp Asp Gly 945 950 955
960 Leu Glu Met Val Met Pro Gly Glu Pro His Ala Ala Val Val Arg Asp
965 970 975 Lys Glu Ala
Ser Asp Asp Lys Thr Ala Ser Phe Asp Leu Thr Ser Ala 980
985 990 Pro Trp Leu Pro Val Leu Tyr Ala
Asp Gly Met Gln Gly Val Leu Ser 995 1000
1005 Leu Arg Asp Val Phe Ala Gln Ser Asn Leu Ile
Arg Arg Leu Val 1010 1015 1020
Gly Asp Leu Pro Thr Gln Asp Phe Ala Leu Leu Arg Leu Leu Leu
1025 1030 1035 Ala Val Leu
Tyr Asp Ala Val Asp Gly Pro Arg Asp Gly Gln Asp 1040
1045 1050 Trp Glu Asp Leu Trp Thr Ser Asp
Asp Pro Phe Ala Ala Val Pro 1055 1060
1065 Ala Tyr Leu Asp Ser His Arg Glu Arg Phe Asp Leu Leu
His Pro 1070 1075 1080
Ala Thr Pro Phe Tyr Gln Val Pro Gly Leu Gln Thr Ala Lys Gly 1085
1090 1095 Glu Val Gly Pro Leu
Asn Lys Ile Val Ala Asp Val Pro Asp Gly 1100 1105
1110 Asp Pro Phe Leu Thr Met Arg Met Pro Gly
Val Glu Gln Leu Ser 1115 1120 1125
Phe Ala Glu Ala Ala Arg Trp Leu Val His Thr Gln Ala Phe Asp
1130 1135 1140 Thr Ser
Gly Ile Lys Ser Gly Val Val Gly Asp Pro Lys Ala Val 1145
1150 1155 Asn Gly Lys Arg Tyr Pro Gln
Gly Val Ala Trp Leu Gly Asn Leu 1160 1165
1170 Gly Gly Val Phe Ala Glu Gly Asp Thr Leu Arg Gln
Thr Leu Leu 1175 1180 1185
Leu Asn Leu Ile Pro Ala Asp Thr Thr Asn Leu Gln Val Thr Ser 1190
1195 1200 Ala Gln Asp Val Pro
Ala Trp Arg Gly Thr Asn Gly Arg Ala Gly 1205 1210
1215 Ser Asp His Ala Asp Ala Glu Pro Arg Val
Pro Ala Gly Leu Arg 1220 1225 1230
Asp Leu Tyr Thr Trp Gln Ser Arg Arg Ile Arg Leu Glu Tyr Asp
1235 1240 1245 Thr Arg
Gly Val Thr Gly Ala Val Leu Thr Tyr Gly Asp Glu Leu 1250
1255 1260 Thr Ala His Asn Lys His Gly
Val Glu Pro Met Thr Gly Trp Arg 1265 1270
1275 Arg Ser Lys Pro Gln Glu Lys Lys Leu Gly Leu Ser
Thr Val Tyr 1280 1285 1290
Met Pro Gln Gln His Asp Pro Thr Arg Ala Ala Trp Arg Gly Ile 1295
1300 1305 Glu Ser Leu Leu Ala
Gly Ser Ala Gly Ser Gly Ser Ser Gln Thr 1310 1315
1320 Gly Glu Pro Ala Ser His Tyr Arg Pro Lys
Ile Val Asp Trp Leu 1325 1330 1335
Gly Glu Leu Ala His His Gly Asn Leu Pro Ser Arg Gly Leu Ile
1340 1345 1350 Arg Val
Arg Thr Ser Gly Ala Val Tyr Gly Thr Gln Gln Ser Ile 1355
1360 1365 Ile Asp Glu Val Val Ser Asp
Glu Leu Thr Met Ala Val Val Leu 1370 1375
1380 Leu His Glu Asp Asp Pro Arg Phe Gly Lys Ala Ala
Val Thr Ala 1385 1390 1395
Val Lys Asp Ala Asp Ser Ala Val Ala Ala Leu Gly Asp Leu Ala 1400
1405 1410 Ser Asp Leu Ala Arg
Ala Ala Gly Leu Asp Pro Glu Pro Glu Arg 1415 1420
1425 Val Thr Ala Arg Asp Arg Ala Phe Gly Ala
Leu Asp Gly Pro Tyr 1430 1435 1440
Arg Arg Trp Leu Leu Asp Leu Gly Asn Ser Thr Asp Pro Ala Ala
1445 1450 1455 Met Arg
Ala Val Trp Gln Gly Arg Val Tyr Asp Ile Ile Ala Val 1460
1465 1470 Gln Gly Gln Met Leu Leu Asp
Ser Ala Gly Ser Ala Ala Ala Gln 1475 1480
1485 Gly Arg Met Val Lys Thr Thr Arg Gly Glu Arg Trp
Met Asp Asp 1490 1495 1500
Ser Leu Ala Asp Leu Tyr Phe Lys Gly Arg Ile Ala Lys Ala Leu 1505
1510 1515 Ser Ser Arg Leu Gly
Lys Lys Pro Thr Asp Pro Gly Glu Pro Val 1520 1525
1530 Gly Ile Gln Glu Asp Pro Ala 1535
1540 461407PRTArtificial SequenceDescription of Artificial
Sequence Synthetic polypeptide 46Met Glu Pro Phe Lys Tyr Ile Cys
His Tyr Trp Gly Lys Ser Ser Lys 1 5 10
15 Ser Leu Thr Lys Gly Asn Asp Ile His Leu Leu Ile Tyr
His Cys Leu 20 25 30
Asp Val Ala Ala Val Ala Asp Cys Trp Trp Asp Gln Ser Val Val Leu
35 40 45 Gln Asn Thr Phe
Cys Arg Asn Glu Met Leu Ser Lys Gln Arg Val Lys 50
55 60 Ala Trp Leu Leu Phe Phe Ile Ala
Leu His Asp Ile Gly Lys Phe Asp 65 70
75 80 Ile Arg Phe Gln Tyr Lys Ser Ala Glu Ser Trp Leu
Lys Leu Asn Pro 85 90
95 Ala Thr Pro Ser Leu Asn Gly Pro Ser Thr Gln Met Cys Arg Lys Phe
100 105 110 Asn His Gly
Ala Ala Gly Leu Tyr Trp Phe Asn Gln Asp Ser Leu Ser 115
120 125 Glu Gln Ser Leu Gly Asp Phe Phe
Ser Phe Phe Asp Ala Ala Pro His 130 135
140 Pro Tyr Glu Ser Trp Phe Pro Trp Val Glu Ala Val Thr
Gly His His 145 150 155
160 Gly Phe Ile Leu His Ser Gln Asp Gln Asp Lys Ser Arg Trp Glu Met
165 170 175 Pro Ala Ser Leu
Ala Ser Tyr Ala Ala Gln Asp Lys Gln Ala Arg Glu 180
185 190 Glu Trp Ile Ser Val Leu Glu Ala Leu
Phe Leu Thr Pro Ala Gly Leu 195 200
205 Ser Ile Asn Asp Ile Pro Pro Asp Cys Ser Ser Leu Leu Ala
Gly Phe 210 215 220
Cys Ser Leu Ala Asp Trp Leu Gly Ser Trp Thr Thr Thr Asn Thr Phe 225
230 235 240 Leu Phe Asn Glu Asp
Ala Pro Ser Asp Ile Asn Ala Leu Arg Thr Tyr 245
250 255 Phe Gln Asp Arg Gln Gln Asp Ala Ser Arg
Val Leu Glu Leu Ser Gly 260 265
270 Leu Val Ser Asn Lys Arg Cys Tyr Glu Gly Val His Ala Leu Leu
Asp 275 280 285 Asn
Gly Tyr Gln Pro Arg Gln Leu Gln Val Leu Val Asp Ala Leu Pro 290
295 300 Val Ala Pro Gly Leu Thr
Val Ile Glu Ala Pro Thr Gly Ser Gly Lys 305 310
315 320 Thr Glu Thr Ala Leu Ala Tyr Ala Trp Lys Leu
Ile Asp Gln Gln Ile 325 330
335 Ala Asp Ser Val Ile Phe Ala Leu Pro Thr Gln Ala Thr Ala Asn Ala
340 345 350 Met Leu
Thr Arg Met Glu Ala Ser Ala Ser His Leu Phe Ser Ser Pro 355
360 365 Asn Leu Ile Leu Ala His Gly
Asn Ser Arg Phe Asn His Leu Phe Gln 370 375
380 Ser Ile Lys Ser Arg Ala Ile Thr Glu Gln Gly Gln
Glu Glu Ala Trp 385 390 395
400 Val Gln Cys Cys Gln Trp Leu Ser Gln Ser Asn Lys Lys Val Phe Leu
405 410 415 Gly Gln Ile
Gly Val Cys Thr Ile Asp Gln Val Leu Ile Ser Val Leu 420
425 430 Pro Val Lys His Arg Phe Ile Arg
Gly Leu Gly Ile Gly Arg Ser Val 435 440
445 Leu Ile Val Asp Glu Val His Ala Tyr Asp Thr Tyr Met
Asn Gly Leu 450 455 460
Leu Glu Ala Val Leu Lys Ala Gln Ala Asp Val Gly Gly Ser Val Ile 465
470 475 480 Leu Leu Ser Ala
Thr Leu Pro Met Lys Gln Lys Gln Lys Leu Leu Asp 485
490 495 Thr Tyr Gly Leu His Thr Asp Pro Val
Glu Asn Asn Ser Ala Tyr Pro 500 505
510 Leu Ile Asn Trp Arg Gly Val Asn Gly Ala Gln Arg Phe Asp
Leu Leu 515 520 525
Ala His Pro Glu Gln Leu Pro Pro Arg Phe Ser Ile Gln Pro Glu Pro 530
535 540 Ile Cys Leu Ala Asp
Met Leu Pro Asp Leu Thr Met Leu Glu Arg Met 545 550
555 560 Ile Ala Ala Ala Asn Ala Gly Ala Gln Val
Cys Leu Ile Cys Asn Leu 565 570
575 Val Asp Val Ala Gln Val Cys Tyr Gln Arg Leu Lys Glu Leu Asn
Asn 580 585 590 Thr
Gln Val Asp Ile Asp Leu Phe His Ala Arg Phe Thr Leu Asn Asp 595
600 605 Arg Arg Glu Lys Glu Asn
Arg Val Ile Ser Asn Phe Gly Lys Asn Gly 610 615
620 Lys Arg Asn Val Gly Arg Ile Leu Val Ala Thr
Gln Val Val Glu Gln 625 630 635
640 Ser Leu Asp Val Asp Phe Asp Trp Leu Ile Thr Gln His Cys Pro Ala
645 650 655 Asp Leu
Leu Phe Gln Arg Leu Gly Arg Leu His Arg His His Arg Lys 660
665 670 Tyr Arg Pro Ala Gly Phe Glu
Ile Pro Val Ala Thr Ile Leu Leu Pro 675 680
685 Asp Gly Glu Gly Tyr Gly Arg His Glu His Ile Tyr
Ser Asn Val Arg 690 695 700
Val Met Trp Arg Thr Gln Gln His Ile Glu Glu Leu Asn Gly Ala Ser 705
710 715 720 Leu Phe Phe
Pro Asp Ala Tyr Arg Gln Trp Leu Asp Ser Ile Tyr Asp 725
730 735 Asp Ala Glu Met Asp Glu Pro Glu
Trp Val Gly Asn Gly Met Asp Lys 740 745
750 Phe Glu Ser Ala Glu Cys Glu Lys Arg Phe Lys Ala Arg
Lys Val Leu 755 760 765
Gln Trp Ala Glu Glu Tyr Ser Leu Gln Asp Asn Asp Glu Thr Ile Leu 770
775 780 Ala Val Thr Arg
Asp Gly Glu Met Ser Leu Pro Leu Leu Pro Tyr Val 785 790
795 800 Gln Thr Ser Ser Gly Lys Gln Leu Leu
Asp Gly Gln Val Tyr Glu Asp 805 810
815 Leu Ser His Glu Gln Gln Tyr Glu Ala Leu Ala Leu Asn Arg
Val Asn 820 825 830
Val Pro Phe Thr Trp Lys Arg Ser Phe Ser Glu Val Val Asp Glu Asp
835 840 845 Gly Leu Leu Trp
Leu Glu Gly Lys Gln Asn Leu Asp Gly Trp Val Trp 850
855 860 Gln Gly Asn Ser Ile Val Ile Thr
Tyr Thr Gly Asp Glu Gly Met Thr 865 870
875 880 Arg Val Ile Pro Ala Asn Pro Lys Gly Asp Pro Thr
Asn Arg Ala Lys 885 890
895 Gly Leu Glu Ala Val Ser Val Ala Ser Met Asn Leu Leu Ile Asp Asn
900 905 910 Trp Ile Pro
Val Arg Pro Arg Asn Gly Gly Lys Val Gln Ile Ile Asn 915
920 925 Leu Gln Ser Leu Tyr Cys Ser Arg
Asp Gln Trp Arg Leu Ser Leu Pro 930 935
940 Arg Asp Asp Met Glu Leu Ala Ala Leu Ala Leu Leu Val
Cys Ile Gly 945 950 955
960 Gln Ile Ile Ala Pro Ala Lys Asp Asp Val Glu Phe Arg His Arg Ile
965 970 975 Met Asn Pro Leu
Thr Glu Asp Glu Phe Gln Gln Leu Ile Ala Pro Trp 980
985 990 Ile Asp Met Phe Tyr Leu Asn His
Ala Glu His Pro Phe Met Gln Thr 995 1000
1005 Lys Gly Val Lys Ala Asn Asp Val Thr Pro Met
Glu Lys Leu Leu 1010 1015 1020
Ala Gly Val Ser Gly Ala Thr Asn Cys Ala Phe Val Asn Gln Pro
1025 1030 1035 Gly Gln Gly
Glu Ala Leu Cys Gly Gly Cys Thr Ala Ile Ala Leu 1040
1045 1050 Phe Asn Gln Ala Asn Gln Ala Pro
Gly Phe Gly Gly Gly Phe Lys 1055 1060
1065 Ser Gly Leu Arg Gly Gly Thr Pro Val Thr Thr Phe Val
Arg Gly 1070 1075 1080
Ile Asp Leu Arg Ser Thr Val Leu Leu Asn Val Leu Thr Leu Pro 1085
1090 1095 Arg Leu Gln Lys Gln
Phe Pro Asn Glu Ser His Thr Glu Asn Gln 1100 1105
1110 Pro Thr Trp Ile Lys Pro Ile Lys Ser Asn
Glu Ser Ile Pro Ala 1115 1120 1125
Ser Ser Ile Gly Phe Val Arg Gly Leu Phe Trp Gln Pro Ala His
1130 1135 1140 Ile Glu
Leu Cys Asp Pro Ile Gly Ile Gly Lys Cys Ser Cys Cys 1145
1150 1155 Gly Gln Glu Ser Asn Leu Arg
Tyr Thr Gly Phe Leu Lys Glu Lys 1160 1165
1170 Phe Thr Phe Thr Val Asn Gly Leu Trp Pro His Pro
His Ser Pro 1175 1180 1185
Cys Leu Val Thr Val Lys Lys Gly Glu Val Glu Glu Lys Phe Leu 1190
1195 1200 Ala Phe Thr Thr Ser
Ala Pro Ser Trp Thr Gln Ile Ser Arg Val 1205 1210
1215 Val Val Asp Lys Ile Ile Gln Asn Glu Asn
Gly Asn Arg Val Ala 1220 1225 1230
Ala Val Val Asn Gln Phe Arg Asn Ile Ala Pro Gln Ser Pro Leu
1235 1240 1245 Glu Leu
Ile Met Gly Gly Tyr Arg Asn Asn Gln Ala Ser Ile Leu 1250
1255 1260 Glu Arg Arg His Asp Val Leu
Met Phe Asn Gln Gly Trp Gln Gln 1265 1270
1275 Tyr Gly Asn Val Ile Asn Glu Ile Val Thr Val Gly
Leu Gly Tyr 1280 1285 1290
Lys Thr Ala Leu Arg Lys Ala Leu Tyr Thr Phe Ala Glu Gly Phe 1295
1300 1305 Lys Asn Lys Asp Phe
Lys Gly Ala Gly Val Ser Val His Glu Thr 1310 1315
1320 Ala Glu Arg His Phe Tyr Arg Gln Ser Glu
Leu Leu Ile Pro Asp 1325 1330 1335
Val Leu Ala Asn Val Asn Phe Ser Gln Ala Asp Glu Val Ile Ala
1340 1345 1350 Asp Leu
Arg Asp Lys Leu His Gln Leu Cys Glu Met Leu Phe Asn 1355
1360 1365 Gln Ser Val Ala Pro Tyr Ala
His His Pro Lys Leu Ile Ser Thr 1370 1375
1380 Leu Ala Leu Ala Arg Ala Thr Leu Tyr Lys His Leu
Arg Glu Leu 1385 1390 1395
Lys Pro Gln Gly Gly Pro Ser Asn Gly 1400 1405
4749DNAArtificial SequenceDescription of Artificial Sequence Synthetic
oligonucleotide 47ccgtcttgcg ctagctctag aactagtcct cagcctaggc
ctaagctgt 49486PRTArtificial SequenceDescription of
Artificial Sequence Synthetic 6xHis tag 48His His His His His His 1
5 4915PRTSimian virus 40 49Pro Lys Lys Lys Arg Lys
Val Asp Pro Lys Lys Lys Arg Lys Val 1 5
10 15 507PRTArtificial SequenceDescription of
Artificial Sequence Synthetic peptide 50Met His His His His His His
1 5
User Contributions:
Comment about this patent or add new information about this topic: