Patent application title: NOVEL CRISPR-ASSOCIATED TRANSPOSON SYSTEMS AND COMPONENTS
Inventors:
IPC8 Class: AC12N15113FI
USPC Class:
1 1
Class name:
Publication date: 2020-08-13
Patent application number: 20200255830
Abstract:
The disclosure describes novel systems, methods, and compositions for the
manipulation of nucleic acids in a targeted fashion. The disclosure
describes non-naturally occurring, engineered CRISPR systems, components,
and methods for targeted modification of DNA, RNA, and protein
substrates. Each system includes one or more protein components and one
or more nucleic acid components that together target DNA, RNA, or protein
substrates.Claims:
1. An engineered, non-naturally occurring Clustered Interspaced Short
Palindromic Repeat (CRISPR)-Cas system of CLUST.009925 comprising: a
Guide consisting of a direct repeat sequence and a spacer sequence
capable of hybridizing to a target nucleic acid, wherein the Guide
comprises CRISPR RNA (crRNA) or DNA, and any one of the following: a. a
CRISPR-associated protein containing both an HTH domain and an rve
integrase domain capable of binding to the Guide, either as a monomer or
multimer, and of targeting the target nucleic acid sequence complementary
to the spacer sequence; b. an IstB domain-containing protein; and c. a
payload nucleic acid flanked by transposon end sequences.
2. The system of claim 1, wherein the target nucleic acid is a DNA or an RNA.
3. The system of claim 2, wherein the target nucleic acid is double-stranded DNA.
4. The system of claim 1, wherein targeting of the target nucleic acid by the protein and crRNA or the Guide results in a modification in the target nucleic acid, which optionally is a double-stranded cleavage event or a single-stranded cleavage event.
5-6. (canceled)
7. The system of claim 1, further comprising a donor template nucleic acid, which optionally is a DNA.
8. (canceled)
9. The system of claim 1, wherein the target nucleic acid is a double-stranded DNA and the targeting of the double-stranded DNA results in scarless DNA insertion.
10. The system of claim 1, wherein the modification results in cell toxicity.
11. The system of claim 1, within a cell, which is a eukaryotic cell or a prokaryotic cell.
12-13. (canceled)
14. A method of targeting and editing a target nucleic acid, the method comprising contacting the target nucleic acid with a system of claim 1, wherein optionally the method results in an insertion or substitution of DNA to correct a native locus.
15. A method of targeting the insertion of a payload nucleic acid at a site of a target nucleic acid, the method comprising contacting the target nucleic acid with a system of claim 1, wherein optionally the method results in a targeted insertion of a DNA payload into a specific genomic target site.
16. A method of targeting the excision of a payload nucleic acid from a site at a target nucleic acid, the method comprising contacting the target nucleic acid with a system of claim 1, wherein optionally the method results in a targeted deletion of DNA to correct a native locus.
17. The system of claim 1, wherein the CRISPR-associated protein comprises an amino acid sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% similarity to an amino acid sequence provided in any one of Tables 2-3.
18. The system of claim 1, wherein the crRNA or Guide comprises a nucleic acid sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% similarity to a nucleic acid sequence provided in Table 4.
19. The system of claim 1, wherein the payload nucleic acid is flanked by transposon end sequences comprising a nucleic acid sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% similarity to a nucleic acid sequence contained in Table 5.
20. The system of claim 1, wherein the CRISPR-associated protein comprises at least one nuclear localization signal or at least one nuclear export signal.
21. (canceled)
22. The system of claim 1, wherein at least one component of the system is encoded by a codon-optimized nucleic acid for expression in a cell, which optionally is present within at least one vector, which optionally comprises one or more regulatory elements operably-linked to a nucleic acid encoding the component of the system, wherein the one or more regulatory elements optionally comprises at least one promoter, which optionally comprises an inducible promoter or a constitutive promoter.
23-26. (canceled)
27. The system of claim 22, wherein the at least one vector comprises a plurality of vectors, or is a viral vector that is optionally selected from the croup consisting of a retroviral vector, a lentiviral vector, an adenoviral vector, an adeno-associated vector, and a herpes simplex vector.
28-29. (canceled)
30. The system of claim 1, wherein the system is present in a delivery system, which optionally comprises a delivery vehicle selected from the croup consisting of a liposome, an exosome, a microvesicle, and a gene-gun.
31. (canceled)
32. A cell comprising the system of claim 1.
33. The cell of claim 32, wherein the cell is a eukaryotic cell, wherein optionally the cell is a eukaryotic cell, which optionally is a mammalian cell, such as a human cell, or a plant cell; or is a prokaryotic cell.
34-39. (canceled)
Description:
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of priority of U.S. application Ser. No. 62/580,880, filed on Nov. 2, 2017; and U.S. Application No. 62/587,381, filed on Nov. 16, 2017. The content of each of the foregoing applications is hereby incorporated by reference in its entirety.
FIELD OF THE INVENTION
[0002] The present disclosure relates to novel CRISPR systems and components, systems for detecting CRISPR systems, and methods and compositions for use of the CRISPR systems in, for example, nucleic acid targeting and manipulation.
BACKGROUND
[0003] Recent application of advances in genome sequencing technologies and analysis have yielded significant insights into the genetic underpinning of biological activities in many diverse areas of nature, ranging from prokaryotic biosynthetic pathways to human pathologies. To fully understand and evaluate the vast quantities of information produced by genetic sequencing technologies, equivalent increases in the scale, efficacy, and ease of technologies for genome and epigenome manipulation are needed. These novel genome and epigenome engineering technologies will accelerate the development of novel applications in numerous areas, including biotechnology, agriculture, and human therapeutics.
[0004] Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and the CRISPR-associated (Cas) genes, collectively known as the CRISPR-Cas or CRISPR/Cas systems, are currently understood to provide immunity to bacteria and archaea against phage infection. The CRISPR-Cas systems of prokaryotic adaptive immunity are an extremely diverse group of proteins effectors, non-coding elements, as well as loci architectures, some examples of which have been engineered and adapted to produce important biotechnologies.
[0005] The components of the system involved in host defense include one or more effector proteins capable of modifying DNA or RNA and an RNA guide element that is responsible to targeting these protein activities to a specific sequence on the phage DNA or RNA. The RNA guide is composed of a CRISPR RNA (crRNA) and may require an additional trans-activating RNA (tracrRNA) to enable targeted nucleic acid manipulation by the effector protein(s). The crRNA consists of a direct repeat responsible for protein binding to the crRNA and a spacer sequence that is complementary to the desired nucleic acid target sequence. CRISPR systems can be reprogrammed to target alternative DNA or RNA targets by modifying the spacer sequence of the crRNA.
[0006] CRISPR-Cas systems can be broadly classified into two classes: Class 1 systems are composed of multiple effector proteins that together form a complex around a crRNA, and Class 2 systems consist of a single effector protein that complexes with the crRNA to target DNA or RNA substrates. The single-subunit effector composition of the Class 2 systems provides a simpler component set for engineering and application translation, and have thus far been an important source of programmable effectors. Thus, the discovery, engineering, and optimization of novel Class 2 systems may lead to widespread and powerful programmable technologies for genome engineering and beyond.
[0007] Citation or identification of any document in this application is not an admission that such document is available as prior art to the present invention.
SUMMARY
[0008] The present disclosure provides methods for computational identification of new CRISPR-Cas systems from genomic databases, together with the development of the natural loci into an engineered system, and experimental validation and application translation.
[0009] In one aspect, provided herein is an engineered, non-naturally occurring Clustered Interspaced Short Palindromic Repeat (CRISPR)-Cas system of CLUST.009925 including a Guide consisting of a direct repeat sequence and a spacer sequence capable of hybridizing to a target nucleic acid, wherein the Guide comprises CRISPR RNA (crRNA) or DNA, and any one of the following: a CRISPR-associated protein containing both an HTH domain and an rye integrase domain capable of binding to the Guide, either as a monomer or multimer, and of targeting the target nucleic acid sequence complementary to the spacer sequence; an IstB domain-containing protein; and a payload nucleic acid flanked by transposon end sequences.
[0010] In some embodiments, the target nucleic acid is a DNA or an RNA. In some embodiments, the target nucleic acid is double-stranded DNA.
[0011] In some embodiments, targeting of the target nucleic acid by the protein and crRNA or the Guide results in a modification in the target nucleic acid. In some embodiments, the modification in the target nucleic acid is a double-stranded cleavage event. In some embodiments, the modification in the target nucleic acid is a single-stranded cleavage event.
[0012] In some embodiments, the system described herein further comprises a donor template nucleic acid. In some embodiments, the donor template nucleic acid is a DNA. In some embodiments, the target nucleic acid is a double-stranded DNA and the targeting of the double-stranded DNA results in scarless DNA insertion. In some embodiments, the modification results in cell toxicity.
[0013] In some embodiments, the system described herein is within a cell. In some embodiments, the cell comprises a eukaryotic cell. In some embodiments, the cell comprises a prokaryotic cell.
[0014] In one aspect, provided herein are methods of targeting and editing a target nucleic acid, the method comprising contacting the target nucleic acid with a system described herein.
[0015] In another aspect, provided herein are methods of targeting the insertion of a payload nucleic acid at a site of a target nucleic acid, the method comprising contacting the target nucleic acid with a system described herein.
[0016] In another aspect, provided herein are methods of targeting the excision of a payload nucleic acid from a site at a target nucleic acid, the method comprising contacting the target nucleic acid with a system described herein.
[0017] In some embodiments, the CRISPR-associated protein includes an amino acid sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% similarity to an amino acid sequence provided in any one of Tables 2-3.
[0018] In some embodiments, the crRNA or Guide includes a nucleic acid sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% similarity to a nucleic acid sequence provided in Table 4.
[0019] In some embodiments, the payload nucleic acid is flanked by transposon end sequences comprising a nucleic acid sequence having at least 85%, 90%, 95%, 96%, 97%, 98%, 99%, or 100% similarity to a nucleic acid sequence contained in Table 5.
[0020] In some embodiments, the CRISPR-associated protein includes at least one nuclear localization signal. In some embodiments, the CRISPR-associated protein includes at least one nuclear export signal.
[0021] In some embodiments, at least one component of the system is encoded by a codon-optimized nucleic acid for expression in a cell. In some embodiments, the codon-optimized nucleic acid is present within at least one vector.
[0022] In some embodiments, the at least one vector includes one or more regulatory elements operably linked to a nucleic acid encoding the component of the system. In some embodiments, the one or more regulatory elements comprise at least one promoter. In some embodiments, the at least one promoter comprises an inducible promoter or a constitutive promoter. In some embodiments, the at least one vector comprises a plurality of vectors. In some embodiments, the at least one vector is or includes a viral vector. In some embodiments, the viral vector is selected from the group consisting of a retroviral vector, a lentiviral vector, an adenoviral vector, an adeno-associated vector, and a herpes simplex vector.
[0023] In some embodiments, the system is present in a delivery system. In some embodiments, the delivery system comprises a delivery vehicle selected from the group consisting of a liposome, an exosome, a microvesicle, and a gene-gun.
[0024] In another aspect, provided herein is a cell including the system described herein. In some embodiments, the cell is a eukaryotic cell. In some embodiments, the eukaryotic cell is a mammalian cell or a plant cell. In some embodiments, the mammalian cell is a human cell. In some embodiments, the cell is a prokaryotic cell.
[0025] In some embodiments, the methods described herein result in an insertion or substitution of DNA to correct a native locus. In some embodiments, the methods described herein result in a targeted insertion of a DNA payload into a specific genomic target site. In some embodiments, the methods described herein result in a targeted deletion of DNA to correct a native locus.
[0026] The term "CRISPR-associated transposon" as used herein refers to a mobile genetic element having terminal transposon ends on both sides, which is acted upon by a CRISPR system described herein. In some embodiments, the CRISPR-associated transposon includes a gene encoding a CRISPR-associated transposase that is capable of facilitating the mobility (e.g., excision or deletion) of the CRISPR-associated transposon from a first site in a nucleic acid to a second site in a nucleic acid.
[0027] The term "CRISPR-associated transposase" as used herein refers to a protein including, one or more transposase domains that is encoded by a gene that in nature is present in a CRISPR-associated transposon. In some embodiments, the CRISPR-associated transposase is capable of facilitating the mobility of a CRISPR-associated transposon from a first site in a nucleic acid to a second site in a nucleic acid. In some embodiments, the CRISPR-associated transposase has integration activity. In some embodiments, the CRISPR-associated transposase has excision activity. In some embodiments, the CRISPR-associated transposase specifically targets a CRISPR-associated transposon for mobilization via an RNA guide.
[0028] The term "Guide" for a CRISPR-associated transposase system refers to either an RNA or DNA sequence that includes one or more direct repeat and spacer sequences, and that is capable of hybridizing to a target nucleic acid and to the proteins and/or nucleic acid of the CRISPR-associated transposon complex.
[0029] The term "cleavage event," as used herein, refers to a DNA break in a target nucleic acid created by a nuclease of a CRISPR system described herein. In some embodiments, the cleavage event is a double-stranded DNA break. In some embodiments, the cleavage event is a single-stranded DNA break.
[0030] The term "CRISPR system" as used herein refers to nucleic acids and/or proteins involved in the expression of, or directing the activity of, CRISPR-effectors, including sequences encoding CRISPR effectors, RNA guides, and other sequences and transcripts from a CRISPR locus.
[0031] The term "CRISPR array" as used herein refers to the nucleic acid (e.g., DNA) segment that includes CRISPR repeats and spacers, starting with the first nucleotide of the first CRISPR repeat and ending with the last nucleotide of the last (terminal) CRISPR repeat. Typically, each spacer in a CRISPR array is located between two repeats. The term "CRISPR repeat," or "CRISPR direct repeat," or "direct repeat," as used herein, refers to multiple short direct repeating sequences, which show very little or no sequence variation within a CRISPR array.
[0032] The term "CRISPR RNA" or "crRNA" as used herein refers to an RNA molecule comprising a guide sequence used by a CRISPR effector to specifically target a nucleic acid sequence. Typically crRNAs contains a sequence that mediates target recognition and a sequence that forms a duplex with a tracrRNA. The crRNA:tracrRNA duplex binds to a CRISPR effector. The term "donor template nucleic acid," as used herein refers to a nucleic acid molecule that can be used by one or more cellular proteins to alter the structure of a target nucleic acid after a CRISPR enzyme described herein has altered a target nucleic acid. In some embodiments, the donor template nucleic acid is a double-stranded nucleic acid. In some embodiments, the donor template nucleic acid is a single-stranded nucleic acid. In some embodiments, the donor template nucleic acid is linear. In some embodiments, the donor template nucleic acid is circular (e.g., a plasmid). In some embodiments, the donor template nucleic acid is an exogenous nucleic acid molecule. In some embodiments, the donor template nucleic acid is an endogenous nucleic acid molecule (e.g., a chromosome).
[0033] The term "CRISPR effector," "effector," "CRISPR-associated protein," or "CRISPR enzyme" as used herein refers to a protein that carries out an enzymatic acitivity or that binds to a target site on a nucleic acid specified by an RNA guide. In some embodiments, a CRISPR effector has endonuclease activity, nickase activity, exonuclease activity, transposase activity, and/or excision activity.
[0034] The term "RNA guide" as used herein refers to any RNA molecule that facilitates the targeting of a protein described herein to a target nucleic acid. Exemplary "RNA guides" include, but are not limited to, crRNAs, crRNAs in fused to tracrRNAs, or crRNAs hybridized to tracrRNAs.
[0035] The term "mobile genetic element" as used herein refers to a nucleic acid capable of being specifically recognized and mobilized from a nucleic acid. In some embodiments, the mobile genetic element comprises nucleic acid sequences at flanking terminal ends that are specifically recognized by a CRISPR-associated transposase.
[0036] The term "origin of replication," as used herein, refers to a nucleic acid sequence in a replicating nucleic acid molecule (e.g., a plasmid or a chromosome) at which replication is initiated.
[0037] As used herein, the term "target nucleic acid" refers to a specific nucleic acid sequence that is to be modified by a CRISPR system described herein. In some embodiments, the target nucleic acid comprises a gene. In some embodiments, the target nucleic acid comprises a non-coding region (e.g., a promoter). In some embodiments, the target nucleic acid is single-stranded. In some embodiments, the target nucleic acid is double-stranded.
[0038] The terms "trans-activating crRNA" or "tracrRNA" as used herein refer to an RNA including a sequence that forms a structure required for a CRISPR effector to bind to a specified target nucleic acid.
[0039] A "transcriptionally-active site" as used herein refers to a site in a nucleic acid sequence comprising promoter regions at which transcription is initiated and actively occurring.
[0040] The term "collateral RNAse activity," as used herein in reference to a CRISPR enzyme, refers to non-specific RNAse activity of a CRISPR enzyme after the enzyme has modified a specifically-targeted nucleic acid.
[0041] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.
[0042] Other features and advantages of the invention will be apparent from the following detailed description, and from the claims.
BRIEF FIGURE DESCRIPTION
[0043] FIGS. 1A-D show the conserved effectors, CRISPR array, and Tnp end elements by bacterial genome accession and species for representative CLUST.009925 loci.
[0044] FIGS. 2A and 2B show a phylogenetic tree of CLUST.009925 effector A proteins.
[0045] FIGS. 3A and 3B show an alignment of CLUST.009925 effector A proteins by genome accession and species, highlighting sequence location of conserved residues, with color denoting nucleotide polarity (Yellow: Non-polar side chain, Green: Polar side chain, Blue: Basic, Red: Acidic).
[0046] FIG. 4 shows PFAM domains identified within CLUST.009925 effector A proteins.
[0047] FIGS. 5A and 5B show a phylogenetic tree of CLUST.009925 effector B proteins.
[0048] FIGS. 6A and 6B shows an alignment of CLUST.009925 effector B proteins by genome accession and species, highlighting sequence location of conserved residues, with color denoting nucleotide polarity (Yellow: Non-polar side chain, Green: Polar side chain, Blue: Basic, Red: Acidic).
[0049] FIG. 7 shows PFAM domains identified within CLUST.009925 effector B proteins.
[0050] FIG. 8 shows a schematic of natural and engineered components for the CRISPR transposition system of CLUST.009925.
[0051] FIG. 9 shows a schematic of transposon excision in the engineered system of CLUST.009925.
[0052] FIG. 10 shows a schematic of RNA guided transposon insertion in the engineered system of CLUST.009925.
[0053] FIG. 11 shows a schematic CRISPR negative selection screening
[0054] FIGS. 12 and 13 show depletion distributions for direct repeats and spacers targeting pACYC and E. coli essential genes. To quantify depletion, a fold-depletion ratio was calculated as R.sub.treated/R.sub.input for each direct repeat and spacer. The normalized input read count is computed as:
R.sub.input=# reads containing DR+spacer/total reads
without expressing the NZ_BCQZ01000006 CLUST.009925 system and RNA guide. The treated read count is computed as
R.sub.treated=(1+# reads containing DR+spacer)/total # reads
with expression of the NZ_BCQZ01000006 CLUST.009925 system and RNA guide. A strongly depleted target has a fold depletion greater than 3, which is marked by the red lines.
[0055] FIGS. 14A and 14B show the target site mapping of depleted RNA guides targeting pACYC (A) and E. coli essential genes (B) for the NZ_BCQZ01000006 CLUST.009925 CRISPR-Cas system.
[0056] FIGS. 15A-C shows a weblogo of the sequences flanking the left (A) and right (B) sides of depleted targets for the NZ_BCQZ01000006 CLUST.009925 CRISPR-Cas system.
DETAILED DESCRIPTION
[0057] The disclosure relates to the use of computational methods and algorithms to predict new CRISPR-Cas systems and identify their components.
[0058] In one embodiment, the disclosure includes new computational methods for identifying novel CRISPR loci by:
[0059] detecting all potential CRISPR arrays in prokaryotic data sources (contig, scaffold, or complete genome);
[0060] identifying all predicted protein coding genes in close proximity to a CRISPR array (e.g. 10 kb);
[0061] forming protein clusters (putative protein families) around identified genes, using, for example, mmseqs2;
[0062] selecting clusters of proteins of unknown function, and identifying homologs in the wider prokaryotic set of proteins using, e.g., BLAST or UBLAST;
[0063] identifying clusters of proteins with a large percentage of homologs co-occurring with CRISPR arrays; and
[0064] predicting the functional domains of the proteins in the identified cluster, e.g., by using hmmsearch on each member of the cluster individually, or by, for example, using a profile hidden Markov model (HMM) constructed from the multiple alignment.
[0065] External databases of functional domains include, for example, Pfam and Uniprot. Multiple alignment can be done using, e.g., mafft.
[0066] In another aspect, the disclosure relates to defining the minimal elements of novel CRISPR systems by:
[0067] identifying conserved elements (both coding genes and non-coding) in the loci surrounding each cluster (in one aspect, this can be done manually by inspection on a case by case analysis);
[0068] identifying specific conserved non-coding elements that may be terminal repeats required for transposon activity;
[0069] identifying the RNA guide associated with each protein (identify conserved direct repeat structures and then attach a non-natural spacer sequence to the direct repeat. The effect of the non-natural spacer is to induce the effector to target a novel DNA or RNA substrate);
[0070] identifying the minimal RNA or DNA target (identify RNA and DNA targeting a by testing targeting activity of the effector(s) in combination with crRNAs containing multiple different engineered spacer sequences targeting a high diversity of DNA and RNA substrates);
[0071] identifying the minimal system necessary to achieve activity; and
[0072] testing all conserved elements together and then systematically removing different proteins while preserving activity.
[0073] The broad natural diversity of CRISPR-Cas defense systems contains a wide range of activity mechanisms and functional elements that can be harnessed for programmable biotechnologies. In a natural system, these mechanisms and parameters enable efficient defense against foreign DNA and viruses while providing self vs. non-self discrimination to avoid self-targeting. In an engineered system, the same mechanisms and parameters also provide a diverse toolbox of molecular technologies and define the boundaries of the targeting space. For instance, systems Cas9 and Cas13a have canonical DNA and RNA endonuclease activity and their targeting spaces are defined by the protospacer adjacent motif (PAM) on targeted DNA and protospacer flanking sites (PFS) on targeted RNA, respectively.
[0074] The methods described herein can be used to discover additional mechanisms and parameters within single subunit Class 2 effector systems that can be more effectively harnessed for programmable biotechnologies.
POOLED-SCREENING
[0075] To efficiently validate the activity of the engineered novel CRISPR-Cas systems and simultaneously evaluate in an unbiased manner different activity mechanisms and functional parameters, we used a new pooled-screening approach in E. coli. First, from the computational identification of the conserved protein and noncoding elements of the novel CRISPR-Cas system, DNA synthesis and molecular cloning was used to assemble the separate components into a single artificial expression vector, which in one embodiment is based on a pET-28a+ backbone. In a second embodiment, the effectors and noncoding elements are transcribed on a single mRNA transcript, and different ribosomal binding sites are used to translate individual effectors.
[0076] Second, the natural crRNA and targeting spacers were replaced with a library of unprocessed crRNAs containing non-natural spacers targeting the essential genes of the host E. coli, or a second plasmid bearing antibiotic resistance, pACYC184. This crRNA library was cloned into the vector backbone containing the protein effectors and noncoding elements (e.g. pET-28a+), and then subsequently transformed the library into E. coli along with the pACYC184 plasmid target. Consequently, each resulting E. coli cell contains no more than one targeting spacer.
[0077] Third, the E. coli were grown under antibiotic selection. In one embodiment, triple antibiotic selection is used: kanamycin for ensuring successful transformation of the pET-28a+ vector containing the engineered CRISPR-Cas effector system, and chloramphenicol and tetracycline for ensuring successful co-transformation of the pACYC184 target vector. Since pACYC184 normally confers resistance to chloramphenicol and tetracycline, under antibiotic selection, positive activity of the novel CRISPR-Cas system targeting the plasmid will eliminate cells that actively express the effectors, noncoding elements, and specific active elements of the crRNA library. Examining the population of surviving cells at a later time point compared to an earlier time point results in a depleted signal compared to the inactive crRNAs.
[0078] Since the pACYC184 plasmid contains a diverse set of features and sequences that may affect the activity of a CRISPR-Cas system, mapping the active crRNAs from the pooled screen onto pACYC184 provides patterns of activity that can be suggestive of different activity mechanisms and functional parameters in a broad, hypothesis-agnostic manner. In this way, the features required for reconstituting the novel CRISPR-Cas system in a heterologous prokaryotic species can be more comprehensively tested and studied.
[0079] The key advantages of the in vivo pooled-screen described herein include:
[0080] (1) Versatility--Plasmid design allows multiple effectors and/or noncoding elements to be expressed; library cloning strategy enables both transcriptional directions of the computationally predicted crRNA to be expressed;
[0081] (2) Comprehensive tests of activity mechanisms & functional parameters--Evaluates diverse interference mechanisms, including DNA or RNA cleavage; DNA excision and/or insertion via transposition; examines co-occurrence of features such as transcription, plasmid DNA replication; and flanking sequences for crRNA library can be used to reliably determine PAMs with complexity equivalence of 4N's;
[0082] (3) Sensitivity--by using as targets the low copy number of pACYC184 and the single copy of the E. coli genome, this screen design enables high sensitivity for CRISPR-Cas activity since even modest interference rates can result in loss of cell viability through loss of antibiotic resistance or essential gene targeting; and
[0083] (4) Efficiency--Optimized molecular biology steps to enable greater speed and throughput, as RNA-sequencing and protein expression samples can be directly harvested from the surviving cells in the screen.
[0084] The novel CRISPR-Cas families described herein were evaluated using this in vivo pooled-screen to evaluate their operational elements, mechanisms and parameters, as well as their ability to be active and reprogrammed in an engineered system outside of their natural cellular environment.
CRISPR Enzymes Associated with Mobile Genetic Elements
[0085] This disclosure provides mobile genetic elements (e.g., CRISPR-associated transposons) associated with CRISPR systems described herein. These mobile genetic elements can be genetically-altered to delete and/or add one or more components (e.g., a gene encoding a therapeutic product), thereby resulting in mobile genetic elements that can be readily inserted into or removed from a target nucleic acid. In some embodiments, the CRISPR systems encoded by these mobile genetic elements include one or more effector proteins, referred to herein as "CRISPR-associated transposases," which facilitate the movement of the mobile genetic elements from a first site in a nucleic acid to a second site in a nucleic acid. As described in further detail below, in some embodiments, the activity (e.g., excision activity or integration activity) of the CRISPR-associated transposases can be directed to a particular site on a target nucleic acid using a RNA guide that is complementary to a nucleic acid sequence on the target nucleic acid. In some embodiments, the RNA guides are engineered to specifically target the insertion, excision, and/or mobilization of the mobile genetic element to any site in a target nucleic acid of interest.
[0086] In some embodiments, the disclosure provides CRISPR-associated transposons that include one or more genes encoding a CRISPR-associated transposase and an RNA guide. In some embodiments, the CRISPR-associated transposons include a payload nucleic acid. In some embodiments, the payload nucleic acid includes a gene of interest. In some embodiments, the gene of interest is operably linked to a promoter (e.g., a constitutive promoter or an inducible promoter). In some embodiments, the gene of interest encodes a therapeutic protein.
[0087] This disclosure further provides a class of CRISPR effectors, referred to herein as CRISPR-associated transposases, capable of facilitating the movement of a mobile genetic element (e.g., a CRISPR-associated transposon), wherein the targeting of the mobile genetic element is facilitated by RNA guides. Mobile genetic elements in CLUST.004377, CLUST.009467, and CLUST.009925 comprise CRISPR-associated transposons, including genes encoding CRISPR-associated transposases having a rye integrase domain (also referred to herein as rve integrase domain-containing effectors).
[0088] RNA-Guided DNA Insertion
[0089] In some embodiments, a CRISPR system described herein mediates the insertion of a nucleotide payload into a target nucleic acid sequence. This activity is facilitated by one or more CRISPR-associated transposases present in the CRISPR system. In some embodiments, the CRISPR-associated transposase comprises one or more of a rve integrase domain a TniQ domain, a TniB domain, or a TnpB domain. In some embodiments, the site of the target nucleic acid where the nucleotide payload is to be inserted is specified by a guide RNA.
[0090] RNA-Guided DNA Excision
[0091] In some embodiments, a CRISPR system described herein mediates the excision of a nucleotide payload from a target nucleic acid sequence. This activity is facilitated by one or more CRISPR-associated transposases present in the CRISPR system. In some embodiments, the CRISPR-associated transposase comprises one or more of a rve domain a TniQ domain, a TniB domain, or a TnpB domain. In some embodiments, the site of the target nucleic acid where the nucleotide payload is to be excised is specified by a guide RNA.
[0092] RNA-Guided DNA Mobilization
[0093] In some embodiments, a CRISPR system described herein mediates the excision of a nucleotide payload from a first site in a target nucleic acid, mobilization of the nucleotide payload, and insertion of the nucleotide payload at a second site in a target nucleic acid. This activity is facilitated by one or more CRISPR-associated transposases present in the CRISPR system. In some embodiments, the CRISPR-associated transposase comprises one or more of a rve domain a TniQ domain, a TniB domain, or a TnpB domain. In some embodiments, the first site of the target nucleic acid where the nucleotide payload is to be excised is specified by a guide RNA. In some embodiments, the second site in the target nucleic acid where the nucleotide payload is to be inserted is specified by a guide RNA.
[0094] Transposase Activity
[0095] In some embodiments, a a CRISPR effector described herein (e.g., a CRISPR-associated transposase) comprises transposase activity. DNA transposition is one of the mechanisms by which genome rearrangement and horizontal gene transfer occurs in prokaryotic and eukaryotic cells. Transposons are DNA sequences that are capable of being moved within a genome. During DNA transposition a transposase recognizes transposon-specific sequences that flank an intervening DNA sequence. The transposase recognizes the transposon, excises the transposon from one location in a nucleic acid sequence, and inserts it (including the inverted repeat sequences with the intervening sequence) into another location. In some instances, the end sequences of the transposon comprise short inverted terminal repeats comprising duplications of a short segment of the sequence flanking the insertion sites that are characteristic for each transposon.
[0096] The mechanisms involved in transposition mediated by several transposon systems including Tc1, Tol2, Minos, Himar1, Hsmar1, Mos1, Frog Prince, Piggyback, and Sleeping Beauty have been characterized (see, e.g., Chitilian et al. (2014) Stem Cells 32: 204-15; Ivics et al. (1997) Cell 91: 501-10; Miskey et al. (2003) Nucleic Acids Res. 31: 6873-81; Urschitz et al. (2013) Mob. Genet. Elements 3: e25167; Jursch et al. (2013) Mob. DNA 4: 15; Pflieger et al. (2014) J. Biol. Chem. 289: 100-11); and Hou et al. (2015) Cancer Biol. Ther. 16(1): 8-16). For example, the Sleeping Beauty transposase (SBase) functions through a "cut-and-paste" process (see, e.g., Yant et al. (2004) Mol. Cell. Biol. 4: 9239-47). In some embodiments, a CRISPR-associated transposase described herein comprise a transposase domain or a transposase-like domain. For example, in some embodiments, the CRISPR-associated transposase comprises a Mu-transposase domain or module. In some embodiments, the CRISPR-associated transposase comprises a TniQ transposase domain. In some embodiments, the CRISPR-associated transposase comprises a TniB transposase domain. In some embodiments, the transposase domain is a OrfB_IS605 domain.
[0097] In some embodiments, the transposase domain of a CRISPR-associated transposase described herein is capable of one or more of the following activities: (a) excising a nucleic acid fragment from its situs on in a genome (i.e., excision activity); (b) mediating the integration of a nucleic acid fragment into a situs in a genome (i.e., integration activity); and/or (c) specifically recognizing a transposon element (e.g., a specific inverted repeat sequence). In some embodiments, a transposase domain of a CRISPR-associated transposase described herein is modified to eliminate one or more activities. In some embodiments, a transposase domain of a CRISPR-associated transposase described herein may be mutated such that the transposase domain comprises excision activity, but does not comprise integration activity. In some embodiments, a transposase domain of a CRISPR-associated transposase described herein is mutated such that a transpose domain comprises integration activity, but does not comprise excision activity. In some embodiments, a transposase domain of a CRISPR-associated transposase described herein is mutated such that the transposase domain comprises integration activity and excision activity, but lacks the ability to recognize a specific nucleic acid sequence. In some embodiments, a transposase domain of a CRISPR-associated transposase described herein is mutated such that the transposase domain comprises integration activity, but lacks excision activity and the ability to recognize a specific nucleic acid sequence. In some embodiments, a transposase domain of a CRISPR-associated transposase described herein is mutated such that the transposase domain comprises excision activity, but lacks integration activity and the ability to recognize a specific nucleic acid sequence.
[0098] In some embodiments, the CRISPR-associated transposases described herein can be used to facilitate the random insertion of a nucleic acid sequence into the genome of a cell. In some embodiments, the CRISPR-associated transposases described herein may be used to facilitate the targeted insertion or excision of a nucleic acid sequence into or out of the genome of a cell. In some embodiments, the nucleic acid sequence that is inserted into the cell is flanked by inverted repeat sequences that are specifically recognized by a transposase domain of the CRISPR-associated transposase. In some embodiments, the intervening nucleic acid sequence between these inverted repeat sequences is at least 10 bp, at least 20 bp, at least 30 bp, at least 40 bp, at least 50 bp, at least 60 bp, at least 70 bp, at least 80 bp, at least 90 bp, at least 100 bp, at least 150 bp, at least 200 bp, at least 250 bp, at least 300 bp, at least 400 bp, at least 500 bp, at least 600 bp, at least 700 bp, at least 800 bp, at least 900 bp, at least 1000 bp, at least 1,500 bp, at least 2,000 bp, at least 3,000 bp, at least 4,000 bp, at least 5,000 bp in length, at least 6,000 bp in length, at least 7,000 bp in length, at least 8,000 bp in length, at least 9,000 bp in length, or at least 10,000 bp in length,. In some embodiments, the intervening nucleic acid sequence between these inverted repeat sequences is less than 10 bp, less than 20 bp, less than 30 bp, less than 40 bp, less than 50 bp, less than 60 bp, less than 70 bp, less than 80 bp, less than 90 bp, less than 100 bp, less than 150 bp, less than 200 bp, less than 250 bp, less than 300 bp, less than 400 bp, less than 500 bp, less than 600 bp, less than 700 bp, less than 800 bp, less than 900 bp, less than 1000 bp, less than 1,500 bp, less than 2,000 bp, less than 3,000 bp, less than 4,000 bp, less than 5,000 bp in length, or less than 10,000 bp in length.
[0099] Effectors Comprising Helix-Turn-Helix Domains
[0100] In some embodiments, effectors within the systems described herein may contain a helix-turn-helix (HTH) domain. HTH domains typically include or consist of two a-helices forming an internal angle of approximately 120 degrees that are connected by a short strand or turn of amino acid residues, and are present in a multitude of prokaryotic and eukaryotic DNA-binding proteins, such as transcription factors (see, e.g., Aravind et al. (2005) FEMS Microbiol. Rev. 29(2): 231-62). Beyond transcriptional regulation, HTH domains are involved in a wide range of functions including DNA repair and replication, RNA metabolism and protein-protein interactions. In some embodiments, an effector described herein comprises DNA-binding activity. In some embodiments, an effector described herein comprises RNA-binding activity. In some embodiments, an effector described herein comprises a HTH domain that mediates a protein-protein interaction.
CRISPR ENZYME MODIFICATIONS
[0101] Modulating Insertion/Excision/Transposition Activity of CRISPR Enzymes
[0102] The activity of CRISPR-associated transposases may be altered in ways to change the relative efficiencies of insertion and excision. Altering these activities enable greater control in the different modes of transposase directed genome editing; for instance, CRISPR-associated transposases with predominantly insertion activity can be used for locus-specific insertion of transgenes to enable applications including, but not limited to, therapeutics (e.g., fetal hemoglobin expression for treatment of thalassemia), genetic engineering (e.g., trait stacking in plant genome engineering), or engineered cells (e.g., introducing control circuits or custom chimeric antigen receptors for CAR-T cell engineering). Alternatively, CRISPR-associated transposases with predominantly excision activity can also be used in applications, such as restoring the genome of engineered cells through excising inserted transcription factors. These CRISPR-associated transposases can be modified to have diminished excision or insertion activity, e.g., excision or insertion inactivation of at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 97%, or 100% as compared with the wild type CRISPR-associated transposases. Modulation of the insertion or excision activity can be done with several methods known in the art, such as introducing mutations into the catalytic core of the transposase. In some embodiments, catalytic residues for the transposase activities are identified, and these amino acid residues can be substituted by different amino acid residues (e.g., glycine or alanine) to diminish the transposase activity. An example in which modifications to the catalytic core of a transposase yielded differential effects on excision versus insertion is with the generation of an excision competent/integration defective variant of the piggyBac transposase (Exc+/Int- PB) Li, Xianghong et al. "piggyBac transposase tools for genome engineering," Proc. Nat'l. Acad. Sci., 1073.10 (2013): 2279-2287. In other embodiments, the variants to modulate excision/insertion activity can be combined with other CRISPR-associated transposase modifications, such as a mutations that relax or make more stringent the targeting space or PAM constraints. Furthermore, these mutations are not restricted to the protein effectors of the transposon, but may be found on the noncoding elements such as the transposase ends. Together, these may yield a set of mutations that enable the tuning of the enzymatic activities of the CRISPR-associated transposase.
[0103] Generation of Fusion Proteins
[0104] Additionally, CRISPR-associated transposases, whether in its native functional form or with mutations to modulation its activity, can provide a foundation from which fusion proteins with additional functional proteins can be created. The inactivated CRISPR enzymes can comprise or be associated with one or more functional domains (e.g., via fusion protein, linker peptides, "GS" linkers, etc.). These functional domains can have various activities, e.g., methylase activity, demethylase activity, transcription activation activity, transcription repression activity, transcription release factor activity, histone modification activity, RNA cleavage activity, DNA cleavage activity, nucleic acid binding. In some embodiments, the functional domains are Kruppel associated box (KRAB), VP64, VP16, Fok1, P65, HSF1, MyoD1, and biotin-APEX.
[0105] The positioning of the one or more functional domains on the CRISPR transposase is one that allows for correct spatial orientation for the functional domain to affect the target with the attributed functional effect. For example, if the functional domain is a transcription activator (e.g., VP16, VP64, or p65), the transcription activator is placed in a spatial orientation that allows it to affect the transcription of the target. Likewise, a transcription repressor is positioned to affect the transcription of the target, and a nuclease (e.g., Fok1) is positioned to cleave or partially cleave the target. In some embodiments, the functional domain is positioned at the N-terminus of the CRISPR enzyme. In some embodiments, the functional domain is positioned at the C-terminus of the CRISPR enzyme. In some embodiments, the CRISPR enzyme is modified to comprise a first functional domain at the N-terminus and a second functional domain at the C-terminus.
[0106] The addition of functional domains to the CRISPR-associated transposase or onto other effector proteins in the complex may provide an ability for the transposase system to modify the the physical DNA (e.g., methylation, etc.) or its regulation (e.g., transcriptional or repression) in situ.
[0107] Split Enzymes
[0108] The present disclosure also provides a split version of the CRISPR enzymes described herein. The split version of the CRISPR enzymes may be advantageous for delivery. In some embodiments, the CRISPR enzymes are split to two parts of the enzymes, which together substantially comprises a functioning CRISPR enzyme.
[0109] The split can be done in a way that the catalytic domain(s) are unaffected. The CRISPR enzymes may function as a nuclease or may be inactivated enzymes, which are essentially RNA-binding proteins with very little or no catalytic activity (e.g., due to mutation(s) in its catalytic domains).
[0110] In some embodiments, the nuclease lobe and a-helical lobe are expressed as separate polypeptides. Although the lobes do not interact on their own, the guide RNA recruits them into a ternary complex that recapitulates the activity of full-length CRISPR enzymes and catalyzes site-specific DNA cleavage. The use of a modified guide RNA abrogates split-enzyme activity by preventing dimerization, allowing for the development of an inducible dimerization system. The split enzyme is described, e.g., in Wright, Addison V., et al. "Rational design of a split-Cas9 enzyme complex," Proc. Nat'l. Acad. Sci., 112.10 (2015): 2984-2989, which is incorporated herein by reference in its entirety.
[0111] In some embodiments, the split enzyme can be fused to a dimerization partner, e.g., by employing rapamycin sensitive dimerization domains. This allows the generation of a chemically inducible CRISPR enzyme for temporal control of CRISPR enzyme activity. The CRISPR enzymes can thus be rendered chemically inducible by being split into two fragments and rapamycin-sensitive dimerization domains can be used for controlled reassembly of the CRISPR enzymes.
[0112] The split point is typically designed in silico and cloned into the constructs. During this process, mutations can be introduced to the split enzyme and non-functional domains can be removed. In some embodiments, the two parts or fragments of the split CRISPR enzyme (i.e., the N-terminal and C-terminal fragments), can form a full CRISPR enzyme, comprising, e.g., at least 70%, at least 80%, at least 90%, at least 95%, or at least 99% of the sequence of the wild-type CRISPR enzyme.
[0113] Self-Activating or Inactivating Enzymes
[0114] The CRISPR enzymes described herein can be designed to be self-activating or self-inactivating. In some embodiments, the CRISPR enzymes are self-inactivating. For example, the target sequence can be introduced into the CRISPR enzyme coding constructs. Thus, the CRISPR enzymes can modify the target sequence, as well as the construct encoding the enzyme thereby self-inactivating their expression. Methods of constructing a self-inactivating CRISPR system is described, e.g., in Epstein, Benjamin E., and David V. Schaffer. "Engineering a Self-Inactivating CRISPR System for AAV Vectors," Mol. Ther., 24 (2016): S50, which is incorporated herein by reference in its entirety.
[0115] In some other embodiments, an additional guide RNA, expressed under the control of a weak promoter (e.g., 7SK promoter), can target the nucleic acid sequence encoding the CRISPR enzyme to prevent and/or block its expression (e.g., by preventing the transcription and/or translation of the nucleic acid). The transfection of cells with vectors expressing the CRISPR enzyme, guide RNAs, and guide RNAs that target the nucleic acid encoding the CRISPR enzyme can lead to efficient disruption of the nucleic acid encoding the CRISPR enzyme and decrease the levels of CRISPR enzyme, thereby limiting the genome editing activity.
[0116] In some embodiments, the genome editing activity of the CRISPR enzymes can be modulated through endogenous RNA signatures (e.g., miRNA) in mammalian cells. The CRISPR enzyme switch can be made by using a miRNA-complementary sequence in the 5'-UTR of mRNA encoding the CRISPR enzyme. The switches selectively and efficiently respond to miRNA in the target cells. Thus, the switches can differentially control the genome editing by sensing endogenous miRNA activities within a heterogeneous cell population. Therefore, the switch systems can provide a framework for cell-type selective genome editing and cell engineering based on intracellular miRNA information (Hirosawa, Moe et al. "Cell-type-specific genome editing with a microRNA-responsive CRISPR-Cas9 switch," Nucl. Acids Res., 2017 Jul. 27; 45(13): e118).
[0117] Inducible CRISPR Enzymes
[0118] The CRISPR enzymes can be inducible, e.g., light inducible or chemically inducible. This mechanism allows for activation of the functional domain in the CRISPR enzymes. Light inducibility can be achieved by various methods known in the art, e.g., by designing a fusion complex wherein CRY2PHR/CIBN pairing is used in split CRISPR Enzymes (see, e.g., Konermann et al. "Optical control of mammalian endogenous transcription and epigenetic states," Nature, 500.7463 (2013): 472). Chemical inducibility can be achieved, e.g., by designing a fusion complex wherein FKBP/FRB (FK506 binding protein/FKBP rapamycin binding domain) pairing is used in split CRISPR Enzymes. Rapamycin is required for forming the fusion complex, thereby activating the CRISPR enzymes (see, e.g., Zetsche, Volz, and Zhang, "A split-Cas9 architecture for inducible genome editing and transcription modulation," Nature Biotech., 33.2 (2015): 139-142).
[0119] Furthermore, expression of the CRISPR enzymes can be modulated by inducible promoters, e.g., tetracycline or doxycycline controlled transcriptional activation (Tet-On and Tet-Off expression system), hormone inducible gene expression system (e.g., an ecdysone inducible gene expression system), and an arabinose-inducible gene expression system. When delivered as RNA, expression of the RNA targeting effector protein can be modulated via a riboswitch, which can sense a small molecule like tetracycline (see, e.g., Goldfless, Stephen J. et al. "Direct and specific chemical control of eukaryotic translation with a synthetic RNA-protein interaction," Nucl. Acids Res., 40.9 (2012): e64-e64).
[0120] Various embodiments of inducible CRISPR enzymes and inducible CRISPR systems are described, e.g., in U.S. Pat. No. 8,871,445, US20160208243, and WO2016205764, each of which is incorporated herein by reference in its entirety.
[0121] Functional Mutations
[0122] Various mutations or modifications can be introduced into CRISPR enzymes as described herein to improve specificity and/or robustness. In some embodiments, the amino acid residues that recognize the Protospacer Adjacent Motif (PAM) are identified. The CRISPR enzymes described herein can be modified further to recognize different PAMs, e.g., by substituting the amino acid residues that recognize PAM with other amino acid residues.
[0123] In some embodiments, at least one Nuclear Localization Signal (NLS) is attached to the nucleic acid sequences encoding the CRISPR enzyme. In some embodiments, at least one Nuclear Export Signal (NES) is attached to the nucleic acid sequences encoding the CRISPR enzyme. In a preferred embodiment a C-terminal and/or N-terminal NLS or NES is attached for optimal expression and nuclear targeting in eukaryotic cells, e.g., human cells.
[0124] In some embodiments, the CRISPR enzymes described herein are mutated at one or more amino acid residues to alter one or more functional activities. For example, in some embodiments, the CRISPR enzyme is mutated at one or more amino acid residues to alter its helicase activity. In some embodiments, the CRISPR enzyme is mutated at one or more amino acid residues to alter its nuclease activity (e.g., endonuclease activity or exonuclease activity). In some embodiments, the CRISPR enzyme is mutated at one or more amino acid residues to alter its ability to functionally associate with a guide RNA. In some embodiments, the CRISPR enzyme is mutated at one or more amino acid residues to alter its ability to functionally associate with a target nucleic acid.
[0125] In some embodiments, the CRISPR enzymes described herein are capable of modifying a target nucleic acid molecule. In some embodiments, the CRISPR enzyme modifies both strands of the target nucleic acid molecule. However, in some embodiments, the CRISPR enzyme is mutated at one or more amino acid residues to alter its nucleic acid manipulation activity. For example, in some embodiments, the CRISPR enzyme may comprise one or more mutations which render the enzyme incapable of cleaving a target nucleic acid or inserting/excising a target sequence.
[0126] In some embodiments, a CRISPR enzyme described herein may be engineered to comprise a deletion in one or more amino acid residues to reduce the size of the enzyme while retaining one or more desired functional activities (e.g., nuclease activity and the ability to functionally interact with a guide RNA). The truncated CRISPR enzyme may be advantageously used in combination with delivery systems having load limitations.
[0127] In one aspect, the present disclosure provides nucleotide sequences that are at least 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to the sequences described herein. In another aspect, the present disclosure also provides amino acid sequences that are at least 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to the sequences described herein.
[0128] In some embodiments, the nucleic acid sequences have at least a portion (e.g., at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 30, 40, 50, 60, 70, 80, 90, or 100 nucleotides, e.g., contiguous or non-contiguous nucleotides) that are the same as the sequences described herein. In some embodiments, the nucleic acid sequences have at least a portion (e.g., at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 30, 40, 50, 60, 70, 80, 90, or 100 nucleotides, e.g., contiguous or non-contiguous nucleotides) that is different from the sequences described herein.
[0129] In some embodiments, the amino acid sequences have at least a portion (e.g., at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 30, 40, 50, 60, 70, 80, 90, or 100 amino acid residues, e.g., contiguous or non-contiguous amino acid residues) that is the same as the sequences described herein. In some embodiments, the amino acid sequences have at least a portion (e.g., at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 30, 40, 50, 60, 70, 80, 90, or 100 amino acid residues, e.g., contiguous or non-contiguous amino acid residues) that is different from the sequences described herein.
[0130] To determine the percent identity of two amino acid sequences, or of two nucleic acid sequences, the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in one or both of a first and a second amino acid or nucleic acid sequence for optimal alignment and non-homologous sequences can be disregarded for comparison purposes). In general, the length of a reference sequence aligned for comparison purposes should be at least 80% of the length of the reference sequence, and in some embodiments is at least 90%, 95%, or 100% of the length of the reference sequence. The amino acid residues or nucleotides at corresponding amino acid positions or nucleotide positions are then compared. When a position in the first sequence is occupied by the same amino acid residue or nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position. The percent identity between the two sequences is a function of the number of identical positions shared by the sequences, taking into account the number of gaps, and the length of each gap, which need to be introduced for optimal alignment of the two sequences. For purposes of the present disclosure, the comparison of sequences and determination of percent identity between two sequences can be accomplished using a Blosum 62 scoring matrix with a gap penalty of 12, a gap extend penalty of 4, and a frameshift gap penalty of 5.
GUIDE RNA MODIFICATIONS
[0131] Spacer Lengths
[0132] The spacer length of guide RNAs can range from about 15 to 50 nucleotides. In some embodiments, the spacer length of a guide RNA is at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, or at least 22 nucleotides. In some embodiments, the spacer length is from 15 to 17 nucleotides, from 17 to 20 nucleotides, from 20 to 24 nucleotides (e.g., 20, 21, 22, 23, or 24 nucleotides), from 23 to 25 nucleotides (e.g., 23, 24, or 25 nucleotides), from 24 to 27 nucleotides, from 27 to 30 nucleotides, from 30 to 45 nucleotides (e.g., 30, 31, 32, 33, 34, 35, 40, or 45 nucleotides), from 30 or 35 to 40 nucleotides, from 41 to 45 nucleotides, from 45 to 50 nucleotides, or longer. In some embodiments, the direct repeat length of the guide RNA is at least 16 nucleotides, or is from 16 to 20 nucleotides (e.g., 16, 17, 18, 19, or 20 nucleotides). In some embodiments, the direct repeat length of the guide RNA is 19 nucleotides.
[0133] The guide RNA sequences can be modified in a manner that allows for formation of the CRISPR complex and successful binding to the target, while at the same time not allowing for successful effector activity (i.e., without excision activity/without insertion activity/without nuclease activity). These modified guide sequences are referred to as "dead guides" or "dead guide sequences." These dead guides or dead guide sequences may be catalytically inactive or conformationally inactive with regard to nuclease activity. Dead guide sequences are typically shorter than respective guide sequences that result in active modification. In some embodiments, dead guides are 5%, 10%, 20%, 30%, 40%, or 50%, shorter than respective guide RNAs that have nuclease activity. Dead guide sequences of guide RNAs can be from 13 to 15 nucleotides in length (e.g., 13, 14, or 15 nucleotides in length), from 15 to 19 nucleotides in length, or from 17 to 18 nucleotides in length (e.g., 17 nucleotides in length).
[0134] Thus, in one aspect, the disclosure provides non-naturally occurring or engineered CRISPR systems including a functional CRISPR enzyme as described herein, and a guide RNA (gRNA) wherein the gRNA comprises a dead guide sequence whereby the gRNA is capable of hybridizing to a target sequence such that the CRISPR system is directed to a genomic locus of interest in a cell without detectable nucleic acid modification activity.
[0135] A detailed description of dead guides is described, e.g., in WO 2016094872, which is incorporated herein by reference in its entirety.
[0136] Inducible Guides
[0137] Guide RNAs can be generated as components of inducible systems. The inducible nature of the systems allows for spatiotemporal control of gene editing or gene expression. In some embodiments, the stimuli for the inducible systems include, e.g., electromagnetic radiation, sound energy, chemical energy, and/or thermal energy.
[0138] In some embodiments, the transcription of guide RNA can be modulated by inducible promoters, e.g., tetracycline or doxycycline controlled transcriptional activation (Tet-On and Tet-Off expression systems), hormone inducible gene expression systems (e.g., ecdysone inducible gene expression systems), and arabinose-inducible gene expression systems. Other examples of inducible systems include, e.g., small molecule two-hybrid transcription activations systems (FKBP, ABA, etc.), light inducible systems (Phytochrome, LOV domains, or cryptochrome), or Light Inducible Transcriptional Effector (LITE). These inducible systems are described, e.g., in WO 2016205764 and U.S. Pat. No. 8,795,965, both of which are incorporated herein by reference in the entirety.
[0139] Chemical Modifications
[0140] Chemical modifications can be applied to the guide RNA's phosphate backbone, sugar, and/or base. Backbone modifications such as phosphorothioates modify the charge on the phosphate backbone and aid in the delivery and nuclease resistance of the oligonucleotide (see, e.g., Eckstein, "Phosphorothioates, essential components of therapeutic oligonucleotides," Nucl. Acid Ther., 24 (2014), pp. 374-387); modifications of sugars, such as 2'-O-methyl (2'-OMe), 2'-F, and locked nucleic acid (LNA), enhance both base pairing and nuclease resistance (see, e.g., Allerson et al. "Fully 2'-modified oligonucleotide duplexes with improved in vitro potency and stability compared to unmodified small interfering RNA," J. Med. Chem., 48.4 (2005): 901-904). Chemically modified bases such as 2-thiouridine or N6-methyladenosine, among others, can allow for either stronger or weaker base pairing (see, e.g., Bramsen et al., "Development of therapeutic-grade small interfering RNAs by chemical engineering," Front. Genet., 2012 Aug. 20; 3:154). Additionally, RNA is amenable to both 5' and 3' end conjugations with a variety of functional moieties including fluorescent dyes, polyethylene glycol, or proteins.
[0141] A wide variety of modifications can be applied to chemically synthesized guide RNA molecules. For example, modifying an oligonucleotide with a 2'-OMe to improve nuclease resistance can change the binding energy of Watson-Crick base pairing. Furthermore, a 2'-OMe modification can affect how the oligonucleotide interacts with transfection reagents, proteins or any other molecules in the cell. The effects of these modifications can be determined by empirical testing.
[0142] In some embodiments, the guide RNA includes one or more phosphorothioate modifications. In some embodiments, the guide RNA includes one or more locked nucleic acids for the purpose of enhancing base pairing and/or increasing nuclease resistance.
[0143] A summary of these chemical modifications can be found, e.g., in Kelley et al., "Versatility of chemically synthesized guide RNAs for CRISPR-Cas9 genome editing," J. Biotechnol. 2016 Sep 10; 233:74-83; WO 2016205764; and U.S. Pat. No. 8,795,965 B2; each which is incorporated by reference in its entirety.
[0144] Sequence Modifications
[0145] The sequences and the lengths of the guide RNAs, tracrRNAs, and crRNAs described herein can be optimized. In some embodiments, the optimized length of guide RNA can be determined by identifying the processed form of tracrRNA and/or crRNA, or by empirical length studies for guide RNAs, tracrRNAs, crRNAs, and the tracrRNA tetraloops.
[0146] The guide RNAs can also include one or more aptamer sequences. Aptamers are oligonucleotide or peptide molecules that can bind to a specific target molecule. The aptamers can be specific to gene effectors, gene activators, or gene repressors. In some embodiments, the aptamers can be specific to a protein, which in turn is specific to and recruits / binds to specific gene effectors, gene activators, or gene repressors. The effectors, activators, or repressors can be present in the form of fusion proteins. In some embodiments, the guide RNA has two or more aptamer sequences that are specific to the same adaptor proteins. In some embodiments, the two or more aptamer sequences are specific to different adaptor proteins. The adaptor proteins can include, e.g., MS2, PP7, Q.beta., F2, GA, fr, JP501, M12, R17, BZ13, JP34, JP500, KU1, M11, MX1, TW18, VK, SP, FI, ID2, NL95, TW19, AP205, .PHI.Cb5, .PHI.Cb8r, .PHI.Cb12r, .PHI.Cb23r, 7s, and PRR1. Accordingly, in some embodiments, the aptamer is selected from binding proteins specifically binding any one of the adaptor proteins as described herein. In some embodiments, the aptamer sequence is a MS2 loop. A detailed description of aptamers can be found, e.g., in Nowak et al., "Guide RNA engineering for versatile Cas9 functionality," Nucl. Acid. Res., 2016 Nov. 16;44(20):9555-9564; and WO 2016205764, which are incorporated herein by reference in their entirety.
[0147] Guide: Target Sequence Matching Requirements
[0148] In classic CRISPR systems, the degree of complementarity between a guide sequence and its corresponding target sequence can be about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or 100%. In some embodiments, the degree of complementarity is 100%. The guide RNAs can be about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length.
[0149] To reduce off-target interactions, e.g., to reduce the guide interacting with a target sequence having low complementarity, mutations can be introduced to the CRISPR systems so that the CRISPR systems can distinguish between target and off-target sequences that have greater than 80%, 85%, 90%, or 95% complementarity. In some embodiments, the degree of complementarity is from 80% to 95%, e.g., about 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, or 95% (for example, distinguishing between a target having 18 nucleotides from an off-target of 18 nucleotides having 1, 2, or 3 mismatches). Accordingly, in some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence is greater than 94.5%, 95%, 95.5%, 96%, 96.5%, 97%, 97.5%, 98%, 98.5%, 99%, 99.5%, or 99.9%. In some embodiments, the degree of complementarity is 100%.
[0150] It is known in the field that complete complementarity is not required, provided there is sufficient complementarity to be functional. For CRISPR nucleases, modulations of cleavage efficiency can be exploited by introduction of mismatches, e.g., one or more mismatches, such as 1 or 2 mismatches between spacer sequence and target sequence, including the position of the mismatch along the spacer/target. The more central (i.e., not at the 3' or 5' ends) a mismatch, e.g., a double mismatch, is located; the more cleavage efficiency is affected. Accordingly, by choosing mismatch positions along the spacer sequence, cleavage efficiency can be modulated. For example, if less than 100% cleavage of targets is desired (e.g., in a cell population), 1 or 2 mismatches between spacer and target sequence can be introduced in the spacer sequences.
METHODS OF USING CRISPR SYSTEMS
[0151] The CRISPR-associated transposon systems described herein have a wide variety of utilities including modifying (e.g., deleting, inserting, translocating, inactivating, or activating) a target polynucleotide in a multiplicity of cell types. The CRISPR transposon systems have a broad spectrum of applications in, e.g., tracking and labeling of nucleic acids, drug screening, and treating various genetic disorders.
[0152] High-Throughput Screening
[0153] The CRISPR systems described herein can be used for preparing next generation sequencing (NGS) libraries. For example, to create a cost-effective NGS library, the CRISPR systems can be used to disrupt the coding sequence of a target gene, and the CRISPR enzyme transfected clones can be screened simultaneously by next-generation sequencing (e.g., on the Illumina system). CRISPR-associated transposases may enable more efficient preparation due to the ability to directly insert barcodes and adaptor sequences on a transposase payload.
[0154] Engineered Microorganisms
[0155] Microorganisms (e.g., E. coli, yeast, and microalgae) are widely used for synthetic biology. The development of synthetic biology has a wide utility, including various clinical applications. For example, the programmable CRISPR systems can be used to split proteins of toxic domains for targeted cell death, e.g., using cancer-linked RNA as target transcript. Further, pathways involving protein-protein interactions can be influenced in synthetic biological systems with e.g. fusion complexes with the appropriate effectors such as kinases or enzymes.
[0156] In some embodiments, guide RNA sequences that target phage sequences can be introduced into the microorganism. Thus, the disclosure also provides methods of vaccinating a microorganism (e.g., a production strain) against phage infection.
[0157] In some embodiments, the CRISPR systems provided herein can be used to engineer microorganisms, e.g., to improve yield or improve fermentation efficiency. For example, the CRISPR systems described herein can be used to engineer microorganisms, such as yeast, to generate biofuel or biopolymers from fermentable sugars, or to degrade plant-derived lignocellulose derived from agricultural waste as a source of fermentable sugars. More particularly, the methods described herein can be used to modify the expression of endogenous genes required for biofuel production and/or to modify endogenous genes, which may interfere with the biofuel synthesis. These methods of engineering microorganisms are described e.g., in Verwaal et al., "CRISPR/Cpf1 enables fast and simple genome editing of Saccharomyces cerevisiae," Yeast, 2017 Sep. 8. doi: 10.1002/yea.3278; and Hlavova et al., "Improving microalgae for biotechnology--from genetics to synthetic biology," Biotechnol. Adv., 2015 Nov. 1; 33:1194-203, both of which are incorporated herein by reference in the entirety.
[0158] Application in Plants
[0159] The CRISPR systems described herein have a wide variety of utility in plants. In some embodiments, the CRISPR systems can be used to engineer genomes of plants (e.g., improving production, making products with desired post-translational modifications, or introducing genes for producing industrial products). In some embodiments, the CRISPR systems can be used to introduce a desired trait to a plant (e.g., with or without heritable modifications to the genome), or regulate expression of endogenous genes in plant cells or whole plants.
[0160] In some embodiments, the CRISPR systems can be used to identify, edit, and/or silence genes encoding specific proteins, e.g., allergenic proteins (e.g., allergenic proteins in peanuts, soybeans, lentils, peas, green beans, and mung beans). A detailed description regarding how to identify, edit, and/or silence genes encoding proteins is described, e.g., in Nicolaou et al., "Molecular diagnosis of peanut and legume allergy," Curr. Opin. Allergy Clin. Immunol., 2011 June; 11(3):222-8, and WO 2016205764 A1; both of which are incorporated herein by reference in the entirety.
[0161] Gene Drives
[0162] Gene drive is the phenomenon in which the inheritance of a particular gene or set of genes is favorably biased. The CRISPR systems described herein can be used to build gene drives. For example, the CRISPR systems can be designed to target and disrupt a particular allele of a gene, causing the cell to copy the second allele to fix the sequence. Because of the copying, the first allele will be converted to the second allele, increasing the chance of the second allele being transmitted to the offspring. A detailed method regarding how to use the CRISPR systems described herein to build gene drives is described, e.g., in Hammond et al., "A CRISPR-Cas9 gene drive system targeting female reproduction in the malaria mosquito vector Anopheles gambiae," Nat. Biotechnol., 2016 January; 34(1):78-83, which is incorporated herein by reference in its entirety.
[0163] Pooled-Screening
[0164] As described herein, pooled CRISPR screening is a powerful tool for identifying genes involved in biological mechanisms such as cell proliferation, drug resistance, and viral infection. Cells are transduced in bulk with a library of guide RNA (gRNA)-encoding vectors described herein, and the distribution of gRNAs is measured before and after applying a selective challenge. Pooled CRISPR screens work well for mechanisms that affect cell survival and proliferation, and they can be extended to measure the activity of individual genes (e.g., by using engineered reporter cell lines). CRISPR-associated transposases may enable more efficient disruption due to the potential for inserting larger sequences. Arrayed CRISPR screens, in which only one gene is targeted at a time, make it possible to use RNA-seq as the readout. In some embodiments, the CRISPR systems as described herein can be used in single-cell CRISPR screens. A detailed description regarding pooled CRISPR screenings can be found, e.g., in Datlinger et al., "Pooled CRISPR screening with single-cell transcriptome read-out," Nat. Methods., 2017 March; 14(3):297-301, which is incorporated herein by reference in its entirety.
[0165] Saturation Mutagenesis (Bashing)
[0166] The CRISPR systems described herein can be used for in situ saturating mutagenesis. In some embodiments, a pooled guide RNA library can be used to perform in situ saturating mutagenesis for particular genes or regulatory elements. In other embodiments, a pooled library of DNA inserts containing a saturating mutagenesis library can be inserted by the CRISPR-associated transposase. Such methods can reveal critical minimal features and discrete vulnerabilities of these genes or regulatory elements (e.g., enhancers). These methods are described, e.g., in Canver et al., "BCL11A enhancer dissection by Cas9-mediated in situ saturating mutagenesis," Nature, 2015 Nov. 12; 527(7577):192-7, which is incorporated herein by reference in its entirety.
[0167] RNA-Related Applications
[0168] The CRISPR systems described herein can have various RNA-related applications, e.g., modulating gene expression, inhibiting RNA expression, screening RNA or RNA products, determining functions of lincRNA or non-coding RNA, inducing cell dormancy, inducing cell cycle arrest, reducing cell growth and/or cell proliferation, inducing cell anergy, inducing cell apoptosis, inducing cell necrosis, inducing cell death, and/or inducing programmed cell death. A detailed description of these applications can be found, e.g., in WO 2016205764 A1, which is incorporated herein by reference in its entirety.
[0169] Modulating Gene Expression
[0170] The CRISPR systems described herein can be used to modulate gene expression. The CRISPR systems can be used, together with suitable guide RNAs, to target gene expression, via control of RNA processing. The control of RNA processing can include, e.g., RNA processing reactions such as RNA splicing (e.g., alternative splicing), viral replication, and tRNA biosynthesis. The RNA targeting proteins in combination with suitable guide RNAs can also be used to control RNA activation (RNAa). RNA activation is a small RNA-guided and Argonaute (Ago)-dependent gene regulation phenomenon in which promoter-targeted short double-stranded RNAs (dsRNAs) induce target gene expression at the transcriptional/epigenetic level. RNAa leads to the promotion of gene expression, so control of gene expression may be achieved that way through disruption or reduction of RNAa. In some embodiments, the methods include the use of the RNA targeting CRISPR as substitutes for e.g., interfering ribonucleic acids (such as siRNAs, shRNAs, or dsRNAs). The methods of modulating gene expression are described, e.g., in WO 2016205764, which is incorporated herein by reference in its entirety.
[0171] Controlling RNA Interference
[0172] Control over interfering RNAs or microRNAs (miRNA) can help reduce off-target effects by reducing the longevity of the interfering RNAs or miRNAs in vivo or in vitro. In some embodiments, the target RNAs can include interfering RNAs, i.e., RNAs involved in the RNA interference pathway, such as small hairpin RNAs (shRNAs), small interfering (siRNAs), etc. In some embodiments, the target RNAs include, e.g., miRNAs or double stranded RNAs (dsRNA).
[0173] In some embodiments, if the RNA targeting protein and suitable guide RNAs are selectively expressed (for example spatially or temporally under the control of a regulated promoter, for example a tissue- or cell cycle-specific promoter and/or enhancer), this can be used to protect the cells or systems (in vivo or in vitro) from RNA interference (RNAi) in those cells. This may be useful in neighboring tissues or cells where RNAi is not required or for the purposes of comparison of the cells or tissues where the effector proteins and suitable guide RNAs are and are not expressed (i.e., where the RNAi is not controlled and where it is, respectively). The RNA targeting proteins can be used to control or bind to molecules comprising or consisting of RNAs, such as ribozymes, ribosomes, or riboswitches. In some embodiments, the guide RNAs can recruit the RNA targeting proteins to these molecules so that the RNA targeting proteins are able to bind to them. These methods are described, e.g., in WO 2016205764 and WO 2017070605, both of which are incorporated herein by reference in the entirety.
[0174] Therapeutic Applications
[0175] The CRISPR-associated transposon systems described herein can have diverse therapeutic applications. Without wishing to be limiting, one framework to organize the range of therapeutic applications enabled by CRISPR-associated transposon systems is determining whether the therapeutic genetic modification is a correction of a native locus, or locus-agnostic gene augmentation.
[0176] For therapeutic correction of a native locus, the new CRISPR systems can be used to correct mutations responsible for monogenic diseases (e.g., Duchenne Muscular Dystrophy, Cystic Fibrosis, etc.), or introduce beneficial mutations (e.g., Pcsk9 for lowered cardiovascular disease risk, BCL11a for increasing fetal hemoglobin expression in treating hemoglobinapthies, CCR5 for HIV resistance, etc.) Using CRISPR-associated transposons may have key advantages. The first advantage is the potential to use a single therapeutic construct to correct a diverse set of genetic mutations, whether in a single patient or across the patient population. This is due to the fact that transposition enables the replacement of a large gene fragment, rather than the short-range corrections enabled by homology directed repair following DNA cleavage or base editing. Second, using CRISPR-associated transposons may enable therapeutic modifications in a broad range of post-mitotic cells and tissues, given that the enzymatic mechanism of action of the transposon is anticipated to be independent from DNA repair mechanisms such as homologous recombination or homology directed repair.
[0177] For locus-agnostic gene augmentation, the new CRISPR systems can be used to introduce gene fragments that provide a therapeutic benefit. This includes gene therapies to replace missing or defective native enzymes, including, but not limited to, RPE65 in Leber's Congenital Amaurosis, adenosine de-aminase in Severe Combined Immunodeficiency (SCID), and any number of defective enzymes causing diseases of inborn errors of metabolism. In addition to supplementing defective enzymes, the CRISPR-associated transposons may provide augment existing cellular properties, such as the introduction of custom chimeric antigen T-cell receptors for the production of cell therapies. The CRISPR-associated transposons may have the advantages of greater therapeutic durability when the transgene is incorporated genomically (vs. episomal expression in some recombinant viral vectors), and of greater control in transgene insertion, to ensure that the transposon is directed towards chromosomal locations that are at once "safe harbors" for genome editing but still transcriptionally active. This is differentiated from the possible deleterious effects of pseudo-random insertion of integrating viruses, as well as the promiscuous insertion using transposases such as Tn5, piggyBac, and Sleeping Beauty.
[0178] Altogether, a programmable transposon would enable a range of genome modifications that may prove highly valuable to therapeutic development, whether directly via therapeutic gene corrections, or indirectly by enabling the engineering of cells and disease models.
[0179] Delivery
[0180] Through this disclosure and the knowledge in the art, the CRISPR systems described herein, or components thereof, nucleic acid molecules thereof, or nucleic acid molecules encoding or providing components thereof, can be delivered by various delivery systems such as vectors, e.g., plasmids, viral delivery vectors. The new CRISPR enzymes and/or any of the RNAs (e.g., guide RNAs) can be delivered using suitable vectors, e.g., plasmids or viral vectors, such as adeno-associated viruses (AAV), lentiviruses, adenoviruses, and other viral vectors, or combinations thereof. The proteins and one or more guide RNAs can be packaged into one or more vectors, e.g., plasmids or viral vectors.
[0181] In some embodiments, the vectors, e.g., plasmids or viral vectors, are delivered to the tissue of interest by, e.g., intramuscular injection, intravenous administration, transdermal administration, intranasal administration, oral administration, or mucosal administration. Such delivery may be either via a single dose, or multiple doses. One skilled in the art understands that the actual dosage to be delivered herein may vary greatly depending upon a variety of factors, such as the vector choices, the target cells, organisms, tissues, the general conditions of the subject to be treated, the degrees of transformation/modification sought, the administration routes, the administration modes, the types of transformation/modification sought, etc.
[0182] In certain embodiments, the delivery is via adenoviruses, which can be at a single dose containing at least 1.times.10.sup.5 particles (also referred to as particle units, pu) of adenoviruses. In some embodiments, the dose preferably is at least about 1.times.10.sup.6 particles, at least about 1.times.10.sup.7 particles, at least about 1.times.10.sup.8 particles, and at least about 1.times.10.sup.9 particles of the adenoviruses. The delivery methods and the doses are described, e.g., in WO 2016205764 A1 and U.S. Pat. No. 8,454,972 B2, both of which are incorporated herein by reference in the entirety.
[0183] In some embodiments, the delivery is via plasmids. The dosage can be a sufficient number of plasmids to elicit a response. In some cases, suitable quantities of plasmid DNA in plasmid compositions can be from about 0.1 to about 2 mg. Plasmids will generally include (i) a promoter; (ii) a sequence encoding a nucleic acid-targeting CRISPR enzymes, operably linked to the promoter; (iii) a selectable marker; (iv) an origin of replication; and (v) a transcription terminator downstream of and operably linked to (ii). The plasmids can also encode the RNA components of a CRISPR complex, but one or more of these may instead be encoded on different vectors. The frequency of administration is within the ambit of the medical or veterinary practitioner (e.g., physician, veterinarian), or a person skilled in the art.
[0184] In another embodiment, the delivery is via liposomes or lipofectin formulations and the like, and can be prepared by methods known to those skilled in the art. Such methods are described, for example, in WO 2016205764 and U.S. Pat. Nos. 5,593,972; 5,589,466; and 5,580,859; each of which is incorporated herein by reference in its entirety.
[0185] In some embodiments, the delivery is via nanoparticles or exosomes. For example, exosomes have been shown to be particularly useful in delivery RNA.
[0186] Further means of introducing one or more components of the new CRISPR systems to the cell is by using cell penetrating peptides (CPP). In some embodiments, a cell penetrating peptide is linked to the CRISPR enzymes. In some embodiments, the CRISPR enzymes and/or guide RNAs are coupled to one or more CPPs to effectively transport them inside cells (e.g., plant protoplasts). In some embodiments, the CRISPR enzymes and/or guide RNA(s) are encoded by one or more circular or non-circular DNA molecules that are coupled to one or more CPPs for cell delivery.
[0187] CPPs are short peptides of fewer than 35 amino acids either derived from proteins or from chimeric sequences capable of transporting biomolecules across cell membrane in a receptor independent manner. CPPs can be cationic peptides, peptides having hydrophobic sequences, amphipathic peptides, peptides having proline-rich and anti-microbial sequences, and chimeric or bipartite peptides. Examples of CPPs include, e.g., Tat (which is a nuclear transcriptional activator protein required for viral replication by HIV type 1), penetratin, Kaposi fibroblast growth factor (FGF) signal peptide sequence, integrin .beta.3 signal peptide sequence, polyarginine peptide Args sequence, Guanine rich-molecular transporters, and sweet arrow peptide. CPPs and methods of using them are described, e.g., in Hallbrink et al., "Prediction of cell-penetrating peptides," Methods Mol. Biol., 2015;1324:39-58; Ramakrishna et al., "Gene disruption by cell-penetrating peptide-mediated delivery of Cas9 protein and guide RNA," Genome Res., 2014 June;24(6):1020-7; and WO 2016205764 A1; each of which is incorporated herein by reference in its entirety.
[0188] Various delivery methods for the CRISPR systems described herein are also described, e.g., in U.S. Pat. No. 8,795,965, EP 3009511, WO 2016205764, and WO 2017070605; each of which is incorporated herein by reference in its entirety.
EXAMPLES
[0189] The invention is further described in the following examples, which do not limit the scope of the invention described in the claims.
Example 1--Identification of Minimal Components for a CLUST.009925 CRISPR System (FIGS. 1-7)
[0190] This protein family describes a mobile genetic element associated with CRISPR systems found in organisms including but not limited to Streptomyces and Salinispora (FIG. 1). The naturally occurring loci containing this effector are depicted in FIG. 1, indicative that Effector A (.about.410 amino acids) has a high co-occurrence with the effector protein Effector B (.about.250 aa). CLUST.009925 effectors include the exemplary effectors detailed in TABLES 1-5.
[0191] An HMM profile constructed from the multiple sequence alignment of Effector A, and
[0192] Pfam and Uniprot database searches revealed an rve domain indicative of integrase activity, as well a HTH domain indicative of nucleic acid binding (FIGS. 2-4).
[0193] A domain search as described above for Effector B revealed an IstB domain originating from the IS21 family of transposons, as well as a Bac_DnaA domain indicative of DNA unwinding activity. Together these domains are indicative of accessory functions involved in the transposition process (FIGS. 5-7).
TABLE-US-00001 TABLE 1 Representative CLUST.009925 Effector Protein Accessions effector # c species accession accessory spacers cas1 cas2 s Actinomadura macra NBRC 14102 WP_067466072.1 WP_067466075.1 64 Y Y 4 (NZ_BCQT01000020) Cellulomonas fimi ATCC 484 (CP002666) AEE46996.1 WP_013772021.1 2 N N 4 Granulicoccus phenolivorans DSM 17626 WP_081684113.1 WP_026927625.1 57 N N 4 (NZ_AUHH01000033) Granulicoccus phenolivorans NBRC 107789 WP_081684113.1 WP_026927625.1 79 N N 4 (NZ_BCQZ01000006) Mycobacterium canettii CIPT 140060008 (NC_019950) WP_015288588.1 WP_015288589.1 2 N N 4 Mycobacterium canettii CIPT 140070010 (FO203509) CCK59973.1 N/A 18 N N 3 Mycobacterium kansasii 732 (JANZ01000001) EUA10187.1 EUA10185.1 2 N N 4 Mycobacterium kansasii 732 (JANZ01000006) EUA10187.1 EUA10185.1 6 N N 4 Mycobacterium rhodesiae (MVIH01000005) WP_083119439.1 N/A 2 N N 4 Mycobacterium tuberculosis MD17240 (KK312697) WP_003414864.1 N/A 3 N N 4 Nakamurella multipartita DSM 44233 (NC_013235) WP_015748250.1 N/A 159 N N 4 Nocardia higoensis NBRC 100133 WP_040800760.1 WP_040800758.1 4 N N 4 (NZ_BAGA01000172) Salinispora arenicola CNH646 (NZ_AZWH01000022) WP_029025341.1 WP_018802534.1 40 N Y 3 Salinispora arenicola CNS-205 (NC_009953) WP_012181244.1 WP_012181245.1 33 N N 4 Salinispora arenicola CNS-205 (NC_009953) WP_012182723.1 WP_012182724.1 44 Y Y 4 Salinispora arenicola CNY231 (NZ_KB896461) WP_018802787.1 WP_018802788.1 3 N N 4 Salinispora pacifica CNS996 (NZ_KB895134) WP_018738805.1 WP_018738804.1 6 N N 3 Salinispora pacifica CNT584 (NZ_KB896797) WP_018815463.1 WP_018815462.1 44 N N 3 Salinispora pacifica DSM 45549 (NZ_KB900614) WP_018254221.1 WP_018252763.1 17 Y Y 3 Streptacidiphilus albus JL83 (NZ_JQML01000001) WP_034088311.1 WP_081983172.1 11 N N 4 Streptomyces avermitilis MA-4680 = NBRC 14893 BAC75258.1 BAC75257.1 64 N N 4 (BA000030) Streptomyces fradiae (JNAD01000015) WP_043464545.1 WP_043464542.1 20 N N 4 Streptomyces fradiae (LGSP01000169) WP_043464545.1 N/A 14 N N 4 Streptomyces fradiae (NZ_MCNU01000065) WP_043464545.1 WP_043464542.1 9 N N 4 Streptomyces niveiscabiei (NZ_LIRL01000346) WP_055721037.1 WP_055721038.1 6 N N 4 Streptomyces noursei ATCC 11455 (NZ_CP011533) WP_067342985.1 WP_067342984.1 8 N N 4 Streptomyces pactum (CP019724) WP_055422289.1 WP_055422288.1 36 N N 4 Streptomyces pactum (NZ_LIQD01000008) WP_055422289.1 WP_055422288.1 21 N N 4 Streptomyces thermoautotrophicus (JYIJ01000009) KWX05882.1 WP_067067772.1 14 Y Y 4 Streptomyces thermoautotrophicus WP_079046208.1 WP_067067772.1 14 Y Y 3 (NZ_JYIJ01000009) Streptomyces viridifaciens (NZ_MPLE01000001) WP_072653819.1 N/A 42 N N 4 Streptomyces viridifaciens (NZ_MPLE01000001) WP_030291260.1 N/A 42 N N 4 Streptosporangium sp. M26 (NGFP01000014) WP_086568729.1 WP_086568727.1 5 N N 4 Tessaracoccus flavescens (CP019607) WP_077349293.1 N/A 151 Y Y 4 Tessaracoccus sp. T2.5-30 (NZ_CP019229) WP_068751155.1 WP_068751156.1 2 N N 4 Trueperella pyogenes (CP012649) ALD73443.1 WP_052251142.1 3 N Y 4 indicates data missing or illegible when filed
TABLE-US-00002 TABLE 2 Amino acid sequences of Representative CLUST.009925 Effector A Proteins >WP_072653819.1 [Streptomyces viridifaciens] MIRVEDWAEIRRLHRAEQMPIRAIARHLGISKNTVKRALATDRPPVYQRPLKGSAVDAVEPAIRELLRQTPAMP- ATVIAERIGW DRGLTILKERVRELRPAYLPVDPVSRTVYSPGELAQCDLWFPPAEIPLGYGQTGRPPVLVVVSGYSRVITARML- PSRSTADLID GHWRLLTGWGAVPKTLVWDNESGVGQGKLTSEFAAFAGLLATRIHLCRPRDPEAKGLVERVNGYLETSFLPGRT- FAGPGDFNTQ LTAWLQVANRRHHRTIGCRPAERWEADRSEMLTLPPVAPPNWWPQTTRLGRDHYIRVDTCDYSVHPRAIGQQIT- VRADTEEIVA THRTGEVVARHSRCWARHQTITDPEHTAAAQTLRGQAIRQRASRAWASSLTALAPDDLGVEVEQRELGSYDRIF- TLIEGGAGKE DT (SEQ ID NO: 1) >WP_003414864.1 [Mycobacterium tuberculosis MD17240] MLTVEDWAEIRRLHRAEGLPIKMIARVLGISKNTVKSALESNQQPKYERAPQGSIVDAVEPRIRELLQAYPTMP- ATVIAERIGW ERSIRVLSARVAELRPVYLPPDPASRTTYVAGEIAQCDFWFPPIELPVGFGQTRTAKQLPVLTMVCAYSRWLLA- MLLPSRCAED LFAGWWRLIEALGAVPRVLVWDGEGAIGRWRGGRSELTTECQAFRGTLAAKVLICRPADPEAKGLIERAHDYLE- RSFLPGRVFA SPADFNAQLGAWLALVNTRTRRALGCAPTDRIGADRAAMLSLPPVAPATGWCTSLRLPRDHYVRCDSNDYSVHP- GVIGHRVLVR ADLERVHVFCDGELVADHERIWAVHQTVSDPAHVEAAKVLRRRHFSAASPVVEPQVQVRSLSDYDDALGVDIDG- GVA (SEQ ID NO: 2) >WP_043464545.1 [Streptomyces fradiae] MISVEDWAEIRRLHRSEGMPIRAIARKLGISRTTVRRAVASDRPPKYERAPKGSIVDEVEPRIRELLEVWPDMP- ATVVAERIGW QRGMTVLRDRLRDLRRDYVPADPASRTAYEPGELVQCDLWFPPADIPLGFGQVGSPPVLVMVSGYSRWITARML- PTRSAADLIA GHWRLLTGLGAVPKALVWDNEGAVGSWRGGRPRLTEDFAAFAGLLGIRIVQCRPGDPEAKGMVERANGYLETSF- LPGRVFASPT DFNVQLEDWLRRANRRVHRTLQARPADRIDADRAGMLPLPPVDPPGWWRTSLRLPRDHYVRVDTCDYSVHPLAI- GRRIEVKAGL EGVVVFCEGTEVARHVRCWARHQTITDPAHSAAAGAARLTACKAPADDGEAEVEQRSLDTYDRIFGVIEGGLGQ- EEGIA (SEQ ID NO: 3) >WP_043464545.1 [Streptomyces fradiae] MISVEDWAEIRRLHRSEGMPIRAIARKLGISRTTVRRAVASDRPPKYERAPKGSIVDEVEPRIRELLEVWPDMP- ATVVAERIGW QRGMTVLRDRLRDLRRDYVPADPASRTAYEPGELVQCDLWFPPADIPLGFGQVGSPPVLVMVSGYSRWITARML- PTRSAADLIA GHWRLLTGLGAVPKALVWDNEGAVGSWRGGRPRLTEDFAAFAGLLGIRIVQCRPGDPEAKGMVERANGYLETSF- LPGRVFASPT DFNVQLEDWLRRANRRVHRTLQARPADRIDADRAGMLPLPPVDPPGWWRTSLRLPRDHYVRVDTCDYSVHPLAI- GRRIEVKAGL EGVVVFCEGTEVARHVRCWARHQTITDPAHSAAAGAARLTACKAPADDGEAEVEQRSLDTYDRIFGVIEGGLGQ- EEGIA (SEQ ID NO: 3) >WP_043464545.1 [Streptomyces fradiae] MISVEDWAEIRRLHRSEGMPIRAIARKLGISRTTVRRAVASDRPPKYERAPKGSIVDEVEPRIRELLEVWPDMP- ATVVAERIGW QRGMTVLRDRLRDLRRDYVPADPASRTAYEPGELVQCDLWFPPADIPLGFGQVGSPPVLVMVSGYSRWITARML- PTRSAADLIA GHWRLLTGLGAVPKALVWDNEGAVGSWRGGRPRLTEDFAAFAGLLGIRIVQCRPGDPEAKGMVERANGYLETSF- LPGRVFASPT DFNVQLEDWLRRANRRVHRTLQARPADRIDADRAGMLPLPPVDPPGWWRTSLRLPRDHYVRVDTCDYSVHPLAI- GRRIEVKAGL EGVVVFCEGTEVARHVRCWARHQTITDPAHSAAAGAARLTACKAPADDGEAEVEQRSLDTYDRIFGVIEGGLGQ- EEGIA (SEQ ID NO: 3) >WP_067342985.1 [Streptomyces noursei ATCC 11455] MIHVEDWAEIRRLHRAEQMPIRAIARHLGISKNTVKRALAHDRPPKYERPAKGSAVDAVEVQIRELLRETPTMP- ATVIAERIGW QRGMTILRERVRELRPAYLPVDPVSRTTYRPGELAQCDLWFPEADIPLGYGQTGRPPVLVMVSGYSRIIAARML- PSRRSGDLID GHWRLLTAWGAVPRMLVWDNEAGVGKGRVTSEFAAFAGLLATKIYLCRPRDPEAKGLVERANGYLETSFLPGRH- FTGPDDFNTQ LDAWLKVANRRVHRTLQARPSDRWEADRAGMLALPPVDPPSWWRFQIRLGRDHYVRVDTCDYSVDPAAIGRMVT- VLCDNDEVIV LAQGGEIVARHPRCWARHQTLTDPHHAAAGDVMRREVHRRHGAACAAAAPDVVEVEQRELGTYDRLFTVIDGGN- NQEAG (SEQ ID NO: 4) >WP_068751155.1 [Tessaracoccus sp. T2.5-30] MISVEDWAEIRRLHRSEGLAIKAIARQLGVARNTVRSALASDVPPRYERESPGSLVDAVEPQTRALLAATPRMP- ATVIAERISW EHSSSVLRARVAELRPLYLPADPADRTQYRAGEIVQCDLWFPPRVVPVADGVLAAPPVLTMVAAWSGFIAALLL- PTRQTGDLLA GMWQLLTGSFGAVPRMLVWDNEAGIGQHRRLTVPARGFAGTLGTRIYQTNPRDPEAKGVVERANGFLQTSFMPG- REFVSPADFN TQLADWLPRANQRLLRRTGARPADMLASEAAAMSSLPPVAPLVGSSSRVRLGRDYYVRVAGNDYSVDPAVIGRF- VDVGCDMDTV TVTCAGQPVASHARCWDQRRTLTDPAHVATARTLRAAHQAQKTAGRVDARGAGEDVALRPLSVYDDLFDLTGID- PASASGAVA (SEQ ID NO: 5) >WP_086568729.1 [Streptosporangium sp. M26] MIKVEDWAEIRRLYRAEAMPIKAIARHLGISKNTVKRALAADAPPKYQRPSKGSIVDAAEPQTRALLAKFPTMP- ATVIAERIGW ERSITVLKDRIRVLRPQFKPVDPASRTTYQAGELAQCDLWFPPVKVPVGAGHLASPPVLVMVSGYSRWMLPSRT- AGDLFAGHWS LLSDLGAVPKTLVWDNESAIGQRRGGKAQLTADANAFRGALGVGITQCRPGDPEAKGLVERANGYLETSFLPGR- TFSCPQDFNT QLADWLPMANERHHRRIQCRPLDRLRADLAAMVALPPIAPLLGWRTSTRLARDHYVRIASCDYSVHPSVIGRLV- EVIADLEEVT VTCGGQVVARHRRCWAPHQTITDALHAQVAAAMRRTCLRTATAPAKAEVEVEQRRLSDYDALLGIEGVA (SEQ ID NO: 6) >WP_083119439.1 [Mycobacterium rhodesiae] MISLEDWALIRHLHRSEGLSQRAIARQLGIARDTVASALASDGPPKYERAAVPSAINEVEPRIRALLTAYPQLP- ATVIAERVGW TGSISWFRERVRAIRPEYLPADPVDRLEHPPGRVIQCDLWFPAPKIAVGFGQESILPVLVMVAAFSRFIAAVML- PSRQTMDLVA GMWQLLSGSFAAVPRELWWDNEAGIGRRGRLTDPVTALVGTLGARLVQLRPYDPESKGMVERANRYLETSFLPG- RSFTSPQDFN DQLQQWLPVANSRRVRVLDGRPIDFLDADRAQMLRLPPVPPVTETVASVRLGRDYYVRVAGNDYSVDPAAIGQL- VDVTTTLTQV TVSRAGRLLAAHERCWAARQTLTDPAHVQAAATLRQQFHAGPAPQAGAHLLRDLADYDRAFGVDFSTGTVTSDG- EVA (SEQ ID NO: 7) >KWX05882.1 [Streptomyces thermoautotrophicus] MEDWAEIRRLHRAEGVPIKEIARRLGVARNTVRAALNSDRPPKYERASRGQVADAFEPQMRALLKEWPRMPAPV- IAERIGWPYS MAPLRKRLALIRPEYLGIDPVDRVTYGPGQVAQCDLWFPQTRIPVTAGQERMLPVLVMTLGFSRFMTATMIPTR- QAGDILSGMW QLIRGIGRVTKTLVWDREAAIGGTGKVSAPAAAFAGTLATTIRLAPPRDPEFKGMVERNNQYLETSFLPGRRFV- SPADFNDQLG DWLVRANSRTVRSIQGRPVDLLEADCQAMIPLPPVTPPIGLNHRVRLGRDYYVRVDTVDYSVDPQAIGRFVDVT- ASLETVTVLC DGQLVARHARSWARQGVITDPVHAATAARMRQALAEDRQRRQAAVRRHADGHAVALRALPDYDALFGVDFNPPS- TKAK (SEQ ID NO: 8) >WP_012181244.1 [Salinispora arenicola CNS-205] MLSVEDWAEIRRLHRAERMAIKAICRRLGVSRNTVRKALASHEPPRYQRAAKGSIVDAVEPQIRVLLAEFPDMP- TTVIMERVGW TRGKTVFADRVQQLRPLFRRPDPSQRTEYLPGELAQCDLWFPPADVPLGFGQVGRPPVLVMVSGYSRWLSAVMI- PSRQSPDLLV GHWRLISGWRRVPKALVWDNESAVGQWRAGRPQLTEAMNAFRGTLGIKVIQCRPADPEAKGLVERANGYLETSF- LPGRRFASPG DFNAQLSEWLVRANNRQHRVLGCRPAERWDADRQAMLPLPPVAPVVGWRQATRLPRDHYVRMDGNDYSVHPSVV- GRRVEVTADG DQVTVLCDGRSVARHDRCWAKHQSITDTAHRQAAADLRVAAQRTPTAAVDAQVERRPLSDYDRLFGLDEVAA (SEQ ID NO: 9) >ALD73443.1 [Trueperella pyogenes] MDDWAQIRILRNEGMSIRKIASTVGCAKKTVERALASNTPPSYKQRAPQKTAFDEFELDVRALIDDVPDLPATV- LAQRVGWTGS MSWFRENVRRIRPEYMPKDPVDVLDHKAGQQIQCDLMFPDDGITNDMGVPAKFPVLVMVSSYSREMAACVLPTK- TTGDLVSGMW MLLHDRFQVVPQHLLWDHESGIGNKRLVDQVVGFSGTLGLKVRQAPPRDPETKGIVERHNEYLQTSFFPGRRFT- DPIDAQAQLD SWIDNIANKRIHATLKQRPVDRWAADKEAMSALPPYAPQTGLRRQVRLPRNYYVSVDSNRYSVSPAAIGKIVTI- FVELHRVVIT DSAGVIVADHRRVWGKNVTVTDPDHQRLAKQMRQSLAAPKPRACDIAVEAADLSIYDKIAGWEQSA (SEQ ID NO: 10) >WP_012182723.1 [Salinispora arenicola CNS-205] MLSVEDWAEIRRLRRSEGMAIQATARRLRMSRNTVKKALASDEPPRYRRVAKGSIVDAVEPQTRALLAEFPEMP- TTVIMVRVGW TRGKTVFCDRVQQLRPLFRRPDPAQRTEYLPGELAQCDLWFPPADVSLGFGQVGRPPVLVMVSGYSRWLSAVMI- PSRQSPDLLD GHWTLISGWDRTPKGLVWDNESAVGQWRAGRPQLTEAMNAFRGTLGIKVIQCRPADPEAKGLVERANGYLETSF- LPGRRFASPQ DFNAQLTEWLVRANNRQHRMLGCRPVDRWDADRAAMLSLPPVAPVVGWRRTTRLPRDHYVRLDSNDYSVHPAAV- GRRVDIVADA DRVQVFCENRLFARHDRCWAKHQSITDPAHRQAAADLRTAARQTPAAAGTTEVEHRQLADYDRMFGLDEVAA (SEQ ID NO: 11) >WP_077349293.1 [Tessaracoccus flavescens] MISVEDWAEIRRLHKSEKMAIKAIARQLGVARNTVRAALSSDGPPKYERAPVGSAVDAFEPQTRALLSRTPTMP- ATVIAERIGW TRSGSVLRARVAELRPLFAPPDPADRTEYQPGEIVQCDLWFPPKIVRVAADVWTAPPVLTMVAAWSGFIAAVLL- PSRTTGDLLA GMWQLLSGSFGGVPKTLVWDNESGIGQHRRLTVAARAFAGTLGTRIFQTRPRDPEAKGVVERANGYLQTSFLPG- REFGSPSEFN TQLAAWLPRANQRLLRRTGAAPAARLAAEVAAMTALPPVPPTVGFTDRIRLGRDYYVRVMGNDYSVDPAAIGRF- VDITCDLEHV TVSCAGTVVAEHGRCWDLRCTITDPAHVAAAKHLRAAFQSRNTPTTGRVEPSHQVGMRSLSDYDELFNLNTATA- LSVKEVA (SEQ ID NO: 12) >WP_055422289.1 [Streptomyces pactum] MISVEDWAEIRRLHRAEQMPVRAIARKLGIARNTVRRAIADDAPPKYQRAPKGSIVDAVEPQIRELLEQWPEMP- ATVIAERIGW DRGLTVLKDRVRDLRPAYRPADPASRTVYEPGEIGQCDLWFPPADIPLGFGQVGRPPVLVMVAGYSRWITARML- PSRSAADLIA GHWRLLTELGAVPRVLVWDNEGAVGSWRPGGPQLTDEFAAFAGLLGVKFLLCKPRDPEAKGLVERANGYLETSF- LPGRVFTSPA DFNIQLADWLTRANRRIHRTLQARPADRLEADRSRMLALPPIAPPGWWKASLRLPRDHYVRLDTCDYSVHPLAV- GRRVQVAADL DQVLVTCDGVEVARHARSWARHQTITDPDHAAAAAAARKAAAGTKPAPVDVTEVEERSLEAYDRIFGVIDGGLS- TGEGAA (SEQ ID NO: 13) >WP_055422289.1 [Streptomyces pactum] MISVEDWAEIRRLHRAEQMPVRAIARKLGIARNTVRRAIADDAPPKYQRAPKGSIVDAVEPQIRELLEQWPEMP- ATVIAERIGW DRGLTVLKDRVRDLRPAYRPADPASRTVYEPGEIGQCDLWFPPADIPLGFGQVGRPPVLVMVAGYSRWITARML- PSRSAADLIA GHWRLLTELGAVPRVLVWDNEGAVGSWRPGGPQLTDEFAAFAGLLGVKFLLCKPRDPEAKGLVERANGYLETSF- LPGRVFTSPA DFNIQLADWLTRANRRIHRTLQARPADRLEADRSRMLALPPIAPPGWWKASLRLPRDHYVRLDTCDYSVHPLAV- GRRVQVAADL DQVLVTCDGVEVARHARSWARHQTITDPDHAAAAAAARKAAAGTKPAPVDVTEVEERSLEAYDRIFGVIDGGLS- TGEGAA (SEQ ID NO: 13) >WP_015288588.1 [Mycobacterium canettii CIPT 140060008] MLSVEDWAEIRRLRRSERLPISEIARVLKISRNTVKSALASDGPPKYQRAAKGSVADEAEPRIRELLAAYPRMP- ATVIAERIGW WYSIRTLSGRVRELRPLYLPPDPASRTTYVAGEIGQCDFWFPDVVVPVGYGQVRTATALPVLTMVCGYSRWASA- LLIPTRTAED LYAGWWQHLSTLGAVPRVLVWDGEGAVGRWRARQPELTAACHAFRGTLAAKVWICKPGDPEAKGLVERFHDYLE- RAFLPGRVFA SPADFNTQLQAWLVRANHRQHRVLGCRPADRIEADTAAMLTLPPVGPSIGWRTSTRLPRDHYVRLDGNDYSVHP- VAIGRRIEIT ADLSRVRVWCGGTLVADHDRIWAKHQTISDPEHVVAAKLLRRKRFDIVGPPHHVEVEQRLLTTYDTVLGLDGPV- A (SEQ ID NO: 14) >CCK59973.1 [Mycobacterium canettii CIPT 140070010] MSVEEWAKIRRLHRSEGRSISEIARMLGVSRNTVRAALSSADPPKYRRRPMGSVADAAEPKIRELLTADPRMPA- TVIAERIGWS HSIRTLRGRVRELRPLYLPPDPASTTAYSPGAIAQCAFWFPNVVVPVGFGQIRTAAALPVLTMISGYSWWVSAA- LLPTRSAEDL YVGWWQQLSMLGATPRAFVWDGEDAVGRWRTQQPELTAACHTFCSALDATVQICRPGDPEGRRLVNQFRGCLER- AFLPGRRFSS RADFNTQLREWLTQVNHHRHRALGFRPADRIEADKAAMLALPRVRPALGWQASSTRLQRDQLVRIDGNNYSVHP- DAVGRPIEVT ADLDRVQVWCQGILVADHARVWAKHQTIRDPAHNHR (SEQ ID NO: 15)
>AEE46996.1 [Cellulomonas fimi ATCC 484] MTVEDWAEIRRLHRVDGMAIKAIARRLGVSRNAVRRALARDAPPRYERAAKGSLVDQVDPVVRGLLASCPTMPA- TVIAQRIGWT HSLTILKDRVRVLRPYYLAPDPATRTSYEAGQRVQCDLWFPPVDVPLGAGQVGSPPVLVMVAGYSRMMFATMLP- SRQAPDLIAG HWRLLNAMGVVPRELVWDNEAAVGSWRAGKPKLTEEFEAFRGVLGIGVHQCRPRDPEAKGLVERANGYLETSFL- PGRMFTSPGD FNTQLADWLVLANRRPHRALGCKPVERWTADRDAMLALPPVAPQLGWRARVRLPRDHYVRLGSNDYSVDPVAVG- RFVEVIADLQ QVTVRLGTRVVAVHERCWARWQTITDPAHRAAALEMASVAAHRPASSAAPDEVEQRDLGAYDVAFGLTDVA (SEQ ID NO: 16) >WP_015748250.1 [Nakamurella multipartita DSM 44233] MLIVDDWAEIRRLHRAEGMPIRAIARRLGCSKNTVKRALAAQGPPRYERATVGSAMDAFEPAIRALLAEFPSMP- TSVIMERVGW SRGRTVFFERVAVLRPLFVPPDPASRTEYGPGQLAQCDLWFPPVDVPVGFDQVARPPVLVMVSGFSRVITARML- PSRQSADLLA GHWELLLGWGRLPRALVWDNEAAVGRWRGGRPELTEPMNAFRGTLGIKVVLCAPRDPESKGLVERANGYLETSF- LPGRTFTSPA DFNAQLAAWLVRANQRQHRRLGCRPIDRWAADLAAMMAMPPVAPVVGWTASPLLPRDHYVRVDSNDYSVHPGVV- GRRVQVLADL DQVVVTCAGTVVAAHERCWARRQTITDADHAQAAAALRAAHRERVRRPVETDVAVRELADYDRIFGLQDDLDDH- PSVDVADGEV A (SEQ ID NO: 17) >BAC75258.1 [Streptomyces avermitilis MA-4680 = NBRC 14893] MEDWAEIRRLHRAEQMPVRAIARKLGIARNTVRRAIADDAPPKYQRAPKGSIVDAVEPQIRELLEQWPEMPATV- IAERIGWDRG LTVLKDRVRDLRPAYRPADPASRTVYEPGEIGQCDLWFPPADIPLGFGQVGRPPVLVMVAGYSRWITARMLPSR- SAADLIAGHW RLLTELGAVPRVLVWDNEGAVGSWRSGGPQLTDEFAAFAGLLGIKFLLCKPRDPESKGLVERANGYLETSFLPG- RVFTSPADFN IQLADWLTRANRRIHRTLQARPADRLEADRSRMLALPPIAPPGWWKASLRLPRDHYVRLDTCDYSVHPLAVGRR- IEVAAGLDQV LVTCDGVEVARHARSWARHQTITDPDHAAAAAVARKAAAGTKPAPVDVTEVEERSLDTYDRIFGVIDGGLSTGE- GAA (SEQ ID NO: 18) >WP_030291260.1 [Streptomyces viridifaciens] MIQVEDWAEIRRLHRAEGLSARAVARELGISRGTVLRALASDRPPVYRRAPKGSAVDAVEPAIRELLKQTPTMP- ATVIAERIGW ERGLSILRERVRELRPAYLPADPVSRTVYQPGELAQCDLWFPPVDIPLGYGQSGRPPVLVIVSGYSRMITARML- PSRTTGDLID GHWRLLNSWGAVPKTLVWDNETGVGKGRLTSEFAAFAGLLATRVHLCRPRDPEAKGLVERANGYLETSFVPGRT- FTGPDDFNRQ LTAWLRIANRRHHRSIDARPADRWEADRAQMLTIPPVAPPHWWPLRVRIGRDHYIRVDTNDYSVHPRVIGRTVT- VHADNEEITV TCGDDVVARHARCWARHQSLTDPDHAAAANLMRGEVIHQQAARAATARAAVLAPDSLGIEVEQRELGTYDRMFT- LIEGGAGKED T (SEQ ID NO: 19) >EUA10187.1 [Mycobacterium kansasii 732] MEDWAEIRRLHRAEGLPIKTIARTLNISRNTVRSALAAEGPPKYERKPAGSAVDAFEDAIRAQLKAVPTMPATV- IAERVGWTRG MTVFKERVRELRPAFLPPDPAGRTTYEAGEIAQCDFWFPPTTVPVGYGQVRTPMQLPVLTMVCGYSRWLAAILV- PSRCAEDLFA GWWQLIHALGAVPRTLVWDGEGAIGRWRSGRVELTGQCQAFRGVLGAKVVVLKPREPEHKGIIERAHDYLERSF- LPGRTFSGPG DFNHQLQQWLQTVNRRTRRVLGCAPTERIGADRQAMLALPPVAPATGWRATTRLARDHYVRLDSNDYSVHPGVI- GRRIEVLADL DRVRVFCEGKLVADHERVWAWHQTITDPEHRAAANMLRCNRIGALRPVREPADQISVEQRALADYDTALGIDLG- EGGLVS (SEQ ID NO: 20) >EUA10187.1 [Mycobacterium kansasii 732] MEDWAEIRRLHRAEGLPIKTIARTLNISRNTVRSALAAEGPPKYERKPAGSAVDAFEDAIRAQLKAVPTMPATV- IAERVGWTRG MTVFKERVRELRPAFLPPDPAGRTTYEAGEIAQCDFWFPPTTVPVGYGQVRTPMQLPVLTMVCGYSRWLAAILV- PSRCAEDLFA GWWQLIHALGAVPRTLVWDGEGAIGRWRSGRVELTGQCQAFRGVLGAKVVVLKPREPEHKGIIERAHDYLERSF- LPGRTFSGPG DFNHQLQQWLQTVNRRTRRVLGCAPTERIGADRQAMLALPPVAPATGWRATTRLARDHYVRLDSNDYSVHPGVI- GRRIEVLADL DRVRVFCEGKLVADHERVWAWHQTITDPEHRAAANMLRCNRIGALRPVREPADQISVEQRALADYDTALGIDLG- EGGLVS (SEQ ID NO: 20) >WP_055721037.1 [Streptomyces niveiscabiei] MISVEDWAEIRRLHRAEQMPIRAIARQLGISKNTVKRALATDRPPVYSRPAKGSAVDAVEPQIRELLKQTPTMP- ATVIAERIGW DRGMTVLKERVRELRPAYLPVDPVSRTSYQPGELAQCDLWFPPADIPLGYGQSGRPPVLVMVSGYSRLIAARML- PTRTTGDLID GHWKLLTGWNAVPRMLVWDNEAGIGRGKVTGDFAAFAGLLATRIYLCRPRDPEAKGLVERANGYLETSFLPGRT- FTGPDDFNTQ LATWLAIANRRQHRTLGARPIDRWEADRAQMLTLPPVDPPRWWRFATRIGRDHYIRVDTCDYSVHPLAIGKKVQ- VRTDTDEVVV TLTPGGAEVARHPRCWAKQQTITDPVHARAAAVLRGDYRHHQASRAQAVRRHNTATASDLVEVEQRRLDSYDRL- FTLIEGGGQA DDPEVS (SEQ ID NO: 21) >WP_067466072.1 [Actinomadura macra NBRC 14102] MIKVEDWAEIRRLHRSEQMPIKAIARRLGVSKNTVKRALAADDPPKYRRAGKGSIVDAVEPQIRELLAEFPDMP- ATVIAERIGW ARSLTVLKDRVRVLRPQYRSPDPASRTTYQPGELAQCDLWFPPTKVPVGAGQQTSPPVLVMVSGYSRWLMARML- PSRAAGDLFA GMWALLRMLGSAPKTLVWDNEGAIGQWNGGRPRLTAEANAFRGTLGIKIVQCRPGDPEAKGVVERANGYLETSF- LPGRSFSGPD DFNAQLAGWLGHHANVRHHRRIECRPADRLVADRAAMVALPPVEPLVGWRTSTRLARDHYVRIASNDYSVHPSA- IGRLVEVVAD LEQVTVTCGGQNVAAHPRCWAVHQSITDPVHAQAAAAMRRSGLQVTRAPVDTQVEQRRLADYDALIGIEGTEGV- A (SEQ ID NO: 22) >WP_018254221.1 [Salinispora pacifica DSM 45549] MDGVEPQVRALLAEFPRMPATVIAERIGWTKSLTILKDRVRELRPLFVPPDPTDRVEYDPGEVAQCDLWFPPQP- IPVGGGAERI LPVLAMTCGYSRVTDAVMIPSRKAGDILAGMWEIVAGWGACPRTLVWDREAAIGGTGKLTTEAAAFAGTIGVRI- RLAPPRDPEF KGLVERRNGFFETSFLPGRVFTSPFDFNVQISDWLVQRANTRVLRAIGLTTPTARWAADRAAMVALPPVAPAIG- LTHRVRLGRD YYVRIDGNDYSVDPRCIGRFIDVLATPARMVASCAGQVVADHDRDWGHARTITDPEHQATARLLRQDLAARRRQ- ASTRSHADGH VVAIRALPDYDALFGVDFDPRPDVEAVQKSAAEGK (SEQ ID NO: 23) >WP_029025341.1 [Salinispora arenicola CNH646] MDGVEPQVRALLAEFPRMPATVIAERIGWTKSLTILKDRVRELRPLFVPPDPTDRVEYDPGEVAQCDLWFPPQP- IPVGGGAERV LPVLAMTCGYSRVTDAVMIPSRKAGDILAGMWEIVAGWGACPRTLVWDREAAIGGTGKLTTEAASFAGTVGVRI- RLAPPRDPEF KGLVERRNGFFETSFLPGRVFTSPFDFNVQISDWIVQRANTRVLRAIGLTTPTARWAADRAAMVALPPVAPAIG- LTHRVRLGRD YYVRIDGNDYSVDPRCIGRFVDVFATPARMVASCAGQVVADHDRDWGHARTITDPEHQATARLLRQDLAARRRQ- ASTRSHADGH VVAIRALPDYDALFGVDFDPRPNREAVQKSAAEGT (SEQ ID NO: 24) >WP_018815463.1 [Salinispora pacifica CNT584] MDGVEPQVRALLAEFPRMPATVIAERIGWTKSLTILKDRVRELRPLFVPPDPTDRVEYDPGEVAQCDLWFPPQP- IPVGGGAERI LPVLAMTCGYSRVTDAVMIPSRKAGDILAGMWEIVAGWGACPRTLVWDREAAIGGTGKLTTEAAAFAGTIGVRI- RLAPPRDPEF KGLVERRNGFFETSFLPGRVFTSPFDFNVQISDWLVQRANTRVLRAIGLTTPTARWAADRAAMVALPPVAPAIG- LTHRVRLGRD YYVRIDGNDYSVDPRCIGRFIDVLATPARMVASCAGQVVADHDRDWGHARTITDPEHQATARLLRQDLAARRRQ- ASTRSHADGH VVAIRALPDYDALFGVDFDPRPDVEAVQKSAAGGK (SEQ ID NO: 25) >WP_018738805.1 [Salinispora pacifica CNS996] MDGVEPQVRALLAEFPRMPATVIAERIGWTKSLTILKDRVRELRPLFVPPDPTDRVEYDPGEVAQCDLWFPPQP- IPVGGGAERI LPVLAMTCGYSRVTDAVMIPSRKAGDILAGMWEIVAGWGACPRTLVWDREAAIGGTGKLTTEAAAFAGTIGVRI- RLAPPRDPEF KGLVERRNGFFETSFLSGRVFTSPFDFNVQISDWLVQRANTRVLRAIGLTTPTARWAADRAAMVALPPVAPAIG- LTHRVRLGRD YYVRIDGNDYSVDPRCIGRFIDVLATHARMVASCAGQVVADHDRDWGHARTITDPEHQATARLLRQDLAARRRQ- ASTRSHADGH VVAIRALPDYDALFGVDFDPRPDVEAVQKSAAGGK (SEQ ID NO: 26) >WP_040800760.1 [Nocardia higoensis NBRC 100133] MQDWAQIKYLYTSEGLSQRAIAARLGISRDTVARAIRSESPPHYQRAVGPSVFDGFEPHVRQLLAEFPTMPTSV- IAERVGWVGS ASWFRKKVAGLRPEYAPKDPADRLEYRPGDQVQCDLWFPPVTIALGADQFGTPPVLVMVASFSRFITAMMIPTR- TTADLVAGMW ALLSNQLGAVPRRLLWDNESGIGRRGQLATGVAAFTGMAATRIVQCKPFDPESKGIVERANGYLETSFLPGRRF- SSPADFNDQI GRWLPIANTRRVRRIAAAPVELIGTDRAAMTALPPVAPSVGFTCRSRLPRDYYLRVLGNDYSIDPTMIGRMVDV- HADLDTVAAR CEGHVVASHRRAWSTRQTITDPAHVETAARLRETFTNNRFRDAGADGMVRDLADYDAIFGVDFDTEGVA (SEQ ID NO: 27) >WP_079046208.1 [Streptomyces thermoautotrophicus] MTYGPGQVAQCDLWFPQTRIPVTAGQERMLPVLVMTLGFSRFMTATMIPTRQAGDILSGMWQLIRGIGRVTKTL- VWDREAAIGG TGKVSAPAAAFAGTLATTIRLAPPRDPEFKGMVERNNQYLETSFLPGRRFVSPADFNDQLGDWLVRANSRTVRS- IQGRPVDLLE ADCQAMIPLPPVTPPIGLNHRVRLGRDYYVRVDTVDYSVDPQAIGRFVDVTASLETVTVLCDGQLVARHARSWA- RQGVITDPVH AATAARMRQALAEDRQRRQAAVRRHADGRAVALRALPDYDALFGVDFNPPSTKAK (SEQ ID NO: 28) >WP_018802787.1 [Salinispora arenicola CNY231] MLSVEDWAEIRRLRRSEGMAIQATARRLRMSRNTVKKALASDEPPRYRRVAKGSIVDAVEPQTRALLAEFPEMP- TTVIMVRVGW TRGKTVFCDRVQQLRPLFRRPDPAQRTEYLPGELAQCDLWFPPADVSLGFGQVGRPPVLVMVSGYSRWLSAVMI- PSRQSPDLLG GHWTLISGWDRTPKGLVWDNESAVGQWRAGRPQLTEAMNAFRGTLGIKVIQCRPADPEAKGLVERANGYLETSF- LPGRRFASPQ DFNAQLTEWLVRANNRQHRMLGCRPVDRWDADRAAMLSLPPVAPVVGWRQSTRLPRDHYVRLDSNDYSVHPAAV- GRRVDIVADA DRVQVFCENRLFARHDRCWAKHQSITDPAHRQAAADLRTAARQTPAAAGTTEVEHRQLADYDRMFGLDEVAA (SEQ ID NO: 29) >WP_034088311.1 [Streptacidiphilus albus JL83] MISVEDWAEIRRLHRAEEMPIRAIARHLGISKNTVKRAIATDRAPVYERAAKGSAVDAFEPAIRELLKATPSMP- ATVIAERIGW ERGITVLKERVRELRPAYLPADPTGRTQYLPGELAQCDLWFPPVHIPVGYGQVACPPVLVMVSGYSRMITARMI- PTRQTGDLIA GHWRLLSDWGTVPKMLVWDNESGIGQGKLTTEFAAFAGLLAVKVHLCRPRDPEAKGLVERANGYLETSFVPGRT- FTGPDDFNTQ LGDWLQGANRRLHRSIQARPVDRWEADRAAMLALPPVGPPQWYLFHTRIGRDHYLRIDLNDYSVHPRAIGRRVQ- VTCDADLIRV VTDGGDLVAEHPRCWARHQTLTDPDHKSAADQMRGDFIHAKAAAAARSRTATALAPDNLGIEVEERQLDTYDRI- FTLIQGGAGQ EENR (SEQ ID NO: 30) >WP_081684113.1 [Granulicoccus phenolivorans DSM 17626] MEDWALIRRLVADGVPQRQVARDLGIGRSTVARAVASDRPPKYERPAVPTSFTSFEPAVRQLLTDTPSMPATVL- AERVGWEGSI TWFRAHVRRLRPEHRPIDPSDRLTWLPGDAAQCDLWFPPKKIPLEDGTTSLLPVMVITAAHSRFMVARMIPTRH- TQDLLLGMWE LLQQLGRVPRRLIWDNEAGIGRGKRNAEGIGAFTGTLATTLVRLKPYDPESKGVVERRNGFLETSFMPGRSFAS- AADFNAQLTE WLETANARVVRTIKARPVDLLDADRAAMLPLPPVAPVIGWVNRVRLGRDYYVRVDSNDYSVDPNVIGRFVEVRA- DLSRVVVRHD GRRVAAHDRVWARGMTVTDPAHVTAAKALREHFQRPRPAADPGEAFDRDLADYDRAFGLLNGGLGNAAAADPGD- GTVAGLGAGA VAGLRDGEVA (SEQ ID NO: 31) >WP_081684113.1 [Granulicoccus phenolivorans NBRC 107789] MEDWALIRRLVADGVPQRQVARDLGIGRSTVARAVASDRPPKYERPAVPTSFTSFEPAVRQLLTDTPSMPATVL- AERVGWEGSI TWFRAHVRRLRPEHRPIDPSDRLTWLPGDAAQCDLWFPPKKIPLEDGTTSLLPVMVITAAHSRFMVARMIPTRH- TQDLLLGMWE LLQQLGRVPRRLIWDNEAGIGRGKRNAEGIGAFTGTLATTLVRLKPYDPESKGVVERRNGFLETSFMPGRSFAS- AADFNAQLTE WLETANARVVRTIKARPVDLLDADRAAMLPLPPVAPVIGWVNRVRLGRDYYVRVDSNDYSVDPNVIGRFVEVRA- DLSRVVVRHD GRRVAAHDRVWARGMTVTDPAHVTAAKALREHFQRPRPAADPGEAFDRDLADYDRAFGLLNGGLGNAAAADPGD- GTVAGLGAGA VAGLRDGEVA (SEQ ID NO: 31)
TABLE-US-00003 TABLE 3 Amino acid sequences of Representative CLUST.009925 Effector B Proteins >WP_043464542.1 [Streptomyces fradiae] MTTTTATSSGRDVTSELAYLTRVLKAPALREAASRLAERAEAEQWSFEEYLAACLQREVAARDAHGAESRIRAA- RFPSRKSLED FDFDHQRSVKRETITHLGTLDFVAGKENVVFLGPPGTGKTHLATGLGIRACQAGHRVAFATAAEWVTRLAKAHE- TGRLDEELVR LGRIPLIIVDEVGYIPFEPEAANLFFQFISGRYERASVIVTSNKPFGRWGEVFGDDTVAAAMIDRLVHHAEVIS- LKGDSYRMRG RDLGRVPAANTRE (SEQ ID NO: 32) >WP_067342984.1 [Streptomyces noursei ATCC 11455] MPARAATDDAAPATSRRTGQQTAADLAFLARAMKAPALLDAAERLAERARKESWTHAEYLVACLQREVSARESH- GGEARVRAAR FPAIKTVEELDVTHLRGMTRQQLAHLGTLDFITGKENAVFLGPPGTGKTHLAIGLGVRACQAGHRVAFATASEW- VDRLAAAHQA GRLQVELTKLGRYPLIVIDEVGYIPFEAEAANLFFQLISNRYERASVIVTSNKPFGRWGEVFGDETVAAAMIDR- LVHHAEVHSF KGDSYRMKGRELGRIPHDTTDND (SEQ ID NO: 33) >WP_068751156.1 [Tessaracoccus sp. T2.5-30] MSTKDLTGEIGFLARELKTPVIAETFTDLGDRARAEGWSHEEYLAAVLGRQVASRTANGTRLRISAAHLPAVKT- VEDFVFDHIP AASRDLIAHLATCTFIPKRENVVLLGPPGTGKTHLAIALGIKAAEAAHPVLFDSATGWINRLAAAHASGALERE- LRRLKRYRAL IIDEVGYLPFDTTAAALFFQLIASRYETGTVIVTSNLPFSRWGETLGDDVVAAATIDRLVHHAHVIGLDGDSYR- TRGHRRQPTK (SEQ ID NO: 34) >WP_086568727.1 [Streptosporangium sp. M26] MAATTTNTTANTTATSGRNLAAELAYLTRVLKAPSLAASIDRLAERARTESWTHEEFLAACLQREVAARDSHGG- EARIRFARFP ARKALEDFDYDHQRSLKREVIAHLGTLDFVMAKDNVVFLGPPGTGKTHLSIGLGIRACQAGHRVAFATAAQWVA- RLAEAHAAGT LQAELIKLSRIPVLIVDEVGYIPFEPEAANLFFQLVSSRYERASLIVTSNKPFGRWGEVFGDDVVAAAMIDRLV- HHAEVISLKG DSYRLKNRDLGRVPAANPSNDQ (SEQ ID NO: 35) >WP_067067772.1 [Streptomyces thermoautotrophicus] MTTTARPKTSTPKDGLPSLIAYLTRVLKTPTIGAFWEELATQARDENWSHEEYLAALLQRQVADRESKGTTMRI- RTAHFPQVKT LEDFNLDHLPSLRRDVLAHLATSTFVAKAENVILLGPPGIGKTHLAIGLGVKAARAGYSVLFDTASNWIARLAA- AHHAGRLETE LKKIRRYKLIIIDEVGYIPFDQDAANLFFQLIASRYEQGSVMVTSNLPFGRWGETFSDDVVAAAMIDRLVHHAE- VLTLTGDSYR TRQRRELLAKENRASRN (SEQ ID NO: 36) >WP_012181245.1 [Salinispora arenicola CNS-205] MATKTSRNVSSEIAFLTRALKAPSLAASVERLAERARAESWTHEEFLAACLQREVAARESHGGEGRIRAAKFPA- RKSLEEFDFD HQRSLKREATAHLGTLDFITGRENVVFLGPPGTGKTHLSIGLGIRACQAGHRVAFATAAQWVSRLADAHHAGRL- QDELVKLGRI PLLIVDEVGYIPFEAEAANLFFQLVSNRYERASLIVTSNKPFGRWGEVFGDDVVAAAMIDRLVHHAEVVSMKGD- SYRLKDRDLG RVPAATKTND (SEQ ID NO: 37) >WP_052251142.1 [Trueperella pyogenes] MTTKDHRQIDSELAHISKVLKAPRIHATYFETAEQAREDGWSFEEYLAAVLSVEASARQESGANARIKRAGFPQ- VKTIEEFDFT IQPSIDRAKIARLETSAWISQASNVIFLGPPGTGKTHLAIGLGVIAARQGYRVLFDTAAGWVQKLTQAHDRGEL- PKLLTKLGRY DLLVVDEVGYIPIEAEAANLFFQLVSTRYEKASLIMTSNLPFSRWGECFGDQTIAAAMIDRVVHHAEILTHKGT- SYRINGHEDI LPSVNAERGKALK (SEQ ID NO: 38) >WP_012182724.1 [Salinispora arenicola CNS-205] MATSSSRNVASEIAFLTRALKAPSLAASVERLAERARAESWTHEEFLAACLQREVAAREAHGGEGRIRAARFPA- RKSLEEFDFE HQRSLKRETIAHLGTLDFVASKENVVFLGPPGTGKTHLSIGLGIRACQAGHRVAFATAAGWVSRLADSHHAGRL- QDELVKLGRI PLLIVDEVGYIPFEAEAANLFFQLVSNRYERASLIVTSNKPFGRWGEVFGDDVVAAAMIDRLVHHAEVISMKGD- SYRLKDRDLG RVPAATKTND (SEQ ID NO: 39) >WP_055422288.1 [Streptomyces pactum] MSMKNGTNQARTSRDVGSELIYLTKALKAPALRDAAARLAERARDEGWSHEEYLAACLQREVAARDSHGAEGRI- RAARFPSRKS LEDFDFDHQRSVKREVIAHLGTLDFVAGKENVIFLGPPGTGKTHLATGLGIRACQAGHRVAFGTAAQWVTRLAE- AHQAGRLSDE LTRLGRIPLIVVDEVGYIPFEPEAANLFFQFISGRYERASVIVTSNKPFGRWGEVFGDDTVAAAMIDRLVHHAE- VISLKGDSYR MRGRDLGRVPAANTGE (SEQ ID NO: 40) >WP_015288589.1 [Mycobacterium canettii CIPT 140060008] MAAKTATNSRDVAAELAYLTRALKAPTLRGAIERLADRARTKTWSYEEFLAACLQREVSARESHGGEGRIRAAR- FPSRKSLEEF DFDHARGLKRDTIAHLGTLDFVTLAIGIAIRACQAGHRVLFATASQWVDRLAAAHHSGTLQSELIRLARYPLLV- VDEVGYIPFE PEAANLFFQLVSSRYERASLIVTSNKPFGRWGEVFGDDVVAAAMIDRLVHHAEVIALKGDSYRIKDRDLGRVPT- VTADDQ (SEQ ID NO: 41) >WP_013772021.1 [Cellulomonas fimi ATCC 484] MTATKTTARDVSSELVFLTRALKAPTMRDAVDRLAERARAESWTHEEFLAACLQREVAAREAHGGEGRIRAARF- PARKSLEDFD FEHARGLARDQTAHLGTLDFVAARDNVVFLGPPGTGKTHLATGIAVRACQAGHRVLFATASEWVDRLASAHHDG- RLQDELRRLG RYPLLVIDEVGYIPFEPEAANLFFQLVSARYERASLIVTSNKPFGRWGDVFGDDTVAAAMIDRLVHHADVIALK- GDSYRLKNRD LGRPPAATTD (SEQ ID NO: 42) >BAC75257.1 [Streptomyces avermitilis MA-4680 = NBRC 14893] MMSTKNGTNQARTSRDVGSELIYLTKALKAPALRDAAARLAERARDEGWSHEEYLAACLQREVAARDSHGAEGR- IRAARFPSRK SLEDFDFDHQRSVKREVIAHLGTLDFVAGKENVIFLGPPGTGKTHLATGLGIRACQAGHRVAFGTAAQWVARLA- EAHQAGRLSD ELTRLGRIPLIVVDEVGYIPFEPEAANLFFQFISGRYERASVIVTSNKPFGRWGEVFGDDTVAAAMIDRLVHHA- EVISLKGDSY RMRGRDLGRVPAANTGE (SEQ ID NO: 43) >EUA10185.1 [Mycobacterium kansasii 732] MASSNHRDLSAEISFLTRALKAPTMREAVARLAERARTESWTHEEFLVACLQREVSARESHGGEGRIRAARFPS- RKSLEEFDFD HARGLKRDLIAHLGTLDFVTARDNVVFLGPPGTGKTHLAIGIAIRACQAGHRVLFATASEWVARLAEAHHGGRL- QPELLRLGRY PLLVIDEVGYIPFEPEAANLFFQLVSSRYERASLIVTSNKPFGRWGEVFGDDVVAAAMIDRLVHHAEVIALKGD- SYRIKDRDLG RVPGSTTEE (SEQ ID NO: 44) >WP_055721038.1 [Streptomyces niveiscabiei] MPATTRTTAAAGPRTGRQTAADLSFLARAMKAPALLDAAERLAERALKETWTHTEFLVACLQREVSARESHGGE- ARIRTARFPA IKTIEELDVTHLRGITRQQLAHLGTLDFITAKENAVFLGPPGTGKTHLAIGLAVRACQAGHRVAFATAAEWVDR- LAAAHHAGHL QAELTKLSRYPLIVVDEVGYIPFESEAANLFFQLVSNRYERASVIVTSNKPFGRWGEVFGDETVAAAMIDRLVH- HAEVHSLKGD SYRMRGRQLGRVPTATHDTD (SEQ ID NO: 45) >WP_067466075.1 [Actinomadura macra NBRC 14102] MAGVRAKTTATAPNSGRNVDAELAYLTRVLKAPSLAASVQRLAERARAESWTHEEFLAACLQREVAAREAHGSE- ARIRSARFPA RKALEDFDYDHQRSLKREVIAHLGTLDFVAARENAVFLGPPGTGKTHLSIGLGIRACQAGHRVCFATAAGWVAR- LAQAHTAGRL QDELTKLARIPVLIVDEVGYIPFEPEAANLFFQLVSSRYERASLIVTSNKPFGRWGEVFGDDVVAAAMIDRLVH- HAEVISLKGD SYRLKNRDLGRVPAASTTNDQQ (SEQ ID NO: 46) >WP_018252763.1 [Salinispora pacifica DSM 45549] MTTITKTRPAPDGLGSKLAYLTRVLKTPTIGRTWETLADQARQANWSHEEYLAAVLERQVADRESAGTTMRIRT- AHFPAIKTLE DFNLDHLPSLRKDVLAHLATATFIPKAENVILLGPPGLGKTHLAIGLGIKATQSGYSVLFDTATNWIDRLARAH- HRGALEAELK KIRRYKLIIVDEVGYIPFDTDAANLFFQLVASRYEAGSILVTSNLPFGRWGEVFGDEVVAAAMIDRLVHHAEVL- TLAGESYRTR ARRELLAKDRNK (SEQ ID NO: 47) >WP_018802534.1 [Salinispora arenicola CNH646] MTTTTKSTPTPDGLGSKLAYLTRVLKTPTIGRTWEILADQAREANWSHEEYLAAVLERQVADRESAGTTMRIRT- AHFPAIKTLE DFNLDHLPSLRKDVLAHLATATFIPKAENVILLGPPGLGKTHLAIGLGIKATQSGYSVLFDTATNWIDRLARAH- HAGALEAELK KIRRYKLIIVDEVGYIPFDTDAANLFFQLVASRYETGSILVTSNLPFGRWGEVFGDEVVAAAMIDRLVHHAEVL- TLAGESYRTK ARRELLAKDRGK (SEQ ID NO: 48) >WP_018815462.1 [Salinispora pacifica CNT584] MTTITKTRPAPDGLRSKLAYLTRVLKTPTIGRTWETLADQARQANWSHEEYLAAVLERQVADRESAGTTMRIRT- AHFPAIKTLE DFNLDHLPSLRKDVLAHLATATFIPKAENVILLGPPGLGKTHLAIGLGIKATQSGYSVLFDTATNWIDRLARAH- HRGALEAELK KIRRYKLIIVDEVGYIPFDTDAANLFFQLVASRYEAGSILVTSNLPFGRWGEVFGDEVVAAAMIDRLVHHAEVL- TLAGESYRTR ARRELLAKDRNK (SEQ ID NO: 49) >WP_018738804.1 [Salinispora pacifica CNS996] MTTTTKTRPAPDGLGSKLAYLTRVLKTPTIGRTWETLADQARQANWSHEEYLAAVLERQVADRESAGTTMRIRT- AHFPAIKTLE DFNLDHLPSLRKDVLAHLATATFIPKAENVILLGPPGLGKTHLAIGLGIKATQSGYSVLFDTATNWIDRLARAH- HRGALEAELK KIRRYKLIIVDEVGYIPFDTDAANLFFQLVASRYEAGSILVTSNLPFGRWGEVFGDEVVAAAMIDRLVHHAEVL- TLAGESYRTR ARRELLAKDRNK (SEQ ID NO: 50) >WP_040800758.1 [Nocardia higoensis NBRC 100133] MNTDTNKQIEYYANALKAPRIRDSAARLAEQARDAGWTHEEYLAAVLSREVSSRESSGAETRIRAAGFPARKAI- EEFNFDHQPA LKRDTLAHLGTAQFISKAQNVVLLGPPGTGKTHLSIGLGIAAAHHGHRVLFATAVEWVTRLQTAHQQGRLAAEL- AKLRRYGLLI VDEVGYIPFEQDAANLFFQLVSSRYEHASLVLTSNLPFSRWGDVFSDHVVAAAMIDRIVHHADVLTLKGNSYRL- RNTEIDTLPS MRGDNQAD (SEQ ID NO: 51) >WP_018802788.1 [Salinispora arenicola CNY231] MATSSSRNVASEIAFLTRALKAPSLAACVERLAERARAESWTHEEFLAACLQREVAAREAYGGEGRIRAARFPA- RKSLEEFDFE HQRSLKRETIAHLGTLDFVASKENVVFLGPPGTGKTHLSIGLGIRACQAGHRVAFATAAGWVSRLADSHHAGRL- QDELVKLGRI PLLIVDEVGYIPFEAEAANLFFQLVSNRYERASLIVTSNKPFGRWGEVFGDDVVAAAMIDRLVHHAEVISMKGD- SYRLKDRDLG RVPAATKTND (SEQ ID NO: 52) >WP_081983172.1 [Streptacidiphilus albus JL83] METDQDATEPASTKAASGRRTAKQTASDLAFYARAMKAPVLLDAAERLAERARAETWTHAEYLVAVLQREVAAR- ESHGGEGRIR AARFPAVKTLEELDVTHLRGLTRQQLAHLGTLDFITGKENAIFLGPPGTGKTHLATGLAVRACQAGHRTAFATA- AQWVDRLKEA HAAGRLQDELVKLGRYPLIVIDEVGYIPFEADAANLFFQLISNRYERASVIVTSNKPFGRWGEVFGDETVAAAM- IDRLVHHAEV HSLKGDSYRMRGHDLGRVPTAVNETS (SEQ ID NO: 53) >WP_026927625.1 [Granulicoccus phenolivorans DSM 17626] MATKKNEASEALKQLTYLASALKAPRITEAAARLADHARDAGWTYEEYLAAVLDREVAARNASGAQLRIRAAGF- PARKTIEEFD WDAQPAVRQQIAALASGGFLTEARNVVLLGPPGTGKTHLATGLGIAAANHGHRILFATATEWVTRLTDAHRAGR- LPLELTRLRR YGLIIVDEVGYLPFDQDAANLFFQLVSSRYEHASLILTSNLPFSGWGGVFGDQAVAAAMIDRVVHHADVLTLKG- ASYRLRNRGI ETLPSIKTQDTAD (SEQ ID NO: 54)
TABLE-US-00004 TABLE 4 Nucleotide sequences of Representative CLUST.009925 Direct Repeats CLUST.009925 Effector Protein Direct Repeat Accession Nucleotide Sequence WP_072653819.1 GTCCTCCCCGCGCCCGCGGGGGTCAGCCG (SEQ ID NO: 1) (SEQ ID NO: 55) WP_003414864.1 TGCCCGCCTCCTCACTCGCGCCATTCCGG (SEQ ID NO: 2) CGCTCGCCG (SEQ ID NO: 56) WP_043464545.1 GTCCTCTCCGCGCGAGCGGAGGTGAGCCG (SEQ ID NO: 3) (SEQ ID NO: 57) WP_043464545.1 GTCCTCTCCGCGCGAGCGGAGGTGAGCCG (SEQ ID NO: 3) (SEQ ID NO: 57) WP_043464545.1 GTCCTCTCCGCGCGAGCGGAGGTGAGCCG (SEQ ID NO: 3) (SEQ ID NO: 57) WP_067342985.1 GCTGCTGTGGTGCCGCGGCT (SEQ ID NO: 4) (SEQ ID NO: 58) WP_068751155.1 GCGACGACGGCGCGGGTCAG (SEQ ID NO: 5) (SEQ ID NO: 59) WP_086568729.1 GGAGTCGATGGCGTGCTGGTAGGCGGC (SEQ ID NO: 6) (SEQ ID NO: 60) WP_083119439.1 CCGGAGTCGGGTGGGGTGTT (SEQ ID NO: 7) (SEQ ID NO: 61) KWX05882.1 GTTGCGACCCCTCGTAGGGGCGATGAGGAC (SEQ ID NO: 8) (SEQ ID NO: 62) WP_012181244.1 CGGAGCACCCCCACGTGCGTGGGGAGGAC (SEQ ID NO: 9) (SEQ ID NO: 63) ALD73443.1 GTAGTGTGCATGGTGTACAT (SEQ ID NO: 10) (SEQ ID NO: 64) WP_012182723.1 CTGCTCCCCGCGCATGCGGGGGTGATCCC (SEQ ID NO: 11) (SEQ ID NO: 65) WP_077349293.1 GTCCGTCCCCGCCACGCGGGGGTGAGCCC (SEQ ID NO: 12) (SEQ ID NO: 66) WP_055422289.1 GTCCTCTCCGCGCGAGCGGAGGTGAGCCG (SEQ ID NO: 13) (SEQ ID NO: 57) WP_055422289.1 GTCCTCTCCGCGCGAGCGGAAGTGAGCCG (SEQ ID NO: 13) (SEQ ID NO: 67) WP_015288588.1 CACTTGAGGGCGCACAGGCGCCCGAA (SEQ ID NO: 14) (SEQ ID NO: 68) CCK59973.1 TTGCCGCCATTACCGCCGGCGCCG (SEQ ID NO: 15) (SEQ ID NO: 69) AEE46996.1 GCGTCAGAAGGCGCGCGTGATCATCGCGC (SEQ ID NO: 16) GCTTG (SEQ ID NO: 70) WP_015748250.1 GGGCTCATCCCCGCGGGCGCGGGGAGCAC (SEQ ID NO: 17) (SEQ ID NO: 71) BAC75258.1 CGGTTCACCTCCGCTCGCGCGGAGAGCAC (SEQ ID NO: 18) (SEQ ID NO: 72) WP_030291260.1 CGGCTGACCCCCGCGGGCGCGGGGAGGAC (SEQ ID NO: 19) (SEQ ID NO: 73) EUA10187.1 CCCCTGTGAGTCGAGTGAGCGGAGCGAGCG (SEQ ID NO: 20) (SEQ ID NO: 74) EUA10187.1 GGTGGCCACCGTGGCGATATT (SEQ ID NO: 20) (SEQ ID NO: 75) WP_055721037.1 CGGTTCACCCCCGCGTCCGCGGGAAGCAC (SEQ ID NO: 21) (SEQ ID NO: 76) WP_067466072.1 GGGACCATCCCCGCGTGCGCGGGGAGCAG (SEQ ID NO: 22) (SEQ ID NO: 77) WP_018254221.1 GTTGCGATCCCTGGTAGGGGCGATGAGGAC (SEQ ID NO: 23) (SEQ ID NO: 78) WP_029025341.1 GGAAGACCCCCGCGCGTGCGGGGACGAG (SEQ ID NO: 24) (SEQ ID NO: 79) WP_018815463.1 GTCCTCCCCACGCACGTGGGGGTGCTCCG (SEQ ID NO: 25) (SEQ ID NO: 80) WP_018738805.1 CGGAGCACCCCCACGTGCGTGGGGAGGAC (SEQ ID NO: 26) (SEQ ID NO: 63) WP_040800760.1 GTCCTCATCGCCCCTACGAGGGATCGCAAC (SEQ ID NO: 27) (SEQ ID NO: 81) WP_079046208.1 GTTGCGACCCCTCGTAGGGGCGATGAGGAC (SEQ ID NO: 28) (SEQ ID NO: 62) WP_018802787.1 GCGTCCGCCGCTCCTGGTTC (SEQ ID NO: 29) (SEQ ID NO: 82) WP_034088311.1 TGTTCCCTGCGCCCGCAGGGATGGTCCC (SEQ ID NO: 30) (SEQ ID NO: 83) WP_081684113.1 GTGCTCCCCGCGCGAGCGGGGATGATCC (SEQ ID NO: 31) (SEQ ID NO: 84) WP_081684113.1 GTGCTCCCCGCGCGAGCGGGGATGATCC (SEQ ID NO: 31) (SEQ ID NO: 84)
TABLE-US-00005 TABLE 5 Non-coding Transposon End Sequences of Representative CLUST.009925 Systems >Left Transposon End for WP_068751155.1 GCGTGTCGATGCCGAGTGGATACTGACCCCGTGGCGCCGAGTGAATTTTGACCCCTCTCCGTCAGAGTGAGGGA (SEQ ID NO: 85) >Right Transposon End for WP_068751155.1 AACCAACAACAACCATGAGAGGGGTCAATTTTCGATCGGCAACAGGGGGTCAGTTTTCGTCCGGCGTTGACAAC- GCGCTCGCCG TCGGCCAGGTCGAACTGCGGGTTGTGGGCCGAGGCGTGTGCGATGAGCCGTTCGCCCTCCCACACCGTGGGCAG- CAACGTCGCC GACGGCGCCCCGTCGCCCCCCGTGATCCAGAGTGCGGCGCCCACCTCGCTGAGGAGCGCGCGACAGCGGTCGAC- GTTGTCCACC CGCTGGCTGGCGGGGACGTACACGGTGCTGGCTGGGATGTCGGGCTCCATGGTCGGCCATCGTAATCGGGCCGC- ACGCCCTGCC GCGGAGGCCCTCACCGAGGGC (SEQ ID NO: 86) >Left Transposon End for WP_012181244.1 TTCCACCCTCTGACGGTACGTGCGTGACGAGCGCGGGACGCGACGGACGAGGTTCTGACGCCCCCAGATGGCTG- GCCTCCACGC GGGCTGGTGTCGGCTCCGCCTGAAAACTGACCCTGTGGTTCCGGCTGAATTTTGACCCCTTCCGAAAGCATCGG- GAG (SEQ ID NO: 87) >Right Transposon End for WP_012181244.1 ACATCAACAACCAACGAGGGGGTCAAAATTCGGCCGGAACAGAGGGGTCAAATTTCAGCCGGCGTTGACAGCTG- GTCCCCGCAA CTGGGCGTCGCTGCGGGGACCAGCTTGTCGGTGGCCTTCGGCCACGGCTGCTCCACCTGCGCGTGGCCGTGGCC- AGCAGCCGAG CGGTGACCTCGTCGACCTGTTCAGGTTCGCCGCAGGCCCGACCGCTCGCCAGCGCCGGGCCCCGCGCTGTGGAT- AACCCCGATG CCTGTGGATGATCCCCTGATCATGGCCAGATCTGCTAGCCCTCGTATCACCAGCAGCGCGGACGCAGGGGCGGG- CCGTAGAAGG GACCACACTCGTATGGCACCAAGAGCAAGGCGTGGACGTTTGAAGCCTCAATGAAGGGCAGCTCCGCTGGGAGA- TGCGACCCCG GAAGCCGCCCAGATCTTCACTGCGCACGTTAAACAGGCCTCAATGATTTCAGCCGGCGTTGACAGCGTCGAGTG- GTTCAGAGGA AGCGTTGGCCCATCACGAGACCACCGGAGCGGTGATGTAGGACCGCTGCAACCCTTCGATCAGTCCAGCTCGCG- CAGAGGGTCC ACCGTGGGTGCTATCAGGAGCGT (SEQ ID NO: 88) >Left Transposon End for WP_015748250.1 TAACTGCCGTCCGTCTTCGTGGAGGCTGTCGGTTCCGGGTGAATCCTGACCCAGCTGGTCCGGTTGAAAACTGA- CCCACCTCGC AACGATCGTGGG (SEQ ID NO: 89) >Right Transposon End for WP_015748250.1 CGTTGCCGTTCCATCTCGTCGAAGAAGACCCGCTGATTGCGAGGAAACGACGATGGCAGGACGTCAGGACAGGG- CTCGGTGATG ACAGTGATGAGGGCGCGTCAGGAACCGGGGATAACGCCCGGTACCGGCAAGACCCACCTCGCGACGGGCATCTC- GATCCGCGCC TGCCAAGCCGGCCATCGCGTCCTGTTCGCGACCGCCGCCGAGTGGGTCGCTCGCCTGGCCGAGGCCCACCACGC- CGGCCGGCTG CAACCAGAACTCGTTCGTCTGGGCCGCTACCCGTTGCTCGTGATCGACGAGGTCGGCTACATCCCGTTCGAAGC- CGAGGCCGCG AACCTGTTCTTCCAACTCGTGTCCAACCGCTACGAACGGGCCTCGCTCATCGTCACCAGCAACAAGCCCTTCGG- CCGGTGGGGC GAAGTCTTCGGCGACGACGTCGTCGCTGCCGCGATGATCGACCGCCTGGTCCACCACGCTGACGTCATCGCCCT- CAAAGGCGAC AGCTACCGACTCAAAGACCGCGACCTGGGCCGCCCACCAGCGGCCAACACCGACCAATGACCACCAGGTGGGTC- AGTTTTCAAC CGGACAAGGAGGGTCCAAGTTCAACCGGAGTTGACAGAGGCGATCACGGGCA (SEQ ID NO: 90) >Left Transposon End for WP_018254221.1 CCCAGATCAGCGTCAACACACGGCCTCCGATCTTGCCGGCGGCGCTCGGTGACAGGCCCACCTCGTGCAGCAAC- GTGCGCAGCA GCCGGAACCACACGCCGGCGGGCACGTCCCGGCCGAGCAGGCTGACCCGGCCGGTCATCAGCGCCTGGTGGGTG- TAGCGGTCCA GGTCGGCAAGCGGCGCGGCCACCGGCACTGGATCCGACGGCCGGCCGTGCCGCCATCTCGATGTCGACGGTGGT- GGCATCCTCC AGCCGGCAGCCGTGCTCGCCGCAACTGACCATCAGCGGTAGCCGACCCAACAGCGCGCGGCCCCTGCCCGGCAG- GCCGGCGCTG ACCGGGCACGTCCGGTTATGCCACTGCCGGGGCAGCCACGGTCCGCGCCACCGCCGGTATCGGGCGAGCCCGTT- GCCGATGCCG GCGCGGGGTGCGTCTTTGATCCACGACCAGTTCGATCCCTGGTAGGGGTGTCGACGACGCGCTGAGACTGACCC- TCTAGCGGCG GATTGGGACTGACCCCTCCGAGGCTGGAGGGGTGATCGCATTGGAAGACTGGGCTGAGGTCCGTCGGTTGCATC- GGGCTGAGGG TGTTCCGATCAAGGAGATTGCGCGTCGGTTGGGTTTGGCGCGTAACACGGTCCGTTCGGCGCTTCGGGCCGAGG- CGCCGCCGTC GCGTGAGCGCGGCCCTCGCGGTTCGTGT (SEQ ID NO: 91) >Right Transposon End for WP_018254221.1 CACGAAACGACCTCCGGGGGGTCAATCCCAAAACGTCGATAGGGGGTCAGTTCCAATCCGCCGTTGA (SEQ ID NO: 92) >Left Transposon End for KWX05882.1 TGGTGGTGCCCCCCTGTCGTTGCTTCCTGAAAACTGACCCCCAGGCAGTCTCCGAATCTTGACCCCCTCCTAAC- TCTTGGAGGG TGCTGAAG (SEQ ID NO: 93) >Right Transposon End for KWX05882.1 CCCGACCCCAGGGGATCAATTTTCCCGGAGCGGAAGGGGGTCAGTTTTCGCGAAGCGTTGACACCCCCCAGATG- CCGATCTGGT GTGTGG (SEQ ID NO: 94) >Left Transposon End for WP_029025341.1 GCATCTGCGCCGCCACGACCTACGGCACACCGGACTCACCTGGATGGCTGACGCAGGTGTGCCGGTGCACGTCC- TGCGGAAGAT TGCCGGACACGGGTCGCACCACCACCCAGCGCTACCTACACCCCGACCGGCAGTCGTTCGCCGACGCCGGAACG- ACGCTGAGCG CCCGCTTGAAGGCCCGCCGGTCCCCAGATGGTCCCCAGCTACGCGCCGTAGATCAGGAAAGTCCTACTACCCCA- GTACGAATTA GAGGCCGTTGACCAGGGTTTCACCCCGGTCGACGGCCTCTTTTCGTGCTGTCGGGACGGCCGGATTCGAACCGA- CGACCCCTTG TTACCAACCGTAGTCACACCGGGCATGTCATCGCTTGTCGCGAGTAAGCCCTGAGCTGCACCAACACGGTCAGA- TACTTGTTCC TCCTGGTCACTGTTCGTCAGCTCCTGGCAGGAGTTTTCGGGGGTAAACTGTCGACGACGCGCTGAGACTGACCC- TCTAGCGGCG GATTGGGACTGACCCCTCCGAGGCTGGAGGGGTGATCGCATTGGAAGACTGGGCTGAGATCCGTCGGTTGCATC- GGGCTGAGGG TGTTCCGATCAAGGAGATTGCGCGTCGGTTGGGTTTGGCGCGTAACACGGTCCGTTCGGCGCTTCGCGCTGATG- CGCCGCCGTC GCGTGAGCGCGGCCCTCGCGGGTCGTCG (SEQ ID NO: 95) >Right Transposon End for WP_029025341.1 AACGAAACGACCCCCGAGGGGGTCAATCCCAATCCGTCGATAGGGGGTCAGTTCCAGTCCGCCGTTGACATAAA- CGGGGGCCGG CAGACCAACCATCAGGGTTTCCACCACGCCCACCCCCGCCACCGAGATGTCCGTCGCCACGAACAGAGCCCGCA- AGCCGTGGCG TGGTCATCGAGCATGGACGACCGGGGAGAGGCCAGAGCAAAGTCTCCCGCTGGCTCATGCAGAGTCCTCCCGGA- CGATGACTGA TGCGCATTGGTCATGAGACGACCTACCAGCGCCGTGCTGCGACGACCGGTTAAACCCGCCCTTCCCCAAGTGCA- CGGATCTATC GCAGCACACCGCAGCCGATCTCCACGCCGTCGAGCAACGCCTCAACAACCGGCCCCGCAAGACCCTCAACTGGC- GCACCCCGGC CGACGTCTTCCATACCGCACTGGCACCCTGACGATCGTCACCGTTGCGACGACTGCTTGAATCCGCCCGGAACC- ACGGGGTCAT TTTTCAGGCGGAGCCGACCACGAGACCGCACTCGGGATGAGTATCGCCCTGGACATGGGAAAATAC (SEQ ID NO: 96) >Left Transposon End for WP_012182723.1 GTGAGAACTGGAGCAAAGCCATGTCGAGGCAACTTGTCGGCTCCGCCTGAAAACTGACCACGTGGTTCCGGCTG- AATCGGGACC ACCTCCTGAAGCATCGGGAG (SEQ ID NO: 97) >Right Transposon End for WP_012182723.1 AGATCAACCAAACCGAGGTGGTCCAGTTTCAGCCGGACCACGGTGGTCCCGTTTGAGGCGGCGTTGACACAA (SEQ ID NO: 98) >Left Transposon End for WP_040800760.1 ACTCGGCATAGTACGGCGTGCTCGAAGCCTGCCTCGAACGAACTCAATCTCGCTGTCGTTCCATGACTTATCGC- TCACGGCCTC GACCGACCCGAACCCGTCTGTCAGCGCTCGTTGAAAACCAGGCCACTGGCGCTCTTTGAAAATCGGCCACCCGT- CCACGATTGG AAGGGTGATCTCA (SEQ ID NO: 99) >Right Transposon End for WP_040800760.1 ACAACCATCAAGTGGCCCGATTTTCGAAGAGCGCCACCGGCCGGTTTTCGAAGAGCGTCGACACC (SEQ ID NO: 100) >Left Transposon End for WP_055422289.1 GCGTAAGCGGAGGTGAGCCGTGCCACAGGACCTCCAGGTCCGCGCGAGCGGAGGTGAGCCGGATTACACGGTCG- AAGCCGTGGG CATCGGCGGGTCCTCTCCGCCCGAGCGGAGGCGAGCCGGATCCCGATTCGGGTTTCGTGGTGACGTCCGGGTCC- TCTCCGCGCG AGCGGAGGTAAGCCGGACAGCACAGCGCTGATAGCTTGCTTGATCGCGTCCTCTCCGGGCGAGCGGAGGTGAAC- CGGTCGGCGA CGGTTCGACGTACCGGTGGTGGCTGTCCTCTCCGCGCGAGCGGAGGTGAGCCGGAGGTCTGCCCATGGCAGGCA- ACTCTCGGCC GGTCCTCTCCGTGCGAGCGGAGGTGAGCCGTCCGCCAGGAGCAGCAGCCGCTCGATCGCCGAGTCCTCTCCGCG- CGAGCGGAGG TGAACCGGCGGTCTGGGACTCCGTCCTGGGCGACATCATGTCCTATCCGTGCGAGCGGAGGTGAGCCGTTCTGC- CGCCCGTCAG GTGAGCCGGTCAGCCGGTCCTCTCCGCGCGAGCGGAAGTGAGCCGCCGGTGGGGTGGGCGGTGTGGCCGCATCA- CACGTCCTCT CCGCGTAAGCGGAGGTGAGCTGTCGGCGACGGCTGAAAAGTGGACCAGTAGCGGCGCTTGAAAGTTGACCCTCT- CCAGGGGTCT GGCTCGTTGAGTCAGGCCGGGAGGAGTG (SEQ ID NO: 101) >Right Transposon End for WP_055422289.1 CCAACATCAACTGACGCAGCGGGGGGTCATTTTTCACCCGCCGGAATCGGCTCACATTTCAAGCGTCGCCGACA- GTGAGCCGGG GCCTGACGAGGCTCTGGAGTGCAAGGACGTGC (SEQ ID NO: 102) >Left Transposon End for BAC75258.1 CCCGTCTCAGTCCCGAGTGCCAGCGTCTGTCGGCGACGGCTGAAAAGTGGCTCAGTCAAGATTTCCGTGAAGCC- GATCGTGCGC CTGTGCGCCGGTGACGCTCCGTCACAGGCCGC (SEQ ID NO: 103) >Right Transposon End for BAC75258.1 CCAACATCAACTGACGCAGCGGGGGGTCATTTTTCACCCGCCGGAATCGGCTCACATTTCAGGCTCAGTCAAGA- TTTCCGTGAA GCCGA (SEQ ID NO: 104) >Left Transposon End for WP_055721037.1 TCAGGGCGGTTCTCGCGCGCCTGTCCGGATGCCGGTTCACTGTCGGCGGCGGGTGAATCGTGACCCTCTGACGG- CGGATCAAAA CTGACCCACTTCGTGGTCTTCTGACCAACCTGATCTTGATCGAAGGTCAGGAGAAGAGG (SEQ ID NO: 105) >Right Transposon End for WP_055721037.1 ACACCGGCAACCAGACCCGGTGGGTCAGAACTCGACCGCCCACACCGGGTCAGGATTCAGCCGCCGCCGACACT- TTGAGCCGAC GCCCAGCATCTCCACAAGTAGGGCTGTACGACAGGCTTCCCTCTCGCCCAGCTACGAAAGGTGACCAGTACCTG- GTTCCTGAAC GCGCTGGAAGGGGGCCGACCCCAGAAGGCCAGCCCCCGGTGTCAGGCCGGGA (SEQ ID NO: 106) >Left Transposon End for WP_083119439.1 TCACAGCGACGGGTCGATGAGATTGTCGATATCGCGCCGGAGGGCATCCGGGTCCACCGGCGGAAGATTCTTGC- GACGACGAAT CAGTTCCGCAGGTCCTGCGCTCGAACGAGGCAACGGCCGTAGCTCAGCAACAGGTGTCCCGTCCCGAGTAATGA- CGATGCGCTC GCCATGCTCGACACGGCGCAGAACGTCTCCTCCACTGTTGCGCAATTCGCGCACCGTGACTGCATCCACGTACG- AAGTGTATCA CCGGTGAGACGTCAAAGCAGAGTACCAAGAGTTCAACGTTGCGCACGCGAGAACCGGGGGGATCGAATTGCAGG- GTTATCAGTA ACCCATGGCATATTGCATTGCCGCCCAATGATTTTGACACTTCGAACCGCTAGATGGTCCCCCGTTTCAGAACC- CATTGATGCC GGACGGCGGTACCTCTGCGGCGATCTCAAAGTCGGCGTCATAGTGAATCAGCGTCACAACATGGGCCTCTGCGA- TGGTCGCGAT CAGGAGGTCGGCCATGCCGACGGCGCGGTTGTTGACGACGGTTGAAGTTCCGGCCAGTAGCGACGGTTGAAAAG- TAGGCCACCC AATCAGATTGGATGGGTGATCTCCTTGGAAGACTGGGCCTTGATCCGGCATCTTCAC (SEQ ID NO: 107) >Right Transposon End for WP_083119439.1 CCACGACCACGTGGCCTACTTTTCACCGTCGCATCTGGCCTGGTTTTCGACCGTCGTCAACAGCGGTGGCGACC- CGTGCTGGCG AGCTGCCGTTGCGCGCCGAGCGCCGTCTGCCAGTGCTCGTCATCGGTTGGCAGGTACTCATAGGCCAGACGGCG- GTCTGACCGG AGTTGTTCGTAGTCTTTAGGGCTGCGAGCGCAATAGAGCGCCTCGGCGTCGAGCTAAGCAGTTCATCCGCGCTG- GTATCGATGA GATGCCGAGCGA (SEQ ID NO: 108) >Left Transposon End for WP_018815463.1 CGACGACGCGCTGAGACTGACCCTCTAGCGGCGGATTGGGACTGACCCCTCCGAGGCTGGAGGGGTGATCGCAT- TGGAAGACTG GGCTGAGGTCCGTCGGTTGCATCGGGCTGAGGGTGTTCCGATCAAGGAGATTGCGCGTCGGTTGGGTTTGGCGC- GTAACACGGT CCGTTCGGCGCTTCGGGCCGAGGCGCCGCCGTCGCGTGAGCGCGGCCCTCGCGGTTCGTGT (SEQ ID NO: 109) >Right Transposon End for WP_018815463.1 CACGAAACGACCTCCGGGGGGTCAATCCCAAAACGTCGATAGGGGGTCAGTTCCAATCCGCCGTTGACAGGGGG- TGCTCCGACG GCCGAGAGTGCGCACGTGATGGCGGCTAT (SEQ ID NO: 110) >Left Transposon End for WP_077349293.1 GTCGATGCCGCATGAATACTGACCCCCAGGTGCCGAGCGAAAGTTGACCCCCTCGGTCAGAGTGAGGGA (SEQ ID NO: 111) >Right Transposon End for WP_077349293.1 GCACCGCCAACGGGACCAGGATGCGGATCTCGGCAGCCCACTTCCCGCAGGTCAAGACCATGCAGGACTTCGTC- TTCGACCACA
TCCCCGCCGCCACGCGCGACGTGATCGCGCACCTGGCAACTGGCACCTTCATCGCCAAGCGGGAGAACGTGGTC- CTGCTGGGGC CGCCGGGACCGGGAAGACCCACCTCGCAATCGCCGTCGCGATGAAGGCCGCGGAAGCGTCCTACCTGGTGTTGT- TCGACTCCGC GACCGGCTGGATCCACCGGCTGGCCCAAGCCCATGCCAAGGGCGGTCTCGAGCGAGAACTACGACGGCTGAACC- GCTATCGACT GCTCATCATCGACGAAGTTGGATACCTGCCGTTGGACGCGGCCGCGGCGGCGTTGTTCTTCCAACTCGTCGCCT- CCCGCTACGA GACCGGATCGATCATCGTGACCTCGAACCTGCCCTTCAGCCGCTGGGGCGAGACCCTCGGCGACGATGTCGTCG- CAGCAGCCAC CATCGACCGGCTCGTCCACCACGCCCACGTCATCGGCCTGGACGGCGACTCCTACCGAACCCGCGCACACCGCG- ACACCATCAA CCAGCAAACCAAGTAGCCAACCAACAAGAAACCCACCGAGAGGGGTCAATTTTCAATCAGCAGAGGGGGGGTCA- GTTTTGGCCC GGCGTTGACAGCCCGTCTGGTCCATCAC (SEQ ID NO: 112) >Left Transposon End for WP_043464545.1 GTCGAGCGGGGCACCTCGCACGGCTCGGATGTCGACAACGCCTCAAATTGTGCCGCCTCCAACGGATGAAAAGT- GGCCCGTCTG AACGGTCTGGCTCGTTGAGTCAGGCAGGGAGGAAGG (SEQ ID NO: 113) >Right Transposon End for WP_043464545.1 CCAACATCAACTGACTACGGGTCCACTTTTCATCCGCTGCCTCCGGTCCACGATTGCGCCGTTACCGACACTCG- GACAG (SEQ ID NO: 114) >Left Transposon End for ALD73443.1 TGTGTTTTCACTCATTTTGTCAGTGACGGTTGAAAAGTAGCCCAAAAACGACGGTCGAAAAGTAGCCCATCCGA- GAACAATTGG ATGGGTGATTTCC (SEQ ID NO: 115) >Right Transposon End for ALD73443.1 ACAACAAGCACAATTGGGCTACTTTTCCACCGTCAGAAGTGGGCTACTTTTCGACCGTCGCTGACACATTTCGA- TTCCTTTCAC (SEQ ID NO: 116) >Left Transposon End for EUA10187.1 CGGGGCTAGCTGTCGGCGACGCCTGAAAACTGACCCCGTGTCGACGCCCGAATTTTGACCCCCTTCGTCTGAGT- CTCGGCTTAC GAGCCGAGAGGAGAAGGGAGTTGTTAGCT (SEQ ID NO: 117) >Right Transposon End for EUA10187.1 ACACCAAGGGGGTCAATTTTCAACCGTCGAAAAGGGGTCAATTTTCGGCCGCCGTTGACACTAGCTCCGCAGCG- TGTGTTGGAG ATAGGG (SEQ ID NO: 118) >Left Transposon End for WP_079046208.1 TGGTGGTGCCCCCCTGTCGTTGCTTCCTGAAAACTGACCCCCAGGCAGTCTCCGAATCTTGACCCCCTCCTAAC- TCTTGGAGGG TGCTGAAGGTGGAGGACTGGGCAGAGATACGCCGGTTGCATCGGGCGGAGGGCGTGCCCATCAAGGAGATCGCG- CGTCGGCTGG GGGTGGCCCGGAACACGGTGCGGGCCGCGTTGAACTCGGACCGGCCGCCGAAGTACGAGCGGGCCTCGCGCGGA- CAGGTCGCGG ACGCGTTCGAGCCGCAGATGCGGGCCCTGCTCAAGGAGTGGCCGAGGATGCCGGCCCCGGTGATCGCCGAGCGG- ATCGGCTGGC CGTACTCGATGGCGCCGCTGCGCAAGCGGCTCGCGCTGATCCGGCCGGAGTACCTGGGTATCGACCCGGTCGAC- CGG (SEQ ID NO: 119) >Right Transposon End for WP_079046208.1 CCCGACCCCAGGGGATCAATTTTCCCGGAGCGGAAGGGGGTCAGTTTTCGCGAAGCGTTGACACCCCCCAGATG- CCGATCTGGT GTGTGG (SEQ ID NO: 94) >Left Transposon End for WP_003414864.1 CCGCCTGTTCGAGACCCCATGCCACGCTCGGCTGGCCGACGACGATCACCCATCGCAGACACCACACTTGGTAG- GGGTTGCCAG TTGTTGGCCGGGTGAGTGGTCGGCGCGCCGTTGCCCGGGGTAGGGTTCGAGGTCTTTGGATGATGGGCGTTTCC- ACGCTGCCCA AAGGATGACCTCGACGTGTCCGAGTTCACGTTGACCGCGTGAAGTTAAACCGGTGCCGAGCGTGCACTGAGGGC- GAAATCCGGC GCCGATTTTCCGCCCTGAGTTCACGTTGGGCGACGGCGCCCATGAACGACGCCACATCGCACATGGCGCTCAGG- CCAAGCACCA GCCCATCTCCGTCGCCGGCCACCGTCACCGATCGAACGACCTCGACCCCCGCCCTGGCAACAACACGCCGCTGC- CCTCTACACC TCCGCGCTGTCGAAAATTGTCACGGAGCCTTGCGGGGGCTGGTGCGACTGATATGACGCACCTTCCGCCAGAGG- CTAGCCCGAC GTTTACTGACGTTACTGCTGCTTACCGTTTGTCGACGGCACGTGAAAACTGACCCCGGCGCGGCACCCGAATTT- TGACCCCCTG GTCGGGTGGACTGGCTCTACCCGAGCCAGGAGGACCGAAGGGA (SEQ ID NO: 120) >Right Transposon End for WP_003414864.1 CGCTTCCCGGCTCGGAAGTCGTTGGAAGAGTTCGACTTTGAGCATGCTCGTGGCCTCAAACGCGACACCATCGC- ACATCTGGGC ACCCTGGATTTCATCACCGCCCGCGATAACGTCGTGTTTTTGGGCCCCGCCTGGCACCGGGAAGACTCATCTTG- CGGTCGGCCT GGCGATACGCGCGTGTCAGGCCGGTCATCGGGTGCTGTTCGCCACCGCCGCCGAATGGGTAGCACGGCTCGCCG- AGGCTCACCA CGCCGGGCGCATCTACGCCGAACTCACCCGGCTTTGCCGCTATCCGCTCCTGGTGGTTGACGAAGTCGGCTACA- TTCCGTTTGA GCCCGAGGCCGCCAACCTCTTCTTCCAGCTGGTGTCCTCCCGGTATGAGCGGGCCAGCTTGATCGTCACGTCCA- ATAAGGCCTT CGGCCGGTGGGGCGAGGTTTTCGGCGGCGACGACGTCGTTGCTGCCGCCATGATCGACCGCCTCGTCCACCATG- CTGAAGTCGT CGCCCTCAAAGGCGACAGCTACCGGCTCAAAGACCGCGACCTCGGCCGCGTCCCACCAGCCGGAACCACCGAAG- AATAACCACC AACCGCCCGGTCTAGGGGGTCAATTTTCAGATGCCGTCAGGGGGTCAGTTTTCGGGTGCCGTTGACACCGTTCA- CAAGGGCGTT TCGAGCAACGCGTCGACGCAACTTCGGC (SEQ ID NO: 121) >Left Transposon End for WP_072653819.1 CATCGCGGCGGCGACGGTCTCGTCGCCGAACGTCTCTCCCCAGCGTCCGAAGGGCTTGTTCGAGGTGACGATTA- CGCTCGCCCG TTCGTATCTGTTCGAGATCAGCTGGAAGAACAGGTTCGCCGCCTCGGCCTCAAACGGGATATACCCCACCTCGT- CGATCACGAT CAGCGGGTAGCGGCCGAGCTTGACCAGCTCATCCTGGAGCCGGCCGGTGTGATGGGCGGCGGCCAGGCGGTCGA- CCCACTGGGC GGCGGTGGCGAACGCGACCCGGTGGCCGGCCTGGCAGGCCCGGATCGCCAGCCCGGTCGCGATGTGCGTCTTGC- CCGTGCCCGG CGGCCCCAGGAAAATTGCGTTCTCCTTGGCCGCGATGAAGTCCAAGGTTCCCAGGTGGGCGAGTTGCTGGCGGG- TCATTCCGCG TAGATGGGCGAGGTCGAGTTCCTCGATCGTCTTGATCGCGGGGAAGCGGGCGGCGCGGATGCGGCCCTCGCCGC- CGTGGCTGTC GCGGGCCGAGACCTCCCGCTGCAGGCAGGCGACGAGGTATTCGAGGTGGCTCCAGGACTCGGCCTGGGCGCGTT- CGGCGAGCCG CTCGGCGGCGTCCAGCAGGGCGGGGGCTTTCATCGCGCGGGCGAGGAAGGCCAGGTCGGCGGCGGTTTGTCGGG- TGGTGCGGGC GGCGGGAACGCCGGCTTTCGCCGCGTCG (SEQ ID NO: 122) >Right Transposon End for WP_072653819.1 TGGCTCGCACCGCCACCGCCACGACCACCGCAGAGCAGACGCCCAAGGAAGGGCGGCAGACCTCGGCAGACCTC- GCGTTCCTCG CCCGCGCCATGAAGGCACCCGCTCTGCTGGACGCCGCCGAGCGCTTGGCCGAGCGGGCCCGCACCGAGTCCTGG- ACCCACCTCG AATACCTGGTCGCCTGCCTGCAGCGCGAGGTCTCCGCCCGTGACAGCCACGGCGGCGAACAGCGCATCCGGGCC- TGCCAGGCCG GCCACCGGGTCGCGTTCGCCACCGCCTCCCAATGGGTCGACCGCCTCGCCGCCGCCCACCACACCGGCCGCCTC- CAGGACGAAC TCGTCAAACTCGGCCGCTACCCGTTGATCGTTGTCGACGAGGTCGGCTACATCCCCTTCGAGCCCGAGGCCGCG- AACCTGTTCT TCCAGCTCGTCTCGAACAGATACGAAAGAGCGGCGCGGTACAGCTGGGCGTAAGCTGGCTGCCCCCCCGCCTGC- GCGGGGAGGA CGGACCGTGGCGGGATTCCCCGGTGGGCGGCGGCGACGGTCTCGTCGCCGAAGGTTTCTCCCCAGCGTCCGAAG- GGCTTGTTCG AGGTGACGATCACACTGGCTCTTTCGTCCTCCCCGCACCCGCGGGGGTCAGCCGCTGCCGCCCACCGGGGAATC- CCGCCACGGT CCGTCCTCCCCGCGCTCGCAGGGGTCAG (SEQ ID NO: 123) >Left Transposon End for WP_034088311.1 TCCTTATGCAGCCTGACCGTCTTGGTGACGGTTGGCGCGTGCTTCGCCCTCGCCCGGATCACGGACGGCTTCGG- CGCCACAGCG AGGACACCCCACCGCTCATCCGCGCCAGCGCCTACCTGTGGCGACGCATCACCCCCGGCTACTCGGACGGTAGC- CCCGTAGGAG CCTGCTGACGCCCCTCAGGGGGCGTGACCACCAGTGGTTACGCCAGCCGGATCGAGGCGGATCGAGCAGGGTTC- AGGAGGATGG AAACAGGGGTGTTTGTCCCGGTTATTCGGCAAAGCCGCAGGTCAGGGCTGTGCTCGGCATGGGTTCGACTCCCC- TAGGCTCCAC CCTATAAATGCCCTCTGAACTGCGGAAACGTGGAGTCGGAGGGCATTTCTCGTAACCACTGGTGGGCACGGCCG- TGCCCACTGG GAGCTCGGGGAGGCAAGCTTGGATGCGGTCTGCTCTTGTTGACCCTACGGGTTGCCTGGGATCACTATGCGCTT- GACCTGCGGA AATGCGAGTCCGTGTGTCGCTATGGCGCACCCAGGTACGTACGAGACTTCCAGGGGACAGCGGGGAGCATCTCC- CCTAGGTCTG GAGAGCGGGCCGGGGTGCGGAGCGGGGGTGGTGCGGTCTGTCGGTGTGACGTCCGTGGGGACGTGTCGGCCCGT- ACGATGGCGG TGTCCGTTGGCCGTTTGGGAGGCGCGCG (SEQ ID NO: 124) >Right Transposon End for WP_034088311.1 TGCTCCTCGCGCCCGCGGGGATGGTCCTCTCGAAACCAACAGCGACCTGACCGCCGATTCCTGCTCCCCGCGCT- TGCAGGGATG GTCCCTACATCTACCTGTCCACGACTCCCCAGCCCGACTGCTCCCCGTGCCTGCGGGGATGGTCCCATGGAGGC- CTTGGTGGGC AGCATGGCGGCGATCTGCTCCCTGCACCCGCGGGGATGGTCCCCCTCTCAGGGGTGCCGTCCGCCCCGTCCCGA- GCCTCCTGCG CCCCCGCCCGCGGGGATGGTTCCGTTTTGTCCTGCGCGCCGCATCCTGCCGCGCCCTATTCTGAGAGGCCGATG- CAACTCGGGC GATAGCTTCAGAGGAGCCTCCTGATGATCCTGAGGAGGCGCGCCCACCAGTGGTTACGCCAGCCGGATCGAGGC- GGATCGAGCA GGGTTCAGGAGGATGGAACAGGGGTGTTTGTCCCGGTTCTTCGGCAAAGCCGCAGGTCAGCGCTGTGCTCGGCA- TGGGTTCGAC TCCCCTAGGCTCCACACCCGAAGACCCTTCTGACCTGCGGAAACGCTGTCAGGAGGGTCTTTTTGTCCCGGTTT- TCTCTGCTGA CGTGTCAGCATAGGGGTGCTGACACGATCTTGAAGGAGTCAGTATTTCCGGGGCCCGGGGAGATGCTTGCTCCC- CTGACTGTTC CCCTGGGCTCGCGGTCCCACTGGGGAGA (SEQ ID NO: 125) >Left Transposon End for WP_081684113.1 GTCGACGGCACTCCAAAATGAGGCCATAGCGGCAGTCGAAAACGAGGCCACCCAGACAGATTGGTTGGGTGATC- TCT (SEQ ID NO: 126) >Right Transposon End for WP_081684113.1 GAACAACACGAACCGGTGGCCTCGTTTTCAACCGTCGGTTCGGCCTCAGTTTCAAGCGCCGTCGACAATGATCC- GGTCGCGGGA GAGGTCCGGCGCAGCTCCGGGCA (SEQ ID NO: 127) >Left Transposon End for WP_067466072.1 TTCTACGTGTCGACGACGGGTGAAAACTGACCCCCAGGCGACGGCCGAATTCTGACCCCCTTCACTCTCTGGAG- G (SEQ ID NO: 128) >Right Transposon End for WP_067466072.1 CAACTCGTAACCAACGACAACATCAAGGGGTCAACTTTCAGACGTCCCCAGGGGGTCAGAATTCAGCCGCCGTT- GACACTACGA GCCGGAGGGCGATCCCCTGCCCTA (SEQ ID NO: 129) >Left Transposon End for WP_086568729.1 CCGCCACGTCACTCGATGGCAAGGCCCGACGTAGGAGCGCCGCTGCAGTTCCCTGAGTCGACCCCTTGCCGGGA- ATTGCGGCGT TGTTGCTGGTCACGGCTGCAGTATGAAACGTTCTCGGTGCGTGAGGACGTGGGGTGTCCGGTGGCAAGGTGACC- CGGACCGCAG CGAAGCGAGGACCGGAAGCGCCGCGGCAGGACTAATCAGCGGATGCTGTGCGGTGTTGGCGTGGTGGACGGCGG- GGTCGTAGGG GGTGCCGGTGCGCCAGCAGGCGTGGATGACGCGTAGCCATCCTCGGCCGAGGATACGCACCGCGTTCGGGTGGC- GTTTGCCGCG GTCTCGGGCTCGCTGGTAGGCCTGTTCGGCCCAGGGATTGGCGTGCCGGCTGTTGTCGGCGAAGTGGGTCATGG- CCTTGCAGGC GGCCCGGTTGGCGCTGTGCCGAAGCCGGCAGCGTGGACTTTGCCGCTGGCGCGGGTGACCGGGGCCATGCCGAC- CTCAGCGGCG ATCCGATCGCAGGCCATGCACCCGCCGGGCCGAGAGGTGACAATCCTTGAAAGGGCTGCCTACCCAGGATCCCT- TATTAGCCGA TTGCCCCGGCCACGTACGGCGAGTACCGAAGTCGGCGATCGTGTCAACGACGCGCGAACGGTGACCCCCTGACG- ACGGCCGAAT AGTGACCCCCTTCAGGCACTCTGGAGGG (SEQ ID NO: 130) >Right Transposon End for WP_086568729.1 CAATCAAGTACACGACAGGGGTCAGTTTTCATCTGCCGATAGGGGGTCACAGTTCGAACGGCGTCGACAGATCG- TCCCGCCTCA GCCATCCACGATCGCTCAAGACCTTCAGGCGCAGTGCGTACCAAGGCTTCATGCGAAGCCAGTTCCTCTTCCTG- CGCTCAACGC CCGGCCGGGACTTGCGGGGCGAAACGTCTTCCTAGCTCGCGCCAGTGGCTGTAAAGCTCCCGCCGCTATCGATG- CCCGTGATGA CCTTCGACTGGTCCAGTCTAGGTCCCCACTGTGAGGTCAGGCGCGTGGGCGGCGCTGGTCGCGACCGGCCGAAG- AGCCGCAGCG CGGCCAGCGACCGCCCGACGCGCGGCAGGCCGCCGCAGGCGGGCTGCCCTTGAAGACGAGTATATAAGTTTCTT- TCGATCTTGC TGGTCGTACTCG (SEQ ID NO: 131) >Left Transposon End for AEE46996.1 CGTACGGGACGGACGCGTCAGCGCGCCGTTGTCGGCGACGCCTGAAAACTGACCCCGTGGCGACGGATGAAAAC- GGACCCCCTC GGCAACGCTGAGGAGGGTG (SEQ ID NO: 132) >Right Transposon End for AEE46996.1 CCACCACACGAGAGGGGTTCACTTTCGGACGTCGCCAGGGGGTTCGTTTTCAGCCGTCGTTGACAGCCGTCCTG- TCCGGCACCT CGACCCCGCGCCGCTCCAGCTCCTCGCGGACCGTGCGGGGGATGTCGCGGCACCCGCACTCCTGCGCGTGCGCG- AGGTGCCACG TGACGCGTGCGTCCATCGGCGCGCCCTGGCCCAGCACGTGGGCGTCGTGCCACTCGGCGTTCATGCGTCCACCC- CAGCACTGCG TCCCGGACCTGTCGAGGGGCGCAGGTCCCGGGGACGACGCGAGGCCGCCCCGACGGACGGGACGGCCTCGCGAC- GAGGG (SEQ ID NO: 133) >Left Transposon End for WP_015288588.1 CCTCAACGCCGGTCGCCACAGCCGCTCAAACGTGGCGGCCGCGCGTATTCGACCGTCCGTAGTGGTTCGTTAAA- GCGTTGCAGC ACAACGCATACAACAATCAATCGGCCATTGAGTTCGCACGCTCATGCAGTTGCGAATGGTCGGTGGATGCTCGA- AGCCAATGCA GAAAGCGACCGGCTCGATGAGCTGCACCAGCAGTATCACCGAGATGATCTTGGCGGTAATCAGGCTTGTATCTC- TTGTAGTGTG
TCGGCGGCAACTGAATACTGACCAGAGCGCGGCAACTGAAAATTGACCAGCTTCCTGGAGAGCCTTGGCTATGG- GCCAAGGAGG AAGCGA (SEQ ID NO: 134) >Right Transposon End for WP_015288588.1 AACCAAGCTGGTCAATTTTCGATTGCCGACACCTGATCAGTTTTCGGTTGCCGTTGACATAGTGCCCAAAACAC- GCACCCACAT CAGATGCAGAACCCCTTGACAACCAATAGGGAATCTCTTCGCATGATGGAGGTTGCTGGCACCAATCCA (SEQ ID NO: 135) >Left Transposon End for WP_067342985.1 CGCCTGTTCCCCTGGGGAACACCAAAGGCCCGTACCGGCCGCGGAAGCTGAGATCCTCCGCGCCCGGTACGGGC- CTTGTCGTTG TGCACCACCAGGTTACGCTGTCCTTTGAAGCCAAGTTAGTCCGGTGCGATGTCCTGACCTGGAGCGACTAGGTT- GGATGTCAAA GGACATTCCTCTCAACTGCCGCGGGTGTTCCCCAGGGGAACACCTCGCGCCACTGTCCCTGGACGAATCAGGCA- GGCAGCGGGT CAAACGGGAACACCCTCAAGACCACCACGATCAGCGACCATTAAAGCCGTGCTGTGGTGCCGCGGCCGTGGACT- CTCGGCGTCG CTCGAATCCTGACCCTCTGACGGCGTATCAAATCTGACCCACTTGGTGATCTTCATCTGACCCTGGCCTTGGTG- ATCAAGGTCA GGAGGGAAGAG (SEQ ID NO: 136) >Right Transposon End for WP_067342985.1 GCAACCGCAGCAGACACCACCTGGGTCAGGATTCAACCGACCAAAGTGGGTCAGAGTTCAGCCGCCGCCGACAT- GGACCGCGGA GATGCCCCTGCGGACGCGCTGCTCGCCGCCGGGTTCACCTTGCTGCTGTGGTGCCGTGGTCGTGGATCGCGGAG- ATGCCCTGCG CGTGGCAGGAACAGTCGAGGACAAGATCGA (SEQ ID NO: 137)
Example 2--Design of Engineered System for a CLUST.009925 CRISPR System (FIG. 8)
[0194] Having identified the minimal CLUST.009925 CRISPR-Cas system components, we composed an engineered system for transposon excision, mobilization, and programmable insertion (FIG. 8). Minimally, the natural locus consists of a transposon comprising left and right Tnp ends closely flanking effectors A-B, and a CRISPR array located immediately adjacent to the transposon. In some embodiments, the Cas Effector Cassette of an engineered CLUST.009925 system is composed of effectors A and B under the control of a single artificial promoter. In some embodiments, the Transposon Payload Cassette of an engineered CLUST.009925 system is composed of Tnp ends flanking a nucleic acid cargo ranging from approximately less than 100 nt to greater than 25 kb (FIG. 8). In some embodiments, the RNA Guide Cassette of an engineered CLUST.009925 system is expressed under the control of a single artificial promoter. In some embodiments, the RNA guide may consist of a minimal CRISPR array containing two or more direct repeats and one or more spacers. In some embodiments, the RNA guide may consist of one or more crRNAs containing processed direct repeat and spacer components, and optionally a fused tracrRNA component. In some embodiments, the RNA Guide Cassette contains a tracrRNA expressed under a second artificial promoter.
[0195] We selected the NZ_BCQZ01000006 locus for functional validation of the engineered CLUST.009925 system.
[0196] DNA Synthesis & Effector Library Cloning
[0197] To test the activity of the NZ_BCQZ01000006 CRISPR-Cas system, we designed and synthesized a minimal engineered system as described above into the pET28a(+) vector. The synthesized system consisting of an Effector Cassette and acceptor site for an RNA Guide Cassette were included as the cargo region of a Transposon Payload Cassette. The synthesized combined IPTG inducible T7 and lac promoters were used to drive expression of the Effector Cassette, and each CDS sequence was codon optimized for E. coli expression and preceded by an E. coli ribosome binding sequence. A J23119 (Registry of Standard Biological Parts: http://parts.igem.org/Part:BBa_J23119) promoter was used for expression of the RNA Guide Cassette.
[0198] In tandem with the effector gene synthesis, we first computationally designed an oligonucleotide library synthesis (OLS) pool containing "repeat-spacer-repeat" sequences, where "repeat" represents the consensus direct repeat sequence found in the CRISPR array associated with the effector, and "spacer" represents sequences tiling the pACYC184 plasmid and E. coli essential genes. The spacer length was determined by the mode of the spacer lengths found in the endogenous CRISPR array. The repeat-spacer-repeat sequence was appended with restriction sites enabling the bi-directional cloning of the fragment into the acceptor site for the RNA Guide Cassette, as well as unique PCR priming sites to enable specific amplification of a specific repeat-spacer-repeat library from a larger pool. The library synthesis was performed by Agilent Genomics.
[0199] We next cloned the CRISPR array library into the plasmid containing the minimal engineered NZ_BCQZ01000006 CLUST.009925 system using the Golden Gate assembly method. In brief, we first amplified each repeat-spacer-repeat from the CRISPR array library using unique PCR primers, and pre-linearized the plasmid backbone using BsaI to reduce potential background. Both DNA fragments were purified with Ampure XP (Beckman Coulter) prior to addition to Golden Gate Assembly Master Mix (New England Biolabs) and incubated as per manufacturer's instructions. We further purified and concentrated the Golden Gate reaction to enable maximum transformation efficiency in the subsequent steps of the bacterial screen.
Example 3--Functional Screening of a CLUST.009925 CRISPR System (FIGS. 9-15)
[0200] Functional Screening for CLUST.009925
[0201] To accelerate functional screening of the NZ_BCQZ01000006 CLUST.009925 system, we developed a strategy to derive the following functional information in a single screen: 1) crRNA expression direction and processing, 2) nucleic acid substrate type, and 3) targeting requirements such as protospacer adjacent motif (PAM), protospacer flanking sequence (PFS), or target secondary structure. We designed minimal CRISPR array libraries consisting of two consensus direct repeats, each flanking a unique natural-length spacer sequence targeting either the pACYC184 vector, E. coli essential genes, or an absent GFP sequence as a negative control. We also designed a bidirectional array library cloning strategy to test both possible CRISPR array expression directions in parallel.
[0202] The CRISPR array library was cloned into the RNA Guide Cassette acceptor site of the NZ_BCQZ01000006 CLUST.009925 CRISPR-Cas system expression plasmid such that each element in the resulting plasmid library contained a single RNA guide element expressed in either the forward or reverse orientation. The resulting plasmid libraries were transformed with pACYC184 into E. coli using electroporation, yielding a maximum of one plasmid library element per cell. Transformed E. coli cells were plated on bioassay plates containing Kanamycin (selecting for the library plasmid), Chloramphenicol (CAM; selecting for intact pACYC184 CAM expression), and Tetracycline (TET; selecting for intact pACYC184 TET expression).
[0203] Programmable mobilization and insertion of the NZ_BCQZ01000006 Transposon Payload Cassette was assessed as follows. Following transformation, the CLUST.009925 system mobilizes the Transposon Payload Cassette by excising it from the donor plasmid followed by programmable insertion of the mobilized cassette at a target site specified by the spacer sequence of the expressed RNA guide (FIGS. 9-10). Interruption of pACYC184 antibiotic resistance genes or E. coli essential genes by targeted insertion of the Transposon Payload Cassette by the CLUST.009925 CRISPR-Cas system results in bacterial cell death and depletion of the targeting RNA guide (FIG. 11). To invesitigate negative selection of RNA guides resulting from programmable transposon insertion, bacteria were harvested 12 h after plating, and plasmid DNA was extracted. We PCR amplified the CRISPR array region of the input plasmid library prior to transformation and the output plasmid library after bacterial selection on antibiotic plates and compared the frequency of RNA guide elements in these samples to identify depleted elements (FIG. 11).
[0204] Bacterial Screen Transformation
[0205] The plasmid library containing the distinct repeat-spacer-repeat elements and Cas proteins was electroporated into Endura or E. cloni electrocompetent E. coli (Lucigen) using a Gene Pulser Xcell (Bio-rad) following the protocol recommended by Lucigen. The library was either co-transformed with purified pACYC184 plasmid, or directly transformed into pACYC184-containing Endura or E. cloni electrocompetent E. coli (Lucigen), plated onto agar containing Chloramphenicol (Fisher), Tetracycline (Alfa Aesar), and Kanamycin (Alfa Aesar) in BioAssay dishes (Thermo Fisher), and incubated for 10-12 h. After estimation of approximate colony count to ensure sufficient library representation on the bacterial plate, the bacteria were harvested and DNA plasmid extracted using a QlAprep Spin Miniprep Kit (Qiagen) to create the `output library`. By performing a PCR using custom primers containing barcodes and sites compatible with Illumina sequencing chemistry, we generated a barcoded next generation sequencing library from both the pre-transformation `input library` and the post-harvest `output library`, which were then pooled and loaded onto a Nextseq 550 (Illumina) to evaluate the effectors. At least two independent biological replicates were performed for each screen to ensure consistency.
[0206] Bacterial Screen Sequencing Analysis
[0207] Next generation sequencing data for screen input and output libraries were demultiplexed using Illumina bcl2fastq. Reads in resulting fastq files for each sample contained the CRISPR array elements for the screening plasmid library. The direct repeat sequence of the CRISPR array was used to determine the array orientation, and the spacer sequence was mapped to the source plasmid pACYC184 or negative control sequence (GFP) to determine the corresponding target. For each sample, the total number of reads for each unique array element (ra) in a given plasmid library was counted and normalized as follows: (r.sub.a+1)/total reads for all library array elements. The depletion score was calculated by dividing normalized output reads for a given array element by normalized input reads.
[0208] To identify specific parameters resulting in enzymatic activity and bacterial cell death, we used next generation sequencing (NGS) to quantify and compare the representation of individual CRISPR arrays (i.e., repeat-spacer-repeat) in the PCR of the input and output plasmid libraries. We define the array depletion ratio as the normalized output read count divided by the normalized input read count. An array was considered to be strongly depleted if the fold depletion was at least than 3. When calculating the array depletion ratio across biological replicates, we took the maximum depletion ratio value for a given CRISPR array across all experiments (i.e. a strongly depleted array must be strongly depleted in all biological replicates). We generated a matrix including array depletion ratios and the following features for each spacer target: target strand, transcript targeting, ORI targeting, target sequence motifs, flanking sequence motifs, and target secondary structure. We investigated the degree to which different features in this matrix explained target depletion for CLUST.009925 systems, thereby yielding a broad survey of functional parameters within a single screen.
[0209] Bacterial Screening Indicates Programmable DNA Interference by the NZ_BCQZ01000006 CLUST.009925 System
[0210] Comparison of two bioreplicate screens for the NZ_BCQZ01000006 CLUST.009925 CRISPR-Cas system showed replication of depleted RNA guide elements for a single direct repeat expression direction (5' GTGC . . . ATCC--[spacer] . . . 3'). This indicates a specific interaction of the NZ_BCQZ01000006 CLUST.009925 effector proteins with a specific direct repeat orientation, and accompanying modification of targeted pACYC and E. coli essential gene target sequences (FIGS. 12-13). Targeting of both strands of pACYC and E. coli essential genes indicates DNA targeting not confined to a specific strand orientation (FIGS. 14A-B).
[0211] Additionally, localization of depleted targets to sequences either within or in close proximity to gene coding sequences is indicative of coding DNA interruption associated interference consistent with the interference arising from mobilization and programmable insertion of the Transposon Payload Cassette at or in close proximity to target sites (FIGS. 14A-B). In an attempt to understand targeting requirements for the CLUST.009925 system, we compiled all target and target-flanking sequences associated with depleted RNA guide elements (FIGS. 15A-C). Flanking sequences showed no evidence of PAM or PFS elements required for targeting, and target sequences showed a slight preference for G or C in the first position of the spacer (FIGS. 15A-C).
OTHER EMBODIMENTS
[0212] It is to be understood that while the invention has been described in conjunction with the detailed description thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims.
Sequence CWU
1
1
1371422PRTStreptomyces viridifaciens 1Met Ile Arg Val Glu Asp Trp Ala Glu
Ile Arg Arg Leu His Arg Ala1 5 10
15Glu Gln Met Pro Ile Arg Ala Ile Ala Arg His Leu Gly Ile Ser
Lys 20 25 30Asn Thr Val Lys
Arg Ala Leu Ala Thr Asp Arg Pro Pro Val Tyr Gln 35
40 45Arg Pro Leu Lys Gly Ser Ala Val Asp Ala Val Glu
Pro Ala Ile Arg 50 55 60Glu Leu Leu
Arg Gln Thr Pro Ala Met Pro Ala Thr Val Ile Ala Glu65 70
75 80Arg Ile Gly Trp Asp Arg Gly Leu
Thr Ile Leu Lys Glu Arg Val Arg 85 90
95Glu Leu Arg Pro Ala Tyr Leu Pro Val Asp Pro Val Ser Arg
Thr Val 100 105 110Tyr Ser Pro
Gly Glu Leu Ala Gln Cys Asp Leu Trp Phe Pro Pro Ala 115
120 125Glu Ile Pro Leu Gly Tyr Gly Gln Thr Gly Arg
Pro Pro Val Leu Val 130 135 140Val Val
Ser Gly Tyr Ser Arg Val Ile Thr Ala Arg Met Leu Pro Ser145
150 155 160Arg Ser Thr Ala Asp Leu Ile
Asp Gly His Trp Arg Leu Leu Thr Gly 165
170 175Trp Gly Ala Val Pro Lys Thr Leu Val Trp Asp Asn
Glu Ser Gly Val 180 185 190Gly
Gln Gly Lys Leu Thr Ser Glu Phe Ala Ala Phe Ala Gly Leu Leu 195
200 205Ala Thr Arg Ile His Leu Cys Arg Pro
Arg Asp Pro Glu Ala Lys Gly 210 215
220Leu Val Glu Arg Val Asn Gly Tyr Leu Glu Thr Ser Phe Leu Pro Gly225
230 235 240Arg Thr Phe Ala
Gly Pro Gly Asp Phe Asn Thr Gln Leu Thr Ala Trp 245
250 255Leu Gln Val Ala Asn Arg Arg His His Arg
Thr Ile Gly Cys Arg Pro 260 265
270Ala Glu Arg Trp Glu Ala Asp Arg Ser Glu Met Leu Thr Leu Pro Pro
275 280 285Val Ala Pro Pro Asn Trp Trp
Pro Gln Thr Thr Arg Leu Gly Arg Asp 290 295
300His Tyr Ile Arg Val Asp Thr Cys Asp Tyr Ser Val His Pro Arg
Ala305 310 315 320Ile Gly
Gln Gln Ile Thr Val Arg Ala Asp Thr Glu Glu Ile Val Ala
325 330 335Thr Asn Arg Thr Gly Glu Val
Val Ala Arg His Ser Arg Cys Trp Ala 340 345
350Arg His Gln Thr Ile Thr Asp Pro Glu His Thr Ala Ala Ala
Gln Thr 355 360 365Leu Arg Gly Gln
Ala Ile Arg Gln Arg Ala Ser Arg Ala Trp Ala Ser 370
375 380Ser Leu Thr Ala Leu Ala Pro Asp Asp Leu Gly Val
Glu Val Glu Gln385 390 395
400Arg Glu Leu Gly Ser Tyr Asp Arg Ile Phe Thr Leu Ile Glu Gly Gly
405 410 415Ala Gly Lys Glu Asp
Thr 4202413PRTMycobacterium tuberculosis 2Met Leu Thr Val Glu
Asp Trp Ala Glu Ile Arg Arg Leu His Arg Ala1 5
10 15Glu Gly Leu Pro Ile Lys Met Ile Ala Arg Val
Leu Gly Ile Ser Lys 20 25
30Asn Thr Val Lys Ser Ala Leu Glu Ser Asn Gln Gln Pro Lys Tyr Glu
35 40 45Arg Ala Pro Gln Gly Ser Ile Val
Asp Ala Val Glu Pro Arg Ile Arg 50 55
60Glu Leu Leu Gln Ala Tyr Pro Thr Met Pro Ala Thr Val Ile Ala Glu65
70 75 80Arg Ile Gly Trp Glu
Arg Ser Ile Arg Val Leu Ser Ala Arg Val Ala 85
90 95Glu Leu Arg Pro Val Tyr Leu Pro Pro Asp Pro
Ala Ser Arg Thr Thr 100 105
110Tyr Val Ala Gly Glu Ile Ala Gln Cys Asp Phe Trp Phe Pro Pro Ile
115 120 125Glu Leu Pro Val Gly Phe Gly
Gln Thr Arg Thr Ala Lys Gln Leu Pro 130 135
140Val Leu Thr Met Val Cys Ala Tyr Ser Arg Trp Leu Leu Ala Met
Leu145 150 155 160Leu Pro
Ser Arg Cys Ala Glu Asp Leu Phe Ala Gly Trp Trp Arg Leu
165 170 175Ile Glu Ala Leu Gly Ala Val
Pro Arg Val Leu Val Trp Asp Gly Glu 180 185
190Gly Ala Ile Gly Arg Trp Arg Gly Gly Arg Ser Glu Leu Thr
Thr Glu 195 200 205Cys Gln Ala Phe
Arg Gly Thr Leu Ala Ala Lys Val Leu Ile Cys Arg 210
215 220Pro Ala Asp Pro Glu Ala Lys Gly Leu Ile Glu Arg
Ala His Asp Tyr225 230 235
240Leu Glu Arg Ser Phe Leu Pro Gly Arg Val Phe Ala Ser Pro Ala Asp
245 250 255Phe Asn Ala Gln Leu
Gly Ala Trp Leu Ala Leu Val Asn Thr Arg Thr 260
265 270Arg Arg Ala Leu Gly Cys Ala Pro Thr Asp Arg Ile
Gly Ala Asp Arg 275 280 285Ala Ala
Met Leu Ser Leu Pro Pro Val Ala Pro Ala Thr Gly Trp Cys 290
295 300Thr Ser Leu Arg Leu Pro Arg Asp His Tyr Val
Arg Cys Asp Ser Asn305 310 315
320Asp Tyr Ser Val His Pro Gly Val Ile Gly His Arg Val Leu Val Arg
325 330 335Ala Asp Leu Glu
Arg Val His Val Phe Cys Asp Gly Glu Leu Val Ala 340
345 350Asp His Glu Arg Ile Trp Ala Val His Gln Thr
Val Ser Asp Pro Ala 355 360 365His
Val Glu Ala Ala Lys Val Leu Arg Arg Arg His Phe Ser Ala Ala 370
375 380Ser Pro Val Val Glu Pro Gln Val Gln Val
Arg Ser Leu Ser Asp Tyr385 390 395
400Asp Asp Ala Leu Gly Val Asp Ile Asp Gly Gly Val Ala
405 4103415PRTStreptomyces fradiae 3Met Ile Ser Val
Glu Asp Trp Ala Glu Ile Arg Arg Leu His Arg Ser1 5
10 15Glu Gly Met Pro Ile Arg Ala Ile Ala Arg
Lys Leu Gly Ile Ser Arg 20 25
30Thr Thr Val Arg Arg Ala Val Ala Ser Asp Arg Pro Pro Lys Tyr Glu
35 40 45Arg Ala Pro Lys Gly Ser Ile Val
Asp Glu Val Glu Pro Arg Ile Arg 50 55
60Glu Leu Leu Glu Val Trp Pro Asp Met Pro Ala Thr Val Val Ala Glu65
70 75 80Arg Ile Gly Trp Gln
Arg Gly Met Thr Val Leu Arg Asp Arg Leu Arg 85
90 95Asp Leu Arg Arg Asp Tyr Val Pro Ala Asp Pro
Ala Ser Arg Thr Ala 100 105
110Tyr Glu Pro Gly Glu Leu Val Gln Cys Asp Leu Trp Phe Pro Pro Ala
115 120 125Asp Ile Pro Leu Gly Phe Gly
Gln Val Gly Ser Pro Pro Val Leu Val 130 135
140Met Val Ser Gly Tyr Ser Arg Trp Ile Thr Ala Arg Met Leu Pro
Thr145 150 155 160Arg Ser
Ala Ala Asp Leu Ile Ala Gly His Trp Arg Leu Leu Thr Gly
165 170 175Leu Gly Ala Val Pro Lys Ala
Leu Val Trp Asp Asn Glu Gly Ala Val 180 185
190Gly Ser Trp Arg Gly Gly Arg Pro Arg Leu Thr Glu Asp Phe
Ala Ala 195 200 205Phe Ala Gly Leu
Leu Gly Ile Arg Ile Val Gln Cys Arg Pro Gly Asp 210
215 220Pro Glu Ala Lys Gly Met Val Glu Arg Ala Asn Gly
Tyr Leu Glu Thr225 230 235
240Ser Phe Leu Pro Gly Arg Val Phe Ala Ser Pro Thr Asp Phe Asn Val
245 250 255Gln Leu Glu Asp Trp
Leu Arg Arg Ala Asn Arg Arg Val His Arg Thr 260
265 270Leu Gln Ala Arg Pro Ala Asp Arg Ile Asp Ala Asp
Arg Ala Gly Met 275 280 285Leu Pro
Leu Pro Pro Val Asp Pro Pro Gly Trp Trp Arg Thr Ser Leu 290
295 300Arg Leu Pro Arg Asp His Tyr Val Arg Val Asp
Thr Cys Asp Tyr Ser305 310 315
320Val His Pro Leu Ala Ile Gly Arg Arg Ile Glu Val Lys Ala Gly Leu
325 330 335Glu Gly Val Val
Val Phe Cys Glu Gly Thr Glu Val Ala Arg His Val 340
345 350Arg Cys Trp Ala Arg His Gln Thr Ile Thr Asp
Pro Ala His Ser Ala 355 360 365Ala
Ala Gly Ala Ala Arg Leu Thr Ala Cys Lys Ala Pro Ala Asp Asp 370
375 380Gly Glu Ala Glu Val Glu Gln Arg Ser Leu
Asp Thr Tyr Asp Arg Ile385 390 395
400Phe Gly Val Ile Glu Gly Gly Leu Gly Gln Glu Glu Gly Ile Ala
405 410
4154415PRTStreptomyces noursei 4Met Ile His Val Glu Asp Trp Ala Glu Ile
Arg Arg Leu His Arg Ala1 5 10
15Glu Gln Met Pro Ile Arg Ala Ile Ala Arg His Leu Gly Ile Ser Lys
20 25 30Asn Thr Val Lys Arg Ala
Leu Ala His Asp Arg Pro Pro Lys Tyr Glu 35 40
45Arg Pro Ala Lys Gly Ser Ala Val Asp Ala Val Glu Val Gln
Ile Arg 50 55 60Glu Leu Leu Arg Glu
Thr Pro Thr Met Pro Ala Thr Val Ile Ala Glu65 70
75 80Arg Ile Gly Trp Gln Arg Gly Met Thr Ile
Leu Arg Glu Arg Val Arg 85 90
95Glu Leu Arg Pro Ala Tyr Leu Pro Val Asp Pro Val Ser Arg Thr Thr
100 105 110Tyr Arg Pro Gly Glu
Leu Ala Gln Cys Asp Leu Trp Phe Pro Glu Ala 115
120 125Asp Ile Pro Leu Gly Tyr Gly Gln Thr Gly Arg Pro
Pro Val Leu Val 130 135 140Met Val Ser
Gly Tyr Ser Arg Ile Ile Ala Ala Arg Met Leu Pro Ser145
150 155 160Arg Arg Ser Gly Asp Leu Ile
Asp Gly His Trp Arg Leu Leu Thr Ala 165
170 175Trp Gly Ala Val Pro Arg Met Leu Val Trp Asp Asn
Glu Ala Gly Val 180 185 190Gly
Lys Gly Arg Val Thr Ser Glu Phe Ala Ala Phe Ala Gly Leu Leu 195
200 205Ala Thr Lys Ile Tyr Leu Cys Arg Pro
Arg Asp Pro Glu Ala Lys Gly 210 215
220Leu Val Glu Arg Ala Asn Gly Tyr Leu Glu Thr Ser Phe Leu Pro Gly225
230 235 240Arg His Phe Thr
Gly Pro Asp Asp Phe Asn Thr Gln Leu Asp Ala Trp 245
250 255Leu Lys Val Ala Asn Arg Arg Val His Arg
Thr Leu Gln Ala Arg Pro 260 265
270Ser Asp Arg Trp Glu Ala Asp Arg Ala Gly Met Leu Ala Leu Pro Pro
275 280 285Val Asp Pro Pro Ser Trp Trp
Arg Phe Gln Ile Arg Leu Gly Arg Asp 290 295
300His Tyr Val Arg Val Asp Thr Cys Asp Tyr Ser Val Asp Pro Ala
Ala305 310 315 320Ile Gly
Arg Met Val Thr Val Leu Cys Asp Asn Asp Glu Val Ile Val
325 330 335Leu Ala Gln Gly Gly Glu Ile
Val Ala Arg His Pro Arg Cys Trp Ala 340 345
350Arg His Gln Thr Leu Thr Asp Pro His His Ala Ala Ala Gly
Asp Val 355 360 365Met Arg Arg Glu
Val His Arg Arg His Gly Ala Ala Cys Ala Ala Ala 370
375 380Ala Pro Asp Val Val Glu Val Glu Gln Arg Glu Leu
Gly Thr Tyr Asp385 390 395
400Arg Leu Phe Thr Val Ile Asp Gly Gly Asn Asn Gln Glu Ala Gly
405 410 4155419PRTTessaracoccus sp.
T2.5-30 5Met Ile Ser Val Glu Asp Trp Ala Glu Ile Arg Arg Leu His Arg Ser1
5 10 15Glu Gly Leu Ala
Ile Lys Ala Ile Ala Arg Gln Leu Gly Val Ala Arg 20
25 30Asn Thr Val Arg Ser Ala Leu Ala Ser Asp Val
Pro Pro Arg Tyr Glu 35 40 45Arg
Glu Ser Pro Gly Ser Leu Val Asp Ala Val Glu Pro Gln Ile Arg 50
55 60Ala Leu Leu Ala Ala Thr Pro Arg Met Pro
Ala Thr Val Ile Ala Glu65 70 75
80Arg Ile Ser Trp Glu His Ser Ser Ser Val Leu Arg Ala Arg Val
Ala 85 90 95Glu Leu Arg
Pro Leu Tyr Leu Pro Ala Asp Pro Ala Asp Arg Thr Gln 100
105 110Tyr Arg Ala Gly Glu Ile Val Gln Cys Asp
Leu Trp Phe Pro Pro Arg 115 120
125Val Val Pro Val Ala Asp Gly Val Leu Ala Ala Pro Pro Val Leu Thr 130
135 140Met Val Ala Ala Trp Ser Gly Phe
Ile Ala Ala Leu Leu Leu Pro Thr145 150
155 160Arg Gln Thr Gly Asp Leu Leu Ala Gly Met Trp Gln
Leu Leu Thr Gly 165 170
175Ser Phe Gly Ala Val Pro Arg Met Leu Val Trp Asp Asn Glu Ala Gly
180 185 190Ile Gly Gln His Arg Arg
Leu Thr Val Pro Ala Arg Gly Phe Ala Gly 195 200
205Thr Leu Gly Thr Arg Ile Tyr Gln Thr Asn Pro Arg Asp Pro
Glu Ala 210 215 220Lys Gly Val Val Glu
Arg Ala Asn Gly Phe Leu Gln Thr Ser Phe Met225 230
235 240Pro Gly Arg Glu Phe Val Ser Pro Ala Asp
Phe Asn Thr Gln Leu Ala 245 250
255Asp Trp Leu Pro Arg Ala Asn Gln Arg Leu Leu Arg Arg Thr Gly Ala
260 265 270Arg Pro Ala Asp Met
Leu Ala Ser Glu Ala Ala Ala Met Ser Ser Leu 275
280 285Pro Pro Val Ala Pro Leu Val Gly Ser Ser Ser Arg
Val Arg Leu Gly 290 295 300Arg Asp Tyr
Tyr Val Arg Val Ala Gly Asn Asp Tyr Ser Val Asp Pro305
310 315 320Ala Val Ile Gly Arg Phe Val
Asp Val Gly Cys Asp Met Asp Thr Val 325
330 335Thr Val Thr Cys Ala Gly Gln Pro Val Ala Ser His
Ala Arg Cys Trp 340 345 350Asp
Gln Arg Arg Thr Leu Thr Asp Pro Ala His Val Ala Thr Ala Arg 355
360 365Thr Leu Arg Ala Ala His Gln Ala Gln
Lys Thr Ala Gly Arg Val Asp 370 375
380Ala Arg Gly Ala Gly Glu Asp Val Ala Leu Arg Pro Leu Ser Val Tyr385
390 395 400Asp Asp Leu Phe
Asp Leu Thr Gly Ile Asp Pro Ala Ser Ala Ser Gly 405
410 415Ala Val Ala6405PRTStreptosporangium sp.
M26 6Met Ile Lys Val Glu Asp Trp Ala Glu Ile Arg Arg Leu Tyr Arg Ala1
5 10 15Glu Ala Met Pro Ile
Lys Ala Ile Ala Arg His Leu Gly Ile Ser Lys 20
25 30Asn Thr Val Lys Arg Ala Leu Ala Ala Asp Ala Pro
Pro Lys Tyr Gln 35 40 45Arg Pro
Ser Lys Gly Ser Ile Val Asp Ala Ala Glu Pro Gln Ile Arg 50
55 60Ala Leu Leu Ala Lys Phe Pro Thr Met Pro Ala
Thr Val Ile Ala Glu65 70 75
80Arg Ile Gly Trp Glu Arg Ser Ile Thr Val Leu Lys Asp Arg Ile Arg
85 90 95Val Leu Arg Pro Gln
Phe Lys Pro Val Asp Pro Ala Ser Arg Thr Thr 100
105 110Tyr Gln Ala Gly Glu Leu Ala Gln Cys Asp Leu Trp
Phe Pro Pro Val 115 120 125Lys Val
Pro Val Gly Ala Gly His Leu Ala Ser Pro Pro Val Leu Val 130
135 140Met Val Ser Gly Tyr Ser Arg Trp Met Leu Pro
Ser Arg Thr Ala Gly145 150 155
160Asp Leu Phe Ala Gly His Trp Ser Leu Leu Ser Asp Leu Gly Ala Val
165 170 175Pro Lys Thr Leu
Val Trp Asp Asn Glu Ser Ala Ile Gly Gln Arg Arg 180
185 190Gly Gly Lys Ala Gln Leu Thr Ala Asp Ala Asn
Ala Phe Arg Gly Ala 195 200 205Leu
Gly Val Gly Ile Thr Gln Cys Arg Pro Gly Asp Pro Glu Ala Lys 210
215 220Gly Leu Val Glu Arg Ala Asn Gly Tyr Leu
Glu Thr Ser Phe Leu Pro225 230 235
240Gly Arg Thr Phe Ser Cys Pro Gln Asp Phe Asn Thr Gln Leu Ala
Asp 245 250 255Trp Leu Pro
Met Ala Asn Glu Arg His His Arg Arg Ile Gln Cys Arg 260
265 270Pro Leu Asp Arg Leu Arg Ala Asp Leu Ala
Ala Met Val Ala Leu Pro 275 280
285Pro Ile Ala Pro Leu Leu Gly Trp Arg Thr Ser Thr Arg Leu Ala Arg 290
295 300Asp His Tyr Val Arg Ile Ala Ser
Cys Asp Tyr Ser Val His Pro Ser305 310
315 320Val Ile Gly Arg Leu Val Glu Val Ile Ala Asp Leu
Glu Glu Val Thr 325 330
335Val Thr Cys Gly Gly Gln Val Val Ala Arg His Arg Arg Cys Trp Ala
340 345 350Pro His Gln Thr Ile Thr
Asp Ala Leu His Ala Gln Val Ala Ala Ala 355 360
365Met Arg Arg Thr Cys Leu Arg Thr Ala Thr Ala Pro Ala Lys
Ala Glu 370 375 380Val Glu Val Glu Gln
Arg Arg Leu Ser Asp Tyr Asp Ala Leu Leu Gly385 390
395 400Ile Glu Gly Val Ala
4057413PRTMycobacterium rhodesiae 7Met Ile Ser Leu Glu Asp Trp Ala Leu
Ile Arg His Leu His Arg Ser1 5 10
15Glu Gly Leu Ser Gln Arg Ala Ile Ala Arg Gln Leu Gly Ile Ala
Arg 20 25 30Asp Thr Val Ala
Ser Ala Leu Ala Ser Asp Gly Pro Pro Lys Tyr Glu 35
40 45Arg Ala Ala Val Pro Ser Ala Ile Asn Glu Val Glu
Pro Arg Ile Arg 50 55 60Ala Leu Leu
Thr Ala Tyr Pro Gln Leu Pro Ala Thr Val Ile Ala Glu65 70
75 80Arg Val Gly Trp Thr Gly Ser Ile
Ser Trp Phe Arg Glu Arg Val Arg 85 90
95Ala Ile Arg Pro Glu Tyr Leu Pro Ala Asp Pro Val Asp Arg
Leu Glu 100 105 110His Pro Pro
Gly Arg Val Ile Gln Cys Asp Leu Trp Phe Pro Ala Pro 115
120 125Lys Ile Ala Val Gly Phe Gly Gln Glu Ser Ile
Leu Pro Val Leu Val 130 135 140Met Val
Ala Ala Phe Ser Arg Phe Ile Ala Ala Val Met Leu Pro Ser145
150 155 160Arg Gln Thr Met Asp Leu Val
Ala Gly Met Trp Gln Leu Leu Ser Gly 165
170 175Ser Phe Ala Ala Val Pro Arg Glu Leu Trp Trp Asp
Asn Glu Ala Gly 180 185 190Ile
Gly Arg Arg Gly Arg Leu Thr Asp Pro Val Thr Ala Leu Val Gly 195
200 205Thr Leu Gly Ala Arg Leu Val Gln Leu
Arg Pro Tyr Asp Pro Glu Ser 210 215
220Lys Gly Met Val Glu Arg Ala Asn Arg Tyr Leu Glu Thr Ser Phe Leu225
230 235 240Pro Gly Arg Ser
Phe Thr Ser Pro Gln Asp Phe Asn Asp Gln Leu Gln 245
250 255Gln Trp Leu Pro Val Ala Asn Ser Arg Arg
Val Arg Val Leu Asp Gly 260 265
270Arg Pro Ile Asp Phe Leu Asp Ala Asp Arg Ala Gln Met Leu Arg Leu
275 280 285Pro Pro Val Pro Pro Val Thr
Glu Thr Val Ala Ser Val Arg Leu Gly 290 295
300Arg Asp Tyr Tyr Val Arg Val Ala Gly Asn Asp Tyr Ser Val Asp
Pro305 310 315 320Ala Ala
Ile Gly Gln Leu Val Asp Val Thr Thr Thr Leu Thr Gln Val
325 330 335Thr Val Ser Arg Ala Gly Arg
Leu Leu Ala Ala His Glu Arg Cys Trp 340 345
350Ala Ala Arg Gln Thr Leu Thr Asp Pro Ala His Val Gln Ala
Ala Ala 355 360 365Thr Leu Arg Gln
Gln Phe His Ala Gly Pro Ala Pro Gln Ala Gly Ala 370
375 380His Leu Leu Arg Asp Leu Ala Asp Tyr Asp Arg Ala
Phe Gly Val Asp385 390 395
400Phe Ser Thr Gly Thr Val Thr Ser Asp Gly Glu Val Ala
405 4108414PRTStreptomyces thermoautotrophicus 8Met Glu
Asp Trp Ala Glu Ile Arg Arg Leu His Arg Ala Glu Gly Val1 5
10 15Pro Ile Lys Glu Ile Ala Arg Arg
Leu Gly Val Ala Arg Asn Thr Val 20 25
30Arg Ala Ala Leu Asn Ser Asp Arg Pro Pro Lys Tyr Glu Arg Ala
Ser 35 40 45Arg Gly Gln Val Ala
Asp Ala Phe Glu Pro Gln Met Arg Ala Leu Leu 50 55
60Lys Glu Trp Pro Arg Met Pro Ala Pro Val Ile Ala Glu Arg
Ile Gly65 70 75 80Trp
Pro Tyr Ser Met Ala Pro Leu Arg Lys Arg Leu Ala Leu Ile Arg
85 90 95Pro Glu Tyr Leu Gly Ile Asp
Pro Val Asp Arg Val Thr Tyr Gly Pro 100 105
110Gly Gln Val Ala Gln Cys Asp Leu Trp Phe Pro Gln Thr Arg
Ile Pro 115 120 125Val Thr Ala Gly
Gln Glu Arg Met Leu Pro Val Leu Val Met Thr Leu 130
135 140Gly Phe Ser Arg Phe Met Thr Ala Thr Met Ile Pro
Thr Arg Gln Ala145 150 155
160Gly Asp Ile Leu Ser Gly Met Trp Gln Leu Ile Arg Gly Ile Gly Arg
165 170 175Val Thr Lys Thr Leu
Val Trp Asp Arg Glu Ala Ala Ile Gly Gly Thr 180
185 190Gly Lys Val Ser Ala Pro Ala Ala Ala Phe Ala Gly
Thr Leu Ala Thr 195 200 205Thr Ile
Arg Leu Ala Pro Pro Arg Asp Pro Glu Phe Lys Gly Met Val 210
215 220Glu Arg Asn Asn Gln Tyr Leu Glu Thr Ser Phe
Leu Pro Gly Arg Arg225 230 235
240Phe Val Ser Pro Ala Asp Phe Asn Asp Gln Leu Gly Asp Trp Leu Val
245 250 255Arg Ala Asn Ser
Arg Thr Val Arg Ser Ile Gln Gly Arg Pro Val Asp 260
265 270Leu Leu Glu Ala Asp Cys Gln Ala Met Ile Pro
Leu Pro Pro Val Thr 275 280 285Pro
Pro Ile Gly Leu Asn His Arg Val Arg Leu Gly Arg Asp Tyr Tyr 290
295 300Val Arg Val Asp Thr Val Asp Tyr Ser Val
Asp Pro Gln Ala Ile Gly305 310 315
320Arg Phe Val Asp Val Thr Ala Ser Leu Glu Thr Val Thr Val Leu
Cys 325 330 335Asp Gly Gln
Leu Val Ala Arg His Ala Arg Ser Trp Ala Arg Gln Gly 340
345 350Val Ile Thr Asp Pro Val His Ala Ala Thr
Ala Ala Arg Met Arg Gln 355 360
365Ala Leu Ala Glu Asp Arg Gln Arg Arg Gln Ala Ala Val Arg Arg His 370
375 380Ala Asp Gly His Ala Val Ala Leu
Arg Ala Leu Pro Asp Tyr Asp Ala385 390
395 400Leu Phe Gly Val Asp Phe Asn Pro Pro Ser Thr Lys
Ala Lys 405 4109408PRTSalinispora
arenicola 9Met Leu Ser Val Glu Asp Trp Ala Glu Ile Arg Arg Leu His Arg
Ala1 5 10 15Glu Arg Met
Ala Ile Lys Ala Ile Cys Arg Arg Leu Gly Val Ser Arg 20
25 30Asn Thr Val Arg Lys Ala Leu Ala Ser His
Glu Pro Pro Arg Tyr Gln 35 40
45Arg Ala Ala Lys Gly Ser Ile Val Asp Ala Val Glu Pro Gln Ile Arg 50
55 60Val Leu Leu Ala Glu Phe Pro Asp Met
Pro Thr Thr Val Ile Met Glu65 70 75
80Arg Val Gly Trp Thr Arg Gly Lys Thr Val Phe Ala Asp Arg
Val Gln 85 90 95Gln Leu
Arg Pro Leu Phe Arg Arg Pro Asp Pro Ser Gln Arg Thr Glu 100
105 110Tyr Leu Pro Gly Glu Leu Ala Gln Cys
Asp Leu Trp Phe Pro Pro Ala 115 120
125Asp Val Pro Leu Gly Phe Gly Gln Val Gly Arg Pro Pro Val Leu Val
130 135 140Met Val Ser Gly Tyr Ser Arg
Trp Leu Ser Ala Val Met Ile Pro Ser145 150
155 160Arg Gln Ser Pro Asp Leu Leu Val Gly His Trp Arg
Leu Ile Ser Gly 165 170
175Trp Arg Arg Val Pro Lys Ala Leu Val Trp Asp Asn Glu Ser Ala Val
180 185 190Gly Gln Trp Arg Ala Gly
Arg Pro Gln Leu Thr Glu Ala Met Asn Ala 195 200
205Phe Arg Gly Thr Leu Gly Ile Lys Val Ile Gln Cys Arg Pro
Ala Asp 210 215 220Pro Glu Ala Lys Gly
Leu Val Glu Arg Ala Asn Gly Tyr Leu Glu Thr225 230
235 240Ser Phe Leu Pro Gly Arg Arg Phe Ala Ser
Pro Gly Asp Phe Asn Ala 245 250
255Gln Leu Ser Glu Trp Leu Val Arg Ala Asn Asn Arg Gln His Arg Val
260 265 270Leu Gly Cys Arg Pro
Ala Glu Arg Trp Asp Ala Asp Arg Gln Ala Met 275
280 285Leu Pro Leu Pro Pro Val Ala Pro Val Val Gly Trp
Arg Gln Ala Thr 290 295 300Arg Leu Pro
Arg Asp His Tyr Val Arg Met Asp Gly Asn Asp Tyr Ser305
310 315 320Val His Pro Ser Val Val Gly
Arg Arg Val Glu Val Thr Ala Asp Gly 325
330 335Asp Gln Val Thr Val Leu Cys Asp Gly Arg Ser Val
Ala Arg His Asp 340 345 350Arg
Cys Trp Ala Lys His Gln Ser Ile Thr Asp Thr Ala His Arg Gln 355
360 365Ala Ala Ala Asp Leu Arg Val Ala Ala
Gln Arg Thr Pro Thr Ala Ala 370 375
380Val Asp Ala Gln Val Glu Arg Arg Pro Leu Ser Asp Tyr Asp Arg Leu385
390 395 400Phe Gly Leu Asp
Glu Val Ala Ala 40510402PRTTrueperella pyogenes 10Met Asp
Asp Trp Ala Gln Ile Arg Ile Leu Arg Asn Glu Gly Met Ser1 5
10 15Ile Arg Lys Ile Ala Ser Thr Val
Gly Cys Ala Lys Lys Thr Val Glu 20 25
30Arg Ala Leu Ala Ser Asn Thr Pro Pro Ser Tyr Lys Gln Arg Ala
Pro 35 40 45Gln Lys Thr Ala Phe
Asp Glu Phe Glu Leu Asp Val Arg Ala Leu Ile 50 55
60Asp Asp Val Pro Asp Leu Pro Ala Thr Val Leu Ala Gln Arg
Val Gly65 70 75 80Trp
Thr Gly Ser Met Ser Trp Phe Arg Glu Asn Val Arg Arg Ile Arg
85 90 95Pro Glu Tyr Met Pro Lys Asp
Pro Val Asp Val Leu Asp His Lys Ala 100 105
110Gly Gln Gln Ile Gln Cys Asp Leu Met Phe Pro Asp Asp Gly
Ile Thr 115 120 125Asn Asp Met Gly
Val Pro Ala Lys Phe Pro Val Leu Val Met Val Ser 130
135 140Ser Tyr Ser Arg Phe Met Ala Ala Cys Val Leu Pro
Thr Lys Thr Thr145 150 155
160Gly Asp Leu Val Ser Gly Met Trp Met Leu Leu His Asp Arg Phe Gln
165 170 175Val Val Pro Gln His
Leu Leu Trp Asp His Glu Ser Gly Ile Gly Asn 180
185 190Lys Arg Leu Val Asp Gln Val Val Gly Phe Ser Gly
Thr Leu Gly Leu 195 200 205Lys Val
Arg Gln Ala Pro Pro Arg Asp Pro Glu Thr Lys Gly Ile Val 210
215 220Glu Arg His Asn Glu Tyr Leu Gln Thr Ser Phe
Phe Pro Gly Arg Arg225 230 235
240Phe Thr Asp Pro Ile Asp Ala Gln Ala Gln Leu Asp Ser Trp Ile Asp
245 250 255Asn Ile Ala Asn
Lys Arg Ile His Ala Thr Leu Lys Gln Arg Pro Val 260
265 270Asp Arg Trp Ala Ala Asp Lys Glu Ala Met Ser
Ala Leu Pro Pro Tyr 275 280 285Ala
Pro Gln Thr Gly Leu Arg Arg Gln Val Arg Leu Pro Arg Asn Tyr 290
295 300Tyr Val Ser Val Asp Ser Asn Arg Tyr Ser
Val Ser Pro Ala Ala Ile305 310 315
320Gly Lys Ile Val Thr Ile Phe Val Glu Leu His Arg Val Val Ile
Thr 325 330 335Asp Ser Ala
Gly Val Ile Val Ala Asp His Arg Arg Val Trp Gly Lys 340
345 350Asn Val Thr Val Thr Asp Pro Asp His Gln
Arg Leu Ala Lys Gln Met 355 360
365Arg Gln Ser Leu Ala Ala Pro Lys Pro Arg Ala Cys Asp Ile Ala Val 370
375 380Glu Ala Ala Asp Leu Ser Ile Tyr
Asp Lys Ile Ala Gly Trp Glu Gln385 390
395 400Ser Ala11408PRTSalinispora arenicola 11Met Leu Ser
Val Glu Asp Trp Ala Glu Ile Arg Arg Leu Arg Arg Ser1 5
10 15Glu Gly Met Ala Ile Gln Ala Ile Ala
Arg Arg Leu Arg Met Ser Arg 20 25
30Asn Thr Val Lys Lys Ala Leu Ala Ser Asp Glu Pro Pro Arg Tyr Arg
35 40 45Arg Val Ala Lys Gly Ser Ile
Val Asp Ala Val Glu Pro Gln Ile Arg 50 55
60Ala Leu Leu Ala Glu Phe Pro Glu Met Pro Thr Thr Val Ile Met Val65
70 75 80Arg Val Gly Trp
Thr Arg Gly Lys Thr Val Phe Cys Asp Arg Val Gln 85
90 95Gln Leu Arg Pro Leu Phe Arg Arg Pro Asp
Pro Ala Gln Arg Thr Glu 100 105
110Tyr Leu Pro Gly Glu Leu Ala Gln Cys Asp Leu Trp Phe Pro Pro Ala
115 120 125Asp Val Ser Leu Gly Phe Gly
Gln Val Gly Arg Pro Pro Val Leu Val 130 135
140Met Val Ser Gly Tyr Ser Arg Trp Leu Ser Ala Val Met Ile Pro
Ser145 150 155 160Arg Gln
Ser Pro Asp Leu Leu Asp Gly His Trp Thr Leu Ile Ser Gly
165 170 175Trp Asp Arg Thr Pro Lys Gly
Leu Val Trp Asp Asn Glu Ser Ala Val 180 185
190Gly Gln Trp Arg Ala Gly Arg Pro Gln Leu Thr Glu Ala Met
Asn Ala 195 200 205Phe Arg Gly Thr
Leu Gly Ile Lys Val Ile Gln Cys Arg Pro Ala Asp 210
215 220Pro Glu Ala Lys Gly Leu Val Glu Arg Ala Asn Gly
Tyr Leu Glu Thr225 230 235
240Ser Phe Leu Pro Gly Arg Arg Phe Ala Ser Pro Gln Asp Phe Asn Ala
245 250 255Gln Leu Thr Glu Trp
Leu Val Arg Ala Asn Asn Arg Gln His Arg Met 260
265 270Leu Gly Cys Arg Pro Val Asp Arg Trp Asp Ala Asp
Arg Ala Ala Met 275 280 285Leu Ser
Leu Pro Pro Val Ala Pro Val Val Gly Trp Arg Arg Thr Thr 290
295 300Arg Leu Pro Arg Asp His Tyr Val Arg Leu Asp
Ser Asn Asp Tyr Ser305 310 315
320Val His Pro Ala Ala Val Gly Arg Arg Val Asp Ile Val Ala Asp Ala
325 330 335Asp Arg Val Gln
Val Phe Cys Glu Asn Arg Leu Phe Ala Arg His Asp 340
345 350Arg Cys Trp Ala Lys His Gln Ser Ile Thr Asp
Pro Ala His Arg Gln 355 360 365Ala
Ala Ala Asp Leu Arg Thr Ala Ala Arg Gln Thr Pro Ala Ala Ala 370
375 380Gly Thr Thr Glu Val Glu His Arg Gln Leu
Ala Asp Tyr Asp Arg Met385 390 395
400Phe Gly Leu Asp Glu Val Ala Ala
40512417PRTTessaracoccus flavescens 12Met Ile Ser Val Glu Asp Trp Ala Glu
Ile Arg Arg Leu His Lys Ser1 5 10
15Glu Lys Met Ala Ile Lys Ala Ile Ala Arg Gln Leu Gly Val Ala
Arg 20 25 30Asn Thr Val Arg
Ala Ala Leu Ser Ser Asp Gly Pro Pro Lys Tyr Glu 35
40 45Arg Ala Pro Val Gly Ser Ala Val Asp Ala Phe Glu
Pro Gln Ile Arg 50 55 60Ala Leu Leu
Ser Arg Thr Pro Thr Met Pro Ala Thr Val Ile Ala Glu65 70
75 80Arg Ile Gly Trp Thr Arg Ser Gly
Ser Val Leu Arg Ala Arg Val Ala 85 90
95Glu Leu Arg Pro Leu Phe Ala Pro Pro Asp Pro Ala Asp Arg
Thr Glu 100 105 110Tyr Gln Pro
Gly Glu Ile Val Gln Cys Asp Leu Trp Phe Pro Pro Lys 115
120 125Ile Val Arg Val Ala Ala Asp Val Trp Thr Ala
Pro Pro Val Leu Thr 130 135 140Met Val
Ala Ala Trp Ser Gly Phe Ile Ala Ala Val Leu Leu Pro Ser145
150 155 160Arg Thr Thr Gly Asp Leu Leu
Ala Gly Met Trp Gln Leu Leu Ser Gly 165
170 175Ser Phe Gly Gly Val Pro Lys Thr Leu Val Trp Asp
Asn Glu Ser Gly 180 185 190Ile
Gly Gln His Arg Arg Leu Thr Val Ala Ala Arg Ala Phe Ala Gly 195
200 205Thr Leu Gly Thr Arg Ile Phe Gln Thr
Arg Pro Arg Asp Pro Glu Ala 210 215
220Lys Gly Val Val Glu Arg Ala Asn Gly Tyr Leu Gln Thr Ser Phe Leu225
230 235 240Pro Gly Arg Glu
Phe Gly Ser Pro Ser Glu Phe Asn Thr Gln Leu Ala 245
250 255Ala Trp Leu Pro Arg Ala Asn Gln Arg Leu
Leu Arg Arg Thr Gly Ala 260 265
270Ala Pro Ala Ala Arg Leu Ala Ala Glu Val Ala Ala Met Thr Ala Leu
275 280 285Pro Pro Val Pro Pro Thr Val
Gly Phe Thr Asp Arg Ile Arg Leu Gly 290 295
300Arg Asp Tyr Tyr Val Arg Val Met Gly Asn Asp Tyr Ser Val Asp
Pro305 310 315 320Ala Ala
Ile Gly Arg Phe Val Asp Ile Thr Cys Asp Leu Glu His Val
325 330 335Thr Val Ser Cys Ala Gly Thr
Val Val Ala Glu His Gly Arg Cys Trp 340 345
350Asp Leu Arg Cys Thr Ile Thr Asp Pro Ala His Val Ala Ala
Ala Lys 355 360 365His Leu Arg Ala
Ala Phe Gln Ser Arg Asn Thr Pro Thr Thr Gly Arg 370
375 380Val Glu Pro Ser His Gln Val Gly Met Arg Ser Leu
Ser Asp Tyr Asp385 390 395
400Glu Leu Phe Asn Leu Asn Thr Ala Thr Ala Leu Ser Val Lys Glu Val
405 410
415Ala13416PRTStreptomyces pactum 13Met Ile Ser Val Glu Asp Trp Ala Glu
Ile Arg Arg Leu His Arg Ala1 5 10
15Glu Gln Met Pro Val Arg Ala Ile Ala Arg Lys Leu Gly Ile Ala
Arg 20 25 30Asn Thr Val Arg
Arg Ala Ile Ala Asp Asp Ala Pro Pro Lys Tyr Gln 35
40 45Arg Ala Pro Lys Gly Ser Ile Val Asp Ala Val Glu
Pro Gln Ile Arg 50 55 60Glu Leu Leu
Glu Gln Trp Pro Glu Met Pro Ala Thr Val Ile Ala Glu65 70
75 80Arg Ile Gly Trp Asp Arg Gly Leu
Thr Val Leu Lys Asp Arg Val Arg 85 90
95Asp Leu Arg Pro Ala Tyr Arg Pro Ala Asp Pro Ala Ser Arg
Thr Val 100 105 110Tyr Glu Pro
Gly Glu Ile Gly Gln Cys Asp Leu Trp Phe Pro Pro Ala 115
120 125Asp Ile Pro Leu Gly Phe Gly Gln Val Gly Arg
Pro Pro Val Leu Val 130 135 140Met Val
Ala Gly Tyr Ser Arg Trp Ile Thr Ala Arg Met Leu Pro Ser145
150 155 160Arg Ser Ala Ala Asp Leu Ile
Ala Gly His Trp Arg Leu Leu Thr Glu 165
170 175Leu Gly Ala Val Pro Arg Val Leu Val Trp Asp Asn
Glu Gly Ala Val 180 185 190Gly
Ser Trp Arg Pro Gly Gly Pro Gln Leu Thr Asp Glu Phe Ala Ala 195
200 205Phe Ala Gly Leu Leu Gly Val Lys Phe
Leu Leu Cys Lys Pro Arg Asp 210 215
220Pro Glu Ala Lys Gly Leu Val Glu Arg Ala Asn Gly Tyr Leu Glu Thr225
230 235 240Ser Phe Leu Pro
Gly Arg Val Phe Thr Ser Pro Ala Asp Phe Asn Ile 245
250 255Gln Leu Ala Asp Trp Leu Thr Arg Ala Asn
Arg Arg Ile His Arg Thr 260 265
270Leu Gln Ala Arg Pro Ala Asp Arg Leu Glu Ala Asp Arg Ser Arg Met
275 280 285Leu Ala Leu Pro Pro Ile Ala
Pro Pro Gly Trp Trp Lys Ala Ser Leu 290 295
300Arg Leu Pro Arg Asp His Tyr Val Arg Leu Asp Thr Cys Asp Tyr
Ser305 310 315 320Val His
Pro Leu Ala Val Gly Arg Arg Val Gln Val Ala Ala Asp Leu
325 330 335Asp Gln Val Leu Val Thr Cys
Asp Gly Val Glu Val Ala Arg His Ala 340 345
350Arg Ser Trp Ala Arg His Gln Thr Ile Thr Asp Pro Asp His
Ala Ala 355 360 365Ala Ala Ala Ala
Ala Arg Lys Ala Ala Ala Gly Thr Lys Pro Ala Pro 370
375 380Val Asp Val Thr Glu Val Glu Glu Arg Ser Leu Glu
Ala Tyr Asp Arg385 390 395
400Ile Phe Gly Val Ile Asp Gly Gly Leu Ser Thr Gly Glu Gly Ala Ala
405 410 41514411PRTMycobacterium
canettii CIPT 140060008 14Met Leu Ser Val Glu Asp Trp Ala Glu Ile Arg Arg
Leu Arg Arg Ser1 5 10
15Glu Arg Leu Pro Ile Ser Glu Ile Ala Arg Val Leu Lys Ile Ser Arg
20 25 30Asn Thr Val Lys Ser Ala Leu
Ala Ser Asp Gly Pro Pro Lys Tyr Gln 35 40
45Arg Ala Ala Lys Gly Ser Val Ala Asp Glu Ala Glu Pro Arg Ile
Arg 50 55 60Glu Leu Leu Ala Ala Tyr
Pro Arg Met Pro Ala Thr Val Ile Ala Glu65 70
75 80Arg Ile Gly Trp Trp Tyr Ser Ile Arg Thr Leu
Ser Gly Arg Val Arg 85 90
95Glu Leu Arg Pro Leu Tyr Leu Pro Pro Asp Pro Ala Ser Arg Thr Thr
100 105 110Tyr Val Ala Gly Glu Ile
Gly Gln Cys Asp Phe Trp Phe Pro Asp Val 115 120
125Val Val Pro Val Gly Tyr Gly Gln Val Arg Thr Ala Thr Ala
Leu Pro 130 135 140Val Leu Thr Met Val
Cys Gly Tyr Ser Arg Trp Ala Ser Ala Leu Leu145 150
155 160Ile Pro Thr Arg Thr Ala Glu Asp Leu Tyr
Ala Gly Trp Trp Gln His 165 170
175Leu Ser Thr Leu Gly Ala Val Pro Arg Val Leu Val Trp Asp Gly Glu
180 185 190Gly Ala Val Gly Arg
Trp Arg Ala Arg Gln Pro Glu Leu Thr Ala Ala 195
200 205Cys His Ala Phe Arg Gly Thr Leu Ala Ala Lys Val
Trp Ile Cys Lys 210 215 220Pro Gly Asp
Pro Glu Ala Lys Gly Leu Val Glu Arg Phe His Asp Tyr225
230 235 240Leu Glu Arg Ala Phe Leu Pro
Gly Arg Val Phe Ala Ser Pro Ala Asp 245
250 255Phe Asn Thr Gln Leu Gln Ala Trp Leu Val Arg Ala
Asn His Arg Gln 260 265 270His
Arg Val Leu Gly Cys Arg Pro Ala Asp Arg Ile Glu Ala Asp Thr 275
280 285Ala Ala Met Leu Thr Leu Pro Pro Val
Gly Pro Ser Ile Gly Trp Arg 290 295
300Thr Ser Thr Arg Leu Pro Arg Asp His Tyr Val Arg Leu Asp Gly Asn305
310 315 320Asp Tyr Ser Val
His Pro Val Ala Ile Gly Arg Arg Ile Glu Ile Thr 325
330 335Ala Asp Leu Ser Arg Val Arg Val Trp Cys
Gly Gly Thr Leu Val Ala 340 345
350Asp His Asp Arg Ile Trp Ala Lys His Gln Thr Ile Ser Asp Pro Glu
355 360 365His Val Val Ala Ala Lys Leu
Leu Arg Arg Lys Arg Phe Asp Ile Val 370 375
380Gly Pro Pro His His Val Glu Val Glu Gln Arg Leu Leu Thr Thr
Tyr385 390 395 400Asp Thr
Val Leu Gly Leu Asp Gly Pro Val Ala 405
41015372PRTMycobacterium canettii CIPT 140070010 15Met Ser Val Glu Glu
Trp Ala Lys Ile Arg Arg Leu His Arg Ser Glu1 5
10 15Gly Arg Ser Ile Ser Glu Ile Ala Arg Met Leu
Gly Val Ser Arg Asn 20 25
30Thr Val Arg Ala Ala Leu Ser Ser Ala Asp Pro Pro Lys Tyr Arg Arg
35 40 45Arg Pro Met Gly Ser Val Ala Asp
Ala Ala Glu Pro Lys Ile Arg Glu 50 55
60Leu Leu Thr Ala Asp Pro Arg Met Pro Ala Thr Val Ile Ala Glu Arg65
70 75 80Ile Gly Trp Ser His
Ser Ile Arg Thr Leu Arg Gly Arg Val Arg Glu 85
90 95Leu Arg Pro Leu Tyr Leu Pro Pro Asp Pro Ala
Ser Thr Thr Ala Tyr 100 105
110Ser Pro Gly Ala Ile Ala Gln Cys Ala Phe Trp Phe Pro Asn Val Val
115 120 125Val Pro Val Gly Phe Gly Gln
Ile Arg Thr Ala Ala Ala Leu Pro Val 130 135
140Leu Thr Met Ile Ser Gly Tyr Ser Trp Trp Val Ser Ala Ala Leu
Leu145 150 155 160Pro Thr
Arg Ser Ala Glu Asp Leu Tyr Val Gly Trp Trp Gln Gln Leu
165 170 175Ser Met Leu Gly Ala Thr Pro
Arg Ala Phe Val Trp Asp Gly Glu Asp 180 185
190Ala Val Gly Arg Trp Arg Thr Gln Gln Pro Glu Leu Thr Ala
Ala Cys 195 200 205His Thr Phe Cys
Ser Ala Leu Asp Ala Thr Val Gln Ile Cys Arg Pro 210
215 220Gly Asp Pro Glu Gly Arg Arg Leu Val Asn Gln Phe
Arg Gly Cys Leu225 230 235
240Glu Arg Ala Phe Leu Pro Gly Arg Arg Phe Ser Ser Pro Ala Asp Phe
245 250 255Asn Thr Gln Leu Arg
Glu Trp Leu Thr Gln Val Asn His His Arg His 260
265 270Arg Ala Leu Gly Phe Arg Pro Ala Asp Arg Ile Glu
Ala Asp Lys Ala 275 280 285Ala Met
Leu Ala Leu Pro Arg Val Arg Pro Ala Leu Gly Trp Gln Ala 290
295 300Ser Ser Thr Arg Leu Gln Arg Asp Gln Leu Val
Arg Ile Asp Gly Asn305 310 315
320Asn Tyr Ser Val His Pro Asp Ala Val Gly Arg Pro Ile Glu Val Thr
325 330 335Ala Asp Leu Asp
Arg Val Gln Val Trp Cys Gln Gly Ile Leu Val Ala 340
345 350Asp His Ala Arg Val Trp Ala Lys His Gln Thr
Ile Arg Asp Pro Ala 355 360 365His
Asn His Arg 37016407PRTCellulomonas fimi ATCC 484 16Met Thr Val Glu
Asp Trp Ala Glu Ile Arg Arg Leu His Arg Val Asp1 5
10 15Gly Met Ala Ile Lys Ala Ile Ala Arg Arg
Leu Gly Val Ser Arg Asn 20 25
30Ala Val Arg Arg Ala Leu Ala Arg Asp Ala Pro Pro Arg Tyr Glu Arg
35 40 45Ala Ala Lys Gly Ser Leu Val Asp
Gln Val Asp Pro Val Val Arg Gly 50 55
60Leu Leu Ala Ser Cys Pro Thr Met Pro Ala Thr Val Ile Ala Gln Arg65
70 75 80Ile Gly Trp Thr His
Ser Leu Thr Ile Leu Lys Asp Arg Val Arg Val 85
90 95Leu Arg Pro Tyr Tyr Leu Ala Pro Asp Pro Ala
Thr Arg Thr Ser Tyr 100 105
110Glu Ala Gly Gln Arg Val Gln Cys Asp Leu Trp Phe Pro Pro Val Asp
115 120 125Val Pro Leu Gly Ala Gly Gln
Val Gly Ser Pro Pro Val Leu Val Met 130 135
140Val Ala Gly Tyr Ser Arg Met Met Phe Ala Thr Met Leu Pro Ser
Arg145 150 155 160Gln Ala
Pro Asp Leu Ile Ala Gly His Trp Arg Leu Leu Asn Ala Met
165 170 175Gly Val Val Pro Arg Glu Leu
Val Trp Asp Asn Glu Ala Ala Val Gly 180 185
190Ser Trp Arg Ala Gly Lys Pro Lys Leu Thr Glu Glu Phe Glu
Ala Phe 195 200 205Arg Gly Val Leu
Gly Ile Gly Val His Gln Cys Arg Pro Arg Asp Pro 210
215 220Glu Ala Lys Gly Leu Val Glu Arg Ala Asn Gly Tyr
Leu Glu Thr Ser225 230 235
240Phe Leu Pro Gly Arg Met Phe Thr Ser Pro Gly Asp Phe Asn Thr Gln
245 250 255Leu Ala Asp Trp Leu
Val Leu Ala Asn Arg Arg Pro His Arg Ala Leu 260
265 270Gly Cys Lys Pro Val Glu Arg Trp Thr Ala Asp Arg
Asp Ala Met Leu 275 280 285Ala Leu
Pro Pro Val Ala Pro Gln Leu Gly Trp Arg Ala Arg Val Arg 290
295 300Leu Pro Arg Asp His Tyr Val Arg Leu Gly Ser
Asn Asp Tyr Ser Val305 310 315
320Asp Pro Val Ala Val Gly Arg Phe Val Glu Val Ile Ala Asp Leu Gln
325 330 335Gln Val Thr Val
Arg Leu Gly Thr Arg Val Val Ala Val His Glu Arg 340
345 350Cys Trp Ala Arg Trp Gln Thr Ile Thr Asp Pro
Ala His Arg Ala Ala 355 360 365Ala
Leu Glu Met Ala Ser Val Ala Ala His Arg Pro Ala Ser Ser Ala 370
375 380Ala Pro Asp Glu Val Glu Gln Arg Asp Leu
Gly Ala Tyr Asp Val Ala385 390 395
400Phe Gly Leu Thr Asp Val Ala
40517421PRTNakamurella multipartita DSM 44233 17Met Leu Ile Val Asp Asp
Trp Ala Glu Ile Arg Arg Leu His Arg Ala1 5
10 15Glu Gly Met Pro Ile Arg Ala Ile Ala Arg Arg Leu
Gly Cys Ser Lys 20 25 30Asn
Thr Val Lys Arg Ala Leu Ala Ala Gln Gly Pro Pro Arg Tyr Glu 35
40 45Arg Ala Thr Val Gly Ser Ala Val Asp
Ala Phe Glu Pro Ala Ile Arg 50 55
60Ala Leu Leu Ala Glu Phe Pro Ser Met Pro Thr Ser Val Ile Met Glu65
70 75 80Arg Val Gly Trp Ser
Arg Gly Arg Thr Val Phe Phe Glu Arg Val Ala 85
90 95Val Leu Arg Pro Leu Phe Val Pro Pro Asp Pro
Ala Ser Arg Thr Glu 100 105
110Tyr Gly Pro Gly Gln Leu Ala Gln Cys Asp Leu Trp Phe Pro Pro Val
115 120 125Asp Val Pro Val Gly Phe Asp
Gln Val Ala Arg Pro Pro Val Leu Val 130 135
140Met Val Ser Gly Phe Ser Arg Val Ile Thr Ala Arg Met Leu Pro
Ser145 150 155 160Arg Gln
Ser Ala Asp Leu Leu Ala Gly His Trp Glu Leu Leu Leu Gly
165 170 175Trp Gly Arg Leu Pro Arg Ala
Leu Val Trp Asp Asn Glu Ala Ala Val 180 185
190Gly Arg Trp Arg Gly Gly Arg Pro Glu Leu Thr Glu Pro Met
Asn Ala 195 200 205Phe Arg Gly Thr
Leu Gly Ile Lys Val Val Leu Cys Ala Pro Arg Asp 210
215 220Pro Glu Ser Lys Gly Leu Val Glu Arg Ala Asn Gly
Tyr Leu Glu Thr225 230 235
240Ser Phe Leu Pro Gly Arg Thr Phe Thr Ser Pro Ala Asp Phe Asn Ala
245 250 255Gln Leu Ala Ala Trp
Leu Val Arg Ala Asn Gln Arg Gln His Arg Arg 260
265 270Leu Gly Cys Arg Pro Ile Asp Arg Trp Ala Ala Asp
Leu Ala Ala Met 275 280 285Met Ala
Met Pro Pro Val Ala Pro Val Val Gly Trp Thr Ala Ser Pro 290
295 300Leu Leu Pro Arg Asp His Tyr Val Arg Val Asp
Ser Asn Asp Tyr Ser305 310 315
320Val His Pro Gly Val Val Gly Arg Arg Val Gln Val Leu Ala Asp Leu
325 330 335Asp Gln Val Val
Val Thr Cys Ala Gly Thr Val Val Ala Ala His Glu 340
345 350Arg Cys Trp Ala Arg Arg Gln Thr Ile Thr Asp
Ala Asp His Ala Gln 355 360 365Ala
Ala Ala Ala Leu Arg Ala Ala His Arg Glu Arg Val Arg Arg Pro 370
375 380Val Glu Thr Asp Val Ala Val Arg Glu Leu
Ala Asp Tyr Asp Arg Ile385 390 395
400Phe Gly Leu Gln Asp Asp Leu Asp Asp His Pro Ser Val Asp Val
Ala 405 410 415Asp Gly Glu
Val Ala 42018413PRTStreptomyces avermitilis MA-4680 = NBRC
14893 18Met Glu Asp Trp Ala Glu Ile Arg Arg Leu His Arg Ala Glu Gln Met1
5 10 15Pro Val Arg Ala
Ile Ala Arg Lys Leu Gly Ile Ala Arg Asn Thr Val 20
25 30Arg Arg Ala Ile Ala Asp Asp Ala Pro Pro Lys
Tyr Gln Arg Ala Pro 35 40 45Lys
Gly Ser Ile Val Asp Ala Val Glu Pro Gln Ile Arg Glu Leu Leu 50
55 60Glu Gln Trp Pro Glu Met Pro Ala Thr Val
Ile Ala Glu Arg Ile Gly65 70 75
80Trp Asp Arg Gly Leu Thr Val Leu Lys Asp Arg Val Arg Asp Leu
Arg 85 90 95Pro Ala Tyr
Arg Pro Ala Asp Pro Ala Ser Arg Thr Val Tyr Glu Pro 100
105 110Gly Glu Ile Gly Gln Cys Asp Leu Trp Phe
Pro Pro Ala Asp Ile Pro 115 120
125Leu Gly Phe Gly Gln Val Gly Arg Pro Pro Val Leu Val Met Val Ala 130
135 140Gly Tyr Ser Arg Trp Ile Thr Ala
Arg Met Leu Pro Ser Arg Ser Ala145 150
155 160Ala Asp Leu Ile Ala Gly His Trp Arg Leu Leu Thr
Glu Leu Gly Ala 165 170
175Val Pro Arg Val Leu Val Trp Asp Asn Glu Gly Ala Val Gly Ser Trp
180 185 190Arg Ser Gly Gly Pro Gln
Leu Thr Asp Glu Phe Ala Ala Phe Ala Gly 195 200
205Leu Leu Gly Ile Lys Phe Leu Leu Cys Lys Pro Arg Asp Pro
Glu Ser 210 215 220Lys Gly Leu Val Glu
Arg Ala Asn Gly Tyr Leu Glu Thr Ser Phe Leu225 230
235 240Pro Gly Arg Val Phe Thr Ser Pro Ala Asp
Phe Asn Ile Gln Leu Ala 245 250
255Asp Trp Leu Thr Arg Ala Asn Arg Arg Ile His Arg Thr Leu Gln Ala
260 265 270Arg Pro Ala Asp Arg
Leu Glu Ala Asp Arg Ser Arg Met Leu Ala Leu 275
280 285Pro Pro Ile Ala Pro Pro Gly Trp Trp Lys Ala Ser
Leu Arg Leu Pro 290 295 300Arg Asp His
Tyr Val Arg Leu Asp Thr Cys Asp Tyr Ser Val His Pro305
310 315 320Leu Ala Val Gly Arg Arg Ile
Glu Val Ala Ala Gly Leu Asp Gln Val 325
330 335Leu Val Thr Cys Asp Gly Val Glu Val Ala Arg His
Ala Arg Ser Trp 340 345 350Ala
Arg His Gln Thr Ile Thr Asp Pro Asp His Ala Ala Ala Ala Ala 355
360 365Val Ala Arg Lys Ala Ala Ala Gly Thr
Lys Pro Ala Pro Val Asp Val 370 375
380Thr Glu Val Glu Glu Arg Ser Leu Asp Thr Tyr Asp Arg Ile Phe Gly385
390 395 400Val Ile Asp Gly
Gly Leu Ser Thr Gly Glu Gly Ala Ala 405
41019421PRTStreptomyces viridifaciens 19Met Ile Gln Val Glu Asp Trp Ala
Glu Ile Arg Arg Leu His Arg Ala1 5 10
15Glu Gly Leu Ser Ala Arg Ala Val Ala Arg Glu Leu Gly Ile
Ser Arg 20 25 30Gly Thr Val
Leu Arg Ala Leu Ala Ser Asp Arg Pro Pro Val Tyr Arg 35
40 45Arg Ala Pro Lys Gly Ser Ala Val Asp Ala Val
Glu Pro Ala Ile Arg 50 55 60Glu Leu
Leu Lys Gln Thr Pro Thr Met Pro Ala Thr Val Ile Ala Glu65
70 75 80Arg Ile Gly Trp Glu Arg Gly
Leu Ser Ile Leu Arg Glu Arg Val Arg 85 90
95Glu Leu Arg Pro Ala Tyr Leu Pro Ala Asp Pro Val Ser
Arg Thr Val 100 105 110Tyr Gln
Pro Gly Glu Leu Ala Gln Cys Asp Leu Trp Phe Pro Pro Val 115
120 125Asp Ile Pro Leu Gly Tyr Gly Gln Ser Gly
Arg Pro Pro Val Leu Val 130 135 140Ile
Val Ser Gly Tyr Ser Arg Met Ile Thr Ala Arg Met Leu Pro Ser145
150 155 160Arg Thr Thr Gly Asp Leu
Ile Asp Gly His Trp Arg Leu Leu Asn Ser 165
170 175Trp Gly Ala Val Pro Lys Thr Leu Val Trp Asp Asn
Glu Thr Gly Val 180 185 190Gly
Lys Gly Arg Leu Thr Ser Glu Phe Ala Ala Phe Ala Gly Leu Leu 195
200 205Ala Thr Arg Val His Leu Cys Arg Pro
Arg Asp Pro Glu Ala Lys Gly 210 215
220Leu Val Glu Arg Ala Asn Gly Tyr Leu Glu Thr Ser Phe Val Pro Gly225
230 235 240Arg Thr Phe Thr
Gly Pro Asp Asp Phe Asn Arg Gln Leu Thr Ala Trp 245
250 255Leu Arg Ile Ala Asn Arg Arg His His Arg
Ser Ile Asp Ala Arg Pro 260 265
270Ala Asp Arg Trp Glu Ala Asp Arg Ala Gln Met Leu Thr Ile Pro Pro
275 280 285Val Ala Pro Pro His Trp Trp
Pro Leu Arg Val Arg Ile Gly Arg Asp 290 295
300His Tyr Ile Arg Val Asp Thr Asn Asp Tyr Ser Val His Pro Arg
Val305 310 315 320Ile Gly
Arg Thr Val Thr Val His Ala Asp Asn Glu Glu Ile Thr Val
325 330 335Thr Cys Gly Asp Asp Val Val
Ala Arg His Ala Arg Cys Trp Ala Arg 340 345
350His Gln Ser Leu Thr Asp Pro Asp His Ala Ala Ala Ala Asn
Leu Met 355 360 365Arg Gly Glu Val
Ile His Gln Gln Ala Ala Arg Ala Ala Thr Ala Arg 370
375 380Ala Ala Val Leu Ala Pro Asp Ser Leu Gly Ile Glu
Val Glu Gln Arg385 390 395
400Glu Leu Gly Thr Tyr Asp Arg Met Phe Thr Leu Ile Glu Gly Gly Ala
405 410 415Gly Lys Glu Asp Thr
42020416PRTMycobacterium kansasii 732 20Met Glu Asp Trp Ala Glu
Ile Arg Arg Leu His Arg Ala Glu Gly Leu1 5
10 15Pro Ile Lys Thr Ile Ala Arg Thr Leu Asn Ile Ser
Arg Asn Thr Val 20 25 30Arg
Ser Ala Leu Ala Ala Glu Gly Pro Pro Lys Tyr Glu Arg Lys Pro 35
40 45Ala Gly Ser Ala Val Asp Ala Phe Glu
Asp Ala Ile Arg Ala Gln Leu 50 55
60Lys Ala Val Pro Thr Met Pro Ala Thr Val Ile Ala Glu Arg Val Gly65
70 75 80Trp Thr Arg Gly Met
Thr Val Phe Lys Glu Arg Val Arg Glu Leu Arg 85
90 95Pro Ala Phe Leu Pro Pro Asp Pro Ala Gly Arg
Thr Thr Tyr Glu Ala 100 105
110Gly Glu Ile Ala Gln Cys Asp Phe Trp Phe Pro Pro Thr Thr Val Pro
115 120 125Val Gly Tyr Gly Gln Val Arg
Thr Pro Met Gln Leu Pro Val Leu Thr 130 135
140Met Val Cys Gly Tyr Ser Arg Trp Leu Ala Ala Ile Leu Val Pro
Ser145 150 155 160Arg Cys
Ala Glu Asp Leu Phe Ala Gly Trp Trp Gln Leu Ile His Ala
165 170 175Leu Gly Ala Val Pro Arg Thr
Leu Val Trp Asp Gly Glu Gly Ala Ile 180 185
190Gly Arg Trp Arg Ser Gly Arg Val Glu Leu Thr Gly Gln Cys
Gln Ala 195 200 205Phe Arg Gly Val
Leu Gly Ala Lys Val Val Val Leu Lys Pro Arg Glu 210
215 220Pro Glu His Lys Gly Ile Ile Glu Arg Ala His Asp
Tyr Leu Glu Arg225 230 235
240Ser Phe Leu Pro Gly Arg Thr Phe Ser Gly Pro Gly Asp Phe Asn His
245 250 255Gln Leu Gln Gln Trp
Leu Gln Thr Val Asn Arg Arg Thr Arg Arg Val 260
265 270Leu Gly Cys Ala Pro Thr Glu Arg Ile Gly Ala Asp
Arg Gln Ala Met 275 280 285Leu Ala
Leu Pro Pro Val Ala Pro Ala Thr Gly Trp Arg Ala Thr Thr 290
295 300Arg Leu Ala Arg Asp His Tyr Val Arg Leu Asp
Ser Asn Asp Tyr Ser305 310 315
320Val His Pro Gly Val Ile Gly Arg Arg Ile Glu Val Leu Ala Asp Leu
325 330 335Asp Arg Val Arg
Val Phe Cys Glu Gly Lys Leu Val Ala Asp His Glu 340
345 350Arg Val Trp Ala Trp His Gln Thr Ile Thr Asp
Pro Glu His Arg Ala 355 360 365Ala
Ala Asn Met Leu Arg Cys Asn Arg Ile Gly Ala Leu Arg Pro Val 370
375 380Arg Glu Pro Ala Asp Gln Ile Ser Val Glu
Gln Arg Ala Leu Ala Asp385 390 395
400Tyr Asp Thr Ala Leu Gly Ile Asp Leu Gly Glu Gly Gly Leu Val
Ser 405 410
41521426PRTStreptomyces niveiscabiei 21Met Ile Ser Val Glu Asp Trp Ala
Glu Ile Arg Arg Leu His Arg Ala1 5 10
15Glu Gln Met Pro Ile Arg Ala Ile Ala Arg Gln Leu Gly Ile
Ser Lys 20 25 30Asn Thr Val
Lys Arg Ala Leu Ala Thr Asp Arg Pro Pro Val Tyr Ser 35
40 45Arg Pro Ala Lys Gly Ser Ala Val Asp Ala Val
Glu Pro Gln Ile Arg 50 55 60Glu Leu
Leu Lys Gln Thr Pro Thr Met Pro Ala Thr Val Ile Ala Glu65
70 75 80Arg Ile Gly Trp Asp Arg Gly
Met Thr Val Leu Lys Glu Arg Val Arg 85 90
95Glu Leu Arg Pro Ala Tyr Leu Pro Val Asp Pro Val Ser
Arg Thr Ser 100 105 110Tyr Gln
Pro Gly Glu Leu Ala Gln Cys Asp Leu Trp Phe Pro Pro Ala 115
120 125Asp Ile Pro Leu Gly Tyr Gly Gln Ser Gly
Arg Pro Pro Val Leu Val 130 135 140Met
Val Ser Gly Tyr Ser Arg Leu Ile Ala Ala Arg Met Leu Pro Thr145
150 155 160Arg Thr Thr Gly Asp Leu
Ile Asp Gly His Trp Lys Leu Leu Thr Gly 165
170 175Trp Asn Ala Val Pro Arg Met Leu Val Trp Asp Asn
Glu Ala Gly Ile 180 185 190Gly
Arg Gly Lys Val Thr Gly Asp Phe Ala Ala Phe Ala Gly Leu Leu 195
200 205Ala Thr Arg Ile Tyr Leu Cys Arg Pro
Arg Asp Pro Glu Ala Lys Gly 210 215
220Leu Val Glu Arg Ala Asn Gly Tyr Leu Glu Thr Ser Phe Leu Pro Gly225
230 235 240Arg Thr Phe Thr
Gly Pro Asp Asp Phe Asn Thr Gln Leu Ala Thr Trp 245
250 255Leu Ala Ile Ala Asn Arg Arg Gln His Arg
Thr Leu Gly Ala Arg Pro 260 265
270Ile Asp Arg Trp Glu Ala Asp Arg Ala Gln Met Leu Thr Leu Pro Pro
275 280 285Val Asp Pro Pro Arg Trp Trp
Arg Phe Ala Thr Arg Ile Gly Arg Asp 290 295
300His Tyr Ile Arg Val Asp Thr Cys Asp Tyr Ser Val His Pro Leu
Ala305 310 315 320Ile Gly
Lys Lys Val Gln Val Arg Thr Asp Thr Asp Glu Val Val Val
325 330 335Thr Leu Thr Pro Gly Gly Ala
Glu Val Ala Arg His Pro Arg Cys Trp 340 345
350Ala Lys Gln Gln Thr Ile Thr Asp Pro Val His Ala Arg Ala
Ala Ala 355 360 365Val Leu Arg Gly
Asp Tyr Arg His His Gln Ala Ser Arg Ala Gln Ala 370
375 380Val Arg Arg His Asn Thr Ala Thr Ala Ser Asp Leu
Val Glu Val Glu385 390 395
400Gln Arg Arg Leu Asp Ser Tyr Asp Arg Leu Phe Thr Leu Ile Glu Gly
405 410 415Gly Gly Gln Ala Asp
Asp Pro Glu Val Ser 420
42522411PRTActinomadura macra NBRC 14102 22Met Ile Lys Val Glu Asp Trp
Ala Glu Ile Arg Arg Leu His Arg Ser1 5 10
15Glu Gln Met Pro Ile Lys Ala Ile Ala Arg Arg Leu Gly
Val Ser Lys 20 25 30Asn Thr
Val Lys Arg Ala Leu Ala Ala Asp Asp Pro Pro Lys Tyr Arg 35
40 45Arg Ala Gly Lys Gly Ser Ile Val Asp Ala
Val Glu Pro Gln Ile Arg 50 55 60Glu
Leu Leu Ala Glu Phe Pro Asp Met Pro Ala Thr Val Ile Ala Glu65
70 75 80Arg Ile Gly Trp Ala Arg
Ser Leu Thr Val Leu Lys Asp Arg Val Arg 85
90 95Val Leu Arg Pro Gln Tyr Arg Ser Pro Asp Pro Ala
Ser Arg Thr Thr 100 105 110Tyr
Gln Pro Gly Glu Leu Ala Gln Cys Asp Leu Trp Phe Pro Pro Thr 115
120 125Lys Val Pro Val Gly Ala Gly Gln Gln
Thr Ser Pro Pro Val Leu Val 130 135
140Met Val Ser Gly Tyr Ser Arg Trp Leu Met Ala Arg Met Leu Pro Ser145
150 155 160Arg Ala Ala Gly
Asp Leu Phe Ala Gly Met Trp Ala Leu Leu Arg Met 165
170 175Leu Gly Ser Ala Pro Lys Thr Leu Val Trp
Asp Asn Glu Gly Ala Ile 180 185
190Gly Gln Trp Asn Gly Gly Arg Pro Arg Leu Thr Ala Glu Ala Asn Ala
195 200 205Phe Arg Gly Thr Leu Gly Ile
Lys Ile Val Gln Cys Arg Pro Gly Asp 210 215
220Pro Glu Ala Lys Gly Val Val Glu Arg Ala Asn Gly Tyr Leu Glu
Thr225 230 235 240Ser Phe
Leu Pro Gly Arg Ser Phe Ser Gly Pro Asp Asp Phe Asn Ala
245 250 255Gln Leu Ala Gly Trp Leu Gly
His His Ala Asn Val Arg His His Arg 260 265
270Arg Ile Glu Cys Arg Pro Ala Asp Arg Leu Val Ala Asp Arg
Ala Ala 275 280 285Met Val Ala Leu
Pro Pro Val Glu Pro Leu Val Gly Trp Arg Thr Ser 290
295 300Thr Arg Leu Ala Arg Asp His Tyr Val Arg Ile Ala
Ser Asn Asp Tyr305 310 315
320Ser Val His Pro Ser Ala Ile Gly Arg Leu Val Glu Val Val Ala Asp
325 330 335Leu Glu Gln Val Thr
Val Thr Cys Gly Gly Gln Asn Val Ala Ala His 340
345 350Pro Arg Cys Trp Ala Val His Gln Ser Ile Thr Asp
Pro Val His Ala 355 360 365Gln Ala
Ala Ala Ala Met Arg Arg Ser Gly Leu Gln Val Thr Arg Ala 370
375 380Pro Val Asp Thr Gln Val Glu Gln Arg Arg Leu
Ala Asp Tyr Asp Ala385 390 395
400Leu Ile Gly Ile Glu Gly Thr Glu Gly Val Ala 405
41023371PRTSalinispora pacifica DSM 45549 23Met Asp Gly Val
Glu Pro Gln Val Arg Ala Leu Leu Ala Glu Phe Pro1 5
10 15Arg Met Pro Ala Thr Val Ile Ala Glu Arg
Ile Gly Trp Thr Lys Ser 20 25
30Leu Thr Ile Leu Lys Asp Arg Val Arg Glu Leu Arg Pro Leu Phe Val
35 40 45Pro Pro Asp Pro Thr Asp Arg Val
Glu Tyr Asp Pro Gly Glu Val Ala 50 55
60Gln Cys Asp Leu Trp Phe Pro Pro Gln Pro Ile Pro Val Gly Gly Gly65
70 75 80Ala Glu Arg Ile Leu
Pro Val Leu Ala Met Thr Cys Gly Tyr Ser Arg 85
90 95Val Thr Asp Ala Val Met Ile Pro Ser Arg Lys
Ala Gly Asp Ile Leu 100 105
110Ala Gly Met Trp Glu Ile Val Ala Gly Trp Gly Ala Cys Pro Arg Thr
115 120 125Leu Val Trp Asp Arg Glu Ala
Ala Ile Gly Gly Thr Gly Lys Leu Thr 130 135
140Thr Glu Ala Ala Ala Phe Ala Gly Thr Ile Gly Val Arg Ile Arg
Leu145 150 155 160Ala Pro
Pro Arg Asp Pro Glu Phe Lys Gly Leu Val Glu Arg Arg Asn
165 170 175Gly Phe Phe Glu Thr Ser Phe
Leu Pro Gly Arg Val Phe Thr Ser Pro 180 185
190Phe Asp Phe Asn Val Gln Ile Ser Asp Trp Leu Val Gln Arg
Ala Asn 195 200 205Thr Arg Val Leu
Arg Ala Ile Gly Leu Thr Thr Pro Thr Ala Arg Trp 210
215 220Ala Ala Asp Arg Ala Ala Met Val Ala Leu Pro Pro
Val Ala Pro Ala225 230 235
240Ile Gly Leu Thr His Arg Val Arg Leu Gly Arg Asp Tyr Tyr Val Arg
245 250 255Ile Asp Gly Asn Asp
Tyr Ser Val Asp Pro Arg Cys Ile Gly Arg Phe 260
265 270Ile Asp Val Leu Ala Thr Pro Ala Arg Met Val Ala
Ser Cys Ala Gly 275 280 285Gln Val
Val Ala Asp His Asp Arg Asp Trp Gly His Ala Arg Thr Ile 290
295 300Thr Asp Pro Glu His Gln Ala Thr Ala Arg Leu
Leu Arg Gln Asp Leu305 310 315
320Ala Ala Arg Arg Arg Gln Ala Ser Thr Arg Ser His Ala Asp Gly His
325 330 335Val Val Ala Ile
Arg Ala Leu Pro Asp Tyr Asp Ala Leu Phe Gly Val 340
345 350Asp Phe Asp Pro Arg Pro Asp Val Glu Ala Val
Gln Lys Ser Ala Ala 355 360 365Glu
Gly Lys 37024371PRTSalinispora arenicola CNH646 24Met Asp Gly Val Glu
Pro Gln Val Arg Ala Leu Leu Ala Glu Phe Pro1 5
10 15Arg Met Pro Ala Thr Val Ile Ala Glu Arg Ile
Gly Trp Thr Lys Ser 20 25
30Leu Thr Ile Leu Lys Asp Arg Val Arg Glu Leu Arg Pro Leu Phe Val
35 40 45Pro Pro Asp Pro Thr Asp Arg Val
Glu Tyr Asp Pro Gly Glu Val Ala 50 55
60Gln Cys Asp Leu Trp Phe Pro Pro Gln Pro Ile Pro Val Gly Gly Gly65
70 75 80Ala Glu Arg Val Leu
Pro Val Leu Ala Met Thr Cys Gly Tyr Ser Arg 85
90 95Val Thr Asp Ala Val Met Ile Pro Ser Arg Lys
Ala Gly Asp Ile Leu 100 105
110Ala Gly Met Trp Glu Ile Val Ala Gly Trp Gly Ala Cys Pro Arg Thr
115 120 125Leu Val Trp Asp Arg Glu Ala
Ala Ile Gly Gly Thr Gly Lys Leu Thr 130 135
140Thr Glu Ala Ala Ser Phe Ala Gly Thr Val Gly Val Arg Ile Arg
Leu145 150 155 160Ala Pro
Pro Arg Asp Pro Glu Phe Lys Gly Leu Val Glu Arg Arg Asn
165 170 175Gly Phe Phe Glu Thr Ser Phe
Leu Pro Gly Arg Val Phe Thr Ser Pro 180 185
190Phe Asp Phe Asn Val Gln Ile Ser Asp Trp Ile Val Gln Arg
Ala Asn 195 200 205Thr Arg Val Leu
Arg Ala Ile Gly Leu Thr Thr Pro Thr Ala Arg Trp 210
215 220Ala Ala Asp Arg Ala Ala Met Val Ala Leu Pro Pro
Val Ala Pro Ala225 230 235
240Ile Gly Leu Thr His Arg Val Arg Leu Gly Arg Asp Tyr Tyr Val Arg
245 250 255Ile Asp Gly Asn Asp
Tyr Ser Val Asp Pro Arg Cys Ile Gly Arg Phe 260
265 270Val Asp Val Phe Ala Thr Pro Ala Arg Met Val Ala
Ser Cys Ala Gly 275 280 285Gln Val
Val Ala Asp His Asp Arg Asp Trp Gly His Ala Arg Thr Ile 290
295 300Thr Asp Pro Glu His Gln Ala Thr Ala Arg Leu
Leu Arg Gln Asp Leu305 310 315
320Ala Ala Arg Arg Arg Gln Ala Ser Thr Arg Ser His Ala Asp Gly His
325 330 335Val Val Ala Ile
Arg Ala Leu Pro Asp Tyr Asp Ala Leu Phe Gly Val 340
345 350Asp Phe Asp Pro Arg Pro Asn Arg Glu Ala Val
Gln Lys Ser Ala Ala 355 360 365Glu
Gly Thr 37025371PRTSalinispora pacifica CNT584 25Met Asp Gly Val Glu
Pro Gln Val Arg Ala Leu Leu Ala Glu Phe Pro1 5
10 15Arg Met Pro Ala Thr Val Ile Ala Glu Arg Ile
Gly Trp Thr Lys Ser 20 25
30Leu Thr Ile Leu Lys Asp Arg Val Arg Glu Leu Arg Pro Leu Phe Val
35 40 45Pro Pro Asp Pro Thr Asp Arg Val
Glu Tyr Asp Pro Gly Glu Val Ala 50 55
60Gln Cys Asp Leu Trp Phe Pro Pro Gln Pro Ile Pro Val Gly Gly Gly65
70 75 80Ala Glu Arg Ile Leu
Pro Val Leu Ala Met Thr Cys Gly Tyr Ser Arg 85
90 95Val Thr Asp Ala Val Met Ile Pro Ser Arg Lys
Ala Gly Asp Ile Leu 100 105
110Ala Gly Met Trp Glu Ile Val Ala Gly Trp Gly Ala Cys Pro Arg Thr
115 120 125Leu Val Trp Asp Arg Glu Ala
Ala Ile Gly Gly Thr Gly Lys Leu Thr 130 135
140Thr Glu Ala Ala Ala Phe Ala Gly Thr Ile Gly Val Arg Ile Arg
Leu145 150 155 160Ala Pro
Pro Arg Asp Pro Glu Phe Lys Gly Leu Val Glu Arg Arg Asn
165 170 175Gly Phe Phe Glu Thr Ser Phe
Leu Pro Gly Arg Val Phe Thr Ser Pro 180 185
190Phe Asp Phe Asn Val Gln Ile Ser Asp Trp Leu Val Gln Arg
Ala Asn 195 200 205Thr Arg Val Leu
Arg Ala Ile Gly Leu Thr Thr Pro Thr Ala Arg Trp 210
215 220Ala Ala Asp Arg Ala Ala Met Val Ala Leu Pro Pro
Val Ala Pro Ala225 230 235
240Ile Gly Leu Thr His Arg Val Arg Leu Gly Arg Asp Tyr Tyr Val Arg
245 250 255Ile Asp Gly Asn Asp
Tyr Ser Val Asp Pro Arg Cys Ile Gly Arg Phe 260
265 270Ile Asp Val Leu Ala Thr Pro Ala Arg Met Val Ala
Ser Cys Ala Gly 275 280 285Gln Val
Val Ala Asp His Asp Arg Asp Trp Gly His Ala Arg Thr Ile 290
295 300Thr Asp Pro Glu His Gln Ala Thr Ala Arg Leu
Leu Arg Gln Asp Leu305 310 315
320Ala Ala Arg Arg Arg Gln Ala Ser Thr Arg Ser His Ala Asp Gly His
325 330 335Val Val Ala Ile
Arg Ala Leu Pro Asp Tyr Asp Ala Leu Phe Gly Val 340
345 350Asp Phe Asp Pro Arg Pro Asp Val Glu Ala Val
Gln Lys Ser Ala Ala 355 360 365Gly
Gly Lys 37026371PRTSalinispora pacifica CNS996 26Met Asp Gly Val Glu
Pro Gln Val Arg Ala Leu Leu Ala Glu Phe Pro1 5
10 15Arg Met Pro Ala Thr Val Ile Ala Glu Arg Ile
Gly Trp Thr Lys Ser 20 25
30Leu Thr Ile Leu Lys Asp Arg Val Arg Glu Leu Arg Pro Leu Phe Val
35 40 45Pro Pro Asp Pro Thr Asp Arg Val
Glu Tyr Asp Pro Gly Glu Val Ala 50 55
60Gln Cys Asp Leu Trp Phe Pro Pro Gln Pro Ile Pro Val Gly Gly Gly65
70 75 80Ala Glu Arg Ile Leu
Pro Val Leu Ala Met Thr Cys Gly Tyr Ser Arg 85
90 95Val Thr Asp Ala Val Met Ile Pro Ser Arg Lys
Ala Gly Asp Ile Leu 100 105
110Ala Gly Met Trp Glu Ile Val Ala Gly Trp Gly Ala Cys Pro Arg Thr
115 120 125Leu Val Trp Asp Arg Glu Ala
Ala Ile Gly Gly Thr Gly Lys Leu Thr 130 135
140Thr Glu Ala Ala Ala Phe Ala Gly Thr Ile Gly Val Arg Ile Arg
Leu145 150 155 160Ala Pro
Pro Arg Asp Pro Glu Phe Lys Gly Leu Val Glu Arg Arg Asn
165 170 175Gly Phe Phe Glu Thr Ser Phe
Leu Ser Gly Arg Val Phe Thr Ser Pro 180 185
190Phe Asp Phe Asn Val Gln Ile Ser Asp Trp Leu Val Gln Arg
Ala Asn 195 200 205Thr Arg Val Leu
Arg Ala Ile Gly Leu Thr Thr Pro Thr Ala Arg Trp 210
215 220Ala Ala Asp Arg Ala Ala Met Val Ala Leu Pro Pro
Val Ala Pro Ala225 230 235
240Ile Gly Leu Thr His Arg Val Arg Leu Gly Arg Asp Tyr Tyr Val Arg
245 250 255Ile Asp Gly Asn Asp
Tyr Ser Val Asp Pro Arg Cys Ile Gly Arg Phe 260
265 270Ile Asp Val Leu Ala Thr His Ala Arg Met Val Ala
Ser Cys Ala Gly 275 280 285Gln Val
Val Ala Asp His Asp Arg Asp Trp Gly His Ala Arg Thr Ile 290
295 300Thr Asp Pro Glu His Gln Ala Thr Ala Arg Leu
Leu Arg Gln Asp Leu305 310 315
320Ala Ala Arg Arg Arg Gln Ala Ser Thr Arg Ser His Ala Asp Gly His
325 330 335Val Val Ala Ile
Arg Ala Leu Pro Asp Tyr Asp Ala Leu Phe Gly Val 340
345 350Asp Phe Asp Pro Arg Pro Asp Val Glu Ala Val
Gln Lys Ser Ala Ala 355 360 365Gly
Gly Lys 37027405PRTNocardia higoensis NBRC 100133 27Met Gln Asp Trp
Ala Gln Ile Lys Tyr Leu Tyr Thr Ser Glu Gly Leu1 5
10 15Ser Gln Arg Ala Ile Ala Ala Arg Leu Gly
Ile Ser Arg Asp Thr Val 20 25
30Ala Arg Ala Ile Arg Ser Glu Ser Pro Pro His Tyr Gln Arg Ala Val
35 40 45Gly Pro Ser Val Phe Asp Gly Phe
Glu Pro His Val Arg Gln Leu Leu 50 55
60Ala Glu Phe Pro Thr Met Pro Thr Ser Val Ile Ala Glu Arg Val Gly65
70 75 80Trp Val Gly Ser Ala
Ser Trp Phe Arg Lys Lys Val Ala Gly Leu Arg 85
90 95Pro Glu Tyr Ala Pro Lys Asp Pro Ala Asp Arg
Leu Glu Tyr Arg Pro 100 105
110Gly Asp Gln Val Gln Cys Asp Leu Trp Phe Pro Pro Val Thr Ile Ala
115 120 125Leu Gly Ala Asp Gln Phe Gly
Thr Pro Pro Val Leu Val Met Val Ala 130 135
140Ser Phe Ser Arg Phe Ile Thr Ala Met Met Ile Pro Thr Arg Thr
Thr145 150 155 160Ala Asp
Leu Val Ala Gly Met Trp Ala Leu Leu Ser Asn Gln Leu Gly
165 170 175Ala Val Pro Arg Arg Leu Leu
Trp Asp Asn Glu Ser Gly Ile Gly Arg 180 185
190Arg Gly Gln Leu Ala Thr Gly Val Ala Ala Phe Thr Gly Met
Ala Ala 195 200 205Thr Arg Ile Val
Gln Cys Lys Pro Phe Asp Pro Glu Ser Lys Gly Ile 210
215 220Val Glu Arg Ala Asn Gly Tyr Leu Glu Thr Ser Phe
Leu Pro Gly Arg225 230 235
240Arg Phe Ser Ser Pro Ala Asp Phe Asn Asp Gln Ile Gly Arg Trp Leu
245 250 255Pro Ile Ala Asn Thr
Arg Arg Val Arg Arg Ile Ala Ala Ala Pro Val 260
265 270Glu Leu Ile Gly Thr Asp Arg Ala Ala Met Thr Ala
Leu Pro Pro Val 275 280 285Ala Pro
Ser Val Gly Phe Thr Cys Arg Ser Arg Leu Pro Arg Asp Tyr 290
295 300Tyr Leu Arg Val Leu Gly Asn Asp Tyr Ser Ile
Asp Pro Thr Met Ile305 310 315
320Gly Arg Met Val Asp Val His Ala Asp Leu Asp Thr Val Ala Ala Arg
325 330 335Cys Glu Gly His
Val Val Ala Ser His Arg Arg Ala Trp Ser Thr Arg 340
345 350Gln Thr Ile Thr Asp Pro Ala His Val Glu Thr
Ala Ala Arg Leu Arg 355 360 365Glu
Thr Phe Thr Asn Asn Arg Phe Arg Asp Ala Gly Ala Asp Gly Met 370
375 380Val Arg Asp Leu Ala Asp Tyr Asp Ala Ile
Phe Gly Val Asp Phe Asp385 390 395
400Thr Glu Gly Val Ala 40528307PRTStreptomyces
thermoautotrophicus 28Met Thr Tyr Gly Pro Gly Gln Val Ala Gln Cys Asp Leu
Trp Phe Pro1 5 10 15Gln
Thr Arg Ile Pro Val Thr Ala Gly Gln Glu Arg Met Leu Pro Val 20
25 30Leu Val Met Thr Leu Gly Phe Ser
Arg Phe Met Thr Ala Thr Met Ile 35 40
45Pro Thr Arg Gln Ala Gly Asp Ile Leu Ser Gly Met Trp Gln Leu Ile
50 55 60Arg Gly Ile Gly Arg Val Thr Lys
Thr Leu Val Trp Asp Arg Glu Ala65 70 75
80Ala Ile Gly Gly Thr Gly Lys Val Ser Ala Pro Ala Ala
Ala Phe Ala 85 90 95Gly
Thr Leu Ala Thr Thr Ile Arg Leu Ala Pro Pro Arg Asp Pro Glu
100 105 110Phe Lys Gly Met Val Glu Arg
Asn Asn Gln Tyr Leu Glu Thr Ser Phe 115 120
125Leu Pro Gly Arg Arg Phe Val Ser Pro Ala Asp Phe Asn Asp Gln
Leu 130 135 140Gly Asp Trp Leu Val Arg
Ala Asn Ser Arg Thr Val Arg Ser Ile Gln145 150
155 160Gly Arg Pro Val Asp Leu Leu Glu Ala Asp Cys
Gln Ala Met Ile Pro 165 170
175Leu Pro Pro Val Thr Pro Pro Ile Gly Leu Asn His Arg Val Arg Leu
180 185 190Gly Arg Asp Tyr Tyr Val
Arg Val Asp Thr Val Asp Tyr Ser Val Asp 195 200
205Pro Gln Ala Ile Gly Arg Phe Val Asp Val Thr Ala Ser Leu
Glu Thr 210 215 220Val Thr Val Leu Cys
Asp Gly Gln Leu Val Ala Arg His Ala Arg Ser225 230
235 240Trp Ala Arg Gln Gly Val Ile Thr Asp Pro
Val His Ala Ala Thr Ala 245 250
255Ala Arg Met Arg Gln Ala Leu Ala Glu Asp Arg Gln Arg Arg Gln Ala
260 265 270Ala Val Arg Arg His
Ala Asp Gly His Ala Val Ala Leu Arg Ala Leu 275
280 285Pro Asp Tyr Asp Ala Leu Phe Gly Val Asp Phe Asn
Pro Pro Ser Thr 290 295 300Lys Ala
Lys30529408PRTSalinispora arenicola CNY231 29Met Leu Ser Val Glu Asp Trp
Ala Glu Ile Arg Arg Leu Arg Arg Ser1 5 10
15Glu Gly Met Ala Ile Gln Ala Ile Ala Arg Arg Leu Arg
Met Ser Arg 20 25 30Asn Thr
Val Lys Lys Ala Leu Ala Ser Asp Glu Pro Pro Arg Tyr Arg 35
40 45Arg Val Ala Lys Gly Ser Ile Val Asp Ala
Val Glu Pro Gln Ile Arg 50 55 60Ala
Leu Leu Ala Glu Phe Pro Glu Met Pro Thr Thr Val Ile Met Val65
70 75 80Arg Val Gly Trp Thr Arg
Gly Lys Thr Val Phe Cys Asp Arg Val Gln 85
90 95Gln Leu Arg Pro Leu Phe Arg Arg Pro Asp Pro Ala
Gln Arg Thr Glu 100 105 110Tyr
Leu Pro Gly Glu Leu Ala Gln Cys Asp Leu Trp Phe Pro Pro Ala 115
120 125Asp Val Ser Leu Gly Phe Gly Gln Val
Gly Arg Pro Pro Val Leu Val 130 135
140Met Val Ser Gly Tyr Ser Arg Trp Leu Ser Ala Val Met Ile Pro Ser145
150 155 160Arg Gln Ser Pro
Asp Leu Leu Gly Gly His Trp Thr Leu Ile Ser Gly 165
170 175Trp Asp Arg Thr Pro Lys Gly Leu Val Trp
Asp Asn Glu Ser Ala Val 180 185
190Gly Gln Trp Arg Ala Gly Arg Pro Gln Leu Thr Glu Ala Met Asn Ala
195 200 205Phe Arg Gly Thr Leu Gly Ile
Lys Val Ile Gln Cys Arg Pro Ala Asp 210 215
220Pro Glu Ala Lys Gly Leu Val Glu Arg Ala Asn Gly Tyr Leu Glu
Thr225 230 235 240Ser Phe
Leu Pro Gly Arg Arg Phe Ala Ser Pro Gln Asp Phe Asn Ala
245 250 255Gln Leu Thr Glu Trp Leu Val
Arg Ala Asn Asn Arg Gln His Arg Met 260 265
270Leu Gly Cys Arg Pro Val Asp Arg Trp Asp Ala Asp Arg Ala
Ala Met 275 280 285Leu Ser Leu Pro
Pro Val Ala Pro Val Val Gly Trp Arg Gln Ser Thr 290
295 300Arg Leu Pro Arg Asp His Tyr Val Arg Leu Asp Ser
Asn Asp Tyr Ser305 310 315
320Val His Pro Ala Ala Val Gly Arg Arg Val Asp Ile Val Ala Asp Ala
325 330 335Asp Arg Val Gln Val
Phe Cys Glu Asn Arg Leu Phe Ala Arg His Asp 340
345 350Arg Cys Trp Ala Lys His Gln Ser Ile Thr Asp Pro
Ala His Arg Gln 355 360 365Ala Ala
Ala Asp Leu Arg Thr Ala Ala Arg Gln Thr Pro Ala Ala Ala 370
375 380Gly Thr Thr Glu Val Glu His Arg Gln Leu Ala
Asp Tyr Asp Arg Met385 390 395
400Phe Gly Leu Asp Glu Val Ala Ala
40530424PRTStreptacidiphilus albus JL83 30Met Ile Ser Val Glu Asp Trp Ala
Glu Ile Arg Arg Leu His Arg Ala1 5 10
15Glu Glu Met Pro Ile Arg Ala Ile Ala Arg His Leu Gly Ile
Ser Lys 20 25 30Asn Thr Val
Lys Arg Ala Ile Ala Thr Asp Arg Ala Pro Val Tyr Glu 35
40 45Arg Ala Ala Lys Gly Ser Ala Val Asp Ala Phe
Glu Pro Ala Ile Arg 50 55 60Glu Leu
Leu Lys Ala Thr Pro Ser Met Pro Ala Thr Val Ile Ala Glu65
70 75 80Arg Ile Gly Trp Glu Arg Gly
Ile Thr Val Leu Lys Glu Arg Val Arg 85 90
95Glu Leu Arg Pro Ala Tyr Leu Pro Ala Asp Pro Thr Gly
Arg Thr Gln 100 105 110Tyr Leu
Pro Gly Glu Leu Ala Gln Cys Asp Leu Trp Phe Pro Pro Val 115
120 125His Ile Pro Val Gly Tyr Gly Gln Val Ala
Cys Pro Pro Val Leu Val 130 135 140Met
Val Ser Gly Tyr Ser Arg Met Ile Thr Ala Arg Met Ile Pro Thr145
150 155 160Arg Gln Thr Gly Asp Leu
Ile Ala Gly His Trp Arg Leu Leu Ser Asp 165
170 175Trp Gly Thr Val Pro Lys Met Leu Val Trp Asp Asn
Glu Ser Gly Ile 180 185 190Gly
Gln Gly Lys Leu Thr Thr Glu Phe Ala Ala Phe Ala Gly Leu Leu 195
200 205Ala Val Lys Val His Leu Cys Arg Pro
Arg Asp Pro Glu Ala Lys Gly 210 215
220Leu Val Glu Arg Ala Asn Gly Tyr Leu Glu Thr Ser Phe Val Pro Gly225
230 235 240Arg Thr Phe Thr
Gly Pro Asp Asp Phe Asn Thr Gln Leu Gly Asp Trp 245
250 255Leu Gln Gly Ala Asn Arg Arg Leu His Arg
Ser Ile Gln Ala Arg Pro 260 265
270Val Asp Arg Trp Glu Ala Asp Arg Ala Ala Met Leu Ala Leu Pro Pro
275 280 285Val Gly Pro Pro Gln Trp Tyr
Leu Phe His Thr Arg Ile Gly Arg Asp 290 295
300His Tyr Leu Arg Ile Asp Leu Asn Asp Tyr Ser Val His Pro Arg
Ala305 310 315 320Ile Gly
Arg Arg Val Gln Val Thr Cys Asp Ala Asp Leu Ile Arg Val
325 330 335Val Thr Asp Gly Gly Asp Leu
Val Ala Glu His Pro Arg Cys Trp Ala 340 345
350Arg His Gln Thr Leu Thr Asp Pro Asp His Lys Ser Ala Ala
Asp Gln 355 360 365Met Arg Gly Asp
Phe Ile His Ala Lys Ala Ala Ala Ala Ala Arg Ser 370
375 380Arg Thr Ala Thr Ala Leu Ala Pro Asp Asn Leu Gly
Ile Glu Val Glu385 390 395
400Glu Arg Gln Leu Asp Thr Tyr Asp Arg Ile Phe Thr Leu Ile Gln Gly
405 410 415Gly Ala Gly Gln Glu
Glu Asn Arg 42031430PRTGranulicoccus phenolivorans DSM 17626
31Met Glu Asp Trp Ala Leu Ile Arg Arg Leu Val Ala Asp Gly Val Pro1
5 10 15Gln Arg Gln Val Ala Arg
Asp Leu Gly Ile Gly Arg Ser Thr Val Ala 20 25
30Arg Ala Val Ala Ser Asp Arg Pro Pro Lys Tyr Glu Arg
Pro Ala Val 35 40 45Pro Thr Ser
Phe Thr Ser Phe Glu Pro Ala Val Arg Gln Leu Leu Thr 50
55 60Asp Thr Pro Ser Met Pro Ala Thr Val Leu Ala Glu
Arg Val Gly Trp65 70 75
80Glu Gly Ser Ile Thr Trp Phe Arg Ala His Val Arg Arg Leu Arg Pro
85 90 95Glu His Arg Pro Ile Asp
Pro Ser Asp Arg Leu Thr Trp Leu Pro Gly 100
105 110Asp Ala Ala Gln Cys Asp Leu Trp Phe Pro Pro Lys
Lys Ile Pro Leu 115 120 125Glu Asp
Gly Thr Thr Ser Leu Leu Pro Val Met Val Ile Thr Ala Ala 130
135 140His Ser Arg Phe Met Val Ala Arg Met Ile Pro
Thr Arg His Thr Gln145 150 155
160Asp Leu Leu Leu Gly Met Trp Glu Leu Leu Gln Gln Leu Gly Arg Val
165 170 175Pro Arg Arg Leu
Ile Trp Asp Asn Glu Ala Gly Ile Gly Arg Gly Lys 180
185 190Arg Asn Ala Glu Gly Ile Gly Ala Phe Thr Gly
Thr Leu Ala Thr Thr 195 200 205Leu
Val Arg Leu Lys Pro Tyr Asp Pro Glu Ser Lys Gly Val Val Glu 210
215 220Arg Arg Asn Gly Phe Leu Glu Thr Ser Phe
Met Pro Gly Arg Ser Phe225 230 235
240Ala Ser Ala Ala Asp Phe Asn Ala Gln Leu Thr Glu Trp Leu Glu
Thr 245 250 255Ala Asn Ala
Arg Val Val Arg Thr Ile Lys Ala Arg Pro Val Asp Leu 260
265 270Leu Asp Ala Asp Arg Ala Ala Met Leu Pro
Leu Pro Pro Val Ala Pro 275 280
285Val Ile Gly Trp Val Asn Arg Val Arg Leu Gly Arg Asp Tyr Tyr Val 290
295 300Arg Val Asp Ser Asn Asp Tyr Ser
Val Asp Pro Asn Val Ile Gly Arg305 310
315 320Phe Val Glu Val Arg Ala Asp Leu Ser Arg Val Val
Val Arg His Asp 325 330
335Gly Arg Arg Val Ala Ala His Asp Arg Val Trp Ala Arg Gly Met Thr
340 345 350Val Thr Asp Pro Ala His
Val Thr Ala Ala Lys Ala Leu Arg Glu His 355 360
365Phe Gln Arg Pro Arg Pro Ala Ala Asp Pro Gly Glu Ala Phe
Asp Arg 370 375 380Asp Leu Ala Asp Tyr
Asp Arg Ala Phe Gly Leu Leu Asn Gly Gly Leu385 390
395 400Gly Asn Ala Ala Ala Ala Asp Pro Gly Asp
Gly Thr Val Ala Gly Leu 405 410
415Gly Ala Gly Ala Val Ala Gly Leu Arg Asp Gly Glu Val Ala
420 425 43032265PRTStreptomyces fradiae
32Met Thr Thr Thr Thr Ala Thr Ser Ser Gly Arg Asp Val Thr Ser Glu1
5 10 15Leu Ala Tyr Leu Thr Arg
Val Leu Lys Ala Pro Ala Leu Arg Glu Ala 20 25
30Ala Ser Arg Leu Ala Glu Arg Ala Glu Ala Glu Gln Trp
Ser Phe Glu 35 40 45Glu Tyr Leu
Ala Ala Cys Leu Gln Arg Glu Val Ala Ala Arg Asp Ala 50
55 60His Gly Ala Glu Ser Arg Ile Arg Ala Ala Arg Phe
Pro Ser Arg Lys65 70 75
80Ser Leu Glu Asp Phe Asp Phe Asp His Gln Arg Ser Val Lys Arg Glu
85 90 95Thr Ile Thr His Leu Gly
Thr Leu Asp Phe Val Ala Gly Lys Glu Asn 100
105 110Val Val Phe Leu Gly Pro Pro Gly Thr Gly Lys Thr
His Leu Ala Thr 115 120 125Gly Leu
Gly Ile Arg Ala Cys Gln Ala Gly His Arg Val Ala Phe Ala 130
135 140Thr Ala Ala Glu Trp Val Thr Arg Leu Ala Lys
Ala His Glu Thr Gly145 150 155
160Arg Leu Asp Glu Glu Leu Val Arg Leu Gly Arg Ile Pro Leu Ile Ile
165 170 175Val Asp Glu Val
Gly Tyr Ile Pro Phe Glu Pro Glu Ala Ala Asn Leu 180
185 190Phe Phe Gln Phe Ile Ser Gly Arg Tyr Glu Arg
Ala Ser Val Ile Val 195 200 205Thr
Ser Asn Lys Pro Phe Gly Arg Trp Gly Glu Val Phe Gly Asp Asp 210
215 220Thr Val Ala Ala Ala Met Ile Asp Arg Leu
Val His His Ala Glu Val225 230 235
240Ile Ser Leu Lys Gly Asp Ser Tyr Arg Met Arg Gly Arg Asp Leu
Gly 245 250 255Arg Val Pro
Ala Ala Asn Thr Arg Glu 260
26533275PRTStreptomyces noursei ATCC 11455 33Met Pro Ala Arg Ala Ala Thr
Asp Asp Ala Ala Pro Ala Thr Ser Arg1 5 10
15Arg Thr Gly Gln Gln Thr Ala Ala Asp Leu Ala Phe Leu
Ala Arg Ala 20 25 30Met Lys
Ala Pro Ala Leu Leu Asp Ala Ala Glu Arg Leu Ala Glu Arg 35
40 45Ala Arg Lys Glu Ser Trp Thr His Ala Glu
Tyr Leu Val Ala Cys Leu 50 55 60Gln
Arg Glu Val Ser Ala Arg Glu Ser His Gly Gly Glu Ala Arg Val65
70 75 80Arg Ala Ala Arg Phe Pro
Ala Ile Lys Thr Val Glu Glu Leu Asp Val 85
90 95Thr His Leu Arg Gly Met Thr Arg Gln Gln Leu Ala
His Leu Gly Thr 100 105 110Leu
Asp Phe Ile Thr Gly Lys Glu Asn Ala Val Phe Leu Gly Pro Pro 115
120 125Gly Thr Gly Lys Thr His Leu Ala Ile
Gly Leu Gly Val Arg Ala Cys 130 135
140Gln Ala Gly His Arg Val Ala Phe Ala Thr Ala Ser Glu Trp Val Asp145
150 155 160Arg Leu Ala Ala
Ala His Gln Ala Gly Arg Leu Gln Val Glu Leu Thr 165
170 175Lys Leu Gly Arg Tyr Pro Leu Ile Val Ile
Asp Glu Val Gly Tyr Ile 180 185
190Pro Phe Glu Ala Glu Ala Ala Asn Leu Phe Phe Gln Leu Ile Ser Asn
195 200 205Arg Tyr Glu Arg Ala Ser Val
Ile Val Thr Ser Asn Lys Pro Phe Gly 210 215
220Arg Trp Gly Glu Val Phe Gly Asp Glu Thr Val Ala Ala Ala Met
Ile225 230 235 240Asp Arg
Leu Val His His Ala Glu Val His Ser Phe Lys Gly Asp Ser
245 250 255Tyr Arg Met Lys Gly Arg Glu
Leu Gly Arg Ile Pro His Asp Thr Thr 260 265
270Asp Asn Asp 27534252PRTTessaracoccus sp. T2.5-30
34Met Ser Thr Lys Asp Leu Thr Gly Glu Ile Gly Phe Leu Ala Arg Glu1
5 10 15Leu Lys Thr Pro Val Ile
Ala Glu Thr Phe Thr Asp Leu Gly Asp Arg 20 25
30Ala Arg Ala Glu Gly Trp Ser His Glu Glu Tyr Leu Ala
Ala Val Leu 35 40 45Gly Arg Gln
Val Ala Ser Arg Thr Ala Asn Gly Thr Arg Leu Arg Ile 50
55 60Ser Ala Ala His Leu Pro Ala Val Lys Thr Val Glu
Asp Phe Val Phe65 70 75
80Asp His Ile Pro Ala Ala Ser Arg Asp Leu Ile Ala His Leu Ala Thr
85 90 95Cys Thr Phe Ile Pro Lys
Arg Glu Asn Val Val Leu Leu Gly Pro Pro 100
105 110Gly Thr Gly Lys Thr His Leu Ala Ile Ala Leu Gly
Ile Lys Ala Ala 115 120 125Glu Ala
Ala His Pro Val Leu Phe Asp Ser Ala Thr Gly Trp Ile Asn 130
135 140Arg Leu Ala Ala Ala His Ala Ser Gly Ala Leu
Glu Arg Glu Leu Arg145 150 155
160Arg Leu Lys Arg Tyr Arg Ala Leu Ile Ile Asp Glu Val Gly Tyr Leu
165 170 175Pro Phe Asp Thr
Thr Ala Ala Ala Leu Phe Phe Gln Leu Ile Ala Ser 180
185 190Arg Tyr Glu Thr Gly Thr Val Ile Val Thr Ser
Asn Leu Pro Phe Ser 195 200 205Arg
Trp Gly Glu Thr Leu Gly Asp Asp Val Val Ala Ala Ala Thr Ile 210
215 220Asp Arg Leu Val His His Ala His Val Ile
Gly Leu Asp Gly Asp Ser225 230 235
240Tyr Arg Thr Arg Gly His Arg Arg Gln Pro Thr Lys
245 25035274PRTStreptosporangium sp. M26 35Met Ala Ala
Thr Thr Thr Asn Thr Thr Ala Asn Thr Thr Ala Thr Ser1 5
10 15Gly Arg Asn Leu Ala Ala Glu Leu Ala
Tyr Leu Thr Arg Val Leu Lys 20 25
30Ala Pro Ser Leu Ala Ala Ser Ile Asp Arg Leu Ala Glu Arg Ala Arg
35 40 45Thr Glu Ser Trp Thr His Glu
Glu Phe Leu Ala Ala Cys Leu Gln Arg 50 55
60Glu Val Ala Ala Arg Asp Ser His Gly Gly Glu Ala Arg Ile Arg Phe65
70 75 80Ala Arg Phe Pro
Ala Arg Lys Ala Leu Glu Asp Phe Asp Tyr Asp His 85
90 95Gln Arg Ser Leu Lys Arg Glu Val Ile Ala
His Leu Gly Thr Leu Asp 100 105
110Phe Val Met Ala Lys Asp Asn Val Val Phe Leu Gly Pro Pro Gly Thr
115 120 125Gly Lys Thr His Leu Ser Ile
Gly Leu Gly Ile Arg Ala Cys Gln Ala 130 135
140Gly His Arg Val Ala Phe Ala Thr Ala Ala Gln Trp Val Ala Arg
Leu145 150 155 160Ala Glu
Ala His Ala Ala Gly Thr Leu Gln Ala Glu Leu Ile Lys Leu
165 170 175Ser Arg Ile Pro Val Leu Ile
Val Asp Glu Val Gly Tyr Ile Pro Phe 180 185
190Glu Pro Glu Ala Ala Asn Leu Phe Phe Gln Leu Val Ser Ser
Arg Tyr 195 200 205Glu Arg Ala Ser
Leu Ile Val Thr Ser Asn Lys Pro Phe Gly Arg Trp 210
215 220Gly Glu Val Phe Gly Asp Asp Val Val Ala Ala Ala
Met Ile Asp Arg225 230 235
240Leu Val His His Ala Glu Val Ile Ser Leu Lys Gly Asp Ser Tyr Arg
245 250 255Leu Lys Asn Arg Asp
Leu Gly Arg Val Pro Ala Ala Asn Pro Ser Asn 260
265 270Asp Gln36269PRTStreptomyces thermoautotrophicus
36Met Thr Thr Thr Ala Arg Pro Lys Thr Ser Thr Pro Lys Asp Gly Leu1
5 10 15Pro Ser Leu Ile Ala Tyr
Leu Thr Arg Val Leu Lys Thr Pro Thr Ile 20 25
30Gly Ala Phe Trp Glu Glu Leu Ala Thr Gln Ala Arg Asp
Glu Asn Trp 35 40 45Ser His Glu
Glu Tyr Leu Ala Ala Leu Leu Gln Arg Gln Val Ala Asp 50
55 60Arg Glu Ser Lys Gly Thr Thr Met Arg Ile Arg Thr
Ala His Phe Pro65 70 75
80Gln Val Lys Thr Leu Glu Asp Phe Asn Leu Asp His Leu Pro Ser Leu
85 90 95Arg Arg Asp Val Leu Ala
His Leu Ala Thr Ser Thr Phe Val Ala Lys 100
105 110Ala Glu Asn Val Ile Leu Leu Gly Pro Pro Gly Ile
Gly Lys Thr His 115 120 125Leu Ala
Ile Gly Leu Gly Val Lys Ala Ala Arg Ala Gly Tyr Ser Val 130
135 140Leu Phe Asp Thr Ala Ser Asn Trp Ile Ala Arg
Leu Ala Ala Ala His145 150 155
160His Ala Gly Arg Leu Glu Thr Glu Leu Lys Lys Ile Arg Arg Tyr Lys
165 170 175Leu Ile Ile Ile
Asp Glu Val Gly Tyr Ile Pro Phe Asp Gln Asp Ala 180
185 190Ala Asn Leu Phe Phe Gln Leu Ile Ala Ser Arg
Tyr Glu Gln Gly Ser 195 200 205Val
Met Val Thr Ser Asn Leu Pro Phe Gly Arg Trp Gly Glu Thr Phe 210
215 220Ser Asp Asp Val Val Ala Ala Ala Met Ile
Asp Arg Leu Val His His225 230 235
240Ala Glu Val Leu Thr Leu Thr Gly Asp Ser Tyr Arg Thr Arg Gln
Arg 245 250 255Arg Glu Leu
Leu Ala Lys Glu Asn Arg Ala Ser Arg Asn 260
26537262PRTSalinispora arenicola CNS-205 37Met Ala Thr Lys Thr Ser Arg
Asn Val Ser Ser Glu Ile Ala Phe Leu1 5 10
15Thr Arg Ala Leu Lys Ala Pro Ser Leu Ala Ala Ser Val
Glu Arg Leu 20 25 30Ala Glu
Arg Ala Arg Ala Glu Ser Trp Thr His Glu Glu Phe Leu Ala 35
40 45Ala Cys Leu Gln Arg Glu Val Ala Ala Arg
Glu Ser His Gly Gly Glu 50 55 60Gly
Arg Ile Arg Ala Ala Lys Phe Pro Ala Arg Lys Ser Leu Glu Glu65
70 75 80Phe Asp Phe Asp His Gln
Arg Ser Leu Lys Arg Glu Ala Ile Ala His 85
90 95Leu Gly Thr Leu Asp Phe Ile Thr Gly Arg Glu Asn
Val Val Phe Leu 100 105 110Gly
Pro Pro Gly Thr Gly Lys Thr His Leu Ser Ile Gly Leu Gly Ile 115
120 125Arg Ala Cys Gln Ala Gly His Arg Val
Ala Phe Ala Thr Ala Ala Gln 130 135
140Trp Val Ser Arg Leu Ala Asp Ala His His Ala Gly Arg Leu Gln Asp145
150 155 160Glu Leu Val Lys
Leu Gly Arg Ile Pro Leu Leu Ile Val Asp Glu Val 165
170 175Gly Tyr Ile Pro Phe Glu Ala Glu Ala Ala
Asn Leu Phe Phe Gln Leu 180 185
190Val Ser Asn Arg Tyr Glu Arg Ala Ser Leu Ile Val Thr Ser Asn Lys
195 200 205Pro Phe Gly Arg Trp Gly Glu
Val Phe Gly Asp Asp Val Val Ala Ala 210 215
220Ala Met Ile Asp Arg Leu Val His His Ala Glu Val Val Ser Met
Lys225 230 235 240Gly Asp
Ser Tyr Arg Leu Lys Asp Arg Asp Leu Gly Arg Val Pro Ala
245 250 255Ala Thr Lys Thr Asn Asp
26038265PRTTrueperella pyogenes 38Met Thr Thr Lys Asp His Arg Gln Ile
Asp Ser Glu Leu Ala His Ile1 5 10
15Ser Lys Val Leu Lys Ala Pro Arg Ile His Ala Thr Tyr Phe Glu
Thr 20 25 30Ala Glu Gln Ala
Arg Glu Asp Gly Trp Ser Phe Glu Glu Tyr Leu Ala 35
40 45Ala Val Leu Ser Val Glu Ala Ser Ala Arg Gln Glu
Ser Gly Ala Asn 50 55 60Ala Arg Ile
Lys Arg Ala Gly Phe Pro Gln Val Lys Thr Ile Glu Glu65 70
75 80Phe Asp Phe Thr Ile Gln Pro Ser
Ile Asp Arg Ala Lys Ile Ala Arg 85 90
95Leu Glu Thr Ser Ala Trp Ile Ser Gln Ala Ser Asn Val Ile
Phe Leu 100 105 110Gly Pro Pro
Gly Thr Gly Lys Thr His Leu Ala Ile Gly Leu Gly Val 115
120 125Ile Ala Ala Arg Gln Gly Tyr Arg Val Leu Phe
Asp Thr Ala Ala Gly 130 135 140Trp Val
Gln Lys Leu Thr Gln Ala His Asp Arg Gly Glu Leu Pro Lys145
150 155 160Leu Leu Thr Lys Leu Gly Arg
Tyr Asp Leu Leu Val Val Asp Glu Val 165
170 175Gly Tyr Ile Pro Ile Glu Ala Glu Ala Ala Asn Leu
Phe Phe Gln Leu 180 185 190Val
Ser Thr Arg Tyr Glu Lys Ala Ser Leu Ile Met Thr Ser Asn Leu 195
200 205Pro Phe Ser Arg Trp Gly Glu Cys Phe
Gly Asp Gln Thr Ile Ala Ala 210 215
220Ala Met Ile Asp Arg Val Val His His Ala Glu Ile Leu Thr His Lys225
230 235 240Gly Thr Ser Tyr
Arg Ile Asn Gly His Glu Asp Ile Leu Pro Ser Val 245
250 255Asn Ala Glu Arg Gly Lys Ala Leu Lys
260 26539262PRTSalinispora arenicola CNS-205 39Met
Ala Thr Ser Ser Ser Arg Asn Val Ala Ser Glu Ile Ala Phe Leu1
5 10 15Thr Arg Ala Leu Lys Ala Pro
Ser Leu Ala Ala Ser Val Glu Arg Leu 20 25
30Ala Glu Arg Ala Arg Ala Glu Ser Trp Thr His Glu Glu Phe
Leu Ala 35 40 45Ala Cys Leu Gln
Arg Glu Val Ala Ala Arg Glu Ala His Gly Gly Glu 50 55
60Gly Arg Ile Arg Ala Ala Arg Phe Pro Ala Arg Lys Ser
Leu Glu Glu65 70 75
80Phe Asp Phe Glu His Gln Arg Ser Leu Lys Arg Glu Thr Ile Ala His
85 90 95Leu Gly Thr Leu Asp Phe
Val Ala Ser Lys Glu Asn Val Val Phe Leu 100
105 110Gly Pro Pro Gly Thr Gly Lys Thr His Leu Ser Ile
Gly Leu Gly Ile 115 120 125Arg Ala
Cys Gln Ala Gly His Arg Val Ala Phe Ala Thr Ala Ala Gly 130
135 140Trp Val Ser Arg Leu Ala Asp Ser His His Ala
Gly Arg Leu Gln Asp145 150 155
160Glu Leu Val Lys Leu Gly Arg Ile Pro Leu Leu Ile Val Asp Glu Val
165 170 175Gly Tyr Ile Pro
Phe Glu Ala Glu Ala Ala Asn Leu Phe Phe Gln Leu 180
185 190Val Ser Asn Arg Tyr Glu Arg Ala Ser Leu Ile
Val Thr Ser Asn Lys 195 200 205Pro
Phe Gly Arg Trp Gly Glu Val Phe Gly Asp Asp Val Val Ala Ala 210
215 220Ala Met Ile Asp Arg Leu Val His His Ala
Glu Val Ile Ser Met Lys225 230 235
240Gly Asp Ser Tyr Arg Leu Lys Asp Arg Asp Leu Gly Arg Val Pro
Ala 245 250 255Ala Thr Lys
Thr Asn Asp 26040268PRTStreptomyces pactum 40Met Ser Met Lys
Asn Gly Thr Asn Gln Ala Arg Thr Ser Arg Asp Val1 5
10 15Gly Ser Glu Leu Ile Tyr Leu Thr Lys Ala
Leu Lys Ala Pro Ala Leu 20 25
30Arg Asp Ala Ala Ala Arg Leu Ala Glu Arg Ala Arg Asp Glu Gly Trp
35 40 45Ser His Glu Glu Tyr Leu Ala Ala
Cys Leu Gln Arg Glu Val Ala Ala 50 55
60Arg Asp Ser His Gly Ala Glu Gly Arg Ile Arg Ala Ala Arg Phe Pro65
70 75 80Ser Arg Lys Ser Leu
Glu Asp Phe Asp Phe Asp His Gln Arg Ser Val 85
90 95Lys Arg Glu Val Ile Ala His Leu Gly Thr Leu
Asp Phe Val Ala Gly 100 105
110Lys Glu Asn Val Ile Phe Leu Gly Pro Pro Gly Thr Gly Lys Thr His
115 120 125Leu Ala Thr Gly Leu Gly Ile
Arg Ala Cys Gln Ala Gly His Arg Val 130 135
140Ala Phe Gly Thr Ala Ala Gln Trp Val Thr Arg Leu Ala Glu Ala
His145 150 155 160Gln Ala
Gly Arg Leu Ser Asp Glu Leu Thr Arg Leu Gly Arg Ile Pro
165 170 175Leu Ile Val Val Asp Glu Val
Gly Tyr Ile Pro Phe Glu Pro Glu Ala 180 185
190Ala Asn Leu Phe Phe Gln Phe Ile Ser Gly Arg Tyr Glu Arg
Ala Ser 195 200 205Val Ile Val Thr
Ser Asn Lys Pro Phe Gly Arg Trp Gly Glu Val Phe 210
215 220Gly Asp Asp Thr Val Ala Ala Ala Met Ile Asp Arg
Leu Val His His225 230 235
240Ala Glu Val Ile Ser Leu Lys Gly Asp Ser Tyr Arg Met Arg Gly Arg
245 250 255Asp Leu Gly Arg Val
Pro Ala Ala Asn Thr Gly Glu 260
26541248PRTMycobacterium canettii CIPT 140060008 41Met Ala Ala Lys Thr
Ala Thr Asn Ser Arg Asp Val Ala Ala Glu Leu1 5
10 15Ala Tyr Leu Thr Arg Ala Leu Lys Ala Pro Thr
Leu Arg Gly Ala Ile 20 25
30Glu Arg Leu Ala Asp Arg Ala Arg Thr Lys Thr Trp Ser Tyr Glu Glu
35 40 45Phe Leu Ala Ala Cys Leu Gln Arg
Glu Val Ser Ala Arg Glu Ser His 50 55
60Gly Gly Glu Gly Arg Ile Arg Ala Ala Arg Phe Pro Ser Arg Lys Ser65
70 75 80Leu Glu Glu Phe Asp
Phe Asp His Ala Arg Gly Leu Lys Arg Asp Thr 85
90 95Ile Ala His Leu Gly Thr Leu Asp Phe Val Thr
Leu Ala Ile Gly Ile 100 105
110Ala Ile Arg Ala Cys Gln Ala Gly His Arg Val Leu Phe Ala Thr Ala
115 120 125Ser Gln Trp Val Asp Arg Leu
Ala Ala Ala His His Ser Gly Thr Leu 130 135
140Gln Ser Glu Leu Ile Arg Leu Ala Arg Tyr Pro Leu Leu Val Val
Asp145 150 155 160Glu Val
Gly Tyr Ile Pro Phe Glu Pro Glu Ala Ala Asn Leu Phe Phe
165 170 175Gln Leu Val Ser Ser Arg Tyr
Glu Arg Ala Ser Leu Ile Val Thr Ser 180 185
190Asn Lys Pro Phe Gly Arg Trp Gly Glu Val Phe Gly Asp Asp
Val Val 195 200 205Ala Ala Ala Met
Ile Asp Arg Leu Val His His Ala Glu Val Ile Ala 210
215 220Leu Lys Gly Asp Ser Tyr Arg Ile Lys Asp Arg Asp
Leu Gly Arg Val225 230 235
240Pro Thr Val Thr Ala Asp Asp Gln
24542262PRTCellulomonas fimi ATCC 484 42Met Thr Ala Thr Lys Thr Thr Ala
Arg Asp Val Ser Ser Glu Leu Val1 5 10
15Phe Leu Thr Arg Ala Leu Lys Ala Pro Thr Met Arg Asp Ala
Val Asp 20 25 30Arg Leu Ala
Glu Arg Ala Arg Ala Glu Ser Trp Thr His Glu Glu Phe 35
40 45Leu Ala Ala Cys Leu Gln Arg Glu Val Ala Ala
Arg Glu Ala His Gly 50 55 60Gly Glu
Gly Arg Ile Arg Ala Ala Arg Phe Pro Ala Arg Lys Ser Leu65
70 75 80Glu Asp Phe Asp Phe Glu His
Ala Arg Gly Leu Ala Arg Asp Gln Ile 85 90
95Ala His Leu Gly Thr Leu Asp Phe Val Ala Ala Arg Asp
Asn Val Val 100 105 110Phe Leu
Gly Pro Pro Gly Thr Gly Lys Thr His Leu Ala Thr Gly Ile 115
120 125Ala Val Arg Ala Cys Gln Ala Gly His Arg
Val Leu Phe Ala Thr Ala 130 135 140Ser
Glu Trp Val Asp Arg Leu Ala Ser Ala His His Asp Gly Arg Leu145
150 155 160Gln Asp Glu Leu Arg Arg
Leu Gly Arg Tyr Pro Leu Leu Val Ile Asp 165
170 175Glu Val Gly Tyr Ile Pro Phe Glu Pro Glu Ala Ala
Asn Leu Phe Phe 180 185 190Gln
Leu Val Ser Ala Arg Tyr Glu Arg Ala Ser Leu Ile Val Thr Ser 195
200 205Asn Lys Pro Phe Gly Arg Trp Gly Asp
Val Phe Gly Asp Asp Thr Val 210 215
220Ala Ala Ala Met Ile Asp Arg Leu Val His His Ala Asp Val Ile Ala225
230 235 240Leu Lys Gly Asp
Ser Tyr Arg Leu Lys Asn Arg Asp Leu Gly Arg Pro 245
250 255Pro Ala Ala Thr Thr Asp
26043269PRTStreptomyces avermitilis MA-4680 = NBRC 14893 43Met Met Ser
Thr Lys Asn Gly Thr Asn Gln Ala Arg Thr Ser Arg Asp1 5
10 15Val Gly Ser Glu Leu Ile Tyr Leu Thr
Lys Ala Leu Lys Ala Pro Ala 20 25
30Leu Arg Asp Ala Ala Ala Arg Leu Ala Glu Arg Ala Arg Asp Glu Gly
35 40 45Trp Ser His Glu Glu Tyr Leu
Ala Ala Cys Leu Gln Arg Glu Val Ala 50 55
60Ala Arg Asp Ser His Gly Ala Glu Gly Arg Ile Arg Ala Ala Arg Phe65
70 75 80Pro Ser Arg Lys
Ser Leu Glu Asp Phe Asp Phe Asp His Gln Arg Ser 85
90 95Val Lys Arg Glu Val Ile Ala His Leu Gly
Thr Leu Asp Phe Val Ala 100 105
110Gly Lys Glu Asn Val Ile Phe Leu Gly Pro Pro Gly Thr Gly Lys Thr
115 120 125His Leu Ala Thr Gly Leu Gly
Ile Arg Ala Cys Gln Ala Gly His Arg 130 135
140Val Ala Phe Gly Thr Ala Ala Gln Trp Val Ala Arg Leu Ala Glu
Ala145 150 155 160His Gln
Ala Gly Arg Leu Ser Asp Glu Leu Thr Arg Leu Gly Arg Ile
165 170 175Pro Leu Ile Val Val Asp Glu
Val Gly Tyr Ile Pro Phe Glu Pro Glu 180 185
190Ala Ala Asn Leu Phe Phe Gln Phe Ile Ser Gly Arg Tyr Glu
Arg Ala 195 200 205Ser Val Ile Val
Thr Ser Asn Lys Pro Phe Gly Arg Trp Gly Glu Val 210
215 220Phe Gly Asp Asp Thr Val Ala Ala Ala Met Ile Asp
Arg Leu Val His225 230 235
240His Ala Glu Val Ile Ser Leu Lys Gly Asp Ser Tyr Arg Met Arg Gly
245 250 255Arg Asp Leu Gly Arg
Val Pro Ala Ala Asn Thr Gly Glu 260
26544261PRTMycobacterium kansasii 732 44Met Ala Ser Ser Asn His Arg Asp
Leu Ser Ala Glu Ile Ser Phe Leu1 5 10
15Thr Arg Ala Leu Lys Ala Pro Thr Met Arg Glu Ala Val Ala
Arg Leu 20 25 30Ala Glu Arg
Ala Arg Thr Glu Ser Trp Thr His Glu Glu Phe Leu Val 35
40 45Ala Cys Leu Gln Arg Glu Val Ser Ala Arg Glu
Ser His Gly Gly Glu 50 55 60Gly Arg
Ile Arg Ala Ala Arg Phe Pro Ser Arg Lys Ser Leu Glu Glu65
70 75 80Phe Asp Phe Asp His Ala Arg
Gly Leu Lys Arg Asp Leu Ile Ala His 85 90
95Leu Gly Thr Leu Asp Phe Val Thr Ala Arg Asp Asn Val
Val Phe Leu 100 105 110Gly Pro
Pro Gly Thr Gly Lys Thr His Leu Ala Ile Gly Ile Ala Ile 115
120 125Arg Ala Cys Gln Ala Gly His Arg Val Leu
Phe Ala Thr Ala Ser Glu 130 135 140Trp
Val Ala Arg Leu Ala Glu Ala His His Gly Gly Arg Leu Gln Pro145
150 155 160Glu Leu Leu Arg Leu Gly
Arg Tyr Pro Leu Leu Val Ile Asp Glu Val 165
170 175Gly Tyr Ile Pro Phe Glu Pro Glu Ala Ala Asn Leu
Phe Phe Gln Leu 180 185 190Val
Ser Ser Arg Tyr Glu Arg Ala Ser Leu Ile Val Thr Ser Asn Lys 195
200 205Pro Phe Gly Arg Trp Gly Glu Val Phe
Gly Asp Asp Val Val Ala Ala 210 215
220Ala Met Ile Asp Arg Leu Val His His Ala Glu Val Ile Ala Leu Lys225
230 235 240Gly Asp Ser Tyr
Arg Ile Lys Asp Arg Asp Leu Gly Arg Val Pro Gly 245
250 255Ser Thr Thr Glu Glu
26045272PRTStreptomyces niveiscabiei 45Met Pro Ala Thr Thr Arg Thr Thr
Ala Ala Ala Gly Pro Arg Thr Gly1 5 10
15Arg Gln Thr Ala Ala Asp Leu Ser Phe Leu Ala Arg Ala Met
Lys Ala 20 25 30Pro Ala Leu
Leu Asp Ala Ala Glu Arg Leu Ala Glu Arg Ala Leu Lys 35
40 45Glu Thr Trp Thr His Thr Glu Phe Leu Val Ala
Cys Leu Gln Arg Glu 50 55 60Val Ser
Ala Arg Glu Ser His Gly Gly Glu Ala Arg Ile Arg Thr Ala65
70 75 80Arg Phe Pro Ala Ile Lys Thr
Ile Glu Glu Leu Asp Val Thr His Leu 85 90
95Arg Gly Ile Thr Arg Gln Gln Leu Ala His Leu Gly Thr
Leu Asp Phe 100 105 110Ile Thr
Ala Lys Glu Asn Ala Val Phe Leu Gly Pro Pro Gly Thr Gly 115
120 125Lys Thr His Leu Ala Ile Gly Leu Ala Val
Arg Ala Cys Gln Ala Gly 130 135 140His
Arg Val Ala Phe Ala Thr Ala Ala Glu Trp Val Asp Arg Leu Ala145
150 155 160Ala Ala His His Ala Gly
His Leu Gln Ala Glu Leu Thr Lys Leu Ser 165
170 175Arg Tyr Pro Leu Ile Val Val Asp Glu Val Gly Tyr
Ile Pro Phe Glu 180 185 190Ser
Glu Ala Ala Asn Leu Phe Phe Gln Leu Val Ser Asn Arg Tyr Glu 195
200 205Arg Ala Ser Val Ile Val Thr Ser Asn
Lys Pro Phe Gly Arg Trp Gly 210 215
220Glu Val Phe Gly Asp Glu Thr Val Ala Ala Ala Met Ile Asp Arg Leu225
230 235 240Val His His Ala
Glu Val His Ser Leu Lys Gly Asp Ser Tyr Arg Met 245
250 255Arg Gly Arg Gln Leu Gly Arg Val Pro Thr
Ala Thr His Asp Thr Asp 260 265
27046274PRTActinomadura macra NBRC 14102 46Met Ala Gly Val Arg Ala Lys
Thr Thr Ala Thr Ala Pro Asn Ser Gly1 5 10
15Arg Asn Val Asp Ala Glu Leu Ala Tyr Leu Thr Arg Val
Leu Lys Ala 20 25 30Pro Ser
Leu Ala Ala Ser Val Gln Arg Leu Ala Glu Arg Ala Arg Ala 35
40 45Glu Ser Trp Thr His Glu Glu Phe Leu Ala
Ala Cys Leu Gln Arg Glu 50 55 60Val
Ala Ala Arg Glu Ala His Gly Ser Glu Ala Arg Ile Arg Ser Ala65
70 75 80Arg Phe Pro Ala Arg Lys
Ala Leu Glu Asp Phe Asp Tyr Asp His Gln 85
90 95Arg Ser Leu Lys Arg Glu Val Ile Ala His Leu Gly
Thr Leu Asp Phe 100 105 110Val
Ala Ala Arg Glu Asn Ala Val Phe Leu Gly Pro Pro Gly Thr Gly 115
120 125Lys Thr His Leu Ser Ile Gly Leu Gly
Ile Arg Ala Cys Gln Ala Gly 130 135
140His Arg Val Cys Phe Ala Thr Ala Ala Gly Trp Val Ala Arg Leu Ala145
150 155 160Gln Ala His Thr
Ala Gly Arg Leu Gln Asp Glu Leu Thr Lys Leu Ala 165
170 175Arg Ile Pro Val Leu Ile Val Asp Glu Val
Gly Tyr Ile Pro Phe Glu 180 185
190Pro Glu Ala Ala Asn Leu Phe Phe Gln Leu Val Ser Ser Arg Tyr Glu
195 200 205Arg Ala Ser Leu Ile Val Thr
Ser Asn Lys Pro Phe Gly Arg Trp Gly 210 215
220Glu Val Phe Gly Asp Asp Val Val Ala Ala Ala Met Ile Asp Arg
Leu225 230 235 240Val His
His Ala Glu Val Ile Ser Leu Lys Gly Asp Ser Tyr Arg Leu
245 250 255Lys Asn Arg Asp Leu Gly Arg
Val Pro Ala Ala Ser Thr Thr Asn Asp 260 265
270Gln Gln47264PRTSalinispora pacifica DSM 45549 47Met Thr
Thr Ile Thr Lys Thr Arg Pro Ala Pro Asp Gly Leu Gly Ser1 5
10 15Lys Leu Ala Tyr Leu Thr Arg Val
Leu Lys Thr Pro Thr Ile Gly Arg 20 25
30Thr Trp Glu Thr Leu Ala Asp Gln Ala Arg Gln Ala Asn Trp Ser
His 35 40 45Glu Glu Tyr Leu Ala
Ala Val Leu Glu Arg Gln Val Ala Asp Arg Glu 50 55
60Ser Ala Gly Thr Thr Met Arg Ile Arg Thr Ala His Phe Pro
Ala Ile65 70 75 80Lys
Thr Leu Glu Asp Phe Asn Leu Asp His Leu Pro Ser Leu Arg Lys
85 90 95Asp Val Leu Ala His Leu Ala
Thr Ala Thr Phe Ile Pro Lys Ala Glu 100 105
110Asn Val Ile Leu Leu Gly Pro Pro Gly Leu Gly Lys Thr His
Leu Ala 115 120 125Ile Gly Leu Gly
Ile Lys Ala Thr Gln Ser Gly Tyr Ser Val Leu Phe 130
135 140Asp Thr Ala Thr Asn Trp Ile Asp Arg Leu Ala Arg
Ala His His Arg145 150 155
160Gly Ala Leu Glu Ala Glu Leu Lys Lys Ile Arg Arg Tyr Lys Leu Ile
165 170 175Ile Val Asp Glu Val
Gly Tyr Ile Pro Phe Asp Thr Asp Ala Ala Asn 180
185 190Leu Phe Phe Gln Leu Val Ala Ser Arg Tyr Glu Ala
Gly Ser Ile Leu 195 200 205Val Thr
Ser Asn Leu Pro Phe Gly Arg Trp Gly Glu Val Phe Gly Asp 210
215 220Glu Val Val Ala Ala Ala Met Ile Asp Arg Leu
Val His His Ala Glu225 230 235
240Val Leu Thr Leu Ala Gly Glu Ser Tyr Arg Thr Arg Ala Arg Arg Glu
245 250 255Leu Leu Ala Lys
Asp Arg Asn Lys 26048264PRTSalinispora arenicola CNH646 48Met
Thr Thr Thr Thr Lys Ser Thr Pro Thr Pro Asp Gly Leu Gly Ser1
5 10 15Lys Leu Ala Tyr Leu Thr Arg
Val Leu Lys Thr Pro Thr Ile Gly Arg 20 25
30Thr Trp Glu Ile Leu Ala Asp Gln Ala Arg Glu Ala Asn Trp
Ser His 35 40 45Glu Glu Tyr Leu
Ala Ala Val Leu Glu Arg Gln Val Ala Asp Arg Glu 50 55
60Ser Ala Gly Thr Thr Met Arg Ile Arg Thr Ala His Phe
Pro Ala Ile65 70 75
80Lys Thr Leu Glu Asp Phe Asn Leu Asp His Leu Pro Ser Leu Arg Lys
85 90 95Asp Val Leu Ala His Leu
Ala Thr Ala Thr Phe Ile Pro Lys Ala Glu 100
105 110Asn Val Ile Leu Leu Gly Pro Pro Gly Leu Gly Lys
Thr His Leu Ala 115 120 125Ile Gly
Leu Gly Ile Lys Ala Thr Gln Ser Gly Tyr Ser Val Leu Phe 130
135 140Asp Thr Ala Thr Asn Trp Ile Asp Arg Leu Ala
Arg Ala His His Ala145 150 155
160Gly Ala Leu Glu Ala Glu Leu Lys Lys Ile Arg Arg Tyr Lys Leu Ile
165 170 175Ile Val Asp Glu
Val Gly Tyr Ile Pro Phe Asp Thr Asp Ala Ala Asn 180
185 190Leu Phe Phe Gln Leu Val Ala Ser Arg Tyr Glu
Thr Gly Ser Ile Leu 195 200 205Val
Thr Ser Asn Leu Pro Phe Gly Arg Trp Gly Glu Val Phe Gly Asp 210
215 220Glu Val Val Ala Ala Ala Met Ile Asp Arg
Leu Val His His Ala Glu225 230 235
240Val Leu Thr Leu Ala Gly Glu Ser Tyr Arg Thr Lys Ala Arg Arg
Glu 245 250 255Leu Leu Ala
Lys Asp Arg Gly Lys 26049264PRTSalinispora pacifica CNT584
49Met Thr Thr Ile Thr Lys Thr Arg Pro Ala Pro Asp Gly Leu Arg Ser1
5 10 15Lys Leu Ala Tyr Leu Thr
Arg Val Leu Lys Thr Pro Thr Ile Gly Arg 20 25
30Thr Trp Glu Thr Leu Ala Asp Gln Ala Arg Gln Ala Asn
Trp Ser His 35 40 45Glu Glu Tyr
Leu Ala Ala Val Leu Glu Arg Gln Val Ala Asp Arg Glu 50
55 60Ser Ala Gly Thr Thr Met Arg Ile Arg Thr Ala His
Phe Pro Ala Ile65 70 75
80Lys Thr Leu Glu Asp Phe Asn Leu Asp His Leu Pro Ser Leu Arg Lys
85 90 95Asp Val Leu Ala His Leu
Ala Thr Ala Thr Phe Ile Pro Lys Ala Glu 100
105 110Asn Val Ile Leu Leu Gly Pro Pro Gly Leu Gly Lys
Thr His Leu Ala 115 120 125Ile Gly
Leu Gly Ile Lys Ala Thr Gln Ser Gly Tyr Ser Val Leu Phe 130
135 140Asp Thr Ala Thr Asn Trp Ile Asp Arg Leu Ala
Arg Ala His His Arg145 150 155
160Gly Ala Leu Glu Ala Glu Leu Lys Lys Ile Arg Arg Tyr Lys Leu Ile
165 170 175Ile Val Asp Glu
Val Gly Tyr Ile Pro Phe Asp Thr Asp Ala Ala Asn 180
185 190Leu Phe Phe Gln Leu Val Ala Ser Arg Tyr Glu
Ala Gly Ser Ile Leu 195 200 205Val
Thr Ser Asn Leu Pro Phe Gly Arg Trp Gly Glu Val Phe Gly Asp 210
215 220Glu Val Val Ala Ala Ala Met Ile Asp Arg
Leu Val His His Ala Glu225 230 235
240Val Leu Thr Leu Ala Gly Glu Ser Tyr Arg Thr Arg Ala Arg Arg
Glu 245 250 255Leu Leu Ala
Lys Asp Arg Asn Lys 26050264PRTSalinispora pacifica CNS996
50Met Thr Thr Thr Thr Lys Thr Arg Pro Ala Pro Asp Gly Leu Gly Ser1
5 10 15Lys Leu Ala Tyr Leu Thr
Arg Val Leu Lys Thr Pro Thr Ile Gly Arg 20 25
30Thr Trp Glu Thr Leu Ala Asp Gln Ala Arg Gln Ala Asn
Trp Ser His 35 40 45Glu Glu Tyr
Leu Ala Ala Val Leu Glu Arg Gln Val Ala Asp Arg Glu 50
55 60Ser Ala Gly Thr Thr Met Arg Ile Arg Thr Ala His
Phe Pro Ala Ile65 70 75
80Lys Thr Leu Glu Asp Phe Asn Leu Asp His Leu Pro Ser Leu Arg Lys
85 90 95Asp Val Leu Ala His Leu
Ala Thr Ala Thr Phe Ile Pro Lys Ala Glu 100
105 110Asn Val Ile Leu Leu Gly Pro Pro Gly Leu Gly Lys
Thr His Leu Ala 115 120 125Ile Gly
Leu Gly Ile Lys Ala Thr Gln Ser Gly Tyr Ser Val Leu Phe 130
135 140Asp Thr Ala Thr Asn Trp Ile Asp Arg Leu Ala
Arg Ala His His Arg145 150 155
160Gly Ala Leu Glu Ala Glu Leu Lys Lys Ile Arg Arg Tyr Lys Leu Ile
165 170 175Ile Val Asp Glu
Val Gly Tyr Ile Pro Phe Asp Thr Asp Ala Ala Asn 180
185 190Leu Phe Phe Gln Leu Val Ala Ser Arg Tyr Glu
Ala Gly Ser Ile Leu 195 200 205Val
Thr Ser Asn Leu Pro Phe Gly Arg Trp Gly Glu Val Phe Gly Asp 210
215 220Glu Val Val Ala Ala Ala Met Ile Asp Arg
Leu Val His His Ala Glu225 230 235
240Val Leu Thr Leu Ala Gly Glu Ser Tyr Arg Thr Arg Ala Arg Arg
Glu 245 250 255Leu Leu Ala
Lys Asp Arg Asn Lys 26051260PRTNocardia higoensis NBRC 100133
51Met Asn Thr Asp Thr Asn Lys Gln Ile Glu Tyr Tyr Ala Asn Ala Leu1
5 10 15Lys Ala Pro Arg Ile Arg
Asp Ser Ala Ala Arg Leu Ala Glu Gln Ala 20 25
30Arg Asp Ala Gly Trp Thr His Glu Glu Tyr Leu Ala Ala
Val Leu Ser 35 40 45Arg Glu Val
Ser Ser Arg Glu Ser Ser Gly Ala Glu Thr Arg Ile Arg 50
55 60Ala Ala Gly Phe Pro Ala Arg Lys Ala Ile Glu Glu
Phe Asn Phe Asp65 70 75
80His Gln Pro Ala Leu Lys Arg Asp Thr Leu Ala His Leu Gly Thr Ala
85 90 95Gln Phe Ile Ser Lys Ala
Gln Asn Val Val Leu Leu Gly Pro Pro Gly 100
105 110Thr Gly Lys Thr His Leu Ser Ile Gly Leu Gly Ile
Ala Ala Ala His 115 120 125His Gly
His Arg Val Leu Phe Ala Thr Ala Val Glu Trp Val Thr Arg 130
135 140Leu Gln Thr Ala His Gln Gln Gly Arg Leu Ala
Ala Glu Leu Ala Lys145 150 155
160Leu Arg Arg Tyr Gly Leu Leu Ile Val Asp Glu Val Gly Tyr Ile Pro
165 170 175Phe Glu Gln Asp
Ala Ala Asn Leu Phe Phe Gln Leu Val Ser Ser Arg 180
185 190Tyr Glu His Ala Ser Leu Val Leu Thr Ser Asn
Leu Pro Phe Ser Arg 195 200 205Trp
Gly Asp Val Phe Ser Asp His Val Val Ala Ala Ala Met Ile Asp 210
215 220Arg Ile Val His His Ala Asp Val Leu Thr
Leu Lys Gly Asn Ser Tyr225 230 235
240Arg Leu Arg Asn Thr Glu Ile Asp Thr Leu Pro Ser Met Arg Gly
Asp 245 250 255Asn Gln Ala
Asp 26052262PRTSalinispora arenicola CNY231 52Met Ala Thr Ser
Ser Ser Arg Asn Val Ala Ser Glu Ile Ala Phe Leu1 5
10 15Thr Arg Ala Leu Lys Ala Pro Ser Leu Ala
Ala Cys Val Glu Arg Leu 20 25
30Ala Glu Arg Ala Arg Ala Glu Ser Trp Thr His Glu Glu Phe Leu Ala
35 40 45Ala Cys Leu Gln Arg Glu Val Ala
Ala Arg Glu Ala Tyr Gly Gly Glu 50 55
60Gly Arg Ile Arg Ala Ala Arg Phe Pro Ala Arg Lys Ser Leu Glu Glu65
70 75 80Phe Asp Phe Glu His
Gln Arg Ser Leu Lys Arg Glu Thr Ile Ala His 85
90 95Leu Gly Thr Leu Asp Phe Val Ala Ser Lys Glu
Asn Val Val Phe Leu 100 105
110Gly Pro Pro Gly Thr Gly Lys Thr His Leu Ser Ile Gly Leu Gly Ile
115 120 125Arg Ala Cys Gln Ala Gly His
Arg Val Ala Phe Ala Thr Ala Ala Gly 130 135
140Trp Val Ser Arg Leu Ala Asp Ser His His Ala Gly Arg Leu Gln
Asp145 150 155 160Glu Leu
Val Lys Leu Gly Arg Ile Pro Leu Leu Ile Val Asp Glu Val
165 170 175Gly Tyr Ile Pro Phe Glu Ala
Glu Ala Ala Asn Leu Phe Phe Gln Leu 180 185
190Val Ser Asn Arg Tyr Glu Arg Ala Ser Leu Ile Val Thr Ser
Asn Lys 195 200 205Pro Phe Gly Arg
Trp Gly Glu Val Phe Gly Asp Asp Val Val Ala Ala 210
215 220Ala Met Ile Asp Arg Leu Val His His Ala Glu Val
Ile Ser Met Lys225 230 235
240Gly Asp Ser Tyr Arg Leu Lys Asp Arg Asp Leu Gly Arg Val Pro Ala
245 250 255Ala Thr Lys Thr Asn
Asp 26053278PRTStreptacidiphilus albus JL83 53Met Glu Thr Asp
Gln Asp Ala Thr Glu Pro Ala Ser Thr Lys Ala Ala1 5
10 15Ser Gly Arg Arg Thr Ala Lys Gln Thr Ala
Ser Asp Leu Ala Phe Tyr 20 25
30Ala Arg Ala Met Lys Ala Pro Val Leu Leu Asp Ala Ala Glu Arg Leu
35 40 45Ala Glu Arg Ala Arg Ala Glu Thr
Trp Thr His Ala Glu Tyr Leu Val 50 55
60Ala Val Leu Gln Arg Glu Val Ala Ala Arg Glu Ser His Gly Gly Glu65
70 75 80Gly Arg Ile Arg Ala
Ala Arg Phe Pro Ala Val Lys Thr Leu Glu Glu 85
90 95Leu Asp Val Thr His Leu Arg Gly Leu Thr Arg
Gln Gln Leu Ala His 100 105
110Leu Gly Thr Leu Asp Phe Ile Thr Gly Lys Glu Asn Ala Ile Phe Leu
115 120 125Gly Pro Pro Gly Thr Gly Lys
Thr His Leu Ala Thr Gly Leu Ala Val 130 135
140Arg Ala Cys Gln Ala Gly His Arg Thr Ala Phe Ala Thr Ala Ala
Gln145 150 155 160Trp Val
Asp Arg Leu Lys Glu Ala His Ala Ala Gly Arg Leu Gln Asp
165 170 175Glu Leu Val Lys Leu Gly Arg
Tyr Pro Leu Ile Val Ile Asp Glu Val 180 185
190Gly Tyr Ile Pro Phe Glu Ala Asp Ala Ala Asn Leu Phe Phe
Gln Leu 195 200 205Ile Ser Asn Arg
Tyr Glu Arg Ala Ser Val Ile Val Thr Ser Asn Lys 210
215 220Pro Phe Gly Arg Trp Gly Glu Val Phe Gly Asp Glu
Thr Val Ala Ala225 230 235
240Ala Met Ile Asp Arg Leu Val His His Ala Glu Val His Ser Leu Lys
245 250 255Gly Asp Ser Tyr Arg
Met Arg Gly His Asp Leu Gly Arg Val Pro Thr 260
265 270Ala Val Asn Glu Thr Ser
27554265PRTGranulicoccus phenolivorans DSM 17626 54Met Ala Thr Lys Lys
Asn Glu Ala Ser Glu Ala Leu Lys Gln Leu Thr1 5
10 15Tyr Leu Ala Ser Ala Leu Lys Ala Pro Arg Ile
Thr Glu Ala Ala Ala 20 25
30Arg Leu Ala Asp His Ala Arg Asp Ala Gly Trp Thr Tyr Glu Glu Tyr
35 40 45Leu Ala Ala Val Leu Asp Arg Glu
Val Ala Ala Arg Asn Ala Ser Gly 50 55
60Ala Gln Leu Arg Ile Arg Ala Ala Gly Phe Pro Ala Arg Lys Thr Ile65
70 75 80Glu Glu Phe Asp Trp
Asp Ala Gln Pro Ala Val Arg Gln Gln Ile Ala 85
90 95Ala Leu Ala Ser Gly Gly Phe Leu Thr Glu Ala
Arg Asn Val Val Leu 100 105
110Leu Gly Pro Pro Gly Thr Gly Lys Thr His Leu Ala Thr Gly Leu Gly
115 120 125Ile Ala Ala Ala Asn His Gly
His Arg Ile Leu Phe Ala Thr Ala Thr 130 135
140Glu Trp Val Thr Arg Leu Thr Asp Ala His Arg Ala Gly Arg Leu
Pro145 150 155 160Leu Glu
Leu Thr Arg Leu Arg Arg Tyr Gly Leu Ile Ile Val Asp Glu
165 170 175Val Gly Tyr Leu Pro Phe Asp
Gln Asp Ala Ala Asn Leu Phe Phe Gln 180 185
190Leu Val Ser Ser Arg Tyr Glu His Ala Ser Leu Ile Leu Thr
Ser Asn 195 200 205Leu Pro Phe Ser
Gly Trp Gly Gly Val Phe Gly Asp Gln Ala Val Ala 210
215 220Ala Ala Met Ile Asp Arg Val Val His His Ala Asp
Val Leu Thr Leu225 230 235
240Lys Gly Ala Ser Tyr Arg Leu Arg Asn Arg Gly Ile Glu Thr Leu Pro
245 250 255Ser Ile Lys Thr Gln
Asp Thr Ala Asp 260 2655529DNAStreptomyces
viridifaciens 55gtcctccccg cgcccgcggg ggtcagccg
295638DNAMycobacterium tuberculosis MD17240 56tgcccgcctc
ctcactcgcg ccattccggc gctcgccg
385729DNAStreptomyces fradiae 57gtcctctccg cgcgagcgga ggtgagccg
295820DNAStreptomyces noursei ATCC 11455
58gctgctgtgg tgccgcggct
205920DNATessaracoccus sp. T2.5-30 59gcgacgacgg cgcgggtcag
206027DNAStreptosporangium sp. M26
60ggagtcgatg gcgtgctggt aggcggc
276120DNAMycobacterium rhodesiae 61ccggagtcgg gtggggtgtt
206230DNAStreptomyces thermoautotrophicus
62gttgcgaccc ctcgtagggg cgatgaggac
306329DNASalinispora arenicola CNS-205 63cggagcaccc ccacgtgcgt ggggaggac
296420DNATrueperella pyogenes
64gtagtgtgca tggtgtacat
206529DNASalinispora arenicola CNS-205 65ctgctccccg cgcatgcggg ggtgatccc
296629DNATessaracoccus flavescens
66gtccgtcccc gccacgcggg ggtgagccc
296729DNAStreptomyces pactum 67gtcctctccg cgcgagcgga agtgagccg
296826DNAMycobacterium canettii CIPT 140060008
68cacttgaggg cgcacaggcg cccgaa
266924DNAMycobacterium canettii CIPT 140070010 69ttgccgccat taccgccggc
gccg 247034DNACellulomonas
fimi ATCC 484 70gcgtcagaag gcgcgcgtga tcatcgcgcg cttg
347129DNANakamurella multipartita DSM 44233 71gggctcatcc
ccgcgggcgc ggggagcac
297229DNAStreptomyces avermitilis MA-4680 = NBRC 14893 72cggttcacct
ccgctcgcgc ggagagcac
297329DNAStreptomyces viridifaciens 73cggctgaccc ccgcgggcgc ggggaggac
297430DNAMycobacterium kansasii 732
74cccctgtgag tcgagtgagc ggagcgagcg
307521DNAMycobacterium kansasii 732 75ggtggccacc gtggcgatat t
217629DNAStreptomyces niveiscabiei
76cggttcaccc ccgcgtccgc gggaagcac
297729DNAActinomadura macra NBRC 14102 77gggaccatcc ccgcgtgcgc ggggagcag
297830DNASalinispora pacifica DSM
45549 78gttgcgatcc ctggtagggg cgatgaggac
307928DNASalinispora arenicola CNH646 79ggaagacccc cgcgcgtgcg
gggacgag 288029DNASalinispora
pacifica CNT584 80gtcctcccca cgcacgtggg ggtgctccg
298130DNANocardia higoensis NBRC 100133 81gtcctcatcg
cccctacgag ggatcgcaac
308220DNASalinispora arenicola CNY231 82gcgtccgccg ctcctggttc
208328DNAStreptacidiphilus albus JL83
83tgttccctgc gcccgcaggg atggtccc
288428DNAGranulicoccus phenolivorans DSM 17626 84gtgctccccg cgcgagcggg
gatgatcc 288574DNALeft Transposon
End for WP_068751155.1 85gcgtgtcgat gccgagtgga tactgacccc gtggcgccga
gtgaattttg acccctctcc 60gtcagagtga ggga
7486357DNARight Transposon End for WP_068751155.1
86aaccaacaac aaccatgaga ggggtcaatt ttcgatcggc aacagggggt cagttttcgt
60ccggcgttga caacgcgctc gccgtcggcc aggtcgaact gcgggttgtg ggccgaggcg
120tgtgcgatga gccgttcgcc ctcccacacc gtgggcagca acgtcgccga cggcgccccg
180tcgccccccg tgatccagag tgcggcgccc acctcgctga ggagcgcgcg acagcggtcg
240acgttgtcca cccgctggct ggcggggacg tacacggtgc tggctgggat gtcgggctcc
300atggtcggcc atcgtaatcg ggccgcacgc cctgccgcgg aggccctcac cgagggc
35787161DNALeft Transposon End for WP_012181244.1 87ttccaccctc tgacggtacg
tgcgtgacga gcgcgggacg cgacggacga ggttctgacg 60cccccagatg gctggcctcc
acgcgggctg gtgtcggctc cgcctgaaaa ctgaccctgt 120ggttccggct gaattttgac
cccttccgaa agcatcggga g 16188611DNARight
Transposon End for WP_012181244.1 88acatcaacaa ccaacgaggg ggtcaaaatt
cggccggaac agaggggtca aatttcagcc 60ggcgttgaca gctggtcccc gcaactgggc
gtcgctgcgg ggaccagctt gtcggtggcc 120ttcggccacg gctgctccac ctgcgcgtgg
ccgtggccag cagccgagcg gtgacctcgt 180cgacctgttc aggttcgccg caggcccgac
cgctcgccag cgccgggccc cgcgctgtgg 240ataaccccga tgcctgtgga tgatcccctg
atcatggcca gatctgctag ccctcgtatc 300accagcagcg cggacgcagg ggcgggccgt
agaagggacc acactcgtat ggcaccaaga 360gcaaggcgtg gacgtttgaa gcctcaatga
agggcagctc cgctgggaga tgcgaccccg 420gaagccgccc agatcttcac tgcgcacgtt
aaacaggcct caatgatttc agccggcgtt 480gacagcgtcg agtggttcag aggaagcgtt
ggcccatcac gagaccaccg gagcggtgat 540gtaggaccgc tgcaaccctt cgatcagtcc
agctcgcgca gagggtccac cgtgggtgct 600atcaggagcg t
6118996DNALeft Transposon End for
WP_015748250.1 89taactgccgt ccgtcttcgt ggaggctgtc ggttccgggt gaatcctgac
ccagctggtc 60cggttgaaaa ctgacccacc tcgcaacgat cgtggg
9690640DNARight Transposon End for WP_015748250.1
90cgttgccgtt ccatctcgtc gaagaagacc cgctgattgc gaggaaacga cgatggcagg
60acgtcaggac agggctcggt gatgacagtg atgagggcgc gtcaggaacc ggggataacg
120cccggtaccg gcaagaccca cctcgcgacg ggcatctcga tccgcgcctg ccaagccggc
180catcgcgtcc tgttcgcgac cgccgccgag tgggtcgctc gcctggccga ggcccaccac
240gccggccggc tgcaaccaga actcgttcgt ctgggccgct acccgttgct cgtgatcgac
300gaggtcggct acatcccgtt cgaagccgag gccgcgaacc tgttcttcca actcgtgtcc
360aaccgctacg aacgggcctc gctcatcgtc accagcaaca agcccttcgg ccggtggggc
420gaagtcttcg gcgacgacgt cgtcgctgcc gcgatgatcg accgcctggt ccaccacgct
480gacgtcatcg ccctcaaagg cgacagctac cgactcaaag accgcgacct gggccgccca
540ccagcggcca acaccgacca atgaccacca ggtgggtcag ttttcaaccg gacaaggagg
600gtccaagttc aaccggagtt gacagaggcg atcacgggca
64091700DNALeft Transposon End for WP_018254221.1 91cccagatcag cgtcaacaca
cggcctccga tcttgccggc ggcgctcggt gacaggccca 60cctcgtgcag caacgtgcgc
agcagccgga accacacgcc ggcgggcacg tcccggccga 120gcaggctgac ccggccggtc
atcagcgcct ggtgggtgta gcggtccagg tcggcaagcg 180gcgcggccac cggcactgga
tccgacggcc ggccgtgccg ccatctcgat gtcgacggtg 240gtggcatcct ccagccggca
gccgtgctcg ccgcaactga ccatcagcgg tagccgaccc 300aacagcgcgc ggcccctgcc
cggcaggccg gcgctgaccg ggcacgtccg gttatgccac 360tgccggggca gccacggtcc
gcgccaccgc cggtatcggg cgagcccgtt gccgatgccg 420gcgcggggtg cgtctttgat
ccacgaccag ttcgatccct ggtaggggtg tcgacgacgc 480gctgagactg accctctagc
ggcggattgg gactgacccc tccgaggctg gaggggtgat 540cgcattggaa gactgggctg
aggtccgtcg gttgcatcgg gctgagggtg ttccgatcaa 600ggagattgcg cgtcggttgg
gtttggcgcg taacacggtc cgttcggcgc ttcgggccga 660ggcgccgccg tcgcgtgagc
gcggccctcg cggttcgtgt 7009267DNARight Transposon
End for WP_018254221.1 92cacgaaacga cctccggggg gtcaatccca aaacgtcgat
agggggtcag ttccaatccg 60ccgttga
679392DNALeft Transposon End for KWX05882.1
93tggtggtgcc cccctgtcgt tgcttcctga aaactgaccc ccaggcagtc tccgaatctt
60gaccccctcc taactcttgg agggtgctga ag
929490DNARight Transposon End for KWX05882.1 94cccgacccca ggggatcaat
tttcccggag cggaaggggg tcagttttcg cgaagcgttg 60acacccccca gatgccgatc
tggtgtgtgg 9095700DNALeft Transposon
End for WP_029025341.1 95gcatctgcgc cgccacgacc tacggcacac cggactcacc
tggatggctg acgcaggtgt 60gccggtgcac gtcctgcgga agattgccgg acacgggtcg
caccaccacc cagcgctacc 120tacaccccga ccggcagtcg ttcgccgacg ccggaacgac
gctgagcgcc cgcttgaagg 180cccgccggtc cccagatggt ccccagctac gcgccgtaga
tcaggaaagt cctactaccc 240cagtacgaat tagaggccgt tgaccagggt ttcaccccgg
tcgacggcct cttttcgtgc 300tgtcgggacg gccggattcg aaccgacgac cccttgttac
caaccgtagt cacaccgggc 360atgtcatcgc ttgtcgcgag taagccctga gctgcaccaa
cacggtcaga tacttgttcc 420tcctggtcac tgttcgtcag ctcctggcag gagttttcgg
gggtaaactg tcgacgacgc 480gctgagactg accctctagc ggcggattgg gactgacccc
tccgaggctg gaggggtgat 540cgcattggaa gactgggctg agatccgtcg gttgcatcgg
gctgagggtg ttccgatcaa 600ggagattgcg cgtcggttgg gtttggcgcg taacacggtc
cgttcggcgc ttcgcgctga 660tgcgccgccg tcgcgtgagc gcggccctcg cgggtcgtcg
70096570DNARight Transposon End for WP_029025341.1
96aacgaaacga cccccgaggg ggtcaatccc aatccgtcga tagggggtca gttccagtcc
60gccgttgaca taaacggggg ccggcagacc aaccatcagg gtttccacca cgcccacccc
120cgccaccgag atgtccgtcg ccacgaacag agcccgcaag ccgtggcgtg gtcatcgagc
180atggacgacc ggggagaggc cagagcaaag tctcccgctg gctcatgcag agtcctcccg
240gacgatgact gatgcgcatt ggtcatgaga cgacctacca gcgccgtgct gcgacgaccg
300gttaaacccg cccttcccca agtgcacgga tctatcgcag cacaccgcag ccgatctcca
360cgccgtcgag caacgcctca acaaccggcc ccgcaagacc ctcaactggc gcaccccggc
420cgacgtcttc cataccgcac tggcaccctg acgatcgtca ccgttgcgac gactgcttga
480atccgcccgg aaccacgggg tcatttttca ggcggagccg accacgagac cgcactcggg
540atgagtatcg ccctggacat gggaaaatac
57097104DNALeft Transposon End for WP_012182723.1 97gtgagaactg gagcaaagcc
atgtcgaggc aacttgtcgg ctccgcctga aaactgacca 60cgtggttccg gctgaatcgg
gaccacctcc tgaagcatcg ggag 1049872DNARight Transposon
End for WP_012182723.1 98agatcaacca aaccgaggtg gtccagtttc agccggacca
cggtggtccc gtttgaggcg 60gcgttgacac aa
7299181DNALeft Transposon End for WP_040800760.1
99actcggcata gtacggcgtg ctcgaagcct gcctcgaacg aactcaatct cgctgtcgtt
60ccatgactta tcgctcacgg cctcgaccga cccgaacccg tctgtcagcg ctcgttgaaa
120accaggccac tggcgctctt tgaaaatcgg ccacccgtcc acgattggaa gggtgatctc
180a
18110065DNARight Transposon End for WP_040800760.1 100acaaccatca
agtggcccga ttttcgaaga gcgccaccgg ccggttttcg aagagcgtcg 60acacc
65101700DNALeft
Transposon End for WP_055422289.1 101gcgtaagcgg aggtgagccg tgccacagga
cctccaggtc cgcgcgagcg gaggtgagcc 60ggattacacg gtcgaagccg tgggcatcgg
cgggtcctct ccgcccgagc ggaggcgagc 120cggatcccga ttcgggtttc gtggtgacgt
ccgggtcctc tccgcgcgag cggaggtaag 180ccggacagca cagcgctgat agcttgcttg
atcgcgtcct ctccgggcga gcggaggtga 240accggtcggc gacggttcga cgtaccggtg
gtggctgtcc tctccgcgcg agcggaggtg 300agccggaggt ctgcccatgg caggcaactc
tcggccggtc ctctccgtgc gagcggaggt 360gagccgtccg ccaggagcag cagccgctcg
atcgccgagt cctctccgcg cgagcggagg 420tgaaccggcg gtctgggact ccgtcctggg
cgacatcatg tcctatccgt gcgagcggag 480gtgagccgtt ctgccgcccg tcaggtgagc
cggtcagccg gtcctctccg cgcgagcgga 540agtgagccgc cggtggggtg ggcggtgtgg
ccgcatcaca cgtcctctcc gcgtaagcgg 600aggtgagctg tcggcgacgg ctgaaaagtg
gaccagtagc ggcgcttgaa agttgaccct 660ctccaggggt ctggctcgtt gagtcaggcc
gggaggagtg 700102116DNARight Transposon End for
WP_055422289.1 102ccaacatcaa ctgacgcagc ggggggtcat ttttcacccg ccggaatcgg
ctcacatttc 60aagcgtcgcc gacagtgagc cggggcctga cgaggctctg gagtgcaagg
acgtgc 116103116DNALeft Transposon End for BAC75258.1
103cccgtctcag tcccgagtgc cagcgtctgt cggcgacggc tgaaaagtgg ctcagtcaag
60atttccgtga agccgatcgt gcgcctgtgc gccggtgacg ctccgtcaca ggccgc
11610489DNARight Transposon End for BAC75258.1 104ccaacatcaa ctgacgcagc
ggggggtcat ttttcacccg ccggaatcgg ctcacatttc 60aggctcagtc aagatttccg
tgaagccga 89105143DNALeft
Transposon End for WP_055721037.1 105tcagggcggt tctcgcgcgc ctgtccggat
gccggttcac tgtcggcggc gggtgaatcg 60tgaccctctg acggcggatc aaaactgacc
cacttcgtgg tcttctgacc aacctgatct 120tgatcgaagg tcaggagaag agg
143106220DNARight Transposon End for
WP_055721037.1 106acaccggcaa ccagacccgg tgggtcagaa ctcgaccgcc cacaccgggt
caggattcag 60ccgccgccga cactttgagc cgacgcccag catctccaca agtagggctg
tacgacaggc 120ttccctctcg cccagctacg aaaggtgacc agtacctggt tcctgaacgc
gctggaaggg 180ggccgacccc agaaggccag cccccggtgt caggccggga
220107645DNALeft Transposon End for WP_083119439.1
107tcacagcgac gggtcgatga gattgtcgat atcgcgccgg agggcatccg ggtccaccgg
60cggaagattc ttgcgacgac gaatcagttc cgcaggtcct gcgctcgaac gaggcaacgg
120ccgtagctca gcaacaggtg tcccgtcccg agtaatgacg atgcgctcgc catgctcgac
180acggcgcaga acgtctcctc cactgttgcg caattcgcgc accgtgactg catccacgta
240cgaagtgtat caccggtgag acgtcaaagc agagtaccaa gagttcaacg ttgcgcacgc
300gagaaccggg gggatcgaat tgcagggtta tcagtaaccc atggcatatt gcattgccgc
360ccaatgattt tgacacttcg aaccgctaga tggtcccccg tttcagaacc cattgatgcc
420ggacggcggt acctctgcgg cgatctcaaa gtcggcgtca tagtgaatca gcgtcacaac
480atgggcctct gcgatggtcg cgatcaggag gtcggccatg ccgacggcgc ggttgttgac
540gacggttgaa gttccggcca gtagcgacgg ttgaaaagta ggccacccaa tcagattgga
600tgggtgatct ccttggaaga ctgggccttg atccggcatc ttcac
645108264DNARight Transposon End for WP_083119439.1 108ccacgaccac
gtggcctact tttcaccgtc gcatctggcc tggttttcga ccgtcgtcaa 60cagcggtggc
gacccgtgct ggcgagctgc cgttgcgcgc cgagcgccgt ctgccagtgc 120tcgtcatcgg
ttggcaggta ctcataggcc agacggcggt ctgaccggag ttgttcgtag 180tctttagggc
tgcgagcgca atagagcgcc tcggcgtcga gctaagcagt tcatccgcgc 240tggtatcgat
gagatgccga gcga
264109229DNALeft Transposon End for WP_018815463.1 109cgacgacgcg
ctgagactga ccctctagcg gcggattggg actgacccct ccgaggctgg 60aggggtgatc
gcattggaag actgggctga ggtccgtcgg ttgcatcggg ctgagggtgt 120tccgatcaag
gagattgcgc gtcggttggg tttggcgcgt aacacggtcc gttcggcgct 180tcgggccgag
gcgccgccgt cgcgtgagcg cggccctcgc ggttcgtgt
229110113DNARight Transposon End for WP_018815463.1 110cacgaaacga
cctccggggg gtcaatccca aaacgtcgat agggggtcag ttccaatccg 60ccgttgacag
ggggtgctcc gacggccgag agtgcgcacg tgatggcggc tat 11311169DNALeft
Transposon End for WP_077349293.1 111gtcgatgccg catgaatact gacccccagg
tgccgagcga aagttgaccc cctcggtcag 60agtgaggga
69112700DNARight Transposon End for
WP_077349293.1 112gcaccgccaa cgggaccagg atgcggatct cggcagccca cttcccgcag
gtcaagacca 60tgcaggactt cgtcttcgac cacatccccg ccgccacgcg cgacgtgatc
gcgcacctgg 120caactggcac cttcatcgcc aagcgggaga acgtggtcct gctggggccg
ccgggaccgg 180gaagacccac ctcgcaatcg ccgtcgcgat gaaggccgcg gaagcgtcct
acctggtgtt 240gttcgactcc gcgaccggct ggatccaccg gctggcccaa gcccatgcca
agggcggtct 300cgagcgagaa ctacgacggc tgaaccgcta tcgactgctc atcatcgacg
aagttggata 360cctgccgttg gacgcggccg cggcggcgtt gttcttccaa ctcgtcgcct
cccgctacga 420gaccggatcg atcatcgtga cctcgaacct gcccttcagc cgctggggcg
agaccctcgg 480cgacgatgtc gtcgcagcag ccaccatcga ccggctcgtc caccacgccc
acgtcatcgg 540cctggacggc gactcctacc gaacccgcgc acaccgcgac accatcaacc
agcaaaccaa 600gtagccaacc aacaagaaac ccaccgagag gggtcaattt tcaatcagca
gagggggggt 660cagttttggc ccggcgttga cagcccgtct ggtccatcac
700113120DNALeft Transposon End for WP_043464545.1
113gtcgagcggg gcacctcgca cggctcggat gtcgacaacg cctcaaattg tgccgcctcc
60aacggatgaa aagtggcccg tctgaacggt ctggctcgtt gagtcaggca gggaggaagg
12011479DNARight Transposon End for WP_043464545.1 114ccaacatcaa
ctgactacgg gtccactttt catccgctgc ctccggtcca cgattgcgcc 60gttaccgaca
ctcggacag 7911597DNALeft
Transposon End for ALD73443.1 115tgtgttttca ctcattttgt cagtgacggt
tgaaaagtag cccaaaaacg acggtcgaaa 60agtagcccat ccgagaacaa ttggatgggt
gatttcc 9711684DNARight Transposon End for
ALD73443.1 116acaacaagca caattgggct acttttccac cgtcagaagt gggctacttt
tcgaccgtcg 60ctgacacatt tcgattcctt tcac
84117113DNALeft Transposon End for EUA10187.1 117cggggctagc
tgtcggcgac gcctgaaaac tgaccccgtg tcgacgcccg aattttgacc 60cccttcgtct
gagtctcggc ttacgagccg agaggagaag ggagttgtta gct
11311890DNARight Transposon End for EUA10187.1 118acaccaaggg ggtcaatttt
caaccgtcga aaaggggtca attttcggcc gccgttgaca 60ctagctccgc agcgtgtgtt
ggagataggg 90119413DNALeft
Transposon End for WP_079046208.1 119tggtggtgcc cccctgtcgt tgcttcctga
aaactgaccc ccaggcagtc tccgaatctt 60gaccccctcc taactcttgg agggtgctga
aggtggagga ctgggcagag atacgccggt 120tgcatcgggc ggagggcgtg cccatcaagg
agatcgcgcg tcggctgggg gtggcccgga 180acacggtgcg ggccgcgttg aactcggacc
ggccgccgaa gtacgagcgg gcctcgcgcg 240gacaggtcgc ggacgcgttc gagccgcaga
tgcgggccct gctcaaggag tggccgagga 300tgccggcccc ggtgatcgcc gagcggatcg
gctggccgta ctcgatggcg ccgctgcgca 360agcggctcgc gctgatccgg ccggagtacc
tgggtatcga cccggtcgac cgg 413120631DNALeft Transposon End for
WP_003414864.1 120ccgcctgttc gagaccccat gccacgctcg gctggccgac gacgatcacc
catcgcagac 60accacacttg gtaggggttg ccagttgttg gccgggtgag tggtcggcgc
gccgttgccc 120ggggtagggt tcgaggtctt tggatgatgg gcgtttccac gctgcccaaa
ggatgacctc 180gacgtgtccg agttcacgtt gaccgcgtga agttaaaccg gtgccgagcg
tgcactgagg 240gcgaaatccg gcgccgattt tccgccctga gttcacgttg ggcgacggcg
cccatgaacg 300acgccacatc gcacatggcg ctcaggccaa gcaccagccc atctccgtcg
ccggccaccg 360tcaccgatcg aacgacctcg acccccgccc tggcaacaac acgccgctgc
cctctacacc 420tccgcgctgt cgaaaattgt cacggagcct tgcgggggct ggtgcgactg
atatgacgca 480ccttccgcca gaggctagcc cgacgtttac tgacgttact gctgcttacc
gtttgtcgac 540ggcacgtgaa aactgacccc ggcgcggcac ccgaattttg accccctggt
cgggtggact 600ggctctaccc gagccaggag gaccgaaggg a
631121700DNARight Transposon End for WP_003414864.1
121cgcttcccgg ctcggaagtc gttggaagag ttcgactttg agcatgctcg tggcctcaaa
60cgcgacacca tcgcacatct gggcaccctg gatttcatca ccgcccgcga taacgtcgtg
120tttttgggcc ccgcctggca ccgggaagac tcatcttgcg gtcggcctgg cgatacgcgc
180gtgtcaggcc ggtcatcggg tgctgttcgc caccgccgcc gaatgggtag cacggctcgc
240cgaggctcac cacgccgggc gcatctacgc cgaactcacc cggctttgcc gctatccgct
300cctggtggtt gacgaagtcg gctacattcc gtttgagccc gaggccgcca acctcttctt
360ccagctggtg tcctcccggt atgagcgggc cagcttgatc gtcacgtcca ataaggcctt
420cggccggtgg ggcgaggttt tcggcggcga cgacgtcgtt gctgccgcca tgatcgaccg
480cctcgtccac catgctgaag tcgtcgccct caaaggcgac agctaccggc tcaaagaccg
540cgacctcggc cgcgtcccac cagccggaac caccgaagaa taaccaccaa ccgcccggtc
600tagggggtca attttcagat gccgtcaggg ggtcagtttt cgggtgccgt tgacaccgtt
660cacaagggcg tttcgagcaa cgcgtcgacg caacttcggc
700122700DNALeft Transposon End for WP_072653819.1 122catcgcggcg
gcgacggtct cgtcgccgaa cgtctctccc cagcgtccga agggcttgtt 60cgaggtgacg
attacgctcg cccgttcgta tctgttcgag atcagctgga agaacaggtt 120cgccgcctcg
gcctcaaacg ggatataccc cacctcgtcg atcacgatca gcgggtagcg 180gccgagcttg
accagctcat cctggagccg gccggtgtga tgggcggcgg ccaggcggtc 240gacccactgg
gcggcggtgg cgaacgcgac ccggtggccg gcctggcagg cccggatcgc 300cagcccggtc
gcgatgtgcg tcttgcccgt gcccggcggc cccaggaaaa ttgcgttctc 360cttggccgcg
atgaagtcca aggttcccag gtgggcgagt tgctggcggg tcattccgcg 420tagatgggcg
aggtcgagtt cctcgatcgt cttgatcgcg gggaagcggg cggcgcggat 480gcggccctcg
ccgccgtggc tgtcgcgggc cgagacctcc cgctgcaggc aggcgacgag 540gtattcgagg
tggctccagg actcggcctg ggcgcgttcg gcgagccgct cggcggcgtc 600cagcagggcg
ggggctttca tcgcgcgggc gaggaaggcc aggtcggcgg cggtttgtcg 660ggtggtgcgg
gcggcgggaa cgccggcttt cgccgcgtcg
700123700DNARight Transposon End for WP_072653819.1 123tggctcgcac
cgccaccgcc acgaccaccg cagagcagac gcccaaggaa gggcggcaga 60cctcggcaga
cctcgcgttc ctcgcccgcg ccatgaaggc acccgctctg ctggacgccg 120ccgagcgctt
ggccgagcgg gcccgcaccg agtcctggac ccacctcgaa tacctggtcg 180cctgcctgca
gcgcgaggtc tccgcccgtg acagccacgg cggcgaacag cgcatccggg 240cctgccaggc
cggccaccgg gtcgcgttcg ccaccgcctc ccaatgggtc gaccgcctcg 300ccgccgccca
ccacaccggc cgcctccagg acgaactcgt caaactcggc cgctacccgt 360tgatcgttgt
cgacgaggtc ggctacatcc ccttcgagcc cgaggccgcg aacctgttct 420tccagctcgt
ctcgaacaga tacgaaagag cggcgcggta cagctgggcg taagctggct 480gcccccccgc
ctgcgcgggg aggacggacc gtggcgggat tccccggtgg gcggcggcga 540cggtctcgtc
gccgaaggtt tctccccagc gtccgaaggg cttgttcgag gtgacgatca 600cactggctct
ttcgtcctcc ccgcacccgc gggggtcagc cgctgccgcc caccggggaa 660tcccgccacg
gtccgtcctc cccgcgctcg caggggtcag
700124700DNALeft Transposon End for WP_034088311.1 124tccttatgca
gcctgaccgt cttggtgacg gttggcgcgt gcttcgccct cgcccggatc 60acggacggct
tcggcgccac agcgaggaca ccccaccgct catccgcgcc agcgcctacc 120tgtggcgacg
catcaccccc ggctactcgg acggtagccc cgtaggagcc tgctgacgcc 180cctcaggggg
cgtgaccacc agtggttacg ccagccggat cgaggcggat cgagcagggt 240tcaggaggat
ggaaacaggg gtgtttgtcc cggttattcg gcaaagccgc aggtcagggc 300tgtgctcggc
atgggttcga ctcccctagg ctccacccta taaatgccct ctgaactgcg 360gaaacgtgga
gtcggagggc atttctcgta accactggtg ggcacggccg tgcccactgg 420gagctcgggg
aggcaagctt ggatgcggtc tgctcttgtt gaccctacgg gttgcctggg 480atcactatgc
gcttgacctg cggaaatgcg agtccgtgtg tcgctatggc gcacccaggt 540acgtacgaga
cttccagggg acagcgggga gcatctcccc taggtctgga gagcgggccg 600gggtgcggag
cgggggtggt gcggtctgtc ggtgtgacgt ccgtggggac gtgtcggccc 660gtacgatggc
ggtgtccgtt ggccgtttgg gaggcgcgcg
700125700DNARight Transposon End for WP_034088311.1 125tgctcctcgc
gcccgcgggg atggtcctct cgaaaccaac agcgacctga ccgccgattc 60ctgctccccg
cgcttgcagg gatggtccct acatctacct gtccacgact ccccagcccg 120actgctcccc
gtgcctgcgg ggatggtccc atggaggcct tggtgggcag catggcggcg 180atctgctccc
tgcacccgcg gggatggtcc ccctctcagg ggtgccgtcc gccccgtccc 240gagcctcctg
cgcccccgcc cgcggggatg gttccgtttt gtcctgcgcg ccgcatcctg 300ccgcgcccta
ttctgagagg ccgatgcaac tcgggcgata gcttcagagg agcctcctga 360tgatcctgag
gaggcgcgcc caccagtggt tacgccagcc ggatcgaggc ggatcgagca 420gggttcagga
ggatggaaca ggggtgtttg tcccggttct tcggcaaagc cgcaggtcag 480cgctgtgctc
ggcatgggtt cgactcccct aggctccaca cccgaagacc cttctgacct 540gcggaaacgc
tgtcaggagg gtctttttgt cccggttttc tctgctgacg tgtcagcata 600ggggtgctga
cacgatcttg aaggagtcag tatttccggg gcccggggag atgcttgctc 660ccctgactgt
tcccctgggc tcgcggtccc actggggaga 70012677DNALeft
Transposon End for WP_081684113.1 126gtcgacggca ctccaaaatg aggccatagc
ggcagtcgaa aacgaggcca cccagacaga 60ttggttgggt gatctct
77127107DNARight Transposon End for
WP_081684113.1 127gaacaacacg aaccggtggc ctcgttttca accgtcggtt cggcctcagt
ttcaagcgcc 60gtcgacaatg atccggtcgc gggagaggtc cggcgcagct ccgggca
10712875DNALeft Transposon End for WP_067466072.1
128ttctacgtgt cgacgacggg tgaaaactga cccccaggcg acggccgaat tctgaccccc
60ttcactctct ggagg
75129108DNARight Transposon End for WP_067466072.1 129caactcgtaa
ccaacgacaa catcaagggg tcaactttca gacgtcccca gggggtcaga 60attcagccgc
cgttgacact acgagccgga gggcgatccc ctgcccta
108130700DNALeft Transposon End for WP_086568729.1 130ccgccacgtc
actcgatggc aaggcccgac gtaggagcgc cgctgcagtt ccctgagtcg 60accccttgcc
gggaattgcg gcgttgttgc tggtcacggc tgcagtatga aacgttctcg 120gtgcgtgagg
acgtggggtg tccggtggca aggtgacccg gaccgcagcg aagcgaggac 180cggaagcgcc
gcggcaggac taatcagcgg atgctgtgcg gtgttggcgt ggtggacggc 240ggggtcgtag
ggggtgccgg tgcgccagca ggcgtggatg acgcgtagcc atcctcggcc 300gaggatacgc
accgcgttcg ggtggcgttt gccgcggtct cgggctcgct ggtaggcctg 360ttcggcccag
ggattggcgt gccggctgtt gtcggcgaag tgggtcatgg ccttgcaggc 420ggcccggttg
gcgctgtgcc gaagccggca gcgtggactt tgccgctggc gcgggtgacc 480ggggccatgc
cgacctcagc ggcgatccga tcgcaggcca tgcacccgcc gggccgagag 540gtgacaatcc
ttgaaagggc tgcctaccca ggatccctta ttagccgatt gccccggcca 600cgtacggcga
gtaccgaagt cggcgatcgt gtcaacgacg cgcgaacggt gaccccctga 660cgacggccga
atagtgaccc ccttcaggca ctctggaggg
700131432DNARight Transposon End for WP_086568729.1 131caatcaagta
cacgacaggg gtcagttttc atctgccgat agggggtcac agttcgaacg 60gcgtcgacag
atcgtcccgc ctcagccatc cacgatcgct caagaccttc aggcgcagtg 120cgtaccaagg
cttcatgcga agccagttcc tcttcctgcg ctcaacgccc ggccgggact 180tgcggggcga
aacgtcttcc tagctcgcgc cagtggctgt aaagctcccg ccgctatcga 240tgcccgtgat
gaccttcgac tggtccagtc taggtcccca ctgtgaggtc aggcgcgtgg 300gcggcgctgg
tcgcgaccgg ccgaagagcc gcagcgcggc cagcgaccgc ccgacgcgcg 360gcaggccgcc
gcaggcgggc tgcccttgaa gacgagtata taagtttctt tcgatcttgc 420tggtcgtact
cg
432132103DNALeft Transposon End for AEE46996.1 132cgtacgggac ggacgcgtca
gcgcgccgtt gtcggcgacg cctgaaaact gaccccgtgg 60cgacggatga aaacggaccc
cctcggcaac gctgaggagg gtg 103133331DNARight
Transposon End for AEE46996.1 133ccaccacacg agaggggttc actttcggac
gtcgccaggg ggttcgtttt cagccgtcgt 60tgacagccgt cctgtccggc acctcgaccc
cgcgccgctc cagctcctcg cggaccgtgc 120gggggatgtc gcggcacccg cactcctgcg
cgtgcgcgag gtgccacgtg acgcgtgcgt 180ccatcggcgc gccctggccc agcacgtggg
cgtcgtgcca ctcggcgttc atgcgtccac 240cccagcactg cgtcccggac ctgtcgaggg
gcgcaggtcc cggggacgac gcgaggccgc 300cccgacggac gggacggcct cgcgacgagg g
331134342DNALeft Transposon End for
WP_015288588.1 134cctcaacgcc ggtcgccaca gccgctcaaa cgtggcggcc gcgcgtattc
gaccgtccgt 60agtggttcgt taaagcgttg cagcacaacg catacaacaa tcaatcggcc
attgagttcg 120cacgctcatg cagttgcgaa tggtcggtgg atgctcgaag ccaatgcaga
aagcgaccgg 180ctcgatgagc tgcaccagca gtatcaccga gatgatcttg gcggtaatca
ggcttgtatc 240tcttgtagtg tgtcggcggc aactgaatac tgaccagagc gcggcaactg
aaaattgacc 300agcttcctgg agagccttgg ctatgggcca aggaggaagc ga
342135153DNARight Transposon End for WP_015288588.1
135aaccaagctg gtcaattttc gattgccgac acctgatcag ttttcggttg ccgttgacat
60agtgcccaaa acacgcaccc acatcagatg cagaacccct tgacaaccaa tagggaatct
120cttcgcatga tggaggttgc tggcaccaat cca
153136431DNALeft Transposon End for WP_067342985.1 136cgcctgttcc
cctggggaac accaaaggcc cgtaccggcc gcggaagctg agatcctccg 60cgcccggtac
gggccttgtc gttgtgcacc accaggttac gctgtccttt gaagccaagt 120tagtccggtg
cgatgtcctg acctggagcg actaggttgg atgtcaaagg acattcctct 180caactgccgc
gggtgttccc caggggaaca cctcgcgcca ctgtccctgg acgaatcagg 240caggcagcgg
gtcaaacggg aacaccctca agaccaccac gatcagcgac cattaaagcc 300gtgctgtggt
gccgcggccg tggactctcg gcgtcgctcg aatcctgacc ctctgacggc 360gtatcaaatc
tgacccactt ggtgatcttc atctgaccct ggccttggtg atcaaggtca 420ggagggaaga g
431137198DNARight
Transposon End for WP_067342985.1 137gcaaccgcag cagacaccac ctgggtcagg
attcaaccga ccaaagtggg tcagagttca 60gccgccgccg acatggaccg cggagatgcc
cctgcggacg cgctgctcgc cgccgggttc 120accttgctgc tgtggtgccg tggtcgtgga
tcgcggagat gccctgcgcg tggcaggaac 180agtcgaggac aagatcga
198
User Contributions:
Comment about this patent or add new information about this topic: