Patent application title: USING SPLIT DEAMINASES TO LIMIT UNWANTED OFF-TARGET BASE EDITOR DEAMINATION
Inventors:
IPC8 Class: AC12N1510FI
USPC Class:
1 1
Class name:
Publication date: 2020-06-04
Patent application number: 20200172895
Abstract:
Described herein are methods and compositions for improving the
genome-wide specificities of targeted base editing technologies. Herein,
we describe dimeric base editing (BE) technologies that use split
deaminases (sDA) that are functional when brought into close proximity to
each other, one fused to a ZF and one to an nCas9-UGI protein comprising
one or more UGIs, so as to limit the ability of the deaminase domain from
deaminating at off-target ssDNA target sites independent of nCas9 R-loop
formation. Thus, provided herein are fusion proteins comprising: (i) a
first portion of a split deaminase ("sDAI") enzyme fused to a
programmable DNA-binding domain; or (ii) a second portion of a split
deaminase ("sDA2") fused to an nCas9 protein. The present invention also
includes the vectors and cells comprising the vectors, as well as kits
comprising the proteins and nucleic acids described herein.Claims:
1. A fusion protein comprising: (i) a first portion of a split deaminase
(sDA1) enzyme fused to a programmable DNA-binding domain, selected from
the group consisting of zinc fingers (ZFs), transcription activator
effector-like effectors (TALEs), and Clustered Regularly Interspaced
Short Palindromic Repeats (CRISPR) Cas RNA-guided nucleases (RGNs), and
catalytically inactive Cas9 (dCas9) nicking Cas9 (nCas9), wherein the
sDA1 is an N-terminal truncated, catalytically inactive or deficient
derivative of a parental deaminase selected from the group consisting of
hAID, rAPOBEC1, mAPOBEC3, hAPOBEC3A, hAPOBEC3B, hAPOBEC3C, hAPOBEC3F,
hAPOBEC3G, hAPOBEC3H, and variants thereof, and optionally one or more
uracil glycosylase inhibitor (UGI) sequences; or (ii) a second portion of
a split deaminase (sDA2) fused to nCas9, and one or more uracil
glycosylase inhibitor (UGI) proteins, or any orthogonal DNA targeting
domain as the one used for its complementary sDA1 portion, wherein the
sDA2 is a C-terminal truncated, catalytically inactive or deficient
derivative of hAID, rAPOBEC1*, mAPOBEC3, hAPOBEC3A, hAPOBEC3B, hAPOBEC3C,
hAPOBEC3F, hAPOBEC3G, hAPOBEC3H, and variants thereof; wherein the
co-expression of the fusion protein of (i) with the fusion protein of
(ii) in eukaryotic cells and their subsequent co-localization at adjacent
genomic target sites provides a catalytically active base editor.
2. A pair of the fusion proteins of claim 1, comprising: (i) a first fusion protein comprising a first portion of a split deaminase (sDA1) enzyme fused to one or more ZFs, wherein the sDA1 is an N-terminal truncated, catalytically inactive or deficient derivative of a parental deaminase selected from the group consisting of hAID, rAPOBEC1, mAPOBEC3, hAPOBEC3A, hAPOBEC3B, hAPOBEC3C, hAPOBEC3F, hAPOBEC3G, or hAPOBEC3H, and variants thereof that have altered substrate specificities or activities and optionally one or more UGI sequences; and (ii) a second fusion protein comprising a second portion of a split deaminase (sDA2) fused to an nCas9 protein and one or more UGI proteins, wherein the sDA2 is a C-terminal truncated, catalytically inactive or deficient derivative of the same parental deaminase as SDA1, wherein the co-expression of the fusion protein of (i) with the fusion protein of (ii) in eukaryotic cells and their subsequent co-localization at adjacent genomic target sites provides a catalytically active base-editor.
3. A nucleic acid encoding a fusion protein of claim 1.
4. A composition comprising one or more nucleic acids, wherein the nucleic acids encode the pair of fusion proteins of claim 2.
5. A method of targeted deamination of one or more selected cytosines in a nucleic acid, the method comprising contacting the nucleic acid with the pair of fusion proteins of claim 2, and one or more gRNAs that interact with Cas9 domains in the fusion proteins, preferably wherein one of the fusion proteins comprises nCas9, the other fusion protein comprises ZF or TALE, the ZF or TALE is targeted to a sequence of 9-24 bp adjacent to the target site of the gRNA for the nCas9, wherein the gRNA binds to the nucleic acid comprising the selected cytosine.
6. The method of claim 5, wherein the nucleic acid is in a cell, and the method comprises contact the cell with the fusion proteins or expressing the fusion proteins in the cell.
7. The method of claim 6, wherein the cell is a eukaryotic cell.
8. A method of improving specificity of targeted deamination in a cell, the method comprising expressing in the cell, or contacting the cell with, the pair of fusion proteins of claim 2, and one or more gRNAs that interact with Cas9 domains in the fusion proteins, preferably wherein one of the fusion proteins comprises nCas9, the other fusion protein comprises ZF or TALE, the ZF or TALE is targeted to a sequence of 9-24 bp adjacent to the target site of the gRNA for the nCas9, wherein the gRNA binds to the nucleic acid comprising the selected cytosine.
9. The method of claim 5, wherein the fusion protein is delivered as a ribonucleoprotein (RNP) complex with one or more gRNAs that interact with Cas9 domains in the fusion proteins, mRNA, or plasmid.
10. A method of deaminating one or more selected cytosines in a nucleic acid, the method comprising contacting the nucleic acid with the pair of fusion proteins of claim 2.
11. A composition comprising a fusion protein of part (i) of claim 1; a fusion protein of part (ii) of claim 1; or a fusion protein of part (i) of claim 1 and a fusion protein of part (ii) of claim 1.
12. The composition of claim 11, comprising one or more ribonucleoprotein (RNP) complexes.
13. A vector comprising the nucleic acid of claim 3.
14. An isolated host cell comprising the nucleic acid of claim 13.
15. The host cell of claim 14, which is a stem cell.
16. The host cell of claim 15, wherein the stem cell is hematopoietic stem cell.
Description:
CLAIM OF PRIORITY
[0001] This application claims the benefit of U.S. Provisional Patent Application Ser. No. 62/511,296, filed on May 25, 2017; Ser. No. 62/541,544, filed on Aug. 4, 2017; and Ser. No. 62/622,676, filed on Jan. 26, 2018. The entire contents of the foregoing are hereby incorporated by reference.
TECHNICAL FIELD
[0003] Described herein are methods and compositions for improving the genome-wide specificities of targeted base editing technologies.
BACKGROUND
[0004] Base editing (BE) technologies use an engineered DNA binding domain (such as RNA-guided, catalytically inactive Cas9 (dead Cas9 or dCas9), a nickase version of Cas9 (nCas9), or zinc finger (ZF) arrays) to recruit a cytosine deaminase domain to a specific genomic location to effect site-specific cytosine.fwdarw.thymine transition substitutions.sup.1,2. BEs are a particularly attractive tool for treating genetic diseases that manifest in cellular contexts where making precise mutations by homology directed repair (HDR) would be therapeutically beneficial but are difficult to create with traditional nuclease-based genome editing technology. For example, it is challenging or impossible to achieve HDR outcomes in tissues composed primarily of slowly dividing or post-mitotic cell populations, since HDR pathways are restricted to the G2 and S phases of the cell cycle.sup.3. In addition, the efficiency of HDR can be substantially limited by the competing and more efficient induction of variable-length indel mutations caused by non-homologous end-joining-mediated repair of nuclease-induced breaks. By contrast, BE technology has the potential to allow practitioners to make highly controllable, highly precise mutations without the need for cell-type-variable DNA repair mechanisms.
SUMMARY
[0005] Base editor platforms (BE) possess the unique capability to generate precise, user-defined genome-editing events without the need for a donor DNA molecule. Base Editors (BEs) that include a single strand nicking CRISPR-Cas9 (nCas9) protein fused to cytosine deaminase domain and uracil glycosylase inhibitor (UGI) domains (e.g., BE3) efficiently induce cytosine-to-thymine (C-to-T) base transitions in a site-specific manner as determined by the CRISPR guide RNA (gRNA) spacer sequence.sup.1. As with all genome editing reagents, it is critical to first determine and then mitigate BE's capacity for generating off-target mutations before it is used for therapeutics so as to limit its potential for creating deleterious and irreversible genetically-encoded side-effects. Herein, we describe dimeric BEs that use split deaminases (sDA) that are functional when brought into close proximity to each other, one fused to a ZF and one to an nCas9-UGI protein comprising one or more UGIs, so as to limit the ability of the deaminase domain from deaminating at off-target ssDNA target sites independent of nCas9 R-loop formation.
[0006] Thus, provided herein are fusion proteins comprising: (i) a first portion of a split deaminase ("sDA1") enzyme fused to a programmable DNA-binding domain, preferably selected from the group consisting of such as a ZF, TALE, Cas9, catalytically inactive Cas9 (dCas9) or Cas9 ortholog (i.e., a homologous protein from another species such as dCpf1), nicking Cas9 (nCas9) or nicking Cas9 ortholog, wherein the sDA1 is an N-terminal truncated, catalytically inactive or deficient derivative of a parental deaminase selected from the group consisting of hAID, rAPOBEC1, mAPOBEC3, hAPOBEC3A, hAPOBEC3B, hAPOBEC3C, hAPOBEC3F, hAPOBEC3G, or hAPOBEC3H, and variants thereof, e.g., variants that have altered substrate specificities or activities such as eA3A; or (ii) a second portion of a split deaminase ("sDA2") fused to an nCas9 protein, preferably an nCas9-UGI protein, e.g., in a manner similar to previously described base editor architectures, or any orthogonal DNA targeting domain as the one used for its complementary sDA1 portion (e.g., dCpf1, TALE, ZF), wherein the sDA2 is a C-terminal truncated, catalytically inactive or deficient derivative of a parental deaminase selected from the group consisting of hAID, rAPOBEC1*, mAPOBEC3, hAPOBEC3A, hAPOBEC3B, hAPOBEC3C, hAPOBEC3F, hAPOBEC3G, or hAPOBEC3H, and/or variants thereof, e.g., with altered substrate specificities or activities such as eA3A. In the present methods, the split deaminases are not full length proteins, but are fragments thereof, wherein the co-expression of a fusion protein of (i) with a fusion protein of (ii) comprising a sDA1 and sDA2 portion from the same parental deaminase in eukaryotic cells, and their subsequent co-localization at adjacent genomic target sites, provides a catalytically active base-editor. The terms "sDA1" and "sDA2" are used herein to refer to the first and second split deaminases generally, and do not refer specifically to the exemplary split deaminases described herein.
[0007] Also provided herein are nucleic acids encoding the fusion proteins described herein, and compositions comprising one or more of those nucleic acids, e.g., wherein the nucleic acids encode a pair of the fusion proteins, e.g., comprising a SDA1 and SDA2 portion from the same parental deaminase. Further, provided herein are vectors comprising the nucleic acids, and isolated host cells comprising and optionally expressing the nucleic acids. In some embodiments, the host cell is a stem cell, e.g., a hematopoietic stem cell.
[0008] In addition, provided herein are methods for targeted deamination of one or more selected cytosines in a nucleic acid. The methods include contacting the nucleic acid with a pair of fusion proteins described herein comprising a SDA1 and SDA2 portion from the same parental deaminase, as well as one or more gRNAs that interact with Cas9 domains in the fusion proteins. In some embodiments, one of the fusion proteins comprises nCas9, the other fusion protein comprises ZF or TALE, and the ZF or TALE is targeted to a sequence of 9-24 bp adjacent to the target site of the gRNA for the nCas9, wherein the gRNA binds to the nucleic acid comprising the selected cytosine.
[0009] In some embodiments, the nucleic acid is in a cell, e.g., a eukaryotic cell, and the method comprises contact the cell with the fusion proteins or expressing the fusion proteins in the cell.
[0010] Also provided are methods for improving specificity of targeted deamination in a cell, e.g., a eukaryotic cell, by expressing in the cell, or contacting the cell with, a pair of fusion proteins described herein comprising a sDA1 and sDA2 portion from the same parental deaminase, as well as one or more gRNAs that interact with Cas9 domains in the fusion proteins. In some embodiments, one of the fusion proteins comprises nCas9, the other fusion protein comprises ZF or TALE, the ZF or TALE is targeted to a sequence of 9-24 bp adjacent to the target site of the gRNA for the nCas9, wherein the gRNA binds to the nucleic acid comprising the selected cytosine.
[0011] In some embodiments, the fusion protein is delivered as an RNP, mRNA, or plasmid.
[0012] Also provided herein are methods for deaminating one or more selected cytosines in a nucleic acid, by contacting the nucleic acid with a pair of fusion proteins described herein comprising a sDA1 and sDA2 portion from the same parental deaminase, as well as one or more gRNAs that interact with Cas9 domains in the fusion proteins.
[0013] In addition, provided herein are compositions comprising a purified fusion protein or pair of fusion proteins described herein, preferably a pair of fusion proteins described herein comprising a sDA1 and sDA2 portion from the same parental deaminase, an optionally one or more gRNAs that interact with Cas9 domains in the fusion proteins. In some embodiments, the composition comprise one or more ribonucleoprotein (RNP) complexes.
[0014] Also provided herein are ribonucleoprotein (RNP) complexes that include a variant spCas9 protein as described herein and a guide RNA that targets a sequence having a PAM sequence targeted by the split deaminase fusion protein comprising Cas9 or Cas9 derivative.
[0015] Also provided herein are methods for targeted deamination, or improving specificity of targeted deamination, of a selected cytosine in a nucleic acid, comprising contacting the nucleic acid with one or more of the fusion proteins or base editing systems described herein.
[0016] In some embodiments, the fusion protein is delivered as an RNP, mRNA, or plasmid DNA.
[0017] Also provided herein are methods for deaminating a selected cytosine in a nucleic acid, the method comprising contacting the nucleic acid with a fusion protein or base editing system described herein.
[0018] Additionally, provided herein are compositions comprising a purified a fusion protein or base editing system as described herein.
[0019] Further, provided herein are nucleic acids encoding a fusion protein or base editing system described herein, as well as vectors comprising the nucleic acids, and host cells comprising the nucleic acids, e.g., stem cells, e.g., hematopoietic stem cells.
[0020] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Methods and materials are described herein for use in the present invention; other, suitable methods and materials known in the art can also be used. The materials, methods, and examples are illustrative only and not intended to be limiting. All publications, patent applications, patents, sequences, database entries, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control.
[0021] Other features and advantages of the invention will be apparent from the following detailed description and figures, and from the claims.
DESCRIPTION OF DRAWINGS
[0022] FIG. 1. Diagram of an exemplary typical high efficiency base editing setup. A nicking Cas9 bearing a catalytically inactivating mutation at one of its two nuclease domains binds to the target site dictated by the variable spacer sequence of the gRNA. The formation of a stable R-loop creates a ssDNA editing window on the non-deaminated strand. The Cas9 creates a single strand break in the genomic DNA, prompting the host cell to repair the lesion using the deaminated strand as a template, thus biasing repair towards the cytosine.fwdarw.thymine transition substitution. See Komor et al., 2016.
[0023] FIGS. 2A-2G. Schematic representation of: 2A.) First-generation base editor targeting and deaminating at an on-target site, with a deaminase targeting an R-loop generated by an on-target nCas9. 2B.) First-generation base-editor binding to and deaminating an off-target genomic R-loop independent of its nCas9 targeting capabilities. 2C.) First-generation base-editor binding to and deaminating an off-target genomic transcription bubble independent of its nCas9 targeting capabilities. (Note: 2B and 2C are exemplary cases of genomic ssDNA targets potentially available to BE deamination, but do not constitute an exhaustive list.) 2D.) A split-deaminase (sDA) BE targeting a genomic site with an nCas9-mediated R-loop and adjacent TALE or ZF binding. Note that the two split deaminase portions (sDA1 and sDA2) are brought into close proximity by the adjacent binding, reconstituting their catalytic activity and allowing on-target deamination. 2E and 2F.) Even if one half of a split BE could bind a non-target piece of ssDNA such as a genomic R-loop or transcription bubble through its sDA domain, it would not have enough machinery to reconstitute deaminase enzymatic activity (sDA2-nCas9-UGI half shown). 2G. Off-target binding of the ZF or nCas9 components of a sDA system (ZF-sDA1 shown) would not result in co-localization of enough machinery to reconstitute deaminase enzymatic activity.
[0024] FIGS. 3A-3B. hAPOBEC3G with representative candidate split sites. Multiple rotational views of the hAPOBEC3G structure are shown. Magenta colored loop regions are candidate split sites selected on the bases of their lack of secondary structures and their distance from the catalytic center. PDB: 3E1U.
[0025] FIG. 4. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.1-3AC3L-ZF-C and N-sDA2.1-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N.fwdarw.C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. The sDA1.1 and sDA2.1 pair resulted in significant C-to-T conversion when the ZF binding site was upstream of gRNA binding site with an in-series orientation with 31 bps in-between. EGFP target sequence,
TABLE-US-00001 (SEQ ID NO: 1) CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCT.
[0026] FIG. 5. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.1-3AC3L-ZF-C and N-sDA2.2-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N.fwdarw.C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. The sDA1.1 and sDA2.2 pair did not stimulate discernable C-to-T conversion in any orientation attempted. EGFP target sequence,
TABLE-US-00002 (SEQ ID NO: 1) CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCT.
[0027] FIG. 6. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.2-3AC3L-ZF-C and N-sDA2.1-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N.fwdarw.C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. Low-level C-to-T mutations are observed primarily when using gRNA2 with either ZF, with gRNA1 experiments yielding detectable but diminished levels of activity. EGFP target sequence,
TABLE-US-00003 (SEQ ID NO: 1) CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCT.
[0028] FIG. 7. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.2-3AC3L-ZF-C and N-sDA2.2-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N.fwdarw.C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. Low-level C-to-T mutations are observed primarily when using gRNA2 with either ZF, with gRNA1 experiments yielding detectable but diminished levels of activity. EGFP target sequence,
TABLE-US-00004 (SEQ ID NO: 1) CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCT.
[0029] FIG. 8. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.2-3AC3L-ZF-C and N-sDA2.3-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N.fwdarw.C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
TABLE-US-00005 (SEQ ID NO: 1) CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCT.
[0030] FIG. 9. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.3-3AC3L-ZF-C and N-sDA2.2-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N.fwdarw.C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
TABLE-US-00006 (SEQ ID NO: 1) CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCT.
[0031] FIG. 10. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.2-3AC3L-ZF-C and N-sDA2.3-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N.fwdarw.C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
TABLE-US-00007 (SEQ ID NO: 1) CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATC T.
[0032] FIG. 11. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.3-3AC3L-ZF-C and N-sDA2.4-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N.fwdarw.C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
TABLE-US-00008 (SEQ ID NO: 1) CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATC T.
[0033] FIG. 12. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.4-3AC3L-ZF-C and N-sDA2.3-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N.fwdarw.C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
TABLE-US-00009 (SEQ ID NO: 1) CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATC T.
[0034] FIG. 13. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.4-3AC3L-ZF-C and N-sDA2.4-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N.fwdarw.C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
TABLE-US-00010 (SEQ ID NO: 1) CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATC T.
[0035] FIG. 14. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.5-3AC3L-ZF-C and N-sDA2.4-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N.fwdarw.C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
TABLE-US-00011 (SEQ ID NO: 1) CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATC T.
[0036] FIG. 15. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.5-3AC3L-ZF-C and N-sDA2.6-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N.fwdarw.C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
TABLE-US-00012 (SEQ ID NO: 1) CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATC T.
[0037] FIG. 16. C-to-T transition mutations in the integrated EGFP gene from a split rAPO1 base editor architecture consisting of adjacently-targeting N-sDA1.6-3AC3L-ZF-C and N-sDA2.6-nCas9-UGI-C proteins in several indicated orientations. Conversion rates at each position are indicated by shaded boxes with overlaid percentage numbers for residues in which significant mutation was observed. Orientation information is depicted, with arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM) and ZF binding sites (with the arrow indicating the direction of ZF binding in reference to N.fwdarw.C orientation). Approximate editing windows (residues 4-8 in the gRNA target site) are indicated. Experiments were performed in duplicate and sequencing from each sample is shown independently. No significant mutations detected. EGFP target sequence,
TABLE-US-00013 (SEQ ID NO: 1) CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATC T.
[0038] FIG. 17. C-to-T conversion data with first-generation BE3 (described in reference 1) with both gRNAs used in this study. (Note that the coloration gradient of these samples is shaded lighter than graphs above and that direct comparison requires evaluation of relative numerical rates). Orientation information is depicted, with an arrows representing gRNA binding sites (with the arrow pointing in the direction of the PAM). EGFP target sequence,
TABLE-US-00014 (SEQ ID NO: 1) CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATC T.
[0039] FIG. 18. C-to-T conversion rates of individual N-sDA1-ZF-C proteins without an adjacent sDA2-nCas9-UGI. No discernable editing observed. EGFP target sequence,
TABLE-US-00015 (SEQ ID NO: 1) CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATC T.
[0040] FIG. 19. C-to-T conversion rates of individual N-sDA2-nCas9-UGI-C proteins without an adjacent N-sDA1-ZF-C. No discernable editing was observed. EGFP target sequence,
TABLE-US-00016 (SEQ ID NO: 1) CTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATC T.
[0041] FIG. 20. Evidence of C-to-T conversion when using adjacently-targeting N-sDA1.X-NLS-ZF-C and N-sDA2.X-nCas9-UGI-C human APOBEC3a (hA3A) split Base Editors in the indicated orientation. Pointed boxes representing the nCas9 gRNA binding site (gRNA2) and ZF binding site (ZF1) are shown, with the pointed ends indicating the PAM-proximal end of the gRNA and indicating the N.fwdarw.C orientation of the ZF, respectively. Conversion rates at each position are indicated by shaded boxes. Rates of deamination by split BE pairs are around 2.5% per cytosine using the sDA1.6+sDA2.6 configuration and around 1.7% per cytosine for the sDA1.1+sDA2.1 configuration, while a hAPOBEC3A-nCas9-UGI positive control possessed 3-4.times. the amount of on-target activity as active hA3A halfase pairs. gRNA target region:
TABLE-US-00017 (SEQ ID NO: 2) CATGCCCGAAGGCTACGTCCAG.
[0042] FIGS. 21A-21D. Summary of C-to-T conversion rate of all rAPO1 halfase combination base editors as compared to a benchmark BE3 base editor at an integrated EGFP locus. The sum of total C-to-T editing percentages among three cytosines within or near the target gRNA's approximate editing window is shown, as averaged between two replicates. 21A shows the ZF1+gRNA1 data, 21B shows the ZF1+gRNA2 data, 21C shows the ZF2+gRNA1 data, 21D shows the ZF2+gRNA2 data.
[0043] FIG. 22. Representation of a portion of the EGFP reporter gene and the target sites used for the rAPO1 halfase combination experiments. EGFP target region:
TABLE-US-00018 (SEQ ID NO: 3) GCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCC GCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACG.
DETAILED DESCRIPTION
[0044] In the most efficient BE configuration described to date, a cytosine deaminase (DA) domain and uracil glycosylase inhibitor (UGI; a small bacteriophage protein that inhibits host cell uracil DNA glycosylase (UDG), the enzyme responsible for excising uracil from the genome.sup.1, 4) are both fused to nCas9 (derived from either Streptococcus pyogenes Cas9 (SpCas9) or Staphylococcus aureus Cas9 (SaCas9). The nCas9 forms an R-loop at a target site specified by its single guide RNA (gRNA) and recognition of an adjacent protospacer adjacent motif (PAM), leaving approximately 4-8 nucleotides of the non-target strand exposed as single stranded DNA (ssDNA) near the PAM-distal end of the R-loop (FIG. 1). This region of the ssDNA is the template that is able to be deaminated by the ssDNA-specific DA domain to produce a guanosine:uracil (G:U) mismatch and defines the editing window. The nCas9 nicks the non-deaminated strand of DNA, biasing conversion of the G:U mismatch to an adenine:thymine (A:T) base pair by directing the cell to repair the nick lesion using the deaminated strand as a template. To date, the deaminase domains described in these fusion proteins have been rat APOBEC1 (rAPO1), an activation-induced cytosine deaminase (AID) derived from lamprey termed CDA (PmCDA), human AID (hAID), or a hyperactive form of hAID lacking a nuclear export signal, or an engineered variant of human APOBEC3A (hA3A) termed eA3A.sup.1-2, 5-7, 16. Any of these deaminase domains from these BEs can be used as parental deaminases in the present fusion proteins. BE technology was primarily established using the SpCas9 protein for its nCas9 domain (nSpCas9), but although herein we refer to nCas9, in general any Cas9-like nickase could be used based on any ortholog of the Cpf1 protein (including the related Cpf1 enzyme class) to perform this function, unless specifically indicated. In addition, a completely enzymatically dead dCas9 (or Cas9-like enzyme) can also be used as the targeting mechanism of a functional BE enzyme.
[0045] An important consideration for the use of BE in therapeutic settings will be to assess its genome-wide capacity for off-target mutagenesis and to modify the technology to minimize or, ideally, to eliminate the risks of stimulating deleterious off-target mutations. Herein, we described technological improvements to BEs that can be used to reduce or eliminate potential unwanted BE mutagenesis.
[0046] Using Split Deaminases to Limit Unwanted Off-Target Base Editor Deamination
[0047] Because of AID/APOBEC enzymes' natural ability to bind and deaminate cytosines in genomic DNA and cytosines in RNA, non-specific spurious deamination events are a possibly important source of off-target mutagenesis in the genome and transcriptome from CRISPR Base Editor technology. In theory, even if the BE's nCas9 domain (and any potential dCas9, TALE, and/or ZF domains) are eminently specific, this might do nothing to prevent the natural RNA- and ssDNA-targeting ability of the APOBEC enzyme from non-specifically deaminating globally across the transcriptome or the whichever regions of the genome are exposed as ssDNA, such as actively transcribed regions or DNA undergoing replication. In fact, an E. coli-based assay examining deaminases showed that an actively transcribed region could be highly enriched (.about.7-530 fold) for C.fwdarw.T transition mutations when exposed to various overexpressed mammalian deaminases.sup.4. Further, one group has found that co-expression of PmCda1 and nCas9 as two separate, untethered proteins in yeast cells results in similar levels of deamination at the gRNA-specified target site as when the two components are expressed as direct fusion partners, demonstrating that these proteins are capable of deaminating ssDNA from solution without an affinity tether to the genomic location.sup.5. This concern is especially relevant now that scientists are becoming increasingly aware that R-loops are a more common occurrence in the genomes of eukaryotic cells than previously thought, thus creating many potential steady-state off-target ssDNA substrates where an APOBEC could bind and deaminate.sup.6. While it is as yet unproven whether BE overexpression itself can sufficiently stimulate spurious deamination and mutagenesis on a global genomic scale, aberrant and over-active APOBEC deaminase activity is a known driver of tumorigenic mutagenesis.sup.7 and overexpression of at least hAPO3.sup.8-11 has been shown to stimulate genomic cytosine hypermutation. Thus, it stands to reason that limiting the naturally global deaminating activity of over-expressed deaminases like BE will be important for translating BE technologies into therapeutic applications. Of note, since most BEs include at least one UGI inhibitor to bias deamination events toward productive C.fwdarw.T mutations, it is possible that global off-target BE activity is even more mutagenic than the effects of aberrant deaminase activity alone during tumorigenesis.
[0048] To impose a stricter requirement for BEs to act on their intended target sequences rather than globally, we created a split BE architecture comprised of two separate proteins consisting of reciprocal deaminase truncation variants fused to adjacently-targeted DNA binding domains. These dimeric BE technologies make use of "split deaminases" (sDAs) that require co-localization (an "AND Gate") of both sDA domains at adjacent DNA sites to function properly. In this scenario, spurious binding events of either "halfase" of the dimeric base-editor will be unlikely to result in productive deamination events, since each component on its own will not contain the full complement of enzymatic machinery necessary to catalyze cytosine deamination (FIGS. 2A-2G).
[0049] To create dimeric BEs, we fused an N-terminal truncation of a split deaminase (sDA1) enzyme to a ZF (though any DNA targeting domain orthogonal to Cas9, such as Cpf1, TALE, ZF, or a dCas9 orthogonal to the nCas9 used to target sDA2, may be suitable) targeted to a .about.9-24 bp sequence, and a reciprocal or somewhat overlapping C-terminal truncation of a deaminase fused to an nCas9-UGI fusion protein, such that the N-terminal truncation and the C-terminal truncation together form a functional enzyme. The exemplary BEs were made in a similar orientation to the first-generation BE3 enzyme (sDA2-nCas9-UGI) targeting an adjacent sequence with a .about.17-24 bp target site.sup.1. To the best of the inventors' knowledge, though there is no record of functional split APOBEC enzymes (or other mammalian deaminases), a yeast cytosine deaminase (yCD) has been shown to constitute at least a partially functional enzyme (on cytosine as a metabolite and the pro-drug 5-fluorocytosine, though it was not shown to be able to deaminate DNA) when split and reconstituted by protein dimerization.sup.12 and serves as a useful template to inform how various APOBEC proteins may be effectively bifurcated; however, since the yCD shares little primary sequence homology to mammalian deaminases, and the split yCD was not reported to function on DNA and used protein scaffolds to bring its constituent pieces together, it is not obvious that yCD split deaminases will be directly comparable to those so far described for use in for BEs. Therefore, we used APOBEC structural information to determine the unstructured linker regions as potential sites at which to split APOBEC enzymes (FIGS. 3A-3B), since those sites may be less likely to affect overall functionality or folding of the constituent subdomains. This split deaminase strategy can be used with wild-type versions of deaminase enyzmes, and also any engineered variants that may be described, with the split BE potentially retaining any special features of the engineered deaminases.sup.16.
[0050] This architecture should virtually eliminate the capacity for spurious deamination, since any other DNA binding event by either of the two constituent halfases will lack any enzymatic deaminase activity and will be therefore unable to perturb genomic DNA. In addition, a split BE should generally increase the specificity of editing compared to typical BEs by virtue of the fact that the split BE system requires the binding of a higher number of sequential/adjacent DNA bases, thereby decreasing the off-target effects conferred by off-target binding of either halfase on its own. CRISPR BE architectures are known to induce C-to-T mutations in human cells at some genomic sites that are imperfect matches to their gRNAs.sup.13, and since ZFs are known to bind with some capacity to off-target sites it stands to reason that a ZF-BE architecture would also induce off-target mutagenesis to some capacityl.sup.14.
[0051] It is conceivable and likely that any CRISPR/Cas-based targeting system, including Cas9s from Streptococcus pyogenes or Stapholococcus aureus or Cpf1 proteins from various organisms could be used in place of the nCas9 portion of the sDA2-nCas9-UGI fusion protein, so long as the targeting mechanism results in specific DNA binding and the creation of an R-loop that exposes ssDNA to action by the reconstituted split deaminase. Table 1 contains a list of representative CRISPR/Cas targeting systems and the residues/mutations therein known to be important for creating nickase and catalytically inactive (dead) mutants. Note that while Cpf1 nickases have yet to be described, catalytically null Cpf1 orthologs may replicate the targeting characteristics of nCas9 such that it could form the basis of a functional sDA2 halfase. In some embodiments, ZF domains are chosen as the DNA binding domain for sDA1 due to their small size, presumed lack of immunogenicity, and because, unlike CRISPR-based targeting systems, they do not create an R-loop upon binding and do not expose additional substrate ssDNA to the deaminase domain. In principle, however, use of any engineered DNA binding domain, such as a CRISPR-based targeting complex or a TALE DNA binding domain, could still result in functional sDA1 halfase. In the examples shown herein, ZF domains targeting an integrated EGFP gene were used for the sDA1 halfases.sup.15.
Programmable DNA Binding Domain
[0052] The present fusion proteins can include programmable DNA binding domains such as engineered C2H2 zinc-fingers, transcription activator effector-like effectors (TALEs), and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) Cas RNA-guided nucleases (RGNs) and their variants, including ssDNA nickases (nCas9) or their analogs and catalytically inactive dead Cas9 (dCas9) and its analogs, and any engineered protospacer-adjacent motif (PAM) variants. A programmable DNA binding domain is one that can be engineered to bind to a selected target sequence.
[0053] CRISPR-Cas Nucleases
[0054] Although herein we refer to nCas9, in general any Cas9-like nickase could be used based on any ortholog of the Cpf1 protein (including the related Cpf1 enzyme class), unless specifically indicated.
TABLE-US-00019 TABLE 1 List of Exemplary Cas9 Orthologs UniProt Accession Nickase Mutations/ Ortholog Number Catalytic residues S. pyogenes Cas9 (SpCas9) Q99ZW2 D10A, E762A, H840A, N854A, N863A, D986A.sup.17 S. aureus Cas9 (SaCas9) J7RUA5 D10A and N580.sup.18 S. thermophilus Cas9 G3ECR1 D31A and N891A.sup.19 (St1Cas9) S. pasteurianus Cas9 F5X275 D10, H599* (SpaCas9) C. jejuni Cas9 (CjCas9) Q0P897 D8A, H559A.sup.20 F. novicida Cas9 (FnCas9) A0Q5Y3 D11, N995.sup.21 P. lavamentivorans Cas9 A7HP89 D8, H601* (PlCas9) C. lari Cas9 (ClCas9) G1UFN3 D7, H567* F. novicida Cpf1 (FnCpf1) A0Q7Q2 D917, E1006, D1255.sup.21 M. bovoculi Cpf1 (MbCpf1) Sequence N/A** given at end A. sp. BV3L6 (AsCpf1) U2UMQ6 D908, 993E, Q1226, D1263.sup.22 L. bacterium N2006 (LbCpf1) A0A182DWE3 D832A.sup.24 *predicted based on UniRule annotation on the UniProt database. **May be determinable based on sequence alignment with other Cpf1 orthologs
These orthologs, and mutants and variants thereof as known in the art, can be used in any of the fusion proteins described herein. See, e.g., WO 2017/040348 (which describes variants of SaCas9 and SpCas 9 with increased specificity) and WO 2016/141224 (which describes variants of SaCas9 and SpCas 9 with altered PAM specificity).
[0055] The Cas9 nuclease from S. pyogenes (hereafter simply Cas9) can be guided via simple base pair complementarity between 17-20 nucleotides of an engineered guide RNA (gRNA), e.g., a single guide RNA or crRNA/tracrRNA pair, and the complementary strand of a target genomic DNA sequence of interest that lies next to a protospacer adjacent motif (PAM), e.g., a PAM matching the sequence NGG or NAG (Shen et al., Cell Res (2013); Dicarlo et al., Nucleic Acids Res (2013); Jiang et al., Nat Biotechnol 31, 233-239 (2013); Jinek et al., Elife 2, e00471 (2013); Hwang et al., Nat Biotechnol 31, 227-229 (2013); Cong et al., Science 339, 819-823 (2013); Mali et al., Science 339, 823-826 (2013c); Cho et al., Nat Biotechnol 31, 230-232 (2013); Jinek et al., Science 337, 816-821 (2012)). The engineered CRISPR from Prevotella and Francisella 1 (Cpf1) nuclease can also be used, e.g., as described in Zetsche et al., Cell 163, 759-771 (2015); Schunder et al., Int J Med Microbiol 303, 51-60 (2013); Makarova et al., Nat Rev Microbiol 13, 722-736 (2015); Fagerlund et al., Genome Biol 16, 251 (2015). Unlike SpCas9, Cpf1 requires only a single 42-nt crRNA, which has 23 nt at its 3' end that are complementary to the protospacer of the target DNA sequence (Zetsche et al., 2015). Furthermore, whereas SpCas9 recognizes an NGG PAM sequence that is 3' of the protospacer, AsCpf1 and LbCp1 recognize TTTN PAMs that are found 5' of the protospacer (Id.).
[0056] In some embodiments, the present system utilizes a wild type or variant Cas9 protein from S. pyogenes or Staphylococcus aureus, or a wild type Cpf1 protein from Acidaminococcus sp. BV3L6 or Lachnospiraceae bacterium ND2006 either as encoded in bacteria or codon-optimized for expression in mammalian cells and/or modified in its PAM recognition specificity and/or its genome-wide specificity. A number of variants have been described; see, e.g., WO 2016/141224, PCT/US2016/049147, Kleinstiver et al., Nat Biotechnol. 2016 August; 34(8):869-74; Tsai and Joung, Nat Rev Genet. 2016 May; 17(5):300-12; Kleinstiver et al., Nature. 2016 Jan. 28; 529(7587):490-5; Shmakov et al., Mol Cell. 2015 Nov. 5; 60(3):385-97; Kleinstiver et al., Nat Biotechnol. 2015 December; 33(12):1293-1298; Dahlman et al., Nat Biotechnol. 2015 November; 33(11):1159-61; Kleinstiver et al., Nature. 2015 Jul. 23; 523(7561):481-5; Wyvekens et al., Hum Gene Ther. 2015 July; 26(7):425-31; Hwang et al., Methods Mol Biol. 2015; 1311:317-34; Osborn et al., Hum Gene Ther. 2015 February; 26(2):114-26; Konermann et al., Nature. 2015 Jan. 29; 517(7536):583-8; Fu et al., Methods Enzymol. 2014; 546:21-45; and Tsai et al., Nat Biotechnol. 2014 June; 32(6):569-76, inter alia.
[0057] The guide RNA is expressed or present in the cell together with the Cas9 or Cpf1. Either the guide RNA or the nuclease, or both, can be expressed transiently or stably in the cell or introduced as a purified protein or nucleic acid.
[0058] In some embodiments, the Cas9 also includes one of the following mutations, which reduce nuclease activity of the Cas9; e.g., for SpCas9, mutations at D10A or H840A (which creates a single-strand nickase).
[0059] In some embodiments, the SpCas9 variants also include mutations at one of the following amino acid positions, which destroy the nuclease activity of the Cas9: D10, E762, D839, H983, or D986 and H840 or N863, e.g., D10A/D10N and H840A/H840N/H840Y, to render the nuclease portion of the protein catalytically inactive; substitutions at these positions could be alanine (as they are in Nishimasu al., Cell 156, 935-949 (2014)), or other residues, e.g., glutamine, asparagine, tyrosine, serine, or aspartate, e.g., E762Q, H983N, H983Y, D986N, N863D, N863S, or N863H (see WO 2014/152432).
[0060] In some embodiments, the Cas9 is fused to one or more Uracil glycosylase inhibitor (UGI) protein sequences; an exemplary UGI sequence is as follows:
TABLE-US-00020 TNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDEST DENVMLLTSDAPEYKPWALVIQDSNGENKIKML (SEQ ID NO: 4; Uniprot: P14739).
Typically, the UGIs are at the C-terminus of a BE fusion protein, but could conceivably be at the N-terminus, or between the DNA binding domain and the sDA domain. Linkers as known in the art can be used to separate domains.
[0061] TAL Effector Repeat Arrays
[0062] Transcription activator like effectors (TALEs) of plant pathogenic bacteria in the genus Xanthomonas play important roles in disease, or trigger defense, by binding host DNA and activating effector-specific host genes. Specificity depends on an effector-variable number of imperfect, typically .about.33-35 amino acid repeats. Polymorphisms are present primarily at repeat positions 12 and 13, which are referred to herein as the repeat variable-diresidue (RVD). The RVDs of TAL effectors correspond to the nucleotides in their target sites in a direct, linear fashion, one RVD to one nucleotide, with some degeneracy and no apparent context dependence. In some embodiments, the polymorphic region that grants nucleotide specificity may be expressed as a triresidue or triplet.
[0063] Each DNA binding repeat can include a RVD that determines recognition of a base pair in the target DNA sequence, wherein each DNA binding repeat is responsible for recognizing one base pair in the target DNA sequence. In some embodiments, the RVD can comprise one or more of: HA for recognizing C; ND for recognizing C; HI for recognizing C; HN for recognizing G; NA for recognizing G; SN for recognizing G or A; YG for recognizing T; and NK for recognizing G, and one or more of: HD for recognizing C; NG for recognizing T; NI for recognizing A; NN for recognizing G or A; NS for recognizing A or C or G or T; N* for recognizing C or T, wherein * represents a gap in the second position of the RVD; HG for recognizing T; H* for recognizing T, wherein * represents a gap in the second position of the RVD; and IG for recognizing T.
[0064] TALE proteins may be useful in research and biotechnology as targeted chimeric nucleases that can facilitate homologous recombination in genome engineering (e.g., to add or enhance traits useful for biofuels or biorenewables in plants). These proteins also may be useful as, for example, transcription factors, and especially for therapeutic applications requiring a very high level of specificity such as therapeutics against pathogens (e.g., viruses) as non-limiting examples.
[0065] Methods for generating engineered TALE arrays are known in the art, see, e.g., the fast ligation-based automatable solid-phase high-throughput (FLASH) system described in U.S. Ser. No. 61/610,212, and Reyon et al., Nature Biotechnology 30,460-465 (2012); as well as the methods described in Bogdanove & Voytas, Science 333, 1843-1846 (2011); Bogdanove et al., Curr Opin Plant Biol 13, 394-401 (2010); Scholze & Boch, J. Curr Opin Microbiol (2011); Boch et al., Science 326, 1509-1512 (2009); Moscou & Bogdanove, Science 326, 1501 (2009); Miller et al., Nat Biotechnol 29, 143-148 (2011); Morbitzer et al., T. Proc Natl Acad Sci USA 107, 21617-21622 (2010); Morbitzer et al., Nucleic Acids Res 39, 5790-5799 (2011); Zhang et al., Nat Biotechnol 29, 149-153 (2011); Geissler et al., PLoS ONE 6, e19509 (2011); Weber et al., PLoS ONE 6, e19722 (2011); Christian et al., Genetics 186, 757-761 (2010); Li et al., Nucleic Acids Res 39, 359-372 (2011); Mahfouz et al., Proc Natl Acad Sci USA 108, 2623-2628 (2011); Mussolino et al., Nucleic Acids Res (2011); Li et al., Nucleic Acids Res 39, 6315-6325 (2011); Cermak et al., Nucleic Acids Res 39, e82 (2011); Wood et al., Science 333, 307 (2011); Hockemeye et al. Nat Biotechnol 29, 731-734 (2011); Tesson et al., Nat Biotechnol 29, 695-696 (2011); Sander et al., Nat Biotechnol 29, 697-698 (2011); Huang et al., Nat Biotechnol 29, 699-700 (2011); and Zhang et al., Nat Biotechnol 29, 149-153 (2011); all of which are incorporated herein by reference in their entirety.
[0066] Also suitable for use in the present methods are MegaTALs, which are a fusion of a meganuclease with a TAL effector; see, e.g., Boissel et al., Nucl. Acids Res. 42(4):2591-2601 (2014); Boissel and Scharenberg, Methods Mol Biol. 2015; 1239:171-96.
[0067] Zinc Fingers
[0068] Zinc finger (ZF) proteins are DNA-binding proteins that contain one or more zinc fingers, independently folded zinc-containing mini-domains, the structure of which is well known in the art and defined in, for example, Miller et al., 1985, EMBO J., 4:1609; Berg, 1988, Proc. Natl. Acad. Sci. USA, 85:99; Lee et al., 1989, Science. 245:635; and Klug, 1993, Gene, 135:83. Crystal structures of the zinc finger protein Zif268 and its variants bound to DNA show a semi-conserved pattern of interactions, in which typically three amino acids from the alpha-helix of the zinc finger contact three adjacent base pairs or a "subsite" in the DNA (Pavletich et al., 1991, Science, 252:809; Elrod-Erickson et al., 1998, Structure, 6:451). Thus, the crystal structure of Zif268 suggested that zinc finger DNA-binding domains might function in a modular manner with a one-to-one interaction between a zinc finger and a three-base-pair "subsite" in the DNA sequence. In naturally occurring zinc finger transcription factors, multiple zinc fingers are typically linked together in a tandem array to achieve sequence-specific recognition of a contiguous DNA sequence (Klug, 1993, Gene 135:83).
[0069] Multiple studies have shown that it is possible to artificially engineer the DNA binding characteristics of individual zinc fingers by randomizing the amino acids at the alpha-helical positions involved in DNA binding and using selection methodologies such as phage display to identify desired variants capable of binding to DNA target sites of interest (Rebar et al., 1994, Science, 263:671; Choo et al., 1994 Proc. Natl. Acad. Sci. USA, 91:11163; Jamieson et al., 1994, Biochemistry 33:5689; Wu et al., 1995 Proc. Natl. Acad. Sci. USA, 92: 344). Such recombinant zinc finger proteins can be fused to functional domains, such as transcriptional activators, transcriptional repressors, methylation domains, and nucleases to regulate gene expression, alter DNA methylation, and introduce targeted alterations into genomes of model organisms, plants, and human cells (Carroll, 2008, Gene Ther., 15:1463-68; Cathomen, 2008, Mol. Ther., 16:1200-07; Wu et al., 2007, Cell. Mol. Life Sci., 64:2933-44).
[0070] One existing method for engineering zinc finger arrays, known as "modular assembly," advocates the simple joining together of pre-selected zinc finger modules into arrays (Segal et al., 2003, Biochemistry, 42:2137-48; Beerli et al., 2002, Nat. Biotechnol., 20:135-141; Mandell et al., 2006, Nucleic Acids Res., 34:W516-523; Carroll et al., 2006, Nat. Protoc. 1:1329-41; Liu et al., 2002, J. Biol. Chem., 277:3850-56; Bae et al., 2003, Nat. Biotechnol., 21:275-280; Wright et al., 2006, Nat. Protoc., 1:1637-52). Although straightforward enough to be practiced by any researcher, recent reports have demonstrated a high failure rate for this method, particularly in the context of zinc finger nucleases (Ramirez et al., 2008, Nat. Methods, 5:374-375; Kim et al., 2009, Genome Res. 19:1279-88), a limitation that typically necessitates the construction and cell-based testing of very large numbers of zinc finger proteins for any given target gene (Kim et al., 2009, Genome Res. 19:1279-88).
[0071] Combinatorial selection-based methods that identify zinc finger arrays from randomized libraries have been shown to have higher success rates than modular assembly (Maeder et al., 2008, Mol. Cell, 31:294-301; Joung et al., 2010, Nat. Methods, 7:91-92; Isalan et al., 2001, Nat. Biotechnol., 19:656-660). In preferred embodiments, the zinc finger arrays are described in, or are generated as described in, WO 2011/017293 and WO 2004/099366. Additional suitable zinc finger DBDs are described in U.S. Pat. Nos. 6,511,808, 6,013,453, 6,007,988, and 6,503,717 and U.S. patent application 2002/0160940.
Base Editors
[0072] In some embodiments, the base editor is a deaminase that modifies cytosine DNA bases, e.g., a cytosine deaminase from the apolipoprotein B mRNA-editing enzyme, catalytic polypeptide-like (APOBEC) family of deaminases, including APOBEC1, APOBEC2, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D/E, APOBEC3F, APOBEC3G, APOBEC3H, APOBEC4 (see, e.g., Yang et al., J Genet Genomics. 2017 Sep. 20; 44(9):423-437); activation-induced cytosine deaminase (AID), e.g., activation induced cytosine deaminase (AICDA), cytosine deaminase 1 (CDA1), and CDA2, and cytosine deaminase acting on tRNA (CDAT). The following Table 2 provides exemplary sequences; other sequences can also be used.
TABLE-US-00021 TABLE 2 GenBank Accession Nos. Deaminase Nucleic Acid Amino Acid hAID/AICDA NM_020661.3 isoform 1 NP_065712.1 variant 1 NM_020661.3 isoform 2 NP_065712.1 variant 2 APOBEC1 NM_001644.4 isoform a NP_001635.2 variant 1 NM_005889.3 isoform b NP_005880.2 variant 3 APOBEC2 NM_006789.3 NP_006780.1 APOBEC3A NM_145699.3 isoform a NP_663745.1 variant 1 NM_001270406.1 isoform b NP_001257335.1 variant 2 APOBEC3B NM_004900.4 isoform a NP_004891.4 variant 1 NM_001270411.1 isoform b NP_001257340.1 variant 2 APOBEC3C NM_014508.2 NP_055323.2 APOBEC3D/E NM_152426.3 NP_689639.2 APOBEC3F NM_145298.5 isoform a NP_660341.2 variant 1 NM_001006666.1 isoform b NP_001006667.1 variant 2 APOBEC3G NM_021822.3 (isoform a) NP_068594.1 (variant 1) APOBEC3H NM_001166003.2 NP_001159475.2 (variant SV-200) APOBEC4 NM_203454.2 NP_982279.1 CDA1* NM_127515.4 NP_179547.1 yCD (FCY1)* NM_001184159.1 NP_015387.1 *from Saccharomyces cerevisicae S288C
[0073] Exemplary split deaminase regions are shown in Table 3. Each split region listed in Table 3 represents a region of the enzyme either known to be a linker region devoid of secondary structure and positioned away from enzymatically important functions or predicted to be linker based on alignment with hAPOBEC3G where structural information is lacking (* indicates which proteins lack sufficient structural information). Unstructured recognition loops were not included due to their importance in determining substrate binding and specificity. All protein sequences acquired from uniprot.org. All positional information refers to positions within the full-length protein sequences as described below. Candidate split regions described only indicate our best attempt at a priori prediction of which splits will be functional.
TABLE-US-00022 TABLE 3 Split Deaminase Regions Split Split Split Split Split Split Region Region Region Region Region Region Deaminase 1 2 3 4 5 6 hAID N51- D69- S85-P86 P102- M129- E153- H56 C75 N103 T140 E163 rAPOBEC1* H48- Y75- S91-P92 P108- M144- N158- H61 R81 H109 T145 W167 mAPOBEC3* N66-I70 V87-E93 S103- H120- M156- D170- P104 N121 D157 K180 hAPOBEC3A N57- Q83-I89 S99- T118- M153- D167- H70 P100 H119 T154 G178 hAPOBEC3C N57- I79-K85 S95-P96 S112- M148- Y162- H66 N113 D149 K172 hAPOBEC3G N244- K270- S286- K303- M338- D352- H257 D276 P287 H304 T339 D362 hAPOBEC3H* N49- K67- S83-P84 D100- M136- D150- H54 C73 H101 G137 Y160 hAPOBEC3F N240- I262- S278- S295- M331- Y345- H249 N268 P279 N296 G332 K355
[0074] The split deaminase regions can include mutations that may enhance base editing, e.g., when made to the nCas9-UGI portion, e.g., mutations corresponding to W90, R126, or R132 of SEQ ID NO:46, e.g., corresponding to W90Y, R126E, R132E, of SEQ ID NO:46 (see, e.g., Kim et al. "Increasing the Genome-Targeting Scope and Precision of Base Editing with Engineered Cas9-Cytosine Deaminase Fusions." Nature Biotechnology 35(4):371-376 (2017)). Alternatively or in addition, the split deaminase regions can include mutations at positions corresponding to one or more of N57, Y130, or K60 of SEQ ID NO:49, e.g., mutations corresponding to N57G, N57A, N57Q, Y130F, K60D of SEQ ID NO:49 (see, e.g., reference 17).
Variants
[0075] In some embodiments, the components of the fusion proteins are at least 80%, e.g., at least 85%, 90%, 95%, 97%, or 99% identical to the amino acid sequence of a exemplary sequence (e.g., as provided herein), e.g., have differences at up to 1%, 2%, 5%, 10%, 15%, or 20% of the residues of the exemplary sequence replaced, e.g., with conservative mutations, e.g., including or in addition to the mutations described herein. In preferred embodiments, the variant retains desired activity of the parent, e.g., nickase activity, and/or the ability to interact with a guide RNA and/or target DNA, optionally with improved specificity or altered substrate specificity.
[0076] To determine the percent identity of two nucleic acid sequences, the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in one or both of a first and a second amino acid or nucleic acid sequence for optimal alignment and non-homologous sequences can be disregarded for comparison purposes). The length of a reference sequence aligned for comparison purposes is at least 80% of the length of the reference sequence, and in some embodiments is at least 90% or 100%. The nucleotides at corresponding amino acid positions or nucleotide positions are then compared. When a position in the first sequence is occupied by the same nucleotide as the corresponding position in the second sequence, then the molecules are identical at that position (as used herein nucleic acid "identity" is equivalent to nucleic acid "homology"). The percent identity between the two sequences is a function of the number of identical positions shared by the sequences, taking into account the number of gaps, and the length of each gap, which need to be introduced for optimal alignment of the two sequences. Percent identity between two polypeptides or nucleic acid sequences is determined in various ways that are within the skill in the art, for instance, using publicly available computer software such as Smith Waterman Alignment (Smith, T. F. and M. S. Waterman (1981) J Mol Biol 147:195-7); "BestFit" (Smith and Waterman, Advances in Applied Mathematics, 482-489 (1981)) as incorporated into GeneMatcher Plus.TM., Schwarz and Dayhof (1979) Atlas of Protein Sequence and Structure, Dayhof, M. O., Ed, pp 353-358; BLAST program (Basic Local Alignment Search Tool; (Altschul, S. F., W. Gish, et al. (1990) J Mol Biol 215: 403-10), BLAST-2, BLAST-P, BLAST-N, BLAST-X, WU-BLAST-2, ALIGN, ALIGN-2, CLUSTAL, or Megalign (DNASTAR) software. In addition, those skilled in the art can determine appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the length of the sequences being compared. In general, for proteins or nucleic acids, the length of comparison can be any length, up to and including full length (e.g., 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, or 100%). For purposes of the present compositions and methods, at least 80% of the full length of the sequence is aligned.
[0077] For purposes of the present disclosure, the comparison of sequences and determination of percent identity between two sequences can be accomplished using a Blossum 62 scoring matrix with a gap penalty of 12, a gap extend penalty of 4, and a frameshift gap penalty of 5.
[0078] Conservative substitutions typically include substitutions within the following groups: glycine, alanine; valine, isoleucine, leucine; aspartic acid, glutamic acid, asparagine, glutamine; serine, threonine; lysine, arginine; and phenylalanine, tyrosine.
[0079] Also provided herein are isolated nucleic acids encoding the split deaminase fusion proteins, vectors comprising the isolated nucleic acids, optionally operably linked to one or more regulatory domains for expressing the variant proteins, and host cells, e.g., mammalian host cells, comprising the nucleic acids, and optionally expressing the variant proteins. In some embodiments, the host cells are stem cells, e.g., hematopoietic stem cells.
[0080] In some embodiments, the fusion proteins include a linker between the DNA binding domain (e.g., ZFN, TALE, or nCas9) and the BE domains. Linkers that can be used in these fusion proteins (or between fusion proteins in a concatenated structure) can include any sequence that does not interfere with the function of the fusion proteins. In preferred embodiments, the linkers are short, e.g., 2-20 amino acids, and are typically flexible (i.e., comprising amino acids with a high degree of freedom such as glycine, alanine, and serine). In some embodiments, the linker comprises one or more units consisting of GGGS (SEQ ID NO:5) or GGGGS (SEQ ID NO:6), e.g., two, three, four, or more repeats of the GGGS (SEQ ID NO:5) or GGGGS (SEQ ID NO:6) unit. Other linker sequences can also be used.
[0081] In some embodiments, the split deaminase fusion protein includes a cell-penetrating peptide sequence that facilitates delivery to the intracellular space, e.g., HIV-derived TAT peptide, penetratins, transportans, or hCT derived cell-penetrating peptides, see, e.g., Caron et al., (2001) Mol Ther. 3(3):310-8; Langel, Cell-Penetrating Peptides: Processes and Applications (CRC Press, Boca Raton Fla. 2002); El-Andaloussi et al., (2005) Curr Pharm Des. 11(28):3597-611; and Deshayes et al., (2005) Cell Mol Life Sci. 62(16):1839-49.
[0082] Cell penetrating peptides (CPPs) are short peptides that facilitate the movement of a wide range of biomolecules across the cell membrane into the cytoplasm or other organelles, e.g. the mitochondria and the nucleus. Examples of molecules that can be delivered by CPPs include therapeutic drugs, plasmid DNA, oligonucleotides, siRNA, peptide-nucleic acid (PNA), proteins, peptides, nanoparticles, and liposomes. CPPs are generally 30 amino acids or less, are derived from naturally or non-naturally occurring protein or chimeric sequences, and contain either a high relative abundance of positively charged amino acids, e.g. lysine or arginine, or an alternating pattern of polar and non-polar amino acids. CPPs that are commonly used in the art include Tat (Frankel et al., (1988) Cell. 55:1189-1193, Vives et al., (1997) J. Biol. Chem. 272:16010-16017), penetratin (Derossi et al., (1994) J. Biol. Chem. 269:10444-10450), polyarginine peptide sequences (Wender et al., (2000) Proc. Natl. Acad. Sci. USA 97:13003-13008, Futaki et al., (2001) J. Biol. Chem. 276:5836-5840), and transportan (Pooga et al., (1998) Nat. Biotechnol. 16:857-861).
[0083] CPPs can be linked with their cargo through covalent or non-covalent strategies. Methods for covalently joining a CPP and its cargo are known in the art, e.g. chemical cross-linking (Stetsenko et al., (2000) J. Org. Chem. 65:4900-4909, Gait et al. (2003) Cell. Mol. Life. Sci. 60:844-853) or cloning a fusion protein (Nagahara et al., (1998) Nat. Med. 4:1449-1453). Non-covalent coupling between the cargo and short amphipathic CPPs comprising polar and non-polar domains is established through electrostatic and hydrophobic interactions.
[0084] CPPs have been utilized in the art to deliver potentially therapeutic biomolecules into cells. Examples include cyclosporine linked to polyarginine for immunosuppression (Rothbard et al., (2000) Nature Medicine 6(11):1253-1257), siRNA against cyclin B1 linked to a CPP called MPG for inhibiting tumorigenesis (Crombez et al., (2007) Biochem Soc. Trans. 35:44-46), tumor suppressor p53 peptides linked to CPPs to reduce cancer cell growth (Takenobu et al., (2002) Mol. Cancer Ther. 1(12):1043-1049, Snyder et al., (2004) PLoS Biol. 2:E36), and dominant negative forms of Ras or phosphoinositol 3 kinase (PI3K) fused to Tat to treat asthma (Myou et al., (2003) J. Immunol. 171:4399-4405).
[0085] CPPs have been utilized in the art to transport contrast agents into cells for imaging and biosensing applications. For example, green fluorescent protein (GFP) attached to Tat has been used to label cancer cells (Shokolenko et al., (2005) DNA Repair 4(4):511-518). Tat conjugated to quantum dots have been used to successfully cross the blood-brain barrier for visualization of the rat brain (Santra et al., (2005) Chem. Commun. 3144-3146). CPPs have also been combined with magnetic resonance imaging techniques for cell imaging (Liu et al., (2006) Biochem. and Biophys. Res. Comm. 347(1):133-140). See also Ramsey and Flynn, Pharmacol Ther. 2015 Jul. 22. pii: S0163-7258(15)00141-2.
[0086] Alternatively or in addition, the split deaminase fusion proteins can include a nuclear localization sequence, e.g., SV40 large T antigen NLS (PKKKRRV (SEQ ID NO:7)) and nucleoplasmin NLS (KRPAATKKAGQAKKKK (SEQ ID NO: 8)). Other NLSs are known in the art; see, e.g., Cokol et al., EMBO Rep. 2000 Nov. 15; 1(5): 411-415; Freitas and Cunha, Curr Genomics. 2009 December; 10(8): 550-557.
[0087] In some embodiments, the split deaminase fusion proteins include a moiety that has a high affinity for a ligand, for example GST, FLAG or hexahistidine sequences. Such affinity tags can facilitate the purification of recombinant split deaminase fusion proteins.
[0088] The split deaminase fusion proteins described herein can be used for altering the genome of a cell. The methods generally include expressing or contacting the split deaminase fusion proteins in the cells; in versions using one or two Cas9s, the methods include using a guide RNA having a region complementary to a selected portion of the genome of the cell. Methods for selectively altering the genome of a cell are known in the art, see, e.g., U.S. Pat. No. 8,993,233; US 20140186958; U.S. Pat. No. 9,023,649; WO/2014/099744; WO 2014/089290; WO2014/144592; WO144288; WO2014/204578; WO2014/152432; WO2115/099850; U.S. Pat. No. 8,697,359; US20160024529; US20160024524; US20160024523; US20160024510; US20160017366; US20160017301; US20150376652; US20150356239; US20150315576; US20150291965; US20150252358; US20150247150; US20150232883; US20150232882; US20150203872; US20150191744; US20150184139; US20150176064; US20150167000; US20150166969; US20150159175; US20150159174; US20150093473; US20150079681; US20150067922; US20150056629; US20150044772; US20150024500; US20150024499; US20150020223; US20140356867; US20140295557; US20140273235; US20140273226; US20140273037; US20140189896; US20140113376; US20140093941; US20130330778; US20130288251; US20120088676; US20110300538; US20110236530; US20110217739; US20110002889; US20100076057; US20110189776; US20110223638; US20130130248; US20150050699; US20150071899; US20150050699; US20150045546; US20150031134; US20150024500; US20140377868; US20140357530; US20140349400; US20140335620; US20140335063; US20140315985; US20140310830; US20140310828; US20140309487; US20140304853; US20140298547; US20140295556; US20140294773; US20140287938; US20140273234; US20140273232; US20140273231; US20140273230; US20140271987; US20140256046; US20140248702; US20140242702; US20140242700; US20140242699; US20140242664; US20140234972; US20140227787; US20140212869; US20140201857; US20140199767; US20140189896; US20140186958; US20140186919; US20140186843; US20140179770; US20140179006; US20140170753; WO/2008/108989; WO/2010/054108; WO/2012/164565; WO/2013/098244; WO/2013/176772; US 20150071899; Makarova et al., "Evolution and classification of the CRISPR-Cas systems" 9(6) Nature Reviews Microbiology 467-477 (1-23) (June 2011); Wiedenheft et al., "RNA-guided genetic silencing systems in bacteria and archaea" 482 Nature 331-338 (Feb. 16, 2012); Gasiunas et al., "Cas9-crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria" 109(39) Proceedings of the National Academy of Sciences USA E2579-E2586 (Sep. 4, 2012); Jinek et al., "A Programmable Dual-RNA-Guided DNA Endonuclease in Adaptive Bacterial Immunity" 337 Science 816-821 (Aug. 17, 2012); Carroll, "A CRISPR Approach to Gene Targeting" 20(9) Molecular Therapy 1658-1660 (September 2012); U.S. Appl. No. 61/652,086, filed May 25, 2012; Al-Attar et al., Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs): The Hallmark of an Ingenious Antiviral Defense Mechanism in Prokaryotes, Biol Chem. (2011) vol. 392, Issue 4, pp. 277-289; Hale et al., Essential Features and Rational Design of CRISPR RNAs That Function With the Cas RAMP Module Complex to Cleave RNAs, Molecular Cell, (2012) vol. 45, Issue 3, 292-302.
[0089] For methods in which the split deaminase fusion proteins are delivered to cells, the proteins can be produced using any method known in the art, e.g., by in vitro translation, or expression in a suitable host cell from nucleic acid encoding the split deaminase fusion protein; a number of methods are known in the art for producing proteins. For example, the proteins can be produced in and purified from yeast, E. coli, insect cell lines, plants, transgenic animals, or cultured mammalian cells; see, e.g., Palomares et al., "Production of Recombinant Proteins: Challenges and Solutions," Methods Mol Biol. 2004; 267:15-52. In addition, the split deaminase fusion proteins can be linked to a moiety that facilitates transfer into a cell, e.g., a lipid nanoparticle, optionally with a linker that is cleaved once the protein is inside the cell. See, e.g., LaFountaine et al., Int J Pharm. 2015 Aug. 13; 494(1):180-194.
[0090] Expression Systems
[0091] To use the split deaminase fusion proteins described herein, it may be desirable to express them from a nucleic acid that encodes them. This can be performed in a variety of ways. For example, the nucleic acid encoding the split deaminase fusion can be cloned into an intermediate vector for transformation into prokaryotic or eukaryotic cells for replication and/or expression. Intermediate vectors are typically prokaryote vectors, e.g., plasmids, or shuttle vectors, or insect vectors, for storage or manipulation of the nucleic acid encoding the split deaminase fusion for production of the split deaminase fusion protein. The nucleic acid encoding the split deaminase fusion protein can also be cloned into an expression vector, for administration to a plant cell, animal cell, preferably a mammalian cell or a human cell, fungal cell, bacterial cell, or protozoan cell.
[0092] To obtain expression, a sequence encoding a split deaminase fusion protein is typically subcloned into an expression vector that contains a promoter to direct transcription. Suitable bacterial and eukaryotic promoters are well known in the art and described, e.g., in Sambrook et al., Molecular Cloning, A Laboratory Manual (3d ed. 2001); Kriegler, Gene Transfer and Expression: A Laboratory Manual (1990); and Current Protocols in Molecular Biology (Ausubel et al., eds., 2010). Bacterial expression systems for expressing the engineered protein are available in, e.g., E. coli, Bacillus sp., and Salmonella (Palva et al., 1983, Gene 22:229-235). Kits for such expression systems are commercially available. Eukaryotic expression systems for mammalian cells, yeast, and insect cells are well known in the art and are also commercially available.
[0093] The promoter used to direct expression of a nucleic acid depends on the particular application. For example, a strong constitutive promoter is typically used for expression and purification of fusion proteins. In contrast, when the split deaminase fusion protein is to be administered in vivo for gene regulation, either a constitutive or an inducible promoter can be used, depending on the particular use of the split deaminase fusion protein. In addition, a preferred promoter for administration of the split deaminase fusion protein can be a weak promoter, such as HSV TK or a promoter having similar activity. The promoter can also include elements that are responsive to transactivation, e.g., hypoxia response elements, Gal4 response elements, lac repressor response element, and small molecule control systems such as tetracycline-regulated systems and the RU-486 system (see, e.g., Gossen & Bujard, 1992, Proc. Natl. Acad. Sci. USA, 89:5547; Oligino et al., 1998, Gene Ther., 5:491-496; Wang et al., 1997, Gene Ther., 4:432-441; Neering et al., 1996, Blood, 88:1147-55; and Rendahl et al., 1998, Nat. Biotechnol., 16:757-761).
[0094] In addition to the promoter, the expression vector typically contains a transcription unit or expression cassette that contains all the additional elements required for the expression of the nucleic acid in host cells, either prokaryotic or eukaryotic. A typical expression cassette thus contains a promoter operably linked, e.g., to the nucleic acid sequence encoding the split deaminase fusion protein, and any signals required, e.g., for efficient polyadenylation of the transcript, transcriptional termination, ribosome binding sites, or translation termination. Additional elements of the cassette may include, e.g., enhancers, and heterologous spliced intronic signals.
[0095] The particular expression vector used to transport the genetic information into the cell is selected with regard to the intended use of the split deaminase fusion protein, e.g., expression in plants, animals, bacteria, fungus, protozoa, etc. Standard bacterial expression vectors include plasmids such as pBR322 based plasmids, pSKF, pET23D, and commercially available tag-fusion expression systems such as GST and LacZ.
[0096] Expression vectors containing regulatory elements from eukaryotic viruses are often used in eukaryotic expression vectors, e.g., SV40 vectors, papilloma virus vectors, and vectors derived from Epstein-Barr virus. Other exemplary eukaryotic vectors include pMSG, pAV009/A+, pMTO10/A+, pMAMneo-5, baculovirus pDSVE, and any other vector allowing expression of proteins under the direction of the SV40 early promoter, SV40 late promoter, metallothionein promoter, murine mammary tumor virus promoter, Rous sarcoma virus promoter, polyhedrin promoter, or other promoters shown effective for expression in eukaryotic cells.
[0097] The vectors for expressing the split deaminase fusion protein can include RNA Pol III promoters to drive expression of the guide RNAs, e.g., the H1, U6 or 7SK promoters. These human promoters allow for expression of split deaminase fusion protein in mammalian cells following plasmid transfection.
[0098] Some expression systems have markers for selection of stably transfected cell lines such as thymidine kinase, hygromycin B phosphotransferase, and dihydrofolate reductase. High yield expression systems are also suitable, such as using a baculovirus vector in insect cells, with the gRNA encoding sequence under the direction of the polyhedrin promoter or other strong baculovirus promoters.
[0099] The elements that are typically included in expression vectors also include a replicon that functions in E. coli, a gene encoding antibiotic resistance to permit selection of bacteria that harbor recombinant plasmids, and unique restriction sites in nonessential regions of the plasmid to allow insertion of recombinant sequences.
[0100] Standard transfection methods are used to produce bacterial, mammalian, yeast or insect cell lines that express large quantities of protein, which are then purified using standard techniques (see, e.g., Colley et al., 1989, J. Biol. Chem., 264:17619-22; Guide to Protein Purification, in Methods in Enzymology, vol. 182 (Deutscher, ed., 1990)). Transformation of eukaryotic and prokaryotic cells are performed according to standard techniques (see, e.g., Morrison, 1977, J. Bacteriol. 132:349-351; Clark-Curtiss & Curtiss, Methods in Enzymology 101:347-362 (Wu et al., eds, 1983).
[0101] Any of the known procedures for introducing foreign nucleotide sequences into host cells may be used. These include the use of calcium phosphate transfection, polybrene, protoplast fusion, electroporation, nucleofection, liposomes, microinjection, naked DNA, plasmid vectors, viral vectors, both episomal and integrative, and any of the other well-known methods for introducing cloned genomic DNA, cDNA, synthetic DNA or other foreign genetic material into a host cell (see, e.g., Sambrook et al., supra). It is only necessary that the particular genetic engineering procedure used be capable of successfully introducing at least one gene into the host cell capable of expressing the split deaminase fusion protein.
[0102] In methods wherein the fusion proteins include a Cas9 domain, the methods also include delivering a gRNA that interacts with the Cas9.
[0103] Alternatively, the methods can include delivering the split deaminase fusion protein and guide RNA together, e.g., as a complex. For example, the split deaminase fusion protein and gRNA can be can be overexpressed in a host cell and purified, then complexed with the guide RNA (e.g., in a test tube) to form a ribonucleoprotein (RNP), and delivered to cells. In some embodiments, the split deaminase fusion protein can be expressed in and purified from bacteria through the use of bacterial expression plasmids. For example, His-tagged split deaminase fusion protein can be expressed in bacterial cells and then purified using nickel affinity chromatography. The use of RNPs circumvents the necessity of delivering plasmid DNAs encoding the nuclease or the guide, or encoding the nuclease as an mRNA. RNP delivery may also improve specificity, presumably because the half-life of the RNP is shorter and there's no persistent expression of the nuclease and guide (as you'd get from a plasmid). The RNPs can be delivered to the cells in vivo or in vitro, e.g., using lipid-mediated transfection or electroporation. See, e.g., Liang et al. "Rapid and highly efficient mammalian cell engineering via Cas9 protein transfection." Journal of biotechnology 208 (2015): 44-53; Zuris, John A., et al. "Cationic lipid-mediated delivery of proteins enables efficient protein-based genome editing in vitro and in vivo." Nature biotechnology 33.1 (2015): 73-80; Kim et al. "Highly efficient RNA-guided genome editing in human cells via delivery of purified Cas9 ribonucleoproteins." Genome research 24.6 (2014): 1012-1019.
[0104] The present invention also includes the vectors and cells comprising the vectors, as well as kits comprising the proteins and nucleic acids described herein, e.g., for use in a method described herein.
EXAMPLES
[0105] The invention is further described in the following examples, which do not limit the scope of the invention described in the claims.
[0106] Materials and Methods
[0107] The following materials and methods were used in the Examples below.
[0108] Molecular Cloning
[0109] sDA1-containing expression plasmids were constructed by selectively amplifying desired regions of the rAPO1, hA3A, or BE3 genes, as well as DNA sequences encoding a 3AC3L-NLS or NLS only linker and desired EGFP-targeting ZFs, by the PCR method such that they had significant overlapping ends and using isothermal assembly (or "Gibson Assembly," NEB) to assemble them in the desired order in a pCAG expression vector. sDA2-containing expression plasmids were constructed by truncating a BE3 gene by PCR and using Gibson assembly to put the resulting pieces into a pCAG expression plasmid. PCR was conducted using Q5 or Phusion polymerases (NEB).
[0110] Cell Culture and Transfections
[0111] A HEK293 cell line in which an integrated EGFP reporter gene has been integrated (unpublished) was grown in culture using media consisting of Advanced Dulbeccos Modified Medium (Gibco) supplemented with 10% heat inactivated fetal bovine serum (Gibco), 1% 10,000 U/ml penicillin-streptomycin solution (Gibco), and 1% Glutamax (Gibco). Cells were passaged every 3-4 days to maintain an actively growing population and avoid anoxic conditions. Transfections containing 1.0 microgram of transfection quality DNA (Qiagen Maxi- or Miniprep) were conducted by seeding 1.5.times.10.sup.5 cells in 24-well TC-treated plates (Corning) and using TransIT-293 reagent according to manufacturer's protocol (Minis Bio). For split deaminase experiments: of the 1.0 micrograms of DNA transfected, 400 nanograms contained the sDA1-encoding plasmid, 400 nanograms contained the sDA2-encoding plasmid, and 200 nanograms contained an expression plasmid encoding the SpCas9 gRNA targeting the EGFP reporter gene. For BE control experiments: 400 nanograms contained BE-expressing plasmid, 400 nanograms contained a pMax-GFP-encoding plasmid (Lonza), and 200 nanograms contained an expression plasmid encoding the SpCas9 gRNA targeting the EGFP reporter gene. For individual halfase controls: 400 nanograms contained the sDA-encoding plasmid, 400 nanograms contained a pMax-GFP-encoding plasmid (Lonza), and 200 nanograms contained an expression plasmid encoding the SpCas9 gRNA targeting the EGFP reporter gene. Genomic DNA was harvested 3 days post-transfection using the DNAdvance kit (Agencourt).
[0112] High-Throughput Amplicon Sequencing
[0113] Rates of base editing at target loci were determined by deep-sequencing of PCR amplicons amplified off of genomic DNA isolated from transfected cells. Target site genomic DNA was amplified using EGFP-specific DNA primers flanking the sDA2 nCas9 binding sites. Illumina TruSeq adapters were added to the ends of the amplicons either by PCR or NEBNext Ultra II kit (NEB) and molecularly indexed with NEBNext Dual Index Primers (NEB). Samples were combined into libraries and sequenced on the Illumina MiSeq machine using the MiSeq Reagent Micro Kit v2 (Illumina). Sequencing results were analyzed using a batch version of the software CRISPResso (crispresso.rocks).
[0114] gRNA and ZF Target Sequences
TABLE-US-00023 ZF1 Binding site: (SEQ ID NO: 9) aGAAGATGGTg ZF2 Binding Site: (SEQ ID NO: 10) gGTCGGGGTAg gRNA1 Binding Site (with PAM): (SEQ ID NO: 11) TTCAAGTCCGCCATGCCCGAAGG gRNA2 Binding Site (with PAM): (SEQ ID NO: 12) CATGCCCGAAGGCTACGTCCAGG
[0115] Relevant Protein Sequences
[0116] In the following sequences, "X" indicates an undetermined amino acid residue, indicating the variable regions of a ZF that are responsible for specific DNA binding.
TABLE-US-00024 3AC3L-NLS Linker (SEQ ID NO: 13) SSGNSNANSRGPSFSSGLVPLSLRGSHGSPKKKRKVGS NLS Linker (SEQ ID NO: 14) GSPKKKRKVGS N-rAPOBEC1 sDA1.1-3AC3L-NLS-ZF-C (SEQ ID NO: 15) MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHS ISSGNSNANSRGPSFSSGLVPLSLRGSHGSPKKKRKVGSSRPGERPFQCRICMR NFSXXXXLXXHTRTHTGEKPFQCRICMRNFSXXXXLXXHLRTHTGEKPFQCR ICMRNFSXXXXLXXHLKTHLRGSSAQ N-rAPOBEC1 sDA1.2-3AC3L-NLS-ZF-C (SEQ ID NO: 16) MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHS IWRHTSQNTNKHVEVNFIEKFTTERYFCPSSGNSNANSRGPSFSSGLVPLSLRG SHGSPKKKRKVGSSRPGERPFQCRICMRNFSXXXXLXXHTRTHTGEKPFQCRI CMRNFSXXXXLXXHLRTHTGEKPFQCRTCMRNFSXXXXLXXHLKTHLRGSS AQ N-rAPOBEC1 sDA1.3-3AC3L-NLS-ZF-C (SEQ ID NO: 17) MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHS IWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSSSGNSNANSRG PSFSSGLVPLSLRGSHGSPKKKRKVGSSRPGERPFQCRICMRNFSXXXXLXXH TRTHTGEKPFQCRICMRNFSXXXXLXXHLRTHTGEKPFQCRICMRNFSXXXX LXXEILKTHLRGSSAQ N-rAPOBEC1 sDA1.4-3AC3L-NLS-ZF-C (SEQ ID NO: 18) MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHS IWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEF LSRYPSSGNSNANSRGPSFSSGLVPLSLRGSHGSPKKKRKVGSSRPGERPFQCR ICMRNFSXXXXLXXHTRTHTGEKPFQCRTCMRNFSXXXXLXXHLRTHTGEKP FQCRICMRNFSXXXXLXXHLKTHLRGSSAQ N-rAPOBEC1 sDA1.5-3AC3L-NLS-ZF-C (SEQ ID NO: 19) MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHS IWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEF LSRYPHVTLFIYIARLYHHADPRNRQGLRDLISSGVTIQIMSSGNSNANSRGPS FSSGLVPLSLRGSHGSPKKKRKVGSSRPGERPFQCRICMRNFSXXXXLXXHTR THTGEKPFQCRTCMRNFSXXXXLXXHLRTHTGEKPFQCRICMRNFSXXXXLX XHLKTHLRGSSAQ N-rAPOBEC1 sDA1.6-3AC3L-NLS-ZF-C (SEQ ID NO: 20) MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHS IWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEF LSRYPHVTLFIYIARLYHHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNF VNYSSSGNSNANSRGPSFSSGLVPLSLRGSHGSPKKKRKVGSSRPGERPFQCRI CMRNFSXXXXLXXHTRTHTGEKPFQCRTCMRNFSXXXXLXXHLRTHTGEKP FQCRICMRNFSXXXXLXXHLKTHLRGSSAQ N-rAPOBEC1 sDA2.1-nCas9-UGI-C (SEQ ID NO: 21) MWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECS RAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLISSGVTIQIMTEQESGYC WRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQPQLTF FTIALQSCHYQRLPPHILWATGLKSGSETPGTSESATPESDKKYSIGLAIGTNSV GWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTA RRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNI VDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDL NPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQ LPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLA QIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTL LKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTE ELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIE KILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIER MTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQK KAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLL KIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQL KRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTF KEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKP ENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNE KLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSD KNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELD KAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSD FRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVY DVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETG EIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARK KDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFE KNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELAL PSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVIL ADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKR YTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKETGKQLVIQ ESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVI QDSNGENKIKMLSGGSPKKKRKV N-rAPOBEC1 sDA2.2-nCas9-UGI-C (SEQ ID NO: 22) MNTRCSITWFLSWSPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPR NRQGLRDLISSGVTIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLY VLELYCIILGLPPCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLKSGSE TPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSI KKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDS FFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKA DLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINA SGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFD LAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVN TEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGY IDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHL GELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKS EETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYN ELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIEC FDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDR EMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILD FLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIK KGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEE GIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDV DHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNA KLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTK YDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVG TALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFF KTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTE VQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEK GKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFEL ENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQL FVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIH LFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQL GGDSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDE STDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV N-rAPOBEC1 sDA2.3-nCas9-UGI-C (SEQ ID NO: 23) MPCGECSRAITEFLSRYPHVTLFIYIARLYHHADPRNRQGLRDLISSGV TIQIMTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPC LNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLKSGSETPGTSESATPESD KKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDS GETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVE EDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAH MIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSA RLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLS KDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSAS MIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEF YKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQ EDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEE VVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTE GMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVED RFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTY AHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANR NFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVV DELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQIL
KEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFL KDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFD NLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIR EVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTETTLANGE IRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKES ILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSV KELLGITIVIERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRML ASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGA PAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTN LSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLL TSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV N-rAPOBEC1 sDA2.4-nCas9-UGI-C (SEQ ID NO: 24) MHVTLFIYIARLYHHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRN FVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQPQLTFFTIA LQSCHYQRLPPHILWATGLKSGSETPGTSESATPESDKKYSIGLAIGTNSVGW AVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRR YTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDE VAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDN SDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGE KKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGD QYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKA LVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELL VKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKIL TFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTN FDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIV DLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIK DKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRR YTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDI QKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIV IEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYL YYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRG KSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGF IKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKD FQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRK MIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVW DKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDW DPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPI DFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSK YVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILAD ANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYT STKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKETGKQLVIQESI LMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDS NGENKIKMLSGGSPKKKRKV N-rAPOBEC1 sDA2.5-nCas9-UGI-C (SEQ ID NO: 25) MTEQESGYCWRNFVNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLP PCLNILRRKQPQLTFFTIALQSCHYQRLPPHILWATGLKSGSETPGTSESATPES DKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFD SGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLV EEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAH MIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSA RLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLS KDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSAS MIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEF YKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQ EDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEE VVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTE GMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVED RFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTY AHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANR NFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVV DELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQIL KEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFL KDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFD NLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIR EVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTETTLANGE IRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKES ILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSV KELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRML ASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGA PAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTN LSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLL TSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV N-rAPOBEC1 sDA2.6-nCas9-UGI-C (SEQ ID NO: 26) MPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQPQLTFFTI ALQSCHYQRLPPHILWATGLKSGSETPGTSESATPESDKKYSIGLAIGTNSVG WAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTAR RRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIV DEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNP DNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLP GEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQI GDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLL KALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEE LLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEK ILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERM TNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKK AIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLK IIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKR RRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKE DIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPEN IVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKL YLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKN RGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKA GFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFR KDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDV RKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIV WDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKD WDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKN PIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPS KYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILA DANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRY TSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKETGKQLVIQE SILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQ DSNGENKIKMLSGGSPKKKRKV N-hAPOBEC3A sDA1.1-NLS-ZF-C (SEQ ID NO: 27) MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVK MDQHRGFLHNQAKNLLGSPKKKRKVGSSRPGERPFQCRICMRNFSXXXXLX XHTRTHTGEKPFQCRICMRNFSXXXXLXXHLRTHTGEKPFQCRICMRNFSXX XXLXXEILKTHLRGSSAQ N-hAPOBEC3A sDA1.2-NLS-ZF-C (SEQ ID NO: 28) MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVK MDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPGSPKKKRKVGSS RPGERPFQCRICMRNFSXXXXLXXHTRTHTGEKPFQCRICMRNFSXXXXLXX HLRTHTGEKPFQCRICMRNFSXXXXLXXHLKTHLRGSSAQ N-hAPOBEC3A sDA1.3-NLS-ZF-C (SEQ ID NO: 29) MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVK MDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWS GSPKKKRKVGSSRPGERPFQCRICMRNFSXXXXLXXHTRTHTGEKPFQCRTC MRNFSXXXXLXXHLRTHTGEKPFQCRICMRNFSXXXXLXXHLKTHLRGSSA Q N-hAPOBEC3A sDA1.4-NLS-ZF-C (SEQ ID NO: 30) MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVK MDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWS
PCFSWGCAGEVRAFLQENTGSPKKKRKVGSSRPGERPFQCRICMRNFSXXXX LXXHTRTHTGEKPFQCRICMRNFSXXXXLXXHLRTHTGEKPFQCRICMRNFS XXXXLXXHLKTHLRGSSAQ N-hAPOBEC3A sDA1.5-NLS-ZF-C (SEQ ID NO: 31) MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVK MDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWS PCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQV SIMGSPKKKRKVGSSRPGERPFQCRICMRNFSXXXXLXXHTRTHTGEKPFQC RICMRNFSXXXXLXXHLRTHTGEKPFQCRICMRNFSXXXXLXXHLKTHLRGS SAQ N-hAPOBEC3A sDA1.6-NLS-ZF-C (SEQ ID NO: 32) MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVK MDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWS PCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQV SIMTYDEFKHCWDTFVDHQGCPGSPKKKRKVGSSRPGERPFQCRICMRNFSX XXXLXXHTRTHTGEKPFQCRTCMRNFSXXXXLXXHLRTHTGEKPFQCRTCMR NFSXXXXLXXHLKTHLRGSSAQ N-hAPOBEC3A sDA2.1-nCas9-UGI-C (SEQ ID NO: 33) MCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGE VRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHC WDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNSGSETPGTSESATP ESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALL FDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESF LVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLAL AHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAI LSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKL QLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPL SASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQ EEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAIL RRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPW NFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVK YVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEIS GVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEER LKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDG FANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQT VKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELG SQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQ SFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQR KFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDEND KLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIK KYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITL ANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGG FSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKK LKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRK RMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQH KHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTN LGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGG STNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENV MLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV N-hAPOBEC3A sDA2.2-nCas9-UGI-C (SEQ ID NO: 34) MAQIYRVTWFISWSPCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYD PLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHS QALSGRLRAILQNQGNSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITD EYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRR KNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAY HEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDV DKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKN GLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYA DLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVR QQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKL NREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRI PYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDK NLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLL FKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKD FLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTG WGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKA QVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEM ARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYY LQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKS DNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIK RQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQ FYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMI AKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDK GRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDP KKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDF LEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYV NFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANL DKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK EVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKETGKQLVIQESILM LPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNG ENKIKMLSGGSPKKKRKV N-hAPOBEC3A sDA2.3-nCas9-UGI-C (SEQ ID NO: 35) MPCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRD AGAQVSIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQN QGNSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVL GNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNE MAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKL VDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQL FEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLT PNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAIL LSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQS KNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDN GSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRF AWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLL YEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLK EDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIV LTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRD KQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIA NLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNS RERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELD INRLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKN YWRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQI LDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHD AYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFF YSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQ VNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSV LVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIK LPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSP EDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPI REQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYE TRIDLSQLGGDSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDIL VHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKR KV N-hAPOBEC3A sDA2.4-nCas9-UGI-C (SEQ ID NO: 36) MHVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWD TFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNSGSETPGTSESATPESD KKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDS GETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVE EDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAH MIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSA RLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLS KDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSAS MIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEF YKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQ EDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEE
VVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTE GMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVED RFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTY AHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANR NFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVV DELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQIL KEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFL KDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFD NLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIR EVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPK LESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTETTLANGE IRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKES ILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSV KELLGITMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRML ASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHY LDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGA PAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTN LSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLL TSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV N-hAPOBEC3A sDA2.5-nCas9-UGI-C (SEQ ID NO: 37) MTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQG NSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVITDEYKVPSKKFKVLGN TDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEM AKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLV DSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLF EENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTP NFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILL SDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSK NGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGS IPHQIHLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAW MTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYE YFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKED YFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLT LTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQ SGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANL AGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRE RMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDIN RLSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNY WRQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQIL DSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDA YLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFY SNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQV NIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVL VVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKL PKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPE DNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIR EQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYET RIDLSQLGGDSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDIL VHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKR KV N-hAPOBEC3A sDA2.5-nCas9-UGI-C (SEQ ID NO: 38) MFQPWDGLDEHSQALSGRLRAILQNQGNSGSETPGTSESATPESDKKY SIGLAIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGET AEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDK KHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKF RGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLS KSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDT YDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIK RYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKF IKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDF YPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVV DKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGM RKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRF NASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAH LFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNF MQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDE LVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKE HPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKD DSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNL TKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREV KVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLE SEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIR KRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESIL PKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKE LLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLAS AGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLD EIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPA AFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLS DIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTS DAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV S. aureus Cas9 (SEQ ID NO: 39) MKRNYILGLDIGITSVGYGIIDYETRDVIDAGVRLFKEANVENNEGRRS KRGARRLKRRRRHRIQRVKKLLFDYNLLTDHSELSGINPYEARVKGLSQKLS EEEFSAALLHLAKRRGVHNVNEVEEDTGNELSTKEQISRNSKALEEKYVAEL QLERLKKDGEVRGSINRFKTSDYVKEAKQLLKVQKAYHQLDQSFIDTYIDLL ETRRTYYEGPGEGSPFGWKDIKEWYEMLMGHCTYFPEELRSVKYAYNADLY NALNDLNNLVITRDENEKLEYYEKFQIIENVFKQKKKPTLKQIAKEILVNEEDI KGYRVTSTGKPEFTNLKVYHDIKDITARKEIIENAELLDQIAKILTIYQSSEDIQ EELTNLNSELTQEEIEQISNLKGYTGTHNLSLKAINLILDELWHTNDNQIAIFNR LKLVPKKVDLSQQKEIPTTLVDDFILSPVVKRSFIQSIKVINAIIKKYGLPNDIIIE LAREKNSKDAQKMINEMQKRNRQTNERIEEIIRTTGKENAKYLIEKIKLHDMQ EGKCLYSLEAIPLEDLLNNPFNYEVDHIIPRSVSFDNSFNNKVLVKQEENSKKG NRTPFQYLSSSDSKISYETFKKHILNLAKGKGRISKTKKEYLLEERDINRFSVQ KDFINRNLVDTRYATRGLMNLLRSYFRVNNLDVKVKSINGGFTSFLRRKWKF KKERNKGYKHHAEDALIIANADFIFKEWKKLDKAKKVMENQMFEEKQAES MPEIETEQEYKEIFITPHQIKHIKDFKDYKYSHRVDKKPNRELINDTLYSTRKD DKGNTLIVNNLNGLYDKDNDKLKKLINKSPEKLLMYHHDPQTYQKLKLIME QYGDEKNPLYKYYEETGNYLTKYSKKDNGPVIKKIKYYGNKLNAHLDITDD YPNSRNKVVKLSLKPYRFDVYLDNGVYKFVTVKNLDVIKKENYYEVNSKCY EEAKKLKKISNQAEFIASFYNNDLIKINGELYRVIGVNNDLLNRIEVNMIDITY REYLENMNDKRPPRIIKTIASKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG C. jejuni Cas9 (SEQ ID NO: 40) MARILAFDIGISSIGWAFSENDELKDCGVRIFTKVENPKTGESLALPRRL ARSARKRLARRKARLNHLKHLIANEFKLNYEDYQSFDESLAKAYKGSLISPY ELRFRALNELLSKQDFARVILHIAKRRGYDDIKNSDDKEKGAILKAIKQNEEK LANYQSVGEYLYKEYFQKFKENSKEFTNVRNKKESYERCIAQSFLKDELKLIF KKQREFGFSFSKKFEEEVLSVAFYKRALKDFSHLVGNCSFFTDEKRAPKNSPL AFMFVALTRIINLLNNLKNTEGILYTKDDLNALLNEVLKNGTLTYKQTKKLL GLSDDYEFKGEKGTYFIEFKKYKEFIKALGEHNLSQDDLNEIAKDITLIKDEIK LKKALAKYDLNQNQIDSLSKLEFKDHLNISFKALKLVTPLMLEGKKYDEACN ELNLKVAINEDKKDFLPAFNETYYKDEVTNPVVLRAIKEYRKVLNALLKKYG KVHKINIELAREVGKNHSQRAKIEKEQNENYKAKKDAELECEKLGLKINSKNI LKLRLFKEQKEFCAYSGEKIKISDLQDEKMLEIDHIYPYSRSFDDSYMNKVLV FTKQNQEKLNQTPFEAFGNDSAKWQKIEVLAKNLPTKKQKRILDKNYKDKE QKNFKDRNLNDTRYIARLVLNYTKDYLDFLPLSDDENTKLNDTQKGSKVHV EAKSGMLTSALRHTWGFSAKDRNNHLHHAIDAVIIAYANNSIVKAFSDFKKE QESNSAELYAKKISELDYKNKRKFFEPFSGFRQKVLDKIDEIFVSKPERKKPSG ALHEETFRKEEEFYQSYGGKEGVLKALELGKIRKVNGKIVKNGDMFRVDIFK HKKTNKFYAVPIYTMDFALKVLPNKAVARSKKGEIKDWILMDENYEFCFSLY KDSLILIQTKDMQEPEFVYYNAFTSSTVSLIVSKHDNKFETLSKNQKILFKNAN EKEVIAKSIGIQNLKVFEKYIVSALGEVTKAEFRQREDFKK P. lavamentivorans Cas9 (SEQ ID NO: 41) MERIFGFDIGTTSIGFSVIDYSSTQSAGNIQRLGVRIFPEARDPDGTPLNQ QRRQKRMMRRQLRRRRIRRKALNETLHEAGFLPAYGSADWPVVMADEPYE LRRRGLEEGLSAYEFGRAIYHLAQHRHFKGRELEESDTPDPDVDDEKEAANE RAATLKALKNEQTTLGAWLARRPPSDRKRGIHAHRNVVAEEFERLWEVQSK
FHPALKSEEMRARISDTIFAQRPVFWRKNTLGECRFMPGEPLCPKGSWLSQQR RMLEKLNNLAIAGGNARPLDAEERDAILSKLQQQASMSWPGVRSALKALYK QRGEPGAEKSLKFNLELGGESKLLGNALEAKLADMFGPDWPAHPRKQEIRH AVHERLWAADYGETPDKKRVIILSEKDRKAHREAAANSFVADFGITGEQAAQ LQALKLPTGWEPYSIPALNLFLAELEKGERFGALVNGPDWEGWRRTNFPHRN QPTGEILDKLPSPASKEERERISQLRNPTVVRTQNELRKVVNNLIGLYGKPDRI RIEVGRDVGKSKREREEIQSGIRRNEKQRKKATEDLIKNGIANPSRDDVEKWI LWKEGQERCPYTGDQIGFNALFREGRYEVEHIWPRSRSFDNSPRNKTLCRKD VNIEKGNRMPFEAFGHDEDRWSAIQIRLQGMVSAKGGTGMSPGKVKRFLAK TMPEDFAARQLNDTRYAAKQILAQLKRLWPDMGPEAPVKVEAVTGQVTAQ LRKLWTLNNILADDGEKTRADHRHHAIDALTVACTHPGMTNKLSRYWQLRD DPRAEKPALTPPWDTIRADAEKAVSEIVVSHRVRKKVSGPLHKETTYGDTGT DIKTKSGTYRQFVTRKKIESLSKGELDEIRDPRIKEIVAAHVAGRGGDPKKAFP PYPCVSPGGPEIRKVRLTSKQQLNLMAQTGNGYADLGSNHHIAIYRLPDGKA DFEIVSLFDASRRLAQRNPIVQRTRADGASFVMSLAAGEAIMIPEGSKKGIWIV QGVWASGQVVLERDTDADHSTTTRPMPNPILKDDAKKVSIDPIGRVRPSND N cinerea Cas9 (SEQ ID NO: 42) MAAFKPNPMNYILGLDIGIASVGWAIVEIDEEENPIRLIDLGVRVFERA EVPKTGDSLAAARRLARSVRRLTRRRAHRLLRARRLLKREGVLQAADFDEN GLIKSLPNTPWQLRAAALDRKLTPLEWSAVLLHLIKHRGYLSQRKNEGETAD KELGALLKGVADNTHALQTGDFRTPAELALNKFEKESGHIRNQRGDYSHTFN RKDLQAELNLLFEKQKEFGNPHVSDGLKEGIETLLMTQRPALSGDAVQKML GHCTFEPTEPKAAKNTYTAERFVWLTKLNNLRILEQGSERPLTDTERATLMD EPYRKSKLTYAQARKLLDLDDTAFFKGLRYGKDNAEASTLMEMKAYHAISR ALEKEGLKDKKSPLNLSPELQDEIGTAFSLFKTDEDITGRLKDRVQPEILEALL KHISFDKFVQISLKALRRIVPLMEQGNRYDEACTEIYGDHYGKKNTEEKIYLP PIPADEIRNPVVLRALSQARKVINGVVRRYGSPARIHIETAREVGKSFKDRKEI EKRQEENRKDREKSAAKFREYFPNFVGEPKSKDILKLRLYEQQHGKCLYSGK EINLGRLNEKGYVEIDHALPFSRTWDDSFNNKVLALGSENQNKGNQTPYEYF NGKDNSREWQEFKARVETSRFPRSKKQRILLQKFDEDGFKERNLNDTRYINR FLCQFVADHMLLTGKGKRRVFASNGQITNLLRGFWGLRKVRAENDRHHALD AVVVACSTIAMQQKITRFVRYKEMNAFDGKTIDKETGEVLHQKAHFPQPWE FFAQEVMIRVFGKPDGKPEFEEADTPEKLRTLLAEKLSSRPEAVHKYVTPLFIS RAPNRKMSGQGHMETVKSAKRLDEGISVLRVPLTQLKLKDLEKMVNREREP KLYEALKARLEAHKDDPAKAFAEPFYKYDKAGNRTQQVKAVRVEQVQKTG VWVHNHNGIADNATIVRVDVFEKGGKYYLVPIYSWQVAKGILPDRAVVQGK DEEDWTVMDDSFEFKFVLYANDLIKLTAKKNEFLGYFVSLNRATGAIDIRTH DTDSTKGKNGIFQSVGVKTALSFQKYQIDELGKEIRPCRLKKRPPVR hAID (SEQ ID NO: 43) MDSLLMNRRKFLYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDF GYLRNKNGCHVELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDCARHVADF LRGNPNLSLRIFTARLYFCEDRKAEPEGLRRLHRAGVQIAIMTFKDYFYCWNT FVENHERTFKAWEGLHENSVRLSRQLRRILLPLYEVDDLRDAFRTLGL hAIDv solubility variant lacking N-terminal RNA-binding region (SEQ ID NO: 44) MDPHIFTSNFNNGIGRHKTYLCYEVERLDSATSFSLDFGYLRNKNGCH VELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRGNPNLSLRI FTARLYFCEDRKAEPEGLRRLHRAGVQIAIMTFKDYFYCWNTFVENHERTFK AWEGLHENSVRLSRQLRRILLPLYEVDDLRDAFRTLGL hAIDv solubility variant lacking N-terminal RNA-binding region and the C- terminal poorly structured region (SEQ ID NO: 45) MDPHIFTSNFNNGIGRHKTYLCYEVERLDSATSFSLDFGYLRNKNGCH VELLFLRYISDWDLDPGRCYRVTWFTSWSPCYDCARHVADFLRGNPNLSLRI FTARLYFCEDRKAEPEGLRRLHRAGVQIAIMTFKDYFYCWNTFVENHERTFK AWEGLHENSVRLSRQLRRILLPL rAPOBEC1 (rAPO1) (SEQ ID NO: 46) MSSETGPVAVDPTLRRRIEPHEFEVFFDPRELRKETCLLYEINWGGRHS IWRHTSQNTNKHVEVNFIEKFTTERYFCPNTRCSITWFLSWSPCGECSRAITEF LSRYPHVTLFIYIARLYHHADPRNRQGLRDLISSGVTIQIMTEQESGYCWRNF VNYSPSNEAHWPRYPHLWVRLYVLELYCIILGLPPCLNILRRKQPQLTFFTIAL QSCHYQRLPPHILWATGLK mAPOBEC3 (SEQ ID NO: 47) MGPFCLGCSHRKCYSPIRNLISQETFKFHFKNLGYAKGRKDTFLCYEV TRKDCDSPVSLHHGVFKNKDNIHAEICFLYWFHDKVLKVLSPREEFKITWYM SWSPCFECAEQIVRFLATHENLSLDIFSSRLYNVQDPETQQNLCRLVQEGAQV AAMDLYEFKKCWKKFVDNGGRRFRPWKRLLTNFRYQDSKLQEILRRMDPLS EEEFYSQFYNQRVKHLCYYHRMKPYLCYQLEQFNGQAPLKGCLLSEKGKQH AEILFLDKIRSMELSQVTITCYLTWSPCPNCAWQLAAFKRDRPDLILHIYTSRL YFHWKRPFQKGLCSLWQSGILVDVMDLPQFTDCWTNFVNPKRPFRPWKGLE IISRRTQRRLRRIKESWGLQDLVNDFGNLQLGPPMSN mAPOBEC3 catalytic domain (SEQ ID NO: 48) MGPFCLGCSHRKCYSPIRNLISQETFKFHFKNLGYAKGRKDTFLCYEV TRKDCDSPVSLHHGVFKNKDNIHAEICFLYWFHDKVLKVLSPREEFKITWYM SWSPCFECAEQIVRFLATHHNLSLDIFSSRLYNVQDPETQQNLCRLVQEGAQV AAMDLYEFKKCWKKFVDNGGRRFRPWKRLLTNFRYQDSKLQEILRR hAPOBEC3A (hA3A) (SEQ ID NO: 49) MEASPASGPRHLMDPHIFTSNFNNGIGREIKTYLCYEVERLDNGTSVK MDQHRGFLHNQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWS PCFSWGCAGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQV SIMTYDEFKHCWDTFVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGN hAPOBEC3G (SEQ ID NO: 50) MKPHFRNTVERMYRDTFSYNFYNRPILSRRNTVWLCYEVKTKGPSRP PLDAKIFRGQVYSELKYHPEMRFFHWFSKWRKLHRDQEYEVTWYISWSPCT KCTRDMATFLAEDPKVTLTIFVARLYYFWDPDYQEALRSLCQKRDGPRATM KIMNYDEFQHCWSKFVYSQRELFEPWNNLPKYYILLHIMLGEILRHSMDPPTF TFNFNNEPWVRGRHETYLCYEVERMHNDTWVLLNQRRGFLCNQAPHKHGF LEGRHAELCFLDVIPFWKLDLDQDYRVTCFTSWSPCFSCAQEMAKFISKNKH VSLCIFTARIYDDQGRCQEGLRTLAEAGAKISIMTYSEFKHCWDTFVDHQGCP FQPWDGLDEHSQDLSGRLRAILQNQEN hAPOBEC3G catalytic domain (SEQ ID NO: 51) PPTFTFNFNNEPWVRGRHETYLCYEVERMHNDTWVLLNQRRGFLCN QAPHKHGFLEGRHAELCFLDVIPFWKLDLDQDYRVTCFTSWSPCFSCAQEMA KFISKNKHVSLCIFTARIYDDQGRCQEGLRTLAEAGAKISIMTYSEFKHCWDT FVDHQGCPFQPWDGLDEHSQDLSGRLRAILQNQEN hAPOBEC3H (SEQ ID NO: 52) MALLTAETFRLQFNNKRRLRRPYYPRKALLCYQLTPQNGSTPTRGYFE NKKKCHAEICFINEIKSMGLDETQCYQVTCYLTWSPCSSCAWELVDFIKAHD HLNLGIFASRLYYHWCKPQQKGLRLLCGSQVPVEVMGFPKFADCWENFVDH EKPLSFNPYKMLEELDKNSRAIKRRLERIKIPGVRAQGRYMDILCDAEV hAPOBEC3F (SEQ ID NO: 53) MKPHFRNTVERMYRDTFSYNFYNRPILSRRNTVWLCYEVKTKGPSRP RLDAKIFRGQVYSQPEHHAEMCFLSWFCGNQLPAYKCFQITWFVSWTPCPDC VAKLAEFLAEHPNVTLTISAARLYYYWERDYRRALCRLSQAGARVKIMDDE EFAYCWENFVYSEGQPFMPWYKFDDNYAFLHRTLKEILRNPMEAMYPHIFYF HFKNLRKAYGRNESWLCFTMEVVKHESPVSWKRGVFRNQVDPETHCHAER CFLSWFCDDILSPNTNYEVTWYTSWSPCPECAGEVAEFLARHSNVNLTIFTAR LYYFWDTDYQEGLRSLSQEGASVEIMGYKDFKYCWENFVYNDDEPFKPWK GLKYNFLFLDSKLQEILE hAPOBEC3F catalytic domain (SEQ ID NO: 54) KEILRNPMEAMYPHIFYFHFKNLRKAYGRNESWLCFTMEVVKHEISPV SWKRGVFRNQVDPETHCHAERCFLSWFCDDILSPNTNYEVTWYTSWSPCPEC AGEVAEFLARHSNVNLTIFTARLYYFWDTDYQEGLRSLSQEGASVEIMGYKD FKYCWENFVYNDDEPFKPWKGLKYNFLFLDSKLQEILE C. lari Cas9 (SEQ ID NO: 55) MRILGFDIGINSIGWAFVENDELKDCGVRIFTKAENPKNKESLALPRRN ARSSRRRLKRRKARLIAIKRILAKELKLNYKDYVAADGELPKAYEGSLASVY ELRYKALTQNLETKDLARVILHIAKHRGYMNKNEKKSNDAKKGKILSALKN NALKLENYQSVGEYFYKEFFQKYKKNTKNFIKIRNTKDNYNNCVLSSDLEKE LKLILEKQKEFGYNYSEDFINEILKVAFFQRPLKDFSHLVGACTFFEEEKRACK NSYSAWEFVALTKIINEIKSLEKISGEIVPTQTINEVLNLILDKGSITYKKFRSCI NLHESISFKSLKYDKENAENAKLIDFRKLVEFKKALGVHSLSRQELDQISTHIT LIKDNVKLKTVLEKYNLSNEQINNLLEIEFNDYINLSFKALGMILPLMREGKR YDEACEIANLKPKTVDEKKDFLPAFCDSIFAHELSNPVVNRAISEYRKVLNAL LKKYGKVHKIHLELARDVGLSKKAREKIEKEQKENQAVNAWALKECENIGL KASAKNILKLKLWKEQKEICIYSGNKISIEHLKDEKALEVDHIYPYSRSFDDSFI NKVLVFTKENQEKLNKTPFEAFGKNIEKWSKIQTLAQNLPYKKKNKILDENF KDKQQEDFISRNLNDTRYIATLIAKYTKEYLNFLLLSENENANLKSGEKGSKI
HVQTISGMLTSVLRHTWGFDKKDRNNHLHHALDAIIVAYSTNSIIKAFSDFRK NQELLKARFYAKELTSDNYKHQVKFFEPFKSFREKILSKIDEIFVSKPPRKRAR RALHKDTFHSENKIIDKCSYNSKEGLQIALSCGRVRKIGTKYVENDTIVRVDIF KKQNKFYAIPIYAMDFALGILPNKIVITGKDKNNNPKQWQTIDESYEFCFSLY KNDLILLQKKNMQEPEFAYYNDFSISTSSICVEKHDNKFENLTSNQKLLFSNA KEGSVKVESLGIQNLKVFEKYIITPLGDKIKADFQPRENISLKTSKKYGLR MbCpf1 (SEQ ID NO: 56) MLFQDFTHLYPLSKTVRFELKPIDRTLEHIHAKNFLSQDETMADMHQKVKVI LDDYHRDFIADMMGEVKLTKLAEFYDVYLKFRKNPKDDELQKQLKDLQAV LRKEIVKPIGNGGKYKAGYDRLFGAKLFKDGKELGDLAKFVIAQEGESSPKL AHLAHFEKFSTYFTGFHDNRKNMYSDEDKHTAIAYRLIHENLPRFIDNLQILT TIKQKHSALYDQIINELTASGLDVSLASHLDGYHKLLTQEGITAYNTLLGGISG EAGSPKIQGINELINSHHNQHCHKSERIAKLRPLHKQILSDGMSVSFLPSKFAD DSEMCQAVNEFYRHYADVFAKVQSLFDGFDDHQKDGIYVEHKNLNELSKQA FGDFALLGRVLDGYYVDVVNPEFNERFAKAKTDNAKAKLTKEKDKFIKGVH SLASLEQAIEHYTARHDDESVQAGKLGQYFKHGLAGVDNPIQKIHNNHSTIK GFLERERPAGERALPKIKSGKNPEMTQLRQLKELLDNALNVAHFAKLLTTKT TLDNQDGNFYGEFGVLYDELAKIPTLYNKVRDYLSQKPFSTEKYKLNFGNPT LLNGWDLNKEKDNFGVILQKDGCYYLALLDKAHKKVFDNAPNTGKSIYQK MIYKYLEVRKQFPKVFFSKEAIAINYHPSKELVEIKDKGRQRSDDERLKLYRFI LECLKIHPKYDKKFEGAIGDIQLFKKDKKGREVPISEKDLFDKINGIFSSKPKLE MEDFFIGEFKRYNPSQDLVDQYNIYKKIDSNDNRKKENFYNNHPKFKKDLVR YYYESMCKHEEWEESFEFSKKLQDIGCYVDVNELFTEIETRRLNYKISFCNIN ADYIDELVEQGQLYLFQIYNKDFSPKAHGKPNLHTLYFKALFSEDNLADPIYK LNGEAQIFYRKASLDMNETTIHRAGEVLENKNPDNPKKRQFVYDIIKDKRYT QDKFMLHVPITMNFGVQGMTIKEFNKKVNQSIQQYDEVNVIGIDRGERHLLY LTVINSKGEILEQCSLNDITTASANGTQMTTPYHKILDKREIERLNARVGWGEI ETIKELKSGYLSHVVHQISQLMLKYNAIVVLEDLNFGFKRGRFKVEKQIYQNF ENALIKKLNHLVLKDKADDEIGSYKNALQLTNNFTDLKSIGKQTGFLFYVPA WNTSKIDPETGFVDLLKPRYENIAQSQAFFGKFDKICYNADKDYFEFHIDYAK FTDKAKNSRQIWTICSHGDKRYVYDKTANQNKGAAKGINVNDELKSLFARH HINEKQPNLVMDICQNNDKEFHKSLMYLLKTLLALRYSNASSDEDFILSPVAN DEGVFFNSALADDTQPQNADANGAYHIALKGLWLLNELKNSDDLNKVKLAI DNQTWLNFAQNRKRPAATKKAGQAKKKKGSYPYDVPDYAYPYDVPDYAYP YDVPDYA
Example 1
[0117] Since multiple mammalian deaminases may constitute functional BE proteins, and multiple truncation points may result in functional split BEs, we sought to examine an extensive set of split BE candidate pairs (Table 2, above, shows a representative list of deaminases that may be suitable for BE applications). Deaminase truncation points were chosen by evaluating structural information and determining which amino acid residues within the deaminase domains were unlikely to contribute to meaningful secondary structural components and were thus unlikely to affect the functionality of an intact enzyme. We chose six potential truncations regions for six mammalian deaminases (a full table of predicted split regions is included in Table 3, above). Each split region corresponds to homologous regions in each of the listed deaminases based on protein alignment. sDA1 halfases that contain a truncation variant from split region X are referred to as sDA1.X, and sDA2 halfases are similarly named.
[0118] Tables 4-6 show the exact truncation variants that we have created and evaluated. In addition to exactly reciprocal halfase pairs (in which the sDA1 and sDA2 portions of the BE contain truncation variants of a deaminase perfectly bisected by a defined split site), we also tested split BEs in which the halfases shared overlapping peptide sequences. We reasoned that this "extra" overlap may enable proper folding of the constituent halfases so as to enable functional reconstitution of the deamniase, and also noted that the most functional split yCD pair included a significant overlap in peptide sequence.sup.12.
TABLE-US-00025 TABLE 4 Exact sDA split sites chosen. Split 1 Split 2 Split 3 sDA1.1 sDA2.1 sDA1.2 sDA2.2 sDA1.3 sDA2.3 rAPOBEC1 1-50 51-229 1-78 79-229 1-91 92-229 hAPOBEC3A 1-63 67-199 1-86 87-199 1-99 100-199 Split 4 Split 5 Split 6 sDA1.4 sDA2.4 sDA1.5 sDA2.6 sDA1.6 sDA2.6 rAPOBEC1 1-108 109-229 1-144 145-229 1-160 161-229 hAPOBEC3A 1-118 119-199 1-153 154-199 1-172 173-199
Split sites chosen and examined in this study so far, by deaminase domain. Amino acid positions of each species is given, according to the sequences given at the end of this document.
TABLE-US-00026 TABLE 5 rAPOBEC1 sDA BE combination pairs examined sDA1.1 sDA1.2 sDA1.3 sDA1.4 sDA1.5 sDA1.6 sDA2.1 Yes Yes No No No No sDA2.2 Yes Yes Yes No No No sDA2.3 No Yes Yes Yes No No sDA2.4 No No Yes Yes Yes No sDA2.5 No No No No No No sDA2.6 No No No No Yes Yes
TABLE-US-00027 TABLE 6 hAPOBEC3A sDA BE combination pairs examined sDA1.1 sDA1.2 sDA1.3 sDA1.4 sDA1.5 sDA1.6 sDA2.1 Yes No No No No No sDA2.2 No Yes No No No No sDA2.3 No No Yes No No No sDA2.4 No No No Yes No No sDA2.5 No No No No Yes No sDA2.6 No No No No No Yes
[0119] Several split BE halfase combinations showed activity when targeted to adjacent DNA sequences in a human HEK293 cell line in which the EGFP reporter gene has been integrated. Each rAPOBEC1 pair was tested in two different orientations with regards to the ZF and gRNA binding sites, with two different ZF domains and two different gRNAs for 4 total orientation pairs. Only directly reciprocal hAPOBEC3A pairs were tested (e.g. sDA1.1 with sDA2.1). Activity of each BE halfase pair when co-delivered by plasmid transfection with an approximate ratio of 1:1 for each halfase is shown in FIGS. 4-16 (FIG. 17 is a positive BE3 control for comparison) for each orientation of rAPO1 sDA pairs and FIG. 20 for hA3A pairs. A summary of the cumulative editing efficiencies (the sum of the editing rates at the cytosines within the gRNA editing window) of all rAPO1 halfase pairs in each orientation is given in FIG. 21. The target site configurations for and all DNA targeting proteins used for rAPO1 experiments is shown in FIG. 22. All rAPO1 split BEs shown include an sDA1 halfase with an sDA1-3AC3L-NLS-ZF configuration, while all hA3A split BEs include an sDA1 with an sDA1-NLS-ZF configuration. In particular, and in various orientations, rAPO1 sDA1.1+rAPO1 sDA2.1, rAPO1 sDA1.2+rAPO1 sDA2.1, rAPO1 sDA1.2+rAPO1 sDA2.2, hA3A sDA1.1+sDA2.1, and hA3A 1.6+hA3A 2.6 show significant activity compared to a positive BE3 control (FIGS. 4, 6, 7, 20, and 21).
[0120] It is conceivable and likely that optimizations of several parameters--including the nature of the protein linker between the sDA components and the targeting domains, the spacing between the sDA1 and sDA2 binding sites, the exact sites at which the sDAs are split, the relative concentrations of the sDA components, the source deaminase, the source of the nCas9 targeting mechanism, and the type of targeting domain used for the sDA1 component--could influence, change, or enhance the nature of a split BE platform. Furthermore, it is likely that split base editor pairs that do not include an sDA2 fused to an nCas9-UGI domain as in previously described base editors will still retain some limited mutagentic capacity so long as their DNA targeting proteins are brought to adjacent sequences of DNA, since the reconstituted deaminase domain will still be active around such target sites.
[0121] Importantly, none of the individual rAPO1 halfases are active on their own, suggesting a requirement for both halfases for genuine reconstitution of the functional deaminase domain (FIGS. 18 and 19). Furthermore, the fact that functional combinations seem to prefer certain orientations over others (for instance, rAPO1 sDA1.1+rAPO1 sDA2.1 is mostly functional in one of the orientations tested) suggests that the deaminase domains genuinely require adjacent binding of their DNA-targeting domains to function, making it unlikely that the deaminase domain can become reconstituted at other sites in the genome and thus unlikely to cause spurious genomic deamination.
REFERENCES
[0122] 1. Komor, Alexis C., Yongjoo B. Kim, Michael S. Packer, John A. Zuris, and David R. Liu. "Programmable Editing of a Target Base in Genomic DNA without Double-stranded DNA Cleavage." Nature 533.7603 (2016): 420-24.
[0123] 2. Yang, Luhan, Adrian W. Briggs, Wei Leong Chew, Prashant Mali, Marc Guell, John Aach, Daniel Bryan Goodman, David Cox, Yinan Kan, Emal Lesha, Venkataramanan Soundararaj an, Feng Zhang, and George Church. "Engineering and Optimising Deaminase Fusions for Genome Editing." Nature Communications 7 (2016): 13330.
[0124] 3. Jasin, Maria, and Rodney Rothstein. "Repair of strand breaks by homologous recombination." Cold Spring Harbor perspectives in biology 5.11 (2013): a012740.
[0125] 4. Harris, Reuben S., Svend K. Petersen-Mahrt, and Michael S. Neuberger. "RNA Editing Enzyme APOBEC1 and Some of Its Homologs Can Act as DNA Mutators." Molecular Cell 10.5 (2002): 1247-253.
[0126] 5. Nishida, K., T. Arazoe, N. Yachie, S. Banno, M. Kakimoto, M. Tabata, M. Mochizuki, A. Miyabe, M. Araki, K. Y. Hara, Z. Shimatani, and A. Kondo. "Targeted Nucleotide Editing Using Hybrid Prokaryotic and Vertebrate Adaptive Immune Systems." Science 353.6305 (2016).
[0127] 6. Santos-Pereira, Jose M., and Andres Aguilera. "R Loops: New Modulators of Genome Dynamics and Function." Nature Reviews Genetics 16.10 (2015): 583-97.
[0128] 7. Rebhandl, Stefan, Michael Huemer, Richard Greil, and Roland Geisberger. "AID/APOBEC Deaminases and Cancer." Oncoscience 2 (2015): 320.
[0129] 8. Suspene, Rodolphe, et al. "Recovery of APOBEC3-edited human immunodeficiency virus G.fwdarw.A hypermutants by differential DNA denaturation PCR." Journal of general virology 86.1 (2005): 125-129.
[0130] 9. Aynaud, Marie-Ming, et al. "Human Tribbles 3 protects nuclear DNA from cytidine deamination by APOBEC3A." Journal of Biological Chemistry 287.46 (2012): 39182-39192.
[0131] 10. Shinohara, Masanobu, et al. "APOBEC3B can impair genomic stability by inducing base substitutions in genomic DNA in human cells." Scientific reports 2 (2012): 806.
[0132] 11. Holtz, Colleen M., Holly A. Sadler, and Louis M. Mansky. "APOBEC3G cytosine deamination hotspots are defined by both sequence context and single-stranded DNA secondary structure." Nucleic acids research (2013): gkt246.
[0133] 12. Ear, Po Hien, and Stephen W. Michnick. "A General Life-death Selection Strategy for Dissecting Protein Functions." Nature Methods 6.11 (2009): 813-16.
[0134] 13. Rees, Holly A., et al. "Improving the DNA specificity and applicability of base editing through protein engineering and protein delivery." Nature Communications 8 (2017): ncomms15790.
[0135] 14. Pattanayak, Vikram, et al. "Revealing off-Target Cleavage Specificities of Zinc-Finger Nucleases by in Vitro Selection." Nature Methods, vol. 8, no. 9, July 2011, pp. 765-770., doi:10.1038/nmeth.1670.
[0136] 15. Maeder, Morgan L., et al. "Rapid `Open-Source` Engineering of Customized Zinc-Finger Nucleases for Highly Efficient Gene Modification." Molecular Cell, vol. 31, no. 2, 2008, pp. 294-301., doi:10.1016/j.molce1.2008.06.016.
[0137] 16. Jason M. Gerhke, Oliver R. Cervantes, M. Kendell Clement, Luca Pinello, J. Keith Joung, "High-precision CRISPR-Cas9 base editors with minimized bystander and off-target mutations," bioRxiv 273938; doi: doi.org/10.1101/273938.
[0138] 17. CA2915837A1
[0139] 18. Friedland, Ari E., et al. "Characterization of Staphylococcus Aureus Cas9: a Smaller Cas9 for All-in-One Adeno-Associated Virus Delivery and Paired Nickase Applications." Genome Biology, vol. 16, no. 1, 2015, doi:10.1186/s13059-015-0817-8.
[0140] 19. Gasiunas, G., et al. "Cas9-CrRNA Ribonucleoprotein Complex Mediates Specific DNA Cleavage for Adaptive immunity in Bacteria." Proceedings of the National Academy of Sciences, vol. 109, no. 39, April 2012, doi:10.1073/pnas.1208507109.
[0141] 20. Yamada, Mari, et al. "Crystal Structure of the Minimal Cas9 from Campylobacter Jejuni Reveals the Molecular Diversity in the CRISPR-Cas9 Systems." Molecular Cell, vol. 65, no. 6, 2017, doi:10.1016/j.molcel.2017.02.007.
[0142] 21. Zetsche, Bernd, et al. "Cpf1 Is a Single RNA-Guided Endonuclease of a Class 2 CRISPR-Cas System." Cell, vol. 163, no. 3, 2015, pp. 759-771., doi:10,1016/j.cell.2015.09.038,
[0143] 22. Hirano, H., et al. "Crystal Structure of Francisella Novicida Cas9 RHA in Complex with sgRNA and Target DNA (TGG PAM)." February, 2016, doi:10.2210/pdb5b2q/pdb.
[0144] 23. Yamano, T., et al. "Crystal Structure of Acidaminococcus Sp. Cpf1 in Complex with CrRNA and Target DNA," April 2016, doi:10.2210/pd
[0145] 24. Tang, Xu, et al. "A CRISPR-Cpf1 System for Efficient Genome Editing and Transcriptional Repression in Plants." Nature Plants, vol. 3, no. 7, 2017, p. 17103., doi:10.1038/nplants.2017.103.
OTHER EMBODIMENTS
[0146] It is to be understood that while the invention has been described in conjunction with the detailed description thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims.
Sequence CWU
1
1
56150DNAArtificial SequenceEGFP gRNA target sequence 1cttcttcaag
tccgccatgc ccgaaggcta cgtccaggag cgcaccatct
50222DNAArtificial SequencegRNA target region 2catgcccgaa ggctacgtcc ag
22399DNAArtificial
SequenceEGFP target region 3gcttcagccg ctaccccgac cacatgaagc agcacgactt
cttcaagtcc gccatgcccg 60aaggctacgt ccaggagcgc accatcttct tcaaggacg
99483PRTArtificial SequenceUracil glycosylase
inhibitor sequence 4Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys
Gln Leu Val1 5 10 15Ile
Gln Glu Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu Val Ile 20
25 30Gly Asn Lys Pro Glu Ser Asp Ile
Leu Val His Thr Ala Tyr Asp Glu 35 40
45Ser Thr Asp Glu Asn Val Met Leu Leu Thr Ser Asp Ala Pro Glu Tyr
50 55 60Lys Pro Trp Ala Leu Val Ile Gln
Asp Ser Asn Gly Glu Asn Lys Ile65 70 75
80Lys Met Leu54PRTArtificial Sequencelinker sequence
5Gly Gly Gly Ser165PRTArtificial Sequencelinker sequence 6Gly Gly Gly Gly
Ser1 577PRTArtificial SequenceSV40 large T antigen NLS 7Pro
Lys Lys Lys Arg Arg Val1 5816PRTArtificial
Sequencenucleoplasmin NLS 8Lys Arg Pro Ala Ala Thr Lys Lys Ala Gly Gln
Ala Lys Lys Lys Lys1 5 10
15911DNAArtificial SequenceZF1 Binding site 9agaagatggt g
111011DNAArtificial SequenceZF2
Binding Site 10ggtcggggta g
111123DNAArtificial SequencegRNA1 Binding Site (with PAM)
11ttcaagtccg ccatgcccga agg
231223DNAArtificial SequenceBinding Site (with PAM) 12catgcccgaa
ggctacgtcc agg
231338PRTArtificial Sequence3AC3L-NLS Linker 13Ser Ser Gly Asn Ser Asn
Ala Asn Ser Arg Gly Pro Ser Phe Ser Ser1 5
10 15Gly Leu Val Pro Leu Ser Leu Arg Gly Ser His Gly
Ser Pro Lys Lys 20 25 30Lys
Arg Lys Val Gly Ser 351411PRTArtificial SequenceNLS Linker 14Gly
Ser Pro Lys Lys Lys Arg Lys Val Gly Ser1 5
1015181PRTArtificial SequenceN-rAPOBEC1
sDA1.1-3AC3L-NLS-ZF-Cmisc_feature(107)..(110)Xaa can be any naturally
occurring amino acidmisc_feature(112)..(113)Xaa can be any naturally
occurring amino acidmisc_feature(135)..(138)Xaa can be any naturally
occurring amino acidmisc_feature(140)..(141)Xaa can be any naturally
occurring amino acidmisc_feature(163)..(166)Xaa can be any naturally
occurring amino acidmisc_feature(168)..(169)Xaa can be any naturally
occurring amino acid 15Met Ser Ser Glu Thr Gly Pro Val Ala Val Asp Pro
Thr Leu Arg Arg1 5 10
15Arg Ile Glu Pro His Glu Phe Glu Val Phe Phe Asp Pro Arg Glu Leu
20 25 30Arg Lys Glu Thr Cys Leu Leu
Tyr Glu Ile Asn Trp Gly Gly Arg His 35 40
45Ser Ile Ser Ser Gly Asn Ser Asn Ala Asn Ser Arg Gly Pro Ser
Phe 50 55 60Ser Ser Gly Leu Val Pro
Leu Ser Leu Arg Gly Ser His Gly Ser Pro65 70
75 80Lys Lys Lys Arg Lys Val Gly Ser Ser Arg Pro
Gly Glu Arg Pro Phe 85 90
95Gln Cys Arg Ile Cys Met Arg Asn Phe Ser Xaa Xaa Xaa Xaa Leu Xaa
100 105 110Xaa His Thr Arg Thr His
Thr Gly Glu Lys Pro Phe Gln Cys Arg Ile 115 120
125Cys Met Arg Asn Phe Ser Xaa Xaa Xaa Xaa Leu Xaa Xaa His
Leu Arg 130 135 140Thr His Thr Gly Glu
Lys Pro Phe Gln Cys Arg Ile Cys Met Arg Asn145 150
155 160Phe Ser Xaa Xaa Xaa Xaa Leu Xaa Xaa His
Leu Lys Thr His Leu Arg 165 170
175Gly Ser Ser Ala Gln 18016209PRTArtificial
SequenceN-rAPOBEC1 sDA1.2-3AC3L-NLS-ZF-Cmisc_feature(135)..(138)Xaa can
be any naturally occurring amino acidmisc_feature(140)..(141)Xaa can be
any naturally occurring amino acidmisc_feature(163)..(166)Xaa can be any
naturally occurring amino acidmisc_feature(168)..(169)Xaa can be any
naturally occurring amino acidmisc_feature(191)..(194)Xaa can be any
naturally occurring amino acidmisc_feature(196)..(197)Xaa can be any
naturally occurring amino acid 16Met Ser Ser Glu Thr Gly Pro Val Ala Val
Asp Pro Thr Leu Arg Arg1 5 10
15Arg Ile Glu Pro His Glu Phe Glu Val Phe Phe Asp Pro Arg Glu Leu
20 25 30Arg Lys Glu Thr Cys Leu
Leu Tyr Glu Ile Asn Trp Gly Gly Arg His 35 40
45Ser Ile Trp Arg His Thr Ser Gln Asn Thr Asn Lys His Val
Glu Val 50 55 60Asn Phe Ile Glu Lys
Phe Thr Thr Glu Arg Tyr Phe Cys Pro Ser Ser65 70
75 80Gly Asn Ser Asn Ala Asn Ser Arg Gly Pro
Ser Phe Ser Ser Gly Leu 85 90
95Val Pro Leu Ser Leu Arg Gly Ser His Gly Ser Pro Lys Lys Lys Arg
100 105 110Lys Val Gly Ser Ser
Arg Pro Gly Glu Arg Pro Phe Gln Cys Arg Ile 115
120 125Cys Met Arg Asn Phe Ser Xaa Xaa Xaa Xaa Leu Xaa
Xaa His Thr Arg 130 135 140Thr His Thr
Gly Glu Lys Pro Phe Gln Cys Arg Ile Cys Met Arg Asn145
150 155 160Phe Ser Xaa Xaa Xaa Xaa Leu
Xaa Xaa His Leu Arg Thr His Thr Gly 165
170 175Glu Lys Pro Phe Gln Cys Arg Ile Cys Met Arg Asn
Phe Ser Xaa Xaa 180 185 190Xaa
Xaa Leu Xaa Xaa His Leu Lys Thr His Leu Arg Gly Ser Ser Ala 195
200 205Gln17222PRTArtificial
SequenceN-rAPOBEC1 sDA1.3-3AC3L-NLS-ZF-Cmisc_feature(148)..(151)Xaa can
be any naturally occurring amino acidmisc_feature(153)..(154)Xaa can be
any naturally occurring amino acidmisc_feature(176)..(179)Xaa can be any
naturally occurring amino acidmisc_feature(181)..(182)Xaa can be any
naturally occurring amino acidmisc_feature(204)..(207)Xaa can be any
naturally occurring amino acidmisc_feature(209)..(210)Xaa can be any
naturally occurring amino acid 17Met Ser Ser Glu Thr Gly Pro Val Ala Val
Asp Pro Thr Leu Arg Arg1 5 10
15Arg Ile Glu Pro His Glu Phe Glu Val Phe Phe Asp Pro Arg Glu Leu
20 25 30Arg Lys Glu Thr Cys Leu
Leu Tyr Glu Ile Asn Trp Gly Gly Arg His 35 40
45Ser Ile Trp Arg His Thr Ser Gln Asn Thr Asn Lys His Val
Glu Val 50 55 60Asn Phe Ile Glu Lys
Phe Thr Thr Glu Arg Tyr Phe Cys Pro Asn Thr65 70
75 80Arg Cys Ser Ile Thr Trp Phe Leu Ser Trp
Ser Ser Ser Gly Asn Ser 85 90
95Asn Ala Asn Ser Arg Gly Pro Ser Phe Ser Ser Gly Leu Val Pro Leu
100 105 110Ser Leu Arg Gly Ser
His Gly Ser Pro Lys Lys Lys Arg Lys Val Gly 115
120 125Ser Ser Arg Pro Gly Glu Arg Pro Phe Gln Cys Arg
Ile Cys Met Arg 130 135 140Asn Phe Ser
Xaa Xaa Xaa Xaa Leu Xaa Xaa His Thr Arg Thr His Thr145
150 155 160Gly Glu Lys Pro Phe Gln Cys
Arg Ile Cys Met Arg Asn Phe Ser Xaa 165
170 175Xaa Xaa Xaa Leu Xaa Xaa His Leu Arg Thr His Thr
Gly Glu Lys Pro 180 185 190Phe
Gln Cys Arg Ile Cys Met Arg Asn Phe Ser Xaa Xaa Xaa Xaa Leu 195
200 205Xaa Xaa His Leu Lys Thr His Leu Arg
Gly Ser Ser Ala Gln 210 215
22018239PRTArtificial SequenceN-rAPOBEC1
sDA1.4-3AC3L-NLS-ZF-Cmisc_feature(165)..(168)Xaa can be any naturally
occurring amino acidmisc_feature(170)..(171)Xaa can be any naturally
occurring amino acidmisc_feature(193)..(196)Xaa can be any naturally
occurring amino acidmisc_feature(198)..(199)Xaa can be any naturally
occurring amino acidmisc_feature(221)..(224)Xaa can be any naturally
occurring amino acidmisc_feature(226)..(227)Xaa can be any naturally
occurring amino acid 18Met Ser Ser Glu Thr Gly Pro Val Ala Val Asp Pro
Thr Leu Arg Arg1 5 10
15Arg Ile Glu Pro His Glu Phe Glu Val Phe Phe Asp Pro Arg Glu Leu
20 25 30Arg Lys Glu Thr Cys Leu Leu
Tyr Glu Ile Asn Trp Gly Gly Arg His 35 40
45Ser Ile Trp Arg His Thr Ser Gln Asn Thr Asn Lys His Val Glu
Val 50 55 60Asn Phe Ile Glu Lys Phe
Thr Thr Glu Arg Tyr Phe Cys Pro Asn Thr65 70
75 80Arg Cys Ser Ile Thr Trp Phe Leu Ser Trp Ser
Pro Cys Gly Glu Cys 85 90
95Ser Arg Ala Ile Thr Glu Phe Leu Ser Arg Tyr Pro Ser Ser Gly Asn
100 105 110Ser Asn Ala Asn Ser Arg
Gly Pro Ser Phe Ser Ser Gly Leu Val Pro 115 120
125Leu Ser Leu Arg Gly Ser His Gly Ser Pro Lys Lys Lys Arg
Lys Val 130 135 140Gly Ser Ser Arg Pro
Gly Glu Arg Pro Phe Gln Cys Arg Ile Cys Met145 150
155 160Arg Asn Phe Ser Xaa Xaa Xaa Xaa Leu Xaa
Xaa His Thr Arg Thr His 165 170
175Thr Gly Glu Lys Pro Phe Gln Cys Arg Ile Cys Met Arg Asn Phe Ser
180 185 190Xaa Xaa Xaa Xaa Leu
Xaa Xaa His Leu Arg Thr His Thr Gly Glu Lys 195
200 205Pro Phe Gln Cys Arg Ile Cys Met Arg Asn Phe Ser
Xaa Xaa Xaa Xaa 210 215 220Leu Xaa Xaa
His Leu Lys Thr His Leu Arg Gly Ser Ser Ala Gln225 230
23519275PRTArtificial SequenceN-rAPOBEC1
sDA1.5-3AC3L-NLS-ZF-Cmisc_feature(201)..(204)Xaa can be any naturally
occurring amino acidmisc_feature(206)..(207)Xaa can be any naturally
occurring amino acidmisc_feature(229)..(232)Xaa can be any naturally
occurring amino acidmisc_feature(234)..(235)Xaa can be any naturally
occurring amino acidmisc_feature(257)..(260)Xaa can be any naturally
occurring amino acidmisc_feature(262)..(263)Xaa can be any naturally
occurring amino acid 19Met Ser Ser Glu Thr Gly Pro Val Ala Val Asp Pro
Thr Leu Arg Arg1 5 10
15Arg Ile Glu Pro His Glu Phe Glu Val Phe Phe Asp Pro Arg Glu Leu
20 25 30Arg Lys Glu Thr Cys Leu Leu
Tyr Glu Ile Asn Trp Gly Gly Arg His 35 40
45Ser Ile Trp Arg His Thr Ser Gln Asn Thr Asn Lys His Val Glu
Val 50 55 60Asn Phe Ile Glu Lys Phe
Thr Thr Glu Arg Tyr Phe Cys Pro Asn Thr65 70
75 80Arg Cys Ser Ile Thr Trp Phe Leu Ser Trp Ser
Pro Cys Gly Glu Cys 85 90
95Ser Arg Ala Ile Thr Glu Phe Leu Ser Arg Tyr Pro His Val Thr Leu
100 105 110Phe Ile Tyr Ile Ala Arg
Leu Tyr His His Ala Asp Pro Arg Asn Arg 115 120
125Gln Gly Leu Arg Asp Leu Ile Ser Ser Gly Val Thr Ile Gln
Ile Met 130 135 140Ser Ser Gly Asn Ser
Asn Ala Asn Ser Arg Gly Pro Ser Phe Ser Ser145 150
155 160Gly Leu Val Pro Leu Ser Leu Arg Gly Ser
His Gly Ser Pro Lys Lys 165 170
175Lys Arg Lys Val Gly Ser Ser Arg Pro Gly Glu Arg Pro Phe Gln Cys
180 185 190Arg Ile Cys Met Arg
Asn Phe Ser Xaa Xaa Xaa Xaa Leu Xaa Xaa His 195
200 205Thr Arg Thr His Thr Gly Glu Lys Pro Phe Gln Cys
Arg Ile Cys Met 210 215 220Arg Asn Phe
Ser Xaa Xaa Xaa Xaa Leu Xaa Xaa His Leu Arg Thr His225
230 235 240Thr Gly Glu Lys Pro Phe Gln
Cys Arg Ile Cys Met Arg Asn Phe Ser 245
250 255Xaa Xaa Xaa Xaa Leu Xaa Xaa His Leu Lys Thr His
Leu Arg Gly Ser 260 265 270Ser
Ala Gln 27520291PRTArtificial SequenceN-rAPOBEC1
sDA1.6-3AC3L-NLS-ZF-Cmisc_feature(217)..(220)Xaa can be any naturally
occurring amino acidmisc_feature(222)..(223)Xaa can be any naturally
occurring amino acidmisc_feature(245)..(248)Xaa can be any naturally
occurring amino acidmisc_feature(250)..(251)Xaa can be any naturally
occurring amino acidmisc_feature(273)..(276)Xaa can be any naturally
occurring amino acidmisc_feature(278)..(279)Xaa can be any naturally
occurring amino acid 20Met Ser Ser Glu Thr Gly Pro Val Ala Val Asp Pro
Thr Leu Arg Arg1 5 10
15Arg Ile Glu Pro His Glu Phe Glu Val Phe Phe Asp Pro Arg Glu Leu
20 25 30Arg Lys Glu Thr Cys Leu Leu
Tyr Glu Ile Asn Trp Gly Gly Arg His 35 40
45Ser Ile Trp Arg His Thr Ser Gln Asn Thr Asn Lys His Val Glu
Val 50 55 60Asn Phe Ile Glu Lys Phe
Thr Thr Glu Arg Tyr Phe Cys Pro Asn Thr65 70
75 80Arg Cys Ser Ile Thr Trp Phe Leu Ser Trp Ser
Pro Cys Gly Glu Cys 85 90
95Ser Arg Ala Ile Thr Glu Phe Leu Ser Arg Tyr Pro His Val Thr Leu
100 105 110Phe Ile Tyr Ile Ala Arg
Leu Tyr His His Ala Asp Pro Arg Asn Arg 115 120
125Gln Gly Leu Arg Asp Leu Ile Ser Ser Gly Val Thr Ile Gln
Ile Met 130 135 140Thr Glu Gln Glu Ser
Gly Tyr Cys Trp Arg Asn Phe Val Asn Tyr Ser145 150
155 160Ser Ser Gly Asn Ser Asn Ala Asn Ser Arg
Gly Pro Ser Phe Ser Ser 165 170
175Gly Leu Val Pro Leu Ser Leu Arg Gly Ser His Gly Ser Pro Lys Lys
180 185 190Lys Arg Lys Val Gly
Ser Ser Arg Pro Gly Glu Arg Pro Phe Gln Cys 195
200 205Arg Ile Cys Met Arg Asn Phe Ser Xaa Xaa Xaa Xaa
Leu Xaa Xaa His 210 215 220Thr Arg Thr
His Thr Gly Glu Lys Pro Phe Gln Cys Arg Ile Cys Met225
230 235 240Arg Asn Phe Ser Xaa Xaa Xaa
Xaa Leu Xaa Xaa His Leu Arg Thr His 245
250 255Thr Gly Glu Lys Pro Phe Gln Cys Arg Ile Cys Met
Arg Asn Phe Ser 260 265 270Xaa
Xaa Xaa Xaa Leu Xaa Xaa His Leu Lys Thr His Leu Arg Gly Ser 275
280 285Ser Ala Gln 290211661PRTArtificial
SequenceN-rAPOBEC1 sDA2.1-nCas9-UGI-C 21Met Trp Arg His Thr Ser Gln Asn
Thr Asn Lys His Val Glu Val Asn1 5 10
15Phe Ile Glu Lys Phe Thr Thr Glu Arg Tyr Phe Cys Pro Asn
Thr Arg 20 25 30Cys Ser Ile
Thr Trp Phe Leu Ser Trp Ser Pro Cys Gly Glu Cys Ser 35
40 45Arg Ala Ile Thr Glu Phe Leu Ser Arg Tyr Pro
His Val Thr Leu Phe 50 55 60Ile Tyr
Ile Ala Arg Leu Tyr His His Ala Asp Pro Arg Asn Arg Gln65
70 75 80Gly Leu Arg Asp Leu Ile Ser
Ser Gly Val Thr Ile Gln Ile Met Thr 85 90
95Glu Gln Glu Ser Gly Tyr Cys Trp Arg Asn Phe Val Asn
Tyr Ser Pro 100 105 110Ser Asn
Glu Ala His Trp Pro Arg Tyr Pro His Leu Trp Val Arg Leu 115
120 125Tyr Val Leu Glu Leu Tyr Cys Ile Ile Leu
Gly Leu Pro Pro Cys Leu 130 135 140Asn
Ile Leu Arg Arg Lys Gln Pro Gln Leu Thr Phe Phe Thr Ile Ala145
150 155 160Leu Gln Ser Cys His Tyr
Gln Arg Leu Pro Pro His Ile Leu Trp Ala 165
170 175Thr Gly Leu Lys Ser Gly Ser Glu Thr Pro Gly Thr
Ser Glu Ser Ala 180 185 190Thr
Pro Glu Ser Asp Lys Lys Tyr Ser Ile Gly Leu Ala Ile Gly Thr 195
200 205Asn Ser Val Gly Trp Ala Val Ile Thr
Asp Glu Tyr Lys Val Pro Ser 210 215
220Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys225
230 235 240Asn Leu Ile Gly
Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu Ala 245
250 255Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg
Tyr Thr Arg Arg Lys Asn 260 265
270Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val
275 280 285Asp Asp Ser Phe Phe His Arg
Leu Glu Glu Ser Phe Leu Val Glu Glu 290 295
300Asp Lys Lys His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp
Glu305 310 315 320Val Ala
Tyr His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys
325 330 335Leu Val Asp Ser Thr Asp Lys
Ala Asp Leu Arg Leu Ile Tyr Leu Ala 340 345
350Leu Ala His Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu
Gly Asp 355 360 365Leu Asn Pro Asp
Asn Ser Asp Val Asp Lys Leu Phe Ile Gln Leu Val 370
375 380Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro Ile
Asn Ala Ser Gly385 390 395
400Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg
405 410 415Leu Glu Asn Leu Ile
Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu 420
425 430Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu Thr
Pro Asn Phe Lys 435 440 445Ser Asn
Phe Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys Asp 450
455 460Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala
Gln Ile Gly Asp Gln465 470 475
480Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu
485 490 495Leu Ser Asp Ile
Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro Leu 500
505 510Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His
His Gln Asp Leu Thr 515 520 525Leu
Leu Lys Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys Glu 530
535 540Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr
Ala Gly Tyr Ile Asp Gly545 550 555
560Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu
Glu 565 570 575Lys Met Asp
Gly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu Asp 580
585 590Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn
Gly Ser Ile Pro His Gln 595 600
605Ile His Leu Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe 610
615 620Tyr Pro Phe Leu Lys Asp Asn Arg
Glu Lys Ile Glu Lys Ile Leu Thr625 630
635 640Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala Arg
Gly Asn Ser Arg 645 650
655Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn
660 665 670Phe Glu Glu Val Val Asp
Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu 675 680
685Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val
Leu Pro 690 695 700Lys His Ser Leu Leu
Tyr Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr705 710
715 720Lys Val Lys Tyr Val Thr Glu Gly Met Arg
Lys Pro Ala Phe Leu Ser 725 730
735Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg
740 745 750Lys Val Thr Val Lys
Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu 755
760 765Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu Asp
Arg Phe Asn Ala 770 775 780Ser Leu Gly
Thr Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp785
790 795 800Phe Leu Asp Asn Glu Glu Asn
Glu Asp Ile Leu Glu Asp Ile Val Leu 805
810 815Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile Glu
Glu Arg Leu Lys 820 825 830Thr
Tyr Ala His Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys Arg 835
840 845Arg Arg Tyr Thr Gly Trp Gly Arg Leu
Ser Arg Lys Leu Ile Asn Gly 850 855
860Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser865
870 875 880Asp Gly Phe Ala
Asn Arg Asn Phe Met Gln Leu Ile His Asp Asp Ser 885
890 895Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala
Gln Val Ser Gly Gln Gly 900 905
910Asp Ser Leu His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile
915 920 925Lys Lys Gly Ile Leu Gln Thr
Val Lys Val Val Asp Glu Leu Val Lys 930 935
940Val Met Gly Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala
Arg945 950 955 960Glu Asn
Gln Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg Met
965 970 975Lys Arg Ile Glu Glu Gly Ile
Lys Glu Leu Gly Ser Gln Ile Leu Lys 980 985
990Glu His Pro Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu
Tyr Leu 995 1000 1005Tyr Tyr Leu
Gln Asn Gly Arg Asp Met Tyr Val Asp Gln Glu Leu 1010
1015 1020Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp
His Ile Val Pro 1025 1030 1035Gln Ser
Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys Val Leu Thr 1040
1045 1050Arg Ser Asp Lys Asn Arg Gly Lys Ser Asp
Asn Val Pro Ser Glu 1055 1060 1065Glu
Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu Asn 1070
1075 1080Ala Lys Leu Ile Thr Gln Arg Lys Phe
Asp Asn Leu Thr Lys Ala 1085 1090
1095Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys
1100 1105 1110Arg Gln Leu Val Glu Thr
Arg Gln Ile Thr Lys His Val Ala Gln 1115 1120
1125Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu Asn Asp
Lys 1130 1135 1140Leu Ile Arg Glu Val
Lys Val Ile Thr Leu Lys Ser Lys Leu Val 1145 1150
1155Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg
Glu Ile 1160 1165 1170Asn Asn Tyr His
His Ala His Asp Ala Tyr Leu Asn Ala Val Val 1175
1180 1185Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu
Glu Ser Glu Phe 1190 1195 1200Val Tyr
Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala 1205
1210 1215Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr
Ala Lys Tyr Phe Phe 1220 1225 1230Tyr
Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala 1235
1240 1245Asn Gly Glu Ile Arg Lys Arg Pro Leu
Ile Glu Thr Asn Gly Glu 1250 1255
1260Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val
1265 1270 1275Arg Lys Val Leu Ser Met
Pro Gln Val Asn Ile Val Lys Lys Thr 1280 1285
1290Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro
Lys 1295 1300 1305Arg Asn Ser Asp Lys
Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro 1310 1315
1320Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr
Ser Val 1325 1330 1335Leu Val Val Ala
Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys 1340
1345 1350Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met
Glu Arg Ser Ser 1355 1360 1365Phe Glu
Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys 1370
1375 1380Glu Val Lys Lys Asp Leu Ile Ile Lys Leu
Pro Lys Tyr Ser Leu 1385 1390 1395Phe
Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly 1400
1405 1410Glu Leu Gln Lys Gly Asn Glu Leu Ala
Leu Pro Ser Lys Tyr Val 1415 1420
1425Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser
1430 1435 1440Pro Glu Asp Asn Glu Gln
Lys Gln Leu Phe Val Glu Gln His Lys 1445 1450
1455His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser
Lys 1460 1465 1470Arg Val Ile Leu Ala
Asp Ala Asn Leu Asp Lys Val Leu Ser Ala 1475 1480
1485Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala
Glu Asn 1490 1495 1500Ile Ile His Leu
Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala 1505
1510 1515Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys
Arg Tyr Thr Ser 1520 1525 1530Thr Lys
Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr 1535
1540 1545Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser
Gln Leu Gly Gly Asp 1550 1555 1560Ser
Gly Gly Ser Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr 1565
1570 1575Gly Lys Gln Leu Val Ile Gln Glu Ser
Ile Leu Met Leu Pro Glu 1580 1585
1590Glu Val Glu Glu Val Ile Gly Asn Lys Pro Glu Ser Asp Ile Leu
1595 1600 1605Val His Thr Ala Tyr Asp
Glu Ser Thr Asp Glu Asn Val Met Leu 1610 1615
1620Leu Thr Ser Asp Ala Pro Glu Tyr Lys Pro Trp Ala Leu Val
Ile 1625 1630 1635Gln Asp Ser Asn Gly
Glu Asn Lys Ile Lys Met Leu Ser Gly Gly 1640 1645
1650Ser Pro Lys Lys Lys Arg Lys Val 1655
1660221633PRTArtificial SequenceN-rAPOBEC1 sDA2.2-nCas9-UGI-C 22Met Asn
Thr Arg Cys Ser Ile Thr Trp Phe Leu Ser Trp Ser Pro Cys1 5
10 15Gly Glu Cys Ser Arg Ala Ile Thr
Glu Phe Leu Ser Arg Tyr Pro His 20 25
30Val Thr Leu Phe Ile Tyr Ile Ala Arg Leu Tyr His His Ala Asp
Pro 35 40 45Arg Asn Arg Gln Gly
Leu Arg Asp Leu Ile Ser Ser Gly Val Thr Ile 50 55
60Gln Ile Met Thr Glu Gln Glu Ser Gly Tyr Cys Trp Arg Asn
Phe Val65 70 75 80Asn
Tyr Ser Pro Ser Asn Glu Ala His Trp Pro Arg Tyr Pro His Leu
85 90 95Trp Val Arg Leu Tyr Val Leu
Glu Leu Tyr Cys Ile Ile Leu Gly Leu 100 105
110Pro Pro Cys Leu Asn Ile Leu Arg Arg Lys Gln Pro Gln Leu
Thr Phe 115 120 125Phe Thr Ile Ala
Leu Gln Ser Cys His Tyr Gln Arg Leu Pro Pro His 130
135 140Ile Leu Trp Ala Thr Gly Leu Lys Ser Gly Ser Glu
Thr Pro Gly Thr145 150 155
160Ser Glu Ser Ala Thr Pro Glu Ser Asp Lys Lys Tyr Ser Ile Gly Leu
165 170 175Ala Ile Gly Thr Asn
Ser Val Gly Trp Ala Val Ile Thr Asp Glu Tyr 180
185 190Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn
Thr Asp Arg His 195 200 205Ser Ile
Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser Gly Glu 210
215 220Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala
Arg Arg Arg Tyr Thr225 230 235
240Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn Glu
245 250 255Met Ala Lys Val
Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser Phe 260
265 270Leu Val Glu Glu Asp Lys Lys His Glu Arg His
Pro Ile Phe Gly Asn 275 280 285Ile
Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr Ile Tyr His 290
295 300Leu Arg Lys Lys Leu Val Asp Ser Thr Asp
Lys Ala Asp Leu Arg Leu305 310 315
320Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe
Leu 325 330 335Ile Glu Gly
Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys Leu Phe 340
345 350Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu
Phe Glu Glu Asn Pro Ile 355 360
365Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu Ser 370
375 380Lys Ser Arg Arg Leu Glu Asn Leu
Ile Ala Gln Leu Pro Gly Glu Lys385 390
395 400Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser
Leu Gly Leu Thr 405 410
415Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala Lys Leu Gln
420 425 430Leu Ser Lys Asp Thr Tyr
Asp Asp Asp Leu Asp Asn Leu Leu Ala Gln 435 440
445Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn
Leu Ser 450 455 460Asp Ala Ile Leu Leu
Ser Asp Ile Leu Arg Val Asn Thr Glu Ile Thr465 470
475 480Lys Ala Pro Leu Ser Ala Ser Met Ile Lys
Arg Tyr Asp Glu His His 485 490
495Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln Leu Pro Glu
500 505 510Lys Tyr Lys Glu Ile
Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala Gly 515
520 525Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr
Lys Phe Ile Lys 530 535 540Pro Ile Leu
Glu Lys Met Asp Gly Thr Glu Glu Leu Leu Val Lys Leu545
550 555 560Asn Arg Glu Asp Leu Leu Arg
Lys Gln Arg Thr Phe Asp Asn Gly Ser 565
570 575Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala
Ile Leu Arg Arg 580 585 590Gln
Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys Ile Glu 595
600 605Lys Ile Leu Thr Phe Arg Ile Pro Tyr
Tyr Val Gly Pro Leu Ala Arg 610 615
620Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu Glu Thr Ile625
630 635 640Thr Pro Trp Asn
Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala Gln 645
650 655Ser Phe Ile Glu Arg Met Thr Asn Phe Asp
Lys Asn Leu Pro Asn Glu 660 665
670Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe Thr Val Tyr
675 680 685Asn Glu Leu Thr Lys Val Lys
Tyr Val Thr Glu Gly Met Arg Lys Pro 690 695
700Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu
Phe705 710 715 720Lys Thr
Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp Tyr Phe
725 730 735Lys Lys Ile Glu Cys Phe Asp
Ser Val Glu Ile Ser Gly Val Glu Asp 740 745
750Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu Lys
Ile Ile 755 760 765Lys Asp Lys Asp
Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu Glu 770
775 780Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg
Glu Met Ile Glu785 790 795
800Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp Lys Val Met Lys
805 810 815Gln Leu Lys Arg Arg
Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg Lys 820
825 830Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys
Thr Ile Leu Asp 835 840 845Phe Leu
Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln Leu Ile 850
855 860His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile
Gln Lys Ala Gln Val865 870 875
880Ser Gly Gln Gly Asp Ser Leu His Glu His Ile Ala Asn Leu Ala Gly
885 890 895Ser Pro Ala Ile
Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val Asp 900
905 910Glu Leu Val Lys Val Met Gly Arg His Lys Pro
Glu Asn Ile Val Ile 915 920 925Glu
Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln Lys Asn Ser 930
935 940Arg Glu Arg Met Lys Arg Ile Glu Glu Gly
Ile Lys Glu Leu Gly Ser945 950 955
960Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln Asn
Glu 965 970 975Lys Leu Tyr
Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr Val Asp 980
985 990Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp
Tyr Asp Val Asp His Ile 995 1000
1005Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn Lys Val
1010 1015 1020Leu Thr Arg Ser Asp Lys
Asn Arg Gly Lys Ser Asp Asn Val Pro 1025 1030
1035Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln
Leu 1040 1045 1050Leu Asn Ala Lys Leu
Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr 1055 1060
1065Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys Ala
Gly Phe 1070 1075 1080Ile Lys Arg Gln
Leu Val Glu Thr Arg Gln Ile Thr Lys His Val 1085
1090 1095Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys
Tyr Asp Glu Asn 1100 1105 1110Asp Lys
Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys Ser Lys 1115
1120 1125Leu Val Ser Asp Phe Arg Lys Asp Phe Gln
Phe Tyr Lys Val Arg 1130 1135 1140Glu
Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu Asn Ala 1145
1150 1155Val Val Gly Thr Ala Leu Ile Lys Lys
Tyr Pro Lys Leu Glu Ser 1160 1165
1170Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg Lys Met
1175 1180 1185Ile Ala Lys Ser Glu Gln
Glu Ile Gly Lys Ala Thr Ala Lys Tyr 1190 1195
1200Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu Ile
Thr 1205 1210 1215Leu Ala Asn Gly Glu
Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn 1220 1225
1230Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp
Phe Ala 1235 1240 1245Thr Val Arg Lys
Val Leu Ser Met Pro Gln Val Asn Ile Val Lys 1250
1255 1260Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys
Glu Ser Ile Leu 1265 1270 1275Pro Lys
Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys Asp Trp 1280
1285 1290Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser
Pro Thr Val Ala Tyr 1295 1300 1305Ser
Val Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser Lys Lys 1310
1315 1320Leu Lys Ser Val Lys Glu Leu Leu Gly
Ile Thr Ile Met Glu Arg 1325 1330
1335Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly
1340 1345 1350Tyr Lys Glu Val Lys Lys
Asp Leu Ile Ile Lys Leu Pro Lys Tyr 1355 1360
1365Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu Ala
Ser 1370 1375 1380Ala Gly Glu Leu Gln
Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys 1385 1390
1395Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys
Leu Lys 1400 1405 1410Gly Ser Pro Glu
Asp Asn Glu Gln Lys Gln Leu Phe Val Glu Gln 1415
1420 1425His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln
Ile Ser Glu Phe 1430 1435 1440Ser Lys
Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys Val Leu 1445
1450 1455Ser Ala Tyr Asn Lys His Arg Asp Lys Pro
Ile Arg Glu Gln Ala 1460 1465 1470Glu
Asn Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly Ala Pro 1475
1480 1485Ala Ala Phe Lys Tyr Phe Asp Thr Thr
Ile Asp Arg Lys Arg Tyr 1490 1495
1500Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His Gln Ser
1505 1510 1515Ile Thr Gly Leu Tyr Glu
Thr Arg Ile Asp Leu Ser Gln Leu Gly 1520 1525
1530Gly Asp Ser Gly Gly Ser Thr Asn Leu Ser Asp Ile Ile Glu
Lys 1535 1540 1545Glu Thr Gly Lys Gln
Leu Val Ile Gln Glu Ser Ile Leu Met Leu 1550 1555
1560Pro Glu Glu Val Glu Glu Val Ile Gly Asn Lys Pro Glu
Ser Asp 1565 1570 1575Ile Leu Val His
Thr Ala Tyr Asp Glu Ser Thr Asp Glu Asn Val 1580
1585 1590Met Leu Leu Thr Ser Asp Ala Pro Glu Tyr Lys
Pro Trp Ala Leu 1595 1600 1605Val Ile
Gln Asp Ser Asn Gly Glu Asn Lys Ile Lys Met Leu Ser 1610
1615 1620Gly Gly Ser Pro Lys Lys Lys Arg Lys Val
1625 1630231620PRTArtificial SequenceN-rAPOBEC1
sDA2.3-nCas9-UGI-C 23Met Pro Cys Gly Glu Cys Ser Arg Ala Ile Thr Glu Phe
Leu Ser Arg1 5 10 15Tyr
Pro His Val Thr Leu Phe Ile Tyr Ile Ala Arg Leu Tyr His His 20
25 30Ala Asp Pro Arg Asn Arg Gln Gly
Leu Arg Asp Leu Ile Ser Ser Gly 35 40
45Val Thr Ile Gln Ile Met Thr Glu Gln Glu Ser Gly Tyr Cys Trp Arg
50 55 60Asn Phe Val Asn Tyr Ser Pro Ser
Asn Glu Ala His Trp Pro Arg Tyr65 70 75
80Pro His Leu Trp Val Arg Leu Tyr Val Leu Glu Leu Tyr
Cys Ile Ile 85 90 95Leu
Gly Leu Pro Pro Cys Leu Asn Ile Leu Arg Arg Lys Gln Pro Gln
100 105 110Leu Thr Phe Phe Thr Ile Ala
Leu Gln Ser Cys His Tyr Gln Arg Leu 115 120
125Pro Pro His Ile Leu Trp Ala Thr Gly Leu Lys Ser Gly Ser Glu
Thr 130 135 140Pro Gly Thr Ser Glu Ser
Ala Thr Pro Glu Ser Asp Lys Lys Tyr Ser145 150
155 160Ile Gly Leu Ala Ile Gly Thr Asn Ser Val Gly
Trp Ala Val Ile Thr 165 170
175Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu Gly Asn Thr
180 185 190Asp Arg His Ser Ile Lys
Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp 195 200
205Ser Gly Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala
Arg Arg 210 215 220Arg Tyr Thr Arg Arg
Lys Asn Arg Ile Cys Tyr Leu Gln Glu Ile Phe225 230
235 240Ser Asn Glu Met Ala Lys Val Asp Asp Ser
Phe Phe His Arg Leu Glu 245 250
255Glu Ser Phe Leu Val Glu Glu Asp Lys Lys His Glu Arg His Pro Ile
260 265 270Phe Gly Asn Ile Val
Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr 275
280 285Ile Tyr His Leu Arg Lys Lys Leu Val Asp Ser Thr
Asp Lys Ala Asp 290 295 300Leu Arg Leu
Ile Tyr Leu Ala Leu Ala His Met Ile Lys Phe Arg Gly305
310 315 320His Phe Leu Ile Glu Gly Asp
Leu Asn Pro Asp Asn Ser Asp Val Asp 325
330 335Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr Asn Gln
Leu Phe Glu Glu 340 345 350Asn
Pro Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser Ala 355
360 365Arg Leu Ser Lys Ser Arg Arg Leu Glu
Asn Leu Ile Ala Gln Leu Pro 370 375
380Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu385
390 395 400Gly Leu Thr Pro
Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp Ala 405
410 415Lys Leu Gln Leu Ser Lys Asp Thr Tyr Asp
Asp Asp Leu Asp Asn Leu 420 425
430Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys
435 440 445Asn Leu Ser Asp Ala Ile Leu
Leu Ser Asp Ile Leu Arg Val Asn Thr 450 455
460Glu Ile Thr Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr
Asp465 470 475 480Glu His
His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln Gln
485 490 495Leu Pro Glu Lys Tyr Lys Glu
Ile Phe Phe Asp Gln Ser Lys Asn Gly 500 505
510Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln Glu Glu Phe
Tyr Lys 515 520 525Phe Ile Lys Pro
Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu 530
535 540Val Lys Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln
Arg Thr Phe Asp545 550 555
560Asn Gly Ser Ile Pro His Gln Ile His Leu Gly Glu Leu His Ala Ile
565 570 575Leu Arg Arg Gln Glu
Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu 580
585 590Lys Ile Glu Lys Ile Leu Thr Phe Arg Ile Pro Tyr
Tyr Val Gly Pro 595 600 605Leu Ala
Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser Glu 610
615 620Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu Val
Val Asp Lys Gly Ala625 630 635
640Ser Ala Gln Ser Phe Ile Glu Arg Met Thr Asn Phe Asp Lys Asn Leu
645 650 655Pro Asn Glu Lys
Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr Phe 660
665 670Thr Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr
Val Thr Glu Gly Met 675 680 685Arg
Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys Lys Ala Ile Val Asp 690
695 700Leu Leu Phe Lys Thr Asn Arg Lys Val Thr
Val Lys Gln Leu Lys Glu705 710 715
720Asp Tyr Phe Lys Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser
Gly 725 730 735Val Glu Asp
Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu Leu 740
745 750Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp
Asn Glu Glu Asn Glu Asp 755 760
765Ile Leu Glu Asp Ile Val Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu 770
775 780Met Ile Glu Glu Arg Leu Lys Thr
Tyr Ala His Leu Phe Asp Asp Lys785 790
795 800Val Met Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly
Trp Gly Arg Leu 805 810
815Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly Lys Thr
820 825 830Ile Leu Asp Phe Leu Lys
Ser Asp Gly Phe Ala Asn Arg Asn Phe Met 835 840
845Gln Leu Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile
Gln Lys 850 855 860Ala Gln Val Ser Gly
Gln Gly Asp Ser Leu His Glu His Ile Ala Asn865 870
875 880Leu Ala Gly Ser Pro Ala Ile Lys Lys Gly
Ile Leu Gln Thr Val Lys 885 890
895Val Val Asp Glu Leu Val Lys Val Met Gly Arg His Lys Pro Glu Asn
900 905 910Ile Val Ile Glu Met
Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln 915
920 925Lys Asn Ser Arg Glu Arg Met Lys Arg Ile Glu Glu
Gly Ile Lys Glu 930 935 940Leu Gly Ser
Gln Ile Leu Lys Glu His Pro Val Glu Asn Thr Gln Leu945
950 955 960Gln Asn Glu Lys Leu Tyr Leu
Tyr Tyr Leu Gln Asn Gly Arg Asp Met 965
970 975Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg Leu Ser
Asp Tyr Asp Val 980 985 990Asp
His Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp Asn 995
1000 1005Lys Val Leu Thr Arg Ser Asp Lys
Asn Arg Gly Lys Ser Asp Asn 1010 1015
1020Val Pro Ser Glu Glu Val Val Lys Lys Met Lys Asn Tyr Trp Arg
1025 1030 1035Gln Leu Leu Asn Ala Lys
Leu Ile Thr Gln Arg Lys Phe Asp Asn 1040 1045
1050Leu Thr Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys
Ala 1055 1060 1065Gly Phe Ile Lys Arg
Gln Leu Val Glu Thr Arg Gln Ile Thr Lys 1070 1075
1080His Val Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys
Tyr Asp 1085 1090 1095Glu Asn Asp Lys
Leu Ile Arg Glu Val Lys Val Ile Thr Leu Lys 1100
1105 1110Ser Lys Leu Val Ser Asp Phe Arg Lys Asp Phe
Gln Phe Tyr Lys 1115 1120 1125Val Arg
Glu Ile Asn Asn Tyr His His Ala His Asp Ala Tyr Leu 1130
1135 1140Asn Ala Val Val Gly Thr Ala Leu Ile Lys
Lys Tyr Pro Lys Leu 1145 1150 1155Glu
Ser Glu Phe Val Tyr Gly Asp Tyr Lys Val Tyr Asp Val Arg 1160
1165 1170Lys Met Ile Ala Lys Ser Glu Gln Glu
Ile Gly Lys Ala Thr Ala 1175 1180
1185Lys Tyr Phe Phe Tyr Ser Asn Ile Met Asn Phe Phe Lys Thr Glu
1190 1195 1200Ile Thr Leu Ala Asn Gly
Glu Ile Arg Lys Arg Pro Leu Ile Glu 1205 1210
1215Thr Asn Gly Glu Thr Gly Glu Ile Val Trp Asp Lys Gly Arg
Asp 1220 1225 1230Phe Ala Thr Val Arg
Lys Val Leu Ser Met Pro Gln Val Asn Ile 1235 1240
1245Val Lys Lys Thr Glu Val Gln Thr Gly Gly Phe Ser Lys
Glu Ser 1250 1255 1260Ile Leu Pro Lys
Arg Asn Ser Asp Lys Leu Ile Ala Arg Lys Lys 1265
1270 1275Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe Asp
Ser Pro Thr Val 1280 1285 1290Ala Tyr
Ser Val Leu Val Val Ala Lys Val Glu Lys Gly Lys Ser 1295
1300 1305Lys Lys Leu Lys Ser Val Lys Glu Leu Leu
Gly Ile Thr Ile Met 1310 1315 1320Glu
Arg Ser Ser Phe Glu Lys Asn Pro Ile Asp Phe Leu Glu Ala 1325
1330 1335Lys Gly Tyr Lys Glu Val Lys Lys Asp
Leu Ile Ile Lys Leu Pro 1340 1345
1350Lys Tyr Ser Leu Phe Glu Leu Glu Asn Gly Arg Lys Arg Met Leu
1355 1360 1365Ala Ser Ala Gly Glu Leu
Gln Lys Gly Asn Glu Leu Ala Leu Pro 1370 1375
1380Ser Lys Tyr Val Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu
Lys 1385 1390 1395Leu Lys Gly Ser Pro
Glu Asp Asn Glu Gln Lys Gln Leu Phe Val 1400 1405
1410Glu Gln His Lys His Tyr Leu Asp Glu Ile Ile Glu Gln
Ile Ser 1415 1420 1425Glu Phe Ser Lys
Arg Val Ile Leu Ala Asp Ala Asn Leu Asp Lys 1430
1435 1440Val Leu Ser Ala Tyr Asn Lys His Arg Asp Lys
Pro Ile Arg Glu 1445 1450 1455Gln Ala
Glu Asn Ile Ile His Leu Phe Thr Leu Thr Asn Leu Gly 1460
1465 1470Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr
Thr Ile Asp Arg Lys 1475 1480 1485Arg
Tyr Thr Ser Thr Lys Glu Val Leu Asp Ala Thr Leu Ile His 1490
1495 1500Gln Ser Ile Thr Gly Leu Tyr Glu Thr
Arg Ile Asp Leu Ser Gln 1505 1510
1515Leu Gly Gly Asp Ser Gly Gly Ser Thr Asn Leu Ser Asp Ile Ile
1520 1525 1530Glu Lys Glu Thr Gly Lys
Gln Leu Val Ile Gln Glu Ser Ile Leu 1535 1540
1545Met Leu Pro Glu Glu Val Glu Glu Val Ile Gly Asn Lys Pro
Glu 1550 1555 1560Ser Asp Ile Leu Val
His Thr Ala Tyr Asp Glu Ser Thr Asp Glu 1565 1570
1575Asn Val Met Leu Leu Thr Ser Asp Ala Pro Glu Tyr Lys
Pro Trp 1580 1585 1590Ala Leu Val Ile
Gln Asp Ser Asn Gly Glu Asn Lys Ile Lys Met 1595
1600 1605Leu Ser Gly Gly Ser Pro Lys Lys Lys Arg Lys
Val 1610 1615 1620241603PRTArtificial
SequenceN-rAPOBEC1 sDA2.4-nCas9-UGI-C 24Met His Val Thr Leu Phe Ile Tyr
Ile Ala Arg Leu Tyr His His Ala1 5 10
15Asp Pro Arg Asn Arg Gln Gly Leu Arg Asp Leu Ile Ser Ser
Gly Val 20 25 30Thr Ile Gln
Ile Met Thr Glu Gln Glu Ser Gly Tyr Cys Trp Arg Asn 35
40 45Phe Val Asn Tyr Ser Pro Ser Asn Glu Ala His
Trp Pro Arg Tyr Pro 50 55 60His Leu
Trp Val Arg Leu Tyr Val Leu Glu Leu Tyr Cys Ile Ile Leu65
70 75 80Gly Leu Pro Pro Cys Leu Asn
Ile Leu Arg Arg Lys Gln Pro Gln Leu 85 90
95Thr Phe Phe Thr Ile Ala Leu Gln Ser Cys His Tyr Gln
Arg Leu Pro 100 105 110Pro His
Ile Leu Trp Ala Thr Gly Leu Lys Ser Gly Ser Glu Thr Pro 115
120 125Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser
Asp Lys Lys Tyr Ser Ile 130 135 140Gly
Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp145
150 155 160Glu Tyr Lys Val Pro Ser
Lys Lys Phe Lys Val Leu Gly Asn Thr Asp 165
170 175Arg His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu
Leu Phe Asp Ser 180 185 190Gly
Glu Thr Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg 195
200 205Tyr Thr Arg Arg Lys Asn Arg Ile Cys
Tyr Leu Gln Glu Ile Phe Ser 210 215
220Asn Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu225
230 235 240Ser Phe Leu Val
Glu Glu Asp Lys Lys His Glu Arg His Pro Ile Phe 245
250 255Gly Asn Ile Val Asp Glu Val Ala Tyr His
Glu Lys Tyr Pro Thr Ile 260 265
270Tyr His Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu
275 280 285Arg Leu Ile Tyr Leu Ala Leu
Ala His Met Ile Lys Phe Arg Gly His 290 295
300Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp
Lys305 310 315 320Leu Phe
Ile Gln Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn
325 330 335Pro Ile Asn Ala Ser Gly Val
Asp Ala Lys Ala Ile Leu Ser Ala Arg 340 345
350Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu
Pro Gly 355 360 365Glu Lys Lys Asn
Gly Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly 370
375 380Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala
Glu Asp Ala Lys385 390 395
400Leu Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu
405 410 415Ala Gln Ile Gly Asp
Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn 420
425 430Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg
Val Asn Thr Glu 435 440 445Ile Thr
Lys Ala Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu 450
455 460His His Gln Asp Leu Thr Leu Leu Lys Ala Leu
Val Arg Gln Gln Leu465 470 475
480Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr
485 490 495Ala Gly Tyr Ile
Asp Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe 500
505 510Ile Lys Pro Ile Leu Glu Lys Met Asp Gly Thr
Glu Glu Leu Leu Val 515 520 525Lys
Leu Asn Arg Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn 530
535 540Gly Ser Ile Pro His Gln Ile His Leu Gly
Glu Leu His Ala Ile Leu545 550 555
560Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu
Lys 565 570 575Ile Glu Lys
Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu 580
585 590Ala Arg Gly Asn Ser Arg Phe Ala Trp Met
Thr Arg Lys Ser Glu Glu 595 600
605Thr Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser 610
615 620Ala Gln Ser Phe Ile Glu Arg Met
Thr Asn Phe Asp Lys Asn Leu Pro625 630
635 640Asn Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr
Glu Tyr Phe Thr 645 650
655Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg
660 665 670Lys Pro Ala Phe Leu Ser
Gly Glu Gln Lys Lys Ala Ile Val Asp Leu 675 680
685Leu Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys
Glu Asp 690 695 700Tyr Phe Lys Lys Ile
Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val705 710
715 720Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr
Tyr His Asp Leu Leu Lys 725 730
735Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile
740 745 750Leu Glu Asp Ile Val
Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met 755
760 765Ile Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe
Asp Asp Lys Val 770 775 780Met Lys Gln
Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser785
790 795 800Arg Lys Leu Ile Asn Gly Ile
Arg Asp Lys Gln Ser Gly Lys Thr Ile 805
810 815Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg
Asn Phe Met Gln 820 825 830Leu
Ile His Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala 835
840 845Gln Val Ser Gly Gln Gly Asp Ser Leu
His Glu His Ile Ala Asn Leu 850 855
860Ala Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val865
870 875 880Val Asp Glu Leu
Val Lys Val Met Gly Arg His Lys Pro Glu Asn Ile 885
890 895Val Ile Glu Met Ala Arg Glu Asn Gln Thr
Thr Gln Lys Gly Gln Lys 900 905
910Asn Ser Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu
915 920 925Gly Ser Gln Ile Leu Lys Glu
His Pro Val Glu Asn Thr Gln Leu Gln 930 935
940Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met
Tyr945 950 955 960Val Asp
Gln Glu Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp
965 970 975His Ile Val Pro Gln Ser Phe
Leu Lys Asp Asp Ser Ile Asp Asn Lys 980 985
990Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn
Val Pro 995 1000 1005Ser Glu Glu
Val Val Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu 1010
1015 1020Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe
Asp Asn Leu Thr 1025 1030 1035Lys Ala
Glu Arg Gly Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe 1040
1045 1050Ile Lys Arg Gln Leu Val Glu Thr Arg Gln
Ile Thr Lys His Val 1055 1060 1065Ala
Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu Asn 1070
1075 1080Asp Lys Leu Ile Arg Glu Val Lys Val
Ile Thr Leu Lys Ser Lys 1085 1090
1095Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg
1100 1105 1110Glu Ile Asn Asn Tyr His
His Ala His Asp Ala Tyr Leu Asn Ala 1115 1120
1125Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu
Ser 1130 1135 1140Glu Phe Val Tyr Gly
Asp Tyr Lys Val Tyr Asp Val Arg Lys Met 1145 1150
1155Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala
Lys Tyr 1160 1165 1170Phe Phe Tyr Ser
Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr 1175
1180 1185Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu
Ile Glu Thr Asn 1190 1195 1200Gly Glu
Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala 1205
1210 1215Thr Val Arg Lys Val Leu Ser Met Pro Gln
Val Asn Ile Val Lys 1220 1225 1230Lys
Thr Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu 1235
1240 1245Pro Lys Arg Asn Ser Asp Lys Leu Ile
Ala Arg Lys Lys Asp Trp 1250 1255
1260Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr
1265 1270 1275Ser Val Leu Val Val Ala
Lys Val Glu Lys Gly Lys Ser Lys Lys 1280 1285
1290Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu
Arg 1295 1300 1305Ser Ser Phe Glu Lys
Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly 1310 1315
1320Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro
Lys Tyr 1325 1330 1335Ser Leu Phe Glu
Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser 1340
1345 1350Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala
Leu Pro Ser Lys 1355 1360 1365Tyr Val
Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys 1370
1375 1380Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln
Leu Phe Val Glu Gln 1385 1390 1395His
Lys His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe 1400
1405 1410Ser Lys Arg Val Ile Leu Ala Asp Ala
Asn Leu Asp Lys Val Leu 1415 1420
1425Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala
1430 1435 1440Glu Asn Ile Ile His Leu
Phe Thr Leu Thr Asn Leu Gly Ala Pro 1445 1450
1455Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg
Tyr 1460 1465 1470Thr Ser Thr Lys Glu
Val Leu Asp Ala Thr Leu Ile His Gln Ser 1475 1480
1485Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln
Leu Gly 1490 1495 1500Gly Asp Ser Gly
Gly Ser Thr Asn Leu Ser Asp Ile Ile Glu Lys 1505
1510 1515Glu Thr Gly Lys Gln Leu Val Ile Gln Glu Ser
Ile Leu Met Leu 1520 1525 1530Pro Glu
Glu Val Glu Glu Val Ile Gly Asn Lys Pro Glu Ser Asp 1535
1540 1545Ile Leu Val His Thr Ala Tyr Asp Glu Ser
Thr Asp Glu Asn Val 1550 1555 1560Met
Leu Leu Thr Ser Asp Ala Pro Glu Tyr Lys Pro Trp Ala Leu 1565
1570 1575Val Ile Gln Asp Ser Asn Gly Glu Asn
Lys Ile Lys Met Leu Ser 1580 1585
1590Gly Gly Ser Pro Lys Lys Lys Arg Lys Val 1595
1600251567PRTArtificial SequenceN-rAPOBEC1 sDA2.5-nCas9-UGI-C 25Met Thr
Glu Gln Glu Ser Gly Tyr Cys Trp Arg Asn Phe Val Asn Tyr1 5
10 15Ser Pro Ser Asn Glu Ala His Trp
Pro Arg Tyr Pro His Leu Trp Val 20 25
30Arg Leu Tyr Val Leu Glu Leu Tyr Cys Ile Ile Leu Gly Leu Pro
Pro 35 40 45Cys Leu Asn Ile Leu
Arg Arg Lys Gln Pro Gln Leu Thr Phe Phe Thr 50 55
60Ile Ala Leu Gln Ser Cys His Tyr Gln Arg Leu Pro Pro His
Ile Leu65 70 75 80Trp
Ala Thr Gly Leu Lys Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu
85 90 95Ser Ala Thr Pro Glu Ser Asp
Lys Lys Tyr Ser Ile Gly Leu Ala Ile 100 105
110Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu Tyr
Lys Val 115 120 125Pro Ser Lys Lys
Phe Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile 130
135 140Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe Asp Ser
Gly Glu Thr Ala145 150 155
160Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg
165 170 175Lys Asn Arg Ile Cys
Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala 180
185 190Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu
Ser Phe Leu Val 195 200 205Glu Glu
Asp Lys Lys His Glu Arg His Pro Ile Phe Gly Asn Ile Val 210
215 220Asp Glu Val Ala Tyr His Glu Lys Tyr Pro Thr
Ile Tyr His Leu Arg225 230 235
240Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr
245 250 255Leu Ala Leu Ala
His Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu 260
265 270Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp
Lys Leu Phe Ile Gln 275 280 285Leu
Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala 290
295 300Ser Gly Val Asp Ala Lys Ala Ile Leu Ser
Ala Arg Leu Ser Lys Ser305 310 315
320Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys
Asn 325 330 335Gly Leu Phe
Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn 340
345 350Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp
Ala Lys Leu Gln Leu Ser 355 360
365Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly 370
375 380Asp Gln Tyr Ala Asp Leu Phe Leu
Ala Ala Lys Asn Leu Ser Asp Ala385 390
395 400Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu
Ile Thr Lys Ala 405 410
415Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His His Gln Asp
420 425 430Leu Thr Leu Leu Lys Ala
Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr 435 440
445Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala Gly
Tyr Ile 450 455 460Asp Gly Gly Ala Ser
Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile465 470
475 480Leu Glu Lys Met Asp Gly Thr Glu Glu Leu
Leu Val Lys Leu Asn Arg 485 490
495Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro
500 505 510His Gln Ile His Leu
Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu 515
520 525Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu Lys
Ile Glu Lys Ile 530 535 540Leu Thr Phe
Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn545
550 555 560Ser Arg Phe Ala Trp Met Thr
Arg Lys Ser Glu Glu Thr Ile Thr Pro 565
570 575Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser
Ala Gln Ser Phe 580 585 590Ile
Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val 595
600 605Leu Pro Lys His Ser Leu Leu Tyr Glu
Tyr Phe Thr Val Tyr Asn Glu 610 615
620Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe625
630 635 640Leu Ser Gly Glu
Gln Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr 645
650 655Asn Arg Lys Val Thr Val Lys Gln Leu Lys
Glu Asp Tyr Phe Lys Lys 660 665
670Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe
675 680 685Asn Ala Ser Leu Gly Thr Tyr
His Asp Leu Leu Lys Ile Ile Lys Asp 690 695
700Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp
Ile705 710 715 720Val Leu
Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg
725 730 735Leu Lys Thr Tyr Ala His Leu
Phe Asp Asp Lys Val Met Lys Gln Leu 740 745
750Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg Lys
Leu Ile 755 760 765Asn Gly Ile Arg
Asp Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu 770
775 780Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln
Leu Ile His Asp785 790 795
800Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly
805 810 815Gln Gly Asp Ser Leu
His Glu His Ile Ala Asn Leu Ala Gly Ser Pro 820
825 830Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val
Val Asp Glu Leu 835 840 845Val Lys
Val Met Gly Arg His Lys Pro Glu Asn Ile Val Ile Glu Met 850
855 860Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly Gln
Lys Asn Ser Arg Glu865 870 875
880Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile
885 890 895Leu Lys Glu His
Pro Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu 900
905 910Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met
Tyr Val Asp Gln Glu 915 920 925Leu
Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro 930
935 940Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp
Asn Lys Val Leu Thr Arg945 950 955
960Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn Val Pro Ser Glu Glu
Val 965 970 975Val Lys Lys
Met Lys Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu 980
985 990Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr
Lys Ala Glu Arg Gly Gly 995 1000
1005Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg Gln Leu Val
1010 1015 1020Glu Thr Arg Gln Ile Thr
Lys His Val Ala Gln Ile Leu Asp Ser 1025 1030
1035Arg Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg
Glu 1040 1045 1050Val Lys Val Ile Thr
Leu Lys Ser Lys Leu Val Ser Asp Phe Arg 1055 1060
1065Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn
Tyr His 1070 1075 1080His Ala His Asp
Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu 1085
1090 1095Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe
Val Tyr Gly Asp 1100 1105 1110Tyr Lys
Val Tyr Asp Val Arg Lys Met Ile Ala Lys Ser Glu Gln 1115
1120 1125Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe
Phe Tyr Ser Asn Ile 1130 1135 1140Met
Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile 1145
1150 1155Arg Lys Arg Pro Leu Ile Glu Thr Asn
Gly Glu Thr Gly Glu Ile 1160 1165
1170Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu
1175 1180 1185Ser Met Pro Gln Val Asn
Ile Val Lys Lys Thr Glu Val Gln Thr 1190 1195
1200Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser
Asp 1205 1210 1215Lys Leu Ile Ala Arg
Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly 1220 1225
1230Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu Val
Val Ala 1235 1240 1245Lys Val Glu Lys
Gly Lys Ser Lys Lys Leu Lys Ser Val Lys Glu 1250
1255 1260Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser
Phe Glu Lys Asn 1265 1270 1275Pro Ile
Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys 1280
1285 1290Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser
Leu Phe Glu Leu Glu 1295 1300 1305Asn
Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys 1310
1315 1320Gly Asn Glu Leu Ala Leu Pro Ser Lys
Tyr Val Asn Phe Leu Tyr 1325 1330
1335Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn
1340 1345 1350Glu Gln Lys Gln Leu Phe
Val Glu Gln His Lys His Tyr Leu Asp 1355 1360
1365Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val Ile
Leu 1370 1375 1380Ala Asp Ala Asn Leu
Asp Lys Val Leu Ser Ala Tyr Asn Lys His 1385 1390
1395Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile
His Leu 1400 1405 1410Phe Thr Leu Thr
Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe 1415
1420 1425Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser
Thr Lys Glu Val 1430 1435 1440Leu Asp
Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu 1445
1450 1455Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly
Asp Ser Gly Gly Ser 1460 1465 1470Thr
Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu 1475
1480 1485Val Ile Gln Glu Ser Ile Leu Met Leu
Pro Glu Glu Val Glu Glu 1490 1495
1500Val Ile Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His Thr Ala
1505 1510 1515Tyr Asp Glu Ser Thr Asp
Glu Asn Val Met Leu Leu Thr Ser Asp 1520 1525
1530Ala Pro Glu Tyr Lys Pro Trp Ala Leu Val Ile Gln Asp Ser
Asn 1535 1540 1545Gly Glu Asn Lys Ile
Lys Met Leu Ser Gly Gly Ser Pro Lys Lys 1550 1555
1560Lys Arg Lys Val 1565261551PRTArtificial
SequenceN-rAPOBEC1 sDA2.6-nCas9-UGI-C 26Met Pro Ser Asn Glu Ala His Trp
Pro Arg Tyr Pro His Leu Trp Val1 5 10
15Arg Leu Tyr Val Leu Glu Leu Tyr Cys Ile Ile Leu Gly Leu
Pro Pro 20 25 30Cys Leu Asn
Ile Leu Arg Arg Lys Gln Pro Gln Leu Thr Phe Phe Thr 35
40 45Ile Ala Leu Gln Ser Cys His Tyr Gln Arg Leu
Pro Pro His Ile Leu 50 55 60Trp Ala
Thr Gly Leu Lys Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu65
70 75 80Ser Ala Thr Pro Glu Ser Asp
Lys Lys Tyr Ser Ile Gly Leu Ala Ile 85 90
95Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu
Tyr Lys Val 100 105 110Pro Ser
Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile 115
120 125Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe
Asp Ser Gly Glu Thr Ala 130 135 140Glu
Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg145
150 155 160Lys Asn Arg Ile Cys Tyr
Leu Gln Glu Ile Phe Ser Asn Glu Met Ala 165
170 175Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu
Ser Phe Leu Val 180 185 190Glu
Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly Asn Ile Val 195
200 205Asp Glu Val Ala Tyr His Glu Lys Tyr
Pro Thr Ile Tyr His Leu Arg 210 215
220Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr225
230 235 240Leu Ala Leu Ala
His Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu 245
250 255Gly Asp Leu Asn Pro Asp Asn Ser Asp Val
Asp Lys Leu Phe Ile Gln 260 265
270Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala
275 280 285Ser Gly Val Asp Ala Lys Ala
Ile Leu Ser Ala Arg Leu Ser Lys Ser 290 295
300Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys
Asn305 310 315 320Gly Leu
Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn
325 330 335Phe Lys Ser Asn Phe Asp Leu
Ala Glu Asp Ala Lys Leu Gln Leu Ser 340 345
350Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala Gln
Ile Gly 355 360 365Asp Gln Tyr Ala
Asp Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala 370
375 380Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr Glu
Ile Thr Lys Ala385 390 395
400Pro Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His His Gln Asp
405 410 415Leu Thr Leu Leu Lys
Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr 420
425 430Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr
Ala Gly Tyr Ile 435 440 445Asp Gly
Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile 450
455 460Leu Glu Lys Met Asp Gly Thr Glu Glu Leu Leu
Val Lys Leu Asn Arg465 470 475
480Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro
485 490 495His Gln Ile His
Leu Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu 500
505 510Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu
Lys Ile Glu Lys Ile 515 520 525Leu
Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn 530
535 540Ser Arg Phe Ala Trp Met Thr Arg Lys Ser
Glu Glu Thr Ile Thr Pro545 550 555
560Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala Gln Ser
Phe 565 570 575Ile Glu Arg
Met Thr Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val 580
585 590Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr
Phe Thr Val Tyr Asn Glu 595 600
605Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe 610
615 620Leu Ser Gly Glu Gln Lys Lys Ala
Ile Val Asp Leu Leu Phe Lys Thr625 630
635 640Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp
Tyr Phe Lys Lys 645 650
655Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe
660 665 670Asn Ala Ser Leu Gly Thr
Tyr His Asp Leu Leu Lys Ile Ile Lys Asp 675 680
685Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp Ile Leu Glu
Asp Ile 690 695 700Val Leu Thr Leu Thr
Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg705 710
715 720Leu Lys Thr Tyr Ala His Leu Phe Asp Asp
Lys Val Met Lys Gln Leu 725 730
735Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile
740 745 750Asn Gly Ile Arg Asp
Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu 755
760 765Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe Met Gln
Leu Ile His Asp 770 775 780Asp Ser Leu
Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly785
790 795 800Gln Gly Asp Ser Leu His Glu
His Ile Ala Asn Leu Ala Gly Ser Pro 805
810 815Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val
Val Asp Glu Leu 820 825 830Val
Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val Ile Glu Met 835
840 845Ala Arg Glu Asn Gln Thr Thr Gln Lys
Gly Gln Lys Asn Ser Arg Glu 850 855
860Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile865
870 875 880Leu Lys Glu His
Pro Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu 885
890 895Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp
Met Tyr Val Asp Gln Glu 900 905
910Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro
915 920 925Gln Ser Phe Leu Lys Asp Asp
Ser Ile Asp Asn Lys Val Leu Thr Arg 930 935
940Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn Val Pro Ser Glu Glu
Val945 950 955 960Val Lys
Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu
965 970 975Ile Thr Gln Arg Lys Phe Asp
Asn Leu Thr Lys Ala Glu Arg Gly Gly 980 985
990Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg Gln Leu
Val Glu 995 1000 1005Thr Arg Gln
Ile Thr Lys His Val Ala Gln Ile Leu Asp Ser Arg 1010
1015 1020Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu
Ile Arg Glu Val 1025 1030 1035Lys Val
Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg Lys 1040
1045 1050Asp Phe Gln Phe Tyr Lys Val Arg Glu Ile
Asn Asn Tyr His His 1055 1060 1065Ala
His Asp Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile 1070
1075 1080Lys Lys Tyr Pro Lys Leu Glu Ser Glu
Phe Val Tyr Gly Asp Tyr 1085 1090
1095Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu
1100 1105 1110Ile Gly Lys Ala Thr Ala
Lys Tyr Phe Phe Tyr Ser Asn Ile Met 1115 1120
1125Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile
Arg 1130 1135 1140Lys Arg Pro Leu Ile
Glu Thr Asn Gly Glu Thr Gly Glu Ile Val 1145 1150
1155Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val
Leu Ser 1160 1165 1170Met Pro Gln Val
Asn Ile Val Lys Lys Thr Glu Val Gln Thr Gly 1175
1180 1185Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg
Asn Ser Asp Lys 1190 1195 1200Leu Ile
Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly 1205
1210 1215Phe Asp Ser Pro Thr Val Ala Tyr Ser Val
Leu Val Val Ala Lys 1220 1225 1230Val
Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu 1235
1240 1245Leu Gly Ile Thr Ile Met Glu Arg Ser
Ser Phe Glu Lys Asn Pro 1250 1255
1260Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp
1265 1270 1275Leu Ile Ile Lys Leu Pro
Lys Tyr Ser Leu Phe Glu Leu Glu Asn 1280 1285
1290Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys
Gly 1295 1300 1305Asn Glu Leu Ala Leu
Pro Ser Lys Tyr Val Asn Phe Leu Tyr Leu 1310 1315
1320Ala Ser His Tyr Glu Lys Leu Lys Gly Ser Pro Glu Asp
Asn Glu 1325 1330 1335Gln Lys Gln Leu
Phe Val Glu Gln His Lys His Tyr Leu Asp Glu 1340
1345 1350Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg
Val Ile Leu Ala 1355 1360 1365Asp Ala
Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys His Arg 1370
1375 1380Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn
Ile Ile His Leu Phe 1385 1390 1395Thr
Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp 1400
1405 1410Thr Thr Ile Asp Arg Lys Arg Tyr Thr
Ser Thr Lys Glu Val Leu 1415 1420
1425Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr
1430 1435 1440Arg Ile Asp Leu Ser Gln
Leu Gly Gly Asp Ser Gly Gly Ser Thr 1445 1450
1455Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu
Val 1460 1465 1470Ile Gln Glu Ser Ile
Leu Met Leu Pro Glu Glu Val Glu Glu Val 1475 1480
1485Ile Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His Thr
Ala Tyr 1490 1495 1500Asp Glu Ser Thr
Asp Glu Asn Val Met Leu Leu Thr Ser Asp Ala 1505
1510 1515Pro Glu Tyr Lys Pro Trp Ala Leu Val Ile Gln
Asp Ser Asn Gly 1520 1525 1530Glu Asn
Lys Ile Lys Met Leu Ser Gly Gly Ser Pro Lys Lys Lys 1535
1540 1545Arg Lys Val 155027167PRTArtificial
SequenceN-hAPOBEC3A sDA1.1-NLS-ZF-Cmisc_feature(93)..(96)Xaa can be any
naturally occurring amino acidmisc_feature(98)..(99)Xaa can be any
naturally occurring amino acidmisc_feature(121)..(124)Xaa can be any
naturally occurring amino acidmisc_feature(126)..(127)Xaa can be any
naturally occurring amino acidmisc_feature(149)..(152)Xaa can be any
naturally occurring amino acidmisc_feature(154)..(155)Xaa can be any
naturally occurring amino acid 27Met Glu Ala Ser Pro Ala Ser Gly Pro Arg
His Leu Met Asp Pro His1 5 10
15Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr
20 25 30Leu Cys Tyr Glu Val Glu
Arg Leu Asp Asn Gly Thr Ser Val Lys Met 35 40
45Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu
Leu Gly 50 55 60Ser Pro Lys Lys Lys
Arg Lys Val Gly Ser Ser Arg Pro Gly Glu Arg65 70
75 80Pro Phe Gln Cys Arg Ile Cys Met Arg Asn
Phe Ser Xaa Xaa Xaa Xaa 85 90
95Leu Xaa Xaa His Thr Arg Thr His Thr Gly Glu Lys Pro Phe Gln Cys
100 105 110Arg Ile Cys Met Arg
Asn Phe Ser Xaa Xaa Xaa Xaa Leu Xaa Xaa His 115
120 125Leu Arg Thr His Thr Gly Glu Lys Pro Phe Gln Cys
Arg Ile Cys Met 130 135 140Arg Asn Phe
Ser Xaa Xaa Xaa Xaa Leu Xaa Xaa His Leu Lys Thr His145
150 155 160Leu Arg Gly Ser Ser Ala Gln
16528190PRTArtificial SequenceN-hAPOBEC3A
sDA1.2-NLS-ZF-Cmisc_feature(116)..(119)Xaa can be any naturally occurring
amino acidmisc_feature(121)..(122)Xaa can be any naturally occurring
amino acidmisc_feature(144)..(147)Xaa can be any naturally occurring
amino acidmisc_feature(149)..(150)Xaa can be any naturally occurring
amino acidmisc_feature(172)..(175)Xaa can be any naturally occurring
amino acidmisc_feature(177)..(178)Xaa can be any naturally occurring
amino acid 28Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro
His1 5 10 15Ile Phe Thr
Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr 20
25 30Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn
Gly Thr Ser Val Lys Met 35 40
45Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys 50
55 60Gly Phe Tyr Gly Arg His Ala Glu Leu
Arg Phe Leu Asp Leu Val Pro65 70 75
80Ser Leu Gln Leu Asp Pro Gly Ser Pro Lys Lys Lys Arg Lys
Val Gly 85 90 95Ser Ser
Arg Pro Gly Glu Arg Pro Phe Gln Cys Arg Ile Cys Met Arg 100
105 110Asn Phe Ser Xaa Xaa Xaa Xaa Leu Xaa
Xaa His Thr Arg Thr His Thr 115 120
125Gly Glu Lys Pro Phe Gln Cys Arg Ile Cys Met Arg Asn Phe Ser Xaa
130 135 140Xaa Xaa Xaa Leu Xaa Xaa His
Leu Arg Thr His Thr Gly Glu Lys Pro145 150
155 160Phe Gln Cys Arg Ile Cys Met Arg Asn Phe Ser Xaa
Xaa Xaa Xaa Leu 165 170
175Xaa Xaa His Leu Lys Thr His Leu Arg Gly Ser Ser Ala Gln 180
185 19029203PRTArtificial
SequenceN-hAPOBEC3A sDA1.3-NLS-ZF-Cmisc_feature(129)..(132)Xaa can be any
naturally occurring amino acidmisc_feature(134)..(135)Xaa can be any
naturally occurring amino acidmisc_feature(157)..(160)Xaa can be any
naturally occurring amino acidmisc_feature(162)..(163)Xaa can be any
naturally occurring amino acidmisc_feature(185)..(188)Xaa can be any
naturally occurring amino acidmisc_feature(190)..(191)Xaa can be any
naturally occurring amino acid 29Met Glu Ala Ser Pro Ala Ser Gly Pro Arg
His Leu Met Asp Pro His1 5 10
15Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr
20 25 30Leu Cys Tyr Glu Val Glu
Arg Leu Asp Asn Gly Thr Ser Val Lys Met 35 40
45Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu
Leu Cys 50 55 60Gly Phe Tyr Gly Arg
His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro65 70
75 80Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr
Arg Val Thr Trp Phe Ile 85 90
95Ser Trp Ser Gly Ser Pro Lys Lys Lys Arg Lys Val Gly Ser Ser Arg
100 105 110Pro Gly Glu Arg Pro
Phe Gln Cys Arg Ile Cys Met Arg Asn Phe Ser 115
120 125Xaa Xaa Xaa Xaa Leu Xaa Xaa His Thr Arg Thr His
Thr Gly Glu Lys 130 135 140Pro Phe Gln
Cys Arg Ile Cys Met Arg Asn Phe Ser Xaa Xaa Xaa Xaa145
150 155 160Leu Xaa Xaa His Leu Arg Thr
His Thr Gly Glu Lys Pro Phe Gln Cys 165
170 175Arg Ile Cys Met Arg Asn Phe Ser Xaa Xaa Xaa Xaa
Leu Xaa Xaa His 180 185 190Leu
Lys Thr His Leu Arg Gly Ser Ser Ala Gln 195
20030222PRTArtificial SequenceN-hAPOBEC3A
sDA1.4-NLS-ZF-Cmisc_feature(148)..(151)Xaa can be any naturally occurring
amino acidmisc_feature(153)..(154)Xaa can be any naturally occurring
amino acidmisc_feature(176)..(179)Xaa can be any naturally occurring
amino acidmisc_feature(181)..(182)Xaa can be any naturally occurring
amino acidmisc_feature(204)..(207)Xaa can be any naturally occurring
amino acidmisc_feature(209)..(210)Xaa can be any naturally occurring
amino acid 30Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro
His1 5 10 15Ile Phe Thr
Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr 20
25 30Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn
Gly Thr Ser Val Lys Met 35 40
45Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys 50
55 60Gly Phe Tyr Gly Arg His Ala Glu Leu
Arg Phe Leu Asp Leu Val Pro65 70 75
80Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp
Phe Ile 85 90 95Ser Trp
Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala 100
105 110Phe Leu Gln Glu Asn Thr Gly Ser Pro
Lys Lys Lys Arg Lys Val Gly 115 120
125Ser Ser Arg Pro Gly Glu Arg Pro Phe Gln Cys Arg Ile Cys Met Arg
130 135 140Asn Phe Ser Xaa Xaa Xaa Xaa
Leu Xaa Xaa His Thr Arg Thr His Thr145 150
155 160Gly Glu Lys Pro Phe Gln Cys Arg Ile Cys Met Arg
Asn Phe Ser Xaa 165 170
175Xaa Xaa Xaa Leu Xaa Xaa His Leu Arg Thr His Thr Gly Glu Lys Pro
180 185 190Phe Gln Cys Arg Ile Cys
Met Arg Asn Phe Ser Xaa Xaa Xaa Xaa Leu 195 200
205Xaa Xaa His Leu Lys Thr His Leu Arg Gly Ser Ser Ala Gln
210 215 22031257PRTArtificial
SequenceN-hAPOBEC3A sDA1.5-NLS-ZF-Cmisc_feature(183)..(186)Xaa can be any
naturally occurring amino acidmisc_feature(188)..(189)Xaa can be any
naturally occurring amino acidmisc_feature(211)..(214)Xaa can be any
naturally occurring amino acidmisc_feature(216)..(217)Xaa can be any
naturally occurring amino acidmisc_feature(239)..(242)Xaa can be any
naturally occurring amino acidmisc_feature(244)..(245)Xaa can be any
naturally occurring amino acid 31Met Glu Ala Ser Pro Ala Ser Gly Pro Arg
His Leu Met Asp Pro His1 5 10
15Ile Phe Thr Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr
20 25 30Leu Cys Tyr Glu Val Glu
Arg Leu Asp Asn Gly Thr Ser Val Lys Met 35 40
45Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu
Leu Cys 50 55 60Gly Phe Tyr Gly Arg
His Ala Glu Leu Arg Phe Leu Asp Leu Val Pro65 70
75 80Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr
Arg Val Thr Trp Phe Ile 85 90
95Ser Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala
100 105 110Phe Leu Gln Glu Asn
Thr His Val Arg Leu Arg Ile Phe Ala Ala Arg 115
120 125Ile Tyr Asp Tyr Asp Pro Leu Tyr Lys Glu Ala Leu
Gln Met Leu Arg 130 135 140Asp Ala Gly
Ala Gln Val Ser Ile Met Gly Ser Pro Lys Lys Lys Arg145
150 155 160Lys Val Gly Ser Ser Arg Pro
Gly Glu Arg Pro Phe Gln Cys Arg Ile 165
170 175Cys Met Arg Asn Phe Ser Xaa Xaa Xaa Xaa Leu Xaa
Xaa His Thr Arg 180 185 190Thr
His Thr Gly Glu Lys Pro Phe Gln Cys Arg Ile Cys Met Arg Asn 195
200 205Phe Ser Xaa Xaa Xaa Xaa Leu Xaa Xaa
His Leu Arg Thr His Thr Gly 210 215
220Glu Lys Pro Phe Gln Cys Arg Ile Cys Met Arg Asn Phe Ser Xaa Xaa225
230 235 240Xaa Xaa Leu Xaa
Xaa His Leu Lys Thr His Leu Arg Gly Ser Ser Ala 245
250 255Gln32276PRTArtificial SequenceN-hAPOBEC3A
sDA1.6-NLS-ZF-Cmisc_feature(202)..(205)Xaa can be any naturally occurring
amino acidmisc_feature(207)..(208)Xaa can be any naturally occurring
amino acidmisc_feature(230)..(233)Xaa can be any naturally occurring
amino acidmisc_feature(235)..(236)Xaa can be any naturally occurring
amino acidmisc_feature(258)..(261)Xaa can be any naturally occurring
amino acidmisc_feature(263)..(264)Xaa can be any naturally occurring
amino acid 32Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro
His1 5 10 15Ile Phe Thr
Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr 20
25 30Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn
Gly Thr Ser Val Lys Met 35 40
45Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys 50
55 60Gly Phe Tyr Gly Arg His Ala Glu Leu
Arg Phe Leu Asp Leu Val Pro65 70 75
80Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp
Phe Ile 85 90 95Ser Trp
Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala 100
105 110Phe Leu Gln Glu Asn Thr His Val Arg
Leu Arg Ile Phe Ala Ala Arg 115 120
125Ile Tyr Asp Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg
130 135 140Asp Ala Gly Ala Gln Val Ser
Ile Met Thr Tyr Asp Glu Phe Lys His145 150
155 160Cys Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro
Gly Ser Pro Lys 165 170
175Lys Lys Arg Lys Val Gly Ser Ser Arg Pro Gly Glu Arg Pro Phe Gln
180 185 190Cys Arg Ile Cys Met Arg
Asn Phe Ser Xaa Xaa Xaa Xaa Leu Xaa Xaa 195 200
205His Thr Arg Thr His Thr Gly Glu Lys Pro Phe Gln Cys Arg
Ile Cys 210 215 220Met Arg Asn Phe Ser
Xaa Xaa Xaa Xaa Leu Xaa Xaa His Leu Arg Thr225 230
235 240His Thr Gly Glu Lys Pro Phe Gln Cys Arg
Ile Cys Met Arg Asn Phe 245 250
255Ser Xaa Xaa Xaa Xaa Leu Xaa Xaa His Leu Lys Thr His Leu Arg Gly
260 265 270Ser Ser Ala Gln
275331618PRTArtificial SequenceN-hAPOBEC3A sDA2.1-nCas9-UGI-C 33Met Cys
Gly Phe Tyr Gly Arg His Ala Glu Leu Arg Phe Leu Asp Leu1 5
10 15Val Pro Ser Leu Gln Leu Asp Pro
Ala Gln Ile Tyr Arg Val Thr Trp 20 25
30Phe Ile Ser Trp Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu
Val 35 40 45Arg Ala Phe Leu Gln
Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala 50 55
60Ala Arg Ile Tyr Asp Tyr Asp Pro Leu Tyr Lys Glu Ala Leu
Gln Met65 70 75 80Leu
Arg Asp Ala Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu Phe
85 90 95Lys His Cys Trp Asp Thr Phe
Val Asp His Gln Gly Cys Pro Phe Gln 100 105
110Pro Trp Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly
Arg Leu 115 120 125Arg Ala Ile Leu
Gln Asn Gln Gly Asn Ser Gly Ser Glu Thr Pro Gly 130
135 140Thr Ser Glu Ser Ala Thr Pro Glu Ser Asp Lys Lys
Tyr Ser Ile Gly145 150 155
160Leu Ala Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile Thr Asp Glu
165 170 175Tyr Lys Val Pro Ser
Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg 180
185 190His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu
Phe Asp Ser Gly 195 200 205Glu Thr
Ala Glu Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr 210
215 220Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln
Glu Ile Phe Ser Asn225 230 235
240Glu Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu Glu Glu Ser
245 250 255Phe Leu Val Glu
Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly 260
265 270Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys
Tyr Pro Thr Ile Tyr 275 280 285His
Leu Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala Asp Leu Arg 290
295 300Leu Ile Tyr Leu Ala Leu Ala His Met Ile
Lys Phe Arg Gly His Phe305 310 315
320Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val Asp Lys
Leu 325 330 335Phe Ile Gln
Leu Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro 340
345 350Ile Asn Ala Ser Gly Val Asp Ala Lys Ala
Ile Leu Ser Ala Arg Leu 355 360
365Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu Pro Gly Glu 370
375 380Lys Lys Asn Gly Leu Phe Gly Asn
Leu Ile Ala Leu Ser Leu Gly Leu385 390
395 400Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu
Asp Ala Lys Leu 405 410
415Gln Leu Ser Lys Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala
420 425 430Gln Ile Gly Asp Gln Tyr
Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu 435 440
445Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn Thr
Glu Ile 450 455 460Thr Lys Ala Pro Leu
Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His465 470
475 480His Gln Asp Leu Thr Leu Leu Lys Ala Leu
Val Arg Gln Gln Leu Pro 485 490
495Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr Ala
500 505 510Gly Tyr Ile Asp Gly
Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile 515
520 525Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu
Leu Leu Val Lys 530 535 540Leu Asn Arg
Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn Gly545
550 555 560Ser Ile Pro His Gln Ile His
Leu Gly Glu Leu His Ala Ile Leu Arg 565
570 575Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn
Arg Glu Lys Ile 580 585 590Glu
Lys Ile Leu Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala 595
600 605Arg Gly Asn Ser Arg Phe Ala Trp Met
Thr Arg Lys Ser Glu Glu Thr 610 615
620Ile Thr Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly Ala Ser Ala625
630 635 640Gln Ser Phe Ile
Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn 645
650 655Glu Lys Val Leu Pro Lys His Ser Leu Leu
Tyr Glu Tyr Phe Thr Val 660 665
670Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys
675 680 685Pro Ala Phe Leu Ser Gly Glu
Gln Lys Lys Ala Ile Val Asp Leu Leu 690 695
700Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys Glu Asp
Tyr705 710 715 720Phe Lys
Lys Ile Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu
725 730 735Asp Arg Phe Asn Ala Ser Leu
Gly Thr Tyr His Asp Leu Leu Lys Ile 740 745
750Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp
Ile Leu 755 760 765Glu Asp Ile Val
Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile 770
775 780Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp
Asp Lys Val Met785 790 795
800Lys Gln Leu Lys Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg
805 810 815Lys Leu Ile Asn Gly
Ile Arg Asp Lys Gln Ser Gly Lys Thr Ile Leu 820
825 830Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn
Phe Met Gln Leu 835 840 845Ile His
Asp Asp Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln 850
855 860Val Ser Gly Gln Gly Asp Ser Leu His Glu His
Ile Ala Asn Leu Ala865 870 875
880Gly Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val Lys Val Val
885 890 895Asp Glu Leu Val
Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val 900
905 910Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln
Lys Gly Gln Lys Asn 915 920 925Ser
Arg Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys Glu Leu Gly 930
935 940Ser Gln Ile Leu Lys Glu His Pro Val Glu
Asn Thr Gln Leu Gln Asn945 950 955
960Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp Met Tyr
Val 965 970 975Asp Gln Glu
Leu Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp His 980
985 990Ile Val Pro Gln Ser Phe Leu Lys Asp Asp
Ser Ile Asp Asn Lys Val 995 1000
1005Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn Val Pro
1010 1015 1020Ser Glu Glu Val Val Lys
Lys Met Lys Asn Tyr Trp Arg Gln Leu 1025 1030
1035Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu
Thr 1040 1045 1050Lys Ala Glu Arg Gly
Gly Leu Ser Glu Leu Asp Lys Ala Gly Phe 1055 1060
1065Ile Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr Lys
His Val 1070 1075 1080Ala Gln Ile Leu
Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu Asn 1085
1090 1095Asp Lys Leu Ile Arg Glu Val Lys Val Ile Thr
Leu Lys Ser Lys 1100 1105 1110Leu Val
Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg 1115
1120 1125Glu Ile Asn Asn Tyr His His Ala His Asp
Ala Tyr Leu Asn Ala 1130 1135 1140Val
Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser 1145
1150 1155Glu Phe Val Tyr Gly Asp Tyr Lys Val
Tyr Asp Val Arg Lys Met 1160 1165
1170Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr
1175 1180 1185Phe Phe Tyr Ser Asn Ile
Met Asn Phe Phe Lys Thr Glu Ile Thr 1190 1195
1200Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr
Asn 1205 1210 1215Gly Glu Thr Gly Glu
Ile Val Trp Asp Lys Gly Arg Asp Phe Ala 1220 1225
1230Thr Val Arg Lys Val Leu Ser Met Pro Gln Val Asn Ile
Val Lys 1235 1240 1245Lys Thr Glu Val
Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu 1250
1255 1260Pro Lys Arg Asn Ser Asp Lys Leu Ile Ala Arg
Lys Lys Asp Trp 1265 1270 1275Asp Pro
Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr 1280
1285 1290Ser Val Leu Val Val Ala Lys Val Glu Lys
Gly Lys Ser Lys Lys 1295 1300 1305Leu
Lys Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg 1310
1315 1320Ser Ser Phe Glu Lys Asn Pro Ile Asp
Phe Leu Glu Ala Lys Gly 1325 1330
1335Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr
1340 1345 1350Ser Leu Phe Glu Leu Glu
Asn Gly Arg Lys Arg Met Leu Ala Ser 1355 1360
1365Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser
Lys 1370 1375 1380Tyr Val Asn Phe Leu
Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys 1385 1390
1395Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln Leu Phe Val
Glu Gln 1400 1405 1410His Lys His Tyr
Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe 1415
1420 1425Ser Lys Arg Val Ile Leu Ala Asp Ala Asn Leu
Asp Lys Val Leu 1430 1435 1440Ser Ala
Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala 1445
1450 1455Glu Asn Ile Ile His Leu Phe Thr Leu Thr
Asn Leu Gly Ala Pro 1460 1465 1470Ala
Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr 1475
1480 1485Thr Ser Thr Lys Glu Val Leu Asp Ala
Thr Leu Ile His Gln Ser 1490 1495
1500Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly
1505 1510 1515Gly Asp Ser Gly Gly Ser
Thr Asn Leu Ser Asp Ile Ile Glu Lys 1520 1525
1530Glu Thr Gly Lys Gln Leu Val Ile Gln Glu Ser Ile Leu Met
Leu 1535 1540 1545Pro Glu Glu Val Glu
Glu Val Ile Gly Asn Lys Pro Glu Ser Asp 1550 1555
1560Ile Leu Val His Thr Ala Tyr Asp Glu Ser Thr Asp Glu
Asn Val 1565 1570 1575Met Leu Leu Thr
Ser Asp Ala Pro Glu Tyr Lys Pro Trp Ala Leu 1580
1585 1590Val Ile Gln Asp Ser Asn Gly Glu Asn Lys Ile
Lys Met Leu Ser 1595 1600 1605Gly Gly
Ser Pro Lys Lys Lys Arg Lys Val 1610
1615341595PRTArtificial SequenceN-hAPOBEC3A sDA2.2-nCas9-UGI-C 34Met Ala
Gln Ile Tyr Arg Val Thr Trp Phe Ile Ser Trp Ser Pro Cys1 5
10 15Phe Ser Trp Gly Cys Ala Gly Glu
Val Arg Ala Phe Leu Gln Glu Asn 20 25
30Thr His Val Arg Leu Arg Ile Phe Ala Ala Arg Ile Tyr Asp Tyr
Asp 35 40 45Pro Leu Tyr Lys Glu
Ala Leu Gln Met Leu Arg Asp Ala Gly Ala Gln 50 55
60Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His Cys Trp Asp
Thr Phe65 70 75 80Val
Asp His Gln Gly Cys Pro Phe Gln Pro Trp Asp Gly Leu Asp Glu
85 90 95His Ser Gln Ala Leu Ser Gly
Arg Leu Arg Ala Ile Leu Gln Asn Gln 100 105
110Gly Asn Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala
Thr Pro 115 120 125Glu Ser Asp Lys
Lys Tyr Ser Ile Gly Leu Ala Ile Gly Thr Asn Ser 130
135 140Val Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val
Pro Ser Lys Lys145 150 155
160Phe Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu
165 170 175Ile Gly Ala Leu Leu
Phe Asp Ser Gly Glu Thr Ala Glu Ala Thr Arg 180
185 190Leu Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg
Lys Asn Arg Ile 195 200 205Cys Tyr
Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp 210
215 220Ser Phe Phe His Arg Leu Glu Glu Ser Phe Leu
Val Glu Glu Asp Lys225 230 235
240Lys His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala
245 250 255Tyr His Glu Lys
Tyr Pro Thr Ile Tyr His Leu Arg Lys Lys Leu Val 260
265 270Asp Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile
Tyr Leu Ala Leu Ala 275 280 285His
Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn 290
295 300Pro Asp Asn Ser Asp Val Asp Lys Leu Phe
Ile Gln Leu Val Gln Thr305 310 315
320Tyr Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val
Asp 325 330 335Ala Lys Ala
Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg Arg Leu Glu 340
345 350Asn Leu Ile Ala Gln Leu Pro Gly Glu Lys
Lys Asn Gly Leu Phe Gly 355 360
365Asn Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn 370
375 380Phe Asp Leu Ala Glu Asp Ala Lys
Leu Gln Leu Ser Lys Asp Thr Tyr385 390
395 400Asp Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly
Asp Gln Tyr Ala 405 410
415Asp Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser
420 425 430Asp Ile Leu Arg Val Asn
Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala 435 440
445Ser Met Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr
Leu Leu 450 455 460Lys Ala Leu Val Arg
Gln Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe465 470
475 480Phe Asp Gln Ser Lys Asn Gly Tyr Ala Gly
Tyr Ile Asp Gly Gly Ala 485 490
495Ser Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met
500 505 510Asp Gly Thr Glu Glu
Leu Leu Val Lys Leu Asn Arg Glu Asp Leu Leu 515
520 525Arg Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro
His Gln Ile His 530 535 540Leu Gly Glu
Leu His Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro545
550 555 560Phe Leu Lys Asp Asn Arg Glu
Lys Ile Glu Lys Ile Leu Thr Phe Arg 565
570 575Ile Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn
Ser Arg Phe Ala 580 585 590Trp
Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp Asn Phe Glu 595
600 605Glu Val Val Asp Lys Gly Ala Ser Ala
Gln Ser Phe Ile Glu Arg Met 610 615
620Thr Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His625
630 635 640Ser Leu Leu Tyr
Glu Tyr Phe Thr Val Tyr Asn Glu Leu Thr Lys Val 645
650 655Lys Tyr Val Thr Glu Gly Met Arg Lys Pro
Ala Phe Leu Ser Gly Glu 660 665
670Gln Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val
675 680 685Thr Val Lys Gln Leu Lys Glu
Asp Tyr Phe Lys Lys Ile Glu Cys Phe 690 695
700Asp Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser
Leu705 710 715 720Gly Thr
Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys Asp Phe Leu
725 730 735Asp Asn Glu Glu Asn Glu Asp
Ile Leu Glu Asp Ile Val Leu Thr Leu 740 745
750Thr Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys
Thr Tyr 755 760 765Ala His Leu Phe
Asp Asp Lys Val Met Lys Gln Leu Lys Arg Arg Arg 770
775 780Tyr Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile
Asn Gly Ile Arg785 790 795
800Asp Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly
805 810 815Phe Ala Asn Arg Asn
Phe Met Gln Leu Ile His Asp Asp Ser Leu Thr 820
825 830Phe Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly
Gln Gly Asp Ser 835 840 845Leu His
Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys 850
855 860Gly Ile Leu Gln Thr Val Lys Val Val Asp Glu
Leu Val Lys Val Met865 870 875
880Gly Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn
885 890 895Gln Thr Thr Gln
Lys Gly Gln Lys Asn Ser Arg Glu Arg Met Lys Arg 900
905 910Ile Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln
Ile Leu Lys Glu His 915 920 925Pro
Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr 930
935 940Leu Gln Asn Gly Arg Asp Met Tyr Val Asp
Gln Glu Leu Asp Ile Asn945 950 955
960Arg Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe
Leu 965 970 975Lys Asp Asp
Ser Ile Asp Asn Lys Val Leu Thr Arg Ser Asp Lys Asn 980
985 990Arg Gly Lys Ser Asp Asn Val Pro Ser Glu
Glu Val Val Lys Lys Met 995 1000
1005Lys Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln
1010 1015 1020Arg Lys Phe Asp Asn Leu
Thr Lys Ala Glu Arg Gly Gly Leu Ser 1025 1030
1035Glu Leu Asp Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu
Thr 1040 1045 1050Arg Gln Ile Thr Lys
His Val Ala Gln Ile Leu Asp Ser Arg Met 1055 1060
1065Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu
Val Lys 1070 1075 1080Val Ile Thr Leu
Lys Ser Lys Leu Val Ser Asp Phe Arg Lys Asp 1085
1090 1095Phe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn
Tyr His His Ala 1100 1105 1110His Asp
Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu Ile Lys 1115
1120 1125Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val
Tyr Gly Asp Tyr Lys 1130 1135 1140Val
Tyr Asp Val Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile 1145
1150 1155Gly Lys Ala Thr Ala Lys Tyr Phe Phe
Tyr Ser Asn Ile Met Asn 1160 1165
1170Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys
1175 1180 1185Arg Pro Leu Ile Glu Thr
Asn Gly Glu Thr Gly Glu Ile Val Trp 1190 1195
1200Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser
Met 1205 1210 1215Pro Gln Val Asn Ile
Val Lys Lys Thr Glu Val Gln Thr Gly Gly 1220 1225
1230Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp
Lys Leu 1235 1240 1245Ile Ala Arg Lys
Lys Asp Trp Asp Pro Lys Lys Tyr Gly Gly Phe 1250
1255 1260Asp Ser Pro Thr Val Ala Tyr Ser Val Leu Val
Val Ala Lys Val 1265 1270 1275Glu Lys
Gly Lys Ser Lys Lys Leu Lys Ser Val Lys Glu Leu Leu 1280
1285 1290Gly Ile Thr Ile Met Glu Arg Ser Ser Phe
Glu Lys Asn Pro Ile 1295 1300 1305Asp
Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu 1310
1315 1320Ile Ile Lys Leu Pro Lys Tyr Ser Leu
Phe Glu Leu Glu Asn Gly 1325 1330
1335Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn
1340 1345 1350Glu Leu Ala Leu Pro Ser
Lys Tyr Val Asn Phe Leu Tyr Leu Ala 1355 1360
1365Ser His Tyr Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu
Gln 1370 1375 1380Lys Gln Leu Phe Val
Glu Gln His Lys His Tyr Leu Asp Glu Ile 1385 1390
1395Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val Ile Leu
Ala Asp 1400 1405 1410Ala Asn Leu Asp
Lys Val Leu Ser Ala Tyr Asn Lys His Arg Asp 1415
1420 1425Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile
His Leu Phe Thr 1430 1435 1440Leu Thr
Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe Asp Thr 1445
1450 1455Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr
Lys Glu Val Leu Asp 1460 1465 1470Ala
Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg 1475
1480 1485Ile Asp Leu Ser Gln Leu Gly Gly Asp
Ser Gly Gly Ser Thr Asn 1490 1495
1500Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu Val Ile
1505 1510 1515Gln Glu Ser Ile Leu Met
Leu Pro Glu Glu Val Glu Glu Val Ile 1520 1525
1530Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His Thr Ala Tyr
Asp 1535 1540 1545Glu Ser Thr Asp Glu
Asn Val Met Leu Leu Thr Ser Asp Ala Pro 1550 1555
1560Glu Tyr Lys Pro Trp Ala Leu Val Ile Gln Asp Ser Asn
Gly Glu 1565 1570 1575Asn Lys Ile Lys
Met Leu Ser Gly Gly Ser Pro Lys Lys Lys Arg 1580
1585 1590Lys Val 1595351582PRTArtificial
SequenceN-hAPOBEC3A sDA2.3-nCas9-UGI-C 35Met Pro Cys Phe Ser Trp Gly Cys
Ala Gly Glu Val Arg Ala Phe Leu1 5 10
15Gln Glu Asn Thr His Val Arg Leu Arg Ile Phe Ala Ala Arg
Ile Tyr 20 25 30Asp Tyr Asp
Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg Asp Ala 35
40 45Gly Ala Gln Val Ser Ile Met Thr Tyr Asp Glu
Phe Lys His Cys Trp 50 55 60Asp Thr
Phe Val Asp His Gln Gly Cys Pro Phe Gln Pro Trp Asp Gly65
70 75 80Leu Asp Glu His Ser Gln Ala
Leu Ser Gly Arg Leu Arg Ala Ile Leu 85 90
95Gln Asn Gln Gly Asn Ser Gly Ser Glu Thr Pro Gly Thr
Ser Glu Ser 100 105 110Ala Thr
Pro Glu Ser Asp Lys Lys Tyr Ser Ile Gly Leu Ala Ile Gly 115
120 125Thr Asn Ser Val Gly Trp Ala Val Ile Thr
Asp Glu Tyr Lys Val Pro 130 135 140Ser
Lys Lys Phe Lys Val Leu Gly Asn Thr Asp Arg His Ser Ile Lys145
150 155 160Lys Asn Leu Ile Gly Ala
Leu Leu Phe Asp Ser Gly Glu Thr Ala Glu 165
170 175Ala Thr Arg Leu Lys Arg Thr Ala Arg Arg Arg Tyr
Thr Arg Arg Lys 180 185 190Asn
Arg Ile Cys Tyr Leu Gln Glu Ile Phe Ser Asn Glu Met Ala Lys 195
200 205Val Asp Asp Ser Phe Phe His Arg Leu
Glu Glu Ser Phe Leu Val Glu 210 215
220Glu Asp Lys Lys His Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp225
230 235 240Glu Val Ala Tyr
His Glu Lys Tyr Pro Thr Ile Tyr His Leu Arg Lys 245
250 255Lys Leu Val Asp Ser Thr Asp Lys Ala Asp
Leu Arg Leu Ile Tyr Leu 260 265
270Ala Leu Ala His Met Ile Lys Phe Arg Gly His Phe Leu Ile Glu Gly
275 280 285Asp Leu Asn Pro Asp Asn Ser
Asp Val Asp Lys Leu Phe Ile Gln Leu 290 295
300Val Gln Thr Tyr Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala
Ser305 310 315 320Gly Val
Asp Ala Lys Ala Ile Leu Ser Ala Arg Leu Ser Lys Ser Arg
325 330 335Arg Leu Glu Asn Leu Ile Ala
Gln Leu Pro Gly Glu Lys Lys Asn Gly 340 345
350Leu Phe Gly Asn Leu Ile Ala Leu Ser Leu Gly Leu Thr Pro
Asn Phe 355 360 365Lys Ser Asn Phe
Asp Leu Ala Glu Asp Ala Lys Leu Gln Leu Ser Lys 370
375 380Asp Thr Tyr Asp Asp Asp Leu Asp Asn Leu Leu Ala
Gln Ile Gly Asp385 390 395
400Gln Tyr Ala Asp Leu Phe Leu Ala Ala Lys Asn Leu Ser Asp Ala Ile
405 410 415Leu Leu Ser Asp Ile
Leu Arg Val Asn Thr Glu Ile Thr Lys Ala Pro 420
425 430Leu Ser Ala Ser Met Ile Lys Arg Tyr Asp Glu His
His Gln Asp Leu 435 440 445Thr Leu
Leu Lys Ala Leu Val Arg Gln Gln Leu Pro Glu Lys Tyr Lys 450
455 460Glu Ile Phe Phe Asp Gln Ser Lys Asn Gly Tyr
Ala Gly Tyr Ile Asp465 470 475
480Gly Gly Ala Ser Gln Glu Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu
485 490 495Glu Lys Met Asp
Gly Thr Glu Glu Leu Leu Val Lys Leu Asn Arg Glu 500
505 510Asp Leu Leu Arg Lys Gln Arg Thr Phe Asp Asn
Gly Ser Ile Pro His 515 520 525Gln
Ile His Leu Gly Glu Leu His Ala Ile Leu Arg Arg Gln Glu Asp 530
535 540Phe Tyr Pro Phe Leu Lys Asp Asn Arg Glu
Lys Ile Glu Lys Ile Leu545 550 555
560Thr Phe Arg Ile Pro Tyr Tyr Val Gly Pro Leu Ala Arg Gly Asn
Ser 565 570 575Arg Phe Ala
Trp Met Thr Arg Lys Ser Glu Glu Thr Ile Thr Pro Trp 580
585 590Asn Phe Glu Glu Val Val Asp Lys Gly Ala
Ser Ala Gln Ser Phe Ile 595 600
605Glu Arg Met Thr Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu 610
615 620Pro Lys His Ser Leu Leu Tyr Glu
Tyr Phe Thr Val Tyr Asn Glu Leu625 630
635 640Thr Lys Val Lys Tyr Val Thr Glu Gly Met Arg Lys
Pro Ala Phe Leu 645 650
655Ser Gly Glu Gln Lys Lys Ala Ile Val Asp Leu Leu Phe Lys Thr Asn
660 665 670Arg Lys Val Thr Val Lys
Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile 675 680
685Glu Cys Phe Asp Ser Val Glu Ile Ser Gly Val Glu Asp Arg
Phe Asn 690 695 700Ala Ser Leu Gly Thr
Tyr His Asp Leu Leu Lys Ile Ile Lys Asp Lys705 710
715 720Asp Phe Leu Asp Asn Glu Glu Asn Glu Asp
Ile Leu Glu Asp Ile Val 725 730
735Leu Thr Leu Thr Leu Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu
740 745 750Lys Thr Tyr Ala His
Leu Phe Asp Asp Lys Val Met Lys Gln Leu Lys 755
760 765Arg Arg Arg Tyr Thr Gly Trp Gly Arg Leu Ser Arg
Lys Leu Ile Asn 770 775 780Gly Ile Arg
Asp Lys Gln Ser Gly Lys Thr Ile Leu Asp Phe Leu Lys785
790 795 800Ser Asp Gly Phe Ala Asn Arg
Asn Phe Met Gln Leu Ile His Asp Asp 805
810 815Ser Leu Thr Phe Lys Glu Asp Ile Gln Lys Ala Gln
Val Ser Gly Gln 820 825 830Gly
Asp Ser Leu His Glu His Ile Ala Asn Leu Ala Gly Ser Pro Ala 835
840 845Ile Lys Lys Gly Ile Leu Gln Thr Val
Lys Val Val Asp Glu Leu Val 850 855
860Lys Val Met Gly Arg His Lys Pro Glu Asn Ile Val Ile Glu Met Ala865
870 875 880Arg Glu Asn Gln
Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg Glu Arg 885
890 895Met Lys Arg Ile Glu Glu Gly Ile Lys Glu
Leu Gly Ser Gln Ile Leu 900 905
910Lys Glu His Pro Val Glu Asn Thr Gln Leu Gln Asn Glu Lys Leu Tyr
915 920 925Leu Tyr Tyr Leu Gln Asn Gly
Arg Asp Met Tyr Val Asp Gln Glu Leu 930 935
940Asp Ile Asn Arg Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro
Gln945 950 955 960Ser Phe
Leu Lys Asp Asp Ser Ile Asp Asn Lys Val Leu Thr Arg Ser
965 970 975Asp Lys Asn Arg Gly Lys Ser
Asp Asn Val Pro Ser Glu Glu Val Val 980 985
990Lys Lys Met Lys Asn Tyr Trp Arg Gln Leu Leu Asn Ala Lys
Leu Ile 995 1000 1005Thr Gln Arg
Lys Phe Asp Asn Leu Thr Lys Ala Glu Arg Gly Gly 1010
1015 1020Leu Ser Glu Leu Asp Lys Ala Gly Phe Ile Lys
Arg Gln Leu Val 1025 1030 1035Glu Thr
Arg Gln Ile Thr Lys His Val Ala Gln Ile Leu Asp Ser 1040
1045 1050Arg Met Asn Thr Lys Tyr Asp Glu Asn Asp
Lys Leu Ile Arg Glu 1055 1060 1065Val
Lys Val Ile Thr Leu Lys Ser Lys Leu Val Ser Asp Phe Arg 1070
1075 1080Lys Asp Phe Gln Phe Tyr Lys Val Arg
Glu Ile Asn Asn Tyr His 1085 1090
1095His Ala His Asp Ala Tyr Leu Asn Ala Val Val Gly Thr Ala Leu
1100 1105 1110Ile Lys Lys Tyr Pro Lys
Leu Glu Ser Glu Phe Val Tyr Gly Asp 1115 1120
1125Tyr Lys Val Tyr Asp Val Arg Lys Met Ile Ala Lys Ser Glu
Gln 1130 1135 1140Glu Ile Gly Lys Ala
Thr Ala Lys Tyr Phe Phe Tyr Ser Asn Ile 1145 1150
1155Met Asn Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly
Glu Ile 1160 1165 1170Arg Lys Arg Pro
Leu Ile Glu Thr Asn Gly Glu Thr Gly Glu Ile 1175
1180 1185Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val
Arg Lys Val Leu 1190 1195 1200Ser Met
Pro Gln Val Asn Ile Val Lys Lys Thr Glu Val Gln Thr 1205
1210 1215Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro
Lys Arg Asn Ser Asp 1220 1225 1230Lys
Leu Ile Ala Arg Lys Lys Asp Trp Asp Pro Lys Lys Tyr Gly 1235
1240 1245Gly Phe Asp Ser Pro Thr Val Ala Tyr
Ser Val Leu Val Val Ala 1250 1255
1260Lys Val Glu Lys Gly Lys Ser Lys Lys Leu Lys Ser Val Lys Glu
1265 1270 1275Leu Leu Gly Ile Thr Ile
Met Glu Arg Ser Ser Phe Glu Lys Asn 1280 1285
1290Pro Ile Asp Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val Lys
Lys 1295 1300 1305Asp Leu Ile Ile Lys
Leu Pro Lys Tyr Ser Leu Phe Glu Leu Glu 1310 1315
1320Asn Gly Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu
Gln Lys 1325 1330 1335Gly Asn Glu Leu
Ala Leu Pro Ser Lys Tyr Val Asn Phe Leu Tyr 1340
1345 1350Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser
Pro Glu Asp Asn 1355 1360 1365Glu Gln
Lys Gln Leu Phe Val Glu Gln His Lys His Tyr Leu Asp 1370
1375 1380Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser
Lys Arg Val Ile Leu 1385 1390 1395Ala
Asp Ala Asn Leu Asp Lys Val Leu Ser Ala Tyr Asn Lys His 1400
1405 1410Arg Asp Lys Pro Ile Arg Glu Gln Ala
Glu Asn Ile Ile His Leu 1415 1420
1425Phe Thr Leu Thr Asn Leu Gly Ala Pro Ala Ala Phe Lys Tyr Phe
1430 1435 1440Asp Thr Thr Ile Asp Arg
Lys Arg Tyr Thr Ser Thr Lys Glu Val 1445 1450
1455Leu Asp Ala Thr Leu Ile His Gln Ser Ile Thr Gly Leu Tyr
Glu 1460 1465 1470Thr Arg Ile Asp Leu
Ser Gln Leu Gly Gly Asp Ser Gly Gly Ser 1475 1480
1485Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys
Gln Leu 1490 1495 1500Val Ile Gln Glu
Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu 1505
1510 1515Val Ile Gly Asn Lys Pro Glu Ser Asp Ile Leu
Val His Thr Ala 1520 1525 1530Tyr Asp
Glu Ser Thr Asp Glu Asn Val Met Leu Leu Thr Ser Asp 1535
1540 1545Ala Pro Glu Tyr Lys Pro Trp Ala Leu Val
Ile Gln Asp Ser Asn 1550 1555 1560Gly
Glu Asn Lys Ile Lys Met Leu Ser Gly Gly Ser Pro Lys Lys 1565
1570 1575Lys Arg Lys Val
1580361563PRTArtificial SequenceN-hAPOBEC3A sDA2.4-nCas9-UGI-C 36Met His
Val Arg Leu Arg Ile Phe Ala Ala Arg Ile Tyr Asp Tyr Asp1 5
10 15Pro Leu Tyr Lys Glu Ala Leu Gln
Met Leu Arg Asp Ala Gly Ala Gln 20 25
30Val Ser Ile Met Thr Tyr Asp Glu Phe Lys His Cys Trp Asp Thr
Phe 35 40 45Val Asp His Gln Gly
Cys Pro Phe Gln Pro Trp Asp Gly Leu Asp Glu 50 55
60His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala Ile Leu Gln
Asn Gln65 70 75 80Gly
Asn Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro
85 90 95Glu Ser Asp Lys Lys Tyr Ser
Ile Gly Leu Ala Ile Gly Thr Asn Ser 100 105
110Val Gly Trp Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser
Lys Lys 115 120 125Phe Lys Val Leu
Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu 130
135 140Ile Gly Ala Leu Leu Phe Asp Ser Gly Glu Thr Ala
Glu Ala Thr Arg145 150 155
160Leu Lys Arg Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile
165 170 175Cys Tyr Leu Gln Glu
Ile Phe Ser Asn Glu Met Ala Lys Val Asp Asp 180
185 190Ser Phe Phe His Arg Leu Glu Glu Ser Phe Leu Val
Glu Glu Asp Lys 195 200 205Lys His
Glu Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala 210
215 220Tyr His Glu Lys Tyr Pro Thr Ile Tyr His Leu
Arg Lys Lys Leu Val225 230 235
240Asp Ser Thr Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala
245 250 255His Met Ile Lys
Phe Arg Gly His Phe Leu Ile Glu Gly Asp Leu Asn 260
265 270Pro Asp Asn Ser Asp Val Asp Lys Leu Phe Ile
Gln Leu Val Gln Thr 275 280 285Tyr
Asn Gln Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp 290
295 300Ala Lys Ala Ile Leu Ser Ala Arg Leu Ser
Lys Ser Arg Arg Leu Glu305 310 315
320Asn Leu Ile Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe
Gly 325 330 335Asn Leu Ile
Ala Leu Ser Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn 340
345 350Phe Asp Leu Ala Glu Asp Ala Lys Leu Gln
Leu Ser Lys Asp Thr Tyr 355 360
365Asp Asp Asp Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala 370
375 380Asp Leu Phe Leu Ala Ala Lys Asn
Leu Ser Asp Ala Ile Leu Leu Ser385 390
395 400Asp Ile Leu Arg Val Asn Thr Glu Ile Thr Lys Ala
Pro Leu Ser Ala 405 410
415Ser Met Ile Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu
420 425 430Lys Ala Leu Val Arg Gln
Gln Leu Pro Glu Lys Tyr Lys Glu Ile Phe 435 440
445Phe Asp Gln Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp Gly
Gly Ala 450 455 460Ser Gln Glu Glu Phe
Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met465 470
475 480Asp Gly Thr Glu Glu Leu Leu Val Lys Leu
Asn Arg Glu Asp Leu Leu 485 490
495Arg Lys Gln Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His
500 505 510Leu Gly Glu Leu His
Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro 515
520 525Phe Leu Lys Asp Asn Arg Glu Lys Ile Glu Lys Ile
Leu Thr Phe Arg 530 535 540Ile Pro Tyr
Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala545
550 555 560Trp Met Thr Arg Lys Ser Glu
Glu Thr Ile Thr Pro Trp Asn Phe Glu 565
570 575Glu Val Val Asp Lys Gly Ala Ser Ala Gln Ser Phe
Ile Glu Arg Met 580 585 590Thr
Asn Phe Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His 595
600 605Ser Leu Leu Tyr Glu Tyr Phe Thr Val
Tyr Asn Glu Leu Thr Lys Val 610 615
620Lys Tyr Val Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly Glu625
630 635 640Gln Lys Lys Ala
Ile Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val 645
650 655Thr Val Lys Gln Leu Lys Glu Asp Tyr Phe
Lys Lys Ile Glu Cys Phe 660 665
670Asp Ser Val Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu
675 680 685Gly Thr Tyr His Asp Leu Leu
Lys Ile Ile Lys Asp Lys Asp Phe Leu 690 695
700Asp Asn Glu Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu Thr
Leu705 710 715 720Thr Leu
Phe Glu Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr
725 730 735Ala His Leu Phe Asp Asp Lys
Val Met Lys Gln Leu Lys Arg Arg Arg 740 745
750Tyr Thr Gly Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly
Ile Arg 755 760 765Asp Lys Gln Ser
Gly Lys Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly 770
775 780Phe Ala Asn Arg Asn Phe Met Gln Leu Ile His Asp
Asp Ser Leu Thr785 790 795
800Phe Lys Glu Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser
805 810 815Leu His Glu His Ile
Ala Asn Leu Ala Gly Ser Pro Ala Ile Lys Lys 820
825 830Gly Ile Leu Gln Thr Val Lys Val Val Asp Glu Leu
Val Lys Val Met 835 840 845Gly Arg
His Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn 850
855 860Gln Thr Thr Gln Lys Gly Gln Lys Asn Ser Arg
Glu Arg Met Lys Arg865 870 875
880Ile Glu Glu Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His
885 890 895Pro Val Glu Asn
Thr Gln Leu Gln Asn Glu Lys Leu Tyr Leu Tyr Tyr 900
905 910Leu Gln Asn Gly Arg Asp Met Tyr Val Asp Gln
Glu Leu Asp Ile Asn 915 920 925Arg
Leu Ser Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu 930
935 940Lys Asp Asp Ser Ile Asp Asn Lys Val Leu
Thr Arg Ser Asp Lys Asn945 950 955
960Arg Gly Lys Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys
Met 965 970 975Lys Asn Tyr
Trp Arg Gln Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg 980
985 990Lys Phe Asp Asn Leu Thr Lys Ala Glu Arg
Gly Gly Leu Ser Glu Leu 995 1000
1005Asp Lys Ala Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln
1010 1015 1020Ile Thr Lys His Val Ala
Gln Ile Leu Asp Ser Arg Met Asn Thr 1025 1030
1035Lys Tyr Asp Glu Asn Asp Lys Leu Ile Arg Glu Val Lys Val
Ile 1040 1045 1050Thr Leu Lys Ser Lys
Leu Val Ser Asp Phe Arg Lys Asp Phe Gln 1055 1060
1065Phe Tyr Lys Val Arg Glu Ile Asn Asn Tyr His His Ala
His Asp 1070 1075 1080Ala Tyr Leu Asn
Ala Val Val Gly Thr Ala Leu Ile Lys Lys Tyr 1085
1090 1095Pro Lys Leu Glu Ser Glu Phe Val Tyr Gly Asp
Tyr Lys Val Tyr 1100 1105 1110Asp Val
Arg Lys Met Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys 1115
1120 1125Ala Thr Ala Lys Tyr Phe Phe Tyr Ser Asn
Ile Met Asn Phe Phe 1130 1135 1140Lys
Thr Glu Ile Thr Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro 1145
1150 1155Leu Ile Glu Thr Asn Gly Glu Thr Gly
Glu Ile Val Trp Asp Lys 1160 1165
1170Gly Arg Asp Phe Ala Thr Val Arg Lys Val Leu Ser Met Pro Gln
1175 1180 1185Val Asn Ile Val Lys Lys
Thr Glu Val Gln Thr Gly Gly Phe Ser 1190 1195
1200Lys Glu Ser Ile Leu Pro Lys Arg Asn Ser Asp Lys Leu Ile
Ala 1205 1210 1215Arg Lys Lys Asp Trp
Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser 1220 1225
1230Pro Thr Val Ala Tyr Ser Val Leu Val Val Ala Lys Val
Glu Lys 1235 1240 1245Gly Lys Ser Lys
Lys Leu Lys Ser Val Lys Glu Leu Leu Gly Ile 1250
1255 1260Thr Ile Met Glu Arg Ser Ser Phe Glu Lys Asn
Pro Ile Asp Phe 1265 1270 1275Leu Glu
Ala Lys Gly Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile 1280
1285 1290Lys Leu Pro Lys Tyr Ser Leu Phe Glu Leu
Glu Asn Gly Arg Lys 1295 1300 1305Arg
Met Leu Ala Ser Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu 1310
1315 1320Ala Leu Pro Ser Lys Tyr Val Asn Phe
Leu Tyr Leu Ala Ser His 1325 1330
1335Tyr Glu Lys Leu Lys Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln
1340 1345 1350Leu Phe Val Glu Gln His
Lys His Tyr Leu Asp Glu Ile Ile Glu 1355 1360
1365Gln Ile Ser Glu Phe Ser Lys Arg Val Ile Leu Ala Asp Ala
Asn 1370 1375 1380Leu Asp Lys Val Leu
Ser Ala Tyr Asn Lys His Arg Asp Lys Pro 1385 1390
1395Ile Arg Glu Gln Ala Glu Asn Ile Ile His Leu Phe Thr
Leu Thr 1400 1405 1410Asn Leu Gly Ala
Pro Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile 1415
1420 1425Asp Arg Lys Arg Tyr Thr Ser Thr Lys Glu Val
Leu Asp Ala Thr 1430 1435 1440Leu Ile
His Gln Ser Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp 1445
1450 1455Leu Ser Gln Leu Gly Gly Asp Ser Gly Gly
Ser Thr Asn Leu Ser 1460 1465 1470Asp
Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu Val Ile Gln Glu 1475
1480 1485Ser Ile Leu Met Leu Pro Glu Glu Val
Glu Glu Val Ile Gly Asn 1490 1495
1500Lys Pro Glu Ser Asp Ile Leu Val His Thr Ala Tyr Asp Glu Ser
1505 1510 1515Thr Asp Glu Asn Val Met
Leu Leu Thr Ser Asp Ala Pro Glu Tyr 1520 1525
1530Lys Pro Trp Ala Leu Val Ile Gln Asp Ser Asn Gly Glu Asn
Lys 1535 1540 1545Ile Lys Met Leu Ser
Gly Gly Ser Pro Lys Lys Lys Arg Lys Val 1550 1555
1560371528PRTArtificial SequenceN-hAPOBEC3A
sDA2.5-nCas9-UGI-C 37Met Thr Tyr Asp Glu Phe Lys His Cys Trp Asp Thr Phe
Val Asp His1 5 10 15Gln
Gly Cys Pro Phe Gln Pro Trp Asp Gly Leu Asp Glu His Ser Gln 20
25 30Ala Leu Ser Gly Arg Leu Arg Ala
Ile Leu Gln Asn Gln Gly Asn Ser 35 40
45Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser Asp
50 55 60Lys Lys Tyr Ser Ile Gly Leu Ala
Ile Gly Thr Asn Ser Val Gly Trp65 70 75
80Ala Val Ile Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys
Phe Lys Val 85 90 95Leu
Gly Asn Thr Asp Arg His Ser Ile Lys Lys Asn Leu Ile Gly Ala
100 105 110Leu Leu Phe Asp Ser Gly Glu
Thr Ala Glu Ala Thr Arg Leu Lys Arg 115 120
125Thr Ala Arg Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr
Leu 130 135 140Gln Glu Ile Phe Ser Asn
Glu Met Ala Lys Val Asp Asp Ser Phe Phe145 150
155 160His Arg Leu Glu Glu Ser Phe Leu Val Glu Glu
Asp Lys Lys His Glu 165 170
175Arg His Pro Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr His Glu
180 185 190Lys Tyr Pro Thr Ile Tyr
His Leu Arg Lys Lys Leu Val Asp Ser Thr 195 200
205Asp Lys Ala Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His
Met Ile 210 215 220Lys Phe Arg Gly His
Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn225 230
235 240Ser Asp Val Asp Lys Leu Phe Ile Gln Leu
Val Gln Thr Tyr Asn Gln 245 250
255Leu Phe Glu Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala Lys Ala
260 265 270Ile Leu Ser Ala Arg
Leu Ser Lys Ser Arg Arg Leu Glu Asn Leu Ile 275
280 285Ala Gln Leu Pro Gly Glu Lys Lys Asn Gly Leu Phe
Gly Asn Leu Ile 290 295 300Ala Leu Ser
Leu Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu305
310 315 320Ala Glu Asp Ala Lys Leu Gln
Leu Ser Lys Asp Thr Tyr Asp Asp Asp 325
330 335Leu Asp Asn Leu Leu Ala Gln Ile Gly Asp Gln Tyr
Ala Asp Leu Phe 340 345 350Leu
Ala Ala Lys Asn Leu Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu 355
360 365Arg Val Asn Thr Glu Ile Thr Lys Ala
Pro Leu Ser Ala Ser Met Ile 370 375
380Lys Arg Tyr Asp Glu His His Gln Asp Leu Thr Leu Leu Lys Ala Leu385
390 395 400Val Arg Gln Gln
Leu Pro Glu Lys Tyr Lys Glu Ile Phe Phe Asp Gln 405
410 415Ser Lys Asn Gly Tyr Ala Gly Tyr Ile Asp
Gly Gly Ala Ser Gln Glu 420 425
430Glu Phe Tyr Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp Gly Thr
435 440 445Glu Glu Leu Leu Val Lys Leu
Asn Arg Glu Asp Leu Leu Arg Lys Gln 450 455
460Arg Thr Phe Asp Asn Gly Ser Ile Pro His Gln Ile His Leu Gly
Glu465 470 475 480Leu His
Ala Ile Leu Arg Arg Gln Glu Asp Phe Tyr Pro Phe Leu Lys
485 490 495Asp Asn Arg Glu Lys Ile Glu
Lys Ile Leu Thr Phe Arg Ile Pro Tyr 500 505
510Tyr Val Gly Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp
Met Thr 515 520 525Arg Lys Ser Glu
Glu Thr Ile Thr Pro Trp Asn Phe Glu Glu Val Val 530
535 540Asp Lys Gly Ala Ser Ala Gln Ser Phe Ile Glu Arg
Met Thr Asn Phe545 550 555
560Asp Lys Asn Leu Pro Asn Glu Lys Val Leu Pro Lys His Ser Leu Leu
565 570 575Tyr Glu Tyr Phe Thr
Val Tyr Asn Glu Leu Thr Lys Val Lys Tyr Val 580
585 590Thr Glu Gly Met Arg Lys Pro Ala Phe Leu Ser Gly
Glu Gln Lys Lys 595 600 605Ala Ile
Val Asp Leu Leu Phe Lys Thr Asn Arg Lys Val Thr Val Lys 610
615 620Gln Leu Lys Glu Asp Tyr Phe Lys Lys Ile Glu
Cys Phe Asp Ser Val625 630 635
640Glu Ile Ser Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr
645 650 655His Asp Leu Leu
Lys Ile Ile Lys Asp Lys Asp Phe Leu Asp Asn Glu 660
665 670Glu Asn Glu Asp Ile Leu Glu Asp Ile Val Leu
Thr Leu Thr Leu Phe 675 680 685Glu
Asp Arg Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala His Leu 690
695 700Phe Asp Asp Lys Val Met Lys Gln Leu Lys
Arg Arg Arg Tyr Thr Gly705 710 715
720Trp Gly Arg Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys
Gln 725 730 735Ser Gly Lys
Thr Ile Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn 740
745 750Arg Asn Phe Met Gln Leu Ile His Asp Asp
Ser Leu Thr Phe Lys Glu 755 760
765Asp Ile Gln Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu His Glu 770
775 780His Ile Ala Asn Leu Ala Gly Ser
Pro Ala Ile Lys Lys Gly Ile Leu785 790
795 800Gln Thr Val Lys Val Val Asp Glu Leu Val Lys Val
Met Gly Arg His 805 810
815Lys Pro Glu Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln Thr Thr
820 825 830Gln Lys Gly Gln Lys Asn
Ser Arg Glu Arg Met Lys Arg Ile Glu Glu 835 840
845Gly Ile Lys Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro
Val Glu 850 855 860Asn Thr Gln Leu Gln
Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn865 870
875 880Gly Arg Asp Met Tyr Val Asp Gln Glu Leu
Asp Ile Asn Arg Leu Ser 885 890
895Asp Tyr Asp Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp Asp
900 905 910Ser Ile Asp Asn Lys
Val Leu Thr Arg Ser Asp Lys Asn Arg Gly Lys 915
920 925Ser Asp Asn Val Pro Ser Glu Glu Val Val Lys Lys
Met Lys Asn Tyr 930 935 940Trp Arg Gln
Leu Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe Asp945
950 955 960Asn Leu Thr Lys Ala Glu Arg
Gly Gly Leu Ser Glu Leu Asp Lys Ala 965
970 975Gly Phe Ile Lys Arg Gln Leu Val Glu Thr Arg Gln
Ile Thr Lys His 980 985 990Val
Ala Gln Ile Leu Asp Ser Arg Met Asn Thr Lys Tyr Asp Glu Asn 995
1000 1005Asp Lys Leu Ile Arg Glu Val Lys
Val Ile Thr Leu Lys Ser Lys 1010 1015
1020Leu Val Ser Asp Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg
1025 1030 1035Glu Ile Asn Asn Tyr His
His Ala His Asp Ala Tyr Leu Asn Ala 1040 1045
1050Val Val Gly Thr Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu
Ser 1055 1060 1065Glu Phe Val Tyr Gly
Asp Tyr Lys Val Tyr Asp Val Arg Lys Met 1070 1075
1080Ile Ala Lys Ser Glu Gln Glu Ile Gly Lys Ala Thr Ala
Lys Tyr 1085 1090 1095Phe Phe Tyr Ser
Asn Ile Met Asn Phe Phe Lys Thr Glu Ile Thr 1100
1105 1110Leu Ala Asn Gly Glu Ile Arg Lys Arg Pro Leu
Ile Glu Thr Asn 1115 1120 1125Gly Glu
Thr Gly Glu Ile Val Trp Asp Lys Gly Arg Asp Phe Ala 1130
1135 1140Thr Val Arg Lys Val Leu Ser Met Pro Gln
Val Asn Ile Val Lys 1145 1150 1155Lys
Thr Glu Val Gln Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu 1160
1165 1170Pro Lys Arg Asn Ser Asp Lys Leu Ile
Ala Arg Lys Lys Asp Trp 1175 1180
1185Asp Pro Lys Lys Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr
1190 1195 1200Ser Val Leu Val Val Ala
Lys Val Glu Lys Gly Lys Ser Lys Lys 1205 1210
1215Leu Lys Ser Val Lys Glu Leu Leu Gly Ile Thr Ile Met Glu
Arg 1220 1225 1230Ser Ser Phe Glu Lys
Asn Pro Ile Asp Phe Leu Glu Ala Lys Gly 1235 1240
1245Tyr Lys Glu Val Lys Lys Asp Leu Ile Ile Lys Leu Pro
Lys Tyr 1250 1255 1260Ser Leu Phe Glu
Leu Glu Asn Gly Arg Lys Arg Met Leu Ala Ser 1265
1270 1275Ala Gly Glu Leu Gln Lys Gly Asn Glu Leu Ala
Leu Pro Ser Lys 1280 1285 1290Tyr Val
Asn Phe Leu Tyr Leu Ala Ser His Tyr Glu Lys Leu Lys 1295
1300 1305Gly Ser Pro Glu Asp Asn Glu Gln Lys Gln
Leu Phe Val Glu Gln 1310 1315 1320His
Lys His Tyr Leu Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe 1325
1330 1335Ser Lys Arg Val Ile Leu Ala Asp Ala
Asn Leu Asp Lys Val Leu 1340 1345
1350Ser Ala Tyr Asn Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala
1355 1360 1365Glu Asn Ile Ile His Leu
Phe Thr Leu Thr Asn Leu Gly Ala Pro 1370 1375
1380Ala Ala Phe Lys Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg
Tyr 1385 1390 1395Thr Ser Thr Lys Glu
Val Leu Asp Ala Thr Leu Ile His Gln Ser 1400 1405
1410Ile Thr Gly Leu Tyr Glu Thr Arg Ile Asp Leu Ser Gln
Leu Gly 1415 1420 1425Gly Asp Ser Gly
Gly Ser Thr Asn Leu Ser Asp Ile Ile Glu Lys 1430
1435 1440Glu Thr Gly Lys Gln Leu Val Ile Gln Glu Ser
Ile Leu Met Leu 1445 1450 1455Pro Glu
Glu Val Glu Glu Val Ile Gly Asn Lys Pro Glu Ser Asp 1460
1465 1470Ile Leu Val His Thr Ala Tyr Asp Glu Ser
Thr Asp Glu Asn Val 1475 1480 1485Met
Leu Leu Thr Ser Asp Ala Pro Glu Tyr Lys Pro Trp Ala Leu 1490
1495 1500Val Ile Gln Asp Ser Asn Gly Glu Asn
Lys Ile Lys Met Leu Ser 1505 1510
1515Gly Gly Ser Pro Lys Lys Lys Arg Lys Val 1520
1525381509PRTArtificial SequenceN-hAPOBEC3A sDA2.5-nCas9-UGI-C 38Met Phe
Gln Pro Trp Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser1 5
10 15Gly Arg Leu Arg Ala Ile Leu Gln
Asn Gln Gly Asn Ser Gly Ser Glu 20 25
30Thr Pro Gly Thr Ser Glu Ser Ala Thr Pro Glu Ser Asp Lys Lys
Tyr 35 40 45Ser Ile Gly Leu Ala
Ile Gly Thr Asn Ser Val Gly Trp Ala Val Ile 50 55
60Thr Asp Glu Tyr Lys Val Pro Ser Lys Lys Phe Lys Val Leu
Gly Asn65 70 75 80Thr
Asp Arg His Ser Ile Lys Lys Asn Leu Ile Gly Ala Leu Leu Phe
85 90 95Asp Ser Gly Glu Thr Ala Glu
Ala Thr Arg Leu Lys Arg Thr Ala Arg 100 105
110Arg Arg Tyr Thr Arg Arg Lys Asn Arg Ile Cys Tyr Leu Gln
Glu Ile 115 120 125Phe Ser Asn Glu
Met Ala Lys Val Asp Asp Ser Phe Phe His Arg Leu 130
135 140Glu Glu Ser Phe Leu Val Glu Glu Asp Lys Lys His
Glu Arg His Pro145 150 155
160Ile Phe Gly Asn Ile Val Asp Glu Val Ala Tyr His Glu Lys Tyr Pro
165 170 175Thr Ile Tyr His Leu
Arg Lys Lys Leu Val Asp Ser Thr Asp Lys Ala 180
185 190Asp Leu Arg Leu Ile Tyr Leu Ala Leu Ala His Met
Ile Lys Phe Arg 195 200 205Gly His
Phe Leu Ile Glu Gly Asp Leu Asn Pro Asp Asn Ser Asp Val 210
215 220Asp Lys Leu Phe Ile Gln Leu Val Gln Thr Tyr
Asn Gln Leu Phe Glu225 230 235
240Glu Asn Pro Ile Asn Ala Ser Gly Val Asp Ala Lys Ala Ile Leu Ser
245 250 255Ala Arg Leu Ser
Lys Ser Arg Arg Leu Glu Asn Leu Ile Ala Gln Leu 260
265 270Pro Gly Glu Lys Lys Asn Gly Leu Phe Gly Asn
Leu Ile Ala Leu Ser 275 280 285Leu
Gly Leu Thr Pro Asn Phe Lys Ser Asn Phe Asp Leu Ala Glu Asp 290
295 300Ala Lys Leu Gln Leu Ser Lys Asp Thr Tyr
Asp Asp Asp Leu Asp Asn305 310 315
320Leu Leu Ala Gln Ile Gly Asp Gln Tyr Ala Asp Leu Phe Leu Ala
Ala 325 330 335Lys Asn Leu
Ser Asp Ala Ile Leu Leu Ser Asp Ile Leu Arg Val Asn 340
345 350Thr Glu Ile Thr Lys Ala Pro Leu Ser Ala
Ser Met Ile Lys Arg Tyr 355 360
365Asp Glu His His Gln Asp Leu Thr Leu Leu Lys Ala Leu Val Arg Gln 370
375 380Gln Leu Pro Glu Lys Tyr Lys Glu
Ile Phe Phe Asp Gln Ser Lys Asn385 390
395 400Gly Tyr Ala Gly Tyr Ile Asp Gly Gly Ala Ser Gln
Glu Glu Phe Tyr 405 410
415Lys Phe Ile Lys Pro Ile Leu Glu Lys Met Asp Gly Thr Glu Glu Leu
420 425 430Leu Val Lys Leu Asn Arg
Glu Asp Leu Leu Arg Lys Gln Arg Thr Phe 435 440
445Asp Asn Gly Ser Ile Pro His Gln Ile His Leu Gly Glu Leu
His Ala 450 455 460Ile Leu Arg Arg Gln
Glu Asp Phe Tyr Pro Phe Leu Lys Asp Asn Arg465 470
475 480Glu Lys Ile Glu Lys Ile Leu Thr Phe Arg
Ile Pro Tyr Tyr Val Gly 485 490
495Pro Leu Ala Arg Gly Asn Ser Arg Phe Ala Trp Met Thr Arg Lys Ser
500 505 510Glu Glu Thr Ile Thr
Pro Trp Asn Phe Glu Glu Val Val Asp Lys Gly 515
520 525Ala Ser Ala Gln Ser Phe Ile Glu Arg Met Thr Asn
Phe Asp Lys Asn 530 535 540Leu Pro Asn
Glu Lys Val Leu Pro Lys His Ser Leu Leu Tyr Glu Tyr545
550 555 560Phe Thr Val Tyr Asn Glu Leu
Thr Lys Val Lys Tyr Val Thr Glu Gly 565
570 575Met Arg Lys Pro Ala Phe Leu Ser Gly Glu Gln Lys
Lys Ala Ile Val 580 585 590Asp
Leu Leu Phe Lys Thr Asn Arg Lys Val Thr Val Lys Gln Leu Lys 595
600 605Glu Asp Tyr Phe Lys Lys Ile Glu Cys
Phe Asp Ser Val Glu Ile Ser 610 615
620Gly Val Glu Asp Arg Phe Asn Ala Ser Leu Gly Thr Tyr His Asp Leu625
630 635 640Leu Lys Ile Ile
Lys Asp Lys Asp Phe Leu Asp Asn Glu Glu Asn Glu 645
650 655Asp Ile Leu Glu Asp Ile Val Leu Thr Leu
Thr Leu Phe Glu Asp Arg 660 665
670Glu Met Ile Glu Glu Arg Leu Lys Thr Tyr Ala His Leu Phe Asp Asp
675 680 685Lys Val Met Lys Gln Leu Lys
Arg Arg Arg Tyr Thr Gly Trp Gly Arg 690 695
700Leu Ser Arg Lys Leu Ile Asn Gly Ile Arg Asp Lys Gln Ser Gly
Lys705 710 715 720Thr Ile
Leu Asp Phe Leu Lys Ser Asp Gly Phe Ala Asn Arg Asn Phe
725 730 735Met Gln Leu Ile His Asp Asp
Ser Leu Thr Phe Lys Glu Asp Ile Gln 740 745
750Lys Ala Gln Val Ser Gly Gln Gly Asp Ser Leu His Glu His
Ile Ala 755 760 765Asn Leu Ala Gly
Ser Pro Ala Ile Lys Lys Gly Ile Leu Gln Thr Val 770
775 780Lys Val Val Asp Glu Leu Val Lys Val Met Gly Arg
His Lys Pro Glu785 790 795
800Asn Ile Val Ile Glu Met Ala Arg Glu Asn Gln Thr Thr Gln Lys Gly
805 810 815Gln Lys Asn Ser Arg
Glu Arg Met Lys Arg Ile Glu Glu Gly Ile Lys 820
825 830Glu Leu Gly Ser Gln Ile Leu Lys Glu His Pro Val
Glu Asn Thr Gln 835 840 845Leu Gln
Asn Glu Lys Leu Tyr Leu Tyr Tyr Leu Gln Asn Gly Arg Asp 850
855 860Met Tyr Val Asp Gln Glu Leu Asp Ile Asn Arg
Leu Ser Asp Tyr Asp865 870 875
880Val Asp His Ile Val Pro Gln Ser Phe Leu Lys Asp Asp Ser Ile Asp
885 890 895Asn Lys Val Leu
Thr Arg Ser Asp Lys Asn Arg Gly Lys Ser Asp Asn 900
905 910Val Pro Ser Glu Glu Val Val Lys Lys Met Lys
Asn Tyr Trp Arg Gln 915 920 925Leu
Leu Asn Ala Lys Leu Ile Thr Gln Arg Lys Phe Asp Asn Leu Thr 930
935 940Lys Ala Glu Arg Gly Gly Leu Ser Glu Leu
Asp Lys Ala Gly Phe Ile945 950 955
960Lys Arg Gln Leu Val Glu Thr Arg Gln Ile Thr Lys His Val Ala
Gln 965 970 975Ile Leu Asp
Ser Arg Met Asn Thr Lys Tyr Asp Glu Asn Asp Lys Leu 980
985 990Ile Arg Glu Val Lys Val Ile Thr Leu Lys
Ser Lys Leu Val Ser Asp 995 1000
1005Phe Arg Lys Asp Phe Gln Phe Tyr Lys Val Arg Glu Ile Asn Asn
1010 1015 1020Tyr His His Ala His Asp
Ala Tyr Leu Asn Ala Val Val Gly Thr 1025 1030
1035Ala Leu Ile Lys Lys Tyr Pro Lys Leu Glu Ser Glu Phe Val
Tyr 1040 1045 1050Gly Asp Tyr Lys Val
Tyr Asp Val Arg Lys Met Ile Ala Lys Ser 1055 1060
1065Glu Gln Glu Ile Gly Lys Ala Thr Ala Lys Tyr Phe Phe
Tyr Ser 1070 1075 1080Asn Ile Met Asn
Phe Phe Lys Thr Glu Ile Thr Leu Ala Asn Gly 1085
1090 1095Glu Ile Arg Lys Arg Pro Leu Ile Glu Thr Asn
Gly Glu Thr Gly 1100 1105 1110Glu Ile
Val Trp Asp Lys Gly Arg Asp Phe Ala Thr Val Arg Lys 1115
1120 1125Val Leu Ser Met Pro Gln Val Asn Ile Val
Lys Lys Thr Glu Val 1130 1135 1140Gln
Thr Gly Gly Phe Ser Lys Glu Ser Ile Leu Pro Lys Arg Asn 1145
1150 1155Ser Asp Lys Leu Ile Ala Arg Lys Lys
Asp Trp Asp Pro Lys Lys 1160 1165
1170Tyr Gly Gly Phe Asp Ser Pro Thr Val Ala Tyr Ser Val Leu Val
1175 1180 1185Val Ala Lys Val Glu Lys
Gly Lys Ser Lys Lys Leu Lys Ser Val 1190 1195
1200Lys Glu Leu Leu Gly Ile Thr Ile Met Glu Arg Ser Ser Phe
Glu 1205 1210 1215Lys Asn Pro Ile Asp
Phe Leu Glu Ala Lys Gly Tyr Lys Glu Val 1220 1225
1230Lys Lys Asp Leu Ile Ile Lys Leu Pro Lys Tyr Ser Leu
Phe Glu 1235 1240 1245Leu Glu Asn Gly
Arg Lys Arg Met Leu Ala Ser Ala Gly Glu Leu 1250
1255 1260Gln Lys Gly Asn Glu Leu Ala Leu Pro Ser Lys
Tyr Val Asn Phe 1265 1270 1275Leu Tyr
Leu Ala Ser His Tyr Glu Lys Leu Lys Gly Ser Pro Glu 1280
1285 1290Asp Asn Glu Gln Lys Gln Leu Phe Val Glu
Gln His Lys His Tyr 1295 1300 1305Leu
Asp Glu Ile Ile Glu Gln Ile Ser Glu Phe Ser Lys Arg Val 1310
1315 1320Ile Leu Ala Asp Ala Asn Leu Asp Lys
Val Leu Ser Ala Tyr Asn 1325 1330
1335Lys His Arg Asp Lys Pro Ile Arg Glu Gln Ala Glu Asn Ile Ile
1340 1345 1350His Leu Phe Thr Leu Thr
Asn Leu Gly Ala Pro Ala Ala Phe Lys 1355 1360
1365Tyr Phe Asp Thr Thr Ile Asp Arg Lys Arg Tyr Thr Ser Thr
Lys 1370 1375 1380Glu Val Leu Asp Ala
Thr Leu Ile His Gln Ser Ile Thr Gly Leu 1385 1390
1395Tyr Glu Thr Arg Ile Asp Leu Ser Gln Leu Gly Gly Asp
Ser Gly 1400 1405 1410Gly Ser Thr Asn
Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys 1415
1420 1425Gln Leu Val Ile Gln Glu Ser Ile Leu Met Leu
Pro Glu Glu Val 1430 1435 1440Glu Glu
Val Ile Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His 1445
1450 1455Thr Ala Tyr Asp Glu Ser Thr Asp Glu Asn
Val Met Leu Leu Thr 1460 1465 1470Ser
Asp Ala Pro Glu Tyr Lys Pro Trp Ala Leu Val Ile Gln Asp 1475
1480 1485Ser Asn Gly Glu Asn Lys Ile Lys Met
Leu Ser Gly Gly Ser Pro 1490 1495
1500Lys Lys Lys Arg Lys Val 1505391053PRTS. aureus 39Met Lys Arg Asn
Tyr Ile Leu Gly Leu Asp Ile Gly Ile Thr Ser Val1 5
10 15Gly Tyr Gly Ile Ile Asp Tyr Glu Thr Arg
Asp Val Ile Asp Ala Gly 20 25
30Val Arg Leu Phe Lys Glu Ala Asn Val Glu Asn Asn Glu Gly Arg Arg
35 40 45Ser Lys Arg Gly Ala Arg Arg Leu
Lys Arg Arg Arg Arg His Arg Ile 50 55
60Gln Arg Val Lys Lys Leu Leu Phe Asp Tyr Asn Leu Leu Thr Asp His65
70 75 80Ser Glu Leu Ser Gly
Ile Asn Pro Tyr Glu Ala Arg Val Lys Gly Leu 85
90 95Ser Gln Lys Leu Ser Glu Glu Glu Phe Ser Ala
Ala Leu Leu His Leu 100 105
110Ala Lys Arg Arg Gly Val His Asn Val Asn Glu Val Glu Glu Asp Thr
115 120 125Gly Asn Glu Leu Ser Thr Lys
Glu Gln Ile Ser Arg Asn Ser Lys Ala 130 135
140Leu Glu Glu Lys Tyr Val Ala Glu Leu Gln Leu Glu Arg Leu Lys
Lys145 150 155 160Asp Gly
Glu Val Arg Gly Ser Ile Asn Arg Phe Lys Thr Ser Asp Tyr
165 170 175Val Lys Glu Ala Lys Gln Leu
Leu Lys Val Gln Lys Ala Tyr His Gln 180 185
190Leu Asp Gln Ser Phe Ile Asp Thr Tyr Ile Asp Leu Leu Glu
Thr Arg 195 200 205Arg Thr Tyr Tyr
Glu Gly Pro Gly Glu Gly Ser Pro Phe Gly Trp Lys 210
215 220Asp Ile Lys Glu Trp Tyr Glu Met Leu Met Gly His
Cys Thr Tyr Phe225 230 235
240Pro Glu Glu Leu Arg Ser Val Lys Tyr Ala Tyr Asn Ala Asp Leu Tyr
245 250 255Asn Ala Leu Asn Asp
Leu Asn Asn Leu Val Ile Thr Arg Asp Glu Asn 260
265 270Glu Lys Leu Glu Tyr Tyr Glu Lys Phe Gln Ile Ile
Glu Asn Val Phe 275 280 285Lys Gln
Lys Lys Lys Pro Thr Leu Lys Gln Ile Ala Lys Glu Ile Leu 290
295 300Val Asn Glu Glu Asp Ile Lys Gly Tyr Arg Val
Thr Ser Thr Gly Lys305 310 315
320Pro Glu Phe Thr Asn Leu Lys Val Tyr His Asp Ile Lys Asp Ile Thr
325 330 335Ala Arg Lys Glu
Ile Ile Glu Asn Ala Glu Leu Leu Asp Gln Ile Ala 340
345 350Lys Ile Leu Thr Ile Tyr Gln Ser Ser Glu Asp
Ile Gln Glu Glu Leu 355 360 365Thr
Asn Leu Asn Ser Glu Leu Thr Gln Glu Glu Ile Glu Gln Ile Ser 370
375 380Asn Leu Lys Gly Tyr Thr Gly Thr His Asn
Leu Ser Leu Lys Ala Ile385 390 395
400Asn Leu Ile Leu Asp Glu Leu Trp His Thr Asn Asp Asn Gln Ile
Ala 405 410 415Ile Phe Asn
Arg Leu Lys Leu Val Pro Lys Lys Val Asp Leu Ser Gln 420
425 430Gln Lys Glu Ile Pro Thr Thr Leu Val Asp
Asp Phe Ile Leu Ser Pro 435 440
445Val Val Lys Arg Ser Phe Ile Gln Ser Ile Lys Val Ile Asn Ala Ile 450
455 460Ile Lys Lys Tyr Gly Leu Pro Asn
Asp Ile Ile Ile Glu Leu Ala Arg465 470
475 480Glu Lys Asn Ser Lys Asp Ala Gln Lys Met Ile Asn
Glu Met Gln Lys 485 490
495Arg Asn Arg Gln Thr Asn Glu Arg Ile Glu Glu Ile Ile Arg Thr Thr
500 505 510Gly Lys Glu Asn Ala Lys
Tyr Leu Ile Glu Lys Ile Lys Leu His Asp 515 520
525Met Gln Glu Gly Lys Cys Leu Tyr Ser Leu Glu Ala Ile Pro
Leu Glu 530 535 540Asp Leu Leu Asn Asn
Pro Phe Asn Tyr Glu Val Asp His Ile Ile Pro545 550
555 560Arg Ser Val Ser Phe Asp Asn Ser Phe Asn
Asn Lys Val Leu Val Lys 565 570
575Gln Glu Glu Asn Ser Lys Lys Gly Asn Arg Thr Pro Phe Gln Tyr Leu
580 585 590Ser Ser Ser Asp Ser
Lys Ile Ser Tyr Glu Thr Phe Lys Lys His Ile 595
600 605Leu Asn Leu Ala Lys Gly Lys Gly Arg Ile Ser Lys
Thr Lys Lys Glu 610 615 620Tyr Leu Leu
Glu Glu Arg Asp Ile Asn Arg Phe Ser Val Gln Lys Asp625
630 635 640Phe Ile Asn Arg Asn Leu Val
Asp Thr Arg Tyr Ala Thr Arg Gly Leu 645
650 655Met Asn Leu Leu Arg Ser Tyr Phe Arg Val Asn Asn
Leu Asp Val Lys 660 665 670Val
Lys Ser Ile Asn Gly Gly Phe Thr Ser Phe Leu Arg Arg Lys Trp 675
680 685Lys Phe Lys Lys Glu Arg Asn Lys Gly
Tyr Lys His His Ala Glu Asp 690 695
700Ala Leu Ile Ile Ala Asn Ala Asp Phe Ile Phe Lys Glu Trp Lys Lys705
710 715 720Leu Asp Lys Ala
Lys Lys Val Met Glu Asn Gln Met Phe Glu Glu Lys 725
730 735Gln Ala Glu Ser Met Pro Glu Ile Glu Thr
Glu Gln Glu Tyr Lys Glu 740 745
750Ile Phe Ile Thr Pro His Gln Ile Lys His Ile Lys Asp Phe Lys Asp
755 760 765Tyr Lys Tyr Ser His Arg Val
Asp Lys Lys Pro Asn Arg Glu Leu Ile 770 775
780Asn Asp Thr Leu Tyr Ser Thr Arg Lys Asp Asp Lys Gly Asn Thr
Leu785 790 795 800Ile Val
Asn Asn Leu Asn Gly Leu Tyr Asp Lys Asp Asn Asp Lys Leu
805 810 815Lys Lys Leu Ile Asn Lys Ser
Pro Glu Lys Leu Leu Met Tyr His His 820 825
830Asp Pro Gln Thr Tyr Gln Lys Leu Lys Leu Ile Met Glu Gln
Tyr Gly 835 840 845Asp Glu Lys Asn
Pro Leu Tyr Lys Tyr Tyr Glu Glu Thr Gly Asn Tyr 850
855 860Leu Thr Lys Tyr Ser Lys Lys Asp Asn Gly Pro Val
Ile Lys Lys Ile865 870 875
880Lys Tyr Tyr Gly Asn Lys Leu Asn Ala His Leu Asp Ile Thr Asp Asp
885 890 895Tyr Pro Asn Ser Arg
Asn Lys Val Val Lys Leu Ser Leu Lys Pro Tyr 900
905 910Arg Phe Asp Val Tyr Leu Asp Asn Gly Val Tyr Lys
Phe Val Thr Val 915 920 925Lys Asn
Leu Asp Val Ile Lys Lys Glu Asn Tyr Tyr Glu Val Asn Ser 930
935 940Lys Cys Tyr Glu Glu Ala Lys Lys Leu Lys Lys
Ile Ser Asn Gln Ala945 950 955
960Glu Phe Ile Ala Ser Phe Tyr Asn Asn Asp Leu Ile Lys Ile Asn Gly
965 970 975Glu Leu Tyr Arg
Val Ile Gly Val Asn Asn Asp Leu Leu Asn Arg Ile 980
985 990Glu Val Asn Met Ile Asp Ile Thr Tyr Arg Glu
Tyr Leu Glu Asn Met 995 1000
1005Asn Asp Lys Arg Pro Pro Arg Ile Ile Lys Thr Ile Ala Ser Lys
1010 1015 1020Thr Gln Ser Ile Lys Lys
Tyr Ser Thr Asp Ile Leu Gly Asn Leu 1025 1030
1035Tyr Glu Val Lys Ser Lys Lys His Pro Gln Ile Ile Lys Lys
Gly 1040 1045 105040984PRTC. jejuni
40Met Ala Arg Ile Leu Ala Phe Asp Ile Gly Ile Ser Ser Ile Gly Trp1
5 10 15Ala Phe Ser Glu Asn Asp
Glu Leu Lys Asp Cys Gly Val Arg Ile Phe 20 25
30Thr Lys Val Glu Asn Pro Lys Thr Gly Glu Ser Leu Ala
Leu Pro Arg 35 40 45Arg Leu Ala
Arg Ser Ala Arg Lys Arg Leu Ala Arg Arg Lys Ala Arg 50
55 60Leu Asn His Leu Lys His Leu Ile Ala Asn Glu Phe
Lys Leu Asn Tyr65 70 75
80Glu Asp Tyr Gln Ser Phe Asp Glu Ser Leu Ala Lys Ala Tyr Lys Gly
85 90 95Ser Leu Ile Ser Pro Tyr
Glu Leu Arg Phe Arg Ala Leu Asn Glu Leu 100
105 110Leu Ser Lys Gln Asp Phe Ala Arg Val Ile Leu His
Ile Ala Lys Arg 115 120 125Arg Gly
Tyr Asp Asp Ile Lys Asn Ser Asp Asp Lys Glu Lys Gly Ala 130
135 140Ile Leu Lys Ala Ile Lys Gln Asn Glu Glu Lys
Leu Ala Asn Tyr Gln145 150 155
160Ser Val Gly Glu Tyr Leu Tyr Lys Glu Tyr Phe Gln Lys Phe Lys Glu
165 170 175Asn Ser Lys Glu
Phe Thr Asn Val Arg Asn Lys Lys Glu Ser Tyr Glu 180
185 190Arg Cys Ile Ala Gln Ser Phe Leu Lys Asp Glu
Leu Lys Leu Ile Phe 195 200 205Lys
Lys Gln Arg Glu Phe Gly Phe Ser Phe Ser Lys Lys Phe Glu Glu 210
215 220Glu Val Leu Ser Val Ala Phe Tyr Lys Arg
Ala Leu Lys Asp Phe Ser225 230 235
240His Leu Val Gly Asn Cys Ser Phe Phe Thr Asp Glu Lys Arg Ala
Pro 245 250 255Lys Asn Ser
Pro Leu Ala Phe Met Phe Val Ala Leu Thr Arg Ile Ile 260
265 270Asn Leu Leu Asn Asn Leu Lys Asn Thr Glu
Gly Ile Leu Tyr Thr Lys 275 280
285Asp Asp Leu Asn Ala Leu Leu Asn Glu Val Leu Lys Asn Gly Thr Leu 290
295 300Thr Tyr Lys Gln Thr Lys Lys Leu
Leu Gly Leu Ser Asp Asp Tyr Glu305 310
315 320Phe Lys Gly Glu Lys Gly Thr Tyr Phe Ile Glu Phe
Lys Lys Tyr Lys 325 330
335Glu Phe Ile Lys Ala Leu Gly Glu His Asn Leu Ser Gln Asp Asp Leu
340 345 350Asn Glu Ile Ala Lys Asp
Ile Thr Leu Ile Lys Asp Glu Ile Lys Leu 355 360
365Lys Lys Ala Leu Ala Lys Tyr Asp Leu Asn Gln Asn Gln Ile
Asp Ser 370 375 380Leu Ser Lys Leu Glu
Phe Lys Asp His Leu Asn Ile Ser Phe Lys Ala385 390
395 400Leu Lys Leu Val Thr Pro Leu Met Leu Glu
Gly Lys Lys Tyr Asp Glu 405 410
415Ala Cys Asn Glu Leu Asn Leu Lys Val Ala Ile Asn Glu Asp Lys Lys
420 425 430Asp Phe Leu Pro Ala
Phe Asn Glu Thr Tyr Tyr Lys Asp Glu Val Thr 435
440 445Asn Pro Val Val Leu Arg Ala Ile Lys Glu Tyr Arg
Lys Val Leu Asn 450 455 460Ala Leu Leu
Lys Lys Tyr Gly Lys Val His Lys Ile Asn Ile Glu Leu465
470 475 480Ala Arg Glu Val Gly Lys Asn
His Ser Gln Arg Ala Lys Ile Glu Lys 485
490 495Glu Gln Asn Glu Asn Tyr Lys Ala Lys Lys Asp Ala
Glu Leu Glu Cys 500 505 510Glu
Lys Leu Gly Leu Lys Ile Asn Ser Lys Asn Ile Leu Lys Leu Arg 515
520 525Leu Phe Lys Glu Gln Lys Glu Phe Cys
Ala Tyr Ser Gly Glu Lys Ile 530 535
540Lys Ile Ser Asp Leu Gln Asp Glu Lys Met Leu Glu Ile Asp His Ile545
550 555 560Tyr Pro Tyr Ser
Arg Ser Phe Asp Asp Ser Tyr Met Asn Lys Val Leu 565
570 575Val Phe Thr Lys Gln Asn Gln Glu Lys Leu
Asn Gln Thr Pro Phe Glu 580 585
590Ala Phe Gly Asn Asp Ser Ala Lys Trp Gln Lys Ile Glu Val Leu Ala
595 600 605Lys Asn Leu Pro Thr Lys Lys
Gln Lys Arg Ile Leu Asp Lys Asn Tyr 610 615
620Lys Asp Lys Glu Gln Lys Asn Phe Lys Asp Arg Asn Leu Asn Asp
Thr625 630 635 640Arg Tyr
Ile Ala Arg Leu Val Leu Asn Tyr Thr Lys Asp Tyr Leu Asp
645 650 655Phe Leu Pro Leu Ser Asp Asp
Glu Asn Thr Lys Leu Asn Asp Thr Gln 660 665
670Lys Gly Ser Lys Val His Val Glu Ala Lys Ser Gly Met Leu
Thr Ser 675 680 685Ala Leu Arg His
Thr Trp Gly Phe Ser Ala Lys Asp Arg Asn Asn His 690
695 700Leu His His Ala Ile Asp Ala Val Ile Ile Ala Tyr
Ala Asn Asn Ser705 710 715
720Ile Val Lys Ala Phe Ser Asp Phe Lys Lys Glu Gln Glu Ser Asn Ser
725 730 735Ala Glu Leu Tyr Ala
Lys Lys Ile Ser Glu Leu Asp Tyr Lys Asn Lys 740
745 750Arg Lys Phe Phe Glu Pro Phe Ser Gly Phe Arg Gln
Lys Val Leu Asp 755 760 765Lys Ile
Asp Glu Ile Phe Val Ser Lys Pro Glu Arg Lys Lys Pro Ser 770
775 780Gly Ala Leu His Glu Glu Thr Phe Arg Lys Glu
Glu Glu Phe Tyr Gln785 790 795
800Ser Tyr Gly Gly Lys Glu Gly Val Leu Lys Ala Leu Glu Leu Gly Lys
805 810 815Ile Arg Lys Val
Asn Gly Lys Ile Val Lys Asn Gly Asp Met Phe Arg 820
825 830Val Asp Ile Phe Lys His Lys Lys Thr Asn Lys
Phe Tyr Ala Val Pro 835 840 845Ile
Tyr Thr Met Asp Phe Ala Leu Lys Val Leu Pro Asn Lys Ala Val 850
855 860Ala Arg Ser Lys Lys Gly Glu Ile Lys Asp
Trp Ile Leu Met Asp Glu865 870 875
880Asn Tyr Glu Phe Cys Phe Ser Leu Tyr Lys Asp Ser Leu Ile Leu
Ile 885 890 895Gln Thr Lys
Asp Met Gln Glu Pro Glu Phe Val Tyr Tyr Asn Ala Phe 900
905 910Thr Ser Ser Thr Val Ser Leu Ile Val Ser
Lys His Asp Asn Lys Phe 915 920
925Glu Thr Leu Ser Lys Asn Gln Lys Ile Leu Phe Lys Asn Ala Asn Glu 930
935 940Lys Glu Val Ile Ala Lys Ser Ile
Gly Ile Gln Asn Leu Lys Val Phe945 950
955 960Glu Lys Tyr Ile Val Ser Ala Leu Gly Glu Val Thr
Lys Ala Glu Phe 965 970
975Arg Gln Arg Glu Asp Phe Lys Lys 980411037PRTP.
lavamentivorans 41Met Glu Arg Ile Phe Gly Phe Asp Ile Gly Thr Thr Ser Ile
Gly Phe1 5 10 15Ser Val
Ile Asp Tyr Ser Ser Thr Gln Ser Ala Gly Asn Ile Gln Arg 20
25 30Leu Gly Val Arg Ile Phe Pro Glu Ala
Arg Asp Pro Asp Gly Thr Pro 35 40
45Leu Asn Gln Gln Arg Arg Gln Lys Arg Met Met Arg Arg Gln Leu Arg 50
55 60Arg Arg Arg Ile Arg Arg Lys Ala Leu
Asn Glu Thr Leu His Glu Ala65 70 75
80Gly Phe Leu Pro Ala Tyr Gly Ser Ala Asp Trp Pro Val Val
Met Ala 85 90 95Asp Glu
Pro Tyr Glu Leu Arg Arg Arg Gly Leu Glu Glu Gly Leu Ser 100
105 110Ala Tyr Glu Phe Gly Arg Ala Ile Tyr
His Leu Ala Gln His Arg His 115 120
125Phe Lys Gly Arg Glu Leu Glu Glu Ser Asp Thr Pro Asp Pro Asp Val
130 135 140Asp Asp Glu Lys Glu Ala Ala
Asn Glu Arg Ala Ala Thr Leu Lys Ala145 150
155 160Leu Lys Asn Glu Gln Thr Thr Leu Gly Ala Trp Leu
Ala Arg Arg Pro 165 170
175Pro Ser Asp Arg Lys Arg Gly Ile His Ala His Arg Asn Val Val Ala
180 185 190Glu Glu Phe Glu Arg Leu
Trp Glu Val Gln Ser Lys Phe His Pro Ala 195 200
205Leu Lys Ser Glu Glu Met Arg Ala Arg Ile Ser Asp Thr Ile
Phe Ala 210 215 220Gln Arg Pro Val Phe
Trp Arg Lys Asn Thr Leu Gly Glu Cys Arg Phe225 230
235 240Met Pro Gly Glu Pro Leu Cys Pro Lys Gly
Ser Trp Leu Ser Gln Gln 245 250
255Arg Arg Met Leu Glu Lys Leu Asn Asn Leu Ala Ile Ala Gly Gly Asn
260 265 270Ala Arg Pro Leu Asp
Ala Glu Glu Arg Asp Ala Ile Leu Ser Lys Leu 275
280 285Gln Gln Gln Ala Ser Met Ser Trp Pro Gly Val Arg
Ser Ala Leu Lys 290 295 300Ala Leu Tyr
Lys Gln Arg Gly Glu Pro Gly Ala Glu Lys Ser Leu Lys305
310 315 320Phe Asn Leu Glu Leu Gly Gly
Glu Ser Lys Leu Leu Gly Asn Ala Leu 325
330 335Glu Ala Lys Leu Ala Asp Met Phe Gly Pro Asp Trp
Pro Ala His Pro 340 345 350Arg
Lys Gln Glu Ile Arg His Ala Val His Glu Arg Leu Trp Ala Ala 355
360 365Asp Tyr Gly Glu Thr Pro Asp Lys Lys
Arg Val Ile Ile Leu Ser Glu 370 375
380Lys Asp Arg Lys Ala His Arg Glu Ala Ala Ala Asn Ser Phe Val Ala385
390 395 400Asp Phe Gly Ile
Thr Gly Glu Gln Ala Ala Gln Leu Gln Ala Leu Lys 405
410 415Leu Pro Thr Gly Trp Glu Pro Tyr Ser Ile
Pro Ala Leu Asn Leu Phe 420 425
430Leu Ala Glu Leu Glu Lys Gly Glu Arg Phe Gly Ala Leu Val Asn Gly
435 440 445Pro Asp Trp Glu Gly Trp Arg
Arg Thr Asn Phe Pro His Arg Asn Gln 450 455
460Pro Thr Gly Glu Ile Leu Asp Lys Leu Pro Ser Pro Ala Ser Lys
Glu465 470 475 480Glu Arg
Glu Arg Ile Ser Gln Leu Arg Asn Pro Thr Val Val Arg Thr
485 490 495Gln Asn Glu Leu Arg Lys Val
Val Asn Asn Leu Ile Gly Leu Tyr Gly 500 505
510Lys Pro Asp Arg Ile Arg Ile Glu Val Gly Arg Asp Val Gly
Lys Ser 515 520 525Lys Arg Glu Arg
Glu Glu Ile Gln Ser Gly Ile Arg Arg Asn Glu Lys 530
535 540Gln Arg Lys Lys Ala Thr Glu Asp Leu Ile Lys Asn
Gly Ile Ala Asn545 550 555
560Pro Ser Arg Asp Asp Val Glu Lys Trp Ile Leu Trp Lys Glu Gly Gln
565 570 575Glu Arg Cys Pro Tyr
Thr Gly Asp Gln Ile Gly Phe Asn Ala Leu Phe 580
585 590Arg Glu Gly Arg Tyr Glu Val Glu His Ile Trp Pro
Arg Ser Arg Ser 595 600 605Phe Asp
Asn Ser Pro Arg Asn Lys Thr Leu Cys Arg Lys Asp Val Asn 610
615 620Ile Glu Lys Gly Asn Arg Met Pro Phe Glu Ala
Phe Gly His Asp Glu625 630 635
640Asp Arg Trp Ser Ala Ile Gln Ile Arg Leu Gln Gly Met Val Ser Ala
645 650 655Lys Gly Gly Thr
Gly Met Ser Pro Gly Lys Val Lys Arg Phe Leu Ala 660
665 670Lys Thr Met Pro Glu Asp Phe Ala Ala Arg Gln
Leu Asn Asp Thr Arg 675 680 685Tyr
Ala Ala Lys Gln Ile Leu Ala Gln Leu Lys Arg Leu Trp Pro Asp 690
695 700Met Gly Pro Glu Ala Pro Val Lys Val Glu
Ala Val Thr Gly Gln Val705 710 715
720Thr Ala Gln Leu Arg Lys Leu Trp Thr Leu Asn Asn Ile Leu Ala
Asp 725 730 735Asp Gly Glu
Lys Thr Arg Ala Asp His Arg His His Ala Ile Asp Ala 740
745 750Leu Thr Val Ala Cys Thr His Pro Gly Met
Thr Asn Lys Leu Ser Arg 755 760
765Tyr Trp Gln Leu Arg Asp Asp Pro Arg Ala Glu Lys Pro Ala Leu Thr 770
775 780Pro Pro Trp Asp Thr Ile Arg Ala
Asp Ala Glu Lys Ala Val Ser Glu785 790
795 800Ile Val Val Ser His Arg Val Arg Lys Lys Val Ser
Gly Pro Leu His 805 810
815Lys Glu Thr Thr Tyr Gly Asp Thr Gly Thr Asp Ile Lys Thr Lys Ser
820 825 830Gly Thr Tyr Arg Gln Phe
Val Thr Arg Lys Lys Ile Glu Ser Leu Ser 835 840
845Lys Gly Glu Leu Asp Glu Ile Arg Asp Pro Arg Ile Lys Glu
Ile Val 850 855 860Ala Ala His Val Ala
Gly Arg Gly Gly Asp Pro Lys Lys Ala Phe Pro865 870
875 880Pro Tyr Pro Cys Val Ser Pro Gly Gly Pro
Glu Ile Arg Lys Val Arg 885 890
895Leu Thr Ser Lys Gln Gln Leu Asn Leu Met Ala Gln Thr Gly Asn Gly
900 905 910Tyr Ala Asp Leu Gly
Ser Asn His His Ile Ala Ile Tyr Arg Leu Pro 915
920 925Asp Gly Lys Ala Asp Phe Glu Ile Val Ser Leu Phe
Asp Ala Ser Arg 930 935 940Arg Leu Ala
Gln Arg Asn Pro Ile Val Gln Arg Thr Arg Ala Asp Gly945
950 955 960Ala Ser Phe Val Met Ser Leu
Ala Ala Gly Glu Ala Ile Met Ile Pro 965
970 975Glu Gly Ser Lys Lys Gly Ile Trp Ile Val Gln Gly
Val Trp Ala Ser 980 985 990Gly
Gln Val Val Leu Glu Arg Asp Thr Asp Ala Asp His Ser Thr Thr 995
1000 1005Thr Arg Pro Met Pro Asn Pro Ile
Leu Lys Asp Asp Ala Lys Lys 1010 1015
1020Val Ser Ile Asp Pro Ile Gly Arg Val Arg Pro Ser Asn Asp1025
1030 1035421082PRTN. cinerea 42Met Ala Ala Phe
Lys Pro Asn Pro Met Asn Tyr Ile Leu Gly Leu Asp1 5
10 15Ile Gly Ile Ala Ser Val Gly Trp Ala Ile
Val Glu Ile Asp Glu Glu 20 25
30Glu Asn Pro Ile Arg Leu Ile Asp Leu Gly Val Arg Val Phe Glu Arg
35 40 45Ala Glu Val Pro Lys Thr Gly Asp
Ser Leu Ala Ala Ala Arg Arg Leu 50 55
60Ala Arg Ser Val Arg Arg Leu Thr Arg Arg Arg Ala His Arg Leu Leu65
70 75 80Arg Ala Arg Arg Leu
Leu Lys Arg Glu Gly Val Leu Gln Ala Ala Asp 85
90 95Phe Asp Glu Asn Gly Leu Ile Lys Ser Leu Pro
Asn Thr Pro Trp Gln 100 105
110Leu Arg Ala Ala Ala Leu Asp Arg Lys Leu Thr Pro Leu Glu Trp Ser
115 120 125Ala Val Leu Leu His Leu Ile
Lys His Arg Gly Tyr Leu Ser Gln Arg 130 135
140Lys Asn Glu Gly Glu Thr Ala Asp Lys Glu Leu Gly Ala Leu Leu
Lys145 150 155 160Gly Val
Ala Asp Asn Thr His Ala Leu Gln Thr Gly Asp Phe Arg Thr
165 170 175Pro Ala Glu Leu Ala Leu Asn
Lys Phe Glu Lys Glu Ser Gly His Ile 180 185
190Arg Asn Gln Arg Gly Asp Tyr Ser His Thr Phe Asn Arg Lys
Asp Leu 195 200 205Gln Ala Glu Leu
Asn Leu Leu Phe Glu Lys Gln Lys Glu Phe Gly Asn 210
215 220Pro His Val Ser Asp Gly Leu Lys Glu Gly Ile Glu
Thr Leu Leu Met225 230 235
240Thr Gln Arg Pro Ala Leu Ser Gly Asp Ala Val Gln Lys Met Leu Gly
245 250 255His Cys Thr Phe Glu
Pro Thr Glu Pro Lys Ala Ala Lys Asn Thr Tyr 260
265 270Thr Ala Glu Arg Phe Val Trp Leu Thr Lys Leu Asn
Asn Leu Arg Ile 275 280 285Leu Glu
Gln Gly Ser Glu Arg Pro Leu Thr Asp Thr Glu Arg Ala Thr 290
295 300Leu Met Asp Glu Pro Tyr Arg Lys Ser Lys Leu
Thr Tyr Ala Gln Ala305 310 315
320Arg Lys Leu Leu Asp Leu Asp Asp Thr Ala Phe Phe Lys Gly Leu Arg
325 330 335Tyr Gly Lys Asp
Asn Ala Glu Ala Ser Thr Leu Met Glu Met Lys Ala 340
345 350Tyr His Ala Ile Ser Arg Ala Leu Glu Lys Glu
Gly Leu Lys Asp Lys 355 360 365Lys
Ser Pro Leu Asn Leu Ser Pro Glu Leu Gln Asp Glu Ile Gly Thr 370
375 380Ala Phe Ser Leu Phe Lys Thr Asp Glu Asp
Ile Thr Gly Arg Leu Lys385 390 395
400Asp Arg Val Gln Pro Glu Ile Leu Glu Ala Leu Leu Lys His Ile
Ser 405 410 415Phe Asp Lys
Phe Val Gln Ile Ser Leu Lys Ala Leu Arg Arg Ile Val 420
425 430Pro Leu Met Glu Gln Gly Asn Arg Tyr Asp
Glu Ala Cys Thr Glu Ile 435 440
445Tyr Gly Asp His Tyr Gly Lys Lys Asn Thr Glu Glu Lys Ile Tyr Leu 450
455 460Pro Pro Ile Pro Ala Asp Glu Ile
Arg Asn Pro Val Val Leu Arg Ala465 470
475 480Leu Ser Gln Ala Arg Lys Val Ile Asn Gly Val Val
Arg Arg Tyr Gly 485 490
495Ser Pro Ala Arg Ile His Ile Glu Thr Ala Arg Glu Val Gly Lys Ser
500 505 510Phe Lys Asp Arg Lys Glu
Ile Glu Lys Arg Gln Glu Glu Asn Arg Lys 515 520
525Asp Arg Glu Lys Ser Ala Ala Lys Phe Arg Glu Tyr Phe Pro
Asn Phe 530 535 540Val Gly Glu Pro Lys
Ser Lys Asp Ile Leu Lys Leu Arg Leu Tyr Glu545 550
555 560Gln Gln His Gly Lys Cys Leu Tyr Ser Gly
Lys Glu Ile Asn Leu Gly 565 570
575Arg Leu Asn Glu Lys Gly Tyr Val Glu Ile Asp His Ala Leu Pro Phe
580 585 590Ser Arg Thr Trp Asp
Asp Ser Phe Asn Asn Lys Val Leu Ala Leu Gly 595
600 605Ser Glu Asn Gln Asn Lys Gly Asn Gln Thr Pro Tyr
Glu Tyr Phe Asn 610 615 620Gly Lys Asp
Asn Ser Arg Glu Trp Gln Glu Phe Lys Ala Arg Val Glu625
630 635 640Thr Ser Arg Phe Pro Arg Ser
Lys Lys Gln Arg Ile Leu Leu Gln Lys 645
650 655Phe Asp Glu Asp Gly Phe Lys Glu Arg Asn Leu Asn
Asp Thr Arg Tyr 660 665 670Ile
Asn Arg Phe Leu Cys Gln Phe Val Ala Asp His Met Leu Leu Thr 675
680 685Gly Lys Gly Lys Arg Arg Val Phe Ala
Ser Asn Gly Gln Ile Thr Asn 690 695
700Leu Leu Arg Gly Phe Trp Gly Leu Arg Lys Val Arg Ala Glu Asn Asp705
710 715 720Arg His His Ala
Leu Asp Ala Val Val Val Ala Cys Ser Thr Ile Ala 725
730 735Met Gln Gln Lys Ile Thr Arg Phe Val Arg
Tyr Lys Glu Met Asn Ala 740 745
750Phe Asp Gly Lys Thr Ile Asp Lys Glu Thr Gly Glu Val Leu His Gln
755 760 765Lys Ala His Phe Pro Gln Pro
Trp Glu Phe Phe Ala Gln Glu Val Met 770 775
780Ile Arg Val Phe Gly Lys Pro Asp Gly Lys Pro Glu Phe Glu Glu
Ala785 790 795 800Asp Thr
Pro Glu Lys Leu Arg Thr Leu Leu Ala Glu Lys Leu Ser Ser
805 810 815Arg Pro Glu Ala Val His Lys
Tyr Val Thr Pro Leu Phe Ile Ser Arg 820 825
830Ala Pro Asn Arg Lys Met Ser Gly Gln Gly His Met Glu Thr
Val Lys 835 840 845Ser Ala Lys Arg
Leu Asp Glu Gly Ile Ser Val Leu Arg Val Pro Leu 850
855 860Thr Gln Leu Lys Leu Lys Asp Leu Glu Lys Met Val
Asn Arg Glu Arg865 870 875
880Glu Pro Lys Leu Tyr Glu Ala Leu Lys Ala Arg Leu Glu Ala His Lys
885 890 895Asp Asp Pro Ala Lys
Ala Phe Ala Glu Pro Phe Tyr Lys Tyr Asp Lys 900
905 910Ala Gly Asn Arg Thr Gln Gln Val Lys Ala Val Arg
Val Glu Gln Val 915 920 925Gln Lys
Thr Gly Val Trp Val His Asn His Asn Gly Ile Ala Asp Asn 930
935 940Ala Thr Ile Val Arg Val Asp Val Phe Glu Lys
Gly Gly Lys Tyr Tyr945 950 955
960Leu Val Pro Ile Tyr Ser Trp Gln Val Ala Lys Gly Ile Leu Pro Asp
965 970 975Arg Ala Val Val
Gln Gly Lys Asp Glu Glu Asp Trp Thr Val Met Asp 980
985 990Asp Ser Phe Glu Phe Lys Phe Val Leu Tyr Ala
Asn Asp Leu Ile Lys 995 1000
1005Leu Thr Ala Lys Lys Asn Glu Phe Leu Gly Tyr Phe Val Ser Leu
1010 1015 1020Asn Arg Ala Thr Gly Ala
Ile Asp Ile Arg Thr His Asp Thr Asp 1025 1030
1035Ser Thr Lys Gly Lys Asn Gly Ile Phe Gln Ser Val Gly Val
Lys 1040 1045 1050Thr Ala Leu Ser Phe
Gln Lys Tyr Gln Ile Asp Glu Leu Gly Lys 1055 1060
1065Glu Ile Arg Pro Cys Arg Leu Lys Lys Arg Pro Pro Val
Arg 1070 1075 108043198PRTHomo sapiens
43Met Asp Ser Leu Leu Met Asn Arg Arg Lys Phe Leu Tyr Gln Phe Lys1
5 10 15Asn Val Arg Trp Ala Lys
Gly Arg Arg Glu Thr Tyr Leu Cys Tyr Val 20 25
30Val Lys Arg Arg Asp Ser Ala Thr Ser Phe Ser Leu Asp
Phe Gly Tyr 35 40 45Leu Arg Asn
Lys Asn Gly Cys His Val Glu Leu Leu Phe Leu Arg Tyr 50
55 60Ile Ser Asp Trp Asp Leu Asp Pro Gly Arg Cys Tyr
Arg Val Thr Trp65 70 75
80Phe Thr Ser Trp Ser Pro Cys Tyr Asp Cys Ala Arg His Val Ala Asp
85 90 95Phe Leu Arg Gly Asn Pro
Asn Leu Ser Leu Arg Ile Phe Thr Ala Arg 100
105 110Leu Tyr Phe Cys Glu Asp Arg Lys Ala Glu Pro Glu
Gly Leu Arg Arg 115 120 125Leu His
Arg Ala Gly Val Gln Ile Ala Ile Met Thr Phe Lys Asp Tyr 130
135 140Phe Tyr Cys Trp Asn Thr Phe Val Glu Asn His
Glu Arg Thr Phe Lys145 150 155
160Ala Trp Glu Gly Leu His Glu Asn Ser Val Arg Leu Ser Arg Gln Leu
165 170 175Arg Arg Ile Leu
Leu Pro Leu Tyr Glu Val Asp Asp Leu Arg Asp Ala 180
185 190Phe Arg Thr Leu Gly Leu
19544190PRTArtificial SequencehAIDv solubility variant lacking N-terminal
RNA-binding region 44Met Asp Pro His Ile Phe Thr Ser Asn Phe Asn Asn
Gly Ile Gly Arg1 5 10
15His Lys Thr Tyr Leu Cys Tyr Glu Val Glu Arg Leu Asp Ser Ala Thr
20 25 30Ser Phe Ser Leu Asp Phe Gly
Tyr Leu Arg Asn Lys Asn Gly Cys His 35 40
45Val Glu Leu Leu Phe Leu Arg Tyr Ile Ser Asp Trp Asp Leu Asp
Pro 50 55 60Gly Arg Cys Tyr Arg Val
Thr Trp Phe Thr Ser Trp Ser Pro Cys Tyr65 70
75 80Asp Cys Ala Arg His Val Ala Asp Phe Leu Arg
Gly Asn Pro Asn Leu 85 90
95Ser Leu Arg Ile Phe Thr Ala Arg Leu Tyr Phe Cys Glu Asp Arg Lys
100 105 110Ala Glu Pro Glu Gly Leu
Arg Arg Leu His Arg Ala Gly Val Gln Ile 115 120
125Ala Ile Met Thr Phe Lys Asp Tyr Phe Tyr Cys Trp Asn Thr
Phe Val 130 135 140Glu Asn His Glu Arg
Thr Phe Lys Ala Trp Glu Gly Leu His Glu Asn145 150
155 160Ser Val Arg Leu Ser Arg Gln Leu Arg Arg
Ile Leu Leu Pro Leu Tyr 165 170
175Glu Val Asp Asp Leu Arg Asp Ala Phe Arg Thr Leu Gly Leu
180 185 19045175PRTArtificial
SequencehAIDv solubility variant lacking N-terminal RNA-binding
region and the C-terminal poorly structured 45Met Asp Pro His Ile Phe Thr
Ser Asn Phe Asn Asn Gly Ile Gly Arg1 5 10
15His Lys Thr Tyr Leu Cys Tyr Glu Val Glu Arg Leu Asp
Ser Ala Thr 20 25 30Ser Phe
Ser Leu Asp Phe Gly Tyr Leu Arg Asn Lys Asn Gly Cys His 35
40 45Val Glu Leu Leu Phe Leu Arg Tyr Ile Ser
Asp Trp Asp Leu Asp Pro 50 55 60Gly
Arg Cys Tyr Arg Val Thr Trp Phe Thr Ser Trp Ser Pro Cys Tyr65
70 75 80Asp Cys Ala Arg His Val
Ala Asp Phe Leu Arg Gly Asn Pro Asn Leu 85
90 95Ser Leu Arg Ile Phe Thr Ala Arg Leu Tyr Phe Cys
Glu Asp Arg Lys 100 105 110Ala
Glu Pro Glu Gly Leu Arg Arg Leu His Arg Ala Gly Val Gln Ile 115
120 125Ala Ile Met Thr Phe Lys Asp Tyr Phe
Tyr Cys Trp Asn Thr Phe Val 130 135
140Glu Asn His Glu Arg Thr Phe Lys Ala Trp Glu Gly Leu His Glu Asn145
150 155 160Ser Val Arg Leu
Ser Arg Gln Leu Arg Arg Ile Leu Leu Pro Leu 165
170 17546229PRTRattus norvegicus 46Met Ser Ser Glu
Thr Gly Pro Val Ala Val Asp Pro Thr Leu Arg Arg1 5
10 15Arg Ile Glu Pro His Glu Phe Glu Val Phe
Phe Asp Pro Arg Glu Leu 20 25
30Arg Lys Glu Thr Cys Leu Leu Tyr Glu Ile Asn Trp Gly Gly Arg His
35 40 45Ser Ile Trp Arg His Thr Ser Gln
Asn Thr Asn Lys His Val Glu Val 50 55
60Asn Phe Ile Glu Lys Phe Thr Thr Glu Arg Tyr Phe Cys Pro Asn Thr65
70 75 80Arg Cys Ser Ile Thr
Trp Phe Leu Ser Trp Ser Pro Cys Gly Glu Cys 85
90 95Ser Arg Ala Ile Thr Glu Phe Leu Ser Arg Tyr
Pro His Val Thr Leu 100 105
110Phe Ile Tyr Ile Ala Arg Leu Tyr His His Ala Asp Pro Arg Asn Arg
115 120 125Gln Gly Leu Arg Asp Leu Ile
Ser Ser Gly Val Thr Ile Gln Ile Met 130 135
140Thr Glu Gln Glu Ser Gly Tyr Cys Trp Arg Asn Phe Val Asn Tyr
Ser145 150 155 160Pro Ser
Asn Glu Ala His Trp Pro Arg Tyr Pro His Leu Trp Val Arg
165 170 175Leu Tyr Val Leu Glu Leu Tyr
Cys Ile Ile Leu Gly Leu Pro Pro Cys 180 185
190Leu Asn Ile Leu Arg Arg Lys Gln Pro Gln Leu Thr Phe Phe
Thr Ile 195 200 205Ala Leu Gln Ser
Cys His Tyr Gln Arg Leu Pro Pro His Ile Leu Trp 210
215 220Ala Thr Gly Leu Lys22547397PRTMus musculus 47Met
Gly Pro Phe Cys Leu Gly Cys Ser His Arg Lys Cys Tyr Ser Pro1
5 10 15Ile Arg Asn Leu Ile Ser Gln
Glu Thr Phe Lys Phe His Phe Lys Asn 20 25
30Leu Gly Tyr Ala Lys Gly Arg Lys Asp Thr Phe Leu Cys Tyr
Glu Val 35 40 45Thr Arg Lys Asp
Cys Asp Ser Pro Val Ser Leu His His Gly Val Phe 50 55
60Lys Asn Lys Asp Asn Ile His Ala Glu Ile Cys Phe Leu
Tyr Trp Phe65 70 75
80His Asp Lys Val Leu Lys Val Leu Ser Pro Arg Glu Glu Phe Lys Ile
85 90 95Thr Trp Tyr Met Ser Trp
Ser Pro Cys Phe Glu Cys Ala Glu Gln Ile 100
105 110Val Arg Phe Leu Ala Thr His His Asn Leu Ser Leu
Asp Ile Phe Ser 115 120 125Ser Arg
Leu Tyr Asn Val Gln Asp Pro Glu Thr Gln Gln Asn Leu Cys 130
135 140Arg Leu Val Gln Glu Gly Ala Gln Val Ala Ala
Met Asp Leu Tyr Glu145 150 155
160Phe Lys Lys Cys Trp Lys Lys Phe Val Asp Asn Gly Gly Arg Arg Phe
165 170 175Arg Pro Trp Lys
Arg Leu Leu Thr Asn Phe Arg Tyr Gln Asp Ser Lys 180
185 190Leu Gln Glu Ile Leu Arg Arg Met Asp Pro Leu
Ser Glu Glu Glu Phe 195 200 205Tyr
Ser Gln Phe Tyr Asn Gln Arg Val Lys His Leu Cys Tyr Tyr His 210
215 220Arg Met Lys Pro Tyr Leu Cys Tyr Gln Leu
Glu Gln Phe Asn Gly Gln225 230 235
240Ala Pro Leu Lys Gly Cys Leu Leu Ser Glu Lys Gly Lys Gln His
Ala 245 250 255Glu Ile Leu
Phe Leu Asp Lys Ile Arg Ser Met Glu Leu Ser Gln Val 260
265 270Thr Ile Thr Cys Tyr Leu Thr Trp Ser Pro
Cys Pro Asn Cys Ala Trp 275 280
285Gln Leu Ala Ala Phe Lys Arg Asp Arg Pro Asp Leu Ile Leu His Ile 290
295 300Tyr Thr Ser Arg Leu Tyr Phe His
Trp Lys Arg Pro Phe Gln Lys Gly305 310
315 320Leu Cys Ser Leu Trp Gln Ser Gly Ile Leu Val Asp
Val Met Asp Leu 325 330
335Pro Gln Phe Thr Asp Cys Trp Thr Asn Phe Val Asn Pro Lys Arg Pro
340 345 350Phe Arg Pro Trp Lys Gly
Leu Glu Ile Ile Ser Arg Arg Thr Gln Arg 355 360
365Arg Leu Arg Arg Ile Lys Glu Ser Trp Gly Leu Gln Asp Leu
Val Asn 370 375 380Asp Phe Gly Asn Leu
Gln Leu Gly Pro Pro Met Ser Asn385 390
39548199PRTArtificial SequencemAPOBEC3 catalytic domain 48Met Gly Pro Phe
Cys Leu Gly Cys Ser His Arg Lys Cys Tyr Ser Pro1 5
10 15Ile Arg Asn Leu Ile Ser Gln Glu Thr Phe
Lys Phe His Phe Lys Asn 20 25
30Leu Gly Tyr Ala Lys Gly Arg Lys Asp Thr Phe Leu Cys Tyr Glu Val
35 40 45Thr Arg Lys Asp Cys Asp Ser Pro
Val Ser Leu His His Gly Val Phe 50 55
60Lys Asn Lys Asp Asn Ile His Ala Glu Ile Cys Phe Leu Tyr Trp Phe65
70 75 80His Asp Lys Val Leu
Lys Val Leu Ser Pro Arg Glu Glu Phe Lys Ile 85
90 95Thr Trp Tyr Met Ser Trp Ser Pro Cys Phe Glu
Cys Ala Glu Gln Ile 100 105
110Val Arg Phe Leu Ala Thr His His Asn Leu Ser Leu Asp Ile Phe Ser
115 120 125Ser Arg Leu Tyr Asn Val Gln
Asp Pro Glu Thr Gln Gln Asn Leu Cys 130 135
140Arg Leu Val Gln Glu Gly Ala Gln Val Ala Ala Met Asp Leu Tyr
Glu145 150 155 160Phe Lys
Lys Cys Trp Lys Lys Phe Val Asp Asn Gly Gly Arg Arg Phe
165 170 175Arg Pro Trp Lys Arg Leu Leu
Thr Asn Phe Arg Tyr Gln Asp Ser Lys 180 185
190Leu Gln Glu Ile Leu Arg Arg 19549199PRTHomo
sapiens 49Met Glu Ala Ser Pro Ala Ser Gly Pro Arg His Leu Met Asp Pro
His1 5 10 15Ile Phe Thr
Ser Asn Phe Asn Asn Gly Ile Gly Arg His Lys Thr Tyr 20
25 30Leu Cys Tyr Glu Val Glu Arg Leu Asp Asn
Gly Thr Ser Val Lys Met 35 40
45Asp Gln His Arg Gly Phe Leu His Asn Gln Ala Lys Asn Leu Leu Cys 50
55 60Gly Phe Tyr Gly Arg His Ala Glu Leu
Arg Phe Leu Asp Leu Val Pro65 70 75
80Ser Leu Gln Leu Asp Pro Ala Gln Ile Tyr Arg Val Thr Trp
Phe Ile 85 90 95Ser Trp
Ser Pro Cys Phe Ser Trp Gly Cys Ala Gly Glu Val Arg Ala 100
105 110Phe Leu Gln Glu Asn Thr His Val Arg
Leu Arg Ile Phe Ala Ala Arg 115 120
125Ile Tyr Asp Tyr Asp Pro Leu Tyr Lys Glu Ala Leu Gln Met Leu Arg
130 135 140Asp Ala Gly Ala Gln Val Ser
Ile Met Thr Tyr Asp Glu Phe Lys His145 150
155 160Cys Trp Asp Thr Phe Val Asp His Gln Gly Cys Pro
Phe Gln Pro Trp 165 170
175Asp Gly Leu Asp Glu His Ser Gln Ala Leu Ser Gly Arg Leu Arg Ala
180 185 190Ile Leu Gln Asn Gln Gly
Asn 19550384PRTHomo sapiens 50Met Lys Pro His Phe Arg Asn Thr Val
Glu Arg Met Tyr Arg Asp Thr1 5 10
15Phe Ser Tyr Asn Phe Tyr Asn Arg Pro Ile Leu Ser Arg Arg Asn
Thr 20 25 30Val Trp Leu Cys
Tyr Glu Val Lys Thr Lys Gly Pro Ser Arg Pro Pro 35
40 45Leu Asp Ala Lys Ile Phe Arg Gly Gln Val Tyr Ser
Glu Leu Lys Tyr 50 55 60His Pro Glu
Met Arg Phe Phe His Trp Phe Ser Lys Trp Arg Lys Leu65 70
75 80His Arg Asp Gln Glu Tyr Glu Val
Thr Trp Tyr Ile Ser Trp Ser Pro 85 90
95Cys Thr Lys Cys Thr Arg Asp Met Ala Thr Phe Leu Ala Glu
Asp Pro 100 105 110Lys Val Thr
Leu Thr Ile Phe Val Ala Arg Leu Tyr Tyr Phe Trp Asp 115
120 125Pro Asp Tyr Gln Glu Ala Leu Arg Ser Leu Cys
Gln Lys Arg Asp Gly 130 135 140Pro Arg
Ala Thr Met Lys Ile Met Asn Tyr Asp Glu Phe Gln His Cys145
150 155 160Trp Ser Lys Phe Val Tyr Ser
Gln Arg Glu Leu Phe Glu Pro Trp Asn 165
170 175Asn Leu Pro Lys Tyr Tyr Ile Leu Leu His Ile Met
Leu Gly Glu Ile 180 185 190Leu
Arg His Ser Met Asp Pro Pro Thr Phe Thr Phe Asn Phe Asn Asn 195
200 205Glu Pro Trp Val Arg Gly Arg His Glu
Thr Tyr Leu Cys Tyr Glu Val 210 215
220Glu Arg Met His Asn Asp Thr Trp Val Leu Leu Asn Gln Arg Arg Gly225
230 235 240Phe Leu Cys Asn
Gln Ala Pro His Lys His Gly Phe Leu Glu Gly Arg 245
250 255His Ala Glu Leu Cys Phe Leu Asp Val Ile
Pro Phe Trp Lys Leu Asp 260 265
270Leu Asp Gln Asp Tyr Arg Val Thr Cys Phe Thr Ser Trp Ser Pro Cys
275 280 285Phe Ser Cys Ala Gln Glu Met
Ala Lys Phe Ile Ser Lys Asn Lys His 290 295
300Val Ser Leu Cys Ile Phe Thr Ala Arg Ile Tyr Asp Asp Gln Gly
Arg305 310 315 320Cys Gln
Glu Gly Leu Arg Thr Leu Ala Glu Ala Gly Ala Lys Ile Ser
325 330 335Ile Met Thr Tyr Ser Glu Phe
Lys His Cys Trp Asp Thr Phe Val Asp 340 345
350His Gln Gly Cys Pro Phe Gln Pro Trp Asp Gly Leu Asp Glu
His Ser 355 360 365Gln Asp Leu Ser
Gly Arg Leu Arg Ala Ile Leu Gln Asn Gln Glu Asn 370
375 38051186PRTArtificial SequencehAPOBEC3G catalytic
domain 51Pro Pro Thr Phe Thr Phe Asn Phe Asn Asn Glu Pro Trp Val Arg Gly1
5 10 15Arg His Glu Thr
Tyr Leu Cys Tyr Glu Val Glu Arg Met His Asn Asp 20
25 30Thr Trp Val Leu Leu Asn Gln Arg Arg Gly Phe
Leu Cys Asn Gln Ala 35 40 45Pro
His Lys His Gly Phe Leu Glu Gly Arg His Ala Glu Leu Cys Phe 50
55 60Leu Asp Val Ile Pro Phe Trp Lys Leu Asp
Leu Asp Gln Asp Tyr Arg65 70 75
80Val Thr Cys Phe Thr Ser Trp Ser Pro Cys Phe Ser Cys Ala Gln
Glu 85 90 95Met Ala Lys
Phe Ile Ser Lys Asn Lys His Val Ser Leu Cys Ile Phe 100
105 110Thr Ala Arg Ile Tyr Asp Asp Gln Gly Arg
Cys Gln Glu Gly Leu Arg 115 120
125Thr Leu Ala Glu Ala Gly Ala Lys Ile Ser Ile Met Thr Tyr Ser Glu 130
135 140Phe Lys His Cys Trp Asp Thr Phe
Val Asp His Gln Gly Cys Pro Phe145 150
155 160Gln Pro Trp Asp Gly Leu Asp Glu His Ser Gln Asp
Leu Ser Gly Arg 165 170
175Leu Arg Ala Ile Leu Gln Asn Gln Glu Asn 180
18552200PRTHomo sapiens 52Met Ala Leu Leu Thr Ala Glu Thr Phe Arg Leu Gln
Phe Asn Asn Lys1 5 10
15Arg Arg Leu Arg Arg Pro Tyr Tyr Pro Arg Lys Ala Leu Leu Cys Tyr
20 25 30Gln Leu Thr Pro Gln Asn Gly
Ser Thr Pro Thr Arg Gly Tyr Phe Glu 35 40
45Asn Lys Lys Lys Cys His Ala Glu Ile Cys Phe Ile Asn Glu Ile
Lys 50 55 60Ser Met Gly Leu Asp Glu
Thr Gln Cys Tyr Gln Val Thr Cys Tyr Leu65 70
75 80Thr Trp Ser Pro Cys Ser Ser Cys Ala Trp Glu
Leu Val Asp Phe Ile 85 90
95Lys Ala His Asp His Leu Asn Leu Gly Ile Phe Ala Ser Arg Leu Tyr
100 105 110Tyr His Trp Cys Lys Pro
Gln Gln Lys Gly Leu Arg Leu Leu Cys Gly 115 120
125Ser Gln Val Pro Val Glu Val Met Gly Phe Pro Lys Phe Ala
Asp Cys 130 135 140Trp Glu Asn Phe Val
Asp His Glu Lys Pro Leu Ser Phe Asn Pro Tyr145 150
155 160Lys Met Leu Glu Glu Leu Asp Lys Asn Ser
Arg Ala Ile Lys Arg Arg 165 170
175Leu Glu Arg Ile Lys Ile Pro Gly Val Arg Ala Gln Gly Arg Tyr Met
180 185 190Asp Ile Leu Cys Asp
Ala Glu Val 195 20053373PRTHomo sapiens 53Met Lys
Pro His Phe Arg Asn Thr Val Glu Arg Met Tyr Arg Asp Thr1 5
10 15Phe Ser Tyr Asn Phe Tyr Asn Arg
Pro Ile Leu Ser Arg Arg Asn Thr 20 25
30Val Trp Leu Cys Tyr Glu Val Lys Thr Lys Gly Pro Ser Arg Pro
Arg 35 40 45Leu Asp Ala Lys Ile
Phe Arg Gly Gln Val Tyr Ser Gln Pro Glu His 50 55
60His Ala Glu Met Cys Phe Leu Ser Trp Phe Cys Gly Asn Gln
Leu Pro65 70 75 80Ala
Tyr Lys Cys Phe Gln Ile Thr Trp Phe Val Ser Trp Thr Pro Cys
85 90 95Pro Asp Cys Val Ala Lys Leu
Ala Glu Phe Leu Ala Glu His Pro Asn 100 105
110Val Thr Leu Thr Ile Ser Ala Ala Arg Leu Tyr Tyr Tyr Trp
Glu Arg 115 120 125Asp Tyr Arg Arg
Ala Leu Cys Arg Leu Ser Gln Ala Gly Ala Arg Val 130
135 140Lys Ile Met Asp Asp Glu Glu Phe Ala Tyr Cys Trp
Glu Asn Phe Val145 150 155
160Tyr Ser Glu Gly Gln Pro Phe Met Pro Trp Tyr Lys Phe Asp Asp Asn
165 170 175Tyr Ala Phe Leu His
Arg Thr Leu Lys Glu Ile Leu Arg Asn Pro Met 180
185 190Glu Ala Met Tyr Pro His Ile Phe Tyr Phe His Phe
Lys Asn Leu Arg 195 200 205Lys Ala
Tyr Gly Arg Asn Glu Ser Trp Leu Cys Phe Thr Met Glu Val 210
215 220Val Lys His His Ser Pro Val Ser Trp Lys Arg
Gly Val Phe Arg Asn225 230 235
240Gln Val Asp Pro Glu Thr His Cys His Ala Glu Arg Cys Phe Leu Ser
245 250 255Trp Phe Cys Asp
Asp Ile Leu Ser Pro Asn Thr Asn Tyr Glu Val Thr 260
265 270Trp Tyr Thr Ser Trp Ser Pro Cys Pro Glu Cys
Ala Gly Glu Val Ala 275 280 285Glu
Phe Leu Ala Arg His Ser Asn Val Asn Leu Thr Ile Phe Thr Ala 290
295 300Arg Leu Tyr Tyr Phe Trp Asp Thr Asp Tyr
Gln Glu Gly Leu Arg Ser305 310 315
320Leu Ser Gln Glu Gly Ala Ser Val Glu Ile Met Gly Tyr Lys Asp
Phe 325 330 335Lys Tyr Cys
Trp Glu Asn Phe Val Tyr Asn Asp Asp Glu Pro Phe Lys 340
345 350Pro Trp Lys Gly Leu Lys Tyr Asn Phe Leu
Phe Leu Asp Ser Lys Leu 355 360
365Gln Glu Ile Leu Glu 37054189PRTArtificial SequencehAPOBEC3F
catalytic domain 54Lys Glu Ile Leu Arg Asn Pro Met Glu Ala Met Tyr Pro
His Ile Phe1 5 10 15Tyr
Phe His Phe Lys Asn Leu Arg Lys Ala Tyr Gly Arg Asn Glu Ser 20
25 30Trp Leu Cys Phe Thr Met Glu Val
Val Lys His His Ser Pro Val Ser 35 40
45Trp Lys Arg Gly Val Phe Arg Asn Gln Val Asp Pro Glu Thr His Cys
50 55 60His Ala Glu Arg Cys Phe Leu Ser
Trp Phe Cys Asp Asp Ile Leu Ser65 70 75
80Pro Asn Thr Asn Tyr Glu Val Thr Trp Tyr Thr Ser Trp
Ser Pro Cys 85 90 95Pro
Glu Cys Ala Gly Glu Val Ala Glu Phe Leu Ala Arg His Ser Asn
100 105 110Val Asn Leu Thr Ile Phe Thr
Ala Arg Leu Tyr Tyr Phe Trp Asp Thr 115 120
125Asp Tyr Gln Glu Gly Leu Arg Ser Leu Ser Gln Glu Gly Ala Ser
Val 130 135 140Glu Ile Met Gly Tyr Lys
Asp Phe Lys Tyr Cys Trp Glu Asn Phe Val145 150
155 160Tyr Asn Asp Asp Glu Pro Phe Lys Pro Trp Lys
Gly Leu Lys Tyr Asn 165 170
175Phe Leu Phe Leu Asp Ser Lys Leu Gln Glu Ile Leu Glu 180
185551003PRTC. lari 55Met Arg Ile Leu Gly Phe Asp Ile Gly
Ile Asn Ser Ile Gly Trp Ala1 5 10
15Phe Val Glu Asn Asp Glu Leu Lys Asp Cys Gly Val Arg Ile Phe
Thr 20 25 30Lys Ala Glu Asn
Pro Lys Asn Lys Glu Ser Leu Ala Leu Pro Arg Arg 35
40 45Asn Ala Arg Ser Ser Arg Arg Arg Leu Lys Arg Arg
Lys Ala Arg Leu 50 55 60Ile Ala Ile
Lys Arg Ile Leu Ala Lys Glu Leu Lys Leu Asn Tyr Lys65 70
75 80Asp Tyr Val Ala Ala Asp Gly Glu
Leu Pro Lys Ala Tyr Glu Gly Ser 85 90
95Leu Ala Ser Val Tyr Glu Leu Arg Tyr Lys Ala Leu Thr Gln
Asn Leu 100 105 110Glu Thr Lys
Asp Leu Ala Arg Val Ile Leu His Ile Ala Lys His Arg 115
120 125Gly Tyr Met Asn Lys Asn Glu Lys Lys Ser Asn
Asp Ala Lys Lys Gly 130 135 140Lys Ile
Leu Ser Ala Leu Lys Asn Asn Ala Leu Lys Leu Glu Asn Tyr145
150 155 160Gln Ser Val Gly Glu Tyr Phe
Tyr Lys Glu Phe Phe Gln Lys Tyr Lys 165
170 175Lys Asn Thr Lys Asn Phe Ile Lys Ile Arg Asn Thr
Lys Asp Asn Tyr 180 185 190Asn
Asn Cys Val Leu Ser Ser Asp Leu Glu Lys Glu Leu Lys Leu Ile 195
200 205Leu Glu Lys Gln Lys Glu Phe Gly Tyr
Asn Tyr Ser Glu Asp Phe Ile 210 215
220Asn Glu Ile Leu Lys Val Ala Phe Phe Gln Arg Pro Leu Lys Asp Phe225
230 235 240Ser His Leu Val
Gly Ala Cys Thr Phe Phe Glu Glu Glu Lys Arg Ala 245
250 255Cys Lys Asn Ser Tyr Ser Ala Trp Glu Phe
Val Ala Leu Thr Lys Ile 260 265
270Ile Asn Glu Ile Lys Ser Leu Glu Lys Ile Ser Gly Glu Ile Val Pro
275 280 285Thr Gln Thr Ile Asn Glu Val
Leu Asn Leu Ile Leu Asp Lys Gly Ser 290 295
300Ile Thr Tyr Lys Lys Phe Arg Ser Cys Ile Asn Leu His Glu Ser
Ile305 310 315 320Ser Phe
Lys Ser Leu Lys Tyr Asp Lys Glu Asn Ala Glu Asn Ala Lys
325 330 335Leu Ile Asp Phe Arg Lys Leu
Val Glu Phe Lys Lys Ala Leu Gly Val 340 345
350His Ser Leu Ser Arg Gln Glu Leu Asp Gln Ile Ser Thr His
Ile Thr 355 360 365Leu Ile Lys Asp
Asn Val Lys Leu Lys Thr Val Leu Glu Lys Tyr Asn 370
375 380Leu Ser Asn Glu Gln Ile Asn Asn Leu Leu Glu Ile
Glu Phe Asn Asp385 390 395
400Tyr Ile Asn Leu Ser Phe Lys Ala Leu Gly Met Ile Leu Pro Leu Met
405 410 415Arg Glu Gly Lys Arg
Tyr Asp Glu Ala Cys Glu Ile Ala Asn Leu Lys 420
425 430Pro Lys Thr Val Asp Glu Lys Lys Asp Phe Leu Pro
Ala Phe Cys Asp 435 440 445Ser Ile
Phe Ala His Glu Leu Ser Asn Pro Val Val Asn Arg Ala Ile 450
455 460Ser Glu Tyr Arg Lys Val Leu Asn Ala Leu Leu
Lys Lys Tyr Gly Lys465 470 475
480Val His Lys Ile His Leu Glu Leu Ala Arg Asp Val Gly Leu Ser Lys
485 490 495Lys Ala Arg Glu
Lys Ile Glu Lys Glu Gln Lys Glu Asn Gln Ala Val 500
505 510Asn Ala Trp Ala Leu Lys Glu Cys Glu Asn Ile
Gly Leu Lys Ala Ser 515 520 525Ala
Lys Asn Ile Leu Lys Leu Lys Leu Trp Lys Glu Gln Lys Glu Ile 530
535 540Cys Ile Tyr Ser Gly Asn Lys Ile Ser Ile
Glu His Leu Lys Asp Glu545 550 555
560Lys Ala Leu Glu Val Asp His Ile Tyr Pro Tyr Ser Arg Ser Phe
Asp 565 570 575Asp Ser Phe
Ile Asn Lys Val Leu Val Phe Thr Lys Glu Asn Gln Glu 580
585 590Lys Leu Asn Lys Thr Pro Phe Glu Ala Phe
Gly Lys Asn Ile Glu Lys 595 600
605Trp Ser Lys Ile Gln Thr Leu Ala Gln Asn Leu Pro Tyr Lys Lys Lys 610
615 620Asn Lys Ile Leu Asp Glu Asn Phe
Lys Asp Lys Gln Gln Glu Asp Phe625 630
635 640Ile Ser Arg Asn Leu Asn Asp Thr Arg Tyr Ile Ala
Thr Leu Ile Ala 645 650
655Lys Tyr Thr Lys Glu Tyr Leu Asn Phe Leu Leu Leu Ser Glu Asn Glu
660 665 670Asn Ala Asn Leu Lys Ser
Gly Glu Lys Gly Ser Lys Ile His Val Gln 675 680
685Thr Ile Ser Gly Met Leu Thr Ser Val Leu Arg His Thr Trp
Gly Phe 690 695 700Asp Lys Lys Asp Arg
Asn Asn His Leu His His Ala Leu Asp Ala Ile705 710
715 720Ile Val Ala Tyr Ser Thr Asn Ser Ile Ile
Lys Ala Phe Ser Asp Phe 725 730
735Arg Lys Asn Gln Glu Leu Leu Lys Ala Arg Phe Tyr Ala Lys Glu Leu
740 745 750Thr Ser Asp Asn Tyr
Lys His Gln Val Lys Phe Phe Glu Pro Phe Lys 755
760 765Ser Phe Arg Glu Lys Ile Leu Ser Lys Ile Asp Glu
Ile Phe Val Ser 770 775 780Lys Pro Pro
Arg Lys Arg Ala Arg Arg Ala Leu His Lys Asp Thr Phe785
790 795 800His Ser Glu Asn Lys Ile Ile
Asp Lys Cys Ser Tyr Asn Ser Lys Glu 805
810 815Gly Leu Gln Ile Ala Leu Ser Cys Gly Arg Val Arg
Lys Ile Gly Thr 820 825 830Lys
Tyr Val Glu Asn Asp Thr Ile Val Arg Val Asp Ile Phe Lys Lys 835
840 845Gln Asn Lys Phe Tyr Ala Ile Pro Ile
Tyr Ala Met Asp Phe Ala Leu 850 855
860Gly Ile Leu Pro Asn Lys Ile Val Ile Thr Gly Lys Asp Lys Asn Asn865
870 875 880Asn Pro Lys Gln
Trp Gln Thr Ile Asp Glu Ser Tyr Glu Phe Cys Phe 885
890 895Ser Leu Tyr Lys Asn Asp Leu Ile Leu Leu
Gln Lys Lys Asn Met Gln 900 905
910Glu Pro Glu Phe Ala Tyr Tyr Asn Asp Phe Ser Ile Ser Thr Ser Ser
915 920 925Ile Cys Val Glu Lys His Asp
Asn Lys Phe Glu Asn Leu Thr Ser Asn 930 935
940Gln Lys Leu Leu Phe Ser Asn Ala Lys Glu Gly Ser Val Lys Val
Glu945 950 955 960Ser Leu
Gly Ile Gln Asn Leu Lys Val Phe Glu Lys Tyr Ile Ile Thr
965 970 975Pro Leu Gly Asp Lys Ile Lys
Ala Asp Phe Gln Pro Arg Glu Asn Ile 980 985
990Ser Leu Lys Thr Ser Lys Lys Tyr Gly Leu Arg 995
1000561418PRTM. bovoculi 56Met Leu Phe Gln Asp Phe Thr His
Leu Tyr Pro Leu Ser Lys Thr Val1 5 10
15Arg Phe Glu Leu Lys Pro Ile Asp Arg Thr Leu Glu His Ile
His Ala 20 25 30Lys Asn Phe
Leu Ser Gln Asp Glu Thr Met Ala Asp Met His Gln Lys 35
40 45Val Lys Val Ile Leu Asp Asp Tyr His Arg Asp
Phe Ile Ala Asp Met 50 55 60Met Gly
Glu Val Lys Leu Thr Lys Leu Ala Glu Phe Tyr Asp Val Tyr65
70 75 80Leu Lys Phe Arg Lys Asn Pro
Lys Asp Asp Glu Leu Gln Lys Gln Leu 85 90
95Lys Asp Leu Gln Ala Val Leu Arg Lys Glu Ile Val Lys
Pro Ile Gly 100 105 110Asn Gly
Gly Lys Tyr Lys Ala Gly Tyr Asp Arg Leu Phe Gly Ala Lys 115
120 125Leu Phe Lys Asp Gly Lys Glu Leu Gly Asp
Leu Ala Lys Phe Val Ile 130 135 140Ala
Gln Glu Gly Glu Ser Ser Pro Lys Leu Ala His Leu Ala His Phe145
150 155 160Glu Lys Phe Ser Thr Tyr
Phe Thr Gly Phe His Asp Asn Arg Lys Asn 165
170 175Met Tyr Ser Asp Glu Asp Lys His Thr Ala Ile Ala
Tyr Arg Leu Ile 180 185 190His
Glu Asn Leu Pro Arg Phe Ile Asp Asn Leu Gln Ile Leu Thr Thr 195
200 205Ile Lys Gln Lys His Ser Ala Leu Tyr
Asp Gln Ile Ile Asn Glu Leu 210 215
220Thr Ala Ser Gly Leu Asp Val Ser Leu Ala Ser His Leu Asp Gly Tyr225
230 235 240His Lys Leu Leu
Thr Gln Glu Gly Ile Thr Ala Tyr Asn Thr Leu Leu 245
250 255Gly Gly Ile Ser Gly Glu Ala Gly Ser Pro
Lys Ile Gln Gly Ile Asn 260 265
270Glu Leu Ile Asn Ser His His Asn Gln His Cys His Lys Ser Glu Arg
275 280 285Ile Ala Lys Leu Arg Pro Leu
His Lys Gln Ile Leu Ser Asp Gly Met 290 295
300Ser Val Ser Phe Leu Pro Ser Lys Phe Ala Asp Asp Ser Glu Met
Cys305 310 315 320Gln Ala
Val Asn Glu Phe Tyr Arg His Tyr Ala Asp Val Phe Ala Lys
325 330 335Val Gln Ser Leu Phe Asp Gly
Phe Asp Asp His Gln Lys Asp Gly Ile 340 345
350Tyr Val Glu His Lys Asn Leu Asn Glu Leu Ser Lys Gln Ala
Phe Gly 355 360 365Asp Phe Ala Leu
Leu Gly Arg Val Leu Asp Gly Tyr Tyr Val Asp Val 370
375 380Val Asn Pro Glu Phe Asn Glu Arg Phe Ala Lys Ala
Lys Thr Asp Asn385 390 395
400Ala Lys Ala Lys Leu Thr Lys Glu Lys Asp Lys Phe Ile Lys Gly Val
405 410 415His Ser Leu Ala Ser
Leu Glu Gln Ala Ile Glu His Tyr Thr Ala Arg 420
425 430His Asp Asp Glu Ser Val Gln Ala Gly Lys Leu Gly
Gln Tyr Phe Lys 435 440 445His Gly
Leu Ala Gly Val Asp Asn Pro Ile Gln Lys Ile His Asn Asn 450
455 460His Ser Thr Ile Lys Gly Phe Leu Glu Arg Glu
Arg Pro Ala Gly Glu465 470 475
480Arg Ala Leu Pro Lys Ile Lys Ser Gly Lys Asn Pro Glu Met Thr Gln
485 490 495Leu Arg Gln Leu
Lys Glu Leu Leu Asp Asn Ala Leu Asn Val Ala His 500
505 510Phe Ala Lys Leu Leu Thr Thr Lys Thr Thr Leu
Asp Asn Gln Asp Gly 515 520 525Asn
Phe Tyr Gly Glu Phe Gly Val Leu Tyr Asp Glu Leu Ala Lys Ile 530
535 540Pro Thr Leu Tyr Asn Lys Val Arg Asp Tyr
Leu Ser Gln Lys Pro Phe545 550 555
560Ser Thr Glu Lys Tyr Lys Leu Asn Phe Gly Asn Pro Thr Leu Leu
Asn 565 570 575Gly Trp Asp
Leu Asn Lys Glu Lys Asp Asn Phe Gly Val Ile Leu Gln 580
585 590Lys Asp Gly Cys Tyr Tyr Leu Ala Leu Leu
Asp Lys Ala His Lys Lys 595 600
605Val Phe Asp Asn Ala Pro Asn Thr Gly Lys Ser Ile Tyr Gln Lys Met 610
615 620Ile Tyr Lys Tyr Leu Glu Val Arg
Lys Gln Phe Pro Lys Val Phe Phe625 630
635 640Ser Lys Glu Ala Ile Ala Ile Asn Tyr His Pro Ser
Lys Glu Leu Val 645 650
655Glu Ile Lys Asp Lys Gly Arg Gln Arg Ser Asp Asp Glu Arg Leu Lys
660 665 670Leu Tyr Arg Phe Ile Leu
Glu Cys Leu Lys Ile His Pro Lys Tyr Asp 675 680
685Lys Lys Phe Glu Gly Ala Ile Gly Asp Ile Gln Leu Phe Lys
Lys Asp 690 695 700Lys Lys Gly Arg Glu
Val Pro Ile Ser Glu Lys Asp Leu Phe Asp Lys705 710
715 720Ile Asn Gly Ile Phe Ser Ser Lys Pro Lys
Leu Glu Met Glu Asp Phe 725 730
735Phe Ile Gly Glu Phe Lys Arg Tyr Asn Pro Ser Gln Asp Leu Val Asp
740 745 750Gln Tyr Asn Ile Tyr
Lys Lys Ile Asp Ser Asn Asp Asn Arg Lys Lys 755
760 765Glu Asn Phe Tyr Asn Asn His Pro Lys Phe Lys Lys
Asp Leu Val Arg 770 775 780Tyr Tyr Tyr
Glu Ser Met Cys Lys His Glu Glu Trp Glu Glu Ser Phe785
790 795 800Glu Phe Ser Lys Lys Leu Gln
Asp Ile Gly Cys Tyr Val Asp Val Asn 805
810 815Glu Leu Phe Thr Glu Ile Glu Thr Arg Arg Leu Asn
Tyr Lys Ile Ser 820 825 830Phe
Cys Asn Ile Asn Ala Asp Tyr Ile Asp Glu Leu Val Glu Gln Gly 835
840 845Gln Leu Tyr Leu Phe Gln Ile Tyr Asn
Lys Asp Phe Ser Pro Lys Ala 850 855
860His Gly Lys Pro Asn Leu His Thr Leu Tyr Phe Lys Ala Leu Phe Ser865
870 875 880Glu Asp Asn Leu
Ala Asp Pro Ile Tyr Lys Leu Asn Gly Glu Ala Gln 885
890 895Ile Phe Tyr Arg Lys Ala Ser Leu Asp Met
Asn Glu Thr Thr Ile His 900 905
910Arg Ala Gly Glu Val Leu Glu Asn Lys Asn Pro Asp Asn Pro Lys Lys
915 920 925Arg Gln Phe Val Tyr Asp Ile
Ile Lys Asp Lys Arg Tyr Thr Gln Asp 930 935
940Lys Phe Met Leu His Val Pro Ile Thr Met Asn Phe Gly Val Gln
Gly945 950 955 960Met Thr
Ile Lys Glu Phe Asn Lys Lys Val Asn Gln Ser Ile Gln Gln
965 970 975Tyr Asp Glu Val Asn Val Ile
Gly Ile Asp Arg Gly Glu Arg His Leu 980 985
990Leu Tyr Leu Thr Val Ile Asn Ser Lys Gly Glu Ile Leu Glu
Gln Cys 995 1000 1005Ser Leu Asn
Asp Ile Thr Thr Ala Ser Ala Asn Gly Thr Gln Met 1010
1015 1020Thr Thr Pro Tyr His Lys Ile Leu Asp Lys Arg
Glu Ile Glu Arg 1025 1030 1035Leu Asn
Ala Arg Val Gly Trp Gly Glu Ile Glu Thr Ile Lys Glu 1040
1045 1050Leu Lys Ser Gly Tyr Leu Ser His Val Val
His Gln Ile Ser Gln 1055 1060 1065Leu
Met Leu Lys Tyr Asn Ala Ile Val Val Leu Glu Asp Leu Asn 1070
1075 1080Phe Gly Phe Lys Arg Gly Arg Phe Lys
Val Glu Lys Gln Ile Tyr 1085 1090
1095Gln Asn Phe Glu Asn Ala Leu Ile Lys Lys Leu Asn His Leu Val
1100 1105 1110Leu Lys Asp Lys Ala Asp
Asp Glu Ile Gly Ser Tyr Lys Asn Ala 1115 1120
1125Leu Gln Leu Thr Asn Asn Phe Thr Asp Leu Lys Ser Ile Gly
Lys 1130 1135 1140Gln Thr Gly Phe Leu
Phe Tyr Val Pro Ala Trp Asn Thr Ser Lys 1145 1150
1155Ile Asp Pro Glu Thr Gly Phe Val Asp Leu Leu Lys Pro
Arg Tyr 1160 1165 1170Glu Asn Ile Ala
Gln Ser Gln Ala Phe Phe Gly Lys Phe Asp Lys 1175
1180 1185Ile Cys Tyr Asn Ala Asp Lys Asp Tyr Phe Glu
Phe His Ile Asp 1190 1195 1200Tyr Ala
Lys Phe Thr Asp Lys Ala Lys Asn Ser Arg Gln Ile Trp 1205
1210 1215Thr Ile Cys Ser His Gly Asp Lys Arg Tyr
Val Tyr Asp Lys Thr 1220 1225 1230Ala
Asn Gln Asn Lys Gly Ala Ala Lys Gly Ile Asn Val Asn Asp 1235
1240 1245Glu Leu Lys Ser Leu Phe Ala Arg His
His Ile Asn Glu Lys Gln 1250 1255
1260Pro Asn Leu Val Met Asp Ile Cys Gln Asn Asn Asp Lys Glu Phe
1265 1270 1275His Lys Ser Leu Met Tyr
Leu Leu Lys Thr Leu Leu Ala Leu Arg 1280 1285
1290Tyr Ser Asn Ala Ser Ser Asp Glu Asp Phe Ile Leu Ser Pro
Val 1295 1300 1305Ala Asn Asp Glu Gly
Val Phe Phe Asn Ser Ala Leu Ala Asp Asp 1310 1315
1320Thr Gln Pro Gln Asn Ala Asp Ala Asn Gly Ala Tyr His
Ile Ala 1325 1330 1335Leu Lys Gly Leu
Trp Leu Leu Asn Glu Leu Lys Asn Ser Asp Asp 1340
1345 1350Leu Asn Lys Val Lys Leu Ala Ile Asp Asn Gln
Thr Trp Leu Asn 1355 1360 1365Phe Ala
Gln Asn Arg Lys Arg Pro Ala Ala Thr Lys Lys Ala Gly 1370
1375 1380Gln Ala Lys Lys Lys Lys Gly Ser Tyr Pro
Tyr Asp Val Pro Asp 1385 1390 1395Tyr
Ala Tyr Pro Tyr Asp Val Pro Asp Tyr Ala Tyr Pro Tyr Asp 1400
1405 1410Val Pro Asp Tyr Ala 1415
User Contributions:
Comment about this patent or add new information about this topic: