Patent application title: System and Method of Induced Mutant Protein Based on Activation-induced Cytidine Deaminase
Inventors:
IPC8 Class: AC12N978FI
USPC Class:
1 1
Class name:
Publication date: 2022-03-24
Patent application number: 20220090043
Abstract:
The present invention provides a mutant protein of activation-induced
cytidine deaminase, wherein hsAID is mutated with the following
mutations: T82I, K10E, K34E, E156G, 181*, S38, H130, V152, R174, T100.
The present invention also provides a High-efficiency Base Editor,
including the mutant protein of activation-induced cytidine deaminase of
the present invention and a DNA-specific binding protein, which are
linked sequentially via a linking sequence. The present invention also
provides a single-base locus-directed editing system, including
single-base locus-directed editing proteins and target Hyper Mutation
Fragment. Compared with the existing activation-induced cytidine
deaminase (AID)-based single-base editing system, the mutant protein
inducing system of the present invention has a smaller molecular weight
and higher mutation efficiency.Claims:
1. A mutant protein of activation-induced cytidine deaminase, wherein
hsAID is mutated with the following mutations: T82I, K10E, K34E, E156G,
181*, S38, H130, V152, R174, and T100; and compared with the sequence
shown in SEQ ID NO:1, the corresponding nucleotide sequence of the mutant
protein has at least 95% sequence identity.
2. A High-efficiency Base Editor, comprising the mutant protein of activation-induced cytidine deaminase as claimed in claim 1, a DNA-specific binding protein and a nuclear localization signal, the mutant protein of activation-induced cytidine deaminase and the DNA-specific binding protein are sequentially liked via a linking sequence, and the nuclear localization signal is located at the C-terminus of the High-efficiency Base Editor.
3. The High-efficiency Base Editor of claim 2, wherein the DNA-specific binding protein is a homing endonuclease; the corresponding nucleotide sequence of the DNA-specific binding protein is shown in SEQ ID NO: 2.
4. The High-efficiency Base Editor of claim 2, further comprising a UGI protein domain, the UGI protein domain is located after the DNA-specific binding protein and before the nuclear localization signal.
5. The High-efficiency Base Editor of claim 3, further comprising a UGI protein domain, the UGI protein domain is located after the DNA-specific binding protein and before the nuclear localization signal.
6. The High-efficiency Base Editor of claim 2, wherein, it is suitable for yeast system, and its corresponding nucleotide sequence is shown in SEQ ID NO: 3; it is suitable for Drosophila system, and its corresponding nucleotide sequence is shown in SEQ ID NO: 4; or it is suitable for zebrafish and mouse systems, and its corresponding nucleotide sequence is shown in SEQ ID NO:5.
7. A single-base locus-directed editing system, comprising the single-base locus-directed editing protein of claim 3 and a target Hyper Mutation Fragment.
8. A single-base locus-directed editing system, comprising the single-base locus-directed editing protein of claim 4 and a target Hyper Mutation Fragment.
9. A single-base locus-directed editing system, comprising the single-base locus-directed editing protein of claim 5 and a target Hyper Mutation Fragment.
10. A single-base locus-directed editing system, comprising the single-base locus-directed editing protein of claim 6 and a target Hyper Mutation Fragment.
11. The single-base locus-directed editing system according to claim 7, wherein the target Hyper Mutation Fragment includes the nucleotide sequence as shown in SEQ ID NO: 6 and SEQ ID NO: 7.
12. The single-base locus-directed editing system according to claim 8, wherein the target Hyper Mutation Fragment includes the nucleotide sequence as shown in SEQ ID NO: 6 and SEQ ID NO: 7.
13. The single-base locus-directed editing system according to claim 9, wherein the target Hyper Mutation Fragment includes the nucleotide sequence as shown in SEQ ID NO: 6 and SEQ ID NO: 7.
14. The single-base locus-directed editing system according to claim 10, wherein the target Hyper Mutation Fragment includes the nucleotide sequence as shown in SEQ ID NO: 6 and SEQ ID NO: 7.
15. A gene editing method, comprising using the single-base locus-directed editing system of claim 7.
16. A gene editing method, comprising using the single-base locus-directed editing system of claim 8.
17. A gene editing method, comprising using the single-base locus-directed editing system of claim 10.
18. A gene editing method, comprising using the single-base locus-directed editing system of claim 11.
19. A gene editing method, comprising using the single-base locus-directed editing system of claim 12.
20. A gene editing method, comprising using the single-base locus-directed editing system of claim 14.
Description:
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present application is a Continuation-In-Part Application of PCT application No. PCT/CN2020/139973 filed on Dec. 28, 2020, which claims the benefit of Chinese Patent Application No. 202010285948.1 filed on Apr. 13, 2020. The contents of the above-identified applications are hereby incorporated by reference.
SUBMISSION OF SEQUENCE LISTING ON ASCII TEXT FILE
[0002] The Sequence Listing is submitted as an ASCII formatted text file via EFS-Web, with a file name of "Sequence_Listing.txt", a creation date of Sep. 28, 2021, and a size of 14,751 bytes. The Sequence. Listing filed via EFS-Web is part of the specification and is incorporated in its entirety by reference herein.
FIELD OF THE INVENTION
[0003] The present invention belongs to the technical field of gene editing, specifically relates to a system and a method for inducing mutant proteins based on activation-induced cytidine deaminase.
BACKGROUND OF THE INVENTION
[0004] Activation-induced cytosine deaminase (AICDA, or AID) is a type of DNA editing enzyme, who plays an important function in somatic hypermutation (SHM), gene conversion and class switch recombination (CSR) in B lymphocytes, by deaminating the DNA of Ig-variable region (V) and Ig-switch region (S), the diversity of the immune repertoire is greatly increased.
[0005] Its working principle is generally as follows: First, the cytosine (C) is replaced with uracil (U), and then in the next round of DNA replication, such uracil (U) is converted to thymine (T). If the intracellular repair mechanism detects the presence of uracil (U) in the DNA, there is a certain probability that it will trigger the removal of bases, resulting in a C.fwdarw.G or C.fwdarw.A mutation. DNA deaminase realizes the conversion of cytosine to thymine through deamination, thus triggers DNA mutation (C.fwdarw.T).
[0006] Since the birth of the CRISPR/Cas9 system, high-efficiency gene editing gradually become possible. Cas9 is localized to a specific DNA region under the guidance of a short RNA molecule (guide RNA, gRNA). At the target locus, Cas9 endonuclease induces the breakage of the double strand, and then initially repairs through homology directed repair (HDR) mechanism in the form of insertions and deletions (indels). But the efficiency of precise gene editing mediated by homology directed repair is limited, thus limited the wide application of this technology. Therefore, precise gene editing, such as single-base changes, is still a huge challenge for CRISPR technology.
[0007] Later, researchers discovered that the single-base editing system developed by integrating base deaminase (such as cytidine deaminase APOBEC1 and adenosine deaminase TadA variants) with the CRISPR/Cas system can accurately introduce point mutations of C/G-T/A and A/T-G/C without cutting DNA double strand, so achieves efficient and accurate gene editing. This gene editing system is still guided by RNA, but it does not cause double-strand breakage at the target locus. In contrast, cytidine deaminase converts cytosine base into uridine, and then being repaired by an error-prone mechanism, thus leads to various point mutations. Moreover, when the uracil-DNA glycosylase pathway is inhibited, the system can also achieve more specific and desired point mutations, such as C-T or G-A transitions. This progress in gene editing is very important, because two-thirds of human genetic diseases are caused by single-base changes. Theoretically, the single-base editing system can be used for the treatment of hundreds of genetic diseases and has great potential for clinical application.
[0008] The single-base editing system CRISPR-Cas9-AID based on B cell-specific activation-induced cytidine deaminase (AID) is one of the most important single-base editing technologies.
[0009] The mutation efficiency of the existing AID is not high enough; the CRISPR/Cas9 system and AID-based single-base editing system has a large molecular weight thus is not easy to be transported to the target DNA fragments, and restricts the transgenic application in some species; the CRISPR/Cas9 system has a relatively high "Off Target" effect.
SUMMARY OF THE INVENTION
[0010] The key technical problem solved by the present invention is to provide an activation-induced cytidine deaminase (AID)-based mutant protein inducing system, which is named "High-efficiency Base Editor" (abbreviated as HBE). The system can specifically target DNA and induce DNA mutations with high efficiency. At the same time, the present invention also provides a method for the system to perform single-base locus-directed mutation in mice.
[0011] In order to achieve the above objectives, the technical solutions adopted by the present invention are as follows:
[0012] In the first aspect, the present invention provides a mutant protein of activation-induced cytidine deaminase. The mutant protein has the following mutations to hsAID: T82I, K10E, K34E, E156G, 181*, S38, H130, V152, R174, and T100.
[0013] Further, compared with the sequence shown in SEQ ID NO:1, the corresponding nucleotide sequence of the mutant protein has at least 95% sequence identity, preferably at least 98% or 99% sequence identity, more preferably, has 100% sequence identity. It should be noted that although these varieties of nucleotide sequences are different, the different sites in these nucleotides are located in non-critical positions, which have little effect on the target DNA of the mutant protein and the efficiency of inducing DNA mutations, or even has no effect.
[0014] In the second aspect, the present invention provides a High-efficiency Base Editor (denoted as: HBE), which includes the above-mentioned mutant protein of activation-induced cytidine deaminase, a DNA-specific binding protein, and a nuclear localization signal The mutant protein of the activation-induced cytidine deaminase and the DNA-specific binding protein are sequentially linked via a linking sequence, and the nuclear localization signal is located at the C-terminus of the High-efficiency Base Editor.
[0015] Further, the DNA specific binding protein is a homing endonuclease.
[0016] Further, the homing endonuclease includes iScel, iTevl, iSmaMl, piScel, iPpol, piPful, iHmul, iCrel, iCeul, iAnil.
[0017] Further, the corresponding nucleotide sequence of the DNA-specific binding protein is shown in SEQ ID NO:2.
[0018] Further, it also includes a UGI protein domain, which is located after the DNA-specific binding protein and before the nuclear localization signal.
[0019] The key elements AID10, d I SceI, UGI, and SV40 in the High-efficiency Base Editor (HBE) of the present invention are linked in series to form a protein with single-base-directed editing function. The linking sequence can be adjusted based on the needs of the expression system or the host cells. In the examples, the present invention provides HBE proteins suitable for yeast systems, Drosophila, zebrafish and mice. The key elements AID10, d I SceI, UGI, and SV40 in these HBE proteins are the same, and the difference is the linking sequence. In the yeast system, the linking sequence of AID10 and d I SceI is 6.times.(GGGGS); in Drosophila, the linking sequence is XTEN, 6.times.(GGGGS); in the zebrafish and mice system The middle linking sequence is 6.times.(GGGGS), GS-rich-linker and HA. This proves that the linking sequence can be adjusted according to the expression system. Therefore, in the examples, the present invention obtains HBE proteins suitable for yeast systems, Drosophila, zebrafish and mice, and their corresponding nucleotide sequences are shown in SEQ ID NOs: 3-5, respectively.
[0020] In the third aspect, the present invention provides a single-base locus-directed editing system, wherein includes the above-mentioned High-efficiency Base Editor and a target Hyper Mutation Fragment.
[0021] Further, the nucleotide sequences of the target Hyper Mutation Fragment (abbreviately as HMF) are shown as SEQ ID NO: 6 and SEQ ID NO: 7.
[0022] In a fourth aspect, the present invention provides a gene editing method, in particular a single-base locus-directed mutation method, the method includes employing the above-mentioned single-base locus-directed editing system.
[0023] The beneficial effects of the present invention: Compared with the existing activation-induced cytidine deaminase (AID)-based single-base editing system, the mutant protein inducing system of the present invention has a smaller molecular weight and higher mutation efficiency.
BRIEF DESCRIPTION OF THE DRAWINGS
[0024] FIG. 1 is a schematic diagram of the evolutionary tree of the deaminase gene family.
[0025] FIG. 2 is a schematic diagram of the results of screening different deaminases through the yeast TLC experiment.
[0026] FIG. 3 is a schematic diagram of the comparison results of the mutagenic efficiency of different deaminases (the vertical coordinate represents different deaminases, and the horizontal coordinate represents the mutation rate).
[0027] FIG. 4 is a schematic diagram of the results of screening hsAID variants through the yeast TLC experiment.
[0028] FIG. 5 is a schematic diagram of the protein mutation loci and 3D structure of hsAID mutant AID5.
[0029] FIGS. 6a-6b are schematic diagrams of comparing the mutagenic efficiency of AID5 and AID10 by yeast TLC experiment; FIG. 6a shows the mutagenic effect of AID5, and FIG. 6b shows the mutagenic effect of AID10. It can be seen from this figure that the mutagenic rate of AID10 is higher than that of AID5.
[0030] FIG. 7 is a schematic diagram of protein mutation positions and 3D structure of hsAID mutant AID10.
[0031] FIG. 8 is a schematic diagram of the screening results of DNA binding proteins (the vertical coordinate indicates different DNA binding proteins, and the horizontal coordinate indicates the mutation rate).
[0032] FIGS. 9a-9b are the High-efficiency Base Editor (HBE) sequence map in yeast and the protein 3D structure diagram; among them, FIG. 9a is the single-base locus-directed editing protein (HBE) sequence map in yeast, and FIG. 9b is the single base site-editing protein (HBE) sequence map in yeast. Schematic diagram of the 3D structure of the gene-directed editing protein (HBE).
[0033] FIG. 10 is a sequence map of the HBE protein in Drosophila.
[0034] FIGS. 11a-11b are the HBE protein sequence maps in zebrafish and mice; wherein, FIG. 11a shows the HBE gene induction expression frame; FIG. 11b shows the Gal4-VP16 gene stable expression frame.
[0035] FIG. 12 is a flow chart of the design and optimization of the target Hyper Mutation Fragment (HMF).
[0036] FIG. 13 is a map of the tet-SLOTH system in mice H11 transgenic locus.
[0037] FIG. 14 is a schematic diagram of strain construction of tet-SLOTH mice.
[0038] FIG. 15 is a map of the lox-SLOTH system in the mouse Rosa26 locus.
[0039] FIG. 16 is a schematic diagram of strain construction of lox-SLOTH mice.
[0040] FIGS. 17a-17d are schematic diagrams of HBE gene expression in the tet-SLOTH system under Dox induction conditions; FIG. 17a shows the relative expression of HBE genes in E14 mice in the whole transcriptome, showing that HBE gene expression is normal; FIG. 17b Shows the coverage of RNA-seq in the E14 period on the SLOTH system. It can be seen that the SLOTH system can work normally during the E14 period; FIG. 17c shows the relative expression of the HBE gene in the whole transcriptome of the P1 mice, which is not much different from the E14 period. This indicates that HBE can still be expressed normally during this period; FIG. 17d shows the coverage of P1 RNA-seq on the SLOTH system.
[0041] FIGS. 18a-18c are statistical graphs of the number of mutation events on the HMF3k fragment in the mouse system; among them, FIG. 18a shows the number of mutations per unit labeled sequence in the uninduced expression (HBE-) and induced expression (HBE+); FIG. 18b shows the induction The number of mutations in the sequence indicated in the unit of expressing (HBE+) mice at birth (P0) and one day after birth (P1); FIG. 18c shows the number of mutations in different organs of induced expressing mice.
DETAILED DESCRIPTION OF THE INVENTION
[0042] In order to show the technical solutions, objectives and advantages of the present invention more concisely and clearly, the present invention will be further described in detail below in combination with specific embodiments and the accompanying drawings.
Example 1 Design and Optimization of Activation-Induced Cytidine Deaminase (AID)
[0043] 1. Screen AID with High Mutation Rate
[0044] The sequence of the deaminase gene family is obtained through sequence alignment, and classified into several major categories according to its structure on the evolutionary tree (FIG. 1). And select representative deaminase from each main branch for downstream screening experiments.
[0045] The deaminase gene is integrated into the induction expression vector (pGA) after codon optimization, and the mutation efficiency is determined in the yeast platform. The results can be seen in FIG. 2 that GIL104 yeast expressing different deaminases all appeared resistant clones on the SC-Arg-/CAN+ plate, indicating the successful expression of the deaminase gene. In the experiment, each sample has 36 independent replicates. The number of yeast cells in each TLC plate replicate is 8.times.10.sup.6. In addition, due to the large number of clones, the hsAID sample was diluted 10 times before sampling, that is 8.times.10.sup.5.
[0046] By calculating the number of resistant clones, the mutation efficiency of each deaminase was estimated. From the analysis results (FIG. 3), it can be seen that the hsAID protein has the highest mutation efficiency. hsAID is a human-derived "Activation-induced cytosine deaminase" (AICDA, or AID), involved in somatic hypermutation (SHM), gene conversion and B lymphocytes Class-switching recombination (CSR), through the deamination of Ig variable region (V) and Ig switch region (S) DNA, greatly increases the diversity of the immune repertoire.
[0047] Therefore, the subsequent experiments started with hsAID and thus modified and optimized the AID protein.
[0048] 2. Transforming and Optimizing AID to Obtain the Mutator
[0049] There adopted a semi-rational design strategy. With the help of bioinformatics methods, based on homologous protein sequence alignment, three-dimensional structure or existing knowledge, multiple amino acid residues are selected as targets for modification, combined with the rational selection of effective codons, and by constructing a high-quality mutant library, K10E, K34E, T82I, F115E, E156G, and R174E were tested for the substitution of 6 amino acid positions. In addition, according to the annotation of the AID protein domain, the C-terminus of the protein mediates the class switch recombination of immune gene loci. Therefore, it is inferred that removing the C-terminus of the AID protein can improve the stability of the target sequence and improve the mutation efficiency.
[0050] Test the combination of these loci in the same yeast screening platform to observe whether the efficiency of deaminase mutation can be improved. The experiment materials and methods refer to 7.2.7 and materials and methods 7.3.1 for the and analysis process. The test results are shown in FIG. 4, K10E, K34E, T82I, and E156G improve mutation efficiency more obviously, and T82I has the most significant improvement effect. At the same time, the variants of 181* terminated early also greatly improved the mutation efficiency.
[0051] As shown in FIG. 5, from the perspective of the protein 3D structure, the amino acid residues of K10E, K34E, E156G, and 181* (not shown) in the five mutation positions are all located on the surface of the protein, while T82I is located inside the protein structure. However, in terms of effect, the improvement brought by the T82I mutation is the most obvious. It may be because the mutation from polar amino acid (Thr) to hydrophobic amino acid (Ile) affects the overall structure of the protein, resulting in an overall improvement in catalytic efficiency. The optimized deaminase protein with improved mutation efficiency was obtained, and it was named AID5. In the following description, "activation-induced cytidine deaminase" (mutator) is referred to as "mutator" for abbreviation.
Example 2 Screening and Optimization of DNA-Specific Binding Proteins
[0052] The reason why AID protein can induce high mutations at specific locus in vivo is due to a large number of cofactors and complex time-conditioning control mechanisms. If it is only a simple overexpression of AID protein, the mutation rate cannot reach a high level on the one hand, on the other hand, mutations will be randomly generated in all positions of the genome, thus triggering mutation burden. Therefore, the job of the "targeted recording" system requires the assistance of DNA-specific binding proteins. The "mutator" is pulled by a DNA-specific binding protein to target specific regions in the genome. In the following description, "DNA specific binding protein" is referred to as "targeter" for abbreviation.
[0053] Currently widely used DNA-specific binding proteins can be classified into three categories: zinc finger proteins, TALENs proteins, and CRISPR/Cas proteins. Wherein, zinc finger proteins have short recognition sequences and poor scalability and generally its one protein domain only recognizes specific 3 bases (nucleotide triplets), so multiple protein domains need to be linked in series to recognize specific sequences. Each structural unit of TALENs protein specifically recognizes a single base, by the combination of structural units, the recognition of any sequence can be achieved. However, there are a large number of repeating sequences in the DNA encoding TALENs, which are prone to develop recombination, which is not conducive to the construction of stable transgenic lines. CRISPR/Cas is currently the most widely accepted technology. It relies on specific guide RNA (gRNA) to achieve the combination of specific DNA sequences and it is convenient for modification. However, as a targeting protein, there are also several shortcomings: the protein structure is large (.about.160 kDa), and as a binding protein, it will produce steric resistance effect, thus affects the function of the linking group; the long gene sequence (.about.4 kb) has restrictions on the transgenic application of some species; strong protein binding ability reduces the possibility of single-strand opening, resulting in a shortened window for effective mutation of deaminase.
[0054] "Homing endonucleases" are meganucleases that recognize DNA sequences of 14-40 bp in length, which are very rare in the genome. By exogenously expressing homing endonuclease, homology directed repair can be specifically activated. I-SceI is a homing endonuclease found in yeast. Its recognition sequence (5'-TAGGGATAACAGGGTAAT-3', SEQ ID NO: 6) is as long as 18 bp and has extremely high specificity. Its protein peptide chain length is 234 aa, the space structure is small. The cleavage active site of I-SceI can be mutated (D44N and D145A) to remove the cleavage activity of the protein, but retain the ability of DNA binding. Therefore, inactivated i-SceI (denoted as: d-iSceI) can be used as a specific DNA binding protein, and its nucleotide sequence is shown in SEQ ID NO:2. Therefore, i-SceI was selected for subsequent experiments.
Example 3 Preparation of High-Efficiency Base Editor (HBE) (Yeast System)
[0055] Through the screening and optimization of "mutator" and "targeter", the two core parts of the "recording protein" in the target recording system were initially obtained. Use a flexible peptide chain to connect the two parts to obtain the prototype of the recorded protein. Then, by overall optimizing the fusion protein, and adding other enhancing elements, there obtained the "High-efficiency Base Editor (HBE)".
[0056] Further optimizing and screening the AID protein: Using the AID5 protein in Example 1 as a template, a mutation library of the AID protein was obtained by error-prone PCR. In the final library, each molecule contained about 4 base substitution mutations. After gel recovery, the AID library was constructed into the pGA induction vector with a library size of about 10.sup.5. The plasmid library was transformed into GIL104 yeast strain by lithium acetate transformation method, and positive clones were screened on SC-Leu-/GLU+ agar plate. Randomly pick about 3000 yeast single clones and inoculate them into SC-Leu-/GAL+ liquid medium to induce the expression of AID protein. Each single clone was independently induced in a 96-well plate, and the pGA-AID5 strain was set as the control group. The TLC experiment was the same as the above experiment method.
[0057] According to the results of the TLC experiment, the mutants with a higher number of resistant clones than the control group (AID5) were selected, and the sequence of the AID mutant was amplified by yeast colony PCR for Sanger sequencing. Since the library screening method only comes from a single experimental repeating, the estimation of mutation efficiency is affected by the fluctuation effect. Therefore, the amplified gene was reconstituted into the pGA vector for the second round of TLC screening. In the second round of screening, 20 parallel induction experiment groups were set up for each clone. Finally, the fluctuation analysis of the TLC results in the second round of screening found a mutant with a slight increase in mutation efficiency (FIG. 6a and FIG. 6b). From the results of Sanger sequencing, another five missense mutations were newly obtained from this variant, plus the original five mutations, in total contained 10 amino acid substitutions (FIG. 7). This variant is named AID10. Further analysis of the spatial location of the mutation position of AID10 shows that S38, H130, V152, and R174 are all located on the surface of the protein, and the residues corresponding to T100 are located inside the 3D structure of the protein. Its nucleotide sequence is shown in SEQ ID NO:1.
[0058] Further optimization and screening of DNA-specific binding proteins: In order to replace and screen better binding proteins, other proteins in the homing endonuclease family are also inactivated at the cleavage site. After being linked to the AID10 protein, they are also tested for DNA binding capability. The test strain is GIL104 yeast with the corresponding binding sequence integrated downstream of the CAN gene. The process of TLC experiment and fluctuation analysis is similar to the previous ones.
[0059] From the results of the fluctuation analysis (FIG. 8), it can be found that using different homing endonucleases as binding domains all has good mutation efficiency. This means that several binding domains and binding sequences can be orthogonally used in subsequent applications, thereby reducing the repetitiveness of the marker sequence and increasing the density of mutations. Wherein, iSceI protein has a slight advantage, so subsequent experiments all use iSceI as the binding element in HBE.
[0060] Since the nuclear localization of the protein is the prerequisite for HBE to act on DNA. The N-terminal of the AID protein carries a human-derived nuclear localization signal, but in cross-species applications, the human-derived localization signal may be relatively inferior. Therefore, in downstream applications, the SV40 nuclear localization signal is added to the C-terminus of the HBE protein. The SV40 localization signal is derived from the large T antigen of the virus SV40 (sequence: PKKKRKV, SEQ ID NO: 8), which has been verified to have a good localization effect in multiple species.
[0061] After the cytosine is mutated to uracil, the DNA uracil carbonylase (uracilDNA glycosylase, Ung) in vivo can recognize the abnormalities of the DNA and repair them. Uracil Glyco-sylase Inhibitor (Ugi) is found in Bacillus subtilis bacteriophage PBS2. This protein can form a complex structure with Ung protein, thereby weakening the deamination ability to repair mutations. Therefore, adding the UGI protein domain to the HBE protein improves the efficiency of deaminase mutagenesis. The final HBE protein structure applied to the lineage is shown in FIGS. 9a and 9b. In this example, the HBE protein suitable for yeast system was finally obtained, and its nucleotide sequence is shown in SEQ ID NO:3.
Example 4 Preparation of HBE Protein in Drosophila
[0062] The HBE protein in Drosophila realizes controllable expression in time and space under the control of the GAL4-UAS system (FIG. 10). N-terminal to C-terminal are hsp70 promoter, AID10, XTEN protein linkage, d-iSceI, 6.times.(GGGGS) (SEQ ID NO: 12) protein linkage, UGI and SV40 nuclear localization signals. Gene expression is driven by the upstream UAS sequence (5.times.GAL4 binding sites) and terminated by the downstream SV40 polyA signal. Its nucleotide sequence is shown in SEQ ID NO:4.
Example 5 Preparation of HBE Protein in Zebrafish and Mice
[0063] The structure of the zebrafish HBE protein is the same as that in Drosophila (FIG. 11a). From the N-terminus to C-terminus are AID10, XTEN protein linker, d-iSceI, 6.times.(GGGGS) (SEQ ID NO: 12) protein linker, UGI and SV40 nuclear localization signal, respectively. HBE protein is linked to mCherry protein through self-cleavable FMDV 2A. HBE and mCherry were co-transcribed and translated, and the two were cleaved after the protein matures. Therefore, the brightness of mCherry can be used to characterize the concentration of HBE protein. Gene expression was driven by the upstream UAS sequence (5.times.GAL4 binding sites) and terminated by the downstream SV40 polyA signal. Gal4-VP16 protein (FIG. 11b) can specifically bind to UAS sequence and induce gene expression. Similarly, Gal4-VP16 protein is linked to EGFP through FMDV 2A, and the brightness of EGFP represents the expression level of Gal4-VP16. Its nucleotide sequence is shown in SEQ ID NO:5.
Example 6 Design and Optimization of the Target Hyper Mutation Fragment (HMF)
[0064] Through manual design and optimization, after obtaining the targeted recording protein (HBE), the problem of targeting fragment also needs to be solved. On the one hand, since the i-SceI recognition sequence is fixed, the recognized sequence does not exist in the genome of higher model organisms, so it is necessary to introduce an exogenous targeting fragment.
[0065] However, the length of the foreign targeting fragment is bound to be limited. On the other hand, because deaminase-induced mutations are sequence-biased, it is necessary to design target mutation hotspots.
[0066] Therefore, in this embodiment, the target fragment suitable for higher model organisms was designed, and the design and optimization process are shown in FIG. 12, including three steps:
[0067] 1) According to the known mutation loci, train the scoring algorithm, design and screen out the "target Hyper Mutation Fragments" (abbreviated as HMF). (in silico)
[0068] 2) Import the sequence into the system expressing HBE protein, and analyze the mutation rate of the target fragments through high-throughput sequencing data. (yeast)
[0069] 3) Design the target fragment according to the analysis result. (in silico)
[0070] The final target hyper mutation fragments are as follows:
[0071] Forward iSceI binding motif: 5' TAGGGATAACAGGGTAAT3' (SEQ ID NO:6);
[0072] Reverse iSceI binding motif: 5' ATTACCCTGTTATCCCTA3' (SEQ ID NO:7).
[0073] The above target Hyper Mutation Fragments were used in yeast to test the targeting ability, and the results are as follows:
TABLE-US-00001 >HMF1 (368 bp) (SEQ ID NO: 9) CAGGTGGGTAAGCAAACTGGTTCCAATGCTGGCACCTAGGCTTGCCAGCA TGCTTAGGTAGGTTGGTGCCCAGGTGAGCTTAGGAACTAGCTTGCCAACT AGCCTGCTGGTACACCTGTGCCTGCTAGCATGCCGGTTAGTACCCAGGTA AGCCTACCAGTTAGCTATTACCCTGTTATCCCTATACGTAGGGATAACAG GGTAATAGCTAGTAGGCTTACTAACTTACTAACCGGTTTACTCCAATGCC AGCCAGCCTAGGAGTTTGCCTACTAGCTTGCTAGTAGGTTCAGGTGAGCT AGCTAACCAGCAAGTTGGTATACCAACCAGTTAGTAAGCATGCTGGTAAG CCAGTAAACCTGCTGGCT >HMF2 (752 bp) (SEQ ID NO: 10) TACTCCAATAGGCCAAGGCATTGGCCTACAGGTGGGCTAGCAAGCAAGCC TACTCAGGTGAGCTAGCTTACCTACTAGCTGGCTAACCAGCTAGCAAACC AGCAGGTAAGTTCACCTGGGCATAGGTACTGGTACAGGTGTAGGAACCAA CTGGCAGGTAGGTAGGTAATTACCCTGTTATCCCTATCAGTAGGGATAAC AGGGTAATAGCAAACCGGTTAGTTTACCTAGGTGCCCACCTGAGCACCTA AGCTCAGGTGAGCCGGCTAGCTAGCTGGTTTACCTTGGAGCTTGCCTACT CAGGTGAGCCTGCCAACCTACTAGCCAGTTGGTTTGCCGGTAGGTTAACC AGTTGGCAAGCCTGCCCACCTGCAGGTGCAATTCCAGGTGGGTAAGCAAA CTGGTTCCAATGCTGGCACCTAGGCTTGCCAGCATGCTTAGGTAGGTTGG TGCCCAGGTGAGCTTAGGAACTAGCTTGCCAACTAGCCTGCTGGTACACC TGTGCCTGCTAGCATGCCGGTTAGTACCCAGGTAAGCCTACCAGTTAGCT ATTACCCTGTTATCCCTATACGTAGGGATAACAGGGTAATAGCTAGTAGG CTTACTAACTTACTAACCGGTTTACTCCAATGCCAGCCAGCCTAGGAGTT TGCCTACTAGCTTGCTAGTAGGTTCAGGTGAGCTAGCTAACCAGCAAGTT GGTATACCAACCAGTTAGTAAGCATGCTGGTAAGCCAGTAAACCTGCTGG CT
[0074] The above target Hyper Mutation Fragments were used in Drosophila, zebrafish or mice to test the targeting ability, and the results are as follows:
TABLE-US-00002 >HMF3k (2940 bp) (SEQ ID NO: 11) AGCTTACTAACCAGCCAACTAGCTGGCTAGCAGGTAAACCTGCCAGCCTGC CGGCTCAGGTGAGCCAGTTAGTAGGCAAGTAAGCTCACCTGTAGGGGCTTTGGAGC AGGTATTGGAGTACAGGTGTAGGTTGGAGTTAGCCAGTAGGTTCACCTGATTACCC TGTTATCCCTACAGGTGAGCAGGCTAGCAAGTAGGTTCCAATGCCGGCTGGTAAGC ATACCAACTCCAAAGTTCACCTGCAGGTGTAGGTACCTAGGCACCTGCACCTGGGCA TAGGTGCTCCTAAGCTAGCAAACCGGTACCTATACTCAGGTGAGCTAGCAAGCTCAG GTGTAGGGATAACAGGGTAATAGCTAACCTACTAGTTGGCTAACCCCAACCAATA CTTAGGAGCTGGCAGGCTAGTTTACTAGCTCAGGTGCAGGTGAGTAAGTACACCTGT GCCAGTAAGCACCTAAGCCAACCAGCCCAGGTGAGCCAACTTGCTGGCAAACCTAC TGGTATACCATTACCCTGTTATCCCTAAGCTGGTAAGCTTACCCCTATACTCACCTG TGCCAGCCCAGGTGAGCAAGTTGGTATACCCACCTGCAGGTGAGTAGGCTAGTAAG CTAGCTAGTATGCTAGCTGGTTAGTTTGCCGGCTGGCTCCAAAACTAGTTGGTTGGC TCAGGTGTGCCGGTTTAGGGATAACAGGGTAATTGCTCCTACAGGTGAGTAGGCTT ACCAGCTCAGGTGAGCAAGCTTGCTCCAATAGGTAGGTTGGAGCATGCCAGTTAGCT TTGGAGCTCAGGTGAGTTTGCCAGTAGGTAAACTAGTATACTTGCTAGCTGGCAAGC CGGTTAGTAGGCTCCTAATTACCCTGTTATCCCTACCAAAACCTGCCCCTAAGCTA GTATAGGAGCCGGTTAGCCAACCAGTACCAACCTAAGCACACCTGAGCTAGCAAAC TAGTACCTATACTTGCCAGCAGGCTAGCTTACCAGTAAGTAGGCACAGGTGTGCCCC TAAGCCAGCTGGCAAGCTTAGGGATAACAGGGTAATGGCTGGCTTGCCAGCAGGT TTACCAACTAACCTAGGAACCAACTAACTTGCTCCAAAGCAAGCAAACTCACCTGG GCATGCCCCTAAGCTAGTAAACCCAGGTGAGCAGGTAGGTAAGTTTACCAGCCAAC TTACCCAGGTGAACCAGTTCACCTGATTACCCTGTTATCCCTATGCTAGCATACTT GCTTGCCGGCATGCTTGCTAGTACCAAAACTAGCTGGTTGGCACAGGTGGGCTTGCT TAGGCACCTGAGCAGGCAGGCTAGTACCTAAGCCAACCGGCAAGTAAGTTAGTAGG CTCCAAAGTTCAGGTGTTGGAGTTAACTTAGGGATAACAGGGTAATAGTAGGTAG GTTAGCTGGTTAGTAAGCTTGCCTTGGAGCTTGCTAGTTTGCTAGTTTACCAACTAAC CGGCAAGTTAACTTTGGCACCTGTTGGTAGGCCTAAGCTTGCCAGCCCACCTGAACC TGCCCAGGTGGGCACACCTGAGTATGCCTTGGATTACCCTGTTATCCCTAAGCACA CCTGAGCAAGCTAGTACAGGTGCACCTGCAGGTGCCTACACCTGGGTAGGCTAACT CACCTGTGCCTGCCTGCTGGCACACCTGAACTGGTTGGCACCTATGCCAGCTTGCCA ACCGGCTTAGGTAGGTACCAGCCGGTATACTAGCTAACTAACCTAGGGATAACAGG GTAATCACCTGAGTAAACCCCTAGGTAAGTACAGGTGTACCAGCTGGTTGGTTCCA ACCTAAGCTTTGGTTGGTGCCGGCTGGTTTACCGGTATACTCCAACACCTGAGCTGG TACCTAGGCTTACTCACCTGCAGGTGGGCTGGTACCTATGCCAACCAACCATTACCC TGTTATCCCTACACCTGTTGGAGCTTTGGCACCTGAGCACACCTGGGCTGGCATGCT TAGGCACCTGGGTAGGCTTAGGCAGGTGAGCAGGCTAGCTGGTAGGTTAGCCGGTA CACCTGAGTTTACTCAGGTGCCTAAGCTGGTTTAGGAGCTGGTATAGGGGCATTGGA GCATAGGGATAACAGGGTAATGGCTGGCAGGTTAACCAACTAACCAACTCCTAAG CCGGTAGGCTAGCTAGCATACCTGCTAGCCCCAACACCTGTACCAGCAGGCAAGCT GGCTCCTAAACTAGTACAGGTGAACCTGCCGGCTAGCTAGCTTAGGGGCTAGCCAGT AGGTTATTACCCTGTTATCCCTAAGCTAGCCTGCCAGCTCCTATGCTAGTTAGCAA GCTGGTAGGCTGGCTAGCCTGCCTACTTACCGGTTGGTAGGTAAACCCACCTGAGCA TGCCGGTATGCCTAGGGGCTTGCCTGCCAGCCAACCTAGGTGCTGGCACCTATGCCT ACTTAGGGATAACAGGGTAATAACTGGCTCCAACACCTGTACTAGCAAGCTTGCCA GCAAGTATAGGCACCTGAGCTAACTAGCTTAGGAACCCACCTGGGCATAGGAACCA GCTAGTTAGCTCCAAAGCTAACCCCTAGGTTGGTTTGCCAGCACACCTGTACTTACC CACCTGTACTATTACCCTGTTATCCCTAAGTTAACTCCTAAGCCCACCTGTACCAA CCAGTAGGCATTGGAGTTGGCTGGTACCTAGGCTGGCTAGCCAGCTGGTAAGCAAG CAAGTTTACCCAGGTGGGCTCCTACAGGTGAGCTCCTAAGCTCACCTGGGTACCAAG GCTGGCAAGCAAGCCTAGGGATAACAGGGTAATAGCTGGCTAGTTGGTAGGCTAG CTTAGGGGCTGGCTAACCAGCAGGTAAGTAAGCACCAAAGCAGGTTGGTAAACCTT GGCAGGTGAGTTGGCTAGCTTTGGAACTAGCCAGTTTACCTAGGAACTAGTTCCTAA GCTAGTAGGTTAGTA
Example 7 Verification of the Mutation Effect Induced by the Single-Base Locus-Directed Editing System Constructed by the Present Invention
[0075] (I) Strain Construction of tet-SLOTH Mice
[0076] Mouse HBE protein (mHBE) had been codon optimized and HA protein tag was added. From N-terminus to C-terminus are: AID10, 3.times.(GGGGS) (SEQ ID NO: 12) linker, d-iSceI, 10aa GS rich linker, HA protein tag and SV40 nuclear localization signal, respectively. mHBE is linked between the Tet promoter and the .beta.-globin polyA termination signal; the HMF3k tag sequence has two ends added with specific amplification sequences that can be used for specific amplification in the mouse genome, and is placed downstream of the mHBE expression frame. Together they form the tet-SLOTH system. The system uses Cas9-mediated transgene technology to integrate into the mouse H11 (Hipp11) locus between the Eif4enif1 and Drg1 genes. The H11 locus is located on mouse chromosome 11 and has been confirmed to be used for the stable and high-efficiency expression of foreign genes. The transgenic locus was identified by Southern blot and confirmed to be a single copy. The map of the tet-SLOTH system in the mouse H11 transgenic locus is shown in FIG. 13.
[0077] After mating between tet-SLOTH mice and rtTA strain mice (JAX ID: 006965), in the progeny of tet-SLOTH+/rtTA+, the stably expressed rtTA protein binds to the Tet promoter upstream of HBE under the effect of doxycycline (Dox), and induces the expression of HBE protein. By controlling the intake of Dox in mice, the switch of the SLOTH system can be regulated, as shown in FIG. 14.
[0078] (II) Strain Construction of Lox-SLOTH Mice
[0079] Similar to mice of the tet-SLOTH strain, the lox-SLOTH system consists of two parts: mHBE and HMF3k. The mHBE protein was connected between the chicken .beta.-actin promoter and the polyA termination signal, and a termination signal of 3 tandem repeats with LoxP recombination loci at both ends was inserted between the promoter and the mHBE gene. The system integrated a single copy into the Rosa26 locus in the mouse. Rosa26 locus is located on mouse Chromosome 6, and is the most commonly used safe harbor for mouse transgenic system. The foreign gene integrated into Rosa26 locus could be stably expressed. The map of the lox-SLOTH system in the mouse Rosa26 transgenic loci are shown in FIG. 15. After lox-SLOTH mice were mated with Cre mice, in the progeny, the termination signal between the LoxP loci mediated by the Cre protein was removed, and the lox-SLOTH system was turned on, as shown in FIG. 16.
[0080] In order to verify whether the mutation inducting system can normally function in mice, a transgenic mouse strain of tet-SLOTH was selected and the following experiment was designed: mice homozygous for tet-SLOTH locus and mice heterozygous for rtTA were mated, the genotypes of the progeny were: tet-SLOTH+/rtTA+ and tet-SLOTH+/rtTA-. From 3 days before mating, the mother mice were fed Dox (2 mg/day) in the form of feed intake, and the fetuses could absorb Dox through the placenta to activate the expression of HBE. By comparing the survival rates and phenotype of the progeny of the two genotypes, it can be determined whether the system has an effect on the development of mice. In terms of the number of progeny (F1), there is no significant difference between tet-SLOTH+/rtTA+ and tet-SLOTH+/rtTA-, and there is no abnormality in development.
[0081] Two time points, E14 and P1, were selected, and after the individuals with tet-SLOTH+/rtTA+ were identified by PCR, the whole transcriptome sequencing was performed respectively. First, isolate the hind limbs of the mouse for RNA extraction; then take 1 .mu.g of total RNA from each sample to construct a transcriptome sequencing library; finally, use the Illumina NovaSeq platform for high-throughput sequencing. The data volume of a single sample is greater than 8 Gb, and the sequencing length is PE150. The analysis steps of the sequencing data are as follows: 1) Use fastp to control the quality of the sequencing data, and filter the sequences whose average quality is lower than Q35. 2) Using the map of the locus (FIG. 13) and its sequence, on the basis of the mouse reference genome (Mus musculus, GRCm38), create an index and annotation file for sequence alignment. 3) Compare RNAseq data through STAR. 4) Calculate the effective sequencing read and gene length (subread/featureCount) of each gene. As shown in FIG. 17a-17d, at different developmental time points, Dox can induce the expression of HBE gene in tet-SLOTH transgenic mice. The expression level of HBE is relatively high, and it has stable expression level during mouse development. This means that, unlike previous work, the labeling efficiency of the SLOTH system is constant and can better restore the time points of the cell development history.
[0082] In order to compare the mutations on a single molecule of HMF3k, different organ types in each individual mouse were amplified, monoclonalized with TOPO cloning kit, and single clones were randomly selected for Sanger sequencing. It can be seen from FIGS. 18a-18c that in the HBE- control group, almost no mutations were observed in the HMF3k marker sequence; while in the HBE+ experimental group, there were 4.3 mutations on average in the unit marker sequence. It is proved that the system can function normally in mice and successfully induce mutations.
[0083] The above-mentioned embodiments only convey several embodiments of the present invention, and the descriptions are more specific and detailed, but they should not be understood as limiting the scope of the present invention. It should be pointed out that for those of ordinary skilled in the art, without departing from the concept of the present invention, several modifications and improvements can be made, and these all fall within the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the appended claims.
Sequence CWU
1
1
121630DNAArtificial SequenceSynthetic 1aagcttatgc ccaagaaaaa gcgcaaggtg
gacagcctct tgatgaaccg gaggaagttt 60ctttaccaat tcaaaaatgt ccgctgggct
aagggtcggc gtgagaccta cctgtgctac 120gtagtgaaga ggcgtgacag tgctacatcc
ttttcactgg actttggtta tcttcgcaat 180aagaacggct gccacgtgga attgctcttc
ctccgctaca tctcggactg ggacctagac 240cctggccgct gctaccgcgt cacctggttc
acctcctgga gcccctgcta cgactgtgcc 300cgacatgtgg ccgactttct gcgagggaac
cccaacctca gtctgaggat cttcaccgcg 360cgcctctact tctgtgagga ccgcaaggct
gagcccgagg ggctgcggcg gctgcaccgc 420gccggggtgc aaatagccat catgaccttc
aaagattatt tttactgctg gaatactttt 480gtagaaaacc acgaaagaac tttcaaagcc
tgggaagggc tgcatgaaaa ttcagttcgt 540ctctccagac agcttcggcg catccttttg
cccctgtatg aggttgatga cttacgagac 600gcatttcgta ctttgggact ttaatctaga
6302717DNAArtificial SequenceSynthetic
2ggatccaaaa acatcaaaaa aaaccaggta atgaacctgg gtccgaactc taaactgctg
60aaagaataca aatcccagct gatcgaactg aacatcgaac agttcgaagc aggtatcggt
120ctgatcctgg gtaatgctta catccgttct cgtgatgaag gtaaaaccta ctgtatgcag
180ttcgagtgga aaaacaaagc atacatggac cacgtatgtc tgctgtacga tcagtgggta
240ctgtccccgc cgcacaaaaa agaacgtgtt aaccacctgg gtaacctggt aatcacctgg
300ggcgcccaga ctttcaaaca ccaagctttc aacaaactgg ctaacctgtt catcgttaac
360aacaaaaaaa ccatcccgaa caacctggtt gaaaactacc tgaccccgat gtctctggca
420tactggttca tggatgctgg tggtaaatgg gattacaaca aaaactctac caacaaatcg
480atcgtactga acacccagtc tttcactttc gaagaagtag aatacctggt taagggtctg
540cgtaacaaat tccaactgaa ctgttacgta aaaatcaaca aaaacaaacc gatcatctac
600atcgattcta tgtcttacct gatcttctac aacctgatca aaccgtacct gatcccgcag
660atgatgtaca aactgccgaa cactatctcc tccgaaactt tcctgaaatg agaattc
71731293DNAArtificial SequenceSynthetic 3atggacagcc tcttgatgaa ccggagggag
tttctttacc aattcaaaaa tgtccgctgg 60gctaagggtc ggcgtgagac ctacctgtgc
tacgtagtgg agaggcgtga cagtgctaca 120tccttttcac tggactttgg ttatcttcgc
aataagaacg gctgccacgt ggaattgctc 180ttcctccgct acatctcgga ctgggaccta
gaccctggcc gctgctaccg cgtcacctgg 240ttcatctcct ggagcccctg ctacgactgt
gcccgacatg tggccgactt tctgcgaggg 300aaccccaacc tcagtctgag gatcttcacc
gcgcgcctct acttctgtga ggaccgcaag 360gctgagcccg aggggctgcg gcggctgcac
cgcgccgggg tgcaaatagc catcatgacc 420ttcaaagatt atttttactg ctggaatact
tttgtagaaa accatggaag aactttcaaa 480gcctgggaag ggctgcatga aaattcagtt
cgtctctcca gacagcttcg gcgcatcctt 540ggtggaggtg gttctggtgg tggaggttct
ggtggtggtg gatccatgaa aaacatcaaa 600aaaaaccagg taatgaacct gggtccgaac
tctaaactgc tgaaagaata caaatcccag 660ctgatcgaac tgaacatcga acagttcgaa
gcaggtatcg gtctgatcct gggtaatgct 720tacatccgtt ctcgtgatga aggtaaaacc
tactgtatgc agttcgagtg gaaaaacaaa 780gcatacatgg accacgtatg tctgctgtac
gatcagtggg tactgtcccc gccgcacaaa 840aaagaacgtg ttaaccacct gggtaacctg
gtaatcacct ggggcgccca gactttcaaa 900caccaagctt tcaacaaact ggctaacctg
ttcatcgtta acaacaaaaa aaccatcccg 960aacaacctgg ttgaaaacta cctgaccccg
atgtctctgg catactggtt catggatgct 1020ggtggtaaat gggattacaa caaaaactct
accaacaaat cgatcgtact gaacacccag 1080tctttcactt tcgaagaagt agaatacctg
gttaagggtc tgcgtaacaa attccaactg 1140aactgttacg taaaaatcaa caaaaacaaa
ccgatcatct acatcgattc tatgtcttac 1200ctgatcttct acaacctgat caaaccgtac
ctgatcccgc agatgatgta caaactgccg 1260aacactatct cctccgaaac tttcctgaaa
taa 129341661DNAArtificial
SequenceSynthetic 4atggacagcc tcttgatgaa ccggagggag tttctttacc aattcaaaaa
tgtccgctgg 60gctaagggtc ggcgtgagac ctacctgtgc tacgtagtgg agaggcgtga
ctgtgctaca 120tccttttcac tggactttgg ttatcttcgc aataagaacg gctgccacgt
ggaattgctc 180ttcctccgct acatctcgga ctgggaccta gaccctggcc gctgctaccg
cgtcacctgg 240ttcatctcct ggagcccctg ctacgactgt gcccgacatg tggccgactt
tctgcgaggg 300aaccccaacc tcagtctgag gatcttcgcc gcgcgcctct acttctgtga
ggaccgcaag 360gctgagcccg aggggctgcg gcggctgcgc cgcgccgggg tgcaaatagc
catcatgacc 420ttcaaagatt atttttactg ctggaatact tttgcagaaa accatggaag
aactttcaaa 480gcctgggaag ggctgcatga aaattcagtt cgtctctccg gacagcttcg
gcgcatcctt 540agcggcagcg agactcccgg gacctcagag tccgccacac ccgaaagtaa
aaacatcaaa 600aaaaaccagg taatgaacct gggtccgaac tctaaactgc tgaaagaata
caaatcccag 660ctgatcgaac tgaacatcga acagttcgaa gcaggtatcg gtctgatcct
gggtaatgct 720tacatccgtt ctcgtgatga aggtaaaacc tactgtatgc agttcgagtg
gaaaaacaaa 780gcatacatgg accacgtatg tctgctgtac gatcagtggg tactgtcccc
gccgcacaaa 840aaagaacgtg ttaaccacct gggtaacctg gtaatcacct ggggcgccca
gactttcaaa 900caccaagctt tcaacaaact ggctaacctg ttcatcgtta acaacaaaaa
aaccatcccg 960aacaacctgg ttgaaaacta cctgaccccg atgtctctgg catactggtt
catggatgtg 1020gtggtaaatg ggattacaac aaaaactcta ccaacaaatc gatcgtactg
aacacccagt 1080ctttcacttt cgaagaagta gaatacctgg ttaagggtct gcgtaacaaa
ttccaactga 1140actgttacgt aaaaatcaac aaaaacaaac cgatcatcta catcgattct
atgtcttacc 1200tgatcttcta caacctgatc aaaccgtacc tgatcccgca gatgatgtac
aaactgccga 1260acactatctc ctccgaaact ttcctgaaag gtggaggtgg ttctggtgga
ggtggttctg 1320gtggtggatc tggaggcggt gggtccggag gtggcggttc gggcggaggt
ggatccacta 1380acctgtccga catcatcgag aaggagactg gcaagcagct ggtgatccag
gagtctattc 1440tgatgctgcc agaggaggtg gaagaggtga tcggcaacaa gccagagtct
gatatcctgg 1500tgcacactgc ctacgacgag tccactgacg aaaacgtgat gctgctgact
tccgatgccc 1560cagaatacaa gccatgggcc ctggtgattc aggactccaa cggcgagaac
aagatcaaga 1620tgctgtctgg tggttctccc aagaagaaga ggaaagtcta a
166151638DNAArtificial SequenceSynthetic 5atggacagcc
tccttatgaa ccggcgagag ttcttgtatc aatttaaaaa cgttcgatgg 60gcaaagggac
ggcgggagac ttacctttgc tatgttgtgg agcggcgaga ttgcgccacc 120tctttctctc
ttgacttcgg ctatctccga aacaagaatg gatgtcacgt agaacttttg 180tttcttcggt
atataagtga ctgggacctt gatccaggac gatgctaccg cgttacctgg 240ttcatctcat
ggagcccctg ttatgactgc gccaggcatg ttgctgactt tctgagaggg 300aatccaaacc
tctccctccg cattttcgct gctaggctgt atttttgtga ggatcggaag 360gcagaaccag
agggtctcag gcgattgcgc cgggctggag tacaaatcgc tattatgaca 420tttaaggact
acttttattg ctggaacact ttcgctgaaa atcatggtag aacctttaaa 480gcctgggagg
ggcttcacga gaactcagtc cgattgtcag gtcaactcag gcgcatactg 540ggaggaggtg
gttccggcgg tgggggcagt ggcggaggtg gttctatgaa gaatatcaag 600aaaaatcagg
taatgaattt gggtcctaac agtaagttgc tcaaggaata caagtcccaa 660ctgattgagc
tgaacattga acaattcgaa gccggaattg gcttgatact cggcaatgct 720tatatcagga
gtagagatga agggaaaact tattgcatgc aattcgagtg gaaaaataag 780gcctatatgg
atcacgtgtg tctcctttat gaccaatggg tactgtcacc tccacataag 840aaagagaggg
ttaatcatct tggtaatctc gttatcacat ggggagcaca aactttcaaa 900catcaggcat
ttaacaaatt ggcaaacttg tttattgtga acaataaaaa gactataccc 960aacaatttgg
tcgagaacta tcttacccct atgtctttgg cctactggtt catggacgca 1020ggcggcaaat
gggattacaa caaaaatagt acaaacaaaa gtattgtact taacacacag 1080tcctttacat
tcgaagaggt agaatatttg gtcaaaggac ttaggaacaa gtttcaactg 1140aattgttacg
ttaaaataaa taaaaataag cctatcatat acatagactc tatgtcttac 1200ctgattttct
acaacttgat aaagccctac ctcattcccc aaatgatgta taaactccca 1260aatactattt
cttccgagac cttcctgaaa tctggtggtt ctggaggatc tggtggttct 1320actaatctgt
cagatattat tgaaaaggag accggtaagc aactggttat ccaggaatcc 1380atcctcatgc
tcccagagga ggtggaagaa gtcattggga acaagccgga aagcgatata 1440ctcgtgcaca
ccgcctacga cgagagcacc gacgagaatg tcatgcttct gactagcgac 1500gcccctgaat
acaagccttg ggctctggtc atacaggata gcaacggtga gaacaagatt 1560aagatgctct
ctggtggttc ttacccatac gatgttccag attacgctgc agctcccaag 1620aagaagagga
aagtctaa
1638618DNAArtificial SequenceSynthetic 6tagggataac agggtaat
18718DNAArtificial SequenceSynthetic
7attaccctgt tatcccta
1887PRTSimian virus 40 8Pro Lys Lys Lys Arg Lys Val1
59368DNAArtificial SequenceSynthetic 9caggtgggta agcaaactgg ttccaatgct
ggcacctagg cttgccagca tgcttaggta 60ggttggtgcc caggtgagct taggaactag
cttgccaact agcctgctgg tacacctgtg 120cctgctagca tgccggttag tacccaggta
agcctaccag ttagctatta ccctgttatc 180cctatacgta gggataacag ggtaatagct
agtaggctta ctaacttact aaccggttta 240ctccaatgcc agccagccta ggagtttgcc
tactagcttg ctagtaggtt caggtgagct 300agctaaccag caagttggta taccaaccag
ttagtaagca tgctggtaag ccagtaaacc 360tgctggct
36810752DNAArtificial SequenceSynthetic
10tactccaata ggccaaggca ttggcctaca ggtgggctag caagcaagcc tactcaggtg
60agctagctta cctactagct ggctaaccag ctagcaaacc agcaggtaag ttcacctggg
120cataggtact ggtacaggtg taggaaccaa ctggcaggta ggtaggtaat taccctgtta
180tccctatcag tagggataac agggtaatag caaaccggtt agtttaccta ggtgcccacc
240tgagcaccta agctcaggtg agccggctag ctagctggtt taccttggag cttgcctact
300caggtgagcc tgccaaccta ctagccagtt ggtttgccgg taggttaacc agttggcaag
360cctgcccacc tgcaggtgca attccaggtg ggtaagcaaa ctggttccaa tgctggcacc
420taggcttgcc agcatgctta ggtaggttgg tgcccaggtg agcttaggaa ctagcttgcc
480aactagcctg ctggtacacc tgtgcctgct agcatgccgg ttagtaccca ggtaagccta
540ccagttagct attaccctgt tatccctata cgtagggata acagggtaat agctagtagg
600cttactaact tactaaccgg tttactccaa tgccagccag cctaggagtt tgcctactag
660cttgctagta ggttcaggtg agctagctaa ccagcaagtt ggtataccaa ccagttagta
720agcatgctgg taagccagta aacctgctgg ct
752112940DNAArtificial SequenceSynthetic 11agcttactaa ccagccaact
agctggctag caggtaaacc tgccagcctg ccggctcagg 60tgagccagtt agtaggcaag
taagctcacc tgtaggggct ttggagcagg tattggagta 120caggtgtagg ttggagttag
ccagtaggtt cacctgatta ccctgttatc cctacaggtg 180agcaggctag caagtaggtt
ccaatgccgg ctggtaagca taccaactcc aaagttcacc 240tgcaggtgta ggtacctagg
cacctgcacc tgggcatagg tgctcctaag ctagcaaacc 300ggtacctata ctcaggtgag
ctagcaagct caggtgtagg gataacaggg taatagctaa 360cctactagtt ggctaacccc
aaccaatact taggagctgg caggctagtt tactagctca 420ggtgcaggtg agtaagtaca
cctgtgccag taagcaccta agccaaccag cccaggtgag 480ccaacttgct ggcaaaccta
ctggtatacc attaccctgt tatccctaag ctggtaagct 540tacccctata ctcacctgtg
ccagcccagg tgagcaagtt ggtataccca cctgcaggtg 600agtaggctag taagctagct
agtatgctag ctggttagtt tgccggctgg ctccaaaact 660agttggttgg ctcaggtgtg
ccggtttagg gataacaggg taattgctcc tacaggtgag 720taggcttacc agctcaggtg
agcaagcttg ctccaatagg taggttggag catgccagtt 780agctttggag ctcaggtgag
tttgccagta ggtaaactag tatacttgct agctggcaag 840ccggttagta ggctcctaat
taccctgtta tccctaccaa aacctgcccc taagctagta 900taggagccgg ttagccaacc
agtaccaacc taagcacacc tgagctagca aactagtacc 960tatacttgcc agcaggctag
cttaccagta agtaggcaca ggtgtgcccc taagccagct 1020ggcaagctta gggataacag
ggtaatggct ggcttgccag caggtttacc aactaaccta 1080ggaaccaact aacttgctcc
aaagcaagca aactcacctg ggcatgcccc taagctagta 1140aacccaggtg agcaggtagg
taagtttacc agccaactta cccaggtgaa ccagttcacc 1200tgattaccct gttatcccta
tgctagcata cttgcttgcc ggcatgcttg ctagtaccaa 1260aactagctgg ttggcacagg
tgggcttgct taggcacctg agcaggcagg ctagtaccta 1320agccaaccgg caagtaagtt
agtaggctcc aaagttcagg tgttggagtt aacttaggga 1380taacagggta atagtaggta
ggttagctgg ttagtaagct tgccttggag cttgctagtt 1440tgctagttta ccaactaacc
ggcaagttaa ctttggcacc tgttggtagg cctaagcttg 1500ccagcccacc tgaacctgcc
caggtgggca cacctgagta tgccttggat taccctgtta 1560tccctaagca cacctgagca
agctagtaca ggtgcacctg caggtgccta cacctgggta 1620ggctaactca cctgtgcctg
cctgctggca cacctgaact ggttggcacc tatgccagct 1680tgccaaccgg cttaggtagg
taccagccgg tatactagct aactaaccta gggataacag 1740ggtaatcacc tgagtaaacc
cctaggtaag tacaggtgta ccagctggtt ggttccaacc 1800taagctttgg ttggtgccgg
ctggtttacc ggtatactcc aacacctgag ctggtaccta 1860ggcttactca cctgcaggtg
ggctggtacc tatgccaacc aaccattacc ctgttatccc 1920tacacctgtt ggagctttgg
cacctgagca cacctgggct ggcatgctta ggcacctggg 1980taggcttagg caggtgagca
ggctagctgg taggttagcc ggtacacctg agtttactca 2040ggtgcctaag ctggtttagg
agctggtata ggggcattgg agcataggga taacagggta 2100atggctggca ggttaaccaa
ctaaccaact cctaagccgg taggctagct agcatacctg 2160ctagccccaa cacctgtacc
agcaggcaag ctggctccta aactagtaca ggtgaacctg 2220ccggctagct agcttagggg
ctagccagta ggttattacc ctgttatccc taagctagcc 2280tgccagctcc tatgctagtt
agcaagctgg taggctggct agcctgccta cttaccggtt 2340ggtaggtaaa cccacctgag
catgccggta tgcctagggg cttgcctgcc agccaaccta 2400ggtgctggca cctatgccta
cttagggata acagggtaat aactggctcc aacacctgta 2460ctagcaagct tgccagcaag
tataggcacc tgagctaact agcttaggaa cccacctggg 2520cataggaacc agctagttag
ctccaaagct aacccctagg ttggtttgcc agcacacctg 2580tacttaccca cctgtactat
taccctgtta tccctaagtt aactcctaag cccacctgta 2640ccaaccagta ggcattggag
ttggctggta cctaggctgg ctagccagct ggtaagcaag 2700caagtttacc caggtgggct
cctacaggtg agctcctaag ctcacctggg taccaaggct 2760ggcaagcaag cctagggata
acagggtaat agctggctag ttggtaggct agcttagggg 2820ctggctaacc agcaggtaag
taagcaccaa agcaggttgg taaaccttgg caggtgagtt 2880ggctagcttt ggaactagcc
agtttaccta ggaactagtt cctaagctag taggttagta 2940125PRTArtificial
SequenceSynthetic 12Gly Gly Gly Gly Ser1 5
User Contributions:
Comment about this patent or add new information about this topic: