Patent application title: DNA-BINDING PROTEIN USING PPR MOTIF, AND USE THEREOF
Inventors:
Takashi Yamamoto (Higashihiroshima-Shi, JP)
Tetsushi Sakuma (Higashihiroshima-Shi, JP)
Takahiro Nakamura (Fukuoka-Shi, JP)
Yusuke Yagi (Fukuoka-Shi, JP)
Yasuyuki Okawa (Fukuoka-Shi, JP)
Assignees:
KYUSHU UNIVERSITY, NATIONAL UNIVERSITY CORPORATION
HIROSHIMA UNIVERSITY
IPC8 Class: AC07K14415FI
USPC Class:
1 1
Class name:
Publication date: 2021-10-21
Patent application number: 20210324019
Abstract:
The object of the present invention is to, by analyzing PPR proteins that
act to bind to DNA with a prediction that RNA recognition rules of PPR
motifs can also be used for recognition of DNA, find a PPR protein
showing such a characteristic. According to the present invention, it was
revealed that, with a protein that can bind in a DNA base-selective
manner or a DNA base sequence-specific manner, which contains one or
more, preferably 2 to 30, more preferably 5 to 25, most preferably 9 to
15, of PPR motifs having a structure of the following formula 1 (wherein,
in the formula 1, Helix A is a part that can form an .alpha.-helix
structure; X does not exist, or is a part consisting of 1 to 9 amino
acids; Helix B is a part that can form an .alpha.-helix structure; and L
is a part consisting of 2 to 7 amino acids), and having a specific
combination of amino acids corresponding to a DNA base or DNA base
sequence as amino acids of three positions of No. 1 A.A., No. 4 A.A., in
Helix A of the formula 1 and No. "ii" (-2) A.A. contained in L of the
formula 1, the aforementioned object could be achieved.
(Helix A)-X-(Helix B)-L (Formula 1)Claims:
1. A method for modifying a genetic substance of a cell, the method
comprising: designing a DNA binding protein; determining a DNA base
sequence coding for an amino acid sequence of a fused protein comprising
a functional region and a DNA binding region consisting of a DNA-binding
protein; cloning said DNA base sequence; preparing a vector carrying said
DNA base sequence and a cell containing a DNA having at least 9 target
DNA bases as a target DNA base sequence; and introducing the vector into
the cell so that the DNA binding region of the fused protein binds to the
DNA having the at least 9 target DNA bases, and therefore the functional
region modifies the DNA having the at least 9 target DNA bases, wherein
the DNA-binding protein contains at least 9 PPR motifs for binding to the
at least 9 target DNA bases, respectively, and having a structure of the
following formula 1: (Helix A)-X-(Helix B)-L (Formula 1) (wherein, in
the formula 1: Helix A is a part that can form an .alpha.-helix
structure; X does not exist, or is a part consisting of 1 to 9 amino
acids; Helix B is a part that can form an .alpha.-helix structure; and L
is a part consisting of 2 to 7 amino acids), wherein, position numbers of
amino acids in the PPR motifs are the same as PF01535 in Pfam under the
following definitions: the first amino acid of Helix A is referred to as
Number 1 amino acid (Number 1 AA), the fourth amino acid as Number 4
amino acid (Number 4 AA), and when a next PPR motif (M.sub.n+1)
contiguously exists on the C-terminus side of the PPR motif (M.sub.n)
(when there is no amino acid insertion between the PPR motifs), the -2nd
amino acid counted from the end (C-terminus side) of the amino acids
constituting the PPR motif (M.sub.n); when a non-PPR motif consisting of
1 to 20 amino acids exists between the PPR motif (M.sub.n) and the next
PPR motif (M.sub.n+1) on the C-terminus side, the amino acid locating
upstream of the first amino acid of the next PPR motif (M.sub.n+1) by 2
positions, i.e., the -2nd amino acid; or when any next PPR motif
(M.sub.n+1) does not exist on the C-terminus side of the PPR motif
(M.sub.n), or 21 or more amino acids constituting a non-PPR motif exist
between the PPR motif (M.sub.n) and the next PPR motif (M.sub.n+1) on the
C-terminus side, the 2nd amino acid counted from the end (C-terminus
side) of the amino acids constituting the PPR motif (M.sub.n) is referred
to as Number "ii" (-2) amino acid (Number "ii" (-2) AA), each PPR motif
(M.sub.n) contained in the DNA-binding protein is a PPR motif having a
specific combination of amino acids as the three amino acids of Number 1
AA, Number 4 AA, and Number "ii" (-2) AA, such that the DNA-binding
protein has at least 9 combinations of the three amino acids
corresponding to the respective target DNA bases of the target DNA base
sequence, and each combination of the three amino acids is determined
according to any one of the following definitions: (2-1) when the target
DNA base to which the PPR motif binds is the three amino acids, Number 1
AA, Number 4 AA, and Number "ii" (-2) AA are an arbitrary amino acid,
glycine, and aspartic acid, respectively; (2-2) when the target DNA base
to which the PPR motif binds is the three amino acids, Number 1 AA,
Number 4 AA, and Number "ii" (-2) AA, are glutamic acid, glycine, and
aspartic acid, respectively; (2-3) when the target DNA base to which the
PPR motif binds is A, the three amino acids, Number 1 AA, Number 4 AA,
and Number "ii" (-2) AA, are an arbitrary amino acid, glycine, and
asparagine, respectively; (2-4) when the target DNA base to which the PPR
motif binds is A, the three amino acids, Number 1 AA, Number 4 AA, and
Number "ii" (-2) AA, are glutamic acid, glycine, and asparagine,
respectively; (2-5) when the target DNA base to which the PPR motif binds
is A, the three amino acids, Number 1 AA, Number 4 AA, and Number "ii"
(-2) AA, are an arbitrary amino acid, glycine, and serine, respectively;
(2-7) when the target DNA base to which the PPR motif binds is T, the
three amino acids, Number 1 AA, Number 4 AA, and Number "ii" (-2) AA, are
an arbitrary amino acid, isoleucine, and asparagine, respectively; (2-12)
when the target DNA base to which the PPR motif binds is T, the three
amino acids, Number 1 AA, Number 4 AA, and Number "ii" (-2) AA, are an
arbitrary amino acid, methionine, and aspartic acid, respectively; (2-13)
when the target DNA base to which the PPR motif binds is C, the three
amino acids, Number 1 AA, Number 4 AA, and Number "ii" (-2) AA, are
isoleucine, methionine, and aspartic acid, respectively; (2-15) when the
target DNA base to which the PPR motif binds is T, the three amino acids,
Number 1 AA, Number 4 AA, and Number "ii" (-2) AA, are an arbitrary amino
acid, asparagine, and aspartic acid, respectively; (2-16) when the target
DNA base to which the PPR motif binds is T, the three amino acids, Number
1 AA, Number 4 AA, and Number "ii" (-2) AA, are phenylalanine,
asparagine, and aspartic acid, respectively; (2-17) when the target DNA
base to which the PPR motif binds is T, the three amino acids, Number 1
AA, Number 4 AA, and Number "ii" (-2) AA, are glycine, asparagine, and
aspartic acid, respectively; (2-18) when the target DNA base to which the
PPR motif binds is T, the three amino acids, Number 1 AA, Number 4 AA,
and Number "ii" (-2) AA, are isoleucine, asparagine, and aspartic acid,
respectively; (2-19) when the target DNA base to which the PPR motif
binds is T, the three amino acids, Number 1 AA, Number 4 AA, and Number
"ii" (-2) AA, are threonine, asparagine, and aspartic acid, respectively;
(2-20) when the target DNA base to which the PPR motif binds is T, the
three amino acids, Number 1 AA, Number 4 AA, and Number "ii" (-2) AA are
valine, asparagine, and aspartic acid, respectively; (2-21) when the
target DNA base to which the PPR motif binds is T, the three amino acids,
Number 1 AA, Number 4 AA, and Number "ii" (-2) AA are tyrosine,
asparagine, and aspartic acid, respectively; (2-22) when the target DNA
base to which the PPR motif binds is C, the three amino acids, Number 1
AA, Number 4 AA, and Number "ii" (-2) AA, are an arbitrary amino acid,
asparagine, and asparagine, respectively; (2-23) when the target DNA base
to which the PPR motif binds is C, the three amino acids, Number 1 AA,
Number 4 AA, and Number "ii" (-2) AA, are isoleucine, asparagine, and
asparagine, respectively; (2-24) when the target DNA base to which the
PPR motif binds is C, the three amino acids, Number 1 AA, Number 4 AA,
and Number "ii" (-2) AA, are serine, asparagine, and asparagine,
respectively; (2-25) when the target DNA base to which the PPR motif
binds is C, the three amino acids, Number 1 AA, Number 4 AA, and Number
"ii" (-2) AA, are valine, asparagine, and asparagine, respectively;
(2-26) when the target DNA base to which the PPR motif binds is C, the
three amino acids, Number 1 AA, Number 4 AA, and Number "ii" (-2) AA, are
an arbitrary amino acid, asparagine, and serine, respectively; (2-27)
when the target DNA base to which the PPR motif binds is C, the three
amino acids, Number 1 AA, Number 4 AA, and Number "ii" (-2) AA, are
valine, asparagine, and serine, respectively; (2-28) when the target DNA
base to which the PPR motif binds is C, the three amino acids, Number 1
AA, Number 4 AA, and Number "ii" (-2) AA, are an arbitrary amino acid,
asparagine, and threonine, respectively; (2-29) when the target DNA base
to which the PPR motif binds is C, the three amino acids, Number 1 AA,
Number 4 AA, and Number "ii" (-2) AA, are valine, asparagine, and
threonine, respectively; (2-30) when the target DNA base to which the PPR
motif binds is C, the three amino acids, Number 1 AA, Number 4 AA, and
Number "ii" (-2) AA, are an arbitrary amino acid, asparagine, and
tryptophan, respectively; (2-31) when the target DNA base to which the
PPR motif binds is T, the three amino acids, Number 1 AA, Number 4 AA,
and Number "ii" (-2) AA, are isoleucine, asparagine, and tryptophan,
respectively; (2-33) when the target DNA base to which the PPR motif
binds is T, the three amino acids, Number 1 AA, Number 4 AA, and Number
"ii" (-2) AA, are an arbitrary amino acid, proline, and aspartic acid,
respectively; (2-34) when the target DNA base to which the PPR motif
binds is T, the three amino acids, Number 1 AA, Number 4 AA, and Number
"ii" (-2) AA, are phenylalanine, proline, and aspartic acid,
respectively; (2-35) when the target DNA base to which the PPR motif
binds is T, the three amino acids, Number 1 AA, Number 4 AA, and Number
"ii" (-2) AA, are tyrosine, proline, and aspartic acid, respectively;
(2-37) when the target DNA base to which the PPR motif binds is A, the
three amino acids, Number 1 AA, Number 4 AA, and Number "ii" (-2) AA, are
an arbitrary amino acid, serine, and asparagine, respectively; (2-38)
when the target DNA base to which the PPR motif binds is A, the three
amino acids, Number 1 AA, Number 4 AA, and Number "ii" (-2) AA, are
phenylalanine, serine, and asparagine, respectively; (2-39) when the
target DNA base to which the PPR motif binds is A, the three amino acids,
Number 1 AA, Number 4 AA, and Number "ii" (-2) AA, are valine, serine,
and asparagine, respectively; (2-41) when the target DNA base to which
the PPR motif binds is the three amino acids, Number 1 AA, Number 4 AA,
and Number "ii" (-2) AA, are an arbitrary amino acid, threonine, and
aspartic acid, respectively; (2-42) when the target DNA base to which the
PPR motif binds is the three amino acids, Number 1 AA, Number 4 AA, and
Number "ii" (-2) AA, are valine, threonine, and aspartic acid,
respectively; (2-43) when the target DNA base to which the PPR motif
binds is A, the three amino acids, Number 1 AA, Number 4 AA, and Number
"ii" (-2) AA, are an arbitrary amino acid, threonine, and asparagine,
respectively; (2-44) when the target DNA base to which the PPR motif
binds is A, the three amino acids, Number 1 AA, Number 4 AA, and Number
"ii" (-2) AA, are phenylalanine, threonine, and asparagine, respectively;
(2-45) when the target DNA base to which the PPR motif binds is A, the
three amino acids, Number 1 AA, Number 4 AA, and Number "ii" (-2) AA, are
isoleucine, threonine, and asparagine, respectively; (2-46) when the
target DNA base to which the PPR motif binds is A, the three amino acids,
Number 1 AA, Number 4 AA, and Number "ii" (-2) AA, are valine, threonine,
and asparagine, respectively; (2-48) when the target DNA base to which
the PPR motif binds is C, the three amino acids, Number 1 AA, Number 4
AA, and Number "ii" (-2) AA, are isoleucine, valine, and aspartic acid,
respectively; (2-49) when the target DNA base to which the PPR motif
binds is C, the three amino acids, Number 1 AA, Number 4 AA, and Number
"ii" (-2) AA, are an arbitrary amino acid, valine, and glycine,
respectively; and (2-50) when the target DNA base to which the PPR motif
binds is T, the three amino acids, Number 1 AA, Number 4 AA, and Number
"ii" (-2) AA, are an arbitrary amino acid, valine, and threonine,
respectively.
2. The method according to claim 1, wherein the functional region is fused to the DNA-binding protein on the C-terminus side of the protein.
3. The method according to claim 1, wherein the functional region is a DNA-cleaving enzyme, or a nuclease domain thereof, or a transcription control domain, and the complex functions as a target sequence-specific DNA-cleaving enzyme or transcription control factor.
4. The method according to claim 3, wherein the DNA-cleaving enzyme is the nuclease domain of FokI (SEQ ID NO: 6).
5. The method according to claim 1, wherein the one or more PPR motifs are any group of motifs selected from 9 PPR motifs belonging to the p63 protein consisting of the amino acid sequence of SEQ ID NO: 1, 11 PPR motifs belonging to the GUN1 protein consisting of the amino acid sequence of SEQ ID NO: 2, 15 PPR motifs belonging to the pTac2 protein consisting of the amino acid sequence of SEQ ID NO: 3, 10 PPR motifs belonging to the DG1 protein consisting of the amino acid sequence of SEQ ID NO: 4, and 11 PPR motifs belonging to the GRP23 protein consisting of the amino acid sequence of SEQ ID NO: 5.
Description:
CROSS-REFERENCE OF RELATED APPLICATIONS
[0001] This application is a Divisional of application Ser. No. 16/216,617, filed Dec. 11, 2018, which is a Divisional of application Ser. No. 14/785,952, filed Oct. 21, 2015, which is a 371 of International Application No. PCT/JP2014/061329, filed Apr. 22, 2014, which claims priority of Japanese Patent Application No. 2013-089840, filed Apr. 22, 2013, the entire contents of which are incorporated herein by reference.
TECHNICAL FIELD
[0002] The present invention relates to a protein that can selectively or specifically bind to an intended DNA base or DNA sequence. According to the present invention, a pentatricopeptide repeat (PPR) motif is utilized. The present invention can be used for identification and design of a DNA-binding protein, identification of a target DNA of a protein having a PPR motif, and functional control of DNA. The present invention is useful in the fields of medicine, agricultural science, and so forth. The present invention also relates to a novel DNA-cleaving enzyme that utilizes a complex of a protein containing a PPR motif and a protein that defines a functional region.
BACKGROUND ART
[0003] In recent years, techniques of binding nucleic acid-binding protein factors elucidated through various analyses to an intended sequence have been established, and they are coming to be used. Use of this sequence-specific binding is enabling analysis of intracellular localization of a target nucleic acid (DNA or RNA), elimination of a target DNA sequence, or expression control (activation or inactivation) of a protein-encoding gene existing downstream of a target DNA sequence. There are being conducted researches and developments using the zinc finger protein (Non-patent documents 1 and 2), TAL effecter (TALE, Non-patent document 3, Patent document 1), and CRISPR (Non-patent documents 4 and 5) as protein factors that act on DNA as materials for protein engineering. However, types of such protein factors are still extremely limited.
[0004] For example, the artificial enzyme, zinc finger nuclease (ZFN), known as an artificial DNA-cleaving enzyme, is a chimera protein obtained by binding a part that is constituted by linking 3 to 6 zinc fingers that specifically recognize a DNA consisting of 3 or 4 nucleotides and bind to it, and recognizes a nucleotide sequence in a sequence unit of 3 or 4 nucleotides with one DNA cleavage domain of a bacterial DNA-cleaving enzyme (for example, FokI) (Non-patent document 2). In such a chimera protein, the zinc finger domain is a protein domain that is known to bind to DNA, and it is based on the knowledge that many transcription factors have the aforementioned domain, and bind to a specific DNA sequence to control expression of a gene. By using two of ZFNs each having three zinc fingers, cleavage of one site per 70 billion nucleotides can be induced in theory.
[0005] However, because of the high cost required for the production of ZFNs, etc., the methods using ZFNs have not come to be widely used yet. Moreover, functional sorting efficiency of ZFNs is bad, and it is suggested that the methods have a problem also in this respect. Furthermore, since a zinc finger domain consisting of n of zinc fingers tends to recognize a sequence of (GNN)n, the methods also have a problem that degree of freedom for the target gene sequence is low.
[0006] An artificial enzyme, TALEN, has also been developed by binding a protein consisting of a combinatory sequence of module parts that can recognize every one nucleotide, TAL effecter (TALE), with a DNA cleavage domain of a bacterial DNA-cleaving enzyme (for example, FokI), and it is being investigated as an artificial enzyme that can replace ZFNs (Non-patent document 3). This TALEN is an enzyme generated by fusing a DNA binding domain of a transcription factor of a plant pathogenic Xanthomonas bacterium, and the DNA cleavage domain of the DNA restriction enzyme FokI, and it is known to bind to a neighboring DNA sequence to form a dimer and cleave a double strand DNA. Since, as for this molecule, the DNA binding domain of TALE found from a bacterium that infects with plants recognize one base with a combination of amino acids at two sites in the TALE motif consisting of 34 amino acid residues, it has a characteristic that binding property for a target DNA can be chosen by choosing the repetitive structure of the TALE module. TALEN using the DNA binding domain that has such a characteristic as mentioned above has a characteristic that it enables introduction of mutation into a target gene, like ZFNs, but the significant superiority thereof to ZFNs is that degree of freedom for the target gene (nucleotide sequence) is markedly improved, and the nucleotide to which it binds can be defined with a code.
[0007] However, since the total conformation of TALEN has not been elucidated, the DNA cleavage site of TALEN has not been identified at present. Therefore, it has a problem that cleavage site of TALEN is inaccurate, and is not fixed, compared with ZFNs, and it also cleaves even a similar sequence. Therefore, it has a problem that a nucleotide sequence cannot be accurately cleaved at an intended target site with a DNA-cleaving enzyme. For these reasons, it is desired to develop and provide a novel artificial DNA-cleaving enzyme free from the aforementioned problems.
[0008] On the basis of genome sequence information, PPR proteins (proteins having a pentatricopeptide repeat (PPR) motif) constituting a big family of no less than 500 members only for plants have been identified (Non-patent document 6). The PPR proteins are nucleus-encoded proteins, but are known to act on or involved in control, cleavage, translation, splicing, RNA edition, and RNA stability chiefly at an RNA level in organelles (chloroplasts and mitochondria) in a gene-specific manner. The PPR proteins typically have a structure consisting of about 10 contiguous 35-amino acid motifs of low conservativeness, i.e., PPR motifs, and it is considered that the combination of the PPR motifs is responsible for the sequence-selective binding with RNA. Almost all the PPR proteins consist only of repetition of about 10 PPR motifs, and any domain required for exhibiting a catalytic action is not found in many cases. Therefore, it is considered that the PPR proteins are essentially RNA adapters (Non-patent document 7).
[0009] In general, binding of a protein and DNA, and binding of a protein and RNA are attained by different molecular mechanisms. Therefore, a DNA-binding protein generally does not bind to RNA, whereas an RNA-binding protein generally does not bind to DNA. For example, in the case of the pumilio protein, which is known as an RNA-binding factor, and can encode RNA to be recognized, binding thereof to DNA has not been reported (Non-patent documents 8 and 9).
[0010] However, in the process of investigating properties of various kinds of PPR proteins, it became clear that it could be suggested that some types of the PPR proteins worked as DNA-binding factors.
[0011] The wheat p63 is a PPR protein having 9 PPR motifs, and it is suggested by gel shift assay that it binds to DNA in a sequence-specific manner (Non-patent document 10).
[0012] The GUN1 protein of Arabidopsis thaliana has 11 PPR motifs, and it is suggested by pull down assay that it binds with DNA (Non-patent document 11).
[0013] It has been demonstrated by run-on assay that the Arabidopsis thaliana pTac2 (protein having 15 PPR motifs, Non-patent document 12) and Arabidopsis thaliana DG1 (protein having 10 PPR motifs, Non-patent document 12) directly participate in transcription for generating RNA by using DNA as a template, and they are considered to bind to DNA.
[0014] An Arabidopsis thaliana strain deficient in the gene of GRP23 (protein having 11 PPR motifs, Non-patent document 14) shows the phenotype of embryonal death. It has been demonstrated that this protein physically interacts with the major subunit of the eukaryotic RNA transcription polymerase 2, which is a DNA-dependent RNA transcription enzyme, and therefore it is considered that GRP23 also acts to bind to DNA.
[0015] However, bindings of these PPR proteins to DNA have been only indirectly suggested, and actual sequence-specific binding has not been fully verified. Moreover, even if such proteins bind with DNA, it is generally considered that binding of a protein and DNA, and binding of a protein and RNA are attained by different molecular mechanisms, and therefore what kind of sequence rule specifically exists, with which binding is attained, etc., are not even expected at all.
PRIOR ART REFERENCES
Patent Documents
[0016] Patent document 1: WO2011/072246
[0017] Patent document 2: WO2011/111829
Non-Patent Documents
[0017]
[0018] Non-patent document 1: Maeder, M. L., et al. (2008) Rapid "open-source" engineering of customized zinc-finger nucleases for highly efficient gene modification, Mol. Cell 31, 294-301
[0019] Non-patent document 2: Urnov, F. D., et al. (2010) Genome editing with engineered zinc finger nucleases, Nature Review Genetics, 11, 636-646
[0020] Non-patent document 3: Miller, J. C., et al. (2011) A TALE nuclease architecture for efficient genome editing, Nature Biotech., 29, 143-148
[0021] Non-patent document 4: Mali P., et al. (2013) RNA-guided human genome engineering via Cas9, Science, 339, 823-826
[0022] Non-patent document 5: Cong L., et al. (2013) Multiplex genome engineering using CRISPR/Cas systems, Science, 339, 819-823
[0023] Non-patent document 6: Small, I. D. and Peeters, N. (2000) The PPR motif --a TPR-related motif prevalent in plant organellar proteins, Trends Biochem. Sci., 25, 46-47
[0024] Non-patent document 7: Woodson, J. D., and Chory, J. (2008) Coordination of gene expression between organellar and nuclear genomes, Nature Rev. Genet., 9, 383-395
[0025] Non-patent document 8: Wang, X., et al. (2002) Modular recognition of RNA by a human pumilio-homology domain, Cell, 110, 501-512
[0026] Non-patent document 9: Cheong, C. G, and Hall and T. M. (2006) Engineering RNA sequence specificity of Pumilio repeats, Proc. Natl. Acad. Sci. USA 103, 13635-13639
[0027] Non-patent document 10: Ikeda T. M. and Gray M. W. (1999) Characterization of a DNA-binding protein implicated in transcription in wheat mitochondria, Mol. Cell Bio., 119 (12): 8113-8122
[0028] Non-patent document 11: Koussevitzky S., et al. (2007) Signals from chloroplasts converge to regulate nuclear gene expression, Science, 316:715-719
[0029] Non-patent Document 12: Pfalz J, et al. (2006) PTAC2, -6, and -12 are components of the transcriptionally active plastid chromosome that are required for plastid gene expression, Plant Cell 18:176-197
[0030] Non-patent document 13: Chi W, et al. (2008) The pentatricopeptide repeat protein DELAYED GREENING1 is involved in the regulation of early chloroplast development and chloroplast gene expression in Arabidopsis, Plant Physiol., 147:573-584
[0031] Non-patent document 14: Ding Y H, et al. (2006) Arabidopsis GLUTAMINE-RICH PROTEIN 23 is essential for early embryogenesis and encodes a novel nuclear PPR motif protein that interacts with RNA polymerase II subunit III, Plant Cell, 18:815-830
SUMMARY OF THE INVENTION
Object to be Achieved by the Invention
[0032] The inventors of the present invention expected that the properties of the PPR proteins (proteins having a PPR motif) as RNA adapters would be determined by property of each PPR motif constituting the PPR proteins and combination of a plurality of PPR motifs, and proposed methods for modifying RNA-binding proteins using such PPR motifs (Patent document 2). Then, they elucidated that a PPR motif and RNA bind in one-to-one correspondence, contiguous PPR motifs recognize contiguous RNA bases in an RNA sequence, and such RNA recognition is determined by combination of amino acids at specific three positions among the 35 amino acids constituting the PPR motif, and filed a patent application for a method for designing a customized RNA-binding protein utilizing RNA recognition codes of PPR motifs and use thereof (PCT/JP2012/077274; Yagi, Y., et al. (2013) PLoS One, 8, e57286; and Barkan, A., et al. (2012) PLoS Genet., 8, e1002910).
[0033] It has been generally considered that binding of a protein and DNA, and binding of a protein and RNA are attained by different molecular mechanisms. However, the inventors of the present invention predicted that the RNA recognition rule of the PPR motif would be also usable for recognition of DNA, and analyzed PPR proteins that act to bind with DNA aiming at retrieving PPR proteins having such a characteristic. They also aimed at providing a novel artificial enzyme by preparing a customized DNA-binding protein that binds to a desired sequence using such a PPR protein that specifically binds to a DNA obtained as described above, and using it with a protein that defines a functional region, and providing a novel artificial DNA-cleaving enzyme by using it together with a region having a DNA-cleaving activity as the functional region.
Means for Achieving the Object
[0034] As for the PPR proteins, it was elucidated by various domain search programs (Pfam, Prosite, Interpro, etc.) that the PPR motifs contained in the common RNA-binding type PPR proteins and the PPR motifs contained in the DNA-binding PPR proteins of some kinds mentioned above are not particularly distinguished. Therefore, it was considered that PPR proteins might contain amino acids (amino acid group) that would determine a binding property for DNA or a binding property for RNA apart from the amino acids required for the nucleic acid recognition.
[0035] The inventors of the present invention elucidated that an RNA-binding PPR motif and RNA bind in one-to-one correspondence, contiguous PPR motifs recognize contiguous RNA bases in an RNA sequence, and in such recognition, base-selective binding with RNA is determined by combination of RNA recognition amino acids at specific three positions (that is, the first and fourth amino acids of the first helix (Helix A) among the two .alpha.-helix structures constituting the motif (No. 1 A.A. and No. 4 A.A.), and the second amino acid counted from the C-terminus (No. "ii" (-2) A.A.)), among the 35 amino acids constituting the PPR motif, and filed a patent application for a method for designing a customized RNA-binding protein utilizing RNA recognition codes of PPR motifs and use thereof (PCT/JP2012/077274).
[0036] Then, among the PPR proteins, for the aforementioned wheat p63 (Non-patent document 11, the amino acid sequence of the homologous protein of Arabidopsis thaliana is shown as SEQ ID NO: 1), GUN1 protein of Arabidopsis thaliana (Non-patent document 12, amino acid sequence thereof is shown as SEQ ID NO: 2), pTac2 of Arabidopsis thaliana (Non-patent document 13, amino acid sequence thereof is shown as SEQ ID NO: 3), DG1 (Non-patent document 14, amino acid sequence thereof is shown as SEQ ID NO: 4), and GRP23 of Arabidopsis thaliana (Non-patent document 15, amino acid sequence thereof is shown as SEQ ID NO: 5), for which binding with DNA was suggested, amino acid frequencies of the amino acids at three positions bearing the nucleic acid recognition codes in the PPR motif considered to be important when RNA is a target (No. 1 A.A., No. 4 A.A. and No. "ii" (-2) A.A.) were compared with those found in the RNA binding type motif. As a result, it became clear that the tendencies of the amino acid frequencies found in those PPR motifs as mentioned above, for which DNA-binding property was suggested, and the RNA binding type motifs substantially agreed with each other.
[0037] The above results suggest that the nucleic acid recognition codes of the RNA binding type PPR motifs can also be applied to the DNA binding type PPR motifs. Thymine (T) is a uracil (U) derivative having a structure consisting of uracil (U) of which carbon of the 5-position is methylated, as it is also called 5-methyluracil. Such a characteristic of the base constituting the nucleic acid suggests that the combination of the amino acids that recognizes uracil (U) of an RNA binding type PPR motif is used for recognition of thymine (T) in DNA.
[0038] On the basis of the aforementioned findings, it was elucidated that, by using the aforementioned p63 (amino acid sequence of SEQ ID NO: 1), GUN1 protein of Arabidopsis thaliana (amino acid sequence of SEQ ID NO: 2), pTac2 of Arabidopsis thaliana (amino acid sequence of SEQ ID NO: 3), DG1 (amino acid sequence of SEQ ID NO: 4), and GRP23 of Arabidopsis thaliana (amino acid sequence of SEQ ID NO: 5), which are DNA-binding type PPR proteins, as a template, arranging amino acids of the three positions (No. 1 A.A., No. 4 A.A. and No. "ii" (-2) A.A.) with applying the finding obtained for such PPR proteins as a result of examination of the RNA-binding type PPR motifs, a customized DNA-binding protein that binds to an arbitrary DNA base sequence could be produced.
[0039] That is, the inventors of the present invention provided a protein that comprises 2 or more, preferably 2 to 30, more preferably 5 to 25, most preferably 9 to 15, of PPR motifs having the specific amino acids described later as the amino acids at the three positions (No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A.) in the PPR motifs, and can bind to DNA in a DNA base-selective manner or DNA base sequence-selective manner, of which typical examples are the amino acid sequences of SEQ ID NOS: 1 to 5, and thus accomplished the present invention.
[0040] The present invention provides the followings.
[0041] [1] A method for modifying a genetic substance of a cell, the method including: designing a DNA binding protein; determining a DNA base sequence coding for an amino acid sequence of a fused protein comprising a functional region and a DNA binding region consisting of a DNA-binding protein; cloning said DNA base sequence; preparing a vector carrying said DNA base sequence and a cell containing a DNA having at least 9 target DNA bases as a target DNA base sequence; and introducing the vector into the cell so that the DNA binding region of the fused protein binds to the DNA having the at least 9 target DNA bases, and therefore the functional region modifies the DNA having the at least 9 target DNA bases, wherein the DNA-binding protein contains at least 9 PPR motifs for binding to the at least 9 target DNA bases, respectively, and having a structure of the following formula 1: [Formula 1]
(Helix A)-X-(Helix B)-L (Formula 1)
(wherein, in the formula 1: Helix A is a part that can form an .alpha.-helix structure; X does not exist, or is a part consisting of 1 to 9 amino acids; Helix B is a part that can form an .alpha.-helix structure; and L is a part consisting of 2 to 7 amino acids), wherein, position numbers of amino acids in the PPR motifs are the same as PF01535 in Pfam under the following definitions: the first amino acid of Helix A is referred to as No. 1 amino acid (No. 1 A.A.), the fourth amino acid as No. 4 amino acid (No. 4 A.A.), and
[0042] when a next PPR motif (M.sub.n+1) contiguously exists on the C-terminus side of the PPR motif (M.sub.n) (when there is no amino acid insertion between the PPR motifs), the -2nd amino acid counted from the end (C-terminus side) of the amino acids constituting the PPR motif (M.sub.n);
[0043] when a non-PPR motif consisting of 1 to 20 amino acids exists between the PPR motif (M.sub.n) and the next PPR motif (M.sub.n+1) on the C-terminus side, the amino acid locating upstream of the first amino acid of the next PPR motif (M.sub.n+1) by 2 positions, i.e., the -2nd amino acid; or
[0044] when any next PPR motif (M.sub.n+1) does not exist on the C-terminus side of the PPR motif (M.sub.n), or 21 or more amino acids constituting a non-PPR motif exist between the PPR motif (M.sub.n) and the next PPR motif (M.sub.n+1) on the C-terminus side, the 2nd amino acid counted from the end (C-terminus side) of the amino acids constituting the PPR motif (M.sub.n) is referred to as No. "ii" (-2) amino acid (No. "ii" (-2) A.A.), one PPR motif (M.sub.n) contained in the protein is a PPR motif having a specific combination of amino acids corresponding to a target DNA base or target DNA base sequence as the three amino acids of No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A.
[2] The protein according to [1], wherein the combination of the three amino acids of No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A. is a combination corresponding to a target DNA base or target DNA base sequence. In one embodiment, each PPR motif (M.sub.n) contained in the DNA-binding protein is a PPR motif having a specific combination of amino acids as the three amino acids of Number 1 AA, Number 4 AA, and Number "ii" (-2) AA, such that the DNA-binding protein has at least 9 combinations of the three amino acids corresponding to the respective target DNA bases of the target DNA base sequence. Each combination of the three amino acids is determined according to any one of the following definitions: (1-1) when No. 4 A.A. is glycine (G), No. 1 A.A. may be an arbitrary amino acid, and No. "ii" (-2) A.A. is aspartic acid (D), asparagine (N), or serine (S); (1-2) when No. 4 A.A. is isoleucine (I), each of No. 1 A.A. and No. "ii" (-2) A.A. may be an arbitrary amino acid; (1-3) when No. 4 A.A. is leucine (L), each of No. 1 A.A. and No. "ii" (-2) A.A. may be an arbitrary amino acid; (1-4) when No. 4 A.A. is methionine (M), each of No. 1 A.A. and No. "ii" (-2) A.A. may be an arbitrary amino acid; (1-5) when No. 4 A.A. is asparagine (N), each of No. 1 A.A. and No. "ii" (-2) A.A. may be an arbitrary amino acid; (1-6) when No. 4 A.A. is proline (P), each of No. 1 A.A. and No. "ii" (-2) A.A. may be an arbitrary amino acid; (1-7) when No. 4 A.A. is serine (S), each of No. 1 A.A. and No. "ii" (-2) A.A. may be an arbitrary amino acid; (1-8) when No. 4 A.A. is threonine (T), each of No. 1 A.A. and No. "ii" (-2) A.A. may be an arbitrary amino acid; and (1-9) when No. 4 A.A. is valine (V), each of No. 1 A.A. and No. "ii" (-2) A.A. may be an arbitrary amino acid. [3] The protein according to [1], wherein the combination of the three amino acids of No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A. is a combination corresponding to a target DNA base or target DNA base sequence, and the combination of amino acids is determined according to any one of the following definitions: (2-1) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A. are an arbitrary amino acid, glycine, and aspartic acid, respectively, the PPR motif selectively binds to G; (2-2) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are glutamic acid, glycine, and aspartic acid, respectively, the PPR motif selectively binds to G; (2-3) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, glycine, and asparagine, respectively, the PPR motif selectively binds to A; (2-4) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are glutamic acid, glycine, and asparagine, respectively, the PPR motif selectively binds to A; (2-5) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, glycine, and serine, respectively, the PPR motif selectively binds to A, and next binds to C; (2-6) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, isoleucine, and an arbitrary amino acid, respectively, the PPR motif selectively binds to T and C; (2-7) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, isoleucine, and asparagine, respectively, the PPR motif selectively binds to T, and next binds to C; (2-8) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, leucine, and an arbitrary amino acid, respectively, the PPR motif selectively binds to T and C; (2-9) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, leucine, and aspartic acid, respectively, the PPR motif selectively binds to C; (2-10) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, leucine, and lysine, respectively, the PPR motif selectively binds to T; (2-11) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, methionine, and an arbitrary amino acid, respectively, the PPR motif selectively binds to T; (2-12) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, methionine, and aspartic acid, respectively, the PPR motif selectively binds to T; (2-13) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are isoleucine, methionine, and aspartic acid, respectively, the PPR motif selectively binds to T, and next binds to C; (2-14) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, asparagine, and an arbitrary amino acid, respectively, the PPR motif selectively binds to C and T; (2-15) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, asparagine, and aspartic acid, respectively, the PPR motif selectively binds to T; (2-16) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are phenylalanine, asparagine, and aspartic acid, respectively, the PPR motif selectively binds to T; (2-17) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are glycine, asparagine, and aspartic acid, respectively, the PPR motif selectively binds to T; (2-18) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are isoleucine, asparagine, and aspartic acid, respectively, the PPR motif selectively binds to T; (2-19) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are threonine, asparagine, and aspartic acid, respectively, the PPR motif selectively binds to T; (2-20) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A. are valine, asparagine, and aspartic acid, respectively, the PPR motif selectively binds to T, and next binds to C; (2-21) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A. are tyrosine, asparagine, and aspartic acid, respectively, the PPR motif selectively binds to T, and next binds to C; (2-22) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, asparagine, and asparagine, respectively, the PPR motif selectively binds to C; (2-23) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are isoleucine, asparagine, and asparagine, respectively, the PPR motif selectively binds to C; (2-24) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are serine, asparagine, and asparagine, respectively, the PPR motif selectively binds to C; (2-25) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are valine, asparagine, and asparagine, respectively, the PPR motif selectively binds to C; (2-26) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, asparagine, and serine, respectively, the PPR motif selectively binds to C; (2-27) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are valine, asparagine, and serine, respectively, the PPR motif selectively binds to C; (2-28) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, asparagine, and threonine, respectively, the PPR motif selectively binds to C; (2-29) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are valine, asparagine, and threonine, respectively, the PPR motif selectively binds to C; (2-30) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, asparagine, and tryptophan, respectively, the PPR motif selectively binds to C, and next binds to T; (2-31) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are isoleucine, asparagine, and tryptophan, respectively, the PPR motif selectively binds to T, and next binds to C; (2-32) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, proline, and an arbitrary amino acid, respectively, the PPR motif selectively binds to T; (2-33) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, proline, and aspartic acid, respectively, the PPR motif selectively binds to T; (2-34) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are phenylalanine, proline, and aspartic acid, respectively, the PPR motif selectively binds to T; (2-35) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are tyrosine, proline, and aspartic acid, respectively, the PPR motif selectively binds to T; (2-36) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, serine, and an arbitrary amino acid, respectively, the PPR motif selectively binds to A and G; (2-37) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, serine, and asparagine, respectively, the PPR motif selectively binds to A; (2-38) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are phenylalanine, serine, and asparagine, respectively, the PPR motif selectively binds to A; (2-39) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are valine, serine, and asparagine, respectively, the PPR motif selectively binds to A; (2-40) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, threonine, and an arbitrary amino acid, respectively, the PPR motif selectively binds to A and G; (2-41) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, threonine, and aspartic acid, respectively, the PPR motif selectively binds to G; (2-42) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are valine, threonine, and aspartic acid, respectively, the PPR motif selectively binds to G; (2-43) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, threonine, and asparagine, respectively, the PPR motif selectively binds to A; (2-44) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are phenylalanine, threonine, and asparagine, respectively, the PPR motif selectively binds to A; (2-45) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are isoleucine, threonine, and asparagine, respectively, the PPR motif selectively binds to A; (2-46) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are valine, threonine, and asparagine, respectively, the PPR motif selectively binds to A; (2-47) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, valine, and an arbitrary amino acid, respectively, the PPR motif binds with A, C, and T, but does not bind to G; (2-48) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are isoleucine, valine, and aspartic acid, respectively, the PPR motif selectively binds to C, and next binds to A; (2-49) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, valine, and glycine, respectively, the PPR motif selectively binds to C; and (2-50) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, valine, and threonine, respectively, the PPR motif selectively binds to T. [4] The protein according to any one of [1] to [3], which contains 2 to 30 of the PPR motifs (M.sub.n) defined in [1]. [5] The protein according to any one of [1] to [3], which contains 5 to 25 of the PPR motifs (M.sub.n) defined in [1]. [6] The protein according to any one of [1] to [3], which contains 9 to 15 of the PPR motifs (M.sub.n) defined in [1]. [7] The PPR protein according to [6], which consists of a sequence selected from the amino acid sequence of SEQ ID NO: 1 containing 9 PPR motifs, the amino acid sequence of SEQ ID NO: 2 containing 11 PPR motifs, the amino acid sequence of SEQ ID NO: 3 containing 15 PPR motifs, the amino acid sequence of SEQ ID NO: 4 containing 10 PPR motifs, and the amino acid sequence of SEQ ID NO: 5 containing 11 PPR motifs. [8] A method for identifying a DNA base or DNA base sequence that serves as a target of a DNA-binding protein containing one or more (preferably 2 to 30) PPR motifs (M.sub.n) defined in [1], wherein:
[0045] the DNA base or DNA base sequence is identified by determining presence or absence of a DNA base corresponding to a combination of the three amino acids of No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A. of the PPR motif on the basis of any one of the definitions (1-1) to (1-9) mentioned in [2], and (2-1) to (2-50) mentioned in [3]. [9] A method for identifying a PPR protein containing one or more (preferably 2 to 30) PPR motifs (M.sub.n) defined in [1] that can bind to a target DNA base or target DNA having a specific base sequence, wherein:
[0046] the PPR protein is identified by determining presence or absence of a combination of the three amino acids of No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A. corresponding to the target DNA base or a specific base constituting the target DNA on the basis of any one of the definitions (1-1) to (1-9) mentioned in [2], and (2-1) to (2-50) mentioned in [3].
[10] A method for controlling a function of DNA, which uses the protein according to [1]. [11] A complex consisting of a region comprising the protein according to [1], and a functional region bound together. [12] The complex according to [11], wherein the functional region is fused to the protein according to [1] on the C-terminus side of the protein. [13] The complex according to [11] or [12], wherein the functional region is a DNA-cleaving enzyme, or a nuclease domain thereof, or a transcription control domain, and the complex functions as a target sequence-specific DNA-cleaving enzyme or transcription control factor. [14] The complex according to [13], wherein the DNA-cleaving enzyme is the nuclease domain of FokI (SEQ ID NO: 6). [15] A method for modifying a genetic substance of a cell comprising the following steps:
[0047] preparing a cell containing a DNA having a target sequence; and
[0048] introducing the complex according to [11] into the cell so that the region of the complex consisting of the protein binds to the DNA having a target sequence, and therefore the functional region modifies the DNA having a target sequence.
[16] A method for identifying, recognizing, or targeting a DNA base or DNA having a specific base sequence by using a PPR protein containing one or more PPR motifs. [17] The method according to [16], wherein the protein contains one or more PPR motifs in which three amino acids among the amino acids constituting the motif constitute a specific combination of amino acids. [18] The method according to [16] or [17], wherein the protein contains one or more PPR motifs (M.sub.n) defined in [1].
Effect of the Invention
[0049] According to the present invention, a PPR motif that can binds to a target DNA base, and a protein containing it can be provided. By arranging two or more PPR motifs, a protein that can binds to a target DNA having an arbitrary sequence or length can be provided.
[0050] According to the present invention, a target DNA of an arbitrary PPR protein can be predicted and identified, and conversely, a PPR protein that binds to an arbitrary DNA can be predicted and identified. Prediction of such a target DNA sequence clarifies the genetic identity thereof, and increases possibility of use thereof. Furthermore, according to the present invention, functionalities of homologous genes of a gene of an industrially useful PPR protein showing amino acid polymorphism at a high level can be determined on the basis of difference of the target DNA base sequences thereof.
[0051] Furthermore, according to the present invention, a novel DNA-cleaving enzyme using a PPR motif can also be provided. That is, by linking a protein as a functional region with the PPR motif or PPR protein provided by the present invention, a complex containing a protein having a binding activity for a specific nucleic acid sequence, and a protein having a specific functionality can be prepared.
[0052] The functional region usable in the present invention is one that can impart, among various functions, a function for any one of cleavage, transcription, replication, restoration, synthesis, modification, etc. of DNA. By choosing the sequence of the PPR motifs, which is the characteristic of the present invention, to determine a base sequence of DNA as a target, almost all DNA sequences can be used as a target, and genome edition using a function of the functional region such as those for cleavage, transcription, replication, restoration, synthesis, modification, etc. of DNA can be realized with such a target.
[0053] For example, when the functional region has a function for cleaving DNA, a complex comprising a PPR protein part prepared according to the present invention and a DNA-cleaving region linked together is provided. Such a complex can function as an artificial DNA-cleaving enzyme, which recognizes a base sequence of DNA as a target with the PPR protein part, and then cleaves DNA with the region for cleaving DNA. When the functional region has a transcription control function, a complex comprising a PPR protein part prepared according to the present invention and a transcription control region for DNA linked together is provided. Such a complex can function as an artificial transcription control factor, which recognizes a base sequence of DNA as a target with the PPR protein part, and then promotes transcription of the target DNA.
[0054] The present invention can further be utilized for a method for delivering the aforementioned complex in a living body so that the complex functions in the living body, and preparation of transformants utilizing a nucleic acid sequence (DNA and RNA) encoding a protein obtained according to the present invention, as well as specific modification, control, and impartation of a function in various situations in organisms (cells, tissues, and individuals).
BRIEF DESCRIPTION OF THE DRAWINGS
[0055] FIGS. 1A-1C show conserved sequences and amino acid numbers of the PPR motif. FIG. 1A shows the amino acids constituting the PPR motif defined in the present invention, and the amino acid numbers thereof (the amino acid sequences P, S, L1, and L2 correspond to SEQ ID NO: 20, SEQ ID NO: 21, SEQ ID NO: 22, and SEQ ID NO: 23, respectively). FIG. 1B shows positions of three amino acids (No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A.) that control binding base selectivity in the predicted structure. FIG. 1C shows two examples of the structure of the PPR motif, and the positions of the amino acids on the predicted structure for each case. No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A. are indicated with sticks of magenta color (dark gray in the case of monochratic display) in the conformational diagrams of the protein.
[0056] FIG. 2 summarizes the outlines of the structures of Arabidopsis thaliana p63 (amino acid sequence of SEQ ID NO: 1), the GUN1 protein of Arabidopsis thaliana (amino acid sequence of SEQ ID NO: 2), pTac2 of Arabidopsis thaliana (amino acid sequence of SEQ ID NO: 3), DG1 (amino acid sequences of SEQ ID NO: 4), and GRP23 of Arabidopsis thaliana (amino acid sequence of SEQ ID NO: 5), which are DNA-binding type PPR proteins that function in DNA metabolism, and the outline of the assay system for demonstrating that they bind to DNA.
[0057] FIG. 3 summarizes the amino acid frequencies of the amino acids at the three positions bearing the nucleic acid recognition codes in the PPR motif (No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A.) for the PPR motifs of the PPR proteins (SEQ ID NOS: 1 to 5), for which DNA binding property was suggested, and known RNA-binding type motifs.
[0058] FIG. 4-1 shows the positions of the PPR motifs included in the inside of the proteins, and the positions of the three amino acids bearing the nucleic acid recognition codes (No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A.) in the PPR motifs for each of (A) Arabidopsis thaliana p63 (amino acid sequence of SEQ ID NO: 1) and (B) the GUN1 protein of Arabidopsis thaliana (amino acid sequence of SEQ ID NO: 2.
[0059] FIG. 4-2 shows the positions of the PPR motifs included in the inside of the proteins, and the positions of the three amino acids bearing the nucleic acid recognition codes (No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A.) in the PPR motifs for each of (C) pTac2 of Arabidopsis thaliana (amino acid sequence of SEQ ID NO: 3), and (D) DG1 (amino acid sequence of SEQ ID NO: 4).
[0060] FIG. 4-3 shows the positions of the PPR motifs included in the inside of the proteins, and the positions of the three amino acids bearing the nucleic acid recognition codes (No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A.) in the PPR motifs for (E) GRP23 of Arabidopsis thaliana (amino acid sequence of SEQ ID NO: 5).
[0061] FIG. 5 shows the evaluation of the sequence-specific DNA-binding abilities of the PPR molecules. Artificial transcription factors were prepared by fusing each of three kinds of DNA-binding type (regarded so) PPR molecules with VP64, which is a transcription activation domain, and whether they could activate a luciferase reporter having each target sequence was examined in a human cultured cell.
[0062] FIG. 6 shows comparison of the luciferase activities observed by cointroduction of pTac2-VP64 or GUN1-VP64 with pminCMV-luc2 as a negative control, or a reporter vector comprising 4 or 8 target sequences. As a result, there was observed a tendency that the activity increased with increase of the target sequence for the both molecules, and thus it was verified that these PPR-VP64 molecules specifically bound to each target sequence to function as a site-specific transcription activator.
MODES FOR CARRYING OUT THE INVENTION
[0063] [PPR Motif and PPR Protein]
[0064] The "PPR motif" referred to in the present invention means a polypeptide constituted with 30 to 38 amino acids and having an amino acid sequence that shows, when the amino acid sequence is analyzed with a protein domain search program on the web (for example, Pfam, Prosite, Uniprot, etc.), an E value not larger than a predetermined value (desirably E-03) obtained at PF01535 in the case of Pfam (http://pfam.sanger.ac.uk/), or PS51375 in the case of Prosite (http://www.expasy.org/prosite/), unless otherwise indicated. The PPR motifs in various proteins are also defined in the Uniprot database (http://www.uniprot.org).
[0065] Although the amino acid sequence of the PPR motif is not highly conserved in the PPR motif of the present invention, such a secondary structure of helix, loop, helix, and loop as shown by the following formula is conserved well.
[Formula 2]
(Helix A)-X-(Helix B)-L (Formula 1)
[0066] The position numbers of the amino acids constituting the PPR motif defined in the present invention are according to those defined in a paper of the inventors of the present invention (Kobayashi K, et al., Nucleic Acids Res., 40, 2712-2723 (2012)). That is, the position numbers of the amino acids constituting the PPR motif defined in the present invention are substantially the same as the amino acid numbers defined for PF01535 in Pfam, but correspond to numbers obtained by subtracting 2 from the amino acid numbers defined for PS51375 in Prosite (for example, position 1 according to the present invention is position 3 of PS51375), and also correspond to numbers obtained by subtracting 2 from the amino acid numbers of the PPR motif defined in Uniprot.
[0067] More precisely, in the present invention, the No. 1 amino acid is the first amino acid from which Helix A shown in the formula 1 starts. The No. 4 amino acid is the fourth amino acid counted from the No. 1 amino acid. As for "ii" (-2)nd amino acid,
[0068] when a next PPR motif (M.sub.n+1) contiguously exists on the C-terminus side of the PPR motif (M.sub.n) (when there is no amino acid insertion between the PPR motifs, as in the cases of, for example, Motif Nos. 1, 2, 3, 4, 6 and 7 in FIG. 4-1 (A)), the -2nd amino acid counted from the end (C-terminus side) of the amino acids constituting the PPR motif (M.sub.n) is referred to as No. "ii" (-2) amino acid;
[0069] when a non-PPR motif (part that is not the PPR motif) consisting of 1 to 20 amino acids exists between the PPR motif (M.sub.n) and the next PPR motif (M.sub.n+1) on the C-terminus side (as in the cases of, for example, Motif Nos. 5 and 8 in FIG. 4-1 (A), and Motif Nos. 1, 2, 7 and 8 in FIG. 4-3 (D)), the amino acid locating upstream of the first amino acid of the next PPR motif (M.sub.n+1) by 2 positions, i.e., the -2nd amino acid, is referred to as No. "ii" (-2) amino acid (refer to FIG. 1); or
[0070] when any next PPR motif (M.sub.n+1) does not exist on the C-terminus side of the PPR motif (M.sub.n) (as in the cases of, for example, Motif No. 9 in FIG. 4-1 (A), and Motif No. 11 in FIG. 4-1 (B)), or 21 or more amino acids constituting a non-PPR motif exist between the PPR motif (M.sub.n) and the next PPR motif (M.sub.n+1) on the C-terminus side, the 2nd amino acid counted from the end (C-terminus side) of the amino acids constituting the PPR motif (M.sub.n) is referred to as No. "ii" (-2) amino acid.
[0071] The "PPR protein" referred to in the present invention means a PPR protein having two or more of the aforementioned PPR motifs, unless otherwise indicated. The term "protein" used in this specification means any substance consisting of a polypeptide (chain consisting of two or more amino acids bound through peptide bonds), and also includes those consisting of a comparatively low molecular weight polypeptide, unless otherwise indicated. The "amino acid" referred to in the present invention means a usual amino acid molecule, as well as an amino acid residue constituting a peptide chain. Which the term means will be apparent to those skilled in the art from the context.
[0072] Many PPR proteins exist in plants, and 500 proteins and about 5000 motifs can be found in Arabidopsis thaliana. PPR motifs and PPR proteins of various amino acid sequences also exist in many land plants such as rice, poplar, and selaginella. It is known that some PPR proteins are important factors for obtaining F1 seeds for hybrid vigor as fertility restoration factors that are involved in formation of pollen (male gamete). It has been clarified that some PPR proteins are involved in speciation, similarly in fertility restoration. It has also been clarified that almost all the PPR proteins act on RNA in mitochondria or chloroplasts.
[0073] It is known that, in animals, anomaly of the PPR protein identified as LRPPRC causes Leigh syndrom French Canadian (LSFC, Leigh's syndrome, subacute necrotizing encephalomyelopathy).
[0074] The term "selective" used for a property of a PPR motif for binding with a DNA base in the present invention means that a binding activity for any one base among the DNA bases is higher than binding activities for the other bases, unless otherwise indicates. Those skilled in the art can confirm this selectivity by planning an experiment, or it can also be obtained by calculation as described in the examples mentioned in this specification.
[0075] The DNA base referred to in the present invention means a base of deoxyribonucleotide constituting DNA, and specifically, it means any of adenine (A), guanine (G), cytosine (C), and thymine (T), unless otherwise indicated. Although the PPR protein may have selectivity to a base in DNA, it does not bind to a nucleic acid monomer.
[0076] Although search methods for conserved amino acid sequence as the PPR motif had been established before the present invention was accomplished, any rule concerning selective binding with DNA base had not been discovered at all.
Findings Provided by the Present Invention
[0077] The following findings are provided by the present invention.
(I) Information about Positions of Amino Acids Important for Selective Binding
[0078] Specifically, under the following definitions:
the first amino acid of Helix A of the PPR motif is referred to as No. 1 amino acid (No. 1 A.A.), the fourth amino acid as No. 4 amino acid (No. 4 A.A.), and
[0079] when a next PPR motif (M.sub.n+1) contiguously exists on the C-terminus side of the PPR motif (M.sub.n) (when there is no amino acid insertion between the PPR motifs), the -2nd amino acid counted from the end (C-terminus side) of the amino acids constituting the PPR motif (M.sub.n);
[0080] when a non-PPR motif consisting of 1 to 20 amino acids exist between the PPR motif (M.sub.n) and the next PPR motif (M.sub.n+1) on the C-terminus side, the amino acid locating upstream of the first amino acid of the next PPR motif (M.sub.n+1) by 2 positions, i.e., the -2nd amino acid; or
[0081] when any next PPR motif (M.sub.n+1) does not exist on the C-terminus side of the PPR motif (M.sub.n), or 21 or more amino acids constituting a non-PPR motif exist between the PPR motif (M.sub.n) and the next PPR motif (M.sub.n+1) on the C-terminus side, the 2nd amino acid counted from the end (C-terminus side) of the amino acids constituting the PPR motif (M.sub.n)
is referred to as No. "ii" (-2) amino acid (No. "ii" (-2) A.A.), combination of the three amino acids, the first and fourth amino acids of the helix (Helix A), No. 1 and No. 4 amino acids, and No. "ii" (-2) A.A. defined above (No. 1 A.A., No. 4 A.A. and No. "ii" (-2) A.A.) is important for selective binding to a DNA base, and to what kind of DNA base the motif binds can be determined on the basis of the combination.
[0082] The present invention is based on the findings concerning the combination of the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., found by the inventors of the present invention. Specifically:
(1-1) when No. 4 A.A. is glycine (G), No. 1 A.A. may be an arbitrary amino acid, No. "ii" (-2) A.A. is aspartic acid (D), asparagine (N), or serine (S), and the combination of No. 1 A.A., and No. "ii" (-2) A.A. may be, for example:
[0083] a combination of an arbitrary amino acid and aspartic acid (D) (*GD),
[0084] preferably a combination of glutamic acid (E) and aspartic acid (D) (EGD),
[0085] a combination of an arbitrary amino acid and asparagine (N) (*GN),
[0086] preferably a combination of glutamic acid (E) and asparagine (N) (EGN), or
[0087] a combination of an arbitrary amino acid and serine (S) (*GS);
(1-2) when No. 4 A.A. is isoleucine (I), each of No. 1 A.A. and No. "ii" (-2) A.A. may be an arbitrary amino acid, and the combination of No. 1 A.A., and No. "ii" (-2) A.A. may be, for example:
[0088] a combination of an arbitrary amino acid and asparagine (N) (*IN);
(1-3) when No. 4 A.A. is leucine (L), each of No. 1 A.A. and No. "ii" (-2) A.A. may be an arbitrary amino acid, and the combination of No. 1 A.A., and No. "ii" (-2) A.A. may be, for example:
[0089] a combination of an arbitrary amino acid and aspartic acid (D) (*LD), or
[0090] a combination of an arbitrary amino acid and lysine (K) (*LK);
(1-4) when No. 4 A.A. is methionine (M), each of No. 1 A.A. and No. "ii" (-2) A.A. may be an arbitrary amino acid, and the combination of No. 1 A.A., and No. "ii" (-2) A.A. may be, for example:
[0091] a combination of an arbitrary amino acid and aspartic acid (D) (*MD), or
[0092] a combination of isoleucine (I) and aspartic acid (D) (IMD);
(1-5) when No. 4 A.A. is asparagine (N), each of No. 1 A.A. and No. "ii" (-2) A.A. may be an arbitrary amino acid, and the combination of No. 1 A.A., and No. "ii" (-2) A.A. may be, for example:
[0093] a combination of an arbitrary amino acid and aspartic acid (D) (*ND),
[0094] a combination of any one of phenylalanine (F), glycine (G), isoleucine (I), threonine (T), valine (V) and tyrosines (Y), and aspartic acid (D) (FND, GND, IND, TND, VND, or YND),
[0095] a combination of an arbitrary amino acid and asparagine (N) (*NN),
[0096] a combination of any one of isoleucine (I), serine (S) and valine (V), and asparagine (N) (INN, SNN or VNN)
[0097] a combination of an arbitrary amino acid and serine (S) (*NS),
[0098] a combination of valine (V) and serine (S) (VNS),
[0099] a combination of an arbitrary amino acid and threonine (T) (*NT),
[0100] a combination of valine (V) and threonine (T) (VNT),
[0101] a combination of an arbitrary amino acid and tryptophan (W) (*NW), or
[0102] a combination of isoleucine (I) and tryptophan (W) (INW);
(1-6) when No. 4 A.A. is proline (P), each of No. 1 A.A. and No. "ii" (-2) A.A. may be an arbitrary amino acid, and the combination of No. 1 A.A., and No. "ii" (-2) A.A. may be, for example:
[0103] a combination of an arbitrary amino acid and aspartic acid (D) (*PD),
[0104] a combination of phenylalanine (F) and aspartic acid (D) (FPD), or
[0105] a combination of tyrosine (Y) and aspartic acid (D) (YPD);
(1-7) when No. 4 A.A. is serine (S), each of No. 1 A.A. and No. "ii" (-2) A.A. may be an arbitrary amino acid, and the combination of No. 1 A.A., and No. "ii" (-2) A.A. may be, for example:
[0106] a combination of an arbitrary amino acid and asparagine (N) (*SN),
[0107] a combination of phenylalanine (F) and asparagine (N) (FSN), or
[0108] a combination of valine (V) and asparagine (N) (VSN);
(1-8) when No. 4 A.A. is threonine (T), each of No. 1 A.A. and No. "ii" (-2) A.A. may be an arbitrary amino acid, and the combination of No. 1 A.A., and No. "ii" (-2) A.A. may be, for example:
[0109] a combination of an arbitrary amino acid and aspartic acid (D) (*TD),
[0110] a combination of valine (V) and aspartic acid (D) (VTD),
[0111] a combination of an arbitrary amino acid and asparagine (N) (*TN),
[0112] a combination of phenylalanine (F) and asparagine (N) (FTN),
[0113] a combination of isoleucine (I) and asparagine (N) (ITN), or
[0114] a combination of valine (V) and asparagine (N) (VTN); and
(1-9) when No. 4 A.A. is valine (V), each of No. 1 A.A. and No. "ii" (-2) A.A. may be an arbitrary amino acid, and the combination of No. 1 A.A., and No. "ii" (-2) A.A. may be, for example:
[0115] a combination of isoleucine (I) and aspartic acid (D) (IVD),
[0116] a combination of an arbitrary amino acid and glycine (G) (*VG), or
[0117] a combination of an arbitrary amino acid and threonine (T) (*VT).
(II) Information about Correspondence of Combination of Three Amino Acids of No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., and DNA Base
[0118] The protein is a protein determined on the basis of, specifically, the following definitions, and having a selective DNA base-binding property:
(2-1) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, glycine, and aspartic acid, respectively, the PPR motif selectively binds to G; (2-2) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are glutamic acid, glycine, and aspartic acid, respectively, the PPR motif selectively binds to G; (2-3) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, glycine, and asparagine, respectively, the PPR motif selectively binds to A; (2-4) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are glutamic acid, glycine, and asparagine, respectively, the PPR motif selectively binds to A; (2-5) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, glycine, and serine, respectively, the PPR motif selectively binds to A, and next binds to C; (2-6) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, isoleucine, and an arbitrary amino acid, respectively, the PPR motif selectively binds to T and C; (2-7) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, isoleucine, and asparagine, respectively, the PPR motif selectively binds to T, and next binds to C; (2-8) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, leucine, and an arbitrary amino acid, respectively, the PPR motif selectively binds to T and C; (2-9) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, leucine, and aspartic acid, respectively, the PPR motif selectively binds to C; (2-10) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, leucine, and lysine, respectively, the PPR motif selectively binds to T; (2-11) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, methionine, and an arbitrary amino acid, respectively, the PPR motif selectively binds to T; (2-12) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, methionine, and aspartic acid, respectively, the PPR motif selectively binds to T; (2-13) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are isoleucine, methionine, and aspartic acid, respectively, the PPR motif selectively binds to T, and next binds to C; (2-14) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, asparagine, and an arbitrary amino acid, respectively, the PPR motif selectively binds to C and T; (2-15) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, asparagine, and aspartic acid, respectively, the PPR motif selectively binds to T; (2-16) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are phenylalanine, asparagine, and aspartic acid, respectively, the PPR motif selectively binds to T; (2-17) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are glycine, asparagine, and aspartic acid, respectively, the PPR motif selectively binds to T; (2-18) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are isoleucine, asparagine, and aspartic acid, respectively, the PPR motif selectively binds to T; (2-19) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are threonine, asparagine, and aspartic acid, respectively, the PPR motif selectively binds to T; (2-20) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A. are valine, asparagine, and aspartic acid, respectively, the PPR motif selectively binds to T, and next binds to C; (2-21) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A. are tyrosine, asparagine, and aspartic acid, respectively, the PPR motif selectively binds to T, and next binds to C; (2-22) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, asparagine, and asparagine, respectively, the PPR motif selectively binds to C; (2-23) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are isoleucine, asparagine, and asparagine, respectively, the PPR motif selectively binds to C; (2-24) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are serine, asparagine, and asparagine, respectively, the PPR motif selectively binds to C; (2-25) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are valine, asparagine, and asparagine, respectively, the PPR motif selectively binds to C; (2-26) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, asparagine, and serine, respectively, the PPR motif selectively binds to C; (2-27) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are valine, asparagine, and serine, respectively, the PPR motif selectively binds to C; (2-28) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, asparagine, and threonine, respectively, the PPR motif selectively binds to C; (2-29) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are valine, asparagine, and threonine, respectively, the PPR motif selectively binds to C; (2-30) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, asparagine, and tryptophan, respectively, the PPR motif selectively binds to C, and next binds to T; (2-31) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are isoleucine, asparagine, and tryptophan, respectively, the PPR motif selectively binds to T, and next binds to C; (2-32) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, proline, and an arbitrary amino acid, respectively, the PPR motif selectively binds to T; (2-33) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, proline, and aspartic acid, respectively, the PPR motif selectively binds to T; (2-34) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are phenylalanine, proline, and aspartic acid, respectively, the PPR motif selectively binds to T; (2-35) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are tyrosine, proline, and aspartic acid, respectively, the PPR motif selectively binds to T; (2-36) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, serine, and an arbitrary amino acid, respectively, the PPR motif selectively binds to A and G; (2-37) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, serine, and asparagine, respectively, the PPR motif selectively binds to A; (2-38) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are phenylalanine, serine, and asparagine, respectively, the PPR motif selectively binds to A; (2-39) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are valine, serine, and asparagine, respectively, the PPR motif selectively binds to A; (2-40) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, threonine, and an arbitrary amino acid, respectively, the PPR motif selectively binds to A and G; (2-41) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, threonine, and aspartic acid, respectively, the PPR motif selectively binds to G; (2-42) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are valine, threonine, and aspartic acid, respectively, the PPR motif selectively binds to G; (2-43) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, threonine, and asparagine, respectively, the PPR motif selectively binds to A; (2-44) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are phenylalanine, threonine, and asparagine, respectively, the PPR motif selectively binds to A; (2-45) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are isoleucine, threonine, and asparagine, respectively, the PPR motif selectively binds to A; (2-46) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are valine, threonine, and asparagine, respectively, the PPR motif selectively binds to A; (2-47) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, valine, and an arbitrary amino acid, respectively, the PPR motif binds with A, C, and T, but does not bind to G; (2-48) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are isoleucine, valine, and aspartic acid, respectively, the PPR motif selectively binds to C, and next binds to A; (2-49) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, valine, and glycine, respectively, the PPR motif selectively binds to C; and (2-50) when the three amino acids, No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A., are an arbitrary amino acid, valine, and threonine, respectively, the PPR motif selectively binds to T.
[0119] Combination of amino acids of specific positions and binding property with a DNA base can be confirmed by experiments. Experiments for such purposes include preparation of a PPR motif or a protein containing two or more PPR motifs, preparation of a substrate DNA, and binding property test (for example, gel shift assay). These experiments are well known to those skilled in the art, and as for more specific procedures and conditions, for example, Patent document 2 can be referred to.
[0120] [Use of PPR Motif and PPR Protein]
[0121] Identification and Design
[0122] One PPR motif recognizes a specific one kind of base of DNA, and two or more contiguous PPR motifs can recognize continuous bases in a DNA sequence. Further, according to the present invention, by appropriately choosing amino acids at specific positions, PPR motifs selective for each of A, T, and C can be chosen or designed, and a protein containing an appropriate continuation of such PPR motifs can recognize a corresponding specific sequence. Therefore, according to the present invention, a naturally occurring PPR protein that selectively binds to DNA having a specific base sequence can be predicted or identified, or conversely, DNA as a target of binding of a PPR protein can be predicted and identified. Prediction or identification of such a target is useful for clarifying genetic identity of the target, and is also useful from a viewpoint that such prediction or identification may expand applicability of the target.
[0123] Furthermore, according to the present invention, a PPR motif that can selectively bind to a desired DNA base, and a protein having two or more PPR motifs that can bind to a desired DNA in a sequence-specific manner can be designed. In such design, as for the part other than the amino acids at the important positions in the PPR motif, sequence information on PPR motifs of naturally occurring type in DNA-binding type PPR proteins such as those of SEQ ID NOS: 1 to 5 can be referred to. Further, the motif or protein may also be designed by using a motif or protein of naturally occurring type as a whole, and replacing only the amino acids of the corresponding positions. Although the number of repetitions of PPR motifs can be appropriately chosen according to a target sequence, it may be, for example, 2 or more, preferably 2 to 30, more preferably 5 to 25, most preferably 9 to 15.
[0124] In the designing, amino acids other than those of the combination of the amino acids of No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A. may be taken into consideration. For example, selection of the amino acids of No. 8 and No. 12 described in Patent document 2 mentioned above may be important for exhibiting a DNA-binding activity. According to the researches of the inventors of the present invention, the No. 8 amino acid of a certain PPR motif and the No. 12 amino acid of the same PPR motif may cooperate in binding with DNA. The No. 8 amino acid may be a basic amino acid, preferably lysine, or an acidic amino acid, preferably aspartic acid, and the No. 12 amino acid may be a basic amino acid, neutral amino acid, or hydrophobic amino acid.
[0125] A designed motif or protein can be prepared by methods well known to those skilled in the art. That is, the present invention provides a PPR motif that selectively binds to a specific DNA base, and a PPR protein that specifically binds to DNA having a specific sequence, in which attention is paid to the combination of the amino acids of No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A. Such a motif and protein can be prepared even in a comparatively large amount by methods well known to those skilled in the art, and such methods may comprise determining a nucleic acid sequence encoding a target motif or protein from the amino acid sequence of the target motif or protein, cloning it, and preparing a transformant that produces the target motif or protein.
[0126] Preparation of Complex and Use Thereof
[0127] The PPR motif or PPR protein provided by the present invention can be made into a complex by binding a functional region. The functional region generally refers to a part having such a function as a specific biological function exerted in a living body or cell, for example, enzymatic function, catalytic function, inhibitory function, promotion function, etc, or a function as a marker. Such a region consists of, for example, a protein, peptide, nucleic acid, physiologically active substance, or drug.
[0128] According to the present invention, by binding a functional region to the PPR protein, the target DNA sequence-binding function exerted by the PPR protein, and the function exerted by the functional region can be exhibited in combination. For example, if a protein having a DNA-cleaving function (for example, restriction enzyme such as FokI) or a nuclease domain thereof is used as the functional region, the complex can function as an artificial DNA-cleaving enzyme.
[0129] In order to produce such a complex, methods generally available in this technical field can be used, and there are known a method of synthesizing such a complex as one protein molecule, a method of separately synthesizing two or more members of proteins, and then combining them to form a complex, and so forth.
[0130] In the case of the method of synthesizing a complex as one protein molecule, for example, a protein complex can be designed so as to comprise a PPR protein and a cleaving enzyme bound to the C-terminus of the PPR protein via an amino acid linker, an expression vector structure for expressing the protein complex can be constructed, and the target complex can be expressed from the structure. As such a preparation method, the method described in Japanese Patent Application No. 2011-242250, and so forth can be used.
[0131] For binding the PPR protein and the functional region protein, any binding means known in this technical field may be used, including binding via an amino acid linker, binding utilizing specific affinity such as binding between avidin and biotin, binding utilizing another chemical linker, and so forth.
[0132] The functional region usable in the present invention refers to a region that can impart any one of various functions such as those for cleavage, transcription, replication, restoration, synthesis, or modification of DNA, and so forth. By choosing the sequence of the PPR motif to define a DNA base sequence as a target, which is the characteristic of the present invention, substantially any DNA sequence may be used as the target, and with such a target, genome edition utilizing the function of the functional region such as those for cleavage, transcription, replication, restoration, synthesis, or modification of DNA can be realized.
[0133] For example, when the function of the functional region is a DNA cleavage function, there is provided a complex comprising a PPR protein part prepared according to the present invention and a DNA cleavage region bound together. Such a complex can function as an artificial DNA-cleaving enzyme that recognizes a base sequence of DNA as a target by the PPR protein part, and then cleaves DNA by the DNA cleavage region.
[0134] An example of the functional region having a cleavage function usable for the present invention is a deoxyribonuclease (DNase), which functions as an endodeoxyribonuclease. As such a DNase, for example, endodeoxyribonucleases such as DNase A (e.g., bovine pancreatic ribonuclease A, PDB 2AAS), DNase H and DNase I, restriction enzymes derived from various bacteria (for example, FokI (SEQ ID NO: 6) etc.) and nuclease domains thereof can be used. Such a complex comprising a PPR protein and a functional region does not exist in the nature, and is novel.
[0135] When the function of the functional region is a transcription control function, there is provided a complex comprising a PPR protein part prepared according to the present invention and a DNA transcription control region bound together. Such a complex can function as an artificial transcription control factor, which recognizes a base sequence of DNA as a target by the PPR protein part, and then controls transcription of the target DNA.
[0136] The functional region having a transcription control function usable for the present invention may be a domain that activates transcription, or may be a domain that suppresses transcription. Examples of the transcription control domain include VP16, VP64, TA2, STAT-6, and p65. Such a complex comprising a PPR protein and a transcription control domain does not exist in the nature, and is novel.
[0137] Further, the complex obtainable according to the present invention may deliver a functional region in a living body or cell in a DNA sequence-specific manner, and allow it to function. It thereby makes it possible to perform modification or disruption in a DNA sequence-specific manner in a living body or cell, like protein complexes utilizing a zinc finger protein (Non-patent documents 1 and 2 mentioned above) or TAL effecter (Non-patent document 3 and Patent document 1 mentioned above), and thus it becomes possible to impart a novel function, i.e., function for cleavage of DNA and genome edition utilizing that function. Specifically, with a PPR protein comprising two or more PPR motifs that can bind with a specific base linked together, a specific DNA sequence can be recognized. Then, genome edition of the recognized DNA region can be realized by the functional region bound to the PPR protein using the function of the functional region.
[0138] Furthermore, by binding a drug to the PPR protein that binds to a DNA sequence in a DNA sequence-specific manner, the drug may be delivered to the neighborhood of the DNA sequence as the target. Therefore, the present invention provides a method for DNA sequence-specific delivery of a functional substance.
[0139] It has been clarified that the PPR protein used as a material in the present invention works to specify an edition position for DNA edition, and such a PPR motif having specific amino acids arranged at the positions of the residues of No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A. recognizes a specific base on DNA, and then exhibits the DNA-binding activity thereof. On the basis of such a characteristic, a PPR protein of this type that has specific amino acids arranged at the positions of the residues of No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A. can be expected to recognize a base on DNA specific to each PPR protein, and as a result, introduce base polymorphism, or to be used in a treatment of a disease or condition resulting from a base polymorphism, and in addition, it is considered that the combination of such a PPR protein with such another functional region as mentioned above contribute to modification or improvement of functions for realizing cleavage of DNA for genome edition.
[0140] Moreover, an exogenous DNA-cleaving enzyme can be fused to the C-terminus of the PPR protein. Alternatively, by improving binding DNA base selectivity of the PPR motif on the N-terminus side, a DNA sequence-specific DNA-cleaving enzyme can also be constituted. Moreover, such a complex to which a marker part such as GFP is bound can also be used for visualization of a desired DNA in vivo.
EXAMPLES
Example 1: Collection of PPR Proteins and Target Sequences Thereof Used for DNA Edition
[0141] By referring to the information provided in the prior art references (Non-patent documents 11 to 15), structures and functions of the p63 protein (SEQ ID NO: 1), GUN1 protein (SEQ ID NO: 2), pTac2 protein (SEQ ID NO: 3), DG1 protein (SEQ ID NO: 4), and GRP23 protein (SEQ ID NO: 5) were analyzed.
[0142] To the PPR motif structures in such proteins, amino acid numbers defined in the present invention were imparted together with the information of the Uniprot database (http://www.uniprot.org/). The PPR motifs contained in the five kinds of PPR proteins of Arabidopsis thaliana (SEQ ID NOS: 1 to 5) used for the experiment, and the amino acid numbers thereof are shown in FIG. 3.
[0143] Specifically, amino acid frequencies for the amino acids at the three positions (No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A.) responsible for the nucleic acid recognition codes in the PPR motifs considered to be important at the time of targeting RNA in the aforementioned p63 protein (SEQ ID NO: 1), GUN1 protein (SEQ ID NO: 2), pTac2 protein (SEQ ID NO: 3), DG1 protein (SEQ ID NO: 4), and GRP23 protein (SEQ ID NO: 5) were compared with those of RNA-binding type motifs.
[0144] The p63 protein of Arabidopsis thaliana (SEQ ID NO: 1) has 9 PPR motifs, and the positions of the residues of No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A. in the amino acid sequence are as summarized in the following table and FIG. 3.
TABLE-US-00001 TABLE 1 Code Base to be bound (ratio) A.sub.1 A.sub.4 L.sub.ii (1, 4, ii) A C G T PPR 230, V 233, R 263, S *R* 0.25 0.07 0.06 0.62 motif 1 PPR 265, F 268, D 297, S *D* 0.25 0.24 0.23 0.29 motif 2 PPR 299, L 302, K 322, D *KD 0.20 0.18 0.28 0.34 motif 3 PPR 334, Q 377, A 367, N *AN 0.45 0.18 0.05 0.32 motif 4 PPR 369, R 372, K 399, Y *K* 0.17 0.32 0.23 0.29 motif 5 PPR 401, E 404, L 434, S *LS 0.22 0.37 0.06 0.34 motif 6 PPR 436, S 439, S 469, E *SE 0.58 0.07 0.10 0.25 motif 7 PPR 471, T 474, D 505, M *D* 0.25 0.24 0.23 0.29 motif 8 PPR 507, N 510, M 540, R *M* 0.13 0.14 0.22 0.51 motif 9
[0145] The GUN1 protein of Arabidopsis thaliana (SEQ ID NO: 2) has 11 PPR motifs, and the positions of the residues of No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A. in the amino acid sequence are as summarized in the following table and FIG. 3.
TABLE-US-00002 TABLE 2 Code Base to be bound (ratio) A.sub.1 A.sub.4 L.sub.ii (1, 4, ii) A C G T PPR 234, K 237, S 267, T *S* 0.41 0.12 0.22 0.25 motif 1 PPR 269, Y 272, S 302, N *SN 0.62 0.07 0.04 0.26 motif 2 PPR 304, V 307, N 338, D VND 0.06 0.21 0.06 0.66 motif 3 PPR 340, I 343, N 373, D IND 0.14 0.24 0.12 0.50 motif 4 PPR 375, F 378, N 408, N FNN 0.24 0.21 0.24 0.31 motif 5 PPR 410, V 413, S 443, D VSD 0.33 0.24 0.23 0.20 motif 6 PPR 445, V 448, N 478, D VND 0.06 0.21 0.06 0.66 motif 7 PPR 480, V 483, N 513, N VNN 0.17 0.48 0.09 0.26 motif 8 PPR 515, L 518, S 548, D *SD 0.20 0.17 0.39 0.24 motif 9 PPR 550, V 553, S 583, N VSN 0.57 0.09 0.05 0.30 motif 10 PPR 585, V 588, N 620, A *N* 0.10 0.33 0.10 0.48 motif 11
[0146] The pTac2 protein of Arabidopsis thaliana (SEQ ID NO: 3) has 15 PPR motifs, and the positions of the residues of No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A. in the amino acid sequence are as summarized in the following table and FIG. 3.
TABLE-US-00003 TABLE 3 Code Base to be bound A.sub.1 A.sub.4 L.sub.ii (1, 4, ii) A C G T PPR 106, N 109, A 140, N *AN 0.45 0.18 0.05 0.32 motif 1 PPR 142, H 145, T 175, S *TS 0.37 0.29 0.15 0.19 motif 2 PPR 177, F 180, T 210, S *TS 0.37 0.29 0.15 0.19 motif 3 PPR 212, L 215, N 246, D LND 0.08 0.15 0.23 0.54 motif 4 PPR 248, V 251, N 281, D VND 0.06 0.21 0.06 0.66 motif 5 PPR 283, T 286, S 316, D TSD 0.14 0.18 0.14 0.54 motif 6 PPR 318, T 321, N 351, N TNN 0.08 0.49 0.17 0.26 motif 7 PPR 353, N 356, S 386, D *SD 0.20 0.17 0.39 0.24 motif 8 PPR 388, A 491, N 421, D AND 0.07 0.05 0.14 0.74 motif 9 PPR 423, E 426, E 456, S B.G. 0.25 0.21 0.18 0.36 motif 10 PPR 458, K 461, T 491, S *TS 0.37 0.29 0.15 0.19 motif 11 PPR 493, E 496, H 526, N *H* 0.17 0.34 0.06 0.43 motif 12 PPR 528, D 531, N 561, D *ND 0.11 0.17 0.10 0.62 motif 13 PPR 563, R 566, E 596, S B.G. 0.25 0.21 0.18 0.36 motif 14 PPR 598, M 601, C 631. I *C* 0.55 0.10 0.21 0.14 motif 15 (B.G. means background)
[0147] The DG1 protein of Arabidopsis thaliana (SEQ ID NO: 4) has 10 PPR motifs, and the positions of the residues of No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A. in the amino acid sequence are as summarized in the following table and FIG. 3.
TABLE-US-00004 TABLE 4 Code Base to be bound A.sub.1 A.sub.4 L.sub.ii (1, 4, ii) A C G T PPR 256, F 259, T 290, D *TD 0.10 0.10 0.67 0.13 motif 1 PPR 292, A 295, H 340, D *H* 0.17 0.34 0.06 0.43 motif 2 PPR 342, V 345, N 375, N VNN 0.17 0.48 0.09 0.26 motif 3 PPR 377, A 380, G 410, K *G* 0.29 0.13 0.31 0.27 motif 4 PPR 412, I 415, K 445, T *K* 0.17 0.32 0.23 0.29 motif 5 PPR 477, S 450, Y 481, L B.G. 0.25 0.21 0.18 0.36 motif 6 PPR 483, I 486, T 515, N ITN 0.79 0.06 0.05 0.10 motif 7 PPR 517, G 520, N 553, N *NN 0.12 0.44 0.13 0.30 motif 8 PPR 555, Y 558, S 588, D YSD 0.25 0.15 0.39 0.20 motif 9 PPR 590, T 593, A 623, H *AH 0.41 0.08 0.07 0.45 motif 10 (B.G. means background)
[0148] The GRP23 protein of Arabidopsis thaliana (SEQ ID NO: 5) has 11 PPR motifs, and the positions of the residues of No. 1 A.A., No. 4 A.A., and No. "ii" (-2) A.A. in the amino acid sequence are as summarized in the following table and FIG. 3.
TABLE-US-00005 TABLE 5 Code Base to be bound A.sub.1 A.sub.4 L.sub.ii (1, 4, ii) A C G T PPR 181, F 184, N 215, N FNN 0.24 0.21 0.24 0.31 motif 1 PPR 217, V 220, N 251, S VNS 0.07 0.61 0.05 0.27 motif 2 PPR 253, V 256, R 286, D *RD 0.25 0.07 0.06 0.62 motif 3 PPR 288, T 291, N 321, D TND 0.14 0.08 0.07 0.71 motif 4 PPR 323, I 326, A 356, H *AH 0.41 0.08 0.07 0.45 motif 5 PPR 358, P 361, N 396, N *NN 0.12 0.44 0.13 0.30 motif 6 PPR 398, D 401, G 435, D *GD 0.09 0.09 0.59 0.25 motif 7 PPR 437, L 404, C 470, D *CD 0.30 0.15 0.35 0.20 motif 8 PPR 472, P 475, R 505, V *R* 0.25 0.07 0.06 0.62 motif 9 PPR 507, D 510, A 540, D *AD 0.10 0.22 0.39 0.29 motif 10 PPR 542, S 545, D 575, T *D* 0.25 0.24 0.23 0.29 motif 11 (B.G. means background)
[0149] The amino acid frequencies for these positions were confirmed for each protein, and compared with the amino acid frequencies for the same positions of the RNA-binding type motifs. The results are shown in FIG. 2. It became clear that the tendencies of the amino acid frequencies in the PPR motifs of the PPR proteins for which DNA-binding property is suggested, and the RNA-binding type motifs substantially agreed with each other. That is, it became clear that the PPR proteins that act to bind to DNA bind with nucleic acids according to same sequence rules as those of the PPR proteins that act to bind to RNA, and the RNA recognition codes described in the pending patent application of the inventors of the present invention (PCT/JP2012/077274) can be applied as the DNA recognition codes of the PPR proteins that act to bind to DNA.
[0150] With reference to the RNA recognition codes described in the non-patent document (Yagi, Y. et al., Plos One, 2013, 8, e57286), the DNA-binding type PPR motifs that selectively bind to each corresponding base were evaluated. More precisely, a chi square test was performed on the basis of occurrence nucleotide frequencies shown in Table 6 and expected nucleotide frequencies calculated from the background frequencies. The test was performed for each base (NT), purine or pyrimidine (AG or CT, PY), hydrogen bond group (AT or GC, HB), or amino or keto form (AC or GT). Significant value was defined as P<0.06 (5E-02, 5% significance level), and when a significant value was obtained in any of the tests, the combination of No. 1 amino acid, No. 4 amino acid, and No. "ii" (-2) amino acid was chosen.
TABLE-US-00006 TABLE 6 Base selectivity of DNA-binding code NSRs occurrence of the Probabilitiy matrix Subtraction for background (1, 4, ii) NSR(s) A C G T A C G T *GD 14 0.10 0.06 0.57 0.28 -0.16 -0.15 0.40 -0.08 EGD 8 0.07 0.05 0.69 0.19 -0.19 -0.16 0.52 -0.17 *GN 11 0.55 0.10 0.04 0.31 0.29 -0.11 -0.13 -0.05 EGN 5 0.63 0.06 0.05 0.25 0.37 -0.15 -0.12 -0.11 *GS 3 0.57 0.23 0.06 0.14 0.31 0.02 -0.11 -0.22 *I* 15 0.15 0.29 0.10 0.45 -0.11 0.08 -0.07 0.09 *IN 4 0.17 0.28 0.06 0.50 -0.09 0.07 -0.11 0.14 *L* 23 0.20 0.30 0.03 0.47 -0.06 0.09 -0.14 0.11 *LD 5 0.19 0.47 0.05 0.28 -0.07 0.26 -0.12 -0.08 *LK 3 0.09 0.08 0.06 0.77 -0.17 -0.13 -0.11 0.41 *M* 10 0.14 0.15 0.15 0.56 -0.12 -0.06 -0.02 0.20 *MD 9 0.15 0.13 0.17 0.55 -0.11 -0.08 0.00 0.19 IMD 4 0.09 0.24 0.06 0.62 -0.17 0.03 -0.11 0.26 *N* 147 0.11 0.33 0.10 0.45 -0.15 0.12 -0.07 0.09 ND 72 0.11 0.18 0.10 0.61 -0.15 -0.03 -0.07 0.25 FND 13 0.23 0.19 0.10 0.49 -0.03 -0.02 -0.07 0.13 GND 3 0.09 0.08 0.06 0.77 -0.17 -0.13 -0.11 0.41 IND 5 0.22 0.13 0.05 0.60 -0.04 -0.08 -0.12 0.24 TND 3 0.15 0.08 0.06 0.72 -0.11 -0.13 -0.11 0.36 VND 23 0.06 0.25 0.06 0.63 -0.20 0.04 -0.11 0.27 YND 6 0.08 0.30 0.11 0.52 -0.18 0.09 -0.06 0.16 *NN 34 0.15 0.45 0.14 0.27 -0.11 0.24 -0.03 -0.09 INN 7 0.12 0.49 0.05 0.34 -0.14 0.28 -0.12 -0.02 SNN 3 0.09 0.60 0.06 0.24 -0.17 0.39 -0.11 -0.12 VNN 10 0.20 0.53 0.04 0.23 -0.06 0.32 -0.13 -0.13 *NS 13 0.11 0.47 0.07 0.36 -0.15 0.26 -0.10 0.00 VNS 5 0.08 0.66 0.05 0.21 -0.18 0.45 -0.12 -0.15 *NT 13 0.12 0.52 0.13 0.24 -0.14 0.31 -0.04 -0.12 VNT 5 0.08 0.57 0.05 0.30 -0.18 0.36 -0.12 -0.06 *NW 11 0.14 0.32 0.13 0.41 -0.12 0.11 -0.04 0.05 INW 3 0.09 0.29 0.06 0.56 -0.17 0.08 -0.11 0.20 *P* 17 0.10 0.06 0.11 0.73 -0.16 -0.15 -0.06 0.37 *PD 9 0.06 0.09 0.10 0.75 -0.20 -0.12 -0.07 0.39 FPD 3 0.09 0.08 0.06 0.77 -0.17 -0.13 -0.11 0.41 YPD 3 0.09 0.08 0.06 0.77 -0.17 -0.13 -0.11 0.41 *S* 49 0.38 0.13 0.20 0.29 0.12 -0.08 0.03 -0.07 *SN 18 0.63 0.08 0.05 0.24 0.37 -0.13 -0.12 -0.12 FSN 7 0.63 0.13 0.08 0.16 0.37 -0.08 -0.09 -0.20 VSN 6 0.60 0.10 0.05 0.25 0.34 -0.11 -0.12 -0.11 *T* 86 0.45 0.09 0.31 0.15 0.19 -0.12 0.14 -0.21 *TD 32 0.13 0.12 0.61 0.14 -0.13 -0.09 0.44 -0.22 VTD 7 0.07 0.06 0.67 0.20 -0.19 -0.15 0.50 -0.16 *TN 31 0.66 0.08 0.13 0.13 0.40 -0.13 -0.04 -0.23 FTN 4 0.75 0.07 0.06 0.12 0.49 -0.14 -0.11 -0.24 ITN 5 0.77 0.06 0.05 0.11 0.51 -0.15 -0.12 -0.25 VTN 10 0.63 0.13 0.15 0.09 0.37 -0.08 -0.02 -0.27 *V* 48 0.29 0.21 0.08 0.43 0.03 0.00 -0.09 0.07 IVD 3 0.31 0.50 0.06 0.14 0.05 0.29 -0.11 -0.22 *VG 5 0.22 0.48 0.05 0.25 -0.04 0.27 -0.12 -0.11 *VT 4 0.25 0.07 0.06 0.62 -0.01 -0.14 -0.11 0.26 Background frequency 0.26 0.21 0.17 0.36
[0151] In Table 1, the combinations of the amino acids that showed significant base selectivity were mentioned. That is, these results mean that the PPR motifs having the amino acid species of the No. 1 amino acid, No. 4 amino acid, and No. "ii" (-2) amino acid ("NSRs (1, 4, and ii)" in the table) that provided a significant P value are PPR motifs that impart base-selective binding ability, and a larger "positive" value obtained after the subtraction of the background means higher base selectivity for the base. Among the No. 1 amino acid, No. 4 amino acid, and No. "ii" (-2) amino acid, the No. 4 amino acid most strongly affects the base selectivity, the No. "ii" (-2) amino acid affects the base selectivity next strongly, and the No. 1 amino acid most weakly affects the base selectivity among the three amino acids.
Example 2: Evaluation of Sequence-Specific DNA-Binding Ability PPR Molecules
[0152] In this example, artificial transcription factors were prepared by fusing VP64, which is a transcription activation domain, to the three kinds of DNA-binding type (expectedly) PPR molecules, p63, pTac2, and GUN1, and by examining whether they could activate luciferase reporters each having a corresponding target sequence in a human cultured cell, whether the PPR molecules had a sequence-specific DNA-binding ability or not was determined (FIG. 5).
Experimental Method
[0153] 1. Preparation of PPR-VP64 expression vector
[0154] Only the parts corresponding to the PPR motifs in the coding sequences of p63, pTac2, and GUN1 were prepared by artificial synthesis. For the DNA synthesis, the artificial gene synthesis service of Biomatik was used. The pCS2P vector having the CMV promoter was used as a backbone vector, and each synthesized PPR sequence was inserted into it. Further, the Flag tag and nuclear transfer signal were inserted at the N-terminus of the PPR sequence, and the VP64 sequence was inserted at the C-terminus of the same. The produced sequences of p63-VP64, pTac2-VP64, and GUN1-VP64 are shown in Sequence Listing as SEQ ID NOS: 7 to 9.
[0155] 2. Preparation of Reporter Vector Having PPR Target Sequence
[0156] A reporter vector (pminCMV-luc2, SEQ ID NO: 10) was prepared, in which the firefly luciferase gene was ligated downstream from the Minimal CMV promoter, and a multi-cloning site was placed upstream of the promoter. The predicted target sequence of each PPR was inserted into the vector at the multi-cloning site. The target sequence of each PPR (TCTATCACT for p63, AACTTTCGTCACTCA for pTac2, and AATTTGTCGAT for GUN1, SEQ ID NOS: 11 to 13 in Sequence Listing) was determined by predicting the motif-DNA recognition codes of DNA-binding type PPR from the motif-RNA recognition codes observed in the RNA-binding type PPR. For each PPR, sequences containing 4 or 8 of target sequences were prepared, and used in the following assay. The nucleotide sequences of the vectors are shown as SEQ ID NOS: 14 to 19 in Sequence Listing.
[0157] 3. Transfection into HEK293 T Cell
[0158] The PPR-VP64 expression vector prepared in the section 1, the firefly luciferase expression vector prepared in the section 2, and the pRL-CMV vector (expression vector for Renilla luciferase, Promega) as a reference were introduced by using Lipofectamine LTX (Life Technologies). The DMEM medium (25 .mu.l) was added to each well of a 96-well plate, and a mixture containing the PPR-VP64 expression vector (400 ng), firefly luciferase expression vector (100 ng), and pRL-CMV vector (20 ng) was further added. Then, a mixture of the DMEM medium (25 .mu.l) and Lipofectamine LTX (0.7 .mu.l) was added to each well, the plate was left standing at room temperature for 30 minutes, then 6.times.10.sup.4 of the HEK293 T cells suspended in the DMEM medium containing 15% fetal bovine serum (100 .mu.l) were added, and the cells were cultured at 37.degree. C. in a CO2 incubator for 24 hours.
[0159] 4. Luciferase Assay
[0160] Luciferase assay was performed by using Dual-Glo Luciferase Assay System (Promega) in accordance with the instructions attached to the kit. For the measurement of the luciferase activity, Tri Star LB 941 Plate Reader (Berthold) was used.
[0161] (Results and Discussion)
[0162] The luciferase activity was compared for the cases of introducing pTac2-VP64 or GUN1-VP64 together with pminCMV-luc2 for a negative control, or the reporter vector having 4 or 8 target sequences (table mentioned below, FIG. 6). The comparison of the activity was performed on the basis of standardized scores obtained by dividing the measured values obtained with Fluc (firefly luciferase) with the measured value obtained with Rluc (Renilla luciferase) as the reference (Fluc/Rluc). As a result, there was observed a tendency that the activity increased with increase of the number of the target sequence for the both cases, and thus it was verified that each of the PPR-VP64 molecules specifically bound to each target sequence, and functioned as a site-specific transcription activator.
TABLE-US-00007 Fluc reporter PPR-VP64 Reference Fluc Rluc Fluc/Rluc Fold activation pTac2-VP64 (negative control) pminCMV-luc2 pTac2-VP64 pRL-CMV 47744 4948 9.649151172 1 pTac2-VP64 (4x target) pTac2-4x target pTac2-VP64 pRL-CMV 133465 4757 28.05654824 2.907670089 pTac2-VP64 (8x target) pTac2-8x target pTac2-VP64 pRL-CMV 189146 4011 47.15681875 4.887146849 GUN1-VP64 (negative control) pminCMV-luc2 GUN1-VP64 pRL-CMV 29590 3799 7.788891814 1 GUN1-VP64 (4x target) GUN1-4x target GUN1-VP64 pRL-CMV 61070 2727 22.39457279 2.875193715 GUN1-VP64 (8x target) GUN1-8x target GUN1-VP64 pRL-CMV 66982 2731 24.52654705 3.14891356
Sequence CWU
1
1
231596PRTArabidopsis thaliana 1Met Phe Ala Leu Ser Lys Val Leu Arg Arg Thr
Gln Arg Leu Arg Leu1 5 10
15Gly Ala Cys Ser Ala Val Phe Ser Lys Asp Ile Gln Leu Gly Gly Glu
20 25 30Arg Ser Phe Asp Ser Asn Ser
Ile Ala Ser Thr Lys Arg Glu Ala Val 35 40
45Pro Arg Phe Tyr Glu Ile Ser Ser Leu Ser Asn Arg Ala Leu Ser
Ser 50 55 60Ser Ala Gly Thr Lys Ser
Asp Gln Glu Glu Asp Asp Leu Glu Asp Gly65 70
75 80Phe Ser Glu Leu Glu Gly Ser Lys Ser Gly Gln
Gly Ser Thr Ser Ser 85 90
95Asp Glu Asp Glu Gly Lys Leu Ser Ala Asp Glu Glu Glu Glu Glu Glu
100 105 110Leu Asp Leu Ile Glu Thr
Asp Val Ser Arg Lys Thr Val Glu Lys Lys 115 120
125Gln Ser Glu Leu Phe Lys Thr Ile Val Ser Ala Pro Gly Leu
Ser Ile 130 135 140Gly Ser Ala Leu Asp
Lys Trp Val Glu Glu Gly Asn Glu Ile Thr Arg145 150
155 160Val Glu Ile Ala Lys Ala Met Leu Gln Leu
Arg Arg Arg Arg Met Tyr 165 170
175Gly Arg Ala Leu Gln Met Ser Glu Trp Leu Glu Ala Asn Lys Lys Ile
180 185 190Glu Met Thr Glu Arg
Asp Tyr Ala Ser Arg Leu Asp Leu Thr Val Lys 195
200 205Ile Arg Gly Leu Glu Lys Gly Glu Ala Cys Met Gln
Lys Ile Pro Lys 210 215 220Ser Phe Lys
Gly Glu Val Leu Tyr Arg Thr Leu Leu Ala Asn Cys Val225
230 235 240Ala Ala Gly Asn Val Lys Lys
Ser Glu Leu Val Phe Asn Lys Met Lys 245
250 255Asp Leu Gly Phe Pro Leu Ser Gly Phe Thr Cys Asp
Gln Met Leu Leu 260 265 270Leu
His Lys Arg Ile Asp Arg Lys Lys Ile Ala Asp Val Leu Leu Leu 275
280 285Met Glu Lys Glu Asn Ile Lys Pro Ser
Leu Leu Thr Tyr Lys Ile Leu 290 295
300Ile Asp Val Lys Gly Ala Thr Asn Asp Ile Ser Gly Met Glu Gln Ile305
310 315 320Leu Glu Thr Met
Lys Asp Glu Gly Val Glu Leu Asp Phe Gln Thr Gln 325
330 335Ala Leu Thr Ala Arg His Tyr Ser Gly Ala
Gly Leu Lys Asp Lys Ala 340 345
350Glu Lys Val Leu Lys Glu Met Glu Gly Glu Ser Leu Glu Ala Asn Arg
355 360 365Arg Ala Phe Lys Asp Leu Leu
Ser Ile Tyr Ala Ser Leu Gly Arg Glu 370 375
380Asp Glu Val Lys Arg Ile Trp Lys Ile Cys Glu Ser Lys Pro Tyr
Phe385 390 395 400Glu Glu
Ser Leu Ala Ala Ile Gln Ala Phe Gly Lys Leu Asn Lys Val
405 410 415Gln Glu Ala Glu Ala Ile Phe
Glu Lys Ile Val Lys Met Asp Arg Arg 420 425
430Ala Ser Ser Ser Thr Tyr Ser Val Leu Leu Arg Val Tyr Val
Asp His 435 440 445Lys Met Leu Ser
Lys Gly Lys Asp Leu Val Lys Arg Met Ala Glu Ser 450
455 460Gly Cys Arg Ile Glu Ala Thr Thr Trp Asp Ala Leu
Ile Lys Leu Tyr465 470 475
480Val Glu Ala Gly Glu Val Glu Lys Ala Asp Ser Leu Leu Asp Lys Ala
485 490 495Ser Lys Gln Ser His
Thr Lys Leu Met Met Asn Ser Phe Met Tyr Ile 500
505 510Met Asp Glu Tyr Ser Lys Arg Gly Asp Val His Asn
Thr Glu Lys Ile 515 520 525Phe Leu
Lys Met Arg Glu Ala Gly Tyr Thr Ser Arg Leu Arg Gln Phe 530
535 540Gln Ala Leu Met Gln Ala Tyr Ile Asn Ala Lys
Ser Pro Ala Tyr Gly545 550 555
560Met Arg Asp Arg Leu Lys Ala Asp Asn Ile Phe Pro Asn Lys Ser Met
565 570 575Ala Ala Gln Leu
Ala Gln Gly Asp Pro Phe Lys Lys Thr Ala Ile Ser 580
585 590Asp Ile Leu Asp 5952918PRTArabidopsis
thaliana 2Met Ala Ser Thr Pro Pro His Trp Val Thr Thr Thr Asn Asn His
Arg1 5 10 15Pro Trp Leu
Pro Gln Arg Pro Arg Pro Gly Arg Ser Val Thr Ser Ala 20
25 30Pro Pro Ser Ser Ser Ala Ser Val Ser Ser
Ala His Leu Ser Gln Thr 35 40
45Thr Pro Asn Phe Ser Pro Leu Gln Thr Pro Lys Ser Asp Phe Ser Gly 50
55 60Arg Gln Ser Thr Arg Phe Val Ser Pro
Ala Thr Asn Asn His Arg Gln65 70 75
80Thr Arg Gln Asn Pro Asn Tyr Asn His Arg Pro Tyr Gly Ala
Ser Ser 85 90 95Ser Pro
Arg Gly Ser Ala Pro Pro Pro Ser Ser Val Ala Thr Val Ala 100
105 110Pro Ala Gln Leu Ser Gln Pro Pro Asn
Phe Ser Pro Leu Gln Thr Pro 115 120
125Lys Ser Asp Leu Ser Ser Asp Phe Ser Gly Arg Arg Ser Thr Arg Phe
130 135 140Val Ser Lys Met His Phe Gly
Arg Gln Lys Thr Thr Met Ala Thr Arg145 150
155 160His Ser Ser Ala Ala Glu Asp Ala Leu Gln Asn Ala
Ile Asp Phe Ser 165 170
175Gly Asp Asp Glu Met Phe His Ser Leu Met Leu Ser Phe Glu Ser Lys
180 185 190Leu Cys Gly Ser Asp Asp
Cys Thr Tyr Ile Ile Arg Glu Leu Gly Asn 195 200
205Arg Asn Glu Cys Asp Lys Ala Val Gly Phe Tyr Glu Phe Ala
Val Lys 210 215 220Arg Glu Arg Arg Lys
Asn Glu Gln Gly Lys Leu Ala Ser Ala Met Ile225 230
235 240Ser Thr Leu Gly Arg Tyr Gly Lys Val Thr
Ile Ala Lys Arg Ile Phe 245 250
255Glu Thr Ala Phe Ala Gly Gly Tyr Gly Asn Thr Val Tyr Ala Phe Ser
260 265 270Ala Leu Ile Ser Ala
Tyr Gly Arg Ser Gly Leu His Glu Glu Ala Ile 275
280 285Ser Val Phe Asn Ser Met Lys Glu Tyr Gly Leu Arg
Pro Asn Leu Val 290 295 300Thr Tyr Asn
Ala Val Ile Asp Ala Cys Gly Lys Gly Gly Met Glu Phe305
310 315 320Lys Gln Val Ala Lys Phe Phe
Asp Glu Met Gln Arg Asn Gly Val Gln 325
330 335Pro Asp Arg Ile Thr Phe Asn Ser Leu Leu Ala Val
Cys Ser Arg Gly 340 345 350Gly
Leu Trp Glu Ala Ala Arg Asn Leu Phe Asp Glu Met Thr Asn Arg 355
360 365Arg Ile Glu Gln Asp Val Phe Ser Tyr
Asn Thr Leu Leu Asp Ala Ile 370 375
380Cys Lys Gly Gly Gln Met Asp Leu Ala Phe Glu Ile Leu Ala Gln Met385
390 395 400Pro Val Lys Arg
Ile Met Pro Asn Val Val Ser Tyr Ser Thr Val Ile 405
410 415Asp Gly Phe Ala Lys Ala Gly Arg Phe Asp
Glu Ala Leu Asn Leu Phe 420 425
430Gly Glu Met Arg Tyr Leu Gly Ile Ala Leu Asp Arg Val Ser Tyr Asn
435 440 445Thr Leu Leu Ser Ile Tyr Thr
Lys Val Gly Arg Ser Glu Glu Ala Leu 450 455
460Asp Ile Leu Arg Glu Met Ala Ser Val Gly Ile Lys Lys Asp Val
Val465 470 475 480Thr Tyr
Asn Ala Leu Leu Gly Gly Tyr Gly Lys Gln Gly Lys Tyr Asp
485 490 495Glu Val Lys Lys Val Phe Thr
Glu Met Lys Arg Glu His Val Leu Pro 500 505
510Asn Leu Leu Thr Tyr Ser Thr Leu Ile Asp Gly Tyr Ser Lys
Gly Gly 515 520 525Leu Tyr Lys Glu
Ala Met Glu Ile Phe Arg Glu Phe Lys Ser Ala Gly 530
535 540Leu Arg Ala Asp Val Val Leu Tyr Ser Ala Leu Ile
Asp Ala Leu Cys545 550 555
560Lys Asn Gly Leu Val Gly Ser Ala Val Ser Leu Ile Asp Glu Met Thr
565 570 575Lys Glu Gly Ile Ser
Pro Asn Val Val Thr Tyr Asn Ser Ile Ile Asp 580
585 590Ala Phe Gly Arg Ser Ala Thr Met Asp Arg Ser Ala
Asp Tyr Ser Asn 595 600 605Gly Gly
Ser Leu Pro Phe Ser Ser Ser Ala Leu Ser Ala Leu Thr Glu 610
615 620Thr Glu Gly Asn Arg Val Ile Gln Leu Phe Gly
Gln Leu Thr Thr Glu625 630 635
640Ser Asn Asn Arg Thr Thr Lys Asp Cys Glu Glu Gly Met Gln Glu Leu
645 650 655Ser Cys Ile Leu
Glu Val Phe Arg Lys Met His Gln Leu Glu Ile Lys 660
665 670Pro Asn Val Val Thr Phe Ser Ala Ile Leu Asn
Ala Cys Ser Arg Cys 675 680 685Asn
Ser Phe Glu Asp Ala Ser Met Leu Leu Glu Glu Leu Arg Leu Phe 690
695 700Asp Asn Lys Val Tyr Gly Val Val His Gly
Leu Leu Met Gly Gln Arg705 710 715
720Glu Asn Val Trp Leu Gln Ala Gln Ser Leu Phe Asp Lys Val Asn
Glu 725 730 735Met Asp Gly
Ser Thr Ala Ser Ala Phe Tyr Asn Ala Leu Thr Asp Met 740
745 750Leu Trp His Phe Gly Gln Lys Arg Gly Ala
Glu Leu Val Ala Leu Glu 755 760
765Gly Arg Ser Arg Gln Val Trp Glu Asn Val Trp Ser Asp Ser Cys Leu 770
775 780Asp Leu His Leu Met Ser Ser Gly
Ala Ala Arg Ala Met Val His Ala785 790
795 800Trp Leu Leu Asn Ile Arg Ser Ile Val Tyr Glu Gly
His Glu Leu Pro 805 810
815Lys Val Leu Ser Ile Leu Thr Gly Trp Gly Lys His Ser Lys Val Val
820 825 830Gly Asp Gly Ala Leu Arg
Arg Ala Val Glu Val Leu Leu Arg Gly Met 835 840
845Asp Ala Pro Phe His Leu Ser Lys Cys Asn Met Gly Arg Phe
Thr Ser 850 855 860Ser Gly Ser Val Val
Ala Thr Trp Leu Arg Glu Ser Ala Thr Leu Lys865 870
875 880Leu Leu Ile Leu His Asp His Ile Thr Thr
Ala Thr Ala Thr Thr Thr 885 890
895Thr Met Lys Ser Thr Asp Gln Gln Gln Arg Lys Gln Thr Ser Phe Ala
900 905 910Leu Gln Pro Leu Leu
Leu 9153862PRTArabidopsis thaliana 3Met Asn Leu Ala Ile Pro Asn
Pro Asn Ser His His Leu Ser Phe Leu1 5 10
15Ile Gln Asn Ser Ser Phe Ile Gly Asn Arg Arg Phe Ala
Asp Gly Asn 20 25 30Arg Leu
Arg Phe Leu Ser Gly Gly Asn Arg Lys Pro Cys Ser Phe Ser 35
40 45Gly Lys Ile Lys Ala Lys Thr Lys Asp Leu
Val Leu Gly Asn Pro Ser 50 55 60Val
Ser Val Glu Lys Gly Lys Tyr Ser Tyr Asp Val Glu Ser Leu Ile65
70 75 80Asn Lys Leu Ser Ser Leu
Pro Pro Arg Gly Ser Ile Ala Arg Cys Leu 85
90 95Asp Ile Phe Lys Asn Lys Leu Ser Leu Asn Asp Phe
Ala Leu Val Phe 100 105 110Lys
Glu Phe Ala Gly Arg Gly Asp Trp Gln Arg Ser Leu Arg Leu Phe 115
120 125Lys Tyr Met Gln Arg Gln Ile Trp Cys
Lys Pro Asn Glu His Ile Tyr 130 135
140Thr Ile Met Ile Ser Leu Leu Gly Arg Glu Gly Leu Leu Asp Lys Cys145
150 155 160Leu Glu Val Phe
Asp Glu Met Pro Ser Gln Gly Val Ser Arg Ser Val 165
170 175Phe Ser Tyr Thr Ala Leu Ile Asn Ala Tyr
Gly Arg Asn Gly Arg Tyr 180 185
190Glu Thr Ser Leu Glu Leu Leu Asp Arg Met Lys Asn Glu Lys Ile Ser
195 200 205Pro Ser Ile Leu Thr Tyr Asn
Thr Val Ile Asn Ala Cys Ala Arg Gly 210 215
220Gly Leu Asp Trp Glu Gly Leu Leu Gly Leu Phe Ala Glu Met Arg
His225 230 235 240Glu Gly
Ile Gln Pro Asp Ile Val Thr Tyr Asn Thr Leu Leu Ser Ala
245 250 255Cys Ala Ile Arg Gly Leu Gly
Asp Glu Ala Glu Met Val Phe Arg Thr 260 265
270Met Asn Asp Gly Gly Ile Val Pro Asp Leu Thr Thr Tyr Ser
His Leu 275 280 285Val Glu Thr Phe
Gly Lys Leu Arg Arg Leu Glu Lys Val Cys Asp Leu 290
295 300Leu Gly Glu Met Ala Ser Gly Gly Ser Leu Pro Asp
Ile Thr Ser Tyr305 310 315
320Asn Val Leu Leu Glu Ala Tyr Ala Lys Ser Gly Ser Ile Lys Glu Ala
325 330 335Met Gly Val Phe His
Gln Met Gln Ala Ala Gly Cys Thr Pro Asn Ala 340
345 350Asn Thr Tyr Ser Val Leu Leu Asn Leu Phe Gly Gln
Ser Gly Arg Tyr 355 360 365Asp Asp
Val Arg Gln Leu Phe Leu Glu Met Lys Ser Ser Asn Thr Asp 370
375 380Pro Asp Ala Ala Thr Tyr Asn Ile Leu Ile Glu
Val Phe Gly Glu Gly385 390 395
400Gly Tyr Phe Lys Glu Val Val Thr Leu Phe His Asp Met Val Glu Glu
405 410 415Asn Ile Glu Pro
Asp Met Glu Thr Tyr Glu Gly Ile Ile Phe Ala Cys 420
425 430Gly Lys Gly Gly Leu His Glu Asp Ala Arg Lys
Ile Leu Gln Tyr Met 435 440 445Thr
Ala Asn Asp Ile Val Pro Ser Ser Lys Ala Tyr Thr Gly Val Ile 450
455 460Glu Ala Phe Gly Gln Ala Ala Leu Tyr Glu
Glu Ala Leu Val Ala Phe465 470 475
480Asn Thr Met His Glu Val Gly Ser Asn Pro Ser Ile Glu Thr Phe
His 485 490 495Ser Leu Leu
Tyr Ser Phe Ala Arg Gly Gly Leu Val Lys Glu Ser Glu 500
505 510Ala Ile Leu Ser Arg Leu Val Asp Ser Gly
Ile Pro Arg Asn Arg Asp 515 520
525Thr Phe Asn Ala Gln Ile Glu Ala Tyr Lys Gln Gly Gly Lys Phe Glu 530
535 540Glu Ala Val Lys Thr Tyr Val Asp
Met Glu Lys Ser Arg Cys Asp Pro545 550
555 560Asp Glu Arg Thr Leu Glu Ala Val Leu Ser Val Tyr
Ser Phe Ala Arg 565 570
575Leu Val Asp Glu Cys Arg Glu Gln Phe Glu Glu Met Lys Ala Ser Asp
580 585 590Ile Leu Pro Ser Ile Met
Cys Tyr Cys Met Met Leu Ala Val Tyr Gly 595 600
605Lys Thr Glu Arg Trp Asp Asp Val Asn Glu Leu Leu Glu Glu
Met Leu 610 615 620Ser Asn Arg Val Ser
Asn Ile His Gln Val Ile Gly Gln Met Ile Lys625 630
635 640Gly Asp Tyr Asp Asp Asp Ser Asn Trp Gln
Ile Val Glu Tyr Val Leu 645 650
655Asp Lys Leu Asn Ser Glu Gly Cys Gly Leu Gly Ile Arg Phe Tyr Asn
660 665 670Ala Leu Leu Asp Ala
Leu Trp Trp Leu Gly Gln Lys Glu Arg Ala Ala 675
680 685Arg Val Leu Asn Glu Ala Thr Lys Arg Gly Leu Phe
Pro Glu Leu Phe 690 695 700Arg Lys Asn
Lys Leu Val Trp Ser Val Asp Val His Arg Met Ser Glu705
710 715 720Gly Gly Met Tyr Thr Ala Leu
Ser Val Trp Leu Asn Asp Ile Asn Asp 725
730 735Met Leu Leu Lys Gly Asp Leu Pro Gln Leu Ala Val
Val Val Ser Val 740 745 750Arg
Gly Gln Leu Glu Lys Ser Ser Ala Ala Arg Glu Ser Pro Ile Ala 755
760 765Lys Ala Ala Phe Ser Phe Leu Gln Asp
His Val Ser Ser Ser Phe Ser 770 775
780Phe Thr Gly Trp Asn Gly Gly Arg Ile Met Cys Gln Arg Ser Gln Leu785
790 795 800Lys Gln Leu Leu
Ser Thr Lys Glu Pro Thr Ser Glu Glu Ser Glu Asn 805
810 815Lys Asn Leu Val Ala Leu Ala Asn Ser Pro
Ile Phe Ala Ala Gly Thr 820 825
830Arg Ala Ser Thr Ser Ser Asp Thr Asn His Ser Gly Asn Pro Thr Gln
835 840 845Arg Arg Thr Arg Thr Lys Lys
Glu Leu Ala Gly Ser Thr Ala 850 855
8604798PRTArabidopsis thaliana 4Met Asp Ala Ser Val Val Arg Phe Ser Gln
Ser Pro Ala Arg Val Pro1 5 10
15Pro Glu Phe Glu Pro Asp Met Glu Lys Ile Lys Arg Arg Leu Leu Lys
20 25 30Tyr Gly Val Asp Pro Thr
Pro Lys Ile Leu Asn Asn Leu Arg Lys Lys 35 40
45Glu Ile Gln Lys His Asn Arg Arg Thr Lys Arg Glu Thr Glu
Ser Glu 50 55 60Ala Glu Val Tyr Thr
Glu Ala Gln Lys Gln Ser Met Glu Glu Glu Ala65 70
75 80Arg Phe Gln Thr Leu Arg Arg Glu Tyr Lys
Gln Phe Thr Arg Ser Ile 85 90
95Ser Gly Lys Arg Gly Gly Asp Val Gly Leu Met Val Gly Asn Pro Trp
100 105 110Glu Gly Ile Glu Arg
Val Lys Leu Lys Glu Leu Val Ser Gly Val Arg 115
120 125Arg Glu Glu Val Ser Ala Gly Glu Leu Lys Lys Glu
Asn Leu Lys Glu 130 135 140Leu Lys Lys
Ile Leu Glu Lys Asp Leu Arg Trp Val Leu Asp Asp Asp145
150 155 160Val Asp Val Glu Glu Phe Asp
Leu Asp Lys Glu Phe Asp Pro Ala Lys 165
170 175Arg Trp Arg Asn Glu Gly Glu Ala Val Arg Val Leu
Val Asp Arg Leu 180 185 190Ser
Gly Arg Glu Ile Asn Glu Lys His Trp Lys Phe Val Arg Met Met 195
200 205Asn Gln Ser Gly Leu Gln Phe Thr Glu
Asp Gln Met Leu Lys Ile Val 210 215
220Asp Arg Leu Gly Arg Lys Gln Ser Trp Lys Gln Ala Ser Ala Val Val225
230 235 240His Trp Val Tyr
Ser Asp Lys Lys Arg Lys His Leu Arg Ser Arg Phe 245
250 255Val Tyr Thr Lys Leu Leu Ser Val Leu Gly
Phe Ala Arg Arg Pro Gln 260 265
270Glu Ala Leu Gln Ile Phe Asn Gln Met Leu Gly Asp Arg Gln Leu Tyr
275 280 285Pro Asp Met Ala Ala Tyr His
Cys Ile Ala Val Thr Leu Gly Gln Ala 290 295
300Gly Leu Leu Lys Glu Leu Leu Lys Val Ile Glu Arg Met Arg Gln
Lys305 310 315 320Pro Thr
Lys Leu Thr Lys Asn Leu Arg Gln Lys Asn Trp Asp Pro Val
325 330 335Leu Glu Pro Asp Leu Val Val
Tyr Asn Ala Ile Leu Asn Ala Cys Val 340 345
350Pro Thr Leu Gln Trp Lys Ala Val Ser Trp Val Phe Val Glu
Leu Arg 355 360 365Lys Asn Gly Leu
Arg Pro Asn Gly Ala Thr Tyr Gly Leu Ala Met Glu 370
375 380Val Met Leu Glu Ser Gly Lys Phe Asp Arg Val His
Asp Phe Phe Arg385 390 395
400Lys Met Lys Ser Ser Gly Glu Ala Pro Lys Ala Ile Thr Tyr Lys Val
405 410 415Leu Val Arg Ala Leu
Trp Arg Glu Gly Lys Ile Glu Glu Ala Val Glu 420
425 430Ala Val Arg Asp Met Glu Gln Lys Gly Val Ile Gly
Thr Gly Ser Val 435 440 445Tyr Tyr
Glu Leu Ala Cys Cys Leu Cys Asn Asn Gly Arg Trp Cys Asp 450
455 460Ala Met Leu Glu Val Gly Arg Met Lys Arg Leu
Glu Asn Cys Arg Pro465 470 475
480Leu Glu Ile Thr Phe Thr Gly Leu Ile Ala Ala Ser Leu Asn Gly Gly
485 490 495His Val Asp Asp
Cys Met Ala Ile Phe Gln Tyr Met Lys Asp Lys Cys 500
505 510Asp Pro Asn Ile Gly Thr Ala Asn Met Met Leu
Lys Val Tyr Gly Arg 515 520 525Asn
Asp Met Phe Ser Glu Ala Lys Glu Leu Phe Glu Glu Ile Val Ser 530
535 540Arg Lys Glu Thr His Leu Val Pro Asn Glu
Tyr Thr Tyr Ser Phe Met545 550 555
560Leu Glu Ala Ser Ala Arg Ser Leu Gln Trp Glu Tyr Phe Glu His
Val 565 570 575Tyr Gln Thr
Met Val Leu Ser Gly Tyr Gln Met Asp Gln Thr Lys His 580
585 590Ala Ser Met Leu Ile Glu Ala Ser Arg Ala
Gly Lys Trp Ser Leu Leu 595 600
605Glu His Ala Phe Asp Ala Val Leu Glu Asp Gly Glu Ile Pro His Pro 610
615 620Leu Phe Phe Thr Glu Leu Leu Cys
His Ala Thr Ala Lys Gly Asp Phe625 630
635 640Gln Arg Ala Ile Thr Leu Ile Asn Thr Val Ala Leu
Ala Ser Phe Gln 645 650
655Ile Ser Glu Glu Glu Trp Thr Asp Leu Phe Glu Glu His Gln Asp Trp
660 665 670Leu Thr Gln Asp Asn Leu
His Lys Leu Ser Asp His Leu Ile Glu Cys 675 680
685Asp Tyr Val Ser Glu Pro Thr Val Ser Asn Leu Ser Lys Ser
Leu Lys 690 695 700Ser Arg Cys Gly Ser
Ser Ser Ser Ser Ala Gln Pro Leu Leu Ala Val705 710
715 720Asp Val Thr Thr Gln Ser Gln Gly Glu Lys
Pro Glu Glu Asp Leu Leu 725 730
735Leu Gln Asp Thr Thr Met Glu Asp Asp Asn Ser Ala Asn Gly Glu Ala
740 745 750Trp Glu Phe Thr Glu
Thr Glu Leu Glu Thr Leu Gly Leu Glu Glu Leu 755
760 765Glu Ile Asp Asp Asp Glu Glu Ser Ser Asp Ser Asp
Ser Leu Ser Val 770 775 780Tyr Asp Ile
Leu Lys Glu Trp Glu Glu Ser Ser Lys Lys Glu785 790
7955913PRTArabidopsis thaliana 5Met Ser Leu Ser His Leu Leu Arg
Arg Leu Cys Thr Thr Thr Thr Thr1 5 10
15Thr Arg Ser Pro Leu Ser Ile Ser Phe Leu His Gln Arg Ile
His Asn 20 25 30Ile Ser Leu
Ser Pro Ala Asn Glu Asp Pro Glu Thr Thr Thr Gly Asn 35
40 45Asn Gln Asp Ser Glu Lys Tyr Pro Asn Leu Asn
Pro Ile Pro Asn Asp 50 55 60Pro Ser
Gln Phe Gln Ile Pro Gln Asn His Thr Pro Pro Ile Pro Tyr65
70 75 80Pro Pro Ile Pro His Arg Thr
Met Ala Phe Ser Ser Ala Glu Glu Ala 85 90
95Ala Ala Glu Arg Arg Arg Arg Lys Arg Arg Leu Arg Ile
Glu Pro Pro 100 105 110Leu His
Ala Leu Arg Arg Asp Pro Ser Ala Pro Pro Pro Lys Arg Asp 115
120 125Pro Asn Ala Pro Arg Leu Pro Asp Ser Thr
Ser Ala Leu Val Gly Gln 130 135 140Arg
Leu Asn Leu His Asn Arg Val Gln Ser Leu Ile Arg Ala Ser Asp145
150 155 160Leu Asp Ala Ala Ser Lys
Leu Ala Arg Gln Ser Val Phe Ser Asn Thr 165
170 175Arg Pro Thr Val Phe Thr Cys Asn Ala Ile Ile Ala
Ala Met Tyr Arg 180 185 190Ala
Lys Arg Tyr Ser Glu Ser Ile Ser Leu Phe Gln Tyr Phe Phe Lys 195
200 205Gln Ser Asn Ile Val Pro Asn Val Val
Ser Tyr Asn Gln Ile Ile Asn 210 215
220Ala His Cys Asp Glu Gly Asn Val Asp Glu Ala Leu Glu Val Tyr Arg225
230 235 240His Ile Leu Ala
Asn Ala Pro Phe Ala Pro Ser Ser Val Thr Tyr Arg 245
250 255His Leu Thr Lys Gly Leu Val Gln Ala Gly
Arg Ile Gly Asp Ala Ala 260 265
270Ser Leu Leu Arg Glu Met Leu Ser Lys Gly Gln Ala Ala Asp Ser Thr
275 280 285Val Tyr Asn Asn Leu Ile Arg
Gly Tyr Leu Asp Leu Gly Asp Phe Asp 290 295
300Lys Ala Val Glu Phe Phe Asp Glu Leu Lys Ser Lys Cys Thr Val
Tyr305 310 315 320Asp Gly
Ile Val Asn Ala Thr Phe Met Glu Tyr Trp Phe Glu Lys Gly
325 330 335Asn Asp Lys Glu Ala Met Glu
Ser Tyr Arg Ser Leu Leu Asp Lys Lys 340 345
350Phe Arg Met His Pro Pro Thr Gly Asn Val Leu Leu Glu Val
Phe Leu 355 360 365Lys Phe Gly Lys
Lys Asp Glu Ala Trp Ala Leu Phe Asn Glu Met Leu 370
375 380Asp Asn His Ala Pro Pro Asn Ile Leu Ser Val Asn
Ser Asp Thr Val385 390 395
400Gly Ile Met Val Asn Glu Cys Phe Lys Met Gly Glu Phe Ser Glu Ala
405 410 415Ile Asn Thr Phe Lys
Lys Val Gly Ser Lys Val Thr Ser Lys Pro Phe 420
425 430Val Met Asp Tyr Leu Gly Tyr Cys Asn Ile Val Thr
Arg Phe Cys Glu 435 440 445Gln Gly
Met Leu Thr Glu Ala Glu Arg Phe Phe Ala Glu Gly Val Ser 450
455 460Arg Ser Leu Pro Ala Asp Ala Pro Ser His Arg
Ala Met Ile Asp Ala465 470 475
480Tyr Leu Lys Ala Glu Arg Ile Asp Asp Ala Val Lys Met Leu Asp Arg
485 490 495Met Val Asp Val
Asn Leu Arg Val Val Ala Asp Phe Gly Ala Arg Val 500
505 510Phe Gly Glu Leu Ile Lys Asn Gly Lys Leu Thr
Glu Ser Ala Glu Val 515 520 525Leu
Thr Lys Met Gly Glu Arg Glu Pro Lys Pro Asp Pro Ser Ile Tyr 530
535 540Asp Val Val Val Arg Gly Leu Cys Asp Gly
Asp Ala Leu Asp Gln Ala545 550 555
560Lys Asp Ile Val Gly Glu Met Ile Arg His Asn Val Gly Val Thr
Thr 565 570 575Val Leu Arg
Glu Phe Ile Ile Glu Val Phe Glu Lys Ala Gly Arg Arg 580
585 590Glu Glu Ile Glu Lys Ile Leu Asn Ser Val
Ala Arg Pro Val Arg Asn 595 600
605Ala Gly Gln Ser Gly Asn Thr Pro Pro Arg Val Pro Ala Val Phe Gly 610
615 620Thr Thr Pro Ala Ala Pro Gln Gln
Pro Arg Asp Arg Ala Pro Trp Thr625 630
635 640Ser Gln Gly Val Val His Ser Asn Ser Gly Trp Ala
Asn Gly Thr Ala 645 650
655Gly Gln Thr Ala Gly Gly Ala Tyr Lys Ala Asn Asn Gly Gln Asn Pro
660 665 670Ser Trp Ser Asn Thr Ser
Asp Asn Gln Gln Gln Gln Ser Trp Ser Asn 675 680
685Gln Thr Ala Gly Gln Gln Pro Pro Ser Trp Ser Arg Gln Ala
Pro Gly 690 695 700Tyr Gln Gln Gln Gln
Ser Trp Ser Gln Gln Ser Gly Trp Ser Ser Pro705 710
715 720Ser Gly His Gln Gln Ser Trp Thr Asn Gln
Thr Ala Gly Gln Gln Gln 725 730
735Pro Trp Ala Asn Gln Thr Pro Gly Gln Gln Gln Gln Trp Ala Asn Gln
740 745 750Thr Pro Gly Gln Gln
Gln Gln Leu Ala Asn Gln Thr Pro Gly Gln Gln 755
760 765Gln Gln Trp Ala Asn Gln Thr Pro Gly Gln Gln Gln
Gln Trp Ala Asn 770 775 780Gln Asn Asn
Gly His Gln Gln Pro Trp Ala Asn Gln Asn Thr Gly His785
790 795 800Gln Gln Ser Trp Ala Asn Gln
Thr Pro Ser Gln Gln Gln Pro Trp Ala 805
810 815Asn Gln Thr Thr Gly Gln Gln Gln Gly Trp Gly Asn
Gln Thr Thr Gly 820 825 830Gln
Gln Gln Gln Trp Ala Asn Gln Thr Ala Gly Gln Gln Ser Gly Trp 835
840 845Thr Ala Gln Gln Gln Trp Ser Asn Gln
Thr Ala Ser His Gln Gln Ser 850 855
860Gln Trp Leu Asn Pro Val Pro Gly Glu Val Ala Asn Gln Thr Pro Trp865
870 875 880Ser Asn Ser Val
Asp Ser His Leu Pro Gln Gln Gln Glu Pro Gly Pro 885
890 895Ser His Glu Cys Gln Glu Thr Gln Glu Lys
Lys Val Val Glu Leu Arg 900 905
910Asn6196PRTFlabovacterium okeianocoites 6Ala Leu Val Lys Ser Glu Leu
Glu Glu Lys Lys Ser Glu Leu Arg His1 5 10
15Lys Leu Lys Tyr Val Pro His Glu Tyr Ile Glu Leu Ile
Glu Ile Ala 20 25 30Arg Asn
Ser Thr Gln Asp Arg Ile Leu Glu Met Lys Val Met Glu Phe 35
40 45Phe Met Lys Val Tyr Gly Tyr Arg Gly Lys
His Leu Gly Gly Ser Arg 50 55 60Lys
Pro Asp Gly Ala Ile Tyr Thr Val Gly Ser Pro Ile Asp Tyr Gly65
70 75 80Val Ile Val Asp Thr Lys
Ala Tyr Ser Gly Gly Tyr Asn Leu Pro Ile 85
90 95Gly Gln Ala Asp Glu Met Gln Arg Tyr Val Glu Glu
Asn Gln Thr Arg 100 105 110Asn
Lys His Ile Asn Pro Asn Glu Trp Trp Lys Val Tyr Pro Ser Ser 115
120 125Val Thr Glu Phe Lys Phe Leu Phe Val
Ser Gly His Phe Lys Gly Asn 130 135
140Tyr Lys Ala Gln Leu Thr Arg Leu Asn His Ile Thr Asn Cys Asn Gly145
150 155 160Ala Val Leu Ser
Val Glu Glu Leu Leu Ile Gly Gly Glu Met Ile Lys 165
170 175Ala Gly Thr Leu Thr Leu Glu Glu Val Arg
Arg Lys Phe Asn Asn Gly 180 185
190Glu Ile Asn Phe 19575303DNAArtificial Sequencep63-VP64
7cgccattctg cctggggacg tcggagcaag cttgatttag gtgacactat agaatacaag
60ctacttgttc tttttgcaag atctccacca tggactataa ggaccacgac ggagactaca
120aggatcatga tattgattac aaagacgatg acgataagat ggccccaaag aagaagcgga
180aggtcggtat ccccgggggc gaagtgctgt ataggacact gctggccaac tgcgtggctg
240ctgggaacgt gaagaagtcc gaactggtct tcaacaagat gaaggatctg gggttccccc
300tgagcggctt tacctgtgac caaatgctgc tgctgcacaa aaggattgat agaaagaaaa
360tcgctgatgt cctgctgctg atggaaaagg aaaatatcaa gccaagcctg ctgacctaca
420agatcctgat cgatgtgaag ggcgccacca acgacattag cgggatggaa cagattctgg
480aaacaatgaa agacgagggc gtggagctgg atttccaaac acaggccctg acagccaggc
540attactccgg cgctggactg aaagataagg cagaaaaggt gctgaaggaa atggagggag
600agtccctgga agcaaatagg agggccttta aggacctgct gtccatttac gcctccctgg
660gcagagaaga cgaagtgaaa agaatttgga agatttgcga gtccaaacca tactttgagg
720aatccctggc cgctatccaa gcattcggca agctgaataa ggtgcaagaa gccgaggcaa
780tcttcgaaaa gattgtgaag atggatagaa gagcaagctc cagcacatac tccgtcctgc
840tgagagtgta cgtggatcat aagatgctga gcaaaggcaa agacctggtg aagagaatgg
900ccgagagcgg gtgcagaatt gaagccacca cctgggacgc tctgatcaaa ctgtatgtcg
960aggctgggga ggtggaaaaa gccgattccc tgctggataa agccagcaaa caatcccaca
1020ctaaactgat gatgaatagc ttcatgtata tcatggacga gtatagcaag aggggcgacg
1080tgcacaatac cgaaaaaatc tttctgaaaa tgagggaagc cgggtatact agcggatccg
1140gacgggctga cgcattggac gattttgatc tggatatgct gggaagtgac gccctcgatg
1200attttgacct tgacatgctt ggttcggatg cccttgatga ctttgacctc gacatgctcg
1260gcagtgacgc ccttgatgat ttcgacctgg acatgctgat taactctagt tgatctagat
1320tctgcagccc tatagtgagt cgtattacgt agatccagac atgataagat acattgatga
1380gtttggacaa accacaacta gaatgcagtg aaaaaaatgc tttatttgtg aaatttgtga
1440tgctattgct ttatttgtaa ccattataag ctgcaataaa caagttaaca acaacaattg
1500cattcatttt atgtttcagg ttcaggggga ggtgtgggag gttttttaat tcgcggccgc
1560ggcgccaatg cattgggccc ggtacccagc ttttgttccc tttagtgagg gttaattgcg
1620cgcttggcgt aatcatggtc atagctgttt cctgtgtgaa attgttatcc gctcacaatt
1680ccacacaaca tacgagccgg aagcataaag tgtaaagcct ggggtgccta atgagtgagc
1740taactcacat taattgcgtt gcgctcactg cccgctttcc agtcgggaaa cctgtcgtgc
1800cagctgcatt aatgaatcgg ccaacgcgcg gggagaggcg gtttgcgtat tgggcgctct
1860tccgcttcct cgctcactga ctcgctgcgc tcggtcgttc ggctgcggcg agcggtatca
1920gctcactcaa aggcggtaat acggttatcc acagaatcag gggataacgc aggaaagaac
1980atgtgagcaa aaggccagca aaaggccagg aaccgtaaaa aggccgcgtt gctggcgttt
2040ttccataggc tccgcccccc tgacgagcat cacaaaaatc gacgctcaag tcagaggtgg
2100cgaaacccga caggactata aagataccag gcgtttcccc ctggaagctc cctcgtgcgc
2160tctcctgttc cgaccctgcc gcttaccgga tacctgtccg cctttctccc ttcgggaagc
2220gtggcgcttt ctcatagctc acgctgtagg tatctcagtt cggtgtaggt cgttcgctcc
2280aagctgggct gtgtgcacga accccccgtt cagcccgacc gctgcgcctt atccggtaac
2340tatcgtcttg agtccaaccc ggtaagacac gacttatcgc cactggcagc agccactggt
2400aacaggatta gcagagcgag gtatgtaggc ggtgctacag agttcttgaa gtggtggcct
2460aactacggct acactagaag gacagtattt ggtatctgcg ctctgctgaa gccagttacc
2520ttcggaaaaa gagttggtag ctcttgatcc ggcaaacaaa ccaccgctgg tagcggtggt
2580ttttttgttt gcaagcagca gattacgcgc agaaaaaaag gatctcaaga agatcctttg
2640atcttttcta cggggtctga cgctcagtgg aacgaaaact cacgttaagg gattttggtc
2700atgagattat caaaaaggat cttcacctag atccttttaa attaaaaatg aagttttaaa
2760tcaatctaaa gtatatatga gtaaacttgg tctgacagtt accaatgctt aatcagtgag
2820gcacctatct cagcgatctg tctatttcgt tcatccatag ttgcctgact ccccgtcgtg
2880tagataacta cgatacggga gggcttacca tctggcccca gtgctgcaat gataccgcga
2940gacccacgct caccggctcc agatttatca gcaataaacc agccagccgg aagggccgag
3000cgcagaagtg gtcctgcaac tttatccgcc tccatccagt ctattaattg ttgccgggaa
3060gctagagtaa gtagttcgcc agttaatagt ttgcgcaacg ttgttgccat tgctacaggc
3120atcgtggtgt cacgctcgtc gtttggtatg gcttcattca gctccggttc ccaacgatca
3180aggcgagtta catgatcccc catgttgtgc aaaaaagcgg ttagctcctt cggtcctccg
3240atcgttgtca gaagtaagtt ggccgcagtg ttatcactca tggttatggc agcactgcat
3300aattctctta ctgtcatgcc atccgtaaga tgcttttctg tgactggtga gtactcaacc
3360aagtcattct gagaatagtg tatgcggcga ccgagttgct cttgcccggc gtcaatacgg
3420gataataccg cgccacatag cagaacttta aaagtgctca tcattggaaa acgttcttcg
3480gggcgaaaac tctcaaggat cttaccgctg ttgagatcca gttcgatgta acccactcgt
3540gcacccaact gatcttcagc atcttttact ttcaccagcg tttctgggtg agcaaaaaca
3600ggaaggcaaa atgccgcaaa aaagggaata agggcgacac ggaaatgttg aatactcata
3660ctcttccttt ttcaatatta ttgaagcatt tatcagggtt attgtctcat gagcggatac
3720atatttgaat gtatttagaa aaataaacaa ataggggttc cgcgcacatt tccccgaaaa
3780gtgccaccta aattgtaagc gttaatattt tgttaaaatt cgcgttaaat ttttgttaaa
3840tcagctcatt ttttaaccaa taggccgaaa tcggcaaaat cccttataaa tcaaaagaat
3900agaccgagat agggttgagt gttgttccag tttggaacaa gagtccacta ttaaagaacg
3960tggactccaa cgtcaaaggg cgaaaaaccg tctatcaggg cgatggccca ctacgtgaac
4020catcacccta atcaagtttt ttggggtcga ggtgccgtaa agcactaaat cggaacccta
4080aagggagccc ccgatttaga gcttgacggg gaaagccggc gaacgtggcg agaaaggaag
4140ggaagaaagc gaaaggagcg ggcgctaggg cgctggcaag tgtagcggtc acgctgcgcg
4200taaccaccac acccgccgcg cttaatgcgc cgctacaggg cgcgtcccat tcgccattca
4260ggctgcgcaa ctgttgggaa gggcgatcgg tgcgggcctc ttcgctatta cgccagtcga
4320ccatagccaa ttcaatatgg cgtatatgga ctcatgccaa ttcaatatgg tggatctgga
4380cctgtgccaa ttcaatatgg cgtatatgga ctcgtgccaa ttcaatatgg tggatctgga
4440ccccagccaa ttcaatatgg cggacttggc accatgccaa ttcaatatgg cggacttggc
4500actgtgccaa ctggggaggg gtctacttgg cacggtgcca agtttgagga ggggtcttgg
4560ccctgtgcca agtccgccat attgaattgg catggtgcca ataatggcgg ccatattggc
4620tatatgccag gatcaatata taggcaatat ccaatatggc cctatgccaa tatggctatt
4680ggccaggttc aatactatgt attggcccta tgccatatag tattccatat atgggttttc
4740ctattgacgt agatagcccc tcccaatggg cggtcccata taccatatat ggggcttcct
4800aataccgccc atagccactc ccccattgac gtcaatggtc tctatatatg gtctttccta
4860ttgacgtcat atgggcggtc ctattgacgt atatggcgcc tcccccattg acgtcaatta
4920cggtaaatgg cccgcctggc tcaatgccca ttgacgtcaa taggaccacc caccattgac
4980gtcaatggga tggctcattg cccattcata tccgttctca cgccccctat tgacgtcaat
5040gacggtaaat ggcccacttg gcagtacatc aatatctatt aatagtaact tggcaagtac
5100attactattg gaaggacgcc agggtacatt ggcagtactc ccattgacgt caatggcggt
5160aaatggcccg cgatggctgc caagtacatc cccattgacg tcaatgggga ggggcaatga
5220cgcaaatggg cgttccattg acgtaaatgg gcggtaggcg tgcctaatgg gaggtctata
5280taagcaatgc tcgtttaggg aac
530385948DNAArtificial SequencepTac2-VP64 8cgccattctg cctggggacg
tcggagcaag cttgatttag gtgacactat agaatacaag 60ctacttgttc tttttgcaag
atctccacca tggactataa ggaccacgac ggagactaca 120aggatcatga tattgattac
aaagacgatg acgataagat ggccccaaag aagaagcgga 180aggtcggtat ccccgggtcc
ctgaacgact ttgcactggt ctttaaggaa ttcgcaggaa 240ggggggattg gcaaagatcc
ctgagactgt ttaagtatat gcagaggcaa atctggtgca 300aacccaatga gcatatctat
accattatga tttccctgct ggggagagaa ggactgctgg 360ataaatgtct ggaagtgttt
gacgaaatgc cttcccaagg agtgagcagg agcgtgttca 420gctacactgc actgattaac
gcctacggca gaaacggcag gtacgaaact agcctggagc 480tgctggacag aatgaaaaac
gagaagatca gcccaagcat cctgacttat aacacagtga 540tcaatgcttg tgccagaggc
ggactggact gggagggcct gctgggcctg ttcgcagaga 600tgaggcacga agggattcaa
cccgacatcg tgacttacaa tactctgctg tccgcatgtg 660caattagggg cctgggggac
gaagctgaaa tggtcttcag gactatgaat gacggcggaa 720tcgtgcccga tctgaccaca
tattcccatc tggtcgagac ctttgggaaa ctgaggagac 780tggagaaggt gtgcgatctg
ctgggagaaa tggctagcgg aggctccctg ccagatatta 840cctcctacaa cgtgctgctg
gaagcctacg ccaagtccgg ctccatcaag gaggctatgg 900gcgtgtttca tcagatgcaa
gccgctggct gtacccccaa tgccaacacc tattccgtcc 960tgctgaatct gttcggccag
agcgggagat acgatgacgt gaggcagctg tttctggaaa 1020tgaagagcag caacaccgac
cccgacgctg caacatacaa cattctgatc gaggtgtttg 1080gcgagggggg ctacttcaaa
gaagtcgtca ccctgttcca cgacatggtg gaggaaaaca 1140tcgagcccga tatggagacc
tatgagggga tcatcttcgc ttgcggcaaa ggcggcctgc 1200atgaggacgc taggaagatc
ctgcagtaca tgaccgctaa tgacattgtc ccatcctcca 1260aagcttatac cggcgtgatc
gaggccttcg gccaggctgc cctgtacgag gaagcactgg 1320tcgcctttaa caccatgcac
gaggtcggca gcaacccttc catcgagacc ttccactccc 1380tgctgtatag cttcgccaga
ggcgggctgg tgaaggagtc cgaggcaatc ctgagcaggc 1440tggtcgattc cggcatcccc
aggaacagag acacctttaa tgctcaaatt gaggcctaca 1500aacagggggg gaagttcgaa
gaggctgtga agacctacgt cgacatggaa aagagcaggt 1560gcgaccccga cgagaggacc
ctggaggccg tcctgtccgt gtattccttc gcaagactgg 1620tggatgagtg cagggaacag
tttgaagaaa tgaaggccag cgacattctg cccagcatta 1680tgtgctactg catgatgctg
gcagtgtacg ggaagaccga gaggtgggac gacgtgaacg 1740aactgctgga ggagatgctg
agcaacaggg tcagcaacgg atccggacgg gctgacgcat 1800tggacgattt tgatctggat
atgctgggaa gtgacgccct cgatgatttt gaccttgaca 1860tgcttggttc ggatgccctt
gatgactttg acctcgacat gctcggcagt gacgcccttg 1920atgatttcga cctggacatg
ctgattaact ctagttgatc tagattctgc agccctatag 1980tgagtcgtat tacgtagatc
cagacatgat aagatacatt gatgagtttg gacaaaccac 2040aactagaatg cagtgaaaaa
aatgctttat ttgtgaaatt tgtgatgcta ttgctttatt 2100tgtaaccatt ataagctgca
ataaacaagt taacaacaac aattgcattc attttatgtt 2160tcaggttcag ggggaggtgt
gggaggtttt ttaattcgcg gccgcggcgc caatgcattg 2220ggcccggtac ccagcttttg
ttccctttag tgagggttaa ttgcgcgctt ggcgtaatca 2280tggtcatagc tgtttcctgt
gtgaaattgt tatccgctca caattccaca caacatacga 2340gccggaagca taaagtgtaa
agcctggggt gcctaatgag tgagctaact cacattaatt 2400gcgttgcgct cactgcccgc
tttccagtcg ggaaacctgt cgtgccagct gcattaatga 2460atcggccaac gcgcggggag
aggcggtttg cgtattgggc gctcttccgc ttcctcgctc 2520actgactcgc tgcgctcggt
cgttcggctg cggcgagcgg tatcagctca ctcaaaggcg 2580gtaatacggt tatccacaga
atcaggggat aacgcaggaa agaacatgtg agcaaaaggc 2640cagcaaaagg ccaggaaccg
taaaaaggcc gcgttgctgg cgtttttcca taggctccgc 2700ccccctgacg agcatcacaa
aaatcgacgc tcaagtcaga ggtggcgaaa cccgacagga 2760ctataaagat accaggcgtt
tccccctgga agctccctcg tgcgctctcc tgttccgacc 2820ctgccgctta ccggatacct
gtccgccttt ctcccttcgg gaagcgtggc gctttctcat 2880agctcacgct gtaggtatct
cagttcggtg taggtcgttc gctccaagct gggctgtgtg 2940cacgaacccc ccgttcagcc
cgaccgctgc gccttatccg gtaactatcg tcttgagtcc 3000aacccggtaa gacacgactt
atcgccactg gcagcagcca ctggtaacag gattagcaga 3060gcgaggtatg taggcggtgc
tacagagttc ttgaagtggt ggcctaacta cggctacact 3120agaaggacag tatttggtat
ctgcgctctg ctgaagccag ttaccttcgg aaaaagagtt 3180ggtagctctt gatccggcaa
acaaaccacc gctggtagcg gtggtttttt tgtttgcaag 3240cagcagatta cgcgcagaaa
aaaaggatct caagaagatc ctttgatctt ttctacgggg 3300tctgacgctc agtggaacga
aaactcacgt taagggattt tggtcatgag attatcaaaa 3360aggatcttca cctagatcct
tttaaattaa aaatgaagtt ttaaatcaat ctaaagtata 3420tatgagtaaa cttggtctga
cagttaccaa tgcttaatca gtgaggcacc tatctcagcg 3480atctgtctat ttcgttcatc
catagttgcc tgactccccg tcgtgtagat aactacgata 3540cgggagggct taccatctgg
ccccagtgct gcaatgatac cgcgagaccc acgctcaccg 3600gctccagatt tatcagcaat
aaaccagcca gccggaaggg ccgagcgcag aagtggtcct 3660gcaactttat ccgcctccat
ccagtctatt aattgttgcc gggaagctag agtaagtagt 3720tcgccagtta atagtttgcg
caacgttgtt gccattgcta caggcatcgt ggtgtcacgc 3780tcgtcgtttg gtatggcttc
attcagctcc ggttcccaac gatcaaggcg agttacatga 3840tcccccatgt tgtgcaaaaa
agcggttagc tccttcggtc ctccgatcgt tgtcagaagt 3900aagttggccg cagtgttatc
actcatggtt atggcagcac tgcataattc tcttactgtc 3960atgccatccg taagatgctt
ttctgtgact ggtgagtact caaccaagtc attctgagaa 4020tagtgtatgc ggcgaccgag
ttgctcttgc ccggcgtcaa tacgggataa taccgcgcca 4080catagcagaa ctttaaaagt
gctcatcatt ggaaaacgtt cttcggggcg aaaactctca 4140aggatcttac cgctgttgag
atccagttcg atgtaaccca ctcgtgcacc caactgatct 4200tcagcatctt ttactttcac
cagcgtttct gggtgagcaa aaacaggaag gcaaaatgcc 4260gcaaaaaagg gaataagggc
gacacggaaa tgttgaatac tcatactctt cctttttcaa 4320tattattgaa gcatttatca
gggttattgt ctcatgagcg gatacatatt tgaatgtatt 4380tagaaaaata aacaaatagg
ggttccgcgc acatttcccc gaaaagtgcc acctaaattg 4440taagcgttaa tattttgtta
aaattcgcgt taaatttttg ttaaatcagc tcatttttta 4500accaataggc cgaaatcggc
aaaatccctt ataaatcaaa agaatagacc gagatagggt 4560tgagtgttgt tccagtttgg
aacaagagtc cactattaaa gaacgtggac tccaacgtca 4620aagggcgaaa aaccgtctat
cagggcgatg gcccactacg tgaaccatca ccctaatcaa 4680gttttttggg gtcgaggtgc
cgtaaagcac taaatcggaa ccctaaaggg agcccccgat 4740ttagagcttg acggggaaag
ccggcgaacg tggcgagaaa ggaagggaag aaagcgaaag 4800gagcgggcgc tagggcgctg
gcaagtgtag cggtcacgct gcgcgtaacc accacacccg 4860ccgcgcttaa tgcgccgcta
cagggcgcgt cccattcgcc attcaggctg cgcaactgtt 4920gggaagggcg atcggtgcgg
gcctcttcgc tattacgcca gtcgaccata gccaattcaa 4980tatggcgtat atggactcat
gccaattcaa tatggtggat ctggacctgt gccaattcaa 5040tatggcgtat atggactcgt
gccaattcaa tatggtggat ctggacccca gccaattcaa 5100tatggcggac ttggcaccat
gccaattcaa tatggcggac ttggcactgt gccaactggg 5160gaggggtcta cttggcacgg
tgccaagttt gaggaggggt cttggccctg tgccaagtcc 5220gccatattga attggcatgg
tgccaataat ggcggccata ttggctatat gccaggatca 5280atatataggc aatatccaat
atggccctat gccaatatgg ctattggcca ggttcaatac 5340tatgtattgg ccctatgcca
tatagtattc catatatggg ttttcctatt gacgtagata 5400gcccctccca atgggcggtc
ccatatacca tatatggggc ttcctaatac cgcccatagc 5460cactccccca ttgacgtcaa
tggtctctat atatggtctt tcctattgac gtcatatggg 5520cggtcctatt gacgtatatg
gcgcctcccc cattgacgtc aattacggta aatggcccgc 5580ctggctcaat gcccattgac
gtcaatagga ccacccacca ttgacgtcaa tgggatggct 5640cattgcccat tcatatccgt
tctcacgccc cctattgacg tcaatgacgg taaatggccc 5700acttggcagt acatcaatat
ctattaatag taacttggca agtacattac tattggaagg 5760acgccagggt acattggcag
tactcccatt gacgtcaatg gcggtaaatg gcccgcgatg 5820gctgccaagt acatccccat
tgacgtcaat ggggaggggc aatgacgcaa atgggcgttc 5880cattgacgta aatgggcggt
aggcgtgcct aatgggaggt ctatataagc aatgctcgtt 5940tagggaac
594895531DNAArtificial
SequenceGUN1-VP64 9cgccattctg cctggggacg tcggagcaag cttgatttag gtgacactat
agaatacaag 60ctacttgttc tttttgcaag atctccacca tggactataa ggaccacgac
ggagactaca 120aggatcatga tattgattac aaagacgatg acgataagat ggccccaaag
aagaagcgga 180aggtcggtat ccccgggcaa ggcaagctgg caagcgccat gatctccacc
ctgggcaggt 240acggaaaggt gaccattgcc aagaggatct tcgagaccgc cttcgcaggc
gggtacggca 300acaccgtgta tgctttttcc gccctgatta gcgcatatgg cagaagcggc
ctgcacgaag 360aggccattag cgtgtttaac agcatgaagg agtatggact gaggcccaac
ctggtgacct 420acaacgccgt cattgatgct tgcggcaagg gcggcatgga attcaagcag
gtggccaagt 480tcttcgatga aatgcagagg aacggcgtgc agcctgacag aattacattc
aatagcctgc 540tggctgtgtg cagcagaggg ggcctgtggg aggcagctag gaatctgttt
gacgagatga 600ccaatagaag gatcgagcag gacgtgttct cctataatac actgctggac
gccatttgta 660aaggcgggca aatggacctg gccttcgaaa tcctggccca gatgcccgtc
aaaaggatca 720tgcccaacgt ggtcagctac tccacagtca tcgacgggtt cgccaaggct
ggcaggtttg 780atgaagcact gaacctgttc ggggaaatga gatacctggg aatcgccctg
gacagggtga 840gctacaacac cctgctgagc atctacacta aggtcggcag atccgaggaa
gccctggaca 900tcctgaggga aatggcctcc gtgggcatta agaaggacgt cgtgacatac
aatgccctgc 960tgggcggcta cggcaaacag ggcaagtacg acgaggtcaa gaaggtcttc
acagagatga 1020agagggaaca cgtgctgcca aatctgctga cttattccac tctgattgat
ggctactcca 1080aaggcggact gtacaaggaa gccatggaga ttttcagaga gttcaagagc
gctggcctga 1140gagccgatgt cgtgctgtat tccgcactga tcgatgcact gtgcaaaaac
ggcctggtcg 1200gcagcgccgt gagcctgatc gacgagatga ccaaggaggg aattagcccc
aatgtggtga 1260cttacaatag catcattgat gctttcggca gaagcgccac catggacaga
tccgccgact 1320atagcaacgg cggcagcctg ccattttcct ccagcgccct gggatccgga
cgggctgacg 1380cattggacga ttttgatctg gatatgctgg gaagtgacgc cctcgatgat
tttgaccttg 1440acatgcttgg ttcggatgcc cttgatgact ttgacctcga catgctcggc
agtgacgccc 1500ttgatgattt cgacctggac atgctgatta actctagttg atctagattc
tgcagcccta 1560tagtgagtcg tattacgtag atccagacat gataagatac attgatgagt
ttggacaaac 1620cacaactaga atgcagtgaa aaaaatgctt tatttgtgaa atttgtgatg
ctattgcttt 1680atttgtaacc attataagct gcaataaaca agttaacaac aacaattgca
ttcattttat 1740gtttcaggtt cagggggagg tgtgggaggt tttttaattc gcggccgcgg
cgccaatgca 1800ttgggcccgg tacccagctt ttgttccctt tagtgagggt taattgcgcg
cttggcgtaa 1860tcatggtcat agctgtttcc tgtgtgaaat tgttatccgc tcacaattcc
acacaacata 1920cgagccggaa gcataaagtg taaagcctgg ggtgcctaat gagtgagcta
actcacatta 1980attgcgttgc gctcactgcc cgctttccag tcgggaaacc tgtcgtgcca
gctgcattaa 2040tgaatcggcc aacgcgcggg gagaggcggt ttgcgtattg ggcgctcttc
cgcttcctcg 2100ctcactgact cgctgcgctc ggtcgttcgg ctgcggcgag cggtatcagc
tcactcaaag 2160gcggtaatac ggttatccac agaatcaggg gataacgcag gaaagaacat
gtgagcaaaa 2220ggccagcaaa aggccaggaa ccgtaaaaag gccgcgttgc tggcgttttt
ccataggctc 2280cgcccccctg acgagcatca caaaaatcga cgctcaagtc agaggtggcg
aaacccgaca 2340ggactataaa gataccaggc gtttccccct ggaagctccc tcgtgcgctc
tcctgttccg 2400accctgccgc ttaccggata cctgtccgcc tttctccctt cgggaagcgt
ggcgctttct 2460catagctcac gctgtaggta tctcagttcg gtgtaggtcg ttcgctccaa
gctgggctgt 2520gtgcacgaac cccccgttca gcccgaccgc tgcgccttat ccggtaacta
tcgtcttgag 2580tccaacccgg taagacacga cttatcgcca ctggcagcag ccactggtaa
caggattagc 2640agagcgaggt atgtaggcgg tgctacagag ttcttgaagt ggtggcctaa
ctacggctac 2700actagaagga cagtatttgg tatctgcgct ctgctgaagc cagttacctt
cggaaaaaga 2760gttggtagct cttgatccgg caaacaaacc accgctggta gcggtggttt
ttttgtttgc 2820aagcagcaga ttacgcgcag aaaaaaagga tctcaagaag atcctttgat
cttttctacg 2880gggtctgacg ctcagtggaa cgaaaactca cgttaaggga ttttggtcat
gagattatca 2940aaaaggatct tcacctagat ccttttaaat taaaaatgaa gttttaaatc
aatctaaagt 3000atatatgagt aaacttggtc tgacagttac caatgcttaa tcagtgaggc
acctatctca 3060gcgatctgtc tatttcgttc atccatagtt gcctgactcc ccgtcgtgta
gataactacg 3120atacgggagg gcttaccatc tggccccagt gctgcaatga taccgcgaga
cccacgctca 3180ccggctccag atttatcagc aataaaccag ccagccggaa gggccgagcg
cagaagtggt 3240cctgcaactt tatccgcctc catccagtct attaattgtt gccgggaagc
tagagtaagt 3300agttcgccag ttaatagttt gcgcaacgtt gttgccattg ctacaggcat
cgtggtgtca 3360cgctcgtcgt ttggtatggc ttcattcagc tccggttccc aacgatcaag
gcgagttaca 3420tgatccccca tgttgtgcaa aaaagcggtt agctccttcg gtcctccgat
cgttgtcaga 3480agtaagttgg ccgcagtgtt atcactcatg gttatggcag cactgcataa
ttctcttact 3540gtcatgccat ccgtaagatg cttttctgtg actggtgagt actcaaccaa
gtcattctga 3600gaatagtgta tgcggcgacc gagttgctct tgcccggcgt caatacggga
taataccgcg 3660ccacatagca gaactttaaa agtgctcatc attggaaaac gttcttcggg
gcgaaaactc 3720tcaaggatct taccgctgtt gagatccagt tcgatgtaac ccactcgtgc
acccaactga 3780tcttcagcat cttttacttt caccagcgtt tctgggtgag caaaaacagg
aaggcaaaat 3840gccgcaaaaa agggaataag ggcgacacgg aaatgttgaa tactcatact
cttccttttt 3900caatattatt gaagcattta tcagggttat tgtctcatga gcggatacat
atttgaatgt 3960atttagaaaa ataaacaaat aggggttccg cgcacatttc cccgaaaagt
gccacctaaa 4020ttgtaagcgt taatattttg ttaaaattcg cgttaaattt ttgttaaatc
agctcatttt 4080ttaaccaata ggccgaaatc ggcaaaatcc cttataaatc aaaagaatag
accgagatag 4140ggttgagtgt tgttccagtt tggaacaaga gtccactatt aaagaacgtg
gactccaacg 4200tcaaagggcg aaaaaccgtc tatcagggcg atggcccact acgtgaacca
tcaccctaat 4260caagtttttt ggggtcgagg tgccgtaaag cactaaatcg gaaccctaaa
gggagccccc 4320gatttagagc ttgacgggga aagccggcga acgtggcgag aaaggaaggg
aagaaagcga 4380aaggagcggg cgctagggcg ctggcaagtg tagcggtcac gctgcgcgta
accaccacac 4440ccgccgcgct taatgcgccg ctacagggcg cgtcccattc gccattcagg
ctgcgcaact 4500gttgggaagg gcgatcggtg cgggcctctt cgctattacg ccagtcgacc
atagccaatt 4560caatatggcg tatatggact catgccaatt caatatggtg gatctggacc
tgtgccaatt 4620caatatggcg tatatggact cgtgccaatt caatatggtg gatctggacc
ccagccaatt 4680caatatggcg gacttggcac catgccaatt caatatggcg gacttggcac
tgtgccaact 4740ggggaggggt ctacttggca cggtgccaag tttgaggagg ggtcttggcc
ctgtgccaag 4800tccgccatat tgaattggca tggtgccaat aatggcggcc atattggcta
tatgccagga 4860tcaatatata ggcaatatcc aatatggccc tatgccaata tggctattgg
ccaggttcaa 4920tactatgtat tggccctatg ccatatagta ttccatatat gggttttcct
attgacgtag 4980atagcccctc ccaatgggcg gtcccatata ccatatatgg ggcttcctaa
taccgcccat 5040agccactccc ccattgacgt caatggtctc tatatatggt ctttcctatt
gacgtcatat 5100gggcggtcct attgacgtat atggcgcctc ccccattgac gtcaattacg
gtaaatggcc 5160cgcctggctc aatgcccatt gacgtcaata ggaccaccca ccattgacgt
caatgggatg 5220gctcattgcc cattcatatc cgttctcacg ccccctattg acgtcaatga
cggtaaatgg 5280cccacttggc agtacatcaa tatctattaa tagtaacttg gcaagtacat
tactattgga 5340aggacgccag ggtacattgg cagtactccc attgacgtca atggcggtaa
atggcccgcg 5400atggctgcca agtacatccc cattgacgtc aatggggagg ggcaatgacg
caaatgggcg 5460ttccattgac gtaaatgggc ggtaggcgtg cctaatggga ggtctatata
agcaatgctc 5520gtttagggaa c
5531105135DNAArtificial SequencepminCMV-luc2 10ggtaccgagc
tcttacgcgt gctagcccgg gctcgagatc tgatatcaag cttactagtg 60tcgaggtagg
cgtgtacggt gggaggccta tataagcaga gctcgtttag tgaaccgtca 120gatcgcctgg
aggtaccgcc accatggaag atgccaaaaa cattaagaag ggcccagcgc 180cattctaccc
actcgaagac gggaccgccg gcgagcagct gcacaaagcc atgaagcgct 240acgccctggt
gcccggcacc atcgccttta ccgacgcaca tatcgaggtg gacattacct 300acgccgagta
cttcgagatg agcgttcggc tggcagaagc tatgaagcgc tatgggctga 360atacaaacca
tcggatcgtg gtgtgcagcg agaatagctt gcagttcttc atgcccgtgt 420tgggtgccct
gttcatcggt gtggctgtgg ccccagctaa cgacatctac aacgagcgcg 480agctgctgaa
cagcatgggc atcagccagc ccaccgtcgt attcgtgagc aagaaagggc 540tgcaaaagat
cctcaacgtg caaaagaagc taccgatcat acaaaagatc atcatcatgg 600atagcaagac
cgactaccag ggcttccaaa gcatgtacac cttcgtgact tcccatttgc 660cacccggctt
caacgagtac gacttcgtgc ccgagagctt cgaccgggac aaaaccatcg 720ccctgatcat
gaacagtagt ggcagtaccg gattgcccaa gggcgtagcc ctaccgcacc 780gcaccgcttg
tgtccgattc agtcatgccc gcgaccccat cttcggcaac cagatcatcc 840ccgacaccgc
tatcctcagc gtggtgccat ttcaccacgg cttcggcatg ttcaccacgc 900tgggctactt
gatctgcggc tttcgggtcg tgctcatgta ccgcttcgag gaggagctat 960tcttgcgcag
cttgcaagac tataagattc aatctgccct gctggtgccc acactattta 1020gcttcttcgc
taagagcact ctcatcgaca agtacgacct aagcaacttg cacgagatcg 1080ccagcggcgg
ggcgccgctc agcaaggagg taggtgaggc cgtggccaaa cgcttccacc 1140taccaggcat
ccgccagggc tacggcctga cagaaacaac cagcgccatt ctgatcaccc 1200ccgaagggga
cgacaagcct ggcgcagtag gcaaggtggt gcccttcttc gaggctaagg 1260tggtggactt
ggacaccggt aagacactgg gtgtgaacca gcgcggcgag ctgtgcgtcc 1320gtggccccat
gatcatgagc ggctacgtta acaaccccga ggctacaaac gctctcatcg 1380acaaggacgg
ctggctgcac agcggcgaca tcgcctactg ggacgaggac gagcacttct 1440tcatcgtgga
ccggctgaag agcctgatca aatacaaggg ctaccaggta gccccagccg 1500aactggagag
catcctgctg caacacccca acatcttcga cgccggggtc gccggcctgc 1560ccgacgacga
tgccggcgag ctgcccgccg cagtcgtcgt gctggaacac ggtaaaacca 1620tgaccgagaa
ggagatcgtg gactatgtgg ccagccaggt tacaaccgcc aagaagctgc 1680gcggtggtgt
tgtgttcgtg gacgaggtgc ctaaaggact gaccggcaag ttggacgccc 1740gcaagatccg
cgagattctc attaaggcca agaagggcgg caagatcgcc gtgaattctt 1800aactgcagtt
aatctagagt cggggcggcc ggccgcttcg agcagacatg ataagataca 1860ttgatgagtt
tggacaaacc acaactagaa tgcagtgaaa aaaatgcttt atttgtgaaa 1920tttgtgatgc
tattgcttta tttgtaacca ttataagctg caataaacaa gttaacaaca 1980acaattgcat
tcattttatg tttcaggttc agggggaggt gtgggaggtt ttttaaagca 2040agtaaaacct
ctacaaatgt ggtaaaatcg ataaggatct gaacgatgga gcggagaatg 2100ggcggaactg
ggcggagtta ggggcgggat gggcggagtt aggggcggga ctatggttgc 2160tgactaattg
agatgcatgc tttgcatact tctgcctgct ggggagcctg gggactttcc 2220acacctggtt
gctgactaat tgagatgcat gctttgcata cttctgcctg ctggggagcc 2280tggggacttt
ccacacccta actgacacac attccacagc ggatccgtcg accgatgccc 2340ttgagagcct
tcaacccagt cagctccttc cggtgggcgc ggggcatgac tatcgtcgcc 2400gcacttatga
ctgtcttctt tatcatgcaa ctcgtaggac aggtgccggc agcgctcttc 2460cgcttcctcg
ctcactgact cgctgcgctc ggtcgttcgg ctgcggcgag cggtatcagc 2520tcactcaaag
gcggtaatac ggttatccac agaatcaggg gataacgcag gaaagaacat 2580gtgagcaaaa
ggccagcaaa aggccaggaa ccgtaaaaag gccgcgttgc tggcgttttt 2640ccataggctc
cgcccccctg acgagcatca caaaaatcga cgctcaagtc agaggtggcg 2700aaacccgaca
ggactataaa gataccaggc gtttccccct ggaagctccc tcgtgcgctc 2760tcctgttccg
accctgccgc ttaccggata cctgtccgcc tttctccctt cgggaagcgt 2820ggcgctttct
catagctcac gctgtaggta tctcagttcg gtgtaggtcg ttcgctccaa 2880gctgggctgt
gtgcacgaac cccccgttca gcccgaccgc tgcgccttat ccggtaacta 2940tcgtcttgag
tccaacccgg taagacacga cttatcgcca ctggcagcag ccactggtaa 3000caggattagc
agagcgaggt atgtaggcgg tgctacagag ttcttgaagt ggtggcctaa 3060ctacggctac
actagaagaa cagtatttgg tatctgcgct ctgctgaagc cagttacctt 3120cggaaaaaga
gttggtagct cttgatccgg caaacaaacc accgctggta gcggtggttt 3180ttttgtttgc
aagcagcaga ttacgcgcag aaaaaaagga tctcaagaag atcctttgat 3240cttttctacg
gggtctgacg ctcagtggaa cgaaaactca cgttaaggga ttttggtcat 3300gagattatca
aaaaggatct tcacctagat ccttttaaat taaaaatgaa gttttaaatc 3360aatctaaagt
atatatgagt aaacttggtc tgacagttac caatgcttaa tcagtgaggc 3420acctatctca
gcgatctgtc tatttcgttc atccatagtt gcctgactcc ccgtcgtgta 3480gataactacg
atacgggagg gcttaccatc tggccccagt gctgcaatga taccgcgaga 3540cccacgctca
ccggctccag atttatcagc aataaaccag ccagccggaa gggccgagcg 3600cagaagtggt
cctgcaactt tatccgcctc catccagtct attaattgtt gccgggaagc 3660tagagtaagt
agttcgccag ttaatagttt gcgcaacgtt gttgccattg ctacaggcat 3720cgtggtgtca
cgctcgtcgt ttggtatggc ttcattcagc tccggttccc aacgatcaag 3780gcgagttaca
tgatccccca tgttgtgcaa aaaagcggtt agctccttcg gtcctccgat 3840cgttgtcaga
agtaagttgg ccgcagtgtt atcactcatg gttatggcag cactgcataa 3900ttctcttact
gtcatgccat ccgtaagatg cttttctgtg actggtgagt actcaaccaa 3960gtcattctga
gaatagtgta tgcggcgacc gagttgctct tgcccggcgt caatacggga 4020taataccgcg
ccacatagca gaactttaaa agtgctcatc attggaaaac gttcttcggg 4080gcgaaaactc
tcaaggatct taccgctgtt gagatccagt tcgatgtaac ccactcgtgc 4140acccaactga
tcttcagcat cttttacttt caccagcgtt tctgggtgag caaaaacagg 4200aaggcaaaat
gccgcaaaaa agggaataag ggcgacacgg aaatgttgaa tactcatact 4260cttccttttt
caatattatt gaagcattta tcagggttat tgtctcatga gcggatacat 4320atttgaatgt
atttagaaaa ataaacaaat aggggttccg cgcacatttc cccgaaaagt 4380gccacctgac
gcgccctgta gcggcgcatt aagcgcggcg ggtgtggtgg ttacgcgcag 4440cgtgaccgct
acacttgcca gcgccctagc gcccgctcct ttcgctttct tcccttcctt 4500tctcgccacg
ttcgccggct ttccccgtca agctctaaat cgggggctcc ctttagggtt 4560ccgatttagt
gctttacggc acctcgaccc caaaaaactt gattagggtg atggttcacg 4620tagtgggcca
tcgccctgat agacggtttt tcgccctttg acgttggagt ccacgttctt 4680taatagtgga
ctcttgttcc aaactggaac aacactcaac cctatctcgg tctattcttt 4740tgatttataa
gggattttgc cgatttcggc ctattggtta aaaaatgagc tgatttaaca 4800aaaatttaac
gcgaatttta acaaaatatt aacgcttaca atttgccatt cgccattcag 4860gctgcgcaac
tgttgggaag ggcgatcggt gcgggcctct tcgctattac gccagcccaa 4920gctaccatga
taagtaagta atattaaggt acgggaggta cttggagcgg ccgcaataaa 4980atatctttat
tttcattaca tctgtgtgtt ggttttttgt gtgaatcgat agtactaaca 5040tacgctctcc
atcaaaacaa aacgaaacaa aacaaactag caaaataggc tgtccccagt 5100gcaagtgcag
gtgccagaac atttctctat cgata
5135119DNAArtificial SequenceTarget sequence in p63 11tctatcact
91215DNAArtificial
SequenceTarget sequence in pTac2 12aactttcgtc actca
151311DNAArtificial SequenceTarget
sequence in GUN1 13aatttgtcga t
11145204DNAArtificial Sequencep63-4x target 14ggtaccgagc
tcttacgcgt gctagcccgg gctcgagatc tgatatcaag ttctatcact 60ttgtttcttc
tatcacttga attcttctat cacttatctt cttctatcac ttcagttcgc 120ttactagtgt
cgaggtaggc gtgtacggtg ggaggcctat ataagcagag ctcgtttagt 180gaaccgtcag
atcgcctgga ggtaccgcca ccatggaaga tgccaaaaac attaagaagg 240gcccagcgcc
attctaccca ctcgaagacg ggaccgccgg cgagcagctg cacaaagcca 300tgaagcgcta
cgccctggtg cccggcacca tcgcctttac cgacgcacat atcgaggtgg 360acattaccta
cgccgagtac ttcgagatga gcgttcggct ggcagaagct atgaagcgct 420atgggctgaa
tacaaaccat cggatcgtgg tgtgcagcga gaatagcttg cagttcttca 480tgcccgtgtt
gggtgccctg ttcatcggtg tggctgtggc cccagctaac gacatctaca 540acgagcgcga
gctgctgaac agcatgggca tcagccagcc caccgtcgta ttcgtgagca 600agaaagggct
gcaaaagatc ctcaacgtgc aaaagaagct accgatcata caaaagatca 660tcatcatgga
tagcaagacc gactaccagg gcttccaaag catgtacacc ttcgtgactt 720cccatttgcc
acccggcttc aacgagtacg acttcgtgcc cgagagcttc gaccgggaca 780aaaccatcgc
cctgatcatg aacagtagtg gcagtaccgg attgcccaag ggcgtagccc 840taccgcaccg
caccgcttgt gtccgattca gtcatgcccg cgaccccatc ttcggcaacc 900agatcatccc
cgacaccgct atcctcagcg tggtgccatt tcaccacggc ttcggcatgt 960tcaccacgct
gggctacttg atctgcggct ttcgggtcgt gctcatgtac cgcttcgagg 1020aggagctatt
cttgcgcagc ttgcaagact ataagattca atctgccctg ctggtgccca 1080cactatttag
cttcttcgct aagagcactc tcatcgacaa gtacgaccta agcaacttgc 1140acgagatcgc
cagcggcggg gcgccgctca gcaaggaggt aggtgaggcc gtggccaaac 1200gcttccacct
accaggcatc cgccagggct acggcctgac agaaacaacc agcgccattc 1260tgatcacccc
cgaaggggac gacaagcctg gcgcagtagg caaggtggtg cccttcttcg 1320aggctaaggt
ggtggacttg gacaccggta agacactggg tgtgaaccag cgcggcgagc 1380tgtgcgtccg
tggccccatg atcatgagcg gctacgttaa caaccccgag gctacaaacg 1440ctctcatcga
caaggacggc tggctgcaca gcggcgacat cgcctactgg gacgaggacg 1500agcacttctt
catcgtggac cggctgaaga gcctgatcaa atacaagggc taccaggtag 1560ccccagccga
actggagagc atcctgctgc aacaccccaa catcttcgac gccggggtcg 1620ccggcctgcc
cgacgacgat gccggcgagc tgcccgccgc agtcgtcgtg ctggaacacg 1680gtaaaaccat
gaccgagaag gagatcgtgg actatgtggc cagccaggtt acaaccgcca 1740agaagctgcg
cggtggtgtt gtgttcgtgg acgaggtgcc taaaggactg accggcaagt 1800tggacgcccg
caagatccgc gagattctca ttaaggccaa gaagggcggc aagatcgccg 1860tgaattctta
actgcagtta atctagagtc ggggcggccg gccgcttcga gcagacatga 1920taagatacat
tgatgagttt ggacaaacca caactagaat gcagtgaaaa aaatgcttta 1980tttgtgaaat
ttgtgatgct attgctttat ttgtaaccat tataagctgc aataaacaag 2040ttaacaacaa
caattgcatt cattttatgt ttcaggttca gggggaggtg tgggaggttt 2100tttaaagcaa
gtaaaacctc tacaaatgtg gtaaaatcga taaggatctg aacgatggag 2160cggagaatgg
gcggaactgg gcggagttag gggcgggatg ggcggagtta ggggcgggac 2220tatggttgct
gactaattga gatgcatgct ttgcatactt ctgcctgctg gggagcctgg 2280ggactttcca
cacctggttg ctgactaatt gagatgcatg ctttgcatac ttctgcctgc 2340tggggagcct
ggggactttc cacaccctaa ctgacacaca ttccacagcg gatccgtcga 2400ccgatgccct
tgagagcctt caacccagtc agctccttcc ggtgggcgcg gggcatgact 2460atcgtcgccg
cacttatgac tgtcttcttt atcatgcaac tcgtaggaca ggtgccggca 2520gcgctcttcc
gcttcctcgc tcactgactc gctgcgctcg gtcgttcggc tgcggcgagc 2580ggtatcagct
cactcaaagg cggtaatacg gttatccaca gaatcagggg ataacgcagg 2640aaagaacatg
tgagcaaaag gccagcaaaa ggccaggaac cgtaaaaagg ccgcgttgct 2700ggcgtttttc
cataggctcc gcccccctga cgagcatcac aaaaatcgac gctcaagtca 2760gaggtggcga
aacccgacag gactataaag ataccaggcg tttccccctg gaagctccct 2820cgtgcgctct
cctgttccga ccctgccgct taccggatac ctgtccgcct ttctcccttc 2880gggaagcgtg
gcgctttctc atagctcacg ctgtaggtat ctcagttcgg tgtaggtcgt 2940tcgctccaag
ctgggctgtg tgcacgaacc ccccgttcag cccgaccgct gcgccttatc 3000cggtaactat
cgtcttgagt ccaacccggt aagacacgac ttatcgccac tggcagcagc 3060cactggtaac
aggattagca gagcgaggta tgtaggcggt gctacagagt tcttgaagtg 3120gtggcctaac
tacggctaca ctagaagaac agtatttggt atctgcgctc tgctgaagcc 3180agttaccttc
ggaaaaagag ttggtagctc ttgatccggc aaacaaacca ccgctggtag 3240cggtggtttt
tttgtttgca agcagcagat tacgcgcaga aaaaaaggat ctcaagaaga 3300tcctttgatc
ttttctacgg ggtctgacgc tcagtggaac gaaaactcac gttaagggat 3360tttggtcatg
agattatcaa aaaggatctt cacctagatc cttttaaatt aaaaatgaag 3420ttttaaatca
atctaaagta tatatgagta aacttggtct gacagttacc aatgcttaat 3480cagtgaggca
cctatctcag cgatctgtct atttcgttca tccatagttg cctgactccc 3540cgtcgtgtag
ataactacga tacgggaggg cttaccatct ggccccagtg ctgcaatgat 3600accgcgagac
ccacgctcac cggctccaga tttatcagca ataaaccagc cagccggaag 3660ggccgagcgc
agaagtggtc ctgcaacttt atccgcctcc atccagtcta ttaattgttg 3720ccgggaagct
agagtaagta gttcgccagt taatagtttg cgcaacgttg ttgccattgc 3780tacaggcatc
gtggtgtcac gctcgtcgtt tggtatggct tcattcagct ccggttccca 3840acgatcaagg
cgagttacat gatcccccat gttgtgcaaa aaagcggtta gctccttcgg 3900tcctccgatc
gttgtcagaa gtaagttggc cgcagtgtta tcactcatgg ttatggcagc 3960actgcataat
tctcttactg tcatgccatc cgtaagatgc ttttctgtga ctggtgagta 4020ctcaaccaag
tcattctgag aatagtgtat gcggcgaccg agttgctctt gcccggcgtc 4080aatacgggat
aataccgcgc cacatagcag aactttaaaa gtgctcatca ttggaaaacg 4140ttcttcgggg
cgaaaactct caaggatctt accgctgttg agatccagtt cgatgtaacc 4200cactcgtgca
cccaactgat cttcagcatc ttttactttc accagcgttt ctgggtgagc 4260aaaaacagga
aggcaaaatg ccgcaaaaaa gggaataagg gcgacacgga aatgttgaat 4320actcatactc
ttcctttttc aatattattg aagcatttat cagggttatt gtctcatgag 4380cggatacata
tttgaatgta tttagaaaaa taaacaaata ggggttccgc gcacatttcc 4440ccgaaaagtg
ccacctgacg cgccctgtag cggcgcatta agcgcggcgg gtgtggtggt 4500tacgcgcagc
gtgaccgcta cacttgccag cgccctagcg cccgctcctt tcgctttctt 4560cccttccttt
ctcgccacgt tcgccggctt tccccgtcaa gctctaaatc gggggctccc 4620tttagggttc
cgatttagtg ctttacggca cctcgacccc aaaaaacttg attagggtga 4680tggttcacgt
agtgggccat cgccctgata gacggttttt cgccctttga cgttggagtc 4740cacgttcttt
aatagtggac tcttgttcca aactggaaca acactcaacc ctatctcggt 4800ctattctttt
gatttataag ggattttgcc gatttcggcc tattggttaa aaaatgagct 4860gatttaacaa
aaatttaacg cgaattttaa caaaatatta acgcttacaa tttgccattc 4920gccattcagg
ctgcgcaact gttgggaagg gcgatcggtg cgggcctctt cgctattacg 4980ccagcccaag
ctaccatgat aagtaagtaa tattaaggta cgggaggtac ttggagcggc 5040cgcaataaaa
tatctttatt ttcattacat ctgtgtgttg gttttttgtg tgaatcgata 5100gtactaacat
acgctctcca tcaaaacaaa acgaaacaaa acaaactagc aaaataggct 5160gtccccagtg
caagtgcagg tgccagaaca tttctctatc gata
5204155272DNAArtificial Sequencep63-8x target 15ggtaccgagc tcttacgcgt
gctagcccgg gctcgagatc tgatatcaag ttctatcact 60tccatggttc tatcacttca
cgacttctat cactttgttt cttctatcac ttgaattctt 120ctatcactta agttcttcta
tcacttttcg aattctatca cttatcttct tctatcactt 180cagttcgctt actagtgtcg
aggtaggcgt gtacggtggg aggcctatat aagcagagct 240cgtttagtga accgtcagat
cgcctggagg taccgccacc atggaagatg ccaaaaacat 300taagaagggc ccagcgccat
tctacccact cgaagacggg accgccggcg agcagctgca 360caaagccatg aagcgctacg
ccctggtgcc cggcaccatc gcctttaccg acgcacatat 420cgaggtggac attacctacg
ccgagtactt cgagatgagc gttcggctgg cagaagctat 480gaagcgctat gggctgaata
caaaccatcg gatcgtggtg tgcagcgaga atagcttgca 540gttcttcatg cccgtgttgg
gtgccctgtt catcggtgtg gctgtggccc cagctaacga 600catctacaac gagcgcgagc
tgctgaacag catgggcatc agccagccca ccgtcgtatt 660cgtgagcaag aaagggctgc
aaaagatcct caacgtgcaa aagaagctac cgatcataca 720aaagatcatc atcatggata
gcaagaccga ctaccagggc ttccaaagca tgtacacctt 780cgtgacttcc catttgccac
ccggcttcaa cgagtacgac ttcgtgcccg agagcttcga 840ccgggacaaa accatcgccc
tgatcatgaa cagtagtggc agtaccggat tgcccaaggg 900cgtagcccta ccgcaccgca
ccgcttgtgt ccgattcagt catgcccgcg accccatctt 960cggcaaccag atcatccccg
acaccgctat cctcagcgtg gtgccatttc accacggctt 1020cggcatgttc accacgctgg
gctacttgat ctgcggcttt cgggtcgtgc tcatgtaccg 1080cttcgaggag gagctattct
tgcgcagctt gcaagactat aagattcaat ctgccctgct 1140ggtgcccaca ctatttagct
tcttcgctaa gagcactctc atcgacaagt acgacctaag 1200caacttgcac gagatcgcca
gcggcggggc gccgctcagc aaggaggtag gtgaggccgt 1260ggccaaacgc ttccacctac
caggcatccg ccagggctac ggcctgacag aaacaaccag 1320cgccattctg atcacccccg
aaggggacga caagcctggc gcagtaggca aggtggtgcc 1380cttcttcgag gctaaggtgg
tggacttgga caccggtaag acactgggtg tgaaccagcg 1440cggcgagctg tgcgtccgtg
gccccatgat catgagcggc tacgttaaca accccgaggc 1500tacaaacgct ctcatcgaca
aggacggctg gctgcacagc ggcgacatcg cctactggga 1560cgaggacgag cacttcttca
tcgtggaccg gctgaagagc ctgatcaaat acaagggcta 1620ccaggtagcc ccagccgaac
tggagagcat cctgctgcaa caccccaaca tcttcgacgc 1680cggggtcgcc ggcctgcccg
acgacgatgc cggcgagctg cccgccgcag tcgtcgtgct 1740ggaacacggt aaaaccatga
ccgagaagga gatcgtggac tatgtggcca gccaggttac 1800aaccgccaag aagctgcgcg
gtggtgttgt gttcgtggac gaggtgccta aaggactgac 1860cggcaagttg gacgcccgca
agatccgcga gattctcatt aaggccaaga agggcggcaa 1920gatcgccgtg aattcttaac
tgcagttaat ctagagtcgg ggcggccggc cgcttcgagc 1980agacatgata agatacattg
atgagtttgg acaaaccaca actagaatgc agtgaaaaaa 2040atgctttatt tgtgaaattt
gtgatgctat tgctttattt gtaaccatta taagctgcaa 2100taaacaagtt aacaacaaca
attgcattca ttttatgttt caggttcagg gggaggtgtg 2160ggaggttttt taaagcaagt
aaaacctcta caaatgtggt aaaatcgata aggatctgaa 2220cgatggagcg gagaatgggc
ggaactgggc ggagttaggg gcgggatggg cggagttagg 2280ggcgggacta tggttgctga
ctaattgaga tgcatgcttt gcatacttct gcctgctggg 2340gagcctgggg actttccaca
cctggttgct gactaattga gatgcatgct ttgcatactt 2400ctgcctgctg gggagcctgg
ggactttcca caccctaact gacacacatt ccacagcgga 2460tccgtcgacc gatgcccttg
agagccttca acccagtcag ctccttccgg tgggcgcggg 2520gcatgactat cgtcgccgca
cttatgactg tcttctttat catgcaactc gtaggacagg 2580tgccggcagc gctcttccgc
ttcctcgctc actgactcgc tgcgctcggt cgttcggctg 2640cggcgagcgg tatcagctca
ctcaaaggcg gtaatacggt tatccacaga atcaggggat 2700aacgcaggaa agaacatgtg
agcaaaaggc cagcaaaagg ccaggaaccg taaaaaggcc 2760gcgttgctgg cgtttttcca
taggctccgc ccccctgacg agcatcacaa aaatcgacgc 2820tcaagtcaga ggtggcgaaa
cccgacagga ctataaagat accaggcgtt tccccctgga 2880agctccctcg tgcgctctcc
tgttccgacc ctgccgctta ccggatacct gtccgccttt 2940ctcccttcgg gaagcgtggc
gctttctcat agctcacgct gtaggtatct cagttcggtg 3000taggtcgttc gctccaagct
gggctgtgtg cacgaacccc ccgttcagcc cgaccgctgc 3060gccttatccg gtaactatcg
tcttgagtcc aacccggtaa gacacgactt atcgccactg 3120gcagcagcca ctggtaacag
gattagcaga gcgaggtatg taggcggtgc tacagagttc 3180ttgaagtggt ggcctaacta
cggctacact agaagaacag tatttggtat ctgcgctctg 3240ctgaagccag ttaccttcgg
aaaaagagtt ggtagctctt gatccggcaa acaaaccacc 3300gctggtagcg gtggtttttt
tgtttgcaag cagcagatta cgcgcagaaa aaaaggatct 3360caagaagatc ctttgatctt
ttctacgggg tctgacgctc agtggaacga aaactcacgt 3420taagggattt tggtcatgag
attatcaaaa aggatcttca cctagatcct tttaaattaa 3480aaatgaagtt ttaaatcaat
ctaaagtata tatgagtaaa cttggtctga cagttaccaa 3540tgcttaatca gtgaggcacc
tatctcagcg atctgtctat ttcgttcatc catagttgcc 3600tgactccccg tcgtgtagat
aactacgata cgggagggct taccatctgg ccccagtgct 3660gcaatgatac cgcgagaccc
acgctcaccg gctccagatt tatcagcaat aaaccagcca 3720gccggaaggg ccgagcgcag
aagtggtcct gcaactttat ccgcctccat ccagtctatt 3780aattgttgcc gggaagctag
agtaagtagt tcgccagtta atagtttgcg caacgttgtt 3840gccattgcta caggcatcgt
ggtgtcacgc tcgtcgtttg gtatggcttc attcagctcc 3900ggttcccaac gatcaaggcg
agttacatga tcccccatgt tgtgcaaaaa agcggttagc 3960tccttcggtc ctccgatcgt
tgtcagaagt aagttggccg cagtgttatc actcatggtt 4020atggcagcac tgcataattc
tcttactgtc atgccatccg taagatgctt ttctgtgact 4080ggtgagtact caaccaagtc
attctgagaa tagtgtatgc ggcgaccgag ttgctcttgc 4140ccggcgtcaa tacgggataa
taccgcgcca catagcagaa ctttaaaagt gctcatcatt 4200ggaaaacgtt cttcggggcg
aaaactctca aggatcttac cgctgttgag atccagttcg 4260atgtaaccca ctcgtgcacc
caactgatct tcagcatctt ttactttcac cagcgtttct 4320gggtgagcaa aaacaggaag
gcaaaatgcc gcaaaaaagg gaataagggc gacacggaaa 4380tgttgaatac tcatactctt
cctttttcaa tattattgaa gcatttatca gggttattgt 4440ctcatgagcg gatacatatt
tgaatgtatt tagaaaaata aacaaatagg ggttccgcgc 4500acatttcccc gaaaagtgcc
acctgacgcg ccctgtagcg gcgcattaag cgcggcgggt 4560gtggtggtta cgcgcagcgt
gaccgctaca cttgccagcg ccctagcgcc cgctcctttc 4620gctttcttcc cttcctttct
cgccacgttc gccggctttc cccgtcaagc tctaaatcgg 4680gggctccctt tagggttccg
atttagtgct ttacggcacc tcgaccccaa aaaacttgat 4740tagggtgatg gttcacgtag
tgggccatcg ccctgataga cggtttttcg ccctttgacg 4800ttggagtcca cgttctttaa
tagtggactc ttgttccaaa ctggaacaac actcaaccct 4860atctcggtct attcttttga
tttataaggg attttgccga tttcggccta ttggttaaaa 4920aatgagctga tttaacaaaa
atttaacgcg aattttaaca aaatattaac gcttacaatt 4980tgccattcgc cattcaggct
gcgcaactgt tgggaagggc gatcggtgcg ggcctcttcg 5040ctattacgcc agcccaagct
accatgataa gtaagtaata ttaaggtacg ggaggtactt 5100ggagcggccg caataaaata
tctttatttt cattacatct gtgtgttggt tttttgtgtg 5160aatcgatagt actaacatac
gctctccatc aaaacaaaac gaaacaaaac aaactagcaa 5220aataggctgt ccccagtgca
agtgcaggtg ccagaacatt tctctatcga ta 5272165228DNAArtificial
SequencepTac2-4x target 16ggtaccgagc tcttacgcgt gctagcccgg gctcgagatc
tgatatcaag taactttcgt 60cactcattgt ttctaacttt cgtcactcat ggattctaac
tttcgtcact catatcttct 120aactttcgtc actcatcagt tcgcttacta gtgtcgaggt
aggcgtgtac ggtgggaggc 180ctatataagc agagctcgtt tagtgaaccg tcagatcgcc
tggaggtacc gccaccatgg 240aagatgccaa aaacattaag aagggcccag cgccattcta
cccactcgaa gacgggaccg 300ccggcgagca gctgcacaaa gccatgaagc gctacgccct
ggtgcccggc accatcgcct 360ttaccgacgc acatatcgag gtggacatta cctacgccga
gtacttcgag atgagcgttc 420ggctggcaga agctatgaag cgctatgggc tgaatacaaa
ccatcggatc gtggtgtgca 480gcgagaatag cttgcagttc ttcatgcccg tgttgggtgc
cctgttcatc ggtgtggctg 540tggccccagc taacgacatc tacaacgagc gcgagctgct
gaacagcatg ggcatcagcc 600agcccaccgt cgtattcgtg agcaagaaag ggctgcaaaa
gatcctcaac gtgcaaaaga 660agctaccgat catacaaaag atcatcatca tggatagcaa
gaccgactac cagggcttcc 720aaagcatgta caccttcgtg acttcccatt tgccacccgg
cttcaacgag tacgacttcg 780tgcccgagag cttcgaccgg gacaaaacca tcgccctgat
catgaacagt agtggcagta 840ccggattgcc caagggcgta gccctaccgc accgcaccgc
ttgtgtccga ttcagtcatg 900cccgcgaccc catcttcggc aaccagatca tccccgacac
cgctatcctc agcgtggtgc 960catttcacca cggcttcggc atgttcacca cgctgggcta
cttgatctgc ggctttcggg 1020tcgtgctcat gtaccgcttc gaggaggagc tattcttgcg
cagcttgcaa gactataaga 1080ttcaatctgc cctgctggtg cccacactat ttagcttctt
cgctaagagc actctcatcg 1140acaagtacga cctaagcaac ttgcacgaga tcgccagcgg
cggggcgccg ctcagcaagg 1200aggtaggtga ggccgtggcc aaacgcttcc acctaccagg
catccgccag ggctacggcc 1260tgacagaaac aaccagcgcc attctgatca cccccgaagg
ggacgacaag cctggcgcag 1320taggcaaggt ggtgcccttc ttcgaggcta aggtggtgga
cttggacacc ggtaagacac 1380tgggtgtgaa ccagcgcggc gagctgtgcg tccgtggccc
catgatcatg agcggctacg 1440ttaacaaccc cgaggctaca aacgctctca tcgacaagga
cggctggctg cacagcggcg 1500acatcgccta ctgggacgag gacgagcact tcttcatcgt
ggaccggctg aagagcctga 1560tcaaatacaa gggctaccag gtagccccag ccgaactgga
gagcatcctg ctgcaacacc 1620ccaacatctt cgacgccggg gtcgccggcc tgcccgacga
cgatgccggc gagctgcccg 1680ccgcagtcgt cgtgctggaa cacggtaaaa ccatgaccga
gaaggagatc gtggactatg 1740tggccagcca ggttacaacc gccaagaagc tgcgcggtgg
tgttgtgttc gtggacgagg 1800tgcctaaagg actgaccggc aagttggacg cccgcaagat
ccgcgagatt ctcattaagg 1860ccaagaaggg cggcaagatc gccgtgaatt cttaactgca
gttaatctag agtcggggcg 1920gccggccgct tcgagcagac atgataagat acattgatga
gtttggacaa accacaacta 1980gaatgcagtg aaaaaaatgc tttatttgtg aaatttgtga
tgctattgct ttatttgtaa 2040ccattataag ctgcaataaa caagttaaca acaacaattg
cattcatttt atgtttcagg 2100ttcaggggga ggtgtgggag gttttttaaa gcaagtaaaa
cctctacaaa tgtggtaaaa 2160tcgataagga tctgaacgat ggagcggaga atgggcggaa
ctgggcggag ttaggggcgg 2220gatgggcgga gttaggggcg ggactatggt tgctgactaa
ttgagatgca tgctttgcat 2280acttctgcct gctggggagc ctggggactt tccacacctg
gttgctgact aattgagatg 2340catgctttgc atacttctgc ctgctgggga gcctggggac
tttccacacc ctaactgaca 2400cacattccac agcggatccg tcgaccgatg cccttgagag
ccttcaaccc agtcagctcc 2460ttccggtggg cgcggggcat gactatcgtc gccgcactta
tgactgtctt ctttatcatg 2520caactcgtag gacaggtgcc ggcagcgctc ttccgcttcc
tcgctcactg actcgctgcg 2580ctcggtcgtt cggctgcggc gagcggtatc agctcactca
aaggcggtaa tacggttatc 2640cacagaatca ggggataacg caggaaagaa catgtgagca
aaaggccagc aaaaggccag 2700gaaccgtaaa aaggccgcgt tgctggcgtt tttccatagg
ctccgccccc ctgacgagca 2760tcacaaaaat cgacgctcaa gtcagaggtg gcgaaacccg
acaggactat aaagatacca 2820ggcgtttccc cctggaagct ccctcgtgcg ctctcctgtt
ccgaccctgc cgcttaccgg 2880atacctgtcc gcctttctcc cttcgggaag cgtggcgctt
tctcatagct cacgctgtag 2940gtatctcagt tcggtgtagg tcgttcgctc caagctgggc
tgtgtgcacg aaccccccgt 3000tcagcccgac cgctgcgcct tatccggtaa ctatcgtctt
gagtccaacc cggtaagaca 3060cgacttatcg ccactggcag cagccactgg taacaggatt
agcagagcga ggtatgtagg 3120cggtgctaca gagttcttga agtggtggcc taactacggc
tacactagaa gaacagtatt 3180tggtatctgc gctctgctga agccagttac cttcggaaaa
agagttggta gctcttgatc 3240cggcaaacaa accaccgctg gtagcggtgg tttttttgtt
tgcaagcagc agattacgcg 3300cagaaaaaaa ggatctcaag aagatccttt gatcttttct
acggggtctg acgctcagtg 3360gaacgaaaac tcacgttaag ggattttggt catgagatta
tcaaaaagga tcttcaccta 3420gatcctttta aattaaaaat gaagttttaa atcaatctaa
agtatatatg agtaaacttg 3480gtctgacagt taccaatgct taatcagtga ggcacctatc
tcagcgatct gtctatttcg 3540ttcatccata gttgcctgac tccccgtcgt gtagataact
acgatacggg agggcttacc 3600atctggcccc agtgctgcaa tgataccgcg agacccacgc
tcaccggctc cagatttatc 3660agcaataaac cagccagccg gaagggccga gcgcagaagt
ggtcctgcaa ctttatccgc 3720ctccatccag tctattaatt gttgccggga agctagagta
agtagttcgc cagttaatag 3780tttgcgcaac gttgttgcca ttgctacagg catcgtggtg
tcacgctcgt cgtttggtat 3840ggcttcattc agctccggtt cccaacgatc aaggcgagtt
acatgatccc ccatgttgtg 3900caaaaaagcg gttagctcct tcggtcctcc gatcgttgtc
agaagtaagt tggccgcagt 3960gttatcactc atggttatgg cagcactgca taattctctt
actgtcatgc catccgtaag 4020atgcttttct gtgactggtg agtactcaac caagtcattc
tgagaatagt gtatgcggcg 4080accgagttgc tcttgcccgg cgtcaatacg ggataatacc
gcgccacata gcagaacttt 4140aaaagtgctc atcattggaa aacgttcttc ggggcgaaaa
ctctcaagga tcttaccgct 4200gttgagatcc agttcgatgt aacccactcg tgcacccaac
tgatcttcag catcttttac 4260tttcaccagc gtttctgggt gagcaaaaac aggaaggcaa
aatgccgcaa aaaagggaat 4320aagggcgaca cggaaatgtt gaatactcat actcttcctt
tttcaatatt attgaagcat 4380ttatcagggt tattgtctca tgagcggata catatttgaa
tgtatttaga aaaataaaca 4440aataggggtt ccgcgcacat ttccccgaaa agtgccacct
gacgcgccct gtagcggcgc 4500attaagcgcg gcgggtgtgg tggttacgcg cagcgtgacc
gctacacttg ccagcgccct 4560agcgcccgct cctttcgctt tcttcccttc ctttctcgcc
acgttcgccg gctttccccg 4620tcaagctcta aatcgggggc tccctttagg gttccgattt
agtgctttac ggcacctcga 4680ccccaaaaaa cttgattagg gtgatggttc acgtagtggg
ccatcgccct gatagacggt 4740ttttcgccct ttgacgttgg agtccacgtt ctttaatagt
ggactcttgt tccaaactgg 4800aacaacactc aaccctatct cggtctattc ttttgattta
taagggattt tgccgatttc 4860ggcctattgg ttaaaaaatg agctgattta acaaaaattt
aacgcgaatt ttaacaaaat 4920attaacgctt acaatttgcc attcgccatt caggctgcgc
aactgttggg aagggcgatc 4980ggtgcgggcc tcttcgctat tacgccagcc caagctacca
tgataagtaa gtaatattaa 5040ggtacgggag gtacttggag cggccgcaat aaaatatctt
tattttcatt acatctgtgt 5100gttggttttt tgtgtgaatc gatagtacta acatacgctc
tccatcaaaa caaaacgaaa 5160caaaacaaac tagcaaaata ggctgtcccc agtgcaagtg
caggtgccag aacatttctc 5220tatcgata
5228175320DNAArtificial SequencepTac2-8x target
17ggtaccgagc tcttacgcgt gctagcccgg gctcgagatc tgatatcaag taactttcgt
60cactcatcca tggtaacttt cgtcactcat cacgactaac tttcgtcact cattgtttct
120aactttcgtc actcatgaat tctaactttc gtcactcata agttctaact ttcgtcactc
180atttcgaata actttcgtca ctcatatctt ctaactttcg tcactcatca gttcgcttac
240tagtgtcgag gtaggcgtgt acggtgggag gcctatataa gcagagctcg tttagtgaac
300cgtcagatcg cctggaggta ccgccaccat ggaagatgcc aaaaacatta agaagggccc
360agcgccattc tacccactcg aagacgggac cgccggcgag cagctgcaca aagccatgaa
420gcgctacgcc ctggtgcccg gcaccatcgc ctttaccgac gcacatatcg aggtggacat
480tacctacgcc gagtacttcg agatgagcgt tcggctggca gaagctatga agcgctatgg
540gctgaataca aaccatcgga tcgtggtgtg cagcgagaat agcttgcagt tcttcatgcc
600cgtgttgggt gccctgttca tcggtgtggc tgtggcccca gctaacgaca tctacaacga
660gcgcgagctg ctgaacagca tgggcatcag ccagcccacc gtcgtattcg tgagcaagaa
720agggctgcaa aagatcctca acgtgcaaaa gaagctaccg atcatacaaa agatcatcat
780catggatagc aagaccgact accagggctt ccaaagcatg tacaccttcg tgacttccca
840tttgccaccc ggcttcaacg agtacgactt cgtgcccgag agcttcgacc gggacaaaac
900catcgccctg atcatgaaca gtagtggcag taccggattg cccaagggcg tagccctacc
960gcaccgcacc gcttgtgtcc gattcagtca tgcccgcgac cccatcttcg gcaaccagat
1020catccccgac accgctatcc tcagcgtggt gccatttcac cacggcttcg gcatgttcac
1080cacgctgggc tacttgatct gcggctttcg ggtcgtgctc atgtaccgct tcgaggagga
1140gctattcttg cgcagcttgc aagactataa gattcaatct gccctgctgg tgcccacact
1200atttagcttc ttcgctaaga gcactctcat cgacaagtac gacctaagca acttgcacga
1260gatcgccagc ggcggggcgc cgctcagcaa ggaggtaggt gaggccgtgg ccaaacgctt
1320ccacctacca ggcatccgcc agggctacgg cctgacagaa acaaccagcg ccattctgat
1380cacccccgaa ggggacgaca agcctggcgc agtaggcaag gtggtgccct tcttcgaggc
1440taaggtggtg gacttggaca ccggtaagac actgggtgtg aaccagcgcg gcgagctgtg
1500cgtccgtggc cccatgatca tgagcggcta cgttaacaac cccgaggcta caaacgctct
1560catcgacaag gacggctggc tgcacagcgg cgacatcgcc tactgggacg aggacgagca
1620cttcttcatc gtggaccggc tgaagagcct gatcaaatac aagggctacc aggtagcccc
1680agccgaactg gagagcatcc tgctgcaaca ccccaacatc ttcgacgccg gggtcgccgg
1740cctgcccgac gacgatgccg gcgagctgcc cgccgcagtc gtcgtgctgg aacacggtaa
1800aaccatgacc gagaaggaga tcgtggacta tgtggccagc caggttacaa ccgccaagaa
1860gctgcgcggt ggtgttgtgt tcgtggacga ggtgcctaaa ggactgaccg gcaagttgga
1920cgcccgcaag atccgcgaga ttctcattaa ggccaagaag ggcggcaaga tcgccgtgaa
1980ttcttaactg cagttaatct agagtcgggg cggccggccg cttcgagcag acatgataag
2040atacattgat gagtttggac aaaccacaac tagaatgcag tgaaaaaaat gctttatttg
2100tgaaatttgt gatgctattg ctttatttgt aaccattata agctgcaata aacaagttaa
2160caacaacaat tgcattcatt ttatgtttca ggttcagggg gaggtgtggg aggtttttta
2220aagcaagtaa aacctctaca aatgtggtaa aatcgataag gatctgaacg atggagcgga
2280gaatgggcgg aactgggcgg agttaggggc gggatgggcg gagttagggg cgggactatg
2340gttgctgact aattgagatg catgctttgc atacttctgc ctgctgggga gcctggggac
2400tttccacacc tggttgctga ctaattgaga tgcatgcttt gcatacttct gcctgctggg
2460gagcctgggg actttccaca ccctaactga cacacattcc acagcggatc cgtcgaccga
2520tgcccttgag agccttcaac ccagtcagct ccttccggtg ggcgcggggc atgactatcg
2580tcgccgcact tatgactgtc ttctttatca tgcaactcgt aggacaggtg ccggcagcgc
2640tcttccgctt cctcgctcac tgactcgctg cgctcggtcg ttcggctgcg gcgagcggta
2700tcagctcact caaaggcggt aatacggtta tccacagaat caggggataa cgcaggaaag
2760aacatgtgag caaaaggcca gcaaaaggcc aggaaccgta aaaaggccgc gttgctggcg
2820tttttccata ggctccgccc ccctgacgag catcacaaaa atcgacgctc aagtcagagg
2880tggcgaaacc cgacaggact ataaagatac caggcgtttc cccctggaag ctccctcgtg
2940cgctctcctg ttccgaccct gccgcttacc ggatacctgt ccgcctttct cccttcggga
3000agcgtggcgc tttctcatag ctcacgctgt aggtatctca gttcggtgta ggtcgttcgc
3060tccaagctgg gctgtgtgca cgaacccccc gttcagcccg accgctgcgc cttatccggt
3120aactatcgtc ttgagtccaa cccggtaaga cacgacttat cgccactggc agcagccact
3180ggtaacagga ttagcagagc gaggtatgta ggcggtgcta cagagttctt gaagtggtgg
3240cctaactacg gctacactag aagaacagta tttggtatct gcgctctgct gaagccagtt
3300accttcggaa aaagagttgg tagctcttga tccggcaaac aaaccaccgc tggtagcggt
3360ggtttttttg tttgcaagca gcagattacg cgcagaaaaa aaggatctca agaagatcct
3420ttgatctttt ctacggggtc tgacgctcag tggaacgaaa actcacgtta agggattttg
3480gtcatgagat tatcaaaaag gatcttcacc tagatccttt taaattaaaa atgaagtttt
3540aaatcaatct aaagtatata tgagtaaact tggtctgaca gttaccaatg cttaatcagt
3600gaggcaccta tctcagcgat ctgtctattt cgttcatcca tagttgcctg actccccgtc
3660gtgtagataa ctacgatacg ggagggctta ccatctggcc ccagtgctgc aatgataccg
3720cgagacccac gctcaccggc tccagattta tcagcaataa accagccagc cggaagggcc
3780gagcgcagaa gtggtcctgc aactttatcc gcctccatcc agtctattaa ttgttgccgg
3840gaagctagag taagtagttc gccagttaat agtttgcgca acgttgttgc cattgctaca
3900ggcatcgtgg tgtcacgctc gtcgtttggt atggcttcat tcagctccgg ttcccaacga
3960tcaaggcgag ttacatgatc ccccatgttg tgcaaaaaag cggttagctc cttcggtcct
4020ccgatcgttg tcagaagtaa gttggccgca gtgttatcac tcatggttat ggcagcactg
4080cataattctc ttactgtcat gccatccgta agatgctttt ctgtgactgg tgagtactca
4140accaagtcat tctgagaata gtgtatgcgg cgaccgagtt gctcttgccc ggcgtcaata
4200cgggataata ccgcgccaca tagcagaact ttaaaagtgc tcatcattgg aaaacgttct
4260tcggggcgaa aactctcaag gatcttaccg ctgttgagat ccagttcgat gtaacccact
4320cgtgcaccca actgatcttc agcatctttt actttcacca gcgtttctgg gtgagcaaaa
4380acaggaaggc aaaatgccgc aaaaaaggga ataagggcga cacggaaatg ttgaatactc
4440atactcttcc tttttcaata ttattgaagc atttatcagg gttattgtct catgagcgga
4500tacatatttg aatgtattta gaaaaataaa caaatagggg ttccgcgcac atttccccga
4560aaagtgccac ctgacgcgcc ctgtagcggc gcattaagcg cggcgggtgt ggtggttacg
4620cgcagcgtga ccgctacact tgccagcgcc ctagcgcccg ctcctttcgc tttcttccct
4680tcctttctcg ccacgttcgc cggctttccc cgtcaagctc taaatcgggg gctcccttta
4740gggttccgat ttagtgcttt acggcacctc gaccccaaaa aacttgatta gggtgatggt
4800tcacgtagtg ggccatcgcc ctgatagacg gtttttcgcc ctttgacgtt ggagtccacg
4860ttctttaata gtggactctt gttccaaact ggaacaacac tcaaccctat ctcggtctat
4920tcttttgatt tataagggat tttgccgatt tcggcctatt ggttaaaaaa tgagctgatt
4980taacaaaaat ttaacgcgaa ttttaacaaa atattaacgc ttacaatttg ccattcgcca
5040ttcaggctgc gcaactgttg ggaagggcga tcggtgcggg cctcttcgct attacgccag
5100cccaagctac catgataagt aagtaatatt aaggtacggg aggtacttgg agcggccgca
5160ataaaatatc tttattttca ttacatctgt gtgttggttt tttgtgtgaa tcgatagtac
5220taacatacgc tctccatcaa aacaaaacga aacaaaacaa actagcaaaa taggctgtcc
5280ccagtgcaag tgcaggtgcc agaacatttc tctatcgata
5320185212DNAArtificial SequenceGUN1-4x target 18ggtaccgagc tcttacgcgt
gctagcccgg gctcgagatc tgatatcaag taatttgtcg 60atttgtttct aatttgtcga
ttgaattcta atttgtcgat tatcttctaa tttgtcgatt 120cagttcgctt actagtgtcg
aggtaggcgt gtacggtggg aggcctatat aagcagagct 180cgtttagtga accgtcagat
cgcctggagg taccgccacc atggaagatg ccaaaaacat 240taagaagggc ccagcgccat
tctacccact cgaagacggg accgccggcg agcagctgca 300caaagccatg aagcgctacg
ccctggtgcc cggcaccatc gcctttaccg acgcacatat 360cgaggtggac attacctacg
ccgagtactt cgagatgagc gttcggctgg cagaagctat 420gaagcgctat gggctgaata
caaaccatcg gatcgtggtg tgcagcgaga atagcttgca 480gttcttcatg cccgtgttgg
gtgccctgtt catcggtgtg gctgtggccc cagctaacga 540catctacaac gagcgcgagc
tgctgaacag catgggcatc agccagccca ccgtcgtatt 600cgtgagcaag aaagggctgc
aaaagatcct caacgtgcaa aagaagctac cgatcataca 660aaagatcatc atcatggata
gcaagaccga ctaccagggc ttccaaagca tgtacacctt 720cgtgacttcc catttgccac
ccggcttcaa cgagtacgac ttcgtgcccg agagcttcga 780ccgggacaaa accatcgccc
tgatcatgaa cagtagtggc agtaccggat tgcccaaggg 840cgtagcccta ccgcaccgca
ccgcttgtgt ccgattcagt catgcccgcg accccatctt 900cggcaaccag atcatccccg
acaccgctat cctcagcgtg gtgccatttc accacggctt 960cggcatgttc accacgctgg
gctacttgat ctgcggcttt cgggtcgtgc tcatgtaccg 1020cttcgaggag gagctattct
tgcgcagctt gcaagactat aagattcaat ctgccctgct 1080ggtgcccaca ctatttagct
tcttcgctaa gagcactctc atcgacaagt acgacctaag 1140caacttgcac gagatcgcca
gcggcggggc gccgctcagc aaggaggtag gtgaggccgt 1200ggccaaacgc ttccacctac
caggcatccg ccagggctac ggcctgacag aaacaaccag 1260cgccattctg atcacccccg
aaggggacga caagcctggc gcagtaggca aggtggtgcc 1320cttcttcgag gctaaggtgg
tggacttgga caccggtaag acactgggtg tgaaccagcg 1380cggcgagctg tgcgtccgtg
gccccatgat catgagcggc tacgttaaca accccgaggc 1440tacaaacgct ctcatcgaca
aggacggctg gctgcacagc ggcgacatcg cctactggga 1500cgaggacgag cacttcttca
tcgtggaccg gctgaagagc ctgatcaaat acaagggcta 1560ccaggtagcc ccagccgaac
tggagagcat cctgctgcaa caccccaaca tcttcgacgc 1620cggggtcgcc ggcctgcccg
acgacgatgc cggcgagctg cccgccgcag tcgtcgtgct 1680ggaacacggt aaaaccatga
ccgagaagga gatcgtggac tatgtggcca gccaggttac 1740aaccgccaag aagctgcgcg
gtggtgttgt gttcgtggac gaggtgccta aaggactgac 1800cggcaagttg gacgcccgca
agatccgcga gattctcatt aaggccaaga agggcggcaa 1860gatcgccgtg aattcttaac
tgcagttaat ctagagtcgg ggcggccggc cgcttcgagc 1920agacatgata agatacattg
atgagtttgg acaaaccaca actagaatgc agtgaaaaaa 1980atgctttatt tgtgaaattt
gtgatgctat tgctttattt gtaaccatta taagctgcaa 2040taaacaagtt aacaacaaca
attgcattca ttttatgttt caggttcagg gggaggtgtg 2100ggaggttttt taaagcaagt
aaaacctcta caaatgtggt aaaatcgata aggatctgaa 2160cgatggagcg gagaatgggc
ggaactgggc ggagttaggg gcgggatggg cggagttagg 2220ggcgggacta tggttgctga
ctaattgaga tgcatgcttt gcatacttct gcctgctggg 2280gagcctgggg actttccaca
cctggttgct gactaattga gatgcatgct ttgcatactt 2340ctgcctgctg gggagcctgg
ggactttcca caccctaact gacacacatt ccacagcgga 2400tccgtcgacc gatgcccttg
agagccttca acccagtcag ctccttccgg tgggcgcggg 2460gcatgactat cgtcgccgca
cttatgactg tcttctttat catgcaactc gtaggacagg 2520tgccggcagc gctcttccgc
ttcctcgctc actgactcgc tgcgctcggt cgttcggctg 2580cggcgagcgg tatcagctca
ctcaaaggcg gtaatacggt tatccacaga atcaggggat 2640aacgcaggaa agaacatgtg
agcaaaaggc cagcaaaagg ccaggaaccg taaaaaggcc 2700gcgttgctgg cgtttttcca
taggctccgc ccccctgacg agcatcacaa aaatcgacgc 2760tcaagtcaga ggtggcgaaa
cccgacagga ctataaagat accaggcgtt tccccctgga 2820agctccctcg tgcgctctcc
tgttccgacc ctgccgctta ccggatacct gtccgccttt 2880ctcccttcgg gaagcgtggc
gctttctcat agctcacgct gtaggtatct cagttcggtg 2940taggtcgttc gctccaagct
gggctgtgtg cacgaacccc ccgttcagcc cgaccgctgc 3000gccttatccg gtaactatcg
tcttgagtcc aacccggtaa gacacgactt atcgccactg 3060gcagcagcca ctggtaacag
gattagcaga gcgaggtatg taggcggtgc tacagagttc 3120ttgaagtggt ggcctaacta
cggctacact agaagaacag tatttggtat ctgcgctctg 3180ctgaagccag ttaccttcgg
aaaaagagtt ggtagctctt gatccggcaa acaaaccacc 3240gctggtagcg gtggtttttt
tgtttgcaag cagcagatta cgcgcagaaa aaaaggatct 3300caagaagatc ctttgatctt
ttctacgggg tctgacgctc agtggaacga aaactcacgt 3360taagggattt tggtcatgag
attatcaaaa aggatcttca cctagatcct tttaaattaa 3420aaatgaagtt ttaaatcaat
ctaaagtata tatgagtaaa cttggtctga cagttaccaa 3480tgcttaatca gtgaggcacc
tatctcagcg atctgtctat ttcgttcatc catagttgcc 3540tgactccccg tcgtgtagat
aactacgata cgggagggct taccatctgg ccccagtgct 3600gcaatgatac cgcgagaccc
acgctcaccg gctccagatt tatcagcaat aaaccagcca 3660gccggaaggg ccgagcgcag
aagtggtcct gcaactttat ccgcctccat ccagtctatt 3720aattgttgcc gggaagctag
agtaagtagt tcgccagtta atagtttgcg caacgttgtt 3780gccattgcta caggcatcgt
ggtgtcacgc tcgtcgtttg gtatggcttc attcagctcc 3840ggttcccaac gatcaaggcg
agttacatga tcccccatgt tgtgcaaaaa agcggttagc 3900tccttcggtc ctccgatcgt
tgtcagaagt aagttggccg cagtgttatc actcatggtt 3960atggcagcac tgcataattc
tcttactgtc atgccatccg taagatgctt ttctgtgact 4020ggtgagtact caaccaagtc
attctgagaa tagtgtatgc ggcgaccgag ttgctcttgc 4080ccggcgtcaa tacgggataa
taccgcgcca catagcagaa ctttaaaagt gctcatcatt 4140ggaaaacgtt cttcggggcg
aaaactctca aggatcttac cgctgttgag atccagttcg 4200atgtaaccca ctcgtgcacc
caactgatct tcagcatctt ttactttcac cagcgtttct 4260gggtgagcaa aaacaggaag
gcaaaatgcc gcaaaaaagg gaataagggc gacacggaaa 4320tgttgaatac tcatactctt
cctttttcaa tattattgaa gcatttatca gggttattgt 4380ctcatgagcg gatacatatt
tgaatgtatt tagaaaaata aacaaatagg ggttccgcgc 4440acatttcccc gaaaagtgcc
acctgacgcg ccctgtagcg gcgcattaag cgcggcgggt 4500gtggtggtta cgcgcagcgt
gaccgctaca cttgccagcg ccctagcgcc cgctcctttc 4560gctttcttcc cttcctttct
cgccacgttc gccggctttc cccgtcaagc tctaaatcgg 4620gggctccctt tagggttccg
atttagtgct ttacggcacc tcgaccccaa aaaacttgat 4680tagggtgatg gttcacgtag
tgggccatcg ccctgataga cggtttttcg ccctttgacg 4740ttggagtcca cgttctttaa
tagtggactc ttgttccaaa ctggaacaac actcaaccct 4800atctcggtct attcttttga
tttataaggg attttgccga tttcggccta ttggttaaaa 4860aatgagctga tttaacaaaa
atttaacgcg aattttaaca aaatattaac gcttacaatt 4920tgccattcgc cattcaggct
gcgcaactgt tgggaagggc gatcggtgcg ggcctcttcg 4980ctattacgcc agcccaagct
accatgataa gtaagtaata ttaaggtacg ggaggtactt 5040ggagcggccg caataaaata
tctttatttt cattacatct gtgtgttggt tttttgtgtg 5100aatcgatagt actaacatac
gctctccatc aaaacaaaac gaaacaaaac aaactagcaa 5160aataggctgt ccccagtgca
agtgcaggtg ccagaacatt tctctatcga ta 5212195288DNAArtificial
SequenceGUN1-8x target 19ggtaccgagc tcttacgcgt gctagcccgg gctcgagatc
tgatatcaag taatttgtcg 60attccatggt aatttgtcga ttcacgacta atttgtcgat
ttgtttctaa tttgtcgatt 120gaattctaat ttgtcgatta agttctaatt tgtcgatttt
cgaataattt gtcgattatc 180ttctaatttg tcgattcagt tcgcttacta gtgtcgaggt
aggcgtgtac ggtgggaggc 240ctatataagc agagctcgtt tagtgaaccg tcagatcgcc
tggaggtacc gccaccatgg 300aagatgccaa aaacattaag aagggcccag cgccattcta
cccactcgaa gacgggaccg 360ccggcgagca gctgcacaaa gccatgaagc gctacgccct
ggtgcccggc accatcgcct 420ttaccgacgc acatatcgag gtggacatta cctacgccga
gtacttcgag atgagcgttc 480ggctggcaga agctatgaag cgctatgggc tgaatacaaa
ccatcggatc gtggtgtgca 540gcgagaatag cttgcagttc ttcatgcccg tgttgggtgc
cctgttcatc ggtgtggctg 600tggccccagc taacgacatc tacaacgagc gcgagctgct
gaacagcatg ggcatcagcc 660agcccaccgt cgtattcgtg agcaagaaag ggctgcaaaa
gatcctcaac gtgcaaaaga 720agctaccgat catacaaaag atcatcatca tggatagcaa
gaccgactac cagggcttcc 780aaagcatgta caccttcgtg acttcccatt tgccacccgg
cttcaacgag tacgacttcg 840tgcccgagag cttcgaccgg gacaaaacca tcgccctgat
catgaacagt agtggcagta 900ccggattgcc caagggcgta gccctaccgc accgcaccgc
ttgtgtccga ttcagtcatg 960cccgcgaccc catcttcggc aaccagatca tccccgacac
cgctatcctc agcgtggtgc 1020catttcacca cggcttcggc atgttcacca cgctgggcta
cttgatctgc ggctttcggg 1080tcgtgctcat gtaccgcttc gaggaggagc tattcttgcg
cagcttgcaa gactataaga 1140ttcaatctgc cctgctggtg cccacactat ttagcttctt
cgctaagagc actctcatcg 1200acaagtacga cctaagcaac ttgcacgaga tcgccagcgg
cggggcgccg ctcagcaagg 1260aggtaggtga ggccgtggcc aaacgcttcc acctaccagg
catccgccag ggctacggcc 1320tgacagaaac aaccagcgcc attctgatca cccccgaagg
ggacgacaag cctggcgcag 1380taggcaaggt ggtgcccttc ttcgaggcta aggtggtgga
cttggacacc ggtaagacac 1440tgggtgtgaa ccagcgcggc gagctgtgcg tccgtggccc
catgatcatg agcggctacg 1500ttaacaaccc cgaggctaca aacgctctca tcgacaagga
cggctggctg cacagcggcg 1560acatcgccta ctgggacgag gacgagcact tcttcatcgt
ggaccggctg aagagcctga 1620tcaaatacaa gggctaccag gtagccccag ccgaactgga
gagcatcctg ctgcaacacc 1680ccaacatctt cgacgccggg gtcgccggcc tgcccgacga
cgatgccggc gagctgcccg 1740ccgcagtcgt cgtgctggaa cacggtaaaa ccatgaccga
gaaggagatc gtggactatg 1800tggccagcca ggttacaacc gccaagaagc tgcgcggtgg
tgttgtgttc gtggacgagg 1860tgcctaaagg actgaccggc aagttggacg cccgcaagat
ccgcgagatt ctcattaagg 1920ccaagaaggg cggcaagatc gccgtgaatt cttaactgca
gttaatctag agtcggggcg 1980gccggccgct tcgagcagac atgataagat acattgatga
gtttggacaa accacaacta 2040gaatgcagtg aaaaaaatgc tttatttgtg aaatttgtga
tgctattgct ttatttgtaa 2100ccattataag ctgcaataaa caagttaaca acaacaattg
cattcatttt atgtttcagg 2160ttcaggggga ggtgtgggag gttttttaaa gcaagtaaaa
cctctacaaa tgtggtaaaa 2220tcgataagga tctgaacgat ggagcggaga atgggcggaa
ctgggcggag ttaggggcgg 2280gatgggcgga gttaggggcg ggactatggt tgctgactaa
ttgagatgca tgctttgcat 2340acttctgcct gctggggagc ctggggactt tccacacctg
gttgctgact aattgagatg 2400catgctttgc atacttctgc ctgctgggga gcctggggac
tttccacacc ctaactgaca 2460cacattccac agcggatccg tcgaccgatg cccttgagag
ccttcaaccc agtcagctcc 2520ttccggtggg cgcggggcat gactatcgtc gccgcactta
tgactgtctt ctttatcatg 2580caactcgtag gacaggtgcc ggcagcgctc ttccgcttcc
tcgctcactg actcgctgcg 2640ctcggtcgtt cggctgcggc gagcggtatc agctcactca
aaggcggtaa tacggttatc 2700cacagaatca ggggataacg caggaaagaa catgtgagca
aaaggccagc aaaaggccag 2760gaaccgtaaa aaggccgcgt tgctggcgtt tttccatagg
ctccgccccc ctgacgagca 2820tcacaaaaat cgacgctcaa gtcagaggtg gcgaaacccg
acaggactat aaagatacca 2880ggcgtttccc cctggaagct ccctcgtgcg ctctcctgtt
ccgaccctgc cgcttaccgg 2940atacctgtcc gcctttctcc cttcgggaag cgtggcgctt
tctcatagct cacgctgtag 3000gtatctcagt tcggtgtagg tcgttcgctc caagctgggc
tgtgtgcacg aaccccccgt 3060tcagcccgac cgctgcgcct tatccggtaa ctatcgtctt
gagtccaacc cggtaagaca 3120cgacttatcg ccactggcag cagccactgg taacaggatt
agcagagcga ggtatgtagg 3180cggtgctaca gagttcttga agtggtggcc taactacggc
tacactagaa gaacagtatt 3240tggtatctgc gctctgctga agccagttac cttcggaaaa
agagttggta gctcttgatc 3300cggcaaacaa accaccgctg gtagcggtgg tttttttgtt
tgcaagcagc agattacgcg 3360cagaaaaaaa ggatctcaag aagatccttt gatcttttct
acggggtctg acgctcagtg 3420gaacgaaaac tcacgttaag ggattttggt catgagatta
tcaaaaagga tcttcaccta 3480gatcctttta aattaaaaat gaagttttaa atcaatctaa
agtatatatg agtaaacttg 3540gtctgacagt taccaatgct taatcagtga ggcacctatc
tcagcgatct gtctatttcg 3600ttcatccata gttgcctgac tccccgtcgt gtagataact
acgatacggg agggcttacc 3660atctggcccc agtgctgcaa tgataccgcg agacccacgc
tcaccggctc cagatttatc 3720agcaataaac cagccagccg gaagggccga gcgcagaagt
ggtcctgcaa ctttatccgc 3780ctccatccag tctattaatt gttgccggga agctagagta
agtagttcgc cagttaatag 3840tttgcgcaac gttgttgcca ttgctacagg catcgtggtg
tcacgctcgt cgtttggtat 3900ggcttcattc agctccggtt cccaacgatc aaggcgagtt
acatgatccc ccatgttgtg 3960caaaaaagcg gttagctcct tcggtcctcc gatcgttgtc
agaagtaagt tggccgcagt 4020gttatcactc atggttatgg cagcactgca taattctctt
actgtcatgc catccgtaag 4080atgcttttct gtgactggtg agtactcaac caagtcattc
tgagaatagt gtatgcggcg 4140accgagttgc tcttgcccgg cgtcaatacg ggataatacc
gcgccacata gcagaacttt 4200aaaagtgctc atcattggaa aacgttcttc ggggcgaaaa
ctctcaagga tcttaccgct 4260gttgagatcc agttcgatgt aacccactcg tgcacccaac
tgatcttcag catcttttac 4320tttcaccagc gtttctgggt gagcaaaaac aggaaggcaa
aatgccgcaa aaaagggaat 4380aagggcgaca cggaaatgtt gaatactcat actcttcctt
tttcaatatt attgaagcat 4440ttatcagggt tattgtctca tgagcggata catatttgaa
tgtatttaga aaaataaaca 4500aataggggtt ccgcgcacat ttccccgaaa agtgccacct
gacgcgccct gtagcggcgc 4560attaagcgcg gcgggtgtgg tggttacgcg cagcgtgacc
gctacacttg ccagcgccct 4620agcgcccgct cctttcgctt tcttcccttc ctttctcgcc
acgttcgccg gctttccccg 4680tcaagctcta aatcgggggc tccctttagg gttccgattt
agtgctttac ggcacctcga 4740ccccaaaaaa cttgattagg gtgatggttc acgtagtggg
ccatcgccct gatagacggt 4800ttttcgccct ttgacgttgg agtccacgtt ctttaatagt
ggactcttgt tccaaactgg 4860aacaacactc aaccctatct cggtctattc ttttgattta
taagggattt tgccgatttc 4920ggcctattgg ttaaaaaatg agctgattta acaaaaattt
aacgcgaatt ttaacaaaat 4980attaacgctt acaatttgcc attcgccatt caggctgcgc
aactgttggg aagggcgatc 5040ggtgcgggcc tcttcgctat tacgccagcc caagctacca
tgataagtaa gtaatattaa 5100ggtacgggag gtacttggag cggccgcaat aaaatatctt
tattttcatt acatctgtgt 5160gttggttttt tgtgtgaatc gatagtacta acatacgctc
tccatcaaaa caaaacgaaa 5220caaaacaaac tagcaaaata ggctgtcccc agtgcaagtg
caggtgccag aacatttctc 5280tatcgata
52882035PRTArabidopsis thaliana 20Val Thr Tyr Asn
Thr Leu Ile Ser Gly Leu Cys Lys Ala Gly Arg Leu1 5
10 15Glu Glu Ala Leu Glu Leu Phe Glu Glu Met
Lys Glu Lys Gly Ile Ala 20 25
30Pro Asp Val 352131PRTArabidopsis thaliana 21Val Val Tyr Asn Ala
Leu Ile Asp Met Tyr Ala Lys Cys Gly Asp Leu1 5
10 15Glu Glu Ala Arg Lys Val Phe Asp Glu Met Pro
Glu Arg Asp Val 20 25
302235PRTArabidopsis thaliana 22Phe Thr Leu Ala Ser Val Leu Lys Ala Cys
Ala Ser Leu Gly Ala Leu1 5 10
15Ser Leu Gly Lys Gln Ile His Gly Tyr Val Ile Lys Ser Gly Phe Asp
20 25 30Ser Asp Glu
352336PRTArabidopsis thaliana 23Val Thr Phe Leu Gly Val Leu Ser Ala Cys
Ser His Ser Gly Leu Val1 5 10
15Glu Glu Gly Leu Glu Tyr Phe Glu Ser Met Lys Glu Lys Tyr Gly Ile
20 25 30Glu Pro Asp Glu
35
User Contributions:
Comment about this patent or add new information about this topic: