Patents - stay tuned to the technology

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Patent application title: MECP2 BASED THERAPY

Inventors:
IPC8 Class: AC07K1400FI
USPC Class: 1 1
Class name:
Publication date: 2021-04-08
Patent application number: 20210101938



Abstract:

MeCP2 based therapy. The present invention relates to synthetic polypeptides that are useful in the treatment of disorders associated with reduced MeCP2 activity, including Rett syndrome. The present invention provides synthetic polypeptides comprising: i) an MBD amino acid sequence showing at least 70% similarity with the amino acid sequence showing at least 70% similarity with the amino acid sequence as depicted in SEQ ID NO: 2, wherein the polypeptide has a deletion of at least 50 amino acids, when compared to the full length MeCP2 e1 and e2 sequences. The invention further provides nucleic acid constructs, expression vectors, virions, pharmaceutical compositions, and cells providing polynucleotides of the invention. The invention further provides methods of treating or preventing disease in an animal comprising administering to said animal a synthetic polypeptide according to the invention.

Claims:

1. A synthetic polypeptide comprising: i) an MBD amino acid sequence showing at least 70% similarity with the amino acid sequence as depicted in SEQ ID NO: 1; and ii) an NID amino acid sequence showing at least 70% similarity with the amino acid sequence as depicted in SEQ ID NO: 2, wherein the polypeptide has a deletion of at least 50 amino acids, when compared to the full length MeCP2 e1 and e2 sequences (SEQ ID Nos 3 and 4).

2. A synthetic polypeptide according to claim 1 wherein the polypeptide has less than 90% identity over the entire length of the amino acid sequences of MeCP2 as depicted in SEQ ID NO: 3 and SEQ ID NO: 4.

3. A synthetic polypeptide according to claim 1 or claim 2, having the structure: A-B-C-D-E wherein portion B of the synthetic polypeptide is said MBD amino acid sequence, and portion D of the synthetic polypeptide is said NID amino acid sequence, and further wherein: portion A of the synthetic polypeptide is less than 40 amino acids long and/or has less than 80% identity to the amino acid sequences as depicted in SEQ ID NOs:5 and 6, calculated over the entire length of the amino acid sequences as depicted in SEQ ID NOs: 5 and 6; portion C of the synthetic polypeptide is less than 20 amino acids long and/or has less than 80% identity to the amino acid sequence as depicted in SEQ ID NO: 7, calculated over the entire length of the amino acid sequence as depicted in SEQ ID NO: 7; and/or portion E of the synthetic polypeptide is absent, a protein tag, and/or has less than 80% identity to the amino acid sequence as depicted in SEQ ID NO: 8, calculated over the entire length of the amino acid sequence as depicted in SEQ ID NO: 8.

4. A synthetic polypeptide according to any of claims 1 to 3 wherein said synthetic polypeptide is capable of recruiting a NCoR/SMRT co-repressor complex component, such as NCoR/SMRT, HDAC3, GPS2, TBL1X or TBLR1, preferably TBL1X or TBLR1, to methylated DNA.

5. A synthetic polypeptide according to any preceding claim wherein said synthetic polypeptide consists of less than 430 amino acids, preferably less than 400, 350, 320, 270, or 200 amino acids, and further preferably less than 180 amino acids.

6. A synthetic polypeptide according to any preceding claim wherein said polypeptide comprises a nuclear localization signal (NLS), preferably wherein said NLS is comprised within the amino acid sequence between the MBD and NID.

7. A synthetic polypeptide according to any preceding claim wherein the amino acid sequence between the MBD and NID amino acid sequences has less than 75% identity to the amino acid sequence as depicted in SEQ ID NO: 7, calculated over the entire length of the amino acid sequence as depicted in SEQ ID NO: 7, preferably less than 50%, and further preferably less than 30% identity.

8. A synthetic polypeptide according to any preceding claim wherein the amino acid sequence between the MBD and NID amino acid sequences is less than 50 amino acids long, preferably less than 30 amino acids long, and further preferably less than 20 amino acids long.

9. A synthetic polypeptide according to any preceding claim wherein the amino acid sequence between the MBD and NID amino acid sequences has a substitution or deletion of at least 10 consecutive amino acids compared to the amino acid sequence from position 207 to position 271 of the full length human wild type MeCP2 polypeptide sequence (e2 isoform) as shown in SEQ ID NO: 4.

10. A synthetic polypeptide according to any preceding claim wherein the amino acid sequence adjacent to the carboxy end of the NID amino acid sequence has less than 75% identity to the amino acid sequence as depicted in SEQ ID NO: 8, calculated over the entire length of the amino acid sequence as depicted in SEQ ID NO: 8, preferably less than 50%, and further preferably less than 30% identity.

11. A synthetic polypeptide according to any preceding claim wherein the amino acid sequence adjacent to the carboxy end of the NID amino acid sequence is less than 50 amino acids long, preferably less than 30 amino acids long or less than 20 amino acids long, and further preferably wherein there is no amino acid sequence adjacent to the carboxy end of the NID amino acid sequence.

12. A synthetic polypeptide according to any preceding claim wherein the amino acid sequence adjacent to the amino end of the MBD amino acid sequence has less than 75% identity to the amino acid sequences as depicted in SEQ ID NOs: 5 and 6, calculated over the entire length of the amino acid sequences as depicted in SEQ ID NOs: 5 and 6, preferably less than 50%, and further preferably less than 30% identity.

13. A synthetic polypeptide according to any preceding claim wherein the amino acid sequence adjacent to the amino end of the MBD amino acid sequence is less than 50 amino acids long, preferably less than 30 amino acids long or less than 20 amino acids long, and further preferably less than 10 amino acids long.

14. A synthetic polypeptide according to any preceding claim wherein the polypeptide has less than 90% identity over the entire length of the amino acid sequences of MeCP2 as depicted in SEQ ID NO: 3 and SEQ ID NO: 4, preferably less than 80% identity, less than 70% identity, or less than 60% identity, and further preferably less than 40% identity.

15. A nucleic acid construct that encodes a polypeptide according to any preceding claim.

16. An expression vector comprising a nucleotide sequence encoding a synthetic polypeptide according to any of claims 1 to 14.

17. An expression vector according to claim 16 further comprising one or more control elements selected from: a promoter for expression of the nucleotide sequence in neuronal cells, for example an Mecp2 or MECP2 promoter, one or more downstream miR binding sites from the MECP2 or Mecp2 3'UTR, and an AU-rich element.

18. An expression vector according to claim 16 or claim 17 which is a viral vector, such as a retroviral vector, an adenoviral vector, an adeno-associated viral vector, or an alphaviral vector.

19. A virion comprising a vector according to claim 18.

20. A pharmaceutical composition comprising a synthetic polypeptide according to any of claims 1 to 14, a nucleic acid construct according to claim 15, an expression vector according to any of claims 16 to 18 and/or a virion according to claim 19.

21. A cell comprising a synthetic genetic construct adapted to express a polypeptide according to any of claims 1 to 14.

22. A cell according to claim 21 comprising a vector according to any of claims 16 to 18.

23. A cell according to claim 21 or 22 for producing a virion according to claim 19.

24. A method of treating or preventing disease in an animal comprising administering to said animal a synthetic polypeptide according to any of claims 1 to 14.

25. A method according to claim 24 wherein said disease is a neurological disorder associated with inactivating mutation of MeCP2, for example Rett syndrome.

26. A method according to claim 24 or 25, wherein said administering comprises administering a composition comprising a synthetic polypeptide according to any of claims 1 to 14, a nucleic acid construct according to claim 15, an expression vector according to any of claims 16 to 18, a virion according to claim 19 and/or a pharmaceutical composition according to claim 20.

27. A synthetic polypeptide according to any of claims 1 to 14, a nucleic acid construct according to claim 15, an expression vector according to any of claims 16 to 18, a virion according to claim 19 and/or a pharmaceutical composition according to claim 20 for the treatment or prevention of a neurological disorder associated with inactivating mutation of MeCP2, for example Rett syndrome.

28. The use of a synthetic polypeptide according to any of claims 1 to 14, a nucleic acid construct according to claim 15, an expression vector according to any of claims 16 to 18, a virion according to claim 19 and/or a pharmaceutical composition according to claim 20 in the manufacture of a medicament for the treatment or prevention of a neurological disorder associated with inactivating mutation of MeCP2, for example Rett syndrome.

Description:

FIELD OF THE INVENTION

[0001] The present invention relates to synthetic polypeptides that are useful in the treatment of disorders associated with reduced MeCP2 activity, including Rett syndrome. The invention also relates to nucleic acid constructs, expression vectors, virions and cells for expressing the synthetic polypeptides. Further, the invention concerns methods of treating disorders, such as Rett syndrome, using the synthetic polypeptides of the invention, the use of the synthetic polypeptides, nucleic acid constructs, expression vectors, virions and cells in the manufacture of medicaments for the treatment of disorders associated with reduced MeCP2 activity, including Rett syndrome, and pharmaceutical compositions comprising the synthetic polypeptides, nucleic acid constructs, expression vectors, and virions of the invention.

BACKGROUND TO THE INVENTION

[0002] Methyl CpG-binding Protein 2 (MeCP2) is a nuclear protein that was named for its ability to preferentially bind methylated DNA. Interest in MeCP2 increased when mutations in the MECP2 gene were identified in the majority of Rett syndrome patients.

[0003] Rett syndrome occurs in about 1 in 15,000 girls. Although it is in theory a rare disease because it affects fewer than 1 in 2000 individuals, it is actually one of the most common genetic causes of intellectual disability in women. It was originally considered to be a neurodevelopmental disorder, due to the decreased, arrested and retarded development of those with the disorder from the age of about 6 months. However, the fact that some of the main symptoms have been found to be reversible in a mouse model of the disease means that it is now generally considered to be a neurological disorder.

[0004] The MECP2 gene is located on the X chromosome. It spans 76 kb and is composed of four exons. The MeCP2 protein has two isoforms, MeCP2 e1 and MeCP2 e2, which differ at the N-terminus of the protein. The isoform e1 is made up of 498 amino acids and isoform e2 is 486 amino acids long. The MECP2 (human) and Mecp2 (mouse) genes consist of four exons and undergo alternative splicing to produce the two mRNA species: e1 consists of exons 1,3 and 4; and e2 consists of 1,2,3 and 4. Translation starts from exon 1 or 2 in isoforms e1 and e2, respectively. Since the vast majority of the coding sequence is in exons 3 and 4, these two isoforms are very similar and only differ at the extreme N-termini. The mRNA of the MECP2 e1 variant has greater expression in the brain than that of the MECP2 e2 and the e1 protein is more abundant in the mouse and human brain. MeCP2 is an abundant mammalian protein that selectively binds 5-methyl cytosine residues in symmetrically methylated mCpG dinucleotides and asymmetrically methylated mCpA dinucleotides. CpG dinucleotides are preferentially located in the promoter regions of genes, but these are mostly unmethylated. In comparison, it is the CpG dinucleotides in the bulk genome that are highly methylated and it is to these that MeCP2 binds. The presence of mCpA methylated in neurons further increases the number of binding sites. In this way, MeCP2 regulates gene transcription by binding in the main body of gene sequences.sup.1.

[0005] MeCP2 is highly conserved across vertebrates, and at least six biochemically distinct domains have been identified in the protein.sup.2, including High Mobility Group Protein-like Domains, the Methyl Binding Domain (MBD), the Transcriptional Repression Domain comprising the NCoR/SMRT Interaction Domain (NID), and the C-terminal domains .alpha. and .beta.. Functionally, MeCP2 has been implicated in several cellular processes based on its reported interaction with >40 binding partners.sup.3, including transcriptional co-repressors.sup.4 (e.g. NCoR/SMRT.sup.5), transcriptional activators.sup.6, RNA.sup.7, chromatin remodellers.sup.8,9, microRNA-processing proteins.sup.10, and splicing factors.sup.11. Accordingly, MeCP2 has been cast as a multi-functional hub that integrates diverse functions that are essential in mature neurons.sup.12.

[0006] There are currently no treatments available that are specific for Rett syndrome. Instead, treatment generally involves treating the symptoms of the disease using traditional drugs, whilst preventative strategies involve aggressive nutritional management, prevention of gastrointestinal and orthopedic complications, and rehabilitation therapies. Thus there remains a pressing need for a means of specifically treating and preventing the development of Rett syndrome.

SUMMARY OF THE INVENTION

[0007] Accordingly, the present invention provides synthetic polypeptides comprising an MBD amino acid sequence showing at least 70% similarity with the amino acid sequence as depicted in SEQ ID NO: 1 and an NID amino acid sequence showing at least 70% similarity with the amino acid sequence as depicted in SEQ ID NO: 2.

[0008] The synthetic polypeptides of the invention may have a deletion or substitution of at least 50 amino acids compared to the full length MeCP2 e1 and e2 sequences (SEQ ID NOs 3 and 4). Additionally, or alternatively, the synthetic polypeptides of the invention have less than 90% identity over the entire length of the amino acid sequence of MeCP2 as depicted in SEQ ID NO: 3 and/or SEQ ID NO: 4.

[0009] Generally, the synthetic polypeptides of the invention will comprise MBD and NID domains in accordance with MeCP2, but will be lacking other parts of the natural sequence of MeCP2.

[0010] Thus any deletion or substitution in the synthetic polypeptide may be of a part of the natural sequence of MeCP2, but not of a part of the MBD or NID of MeCP2. Thus the synthetic polypeptides of the invention may generally have the structure

[0011] A-B-C-D-E

[0012] wherein portion B of the synthetic polypeptide is the MBD amino acid sequence showing at least 70% similarity with the amino acid sequence as depicted in SEQ ID NO: 1, and portion D of the synthetic polypeptide is the NID amino acid sequence showing at least 70% similarity with the amino acid sequence as depicted in SEQ ID NO: 2, and further wherein at least one of the three following are true: portion A of the synthetic polypeptide is less than 30 amino acids long and/or has less than 90% identity to the amino acid sequences as depicted in SEQ ID NOs: 5 and 6, calculated over the entire length of the amino acid sequences as depicted in SEQ ID NO: 5 and 6; portion C of the synthetic polypeptide is less than 20 amino acids long and/or has less than 90% identity to the amino acid sequence as depicted in SEQ ID NO: 7, calculated over the entire length of the amino acid sequence as depicted in SEQ ID NO: 7; and portion E of the synthetic polypeptide is absent, a polypeptide tag, and/or has less than 90% identity to the amino acid sequence as depicted in SEQ ID NO: 8, calculated over the entire length of the amino acid sequence as depicted in SEQ ID NO: 8. The skilled person will appreciate that this general structure is disclosed from left to right in the accepted N-terminal to C-terminal direction of the synthetic polypeptide.

[0013] The invention also provides nucleic acid constructs encoding a synthetic polypeptide of the invention, and expression vectors comprising a nucleotide sequence encoding a synthetic polypeptide of the invention. The expression vector may be a viral vector, and thus the invention also provides a virion comprising an expression vector according to the invention. The invention also provides cells that comprise a synthetic genetic construct adapted to express a polypeptide of the invention, cells comprising a vector of the invention, and cells for producing a virion of the invention.

[0014] The invention also provides pharmaceutical compositions comprising the synthetic polypeptides, nucleic acids constructs, expression vectors and/or virions of the invention.

[0015] The synthetic polypeptides of the invention have utility in medicine, and particularly in the treatment of neurological disorders associated with inactivation, such as an inactivating mutation, of MECP2. Such disorders include Rett syndrome. Therefore the invention provides a method of treating or preventing disease in an animal comprising administering to said animal a synthetic polypeptide of the invention. Said administering may comprise administering a synthetic polypeptide of the invention, an expression vector of the invention, a virion of the invention and/or a pharmaceutical composition of the invention.

[0016] Furthermore, the invention provides synthetic polypeptides of the invention, expression vectors of the invention, and virions of the invention for the treatment or prevention of a neurological disorder associated with inactivating mutation of MECP2, for example Rett syndrome. The invention also provides the use of synthetic polypeptides of the invention, expression vectors of the invention, and virions of the invention in the manufacture of a medicament for the treatment or prevention of a neurological disorder associated with inactivating mutation of MECP2, for example Rett syndrome.

DETAILED DESCRIPTION

[0017] The synthetic polypeptides of the invention provide therapeutic proteins that can be used in the treatment of disorders that are caused by inactivation or reduced activity of MeCP2. Such disorders include, in particular, Rett syndrome. The nucleic acid constructs, expression vectors, virions and cells of the invention can be used to produce and, optionally, deliver the synthetic polypeptides. The invention also provides methods of treatment using the products of the invention, and the use of those products in those treatments.

[0018] The invention is based on the inventors' surprising and unexpected finding that it is a deficiency in the biological activity associated with the MBD and NID of MeCP2 that is key to the development of Rett syndrome, such that other parts of the protein are not necessary, despite the fact that MeCP2 is a highly conserved protein. They generated and tested the hypothesis that the MeCP2 functions that are vital in Rett syndrome are those due to MeCP2 forming a bridge between chromatin and the NCoR/SMRT complex, so that all other domains of MeCP2 are dispensable. Furthermore, they hypothesised that the Rett syndrome mutations occurring within the MBD or NID domains interfere directly with this function, and that Rett syndrome mutations occurring outside the MBD and NID either destabilise the protein generally or specifically impair the bridge between the chromatin and the NCoR/SMRT complex. As a result of their studies and understanding, they have concluded that the MBD and the NID are therefore sufficient for the MeCP2 function required to treat or prevent Rett syndrome and similar disorders. This means that they are able to provide a "mini-MeCP2" protein derivative by jettisoning a significant portion, for example in some embodiments up to 50-65%, of the native MeCP2 protein.

[0019] As discussed elsewhere in the specification, the inventors' conclusion means that therapeutic synthetic polypeptides can be prepared that are considerably smaller than the full length MeCP2. Of course, this means that they can be easier to produce and effectively deliver to patients. For example, some delivery vehicles, such as adeno-associated viral (AAV) vectors, are restricted as to the amount of payload they can carry, so the ability to lighten that load by encoding a smaller polypeptide is advantageous. Also, the removal of unnecessary parts of the MeCP2 polypeptide means that alternative polypeptide sequence, such as peptide tags, regulatory tags and/or signaling peptides can be inserted in the polypeptide in some embodiments without making the polypeptide overly large. Similarly, the smaller protein means that a smaller nucleic acid sequence can encode the protein, such that when there are size constraints on the amount of nucleic acid sequence that can be included in a particular construct in some embodiments of the invention, the constructs of that type that encode the polypeptides of the invention may include additional sequences, such as regulatory elements, that would be difficult to include if the full-length MeCP2 protein was being encoded instead of the smaller polypeptide, due to the size constraints. Furthermore, the removal of other biologically active, but unnecessary, parts of the MeCP2 protein means that there may be less chance of unwanted side-effects due to interactions of those parts of the protein during the therapeutic or preventative treatment.

[0020] Thus the invention provides improved methods of treating or preventing disorders associated with reduced MeCP2 activity, such as Rett syndrome, as well as therapeutic products for use in those methods.

[0021] Furthermore, the inventors have surprisingly and unexpectedly found that although deletion of the part of MeCP2 that links the MBD and NID domains appears to reduce the stability of the synthetic polypeptide having the MBD and NID domains, this reduced stability can have a beneficial effect as it reduces the toxicity that can be associated with over-dosing of subjects with a MeCP2 polypeptide. Thus the invention also provides in some embodiments of the invention improved methods that are safer and less likely to be associated with toxic side effects, as well as the therapeutic products for use in those methods, which provide synthetic polypeptides lacking at least part of the amino acid sequence that links the MBD and NID in MeCP2.

[0022] In order to assist the understanding of the present invention, certain terms used herein will now be further defined, and more generally further details of the invention will be given, in the following paragraphs.

[0023] Synthetic Polypeptides

[0024] The invention provides synthetic polypeptides comprising an MBD amino acid sequence and an NID amino acid sequence.

[0025] As used herein, the term "polypeptide" can be used interchangeably with "peptide" or "protein", and means at least two covalently attached alpha amino acid residues linked by a peptidyl bond. The term polypeptide encompasses purified natural products, or chemical products, which may be produced partially or wholly using recombinant or synthetic techniques. The term polypeptide may refer to a complex of more than one polypeptide, such as a dimer or other multimer, a fusion protein, a protein variant, or derivative thereof. The term also includes modified proteins, for example, a protein modified by glycosylation, acetylation, phosphorylation, pegylation, ubiquitination, and so forth. A polypeptide may comprise amino acids not encoded by a nucleic acid codon.

[0026] As used herein, the term "synthetic polypeptide" refers to polypeptide sequences formed by processes through human agency. The synthetic polypeptides of the invention are based on MeCP2 in that they have biologically active MBD and NID sequences, such as those that occur in wild type MeCP2 proteins, but are distinguished from the naturally occurring MeCP2 proteins. The polypeptides of the invention are synthetic because they include mutations, such as amino acid deletions, substitutions and/or insertions, in the wild type MeCP2 sequences such that the resultant synthetic polypeptides are not known from the art as natural polypeptides.

[0027] "Naturally occurring," "native," or "wild-type" is used to describe an object that can be found in nature as distinct from being artificially produced. For example, a protein or nucleotide sequence present in an organism (including a virus), which can be isolated from a source in nature and that has not been intentionally modified by a person in the laboratory, is naturally occurring.

[0028] The Methyl-CpG Binding Domain (MBD) of MeCP2 has the ability to bind methylated DNA. The human MeCP2 MBD sequence is provided herein as SEQ ID NO: 1. It consists of amino acids 72 to 173, inclusive, of the human MeCP2 protein (numbering refers to the e2 isoform, i.e. as in SEQ ID NO: 4). The mouse MeCP2 MBD sequence is identical to the human sequence. The polypeptides of the invention comprise an MBD having at least 70% similarity to this MeCP2 MBD sequence (SEQ ID NO: 1). Preferably the polypeptides of the invention comprise an MBD having at least 70%, 75%, 80%, 85%, 88%, 90%, 92%, 94%, 95%, 96%, 97%, 98% or 99% similarity to the human MeCP2 MBD sequence (SEQ ID NO:1). Further preferably the polypeptides of the invention comprise an MBD having at least 90% similarity. Most preferably the polypeptides of the invention comprise the human MeCP2 MBD sequence (SEQ ID NO:1). The MBD sequences of the synthetic polypeptides of the invention have the ability to bind methylated DNA.

[0029] The MBD sequence of particular interest for the synthetic polypeptides of the invention is that of the amino acids at positions 78 to 162 of the MeCP2 e2 isoform (SEQ ID NO: 4). Thus in preferred embodiments of the invention the polypeptides of the invention comprise an MBD having at least 85%, 88%, 90%, 92%, 94%, 95%, 96%, 97%, 98% or 99% similarity to the sequence of amino acids from positions 78 to 162 of the MeCP2 e2 isoform (SEQ ID NO: 4).

[0030] Most preferably MBD amino acid sequences of the polypeptides of the invention comprise the sequence of amino acids from positions 78 to 162 of the MeCP2 e2 isoform (SEQ ID NO: 4).

[0031] The MBD sequence of MeCP2 includes several phosphorylation sites (Ser80, Ser86, Thr148/9 and Ser164; numbering with respect to the e2 isoform). Phosphorylation at Ser80 and Ser164, at least, has been associated with affecting the activity of MeCP2. Therefore it is preferred that one, more, or all of these amino acids are retained in the MBD sequences of the synthetic polypeptides of the invention.

[0032] The MBD sequence of the invention may correspond to that of a naturally occurring MeCP2 MBD sequence, for example the sequence of MBD in the zebrafish homolog of MeCP2.

[0033] The NCoR/SMRT Interaction Domain (NID) of MeCP2 is the domain through which MeCP2 interacts with the NCoR/SMRT co-repressor complexes. The human MeCP2 NID sequence is provided herein as SEQ ID NO: 2. It consists of amino acids 272 to 312, inclusive, of the human MeCP2 protein (numbering refers to the e2 isoform, i.e. as in SEQ ID NO: 4). The mouse MeCP2 MBD sequence is identical to the human sequence, except for amino acid position 297 in SEQ ID NO: 4 (i.e. the amino acid at position 26 in SEQ ID NO: 2), which is histidine in mouse but glutamine in human. The polypeptides of the invention comprise an NID amino acid sequence having at least 70% similarity to this MeCP2 NID sequence (SEQ ID NO: 2). Preferably the polypeptides of the invention comprise an NID having at least 75%, 80%, 85%, 88%, 90%, 92%, 94%, 95%, 95%, 97%, 98% or 99% similarity to the human MeCP2 NID sequence (SEQ ID NO: 2). Further preferably the polypeptides of the invention comprise an NID having at least 90% similarity. Most preferably the polypeptides of the invention comprise the human MeCP2 NID sequence (SEQ ID NO: 2). The NID sequences of the synthetic polypeptides of the invention have the ability to interact, or bind, with the NCoR/SMRT co-repressor complex.

[0034] The NID sequence of particular interest for the synthetic polypeptides of the invention is that of the amino acids at positions 298 to 309 of the MeCP2 e2 isoform (SEQ ID NO: 4). Thus in preferred embodiments of the invention the polypeptides of the invention comprise an MBD having at least 80%, 85%, 88%, 90%, 92%, 94%, 95%, 96%, 97%, 98% or 99% similarity to the sequence of amino acids from positions 298 to 309 of the MeCP2 e2 isoform (SEQ ID NO: 4). Most preferably NID amino acid sequences of the polypeptides of the invention comprise the sequence of amino acids from positions 298 to 309 of the MeCP2 e2 isoform (SEQ ID NO: 4).

[0035] The NID sequence of MeCP2 includes phosphorylation sites (Thr308 and Ser274; numbering with respect to the e2 isoform), the former of which has been associated with affecting the activity of the NID. Therefore it is preferred that Thr308 at least is retained in the NID sequences of the synthetic polypeptides of the invention.

[0036] The NID sequence of the invention may correspond to that of a naturally occurring MeCP2 NID sequence, for example the sequence of NID in the zebrafish homolog of MeCP2.The MBD and NID sequences may have the same amount of percentage similarity to their respective wild type human MeCP2 sequences, or they may have different amounts of percentage similarity to their respective wild type human MeCP2 sequences. The percentage similarities for the MBD and NID may therefore consist of any combination of the above disclosed percentage similarities. Thus the present invention provides synthetic polypeptides comprising an MBD amino acid sequence showing at least 70% similarity with the amino acid sequence as depicted in SEQ ID NO: 1 and an NID amino acid sequence showing at least 70% similarity with the amino acid sequence as depicted in SEQ ID NO: 2, but preferably the MBD and NID amino acid sequences may have at least 75%, 80%, 85%, 88%, 90%, 92%, 94%, 95%, 95%, 97%, 98% or 99% similarity to the human MeCP2 domain sequences. Further preferably at least 90% similarity. Similarly, the MBD sequence may have at least 70%, 75%, 80%, 85%, 88%, 90%, 92%, 94%, 95%, 95%, 97%, 98% or 99% similarity to the human MeCP2 MBD sequence whilst the NID sequence of the same synthetic polypeptide may have at least 70%, 75%, 80%, 85%, 88%, 90%, 92%, 94%, 95%, 95%, 97%, 98% or 99% similarity to the human MeCP2 NID sequence. Preferably one or both of the MBD and NID sequences will consist of or comprise their corresponding human or mouse domain sequence.

[0037] The term "similarity" refers to a degree of similarity between proteins or polypeptide sequences taking into account differences in amino acids at aligned positions of the sequences, but in which the functional similarity of the different amino acid residues, in view of almost equal size, lipophilicity, acidity, etc., is also taken into account. A percentage similarity can be calculated by optimal alignment of the sequences using a similarity-scoring matrix such as the Blosum62 matrix described in Henikoff S. and Henikoff J. G., P.N.A.S. USA 1992, 89: 10915-10919. Calculation of the percentage similarity and optimal alignment of two sequences using the Blosum62 similarity matrix and the algorithm of Needleman and Wunsch (J. Mol. Biol. 1970, 48: 443-453) can be performed using the GAP program of the Genetics Computer Group (GCG, Madison, Wis., USA) using the default parameters of the program.

[0038] Exemplary parameters for amino acid comparisons for similarity in the present invention use the Blosum62 matrix (Henikoff and Henikoff, supra) in association with the following settings for the GAP program:

[0039] Gap penalty: 8

[0040] Gap length penalty: 2

[0041] No penalty for end gaps.

[0042] Functional polymorphic forms of MBD and NID from mice and humans, and homologues of these domains from MeCP2 of other species, may be included in the polypeptides of the present invention. Variants of these domains in the polypeptides that also form part of the present invention are natural or synthetic variants that may contain variations in the amino acid sequence due to deletions, substitutions, insertions, inversions or additions of one or more amino acids in said sequence or due to an alteration to a moiety chemically linked to a protein. For example, a protein variant may be an altered carbohydrate or PEG structure attached to a protein. The polypeptides of the invention may include at least one such protein modification.

[0043] "Variants" of a polypeptide domain or protein, as used herein, refers to a polypeptide domain or protein resulting when a polypeptide is modified by one or more amino acids (e.g. insertion, deletion or substitution), or which comprises a protein modification, or which contains modified or non-natural amino acids. Substitutional variants of polypeptides are those in which at least one residue in the amino acid sequence has been removed and a different residue inserted in its place. The domains in the polypeptides of the present invention can contain conservative changes, wherein a substituted amino acid has similar structural or chemical properties, or more rarely non-conservative substitutions, for example, replacement of a glycine with a tryptophan, as long as the domains retain function. Variants may also include sequences with amino acid deletions or insertions, or both. Guidance in determining which amino acid residues may be substituted, inserted, or deleted without abolishing biological or immunological activity may be found using computer programs well known in the art.

[0044] The term "conservative substitution", relates to the substitution of one or more amino acid substitutions for amino acid residues having similar biochemical properties. Typically, conservative substitutions have little or no impact on the activity of a resulting polypeptide sequence. For example, a conservative substitution in a binding domain may be an amino acid substitution that does not substantially affect the ability of the domain to bind to its binding partner(s) or otherwise perform its usual biological function. Screening of variants of the polypeptide domains of the present invention can be used to identify which amino acid residues can tolerate an amino acid substitution. In one example, the relevant biological activity of a polypeptide having a modified domain is not altered by more than 25%, preferably not more than 20%, especially not more than 10%, when one or more conservative amino acid substitutions are effected.

[0045] One or more conservative substitutions can be included in a MBD or NID of a polypeptide of the present invention. In one example, 10 or fewer conservative substitutions are included in the domains. A polypeptide of the invention may therefore include 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more conservative substitutions of the MBD and/or NID domains. A polypeptide can be produced to contain one or more conservative substitutions by manipulating the nucleotide sequence that encodes that polypeptide using, for example, standard procedures such as site-directed mutagenesis or PCR. Alternatively, a polypeptide can be produced to contain one or more conservative substitutions by using peptide synthesis methods, for example as known in the art.

[0046] Examples of amino acids which may be substituted for an original amino acid in a protein and which are regarded as conservative substitutions include: Ser for Ala; Lys for Arg; Gln or His for Asn; Glu for Asp; Asn for Gln; Asp for Glu; Pro for Gly; Asn or Gln for His; Leu or Val for Ile; Ile or Val for Leu; Arg or Gln for Lys; Leu or Ile for Met; Met, Leu or Tyr for Phe; Thr for Ser; Ser for Thr; Tyr for Trp; Trp or Phe for Tyr; and Ile or Leu for Val. In one embodiment, the substitutions are among Ala, Val, Leu and Ile; among Ser and Thr; among Asp and Glu; among Asn and Gln; among Lys and Arg; and/or among Phe and Tyr. However, a substitution may not be considered conservative where it results in the removal of a site of phosphorylation within the polypeptide sequence. Further information about conservative substitutions can be found in, among other locations, Ben-Bassat et al., (J. Bacteriol. 169:751-7, 1987), O'Regan et al., (Gene 77:237-51, 1989), Sahin-Toth et al., (Protein Sci. 3:240-7, 1994), Hochuli et al., (Bio/Technology 6:1321-5, 1988), WO 00/67796 (Curd et al.) and in standard textbooks of genetics and molecular biology.

[0047] Substitutions causing loss or decrease of function of MBD and NID are known in the art, not least due to the association of some with Rett syndrome and other MeCP2 associated disorders. Examples of such harmful changes or mutations include those shown in Table 1 and FIG. 2C in the MBD, and those shown in Table 2 and FIG. 2D in NID. Thus the skilled person will understand that these harmful changes should not be included in the MBD and NID domains of the polypeptides of the invention.

TABLE-US-00001 TABLE 1 Harmful and benign amino acid changes in the MBD. According to convention, all amino acid numbers given in the following refer to the human and mouse e2 iosforms. *those found in hemizygous males are in bold, and those in heterozygous females are in italics; **those found in non-mammalian vertebrates are provided in italics. The "benign changes" listed below are not known to be associated with an MeCP2-associated disorder, therefore they are probably benign. Harmful changes associated with: Benign changes present in: Classical Atypical RTT or other General Other [vertebrate] RTT.sup.13 intellectual disability.sup.13 population.sup.14* species** L100V S86C P72L/S P72L L100R P93S P75L A73S P101R D97Y/E K82R V74A P101S P101R R85H A77S P101H G103D R89C P81A P101L W104R R91W I87V R106W G114A S113F D96E R106Q Y120D R115H T99S R106L D121G Q128E E102Q L108H V122M A140G Y120F R111G N126S K144R Q128N/E L124F G129V D147E/N I139M P127L R133H/G T160S E143Q Q128P E137G s only K171N S149I A131D A140V s + s P172L L150T R133C Y141C P173A Q170K R133P D151G K171R R133L P152A s + s P172Q S134C F155C S134F D156G S134P T160S s + s K135E G161E/W L138S P172S s only P152R F155S D156E D156A F157L F157I T158M T158A G161V

[0048] Tables 1 and 2 also list changes that have no known association with an MECP2-related disorder, and so that are believed to be benign. Thus the skilled person will understand that such apparently benign changes may optionally be included in synthetic polypeptides of the invention.

TABLE-US-00002 TABLE 2 Harmful and benign amino acid changes in the NID. According to convention, all amino acid numbers given in the following refer to the human and mouse e2 isoforms. *those found in hemizygous males are in bold, and those in heterozygous females are in italics; **those found in mouse are in bold, and those found in non-mammalian vertebrates are provided in italics. Harmful changes associated with: Benign changes present in: Classical Atypical RTT or other General Present in other RTT.sup.13 intellectual disability.sup.13 population.sup.14* [vertebrate] species P302T K286R P272L G273S/A P302S V300I A278T/V S274A P302H I303M A279S/P V275A/L P302L K304E/R A291T V276A P302R R309W s + s A287V/P A278I K305R T311M V288M A279L K305T R294Q/P//G A280T R306C S295T A281E R306H T311A E282A A288I/L I293A R294K S295P V296L Q297H(mouse)/L T299R V300A V312I/L

[0049] The biological activity of the MBD and NID domains that is of particular interest for the invention is the ability to recruit members of the NCoR/SMRT co-repressor complex to methylated DNA. Therefore it is preferred that the synthetic polypeptides of the invention are capable of recruiting NCoR/SMRT co-repressor complex components to methylated DNA. The NCoR/SMRT co-repressor complex components include NCoR, HDAC3, SIN3A, GPS2, SMRT, TBL1X and TBLR1. Preferably the synthetic polypeptides are capable of recruiting TBL1X or TBLR1 to methylated DNA.

[0050] The inventors have identified the MBD and NID domains as being key to the activity that is required of therapeutic MeCP2 in order for it to compensate for the reduced activity of MeCP2 in Rett syndrome and related disorders. Therefore whilst it is required that the MBD and NID domains are biologically active in the synthetic polypeptides of the invention, amino acid sequences in other parts of the wild type MeCP2 protein may be altered, for example by deletion of amino acids. Thus the synthetic polypeptides of the invention may have a deletion of at least 50 amino acids when compared to the full length human MeCP2 e1 and e2 sequences (SEQ ID NOs 3 and 4). Said deletion of at least 50 amino acids may be a deletion of at least 60, 70, 80, 90, 100, 125, 150, 175, 200, 225, 250, 275, or 300 amino acids when compared to the full length human MeCP2 e1 and e2 sequences (SEQ ID NOs 3 and 4). Preferably said deletion is of at least 200 amino acids when compared to the full length human MeCP2 e1 and e2 sequences (SEQ ID NOs 3 and 4).Such deletion of at least 50 amino acids may be assessed by preparing an alignment (see above) of the amino acid sequence of interest with the human MeCP2 e1 and e2 sequences (SEQ ID NOs 3 and 4). This will identify any regions in which amino acid residues in the MeCP2 sequences have been deleted because there will be gaps in the sequence of interest aligned to the MeCP2 e1 and e2 sequences (SEQ ID NOs 3 and 4). Preferably at least some of the amino acids that have been deleted will be consecutive within the MeCP2 sequence, such that said deletion of at least 50 amino acids will include the deletion of at least 5, 10, 15, 20, 30, 40, 50 or more consecutive amino acids within the MeCP2 e1 and e2 sequences (SEQ ID NOs 3 and 4). The deletion of the at least 50 amino acids will be apparent from the alignment with both the e1 and e2 sequences, therefore any deletion that is present in the N-terminal region of MeCP2 and that is only associated with one of the e1 and e2 sequences, should not be considered a deletion to be counted as part of the at least 50 consecutive amino acids deleted in accordance with the invention. However, a deletion that is present in the N-terminal region of MeCP2 and that is present in both the e1 and e2 sequence alignments should be considered a deletion and counted as part of the at least 50 deleted amino acids in accordance with the invention.

[0051] The amino acids deleted from the wild-type MeCP2 e1 and e2 sequences may be replaced with some other useful amino acid sequence. For example, a deletion of at least 50 amino acids may have occurred when compared to the full length human MeCP2 e1 and e2 sequences (SEQ ID NOs 3 and 4), but those deleted amino acids may have been replaced, at least in part, with amino acid sequence providing a linker, a tag and/or a signaling peptide.

[0052] This may be identified in an alignment of the synthetic polypeptide with the MeCP2 e1 and e2 sequences by a stretch of amino acid sequence in the synthetic polypeptide that does not match the MeCP2 sequence, and wherein that stretch of unmatched amino acid sequence corresponds to a useful, or purposive, heterologous sequence. Thus the invention provides, in at least some embodiments, synthetic polypeptides that have MeCP2 activity associated with the MBD and NID sequences, but that can also include useful heterologous sequences without requiring the synthetic polypeptides to be larger than the wild type MeCP2 protein; since large parts of the MeCP2 sequence can be left out of the synthetic polypeptides of the invention, the heterologous sequence(s) can be included whilst maintaining a relatively small overall size for the synthetic polypeptide.

[0053] As an alternative to the above-mentioned deletion of at least 50 amino acids, or in addition to it, the synthetic polypeptide of the invention having the MBD and NID amino acid sequences may have alterations to the polypeptide amino acid sequences such that it has less than 90% identity to the amino acid sequences of MeCP2, as depicted in SEQ ID NOs: 3 and 4, over the entire length of the amino acid sequences of MeCP2, as depicted in SEQ ID NOs: 3 and 4. Said less than 90% identity will be apparent from the comparison with both the e1 and e2 sequences, therefore any such identity solely due to alterations in the N-terminal region of MeCP2, which are only associated with one of the e1 and e2 sequences, will not be considered as the synthetic polypeptide having less than 90% identity in accordance with the invention. Preferably said identity will be less than 85%, 80%, 75%, 70%, 65%, 60% or 55%. It is particularly preferred that said identity will be less than 60%.

[0054] Synthetic polypeptides of the invention will generally have the structure:

[0055] A-B-C-D-E

[0056] wherein portion B of the synthetic polypeptide is the MBD amino acid sequence, as described above, and portion D of the synthetic polypeptide is said NID amino acid sequence, as described above. As explained above, however, parts of the synthetic polypeptide other than the MBD and NID domains, i.e. portions A, C and D, may have alterations compared to the wild type MeCP2 sequence. Thus the synthetic polypeptide may include alterations in accordance with one or more of the following: portion A of the synthetic polypeptide is less than 40 amino acids long and/or has less than 95% identity to the amino acid sequences as depicted in SEQ ID NOs:5 and 6, calculated over the entire length of the amino acid sequences as depicted in SEQ ID NOs: 5 and 6; portion C of the synthetic polypeptide is less than 20 amino acids long and/or has less than 95% identity to the amino acid sequence as depicted in SEQ ID NO: 7, calculated over the entire length of the amino acid sequence as depicted in SEQ ID NO: 7; and portion E of the synthetic polypeptide is absent, a protein tag, and/or has less than 95% identity to the amino acid sequence as depicted in SEQ ID NO: 8, calculated over the entire length of the amino acid sequence as depicted in SEQ ID NO: 8.

[0057] Portion A of the synthetic polypeptide, corresponding to the N-terminal portion of the polypeptide and the area adjacent to the amino end of the MBD amino acid sequence when the MBD amino acid sequence is N-terminal to the NID amino acid sequence, may be less than 40 amino acids, preferably less than 35, 30, 25, or 20 amino acids. It is particularly preferred that portion A is less than 25 amino acids. Additionally or alternatively, portion A may have less than 95% identity to the amino acid sequences as depicted in SEQ ID NOs:5 and 6, calculated over the entire length of the amino acid sequences as depicted in SEQ ID NOs: 5 and 6; preferably the identity is less than 90%, less than 85%, less than 80%, less than 75%, less than 70%, less than 65%, less than 60%, or less than 50%. Preferably portion A is truncated compared to the natural sequences of MeCP2 e1 and e2, such that portion A is less than 72 amino acids long, preferably less than 70, 65, 50, 55, 45, 30, or 25 amino acids. Further preferably, portion A will be truncated to such an extent that SEQ ID NOs: 5 and 6 are essentially not present. For example, SEQ ID NOs: 5 and 6 may have been deleted from the amino acid sequence of the synthetic polypeptide, and optionally replaced with an N-terminal tag.

[0058] Thus the amino acid sequences specific to e1 and e2, which are 24 amino acids long in human e1 (29 amino acids long in mouse e1) and 9 amino acids long in e2 (human/mouse) and which are encoded by exons 1 and 2 and the first 10 base pairs of exon 3 of MECP2 and Mecp2, may be included in the synthetic polypeptides of the invention, and so, for example, may provide the most N-terminal amino acid sequence of the synthetic polypeptide. The mouse e1 specific N-terminal amino acid sequence is MAAAAATAAAAAAPSGGGGGGEEERLEEK (SEQ ID NO: 9). The mouse e2 specific N-terminal amino acid sequence is MVAGMLGLREEK (SEQ ID NO: 10). The human e1 specific N-terminal amino acid sequence is MAAAAAAAPSGGGGGGEEERLEEK (SEQ ID NO: 11). The human e2 specific N-terminal amino acid sequence is MVAGMLGLREEK (SEQ ID NO: 12). Therefore a synthetic polypeptide of the invention may comprise an amino acid sequence corresponding to any of SEQ ID NOs 9-12, and optionally said amino acid sequence may be the most N-terminal sequence in the synthetic polypeptide of the invention.

[0059] Preferably however, these extreme N-terminal sequences specific to the wild type e1 and e2 MeCP2 will not be included in the synthetic polypeptides of the invention. Therefore preferably a synthetic polypeptide of the invention will not comprise an amino acid sequence corresponding to any of SEQ ID NOs 9-12.

[0060] Preferably the amino acid sequence adjacent to the amino end of the MBD amino acid sequence has less than 75% identity to the amino acid sequences as depicted in SEQ ID NOs: 5 and 6, calculated over the entire length of the amino acid sequences as depicted in SEQ ID NOs: 5 and 6 preferably less than 50%, and further preferably less than 30% identity.

[0061] Preferably the amino acid sequence adjacent to the amino end of the MBD amino acid sequence is less than 50 amino acids long, preferably less than 30 amino acids long or less than 20 amino acids long, and further preferably less than 10 amino acids long. Portion C of the synthetic polypeptide, corresponding to the amino acid sequence between the MBD and NID amino acid sequences, may be less than 20 amino acids, preferably less than 15, 10, or 5 amino acids. Additionally or alternatively, portion C may have less than 95% identity to the amino acid sequence as depicted in SEQ ID NO: 7; preferably the identity is less than 90%, less than 85%, less than 80%, less than 75%, less than 70%, less than 65%, less than 60%, less than 50%, or less than 30%. Preferably portion C is truncated compared to the natural sequence of MeCP2, such that portion C is less than 98 amino acids long, preferably less than 90, 85, 80, 75, 70, 65, 60, 55, 50, 45, 40, 35, 30, 25, 20, or 15 amino acids long. Preferably the amino acid sequence between the MBD and NID amino acid sequences is less than 50 amino acids long, preferably less than 30 amino acids long, and further preferably less than 20 amino acids long.

[0062] An additional benefit to deletions within portion C of the synthetic polypeptide, i.e. the amino acid sequence between the MBD and NID sequences, is that the inventors have found that deletions in this portion can make the synthetic polypeptide less stable. Surprisingly, the inventors have found that such reduced stability may be beneficial to the utility of the synthetic polypeptide in the clinical setting, because it may reduce the chance of over-expression of the synthetic polypeptide. Therefore in some embodiments it is particularly preferred that there be a significant deletion in the amino acid sequence between the MBD and NID amino acid sequences, e.g. portion C of the above-noted generic structure, for example a deletion of at least 10, 15, 20, 30, 40, or 50 amino acids. Preferably the substitution or significant deletion of the amino acids will include substitution or significant deletion from the region from position 207 to position 271 of the full length human wild type MeCP2 polypeptide sequence (e2 isoform) as shown in SEQ ID NO: 4.

[0063] Portion E of the synthetic polypeptide, corresponding to the C-terminal portion of the polypeptide and the area adjacent to the carboxy end of the NID amino acid sequence when the MBD amino acid sequence is N-terminal to the NID amino acid sequence, may be absent, such that the carboxy end of the NID amino acid sequence corresponds with the C-terminus of the synthetic polypeptide. Alternatively, portion E may comprise a protein tag, for example so that portion E may be used to isolate or monitor/detect the synthetic polypeptide.

[0064] Additionally or alternatively, portion E may have less than 95% identity to the amino acid sequence provided in SEQ ID NO: 8, calculated over the entire length of the amino acid sequence as depicted in SEQ ID NO: 8; preferably the identity is less than 90%, less than 85%, less than 80%, less than 75%, less than 70%, less than 65%, less than 60%, or less than 50%. Thus portion E may comprise a deletion of all or a significant part of the amino acids at positions 313 to 486 of MeCP2 (SEQ ID NO:4), optionally with replacement of the deleted sequence by a tag, and further optionally with a linker attaching the tag to the synthetic polypeptide.

[0065] Preferably the amino acid sequence adjacent to the carboxy end of the NID amino acid sequence has less than 75% identity to the amino acid sequence as depicted in SEQ ID NO: 8, calculated over the entire length of the amino acid sequence as depicted in SEQ ID NO: 8, preferably less than 50%, and further preferably less than 30% identity.

[0066] Preferably the amino acid sequence adjacent to the carboxy end of the NID amino acid sequence is less than 50 amino acids long, preferably less than 30 amino acids long or less than 20 amino acids long, and further preferably there is no amino acid sequence adjacent to the carboxy end of the NID amino acid sequence such that the carboxy end of the NID amino acid sequence corresponds with the C-terminus of the synthetic polypeptide.

[0067] The tag forming part of portion E of the synthetic polypeptides of the invention may be for monitoring or detection of the synthetic polypeptide or to allow post-translational regulation of the synthetic polypeptide, as explained further below. Examples of suitable tags for detection or monitoring of the polypeptide are known in the art, and include a polyhistidine tag, a FLAG tag, a Myc tag and a fluorescent protein tag such as enhanced green fluorescent protein (EGFP).

[0068] Preferably the synthetic polypeptides of the invention have less than 90% identity over the entire length of the amino acid sequences of MeCP2 as depicted in SEQ ID NO: 3 and SEQ

[0069] ID NO: 4, preferably less than 80% identity, less than 70% identity, or less than 60% identity, and further preferably less than 40% identity.

[0070] The term "identity" refers to the extent to which two amino acid sequences have the same residues at the same positions in an alignment. The percentage identity as used herein is calculated across the length of a comparative sequence disclosed herein, for example one of SEQ ID NOs: 3 to 8, as described herein. Thus all residues in that comparative sequence should be aligned with the sequence of interest, and any gaps created during alignment of the sequence of interest with the full length of the comparative sequence should be taken into account when calculating the percentage identity, including when such "gaps" occur at either end of the sequence of interest in the alignment. However, any additional end sequence in the sequence of interest that aligns past the end of the comparative sequence, i.e. which does not align as such with the comparative sequence but which overhangs the end of the comparative sequence instead, should not be included when calculating the percentage identity. Thus when calculating the percentage identity, the identity score will be divided by the length of the comparative sequence, including any gaps that have been inserted into the comparative sequence as a result of the optimal alignment with the sequence of interest, and then multiplied by 100.

[0071] Methods of alignment of sequences for comparison are well known in the art. Various programs and alignment algorithms are described in: Smith & Waterman, Adv. Appl. Math. 2:482, 1981; Needleman & Wunsch, J. Mol. Biol. 48:443, 1970; Pearson & Lipman, Proc. Nat. Acad Sci. USA 85:2444, 1988; Higgins & Sharp, Gene, 73:23744, 1988; Higgins & Sharp, CABIOS 5:151-3, 1989; Corpet et al., Nuc. Acids Res. 16:10881-90, 1988; Huang et al. Computer Appls. in the Biosciences 8, 155-65, 1992; and Pearson et al., Meth Mol. Bio. 24:307-31, 1994. Altschul et al., J. Mol. Biol. 215:403-10, 1990, presents a detailed consideration of sequence alignment methods and homology calculations. There are readily available programs that permit the preparation of sequence alignments and the calculation of percentage identity, such as the GAP program, running under GCG (Genetics Computer Group Inc., Madison, Wis., USA).

[0072] Other variants of the synthetic polypeptides of the invention can be, for example, functional variants such salts, amides, esters, and specifically C-terminal esters, and N-acyl derivatives. Also included are peptides which are modified in vivo or in vitro, for example by glycosylation, amidation, carboxylation or phosphorylation.

[0073] A benefit of the invention is that by identifying the MBD and NID sequences as being key to the activity of MeCP2 that is deficient in Rett syndrome, the inventors have identified that significant portions of other sections of the MeCP2 protein may be removed when preparing a synthetic protein for use in treating or preventing a disorder such as Rett syndrome. Therefore the invention provides synthetic polypeptides that are truncated forms of MeCP2, comprising MBD and NID sequences but missing one or more sections of the amino acid sequences adjacent to the amino end of the MBD amino acid sequence (e.g. in portion A), between the MBD and NID amino acid sequences (e.g. in portion C), and adjacent to the carboxy end of the NID amino acid sequence (e.g. in portion E), when compared to the full length human MeCP2 e1 and e2 sequences (SEQ ID NOs 3 and 4). Preferably a synthetic polypeptide of the invention will have less amino acids than the wild type MeCP2 protein, for example a synthetic polypeptide of the invention may consist of less than 430 amino acids, preferably less than 400, 350, 320, 270, or 200 amino acids, and further preferably less than 180 amino acids.

[0074] Synthetic polypeptides of the invention may suitably comprise a signal peptide, for example a nuclear localisation signal. A nuclear localisation signal may target a polypeptide for import into the cell nucleus by nuclear transport. Suitable nuclear localisation signals are known in the art, and include the SV40 nuclear localisation signal and the NLS of the native MeCP2 protein (amino acids 253 to 271 of SEQ ID NO: 4). The NLS may be situated in any part of the polypeptide, but preferably will be situated in the amino acid sequence linking the MBD to the N ID.

[0075] Synthetic polypeptides of the invention may suitably comprise a cell-penetrating peptide (CPP). CPPs, also called protein-transduction domains, consist of short sequences of 8 to 30 amino acids in length that can facilitate entry of molecules into cells.sup.15. The CPP may be synthetic, designed specifically for the targeting of therapeutic molecules, or naturally occurring. Preferably the CPP will facilitate entry into neuronal cells. A preferred CPP for use with the synthetic polypeptides of the invention is the CPP of the trans-activator of transcription protein, Tat.

[0076] Synthetic polypeptides of the invention may suitably comprise a tag. Such tags are well known in the art and may be useful for polypeptide purification, detection/monitoring and/or post-translational regulation. Examples of suitable tags useful in purification or detection of a polypeptide are known in the art, and include a polyhistidine tag, a FLAG tag, a Myc tag and a fluorescent protein tag such as enhanced green fluorescent protein (EGFP).

[0077] A tag may be used for post-translational regulation of a polypeptide by, for example, providing the ability to control the post-translational degradation of the polypeptide. Examples of suitable tags that may be used with synthetic polypeptides of the invention to allow such control of post-translational degradation include a SMASh tag and a Destabilisation Domain (DD) of FKBP12. The SMASh tag is approximately 300 amino acids long and comprises a protease cleavage site followed by the protease, followed by a degron tag. The SMASh tag is fused in the protein of interest (POI) with the protease cleavage site and protease between the POI and the degron tag. This can be on either terminus of the POI. Normally, the protease self-cleaves the protease site, removing the degron tag so that the protein is not degraded. However, treatment with a drug such as Asunaprevir can inhibit the protease, which prevents removal of the degron tag, and so results in degradation of the attached POI. Since the administration of Asunaprevir has a dose-dependent effect on POI degradation, the use of a SMASh tag with Asunaprevir can allow post-translational regulation of the amount of the POI.

[0078] Similarly, the DD-FKBP12 is approximately 110 amino acids long and it destablises the POI to which it is attached. It can be fused to either terminus of the POI, but it is preferably attached to the N-terminus as then it is generally more effective. The fusion protein produced will therefore generally be unstable but it can be protected by the administration of a molecule called Shield-1. Administration of Shield-1 has a dose-dependent effect on the prevention of the degradation of the POI.

[0079] The skilled person will appreciate that it may not be desirable to include certain types of tags, particularly those used for purification or detection during polypeptide synthesis such as Myc or EGFP, in the final therapeutic polypeptide that is delivered to a subject, as the tags may be immunogenic or active in an undesirable manner. Therefore the tags included in the synthetic polypeptides of the invention may be removable, for example by chemical agents or enzymatic means, such as proteolysis or intein splicing; this may allow the use of the tag during preparation of a synthetic polypeptide followed by removal of the tag before the polypeptide is introduced to an animal for treatments in accordance with therapeutic uses (e.g. protein replacement therapy) disclosed herein.

[0080] The synthetic polypeptides of the invention may comprise a linker sequence, for example to link the MBD sequence to the NID sequence, in place of the natural sequence that links these two sequences in MeCP2, or as a means to attach or insert a tag, CPP or NLS to a synthetic polypeptide of the invention. The design and use of such linkers are well known in the art. A suitable linker may comprise between 4 and 15 amino acids, preferably between 6 and 10 amino acids. Preferably the linker will consist of glycines, serines, or a combination of glycines and serines.

[0081] Preferably the synthetic polypeptide sequences of the invention will show at least 80% similarity to one or more of the sequences .DELTA.NC (SEQ ID NO: 13), .DELTA.NIC (SEQ ID NO: 14), .DELTA.N mouse (SEQ ID NO: 15), .DELTA.NC mouse (SEQ ID NO: 16), and .DELTA.NIC mouse (SEQ ID NO: 17), preferably at least 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% similarity. Optionally, as disclosed above, the synthetic polypeptide sequences of the invention may comprise an e1 or e2 specific sequence, preferably at the N-terminus of the synthetic polypeptide. Thus preferably the synthetic polypeptide sequences of the invention will show at least 80% similarity to a sequence consisting of one or more of the sequences: .DELTA.NC (SEQ ID NO: 13); .DELTA.NIC (SEQ ID NO: 14); .DELTA.N mouse (SEQ ID NO: 15); .DELTA.NC mouse (SEQ ID NO: 16); and .DELTA.NIC mouse (SEQ ID NO: 17), and one of the e1 or e2 specific sequences (SEQ ID NOs: 9-12) at the N-terminus of the sequence, and preferably at least 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% similarity. Further preferably the synthetic polypeptide sequences of the invention may comprise or consist of the sequence .DELTA.NC (SEQ ID NO: 13), .DELTA.NIC (SEQ ID NO: 14), .DELTA.N mouse (SEQ ID NO: 15), .DELTA.NC mouse (SEQ ID NO: 16), .DELTA.NIC mouse (SEQ ID NO: 17), or one of the sequences .DELTA.NC (SEQ ID NO: 13), .DELTA.NIC (SEQ ID NO: 14), .DELTA.N mouse (SEQ ID NO: 15), .DELTA.NC mouse (SEQ ID NO: 16), .DELTA.NIC mouse (SEQ ID NO: 17), with one of the e1 or e2 specific sequences (SEQ ID NOs: 9-12) at the N-terminus of the sequence. .DELTA.NC (SEQ ID NO: 13), .DELTA.N mouse (SEQ ID NO: 15), and .DELTA.NC mouse (SEQ ID NO: 16) include the wild type MeCP2 NLS sequence. .DELTA.NIC (SEQ ID NO: 14) and .DELTA.NIC mouse (SEQ ID NO: 17) include the SV40 NLS sequence.

[0082] It is particularly preferred that the synthetic polypeptide sequences of the invention will show at least 80% similarity to a sequence consisting of .DELTA.NIC (SEQ ID NO: 14) and a human e1 or e2 specific sequence (SEQ ID NOs: 11 and 12) immediately adjacent the N-terminus of .DELTA.NIC (SEQ ID NO: 14), preferably at least 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% similarity. Further preferably the synthetic polypeptide sequences of the invention may comprise or consist of .DELTA.NIC (SEQ ID NO: 14), optionally with a human e1 or e2 specific sequence (SEQ ID NOs: 11 and 12) immediately adjacent the N-terminus of .DELTA.NIC (SEQ ID NO: 14). Synthetic polypeptides of the invention may be used in the treatment or prevention of a neurological disorder associated with inactivating mutation of MECP2, for example Rett syndrome, as explained further below.

[0083] Nucleic Acid Constructs, Expression Vectors, Virions and Cells

[0084] The invention provides nucleic acid constructs encoding a synthetic polypeptide of the invention, and expression vectors comprising a nucleotide sequence encoding a synthetic polypeptide of the invention. The expression vector may be a viral vector, and thus the invention also provides a virion comprising an expression vector according to the invention. The invention also provides cells that comprise a synthetic genetic construct adapted to express a polypeptide of the invention, cells comprising a vector of the invention, and cells for producing a virion of the invention.

[0085] Nucleic acid constructs and/or expression vectors suitably comprise at least one expression control sequence operably linked to a nucleotide sequence encoding a synthetic polypeptide of the invention, to drive expression of the synthetic polypeptide. "Expression control sequences" are nucleotide sequences located upstream (5' non-coding sequences), within, or downstream (3' non-coding sequences) of a coding sequence, and which influence the transcription, RNA processing or stability, or translation of the associated coding sequence. Expression control sequences include enhancers, promoters, translation leader sequences, introns, and polyadenylation signal sequences. They include natural and synthetic sequences as well as sequences that may be a combination of synthetic and natural sequences. As is noted herein, the term "expression control sequences" is not limited to promoters. However, some suitable expression control sequences useful in the present invention will include, but are not limited to constitutive promoters, tissue-specific promoters, development-specific promoters, regulatable promoters and viral promoters.

[0086] Such expression control sequences generally comprise a promoter sequence and additional sequences which regulate transcription and translation and/or enhance expression levels. Suitable expression control sequences are well known in the art and include eukaryotic, prokaryotic, or viral promoter or poly-A signal. Expression control and other sequences will, of course, vary depending on the host cell selected or can be made inducible. Examples of useful promoters are the SV-40 promoter (Science 1983, 222: 524-527), the metallothionein promoter (Nature 1982, 296: 39-42), the heat shock promoter (Voellmy et al., P.N.A.S. USA 1985,82: 4949-4953), the PRV gX promoter (Mettenleiter and Rauh, J. Virol. Methods 1990, 30: 55-66), the human CMV IE promoter (U.S. Pat. No. 5,168,062), the Rous Sarcoma virus LTR promoter (Gorman et al., P.N.A.S. USA 1982, 79: 6777-6781), or human elongation factor 1 alpha or ubiquitin promoter. Suitable control sequences to drive expression in animals, e.g. humans, are well known in the art. The expression control sequences can drive ubiquitous expression or tissue- or cell-specific expression. The expression control sequence can comprise, for example, a viral or human promoter. A suitable promoter can be ubiquitous (e.g. the CAG promoter), tissue restricted or tissue specific. For example, the NEST/N promoter may drive expression in the CNS and the TAU and SYNAPSIN promoters may drive expression in neurons. Preferably the promoter will be for expression of the nucleotide sequence in neuronal cells, for example the MECP2 or Mecp2 promoter. Many suitable control sequences are known in the art, and it would be routine for the skilled person to select suitable sequences for the expression system being used.

[0087] The expression of the MECP2 is tightly controlled in animals. Therefore where nucleic acid constructs and expression vectors of the invention are to be used for expression of the synthetic polypeptides of the invention in an animal, for example in gene therapy, it is preferred that said expression be specific to neural cells, particularly neural cells of the brain and CNS. This may be accomplished, for example, through specific targeting of the nucleic acid constructs and expression vectors to the brain and/or the neural cells, through the use of delivery vehicles, such as AAV virions, and/or specific administration routes, such as by administration directly to the CNS. Additionally or alternatively, said neuron specific expression may be accomplished by the use of expression control sequences in the nucleic acid constructs and/or expression vectors that substantially limit the expression of the synthetic polypeptide to neural cells. Preferably those expression control sequences will be selected from the expression control sequences used to control the natural expression of the MECP2 gene.

[0088] The MECP2 gene contains a remarkably large, highly conserved 3'UTR, in which enhancers, silencers and many miRNA binding sites have been identified. Similarly, the MECP2 gene promoter region is also very large, and includes silencer, regulatory element and promoter sequences. Therefore preferably the expression vectors and/or nucleic acid constructs of the invention will include one or more of the expression control sequence elements from the MECP2 3'UTR. For example, gene therapy is disclosed herein that used an expression cassette that included an upstream core promoter element from the Mecp2 gene, and downstream microRNA (miR) binding sites and an AU-rich element. Therefore an expression vector and/or nucleic acid construct of the invention will preferably comprise one or more elements selected from: an upstream MECP2 or Mecp2 core promoter sequence (see, for example, nucleotides 200 to 329, inclusive, of SEQ ID NO:65); one or more downstream miR binding sites from the MECP2 or Mecp2 3'UTR; and an AU-rich element from the MECP2 or Mecp2 3'UTR. Further preferably the one or more downstream miR binding sites from the MECP2 or Mecp2 3'UTR will comprise one or more, or all of the following miR binding sites: a binding site for mir-22 (nucleotides 1166 to 1195, inclusive, of SEQ ID NO:65), a binding site for mir-19 (nucleotides 1196 to 1224, inclusive, of SEQ ID NO:65), a binding site for mir-132 (nucleotides 1225 to 1252, inclusive, of SEQ ID NO:65), and a binding site for mir-124 (nucleotides 1318 to 1324, inclusive, of SEQ ID NO:65).

[0089] Preferably, the expression vector and/or nucleic acid construct of the invention may comprise an upstream CNS regulatory element from Mecp2 or MECP2 (see, for example, nucleotides 422 to 443, inclusive, of SEQ ID NO:65) and/or an upstream silencer from Mecp2 or MECP2 (see, for example, nucleotides 142 to 203, inclusive, of SEQ ID NO:65).

[0090] It is particularly preferred that the upstream region of the expression vector and/or nucleic acid construct of the invention will comprise the mMeP426 sequence (nucleotides 117 to 542 of SEQ ID NO: 65; see FIGS. 17B, C) and/or that the downstream region of the expression vector and/or nucleic acid construct of the invention will comprise the RDH1pA sequence (nucleotides 1166 to 1370 of SEQ ID NO: 65; see FIGS. 17B, C).

[0091] Of course, the skilled person will appreciate that one or more other sequence elements may also be desirable or required in an expression vector and/or nucleic acid construct of the invention, such as a translational initiation signal, e.g. a Kozak sequence, a polyadenylation signal, and binding sites for components of the polyadenylation machinery such as CstF (cleavage stimulation factor). The skilled person will be capable of designing an expression vector and/or nucleic acid construct in accordance with the invention having any and all such necessary or desirable well known sequences.

[0092] The human wild type MECP2 e1 isoform cDNA sequence is provided herein as SEQ ID NO: 18. The mouse wild type MECP2 e1 isoform cDNA sequence is provided herein as SEQ ID NO: 21.

[0093] Suitably the nucleic acid construct of the invention may comprise a sequence for encoding the cDNA sequence of .DELTA.NC (SEQ ID NO: 19), .DELTA.NIC (SEQ ID NO: 20), .DELTA.N mouse (SEQ ID NO: 22), .DELTA.NC mouse (SEQ ID NO: 23), or .DELTA.NIC mouse (SEQ ID NO: 24). As explained above, optionally the synthetic polypeptides of the invention may comprise the e1 or e2 specific N-terminal sequences. The mouse e1 specific N-terminal amino acid sequence is encoded by the cDNA sequence ATGGCCGCCGCTGCCGCCACCGCCGCCGCCGCCGCCGCGCCGAGCGGAGGAGGAG GAGGAGGCGAGGAGGAGAGACTGGAGGAAAAG (SEQ ID NO: 25). The mouse e2 specific N-terminal amino acid sequence is encoded by the cDNA sequence ATGGTAGCTGGGATGTTAGGGCTCAGGGAGGAAAAGGGAGGAAAAG (SEQ ID NO: 26). The human e1 specific N-terminal amino acid sequence is encoded by the cDNA sequence ATGGCCGCCGCCGCCGCCGCCGCGCCGAGCGGAGGAGGAGGAGGAGGCGAGGAGG AGAGACTGGAAGAAAAG (SEQ ID NO: 27). The human e2 specific N-terminal amino acid sequence is encoded by the cDNA sequence ATGGTAGCTGGGATGTTAGGGCTCAGGGAAGAAAAG (SEQ ID NO: 28). Therefore the nucleic acid construct of the invention may comprise a sequence for encoding the cDNA sequence according to any of SEQ ID NOs: 25-28.

[0094] Further preferably the nucleic acid construct of the invention may comprise a sequence for encoding the cDNA sequence of SEQ ID NO: 28 or 29 immediately adjacent to the cDNA sequence of .DELTA.NIC (SEQ ID NO: 20).

[0095] Due to the degeneracy of the genetic code, polynucleotides encoding an identical or substantially identical amino acid sequence may utilise different specific codons (e.g. synonymous base substitutions). All polynucleotides encoding the synthetic polypeptides as defined above are considered to be part of the invention.

[0096] The invention provides an expression vector comprising a nucleotide sequence encoding a synthetic polypeptide of the invention. Such vectors suitably comprise an isolated or synthetic nucleic acid construct as described above.

[0097] The vectors according to the invention are suitable for transforming a host cell. Examples of suitable cloning vectors are plasmid vectors such as pBR322, the various pUC, pEMBL and Bluescript plasmids, or viral vectors such as HVT (Herpes Virus of Turkeys), MDV (Marek Disease Virus), ILT (Infectious Laryngotracheitis Virus), FAV (Fowl Adenovirus), FPV (FowlpoxVirus), or NDV (Newcastle Disease Virus). pcDNA3.1 is a particularly preferred vector for expression in animal cells.

[0098] After the polynucleotide has been cloned into an appropriate vector, the construct may be transferred into the cell, bacteria, or yeast by means of an appropriate method, such as electroporation, CaCl2 transfection or lipofectins. When a baculovirus expression system is used, the transfer vector containing the polynucleotide may be transfected together with a complete baculo genome.

[0099] These techniques are well known in the art and the manufacturers of molecular biological materials (such as Clontech, Stratagene, Promega, and/or Invitrogen) provide suitable reagents and instructions on how to use them. Furthermore, there are a number of standard reference text books providing further information on this, e.g. Rodriguez, R. L. and D. T. Denhardt, ed., "Vectors: A survey of molecular cloning vectors and their uses", Butterworths, 1988; Current protocols in Molecular Biology, eds.: F. M. Ausubelet al., Wiley N. Y. , 1995; Molecular Cloning: a laboratory manual, supra; and DNA Cloning, Vol. 1-4, 2nd edition 1995, eds.: Glover and Hames, Oxford University Press).

[0100] Details of preferred proteins according to the present invention for expression via the vector are described above.

[0101] The vector may be adapted to provide transient expression in a host cell or stable expression. Stable expression can be achieved, for example, through integration of the nucleotide sequence encoding the synthetic polypeptide into the genome of the host cell.

[0102] Suitable viral vectors include retroviral vectors (including lentiviral vectors), adenoviral vectors, adeno-associated viral (AAV) vectors, and alphaviral vectors. Preferably the viral vector will be an AAV vector, such as AAV1, AAV2, AAV4, AAV5, AAV6, AAV8 or AAV9. Preferably the AAV vector will be a self-complementary (sc) AAV vector.

[0103] The vector of the present invention may be present in a virion. Thus the present invention also provides a virion comprising a vector in accordance with the present invention. Preferably the virion and/or viral vector will be for expression in cells of the central nervous system (CNS), such as neuronal cells. Thus preferably a virion of the invention will comprise a capsid and/or inverted terminal repeats (ITRs) from one or more of AAV1, AAV2, AAV4, AAV5, AAV6, AAV8, and AAV9. Preferably the AAV will be a self-complementary (sc) AAV vector. Further preferably, the ITR and capsid proteins may be from different serotypes, for example ITRs from AAV2 may be used with capsid proteins from AAV9 to form scAAV virions.

[0104] Vectors according to the present invention can be used in transforming cells for expression of a protein according to the present invention. This can be done in cell culture to produce recombinant protein for harvesting, or it can be done in vivo to deliver a protein according to the present invention to an animal.

[0105] Thus the present invention also provides a cell population in which cells comprise a synthetic genetic construct adapted to express a protein according to the present invention. Said cell population may be present in a cell-culture system in a suitable medium to support cell growth.

[0106] The cells can be eukaryotic or prokaryotic.

[0107] Polynucleotides of the present invention may be cloned into any appropriate expression system. Suitable expression systems include bacterial expression system (e.g. Escherichia coli DH5.alpha.), a viral expression system (e.g. Baculovirus), a yeast system (e.g. Saccharomyces cerevisiae) or eukaryotic cells (e.g. COS-7, CHO,BHK, HeLa, HD11, DT40, CEF, or HEK-293T cells). A wide range of suitable expression systems are available commercially. Typically the polynucleotide is cloned into an appropriate vector under control of a suitable constitutive or inducible promoter and then introduced into the host cell for expression.

[0108] Suitably the cells are animal cells, more preferably they are mammalian cells, and most preferably human cells. Suitably the cells comprise a vector as set out above.

[0109] Preferably the cells are adapted such that expression of the protein according to the present invention is inducible.

[0110] It is particularly preferred that the cells comprise an expression vector for expressing a synthetic polypeptide of the invention, and the cell is suitable or adapted for producing a virion comprising an expression vector of the invention. Thus the cells may be used to produce virions for use in gene therapy treatment of Rett syndrome and related disorders. Preferably the virions will comprise AAV vectors for expressing a polypeptide of the invention, and further preferably the AAV vectors will comprise AAV9 and/or AAV2.

[0111] Suitable host cells for producing AAV virions include microorganisms, yeast cells, insect cells, and mammalian cells, that can be, or have been, used as recipients of a heterologous DNA molecule. The term includes the progeny of the original cell which has been transfected. Thus, a "host cell" as used herein generally refers to a cell which has been transfected with an exogenous DNA sequence. Cells from the stable human cell line, 293 (readily available through, e.g., the American Type Culture Collection under Accession Number ATCC CRL1573) can be used in the practice of the present invention. Particularly, the human cell line 293 is a human embryonic kidney cell line that has been transformed with adenovirus type-5 DNA fragments, and expresses the adenoviral Ela and Elb genes. The 293 cell line is readily transfected, and provides a particularly convenient platform in which to produce AAV virions.

[0112] Suitably, for in vivo delivery, virions of the invention, such as AAV virions, may be formulated into pharmaceutical compositions. Suitably, pharmaceutical compositions will comprise sufficient genetic material to produce a therapeutically effective amount of the synthetic polypeptide of the invention, i.e., an amount sufficient to reduce, ameliorate or prevent symptoms of the disorders associated with reduced MeCP2 activity, such as Rett syndrome. The pharmaceutical compositions may also contain a pharmaceutically acceptable excipient. Such excipients include any pharmaceutical agent that does not itself induce the production of antibodies harmful to the individual receiving the composition, and which may be administered without undue toxicity. Pharmaceutically acceptable excipients include, but are not limited to, sorbitol, Tween80, and liquids such as water, saline, glycerol and ethanol. Pharmaceutically acceptable salts can be included therein, for example, mineral acid salts such as hydrochlorides, hydrobromides, phosphates, sulfates, and the like; and the salts of organic acids such as acetates, propionates, malonates, benzoates, and the like. Additionally, auxiliary substances, such as wetting or emulsifying agents, pH buffering substances, and the like, may be present in such vehicles.

[0113] As is apparent to those skilled in the art, an effective amount of viral vector which must be added can be empirically determined. Administration can be effected in one dose, continuously or intermittently throughout the course of treatment. Methods of determining the most effective means and dosages of administration are well known to those of skill in the art and will vary with the viral vector, the composition of the therapy, and the subject being treated. Single and multiple administrations can be carried out with the dose level and pattern being selected by the treating physician.

[0114] Pharmaceutical Compositions, Methods of Prevention and Treatment, and Use in Same

[0115] The synthetic polypeptides of the invention are useful for replacing defective MeCP2 in the cells of those affected by Rett syndrome or related disorders. Thus the invention provides a method of treating or preventing disease in an animal comprising administering a synthetic polypeptide of the invention. Preferably the disease is a neurological disorder associated with inactivating mutation of MECP2, for example Rett syndrome. Preferably the animal is a human patient.

[0116] The administering in the methods of the invention may comprise administering a composition comprising a synthetic polypeptide of the invention, administering an expression vector of the invention, and/or administering a virion of the invention.

[0117] The invention therefore also provides synthetic polypeptides, expression vectors and virions for the treatment or prevention of a neurological disorder associated with inactivating mutation of MECP2, for example Rett syndrome, and the use of a synthetic polypeptide of the invention, an expression vector of the invention, or a virion of the invention, in the manufacture of a medicament for the treatment or prevention of a neurological disorder associated with inactivating mutation of MECP2, for example Rett syndrome.

[0118] The disorders that may be treated or prevented as provided herein include those involving a reduction or inactivation in the activity of MeCP2. Thus, as used herein, the phrase "inactivating mutation" encompasses mutations that result in reduced MeCP2 activity as well as mutations that abolish MeCP2 activity, and in particular the activity due to the ability of MeCP2 to recruit members of the NCoR/SMRT co-repressor complex to methylated DNA. Such disorders may be recognised, for example, by the identification of mutations in MeCP2 in the subject having, or at risk of having, the disorder. For example, recurrent (e.g. A140V) or sporadic mutations in males have been found to be causative in some cases of X-linked intellectual disability. Similarly, in females, "hypomorphic" mutations of this kind are associated with learning disability, and exome sequencing of children diagnosed with developmental delay is also revealing mutations in MECP2. Such mutations may affect the ability of the MeCP2 to recruit components of the NCoR/SMRT co-repressor complex to methylated DNA, and/or may generally affect the stability of the protein.

[0119] In the context of the methods and medical uses of the present invention, the animal to be treated may be anyone requiring the treatment, or anyone deemed to be at risk of developing a relevant disorder. Suitably the animal may be a mammal, preferably a primate and further preferably a human subject.

[0120] The animal to be treated may present with symptoms suggestive of a MeCP2 associated disorder. Alternatively, the subject may appear to be asymptomatic but deemed to be at risk of developing an MECP2-related disorder caused by loss of MeCP2 function, such that preventative treatment with synthetic polypeptides of the invention is desirable. Suitably an asymptomatic subject may be a subject who is believed to be at elevated risk of having a MeCP2-associated disorder. Such an asymptomatic subject may be one who has a family history of MeCP2, or one who has undergone genetic testing that indicates a mutation in the MECP2 gene.

[0121] The present invention envisions treating or preventing disorders associated with reduced MeCP2 activity by the administration of a therapeutic agent, i.e., a synthetic polypeptide composition, a nucleic acid construct, an expression vector, and/or a virion of the invention. Administration of the therapeutic agents in accordance with the present invention may be continuous or intermittent, depending, for example, upon the recipient's physiological condition, whether the purpose of the administration is therapeutic or prophylactic, and other factors known to skilled practitioners. The administration of the agents of the invention may be essentially continuous over a preselected period of time or may be in a series of spaced doses. Both local and systemic administration is contemplated.

[0122] One or more suitable unit dosage forms having the therapeutic agent(s) of the invention can be administered by a variety of routes including parenteral, including by intravenous and intramuscular routes, as well as by direct injection into the tissue directly associated with the reduced MeCP2 activity. For example, the therapeutic agent may be directly injected into the brain. Alternatively the therapeutic agent may be introduced intrathecally for brain and spinal cord conditions. In another example, the therapeutic agent may be introduced intramuscularly for viruses that traffic back to affected neurons from muscle, such as AAV, lentivirus and adenovirus. The formulations may, where appropriate, be conveniently presented in discrete unit dosage forms and may be prepared by any of the methods well known to pharmacy. Such methods may include the step of bringing into association the therapeutic agent with liquid carriers, solid matrices, semi-solid carriers, finely divided solid carriers or combinations thereof, and then, if necessary, introducing or shaping the product into the desired delivery system.

[0123] When the therapeutic agents of the invention are prepared for administration, they are preferably combined with a pharmaceutically acceptable carrier, diluent or excipient to form a pharmaceutical formulation, or unit dosage form. The total active ingredients in such formulations include from 0.1 to 99.9% by weight of the formulation. A "pharmaceutically acceptable carrier" is a diluent, excipient, and/or salt that is compatible with the other ingredients of the formulation, and not deleterious to the recipient thereof. The active ingredient for administration may be present as a powder or as granules, as a solution, a suspension or an emulsion.

[0124] Pharmaceutical formulations containing the therapeutic agents of the invention can be prepared by procedures known in the art using well known and readily available ingredients.

[0125] The therapeutic agents of the invention can also be formulated as solutions appropriate for parenteral administration, for instance by intramuscular, subcutaneous or intravenous routes.

[0126] The pharmaceutical formulations of the therapeutic agents of the invention can also take the form of an aqueous or anhydrous solution or dispersion, or alternatively the form of an emulsion or suspension.

[0127] Thus, the therapeutic agent may be formulated for parenteral administration (e.g., by injection, for example, bolus injection or continuous infusion) and may be presented in unit dose form in ampules, pre-filled syringes, small volume infusion containers or in multi-dose containers with an added preservative. The active ingredients may take such forms as suspensions, solutions, or emulsions in oily or aqueous vehicles, and may contain formulatory agents such as suspending, stabilizing and/or dispersing agents. Alternatively, the active ingredients may be in powder form, obtained by aseptic isolation of sterile solid or by lyophilization from solution, for constitution with a suitable vehicle, e.g., sterile, pyrogen-free water, before use.

[0128] The pharmaceutical formulations of the present invention may include, as optional ingredients, pharmaceutically acceptable carriers, diluents, solubilizing or emulsifying agents, and salts of the type that are well-known in the art. Specific non-limiting examples of the carriers and/or diluents that are useful in the pharmaceutical formulations of the present invention include water and physiologically acceptable buffered saline solutions such as phosphate buffered saline solutions pH 7.0-8.0 saline solutions and water.

BRIEF DESCRIPTION OF THE FIGURES

[0129] The invention will now be described in detail with reference to a specific embodiment and with reference to the accompanying drawings, in which:

[0130] FIG. 1 shows stepwise deletion of MeCP2 protein, retaining only the two key functional domains predicted by mutational analysis.

[0131] A) Schematic of human MeCP2 protein sequence (e1 isoform) with the methyl-CpG binding domain (MBD) [residues 78-162.sup.12] and the NCoR/SMRT interaction domain (NID) [residues 285-309.sup.4]; annotated with (above): polymorphisms in healthy hemizygous males, RTT-causing missense mutations, and (below) sequence identity to chimpanzee (e1), mouse (e1), Xenopus and zebrafish homologues [sites of insertions in Xenopus and zebrafish sequences shown as longer bars].

[0132] B) Schematic of the deletions series of MeCP2 proteins (mouse e2 isoforms) expressed by the three novel mouse lines presented in this study (and WT-EGFP mice.sup.16).

[0133] C) EGFP-tagged shortened proteins were overexpressed in HeLa cells, and immunoprecipitated using GFP-TRAP beads. Western blots show expression and purification of these protein (GFP), and co-immunoprecipiation of NCoR/SMRT co-repressor complex components (NCoR, HDAC3 and TBL1XR1). WT-EGFP and R306C-EGFP were used as controls to show the presence and absence of binding to these proteins, respectively. `In`=input, `IP`=immunoprecipiate.

[0134] D) EGFP-tagged shortened MeCP2 proteins were overexpressed in NIH-3T3 cells, which were PFA fixed and stained with DAPI. WT-EGFP and R111G-EGFP were used as controls to show focal and diffuse localisation, respectively.

[0135] E) EGFP-tagged shortened proteins were co-overexpressed with TBL1X-mCherry in NIH-3T3 cells, which were PFA fixed and stained with DAPI. WT-EGFP and R306C-EGFP were used as controls to show the presence and absence TBL1X-mCherry recruitment to heterochromatic foci, respectively.

[0136] FIG. 2 shows the design of the MeCP2 deletion series.

[0137] A) Schematic of the genomic DNA sequences of wild-type and .DELTA.NIC MeCP2, showing the retention of the extreme N-terminal amino acids encoded in exons 1 and 2 and the first 10 bp of exon 3, the deletion of the N- and C-termini, the replacement of the intervening region with a linker and SV40 NLS, and the addition of the C-terminal EGFP tag. Colour key: 5'UTR=white, MBD=mid-grey, NID=dark grey, uncharacterised regions=grey, SV40 NLS=mid-grey beside linker, linkers=dark grey and EGFP=C-terminal mid-grey.

[0138] B) The N-terminal ends of the sequences of all three shortened proteins (e1 and e2 isoforms) showing the fusion of the extreme N-terminal amino acids to the MBD (starting with Pro72).

[0139] C), D) Protein sequence alignment of the (C) MBD and (D) NID region using ClustalWS, shaded according to BLOSUM62 score. Both alignments are annotated with: (above) RTT-missense mutations.sup.13 and activity-dependent phosphorylation sites.sup.17,18,19; and (below) sequence conservation, interaction domains and known.sup.20/predicted.sup.21 structure. Interaction sites: meDNA binding (residues 78-162.sup.12), AT hook 1 (residues 183-195.sup.22), AT hook 2 (residues 257-272.sup.23), NCoR/SMRT binding (residues 285-309.sup.4). The bipartite nuclear localisation signal (NLS) is also shown (residues 253-256 and 266-271). Residue numbers correspond to that of mammalian e2 isoforms. The regions retained in .DELTA.NIC are: MBD resides 72-173 (highlighted by the grey rectangle in C) and NID resides 272-312 (highlighted by the grey rectangle in D).

[0140] FIG. 3 shows the constructs for the generation of .DELTA.N and .DELTA.NC mice.

[0141] (Upper) Diagram of (A) .DELTA.N and (B) .DELTA.NC mouse production. The endogenous Mecp2 allele was targeted in male ES cells. The selection cassette was removed in vivo by crossing chimaeras with deleter (CMV-Cre) mice.

[0142] (Lower) Southern blot analysis shows correct targeting of ES cells and successful cassette deletion in the knock-in mice.

[0143] FIG. 4 shows constructs for the generation of .DELTA.NIC and STOP mice.

[0144] (Upper) Diagram of .DELTA.NIC mouse production. The endogenous Mecp2 allele was targeted in male ES cells. The selection cassette was removed in vivo by crossing chimaeras with deleter (CMV-Cre) mice to produce constitutively expressing .DELTA.NIC mice or retained to produce STOP mice.

[0145] (Lower) Southern blot analysis shows correct targeting of ES cells and successful cassette deletion in the .DELTA.NIC knock-in mice.

[0146] FIG. 5 shows that .DELTA.N and .DELTA.NC proteins are expressed at around wild-type levels in knock-in mice.

[0147] A, Upper) Western blot analysis of crude whole brain extract showing protein sizes and levels in .DELTA.N mice (n=3) compared to their wild-type littermates (n=3), detected using a C-terminal MeCP2 antibody.

[0148] A, Lower) Western blot analysis of .DELTA.N (n=3) and .DELTA.NC (n=3) mice compared to WT-EGFP controls (n=3), detected using a GFP antibody. Histone H3 was used as a loading control. *denotes a non-specific band detected by the GFP antibody.

[0149] B) Flow cytometry analysis of protein levels in nuclei prepared from whole brain (`All`) and the high-NeuN subpopulation (`Neurons`) in WT-EGFP (n=3), .DELTA.N (n=3) and .DELTA.NC (n=3) mice, detected using EGFP fluorescence. Graph shows mean.+-.S.E.M and genotypes were compared to WT-EGFP controls by t-test: All .DELTA.N p=0.338, .DELTA.NC **p=0.003; and Neurons .DELTA.N p=0.672, .DELTA.NC *p=0.014.

[0150] FIG. 6 shows deletion of the N- and C-termini has minimal phenotypic consequence.

[0151] A), B) Phenotypic scoring of hemizygous male (A) .DELTA.N mice (n=10) and (B) .DELTA.NC mice (n=10) each compared to their wild-type littermates (n=10) over one year. Graphs show mean scores.+-.S.E.M. Mecp2-null data (n=12).sup.16 is used as a comparator.

[0152] C),D) Kaplan-Meier plots showing survival of the same cohorts in parts A and B. Mecp2-null data (n=24).sup.16 is used as a comparator.

[0153] E), F), G) Behavioural analysis of separate cohorts performed at 20 weeks of age: .DELTA.N (n=10) and .DELTA.NC mice (n=10-11) each compared to their wildtype littermates (n=10). All graphs show individual values and medians, and the results of statistical analysis comparing genotypes (see below): not significant (`n.s.`) p>0.05, *p<0.05.

[0154] E) Time spent in the closed and open arms of the Elevated Plus Maze was measured during a 15 minute trial, and genotypes were compared using KS tests: .DELTA.N cohort (left) closed arms p=0.988 and open arms p=0.759; .DELTA.NC cohort (right) closed arms p=0.956 and open arms p=0.932.

[0155] F) Time spent in the centre region of the Open Field test was measured during a 20 minute trial, and genotypes were compared using t-tests: .DELTA.N p=0.822; .DELTA.NC *p=0.020.

[0156] G) Average latency to fall from the Accelerating Rotarod in four trials was calculated for each of the three days of the experiment, and genotypes were compared using KS tests: .DELTA.N cohort day 1 p=0.759, day 2 p=0.401 and day 3 p=0.055; .DELTA.NC cohort day 1 p=0.988, day 2 p=0.401 and day 3 p=0.759.

[0157] FIG. 7 shows that .DELTA.NC have a slightly increased weight phenotype that is background-dependent.

[0158] A), B) Growth curves of the backcrossed scoring cohorts (see FIG. 6A-D).

[0159] C) Growth curve of an outbred (75% C57BL/6J) cohort of .DELTA.NC mice (n=7) and wild-type littermates (n=9).

[0160] A), B), C) Graphs show mean values.+-.S.E.M. Genotypes were compared using repeated measures .DELTA.NOVA: .DELTA.N p=0.385, .DELTA.NC ****p<0.0001, .DELTA.NC (outbred) p=0.739. Mecp2-null data (n=20).sup.16 is used as a comparator.

[0161] FIG. 8 shows that no activity phenotype was detected for either .DELTA.N or .DELTA.NC mice.

[0162] A), B) Behavioural analysis of .DELTA.N (n=10) and .DELTA.NC mice (n=10) each compared to their wildtype littermates (n=10) at 20 weeks of age (see FIG. 6E-G). Total distance travelled the Open Field test was measured during a 20 minute trial. Graphs show individual values and medians, and genotypes were compared using t-tests: .DELTA.N p=0.691; .DELTA.NC p=0.791.

[0163] FIG. 9 shows that additional deletion of the intervening region leads to protein instability and mild RTT-like symptoms.

[0164] A) Western blot analysis of crude whole brain extract showing protein sizes and levels in .DELTA.NIC mice (n=3) and WT-EGFP controls (n=3), detected using a GFP antibody. Histone H3 was used as a loading control. *denotes a non-specific band detected by the GFP antibody.

[0165] B) Flow cytometry analysis of protein levels in nuclei prepared from whole brain (`All`) and the high-NeuN subpopulation (`Neurons`) in .DELTA.NIC mice (n=3) and WT-EGFP controls (n=3), detected using EGFP fluorescence. Graph shows mean.+-.S.E.M and genotypes were compared by t-test: All ***p=0.0002 and Neurons ***p=0.0001.

[0166] C) Quantitative PCR analysis of mRNA levels prepared from whole brain of .DELTA.NIC mice (n=3) and wild-type littermates (n=3). Mecp2 transcript levels were normalised to Cyclophilin A. Graph shows mean.+-.S.E.M (relative to wild-type) and genotypes were compared by t-test: **p=0.005.

[0167] D) Phenotypic scoring of .DELTA.NIC mice (n=10) compared to their wild-type littermates (n=10) over one year. Graph shows mean scores.+-.S.E.M. Mecp2-null data (n=12).sup.16 is used as a comparator.

[0168] E) Kaplan-Meier plot showing survival of the same cohort in part D. One .DELTA.NIC animal died at 43 weeks without exceeding a severity score of 2.5. Mecp2-null data (n=24).sup.16 is used as a comparator.

[0169] F), G), H) Behavioural analysis of separate cohorts performed at 20 weeks of age: .DELTA.NIC (n=10) compared to their wildtype littermates (n=10). All graphs show individual values and medians, and the results of statistical analysis comparing genotypes (see below): not significant (`n.s.`) p>0.05, *p<0.05, **p<0.01.

[0170] F) Time spent in the closed and open arms and central region of the Elevated Plus Maze was measured during a 15 minute trial, and genotypes were compared using KS tests: closed arms **p=0.003, open arms p=0.055 and centre *p=0.015.

[0171] G) Time spent in the centre region of the Open Field test was measured during a 20 minute trial, and genotypes were compared using a t-test: p=0.402.

[0172] H) Average latency to fall from the Accelerating Rotarod in four trials was calculated for each of the three days of the experiment, and genotypes were compared using KS tests: day 1 p=0.164, day 2 p=0.055 and day 3 **p=0.003. Changed performance (learning/worsening) over the three day period was determined using Friedman tests: wild-type animals p=0.601, .DELTA.NIC animals **p=0.003.

[0173] FIG. 10 shows that outbred .DELTA.NIC mice had 100% survival over one year.

[0174] Kaplan-Meier plot showing survival of an outbred (75% C57BL/6J) cohort of .DELTA.NIC mice (n=10) and their wild-type littermate (n=1). Mecp2-null data (n=24).sup.16 is used as a comparator.

[0175] FIG. 11 shows that .DELTA.NIC mice have decreased body weight.

[0176] Growth curve of the backcrossed scoring cohort (see FIG. 9D-E). Graph shows mean.+-.S.E.M. Genotypes were compared using repeated measures .DELTA.NOVA: ****p<0.0001. Mecp2-null data (n=20).sup.16 is used as a comparator.

[0177] FIG. 12 shows that no activity phenotype was detected for .DELTA.NIC mice.

[0178] Behavioural analysis of .DELTA.NIC (n=10) compared to their wildtype littermates (n=10) at 20 weeks of age (see FIG. 9F-H). Total distance travelled the Open Field test was measured during a 20 minute trial. Graphs show individual values and medians, and genotypes were compared using a t-test p=0.333.

[0179] FIG. 13 shows that .DELTA.NIC mice have a less severe phenotype than the mildest mouse model of RTT, R133C.

[0180] A), B), C) Copy of phenotypic analysis of .DELTA.NIC mice and wild-type littermates presented in FIG. 9D-E and FIG. S11 using EGFP-tagged R133C mice (n=10).sup.16 as a comparator.

[0181] FIG. 14 shows that `STOP` mice with transcriptionally silenced .DELTA.NIC resemble Mecp2 nulls.

[0182] A) Western blot analysis of crude whole brain extract showing protein sizes and levels in STOP mice (n=3) compared to WT-EGFP (n=3) and .DELTA.NIC controls (n=3), detected using a GFP antibody. Histone H3 was used as a loading control. *denotes a non-specific band detected by the GFP antibody.

[0183] B) Flow cytometry analysis of protein levels in nuclei prepared from whole brain (`All`) and the high-NeuN subpopulation (`Neurons`) in WT-EGFP (n=3), .DELTA.NIC (n=3) and STOP (n=3) mice, detected using EGFP fluorescence. Graph shows mean.+-.S.E.M and genotypes were compared using t-tests: **** denotes a p value<0.0001.

[0184] C) Phenotypic scoring of STOP mice (n=22) compared to Mecp2-null data (n=12).sup.16. Graph shows mean scores.+-.S.E.M.

[0185] D) Kaplan-Meier plot showing survival of STOP mice (n=14) compared to Mecp2-null data (n=24).sup.16.

[0186] FIG. 15 shows that reactivation of .DELTA.NIC successfully reverses neurological symptoms in MeCP2-deficient mice.

[0187] A) Timeline of the reversal experiment (results shown in B-C and FIG. 16).

[0188] B) Phenotypic scoring of Tamoxifen-injected mice from 4-28 weeks: WT.sup.T (n=4), WT CreER.sup.T (n=4), STOP.sup.T (n=9) and STOP CreER.sup.T (n=9). Graph shows mean scores.+-.S.E.M.

[0189] C) Kaplan-Meier plot showing survival of the same cohort. Arrows indicate the timing of Tamoxifen injections. .sup.`T` denotes Tamoxifen-injected animals.

[0190] D) Timeline of the AAV-mediated rescue experiment (results shown in E-F and FIG. 17).

[0191] E) Phenotypic scoring of AAV9-injected mice from 5-20 weeks: WT+vehicle (n=19), Null+vehicle (n=21) and Null+.DELTA.NIC (n=11). Graph shows mean scores.+-.S.E.M.

[0192] F) Kaplan-Meier plot showing survival of the same animals. An arrow indicates the timing of the viral injection.

[0193] FIG. 16 shows successful reactivation of .DELTA.NIC in Tamoxifen-injected STOP CreER mice.

[0194] A) Southern blot analysis of genomic DNA to determine the level of recombination by CreER in Tamoxifen (`+Tmx`)-injected STOP CreER animals (n=8). One Tamoxifen-injected STOP animal was included as a negative control showing recombination was dependant on CreER. Other samples were included for reference (see restriction map in FIG. 4).

[0195] B) Protein levels in Tamoxifen-injected STOP CreER animals was determined using western blotting (upper, n=5) and flow cytometry (lower, n=3). Constitutively expressing .DELTA.NIC mice (n=3) were used as a comparator. Graphs show mean values.+-.S.E.M (quantification by western blotting is shown normalised to .DELTA.NIC). Genotypes were compared using t-tests: western blotting p=0.434; flow cytometry All nuclei p=0.128 and Neuronal nuclei *p=0.016.

[0196] FIG. 17 shows that introduction of .DELTA.NIC into wild-type mice does not have adverse consequences.

[0197] A) Phenotypic scoring of AAV9-injected mice from 5-20 weeks: WT+vehicle (n=19) Null+vehicle (n=21) and WT+.DELTA.NIC (n=9). Graph shows mean scores.+-.S.E.M. An arrow indicates the timing of the viral injection.

[0198] B) Design of construct used in the vector delivery of .DELTA.NIC. Putative regulatory elements (RE) in the extended mMeP426 promoter and endogenous distal 3'-UTR are indicated. The extent of the short 229 bp region of the murine Mecp2 endogenous core promoter that is disclosed in the art.sup.29,38 (mMeP229) is shown relative to the mMeP426 promoter used in this construct. The RDH1pA 3'-UTR consists of several exogenous microRNA (miR) binding sites incorporated as a `binding panel` adjacent to a portion of the distal endogenous MECP2 polyadenylation signal and its accompanying regulatory elements. References with an asterisk indicate human in vitro studies, not rodent.

[0199] C) Full, annotated, sequence of the expression cassette illustrated in FIG. 17B, with flanking AAV2 ITRs. This sequence is also provided as SEQ ID NO: 65.

[0200] FIG. 18 shows an alignment of the cDNA sequence of wild type human MECP2 e1 isoform with cDNA sequences encoding polypeptide sequences in accordance with the invention and the experimental results provided herein. "Human WT" (SEQ ID NO: 18) is a cDNA sequence for the wild type MeCP2 isoform 1. "dNIC-Myc" (SEQ ID NO: 62) is a cDNA sequence for a synthetic polypeptide in accordance with the invention having deletions in the N and C-terminal sequences of MeCP2 and in the sequence linking the MBD and NID, and having a Myc tag at the C-terminus. "dNC-Myc" (SEQ ID NO: 63) is a cDNA sequence for a synthetic polypeptide in accordance with the invention having deletions in the N and C-terminal sequences of MeCP2 and having a Myc tag at the C-terminus. The sections of the cDNA sequences corresponding to the extreme N terminus of the polypeptide, providing the e1-specific sequences, the MBD, the NID, the Myc tag, a SV40 NLS, and linkers for attaching the tag and NLS, are all indicated.

[0201] FIG. 19 shows an alignment of the amino acid sequence of wild type human MeCP2 e1 isoform with polypeptide sequences in accordance with the invention and the experimental results provided herein. "Human WT" (SEQ ID NO: 3) is the amino acid sequence for the wild type MeCP2 isoform 1. "dNIC-Myc" (SEQ ID NO: 61) is the amino acid sequence for a synthetic polypeptide in accordance with the invention having deletions in the N and C-terminal sequences of MeCP2 and in the sequence linking the MBD and NID, and having a Myc tag at the C-terminus. "dNC-Myc" (SEQ ID NO: 60) is the amino acid sequence for a synthetic polypeptide in accordance with the invention having deletions in the N and C-terminal sequences of MeCP2 and having a Myc tag at the C-terminus. The sections of the amino acid sequences corresponding to the extreme N terminus of the polypeptide, having the e1-specific sequences, the MBD, the NID, the Myc tag, a SV40 NLS, and linkers for attaching the tag and NLS, are all indicated.

[0202] FIG. 20 shows an alignment of the cDNA sequence of the wild type mouse MECP2 e1 isoform, with an EGFP tag, with cDNA sequences encoding polypeptide sequences in accordance with the invention and the experimental results provided herein. "dNIC-EGFP" (SEQ ID NO: 51) is a cDNA sequence for a synthetic polypeptide in accordance with the invention having deletions in the N and C-terminal sequences of MeCP2 and in the sequence linking the MBD and NID, and having an EGFP tag at the C-terminus. "dNC-EGFP" (SEQ ID NO: 50) is a cDNA sequence for a synthetic polypeptide in accordance with the invention having deletions in the N and C-terminal sequences of MeCP2 and having an EGFP tag at the C-terminus. "WT-EGFP" (SEQ ID NO: 48) is a cDNA sequence for the wild type MeCP2 isoform 1 with an EGFP tag. "dN-EGFP" (SEQ ID NO: 49) is a cDNA sequence for a synthetic polypeptide in accordance with the invention having deletions in the N terminal sequences of MeCP2 and having an EGFP tag at the C-terminus. The sections of the cDNA sequences corresponding to the extreme N terminus of the polypeptide, providing the e1-specific sequences, the MBD, the NID, the EGFP tag, a SV40 NLS, and linkers for attaching the tag and NLS, are all indicated.

[0203] FIG. 21 shows an alignment of the amino acid sequence of wild type human MECP2 e1 isoform, with an EGFP tag, with polypeptide sequences in accordance with the invention and the experimental results provided herein. "WT-EGFP/1-748" (SEQ ID NO: 40) is an amino acid sequence for the wild type MeCP2 isoform 1 with an EGFP tag at the C-terminus. ".DELTA.N-EGFP/1-689" (SEQ ID NO: 41) is the amino acid sequence for a synthetic polypeptide in accordance with the invention having deletions in the N terminal sequences of MeCP2 and having an EGFP tag at the C-terminus. ".DELTA.NIC-EGFP/1-432" (SEQ ID NO: 43) is the amino acid sequence for a synthetic polypeptide in accordance with the invention having deletions in the N and C-terminal sequences of MeCP2 and in the sequence linking the MBD and NID, and having an EGFP tag at the C-terminus. ".DELTA.NC-EGFP/1-516" (SEQ ID NO: 42) is the amino acid sequence for a synthetic polypeptide in accordance with the invention having deletions in the N and C-terminal sequences of MeCP2 and having an EGFP tag at the C-terminus. The sections of the amino acid sequences corresponding to the extreme N terminus of the polypeptide, having the e1-specific sequences, the MBD, the NID, the EGFP tag, a SV40 NLS, and linkers for attaching the tag and NLS, are all indicated.

TABLE-US-00003 SEQUENCES Below are polynucleotide and amino acid sequences used in accordance with the invention. [SEQ ID NO: 1] MeCP2 Methyl-CpG Binding Domain (MBD) polypeptide sequence PAVPEASASPKQRRSIIRDRGPMYDDPTLPEGVVTRKLKQRKSGRSAGKYDVYLINPQGKAF RSKVELIAYFEKVGDTSLDPNDFDFTVTGRGSPSRREQKPP [SEQ ID NO: 2] MeCP2 NCoR/SMRT Interaction Domain (NID) polypeptide sequence PGSVVAAAAAEAKKKAVKESSIRSVQETVLPIKKRKTRETV [SEQ ID NO: 3] Full length human wild type MeCP2 polypeptide sequence (e1 isoform) MAAAAAAAPSGGGGGGEEERLEEKSEDQDLQGLKDKPLKFKKVKKDKKEEKEGKHEPVQ PSAHHSAEPAEAGKAETSEGSGSAPAVPEASASPKQRRSIIRDRGPMYDDPTLPEGVVTRK LKQRKSGRSAGKYDVYLINPQGKAFRSKVELIAYFEKVGDTSLDPNDFDFTVTGRGSPSRR EQKPPKKPKSPKAPGTGRGRGRPKGSGTTRPKAATSEGVQVKRVLEKSPGKLLVKMPFQT SPGGKAEGGGATTSTQVMVIKRPGRKRKAEADPQAIPKKRGRKPGSVVAAAAAEAKKKAV KESSIRSVQETVLPIKKRKTRETVSIEVKEVVKPLLVSTLGEKSGKGLKTCKSPGRKSKESSP KGRSSSASSPPKKEHHHHHHHSESPKAPVPLLPPLPPPPPEPESSEDPTSPPEPQDLSSSV CKEEKMPRGGSLESDGCPKEPAKTQPAVATAATAAEKYKHRGEGERKDIVSSSMPRPNRE EPVDSRTPVTERVS [SEQ ID NO: 4] Full length human wild type MeCP2 polypeptide sequence (e2 isoform) MVAGMLGLREEKSEDQDLQGLKDKPLKFKKVKKDKKEEKEGKHEPVQPSAHHSAEPAEA GKAETSEGSGSAPAVPEASASPKQRRSIIRDRGPMYDDPTLPEGVVTRKLKQRKSGRSAGK YDVYLINPQGKAFRSKVELIAYFEKVGDTSLDPNDFDFTVTGRGSPSRREQKPPKKPKSPK APGTGRGRGRPKGSGTTRPKAATSEGVQVKRVLEKSPGKLLVKMPFQTSPGGKAEGGGA TTSTQVMVIKRPGRKRKAEADPQAIPKKRGRKPGSVVAAAAAEAKKKAVKESSIRSVQETV LPIKKRKTRETVSIEVKEVVKPLLVSTLGEKSGKGLKTCKSPGRKSKESSPKGRSSSASSPP KKEHHHHHHHSESPKAPVPLLPPLPPPPPEPESSEDPTSPPEPQDLSSSVCKEEKMPRGG SLESDGCPKEPAKTQPAVATAATAAEKYKHRGEGERKDIVSSSMPRPNREEPVDSRTPVT ERVS [SEQ ID NO: 5] MeCP2 polypeptide sequence (e1 isoform) N-terminal to the MBD MAAAAAAAPSGGGGGGEEERLEEKSEDQDLQGLKDKPLKFKKVKKDKKEEKEGKHEPVQ PSAHHSAEPAEAGKAETSEGSGSA [SEQ ID NO: 6] MeCP2 polypeptide sequence (e2 isoform) N-terminal to the MBD MVAGMLGLREEKSEDQDLQGLKDKPLKFKKVKKDKKEEKEGKHEPVQPSAHHSAEPAEA GKAETSEGSGSA [SEQ ID NO: 7] MeCP2 polypeptide sequence intervening between the MBD and NID KKPKSPKAPGTGRGRGRPKGSGTTRPKAATSEGVQVKRVLEKSPGKLLVKMPFQTSPGG KAEGGGATTSTQVMVIKRPGRKRKAEADPQAIPKKRGRK [SEQ ID NO: 8] MeCP2 polypeptide sequence C-terminal to the NID SIEVKEVVKPLLVSTLGEKSGKGLKTCKSPGRKSKESSPKGRSSSASSPPKKEHHHHHHHS ESPKAPVPLLPPLPPPPPEPESSEDPTSPPEPQDLSSSVCKEEKMPRGGSLESDGCPKEPA KTQPAVATAATAAEKYKHRGEGERKDIVSSSMPRPNREEPVDSRTPVTERVS [SEQ ID NO: 9] Mouse e1 specific extreme N-terminus polypeptide sequence MAAAAATAAAAAAPSGGGGGGEEERLEEK [SEQ ID NO: 10] Mouse e2 specific extreme N-terminus polypeptide sequence MVAGMLGLREEK [SEQ ID NO: 11] Human e1 specific extreme N-terminus polypeptide sequence MAAAAAAAPSGGGGGGEEERLEEK [SEQ ID NO: 12] Human e2 specific extreme N-terminus polypeptide sequence MVAGMLGLREEK [SEQ ID NO: 13] .DELTA.NC: A truncated synthetic polypeptide sequence (from human MeCP2) PAVPEASASPKQRRSIIRDRGPMYDDPTLPEGVVTRKLKQRKSGRSAGKYDVYLINPQGKAF RSKVELIAYFEKVGDTSLDPNDFDFTVTGRGSPSRREQKPPKKPKSPKAPGTGRGRGRPK GSGTTRPKAATSEGVQVKRVLEKSPGKLLVKMPFQTSPGGKAEGGGATTSTQVMVIKRPG RKRKAEADPQAIPKKRGRKPGSVVAAAAAEAKKKAVKESSIRSVQETVLPIKKRKTRETV [SEQ ID NO: 14] .DELTA.NIC: A truncated synthetic polypeptide sequence (from human MeCP2) PAVPEASASPKQRRSIIRDRGPMYDDPTLPEGVVTRKLKQRKSGRSAGKYDVYLINPQGKAF RSKVELIAYFEKVGDTSLDPNDFDFTVTGRGSPSRREQKPPGSSGSSGPKKKRKVPGSVV AAAAAEAKKKAVKESSIRSVQETVLPIKKRKTRETV [SEQ ID NO: 15] .DELTA.N mouse: A truncated synthetic polypeptide sequence (from mouse MeCP2) PAVPEASASPKQRRSIIRDRGPMYDDPTLPEGVVTRKLKQRKSGRSAGKYDVYLINPQGKAF RSKVELIAYFEKVGDTSLDPNDFDFTVTGRGSPSRREQKPPKKPKSPKAPGTGRGRGRPK GSGTGRPKAAASEGVQVKRVLEKSPGKLVVKMPFQASPGGKGEGGGATTSAQVMVIKRP GRKRKAEADPQAIPKKRGRKPGSVVAAAAAEAKKKAVKESSIRSVHETVLPIKKRKTRETVS IEVKEVVKPLLVSTLGEKSGKGLKTCKSPGRKSKESSPKGRSSSASSPPKKEHHHHHHHSE STKAPMPLLPSPPPPEPESSEDPISPPEPQDLSSSICKEEKMPRGGSLESDGCPKEPAKTQ PMVATTTTVAEKYKHRGEGERKDIVSSSMPRPNREEPVDSRTPVTERVS [SEQ ID NO: 16] .DELTA.NC mouse: A truncated synthetic polypeptide sequence (from mouse MeCP2) PAVPEASASPKQRRSIIRDRGPMYDDPTLPEGVVTRKLKQRKSGRSAGKYDVYLINPQGKAF RSKVELIAYFEKVGDTSLDPNDFDFTVTGRGSPSRREQKPPKKPKSPKAPGTGRGRGRPK GSGTGRPKAAASEGVQVKRVLEKSPGKLVVKMPFQASPGGKGEGGGATTSAQVMVIKRP GRKRKAEADPQAIPKKRGRKPGSVVAAAAAEAKKKAVKESSIRSVHETVLPIKKRKTRETV [SEQ ID NO: 17] .DELTA.NIC mouse: A truncated synthetic polypeptide sequence (from mouse MeCP2) PAVPEASASPKQRRSIIRDRGPMYDDPTLPEGVVTRKLKQRKSGRSAGKYDVYLINPQGKAF RSKVELIAYFEKVGDTSLDPNDFDFTVTGRGSPSRREQKPPGSSGSSGPKKKRKVPGSVV AAAAAEAKKKAVKESSIRSVHETVLPIKKRKTRETV [SEQ ID NO: 18] Full length human wild type MeCP2 cDNA sequence (e1 isoform) ATGGCCGCCGCCGCCGCCGCCGCGCCGAGCGGAGGAGGAGGAGGAGGCGAGGAGG AGAGACTGGAAGAAAAGTCAGAAGACCAGGACCTCCAGGGCCTCAAGGACAAACCCCT CAAGTTTAAAAAGGTGAAGAAAGATAAGAAAGAAGAGAAAGAGGGCAAGCATGAGCCC GTGCAGCCATCAGCCCACCACTCTGCTGAGCCCGCAGAGGCAGGCAAAGCAGAGACA TCAGAAGGGTCAGGCTCCGCCCCGGCTGTGCCGGAAGCTTCTGCCTCCCCCAAACAG CGGCGCTCCATCATCCGTGACCGGGGACCCATGTATGATGACCCCACCCTGCCTGAAG GCTGGACACGGAAGCTTAAGCAAAGGAAATCTGGCCGCTCTGCTGGGAAGTATGATGT GTATTTGATCAATCCCCAGGGAAAAGCCTTTCGCTCTAAAGTGGAGTTGATTGCGTACT TCGAAAAGGTAGGCGACACATCCCTGGACCCTAATGATTTTGACTTCACGGTAACTGGG AGAGGGAGCCCCTCCCGGCGAGAGCAGAAACCACCTAAGAAGCCCAAATCTCCCAAA GCTCCAGGAACTGGCAGAGGCCGGGGACGCCCCAAAGGGAGCGGCACCACGAGACC CAAGGCGGCCACGTCAGAGGGTGTGCAGGTGAAAAGGGTCCTGGAGAAAAGTCCTGG GAAGCTCCTTGTCAAGATGCCTTTTCAAACTTCGCCAGGGGGCAAGGCTGAGGGGGGT GGGGCCACCACATCCACCCAGGTCATGGTGATCAAACGCCCCGGCAGGAAGCGAAAA GCTGAGGCCGACCCTCAGGCCATTCCCAAGAAACGGGGCCGAAAGCCGGGGAGTGTG GTGGCAGCCGCTGCCGCCGAGGCCAAAAAGAAAGCCGTGAAGGAGTCTTCTATCCGA TCTGTGCAGGAGACCGTACTCCCCATCAAGAAGCGCAAGACCCGGGAGACGGTCAGC ATCGAGGTCAAGGAAGTGGTGAAGCCCCTGCTGGTGTCCACCCTCGGTGAGAAGAGC GGGAAAGGACTGAAGACCTGTAAGAGCCCTGGGCGGAAAAGCAAGGAGAGCAGCCCC AAGGGGCGCAGCAGCAGCGCCTCCTCACCCCCCAAGAAGGAGCACCACCACCATCAC CACCACTCAGAGTCCCCAAAGGCCCCCGTGCCACTGCTCCCACCCCTGCCCCCACCTC CACCTGAGCCCGAGAGCTCCGAGGACCCCACCAGCCCCCCTGAGCCCCAGGACTTGA GCAGCAGCGTCTGCAAAGAGGAGAAGATGCCCAGAGGAGGCTCACTGGAGAGCGACG GCTGCCCCAAGGAGCCAGCTAAGACTCAGCCCGCGGTTGCCACCGCCGCCACGGCCG CAGAAAAGTACAAACACCGAGGGGAGGGAGAGCGCAAAGACATTGTTTCATCCTCCAT GCCAAGGCCAAACAGAGAGGAGCCTGTGGACAGCCGGACGCCCGTGACCGAGAGAGT TAGC [SEQ ID NO: 19] .DELTA.NC: cDNA sequence of a truncated synthetic polypeptide sequence (from human MeCP2) CCGGCTGTGCCGGAAGCTTCTGCCTCCCCCAAACAGCGGCGCTCCATCATCCGTGAC CGGGGACCCATGTATGATGACCCCACCCTGCCTGAAGGCTGGACACGGAAGCTTAAG CAAAGGAAATCTGGCCGCTCTGCTGGGAAGTATGATGTGTATTTGATCAATCCCCAGG GAAAAGCCTTTCGCTCTAAAGTGGAGTTGATTGCGTACTTCGAAAAGGTAGGCGACACA TCCCTGGACCCTAATGATTTTGACTTCACGGTAACTGGGAGAGGGAGCCCCTCCCGGC GAGAGCAGAAACCACCTAAGAAGCCCAAATCTCCCAAAGCTCCAGGAACTGGCAGAGG CCGGGGACGCCCCAAAGGGAGCGGCACCACGAGACCCAAGGCGGCCACGTCAGAGG GTGTGCAGGTGAAAAGGGTCCTGGAGAAAAGTCCTGGGAAGCTCCTTGTCAAGATGCC TTTTCAAACTTCGCCAGGGGGCAAGGCTGAGGGGGGTGGGGCCACCACATCCACCCA GGTCATGGTGATCAAACGCCCCGGCAGGAAGCGAAAAGCTGAGGCCGACCCTCAGGC CATTCCCAAGAAACGGGGCCGAAAGCCGGGGAGTGTGGTGGCAGCCGCTGCCGCCG AGGCCAAAAAGAAAGCCGTGAAGGAGTCTTCTATCCGATCTGTGCAGGAGACCGTACT CCCCATCAAGAAGCGCAAGACCCGGGAGACGGTC [SEQ ID NO: 20] .DELTA.NIC: cDNA sequence of a truncated synthetic polypeptide sequence (from human MeCP2) CCGGCTGTGCCGGAAGCTTCTGCCTCCCCCAAACAGCGGCGCTCCATCATCCGTGAC CGGGGACCCATGTATGATGACCCCACCCTGCCTGAAGGCTGGACACGGAAGCTTAAG CAAAGGAAATCTGGCCGCTCTGCTGGGAAGTATGATGTGTATTTGATCAATCCCCAGG GAAAAGCCTTTCGCTCTAAAGTGGAGTTGATTGCGTACTTCGAAAAGGTAGGCGACACA TCCCTGGACCCTAATGATTTTGACTTCACGGTAACTGGGAGAGGGAGCCCCTCCCGGC GAGAGCAGAAACCACCTGGATCCAGTGGCAGCTCTGGGCCCAAGAAAAAGCGGAAGG TGCCGGGGAGTGTGGTGGCAGCCGCTGCCGCCGAGGCCAAAAAGAAAGCCGTGAAG GAGTCTTCTATCCGATCTGTGCAGGAGACCGTACTCCCCATCAAGAAGCGCAAGACCC GGGAGACGGTC [SEQ ID NO: 21] Full length mouse wild type MeCP2 cDNA sequence (el isoform) ATGGCCGCCGCTGCCGCCACCGCCGCCGCCGCCGCCGCGCCGAGCGGAGGAGGAG GAGGAGGCGAGGAGGAGAGACTGGAGGAAAAGTCAGAAGACCAGGATCTCCAGGGCC TCAGAGACAAGCCACTGAAGTTTAAGAAGGCGAAGAAAGACAAGAAGGAGGACAAAGA AGGCAAGCATGAGCCACTACAACCTTCAGCCCACCATTCTGCAGAGCCAGCAGAGGCA GGCAAAGCAGAAACATCAGAAAGCTCAGGCTCTGCCCCAGCAGTGCCAGAAGCCTCG GCTTCCCCCAAACAGCGGCGCTCCATTATCCGTGACCGGGGACCTATGTATGATGACC CCACCTTGCCTGAAGGTTGGACACGAAAGCTTAAACAAAGGAAGTCTGGCCGATCTGC TGGAAAGTATGATGTATATTTGATCAATCCCCAGGGAAAAGCTTTTCGCTCTAAAGTAGA ATTGATTGCATACTTTGAAAAGGTGGGAGACACCTCCTTGGACCCTAATGATTTTGACTT CACGGTAACTGGGAGAGGGAGCCCCTCCAGGAGAGAGCAGAAACCACCTAAGAAGCC CAAATCTCCCAAAGCTCCAGGAACTGGCAGGGGTCGGGGACGCCCCAAAGGGAGCGG CACTGGGAGACCAAAGGCAGCAGCATCAGAAGGTGTTCAGGTGAAAAGGGTCCTGGA GAAGAGCCCTGGGAAACTTGTTGTCAAGATGCCTTTCCAAGCATCGCCTGGGGGTAAG GGTGAGGGAGGTGGGGCTACCACATCTGCCCAGGTCATGGTGATCAAACGCCCTGGC AGAAAGCGAAAAGCTGAAGCTGACCCCCAGGCCATTCCTAAGAAACGGGGTAGAAAGC CTGGGAGTGTGGTGGCAGCTGCTGCAGCTGAGGCCAAAAAGAAAGCCGTGAAGGAGT CTTCCATACGGTCTGTGCATGAGACTGTGCTCCCCATCAAGAAGCGCAAGACCCGGGA GACGGTCAGCATCGAGGTCAAGGAAGTGGTGAAGCCCCTGCTGGTGTCCACCCTTGG TGAGAAAAGCGGGAAGGGACTGAAGACCTGCAAGAGCCCTGGGCGTAAAAGCAAGGA GAGCAGCCCCAAGGGGCGCAGCAGCAGTGCCTCCTCCCCACCTAAGAAGGAGCACCA TCATCACCACCATCACTCAGAGTCCACAAAGGCCCCCATGCCACTGCTCCCATCCCCA CCCCCACCTGAGCCTGAGAGCTCTGAGGACCCCATCAGCCCCCCTGAGCCTCAGGAC TTGAGCAGCAGCATCTGCAAAGAAGAGAAGATGCCCCGAGGAGGCTCACTGGAAAGC GATGGCTGCCCCAAGGAGCCAGCTAAGACTCAGCCTATGGTCGCCACCACTACCACAG TTGCAGAAAAGTACAAACACCGAGGGGAGGGAGAGCGCAAAGACATTGTTTCATCTTC CATGCCAAGGCCAAACAGAGAGGAGCCTGTGGACAGCCGGACGCCCGTGACCGAGAG AGTTAGCTCT [SEQ ID NO: 22] .DELTA.N mouse: cDNA for a truncated synthetic polypeptide sequence (from mouse MeCP2) CCAGCAGTGCCAGAAGCCTCGGCTTCCCCCAAACAGCGGCGCTCCATTATCCGTGACC GGGGACCTATGTATGATGACCCCACCTTGCCTGAAGGTTGGACACGAAAGCTTAAACA AAGGAAGTCTGGCCGATCTGCTGGAAAGTATGATGTATATTTGATCAATCCCCAGGGAA AAGCTTTTCGCTCTAAAGTAGAATTGATTGCATACTTTGAAAAGGTGGGAGACACCTCC TTGGACCCTAATGATTTTGACTTCACGGTAACTGGGAGAGGGAGCCCCTCCAGGAGAG AGCAGAAACCACCTAAGAAGCCCAAATCTCCCAAAGCTCCAGGAACTGGCAGGGGTCG GGGACGCCCCAAAGGGAGCGGCACTGGGAGACCAAAGGCAGCAGCATCAGAAGGTGT TCAGGTGAAAAGGGTCCTGGAGAAGAGCCCTGGGAAACTTGTTGTCAAGATGCCTTTC CAAGCATCGCCTGGGGGTAAGGGTGAGGGAGGTGGGGCTACCACATCTGCCCAGGTC ATGGTGATCAAACGCCCTGGCAGAAAGCGAAAAGCTGAAGCTGACCCCCAGGCCATTC CTAAGAAACGGGGTAGAAAGCCTGGGAGTGTGGTGGCAGCTGCTGCAGCTGAGGCCA AAAAGAAAGCCGTGAAGGAGTCTTCCATACGGTCTGTGCATGAGACTGTGCTCCCCAT CAAGAAGCGCAAGACCCGGGAGACGGTCAGCATCGAGGTCAAGGAAGTGGTGAAGCC CCTGCTGGTGTCCACCCTTGGTGAGAAAAGCGGGAAGGGACTGAAGACCTGCAAGAG CCCTGGGCGTAAAAGCAAGGAGAGCAGCCCCAAGGGGCGCAGCAGCAGTGCCTCCTC CCCACCTAAGAAGGAGCACCATCATCACCACCATCACTCAGAGTCCACAAAGGCCCCC ATGCCACTGCTCCCATCCCCACCCCCACCTGAGCCTGAGAGCTCTGAGGACCCCATCA GCCCCCCTGAGCCTCAGGACTTGAGCAGCAGCATCTGCAAAGAAGAGAAGATGCCCC GAGGAGGCTCACTGGAAAGCGATGGCTGCCCCAAGGAGCCAGCTAAGACTCAGCCTA TGGTCGCCACCACTACCACAGTTGCAGAAAAGTACAAACACCGAGGGGAGGGAGAGC GCAAAGACATTGTTTCATCTTCCATGCCAAGGCCAAACAGAGAGGAGCCTGTGGACAG CCGGACGCCCGTGACCGAGAGAGTTAGCTGT [SEQ ID NO: 23] .DELTA.NC mouse: cDNA for a truncated synthetic polypeptide sequence (from mouse MeCP2) CCAGCAGTGCCAGAAGCCTCGGCTTCCCCCAAACAGCGGCGCTCCATTATCCGTGACC GGGGACCTATGTATGATGACCCCACCTTGCCTGAAGGTTGGACACGAAAGCTTAAACA AAGGAAGTCTGGCCGATCTGCTGGAAAGTATGATGTATATTTGATCAATCCCCAGGGAA AAGCTTTTCGCTCTAAAGTAGAATTGATTGCATACTTTGAAAAGGTGGGAGACACCTCC TTGGACCCTAATGATTTTGACTTCACGGTAACTGGGAGAGGGAGCCCCTCCAGGAGAG AGCAGAAACCACCTAAGAAGCCCAAATCTCCCAAAGCTCCAGGAACTGGCAGGGGTCG GGGACGCCCCAAAGGGAGCGGCACTGGGAGACCAAAGGCAGCAGCATCAGAAGGTGT TCAGGTGAAAAGGGTCCTGGAGAAGAGCCCTGGGAAACTTGTTGTCAAGATGCCTTTC CAAGCATCGCCTGGGGGTAAGGGTGAGGGAGGTGGGGCTACCACATCTGCCCAGGTC ATGGTGATCAAACGCCCTGGCAGAAAGCGAAAAGCTGAAGCTGACCCCCAGGCCATTC CTAAGAAACGGGGTAGAAAGCCTGGGAGTGTGGTGGCAGCTGCTGCAGCTGAGGCCA AAAAGAAAGCCGTGAAGGAGTCTTCCATACGGTCTGTGCATGAGACTGTGCTCCCCAT CAAGAAGCGCAAGACCCGGGAGACGGTC [SEQ ID NO: 24] .DELTA.NIC mouse: cDNA for a truncated synthetic polypeptide sequence (from mouse MeCP2) CCAGCAGTGCCAGAAGCCTCGGCTTCCCCCAAACAGCGGCGCTCCATTATCCGTGACC GGGGACCTATGTATGATGACCCCACCTTGCCTGAAGGTTGGACACGAAAGCTTAAACA AAGGAAGTCTGGCCGATCTGCTGGAAAGTATGATGTATATTTGATCAATCCCCAGGGAA

AAGCTTTTCGCTCTAAAGTAGAATTGATTGCATACTTTGAAAAGGTGGGAGACACCTCC TTGGACCCTAATGATTTTGACTTCACGGTAACTGGGAGAGGGAGCCCCTCCAGGAGAG AGCAGAAACCACCTGGATCCAGTGGCAGCTCTGGGCCCAAGAAAAAGCGGAAGGTGC CTGGGAGTGTGGTGGCAGCTGCTGCAGCTGAGGCCAAAAAGAAAGCCGTGAAGGAGT CTTCCATACGGTCTGTGCATGAGACTGTGCTCCCCATCAAGAAGCGCAAGACCCGGGA GACGGTC [SEQ ID NO: 25] Mouse e1 specific extreme N-terminus cDNA sequence ATGGCCGCCGCTGCCGCCACCGCCGCCGCCGCCGCCGCGCCGAGCGGAGGAGGAG GAGGAGGCGAGGAGGAGAGACTGGAGGAAAAG [SEQ ID NO: 26] Mouse e2 specific extreme N-terminus cDNA sequence ATGGTAGCTGGGATGTTAGGGCTCAGGGAGGAAAAGGGAGGAAAAG [SEQ ID NO: 27] Human e1 specific extreme N-terminus cDNA sequence ATGGCCGCCGCCGCCGCCGCCGCGCCGAGCGGAGGAGGAGGAGGAGGCGAGGAGG AGAGACTGGAAGAAAAG [SEQ ID NO: 28] Human e2 specific extreme N-terminus cDNA sequence ATGGTAGCTGGGATGTTAGGGCTCAGGGAAGAAAAG [SEQ ID NO: 29] Full length mouse wild type MeCP2 polypeptide sequence (e1 isoform) MAAAAATAAAAAAPSGGGGGGEEERLEEKSEDQDLQGLRDKPLKFKKAKKDKKEDKEGK HEPLQPSAHHSAEPAEAGKAETSESSGSAPAVPEASASPKQRRSIIRDRGPMYDDPTLPEG VVTRKLKQRKSGRSAGKYDVYLINPQGKAFRSKVELIAYFEKVGDTSLDPNDFDFTVTGRGS PSRREQKPPKKPKSPKAPGTGRGRGRPKGSGTGRPKAAASEGVQVKRVLEKSPGKLVVK MPFQASPGGKGEGGGATTSAQVMVIKRPGRKRKAEADPQAIPKKRGRKPGSVVAAAAAE AKKKAVKESSIRSVHETVLPIKKRKTRETVSIEVKEVVKPLLVSTLGEKSGKGLKTCKSPGRK SKESSPKGRSSSASSPPKKEHHHHHHHSESTKAPMPLLPSPPPPEPESSEDPISPPEPQDL SSSICKEEKMPRGGSLESDGCPKEPAKTQPMVATTTTVAEKYKHRGEGERKDIVSSSMPR PNREEPVDSRTPVTERVS [SEQ ID NO: 30] Full length mouse wild type MeCP2 polypeptide sequence (e2 isoform) MVAGMLGLREEKSEDQDLQGLRDKPLKFKKAKKDKKEDKEGKHEPLQPSAHHSAEPAEA GKAETSESSGSAPAVPEASASPKQRRSIIRDRGPMYDDPTLPEGVVTRKLKQRKSGRSAGK YDVYLINPQGKAFRSKVELIAYFEKVGDTSLDPNDFDFTVTGRGSPSRREQKPPKKPKSPK APGTGRGRGRPKGSGTGRPKAAASEGVQVKRVLEKSPGKLVVKMPFQASPGGKGEGGG ATTSAQVMVIKRPGRKRKAEADPQAIPKKRGRKPGSVVAAAAAEAKKKAVKESSIRSVHET VLPIKKRKTRETVSIEVKEVVKPLLVSTLGEKSGKGLKTCKSPGRKSKESSPKGRSSSASSP PKKEHHHHHHHSESTKAPMPLLPSPPPPEPESSEDPISPPEPQDLSSSICKEEKMPRGGSL ESDGCPKEPAKTQPMVATTTTVAEKYKHRGEGERKDIVSSSMPRPNREEPVDSRTPVTER VS [SEQ ID NO: 31] Full length mouse wild type MeCP2 cDNA sequence (e2 isoform) ATGGTAGCTGGGATGTTAGGGCTCAGGGAGGAAAAGTCAGAAGACCAGGATCTCCAG GGCCTCAGAGACAAGCCACTGAAGTTTAAGAAGGCGAAGAAAGACAAGAAGGAGGACA AAGAAGGCAAGCATGAGCCACTACAACCTTCAGCCCACCATTCTGCAGAGCCAGCAGA GGCAGGCAAAGCAGAAACATCAGAAAGCTCAGGCTCTGCCCCAGCAGTGCCAGAAGC CTCGGCTTCCCCCAAACAGCGGCGCTCCATTATCCGTGACCGGGGACCTATGTATGAT GACCCCACCTTGCCTGAAGGTTGGACACGAAAGCTTAAACAAAGGAAGTCTGGCCGAT CTGCTGGAAAGTATGATGTATATTTGATCAATCCCCAGGGAAAAGCTTTTCGCTCTAAA GTAGAATTGATTGCATACTTTGAAAAGGTGGGAGACACCTCCTTGGACCCTAATGATTT TGACTTCACGGTAACTGGGAGAGGGAGCCCCTCCAGGAGAGAGCAGAAACCACCTAA GAAGCCCAAATCTCCCAAAGCTCCAGGAACTGGCAGGGGTCGGGGACGCCCCAAAGG GAGCGGCACTGGGAGACCAAAGGCAGCAGCATCAGAAGGTGTTCAGGTGAAAAGGGT CCTGGAGAAGAGCCCTGGGAAACTTGTTGTCAAGATGCCTTTCCAAGCATCGCCTGGG GGTAAGGGTGAGGGAGGTGGGGCTACCACATCTGCCCAGGTCATGGTGATCAAACGC CCTGGCAGAAAGCGAAAAGCTGAAGCTGACCCCCAGGCCATTCCTAAGAAACGGGGTA GAAAGCCTGGGAGTGTGGTGGCAGCTGCTGCAGCTGAGGCCAAAAAGAAAGCCGTGA AGGAGTCTTCCATACGGTCTGTGCATGAGACTGTGCTCCCCATCAAGAAGCGCAAGAC CCGGGAGACGGTCAGCATCGAGGTCAAGGAAGTGGTGAAGCCCCTGCTGGTGTCCAC CCTTGGTGAGAAAAGCGGGAAGGGACTGAAGACCTGCAAGAGCCCTGGGCGTAAAAG CAAGGAGAGCAGCCCCAAGGGGCGCAGCAGCAGTGCCTCCTCCCCACCTAAGAAGGA GCACCATCATCACCACCATCACTCAGAGTCCACAAAGGCCCCCATGCCACTGCTCCCA TCCCCACCCCCACCTGAGCCTGAGAGCTCTGAGGACCCCATCAGCCCCCCTGAGCCT CAGGACTTGAGCAGCAGCATCTGCAAAGAAGAGAAGATGCCCCGAGGAGGCTCACTG GAAAGCGATGGCTGCCCCAAGGAGCCAGCTAAGACTCAGCCTATGGTCGCCACCACTA CCACAGTTGCAGAAAAGTACAAACACCGAGGGGAGGGAGAGCGCAAAGACATTGTTTC ATCTTCCATGCCAAGGCCAAACAGAGAGGAGCCTGTGGACAGCCGGACGCCCGTGAC CGAGAGAGTTAGC

EXPERIMENTAL RESULTS

[0204] 1. Materials and Methods

[0205] Nomenclature

[0206] According to convention, all amino acid numbers given in the following refer to the e2 isoform. Numbers refer to homologous amino acids in human (NCBI accession P51608) and mouse (NCBI accession Q9Z2D6) until residue 385 where there is a two amino acid insertion in the human protein.

[0207] Mutation Analysis

[0208] Mutational data was collected as described previously.sup.4: causative RTT-causing missense mutations were extracted from the RettBASE dataset.sup.13; and polymorphisms identified in healthy hemizygous males were extracted from the Exome Aggregation Consortium (ExAC) database.sup.14.

[0209] Design of Shortened MeCP2 Proteins

[0210] The MBD and NID were defined as residues 72-173 and 272-312, respectively. All three constructs retain the extreme N-terminal sequences encoded by exons 1 and 2--present in isoforms e1 and e2 respectively. They also include the first three amino acids of exons 3 (EEK) to preserve the splice acceptor site. The intervening region (I) was replaced in .DELTA.NIC by the NLS of SV40 preceded by a flexible linker. The sequence of the NLS is PKKKRKV (SEQ ID NO: 32) (DNA sequence: CCCAAGAAAAAGCGGAAGGTG (SEQ ID NO: 33)) and of the linker is GSSGSSG (SEQ ID NO: 34) (DNA sequence: GGATCCAGTGGCAGCTCTGGG (SEQ ID NO: 35)). All three proteins were C-terminally tagged with EGFP connected by a linker. To be consistent with a previous study tagging full-length MeCP2.sup.16, the linker sequence CKDPPVAT (SEQ ID NO: 36) (DNA sequence: TGTAAGGATCCACCGGTCGCCACC (SEQ ID NO: 37)) was used to connect the C-terminus of .DELTA.N to EGFP. To connect the NID to the EGFP tag in .DELTA.NC and .DELTA.NIC, the flexible GSSGSSG (SEQ ID NO: 38) linker was used instead (DNA sequence: GGGAGCTCCGGCAGTTCTGGA (SEQ ID NO: 39)). The amino acid sequences of the e1 and e2 isoforms for WT-EGFP, .DELTA.N-EGFP, .DELTA.NC-EGFP and .DELTA.NIC-EGFP polypeptides are provided herein as SEQ ID NOs 40-47, respectively. The cDNA sequences of the e1 and e2 isoforms for the WT-EGFP, .DELTA.N-EGFP, .DELTA.NC-EGFP and .DELTA.NIC-EGFP polypeptides are provided herein as SEQ ID NOs 48-55, respectively.

[0211] For expression in cultured cells, cDNA sequences encoding e2 isoforms of the MeCP2 deletion series were synthesised (GeneArt, Thermo Fisher Scientific) and cloned into the pEGFPN1 vector (Clontech) using XhoI and NotI restriction sites (NEB). Point mutations (R111G and R3060) were inserted into the WT-EGFP plasmid using the QuikChange II XL Site-Directed Mutagenesis Kit (Agilent Technologies). Primer sequences for R111G: Forward TGGACACGAAAGCTTAAACAAGGGAAGTCTGGCC (SEQ ID NO: 56) and Reverse GGCCAGACTTCCCTTGTTTAAGCTTTCGTGTCCA (SEQ ID NO: 57); and R3060: Forward CTCCCGGGTCTTGCACTTCTTGATGGGGA (SEQ ID NO: 58) and Reverse TCCCCATCAAGAAGTGCAAGACCCGGGAG (SEQ ID NO: 59). For ES cell targeting, genomic sequences encoding exons 3 and 4 of the EGFP-tagged shortened proteins were synthesised (GeneArt, Thermo Fisher Scientific) and cloned into a previously used.sup.24 targeting vector using MfeI restriction sites (NEB). This vector contains a Neomycin resistance gene followed by a transcriptional `STOP` cassette flanked by LoxP sites (`floxed`) in intron 2.

[0212] For viral delivery of shortened MeCP2 proteins, Myc epitope tagged proteins were prepared. The amino acids sequences of human .DELTA.NC-Myc and .DELTA.NIC-Myc polypeptides are provided herein as SEQ ID NOs 60-61, respectively. The cDNA sequences of human .DELTA.NC-Myc and .DELTA.NIC-Myc polypeptides are provided herein as SEQ ID NOs 62-63, respectively.

[0213] Cell Culture

[0214] HeLa and NIH-3T3 cells were grown in DMEM (Gibco) supplemented with 10% foetal bovine serum (FBS; Gibco) and 1% Penicillin-Streptomycin (Gibco). ES cells were grown in Glasgow MEM (Gibco) supplemented with foetal bovine serum (FBS; Gibco-batch tested), 1% Non-essential amino acids (Gibco), 1% Sodium Pyruvate (Gibco), 0.1% .beta.-mercaptoethanol (Gibco) and LIF (ESGRO).

[0215] Immunoprecipitation

[0216] HeLa cells were transfected with pEGFPN1-MeCP2 plasmids using JetPEI (PolyPlus Transfection) and harvested after 24-48 hours. Nuclear extracts were prepared using Benzonase (Sigma E1014-25KU) and 150 mM NaCl, and MeCP2-EGFP complexes were captured using GFP-Trap_A beads (Chromotek) as described previously.sup.4. Proteins were analysed by western blotting using antibodies to GFP (NEB #2956), NCoR (Bethyl A301-146A), HDAC3 (Sigma 3E11) and TBL1XR1 (Bethyl A300-408A), all at a dilution of 1:1000; followed by LI-COR secondary antibodies: IRDye.RTM. 800CW Donkey anti-Mouse (926-32212) and IRDye.RTM. 800CW Donkey anti-Rabbit (926-32213) or IRDye.RTM. 680LT Donkey anti-Rabbit (926-68023) at a dilution of 1:10,000.

[0217] Recruitment Assay

[0218] NIH-3T3 cells were seeded on coverslips in 6 well plates (25,000 cells per well) and transfected with 2 pg plasmid DNA (pEGFPN1-MeCP2 and pmCherry-TBL1X) using JetPEI (PolyPlus Transfection). After 48 hours, cells were fixed with 4% (w/v) paraformaldehyde, stained with DAPI (Sigma) and then mounted using ProLong Diamond (Life Technologies). Fixed cells were photographed using confocal microscopy (Leica SP5).

[0219] Generation of Knock-In Mice

[0220] Targeting vectors were introduced into 129/Ola E14 TG2a ES cells by electroporation, and G418-resistant clones with correct targeting at the Mecp2 locus were identified by PCR and Southern blot screening. CRISPR-Cas9 technology was used to increase the targeting efficiency of .DELTA.N and .DELTA.NIC lines: the guide RNA sequence (GGTTGTGACCCGCCATGGAT) (SEQ ID NO: 64) was cloned into pX330-U6-Chimeric_BB-CBh-hSpCas9 (a gift from Feng Zhang; Addgene plasmid #42230.sup.25), which was introduced into the ES cells with the targeting vectors. This introduced a double-strand cut in intron 2 of the wild-type gene (at the site of the NeoSTOP cassette in the targeting vector). Mice were generated from ES cells as previously described.sup.26.The `floxed` NeoSTOP cassette was removed in vivo by crossing chimaeras with homozygous females from the transgenic CMV-Cre deleter strain (JAX Stock #006054) on a C57BLJ6J background. The CMV-Cre transgene was subsequently bred out. All mice used in this study were bred and maintained at the University of Edinburgh animal facilities under standard conditions and procedures were carried out by staff licensed by the UK Home Office and according with the Animal and Scientific Procedures Act 1986.

[0221] Biochemical Characterisation of Knock-In Mice

[0222] For biochemical analysis, brains were harvested by snap-freezing in liquid nitrogen at 6-13 weeks of age, unless otherwise stated. Brains of hemizygous male mice were used for all analysis, unless otherwise stated. For Southern blot analysis, half brains were homogenised in 50 mM Tris Cl pH7.5, 100 mM NaCl, 5 mM EDTA and treated with 0.4 mg/ml Proteinase K in 1% SDS at 55.degree. C. overnight. Samples were treated with 0.1 mg/ml RNAseA for 1-2 hours at 37.degree. C., before phenol:chloroform extraction of genomic DNA. Genomic DNA was purified from ES cells using Puregene Core Kit A (Qiagen) according to manufacturer's instructions for cultured cells. Genomic DNA was digested with restriction enzymes (NEB), separated by agarose gel electrophoresis and transferred onto ZetaProbe membranes (BioRad). DNA probes homologous to either exon 4 or the end of the 3' homology arm were radioactively labelled with [.alpha.32]dCTP (Perkin Elmer) using the Prime-a-Gene Labeling System (Promega). Blots were probed overnight, washed, and exposed in Phosphorimager cassettes before scanning on a Typhoon FLA 7000. Bands were quantified using ImageQuant software.

[0223] Protein levels in whole brain crude extracts were quantified using western blotting. Extracts were prepared as described previously.sup.16, and blots were probed with antibodies to GFP (NEB #2956) or MeCP2 (Sigma M6818), both at a dilution of 1:1,000, followed by LI-COR secondary antibodies (listed above). Histone H3 (Abcam ab1791) was used as a loading control (dilution 1:10,000). Levels were quantified using Image Studio Lite Ver 4.0 software and compared using t-tests. WT-EGFP mice.sup.16 were used as controls.

[0224] For flow cytometry analysis, fresh brains were harvested from 12 week-old animals and Dounce-homogenised in 5 ml homogenisation buffer (320 mM sucrose, 5 mM CaCl2, 3 mM Mg(Ac)2, 10 mM Tris HCl pH.7.8, 0.1 mM EDTA, 0.1% NP40, 0.1 mM PMSF, 14.3 mM .beta.-mercaptoethanol, protease inhibitors (Roche)), and 5 ml of 50% OptiPrep gradient centrifugation medium (50% Optiprep (Sigma D1556-250ML), 5 mM CaCl2, 3 mM Mg(Ac)2, 10 mM Tris HCl pH7.8, 0.1M PMSF, 14.3 mM .beta.-mercaptoethanol) was added. This was layered on top of 10 ml of 29% OptiPrep solution (v/v in H2O) in Ultra clear Beckman Coulter centrifuge tubes, and samples were centrifuged at 7,500 rpm for 30 mins, 4.degree. C. Pelleted nuclei were resuspended in Resuspension buffer (20% glycerol in DPBS with protease inhibitors (Roche)). For flow cytometry analysis, nuclei were pelleted at 600.times.g (5 mins, 4.degree. C.), washed in 1 ml PBTB (5% (w/v) BSA, 0.1% Triton X-100 in DPBS with protease inhibitors (Roche)), and then resuspended in 250 pl PBTB. To stain for NeuN, 10 .mu.l of NeuN-A60 antibody (Millipore MAB377) was conjugated to Alexa Fluor 647 (APEX Antibody Labelling Kit, Invitrogen A10475), added at a dilution of 1:125 and incubated under rotation for 45 mins at 4.degree. C. Flow cytometry (BD LSRFortessa SORP) was used to obtain the mean EGFP expression for the total nuclei (n=50,000 per sample) and the high NeuN (neuronal) subpopulation (n>8,000 per sample), and genotypes were compared using t-tests. WT-EGFP mice.sup.16 were used as controls.

[0225] To determine mRNA levels, RNA was purified and reverse transcribed from half brains; and Mecp2 and Cyclophilin A transcripts were analysed by qPCR as previously described.sup.16. mRNA levels in .DELTA.NIC mice were compared to wild-type littermates using a t-test.

[0226] Phenotypic Characterisation of Knock-In Mice

[0227] Consistent with a previous study.sup.16, mice were backcrossed four generations to reach .about.94% C57BL/6J before undergoing phenotypic characterisation. Two separate cohorts, each consisting of 10 mutant animals and 10 wild-type littermates, were produced for each novel knock-in line. One cohort was scored and weighed regularly from 4-52 weeks of age as previously described.sup.24,27. Survival was graphed using Kaplan Meier plots. (A preliminary outbred [75% C57BL/6J] cohort of 7 .DELTA.NC mice and 9 wild-type littermates was also scored.) The second backcrossed cohort underwent behavioural analysis at 20-21 weeks of age (see .sup.27 and .sup.16 for detailed protocols). Tests were performed over a two-week period: Elevated Plus Maze on day 1, Open Field test on day 2, and Accelerating Rotarod test on days 6-9 (one day of training followed by three days of trials). All analysis was performed blind to genotype.

[0228] Statistical Analysis

[0229] Growth curves were compared using repeated measures .DELTA.NOVA. For behavioural analysis, when all data fitted a normal distribution (Open Field centre time and distance travelled), genotypes were compared using t-tests. If not (Elevated Plus Maze time in arms and Accelerating Rotarod latency to fall), genotypes were compared using Kolmogorov-Smirnov tests. Change in performance over time in the Accelerating Rotarod test was determined using Friedman tests.

[0230] Genetic Reactivation of Minimal MeCP2 (.DELTA.NIC)

[0231] Transcriptionally silent minimal MeCP2 (.DELTA.NIC) was reactivated in symptomatic null-like `STOP` mice following the procedure used in .sup.27. In short, the .DELTA.NIC Mecp2 allele was inactivated by the retention of the NeoSTOP cassette in intron 2 by mating chimaeras with wild-type females instead of deleter mice. Resulting STOP/+ females were crossed with heterozygous Cre-ER transgenic males (JAX Stock #004682) to produce males of four genotypes (87.5% C57BLJ6J). A cohort consisting of all four genotypes WT (n=4), WT CreER (n=4), STOP (n=9) and STOP CreER (n=9), was scored and weighed weekly from 4 weeks of age. From 6 weeks (when STOP and STOP CreER mice displayed RTT-like symptoms), all individuals were given a series of Tamoxifen injections: two weekly followed by five daily, each at a dose of 100 pg/g body weight. Brain tissue from Tamoxifen-treated STOP CreER (n=8), WT (n=1) and WT CreER (n=1) animals was harvested at 28 weeks of age (after successful symptom reversal in STOP CreER mice) for biochemical analysis. Brain tissue from one Tamoxifen-treated STOP mouse was also included in the biochemical analysis (methods described above).

[0232] Vector Delivery of Minimal MeCP2 (.DELTA.NIC)

[0233] Minimal MeCP2 (.DELTA.NIC) AAV vector was tested in Mecp2-null and WT mice maintained on a C57BL/6 background. Recombinant AAV vector particles were generated at the UNC Gene Therapy Center Vector Core facility. Self-complementary AAV (scAAV) particles (AAV2 ITR-flanked genomes packaged into AAV9 capsids) were produced from suspension HEK293 cells transfected using polyethyleneimine (Polysciences, Warrington, Pa.) with helper plasmids (pXX6-80, pGSK2/9) and a plasmid containing the ITR-flanked .DELTA.NIC transgene construct. The construct used is illustrated in FIG. 17B, and the annotated sequence (SEQ ID NO: 65) of the ITR-flanked .DELTA.NIC transgene construct is shown in FIG. 17C. For translational relevance, the .DELTA.NIC-expressing construct utilized the equivalent human MECP2 e1 coding sequence and with a small C-terminal Myc epitope tag replacing the EGFP tag used in other experiments. The transgene was under the control of an extended endogenous Mecp2 promoter fragment (MeP426) incorporating additional promoter regulatory elements and a putative silencer element (FIGS. 17B,C). The construct also incorporated a novel 3'-UTR consisting of a fragment of the endogenous MECP2 3'UTR together with a selected panel of binding sites for miRNAs known to be involved in regulation of Mecp2.sup.39-41 (FIGS. 17B,C). Virus production was performed as previously described.sup.28, and vector prepared in a final formulation of high-salt PBS (containing 350 mM total NaCl) supplemented with 5% sorbitol. For brain injection into mice, direct bilateral injections of virus (3 .mu.l per site; dose=1.times.10.sup.11 viral genome per mouse) were delivered into the neuropil of unanaesthetised P1/2 males, as described previously.sup.29. Control injections were made using the same diluent lacking vector (`vehicle control`). The injected pups were returned to the home cage and assessed weekly as described above.

[0234] 2. Results

[0235] The amino acid sequence of MeCP2 is highly conserved throughout vertebrate species (FIG. 1A), suggesting that most of the protein is subject to purifying selection. This supports the widely-held view that its interactions with multiple binding partners are of functional importance: with which MeCP2 has been implicated in several cellular pathways required for proper neuronal function.sup.11,3. An alternative picture emerges when analysing the distribution of RTT-causing missense mutations, highlighting only the MBD and NID--a small minority of the protein--as critical (FIG. 1A). Furthermore, exome sequencing data collected from healthy individuals shows a large number of polymorphisms in the other regions of the protein (FIG. 1A), suggesting these sequences are dispensable. To test whether the MBD and NID might be sufficient for MeCP2 function, we designed a stepwise series of deletions of the endogenous gene to remove regions N-terminal to the MBD (.DELTA.N), C-terminal to the NID (.DELTA.C) and the intervening amino acids between these domains (.DELTA.I) (FIG. 1B). The intervening region was replaced by a nuclear localisation signal (NLS) sequence derived from SV40 virus, connected by short linkers. The Mecp2 gene has four exons, with transcripts alternatively spliced to produce two isoforms that differ only at the extreme N-termini.sup.30. To maintain the Mecp2 gene structure in the knock-in mice, the constructs retained exons 1 and 2 as well as the first 10 bp of exon 3 (splice acceptor site), resulting in the inclusion of 29 and 12 N-terminal amino acids for isoforms e1 and e2, respectively (FIGS. 2A-B, 3, 4). A C-terminal EGFP tag was added to facilitate detection and recovery, as tagging does not affect MeCP2 function in mice.sup.16 (FIG. 1B). Taking into account mapped binding sites, structural information and evolutionary conservation, we encompassed the MBD as residues 72-173 and the NID as residues 272-312 (FIG. S1C-D). The proportion of native MeCP2 protein sequence retained in .DELTA.N, .DELTA.NC and .DELTA.NIC is 88%, 52% and 32% of wild-type, respectively.

[0236] We first tested whether the shortened MeCP2 proteins retained the ability to interact with methylated DNA and the NCoR/SMRT co-repressor complex using cell culture-based assays. All three protein derivatives immunoprecipitated endogenous NCoR/SMRT complex components when overexpressed in HeLa cells, whereas this interaction was abolished in the negative control NID mutant, R306C (FIG. 1C). To assay mCpG binding, we asked whether expressed proteins localised to mCpG-rich pericentric heterochromatic foci in mouse fibroblasts. Previous work established that localisation of wild-type MeCP2 to these foci is dependent on both DNA methylation.sup.31,32 and MBD functionality.sup.33. All three shortened versions of MeCP2 localised to heterochromatic foci, whereas a negative control MBD mutant (R111G) showed a diffuse nuclear distribution (FIG. 1D). To determine whether the shortened proteins could bind chromatin and the NCoR/SMRT complex simultaneously, we asked if they were able to recruit TBL1X, an NCoR/SMRT subunit that binds directly to MeCP2.sup.4, to heterochromatin. Over-expressed TBL1X-mCherry lacks an NLS and is therefore cytoplasmic, but in the presence of over-expressed MeCP2 it is efficiently recruited to heterochromatic foci.sup.4. All shortened MeCP2 proteins likewise recruited TBL1X to the heterochromatic foci, demonstrating their ability to bridge DNA with the co-repressor (FIG. 1E). The MeCP2 NID mutant control (R306C) itself localised correctly, but as described previously.sup.4 was unable to relocate TBL1X from the cytoplasm (FIG. 1E). These three assays confirm that all shortened proteins retain the ability to bind methylated DNA and the NCoR/SMRT complex and form a bridge between them.

[0237] We initially generated .DELTA.N and .DELTA.NC knock-in mice by replacing the endogenous Mecp2 allele in ES cells followed by blastocyst injection and germ line transmission (FIG. 3). These truncated proteins were expressed at approximately wild-type levels in whole brain and in neurons as determined by western blot and flow cytometry analyses (FIG. 5A-B). To assess the phenotype of these truncations, knock-in mice were crossed onto a C57BLJ6J background and cohorts underwent weekly phenotypic scoring.sup.24,27 or behavioural analysis. Both .DELTA.N and .DELTA.NC hemizygous male mice were viable, fertile and showed phenotypic scores indistinguishable from their wild-type littermates over the course of a year (FIG. 6A-D). .DELTA.N mice had no body weight phenotype (FIG. 7A), whereas .DELTA.NC mice displayed a slight increase in weight compared to wild-type littermates (FIG. 7B, repeated measures ANOVA p<0.0001). The weight difference was absent in a more outbred (75% C57BL/6J) cohort of .DELTA.NC mice (FIG. 7C), consistent with previous observations that body weight phenotypes in RTT models are affected by genetic background.sup.26.

[0238] At 20 weeks of age, separate cohorts were tested for behaviours commonly reported in RTT models: hypoactivity, decreased anxiety and reduced motor abilities. No activity phenotype (analysed by total distance travelled in the Open Field test) was detected for either the .DELTA.N or .DELTA.NC mice (FIG. 8). No anxiety phenotype (analysed by increased time spent in the open arms of the Elevated Plus Maze) was detected for either novel mouse line (FIG. 6E). The .DELTA.NC mice did, however, spend significantly more time than their wild-type littermates in the central square of the Open Field arena (FIG. 6F), indicative of mildly decreased anxiety. Motor coordination was assessed using the Accelerating Rotarod test over three days. Whereas mouse models of RTT show impaired performance in this test that is most striking on the third day.sup.34,16, .DELTA.N and .DELTA.NC mice were not significantly different from their wild-type littermates on any of the three days (FIG. 6G). Overall, the results suggest that contributions of the N- and C-terminal domains to MeCP2 function are at best subtle. This result is particularly remarkable given RTT-like symptoms in male mice expressing a slightly more severe C-terminal truncation, which lacks residues beyond T308.sup.35. The difference in phenotype may be explained by retention of full NID function in .DELTA.NC mice, as previous evidence indicates that loss of the further four C-terminal amino acids (309-312) reduces the affinity of this truncated MeCP2 molecule for the NCoR/SMRT co-repressor complex.sup.4.

[0239] We next replaced the endogenous Mecp2 gene with .DELTA.NIC, the minimal allele, containing only the MBD and NID domains and comprising 32% of the full-length protein sequence (FIGS. 1B, 4). Protein levels in whole brain were quantified by western blotting and flow cytometry, both of which showed reduced abundance (.about.50% of WT-EGFP controls; FIG. 9A-B). A similar reduction in protein abundance was also seen in the neuronal subpopulation (.about.40% of WT-EGFP controls; FIG. 9B). Low protein levels were not due to transcriptional silencing, as mRNA was in fact more abundant in .DELTA.NIC mice than in wild-type littermates (FIG. 9C), suggesting that deletion of the intermediate region compromises protein stability. Despite low protein levels, male .DELTA.NIC mice had a normal lifespan (FIGS. 9E, 10). Phenotypic scoring over one year detected mild neurological phenotypes (FIG. 9D), predominantly gait abnormalities and partial hind-limb clasping. These symptoms persisted throughout the scoring period, but did not become more severe. .DELTA.NIC mice also weighed .about.40% less than their wild-type siblings (FIG. 11A; repeated measures .DELTA.NOVA p<0.0001). As seen in this study, both increases and decreases in body weight have been previously reported in MeCP2-mutant mouse models.sup.26,36,23,16. Behavioural analysis of a separate cohort at 20 weeks showed decreased anxiety in male .DELTA.NIC mice, as evidenced by the significantly reduced time spent in the closed arms of the Elevated Plus Maze (FIG. 9F, KS test p=0.003). This result was not supported by the Open Field test (FIG. 9G), which also detected no activity phenotype (FIG. 12). Consistent with the gait defects detected in the scoring cohort, .DELTA.NIC mice had reduced motor coordination, shown by declining performance over three daily trials on the Accelerating Rotarod (FIG. 9H, Friedman test p=0.003). This resulted in significantly impaired performance on the third day of testing compared to wild-type littermates (KS test p=0.003). Overall, it is noteworthy that .DELTA.NIC animals are much less severely affected than male mice with the mildest common mutation found in RTT patients, R133C, which had a median lifespan of 42 weeks, higher symptomatic scores and a stronger reduced weight phenotype.sup.16 (FIG. 13). This result strongly supports our hypothesis that recruitment of the NCoR/SMRT co-repressor complex to chromatin is the primary function of MeCP2, with the mild phenotype observed being a likely consequence of reduced protein levels, as previously described for hypomorphic mice that express full-length MeCP2 at 50% of wild-type levels.sup.37.

[0240] To further test the functionality of minimal MeCP2, we asked whether late provision of .DELTA.NIC via genetic reactivation could reverse phenotypic defects in symptomatic MeCP2-deficient mice, as has previously been shown for the full-length protein.sup.24. We generated null-like MeCP2-deficient mice by preventing .DELTA.NIC expression with a floxed transcriptional STOP cassette in intron 2 (FIGS. 4, 14). These mice were crossed with mice carrying a CreER transgene (Cre recombinase fused to a modified estrogen receptor) to enable reactivation upon Tamoxifen treatment. This was induced after the onset of symptoms in STOP CreER mice (FIG. 15A), resulting in high levels of Cre recombination (FIG. 16A) and protein levels similar to .DELTA.NIC mice (FIG. 16B). .DELTA.NIC gene reactivation had a dramatic effect on phenotypic progression, ameliorating neurological symptoms and restoring normal survival (FIG. 15B-C). In contrast, STOP mice lacking the CreER transgene failed to survive beyond 26 weeks. Thus, despite its radically reduced length and relatively low abundance, .DELTA.NIC was able to effectively reverse the RTT-like phenotype in MeCP2-deficient mice.

[0241] This finding prompted us to explore whether .DELTA.NIC could be used for gene therapy, which we tested in Mecp2-null mice. The .DELTA.NIC gene, driven by a minimal Mecp2 promoter, was tagged with a Myc epitope (in place of much larger EGFP) and packaged into an adeno-associated viral vector (AAV9). Neonatal mice (P1-2) were injected intra-cranially with this virus or the AAV vehicle alone (FIG. 15D). Mecp2-null animals receiving the .DELTA.NIC gene showed greatly reduced symptom severity and enhanced survival (FIG. 15E-F). Despite the lack of fine control over infection rate per brain cell, we did not observe deleterious effects due to over-expression, even in wild-type animals (FIG. 17). It is conceivable that toxicity is mitigated by the moderate instability and/or reduced activity of .DELTA.NIC protein. This experiment also shows that .DELTA.NIC protein is functional without the large EGFP tag. The use of minimal MeCP2 could provide a therapeutic advantage due to the restricted capacity of AAV vectors. Shortening the coding sequence creates room for additional regulatory sequences, enabling better control of expression levels.

[0242] 3. Discussion

[0243] Overall our results argue against the view that MeCP2 functions as a multifunctional hub, and instead support a simpler model whereby its predominant function is to recruit the NCoR/SMRT co-repressor complex to methylated sites on chromatin. It is noteworthy that the minimal MeCP2 protein (.DELTA..DELTA.NIC) is missing all or part of several domains that have been highlighted as potentially important, including the AT-hooks.sup.23, several activity-dependent phosphorylation sites.sup.17,18, an RNA binding motif .sup.6 and interaction sites for proteins implicated in micro-RNA processing.sup.9, splicing.sup.10 and chromatin remodelling.sup.8. Importantly, our discovery that these two domains are sufficient to restore neuronal function to MeCP2-deficient mice has allowed us to show the therapeutic potential of the minimal protein.

[0244] 4. Additional Experiment

[0245] The appearance of toxicity in the form of motor dysfunction, ataxia and apparent loss of proprioception when full length Mecp2/MECP2 is delivered to mice has been reported previously.sup.49,50. An independent report has recently shown an identical stereotyped ataxia and loss of proprioception in response to delivery of the AAV9 variant (AAVhu68) in larger mammalian species and has shown that the peripheral nerve dorsal root ganglia may be especially susceptible to AAV9 variant dosing.sup.51.

[0246] We have performed a direct comparison of full length MECP2 and the .DELTA.NIC MECP2 (Table 3). We observed a significant reduction of ataxia and proprioception dysfunction in the AAV9 .DELTA.NIC MECP2-treated animals compared to mice treated with full length MECP2. These data support the fact that, under identical conditions, the .DELTA.NIC MECP2 minigene confers reduced susceptibility to known peripheral neurotoxicity compared to full length MeCP2.

TABLE-US-00004 TABLE 3 Comparison of full length MECP2 and the .DELTA.NIC MECP2 AAV9- MeP229/MECP2 refers to a AAV2/9 vector having the strong 229 bp fragment of endogenous Mecp2 promoter and full length MECP2; AAV9-MeP426/MECP2 refers to a AAV2/9 vector having the 426 bp fragment of endogenous Mecp2 promoter and full length MECP2; AAV9-MeP426/.DELTA.NIC MECP2 refers to a AAV2/9 vector having the 426 bp fragment of endogenous Mecp2 promoter and the .DELTA.NIC MECP2 minigene insert. Vector Toxicity AAV9-MeP229/MECP2 Severe ataxia/loss of proprioception in 100% of treated mice AAV9-MeP426/MECP2 Mild ataxia/loss of proprioception/clasping in 100% of treated mice AAV9-MeP426/.DELTA.NIC Mild ataxia/loss of proprioception/clasping MECP2 in <25% (4 of 17) of treated mice

REFERENCES

[0247] 1. Kinde, B. et al. DNA methylation in the gene body influences MeCP2-mediated gene repression. Proc. Natl. Acad. Sci. U.S.A. 113, 15114-15119 (2016)

[0248] 2. Adams, V H et al. Intrinsic Disorder and Autonomous Domain Function in the Multifunctional Nuclear Protein, MeCP2. J. Biol. Chem. 282, 15057-64 (2007)

[0249] 3. Lyst, M. J. & Bird, A. Rett syndrome: a complex disorder with simple roots. Nat. Rev. Genet. 16, 261-274 (2015).

[0250] 4. Lyst, M. J. et al. Rett syndrome mutations abolish the interaction of MeCP2 with the NCoR/SMRT co-repressor. Nat. Neurosci. 16, 898-902 (2013).

[0251] 5. Chahrour, M. et al. MeCP2, a key contributor to neurological disease, activates and represses transcription. Science 320, 1224-9 (2008).

[0252] 6. Jeffery, L. & Nakielny, S. Components of the DNA methylation system of chromatin control are RNA-binding proteins. J. Biol. Chem. 279, 49479-49487 (2004).

[0253] 7. Nan, X. et al. Interaction between chromatin proteins MECP2 and ATRX is disrupted by mutations that cause inherited mental retardation. Proc. Natl. Acad. Sci. U.S.A. 104, 2709-14 (2007).

[0254] 8. Agarwal, N. et al. MeCP2 interacts with HP1 and modulates its heterochromatin association during myogenic differentiation. Nucleic Acids Res. 35, 5402-8 (2007).

[0255] 9. Cheng, T.-L. et al. MeCP2 suppresses nuclear microRNA processing and dendritic growth by regulating the DGCR8/Drosha complex. Dev. Cell 28, 547-60 (2014).

[0256] 10. Young, J. I. et al. Regulation of RNA splicing by the methylation-dependent transcriptional repressor methyl-CpG binding protein 2. Proc. Natl. Acad. Sci. U.S.A. 102, 17551-8 (2005).

[0257] 11. Ragione, F. Della, Vacca, M., Fioriniello, S., Pepe, G. & Esposito, M. D. MECP2 , a multi-talented modulator of chromatin architecture. Brief. Funct. Genomics 15, 1-12 (2016).

[0258] 12. Nan, X., Meehan, R. R. & Bird, A. Dissection of the methyl-CpG binding domain from the chromosomal protein MeCP2. Nucleic Acids Res. 21, 4886-4892 (1993).

[0259] 13. RettBase: Rett Syndrome Variation Database. at <http://mecp2.chw.edu.au/>

[0260] 14. Exome Aggregation Consortium (ExAC), Cambridge, Mass. at http://exac.broadinstitute.org

[0261] 15. Dinca, A et al Intracellular delivery of proteins with cell-penetrating peptides for therapeutic uses in human disease. Int J Mol Sci. 17(2): 263 (2016).

[0262] 16. Brown, K. et al. The molecular basis of variable phenotypic severity among common missense mutations causing Rett syndrome. Hum. Mol. Genet. 25, 558-570 (2016).

[0263] 17. Zhou, Z. et al. Brain-Specific Phosphorylation of MeCP2 Regulates Activity-Dependent Bdnf Transcription, Dendritic Growth, and Spine Maturation. Neuron 52, 255-269 (2006).

[0264] 18. Tao, J. et al. Phosphorylation of MeCP2 at Serine 80 regulates its chromatin association and neurological function. Proc. Natl. Acad. Sci. U.S.A. 106, 4882-7 (2009).

[0265] 19. Ebert, D. H. et al. Activity-dependent phosphorylation of MeCP2 threonine 308 regulates interaction with NCoR. Nature 499, 341-5 (2013).

[0266] 20. Ho, K. L. et al. MeCP2 binding to DNA depends upon hydration at methyl-CpG. Mol. Cell 29, 525-31 (2008).

[0267] 21. PHD Secondary structure prediction method. at <https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_ph- d.html>

[0268] 22. Lyst, M. J., Connelly, J., Merusi, C. & Bird, A. Sequence-specific DNA binding by AT-hook motifs in MeCP2. FEBS Lett. 590, 2927-2933 (2016).

[0269] 23. Baker, S. A. et al. An AT-hook domain in MeCP2 determines the clinical course of Rett syndrome and related disorders. Cell 152, 984-96 (2013).

[0270] 24. Guy, J., Gan, J., Selfridge, J., Cobb, S. & Bird, A. Reversal of neurological defects in a mouse model of Rett syndrome. Science 315, 1143-7 (2007).

[0271] 25. Cong, L. et al. Multiplex Genome Engineering Using CRISPR/VCas Systems. Science (80-.). 339,819-823 (2013).

[0272] 26. Guy, J., Hendrich, B., Holmes, M., Martin, J. E. & Bird, a. A mouse Mecp2-null mutation causes neurological symptoms that mimic Rett syndrome. Nat. Genet. 27, 322-6 (2001).

[0273] 27. Cheval, H. et al. Postnatal inactivation reveals enhanced requirement for MeCP2 at distinct age windows. Hum. Mol. Genet. 21, 3806-3814 (2012).

[0274] 28. Clement, N. & Grieger, J. C. Manufacturing of recombinant adeno-associated viral vectors for clinical trials. Mol. Ther. Methods Clin. Dev. 3, 16002 (2016).

[0275] 29. Gadalla, K. K. E. et al. Improved survival and reduced phenotypic severity following AAV9/MECP2 gene transfer to neonatal and juvenile male Mecp2 knockout mice. Mol. Ther. 21, 18-30 (2013).

[0276] 30. Kriaucionis, S. & Bird, A. The major form of MeCP2 has a novel N-terminus generated by alternative splicing. Nucleic Acids Res. 32, 1818-23 (2004).

[0277] 31. Lewis, J. D. et al. Purification, sequence, and cellular localization of a novel chromosomal protein that binds to methylated DNA. Cell 69, 905-14 (1992).

[0278] 32. Nan, X., Tate, P., Li, E. & Bird, A. DNA methylation specifies chromosomal localization of MeCP2. Mol. Cell. Biol. 16, 414-21 (1996).

[0279] 33. Kudo, S. et al. Heterogeneity in residual function of MeCP2 carrying missense mutations in the methyl CpG binding domain. J. Med. Genet. 40, 487-93 (2003).

[0280] 34. Goffin, D. et al. Rett syndrome mutation MeCP2 T158A disrupts DNA binding, protein stability and ERP responses. Nat. Neurosci. 15, 274-83 (2012).

[0281] 35. Shahbazian, M. et al. Mice with truncated MeCP2 recapitulate many Rett syndrome features and display hyperacetylation of histone H3. Neuron 35, 243-54 (2002).

[0282] 36. Chen, R. Z., Akbarian, S., Tudor, M. & Jaenisch, R. Deficiency of methyl-CpG binding protein-2 in CNS neurons results in a Rett-like phenotype in mice. Nat. Genet. 27, 327-331 (2001).

[0283] 37. Samaco, R. C. et al. A partial loss of function allele of Methyl-CpG-binding protein 2 predicts a human neurodevelopmental syndrome. Hum. Mol. Genet. 17, 1718-1727 (2008).

[0284] 38. Gray, S J, Foti, S B, Schwartz, J W, Bachaboina, L, Taylor-Blake, B, Coleman, J, et al. (2011). Optimizing promoters for recombinant adeno-associated virus-mediated gene expression in the peripheral and central nervous system using self-complementary vectors. Hum Gene Ther 22: 1143-1153.

[0285] 39. Feng, Y, Huang, W, Wani, M, Yu, X, and Ashraf, M (2014). Ischemic preconditioning potentiates the protective effect of stem cells through secretion of exosomes by targeting Mecp2 via miR-22. PLoS One 9: e88685.

[0286] 40. Jovicic, A, Roshan, R, Moisoi, N, Pradervand, S, Moser, R, Pillai, B, et al. (2013). Comprehensive expression analyses of neural cell-type-specific miRNAs identify new determinants of the specification and maintenance of neuronal phenotypes. J Neurosci 33: 5127-5137.

[0287] 41. Klein, M E, Lioy, D T, Ma, L, Impey, S, Mandel, G, and Goodman, R H (2007). Homeostatic regulation of MeCP2 expression by a CREB-induced microRNA. Nat Neurosci 10: 1513-1514.

[0288] 42. Liu, J, and Francke, U (2006). Identification of cis-regulatory elements for MECP2 expression. Human molecular genetics 15: 1769-1782.

[0289] 43. Adachi, M, Keefer, E W, and Jones, F S (2005). A segment of the Mecp2 promoter is sufficient to drive expression in neurons. Human molecular genetics 14: 3709-3722.

[0290] 44. Liyanage, V R, Zachariah, R M, and Rastegar, M (2013). Decitabine alters the expression of Mecp2 isoforms via dynamic DNA methylation at the Mecp2 regulatory elements in neural stem cells. Molecular autism 4: 46.

[0291] 45. Visvanathan, J, Lee, S, Lee, B, Lee, J W, and Lee, S K (2007). The microRNA miR-124 antagonizes the anti-neural REST/SCP1 pathway during embryonic CNS development. Genes Dev 21: 744-749.

[0292] 46. Coy, J F, Sedlacek, Z, Bachner, D, Delius, H, and Poustka, A (1999). A complex pattern of evolutionary conservation and alternative polyadenylation within the long 3''-untranslated region of the methyl-CpG-binding protein 2 gene (MeCP2) suggests a regulatory role in gene expression. Human molecular genetics 8: 1253-1262.

[0293] 47. Bagga, J S, and D'Antonio, L A (2013). Role of conserved cis-regulatory elements in the post-transcriptional regulation of the human MECP2 gene involved in autism. Human genomics 7: 19.

[0294] 48. Newnham, C M, Hall-Pogar, T, Liang, S, Wu, J, Tian, B, Hu, J, et al. (2010). Alternative polyadenylation of MeCP2: Influence of cis-acting elements and trans-acting factors. RNA biology 7: 361-372.

[0295] 49. Gadalla, K. (2012) Virus-mediated delivery of MECP2 as a potential tool for the treatment of Rett syndrome. PhD thesis, http://theses.gla.ac.uk/id/eprint/3501

[0296] 50. Gadalla, K. K. E., Vudhironarit, T., Hector, R. D., Sinnett, S., Bahey, N. G., Bailey, M. E. S., Gray, S. J., Cobb, S. R. (2017) Development of a Novel AAV Gene Therapy Cassette with Improved Safety Features and Efficacy in a Mouse Model of Rett Syndrome. Mol Ther Methods Clin Dev. 5 :180-190. doi: 10.1016/j.omtm.2017.04.007.

[0297] 51. Hinderer, C., Katz, N., Buza, E. L., Dyer, C., Goode, T., Bell, P., Richman, L. K., Wilson, J. M. (2018) Severe Toxicity in Nonhuman Primates and Piglets Following High-Dose Intravenous Administration of an Adeno-Associated Virus Vector Expressing Human SMN. Hum Gene Ther. doi: 10.1089/hum.2018.015.

Sequence CWU 1

1

651102PRTHomo sapiens 1Pro Ala Val Pro Glu Ala Ser Ala Ser Pro Lys Gln Arg Arg Ser Ile1 5 10 15Ile Arg Asp Arg Gly Pro Met Tyr Asp Asp Pro Thr Leu Pro Glu Gly 20 25 30Trp Thr Arg Lys Leu Lys Gln Arg Lys Ser Gly Arg Ser Ala Gly Lys 35 40 45Tyr Asp Val Tyr Leu Ile Asn Pro Gln Gly Lys Ala Phe Arg Ser Lys 50 55 60Val Glu Leu Ile Ala Tyr Phe Glu Lys Val Gly Asp Thr Ser Leu Asp65 70 75 80Pro Asn Asp Phe Asp Phe Thr Val Thr Gly Arg Gly Ser Pro Ser Arg 85 90 95Arg Glu Gln Lys Pro Pro 100241PRTHomo sapiens 2Pro Gly Ser Val Val Ala Ala Ala Ala Ala Glu Ala Lys Lys Lys Ala1 5 10 15Val Lys Glu Ser Ser Ile Arg Ser Val Gln Glu Thr Val Leu Pro Ile 20 25 30Lys Lys Arg Lys Thr Arg Glu Thr Val 35 403498PRTHomo sapiens 3Met Ala Ala Ala Ala Ala Ala Ala Pro Ser Gly Gly Gly Gly Gly Gly1 5 10 15Glu Glu Glu Arg Leu Glu Glu Lys Ser Glu Asp Gln Asp Leu Gln Gly 20 25 30Leu Lys Asp Lys Pro Leu Lys Phe Lys Lys Val Lys Lys Asp Lys Lys 35 40 45Glu Glu Lys Glu Gly Lys His Glu Pro Val Gln Pro Ser Ala His His 50 55 60Ser Ala Glu Pro Ala Glu Ala Gly Lys Ala Glu Thr Ser Glu Gly Ser65 70 75 80Gly Ser Ala Pro Ala Val Pro Glu Ala Ser Ala Ser Pro Lys Gln Arg 85 90 95Arg Ser Ile Ile Arg Asp Arg Gly Pro Met Tyr Asp Asp Pro Thr Leu 100 105 110Pro Glu Gly Trp Thr Arg Lys Leu Lys Gln Arg Lys Ser Gly Arg Ser 115 120 125Ala Gly Lys Tyr Asp Val Tyr Leu Ile Asn Pro Gln Gly Lys Ala Phe 130 135 140Arg Ser Lys Val Glu Leu Ile Ala Tyr Phe Glu Lys Val Gly Asp Thr145 150 155 160Ser Leu Asp Pro Asn Asp Phe Asp Phe Thr Val Thr Gly Arg Gly Ser 165 170 175Pro Ser Arg Arg Glu Gln Lys Pro Pro Lys Lys Pro Lys Ser Pro Lys 180 185 190Ala Pro Gly Thr Gly Arg Gly Arg Gly Arg Pro Lys Gly Ser Gly Thr 195 200 205Thr Arg Pro Lys Ala Ala Thr Ser Glu Gly Val Gln Val Lys Arg Val 210 215 220Leu Glu Lys Ser Pro Gly Lys Leu Leu Val Lys Met Pro Phe Gln Thr225 230 235 240Ser Pro Gly Gly Lys Ala Glu Gly Gly Gly Ala Thr Thr Ser Thr Gln 245 250 255Val Met Val Ile Lys Arg Pro Gly Arg Lys Arg Lys Ala Glu Ala Asp 260 265 270Pro Gln Ala Ile Pro Lys Lys Arg Gly Arg Lys Pro Gly Ser Val Val 275 280 285Ala Ala Ala Ala Ala Glu Ala Lys Lys Lys Ala Val Lys Glu Ser Ser 290 295 300Ile Arg Ser Val Gln Glu Thr Val Leu Pro Ile Lys Lys Arg Lys Thr305 310 315 320Arg Glu Thr Val Ser Ile Glu Val Lys Glu Val Val Lys Pro Leu Leu 325 330 335Val Ser Thr Leu Gly Glu Lys Ser Gly Lys Gly Leu Lys Thr Cys Lys 340 345 350Ser Pro Gly Arg Lys Ser Lys Glu Ser Ser Pro Lys Gly Arg Ser Ser 355 360 365Ser Ala Ser Ser Pro Pro Lys Lys Glu His His His His His His His 370 375 380Ser Glu Ser Pro Lys Ala Pro Val Pro Leu Leu Pro Pro Leu Pro Pro385 390 395 400Pro Pro Pro Glu Pro Glu Ser Ser Glu Asp Pro Thr Ser Pro Pro Glu 405 410 415Pro Gln Asp Leu Ser Ser Ser Val Cys Lys Glu Glu Lys Met Pro Arg 420 425 430Gly Gly Ser Leu Glu Ser Asp Gly Cys Pro Lys Glu Pro Ala Lys Thr 435 440 445Gln Pro Ala Val Ala Thr Ala Ala Thr Ala Ala Glu Lys Tyr Lys His 450 455 460Arg Gly Glu Gly Glu Arg Lys Asp Ile Val Ser Ser Ser Met Pro Arg465 470 475 480Pro Asn Arg Glu Glu Pro Val Asp Ser Arg Thr Pro Val Thr Glu Arg 485 490 495Val Ser4486PRTHomo sapiens 4Met Val Ala Gly Met Leu Gly Leu Arg Glu Glu Lys Ser Glu Asp Gln1 5 10 15Asp Leu Gln Gly Leu Lys Asp Lys Pro Leu Lys Phe Lys Lys Val Lys 20 25 30Lys Asp Lys Lys Glu Glu Lys Glu Gly Lys His Glu Pro Val Gln Pro 35 40 45Ser Ala His His Ser Ala Glu Pro Ala Glu Ala Gly Lys Ala Glu Thr 50 55 60Ser Glu Gly Ser Gly Ser Ala Pro Ala Val Pro Glu Ala Ser Ala Ser65 70 75 80Pro Lys Gln Arg Arg Ser Ile Ile Arg Asp Arg Gly Pro Met Tyr Asp 85 90 95Asp Pro Thr Leu Pro Glu Gly Trp Thr Arg Lys Leu Lys Gln Arg Lys 100 105 110Ser Gly Arg Ser Ala Gly Lys Tyr Asp Val Tyr Leu Ile Asn Pro Gln 115 120 125Gly Lys Ala Phe Arg Ser Lys Val Glu Leu Ile Ala Tyr Phe Glu Lys 130 135 140Val Gly Asp Thr Ser Leu Asp Pro Asn Asp Phe Asp Phe Thr Val Thr145 150 155 160Gly Arg Gly Ser Pro Ser Arg Arg Glu Gln Lys Pro Pro Lys Lys Pro 165 170 175Lys Ser Pro Lys Ala Pro Gly Thr Gly Arg Gly Arg Gly Arg Pro Lys 180 185 190Gly Ser Gly Thr Thr Arg Pro Lys Ala Ala Thr Ser Glu Gly Val Gln 195 200 205Val Lys Arg Val Leu Glu Lys Ser Pro Gly Lys Leu Leu Val Lys Met 210 215 220Pro Phe Gln Thr Ser Pro Gly Gly Lys Ala Glu Gly Gly Gly Ala Thr225 230 235 240Thr Ser Thr Gln Val Met Val Ile Lys Arg Pro Gly Arg Lys Arg Lys 245 250 255Ala Glu Ala Asp Pro Gln Ala Ile Pro Lys Lys Arg Gly Arg Lys Pro 260 265 270Gly Ser Val Val Ala Ala Ala Ala Ala Glu Ala Lys Lys Lys Ala Val 275 280 285Lys Glu Ser Ser Ile Arg Ser Val Gln Glu Thr Val Leu Pro Ile Lys 290 295 300Lys Arg Lys Thr Arg Glu Thr Val Ser Ile Glu Val Lys Glu Val Val305 310 315 320Lys Pro Leu Leu Val Ser Thr Leu Gly Glu Lys Ser Gly Lys Gly Leu 325 330 335Lys Thr Cys Lys Ser Pro Gly Arg Lys Ser Lys Glu Ser Ser Pro Lys 340 345 350Gly Arg Ser Ser Ser Ala Ser Ser Pro Pro Lys Lys Glu His His His 355 360 365His His His His Ser Glu Ser Pro Lys Ala Pro Val Pro Leu Leu Pro 370 375 380Pro Leu Pro Pro Pro Pro Pro Glu Pro Glu Ser Ser Glu Asp Pro Thr385 390 395 400Ser Pro Pro Glu Pro Gln Asp Leu Ser Ser Ser Val Cys Lys Glu Glu 405 410 415Lys Met Pro Arg Gly Gly Ser Leu Glu Ser Asp Gly Cys Pro Lys Glu 420 425 430Pro Ala Lys Thr Gln Pro Ala Val Ala Thr Ala Ala Thr Ala Ala Glu 435 440 445Lys Tyr Lys His Arg Gly Glu Gly Glu Arg Lys Asp Ile Val Ser Ser 450 455 460Ser Met Pro Arg Pro Asn Arg Glu Glu Pro Val Asp Ser Arg Thr Pro465 470 475 480Val Thr Glu Arg Val Ser 485583PRTHomo sapiens 5Met Ala Ala Ala Ala Ala Ala Ala Pro Ser Gly Gly Gly Gly Gly Gly1 5 10 15Glu Glu Glu Arg Leu Glu Glu Lys Ser Glu Asp Gln Asp Leu Gln Gly 20 25 30Leu Lys Asp Lys Pro Leu Lys Phe Lys Lys Val Lys Lys Asp Lys Lys 35 40 45Glu Glu Lys Glu Gly Lys His Glu Pro Val Gln Pro Ser Ala His His 50 55 60Ser Ala Glu Pro Ala Glu Ala Gly Lys Ala Glu Thr Ser Glu Gly Ser65 70 75 80Gly Ser Ala671PRTHomo sapiens 6Met Val Ala Gly Met Leu Gly Leu Arg Glu Glu Lys Ser Glu Asp Gln1 5 10 15Asp Leu Gln Gly Leu Lys Asp Lys Pro Leu Lys Phe Lys Lys Val Lys 20 25 30Lys Asp Lys Lys Glu Glu Lys Glu Gly Lys His Glu Pro Val Gln Pro 35 40 45Ser Ala His His Ser Ala Glu Pro Ala Glu Ala Gly Lys Ala Glu Thr 50 55 60Ser Glu Gly Ser Gly Ser Ala65 70798PRTHomo sapiens 7Lys Lys Pro Lys Ser Pro Lys Ala Pro Gly Thr Gly Arg Gly Arg Gly1 5 10 15Arg Pro Lys Gly Ser Gly Thr Thr Arg Pro Lys Ala Ala Thr Ser Glu 20 25 30Gly Val Gln Val Lys Arg Val Leu Glu Lys Ser Pro Gly Lys Leu Leu 35 40 45Val Lys Met Pro Phe Gln Thr Ser Pro Gly Gly Lys Ala Glu Gly Gly 50 55 60Gly Ala Thr Thr Ser Thr Gln Val Met Val Ile Lys Arg Pro Gly Arg65 70 75 80Lys Arg Lys Ala Glu Ala Asp Pro Gln Ala Ile Pro Lys Lys Arg Gly 85 90 95Arg Lys8174PRTHomo sapiens 8Ser Ile Glu Val Lys Glu Val Val Lys Pro Leu Leu Val Ser Thr Leu1 5 10 15Gly Glu Lys Ser Gly Lys Gly Leu Lys Thr Cys Lys Ser Pro Gly Arg 20 25 30Lys Ser Lys Glu Ser Ser Pro Lys Gly Arg Ser Ser Ser Ala Ser Ser 35 40 45Pro Pro Lys Lys Glu His His His His His His His Ser Glu Ser Pro 50 55 60Lys Ala Pro Val Pro Leu Leu Pro Pro Leu Pro Pro Pro Pro Pro Glu65 70 75 80Pro Glu Ser Ser Glu Asp Pro Thr Ser Pro Pro Glu Pro Gln Asp Leu 85 90 95Ser Ser Ser Val Cys Lys Glu Glu Lys Met Pro Arg Gly Gly Ser Leu 100 105 110Glu Ser Asp Gly Cys Pro Lys Glu Pro Ala Lys Thr Gln Pro Ala Val 115 120 125Ala Thr Ala Ala Thr Ala Ala Glu Lys Tyr Lys His Arg Gly Glu Gly 130 135 140Glu Arg Lys Asp Ile Val Ser Ser Ser Met Pro Arg Pro Asn Arg Glu145 150 155 160Glu Pro Val Asp Ser Arg Thr Pro Val Thr Glu Arg Val Ser 165 170929PRTMus musculus 9Met Ala Ala Ala Ala Ala Thr Ala Ala Ala Ala Ala Ala Pro Ser Gly1 5 10 15Gly Gly Gly Gly Gly Glu Glu Glu Arg Leu Glu Glu Lys 20 251012PRTMus musculus 10Met Val Ala Gly Met Leu Gly Leu Arg Glu Glu Lys1 5 101124PRTHomo sapiens 11Met Ala Ala Ala Ala Ala Ala Ala Pro Ser Gly Gly Gly Gly Gly Gly1 5 10 15Glu Glu Glu Arg Leu Glu Glu Lys 201212PRTHomo sapiens 12Met Val Ala Gly Met Leu Gly Leu Arg Glu Glu Lys1 5 1013241PRTArtificial SequencedNC A truncated synthetic polypeptide sequence (from human MeCP2) 13Pro Ala Val Pro Glu Ala Ser Ala Ser Pro Lys Gln Arg Arg Ser Ile1 5 10 15Ile Arg Asp Arg Gly Pro Met Tyr Asp Asp Pro Thr Leu Pro Glu Gly 20 25 30Trp Thr Arg Lys Leu Lys Gln Arg Lys Ser Gly Arg Ser Ala Gly Lys 35 40 45Tyr Asp Val Tyr Leu Ile Asn Pro Gln Gly Lys Ala Phe Arg Ser Lys 50 55 60Val Glu Leu Ile Ala Tyr Phe Glu Lys Val Gly Asp Thr Ser Leu Asp65 70 75 80Pro Asn Asp Phe Asp Phe Thr Val Thr Gly Arg Gly Ser Pro Ser Arg 85 90 95Arg Glu Gln Lys Pro Pro Lys Lys Pro Lys Ser Pro Lys Ala Pro Gly 100 105 110Thr Gly Arg Gly Arg Gly Arg Pro Lys Gly Ser Gly Thr Thr Arg Pro 115 120 125Lys Ala Ala Thr Ser Glu Gly Val Gln Val Lys Arg Val Leu Glu Lys 130 135 140Ser Pro Gly Lys Leu Leu Val Lys Met Pro Phe Gln Thr Ser Pro Gly145 150 155 160Gly Lys Ala Glu Gly Gly Gly Ala Thr Thr Ser Thr Gln Val Met Val 165 170 175Ile Lys Arg Pro Gly Arg Lys Arg Lys Ala Glu Ala Asp Pro Gln Ala 180 185 190Ile Pro Lys Lys Arg Gly Arg Lys Pro Gly Ser Val Val Ala Ala Ala 195 200 205Ala Ala Glu Ala Lys Lys Lys Ala Val Lys Glu Ser Ser Ile Arg Ser 210 215 220Val Gln Glu Thr Val Leu Pro Ile Lys Lys Arg Lys Thr Arg Glu Thr225 230 235 240Val14157PRTArtificial SequencedNIC A truncated synthetic polypeptide sequence (from human MeCP2) 14Pro Ala Val Pro Glu Ala Ser Ala Ser Pro Lys Gln Arg Arg Ser Ile1 5 10 15Ile Arg Asp Arg Gly Pro Met Tyr Asp Asp Pro Thr Leu Pro Glu Gly 20 25 30Trp Thr Arg Lys Leu Lys Gln Arg Lys Ser Gly Arg Ser Ala Gly Lys 35 40 45Tyr Asp Val Tyr Leu Ile Asn Pro Gln Gly Lys Ala Phe Arg Ser Lys 50 55 60Val Glu Leu Ile Ala Tyr Phe Glu Lys Val Gly Asp Thr Ser Leu Asp65 70 75 80Pro Asn Asp Phe Asp Phe Thr Val Thr Gly Arg Gly Ser Pro Ser Arg 85 90 95Arg Glu Gln Lys Pro Pro Gly Ser Ser Gly Ser Ser Gly Pro Lys Lys 100 105 110Lys Arg Lys Val Pro Gly Ser Val Val Ala Ala Ala Ala Ala Glu Ala 115 120 125Lys Lys Lys Ala Val Lys Glu Ser Ser Ile Arg Ser Val Gln Glu Thr 130 135 140Val Leu Pro Ile Lys Lys Arg Lys Thr Arg Glu Thr Val145 150 15515413PRTArtificial SequencedN mouse A truncated synthetic polypeptide sequence (from mouse MeCP2) 15Pro Ala Val Pro Glu Ala Ser Ala Ser Pro Lys Gln Arg Arg Ser Ile1 5 10 15Ile Arg Asp Arg Gly Pro Met Tyr Asp Asp Pro Thr Leu Pro Glu Gly 20 25 30Trp Thr Arg Lys Leu Lys Gln Arg Lys Ser Gly Arg Ser Ala Gly Lys 35 40 45Tyr Asp Val Tyr Leu Ile Asn Pro Gln Gly Lys Ala Phe Arg Ser Lys 50 55 60Val Glu Leu Ile Ala Tyr Phe Glu Lys Val Gly Asp Thr Ser Leu Asp65 70 75 80Pro Asn Asp Phe Asp Phe Thr Val Thr Gly Arg Gly Ser Pro Ser Arg 85 90 95Arg Glu Gln Lys Pro Pro Lys Lys Pro Lys Ser Pro Lys Ala Pro Gly 100 105 110Thr Gly Arg Gly Arg Gly Arg Pro Lys Gly Ser Gly Thr Gly Arg Pro 115 120 125Lys Ala Ala Ala Ser Glu Gly Val Gln Val Lys Arg Val Leu Glu Lys 130 135 140Ser Pro Gly Lys Leu Val Val Lys Met Pro Phe Gln Ala Ser Pro Gly145 150 155 160Gly Lys Gly Glu Gly Gly Gly Ala Thr Thr Ser Ala Gln Val Met Val 165 170 175Ile Lys Arg Pro Gly Arg Lys Arg Lys Ala Glu Ala Asp Pro Gln Ala 180 185 190Ile Pro Lys Lys Arg Gly Arg Lys Pro Gly Ser Val Val Ala Ala Ala 195 200 205Ala Ala Glu Ala Lys Lys Lys Ala Val Lys Glu Ser Ser Ile Arg Ser 210 215 220Val His Glu Thr Val Leu Pro Ile Lys Lys Arg Lys Thr Arg Glu Thr225 230 235 240Val Ser Ile Glu Val Lys Glu Val Val Lys Pro Leu Leu Val Ser Thr 245 250 255Leu Gly Glu Lys Ser Gly Lys Gly Leu Lys Thr Cys Lys Ser Pro Gly 260 265 270Arg Lys Ser Lys Glu Ser Ser Pro Lys Gly Arg Ser Ser Ser Ala Ser 275 280 285Ser Pro Pro Lys Lys Glu His His His His His His His Ser Glu Ser 290 295 300Thr Lys Ala Pro Met Pro Leu Leu Pro Ser Pro Pro Pro Pro Glu Pro305 310 315 320Glu Ser Ser Glu Asp Pro Ile Ser Pro Pro Glu Pro Gln Asp Leu Ser 325 330 335Ser Ser Ile Cys Lys Glu Glu Lys Met Pro Arg Gly Gly Ser Leu Glu 340 345 350Ser Asp Gly Cys Pro Lys Glu Pro Ala Lys Thr Gln Pro Met Val Ala 355 360 365Thr Thr Thr Thr Val Ala Glu Lys Tyr Lys His Arg Gly Glu Gly Glu 370

375 380Arg Lys Asp Ile Val Ser Ser Ser Met Pro Arg Pro Asn Arg Glu Glu385 390 395 400Pro Val Asp Ser Arg Thr Pro Val Thr Glu Arg Val Ser 405 41016241PRTArtificial SequencedNC mouse A truncated synthetic polypeptide sequence (from mouse MeCP2) 16Pro Ala Val Pro Glu Ala Ser Ala Ser Pro Lys Gln Arg Arg Ser Ile1 5 10 15Ile Arg Asp Arg Gly Pro Met Tyr Asp Asp Pro Thr Leu Pro Glu Gly 20 25 30Trp Thr Arg Lys Leu Lys Gln Arg Lys Ser Gly Arg Ser Ala Gly Lys 35 40 45Tyr Asp Val Tyr Leu Ile Asn Pro Gln Gly Lys Ala Phe Arg Ser Lys 50 55 60Val Glu Leu Ile Ala Tyr Phe Glu Lys Val Gly Asp Thr Ser Leu Asp65 70 75 80Pro Asn Asp Phe Asp Phe Thr Val Thr Gly Arg Gly Ser Pro Ser Arg 85 90 95Arg Glu Gln Lys Pro Pro Lys Lys Pro Lys Ser Pro Lys Ala Pro Gly 100 105 110Thr Gly Arg Gly Arg Gly Arg Pro Lys Gly Ser Gly Thr Gly Arg Pro 115 120 125Lys Ala Ala Ala Ser Glu Gly Val Gln Val Lys Arg Val Leu Glu Lys 130 135 140Ser Pro Gly Lys Leu Val Val Lys Met Pro Phe Gln Ala Ser Pro Gly145 150 155 160Gly Lys Gly Glu Gly Gly Gly Ala Thr Thr Ser Ala Gln Val Met Val 165 170 175Ile Lys Arg Pro Gly Arg Lys Arg Lys Ala Glu Ala Asp Pro Gln Ala 180 185 190Ile Pro Lys Lys Arg Gly Arg Lys Pro Gly Ser Val Val Ala Ala Ala 195 200 205Ala Ala Glu Ala Lys Lys Lys Ala Val Lys Glu Ser Ser Ile Arg Ser 210 215 220Val His Glu Thr Val Leu Pro Ile Lys Lys Arg Lys Thr Arg Glu Thr225 230 235 240Val17157PRTArtificial SequencedNIC mouse A truncated synthetic polypeptide sequence (from mouse MeCP2) 17Pro Ala Val Pro Glu Ala Ser Ala Ser Pro Lys Gln Arg Arg Ser Ile1 5 10 15Ile Arg Asp Arg Gly Pro Met Tyr Asp Asp Pro Thr Leu Pro Glu Gly 20 25 30Trp Thr Arg Lys Leu Lys Gln Arg Lys Ser Gly Arg Ser Ala Gly Lys 35 40 45Tyr Asp Val Tyr Leu Ile Asn Pro Gln Gly Lys Ala Phe Arg Ser Lys 50 55 60Val Glu Leu Ile Ala Tyr Phe Glu Lys Val Gly Asp Thr Ser Leu Asp65 70 75 80Pro Asn Asp Phe Asp Phe Thr Val Thr Gly Arg Gly Ser Pro Ser Arg 85 90 95Arg Glu Gln Lys Pro Pro Gly Ser Ser Gly Ser Ser Gly Pro Lys Lys 100 105 110Lys Arg Lys Val Pro Gly Ser Val Val Ala Ala Ala Ala Ala Glu Ala 115 120 125Lys Lys Lys Ala Val Lys Glu Ser Ser Ile Arg Ser Val His Glu Thr 130 135 140Val Leu Pro Ile Lys Lys Arg Lys Thr Arg Glu Thr Val145 150 155181494DNAHomo sapiens 18atggccgccg ccgccgccgc cgcgccgagc ggaggaggag gaggaggcga ggaggagaga 60ctggaagaaa agtcagaaga ccaggacctc cagggcctca aggacaaacc cctcaagttt 120aaaaaggtga agaaagataa gaaagaagag aaagagggca agcatgagcc cgtgcagcca 180tcagcccacc actctgctga gcccgcagag gcaggcaaag cagagacatc agaagggtca 240ggctccgccc cggctgtgcc ggaagcttct gcctccccca aacagcggcg ctccatcatc 300cgtgaccggg gacccatgta tgatgacccc accctgcctg aaggctggac acggaagctt 360aagcaaagga aatctggccg ctctgctggg aagtatgatg tgtatttgat caatccccag 420ggaaaagcct ttcgctctaa agtggagttg attgcgtact tcgaaaaggt aggcgacaca 480tccctggacc ctaatgattt tgacttcacg gtaactggga gagggagccc ctcccggcga 540gagcagaaac cacctaagaa gcccaaatct cccaaagctc caggaactgg cagaggccgg 600ggacgcccca aagggagcgg caccacgaga cccaaggcgg ccacgtcaga gggtgtgcag 660gtgaaaaggg tcctggagaa aagtcctggg aagctccttg tcaagatgcc ttttcaaact 720tcgccagggg gcaaggctga ggggggtggg gccaccacat ccacccaggt catggtgatc 780aaacgccccg gcaggaagcg aaaagctgag gccgaccctc aggccattcc caagaaacgg 840ggccgaaagc cggggagtgt ggtggcagcc gctgccgccg aggccaaaaa gaaagccgtg 900aaggagtctt ctatccgatc tgtgcaggag accgtactcc ccatcaagaa gcgcaagacc 960cgggagacgg tcagcatcga ggtcaaggaa gtggtgaagc ccctgctggt gtccaccctc 1020ggtgagaaga gcgggaaagg actgaagacc tgtaagagcc ctgggcggaa aagcaaggag 1080agcagcccca aggggcgcag cagcagcgcc tcctcacccc ccaagaagga gcaccaccac 1140catcaccacc actcagagtc cccaaaggcc cccgtgccac tgctcccacc cctgccccca 1200cctccacctg agcccgagag ctccgaggac cccaccagcc cccctgagcc ccaggacttg 1260agcagcagcg tctgcaaaga ggagaagatg cccagaggag gctcactgga gagcgacggc 1320tgccccaagg agccagctaa gactcagccc gcggttgcca ccgccgccac ggccgcagaa 1380aagtacaaac accgagggga gggagagcgc aaagacattg tttcatcctc catgccaagg 1440ccaaacagag aggagcctgt ggacagccgg acgcccgtga ccgagagagt tagc 149419723DNAArtificial SequencedNC cDNA sequence of a truncated synthetic polypeptide sequence (from human MeCP2) 19ccggctgtgc cggaagcttc tgcctccccc aaacagcggc gctccatcat ccgtgaccgg 60ggacccatgt atgatgaccc caccctgcct gaaggctgga cacggaagct taagcaaagg 120aaatctggcc gctctgctgg gaagtatgat gtgtatttga tcaatcccca gggaaaagcc 180tttcgctcta aagtggagtt gattgcgtac ttcgaaaagg taggcgacac atccctggac 240cctaatgatt ttgacttcac ggtaactggg agagggagcc cctcccggcg agagcagaaa 300ccacctaaga agcccaaatc tcccaaagct ccaggaactg gcagaggccg gggacgcccc 360aaagggagcg gcaccacgag acccaaggcg gccacgtcag agggtgtgca ggtgaaaagg 420gtcctggaga aaagtcctgg gaagctcctt gtcaagatgc cttttcaaac ttcgccaggg 480ggcaaggctg aggggggtgg ggccaccaca tccacccagg tcatggtgat caaacgcccc 540ggcaggaagc gaaaagctga ggccgaccct caggccattc ccaagaaacg gggccgaaag 600ccggggagtg tggtggcagc cgctgccgcc gaggccaaaa agaaagccgt gaaggagtct 660tctatccgat ctgtgcagga gaccgtactc cccatcaaga agcgcaagac ccgggagacg 720gtc 72320471DNAArtificial SequencedNIC cDNA sequence of a truncated synthetic polypeptide sequence (from human MeCP2) 20ccggctgtgc cggaagcttc tgcctccccc aaacagcggc gctccatcat ccgtgaccgg 60ggacccatgt atgatgaccc caccctgcct gaaggctgga cacggaagct taagcaaagg 120aaatctggcc gctctgctgg gaagtatgat gtgtatttga tcaatcccca gggaaaagcc 180tttcgctcta aagtggagtt gattgcgtac ttcgaaaagg taggcgacac atccctggac 240cctaatgatt ttgacttcac ggtaactggg agagggagcc cctcccggcg agagcagaaa 300ccacctggat ccagtggcag ctctgggccc aagaaaaagc ggaaggtgcc ggggagtgtg 360gtggcagccg ctgccgccga ggccaaaaag aaagccgtga aggagtcttc tatccgatct 420gtgcaggaga ccgtactccc catcaagaag cgcaagaccc gggagacggt c 471211506DNAMus musculus 21atggccgccg ctgccgccac cgccgccgcc gccgccgcgc cgagcggagg aggaggagga 60ggcgaggagg agagactgga ggaaaagtca gaagaccagg atctccaggg cctcagagac 120aagccactga agtttaagaa ggcgaagaaa gacaagaagg aggacaaaga aggcaagcat 180gagccactac aaccttcagc ccaccattct gcagagccag cagaggcagg caaagcagaa 240acatcagaaa gctcaggctc tgccccagca gtgccagaag cctcggcttc ccccaaacag 300cggcgctcca ttatccgtga ccggggacct atgtatgatg accccacctt gcctgaaggt 360tggacacgaa agcttaaaca aaggaagtct ggccgatctg ctggaaagta tgatgtatat 420ttgatcaatc cccagggaaa agcttttcgc tctaaagtag aattgattgc atactttgaa 480aaggtgggag acacctcctt ggaccctaat gattttgact tcacggtaac tgggagaggg 540agcccctcca ggagagagca gaaaccacct aagaagccca aatctcccaa agctccagga 600actggcaggg gtcggggacg ccccaaaggg agcggcactg ggagaccaaa ggcagcagca 660tcagaaggtg ttcaggtgaa aagggtcctg gagaagagcc ctgggaaact tgttgtcaag 720atgcctttcc aagcatcgcc tgggggtaag ggtgagggag gtggggctac cacatctgcc 780caggtcatgg tgatcaaacg ccctggcaga aagcgaaaag ctgaagctga cccccaggcc 840attcctaaga aacggggtag aaagcctggg agtgtggtgg cagctgctgc agctgaggcc 900aaaaagaaag ccgtgaagga gtcttccata cggtctgtgc atgagactgt gctccccatc 960aagaagcgca agacccggga gacggtcagc atcgaggtca aggaagtggt gaagcccctg 1020ctggtgtcca cccttggtga gaaaagcggg aagggactga agacctgcaa gagccctggg 1080cgtaaaagca aggagagcag ccccaagggg cgcagcagca gtgcctcctc cccacctaag 1140aaggagcacc atcatcacca ccatcactca gagtccacaa aggcccccat gccactgctc 1200ccatccccac ccccacctga gcctgagagc tctgaggacc ccatcagccc ccctgagcct 1260caggacttga gcagcagcat ctgcaaagaa gagaagatgc cccgaggagg ctcactggaa 1320agcgatggct gccccaagga gccagctaag actcagccta tggtcgccac cactaccaca 1380gttgcagaaa agtacaaaca ccgaggggag ggagagcgca aagacattgt ttcatcttcc 1440atgccaaggc caaacagaga ggagcctgtg gacagccgga cgcccgtgac cgagagagtt 1500agctct 1506221242DNAArtificial SequencedN mouse cDNA for a truncated synthetic polypeptide sequence (from mouse MeCP2) 22ccagcagtgc cagaagcctc ggcttccccc aaacagcggc gctccattat ccgtgaccgg 60ggacctatgt atgatgaccc caccttgcct gaaggttgga cacgaaagct taaacaaagg 120aagtctggcc gatctgctgg aaagtatgat gtatatttga tcaatcccca gggaaaagct 180tttcgctcta aagtagaatt gattgcatac tttgaaaagg tgggagacac ctccttggac 240cctaatgatt ttgacttcac ggtaactggg agagggagcc cctccaggag agagcagaaa 300ccacctaaga agcccaaatc tcccaaagct ccaggaactg gcaggggtcg gggacgcccc 360aaagggagcg gcactgggag accaaaggca gcagcatcag aaggtgttca ggtgaaaagg 420gtcctggaga agagccctgg gaaacttgtt gtcaagatgc ctttccaagc atcgcctggg 480ggtaagggtg agggaggtgg ggctaccaca tctgcccagg tcatggtgat caaacgccct 540ggcagaaagc gaaaagctga agctgacccc caggccattc ctaagaaacg gggtagaaag 600cctgggagtg tggtggcagc tgctgcagct gaggccaaaa agaaagccgt gaaggagtct 660tccatacggt ctgtgcatga gactgtgctc cccatcaaga agcgcaagac ccgggagacg 720gtcagcatcg aggtcaagga agtggtgaag cccctgctgg tgtccaccct tggtgagaaa 780agcgggaagg gactgaagac ctgcaagagc cctgggcgta aaagcaagga gagcagcccc 840aaggggcgca gcagcagtgc ctcctcccca cctaagaagg agcaccatca tcaccaccat 900cactcagagt ccacaaaggc ccccatgcca ctgctcccat ccccaccccc acctgagcct 960gagagctctg aggaccccat cagcccccct gagcctcagg acttgagcag cagcatctgc 1020aaagaagaga agatgccccg aggaggctca ctggaaagcg atggctgccc caaggagcca 1080gctaagactc agcctatggt cgccaccact accacagttg cagaaaagta caaacaccga 1140ggggagggag agcgcaaaga cattgtttca tcttccatgc caaggccaaa cagagaggag 1200cctgtggaca gccggacgcc cgtgaccgag agagttagct gt 124223723DNAArtificial SequencedNC mouse cDNA for a truncated synthetic polypeptide sequence (from mouse MeCP2) 23ccagcagtgc cagaagcctc ggcttccccc aaacagcggc gctccattat ccgtgaccgg 60ggacctatgt atgatgaccc caccttgcct gaaggttgga cacgaaagct taaacaaagg 120aagtctggcc gatctgctgg aaagtatgat gtatatttga tcaatcccca gggaaaagct 180tttcgctcta aagtagaatt gattgcatac tttgaaaagg tgggagacac ctccttggac 240cctaatgatt ttgacttcac ggtaactggg agagggagcc cctccaggag agagcagaaa 300ccacctaaga agcccaaatc tcccaaagct ccaggaactg gcaggggtcg gggacgcccc 360aaagggagcg gcactgggag accaaaggca gcagcatcag aaggtgttca ggtgaaaagg 420gtcctggaga agagccctgg gaaacttgtt gtcaagatgc ctttccaagc atcgcctggg 480ggtaagggtg agggaggtgg ggctaccaca tctgcccagg tcatggtgat caaacgccct 540ggcagaaagc gaaaagctga agctgacccc caggccattc ctaagaaacg gggtagaaag 600cctgggagtg tggtggcagc tgctgcagct gaggccaaaa agaaagccgt gaaggagtct 660tccatacggt ctgtgcatga gactgtgctc cccatcaaga agcgcaagac ccgggagacg 720gtc 72324471DNAArtificial SequencedNIC mouse cDNA for a truncated synthetic polypeptide sequence (from mouse MeCP2) 24ccagcagtgc cagaagcctc ggcttccccc aaacagcggc gctccattat ccgtgaccgg 60ggacctatgt atgatgaccc caccttgcct gaaggttgga cacgaaagct taaacaaagg 120aagtctggcc gatctgctgg aaagtatgat gtatatttga tcaatcccca gggaaaagct 180tttcgctcta aagtagaatt gattgcatac tttgaaaagg tgggagacac ctccttggac 240cctaatgatt ttgacttcac ggtaactggg agagggagcc cctccaggag agagcagaaa 300ccacctggat ccagtggcag ctctgggccc aagaaaaagc ggaaggtgcc tgggagtgtg 360gtggcagctg ctgcagctga ggccaaaaag aaagccgtga aggagtcttc catacggtct 420gtgcatgaga ctgtgctccc catcaagaag cgcaagaccc gggagacggt c 4712587DNAMus musculus 25atggccgccg ctgccgccac cgccgccgcc gccgccgcgc cgagcggagg aggaggagga 60ggcgaggagg agagactgga ggaaaag 872646DNAMus musculus 26atggtagctg ggatgttagg gctcagggag gaaaagggag gaaaag 462772DNAHomo sapiens 27atggccgccg ccgccgccgc cgcgccgagc ggaggaggag gaggaggcga ggaggagaga 60ctggaagaaa ag 722836DNAHomo sapiens 28atggtagctg ggatgttagg gctcagggaa gaaaag 3629501PRTMus musculus 29Met Ala Ala Ala Ala Ala Thr Ala Ala Ala Ala Ala Ala Pro Ser Gly1 5 10 15Gly Gly Gly Gly Gly Glu Glu Glu Arg Leu Glu Glu Lys Ser Glu Asp 20 25 30Gln Asp Leu Gln Gly Leu Arg Asp Lys Pro Leu Lys Phe Lys Lys Ala 35 40 45Lys Lys Asp Lys Lys Glu Asp Lys Glu Gly Lys His Glu Pro Leu Gln 50 55 60Pro Ser Ala His His Ser Ala Glu Pro Ala Glu Ala Gly Lys Ala Glu65 70 75 80Thr Ser Glu Ser Ser Gly Ser Ala Pro Ala Val Pro Glu Ala Ser Ala 85 90 95Ser Pro Lys Gln Arg Arg Ser Ile Ile Arg Asp Arg Gly Pro Met Tyr 100 105 110Asp Asp Pro Thr Leu Pro Glu Gly Trp Thr Arg Lys Leu Lys Gln Arg 115 120 125Lys Ser Gly Arg Ser Ala Gly Lys Tyr Asp Val Tyr Leu Ile Asn Pro 130 135 140Gln Gly Lys Ala Phe Arg Ser Lys Val Glu Leu Ile Ala Tyr Phe Glu145 150 155 160Lys Val Gly Asp Thr Ser Leu Asp Pro Asn Asp Phe Asp Phe Thr Val 165 170 175Thr Gly Arg Gly Ser Pro Ser Arg Arg Glu Gln Lys Pro Pro Lys Lys 180 185 190Pro Lys Ser Pro Lys Ala Pro Gly Thr Gly Arg Gly Arg Gly Arg Pro 195 200 205Lys Gly Ser Gly Thr Gly Arg Pro Lys Ala Ala Ala Ser Glu Gly Val 210 215 220Gln Val Lys Arg Val Leu Glu Lys Ser Pro Gly Lys Leu Val Val Lys225 230 235 240Met Pro Phe Gln Ala Ser Pro Gly Gly Lys Gly Glu Gly Gly Gly Ala 245 250 255Thr Thr Ser Ala Gln Val Met Val Ile Lys Arg Pro Gly Arg Lys Arg 260 265 270Lys Ala Glu Ala Asp Pro Gln Ala Ile Pro Lys Lys Arg Gly Arg Lys 275 280 285Pro Gly Ser Val Val Ala Ala Ala Ala Ala Glu Ala Lys Lys Lys Ala 290 295 300Val Lys Glu Ser Ser Ile Arg Ser Val His Glu Thr Val Leu Pro Ile305 310 315 320Lys Lys Arg Lys Thr Arg Glu Thr Val Ser Ile Glu Val Lys Glu Val 325 330 335Val Lys Pro Leu Leu Val Ser Thr Leu Gly Glu Lys Ser Gly Lys Gly 340 345 350Leu Lys Thr Cys Lys Ser Pro Gly Arg Lys Ser Lys Glu Ser Ser Pro 355 360 365Lys Gly Arg Ser Ser Ser Ala Ser Ser Pro Pro Lys Lys Glu His His 370 375 380His His His His His Ser Glu Ser Thr Lys Ala Pro Met Pro Leu Leu385 390 395 400Pro Ser Pro Pro Pro Pro Glu Pro Glu Ser Ser Glu Asp Pro Ile Ser 405 410 415Pro Pro Glu Pro Gln Asp Leu Ser Ser Ser Ile Cys Lys Glu Glu Lys 420 425 430Met Pro Arg Gly Gly Ser Leu Glu Ser Asp Gly Cys Pro Lys Glu Pro 435 440 445Ala Lys Thr Gln Pro Met Val Ala Thr Thr Thr Thr Val Ala Glu Lys 450 455 460Tyr Lys His Arg Gly Glu Gly Glu Arg Lys Asp Ile Val Ser Ser Ser465 470 475 480Met Pro Arg Pro Asn Arg Glu Glu Pro Val Asp Ser Arg Thr Pro Val 485 490 495Thr Glu Arg Val Ser 50030484PRTMus musculus 30Met Val Ala Gly Met Leu Gly Leu Arg Glu Glu Lys Ser Glu Asp Gln1 5 10 15Asp Leu Gln Gly Leu Arg Asp Lys Pro Leu Lys Phe Lys Lys Ala Lys 20 25 30Lys Asp Lys Lys Glu Asp Lys Glu Gly Lys His Glu Pro Leu Gln Pro 35 40 45Ser Ala His His Ser Ala Glu Pro Ala Glu Ala Gly Lys Ala Glu Thr 50 55 60Ser Glu Ser Ser Gly Ser Ala Pro Ala Val Pro Glu Ala Ser Ala Ser65 70 75 80Pro Lys Gln Arg Arg Ser Ile Ile Arg Asp Arg Gly Pro Met Tyr Asp 85 90 95Asp Pro Thr Leu Pro Glu Gly Trp Thr Arg Lys Leu Lys Gln Arg Lys 100 105 110Ser Gly Arg Ser Ala Gly Lys Tyr Asp Val Tyr Leu Ile Asn Pro Gln 115 120 125Gly Lys Ala Phe Arg Ser Lys Val Glu Leu Ile Ala Tyr Phe Glu Lys 130 135 140Val Gly Asp Thr Ser Leu Asp Pro Asn Asp Phe Asp Phe Thr Val Thr145 150 155 160Gly Arg Gly Ser Pro Ser Arg Arg Glu Gln Lys Pro Pro Lys Lys Pro 165 170 175Lys Ser Pro Lys Ala Pro Gly Thr Gly Arg Gly Arg Gly Arg Pro Lys 180 185 190Gly Ser Gly Thr Gly Arg Pro Lys Ala Ala Ala Ser Glu Gly Val Gln 195 200 205Val Lys Arg Val Leu Glu Lys Ser Pro Gly Lys Leu Val Val Lys Met 210 215 220Pro Phe Gln Ala Ser

Pro Gly Gly Lys Gly Glu Gly Gly Gly Ala Thr225 230 235 240Thr Ser Ala Gln Val Met Val Ile Lys Arg Pro Gly Arg Lys Arg Lys 245 250 255Ala Glu Ala Asp Pro Gln Ala Ile Pro Lys Lys Arg Gly Arg Lys Pro 260 265 270Gly Ser Val Val Ala Ala Ala Ala Ala Glu Ala Lys Lys Lys Ala Val 275 280 285Lys Glu Ser Ser Ile Arg Ser Val His Glu Thr Val Leu Pro Ile Lys 290 295 300Lys Arg Lys Thr Arg Glu Thr Val Ser Ile Glu Val Lys Glu Val Val305 310 315 320Lys Pro Leu Leu Val Ser Thr Leu Gly Glu Lys Ser Gly Lys Gly Leu 325 330 335Lys Thr Cys Lys Ser Pro Gly Arg Lys Ser Lys Glu Ser Ser Pro Lys 340 345 350Gly Arg Ser Ser Ser Ala Ser Ser Pro Pro Lys Lys Glu His His His 355 360 365His His His His Ser Glu Ser Thr Lys Ala Pro Met Pro Leu Leu Pro 370 375 380Ser Pro Pro Pro Pro Glu Pro Glu Ser Ser Glu Asp Pro Ile Ser Pro385 390 395 400Pro Glu Pro Gln Asp Leu Ser Ser Ser Ile Cys Lys Glu Glu Lys Met 405 410 415Pro Arg Gly Gly Ser Leu Glu Ser Asp Gly Cys Pro Lys Glu Pro Ala 420 425 430Lys Thr Gln Pro Met Val Ala Thr Thr Thr Thr Val Ala Glu Lys Tyr 435 440 445Lys His Arg Gly Glu Gly Glu Arg Lys Asp Ile Val Ser Ser Ser Met 450 455 460Pro Arg Pro Asn Arg Glu Glu Pro Val Asp Ser Arg Thr Pro Val Thr465 470 475 480Glu Arg Val Ser311452DNAMus musculus 31atggtagctg ggatgttagg gctcagggag gaaaagtcag aagaccagga tctccagggc 60ctcagagaca agccactgaa gtttaagaag gcgaagaaag acaagaagga ggacaaagaa 120ggcaagcatg agccactaca accttcagcc caccattctg cagagccagc agaggcaggc 180aaagcagaaa catcagaaag ctcaggctct gccccagcag tgccagaagc ctcggcttcc 240cccaaacagc ggcgctccat tatccgtgac cggggaccta tgtatgatga ccccaccttg 300cctgaaggtt ggacacgaaa gcttaaacaa aggaagtctg gccgatctgc tggaaagtat 360gatgtatatt tgatcaatcc ccagggaaaa gcttttcgct ctaaagtaga attgattgca 420tactttgaaa aggtgggaga cacctccttg gaccctaatg attttgactt cacggtaact 480gggagaggga gcccctccag gagagagcag aaaccaccta agaagcccaa atctcccaaa 540gctccaggaa ctggcagggg tcggggacgc cccaaaggga gcggcactgg gagaccaaag 600gcagcagcat cagaaggtgt tcaggtgaaa agggtcctgg agaagagccc tgggaaactt 660gttgtcaaga tgcctttcca agcatcgcct gggggtaagg gtgagggagg tggggctacc 720acatctgccc aggtcatggt gatcaaacgc cctggcagaa agcgaaaagc tgaagctgac 780ccccaggcca ttcctaagaa acggggtaga aagcctggga gtgtggtggc agctgctgca 840gctgaggcca aaaagaaagc cgtgaaggag tcttccatac ggtctgtgca tgagactgtg 900ctccccatca agaagcgcaa gacccgggag acggtcagca tcgaggtcaa ggaagtggtg 960aagcccctgc tggtgtccac ccttggtgag aaaagcggga agggactgaa gacctgcaag 1020agccctgggc gtaaaagcaa ggagagcagc cccaaggggc gcagcagcag tgcctcctcc 1080ccacctaaga aggagcacca tcatcaccac catcactcag agtccacaaa ggcccccatg 1140ccactgctcc catccccacc cccacctgag cctgagagct ctgaggaccc catcagcccc 1200cctgagcctc aggacttgag cagcagcatc tgcaaagaag agaagatgcc ccgaggaggc 1260tcactggaaa gcgatggctg ccccaaggag ccagctaaga ctcagcctat ggtcgccacc 1320actaccacag ttgcagaaaa gtacaaacac cgaggggagg gagagcgcaa agacattgtt 1380tcatcttcca tgccaaggcc aaacagagag gagcctgtgg acagccggac gcccgtgacc 1440gagagagtta gc 1452327PRTArtificial SequenceNuclear localisation sequence (NLS) polypeptide 32Pro Lys Lys Lys Arg Lys Val1 53321DNAArtificial Sequencepolynucleotide sequence encoding NLS 33cccaagaaaa agcggaaggt g 21347PRTArtificial SequenceLinker polypeptide sequence 34Gly Ser Ser Gly Ser Ser Gly1 53521DNAArtificial Sequencepolynucleotide sequence encoding linker 35ggatccagtg gcagctctgg g 21368PRTArtificial SequenceLinker polypeptide sequence 36Cys Lys Asp Pro Pro Val Ala Thr1 53724DNAArtificial SequencePolynucleotide sequence encoding linker 37tgtaaggatc caccggtcgc cacc 24387PRTArtificial Sequencelinker polypeptide sequence 38Gly Ser Ser Gly Ser Ser Gly1 53921DNAArtificial Sequencepolynucleotide sequence encoding linker 39gggagctccg gcagttctgg a 2140748PRTArtificial SequenceMouse WT e1 with EGFP tag polypeptide 40Met Ala Ala Ala Ala Ala Thr Ala Ala Ala Ala Ala Ala Pro Ser Gly1 5 10 15Gly Gly Gly Gly Gly Glu Glu Glu Arg Leu Glu Glu Lys Ser Glu Asp 20 25 30Gln Asp Leu Gln Gly Leu Arg Asp Lys Pro Leu Lys Phe Lys Lys Ala 35 40 45Lys Lys Asp Lys Lys Glu Asp Lys Glu Gly Lys His Glu Pro Leu Gln 50 55 60Pro Ser Ala His His Ser Ala Glu Pro Ala Glu Ala Gly Lys Ala Glu65 70 75 80Thr Ser Glu Ser Ser Gly Ser Ala Pro Ala Val Pro Glu Ala Ser Ala 85 90 95Ser Pro Lys Gln Arg Arg Ser Ile Ile Arg Asp Arg Gly Pro Met Tyr 100 105 110Asp Asp Pro Thr Leu Pro Glu Gly Trp Thr Arg Lys Leu Lys Gln Arg 115 120 125Lys Ser Gly Arg Ser Ala Gly Lys Tyr Asp Val Tyr Leu Ile Asn Pro 130 135 140Gln Gly Lys Ala Phe Arg Ser Lys Val Glu Leu Ile Ala Tyr Phe Glu145 150 155 160Lys Val Gly Asp Thr Ser Leu Asp Pro Asn Asp Phe Asp Phe Thr Val 165 170 175Thr Gly Arg Gly Ser Pro Ser Arg Arg Glu Gln Lys Pro Pro Lys Lys 180 185 190Pro Lys Ser Pro Lys Ala Pro Gly Thr Gly Arg Gly Arg Gly Arg Pro 195 200 205Lys Gly Ser Gly Thr Gly Arg Pro Lys Ala Ala Ala Ser Glu Gly Val 210 215 220Gln Val Lys Arg Val Leu Glu Lys Ser Pro Gly Lys Leu Val Val Lys225 230 235 240Met Pro Phe Gln Ala Ser Pro Gly Gly Lys Gly Glu Gly Gly Gly Ala 245 250 255Thr Thr Ser Ala Gln Val Met Val Ile Lys Arg Pro Gly Arg Lys Arg 260 265 270Lys Ala Glu Ala Asp Pro Gln Ala Ile Pro Lys Lys Arg Gly Arg Lys 275 280 285Pro Gly Ser Val Val Ala Ala Ala Ala Ala Glu Ala Lys Lys Lys Ala 290 295 300Val Lys Glu Ser Ser Ile Arg Ser Val His Glu Thr Val Leu Pro Ile305 310 315 320Lys Lys Arg Lys Thr Arg Glu Thr Val Ser Ile Glu Val Lys Glu Val 325 330 335Val Lys Pro Leu Leu Val Ser Thr Leu Gly Glu Lys Ser Gly Lys Gly 340 345 350Leu Lys Thr Cys Lys Ser Pro Gly Arg Lys Ser Lys Glu Ser Ser Pro 355 360 365Lys Gly Arg Ser Ser Ser Ala Ser Ser Pro Pro Lys Lys Glu His His 370 375 380His His His His His Ser Glu Ser Thr Lys Ala Pro Met Pro Leu Leu385 390 395 400Pro Ser Pro Pro Pro Pro Glu Pro Glu Ser Ser Glu Asp Pro Ile Ser 405 410 415Pro Pro Glu Pro Gln Asp Leu Ser Ser Ser Ile Cys Lys Glu Glu Lys 420 425 430Met Pro Arg Gly Gly Ser Leu Glu Ser Asp Gly Cys Pro Lys Glu Pro 435 440 445Ala Lys Thr Gln Pro Met Val Ala Thr Thr Thr Thr Val Ala Glu Lys 450 455 460Tyr Lys His Arg Gly Glu Gly Glu Arg Lys Asp Ile Val Ser Ser Ser465 470 475 480Met Pro Arg Pro Asn Arg Glu Glu Pro Val Asp Ser Arg Thr Pro Val 485 490 495Thr Glu Arg Val Ser Cys Lys Asp Pro Pro Val Ala Thr Met Val Ser 500 505 510Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro Ile Leu Val Glu Leu 515 520 525Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Gly Glu Gly Glu 530 535 540Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe Ile Cys Thr Thr545 550 555 560Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr Leu Thr Tyr 565 570 575Gly Val Gln Cys Phe Ser Arg Tyr Pro Asp His Met Lys Gln His Asp 580 585 590Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gln Glu Arg Thr Ile 595 600 605Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu Val Lys Phe 610 615 620Glu Gly Asp Thr Leu Val Asn Arg Ile Glu Leu Lys Gly Ile Asp Phe625 630 635 640Lys Glu Asp Gly Asn Ile Leu Gly His Lys Leu Glu Tyr Asn Tyr Asn 645 650 655Ser His Asn Val Tyr Ile Met Ala Asp Lys Gln Lys Asn Gly Ile Lys 660 665 670Val Asn Phe Lys Ile Arg His Asn Ile Glu Asp Gly Ser Val Gln Leu 675 680 685Ala Asp His Tyr Gln Gln Asn Thr Pro Ile Gly Asp Gly Pro Val Leu 690 695 700Leu Pro Asp Asn His Tyr Leu Ser Thr Gln Ser Ala Leu Ser Lys Asp705 710 715 720Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu Phe Val Thr Ala 725 730 735Ala Gly Ile Thr Leu Gly Met Asp Glu Leu Tyr Lys 740 74541689PRTArtificial SequenceMouse dN e1 with EGFP tag polypeptide 41Met Ala Ala Ala Ala Ala Thr Ala Ala Ala Ala Ala Ala Pro Ser Gly1 5 10 15Gly Gly Gly Gly Gly Glu Glu Glu Arg Leu Glu Glu Lys Pro Ala Val 20 25 30Pro Glu Ala Ser Ala Ser Pro Lys Gln Arg Arg Ser Ile Ile Arg Asp 35 40 45Arg Gly Pro Met Tyr Asp Asp Pro Thr Leu Pro Glu Gly Trp Thr Arg 50 55 60Lys Leu Lys Gln Arg Lys Ser Gly Arg Ser Ala Gly Lys Tyr Asp Val65 70 75 80Tyr Leu Ile Asn Pro Gln Gly Lys Ala Phe Arg Ser Lys Val Glu Leu 85 90 95Ile Ala Tyr Phe Glu Lys Val Gly Asp Thr Ser Leu Asp Pro Asn Asp 100 105 110Phe Asp Phe Thr Val Thr Gly Arg Gly Ser Pro Ser Arg Arg Glu Gln 115 120 125Lys Pro Pro Lys Lys Pro Lys Ser Pro Lys Ala Pro Gly Thr Gly Arg 130 135 140Gly Arg Gly Arg Pro Lys Gly Ser Gly Thr Gly Arg Pro Lys Ala Ala145 150 155 160Ala Ser Glu Gly Val Gln Val Lys Arg Val Leu Glu Lys Ser Pro Gly 165 170 175Lys Leu Val Val Lys Met Pro Phe Gln Ala Ser Pro Gly Gly Lys Gly 180 185 190Glu Gly Gly Gly Ala Thr Thr Ser Ala Gln Val Met Val Ile Lys Arg 195 200 205Pro Gly Arg Lys Arg Lys Ala Glu Ala Asp Pro Gln Ala Ile Pro Lys 210 215 220Lys Arg Gly Arg Lys Pro Gly Ser Val Val Ala Ala Ala Ala Ala Glu225 230 235 240Ala Lys Lys Lys Ala Val Lys Glu Ser Ser Ile Arg Ser Val His Glu 245 250 255Thr Val Leu Pro Ile Lys Lys Arg Lys Thr Arg Glu Thr Val Ser Ile 260 265 270Glu Val Lys Glu Val Val Lys Pro Leu Leu Val Ser Thr Leu Gly Glu 275 280 285Lys Ser Gly Lys Gly Leu Lys Thr Cys Lys Ser Pro Gly Arg Lys Ser 290 295 300Lys Glu Ser Ser Pro Lys Gly Arg Ser Ser Ser Ala Ser Ser Pro Pro305 310 315 320Lys Lys Glu His His His His His His His Ser Glu Ser Thr Lys Ala 325 330 335Pro Met Pro Leu Leu Pro Ser Pro Pro Pro Pro Glu Pro Glu Ser Ser 340 345 350Glu Asp Pro Ile Ser Pro Pro Glu Pro Gln Asp Leu Ser Ser Ser Ile 355 360 365Cys Lys Glu Glu Lys Met Pro Arg Gly Gly Ser Leu Glu Ser Asp Gly 370 375 380Cys Pro Lys Glu Pro Ala Lys Thr Gln Pro Met Val Ala Thr Thr Thr385 390 395 400Thr Val Ala Glu Lys Tyr Lys His Arg Gly Glu Gly Glu Arg Lys Asp 405 410 415Ile Val Ser Ser Ser Met Pro Arg Pro Asn Arg Glu Glu Pro Val Asp 420 425 430Ser Arg Thr Pro Val Thr Glu Arg Val Ser Cys Lys Asp Pro Pro Val 435 440 445Ala Thr Met Val Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro 450 455 460Ile Leu Val Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val465 470 475 480Ser Gly Glu Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys 485 490 495Phe Ile Cys Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val 500 505 510Thr Thr Leu Thr Tyr Gly Val Gln Cys Phe Ser Arg Tyr Pro Asp His 515 520 525Met Lys Gln His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val 530 535 540Gln Glu Arg Thr Ile Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg545 550 555 560Ala Glu Val Lys Phe Glu Gly Asp Thr Leu Val Asn Arg Ile Glu Leu 565 570 575Lys Gly Ile Asp Phe Lys Glu Asp Gly Asn Ile Leu Gly His Lys Leu 580 585 590Glu Tyr Asn Tyr Asn Ser His Asn Val Tyr Ile Met Ala Asp Lys Gln 595 600 605Lys Asn Gly Ile Lys Val Asn Phe Lys Ile Arg His Asn Ile Glu Asp 610 615 620Gly Ser Val Gln Leu Ala Asp His Tyr Gln Gln Asn Thr Pro Ile Gly625 630 635 640Asp Gly Pro Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gln Ser 645 650 655Ala Leu Ser Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu 660 665 670Glu Phe Val Thr Ala Ala Gly Ile Thr Leu Gly Met Asp Glu Leu Tyr 675 680 685Lys42516PRTArtificial SequenceMouse dNC e1 with EGFP tag polypeptide 42Met Ala Ala Ala Ala Ala Thr Ala Ala Ala Ala Ala Ala Pro Ser Gly1 5 10 15Gly Gly Gly Gly Gly Glu Glu Glu Arg Leu Glu Glu Lys Pro Ala Val 20 25 30Pro Glu Ala Ser Ala Ser Pro Lys Gln Arg Arg Ser Ile Ile Arg Asp 35 40 45Arg Gly Pro Met Tyr Asp Asp Pro Thr Leu Pro Glu Gly Trp Thr Arg 50 55 60Lys Leu Lys Gln Arg Lys Ser Gly Arg Ser Ala Gly Lys Tyr Asp Val65 70 75 80Tyr Leu Ile Asn Pro Gln Gly Lys Ala Phe Arg Ser Lys Val Glu Leu 85 90 95Ile Ala Tyr Phe Glu Lys Val Gly Asp Thr Ser Leu Asp Pro Asn Asp 100 105 110Phe Asp Phe Thr Val Thr Gly Arg Gly Ser Pro Ser Arg Arg Glu Gln 115 120 125Lys Pro Pro Lys Lys Pro Lys Ser Pro Lys Ala Pro Gly Thr Gly Arg 130 135 140Gly Arg Gly Arg Pro Lys Gly Ser Gly Thr Gly Arg Pro Lys Ala Ala145 150 155 160Ala Ser Glu Gly Val Gln Val Lys Arg Val Leu Glu Lys Ser Pro Gly 165 170 175Lys Leu Val Val Lys Met Pro Phe Gln Ala Ser Pro Gly Gly Lys Gly 180 185 190Glu Gly Gly Gly Ala Thr Thr Ser Ala Gln Val Met Val Ile Lys Arg 195 200 205Pro Gly Arg Lys Arg Lys Ala Glu Ala Asp Pro Gln Ala Ile Pro Lys 210 215 220Lys Arg Gly Arg Lys Pro Gly Ser Val Val Ala Ala Ala Ala Ala Glu225 230 235 240Ala Lys Lys Lys Ala Val Lys Glu Ser Ser Ile Arg Ser Val His Glu 245 250 255Thr Val Leu Pro Ile Lys Lys Arg Lys Thr Arg Glu Thr Val Gly Ser 260 265 270Ser Gly Ser Ser Gly Met Val Ser Lys Gly Glu Glu Leu Phe Thr Gly 275 280 285Val Val Pro Ile Leu Val Glu Leu Asp Gly Asp Val Asn Gly His Lys 290 295 300Phe Ser Val Ser Gly Glu Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu305 310 315 320Thr Leu Lys Phe Ile Cys Thr Thr Gly Lys Leu Pro Val Pro Trp Pro 325 330 335Thr Leu Val Thr Thr Leu Thr Tyr Gly Val Gln Cys Phe Ser Arg Tyr 340 345 350Pro Asp His Met Lys Gln His Asp Phe Phe Lys Ser Ala Met Pro Glu 355 360

365Gly Tyr Val Gln Glu Arg Thr Ile Phe Phe Lys Asp Asp Gly Asn Tyr 370 375 380Lys Thr Arg Ala Glu Val Lys Phe Glu Gly Asp Thr Leu Val Asn Arg385 390 395 400Ile Glu Leu Lys Gly Ile Asp Phe Lys Glu Asp Gly Asn Ile Leu Gly 405 410 415His Lys Leu Glu Tyr Asn Tyr Asn Ser His Asn Val Tyr Ile Met Ala 420 425 430Asp Lys Gln Lys Asn Gly Ile Lys Val Asn Phe Lys Ile Arg His Asn 435 440 445Ile Glu Asp Gly Ser Val Gln Leu Ala Asp His Tyr Gln Gln Asn Thr 450 455 460Pro Ile Gly Asp Gly Pro Val Leu Leu Pro Asp Asn His Tyr Leu Ser465 470 475 480Thr Gln Ser Ala Leu Ser Lys Asp Pro Asn Glu Lys Arg Asp His Met 485 490 495Val Leu Leu Glu Phe Val Thr Ala Ala Gly Ile Thr Leu Gly Met Asp 500 505 510Glu Leu Tyr Lys 51543432PRTArtificial SequenceMouse dNIC e1 with EGFP tag polypeptide 43Met Ala Ala Ala Ala Ala Thr Ala Ala Ala Ala Ala Ala Pro Ser Gly1 5 10 15Gly Gly Gly Gly Gly Glu Glu Glu Arg Leu Glu Glu Lys Pro Ala Val 20 25 30Pro Glu Ala Ser Ala Ser Pro Lys Gln Arg Arg Ser Ile Ile Arg Asp 35 40 45Arg Gly Pro Met Tyr Asp Asp Pro Thr Leu Pro Glu Gly Trp Thr Arg 50 55 60Lys Leu Lys Gln Arg Lys Ser Gly Arg Ser Ala Gly Lys Tyr Asp Val65 70 75 80Tyr Leu Ile Asn Pro Gln Gly Lys Ala Phe Arg Ser Lys Val Glu Leu 85 90 95Ile Ala Tyr Phe Glu Lys Val Gly Asp Thr Ser Leu Asp Pro Asn Asp 100 105 110Phe Asp Phe Thr Val Thr Gly Arg Gly Ser Pro Ser Arg Arg Glu Gln 115 120 125Lys Pro Pro Gly Ser Ser Gly Ser Ser Gly Pro Lys Lys Lys Arg Lys 130 135 140Val Pro Gly Ser Val Val Ala Ala Ala Ala Ala Glu Ala Lys Lys Lys145 150 155 160Ala Val Lys Glu Ser Ser Ile Arg Ser Val His Glu Thr Val Leu Pro 165 170 175Ile Lys Lys Arg Lys Thr Arg Glu Thr Val Gly Ser Ser Gly Ser Ser 180 185 190Gly Met Val Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro Ile 195 200 205Leu Val Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser 210 215 220Gly Glu Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe225 230 235 240Ile Cys Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr 245 250 255Thr Leu Thr Tyr Gly Val Gln Cys Phe Ser Arg Tyr Pro Asp His Met 260 265 270Lys Gln His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gln 275 280 285Glu Arg Thr Ile Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala 290 295 300Glu Val Lys Phe Glu Gly Asp Thr Leu Val Asn Arg Ile Glu Leu Lys305 310 315 320Gly Ile Asp Phe Lys Glu Asp Gly Asn Ile Leu Gly His Lys Leu Glu 325 330 335Tyr Asn Tyr Asn Ser His Asn Val Tyr Ile Met Ala Asp Lys Gln Lys 340 345 350Asn Gly Ile Lys Val Asn Phe Lys Ile Arg His Asn Ile Glu Asp Gly 355 360 365Ser Val Gln Leu Ala Asp His Tyr Gln Gln Asn Thr Pro Ile Gly Asp 370 375 380Gly Pro Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gln Ser Ala385 390 395 400Leu Ser Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu 405 410 415Phe Val Thr Ala Ala Gly Ile Thr Leu Gly Met Asp Glu Leu Tyr Lys 420 425 43044731PRTArtificial SequenceMouse WT e2 with EGFP tag polypeptide 44Met Val Ala Gly Met Leu Gly Leu Arg Glu Glu Lys Ser Glu Asp Gln1 5 10 15Asp Leu Gln Gly Leu Arg Asp Lys Pro Leu Lys Phe Lys Lys Ala Lys 20 25 30Lys Asp Lys Lys Glu Asp Lys Glu Gly Lys His Glu Pro Leu Gln Pro 35 40 45Ser Ala His His Ser Ala Glu Pro Ala Glu Ala Gly Lys Ala Glu Thr 50 55 60Ser Glu Ser Ser Gly Ser Ala Pro Ala Val Pro Glu Ala Ser Ala Ser65 70 75 80Pro Lys Gln Arg Arg Ser Ile Ile Arg Asp Arg Gly Pro Met Tyr Asp 85 90 95Asp Pro Thr Leu Pro Glu Gly Trp Thr Arg Lys Leu Lys Gln Arg Lys 100 105 110Ser Gly Arg Ser Ala Gly Lys Tyr Asp Val Tyr Leu Ile Asn Pro Gln 115 120 125Gly Lys Ala Phe Arg Ser Lys Val Glu Leu Ile Ala Tyr Phe Glu Lys 130 135 140Val Gly Asp Thr Ser Leu Asp Pro Asn Asp Phe Asp Phe Thr Val Thr145 150 155 160Gly Arg Gly Ser Pro Ser Arg Arg Glu Gln Lys Pro Pro Lys Lys Pro 165 170 175Lys Ser Pro Lys Ala Pro Gly Thr Gly Arg Gly Arg Gly Arg Pro Lys 180 185 190Gly Ser Gly Thr Gly Arg Pro Lys Ala Ala Ala Ser Glu Gly Val Gln 195 200 205Val Lys Arg Val Leu Glu Lys Ser Pro Gly Lys Leu Val Val Lys Met 210 215 220Pro Phe Gln Ala Ser Pro Gly Gly Lys Gly Glu Gly Gly Gly Ala Thr225 230 235 240Thr Ser Ala Gln Val Met Val Ile Lys Arg Pro Gly Arg Lys Arg Lys 245 250 255Ala Glu Ala Asp Pro Gln Ala Ile Pro Lys Lys Arg Gly Arg Lys Pro 260 265 270Gly Ser Val Val Ala Ala Ala Ala Ala Glu Ala Lys Lys Lys Ala Val 275 280 285Lys Glu Ser Ser Ile Arg Ser Val His Glu Thr Val Leu Pro Ile Lys 290 295 300Lys Arg Lys Thr Arg Glu Thr Val Ser Ile Glu Val Lys Glu Val Val305 310 315 320Lys Pro Leu Leu Val Ser Thr Leu Gly Glu Lys Ser Gly Lys Gly Leu 325 330 335Lys Thr Cys Lys Ser Pro Gly Arg Lys Ser Lys Glu Ser Ser Pro Lys 340 345 350Gly Arg Ser Ser Ser Ala Ser Ser Pro Pro Lys Lys Glu His His His 355 360 365His His His His Ser Glu Ser Thr Lys Ala Pro Met Pro Leu Leu Pro 370 375 380Ser Pro Pro Pro Pro Glu Pro Glu Ser Ser Glu Asp Pro Ile Ser Pro385 390 395 400Pro Glu Pro Gln Asp Leu Ser Ser Ser Ile Cys Lys Glu Glu Lys Met 405 410 415Pro Arg Gly Gly Ser Leu Glu Ser Asp Gly Cys Pro Lys Glu Pro Ala 420 425 430Lys Thr Gln Pro Met Val Ala Thr Thr Thr Thr Val Ala Glu Lys Tyr 435 440 445Lys His Arg Gly Glu Gly Glu Arg Lys Asp Ile Val Ser Ser Ser Met 450 455 460Pro Arg Pro Asn Arg Glu Glu Pro Val Asp Ser Arg Thr Pro Val Thr465 470 475 480Glu Arg Val Ser Cys Lys Asp Pro Pro Val Ala Thr Met Val Ser Lys 485 490 495Gly Glu Glu Leu Phe Thr Gly Val Val Pro Ile Leu Val Glu Leu Asp 500 505 510Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Gly Glu Gly Glu Gly 515 520 525Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe Ile Cys Thr Thr Gly 530 535 540Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr Leu Thr Tyr Gly545 550 555 560Val Gln Cys Phe Ser Arg Tyr Pro Asp His Met Lys Gln His Asp Phe 565 570 575Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gln Glu Arg Thr Ile Phe 580 585 590Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu Val Lys Phe Glu 595 600 605Gly Asp Thr Leu Val Asn Arg Ile Glu Leu Lys Gly Ile Asp Phe Lys 610 615 620Glu Asp Gly Asn Ile Leu Gly His Lys Leu Glu Tyr Asn Tyr Asn Ser625 630 635 640His Asn Val Tyr Ile Met Ala Asp Lys Gln Lys Asn Gly Ile Lys Val 645 650 655Asn Phe Lys Ile Arg His Asn Ile Glu Asp Gly Ser Val Gln Leu Ala 660 665 670Asp His Tyr Gln Gln Asn Thr Pro Ile Gly Asp Gly Pro Val Leu Leu 675 680 685Pro Asp Asn His Tyr Leu Ser Thr Gln Ser Ala Leu Ser Lys Asp Pro 690 695 700Asn Glu Lys Arg Asp His Met Val Leu Leu Glu Phe Val Thr Ala Ala705 710 715 720Gly Ile Thr Leu Gly Met Asp Glu Leu Tyr Lys 725 73045672PRTArtificial SequenceMouse dN e2 with EGFP tag polypeptide 45Met Val Ala Gly Met Leu Gly Leu Arg Glu Glu Lys Pro Ala Val Pro1 5 10 15Glu Ala Ser Ala Ser Pro Lys Gln Arg Arg Ser Ile Ile Arg Asp Arg 20 25 30Gly Pro Met Tyr Asp Asp Pro Thr Leu Pro Glu Gly Trp Thr Arg Lys 35 40 45Leu Lys Gln Arg Lys Ser Gly Arg Ser Ala Gly Lys Tyr Asp Val Tyr 50 55 60Leu Ile Asn Pro Gln Gly Lys Ala Phe Arg Ser Lys Val Glu Leu Ile65 70 75 80Ala Tyr Phe Glu Lys Val Gly Asp Thr Ser Leu Asp Pro Asn Asp Phe 85 90 95Asp Phe Thr Val Thr Gly Arg Gly Ser Pro Ser Arg Arg Glu Gln Lys 100 105 110Pro Pro Lys Lys Pro Lys Ser Pro Lys Ala Pro Gly Thr Gly Arg Gly 115 120 125Arg Gly Arg Pro Lys Gly Ser Gly Thr Gly Arg Pro Lys Ala Ala Ala 130 135 140Ser Glu Gly Val Gln Val Lys Arg Val Leu Glu Lys Ser Pro Gly Lys145 150 155 160Leu Val Val Lys Met Pro Phe Gln Ala Ser Pro Gly Gly Lys Gly Glu 165 170 175Gly Gly Gly Ala Thr Thr Ser Ala Gln Val Met Val Ile Lys Arg Pro 180 185 190Gly Arg Lys Arg Lys Ala Glu Ala Asp Pro Gln Ala Ile Pro Lys Lys 195 200 205Arg Gly Arg Lys Pro Gly Ser Val Val Ala Ala Ala Ala Ala Glu Ala 210 215 220Lys Lys Lys Ala Val Lys Glu Ser Ser Ile Arg Ser Val His Glu Thr225 230 235 240Val Leu Pro Ile Lys Lys Arg Lys Thr Arg Glu Thr Val Ser Ile Glu 245 250 255Val Lys Glu Val Val Lys Pro Leu Leu Val Ser Thr Leu Gly Glu Lys 260 265 270Ser Gly Lys Gly Leu Lys Thr Cys Lys Ser Pro Gly Arg Lys Ser Lys 275 280 285Glu Ser Ser Pro Lys Gly Arg Ser Ser Ser Ala Ser Ser Pro Pro Lys 290 295 300Lys Glu His His His His His His His Ser Glu Ser Thr Lys Ala Pro305 310 315 320Met Pro Leu Leu Pro Ser Pro Pro Pro Pro Glu Pro Glu Ser Ser Glu 325 330 335Asp Pro Ile Ser Pro Pro Glu Pro Gln Asp Leu Ser Ser Ser Ile Cys 340 345 350Lys Glu Glu Lys Met Pro Arg Gly Gly Ser Leu Glu Ser Asp Gly Cys 355 360 365Pro Lys Glu Pro Ala Lys Thr Gln Pro Met Val Ala Thr Thr Thr Thr 370 375 380Val Ala Glu Lys Tyr Lys His Arg Gly Glu Gly Glu Arg Lys Asp Ile385 390 395 400Val Ser Ser Ser Met Pro Arg Pro Asn Arg Glu Glu Pro Val Asp Ser 405 410 415Arg Thr Pro Val Thr Glu Arg Val Ser Cys Lys Asp Pro Pro Val Ala 420 425 430Thr Met Val Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro Ile 435 440 445Leu Val Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser 450 455 460Gly Glu Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe465 470 475 480Ile Cys Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr 485 490 495Thr Leu Thr Tyr Gly Val Gln Cys Phe Ser Arg Tyr Pro Asp His Met 500 505 510Lys Gln His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gln 515 520 525Glu Arg Thr Ile Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala 530 535 540Glu Val Lys Phe Glu Gly Asp Thr Leu Val Asn Arg Ile Glu Leu Lys545 550 555 560Gly Ile Asp Phe Lys Glu Asp Gly Asn Ile Leu Gly His Lys Leu Glu 565 570 575Tyr Asn Tyr Asn Ser His Asn Val Tyr Ile Met Ala Asp Lys Gln Lys 580 585 590Asn Gly Ile Lys Val Asn Phe Lys Ile Arg His Asn Ile Glu Asp Gly 595 600 605Ser Val Gln Leu Ala Asp His Tyr Gln Gln Asn Thr Pro Ile Gly Asp 610 615 620Gly Pro Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gln Ser Ala625 630 635 640Leu Ser Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu 645 650 655Phe Val Thr Ala Ala Gly Ile Thr Leu Gly Met Asp Glu Leu Tyr Lys 660 665 67046499PRTArtificial SequenceMouse dNC e2 with EGFP tag polypeptide 46Met Val Ala Gly Met Leu Gly Leu Arg Glu Glu Lys Pro Ala Val Pro1 5 10 15Glu Ala Ser Ala Ser Pro Lys Gln Arg Arg Ser Ile Ile Arg Asp Arg 20 25 30Gly Pro Met Tyr Asp Asp Pro Thr Leu Pro Glu Gly Trp Thr Arg Lys 35 40 45Leu Lys Gln Arg Lys Ser Gly Arg Ser Ala Gly Lys Tyr Asp Val Tyr 50 55 60Leu Ile Asn Pro Gln Gly Lys Ala Phe Arg Ser Lys Val Glu Leu Ile65 70 75 80Ala Tyr Phe Glu Lys Val Gly Asp Thr Ser Leu Asp Pro Asn Asp Phe 85 90 95Asp Phe Thr Val Thr Gly Arg Gly Ser Pro Ser Arg Arg Glu Gln Lys 100 105 110Pro Pro Lys Lys Pro Lys Ser Pro Lys Ala Pro Gly Thr Gly Arg Gly 115 120 125Arg Gly Arg Pro Lys Gly Ser Gly Thr Gly Arg Pro Lys Ala Ala Ala 130 135 140Ser Glu Gly Val Gln Val Lys Arg Val Leu Glu Lys Ser Pro Gly Lys145 150 155 160Leu Val Val Lys Met Pro Phe Gln Ala Ser Pro Gly Gly Lys Gly Glu 165 170 175Gly Gly Gly Ala Thr Thr Ser Ala Gln Val Met Val Ile Lys Arg Pro 180 185 190Gly Arg Lys Arg Lys Ala Glu Ala Asp Pro Gln Ala Ile Pro Lys Lys 195 200 205Arg Gly Arg Lys Pro Gly Ser Val Val Ala Ala Ala Ala Ala Glu Ala 210 215 220Lys Lys Lys Ala Val Lys Glu Ser Ser Ile Arg Ser Val His Glu Thr225 230 235 240Val Leu Pro Ile Lys Lys Arg Lys Thr Arg Glu Thr Val Gly Ser Ser 245 250 255Gly Ser Ser Gly Met Val Ser Lys Gly Glu Glu Leu Phe Thr Gly Val 260 265 270Val Pro Ile Leu Val Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe 275 280 285Ser Val Ser Gly Glu Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr 290 295 300Leu Lys Phe Ile Cys Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr305 310 315 320Leu Val Thr Thr Leu Thr Tyr Gly Val Gln Cys Phe Ser Arg Tyr Pro 325 330 335Asp His Met Lys Gln His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly 340 345 350Tyr Val Gln Glu Arg Thr Ile Phe Phe Lys Asp Asp Gly Asn Tyr Lys 355 360 365Thr Arg Ala Glu Val Lys Phe Glu Gly Asp Thr Leu Val Asn Arg Ile 370 375 380Glu Leu Lys Gly Ile Asp Phe Lys Glu Asp Gly Asn Ile Leu Gly His385 390 395 400Lys Leu Glu Tyr Asn Tyr Asn Ser His Asn Val Tyr Ile Met Ala Asp 405 410 415Lys Gln Lys Asn Gly Ile Lys Val Asn Phe Lys Ile Arg His Asn Ile 420 425 430Glu Asp Gly Ser Val Gln Leu Ala Asp His Tyr Gln Gln Asn Thr Pro 435 440 445Ile Gly Asp Gly

Pro Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr 450 455 460Gln Ser Ala Leu Ser Lys Asp Pro Asn Glu Lys Arg Asp His Met Val465 470 475 480Leu Leu Glu Phe Val Thr Ala Ala Gly Ile Thr Leu Gly Met Asp Glu 485 490 495Leu Tyr Lys47415PRTArtificial SequenceMouse dNIC e2 with EGFP tag polypeptide 47Met Val Ala Gly Met Leu Gly Leu Arg Glu Glu Lys Pro Ala Val Pro1 5 10 15Glu Ala Ser Ala Ser Pro Lys Gln Arg Arg Ser Ile Ile Arg Asp Arg 20 25 30Gly Pro Met Tyr Asp Asp Pro Thr Leu Pro Glu Gly Trp Thr Arg Lys 35 40 45Leu Lys Gln Arg Lys Ser Gly Arg Ser Ala Gly Lys Tyr Asp Val Tyr 50 55 60Leu Ile Asn Pro Gln Gly Lys Ala Phe Arg Ser Lys Val Glu Leu Ile65 70 75 80Ala Tyr Phe Glu Lys Val Gly Asp Thr Ser Leu Asp Pro Asn Asp Phe 85 90 95Asp Phe Thr Val Thr Gly Arg Gly Ser Pro Ser Arg Arg Glu Gln Lys 100 105 110Pro Pro Gly Ser Ser Gly Ser Ser Gly Pro Lys Lys Lys Arg Lys Val 115 120 125Pro Gly Ser Val Val Ala Ala Ala Ala Ala Glu Ala Lys Lys Lys Ala 130 135 140Val Lys Glu Ser Ser Ile Arg Ser Val His Glu Thr Val Leu Pro Ile145 150 155 160Lys Lys Arg Lys Thr Arg Glu Thr Val Gly Ser Ser Gly Ser Ser Gly 165 170 175Met Val Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro Ile Leu 180 185 190Val Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Gly 195 200 205Glu Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe Ile 210 215 220Cys Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr225 230 235 240Leu Thr Tyr Gly Val Gln Cys Phe Ser Arg Tyr Pro Asp His Met Lys 245 250 255Gln His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gln Glu 260 265 270Arg Thr Ile Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu 275 280 285Val Lys Phe Glu Gly Asp Thr Leu Val Asn Arg Ile Glu Leu Lys Gly 290 295 300Ile Asp Phe Lys Glu Asp Gly Asn Ile Leu Gly His Lys Leu Glu Tyr305 310 315 320Asn Tyr Asn Ser His Asn Val Tyr Ile Met Ala Asp Lys Gln Lys Asn 325 330 335Gly Ile Lys Val Asn Phe Lys Ile Arg His Asn Ile Glu Asp Gly Ser 340 345 350Val Gln Leu Ala Asp His Tyr Gln Gln Asn Thr Pro Ile Gly Asp Gly 355 360 365Pro Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gln Ser Ala Leu 370 375 380Ser Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu Phe385 390 395 400Val Thr Ala Ala Gly Ile Thr Leu Gly Met Asp Glu Leu Tyr Lys 405 410 415482274DNAArtificial SequenceMouse WT e1 with EGFP tag cDNA 48atggccgccg ctgccgccac cgccgccgcc gccgccgcgc cgagcggagg aggaggagga 60ggcgaggagg agagactgga ggaaaagtca gaagaccagg atctccaggg cctcagagac 120aagccactga agtttaagaa ggcgaagaaa gacaagaagg aggacaaaga aggcaagcat 180gagccactac aaccttcagc ccaccattct gcagagccag cagaggcagg caaagcagaa 240acatcagaaa gctcaggctc tgccccagca gtgccagaag cctcggcttc ccccaaacag 300cggcgctcca ttatccgtga ccggggacct atgtatgatg accccacctt gcctgaaggt 360tggacacgaa agcttaaaca aaggaagtct ggccgatctg ctggaaagta tgatgtatat 420ttgatcaatc cccagggaaa agcttttcgc tctaaagtag aattgattgc atactttgaa 480aaggtgggag acacctcctt ggaccctaat gattttgact tcacggtaac tgggagaggg 540agcccctcca ggagagagca gaaaccacct aagaagccca aatctcccaa agctccagga 600actggcaggg gtcggggacg ccccaaaggg agcggcactg ggagaccaaa ggcagcagca 660tcagaaggtg ttcaggtgaa aagggtcctg gagaagagcc ctgggaaact tgttgtcaag 720atgcctttcc aagcatcgcc tgggggtaag ggtgagggag gtggggctac cacatctgcc 780caggtcatgg tgatcaaacg ccctggcaga aagcgaaaag ctgaagctga cccccaggcc 840attcctaaga aacggggtag aaagcctggg agtgtggtgg cagctgctgc agctgaggcc 900aaaaagaaag ccgtgaagga gtcttccata cggtctgtgc atgagactgt gctccccatc 960aagaagcgca agacccggga gacggtcagc atcgaggtca aggaagtggt gaagcccctg 1020ctggtgtcca cccttggtga gaaaagcggg aagggactga agacctgcaa gagccctggg 1080cgtaaaagca aggagagcag ccccaagggg cgcagcagca gtgcctcctc cccacctaag 1140aaggagcacc atcatcacca ccatcactca gagtccacaa aggcccccat gccactgctc 1200ccatccccac ccccacctga gcctgagagc tctgaggacc ccatcagccc ccctgagcct 1260caggacttga gcagcagcat ctgcaaagaa gagaagatgc cccgaggagg ctcactggaa 1320agcgatggct gccccaagga gccagctaag actcagccta tggtcgccac cactaccaca 1380gttgcagaaa agtacaaaca ccgaggggag ggagagcgca aagacattgt ttcatcttcc 1440atgccaaggc caaacagaga ggagcctgtg gacagccgga cgcccgtgac cgagagagtt 1500agctgtaagg atccaccggt cgccaccatg gtgagcaagg gcgaggagct gttcaccggg 1560gtggtgccca tcctggtcga gctggacggc gacgtaaacg gccacaagtt cagcgtgtcc 1620ggcgagggcg agggcgatgc cacctacggc aagctgaccc tgaagttcat ctgcaccacc 1680ggcaagctgc ccgtgccctg gcccaccctc gtgaccaccc tgacctacgg cgtgcagtgc 1740ttcagccgct accccgacca catgaagcag cacgacttct tcaagtccgc catgcccgaa 1800ggctacgtcc aggagcgcac catcttcttc aaggacgacg gcaactacaa gacccgcgcc 1860gaggtgaagt tcgagggcga caccctggtg aaccgcatcg agctgaaggg catcgacttc 1920aaggaggacg gcaacatcct ggggcacaag ctggagtaca actacaacag ccacaacgtc 1980tatatcatgg ccgacaagca gaagaacggc atcaaggtga acttcaagat ccgccacaac 2040atcgaggacg gcagcgtgca gctcgccgac cactaccagc agaacacccc catcggcgac 2100ggccccgtgc tgctgcccga caaccactac ctgagcaccc agtccgccct gagcaaagac 2160cccaacgaga agcgcgatca catggtcctg ctggagttcg tgaccgccgc cgggatcact 2220ctcggcatgg acgagctgta caagtaactc ggcatggacg agctgtacaa gtaa 2274492070DNAArtificial SequenceMouse dN e1 with EGFP tag cDNA 49atggccgccg ctgccgccac cgccgccgcc gccgccgcgc cgagcggagg aggaggagga 60ggcgaggagg agagactgga ggaaaagcca gcagtgccag aagcctcggc ttcccccaaa 120cagcggcgct ccattatccg tgaccgggga cctatgtatg atgaccccac cttgcctgaa 180ggttggacac gaaagcttaa acaaaggaag tctggccgat ctgctggaaa gtatgatgta 240tatttgatca atccccaggg aaaagctttt cgctctaaag tagaattgat tgcatacttt 300gaaaaggtgg gagacacctc cttggaccct aatgattttg acttcacggt aactgggaga 360gggagcccct ccaggagaga gcagaaacca cctaagaagc ccaaatctcc caaagctcca 420ggaactggca ggggtcgggg acgccccaaa gggagcggca ctgggagacc aaaggcagca 480gcatcagaag gtgttcaggt gaaaagggtc ctggagaaga gccctgggaa acttgttgtc 540aagatgcctt tccaagcatc gcctgggggt aagggtgagg gaggtggggc taccacatct 600gcccaggtca tggtgatcaa acgccctggc agaaagcgaa aagctgaagc tgacccccag 660gccattccta agaaacgggg tagaaagcct gggagtgtgg tggcagctgc tgcagctgag 720gccaaaaaga aagccgtgaa ggagtcttcc atacggtctg tgcatgagac tgtgctcccc 780atcaagaagc gcaagacccg ggagacggtc agcatcgagg tcaaggaagt ggtgaagccc 840ctgctggtgt ccacccttgg tgagaaaagc gggaagggac tgaagacctg caagagccct 900gggcgtaaaa gcaaggagag cagccccaag gggcgcagca gcagtgcctc ctccccacct 960aagaaggagc accatcatca ccaccatcac tcagagtcca caaaggcccc catgccactg 1020ctcccatccc cacccccacc tgagcctgag agctctgagg accccatcag cccccctgag 1080cctcaggact tgagcagcag catctgcaaa gaagagaaga tgccccgagg aggctcactg 1140gaaagcgatg gctgccccaa ggagccagct aagactcagc ctatggtcgc caccactacc 1200acagttgcag aaaagtacaa acaccgaggg gagggagagc gcaaagacat tgtttcatct 1260tccatgccaa ggccaaacag agaggagcct gtggacagcc ggacgcccgt gaccgagaga 1320gttagctgta aggatccacc ggtcgccacc atggtgagca agggcgagga gctgttcacc 1380ggggtggtgc ccatcctggt cgagctggac ggcgacgtaa acggccacaa gttcagcgtg 1440tccggcgagg gcgagggcga tgccacctac ggcaagctga ccctgaagtt catctgcacc 1500accggcaagc tgcccgtgcc ctggcccacc ctcgtgacca ccctgaccta cggcgtgcag 1560tgcttcagcc gctaccccga ccacatgaag cagcacgact tcttcaagtc cgccatgccc 1620gaaggctacg tccaggagcg caccatcttc ttcaaggacg acggcaacta caagacccgc 1680gccgaggtga agttcgaggg cgacaccctg gtgaaccgca tcgagctgaa gggcatcgac 1740ttcaaggagg acggcaacat cctggggcac aagctggagt acaactacaa cagccacaac 1800gtctatatca tggccgacaa gcagaagaac ggcatcaagg tgaacttcaa gatccgccac 1860aacatcgagg acggcagcgt gcagctcgcc gaccactacc agcagaacac ccccatcggc 1920gacggccccg tgctgctgcc cgacaaccac tacctgagca cccagtccgc cctgagcaaa 1980gaccccaacg agaagcgcga tcacatggtc ctgctggagt tcgtgaccgc cgccgggatc 2040actctcggca tggacgagct gtacaagtaa 2070501551DNAArtificial SequenceMouse dNC e1 with EGFP tag cDNA 50atggccgccg ctgccgccac cgccgccgcc gccgccgcgc cgagcggagg aggaggagga 60ggcgaggagg agagactgga ggaaaagcca gcagtgccag aagcctcggc ttcccccaaa 120cagcggcgct ccattatccg tgaccgggga cctatgtatg atgaccccac cttgcctgaa 180ggttggacac gaaagcttaa acaaaggaag tctggccgat ctgctggaaa gtatgatgta 240tatttgatca atccccaggg aaaagctttt cgctctaaag tagaattgat tgcatacttt 300gaaaaggtgg gagacacctc cttggaccct aatgattttg acttcacggt aactgggaga 360gggagcccct ccaggagaga gcagaaacca cctaagaagc ccaaatctcc caaagctcca 420ggaactggca ggggtcgggg acgccccaaa gggagcggca ctgggagacc aaaggcagca 480gcatcagaag gtgttcaggt gaaaagggtc ctggagaaga gccctgggaa acttgttgtc 540aagatgcctt tccaagcatc gcctgggggt aagggtgagg gaggtggggc taccacatct 600gcccaggtca tggtgatcaa acgccctggc agaaagcgaa aagctgaagc tgacccccag 660gccattccta agaaacgggg tagaaagcct gggagtgtgg tggcagctgc tgcagctgag 720gccaaaaaga aagccgtgaa ggagtcttcc atacggtctg tgcatgagac tgtgctcccc 780atcaagaagc gcaagacccg ggagacggtc gggagctccg gcagttctgg aatggtgagc 840aagggcgagg agctgttcac cggggtggtg cccatcctgg tcgagctgga cggcgacgta 900aacggccaca agttcagcgt gtccggcgag ggcgagggcg atgccaccta cggcaagctg 960accctgaagt tcatctgcac caccggcaag ctgcccgtgc cctggcccac cctcgtgacc 1020accctgacct acggcgtgca gtgcttcagc cgctaccccg accacatgaa gcagcacgac 1080ttcttcaagt ccgccatgcc cgaaggctac gtccaggagc gcaccatctt cttcaaggac 1140gacggcaact acaagacccg cgccgaggtg aagttcgagg gcgacaccct ggtgaaccgc 1200atcgagctga agggcatcga cttcaaggag gacggcaaca tcctggggca caagctggag 1260tacaactaca acagccacaa cgtctatatc atggccgaca agcagaagaa cggcatcaag 1320gtgaacttca agatccgcca caacatcgag gacggcagcg tgcagctcgc cgaccactac 1380cagcagaaca cccccatcgg cgacggcccc gtgctgctgc ccgacaacca ctacctgagc 1440acccagtccg ccctgagcaa agaccccaac gagaagcgcg atcacatggt cctgctggag 1500ttcgtgaccg ccgccgggat cactctcggc atggacgagc tgtacaagta a 1551511299DNAArtificial SequenceMouse dNIC e1 with EGFP tag cDNA 51atggccgccg ctgccgccac cgccgccgcc gccgccgcgc cgagcggagg aggaggagga 60ggcgaggagg agagactgga ggaaaagcca gcagtgccag aagcctcggc ttcccccaaa 120cagcggcgct ccattatccg tgaccgggga cctatgtatg atgaccccac cttgcctgaa 180ggttggacac gaaagcttaa acaaaggaag tctggccgat ctgctggaaa gtatgatgta 240tatttgatca atccccaggg aaaagctttt cgctctaaag tagaattgat tgcatacttt 300gaaaaggtgg gagacacctc cttggaccct aatgattttg acttcacggt aactgggaga 360gggagcccct ccaggagaga gcagaaacca cctggatcca gtggcagctc tgggcccaag 420aaaaagcgga aggtgcctgg gagtgtggtg gcagctgctg cagctgaggc caaaaagaaa 480gccgtgaagg agtcttccat acggtctgtg catgagactg tgctccccat caagaagcgc 540aagacccggg agacggtcgg gagctccggc agttctggaa tggtgagcaa gggcgaggag 600ctgttcaccg gggtggtgcc catcctggtc gagctggacg gcgacgtaaa cggccacaag 660ttcagcgtgt ccggcgaggg cgagggcgat gccacctacg gcaagctgac cctgaagttc 720atctgcacca ccggcaagct gcccgtgccc tggcccaccc tcgtgaccac cctgacctac 780ggcgtgcagt gcttcagccg ctaccccgac cacatgaagc agcacgactt cttcaagtcc 840gccatgcccg aaggctacgt ccaggagcgc accatcttct tcaaggacga cggcaactac 900aagacccgcg ccgaggtgaa gttcgagggc gacaccctgg tgaaccgcat cgagctgaag 960ggcatcgact tcaaggagga cggcaacatc ctggggcaca agctggagta caactacaac 1020agccacaacg tctatatcat ggccgacaag cagaagaacg gcatcaaggt gaacttcaag 1080atccgccaca acatcgagga cggcagcgtg cagctcgccg accactacca gcagaacacc 1140cccatcggcg acggccccgt gctgctgccc gacaaccact acctgagcac ccagtccgcc 1200ctgagcaaag accccaacga gaagcgcgat cacatggtcc tgctggagtt cgtgaccgcc 1260gccgggatca ctctcggcat ggacgagctg tacaagtaa 1299522196DNAArtificial SequenceMouse WT e2 with EGFP tag cDNA 52atggtagctg ggatgttagg gctcagggag gaaaagtcag aagaccagga tctccagggc 60ctcagagaca agccactgaa gtttaagaag gcgaagaaag acaagaagga ggacaaagaa 120ggcaagcatg agccactaca accttcagcc caccattctg cagagccagc agaggcaggc 180aaagcagaaa catcagaaag ctcaggctct gccccagcag tgccagaagc ctcggcttcc 240cccaaacagc ggcgctccat tatccgtgac cggggaccta tgtatgatga ccccaccttg 300cctgaaggtt ggacacgaaa gcttaaacaa aggaagtctg gccgatctgc tggaaagtat 360gatgtatatt tgatcaatcc ccagggaaaa gcttttcgct ctaaagtaga attgattgca 420tactttgaaa aggtgggaga cacctccttg gaccctaatg attttgactt cacggtaact 480gggagaggga gcccctccag gagagagcag aaaccaccta agaagcccaa atctcccaaa 540gctccaggaa ctggcagggg tcggggacgc cccaaaggga gcggcactgg gagaccaaag 600gcagcagcat cagaaggtgt tcaggtgaaa agggtcctgg agaagagccc tgggaaactt 660gttgtcaaga tgcctttcca agcatcgcct gggggtaagg gtgagggagg tggggctacc 720acatctgccc aggtcatggt gatcaaacgc cctggcagaa agcgaaaagc tgaagctgac 780ccccaggcca ttcctaagaa acggggtaga aagcctggga gtgtggtggc agctgctgca 840gctgaggcca aaaagaaagc cgtgaaggag tcttccatac ggtctgtgca tgagactgtg 900ctccccatca agaagcgcaa gacccgggag acggtcagca tcgaggtcaa ggaagtggtg 960aagcccctgc tggtgtccac ccttggtgag aaaagcggga agggactgaa gacctgcaag 1020agccctgggc gtaaaagcaa ggagagcagc cccaaggggc gcagcagcag tgcctcctcc 1080ccacctaaga aggagcacca tcatcaccac catcactcag agtccacaaa ggcccccatg 1140ccactgctcc catccccacc cccacctgag cctgagagct ctgaggaccc catcagcccc 1200cctgagcctc aggacttgag cagcagcatc tgcaaagaag agaagatgcc ccgaggaggc 1260tcactggaaa gcgatggctg ccccaaggag ccagctaaga ctcagcctat ggtcgccacc 1320actaccacag ttgcagaaaa gtacaaacac cgaggggagg gagagcgcaa agacattgtt 1380tcatcttcca tgccaaggcc aaacagagag gagcctgtgg acagccggac gcccgtgacc 1440gagagagtta gctgtaagga tccaccggtc gccaccatgg tgagcaaggg cgaggagctg 1500ttcaccgggg tggtgcccat cctggtcgag ctggacggcg acgtaaacgg ccacaagttc 1560agcgtgtccg gcgagggcga gggcgatgcc acctacggca agctgaccct gaagttcatc 1620tgcaccaccg gcaagctgcc cgtgccctgg cccaccctcg tgaccaccct gacctacggc 1680gtgcagtgct tcagccgcta ccccgaccac atgaagcagc acgacttctt caagtccgcc 1740atgcccgaag gctacgtcca ggagcgcacc atcttcttca aggacgacgg caactacaag 1800acccgcgccg aggtgaagtt cgagggcgac accctggtga accgcatcga gctgaagggc 1860atcgacttca aggaggacgg caacatcctg gggcacaagc tggagtacaa ctacaacagc 1920cacaacgtct atatcatggc cgacaagcag aagaacggca tcaaggtgaa cttcaagatc 1980cgccacaaca tcgaggacgg cagcgtgcag ctcgccgacc actaccagca gaacaccccc 2040atcggcgacg gccccgtgct gctgcccgac aaccactacc tgagcaccca gtccgccctg 2100agcaaagacc ccaacgagaa gcgcgatcac atggtcctgc tggagttcgt gaccgccgcc 2160gggatcactc tcggcatgga cgagctgtac aagtaa 2196532019DNAArtificial SequenceMouse dN e2 with EGFP tag cDNA 53atggtagctg ggatgttagg gctcagggag gaaaagccag cagtgccaga agcctcggct 60tcccccaaac agcggcgctc cattatccgt gaccggggac ctatgtatga tgaccccacc 120ttgcctgaag gttggacacg aaagcttaaa caaaggaagt ctggccgatc tgctggaaag 180tatgatgtat atttgatcaa tccccaggga aaagcttttc gctctaaagt agaattgatt 240gcatactttg aaaaggtggg agacacctcc ttggacccta atgattttga cttcacggta 300actgggagag ggagcccctc caggagagag cagaaaccac ctaagaagcc caaatctccc 360aaagctccag gaactggcag gggtcgggga cgccccaaag ggagcggcac tgggagacca 420aaggcagcag catcagaagg tgttcaggtg aaaagggtcc tggagaagag ccctgggaaa 480cttgttgtca agatgccttt ccaagcatcg cctgggggta agggtgaggg aggtggggct 540accacatctg cccaggtcat ggtgatcaaa cgccctggca gaaagcgaaa agctgaagct 600gacccccagg ccattcctaa gaaacggggt agaaagcctg ggagtgtggt ggcagctgct 660gcagctgagg ccaaaaagaa agccgtgaag gagtcttcca tacggtctgt gcatgagact 720gtgctcccca tcaagaagcg caagacccgg gagacggtca gcatcgaggt caaggaagtg 780gtgaagcccc tgctggtgtc cacccttggt gagaaaagcg ggaagggact gaagacctgc 840aagagccctg ggcgtaaaag caaggagagc agccccaagg ggcgcagcag cagtgcctcc 900tccccaccta agaaggagca ccatcatcac caccatcact cagagtccac aaaggccccc 960atgccactgc tcccatcccc acccccacct gagcctgaga gctctgagga ccccatcagc 1020ccccctgagc ctcaggactt gagcagcagc atctgcaaag aagagaagat gccccgagga 1080ggctcactgg aaagcgatgg ctgccccaag gagccagcta agactcagcc tatggtcgcc 1140accactacca cagttgcaga aaagtacaaa caccgagggg agggagagcg caaagacatt 1200gtttcatctt ccatgccaag gccaaacaga gaggagcctg tggacagccg gacgcccgtg 1260accgagagag ttagctgtaa ggatccaccg gtcgccacca tggtgagcaa gggcgaggag 1320ctgttcaccg gggtggtgcc catcctggtc gagctggacg gcgacgtaaa cggccacaag 1380ttcagcgtgt ccggcgaggg cgagggcgat gccacctacg gcaagctgac cctgaagttc 1440atctgcacca ccggcaagct gcccgtgccc tggcccaccc tcgtgaccac cctgacctac 1500ggcgtgcagt gcttcagccg ctaccccgac cacatgaagc agcacgactt cttcaagtcc 1560gccatgcccg aaggctacgt ccaggagcgc accatcttct tcaaggacga cggcaactac 1620aagacccgcg ccgaggtgaa gttcgagggc gacaccctgg tgaaccgcat cgagctgaag 1680ggcatcgact tcaaggagga cggcaacatc ctggggcaca agctggagta caactacaac 1740agccacaacg tctatatcat ggccgacaag cagaagaacg gcatcaaggt gaacttcaag 1800atccgccaca acatcgagga cggcagcgtg cagctcgccg accactacca gcagaacacc 1860cccatcggcg acggccccgt gctgctgccc gacaaccact acctgagcac ccagtccgcc 1920ctgagcaaag accccaacga gaagcgcgat cacatggtcc tgctggagtt cgtgaccgcc 1980gccgggatca ctctcggcat ggacgagctg tacaagtaa 2019541500DNAArtificial SequenceMouse dNC e2 with EGFP tag cDNA 54atggtagctg ggatgttagg gctcagggag gaaaagccag cagtgccaga agcctcggct 60tcccccaaac agcggcgctc cattatccgt gaccggggac ctatgtatga tgaccccacc 120ttgcctgaag gttggacacg aaagcttaaa caaaggaagt ctggccgatc tgctggaaag 180tatgatgtat atttgatcaa tccccaggga aaagcttttc gctctaaagt agaattgatt

240gcatactttg aaaaggtggg agacacctcc ttggacccta atgattttga cttcacggta 300actgggagag ggagcccctc caggagagag cagaaaccac ctaagaagcc caaatctccc 360aaagctccag gaactggcag gggtcgggga cgccccaaag ggagcggcac tgggagacca 420aaggcagcag catcagaagg tgttcaggtg aaaagggtcc tggagaagag ccctgggaaa 480cttgttgtca agatgccttt ccaagcatcg cctgggggta agggtgaggg aggtggggct 540accacatctg cccaggtcat ggtgatcaaa cgccctggca gaaagcgaaa agctgaagct 600gacccccagg ccattcctaa gaaacggggt agaaagcctg ggagtgtggt ggcagctgct 660gcagctgagg ccaaaaagaa agccgtgaag gagtcttcca tacggtctgt gcatgagact 720gtgctcccca tcaagaagcg caagacccgg gagacggtcg ggagctccgg cagttctgga 780atggtgagca agggcgagga gctgttcacc ggggtggtgc ccatcctggt cgagctggac 840ggcgacgtaa acggccacaa gttcagcgtg tccggcgagg gcgagggcga tgccacctac 900ggcaagctga ccctgaagtt catctgcacc accggcaagc tgcccgtgcc ctggcccacc 960ctcgtgacca ccctgaccta cggcgtgcag tgcttcagcc gctaccccga ccacatgaag 1020cagcacgact tcttcaagtc cgccatgccc gaaggctacg tccaggagcg caccatcttc 1080ttcaaggacg acggcaacta caagacccgc gccgaggtga agttcgaggg cgacaccctg 1140gtgaaccgca tcgagctgaa gggcatcgac ttcaaggagg acggcaacat cctggggcac 1200aagctggagt acaactacaa cagccacaac gtctatatca tggccgacaa gcagaagaac 1260ggcatcaagg tgaacttcaa gatccgccac aacatcgagg acggcagcgt gcagctcgcc 1320gaccactacc agcagaacac ccccatcggc gacggccccg tgctgctgcc cgacaaccac 1380tacctgagca cccagtccgc cctgagcaaa gaccccaacg agaagcgcga tcacatggtc 1440ctgctggagt tcgtgaccgc cgccgggatc actctcggca tggacgagct gtacaagtaa 1500551248DNAArtificial SequenceMouse dNIC e2 with EGFP tag cDNA 55atggtagctg ggatgttagg gctcagggag gaaaagccag cagtgccaga agcctcggct 60tcccccaaac agcggcgctc cattatccgt gaccggggac ctatgtatga tgaccccacc 120ttgcctgaag gttggacacg aaagcttaaa caaaggaagt ctggccgatc tgctggaaag 180tatgatgtat atttgatcaa tccccaggga aaagcttttc gctctaaagt agaattgatt 240gcatactttg aaaaggtggg agacacctcc ttggacccta atgattttga cttcacggta 300actgggagag ggagcccctc caggagagag cagaaaccac ctggatccag tggcagctct 360gggcccaaga aaaagcggaa ggtgcctggg agtgtggtgg cagctgctgc agctgaggcc 420aaaaagaaag ccgtgaagga gtcttccata cggtctgtgc atgagactgt gctccccatc 480aagaagcgca agacccggga gacggtcggg agctccggca gttctggaat ggtgagcaag 540ggcgaggagc tgttcaccgg ggtggtgccc atcctggtcg agctggacgg cgacgtaaac 600ggccacaagt tcagcgtgtc cggcgagggc gagggcgatg ccacctacgg caagctgacc 660ctgaagttca tctgcaccac cggcaagctg cccgtgccct ggcccaccct cgtgaccacc 720ctgacctacg gcgtgcagtg cttcagccgc taccccgacc acatgaagca gcacgacttc 780ttcaagtccg ccatgcccga aggctacgtc caggagcgca ccatcttctt caaggacgac 840ggcaactaca agacccgcgc cgaggtgaag ttcgagggcg acaccctggt gaaccgcatc 900gagctgaagg gcatcgactt caaggaggac ggcaacatcc tggggcacaa gctggagtac 960aactacaaca gccacaacgt ctatatcatg gccgacaagc agaagaacgg catcaaggtg 1020aacttcaaga tccgccacaa catcgaggac ggcagcgtgc agctcgccga ccactaccag 1080cagaacaccc ccatcggcga cggccccgtg ctgctgcccg acaaccacta cctgagcacc 1140cagtccgccc tgagcaaaga ccccaacgag aagcgcgatc acatggtcct gctggagttc 1200gtgaccgccg ccgggatcac tctcggcatg gacgagctgt acaagtaa 12485634DNAArtificial SequencePrimer R111G Forward 56tggacacgaa agcttaaaca agggaagtct ggcc 345734DNAArtificial SequencePrimer R111G Reverse 57ggccagactt cccttgttta agctttcgtg tcca 345829DNAArtificial SequencePrimer R306C Forward 58ctcccgggtc ttgcacttct tgatgggga 295929DNAArtificial SequencePrimer R306C Reverse 59tccccatcaa gaagtgcaag acccgggag 2960284PRTArtificial SequenceHuman dNC e1 with Myc tag polypeptide 60Met Ala Ala Ala Ala Ala Ala Ala Pro Ser Gly Gly Gly Gly Gly Gly1 5 10 15Glu Glu Glu Arg Leu Glu Glu Lys Pro Ala Val Pro Glu Ala Ser Ala 20 25 30Ser Pro Lys Gln Arg Arg Ser Ile Ile Arg Asp Arg Gly Pro Met Tyr 35 40 45Asp Asp Pro Thr Leu Pro Glu Gly Trp Thr Arg Lys Leu Lys Gln Arg 50 55 60Lys Ser Gly Arg Ser Ala Gly Lys Tyr Asp Val Tyr Leu Ile Asn Pro65 70 75 80Gln Gly Lys Ala Phe Arg Ser Lys Val Glu Leu Ile Ala Tyr Phe Glu 85 90 95Lys Val Gly Asp Thr Ser Leu Asp Pro Asn Asp Phe Asp Phe Thr Val 100 105 110Thr Gly Arg Gly Ser Pro Ser Arg Arg Glu Gln Lys Pro Pro Lys Lys 115 120 125Pro Lys Ser Pro Lys Ala Pro Gly Thr Gly Arg Gly Arg Gly Arg Pro 130 135 140Lys Gly Ser Gly Thr Thr Arg Pro Lys Ala Ala Thr Ser Glu Gly Val145 150 155 160Gln Val Lys Arg Val Leu Glu Lys Ser Pro Gly Lys Leu Leu Val Lys 165 170 175Met Pro Phe Gln Thr Ser Pro Gly Gly Lys Ala Glu Gly Gly Gly Ala 180 185 190Thr Thr Ser Thr Gln Val Met Val Ile Lys Arg Pro Gly Arg Lys Arg 195 200 205Lys Ala Glu Ala Asp Pro Gln Ala Ile Pro Lys Lys Arg Gly Arg Lys 210 215 220Pro Gly Ser Val Val Ala Ala Ala Ala Ala Glu Ala Lys Lys Lys Ala225 230 235 240Val Lys Glu Ser Ser Ile Arg Ser Val Gln Glu Thr Val Leu Pro Ile 245 250 255Lys Lys Arg Lys Thr Arg Glu Thr Val Gly Ser Ser Gly Ser Ser Gly 260 265 270Glu Gln Lys Leu Ile Ser Glu Glu Asp Leu Val Asp 275 28061200PRTArtificial SequenceHuman dNIC e1 with Myc tag polypeptide 61Met Ala Ala Ala Ala Ala Ala Ala Pro Ser Gly Gly Gly Gly Gly Gly1 5 10 15Glu Glu Glu Arg Leu Glu Glu Lys Pro Ala Val Pro Glu Ala Ser Ala 20 25 30Ser Pro Lys Gln Arg Arg Ser Ile Ile Arg Asp Arg Gly Pro Met Tyr 35 40 45Asp Asp Pro Thr Leu Pro Glu Gly Trp Thr Arg Lys Leu Lys Gln Arg 50 55 60Lys Ser Gly Arg Ser Ala Gly Lys Tyr Asp Val Tyr Leu Ile Asn Pro65 70 75 80Gln Gly Lys Ala Phe Arg Ser Lys Val Glu Leu Ile Ala Tyr Phe Glu 85 90 95Lys Val Gly Asp Thr Ser Leu Asp Pro Asn Asp Phe Asp Phe Thr Val 100 105 110Thr Gly Arg Gly Ser Pro Ser Arg Arg Glu Gln Lys Pro Pro Gly Ser 115 120 125Ser Gly Ser Ser Gly Pro Lys Lys Lys Arg Lys Val Pro Gly Ser Val 130 135 140Val Ala Ala Ala Ala Ala Glu Ala Lys Lys Lys Ala Val Lys Glu Ser145 150 155 160Ser Ile Arg Ser Val Gln Glu Thr Val Leu Pro Ile Lys Lys Arg Lys 165 170 175Thr Arg Glu Thr Val Gly Ser Ser Gly Ser Ser Gly Glu Gln Lys Leu 180 185 190Ile Ser Glu Glu Asp Leu Val Asp 195 20062855DNAArtificial SequenceHuman dNC e1 with Myc tag cDNA 62atggccgccg ccgccgccgc cgcgccgagc ggaggaggag gaggaggcga ggaggagaga 60ctggaagaaa agccggctgt gccggaagct tctgcctccc ccaaacagcg gcgctccatc 120atccgtgacc ggggacccat gtatgatgac cccaccctgc ctgaaggctg gacacggaag 180cttaagcaaa ggaaatctgg ccgctctgct gggaagtatg atgtgtattt gatcaatccc 240cagggaaaag cctttcgctc taaagtggag ttgattgcgt acttcgaaaa ggtaggcgac 300acatccctgg accctaatga ttttgacttc acggtaactg ggagagggag cccctcccgg 360cgagagcaga aaccacctaa gaagcccaaa tctcccaaag ctccaggaac tggcagaggc 420cggggacgcc ccaaagggag cggcaccacg agacccaagg cggccacgtc agagggtgtg 480caggtgaaaa gggtcctgga gaaaagtcct gggaagctcc ttgtcaagat gccttttcaa 540acttcgccag ggggcaaggc tgaggggggt ggggccacca catccaccca ggtcatggtg 600atcaaacgcc ccggcaggaa gcgaaaagct gaggccgacc ctcaggccat tcccaagaaa 660cggggccgaa agccggggag tgtggtggca gccgctgccg ccgaggccaa aaagaaagcc 720gtgaaggagt cttctatccg atctgtgcag gagaccgtac tccccatcaa gaagcgcaag 780acccgggaga cggtcgggag ctccggcagt tctggagaac aaaaactcat ctcagaagag 840gatctggtcg actag 85563603DNAArtificial SequenceHuman dNIC e1 with Myc tag cDNA 63atggccgccg ccgccgccgc cgcgccgagc ggaggaggag gaggaggcga ggaggagaga 60ctggaagaaa agccggctgt gccggaagct tctgcctccc ccaaacagcg gcgctccatc 120atccgtgacc ggggacccat gtatgatgac cccaccctgc ctgaaggctg gacacggaag 180cttaagcaaa ggaaatctgg ccgctctgct gggaagtatg atgtgtattt gatcaatccc 240cagggaaaag cctttcgctc taaagtggag ttgattgcgt acttcgaaaa ggtaggcgac 300acatccctgg accctaatga ttttgacttc acggtaactg ggagagggag cccctcccgg 360cgagagcaga aaccacctgg atccagtggc agctctgggc ccaagaaaaa gcggaaggtg 420ccggggagtg tggtggcagc cgctgccgcc gaggccaaaa agaaagccgt gaaggagtct 480tctatccgat ctgtgcagga gaccgtactc cccatcaaga agcgcaagac ccgggagacg 540gtcgggagct ccggcagttc tggagaacaa aaactcatct cagaagagga tctggtcgac 600tag 6036420DNAArtificial SequenceGuide sequence RNA 64ggttgtgacc cgccatggat 20651504DNAArtificial SequencedNIC expression cassette flanked by AAV2 ITRs 65gcgcgctcgc tcgctcactg aggccgcccg ggcaaagccc gggcgtcggg cgacctttgg 60tcgcccggcc tcagtgagcg agcgagcgcg cagagaggga gtggggttcg gtacccatag 120gcgccaagag cctagacttc cttaagcgcc agagtccaca agggcccagt taatcctcaa 180cattcaaatg ctgcccacaa aaccagcccc tctgtgccct agccgcctct tttttccaag 240tgacagtaga actccaccaa tccgcagctg aatggggtcc gcctcttttc cctgcctaaa 300cagacaggaa ctcctgccaa ttgagggcgt caccgctaag gctccgcccc agcctgggct 360ccacaaccaa tgaagggtaa tctcgacaaa gagcaagggg tggggcgcgg gcgcgcaggt 420gcagcagcac acaggctggt cgggagggcg gggcgcgacg tctgccgtgc ggggtcccgg 480catcggttgc gcgcgcgctc cctcctctcg gagagagggc tgtggtaaaa cccgtccgga 540aaccatggcc gccgccgccg ccgccgcgcc gagcggagga ggaggaggag gcgaggagga 600gagactggaa gaaaagccgg ctgtgccgga agcttctgcc tcccccaaac agcggcgctc 660catcatccgt gaccggggac ccatgtatga tgaccccacc ctgcctgaag gctggacacg 720gaagcttaag caaaggaaat ctggccgctc tgctgggaag tatgatgtgt atttgatcaa 780tccccaggga aaagcctttc gctctaaagt ggagttgatt gcgtacttcg aaaaggtagg 840cgacacatcc ctggacccta atgattttga cttcacggta actgggagag ggagcccctc 900ccggcgagag cagaaaccac ctggatccag tggcagctct gggcccaaga aaaagcggaa 960ggtgccgggg agtgtggtgg cagccgctgc cgccgaggcc aaaaagaaag ccgtgaagga 1020gtcttctatc cgatctgtgc aggagaccgt actccccatc aagaagcgca agacccggga 1080gacggtcggg agctccggca gttctggaga acaaaaactc atctcagaag aggatctggt 1140cgactagagc tcgctgatca gcctcacaag aataaaggca gctgttgtct cttcagaagt 1200agctttgcac ttttctaaac taggaatatc accaggactg ttactcaatg tgtgggtacc 1260gaaagcactg atatatttaa aaacaaaagg tgtaacctat ttattatata aagagtttgc 1320cttataaatt tacataaaaa tgtccgtttg tgtcttttgt tgtaaaaatc acgcgtagga 1380acccctagtg atggagttgg ccactccctc tctgcgcgct cgctcgctca ctgaggccgg 1440gcgaccaaag gtcgcccgac gcccgggctt tgcccgggcg gcctcagtga gcgagcgagc 1500gcgc 1504



User Contributions:

Comment about this patent or add new information about this topic:

CAPTCHA
New patent applications in this class:
DateTitle
2022-09-22Electronic device
2022-09-22Front-facing proximity detection using capacitive sensor
2022-09-22Touch-control panel and touch-control display apparatus
2022-09-22Sensing circuit with signal compensation
2022-09-22Reduced-size interfaces for managing alerts
Website © 2025 Advameg, Inc.