Patent application title: MECP2E1 GENE
Inventors:
IPC8 Class: AC12Q16883FI
USPC Class:
1 1
Class name:
Publication date: 2020-08-27
Patent application number: 20200270695
Abstract:
The invention is a novel MECP2E1 splice variant and its corresponding
polypeptide. The invention also includes methods of using these nucleic
acid sequences and proteins in medical diagnosis and treatment of
neuropsychiatric disorders or development disorders.Claims:
1-11. (canceled)
12. A method of preparing an MECP2 protein fraction from a subject useful for analyzing an MECP2E1 protein involved in neuropsychiatric or developmental disorder, comprising: (a) extracting proteins comprising an MECP2E1 protein from a sample from the subject; (b) producing an MECP2E1 protein fraction of the proteins extracted in (a) by contacting the proteins extracted in (a) with antibodies or antibody fragments that bind to the MECP2E1 protein; and (c) analyzing the MECP2E1 protein fraction produced in (b).
13. The method of claim 12, wherein step (c) comprises detecting a mutation in a wild-type MECP2E1 protein, wherein the wild-type MECP2E1 protein has the amino acid sequence of SEQ ID NO: 4.
14. The method of claim 13, wherein mutation is a premature truncation of the wild-type MECP2E1 protein.
15. The method of claim 14, wherein the premature truncation occurs after amino acid at positions 36 or 97 of SEQ ID NO: 4.
16. The method of claim 12, wherein the sample from the subject is a selected from blood, urine, serum, tears, saliva, and feces.
17. The method of claim 12, wherein step (c) comprises performing a radioimmunoassay, enzyme immunoassay, immunofluorescence, immune-precipitation, latex agglutination, hemagglutination, or histochemical test.
18. The method of claim 17, wherein the antibodies or antibody fragments of (b) are labeled with a detectable marker.
19. The method of claim 18, wherein the detectable marker is an enzyme, fluorescent material, luminescent material, or radioactive material.
20. The method of claim 19, wherein the enzyme is horseradish peroxidase, biotin, alkaline phosphatase, .beta.-galactosidase, or acetylcholinesterase.
21. The method of claim 19, wherein the fluorescent material is umbelliferone, fluorescein, fluorescein isothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, dansyl chloride, or phycoerythrin.
22. The method of claim 19, wherein the luminescent material is luminol.
23. The method of claim 19, wherein the radioactive material is S-35, Cu-64, Ga-67, Zr-89, Ru-97, Tc-99m, Rh-105, Pd-109, In-111, I-123, I-125, I-131, Re-186, Au-198, Au-199, Ph-203, At-211, Pb-212, or Bi-212.
24. The method of claim 12, wherein step (c) comprises quantifying the amount of MECP2E1 protein in the sample.
25. The method of claim 12, wherein the neuropsychiatric or developmental disorder is selected from autism, autism spectrum disorder, epilepsy, Angelman syndrome, Prader-Willi syndrome, encephalopathy, schizophrenia, bipolar affective disorder, depression, obsessive compulsive disorder, panic disorder, attention deficit hyperactivity disorder, ataxia, and mental retardation.
26. A method of preparing an MECP2 amplified fraction from a subject useful for analyzing an MECP2E1 gene involved in neuropsychiatric or developmental disorder, comprising: (a) extracting nucleic acids comprising an MECP2E1 gene from a sample from the subject; (b) producing an MECP2E1 amplified fraction of the nucleic acids extracted in (a) by contacting the nucleic acids extracted in (a) with primers to amplify the MECP2E1 gene, wherein the MECP2E1 amplified fraction comprises MECP2E1 amplified products; and (c) analyzing the MECP2E1 amplified fraction produced in (b).
27. The method of claim 26, wherein step (c) comprises detecting a mutation within exon 1, or in the intron-exon boundary immediately adjacent to exon 1, of a nucleic acid sequence encoding an MeCP2E1 protein having the amino acid sequence of SEQ ID NO: 4.
28. The method of claim 27, wherein mutation is selected from the group consisting of: (1) a deletion of 11 consecutive base pairs in nucleotides 38 to 54 of SEQ ID NO: 1, the deletion causing a truncation of the MeCP2E1 protein of SEQ ID NO: 4 after amino acid 36; (2) a deletion consisting of nucleotides 1-69 of exon 1 of SEQ ID NO: 1; (3) an adenine to thymine change at nucleotide position 8 of SEQ ID NO: 1; (4) a deletion of a T, G or TG between nucleotide positions 69-71 of SEQ ID NO: 1; (5) a deletion of a T, G or TG between nucleotide positions 70-71 of SEQ ID NO: 1; (6) a deletion of the nucleotide sequence GC at nucleotides -38 and -39 upstream of the position corresponding to nucleotide 1 of SEQ ID NO: 1; (7) a deletion of the nucleotide sequence AG at nucleotides -19 and -20 upstream of the position corresponding to nucleotide 1 of SEQ ID NO:1; (8) a deletion of 11 consecutive base pairs in nucleotides 38 to 54 of SEQ ID NO: 1, the deletion causing a truncation of the MeCP2E1 protein of SEQ ID NO: 4 after amino acid 36; and (9) an adenine to thymine mutation at the nucleotide position corresponding to position 8 of SEQ ID NO: 1.
29. A method of detecting Rett syndrome that is associated with a point mutation in the human MECP2 gene, comprising detecting the presence or absence of a point mutation which disrupts the initiation codon in exon 1 of a nucleic acid sequence encoding the MeCP2E1 protein having the amino acid sequence of SEQ ID NO.: 4 in a sample obtained from a human by (i) amplifying the sample nucleic acid sequence with primers that amplify an adenine to guanine change at nucleotide position 8 of SEQ ID NO:1 and comparing the amplified sample nucleic acid sequence to a control nucleic acid sequence or (ii) detecting with a probe an adenine to guanine change at nucleotide position 8 of SEQ ID NO:1, wherein the presence of the mutation in the sample nucleic acid sequence indicates that the human has Rett syndrome.
Description:
RELATED APPLICATIONS
[0001] This application is a continuation of U.S. patent application Ser. No. 15/429,143, filed Feb. 9, 2017, which is a continuation of U.S. patent application Ser. No. 14/100,889, filed Dec. 9, 2013, now U.S. Pat. No. 9,605,314, which is a continuation of U.S. patent application Ser. No. 12/657,559, filed Jan. 21, 2010, now U.S. Pat. No. 8,637,236, which is a continuation-in-part of U.S. patent application Ser. No. 11/352,153, filed Feb. 9, 2006, now U.S. Pat. No. 7,670,773, which is a continuation of International Patent Application No. PCT/CA2005/000198, filed Feb. 17, 2005, which claims priority from U.S. Provisional Patent Application No. 60/544,311, filed Feb. 17, 2004. The contents of these applications are incorporated herein by reference in their entirety.
SEQUENCE LISTING
[0002] The instant application contains a Sequence Listing which has been submitted in ASCII format via EFS-Web and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Feb. 28, 2014, is named 103779-0609_SL.txt and is 154,492 bytes in size.
BACKGROUND OF THE INVENTION
[0003] Neuropsychiatric disorders account for six of the ten highest impact diseases worldwide, according to the World Health Organization. Cost to the US economy is $100 billion--one of every four persons entering physician offices has a diagnosable mental disorder.
[0004] Rett syndrome (RTT) (OMIM #312750) is characterized by onset, in girls, of a gradual slowing of neurodevelopment in the second half of the first year of life towards stagnation by age four, followed by regression and loss of acquired fine motor and communication skills. A pseudostationary period follows during which a picture of preserved ambulation, aberrant communication and stereotypic hand wringing approximates early autism. Regression, however, remains insidiously ongoing and ultimately results in profound mental retardation.
[0005] Up to 80% of patients with RTT have mutations in exons 3 and 4 of the 4-exon MECP2 gene (FIG. 1a) encoding the MeCP2 transcriptional repressor. Mutations in the remaining 20% of patients has remained elusive. In the known transcript of the gene all four exons are utilized, the translation start site is in exon 2, and exon 1 and most of exon 2 form the 5'untranslated region (UTR). For clarity, this transcript is named MECP2E2 (previously MECP2A), and its encoded protein MeCP2E2 (previously MeCP2A).
[0006] No mutation specific to the MeCP2E2-defining exon 2 has been found to date despite several hundred patients analyzed for mutations in this exon. These studies did not include exon 1 as it was considered non-coding.
[0007] Non-inactivating MECP2 mutations have also been associated with phenotypes that overlap RTT such as mental retardation and autism. There is a need for the identification of further mutations to account for the remaining 20% of RTT patients so that methods of diagnosing and treating RTT can be identified.
[0008] Mutations in the Rett syndrome gene, MECP2, have also been found among autism patients as well as in patients with childhood onset psychosis, Angelman syndrome, non-syndromic mental retardation and neo-natal encepalopathy, demonstrating that there may be diverse phenotypic consequences of mutations in MECP2.
SUMMARY OF THE INVENTION
[0009] The present inventors have identified a novel open reading frame of the MECP2 gene, that is called MECP2E1. Inspection of the 5'UTR revealed that, whereas exon 2 has a number of in-frame stops upstream of the ATG, exon 1 contains an open reading frame across its entire length including an ATG. This open reading frame encodes a transcript composed of exons 1, 3 and 4 of the MECP2 gene. MECP2E1 is similar to MECP2E2 (GenBank accession # NM_004992, SEQ ID NO. 1, except with nucleotides 71-193 absent, corresponding to the splicing out of exon 2.
[0010] Accordingly, the present invention provides an isolated nucleic acid molecule comprising a sequence encoding the MeCP2E1 protein. The invention also includes the corresponding polypeptide, MeCP2E1.
[0011] In one embodiment, the purified and isolated nucleic acid molecule comprises
[0012] (a) a nucleic acid sequence encoding a protein as shown in SEQ ID No. 4;
[0013] (b) a nucleic acid sequence complementary to (a);
[0014] (c) a nucleic acid sequence that has substantial homology to (a) or (b);
[0015] (d) a nucleic acid sequence that is an analog to a nucleic acid sequence of (a), (b), or (c);
[0016] (e) a fragment of (a) to (d) that is at least 15 bases, preferably 20 to 30 bases, and which will hybridize to a nucleic acid sequence of (a), (b), (c) or (d) under stringent hybridization conditions; or
[0017] (f) a nucleic acid molecule differing from any of the nucleic acids of (a) to (c) in codon sequences due to the degeneracy of the genetic code.
[0018] In a specific embodiment of the invention, an isolated nucleic acid molecule is provided having a sequence as shown in SEQ ID No. 3 or a fragment or variant thereof.
[0019] The inventors have found that patients with a neuropsychiatric disorder or developmental disorder such as Rett's syndrome and mental retardation, had mutations in exon 1 of the MECP2E1 gene. Accordingly, the present invention provides a method of detecting a neuropsychiatric disorder or developmental disorder comprising detecting a mutation or deletion in exon 1 of the MECP2E1 sequence (SEQ ID No. 3). A mutation can be detected by sequencing PCR products from genomic DNA using primers X1F/X1R: mutation screening primers (FIG. 1). Detection of insertion or deletion mutations may require the cloning of the PCR product into a suitable plasmid vector, followed by transfection into E. coli, and sequencing of clones from isolated colonies. Alternatively, a mutation can be detected by multiple ligation-dependent probe amplification (MLPA) using 20 probe pairs that target the four MECP2 exons, six X-linked control regions and ten autosomal control regions. A mutation or deletion can also be detected by assaying for the protein product encoded by MECP2E1.
[0020] Other features and advantages of the present invention will become apparent from the following detailed description. It should be understood, however, that the detailed description and the specific examples while indicating preferred embodiments of the invention are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.
BRIEF DESCRIPTION OF THE DRAWINGS
[0021] The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawings will be provided by the Office upon request and payment of the necessary fee.
[0022] The invention will now be described in relation to the drawings in which:
[0023] FIGS. 1A-1F shows MECP2 5' splice variants. a) Structure of the MECP2 gene. Numbered boxes indicate exons; asterisks indicate in-frame stop codons. In the traditional MECP2E2 splice variant, the start codon is in exon 2. In MECP2E1, exon 2 is not present and the start codon is in exon 1. HF/HR1 and MF/MR: human and mouse primer pairs used in the rtPCR experiments shown in panel c. HR2: a second human reverse primer, which confirms the results obtained with HR1 (data not shown). X1F/X1R: mutation screening primers (see FIG. 2). Primer sequences (5'-3'): HF-ctcggagagagggctgtg (SEQ ID No. 5), HR1-cttgaggggtttgtccttga (SEQ ID No. 6), HR2-cgtttgatcaccatgacctg (SEQ ID No. 7), MF-aggaggcgaggaggagagac (SEQ ID No. 8), MR-ctggctctgcagaatggtg (SEQ ID No. 9), X1F-ccatcacagccaatgacg (SEQ ID No. 19), X1R-agggggagggtagagaggag (SEQ ID No. 20). b) Examples of MECP2 ESTs. c) PCR results using primers in (a) (HF/HR1 and MF/MR) on cDNA from indicated adult tissues (except where indicated otherwise) and cell cultures; d.p.c.: days postcoitum. d) Transcript-specific real-time quantitative PCR (SYBR Green detection method) on cDNA from indicated tissues or cell cultures. e) 3'myc-tagged MeCP2E1 (and MeCP2E2) localize principally in the nucleus, and in indeterminate puncti in the cytoplasm. f) N-termini of indicated proteins (SEQ ID NOS 30-42, respectively, in order of appearance); dashes represent no amino acids.
[0024] FIGS. 2A-2C shows a deletion mutation in patient V1. a1) Sequence of PCR product from genomic DNA using primers X1F/X1R (FIG. 1a). Note mixed sequence. a2) and a3) Sequences of clones of the patient's wild-type and mutant alleles respectively; red box indicating the 11 nucleotides deleted in the mutated allele. b) Electropherograms of the same cloned wild-type and deleted alleles (SEQ ID NOS 43-46, respectively, in order of appearance). c) PCR on indicated cDNAs using primers HF/HR1 (FIG. 1a,c). Lanes 1 and 2 (on 2.5% high resolution agarose) are from control and patient whole blood respectively. Lanes 3 to 8 (on 6% denaturing polyacrylamide) are from control blood (3), patient blood (4), control fetal brain (5), control adult brain (6), control testis (7) and control genomic DNA (8). Note that expression of the patient's MECP2E2 transcript with the 11 bp exon 1 deletion (band at 266 bp) is not diminished compared to the non-deleted allele (277 bp). The 141 and 152 bp bands are the deleted and non-deleted MECP2E1 transcripts respectively.
[0025] FIGS. 3A-3B shows a deletion mutation in patient V2. MECP2 Multiplex ligation-dependent probe amplification (MLPA) peak profiles are shown. Control loci are listed along the top. Boxed regions (E1-E4) indicate MECP2 exons 1-4. a) MLPA profile of normal control. b) MLPA profile of patient V2 shows a hemizygous exon 1 deletion (asterisk). The result was consistently reproducible and sequencing ruled out the possibility of a SNP interfering with the ligation efficiency of the MLPA reaction.
[0026] FIGS. 4A-4B shows the characterization of the primary brain cell cultures by rtPCRR (A) and IF (B). (A) Map2, Gfap and Nestin expressions indicate that the cultures in B-27 medium were composed primarily of neurons and those in G-5 medium were glial cells. Fibroblasts from the same embryos were also cultured and used as negative controls. Whole brain tissue (15.5 dpc) was used as a positive control for Map2 and Nestin. (B) Double staining for neurons was performed with mouse anti-MAP2 and rabbit anti-GFAP antibodies. They were also counterstained with DAPI (blue). Most of the cells are neurons, which stained positively for MAP2 (green), and an insignificant percentage of contamination with glial cells stained positively for GFAP (red) was detected.
[0027] FIG. 5 shows the nucleotide sequence of the five MECP2 exon 1 variants identified in female MR patients. All sequences were obtained from single colonies, after cloning the heterozygous PCR product into the pDRIVE vector (Qiagen). The ATG start codon is indicated by a red box, where possible. The resulting amino acid sequence is also indicated, with wild type sequence shown in red, and changes indicated in green type. FIG. 5 discloses SEQ ID NOS 18 and 47-55, respectively, in order of appearance.
[0028] FIG. 6 shows a high resolution agarose gel (2.2%) of PCR product for MECP2 exon 1 for negative controls (Lanes 1 and 2), 3 bp insertion (Lanes 3 and 4), 9 bp insertion (Lane 5) and 2 bp deletion (Lane 6). Size ladder (M)100 bp ladder (MBI Fermentas), flanks the PCR lanes.
DETAILED DESCRIPTION OF THE INVENTION
[0029] The present inventors have identified a MECP2 splice variant that contributes to new coding sequence that may contain mutations in patients with neuropsychiatric disorders such as Rett's syndrome and mental retardation.
I. Nucleic Acid Molecules of the Invention
[0030] As hereinbefore mentioned, the present invention relates to isolated MECP2E1 nucleic acid molecules. The term "isolated" refers to a nucleic acid substantially free of cellular material or culture medium when produced by recombinant DNA techniques, or chemical precursors, or other chemicals when chemically synthesized.
[0031] The term "nucleic acid" is intended to include DNA and RNA and can be either double stranded or single stranded. The term is also intended to include a strand that is a mixture of nucleic acid molecules and nucleic acid analogs and/or nucleotide analogs, or that is made entirely of nucleic acid analogs and/or nucleotide analogs.
[0032] Broadly stated, the present invention provides an isolated nucleic acid molecule containing a sequence encoding the MECP2E1 transcript of the MECP2 gene. Accordingly, the present invention provides an isolated nucleic acid molecule containing a sequence encoding MECP2E1 shown in SEQ ID No. 4 or a fragment, variant, or analog thereof.
[0033] In one embodiment, the purified and isolated nucleic acid molecule comprises
[0034] (a) a nucleic acid sequence encoding a MECP2E1 protein as shown in SEQ ID No. 4;
[0035] (b) a nucleic acid sequence complementary to (a);
[0036] (c) a nucleic acid sequence that has substantial homology to (a) or (b);
[0037] (d) a nucleic acid sequence that is an analog to a nucleic acid sequence of (a), (b), or (c);
[0038] (e) a fragment of (a) to (d) that is at least 15 bases, preferably 20 to 30 bases, and which will hybridize to a nucleic acid sequence of (a), (b), (c) or (d) under stringent hybridization conditions; or
[0039] (f) a nucleic acid molecule differing from any of the nucleic acids of (a) to (c) in codon sequences due to the degeneracy of the genetic code.
[0040] In a specific embodiment of the invention, the isolated nucleic acid molecule has a sequence as shown in SEQ ID No. 3 or a fragment or variant thereof.
[0041] The term "MECP2E1" means an isoform of the MECP2 gene that contains exons 1, 3 and 4 but lacks exon 2. This gene was previously referred to as MECP2B but is now called MECP2E1 indicating the translation start site in exon one. The term "MECP2E1" includes the nucleic acid sequence as shown in SEQ ID No. 3 as well as mutations, variants and fragments thereof that are associated with neuropsychiatric disorders and developmental disorders. "MECP2E1" can also be referred to as "MECP2_e1." The "MeCP2E1" protein can also be referred to as "MeCP2_e1." MECP2E2 is the transcript of the gene that contains exons 1, 2, 3 and 4. "MECP2E2" can also be referred to as "MECP2_e2." The "MeCP2E2" protein can also be referred to as "MeCP2_e2."
[0042] It will be appreciated that the invention includes nucleic acid molecules encoding truncations of the MeCP2E1 proteins of the invention, and analogs and homologs of the MeCP2E1 proteins of the invention and truncations thereof, as described below.
[0043] Further, it will be appreciated that the invention includes nucleic acid molecules comprising nucleic acid sequences having substantial sequence homology with the nucleic acid sequences of the invention and fragments thereof. The term "sequences having substantial sequence homology" means those nucleic acid sequences which have slight or inconsequential sequence variations from these sequences, i.e. the sequences function in substantially the same manner to produce functionally equivalent proteins. The variations may be attributable to local mutations or structural modifications.
[0044] Generally, nucleic acid sequences having substantial homology include nucleic acid sequences having at least 70%, preferably 80-90% identity with the nucleic acid sequences of the invention.
[0045] Sequence identity is most preferably assessed by the algorithm of the BLAST version 2.1 program advanced search (BLAST is a series of programs that are available online at www.ncbi.nlm.nih.gov/BLAST. The advanced blast search (www.ncbi.nlm.nih.gov/blast/blast.cgi?Jform=1) is set to default parameters. (ie Matrix BLOSUM62; Gap existence cost 11; Per residue gap cost 1; Lambda ratio 0.85 default).). For example, if a nucleotide sequence (called "Sequence A") has 90% identity to a portion of the nucleotide sequence in SEQ ID No. 3, then Sequence A will be identical to the referenced portion of the nucleotide sequence in SEQ ID No. 3, except that Sequence A may include up to 10 point mutations, such as substitutions with other nucleotides, per each 100 nucleotides of the referenced portion of the nucleotide sequence in SEQ ID No. 3. Nucleotide sequences functionally equivalent to the MECP2E1 transcript can occur in a variety of forms as described below.
[0046] The term "a nucleic acid sequence which is an analog" means a nucleic acid sequence which has been modified as compared to the sequence of (a), (b) or (c) wherein the modification does not alter the utility of the sequence as described herein. The modified sequence or analog may have improved properties over the sequence shown in (a), (b) or (c). One example of a modification to prepare an analog is to replace one of the naturally occurring bases (i.e. adenine, guanine, cytosine or thymine) of the sequence shown in SEQ ID No. 3, with a modified base such as such as xanthine, hypoxanthine, 2-aminoadenine, 6-methyl, 2-propyl and other alkyl adenines, 5-halo uracil, 5-halo cytosine, 6-aza uracil, 6-aza cytosine and 6-aza thymine, pseudo uracil, 4-thiouracil, 8-halo adenine, 8-aminoadenine, 8-thiol adenine, 8-thiolalkyl adenines, 8-hydroxyl adenine and other 8-substituted adenines, 8-halo guanines, 8 amino guanine, 8-thiol guanine, 8-thiolalkyl guanines, 8-hydroxyl guanine and other 8-substituted guanines, other aza and deaza uracils, thymidines, cytosines, adenines, or guanines, 5-trifluoromethyl uracil and 5-trifluoro cytosine.
[0047] Another example of a modification is to include modified phosphorous or oxygen heteroatoms in the phosphate backbone, short chain alkyl or cycloalkyl intersugar linkages or short chain heteroatomic or heterocyclic intersugar linkages in the nucleic acid molecule shown in SEQ ID No. 3. For example, the nucleic acid sequences may contain phosphorothioates, phosphotriesters, methyl phosphonates, and phosphorodithioates.
[0048] A further example of an analog of a nucleic acid molecule of the invention is a peptide nucleic acid (PNA) wherein the deoxyribose (or ribose) phosphate backbone in the DNA (or RNA), is replaced with a polyamide backbone which is similar to that found in peptides (P. E. Nielsen, et al Science 1991, 254, 1497). PNA analogs have been shown to be resistant to degradation by enzymes and to have extended lives in vivo and in vitro. PNAs also bind stronger to a complimentary DNA sequence due to the lack of charge repulsion between the PNA strand and the DNA strand. Other nucleic acid analogs may contain nucleotides containing polymer backbones, cyclic backbones, or acyclic backbones. For example, the nucleotides may have morpholino backbone structures (U.S. Pat. No. 5,034,506). The analogs may also contain groups such as reporter groups, a group for improving the pharmacokinetic or pharmacodynamic properties of nucleic acid sequence.
[0049] Another aspect of the invention provides a nucleic acid molecule, and fragments thereof having at least 15 bases, which hybridizes to the nucleic acid molecules of the invention under hybridization conditions. Such nucleic acid molecules preferably hybridize to all or a portion of MECP2E1 or its complement under stringent conditions as defined herein (see Sambrook et al. (most recent edition) Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.; Ausubel et al. (eds.), 1995, Current Protocols in Molecular Biology, (John Wiley & Sons, NY)). The portion of the hybridizing nucleic acids is typically at least 15 (e.g. 20, 25, 30 or 50) nucleotides in length. The hybridizing portion of the hybridizing nucleic acid is at least 80% e.g. at least 95% or at least 98% identical to the sequence or a portion or all of a nucleic acid encoding a MeCP2E1 polypeptide, or its complement. Hybridizing nucleic acids of the type described herein can be used, for example, as a cloning probe, a primer (e.g. a PCR primer) or a diagnostic probe. Hybridization of the oligonucleotide probe to a nucleic acid sample typically is performed under stringent conditions. Nucleic acid duplex or hybrid stability is expressed as the melting temperature or Tm, which is the temperature at which a probe dissociates from a target DNA. This melting temperature is used to define the required stringency conditions. If sequences are to be identified that are related and substantially identical to the probe, rather than identical, then it is useful to first establish the lowest temperature at which only homologous hybridization occurs with a particular concentration of salt (e.g. SSC or SSPE). Then, assuming that 1% mismatching results in a 1 degree Celsius decrease in the Tm, the temperature of the final wash in the hybridization reaction is reduced accordingly (for example, if sequences having greater than 95% identity with the probe are sought, the final wash temperature is decreased by 5 degrees Celsius). In practice, the change in Tm can be between 0.5 degrees Celsius and 1.5 degrees Celsius per 1% mismatch. Low stringency conditions involve hybridizing at about: 1.times.SSC, 0.1% SDS at 50.degree. C. High stringency conditions are: 0.1.times.SSC, 0.1% SDS at 65.degree. C. Moderate stringency is about 1.times.SSC 0.1% SDS at 60 degrees Celsius. The parameters of salt concentration and temperature can be varied to achieve the optimal level of identity between the probe and the target nucleic acid.
[0050] Isolated and purified nucleic acid molecules having sequences which differ from the nucleic acid sequence shown in SEQ ID No. 3 due to degeneracy in the genetic code are also within the scope of the invention. The genetic code is degenerate so other nucleic acid molecules, which encode a polypeptide identical to the MeCP2E1 amino acid sequence SEQ ID No. 3 may also be used.
[0051] The present invention also includes mutated forms of MEC2P2E1 associated with a neuropsychiatric disorder or developmental disorder including the specific mutations listed in Table 1. Specifically, the following mutations are associated with Rett's syndrome: (1) an 11 bp deletion in nucleotides 38 to 54 shown in SEQ ID No. 1; (2) a deletion of exon 1 containing nucleotides 1-69 shown in SEQ ID No. 1; (3) an adenine to thymine change at nucleotide position 8 shown in SEQ ID No. 1; (4) a deletion in the sequence TG at nucleotide positions 70-71 in SEQ ID No. 1 (5) an adenine to guanine change at nucleotide position 8 shown in SEQ ID No. 1; (6) a cytosine to thymine change at nucleotide position 12 shown in SEQ ID No. 1; and (7) a deletion in the sequence TG at nucleotide positions 69 and 70 in SEQ ID No. 1.
[0052] The following mutations are associated with developmental delay: (1) an insertion of one or more copies of the trinucleotide sequence GCC between nucleotides 11 and 29 shown in SEQ ID No. 1; (2) a deletion of one or more copies of the trinucleotide sequence GCC between nucleotides 11 and 29 shown in SEQ ID No. 1; (3) an insertion of the nucleotide sequence GGA between nucleotides 38 and 54 shown in SEQ ID No. 1; (4) a deletion of the nucleotide sequence GC at nucleotides -38 and -39 upstream of nucleotide 1 shown in SEQ ID No. 1; and (5) a deletion of the nucleotide sequence AG at nucleotides -19 and -20 upstream of nucleotide 1 shown in SEQ ID No. 1.
[0053] With respect to mutations (4) and (5) in the developmental delay group, these are upstream of nucleotide 1 shown in SEQ ID No. 1 GenBank Accession number BX538060 has the upstream sequences. Therefore, for greater clarity mutation (4), that consists of a deletion of the nucleotide sequence GC at nucleotides -38 and -39, corresponds to nucleotides 11-12 of sequence BX538060; and mutation (5), that consists of a deletion of the nucleotide sequence AG at nucleotides -19 and -20, corresponds to nucleotides 30-31 of BX538060.
[0054] Nucleic acid molecules from MECP2E1 can be isolated by preparing a labeled nucleic acid probe based on all or part of the nucleic acid sequences as shown in SEQ ID No. 3, and using this labelled nucleic acid probe to screen an appropriate DNA library (e.g. a cDNA or genomic DNA library). Nucleic acids isolated by screening of a cDNA or genomic DNA library can be sequenced by standard techniques. Another method involves comparing the MECP2E1 sequence to other sequences, for example using bioinformatics techniques such as database searches or alignment strategies, and detecting the presence of a MECP2E1 nucleic acid sequence.
[0055] Nucleic acid molecules of the invention can also be isolated by selectively amplifying a nucleic acid using the polymerase chain reaction (PCR) methods and cDNA or genomic DNA. It is possible to design synthetic oligonucleotide primers from the nucleic acid molecules as shown in SEQ ID No. 3 for use in PCR. A nucleic acid can be amplified from cDNA or genomic DNA using these oligonucleotide primers and standard PCR amplification techniques. The nucleic acid so amplified can be cloned into an appropriate vector and characterized by DNA sequence analysis. It will be appreciated that cDNA may be prepared from mRNA, by isolating total cellular mRNA by a variety of techniques, for example, by using the guanidinium-thiocyanate extraction procedure of Chirgwin et al., Biochemistry, 18, 5294-5299 (1979). cDNA is then synthesized from the mRNA using reverse transcriptase (for example, Moloney MLV reverse transcriptase available from Gibco/BRL, Bethesda, Md., or AMV reverse transcriptase available from Seikagaku America, Inc., St. Petersburg, Fla.).
[0056] An isolated nucleic acid molecule of the invention which is RNA can be isolated by cloning a cDNA encoding a novel protein of the invention into an appropriate vector which allows for transcription of the cDNA to produce an RNA molecule which encodes the MeCP2E1 protein. For example, a cDNA can be cloned downstream of a bacteriophage promoter, (e.g. a T7 promoter) in a vector, cDNA can be transcribed in vitro with T7 polymerase, and the resultant RNA can be isolated by standard techniques.
[0057] A nucleic acid molecule of the invention may also be chemically synthesized using standard techniques. Various methods of chemically synthesizing polydeoxynucleotides are known, including solid-phase synthesis which, like peptide synthesis, has been fully automated in commercially available DNA synthesizers (See e.g., Itakura et al. U.S. Pat. No. 4,598,049; Caruthers et al. U.S. Pat. No. 4,458,066; and Itakura U.S. Pat. Nos. 4,401,796 and 4,373,071).
[0058] The initiation codon and untranslated sequences of the nucleic acid molecules of the invention may be determined using currently available computer software designed for the purpose, such as PC/Gene (IntelliGenetics Inc., Calif.). Regulatory elements can be identified using conventional techniques. The function of the elements can be confirmed by using these elements to express a reporter gene which is operatively linked to the elements. These constructs may be introduced into cultured cells using standard procedures. In addition to identifying regulatory elements in DNA, such constructs may also be used to identify proteins interacting with the elements, using techniques known in the art.
[0059] The sequence of a nucleic acid molecule of the invention may be inverted relative to its normal presentation for transcription to produce an antisense nucleic acid molecule. Preferably, an antisense sequence is constructed by inverting a region preceding the initiation codon or an unconserved region. In particular, the nucleic acid sequences contained in the nucleic acid molecules of the invention or a fragment thereof, preferably a nucleic acid sequence shown in SEQ ID No. 3 may be inverted relative to its normal presentation for transcription to produce antisense nucleic acid molecules.
[0060] The antisense nucleic acid molecules of the invention or a fragment thereof, may be chemically synthesized using naturally occurring nucleotides or variously modified nucleotides designed to increase the biological stability of the molecules or to increase the physical stability of the duplex formed with mRNA or the native gene e.g. phosphorothioate derivatives and acridine substituted nucleotides. The antisense sequences may be produced biologically using an expression vector introduced into cells in the form of a recombinant plasmid, phagemid or attenuated virus in which antisense sequences are produced under the control of a high efficiency regulatory region, the activity of which may be determined by the cell type into which the vector is introduced.
[0061] The invention also provides nucleic acids encoding fusion proteins comprising a novel protein of the invention and a selected protein, or a selectable marker protein (see below).
II. Novel Proteins of the Invention
[0062] The invention further includes an isolated MeCP2E1 protein encoded by the nucleic acid molecules of the invention. Within the context of the present invention, a protein of the invention may include various structural forms of the primary protein which retain biological activity.
[0063] Broadly stated, the present invention provides an isolated protein encoded by exon 1, 3 and 4 of the MECP2 gene.
[0064] In a preferred embodiment of the invention, the MeCP2E1 protein has the amino acid sequence as shown in SEQ ID No. 4 or a fragment or variant thereof.
[0065] The invention also includes mutated forms of the MeCP2E1 protein that are associated with a neuropsychiatric disorder or developmental disorder. Specifically, the invention includes the mutations in MECP2E1 described in Table 1.
[0066] In addition to full length amino acid sequences, the proteins of the present invention also include truncations of the protein, and analogs, and homologs of the protein and truncations thereof as described herein. Truncated proteins may comprise peptides of at least fifteen amino acid residues.
[0067] Analogs or variants of the protein having the amino acid sequence shown in SEQ ID No. 4 and/or truncations thereof as described herein, may include, but are not limited to an amino acid sequence containing one or more amino acid substitutions, insertions, and/or deletions. Amino acid substitutions may be of a conserved or non-conserved nature. Conserved amino acid substitutions involve replacing one or more amino acids of the proteins of the invention with amino acids of similar charge, size, and/or hydrophobicity characteristics. When only conserved substitutions are made the resulting analog should be functionally equivalent. Non-conserved substitutions involve replacing one or more amino acids of the amino acid sequence with one or more amino acids which possess dissimilar charge, size, and/or hydrophobicity characteristics.
[0068] One or more amino acid insertions may be introduced into the amino acid sequence shown in SEQ ID No. 4. Amino acid insertions may consist of single amino acid residues or sequential amino acids ranging from 2 to 15 amino acids in length. For example, amino acid insertions may be used to destroy target sequences so that the protein is no longer active. This procedure may be used in vivo to inhibit the activity of a protein of the invention.
[0069] Deletions may consist of the removal of one or more amino acids, or discrete portions from the amino acid sequence shown in SEQ ID No. 4. The deleted amino acids may or may not be contiguous. The lower limit length of the resulting analog with a deletion mutation is about 10 amino acids, preferably 100 amino acids.
[0070] Analogs of a protein of the invention may be prepared by introducing mutations in the nucleotide sequence encoding the protein. Mutations in nucleotide sequences constructed for expression of analogs of a protein of the invention must preserve the reading frame of the coding sequences. Furthermore, the mutations will preferably not create complementary regions that could hybridize to produce secondary mRNA structures, such as loops or hairpins, which could adversely affect translation of the receptor mRNA.
[0071] Mutations may be introduced at particular loci by synthesizing oligonucleotides containing a mutant sequence, flanked by restriction sites enabling ligation to fragments of the native sequence. Following ligation, the resulting reconstructed sequence encodes an analog having the desired amino acid insertion, substitution, or deletion.
[0072] Alternatively, oligonucleotide-directed site specific mutagenesis procedures may be employed to provide an altered gene having particular codons altered according to the substitution, deletion, or insertion required. Deletion or truncation of a protein of the invention may also be constructed by utilizing convenient restriction endonuclease sites adjacent to the desired deletion. Subsequent to restriction, overhangs may be filled in, and the DNA religated. Exemplary methods of making the alterations set forth above are disclosed by Sambrook et al (Molecular Cloning: A Laboratory Manual, 2nd Ed., Cold Spring Harbor Laboratory Press, 1989).
[0073] The proteins of the invention also include homologs of the amino acid sequence having the exon 1 region shown in SEQ ID No. 4 and/or truncations thereof as described herein.
[0074] A homologous protein includes a protein with an amino acid sequence having at least 70%, preferably 80-90% identity with the amino acid sequence as shown in SEQ ID No. 4 and includes the exon 1 region characteristic of the MeCP2E1 protein. As with the nucleic acid molecules of the invention, identity is calculated according to methods known in the art. Sequence identity is most preferably assessed by the algorithm of BLAST version 2.1 advanced search. BLAST is a series of programs that are available online at www.ncbi.nlm.nih.gov/BLAST. The advanced BLAST search (www.ncbi.nlm.nih.gov/blast/blast.cgi?Jform=1) is set to default parameters (i.e. Matrix BLOSUM62, Gap existence cost 11; Per residue gap cost 1; Lambda ration 0.85 default).
[0075] The invention also contemplates isoforms of the proteins of the invention. An isoform contains the same number and kinds of amino acids as a protein of the invention, but the isoform has a different molecular structure. The isoforms contemplated by the present invention are those having the same properties as a protein of the invention as described herein.
[0076] The present invention also includes a protein of the invention conjugated with a selected protein, or a selectable marker protein (see below) to produce fusion proteins. Additionally, immunogenic portions of a protein of the invention are within the scope of the invention.
[0077] The proteins of the invention (including truncations, analogs, etc.) may be prepared using recombinant DNA methods. Accordingly, the nucleic acid molecules of the present invention having a sequence which encodes a protein of the invention may be incorporated in a known manner into an appropriate expression vector which ensures good expression of the protein. Possible expression vectors include but are not limited to cosmids, plasmids, or modified viruses (e.g. replication defective retroviruses, adenoviruses and adeno-associated viruses), so long as the vector is compatible with the host cell used. The expression vectors are "suitable for transformation of a host cell", means that the expression vectors contain a nucleic acid molecule of the invention and regulatory sequences selected on the basis of the host cells to be used for expression, which is operatively linked to the nucleic acid molecule. Operatively linked is intended to mean that the nucleic acid is linked to regulatory sequences in a manner which allows expression of the nucleic acid.
[0078] The invention therefore contemplates a recombinant expression vector of the invention containing a nucleic acid molecule of the invention, or a fragment thereof, and the necessary regulatory sequences for the transcription and translation of the inserted protein-sequence. Suitable regulatory sequences may be derived from a variety of sources, including bacterial, fungal, or viral genes (For example, see the regulatory sequences described in Goeddel, Gene Expression Technology: Methods in Enzymology 185, Academic Press, San Diego, Calif. (1990). Selection of appropriate regulatory sequences is dependent on the host cell chosen, and may be readily accomplished by one of ordinary skill in the art. Examples of such regulatory sequences include: a transcriptional promoter and enhancer or RNA polymerase binding sequence, a ribosomal binding sequence, including a translation initiation signal. Additionally, depending on the host cell chosen and the vector employed, other sequences, such as an origin of replication, additional DNA restriction sites, enhancers, and sequences conferring inducibility of transcription may be incorporated into the expression vector. It will also be appreciated that the necessary regulatory sequences may be supplied by the native protein and/or its flanking regions.
[0079] The invention further provides a recombinant expression vector comprising a DNA nucleic acid molecule of the invention cloned into the expression vector in an antisense orientation. That is, the DNA molecule is operatively linked to a regulatory sequence in a manner which allows for expression, by transcription of the DNA molecule, of an RNA molecule which is antisense to a nucleotide sequence comprising the nucleotides as shown SEQ ID No. 3. Regulatory sequences operatively linked to the antisense nucleic acid can be chosen which direct the continuous expression of the antisense RNA molecule.
[0080] The recombinant expression vectors of the invention may also contain a selectable marker gene which facilitates the selection of host cells transformed or transfected with a recombinant molecule of the invention. Examples of selectable marker genes are genes encoding a protein such as G418 and hygromycin which confer resistance to certain drugs, .beta.-galactosidase, chloramphenicol acetyltransferase, or firefly luciferase. Transcription of the selectable marker gene is monitored by changes in the concentration of the selectable marker protein such as B-galactosidase, chloramphenicol acetyltransferase, or firefly luciferase. If the selectable marker gene encodes a protein conferring antibiotic resistance such as neomycin resistance transformant cells can be selected with G418. Cells that have incorporated the selectable marker gene will survive, while the other cells die. This makes it possible to visualize and assay for expression of recombinant expression vectors of the invention and in particular to determine the effect of a mutation on expression and phenotype. It will be appreciated that selectable markers can be introduced on a separate vector from the nucleic acid of interest.
[0081] The recombinant expression vectors may also contain genes which encode a fusion moiety which provides increased expression of the recombinant protein; increased solubility of the recombinant protein; and aid in the purification of a target recombinant protein by acting as a ligand in affinity purification. For example, a proteolytic cleavage site may be added to the target recombinant protein to allow separation of the recombinant protein from the fusion moiety subsequent to purification of the fusion protein.
[0082] Recombinant expression vectors can be introduced into host cells to produce a transformed host cell. The term "transformant host cell" is intended to include prokaryotic and eukaryotic cells which have been transformed or transfected with a recombinant expression vector of the invention. The terms "transformed with", "transfected with", "transformation" and "transfection" are intended to encompass introduction of nucleic acid (e.g. a vector) into a cell by one of many possible techniques known in the art. Prokaryotic cells can be transformed with nucleic acid by, for example, electroporation or calcium-chloride mediated transformation. Nucleic acid can be introduced into mammalian cells via conventional techniques such as calcium phosphate or calcium chloride co-precipitation, DEAE-dextran-mediated transfection, lipofectin, electroporation or microinjection. Suitable methods for transforming and transfecting host cells can be found in Sambrook et al. (Molecular Cloning: A Laboratory Manual, 2nd Edition, Cold Spring Harbor Laboratory press (1989)), and other laboratory textbooks.
[0083] Suitable host cells include a wide variety of prokaryotic and eukaryotic host cells. For example, the proteins of the invention may be expressed in bacterial cells such as E. coli, insect cells (using baculovirus), yeast cells or mammalian cells. Other suitable host cells can be found in Goeddel, Gene Expression Technology: Methods in Enzymology 185, Academic Press, San Diego, Calif. (1991).
[0084] The proteins of the invention may also be prepared by chemical synthesis using techniques well known in the chemistry of proteins such as solid phase synthesis (Merrifield, 1964, J. Am. Chem. Assoc. 85:2149-2154) or synthesis in homogenous solution (Houbenweyl, 1987, Methods of Organic Chemistry, ed. E. Wansch, Vol. 15 I and II, Thieme, Stuttgart).
III. Applications
A. Diagnostic Applications
[0085] As previously mentioned, the present inventors have isolated a novel splice variant of the MECP2 gene, MECP2E1, and have shown that exon 1 is deleted or mutated in people with neuropsychiatric disorders or developmental disorders such as Rett's syndrome or mental retardation. As a result, the present invention also includes a method of detecting a neuropsychiatric or developmental disorder by detecting a mutation or deletion in the MECP2E1 nucleic acid or MeCP2E1 protein.
[0086] The term "neuropsychiatric disorder" as used herein includes, but is not limited to, autism/autism spectrum disorder, epilepsy, Angelman syndrome, Prader-Willi syndrome, encephalopathy, schizophrenia, bipolar affective disorder, depression, obsessive compulsive disorder, panic disorder, attention deficit hyperactivity disorder, and ataxia.
[0087] The term "developmental disorder" includes but is not limited to, mental retardation.
i) Detecting Mutations in the Nucleic Acid Sequence
[0088] In one embodiment, the present invention provides a method for detecting a neuropsychiatric or developmental disorder comprising detecting a deletion or mutation in exon 1 of the MECP2 gene in a sample obtained from an animal, preferably a mammal, more preferably a human.
[0089] The Examples and Table 1 summarize some of the mutations found in MECP2E1 in patient's with Rett's syndrome or developmental delay. (They are also described in Section I). Screening assays can be developed for each of the mutations. Examples of methods that can be used to detect mutations include sequencing, polymerase chain reaction, reverse transcription-polymerase chain reaction, denaturing HPLC, electrophoretic mobility, nucleic acid hybridization, fluorescent in situ hybridization and multiplex ligation-dependent probe amplification. Details of screening assays that may be employed are provided in Examples 3, 4 or 5.
[0090] Rett's syndrome has been shown to be caused by deletions in exon 1 of MECP2. Patients homozygous for these deletions can be detected by PCR-amplifying and sequencing exon 1 and flanking sequences using X1F/X1R primers. Consequently, the present invention includes a method for determining a deletion in exon 1 of the MECP2 gene by a method comprising:
[0091] (a) amplifying the nucleic acid sequences in the sample with primers X1F (5'-CCATCACAGCCAATGACG-3') (SEQ ID No. 19) and X1R (5'-AGGGGGAGGGTAGAGAGGAG-3') (SEQ ID No. 20) in a polymerase chain reaction;
[0092] (b) amplifying the nucleic acid sequences from a control with same primers;
[0093] (c) sequencing the amplified sequences; and
[0094] (d) comparing the sample sequences to the control sequences
[0095] wherein deletion of nucleotides in the sample sequence compared to the control sequence indicates that the sample is from an animal with Rett's syndrome.
[0096] Additional exon 1 mutations not detectable by the PCR reaction, can be identified using multiplex ligation-dependent probe amplification (MLPA) in all four exons. MLPA analysis is described in reference 5 and in Schouten, U.S. application Ser. No. 10/218,567, (publication number 2003/0108913) which are incorporated herein in by reference. Accordingly, the present invention includes a method for determining a deletion in exon 1 of the MECP2 gene by performing MLPA analysis with 20 probe pairs that target the four MECP2 exons, six X-linked control regions and ten autosomal control regions.
[0097] One skilled in the art will appreciate that other methods, in addition to the ones discussed above and in the examples, can be used to detect mutations in exon 1 of the MECP2 gene. For example, in order to isolate nucleic acids from a sample, one can prepare nucleotide probes from the nucleic acid sequences of the invention. In addition, the nucleic acid probes described herein (for example, see FIG. 1) can also be used. A nucleotide probe may be labelled with a detectable marker such as a radioactive label which provides for an adequate signal and has sufficient half life such as .sup.32P, .sup.3H, .sup.14C or the like. Other detectable markers which may be used include antigens that are recognized by a specific labelled antibody, fluorescent compounds, enzymes, antibodies specific for a labelled antigen, and chemiluminescent compounds. An appropriate label may be selected having regard to the rate of hybridization and binding of the probe to the nucleotide to be detected and the amount of nucleotide available for hybridization.
[0098] Accordingly, the present invention also relates to a method of detecting the presence of a nucleic acid molecule containing exon 1 of the MECP2 gene in a sample comprising contacting the sample under hybridization conditions with one or more of nucleotide probes which hybridize to the nucleic acid molecules and are labelled with a detectable marker, and determining the degree of hybridization between the nucleic acid molecule in the sample and the nucleotide probes.
[0099] Hybridization conditions which may be used in the methods of the invention are known in the art and are described for example in Sambrook J, Fritch E F, Maniatis T. In: Molecular Cloning, A Laboratory Manual, 1989. (Nolan C, Ed.), Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. The hybridization product may be assayed using techniques known in the art. The nucleotide probe may be labelled with a detectable marker as described herein and the hybridization product may be assayed by detecting the detectable marker or the detectable change produced by the detectable marker.
[0100] Prior to hybridizing a sample with DNA probes, the sample can be treated with primers that flank the MECP2 gene in order to amplify the nucleic acid sequences in the sample. The primers used may be the ones described in the present application. For example, primers specific for human MECP2 include HF(ctcggagagagggctgtg) (SEQ ID No. 5), HR1(cttgaggggtttgtccttga) (SEQ ID No. 6), HR2(cgtttgatcaccatgacctg) (SEQ ID No. 7). Primers for mouse MECP2 include MF(aggaggcgaggaggagagac) (SEQ ID NO. 8) and MR(ctggctctgcagaatggtg) (SEQ ID No. 9). In addition, the sequence of the MECP2 gene provided herein also permits the identification and isolation, or synthesis of new nucleotide sequences which may be used as primers to amplify a nucleic acid molecule of the invention. The primers may be used to amplify the genomic DNA of other species. The PCR amplified sequences can be examined to determine the relationship between the genes of various species.
[0101] The length and bases of the primers for use in the PCR are selected so that they will hybridize to different strands of the desired sequence and at relative positions along the sequence such that an extension product synthesized from one primer when it is separated from its template can serve as a template for extension of the other primer into a nucleic acid of defined length. Primers which may be used in the invention are oligonucleotides i.e. molecules containing two or more deoxyribonucleotides of the nucleic acid molecule of the invention which occur naturally as in a purified restriction endonuclease digest or are produced synthetically using techniques known in the art such as for example phosphotriester and phosphodiester methods (See Good et al Nucl. Acid Res 4:2157, 1977) or automated techniques (See for example, Conolly, B. A. Nucleic Acids Res. 15(7): 3131, 1987). The primers are capable of acting as a point of initiation of synthesis when placed under conditions which permit the synthesis of a primer extension product which is complementary to the DNA sequence of the invention i.e. in the presence of nucleotide substrates, an agent for polymerization such as DNA polymerase and at suitable temperature and pH. Preferably, the primers are sequences that do not form secondary structures by base pairing with other copies of the primer or sequences that form a hair pin configuration. The primer preferably contains between about 7 and 25 nucleotides.
[0102] The primers may be labelled with detectable markers which allow for detection of the amplified products. Suitable detectable markers are radioactive markers such as P-32, S-35, I-125, and H-3, luminescent markers such as chemiluminescent markers, preferably luminol, and fluorescent markers, preferably dansyl chloride, fluorescein-5-isothiocyanate, and 4-fluor-7-nitrobenz-2-axa-1,3 diazole, enzyme markers such as horseradish peroxidase, alkaline phosphatase, .beta.-galactosidase, acetylcholinesterase, or biotin.
[0103] It will be appreciated that the primers may contain non-complementary sequences provided that a sufficient amount of the primer contains a sequence which is complementary to a nucleic acid molecule of the invention or oligonucleotide fragment thereof, which is to be amplified. Restriction site linkers may also be incorporated into the primers allowing for digestion of the amplified products with the appropriate restriction enzymes facilitating cloning and sequencing of the amplified product.
[0104] In an embodiment of the invention a method of determining the presence of a nucleic acid molecule of the invention is provided comprising treating the sample with primers which are capable of amplifying the nucleic acid molecule or a predetermined oligonucleotide fragment thereof in a polymerase chain reaction to form amplified sequences, under conditions which permit the formation of amplified sequences and, assaying for amplified sequences.
[0105] The polymerase chain reaction refers to a process for amplifying a target nucleic acid sequence as generally described in Innis et al, Academic Press, 1990 in Mullis et al., U.S. Pat. No. 4,863,195 and Mullis, U.S. Pat. No. 4,683,202 which are incorporated herein by reference. Conditions for amplifying a nucleic acid template are described in M. A. Innis and D. H. Gelfand, PCR Protocols, A Guide to Methods and Applications M. A. Innis, D. H. Gelfand, J. J. Sninsky and T. J. White eds, pp 3-12, Academic Press 1989, which is also incorporated herein by reference.
[0106] The amplified products can be isolated and distinguished based on their respective sizes using techniques known in the art. For example, after amplification, the DNA sample can be separated on an agarose gel and visualized, after staining with ethidium bromide, under ultra violet (UV) light. DNA may be amplified to a desired level and a further extension reaction may be performed to incorporate nucleotide derivatives having detectable markers such as radioactive labelled or biotin labelled nucleoside triphosphates. The primers may also be labelled with detectable markers as discussed above. The detectable markers may be analyzed by restriction and electrophoretic separation or other techniques known in the art.
[0107] The conditions which may be employed in the methods of the invention using PCR are those which permit hybridization and amplification reactions to proceed in the presence of DNA in a sample and appropriate complementary hybridization primers. Conditions suitable for the polymerase chain reaction are generally known in the art. For example, see M. A. Innis and D. H. Gelfand, PCR Protocols, A guide to Methods and Applications M. A. Innis, D. H. Gelfand, J. J. Sninsky and T. J. White eds, pp 3-12, Academic Press 1989, which is incorporated herein by reference. Preferably, the PCR utilizes polymerase obtained from the thermophilic bacterium Thermus aquatics (Taq polymerase, GeneAmp Kit, Perkin Elmer Cetus) or other thermostable polymerase may be used to amplify DNA template strands.
[0108] It will be appreciated that other techniques such as the Ligase Chain Reaction (LCR) and NASBA may be used to amplify a nucleic acid molecule of the invention (Barney in "PCR Methods and Applications", August 1991, Vol. 1(1), page 5, and European Published Application No. 0320308, published Jun. 14, 1989, and U.S. Pat. No. 5,130,238 to Malek).
(ii) Detecting the MeCP2E1 Protein
[0109] In another embodiment, the present invention provides a method for detecting a neuropsychiatric or developmental disorder comprising detecting a deletion or mutation in the MeCP2E1 protein in a sample from an animal.
[0110] The MeCP2E1 protein of the present invention may be detected in a biological sample using antibodies that are specific for MeCP2E1 using various immunoassays that are discussed below.
[0111] Conventional methods can be used to prepare the antibodies. For example, by using a peptide from the MeCP2E1 protein of the invention, polyclonal antisera or monoclonal antibodies can be made using standard methods. A mammal, (e.g., a mouse, hamster, or rabbit) can be immunized with an immunogenic form of the peptide which elicits an antibody response in the mammal. Techniques for conferring immunogenicity on a peptide include conjugation to carriers or other techniques well known in the art. For example, the peptide can be administered in the presence of adjuvant. The progress of immunization can be monitored by detection of antibody titers in plasma or serum. Standard ELISA or other immunoassay procedures can be used with the immunogen as antigen to assess the levels of antibodies. Following immunization, antisera can be obtained and, if desired, polyclonal antibodies isolated from the sera.
[0112] To produce monoclonal antibodies, antibody producing cells (lymphocytes) can be harvested from an immunized animal and fused with myeloma cells by standard somatic cell fusion procedures thus immortalizing these cells and yielding hybridoma cells. Such techniques are well known in the art, (e.g., the hybridoma technique originally developed by Kohler and Milstein (Nature 256, 495-497 (1975)) as well as other techniques such as the human B-cell hybridoma technique (Kozbor et al., Immunol. Today 4, 72 (1983)), the EBV-hybridoma technique to produce human monoclonal antibodies (Cole et al. Monoclonal Antibodies in Cancer Therapy (1985) Allen R. Bliss, Inc., pages 77-96), and screening of combinatorial antibody libraries (Huse et al., Science 246, 1275 (1989)). Hybridoma cells can be screened immunochemically for production of antibodies specifically reactive with the peptide and the monoclonal antibodies can be isolated. Therefore, the invention also contemplates hybridoma cells secreting monoclonal antibodies with specificity for a protein of the invention.
[0113] The term "antibody" as used herein is intended to include fragments thereof which also specifically react with a protein of the invention, or peptide thereof. Antibodies can be fragmented using conventional techniques and the fragments screened for utility in the same manner as described above. For example, F(ab').sub.2 fragments can be generated by treating antibody with pepsin. The resulting F(ab').sub.2 fragment can be treated to reduce disulfide bridges to produce Fab' fragments.
[0114] Chimeric antibody derivatives, i.e., antibody molecules that combine a non-human animal variable region and a human constant region are also contemplated within the scope of the invention. Chimeric antibody molecules can include, for example, the antigen binding domain from an antibody of a mouse, rat, or other species, with human constant regions. Conventional methods may be used to make chimeric antibodies containing the immunoglobulin variable region which recognizes a CipA protein (See, for example, Morrison et al., Proc. Natl. Acad. Sci. U.S.A. 81, 6851 (1985); Takeda et al., Nature 314, 452 (1985), Cabilly et al., U.S. Pat. No. 4,816,567; Boss et al., U.S. Pat. No. 4,816,397; Tanaguchi et al., European Patent Publication EP171496; European Patent Publication 0173494, United Kingdom patent GB 2177096B).
[0115] Monoclonal or chimeric antibodies specifically reactive with a protein of the invention as described herein can be further humanized by producing human constant region chimeras, in which parts of the variable regions, particularly the conserved framework regions of the antigen-binding domain, are of human origin and only the hypervariable regions are of non-human origin. Such immunoglobulin molecules may be made by techniques known in the art, (e.g., Teng et al., Proc. Natl. Acad. Sci. U.S.A., 80, 7308-7312 (1983); Kozbor et al., Immunology Today, 4, 7279 (1983); Olsson et al., Meth. Enzymol., 92, 3-16 (1982)), and PCT Publication WO92/06193 or EP 0239400). Humanized antibodies can also be commercially produced (Scotgen Limited, 2 Holly Road, Twickenham, Middlesex, Great Britain.)
[0116] Specific antibodies, or antibody fragments, reactive against a protein of the invention may also be generated by screening expression libraries encoding immunoglobulin genes, or portions thereof, expressed in bacteria with peptides produced from the nucleic acid molecules of the present invention. For example, complete Fab fragments, VH regions and FV regions can be expressed in bacteria using phage expression libraries (See for example Ward et al., Nature 341, 544-546: (1989); Huse et al., Science 246, 1275-1281 (1989); and McCafferty et al. Nature 348, 552-554 (1990)).
[0117] Antibodies may also be prepared using DNA immunization. For example, an expression vector containing a nucleic acid of the invention (as described above) may be injected into a suitable animal such as mouse. The protein of the invention will therefore be expressed in vivo and antibodies will be induced. The antibodies can be isolated and prepared as described above for protein immunization.
[0118] The antibodies may be labelled with a detectable marker including various enzymes, fluorescent materials, luminescent materials and radioactive materials. Examples of suitable enzymes include horseradish peroxidase, biotin, alkaline phosphatase, .beta.-galactosidase, or acetylcholinesterase; examples of suitable fluorescent materials include umbelliferone, fluorescein, fluorescein isothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, dansyl chloride or phycoerythrin; an example of a luminescent material includes luminol; and examples of suitable radioactive material include S-35, Cu-64, Ga-67, Zr-89, Ru-97, Tc-99m, Rh-105, Pd-109, In-111, I-123, I-125, I131, Re-186, Au-198, Au-199, Pb-203, At-211, Pb-212 and Bi-212. The antibodies may also be labelled or conjugated to one partner of a ligand binding pair. Representative examples include avidin-biotin and riboflavin-riboflavin binding protein. Methods for conjugating or labelling the antibodies discussed above with the representative labels set forth above may be readily accomplished using conventional techniques.
[0119] The antibodies reactive against proteins of the invention (e.g. enzyme conjugates or labelled derivatives) may be used to detect a protein of the invention in various samples, for example they may be used in any known immunoassays which rely on the binding interaction between an antigenic determinant of a protein of the invention and the antibodies. Examples of such assays are radioimmunoassays, enzyme immunoassays (e.g. ELISA), immunofluorescence, immuno-precipitation, latex agglutination, hemagglutination, and histochemical tests. Thus, the antibodies may be used to identify or quantify the amount of a protein of the invention in a sample in order to diagnose the presence of Rett's syndrome.
[0120] In a method of the invention a predetermined amount of a sample or concentrated sample is mixed with antibody or labelled antibody. The amount of antibody used in the process is dependent upon the labelling agent chosen. The resulting protein bound to antibody or labelled antibody may be isolated by conventional isolation techniques, for example, salting out, chromatography, electrophoresis, gel filtration, fractionation, absorption, polyacrylamide gel electrophoresis, agglutination, or combinations thereof.
[0121] The sample or antibody may be insolubilized, for example, the sample or antibody can be reacted using known methods with a suitable carrier. Examples of suitable carriers are Sepharose or agarose beads. When an insolubilized sample or antibody is used protein bound to antibody or unreacted antibody is isolated by washing. For example, when the sample is blotted onto a nitrocellulose membrane, the antibody bound to a protein of the invention is separated from the unreacted antibody by washing with a buffer, for example, phosphate buffered saline (PBS) with bovine serum albumin (BSA).
[0122] When labelled antibody is used, the presence of MeCP2E1 can be determined by measuring the amount of labelled antibody bound to a protein of the invention in the sample or of the unreacted labelled antibody. The appropriate method of measuring the labelled material is dependent upon the labelling agent.
[0123] When unlabelled antibody is used in the method of the invention, the presence of MeCP2E1 can be determined by measuring the amount of antibody bound to the protein using substances that interact specifically with the antibody to cause agglutination or precipitation. In particular, labelled antibody against an antibody specific for a protein of the invention, can be added to the reaction mixture. The presence of a protein of the invention can be determined by a suitable method from among the already described techniques depending on the type of labelling agent. The antibody against an antibody specific for a protein of the invention can be prepared and labelled by conventional procedures known in the art which have been described herein. The antibody against an antibody specific for a protein of the invention may be a species specific anti-immunoglobulin antibody or monoclonal antibody, for example, goat anti-rabbit antibody may be used to detect rabbit antibody specific for a protein of the invention.
(iii) Kits
[0124] The reagents suitable for carrying out the methods of the invention may be packaged into convenient kits providing the necessary materials, packaged into suitable containers. Such kits may include all the reagents required to detect a nucleic acid molecule or protein of the invention in a sample by means of the methods described herein, and optionally suitable supports useful in performing the methods of the invention.
[0125] In one embodiment of the invention, the kit includes primers which are capable of amplifying a nucleic acid molecule of the invention or a predetermined oligonucleotide fragment thereof, all the reagents required to produce the amplified nucleic acid molecule or predetermined fragment thereof in the polymerase chain reaction, and means for assaying the amplified sequences. The kit may also include restriction enzymes to digest the PCR products. In another embodiment of the invention the kit contains a nucleotide probe which hybridizes with a nucleic acid molecule of the invention, reagents required for hybridization of the nucleotide probe with the nucleic acid molecule, and directions for its use. In a further embodiment of the invention the kit includes antibodies of the invention and reagents required for binding of the antibody to a protein of the invention in a sample.
[0126] The kits may include nucleic acid molecules, proteins or antibodies of the invention (described above) to detect or treat neuropsychiatric disorders and developmental disorders together with instructions for the use thereof.
[0127] The methods and kits of the present invention may be used to detect neuropsychiatric and developmental disorders such as Rett's syndrome and mental retardation. Samples which may be tested include bodily materials such as blood, urine, serum, tears, saliva, feces, tissues, organs, cells and the like. In addition to human samples, samples may be taken from mammals such as non-human primates, etc.
[0128] Before testing a sample in accordance with the methods described herein, the sample may be concentrated using techniques known in the art, such as centrifugation and filtration. For the hybridization and/or PCR-based methods described herein, nucleic acids may be extracted from cell extracts of the test sample using techniques known in the art.
B. Therapeutic Applications
[0129] As mentioned previously, the nucleic acid molecules of the present invention are deleted or mutated in people with neuropsychiatric disorders and developmental disorders. Accordingly, the present invention provides a method of treating or preventing neuropsychiatric disorders and developmental disorders by administering a nucleic acid sequence containing a sufficient portion of the MECP2E1 splice variant to treat or prevent neuropsychiatric disorders and developmental disorders. The present invention includes a use of a nucleic acid molecule or protein of the invention to treat or detect neuropsychiatric disorders and developmental disorders.
[0130] Recombinant molecules comprising a nucleic acid sequence or fragment thereof, may be directly introduced into cells or tissues in vivo using delivery vehicles such as retroviral vectors, adenoviral vectors and DNA virus vectors. They may also be introduced into cells in vivo using physical techniques such as microinjection and electroporation or chemical methods such as coprecipitation and incorporation of DNA into liposomes. Recombinant molecules may also be delivered in the form of an aerosol or by lavage.
[0131] The nucleic acid sequences may be formulated into pharmaceutical compositions for administration to subjects in a biologically compatible form suitable for administration in vivo. By "biologically compatible form suitable for administration in vivo" is meant a form of the substance to be administered in which any toxic effects are outweighed by the therapeutic effects. The substances may be administered to living organisms including humans, and animals. Administration of a therapeutically active amount of the pharmaceutical compositions of the present invention is defined as an amount effective, at dosages and for periods of time necessary to achieve the desired result. For example, a therapeutically active amount of a substance may vary according to factors such as the disease state, age, sex, and weight of the individual, and the ability of antibody to elicit a desired response in the individual. Dosage regima may be adjusted to provide the optimum therapeutic response. For example, several divided doses may be administered daily or the dose may be proportionally reduced as indicated by the exigencies of the therapeutic situation.
[0132] The active substance may be administered in a convenient manner such as by injection (subcutaneous, intravenous, etc.), oral administration, inhalation, transdermal application, or rectal administration. Depending on the route of administration, the active substance may be coated in a material to protect the compound from the action of enzymes, acids and other natural conditions which may inactivate the compound.
[0133] The compositions described herein can be prepared by per se known methods for the preparation of pharmaceutically acceptable compositions which can be administered to subjects, such that an effective quantity of the active substance is combined in a mixture with a pharmaceutically acceptable vehicle. Suitable vehicles are described, for example, in Remington's Pharmaceutical Sciences (Remington's Pharmaceutical Sciences, Mack Publishing Company, Easton, Pa., USA 1985). On this basis, the compositions include, albeit not exclusively, solutions of the substances in association with one or more pharmaceutically acceptable vehicles or diluents, and contained in buffered solutions with a suitable pH and iso-osmotic with the physiological fluids.
C. Experimental Models
[0134] The present invention also includes methods and experimental models for studying the function of the MECP2 gene and MeCP2E1 protein. Cells, tissues and non-human animals that lack the MECP2E1 splice variant or partially lack in MeCP2E1 expression may be developed using recombinant expression vectors having a specific deletion or mutation in the MECP2E1 gene. A recombinant expression vector may be used to inactivate or alter the MECP2 gene by homologous recombination and thereby create a MECP2E1 deficient cell, tissue or animal. In particular, a targeted mutation could be designed to result in deficient MECP2E1 while MECP2E2 remains unaltered. This can be accomplished by targeting exon 1 of the MECP2 gene.
[0135] Null alleles may be generated in cells, such as embryonic stem cells by deletion mutation. A recombinant MECP2 gene may also be engineered to contain an insertion mutation which inactivates MECP2E1. Such a construct may then be introduced into a cell, such as an embryonic stem cell, by a technique such as transfection, electroporation, injection etc. Cells lacking an intact MECP2 gene may then be identified, for example by Southern blotting, Northern Blotting or by assaying for MECP2E1 using the methods described herein. Such cells may then be fused to embryonic stem cells to generate transgenic non-human animals deficient in MECP2E1. Germline transmission of the mutation may be achieved, for example, by aggregating the embryonic stem cells with early stage embryos, such as 8 cell embryos, in vitro; transferring the resulting blastocysts into recipient females and; generating germline transmission of the resulting aggregation chimeras. Such a mutant animal may be used to define specific cell populations, developmental patterns and in vivo processes, normally dependent on MECP2E1 expression. The present invention also includes the preparation of tissue specific knock-outs of the MECP2E1 variant.
[0136] The following non-limiting examples are illustrative of the present invention:
EXAMPLES
Example 1
Identification of MEC2E1 Splice Variant
[0137] Inspection of the 5'UTR revealed that, whereas exon 2 has a number of in-frame stops upstream of the ATG, exon 1 contains an open reading frame across its entire length including an ATG. Submitting a theoretical construct composed of exons 1, 3 and 4 to the ATGpr program (www.hri.co.jp/atgpr/), which predicts the likelihood of an ATG to be an initiation codon based on significance of its surrounding Kozak nucleotide context, returned a reliability score of 97% compared to 64% for MECP2E2. A search in EST databases identified eight examples of our theorized transcript (named MECP2E1) (FIG. 1b) (vs. 14 examples of MECP2E2). MECP2E1 would be predicted to encode a new variant, MeCP2E1, with an alternative longer N-terminus determined by exon 1.
Example 2
Expression of MECP2E1
[0138] To confirm that MECP2E1 is in fact expressed and not an artifact of cDNA library preparations, cDNA from a variety of tissues was PCR-amplified using a 5'-primer in exon 1 and a 3'-primer in exon 3 (FIG. 1a). Two PCR products corresponding to MECP2E2 and MECP2E1 by size and sequence were obtained in all tissues, including fetal and adult brain, and in brain subregions (FIG. 1c). Results in mouse were similar (FIG. 1c). The expression levels of the two transcripts in adult human brain were quantified. MECP2E1 expression is 10 times higher than MECP2E2 (FIG. 1d). The subcellular localization of MeCP2E1 following transfection of 3' myc-tagged MECP2E1 into COS-7 cells was found to be principally in the nucleus (FIG. 1e).
[0139] MECP2E1 was not detected in previous expression studies. Northern analyses reveal three transcripts, 1.9, 5 and 10.1 kb, with the differences in size due to alternative polyadenylation signal usage (4,6,8) (FIG. 1a). MECP2E1 differs from MECP2E2 in lacking the 124-nucleotide exon 2. At the 5 and 10.1 kb positions on the gel, the two transcripts would not be separable. In the 1.9 kb range, published northern blots do show a thick or double band likely corresponding to the two transcripts. Likewise, conventional western blot analysis would not allow resolution of the two MeCP2 isoforms (molecular weight difference <0.9 kD; FIG. 1f).
Example 3
Mutations in MECP2E1 in Rett's Syndrome
[0140] To determine whether the new coding region is mutated in Rett's syndrome, Exon 1 and flanking sequences were PCR-amplified and sequenced in 19 girls with typical RTT in whom no mutations had been found in the other exons. One patient (V1) was found to carry an 11 bp deletion mutation in exon 1 (FIG. 2). The deletion occurs within the predicted exon 1 open reading frame of MECP2E1 and leads to a frame shift that results in a missense amino acid sequence followed by a premature stop codon after amino acid 36. It does not affect the coding sequence of MECP2E2. This sequence change was not found in 200 control individuals including the patient's parents and brother.
[0141] To search, in the remaining patients, for additional exon 1 deletions not detectable by our PCR reaction, multiplex ligation-dependent probe amplification (MLPA) (5) was performed in all four exons and detected a hemizygous deletion of exon 1 in one patient (Patient V2; FIG. 3). Finally, an additional patient with an MLPA-detected deletion restricted to exon 1 was recently documented in abstract form, though the effect on MECP2E1 was not realized (S. Boulanger et al. Am J Hum Genet 73, 572 (2003)).
[0142] In contrast, no mutation specific to the MeCP2E2-defining exon 2 has been found to date despite several hundred patients analyzed for mutations in this exon (31 publications; most reviewed in ref 3). These studies did not include exon 1 as it was considered non-coding.
[0143] Exon 1 deletions result in absent or truncated MeCP2E1 proteins. However, they also result in shortening of MECP2E2's 5'UTR and may possibly affect its expression. This possibility was tested in patient V1 by RT-PCR on whole blood. No diminution of MECP2E2 expression was present (FIG. 2c). In conclusion, mutation data indicate that inactivation of MeCP2E1 is sufficient in RTT, but the same cannot be said, to date, of MeCP2E2.
Materials and Methods
[0144] PCR, manual sequencing, cloning, rtPCR, gel blotting. PCR amplification was performed using [NH.sub.4].sub.2SO.sub.4-containing PCR buffer (MBI Fermentas) with 1M betaine, 200 .mu.M dNTPs including 50% deaza dGTP, with a 95.degree. C. denaturing step for 3 minutes, followed by cycling at 95.degree. C. for 30 secs, 55.degree. C. for 30 secs, 72.degree. C. for 45 secs for 30 cycles, followed by a 7 minute soak step at 72.degree. C. Manual sequencing was performed, following extraction from a 1% agarose gel, using the Thermosequenase.TM. kit (USB/Amersham) and run on a 6% denaturing polyacrylamide gel for 3 hours. PCR products were cloned using the pDRIVE vector (Qiagen PCR cloning kit). Whole blood RNA was extracted using the PAXgene Blood RNA Kit (Qiagen). Reverse transcription was performed with random hexamers and a standard Superscript III protocol (Invitrogen). Human brain subregion cDNA was obtained from OriGene. The polyacrylamide gel in (FIG. 2c) was blotted onto Hybond N+ (Amersham) and hybridized with primer HF labeled at the 3'end with [.alpha..sup.32P]-dCTP using deoxynucleotidyl transferase (MBI Fermentas).
[0145] Preparation of neuronal and glial cultures. Cerebral cortices were prepared from 15.5 days postcoitum (15.5 dpc) embryos of CD-1 mice. The procedure of Yamasaki et al. (Yamasaki et al. Hum Mol Genet 12: 837-847, 2003) was used. Briefly, fetal cerebral cortices without meninges were dissociated by mechanical trituration and digested with 0.25% trypsin with EDTA. After adding fetal bovine serum (FBS; GIBCO BRL), filtered cells were collected by centrifugation. The cell pellet was resuspended in Neurobasal (GIBCO BRL) medium supplemented with B-27 (GIBCO BRL) for growth of neurons or with G-5 (GIBCO BRL) for growth of glial cells. Cells were plated on polyethyleneimine-coated plastic dishes at a density of 2.times.10.sup.6 cells/ml. Cultures of neurons and glial cells were maintained in 5% CO.sub.2 at 37.degree. C. for 6 days and 12 days, respectively. Isolated brain cells were characterized by RT-PCR and immunofluorescence (IF) using the markers MAP2 (microtubule-associated protein 2) for neurons, GFAP (glial fibrillary acidic protein) for glial cells and NESTIN for progenitor cells. For IF, the following specific antibodies were used: mouse monoclonal anti-MAP2 (CHEMICON), and rabbit polyclonal anti-GFAP (DAKO). The primers used for rtPCR were same as Yamasaki et al. To obtain a semi-quantitative PCR, optimal cDNA concentration and number of cycles were determined according to Gapdh amplification as an internal control. FIG. 4 shows the characterization of the primary brain cell cultures by rtPCR (A) and IF (B).
[0146] Quantitative rtPCR. To determine the quantity of the MECP2 transcripts in different tissues, we developed transcript-specific real-time quantitative PCR assays using SYBR Green detection method (PE Applied Biosystems, ABI PRISM 7900 Sequence Detection System). The following MECP2E2-specific forward primer (25 nM) (in exon 2) was designed: 5'-ctcaccagttcctgctttgatgt-3' (SEQ ID No. 12). The MECP2E1-specific primer (25 nM) was placed at the junction of exons 1 and 3: 5'-aggagagactggaagaaaagtc-3' (SEQ ID No. 10). Both assays used the same reverse primer (25 nM) in exon 3: 5'-cttgaggggtttgtccttga-3' (SEQ ID No. 11), producing fragments of 161- (MECP2E2) and 65-bp (MECP2E1). The corresponding transcript-specific primers (25 nM) for the mouse mecp2 transcripts (mecp2e2 167 bp and mecp2e1 71 bp) were 5'-ctcaccagttcctgctttgatgt-3' (SEQ ID No. 12) (MECP2E2); 5'-aggagagactggaggaaaagtc-3' (SEQ ID No. 13) (MECP2E1) and the common reverse primer 5'-cttaaacttcagtggcttgtctctg-3' (SEQ ID No. 14). PCR conditions were: 2 min 50 C, 10 min 95 C and 40 cycles of 15 sec 95 C, 85 s 60 C. The PCR reactions were performed in separate tubes; and absolute quantitation of the MECP2E2 and E1 transcripts was performed from cDNA from human adult brain, cerebellum, fibroblast and lymphoblast (Clontech, Palo Alto, USA), as well as from murine neuronal and glial cell cultures (see above). Results were analyzed using the standard curve method according to the manufacturer's instructions (PE Applied Biosystems, ABI PRISM 7900 Sequence Detection System). The standard curve was developed using dilutions of the transcript-specific purified PCR products.
[0147] Immunofluorescence light microscopy. 3'-myc-tagged MECP2E2 and MECP2E1 constructs (pCDNA3.1A-MECP2E2-myc and pcDNA3.1A-MECP2E1-myc) were generated by PCR amplification of full-length cDNA of each transcript with BamHI (5') and XbaI (3') restriction sites attached and subsequent cloning in-frame with myc into pcDNA3.1 version A (Invitrogen). The forward primer for MECP2E2 contained the start codon in exon 2 (5'-tatggatccATGgtagctgggat-3') (SEQ ID No. 15), while the forward primer for MECP2E1 included the start codon in exon1 (5'-tatggatccggaaaATGgccg-3') (SEQ ID No. 16) (BamHI restriction site underlined, start codon uppercase). The reverse primer was the same for both amplifications (5'-gcgtctagagctaactctct-3') (SEQ ID No. 17) (XbaI restriction site underlined). The template used for PCR was small intestine cDNA for MECP2E2 and skeletal muscle cDNA for MECP2E1. pcDNA3.1A-MECP2E2-myc and pcDNA3.1A-MECP2E1-myc (2 ug) were transfected into COS-7 cells using lipofectamine (Invitrogen) and the lipid-DNA complex was exposed in DMEM (GIBCO) for 5 hours. Forty-eight hours post-transfection the cultures were rinsed in PBS and fixed for 15 min at -20.degree. C. in an acetone:methanol (1:1) mix, blocked for 1 hour (10% BSA in PBS) and incubated with anti-myc (Santa Cruz Biotechnology, 1:50 in blocking buffer) for 45 min at room temperature. After washing with PBS, slides were incubated with secondary antibody (FITC-labeled goat anti-mouse (Jackson Immunoresearch labs), 1:400, detectable through the green filter) in blocking solution, mounted with Dako Anti-Fade and analyzed by immunofluorescence light microscopy.
[0148] MLPA analysis. MLPA was performed as described by Schouten et al., supra and as described by Schouten, supra. MECP2 test kits from MRC-Holland, Amsterdam, Netherlands (www.mrc-holland.com) were utilized and consisted of 20 probe pairs that target the four MECP2 exons, six X-linked control regions and ten autosomal control regions. Briefly, 100-200 ng of genomic DNA was denatured and hybridized with the probe mix overnight at 60.degree. C. The following morning the paired probes were ligated using heat stable Ligase-65 at 54.degree. C. for 15 minutes. The ligation was followed with PCR with a common primer pair that hybridizes to the terminal end of each ligation product. One PCR primer was FAM-labeled and conditions for the PCR were as follows: 95.degree. C. 30s, 60.degree. C. 30s and 72.degree. C. 1 min. The resulting amplicons were analyzed on an ABI 3100 capillary electrophoresis instrument and ABI Genescan software. All data management and comparisons to normal controls were done with Excel software.
Discussion
[0149] Recently, studies in frog (Xenopus laevis) afforded important insight into the role of MeCP2 in neurodevelopmental transcription regulation. MeCP2 was shown to be a component of the SMRT complex involved in the regulation of genes involved in neuronal differentiation following developmental stage-specific mediation by Notch-Delta. The frog Mecp2 transcript targeted for silencing in these experiments is an orthologue of MECP2E1 (FIG. 1f). In fact, MeCP2E1 appears to be the only form of MeCP2 in non-mammalian vertebrates (FIG. 1f).
[0150] The new MeCP2 N-terminus is a distinctive 21 amino acid peptide including polyalanine and polyglycine tracts (MAAAAAAAPSGGGGGGEEERL) (SEQ ID No. 18) (FIG. 1f). A similar N-terminus occurs in the ERK1 (MAPK3) extracellular signal-regulated kinase (FIG. 1f), a key common component of multiple signal transduction pathways. Intriguingly, in neurons, both ERK1 and MeCP2 have been shown to be present in the post-synaptic compartment, in addition to the nucleus, and the former shown to translocate between the two compartments to link synaptic activity to transcriptional regulation. It is possible that MeCP2E1 similarly links synaptic function, in this case neurodevelopmental synaptic contact guidance, with transcriptional regulation. The only other proteins in which consecutive polyalanine and polyglycine tracts are found are in some members of the homeobox (HOX) family. These, like MeCP2, are developmental transcription regulators.
[0151] Finally, non-inactivating MECP2 mutations have been associated with phenotypes that overlap RTT such as mental retardation and autism. The MeCP2 variant discovered in this study is a candidate for involvement in these disorders.
Example 4
Mutations in MECP2E1 in Mental Retardation
[0152] The inventors screened the MECP2E1 gene in N=401 autism probands, and in N=493 patients with non-specific mental retardation. Autism probands recruited through the Hospital for Sick Children in Toronto (N=146; 114 male, 32 female) and from London, UK (N=13; 10 male, 3 female) were also screened, as well as probands from multiplex families from the Autism Genetic Resource Exchange (AGRE; N=242; 100 female, 142 male). Local institutional ethics board approval was obtained, and written consent given by participants. Anonymized DNA samples were also obtained for 293 female and 200 male patients with non-specific developmental delay/mental retardation who had been referred for fragile-X testing (but tested negative) to the Department of Pediatric Laboratory Medicine at the Hospital for Sick Children. Polymerase chain reaction followed by denaturing high performance liquid chromatography (DHPLC) was used for mutation detection, with PCR primers and conditions as described previously in Example 3. PCR product from female individuals suspected of carrying a sequence variant was cloned into the pDRIVE vector (Qiagen), and at least four clones sequenced using automated BIGDYE.TM. Terminator v3.1 Cycle Sequencing Kit (ABI 3100) in forward and reverse directions. PCR products from males were excised from agarose gel, column purified, then sequenced, also using automated BIGDYE.TM. Terminator v3.1 Cycle Sequencing Kit (ABI 3100) in both forward and reverse directions. No mutations were identified among the autism screening set, however sequence variants were identified among eight of the female MR cases (see FIG. 5), three of which result in insertion or deletion of amino acids within the polyalanine repeat stretch, and two of which result in insertion of a glycine residue within the polyglycine repeat at the N-terminal portion of MECP2E1. The first individual identified was heterozygous for a deletion of a GpC dinucleotide positioned 45-46 bp upstream of the putative MECP2E1 start codon. This deletion could disrupt a potential SP1 transcription factor binding site (as predicted using AliBaba2.1 www.gene-regulation.com/pub/programs/alibaba2/index.html), and may also eliminate potentially methylatable cytosine residues. Another individual is heterozygous for an ApG dinucleotide deletion 26 bp upstream of the MECP2E1 start codon. Two individuals are heterozygous for a GGA trinucleotide insertion within a poly[GGA] stretch, which would result in an additional glycine residue within the predicted polyglycine stretch. A fifth individual is heterozygous for a GCC trinucleotide deletion within a triplet repeat stretch encoding polyalanine. Two individuals are heterozygous for a 9 bp insertion, also within the GCC trinucleotide repeat/polyalanine region, and would result in the polyalanine stretch being extended from seven to ten residues.
[0153] The amino acid sequence variation in .about.2% of female non-specific MR cases in a new isoform of a protein that has previously been associated with a mental retardation syndrome, is extremely intriguing. Moreover, the fact that the variation occurs within a part of the protein that is conserved across many vertebrate species also adds to the interest (100% identity to chimpanzee, orang-utan, macaque, cat and dog MeCP2E1 amino acid sequence). It would be particularly useful to know whether there are any specific phenotypic features among the individuals with the variants, how severe the symptoms are an whether there are overlaps with or distinctions from the Rett syndrome phenotypes. However, since the DNAs were anonymized, it is not possible, in this instance, to correlate the mutations discovered with phenotypic features or severity. In an attempt to address this issue, a second sample set of MR cases (188 female and 96 male) from the Greenwood Genetic Center, South Carolina, were screened, followed by sequencing. No variants were found in the males, and two of the females carried the GGA insertion encoding an extra glycine residue.
[0154] In the present study, three female MR patients were identified with a 3 bp insertion leading to an extra glycine residue within the polyglycine stretch at the N-terminal end of MeCP2E1. No disease association has previously been reported with expansion within a glycine repeat. The function of polyglycine stretches, either within the context of the MeCP2E1 protein or more generally, is not known, although a study of the Toc75 protein in plants suggests that a polyglycine stretch in the protein is essential for correct targeting of the protein to the chloroplast outer envelope. A similar function of protein trafficking may also be the case for mammalian proteins with polyglycine stretches, and for MeCP2E1.
[0155] The variants within the polyalanine tracts are of particular interest, as they are rarely polymorphic, and because a number of small expansions (or duplications) within such tracts have been reported to cause diseases, ranging from cleidocranial dysplasia (RUNX2), oculopharyngeal muscular dystrophy (PABPN1) and mental retardation (ARX; this gene is also X-chromosomal and has a very broad array of phenotypes--see above). The majority of polyalanine disease genes encode transcription factors, although PABPN1 gene encodes a polyadenylate binding protein. On the one hand, amongst these diseases, the smallest pathogenic repeats within the transcription factor genes are generally greater than 20 alanines in length, thus it could be considered improbable that a stretch of alanines as short as that encoded by MECP2E1 could be pathogenic, and a change of 1 or 3 alanine residues could be considered likely to be rare polymorphisms. There is currently some uncertainty as to whether small expansion of 1 or 3 alanine residues within the ARX gene may be pathogenic or innocent variants. On the other hand, oculopharyngeal muscular dystrophy is caused by mutations within a GCG tract in the PABPN1 gene, that expand a polyalanine tract from just 10 alanine residues to between 12 and 17 alanine residues. Moreover, as with the polyalanine tract in MeCP2E1, the polyalanine tract in PABPN1 is right at the N-terminal end of the gene, and thus it is possible that smaller mutations within repeat stretches within the N-terminal portion of a protein may be more detrimental than larger mutations located in the central portions of proteins.
[0156] A recently published study screened for mutations in MECP2 exon 1 among 97 Rett patients with no mutation in exons 2, 3 or 4, and among 146 controls. One of the Rett patients was found to have a 6 bp insertion within the polyalanine-encoding [GCC] stretch, but no such variations were observed among the controls. The variant was inherited from an unaffected mother, and it was concluded that the variant is thus unlikely to be etiologically relevant. However, it has also been demonstrated recently that even subtle changes in expression of MECP2 in mice can have profound neurological and behavioural consequences. It is apparent that patients with the same MECP2 mutation may have very different phenotypic features and severity, and it is likely that variation in X-inactivation pattern plays a role in this discordancy. Thus it is quite feasible that variation in exon 1, either within the repeat stretches resulting in change in length of polyalanine or polyglycine stretch, or in the region just upstream of the start codon, may affect function or expression levels resulting in a neuropathological phenotype.
Example 5
Additional Mutations in MECP2E1 in Rett's Syndrome
[0157] The entire coding regions of exons 1, 2, 3 and 4 and their intronic flanking sequences were analyzed. Exons 2 to 4 were amplified by PCR with primer pairs designed with the use of genomic sequence information from the Human Genome Project working draft site (UCSC, www.genome.ucsc.edu) and the Lasergene Primer select program. The PCR products were loaded on 2% agarose gel to confirm amplification before analysis for base changes by dHPLC (WAVE Nucleic Acid Fragment Analysis System from Transgenomic, San Jose, Calif.). Solvent A consisted of 0.1 mol/L triethylammonim acetate (TEAA) and 25% acetonitrile and solvent B contained 1M TEAA, 25% acenonitril. PCR products showing a chromatographic variation on dHPLC were sequenced directly on an automatic sequencer (Gene Reader 4200). The sequencing data was analyzed using DNA Star software SeqMan (Lasergene). Exon 1 was PCR amplified and sequenced in all patients as recently described.
[0158] The first exon 1 mutation consists of two missing base pairs at the exon 1 intron 1 boundary. Because of the nature of the sequence in this region, we cannot resolve whether the missing two nucleotides are the first two base pairs of intron 1 (GT) or the last nucleotide of exon 1 (T) and the first nucleotide of intron 1 (G). In either case, the missing pair of nucleotides destroys the predicted consensus splice site and results in readthrough of intron 1 (data not shown). In the second patient with an exon 1 mutation a 1A.fwdarw.T substitution (ATG->TTG) changes the first Methionine codon into a Leucine. The prediction is that MECP2E1 translation would be greatly or totally hindered due to absence of a start codon. MECP2E2 would be normally made (and appears unable to rescue the disease phenotype).
Example 6
Additional Mutations in MECP2E1 in Rett's Syndrome
Patients
[0159] Thirty-five samples from females were referred to Children's Mercy Hospital for RTT testing in a two year period spanning September of 2004 through September of 2006 (See, for example, Saunders, C. J., et al., "Novel Exon 1 Mutations in MECP2 Implicate Isoform MeCP2_e1 in Classical Rett Syndrome," American Journal of Medical Genetics, 149A: 1019-1023 (2009)). These patients had various clinical presentations, including autism, mental retardation, developmental delay, and "Angelman-like", and only 9 patients fit the criteria for classical (N=7) or variant (N=2) RTT. Permission to review patient charts was obtained through the Children's Mercy Hospitals and Clinics' Institutional Review Board. In addition, 16 female patients were ascertained through either the Hospital for Sick Children or Centre for Addiction and Mental Health in Toronto, either with autism and developmental delay (N=14) or Rett syndrome (N=2). This ascertainment was subsequent to the study reported by Mnatzakanian, G. et al., "A previously unidentified MECP2 open reading frame defines a new protein isoform relevant to Rett syndrome," Nat Genet., 36: 339-341 (2004) and there is no overlap of subjects between that and the current study. Screening for mutations in MECP2 identified four patients with mutations involving exon 1.
[0160] Patient 1 was a 20-year-old at the time of testing who had a long standing clinical diagnosis of RTT but had never undergone confirmatory DNA testing. She met the criteria for classical RTT, with the exception of acquired microcephaly (head circumference is at 15%). Following normal perinatal development, she sat at 6 months, walked at 14 months, used simple words at 18 months, around which time she began to regress. She lost all speech in addition to purposeful hand movements, which were replaced by a sifting activity. She now walks with a shuffling gait, exhibits some aggressive behavior, is nonverbal, and has medically intractable epilepsy.
[0161] Patient 2 was 7 years old at the time of testing. She met the criteria for classical RTT, with the exception of acquired microcephaly (head circumference 50%). She had a period of normal development, such as smiling, rolling over, and sitting at appropriate times, but around 10 months she exhibited global developmental delay. There was no clear regression in her skills at that point. Around the age of 2, she developed a stereotypic midline hand movement involving her left hand in her mouth and her right hand twirling her hair or rubbing her hair between her fingers. She commando crawls for mobility and will take steps with assistance. She is very hirsute and has precocious puberty with pubic hair development beginning at age 5. She has episodic seizures that do not require daily medication. She had previously tested negative for MECP2 mutations in exons 2-4, MECP2 duplications and deletions, and research testing involving sequencing of the MECP2 promoter region. The family came to the clinic in pursuit of mutation screening for the cyclin-dependent kinase-like 5 (CDKL5) gene, but upon closer examination of the patient's medical record, it was discovered that exon 1 of MECP2 had not been sequenced.
[0162] Patient 3 was a 16-year-old female with a clinical diagnosis of Rett syndrome since 20 months of age. She had microcephaly, developmental regression, severe cognitive insufficiency, midline hand movements, general tonic-clonic seizure disorder, loss of gait, diffuse hypertonicity, scoliosis treated with surgery, GE reflux requiring gastrostomy tube, and multiple hospitalizations for bacterial pneumonia. On her last admission for pneumonia, she succumbed to respiratory insufficiency and was not resuscitated. Brain autopsy showed microencephaly, subpial gliosis, minimal loss of Purkinje cells with gliosis, and isolated eosinophilic neurons in the dentate nucleus and brain stem. Previous testing for MECP2 exons 2-4 was negative.
[0163] Patient 4 had a clinical diagnosis of Rett syndrome since age 10. At birth, she had a normal head circumference but poor muscle tone. Global developmental delays, intense eye contact and screaming spells were noted in infancy. Teeth grinding, hand flapping, and deterioration in fine motor skills began from age 3 to 4. Speech development was slow but she acquired a vocabulary of about 25 words before the onset of loss of speech at age 6 and she became non-verbal by age 10. She first walked at age 14 months following intensive physiotherapy, and still walks unassisted despite occasional loss of balance due to mild gait dyspraxia. Other significant medical history included scoliosis (treated with surgery) and chronic constipation. There is no history of seizures or acquired microcephaly. When the patient was 28 years old, the family sought molecular genetic testing to confirm the clinical diagnosis of Rett syndrome.
[0164] Research ethics board approval was obtained for the study, and written consent obtained for the four patients described here.
Sequence Analysis
[0165] DNA from blood, or in the case of patient 3, cultured fibroblast cells, was extracted by a manual salting out procedure (Lahiri, D. K. and Nurnberger, J. I., "A rapid non-enzymatic method for the preparation of HMW DNA from blood for RFLP studies," Nucleic Acids Res., 19: 5444 (1991)). For most of the 35 subjects the entire MECP2 coding region (exons 1-4) was analyzed (primers and PCR conditions available upon request); for Patients 2 and 3, only exon 1 was analyzed since the remaining coding region had been previously tested by an outside laboratory. Exon 1 of the MECP2 gene was PCR-amplified as described previously (Mnatzakanian, G. et al., "A previously unidentified MECP2 open reading frame defines a new protein isoform relevant to Rett syndrome," Nat Genet., 36: 339-341 (2004)) and verified on a 2% agarose gel. Fragments were purified using ExoSAPit (USB Corp., Cleveland Ohio). Purified products were sequenced in both forward and reverse directions by automated fluorescent dye-terminator sequencing using Big Dye v3.0 (Applied Biosystems, Foster City, Calif.) and run on an ABI310 (Applied Biosystems). For Patient 2, allele-specific sequence was obtained after cloning the heterozygous PCR product into a TA cloning vector (Invitrogen, Carlsbad, Calif.). The sequence data was compared to the MECP2 reference sequence AF030876 using Sequencher software (Gene Codes, Ann Arbor, Mich.).
[0166] In silico analysis of efficiency of translation start sites affected by exon 1 mutations was performed on MEPC2 mRNA sequences using NetStart (www.cbs.dtu.dk/services/NetStart).
X-Chromosome Inactivation
[0167] X-chromosome inactivation was assessed on genomic DNA from peripheral blood leukocytes by methylation-sensitive restriction digestion followed by PCR amplification across the androgen receptor [CAG] repeat region, according to the method described by Plenge, R. M. et al., "Skewed X-chromosome inactivation is a common feature of X-linked mental retardation disorders," Am J Hum Genet., 71: 168-173 (2002).
Results
[0168] In 51 samples tested for RTT, four unrelated patients with exon 1 mutations were identified.
[0169] In Patient 1, a mutation was detected, c.1A>T in SEQ ID No. 1 that disrupts the initiation codon, changing it to a leucine. SEQ ID No. 1 contains non-coding exon sequence upstream of the start codon, so the mutation is located at position 8 in SEQ ID No. 1 which corresponds to the first position in the coding exon of SEQ ID No. 1. In silico analysis of translation initiation using NetStart predicts that translation of MeCP2_e1 would be ablated, but without any negative affect on translation of MeCP2_e2. The patient's mother tested negative for this mutation, however the father's DNA was not available for testing. X-chromosome inactivation in peripheral blood leukocytes appeared to be random.
[0170] Patient 2 has a mutation, c.62+1delTG in SEQ ID No. 1, affecting the splice donor (Amir, R. E. et al., "Mutations in exon 1 of MECP2 are a rare cause of Rett syndrome," J Med Genet., 42: e15 (2005)). SEQ ID No. 1 contains non-coding exon sequence upstream of the start codon, so the mutation is located at positions 69 and 70 in SEQ ID No. 1 which corresponds to positions 62 and 63 in the coding exon of SEQ ID No. 1. Analysis of parental DNA revealed that it arose as a de novo mutation, not present in either parent. This mutation is predicted to disrupt splicing of the MECP2E1 mRNA, and may also affect the translation of the MeCP2_e2 isoform from the exon 2-containing mRNA, MECP2E2 (Amir, R. E. et al., "Mutations in exon 1 of MECP2 are a rare cause of Rett syndrome," J Med Genet., 42: e 15 (2005) and Saxena, A. et al., "Lost in translation: translational interference from a recurrent mutation in exon 1 of MECP2," J Med Genet., 43: 470-477 (2006)). This patient had a random pattern of X-chromosome inactivation in peripheral blood leukocytes.
[0171] Patient 3 had a C>T transition (c.5C>T) in SEQ ID No. 1 resulting in a missense mutation, A2V. SEQ ID No. 1 contains non-coding exon sequence upstream of the start codon, so the mutation is located at position 12 in SEQ ID No. 1 which corresponds to the fifth position in the coding exon of SEQ ID No. 1. Though an alanine to valine substitution is conservative in retaining a nonpolar side chain, this is a residue that is perfectly conserved throughout evolution and marks the beginning of a polyalanine stretch which is present in all vertebrate species (Harvey, C. G. et al., "Sequence Variants Within Exon 1 of MECP2 Occur in Females With Mental Retardation," Am J Med Genet (Neuropsychiatr Genet), 144: 355-360 (2007)). Though the role of this repeat is unknown, it contains multiple binding sites for the SP1 transcription factor, the alterations of which would affect the rate of gene transcription. This patient's parents both tested negative for this mutation, indicating this is a de novo mutation.
[0172] Patient 4 had a A>G transition (c.1 A>G) in SEQ ID No. 1 resulting in the start methionine codon being substituted by a valine codon. SEQ ID No. 1 contains non-coding exon sequence upstream of the start codon, so the mutation is located at position 8 in SEQ ID No. 1 which corresponds to the first position in the coding exon of SEQ ID No. 1. Both parents were negative for this mutation. As with Patient 1, this mutation is predicted to ablate translation of MeCP2_e1, but without any negative affect on translation of MeCP2_e2. X-chromosome inactivation in peripheral blood leukocytes showed skewing, 90:10.
[0173] The presence of these missense/start codon mutations in classic Rett patients, uniquely affecting the MeCP2_e1 isoform, clearly indicates the importance of this isoform in the etiology of Rett syndrome. None of these sequence changes were identified in a previous study that screened MECP2 exon 1 in 1,811 subjects with developmental delay or autism, and 498 healthy adult control individuals (Harvey, C. G. et al., "Sequence Variants Within Exon 1 of MECP2 Occur in Females With Mental Retardation," Am J Med Genet (Neuropsychiatr Genet), 144: 355-360 (2007)).
Discussion
[0174] MECP2 was sequenced in 51 females with various clinical presentations, including developmental delay, autism, atypical and classical RTT, referred to the laboratory for testing. In patients with identified mutations, X-chromosome inactivation was analyzed. Four patients were identified with exon 1 mutations (c.1A>T; c.1A>G; c.5C>T), two of which affected the start codon, one a missense change, and one patient had a previously reported splice site mutation, c.62+1delGT. The 4 patients fit criteria for classical RTT, and thus these findings add support to previous reports that exon 1 mutations may be associated with a severe phenotype. Also, these findings add significant weight to the mounting evidence suggesting that the MeCP2_e1 isoform is the etiologically relevant form of the protein.
[0175] As discussed above, three mutations were detected within exon 1 of the MECP2 gene in 35 clinical samples referred to CMH for MECP2 sequencing, and in one out of 16 samples from the Toronto patient set. All four were associated with classical RTT. Two of these patients had previously tested negative by molecular testing, which at the time included sequencing of exons 2-4 of the MECP2 gene. Following the reports of the second MeCP2 isoform (MeCP2_e1) and the clinical utility of sequencing exon 1, these patients were tested for exon 1 mutations. The total number of distinct exon 1 mutations detected by sequencing is now 10. Two of these mutations, c.47_57del11nt and c.62+1delGT, have been found in more than one patient (see Table 2). This brings the number of Rett patients known to have a mutation within exon 1 of MECP2 to 14.
[0176] All mutations localized to exon 1 reported until recently have been either small insertions or deletions or large deletions removing the entire exon. The c.1A>T and c.1A>G mutations, which are single base pair changes, are the first point mutations to be reported in exon 1 of the MECP2 gene (also see Gauthier, J. et al., "Clinical stringency greatly improves mutation detection in Rett syndrome," Can J Neurol Sci, 32: 321-6 (2005)). The c.1A>T and c.1A>G mutations alter the initiation codon, which would mostly likely result in absent translation of MeCP2_e1. MeCP2_e2 would be presumably unaffected but is clearly unable to compensate, as evidenced by the patients' classic RTT symptoms. Patient 3 had a C>T transition (c.5C>T) resulting in a missense mutation, A2V. This alanine is a perfectly conserved residue that marks beginning of a polyalanine stretch that is present in all vertebrate species (Harvey, C. G. et al., "Sequence Variants Within Exon 1 of MECP2 Occur in Females With Mental Retardation," Am J Med Genet (Neuropsychiatr Genet), 144: 355-360 (2007)). The role of this repeat is unknown, but it could play a role in the regulation of gene transcription, given the multiple binding sites for the SP1 transcription factor. This patient's parents both tested negative for this mutation, indicating this is a de novo, most likely pathogenic mutation. This also emphasizes the functional importance of the N-terminal portion of MeCP2_e1. There are a number of lines of evidence pointing to the likelihood that the MeCP2_e1 isoform is more relevant to RTT etiology than MeCP2_e2: a) no exon 2 missense mutations (which should only affect MeCP2_e2) have been identified to date; b) MeCP2_e1 is the predominant isoform expressed in neuronal tissues Kriaucionis, S. and Bird, A., "The major form of MECP2 has a novel N-terminus generated by alternative splicing," Nucleic Acids Res, 32: 1818-1823 (2004); Mnatzakanian, G. et al., "A previously unidentified MECP2 open reading frame defines a new protein isoform relevant to Rett syndrome," Nat Genet., 36: 339-341 (2004)); c) MeCP2_e1 appears to be the ancestral form of the gene-MeCP2_e2 is only found among the higher vertebrates (Mnatzakanian, G. et al., "A previously unidentified MECP2 open reading frame defines a new protein isoform relevant to Rett syndrome," Nat Genet., 36: 339-341 (2004) and Harvey, C. G. et al., "Sequence Variants Within Exon 1 of MECP2 Occur in Females With Mental Retardation," Am J Med Genet (Neuropsychiatr Genet), 144: 355-360 (2007). On the other hand, analysis of the MECP2 exon 1 11 bp deletion (c.47_57del11nt (p.Gly16Glufs)) identified in a number of studies (Mnatzakanian, G. et al., "A previously unidentified MECP2 open reading frame defines a new protein isoform relevant to Rett syndrome," Nat Genet., 36: 339-341 (2004); Amir, R. E. et al., "Mutations in exon 1 of MECP2 are a rare cause of Rett syndrome," J Med Genet., 42: e15 (2005); Saxena, A. et al., "Lost in translation: translational interference from a recurrent mutation in exon 1 of MECP2," J Med Genet., 43: 470-477 (2006); and Ravn, K. et al., "Mutations found within exon 1 of MECP2 in Danish patients with Rett syndrome," Clin Genet., 67: 532-533 (2005)) has suggested that both isoforms of MeCP2 are disrupted in these patients, and thus could not exclude a role for MeCP2_e2 in RTT etiology (Saxena, A. et al., "Lost in translation: translational interference from a recurrent mutation in exon 1 of MECP2," J Med Genet., 43: 470-477 (2006)). However, the missense and start codon mutations, where only MeCP2_e1 is likely disrupted, cast further doubt on a role for MeCP2_e2 in the disorder.
[0177] Previous studies have concluded that sequencing exon 1 contributes little to the mutation detection rate in RTT, even in pre-selected populations such as classical RTT patients who had already tested negative for mutations in exons 2-4 of the gene (Amir, R. E. et al., "Mutations in exon 1 of MECP2 are a rare cause of Rett syndrome," J Med Genet., 42: e15 (2005); Evans, J. C. et al., "Variation in exon 1 coding region and promoter of MECP2 in Rett syndrome and controls," Eur J Hum Genet., 13: 124-126 (2005); and Quenard, A. et al., "Deleterious mutation in exon 1 of MECP2 in Rett syndrome," Eur J Med Genet., 49: 313-322 (2006)). However, the results of the study described herein, which spanned two years with a total of 51 female patients tested, a minority of whom met the clinical criteria for classical RTT (9) or variant RTT (2), were quite different. Other clinical presentations such as autism or developmental delay were much more frequent in this testing population, which would be less likely to be associated with a MECP2 mutation. Seven other studies examining the exon 1 mutation frequency in Rett females have been published to date (see Table 3). All of these studies were restricted to patients meeting criteria for classic or variant RTT and except for one study (Quenard, A. et al., "Deleterious mutation in exon 1 of MECP2 in Rett syndrome," Eur J Med Genet., 49: 313-322 (2006)), all were looking at patients who had previously tested negative for mutations in exons 2-4. The detection rates for mutations within exon 1 range from 0% to 25% (See Table 3) in these studies, with several groups concluding that exon 1 mutations are a rare cause of RTT (Amir, R. E. et al., "Mutations in exon 1 of MECP2 are a rare cause of Rett syndrome," J Med Genet., 42: e15 (2005); Evans, J. C. et al., "Variation in exon 1 coding region and promoter of MECP2 in Rett syndrome and controls," Eur J Hum Genet., 13: 124-126 (2005); and Quenard, A. et al., "Deleterious mutation in exon 1 of MECP2 in Rett syndrome," Eur J Med Genet., 49: 313-322 (2006)). In this study of 51 unselected patients, 4 had exon 1 mutations (7.8%). For the sake of comparison, if the numbers are restricted to only those patients who fit the classic or atypical RTT criteria, then the exon 1 mutation frequency is 36%. The average detection rate from the reports listed in Table 3 is 8.1% (median 5%). Taken together, these data indicate that exon 1 mutations detectable by sequencing are slightly more common than previously reported (Amir, R. E. et al., "Mutations in exon 1 of MECP2 are a rare cause of Rett syndrome," J Med Genet., 42: e15 (2005); Evans, J. C. et al., "Variation in exon 1 coding region and promoter of MECP2 in Rett syndrome and controls," Eur J Hum Genet., 13: 124-126 (2005); and Quenard, A. et al., "Deleterious mutation in exon 1 of MECP2 in Rett syndrome," Eur J Med Genet., 49: 313-322 (2006)).
[0178] Although genotype-phenotype correlations are difficult to make in RTT because of differences in X-chromosome inactivation (XCI), several authors have observed that patients with exon 1 mutations result in a severe RTT phenotype (Amir, R. E. et al., "Mutations in exon 1 of MECP2 are a rare cause of Rett syndrome," J Med Genet., 42: e15 (2005); Bartholdi, D. et al., "Clinical profiles of four patients with Rett syndrome carrying a novel exon 1 mutation or genomic rearrangement in the MECP2 gene," Clin Genet., 69: 319-326 (2006); and Chunshu, Y. et al., "A patient with classic Rett syndrome with a novel mutation in MECP2 exon 1," Clin Genet., 70: 530-531 (2006)). This could be because exon 1 mutations cause premature truncation of the more relevant, brain-dominant isoform (Kriaucionis, S. and Bird, A., "The major form of MECP2 has a novel N-terminus generated by alternative splicing," Nucleic Acids Res, 32: 1818-1823 (2004) and Mnatzakanian, G. et al., "A previously unidentified MECP2 open reading frame defines a new protein isoform relevant to Rett syndrome," Nat Genet., 36: 339-341 (2004)).
[0179] Out of the 14 patients harboring mutations within exon 1, all but two had classic/severe RTT. The two patients with atypically mild RTT had the same c.47_57del11nt mutation, which has also been reported in classic RTT patients (Table 2), differences for which could be attributed to skewed XCI. All four of the patients in this study had classic RTT, with one dying at an early age from pneumonia at the age of 16. Although the numbers are too small to be of any statistical significance, it is worth noting that 4 of the 14 patients listed in Table 2 died by the age of 25 (median age 17.5). RTT patients do have a decreased survival compared to the general population, but survival to 20 years was 94% in a preliminary study of patients from Texas (del Junco, D. et al., "Survival in a large cohort of US girls and women with Rett syndrome," J Child Neurol., 8:101-102 (1993), Abstract.) and 85.3% in a large Australian cohort of 276 RTT patients (Laurvick, C. L. et al., "Rett syndrome in Australia: a review of the epidemiology," J Pediatr, 148: 347-352 (2006)).
[0180] While the present invention has been described with reference to what are presently considered to be the preferred examples, it is to be understood that the invention is not limited to the disclosed examples. To the contrary, the invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
[0181] All publications, patents and patent applications are herein incorporated by reference in their entirety to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated by reference in its entirety.
TABLE-US-00001 TABLE 1 MECP2E1 mutations or variants identified to date. Position relative Number of Nucleotide to NM_004992 Effect of Associated Patients change (SEQ ID No. 1) Amino acid change change phenotype with mutation 11bp deletion Between Frameshift leads to MECP2E1 Rett 1 38 to 54 nonsense mutation, disrupted, premature truncation of MECP2E2 not protein after amino acid 36 disrupted Exon 1 deletion 1-69 No MECP2E1 translation MECP2E1 and Rett 1 MECP2E2 disrupted 1A -> T 8 1Met -> Leu MECP2E1 Rett 1 disrupted, MECP2E2 possibly diminished del[TG] 69 to 70 Destroys exon 1/intron 1 MECP2E1 Rett 1 splice site, resulting in read disrupted, through and nonsense MECP2E2 translation, with truncation probably not after amino acid 97 disrupted ins[GCCGCCGCC] Between nt 11 ins[Ala]3 within N terminal May affect Developmental 2 and 29 polyalanine stretch of function and or Delay MECP2E1 translation of MECP2E1, but not MECP2E2 del[GCC] Between nt 11 del Ala within N terminal May affect Developmental 1 and 29 polyalanine stretch of function and or Delay MECP2E1 translation of MECP2E1, but not MECP2E2 ins[GGA] Between ins Gly May affect Developmental 5 38 to 54 function and or Delay translation of MECP2E1, but not MECP2E2 -45 del [GC] -38 to -39 In 5'UTR, 45 nt upstream of May affect Developmental 1 relative to START codon- potential transcription or Delay BX538060 SP1 transcription factor translation of binding site MECP2E1 -26 del [AG] -19 to -20 In 5'UTR, 26 nt upstream of May affect Developmental 1 relative to START codon transcription or Delay BX538060 translation of MECP2E1 "del" indicates a deletion; "ins" indicates an insertion
TABLE-US-00002 TABLE 2 Summary of reported exon 1 sequence mutations in MECP2 to date. Patient Age at Death RTT Mutation Age (Cause) XCI Phenotype c.1A > T (p.Met1?) 20 n/a 63:37 classic c.1A > G (p.Met1?) 28 n/a 90:10 classic c.5C > T (p.A2V) 16 Not classic (pneumonia) done c.23_27dup5nt 25 -- classic (p.Ser10Argfs) (not given) c.30delCinsGA 19 70:30 classic (p.Ser10Argfs) (pneumonia) c.47_57del11nt 27 n/a -- classic (p.Gly 16Glufs) c.47_57del11nt 37 n/a -- classic (p.Gly 16Glufs) c.47_57del11nt ? n/a 44:56 atypical (p.Gly 16Glufs) (mild) c.47_57del11nt 13 n/a 73:27 atypical (p.Gly 16Glufs) (mild) c.48_55dup (p.Glu19Alafs) 5 n/a Random classic c.59_60delGA 5 n/a 48:52 classic (p.Arg20Thrfs) c.62 + 1delGT 8 n/a 68:32 classic c.62 + 1delGT 7 n/a 78:22 classic c.62 + 2_62 + 3del 61/2 Random atypical (not given) (severe)
TABLE-US-00003 TABLE 3 Literature reports of exon 1 mutation frequency in females with RTT and variant RTT phenotype. Frequency of Previously Large Gene Mutations in Negative for Rearrangements Exon 1 Phenotype Exons 2-4 Including Exon 1 .sup. 1/19; 5.2% Typical RTT Yes 1 patient, exon 1 .sup. 2/63; 3.2% 38 classic Yes Not tested RTT, 25 atypical RTT 2/212; .9% 211 typical No 4 patients, large RTT, 1 deletions* atypical (severe) RTT 2/10; 20% Typical RTT Yes None 1/20; 5% 12 classic Yes 1 patient, exons RTT, 8 1-2 variant RTT, 1/20; 5% Classic and Yes Not tested atypical RTT 0/97; 0% 37 classic Yes None (Not all RTT and 60 were tested) atypical 1/4; 25% Classic RTT Not n/a specified .sup. 4/51; 7.8% 9 classical 21 Patients Not tested RTT, 2 variant RTT; (the rest have autism, MR, microcephaly, etc.) Total: 6 Deletions 14/496; 2.8% *One deletion including promoter and exon 1, one including exons 1-2, one including promoter and exons 1-2, and one complete gene deletion
Sequence CWU
1
1
55110182DNAHomo sapiens 1ccggaaaatg gccgccgccg ccgccgccgc gccgagcgga
ggaggaggag gaggcgagga 60ggagagactg ctccataaaa atacagactc accagttcct
gctttgatgt gacatgtgac 120tccccagaat acaccttgct tctgtagacc agctccaaca
ggattccatg gtagctggga 180tgttagggct cagggaagaa aagtcagaag accaggacct
ccagggcctc aaggacaaac 240ccctcaagtt taaaaaggtg aagaaagata agaaagaaga
gaaagagggc aagcatgagc 300ccgtgcagcc atcagcccac cactctgctg agcccgcaga
ggcaggcaaa gcagagacat 360cagaagggtc aggctccgcc ccggctgtgc cggaagcttc
tgcctccccc aaacagcggc 420gctccatcat ccgtgaccgg ggacccatgt atgatgaccc
caccctgcct gaaggctgga 480cacggaagct taagcaaagg aaatctggcc gctctgctgg
gaagtatgat gtgtatttga 540tcaatcccca gggaaaagcc tttcgctcta aagtggagtt
gattgcgtac ttcgaaaagg 600taggcgacac atccctggac cctaatgatt ttgacttcac
ggtaactggg agagggagcc 660cctcccggcg agagcagaaa ccacctaaga agcccaaatc
tcccaaagct ccaggaactg 720gcagaggccg gggacgcccc aaagggagcg gcaccacgag
acccaaggcg gccacgtcag 780agggtgtgca ggtgaaaagg gtcctggaga aaagtcctgg
gaagctcctt gtcaagatgc 840cttttcaaac ttcgccaggg ggcaaggctg aggggggtgg
ggccaccaca tccacccagg 900tcatggtgat caaacgcccc ggcaggaagc gaaaagctga
ggccgaccct caggccattc 960ccaagaaacg gggccgaaag ccggggagtg tggtggcagc
cgctgccgcc gaggccaaaa 1020agaaagccgt gaaggagtct tctatccgat ctgtgcagga
gaccgtactc cccatcaaga 1080agcgcaagac ccgggagacg gtcagcatcg aggtcaagga
agtggtgaag cccctgctgg 1140tgtccaccct cggtgagaag agcgggaaag gactgaagac
ctgtaagagc cctgggcgga 1200aaagcaagga gagcagcccc aaggggcgca gcagcagcgc
ctcctcaccc cccaagaagg 1260agcaccacca ccatcaccac cactcagagt ccccaaaggc
ccccgtgcca ctgctcccac 1320ccctgccccc acctccacct gagcccgaga gctccgagga
ccccaccagc ccccctgagc 1380cccaggactt gagcagcagc gtctgcaaag aggagaagat
gcccagagga ggctcactgg 1440agagcgacgg ctgccccaag gagccagcta agactcagcc
cgcggttgcc accgccgcca 1500cggccgcaga aaagtacaaa caccgagggg agggagagcg
caaagacatt gtttcatcct 1560ccatgccaag gccaaacaga gaggagcctg tggacagccg
gacgcccgtg accgagagag 1620ttagctgact ttacacggag cggattgcaa agcaaaccaa
caagaataaa ggcagctgtt 1680gtctcttctc cttatgggta gggctctgac aaagcttccc
gattaactga aataaaaaat 1740attttttttt ctttcagtaa acttagagtt tcgtggcttc
agggtgggag tagttggagc 1800attggggatg tttttcttac cgacaagcac agtcaggttg
aagacctaac cagggccaga 1860agtagctttg cacttttcta aactaggctc cttcaacaag
gcttgctgca gatactactg 1920accagacaag ctgttgacca ggcacctccc ctcccgccca
aacctttccc ccatgtggtc 1980gttagagaca gagcgacaga gcagttgaga ggacactccc
gttttcggtg ccatcagtgc 2040cccgtctaca gctcccccag ctccccccac ctcccccact
cccaaccacg ttgggacagg 2100gaggtgtgag gcaggagaga cagttggatt ctttagagaa
gatggatatg accagtggct 2160atggcctgtg cgatcccacc cgtggtggct caagtctggc
cccacaccag ccccaatcca 2220aaactggcaa ggacgcttca caggacagga aagtggcacc
tgtctgctcc agctctggca 2280tggctaggag gggggagtcc cttgaactac tgggtgtaga
ctggcctgaa ccacaggaga 2340ggatggccca gggtgaggtg gcatggtcca ttctcaaggg
acgtcctcca acgggtggcg 2400ctagaggcca tggaggcagt aggacaaggt gcaggcaggc
tggcctgggg tcaggccggg 2460cagagcacag cggggtgaga gggattccta atcactcaga
gcagtctgtg acttagtgga 2520caggggaggg ggcaaagggg gaggagaaga aaatgttctt
ccagttactt tccaattctc 2580ctttagggac agcttagaat tatttgcact attgagtctt
catgttccca cttcaaaaca 2640aacagatgct ctgagagcaa actggcttga attggtgaca
tttagtccct caagccacca 2700gatgtgacag tgttgagaac tacctggatt tgtatatata
cctgcgcttg ttttaaagtg 2760ggctcagcac atagggttcc cacgaagctc cgaaactcta
agtgtttgct gcaattttat 2820aaggacttcc tgattggttt ctcttctccc cttccatttc
tgccttttgt tcatttcatc 2880ctttcacttc tttcccttcc tccgtcctcc tccttcctag
ttcatccctt ctcttccagg 2940cagccgcggt gcccaaccac acttgtcggc tccagtcccc
agaactctgc ctgccctttg 3000tcctcctgct gccagtacca gccccaccct gttttgagcc
ctgaggaggc cttgggctct 3060gctgagtccg acctggcctg tctgtgaaga gcaagagagc
agcaaggtct tgctctccta 3120ggtagccccc tcttccctgg taagaaaaag caaaaggcat
ttcccaccct gaacaacgag 3180ccttttcacc cttctactct agagaagtgg actggaggag
ctgggcccga tttggtagtt 3240gaggaaagca cagaggcctc ctgtggcctg ccagtcatcg
agtggcccaa caggggctcc 3300atgccagccg accttgacct cactcagaag tccagagtct
agcgtagtgc agcagggcag 3360tagcggtacc aatgcagaac tcccaagacc cgagctggga
ccagtacctg ggtccccagc 3420ccttcctctg ctcccccttt tccctcggag ttcttcttga
atggcaatgt tttgcttttg 3480ctcgatgcag acagggggcc agaacaccac acatttcact
gtctgtctgg tccatagctg 3540tggtgtaggg gcttagaggc atgggcttgc tgtgggtttt
taattgatca gttttcatgt 3600gggatcccat ctttttaacc tctgttcagg aagtccttat
ctagctgcat atcttcatca 3660tattggtata tccttttctg tgtttacaga gatgtctctt
atatctaaat ctgtccaact 3720gagaagtacc ttatcaaagt agcaaatgag acagcagtct
tatgcttcca gaaacaccca 3780caggcatgtc ccatgtgagc tgctgccatg aactgtcaag
tgtgtgttgt cttgtgtatt 3840tcagttattg tccctggctt ccttactatg gtgtaatcat
gaaggagtga aacatcatag 3900aaactgtcta gcacttcctt gccagtcttt agtgatcagg
aaccatagtt gacagttcca 3960atcagtagct taagaaaaaa ccgtgtttgt ctcttctgga
atggttagaa gtgagggagt 4020ttgccccgtt ctgtttgtag agtctcatag ttggactttc
tagcatatat gtgtccattt 4080ccttatgctg taaaagcaag tcctgcaacc aaactcccat
cagcccaatc cctgatccct 4140gatcccttcc acctgctctg ctgatgaccc ccccagcttc
acttctgact cttccccagg 4200aagggaaggg gggtcagaag agagggtgag tcctccagaa
ctcttcctcc aaggacagaa 4260ggctcctgcc cccatagtgg cctcgaactc ctggcactac
caaaggacac ttatccacga 4320gagcgcagca tccgaccagg ttgtcactga gaagatgttt
attttggtca gttgggtttt 4380tatgtattat acttagtcaa atgtaatgtg gcttctggaa
tcattgtcca gagctgcttc 4440cccgtcacct gggcgtcatc tggtcctggt aagaggagtg
cgtggcccac caggcccccc 4500tgtcacccat gacagttcat tcagggccga tggggcagtc
gtggttggga acacagcatt 4560tcaagcgtca ctttatttca ttcgggcccc acctgcagct
ccctcaaaga ggcagttgcc 4620cagcctcttt cccttccagt ttattccaga gctgccagtg
gggcctgagg ctccttaggg 4680ttttctctct atttccccct ttcttcctca ttccctcgtc
tttcccaaag gcatcacgag 4740tcagtcgcct ttcagcaggc agccttggcg gtttatcgcc
ctggcaggca ggggccctgc 4800agctctcatg ctgcccctgc cttggggtca ggttgacagg
aggttggagg gaaagcctta 4860agctgcagga ttctcaccag ctgtgtccgg cccagttttg
gggtgtgacc tcaatttcaa 4920ttttgtctgt acttgaacat tatgaagatg ggggcctctt
tcagtgaatt tgtgaacagc 4980agaattgacc gacagctttc cagtacccat ggggctaggt
cattaaggcc acatccacag 5040tctcccccac ccttgttcca gttgttagtt actacctcct
ctcctgacaa tactgtatgt 5100cgtcgagctc cccccaggtc tacccctccc ggccctgcct
gctggtgggc ttgtcatagc 5160cagtgggatt gccggtcttg acagctcagt gagctggaga
tacttggtca cagccaggcg 5220ctagcacagc tcccttctgt tgatgctgta ttcccatatc
aaaagacaca ggggacaccc 5280agaaacgcca catcccccaa tccatcagtg ccaaactagc
caacggcccc agcttctcag 5340ctcgctggat ggcggaagct gctactcgtg agcgccagtg
cgggtgcaga caatcttctg 5400ttgggtggca tcattccagg cccgaagcat gaacagtgca
cctgggacag ggagcagccc 5460caaattgtca cctgcttctc tgcccagctt ttcattgctg
tgacagtgat ggcgaaagag 5520ggtaataacc agacacaaac tgccaagttg ggtggagaaa
ggagtttctt tagctgacag 5580aatctctgaa ttttaaatca cttagtaagc ggctcaagcc
caggagggag cagagggata 5640cgagcggagt cccctgcgcg ggaccatctg gaattggttt
agcccaagtg gagcctgaca 5700gccagaactc tgtgtccccc gtctaaccac agctcctttt
ccagagcatt ccagtcaggc 5760tctctgggct gactgggcca ggggaggtta caggtaccag
ttctttaaga agatctttgg 5820gcatatacat ttttagcctg tgtcattgcc ccaaatggat
tcctgtttca agttcacacc 5880tgcagattct aggacctgtg tcctagactt cagggagtca
gctgtttcta gagttcctac 5940catggagtgg gtctggagga cctgcccggt gggggggcag
agccctgctc cctccgggtc 6000ttcctactct tctctctgct ctgacgggat ttgttgattc
tctccatttt ggtgtctttc 6060tcttttagat attgtatcaa tctttagaaa aggcatagtc
tacttgttat aaatcgttag 6120gatactgcct cccccagggt ctaaaattac atattagagg
ggaaaagctg aacactgaag 6180tcagttctca acaatttaga aggaaaacct agaaaacatt
tggcagaaaa ttacatttcg 6240atgtttttga atgaatacga gcaagctttt acaacagtgc
tgatctaaaa atacttagca 6300cttggcctga gatgcctggt gagcattaca ggcaagggga
atctggaggt agccgacctg 6360aggacatggc ttctgaacct gtcttttggg agtggtatgg
aaggtggagc gttcaccagt 6420gacctggaag gcccagcacc accctccttc ccactcttct
catcttgaca gagcctgccc 6480cagcgctgac gtgtcaggaa aacacccagg gaactaggaa
ggcacttctg cctgaggggc 6540agcctgcctt gcccactcct gctctgctcg cctcggatca
gctgagcctt ctgagctggc 6600ctctcactgc ctccccaagg ccccctgcct gccctgtcag
gaggcagaag gaagcaggtg 6660tgagggcagt gcaaggaggg agcacaaccc ccagctcccg
ctccgggctc cgacttgtgc 6720acaggcagag cccagaccct ggaggaaatc ctacctttga
attcaagaac atttggggaa 6780tttggaaatc tctttgcccc caaaccccca ttctgtccta
cctttaatca ggtcctgctc 6840agcagtgaga gcagatgagg tgaaaaggcc aagaggtttg
gctcctgccc actgatagcc 6900cctctccccg cagtgtttgt gtgtcaagtg gcaaagctgt
tcttcctggt gaccctgatt 6960atatccagta acacatagac tgtgcgcata ggcctgcttt
gtctcctcta tcctgggctt 7020ttgttttgct ttttagtttt gcttttagtt tttctgtccc
ttttatttaa cgcaccgact 7080agacacacaa agcagttgaa tttttatata tatatctgta
tattgcacaa ttataaactc 7140attttgcttg tggctccaca cacacaaaaa aagacctgtt
aaaattatac ctgttgctta 7200attacaatat ttctgataac catagcatag gacaagggaa
aataaaaaaa gaaaaaaaag 7260aaaaaaaaac gacaaatctg tctgctggtc acttcttctg
tccaagcaga ttcgtggtct 7320tttcctcgct tctttcaagg gctttcctgt gccaggtgaa
ggaggctcca ggcagcaccc 7380aggttttgca ctcttgtttc tcccgtgctt gtgaaagagg
tcccaaggtt ctgggtgcag 7440gagcgctccc ttgacctgct gaagtccgga acgtagtcgg
cacagcctgg tcgccttcca 7500cctctgggag ctggagtcca ctggggtggc ctgactcccc
cagtcccctt cccgtgacct 7560ggtcagggtg agcccatgtg gagtcagcct cgcaggcctc
cctgccagta gggtccgagt 7620gtgtttcatc cttcccactc tgtcgagcct gggggctgga
gcggagacgg gaggcctggc 7680ctgtctcgga acctgtgagc tgcaccaggt agaacgccag
ggaccccaga atcatgtgcg 7740tcagtccaag gggtcccctc caggagtagt gaagactcca
gaaatgtccc tttcttctcc 7800cccatcctac gagtaattgc atttgctttt gtaattctta
atgagcaata tctgctagag 7860agtttagctg taacagttct ttttgatcat ctttttttaa
taattagaaa caccaaaaaa 7920atccagaaac ttgttcttcc aaagcagaga gcattataat
caccagggcc aaaagcttcc 7980ctccctgctg tcattgcttc ttctgaggcc tgaatccaaa
agaaaaacag ccataggccc 8040tttcagtggc cgggctaccc gtgagccctt cggaggacca
gggctggggc agcctctggg 8100cccacatccg gggccagctc cggcgtgtgt tcagtgttag
cagtgggtca tgatgctctt 8160tcccacccag cctgggatag gggcagagga ggcgaggagg
ccgttgccgc tgatgtttgg 8220ccgtgaacag gtgggtgtct gcgtgcgtcc acgtgcgtgt
tttctgactg acatgaaatc 8280gacgcccgag ttagcctcac ccggtgacct ctagccctgc
ccggatggag cggggcccac 8340ccggttcagt gtttctgggg agctggacag tggagtgcaa
aaggcttgca gaacttgaag 8400cctgctcctt cccttgctac cacggcctcc tttccgtttg
atttgtcact gcttcaatca 8460ataacagccg ctccagagtc agtagtcaat gaatatatga
ccaaatatca ccaggactgt 8520tactcaatgt gtgccgagcc cttgcccatg ctgggctccc
gtgtatctgg acactgtaac 8580gtgtgctgtg tttgctcccc ttccccttcc ttctttgccc
tttacttgtc tttctggggt 8640ttttctgttt gggtttggtt tggtttttat ttctcctttt
gtgttccaaa catgaggttc 8700tctctactgg tcctcttaac tgtggtgttg aggcttatat
ttgtgtaatt tttggtgggt 8760gaaaggaatt ttgctaagta aatctcttct gtgtttgaac
tgaagtctgt attgtaacta 8820tgtttaaagt aattgttcca gagacaaata tttctagaca
ctttttcttt acaaacaaaa 8880gcattcggag ggagggggat ggtgactgag atgagagggg
agagctgaac agatgacccc 8940tgcccagatc agccagaagc cacccaaagc agtggagccc
aggagtccca ctccaagcca 9000gcaagccgaa tagctgatgt gttgccactt tccaagtcac
tgcaaaacca ggttttgttc 9060cgcccagtgg attcttgttt tgcttcccct ccccccgaga
ttattaccac catcccgtgc 9120ttttaaggaa aggcaagatt gatgtttcct tgaggggagc
caggagggga tgtgtgtgtg 9180cagagctgaa gagctgggga gaatggggct gggcccaccc
aagcaggagg ctgggacgct 9240ctgctgtggg cacaggtcag gctaatgttg gcagatgcag
ctcttcctgg acaggccagg 9300tggtgggcat tctctctcca aggtgtgccc cgtgggcatt
actgtttaag acacttccgt 9360cacatcccac cccatcctcc agggctcaac actgtgacat
ctctattccc caccctcccc 9420ttcccagggc aataaaatga ccatggaggg ggcttgcact
ctcttggctg tcacccgatc 9480gccagcaaaa cttagatgtg agaaaacccc ttcccattcc
atggcgaaaa catctcctta 9540gaaaagccat taccctcatt aggcatggtt ttgggctccc
aaaacacctg acagcccctc 9600cctcctctga gaggcggaga gtgctgactg tagtgaccat
tgcatgccgg gtgcagcatc 9660tggaagagct aggcagggtg tctgccccct cctgagttga
agtcatgctc ccctgtgcca 9720gcccagaggc cgagagctat ggacagcatt gccagtaaca
caggccaccc tgtgcagaag 9780ggagctggct ccagcctgga aacctgtctg aggttgggag
aggtgcactt ggggcacagg 9840gagaggccgg gacacactta gctggagatg tctctaaaag
ccctgtatcg tattcacctt 9900cagtttttgt gttttgggac aattacttta gaaaataagt
aggtcgtttt aaaaacaaaa 9960attattgatt gcttttttgt agtgttcaga aaaaaggttc
tttgtgtata gccaaatgac 10020tgaaagcact gatatattta aaaacaaaag gcaatttatt
aaggaaattt gtaccatttc 10080agtaaacctg tctgaatgta cctgtatacg tttcaaaaac
accccccccc cactgaatcc 10140ctgtaaccta tttattatat aaagagtttg ccttataaat
tt 101822486PRTHomo sapiens 2Met Val Ala Gly Met Leu
Gly Leu Arg Glu Glu Lys Ser Glu Asp Gln1 5
10 15Asp Leu Gln Gly Leu Lys Asp Lys Pro Leu Lys Phe
Lys Lys Val Lys 20 25 30Lys
Asp Lys Lys Glu Glu Lys Glu Gly Lys His Glu Pro Val Gln Pro 35
40 45Ser Ala His His Ser Ala Glu Pro Ala
Glu Ala Gly Lys Ala Glu Thr 50 55
60Ser Glu Gly Ser Gly Ser Ala Pro Ala Val Pro Glu Ala Ser Ala Ser65
70 75 80Pro Lys Gln Arg Arg
Ser Ile Ile Arg Asp Arg Gly Pro Met Tyr Asp 85
90 95Asp Pro Thr Leu Pro Glu Gly Trp Thr Arg Lys
Leu Lys Gln Arg Lys 100 105
110Ser Gly Arg Ser Ala Gly Lys Tyr Asp Val Tyr Leu Ile Asn Pro Gln
115 120 125Gly Lys Ala Phe Arg Ser Lys
Val Glu Leu Ile Ala Tyr Phe Glu Lys 130 135
140Val Gly Asp Thr Ser Leu Asp Pro Asn Asp Phe Asp Phe Thr Val
Thr145 150 155 160Gly Arg
Gly Ser Pro Ser Arg Arg Glu Gln Lys Pro Pro Lys Lys Pro
165 170 175Lys Ser Pro Lys Ala Pro Gly
Thr Gly Arg Gly Arg Gly Arg Pro Lys 180 185
190Gly Ser Gly Thr Thr Arg Pro Lys Ala Ala Thr Ser Glu Gly
Val Gln 195 200 205Val Lys Arg Val
Leu Glu Lys Ser Pro Gly Lys Leu Leu Val Lys Met 210
215 220Pro Phe Gln Thr Ser Pro Gly Gly Lys Ala Glu Gly
Gly Gly Ala Thr225 230 235
240Thr Ser Thr Gln Val Met Val Ile Lys Arg Pro Gly Arg Lys Arg Lys
245 250 255Ala Glu Ala Asp Pro
Gln Ala Ile Pro Lys Lys Arg Gly Arg Lys Pro 260
265 270Gly Ser Val Val Ala Ala Ala Ala Ala Glu Ala Lys
Lys Lys Ala Val 275 280 285Lys Glu
Ser Ser Ile Arg Ser Val Gln Glu Thr Val Leu Pro Ile Lys 290
295 300Lys Arg Lys Thr Arg Glu Thr Val Ser Ile Glu
Val Lys Glu Val Val305 310 315
320Lys Pro Leu Leu Val Ser Thr Leu Gly Glu Lys Ser Gly Lys Gly Leu
325 330 335Lys Thr Cys Lys
Ser Pro Gly Arg Lys Ser Lys Glu Ser Ser Pro Lys 340
345 350Gly Arg Ser Ser Ser Ala Ser Ser Pro Pro Lys
Lys Glu His His His 355 360 365His
His His His Ser Glu Ser Pro Lys Ala Pro Val Pro Leu Leu Pro 370
375 380Pro Leu Pro Pro Pro Pro Pro Glu Pro Glu
Ser Ser Glu Asp Pro Thr385 390 395
400Ser Pro Pro Glu Pro Gln Asp Leu Ser Ser Ser Val Cys Lys Glu
Glu 405 410 415Lys Met Pro
Arg Gly Gly Ser Leu Glu Ser Asp Gly Cys Pro Lys Glu 420
425 430Pro Ala Lys Thr Gln Pro Ala Val Ala Thr
Ala Ala Thr Ala Ala Glu 435 440
445Lys Tyr Lys His Arg Gly Glu Gly Glu Arg Lys Asp Ile Val Ser Ser 450
455 460Ser Met Pro Arg Pro Asn Arg Glu
Glu Pro Val Asp Ser Arg Thr Pro465 470
475 480Val Thr Glu Arg Val Ser
48531504DNAHomo sapiens 3ccggaaaatg gccgccgccg ccgccgccgc gccgagcgga
ggaggaggag gaggcgagga 60ggagagactg gaagaaaagt cagaagacca ggacctccag
ggcctcaagg acaaacccct 120caagtttaaa aaggtgaaga aagataagaa agaagagaaa
gagggcaagc atgagcccgt 180gcagccatca gcccaccact ctgctgagcc cgcagaggca
ggcaaagcag agacatcaga 240agggtcaggc tccgccccgg ctgtgccgga agcttctgcc
tcccccaaac agcggcgctc 300catcatccgt gaccggggac ccatgtatga tgaccccacc
ctgcctgaag gctggacacg 360gaagcttaag caaaggaaat ctggccgctc tgctgggaag
tatgatgtgt atttgatcaa 420tccccaggga aaagcctttc gctctaaagt ggagttgatt
gcgtacttcg aaaaggtagg 480cgacacatcc ctggacccta atgattttga cttcacggta
actgggagag ggagcccctc 540ccggcgagag cagaaaccac ctaagaagcc caaatctccc
aaagctccag gaactggcag 600aggccgggga cgccccaaag ggagcggcac cacgagaccc
aaggcggcca cgtcagaggg 660tgtgcaggtg aaaagggtcc tggagaaaag tcctgggaag
ctccttgtca agatgccttt 720tcaaacttcg ccagggggca aggctgaggg gggtggggcc
accacatcca cccaggtcat 780ggtgatcaaa cgccccggca ggaagcgaaa agctgaggcc
gaccctcagg ccattcccaa 840gaaacggggc cgaaagccgg ggagtgtggt ggcagccgct
gccgccgagg ccaaaaagaa 900agccgtgaag gagtcttcta tccgatctgt gcaggagacc
gtactcccca tcaagaagcg 960caagacccgg gagacggtca gcatcgaggt caaggaagtg
gtgaagcccc tgctggtgtc 1020caccctcggt gagaagagcg ggaaaggact gaagacctgt
aagagccctg ggcggaaaag 1080caaggagagc agccccaagg ggcgcagcag cagcgcctcc
tcacccccca agaaggagca 1140ccaccaccat caccaccact cagagtcccc aaaggccccc
gtgccactgc tcccacccct 1200gcccccacct ccacctgagc ccgagagctc cgaggacccc
accagccccc ctgagcccca 1260ggacttgagc agcagcgtct gcaaagagga gaagatgccc
agaggaggct cactggagag 1320cgacggctgc cccaaggagc cagctaagac tcagcccgcg
gttgccaccg ccgccacggc 1380cgcagaaaag tacaaacacc gaggggaggg agagcgcaaa
gacattgttt catcctccat 1440gccaaggcca aacagagagg agcctgtgga cagccggacg
cccgtgaccg agagagttag 1500ctga
15044498PRTHomo sapiens 4Met Ala Ala Ala Ala Ala
Ala Ala Pro Ser Gly Gly Gly Gly Gly Gly1 5
10 15Glu Glu Glu Arg Leu Glu Glu Lys Ser Glu Asp Gln
Asp Leu Gln Gly 20 25 30Leu
Lys Asp Lys Pro Leu Lys Phe Lys Lys Val Lys Lys Asp Lys Lys 35
40 45Glu Glu Lys Glu Gly Lys His Glu Pro
Val Gln Pro Ser Ala His His 50 55
60Ser Ala Glu Pro Ala Glu Ala Gly Lys Ala Glu Thr Ser Glu Gly Ser65
70 75 80Gly Ser Ala Pro Ala
Val Pro Glu Ala Ser Ala Ser Pro Lys Gln Arg 85
90 95Arg Ser Ile Ile Arg Asp Arg Gly Pro Met Tyr
Asp Asp Pro Thr Leu 100 105
110Pro Glu Gly Trp Thr Arg Lys Leu Lys Gln Arg Lys Ser Gly Arg Ser
115 120 125Ala Gly Lys Tyr Asp Val Tyr
Leu Ile Asn Pro Gln Gly Lys Ala Phe 130 135
140Arg Ser Lys Val Glu Leu Ile Ala Tyr Phe Glu Lys Val Gly Asp
Thr145 150 155 160Ser Leu
Asp Pro Asn Asp Phe Asp Phe Thr Val Thr Gly Arg Gly Ser
165 170 175Pro Ser Arg Arg Glu Gln Lys
Pro Pro Lys Lys Pro Lys Ser Pro Lys 180 185
190Ala Pro Gly Thr Gly Arg Gly Arg Gly Arg Pro Lys Gly Ser
Gly Thr 195 200 205Thr Arg Pro Lys
Ala Ala Thr Ser Glu Gly Val Gln Val Lys Arg Val 210
215 220Leu Glu Lys Ser Pro Gly Lys Leu Leu Val Lys Met
Pro Phe Gln Thr225 230 235
240Ser Pro Gly Gly Lys Ala Glu Gly Gly Gly Ala Thr Thr Ser Thr Gln
245 250 255Val Met Val Ile Lys
Arg Pro Gly Arg Lys Arg Lys Ala Glu Ala Asp 260
265 270Pro Gln Ala Ile Pro Lys Lys Arg Gly Arg Lys Pro
Gly Ser Val Val 275 280 285Ala Ala
Ala Ala Ala Glu Ala Lys Lys Lys Ala Val Lys Glu Ser Ser 290
295 300Ile Arg Ser Val Gln Glu Thr Val Leu Pro Ile
Lys Lys Arg Lys Thr305 310 315
320Arg Glu Thr Val Ser Ile Glu Val Lys Glu Val Val Lys Pro Leu Leu
325 330 335Val Ser Thr Leu
Gly Glu Lys Ser Gly Lys Gly Leu Lys Thr Cys Lys 340
345 350Ser Pro Gly Arg Lys Ser Lys Glu Ser Ser Pro
Lys Gly Arg Ser Ser 355 360 365Ser
Ala Ser Ser Pro Pro Lys Lys Glu His His His His His His His 370
375 380Ser Glu Ser Pro Lys Ala Pro Val Pro Leu
Leu Pro Pro Leu Pro Pro385 390 395
400Pro Pro Pro Glu Pro Glu Ser Ser Glu Asp Pro Thr Ser Pro Pro
Glu 405 410 415Pro Gln Asp
Leu Ser Ser Ser Val Cys Lys Glu Glu Lys Met Pro Arg 420
425 430Gly Gly Ser Leu Glu Ser Asp Gly Cys Pro
Lys Glu Pro Ala Lys Thr 435 440
445Gln Pro Ala Val Ala Thr Ala Ala Thr Ala Ala Glu Lys Tyr Lys His 450
455 460Arg Gly Glu Gly Glu Arg Lys Asp
Ile Val Ser Ser Ser Met Pro Arg465 470
475 480Pro Asn Arg Glu Glu Pro Val Asp Ser Arg Thr Pro
Val Thr Glu Arg 485 490
495Val Ser518DNAArtificial SequenceDescription of Artificial Sequence
Synthetic HF primer 5ctcggagaga gggctgtg
18620DNAArtificial SequenceDescription of
Artificial Sequence Synthetic HR1 primer 6cttgaggggt ttgtccttga
20720DNAArtificial
SequenceDescription of Artificial Sequence Synthetic HR2 primer
7cgtttgatca ccatgacctg
20820DNAArtificial SequenceDescription of Artificial Sequence Synthetic
MF primer 8aggaggcgag gaggagagac
20919DNAArtificial SequenceDescription of Artificial Sequence
Synthetic MR primer 9ctggctctgc agaatggtg
191022DNAArtificial SequenceDescription of
Artificial Sequence Synthetic MECP2B-specific primer 10aggagagact
ggaagaaaag tc
221120DNAArtificial SequenceDescription of Artificial Sequence Synthetic
reverse primer 11cttgaggggt ttgtccttga
201223DNAArtificial SequenceDescription of Artificial
Sequence Synthetic MECP2A transcript-specific primer 12ctcaccagtt
cctgctttga tgt
231322DNAArtificial SequenceDescription of Artificial Sequence Synthetic
MECP2B transcript-specific primer 13aggagagact ggaggaaaag tc
221425DNAArtificial
SequenceDescription of Artificial Sequence Synthetic reverse primer
14cttaaacttc agtggcttgt ctctg
251523DNAArtificial SequenceDescription of Artificial Sequence Synthetic
MECP2A forward primer 15tatggatcca tggtagctgg gat
231621DNAArtificial SequenceDescription of
Artificial Sequence Synthetic MECP2B forward primer 16tatggatccg
gaaaatggcc g
211720DNAArtificial SequenceDescription of Artificial Sequence Synthetic
reverse primer 17gcgtctagag ctaactctct
201821PRTArtificial SequenceDescription of Artificial
Sequence Synthetic MeCP2 N-terminus peptide 18Met Ala Ala Ala Ala
Ala Ala Ala Pro Ser Gly Gly Gly Gly Gly Gly1 5
10 15Glu Glu Glu Arg Leu
201918DNAArtificial SequenceDescription of Artificial Sequence Synthetic
X1F primer 19ccatcacagc caatgacg
182020DNAArtificial SequenceDescription of Artificial
Sequence Synthetic X1R primer 20agggggaggg tagagaggag
202110171DNAHomo sapiens 21ccggaaaatg
gccgccgccg ccgccgccgc gccgagcagg aggcgaggag gagagactgc 60tccataaaaa
tacagactca ccagttcctg ctttgatgtg acatgtgact ccccagaata 120caccttgctt
ctgtagacca gctccaacag gattccatgg tagctgggat gttagggctc 180agggaagaaa
agtcagaaga ccaggacctc cagggcctca aggacaaacc cctcaagttt 240aaaaaggtga
agaaagataa gaaagaagag aaagagggca agcatgagcc cgtgcagcca 300tcagcccacc
actctgctga gcccgcagag gcaggcaaag cagagacatc agaagggtca 360ggctccgccc
cggctgtgcc ggaagcttct gcctccccca aacagcggcg ctccatcatc 420cgtgaccggg
gacccatgta tgatgacccc accctgcctg aaggctggac acggaagctt 480aagcaaagga
aatctggccg ctctgctggg aagtatgatg tgtatttgat caatccccag 540ggaaaagcct
ttcgctctaa agtggagttg attgcgtact tcgaaaaggt aggcgacaca 600tccctggacc
ctaatgattt tgacttcacg gtaactggga gagggagccc ctcccggcga 660gagcagaaac
cacctaagaa gcccaaatct cccaaagctc caggaactgg cagaggccgg 720ggacgcccca
aagggagcgg caccacgaga cccaaggcgg ccacgtcaga gggtgtgcag 780gtgaaaaggg
tcctggagaa aagtcctggg aagctccttg tcaagatgcc ttttcaaact 840tcgccagggg
gcaaggctga ggggggtggg gccaccacat ccacccaggt catggtgatc 900aaacgccccg
gcaggaagcg aaaagctgag gccgaccctc aggccattcc caagaaacgg 960ggccgaaagc
cggggagtgt ggtggcagcc gctgccgccg aggccaaaaa gaaagccgtg 1020aaggagtctt
ctatccgatc tgtgcaggag accgtactcc ccatcaagaa gcgcaagacc 1080cgggagacgg
tcagcatcga ggtcaaggaa gtggtgaagc ccctgctggt gtccaccctc 1140ggtgagaaga
gcgggaaagg actgaagacc tgtaagagcc ctgggcggaa aagcaaggag 1200agcagcccca
aggggcgcag cagcagcgcc tcctcacccc ccaagaagga gcaccaccac 1260catcaccacc
actcagagtc cccaaaggcc cccgtgccac tgctcccacc cctgccccca 1320cctccacctg
agcccgagag ctccgaggac cccaccagcc cccctgagcc ccaggacttg 1380agcagcagcg
tctgcaaaga ggagaagatg cccagaggag gctcactgga gagcgacggc 1440tgccccaagg
agccagctaa gactcagccc gcggttgcca ccgccgccac ggccgcagaa 1500aagtacaaac
accgagggga gggagagcgc aaagacattg tttcatcctc catgccaagg 1560ccaaacagag
aggagcctgt ggacagccgg acgcccgtga ccgagagagt tagctgactt 1620tacacggagc
ggattgcaaa gcaaaccaac aagaataaag gcagctgttg tctcttctcc 1680ttatgggtag
ggctctgaca aagcttcccg attaactgaa ataaaaaata tttttttttc 1740tttcagtaaa
cttagagttt cgtggcttca gggtgggagt agttggagca ttggggatgt 1800ttttcttacc
gacaagcaca gtcaggttga agacctaacc agggccagaa gtagctttgc 1860acttttctaa
actaggctcc ttcaacaagg cttgctgcag atactactga ccagacaagc 1920tgttgaccag
gcacctcccc tcccgcccaa acctttcccc catgtggtcg ttagagacag 1980agcgacagag
cagttgagag gacactcccg ttttcggtgc catcagtgcc ccgtctacag 2040ctcccccagc
tccccccacc tcccccactc ccaaccacgt tgggacaggg aggtgtgagg 2100caggagagac
agttggattc tttagagaag atggatatga ccagtggcta tggcctgtgc 2160gatcccaccc
gtggtggctc aagtctggcc ccacaccagc cccaatccaa aactggcaag 2220gacgcttcac
aggacaggaa agtggcacct gtctgctcca gctctggcat ggctaggagg 2280ggggagtccc
ttgaactact gggtgtagac tggcctgaac cacaggagag gatggcccag 2340ggtgaggtgg
catggtccat tctcaaggga cgtcctccaa cgggtggcgc tagaggccat 2400ggaggcagta
ggacaaggtg caggcaggct ggcctggggt caggccgggc agagcacagc 2460ggggtgagag
ggattcctaa tcactcagag cagtctgtga cttagtggac aggggagggg 2520gcaaaggggg
aggagaagaa aatgttcttc cagttacttt ccaattctcc tttagggaca 2580gcttagaatt
atttgcacta ttgagtcttc atgttcccac ttcaaaacaa acagatgctc 2640tgagagcaaa
ctggcttgaa ttggtgacat ttagtccctc aagccaccag atgtgacagt 2700gttgagaact
acctggattt gtatatatac ctgcgcttgt tttaaagtgg gctcagcaca 2760tagggttccc
acgaagctcc gaaactctaa gtgtttgctg caattttata aggacttcct 2820gattggtttc
tcttctcccc ttccatttct gccttttgtt catttcatcc tttcacttct 2880ttcccttcct
ccgtcctcct ccttcctagt tcatcccttc tcttccaggc agccgcggtg 2940cccaaccaca
cttgtcggct ccagtcccca gaactctgcc tgccctttgt cctcctgctg 3000ccagtaccag
ccccaccctg ttttgagccc tgaggaggcc ttgggctctg ctgagtccga 3060cctggcctgt
ctgtgaagag caagagagca gcaaggtctt gctctcctag gtagccccct 3120cttccctggt
aagaaaaagc aaaaggcatt tcccaccctg aacaacgagc cttttcaccc 3180ttctactcta
gagaagtgga ctggaggagc tgggcccgat ttggtagttg aggaaagcac 3240agaggcctcc
tgtggcctgc cagtcatcga gtggcccaac aggggctcca tgccagccga 3300ccttgacctc
actcagaagt ccagagtcta gcgtagtgca gcagggcagt agcggtacca 3360atgcagaact
cccaagaccc gagctgggac cagtacctgg gtccccagcc cttcctctgc 3420tccccctttt
ccctcggagt tcttcttgaa tggcaatgtt ttgcttttgc tcgatgcaga 3480cagggggcca
gaacaccaca catttcactg tctgtctggt ccatagctgt ggtgtagggg 3540cttagaggca
tgggcttgct gtgggttttt aattgatcag ttttcatgtg ggatcccatc 3600tttttaacct
ctgttcagga agtccttatc tagctgcata tcttcatcat attggtatat 3660ccttttctgt
gtttacagag atgtctctta tatctaaatc tgtccaactg agaagtacct 3720tatcaaagta
gcaaatgaga cagcagtctt atgcttccag aaacacccac aggcatgtcc 3780catgtgagct
gctgccatga actgtcaagt gtgtgttgtc ttgtgtattt cagttattgt 3840ccctggcttc
cttactatgg tgtaatcatg aaggagtgaa acatcataga aactgtctag 3900cacttccttg
ccagtcttta gtgatcagga accatagttg acagttccaa tcagtagctt 3960aagaaaaaac
cgtgtttgtc tcttctggaa tggttagaag tgagggagtt tgccccgttc 4020tgtttgtaga
gtctcatagt tggactttct agcatatatg tgtccatttc cttatgctgt 4080aaaagcaagt
cctgcaacca aactcccatc agcccaatcc ctgatccctg atcccttcca 4140cctgctctgc
tgatgacccc cccagcttca cttctgactc ttccccagga agggaagggg 4200ggtcagaaga
gagggtgagt cctccagaac tcttcctcca aggacagaag gctcctgccc 4260ccatagtggc
ctcgaactcc tggcactacc aaaggacact tatccacgag agcgcagcat 4320ccgaccaggt
tgtcactgag aagatgttta ttttggtcag ttgggttttt atgtattata 4380cttagtcaaa
tgtaatgtgg cttctggaat cattgtccag agctgcttcc ccgtcacctg 4440ggcgtcatct
ggtcctggta agaggagtgc gtggcccacc aggcccccct gtcacccatg 4500acagttcatt
cagggccgat ggggcagtcg tggttgggaa cacagcattt caagcgtcac 4560tttatttcat
tcgggcccca cctgcagctc cctcaaagag gcagttgccc agcctctttc 4620ccttccagtt
tattccagag ctgccagtgg ggcctgaggc tccttagggt tttctctcta 4680tttccccctt
tcttcctcat tccctcgtct ttcccaaagg catcacgagt cagtcgcctt 4740tcagcaggca
gccttggcgg tttatcgccc tggcaggcag gggccctgca gctctcatgc 4800tgcccctgcc
ttggggtcag gttgacagga ggttggaggg aaagccttaa gctgcaggat 4860tctcaccagc
tgtgtccggc ccagttttgg ggtgtgacct caatttcaat tttgtctgta 4920cttgaacatt
atgaagatgg gggcctcttt cagtgaattt gtgaacagca gaattgaccg 4980acagctttcc
agtacccatg gggctaggtc attaaggcca catccacagt ctcccccacc 5040cttgttccag
ttgttagtta ctacctcctc tcctgacaat actgtatgtc gtcgagctcc 5100ccccaggtct
acccctcccg gccctgcctg ctggtgggct tgtcatagcc agtgggattg 5160ccggtcttga
cagctcagtg agctggagat acttggtcac agccaggcgc tagcacagct 5220cccttctgtt
gatgctgtat tcccatatca aaagacacag gggacaccca gaaacgccac 5280atcccccaat
ccatcagtgc caaactagcc aacggcccca gcttctcagc tcgctggatg 5340gcggaagctg
ctactcgtga gcgccagtgc gggtgcagac aatcttctgt tgggtggcat 5400cattccaggc
ccgaagcatg aacagtgcac ctgggacagg gagcagcccc aaattgtcac 5460ctgcttctct
gcccagcttt tcattgctgt gacagtgatg gcgaaagagg gtaataacca 5520gacacaaact
gccaagttgg gtggagaaag gagtttcttt agctgacaga atctctgaat 5580tttaaatcac
ttagtaagcg gctcaagccc aggagggagc agagggatac gagcggagtc 5640ccctgcgcgg
gaccatctgg aattggttta gcccaagtgg agcctgacag ccagaactct 5700gtgtcccccg
tctaaccaca gctccttttc cagagcattc cagtcaggct ctctgggctg 5760actgggccag
gggaggttac aggtaccagt tctttaagaa gatctttggg catatacatt 5820tttagcctgt
gtcattgccc caaatggatt cctgtttcaa gttcacacct gcagattcta 5880ggacctgtgt
cctagacttc agggagtcag ctgtttctag agttcctacc atggagtggg 5940tctggaggac
ctgcccggtg ggggggcaga gccctgctcc ctccgggtct tcctactctt 6000ctctctgctc
tgacgggatt tgttgattct ctccattttg gtgtctttct cttttagata 6060ttgtatcaat
ctttagaaaa ggcatagtct acttgttata aatcgttagg atactgcctc 6120ccccagggtc
taaaattaca tattagaggg gaaaagctga acactgaagt cagttctcaa 6180caatttagaa
ggaaaaccta gaaaacattt ggcagaaaat tacatttcga tgtttttgaa 6240tgaatacgag
caagctttta caacagtgct gatctaaaaa tacttagcac ttggcctgag 6300atgcctggtg
agcattacag gcaaggggaa tctggaggta gccgacctga ggacatggct 6360tctgaacctg
tcttttggga gtggtatgga aggtggagcg ttcaccagtg acctggaagg 6420cccagcacca
ccctccttcc cactcttctc atcttgacag agcctgcccc agcgctgacg 6480tgtcaggaaa
acacccaggg aactaggaag gcacttctgc ctgaggggca gcctgccttg 6540cccactcctg
ctctgctcgc ctcggatcag ctgagccttc tgagctggcc tctcactgcc 6600tccccaaggc
cccctgcctg ccctgtcagg aggcagaagg aagcaggtgt gagggcagtg 6660caaggaggga
gcacaacccc cagctcccgc tccgggctcc gacttgtgca caggcagagc 6720ccagaccctg
gaggaaatcc tacctttgaa ttcaagaaca tttggggaat ttggaaatct 6780ctttgccccc
aaacccccat tctgtcctac ctttaatcag gtcctgctca gcagtgagag 6840cagatgaggt
gaaaaggcca agaggtttgg ctcctgccca ctgatagccc ctctccccgc 6900agtgtttgtg
tgtcaagtgg caaagctgtt cttcctggtg accctgatta tatccagtaa 6960cacatagact
gtgcgcatag gcctgctttg tctcctctat cctgggcttt tgttttgctt 7020tttagttttg
cttttagttt ttctgtccct tttatttaac gcaccgacta gacacacaaa 7080gcagttgaat
ttttatatat atatctgtat attgcacaat tataaactca ttttgcttgt 7140ggctccacac
acacaaaaaa agacctgtta aaattatacc tgttgcttaa ttacaatatt 7200tctgataacc
atagcatagg acaagggaaa ataaaaaaag aaaaaaaaga aaaaaaaacg 7260acaaatctgt
ctgctggtca cttcttctgt ccaagcagat tcgtggtctt ttcctcgctt 7320ctttcaaggg
ctttcctgtg ccaggtgaag gaggctccag gcagcaccca ggttttgcac 7380tcttgtttct
cccgtgcttg tgaaagaggt cccaaggttc tgggtgcagg agcgctccct 7440tgacctgctg
aagtccggaa cgtagtcggc acagcctggt cgccttccac ctctgggagc 7500tggagtccac
tggggtggcc tgactccccc agtccccttc ccgtgacctg gtcagggtga 7560gcccatgtgg
agtcagcctc gcaggcctcc ctgccagtag ggtccgagtg tgtttcatcc 7620ttcccactct
gtcgagcctg ggggctggag cggagacggg aggcctggcc tgtctcggaa 7680cctgtgagct
gcaccaggta gaacgccagg gaccccagaa tcatgtgcgt cagtccaagg 7740ggtcccctcc
aggagtagtg aagactccag aaatgtccct ttcttctccc ccatcctacg 7800agtaattgca
tttgcttttg taattcttaa tgagcaatat ctgctagaga gtttagctgt 7860aacagttctt
tttgatcatc tttttttaat aattagaaac accaaaaaaa tccagaaact 7920tgttcttcca
aagcagagag cattataatc accagggcca aaagcttccc tccctgctgt 7980cattgcttct
tctgaggcct gaatccaaaa gaaaaacagc cataggccct ttcagtggcc 8040gggctacccg
tgagcccttc ggaggaccag ggctggggca gcctctgggc ccacatccgg 8100ggccagctcc
ggcgtgtgtt cagtgttagc agtgggtcat gatgctcttt cccacccagc 8160ctgggatagg
ggcagaggag gcgaggaggc cgttgccgct gatgtttggc cgtgaacagg 8220tgggtgtctg
cgtgcgtcca cgtgcgtgtt ttctgactga catgaaatcg acgcccgagt 8280tagcctcacc
cggtgacctc tagccctgcc cggatggagc ggggcccacc cggttcagtg 8340tttctgggga
gctggacagt ggagtgcaaa aggcttgcag aacttgaagc ctgctccttc 8400ccttgctacc
acggcctcct ttccgtttga tttgtcactg cttcaatcaa taacagccgc 8460tccagagtca
gtagtcaatg aatatatgac caaatatcac caggactgtt actcaatgtg 8520tgccgagccc
ttgcccatgc tgggctcccg tgtatctgga cactgtaacg tgtgctgtgt 8580ttgctcccct
tccccttcct tctttgccct ttacttgtct ttctggggtt tttctgtttg 8640ggtttggttt
ggtttttatt tctccttttg tgttccaaac atgaggttct ctctactggt 8700cctcttaact
gtggtgttga ggcttatatt tgtgtaattt ttggtgggtg aaaggaattt 8760tgctaagtaa
atctcttctg tgtttgaact gaagtctgta ttgtaactat gtttaaagta 8820attgttccag
agacaaatat ttctagacac tttttcttta caaacaaaag cattcggagg 8880gagggggatg
gtgactgaga tgagagggga gagctgaaca gatgacccct gcccagatca 8940gccagaagcc
acccaaagca gtggagccca ggagtcccac tccaagccag caagccgaat 9000agctgatgtg
ttgccacttt ccaagtcact gcaaaaccag gttttgttcc gcccagtgga 9060ttcttgtttt
gcttcccctc cccccgagat tattaccacc atcccgtgct tttaaggaaa 9120ggcaagattg
atgtttcctt gaggggagcc aggaggggat gtgtgtgtgc agagctgaag 9180agctggggag
aatggggctg ggcccaccca agcaggaggc tgggacgctc tgctgtgggc 9240acaggtcagg
ctaatgttgg cagatgcagc tcttcctgga caggccaggt ggtgggcatt 9300ctctctccaa
ggtgtgcccc gtgggcatta ctgtttaaga cacttccgtc acatcccacc 9360ccatcctcca
gggctcaaca ctgtgacatc tctattcccc accctcccct tcccagggca 9420ataaaatgac
catggagggg gcttgcactc tcttggctgt cacccgatcg ccagcaaaac 9480ttagatgtga
gaaaacccct tcccattcca tggcgaaaac atctccttag aaaagccatt 9540accctcatta
ggcatggttt tgggctccca aaacacctga cagcccctcc ctcctctgag 9600aggcggagag
tgctgactgt agtgaccatt gcatgccggg tgcagcatct ggaagagcta 9660ggcagggtgt
ctgccccctc ctgagttgaa gtcatgctcc cctgtgccag cccagaggcc 9720gagagctatg
gacagcattg ccagtaacac aggccaccct gtgcagaagg gagctggctc 9780cagcctggaa
acctgtctga ggttgggaga ggtgcacttg gggcacaggg agaggccggg 9840acacacttag
ctggagatgt ctctaaaagc cctgtatcgt attcaccttc agtttttgtg 9900ttttgggaca
attactttag aaaataagta ggtcgtttta aaaacaaaaa ttattgattg 9960cttttttgta
gtgttcagaa aaaaggttct ttgtgtatag ccaaatgact gaaagcactg 10020atatatttaa
aaacaaaagg caatttatta aggaaatttg taccatttca gtaaacctgt 10080ctgaatgtac
ctgtatacgt ttcaaaaaca cccccccccc actgaatccc tgtaacctat 10140ttattatata
aagagtttgc cttataaatt t
101712210113DNAHomo sapiens 22gctccataaa aatacagact caccagttcc tgctttgatg
tgacatgtga ctccccagaa 60tacaccttgc ttctgtagac cagctccaac aggattccat
ggtagctggg atgttagggc 120tcagggaaga aaagtcagaa gaccaggacc tccagggcct
caaggacaaa cccctcaagt 180ttaaaaaggt gaagaaagat aagaaagaag agaaagaggg
caagcatgag cccgtgcagc 240catcagccca ccactctgct gagcccgcag aggcaggcaa
agcagagaca tcagaagggt 300caggctccgc cccggctgtg ccggaagctt ctgcctcccc
caaacagcgg cgctccatca 360tccgtgaccg gggacccatg tatgatgacc ccaccctgcc
tgaaggctgg acacggaagc 420ttaagcaaag gaaatctggc cgctctgctg ggaagtatga
tgtgtatttg atcaatcccc 480agggaaaagc ctttcgctct aaagtggagt tgattgcgta
cttcgaaaag gtaggcgaca 540catccctgga ccctaatgat tttgacttca cggtaactgg
gagagggagc ccctcccggc 600gagagcagaa accacctaag aagcccaaat ctcccaaagc
tccaggaact ggcagaggcc 660ggggacgccc caaagggagc ggcaccacga gacccaaggc
ggccacgtca gagggtgtgc 720aggtgaaaag ggtcctggag aaaagtcctg ggaagctcct
tgtcaagatg ccttttcaaa 780cttcgccagg gggcaaggct gaggggggtg gggccaccac
atccacccag gtcatggtga 840tcaaacgccc cggcaggaag cgaaaagctg aggccgaccc
tcaggccatt cccaagaaac 900ggggccgaaa gccggggagt gtggtggcag ccgctgccgc
cgaggccaaa aagaaagccg 960tgaaggagtc ttctatccga tctgtgcagg agaccgtact
ccccatcaag aagcgcaaga 1020cccgggagac ggtcagcatc gaggtcaagg aagtggtgaa
gcccctgctg gtgtccaccc 1080tcggtgagaa gagcgggaaa ggactgaaga cctgtaagag
ccctgggcgg aaaagcaagg 1140agagcagccc caaggggcgc agcagcagcg cctcctcacc
ccccaagaag gagcaccacc 1200accatcacca ccactcagag tccccaaagg cccccgtgcc
actgctccca cccctgcccc 1260cacctccacc tgagcccgag agctccgagg accccaccag
cccccctgag ccccaggact 1320tgagcagcag cgtctgcaaa gaggagaaga tgcccagagg
aggctcactg gagagcgacg 1380gctgccccaa ggagccagct aagactcagc ccgcggttgc
caccgccgcc acggccgcag 1440aaaagtacaa acaccgaggg gagggagagc gcaaagacat
tgtttcatcc tccatgccaa 1500ggccaaacag agaggagcct gtggacagcc ggacgcccgt
gaccgagaga gttagctgac 1560tttacacgga gcggattgca aagcaaacca acaagaataa
aggcagctgt tgtctcttct 1620ccttatgggt agggctctga caaagcttcc cgattaactg
aaataaaaaa tatttttttt 1680tctttcagta aacttagagt ttcgtggctt cagggtggga
gtagttggag cattggggat 1740gtttttctta ccgacaagca cagtcaggtt gaagacctaa
ccagggccag aagtagcttt 1800gcacttttct aaactaggct ccttcaacaa ggcttgctgc
agatactact gaccagacaa 1860gctgttgacc aggcacctcc cctcccgccc aaacctttcc
cccatgtggt cgttagagac 1920agagcgacag agcagttgag aggacactcc cgttttcggt
gccatcagtg ccccgtctac 1980agctccccca gctcccccca cctcccccac tcccaaccac
gttgggacag ggaggtgtga 2040ggcaggagag acagttggat tctttagaga agatggatat
gaccagtggc tatggcctgt 2100gcgatcccac ccgtggtggc tcaagtctgg ccccacacca
gccccaatcc aaaactggca 2160aggacgcttc acaggacagg aaagtggcac ctgtctgctc
cagctctggc atggctagga 2220ggggggagtc ccttgaacta ctgggtgtag actggcctga
accacaggag aggatggccc 2280agggtgaggt ggcatggtcc attctcaagg gacgtcctcc
aacgggtggc gctagaggcc 2340atggaggcag taggacaagg tgcaggcagg ctggcctggg
gtcaggccgg gcagagcaca 2400gcggggtgag agggattcct aatcactcag agcagtctgt
gacttagtgg acaggggagg 2460gggcaaaggg ggaggagaag aaaatgttct tccagttact
ttccaattct cctttaggga 2520cagcttagaa ttatttgcac tattgagtct tcatgttccc
acttcaaaac aaacagatgc 2580tctgagagca aactggcttg aattggtgac atttagtccc
tcaagccacc agatgtgaca 2640gtgttgagaa ctacctggat ttgtatatat acctgcgctt
gttttaaagt gggctcagca 2700catagggttc ccacgaagct ccgaaactct aagtgtttgc
tgcaatttta taaggacttc 2760ctgattggtt tctcttctcc ccttccattt ctgccttttg
ttcatttcat cctttcactt 2820ctttcccttc ctccgtcctc ctccttccta gttcatccct
tctcttccag gcagccgcgg 2880tgcccaacca cacttgtcgg ctccagtccc cagaactctg
cctgcccttt gtcctcctgc 2940tgccagtacc agccccaccc tgttttgagc cctgaggagg
ccttgggctc tgctgagtcc 3000gacctggcct gtctgtgaag agcaagagag cagcaaggtc
ttgctctcct aggtagcccc 3060ctcttccctg gtaagaaaaa gcaaaaggca tttcccaccc
tgaacaacga gccttttcac 3120ccttctactc tagagaagtg gactggagga gctgggcccg
atttggtagt tgaggaaagc 3180acagaggcct cctgtggcct gccagtcatc gagtggccca
acaggggctc catgccagcc 3240gaccttgacc tcactcagaa gtccagagtc tagcgtagtg
cagcagggca gtagcggtac 3300caatgcagaa ctcccaagac ccgagctggg accagtacct
gggtccccag cccttcctct 3360gctccccctt ttccctcgga gttcttcttg aatggcaatg
ttttgctttt gctcgatgca 3420gacagggggc cagaacacca cacatttcac tgtctgtctg
gtccatagct gtggtgtagg 3480ggcttagagg catgggcttg ctgtgggttt ttaattgatc
agttttcatg tgggatccca 3540tctttttaac ctctgttcag gaagtcctta tctagctgca
tatcttcatc atattggtat 3600atccttttct gtgtttacag agatgtctct tatatctaaa
tctgtccaac tgagaagtac 3660cttatcaaag tagcaaatga gacagcagtc ttatgcttcc
agaaacaccc acaggcatgt 3720cccatgtgag ctgctgccat gaactgtcaa gtgtgtgttg
tcttgtgtat ttcagttatt 3780gtccctggct tccttactat ggtgtaatca tgaaggagtg
aaacatcata gaaactgtct 3840agcacttcct tgccagtctt tagtgatcag gaaccatagt
tgacagttcc aatcagtagc 3900ttaagaaaaa accgtgtttg tctcttctgg aatggttaga
agtgagggag tttgccccgt 3960tctgtttgta gagtctcata gttggacttt ctagcatata
tgtgtccatt tccttatgct 4020gtaaaagcaa gtcctgcaac caaactccca tcagcccaat
ccctgatccc tgatcccttc 4080cacctgctct gctgatgacc cccccagctt cacttctgac
tcttccccag gaagggaagg 4140ggggtcagaa gagagggtga gtcctccaga actcttcctc
caaggacaga aggctcctgc 4200ccccatagtg gcctcgaact cctggcacta ccaaaggaca
cttatccacg agagcgcagc 4260atccgaccag gttgtcactg agaagatgtt tattttggtc
agttgggttt ttatgtatta 4320tacttagtca aatgtaatgt ggcttctgga atcattgtcc
agagctgctt ccccgtcacc 4380tgggcgtcat ctggtcctgg taagaggagt gcgtggccca
ccaggccccc ctgtcaccca 4440tgacagttca ttcagggccg atggggcagt cgtggttggg
aacacagcat ttcaagcgtc 4500actttatttc attcgggccc cacctgcagc tccctcaaag
aggcagttgc ccagcctctt 4560tcccttccag tttattccag agctgccagt ggggcctgag
gctccttagg gttttctctc 4620tatttccccc tttcttcctc attccctcgt ctttcccaaa
ggcatcacga gtcagtcgcc 4680tttcagcagg cagccttggc ggtttatcgc cctggcaggc
aggggccctg cagctctcat 4740gctgcccctg ccttggggtc aggttgacag gaggttggag
ggaaagcctt aagctgcagg 4800attctcacca gctgtgtccg gcccagtttt ggggtgtgac
ctcaatttca attttgtctg 4860tacttgaaca ttatgaagat gggggcctct ttcagtgaat
ttgtgaacag cagaattgac 4920cgacagcttt ccagtaccca tggggctagg tcattaaggc
cacatccaca gtctccccca 4980cccttgttcc agttgttagt tactacctcc tctcctgaca
atactgtatg tcgtcgagct 5040ccccccaggt ctacccctcc cggccctgcc tgctggtggg
cttgtcatag ccagtgggat 5100tgccggtctt gacagctcag tgagctggag atacttggtc
acagccaggc gctagcacag 5160ctcccttctg ttgatgctgt attcccatat caaaagacac
aggggacacc cagaaacgcc 5220acatccccca atccatcagt gccaaactag ccaacggccc
cagcttctca gctcgctgga 5280tggcggaagc tgctactcgt gagcgccagt gcgggtgcag
acaatcttct gttgggtggc 5340atcattccag gcccgaagca tgaacagtgc acctgggaca
gggagcagcc ccaaattgtc 5400acctgcttct ctgcccagct tttcattgct gtgacagtga
tggcgaaaga gggtaataac 5460cagacacaaa ctgccaagtt gggtggagaa aggagtttct
ttagctgaca gaatctctga 5520attttaaatc acttagtaag cggctcaagc ccaggaggga
gcagagggat acgagcggag 5580tcccctgcgc gggaccatct ggaattggtt tagcccaagt
ggagcctgac agccagaact 5640ctgtgtcccc cgtctaacca cagctccttt tccagagcat
tccagtcagg ctctctgggc 5700tgactgggcc aggggaggtt acaggtacca gttctttaag
aagatctttg ggcatataca 5760tttttagcct gtgtcattgc cccaaatgga ttcctgtttc
aagttcacac ctgcagattc 5820taggacctgt gtcctagact tcagggagtc agctgtttct
agagttccta ccatggagtg 5880ggtctggagg acctgcccgg tgggggggca gagccctgct
ccctccgggt cttcctactc 5940ttctctctgc tctgacggga tttgttgatt ctctccattt
tggtgtcttt ctcttttaga 6000tattgtatca atctttagaa aaggcatagt ctacttgtta
taaatcgtta ggatactgcc 6060tcccccaggg tctaaaatta catattagag gggaaaagct
gaacactgaa gtcagttctc 6120aacaatttag aaggaaaacc tagaaaacat ttggcagaaa
attacatttc gatgtttttg 6180aatgaatacg agcaagcttt tacaacagtg ctgatctaaa
aatacttagc acttggcctg 6240agatgcctgg tgagcattac aggcaagggg aatctggagg
tagccgacct gaggacatgg 6300cttctgaacc tgtcttttgg gagtggtatg gaaggtggag
cgttcaccag tgacctggaa 6360ggcccagcac caccctcctt cccactcttc tcatcttgac
agagcctgcc ccagcgctga 6420cgtgtcagga aaacacccag ggaactagga aggcacttct
gcctgagggg cagcctgcct 6480tgcccactcc tgctctgctc gcctcggatc agctgagcct
tctgagctgg cctctcactg 6540cctccccaag gccccctgcc tgccctgtca ggaggcagaa
ggaagcaggt gtgagggcag 6600tgcaaggagg gagcacaacc cccagctccc gctccgggct
ccgacttgtg cacaggcaga 6660gcccagaccc tggaggaaat cctacctttg aattcaagaa
catttgggga atttggaaat 6720ctctttgccc ccaaaccccc attctgtcct acctttaatc
aggtcctgct cagcagtgag 6780agcagatgag gtgaaaaggc caagaggttt ggctcctgcc
cactgatagc ccctctcccc 6840gcagtgtttg tgtgtcaagt ggcaaagctg ttcttcctgg
tgaccctgat tatatccagt 6900aacacataga ctgtgcgcat aggcctgctt tgtctcctct
atcctgggct tttgttttgc 6960tttttagttt tgcttttagt ttttctgtcc cttttattta
acgcaccgac tagacacaca 7020aagcagttga atttttatat atatatctgt atattgcaca
attataaact cattttgctt 7080gtggctccac acacacaaaa aaagacctgt taaaattata
cctgttgctt aattacaata 7140tttctgataa ccatagcata ggacaaggga aaataaaaaa
agaaaaaaaa gaaaaaaaaa 7200cgacaaatct gtctgctggt cacttcttct gtccaagcag
attcgtggtc ttttcctcgc 7260ttctttcaag ggctttcctg tgccaggtga aggaggctcc
aggcagcacc caggttttgc 7320actcttgttt ctcccgtgct tgtgaaagag gtcccaaggt
tctgggtgca ggagcgctcc 7380cttgacctgc tgaagtccgg aacgtagtcg gcacagcctg
gtcgccttcc acctctggga 7440gctggagtcc actggggtgg cctgactccc ccagtcccct
tcccgtgacc tggtcagggt 7500gagcccatgt ggagtcagcc tcgcaggcct ccctgccagt
agggtccgag tgtgtttcat 7560ccttcccact ctgtcgagcc tgggggctgg agcggagacg
ggaggcctgg cctgtctcgg 7620aacctgtgag ctgcaccagg tagaacgcca gggaccccag
aatcatgtgc gtcagtccaa 7680ggggtcccct ccaggagtag tgaagactcc agaaatgtcc
ctttcttctc ccccatccta 7740cgagtaattg catttgcttt tgtaattctt aatgagcaat
atctgctaga gagtttagct 7800gtaacagttc tttttgatca tcttttttta ataattagaa
acaccaaaaa aatccagaaa 7860cttgttcttc caaagcagag agcattataa tcaccagggc
caaaagcttc cctccctgct 7920gtcattgctt cttctgaggc ctgaatccaa aagaaaaaca
gccataggcc ctttcagtgg 7980ccgggctacc cgtgagccct tcggaggacc agggctgggg
cagcctctgg gcccacatcc 8040ggggccagct ccggcgtgtg ttcagtgtta gcagtgggtc
atgatgctct ttcccaccca 8100gcctgggata ggggcagagg aggcgaggag gccgttgccg
ctgatgtttg gccgtgaaca 8160ggtgggtgtc tgcgtgcgtc cacgtgcgtg ttttctgact
gacatgaaat cgacgcccga 8220gttagcctca cccggtgacc tctagccctg cccggatgga
gcggggccca cccggttcag 8280tgtttctggg gagctggaca gtggagtgca aaaggcttgc
agaacttgaa gcctgctcct 8340tcccttgcta ccacggcctc ctttccgttt gatttgtcac
tgcttcaatc aataacagcc 8400gctccagagt cagtagtcaa tgaatatatg accaaatatc
accaggactg ttactcaatg 8460tgtgccgagc ccttgcccat gctgggctcc cgtgtatctg
gacactgtaa cgtgtgctgt 8520gtttgctccc cttccccttc cttctttgcc ctttacttgt
ctttctgggg tttttctgtt 8580tgggtttggt ttggttttta tttctccttt tgtgttccaa
acatgaggtt ctctctactg 8640gtcctcttaa ctgtggtgtt gaggcttata tttgtgtaat
ttttggtggg tgaaaggaat 8700tttgctaagt aaatctcttc tgtgtttgaa ctgaagtctg
tattgtaact atgtttaaag 8760taattgttcc agagacaaat atttctagac actttttctt
tacaaacaaa agcattcgga 8820gggaggggga tggtgactga gatgagaggg gagagctgaa
cagatgaccc ctgcccagat 8880cagccagaag ccacccaaag cagtggagcc caggagtccc
actccaagcc agcaagccga 8940atagctgatg tgttgccact ttccaagtca ctgcaaaacc
aggttttgtt ccgcccagtg 9000gattcttgtt ttgcttcccc tccccccgag attattacca
ccatcccgtg cttttaagga 9060aaggcaagat tgatgtttcc ttgaggggag ccaggagggg
atgtgtgtgt gcagagctga 9120agagctgggg agaatggggc tgggcccacc caagcaggag
gctgggacgc tctgctgtgg 9180gcacaggtca ggctaatgtt ggcagatgca gctcttcctg
gacaggccag gtggtgggca 9240ttctctctcc aaggtgtgcc ccgtgggcat tactgtttaa
gacacttccg tcacatccca 9300ccccatcctc cagggctcaa cactgtgaca tctctattcc
ccaccctccc cttcccaggg 9360caataaaatg accatggagg gggcttgcac tctcttggct
gtcacccgat cgccagcaaa 9420acttagatgt gagaaaaccc cttcccattc catggcgaaa
acatctcctt agaaaagcca 9480ttaccctcat taggcatggt tttgggctcc caaaacacct
gacagcccct ccctcctctg 9540agaggcggag agtgctgact gtagtgacca ttgcatgccg
ggtgcagcat ctggaagagc 9600taggcagggt gtctgccccc tcctgagttg aagtcatgct
cccctgtgcc agcccagagg 9660ccgagagcta tggacagcat tgccagtaac acaggccacc
ctgtgcagaa gggagctggc 9720tccagcctgg aaacctgtct gaggttggga gaggtgcact
tggggcacag ggagaggccg 9780ggacacactt agctggagat gtctctaaaa gccctgtatc
gtattcacct tcagtttttg 9840tgttttggga caattacttt agaaaataag taggtcgttt
taaaaacaaa aattattgat 9900tgcttttttg tagtgttcag aaaaaaggtt ctttgtgtat
agccaaatga ctgaaagcac 9960tgatatattt aaaaacaaaa ggcaatttat taaggaaatt
tgtaccattt cagtaaacct 10020gtctgaatgt acctgtatac gtttcaaaaa cacccccccc
ccactgaatc cctgtaacct 10080atttattata taaagagttt gccttataaa ttt
101132310182DNAHomo sapiens 23ccggaaattg gccgccgccg
ccgccgccgc gccgagcgga ggaggaggag gaggcgagga 60ggagagactg ctccataaaa
atacagactc accagttcct gctttgatgt gacatgtgac 120tccccagaat acaccttgct
tctgtagacc agctccaaca ggattccatg gtagctggga 180tgttagggct cagggaagaa
aagtcagaag accaggacct ccagggcctc aaggacaaac 240ccctcaagtt taaaaaggtg
aagaaagata agaaagaaga gaaagagggc aagcatgagc 300ccgtgcagcc atcagcccac
cactctgctg agcccgcaga ggcaggcaaa gcagagacat 360cagaagggtc aggctccgcc
ccggctgtgc cggaagcttc tgcctccccc aaacagcggc 420gctccatcat ccgtgaccgg
ggacccatgt atgatgaccc caccctgcct gaaggctgga 480cacggaagct taagcaaagg
aaatctggcc gctctgctgg gaagtatgat gtgtatttga 540tcaatcccca gggaaaagcc
tttcgctcta aagtggagtt gattgcgtac ttcgaaaagg 600taggcgacac atccctggac
cctaatgatt ttgacttcac ggtaactggg agagggagcc 660cctcccggcg agagcagaaa
ccacctaaga agcccaaatc tcccaaagct ccaggaactg 720gcagaggccg gggacgcccc
aaagggagcg gcaccacgag acccaaggcg gccacgtcag 780agggtgtgca ggtgaaaagg
gtcctggaga aaagtcctgg gaagctcctt gtcaagatgc 840cttttcaaac ttcgccaggg
ggcaaggctg aggggggtgg ggccaccaca tccacccagg 900tcatggtgat caaacgcccc
ggcaggaagc gaaaagctga ggccgaccct caggccattc 960ccaagaaacg gggccgaaag
ccggggagtg tggtggcagc cgctgccgcc gaggccaaaa 1020agaaagccgt gaaggagtct
tctatccgat ctgtgcagga gaccgtactc cccatcaaga 1080agcgcaagac ccgggagacg
gtcagcatcg aggtcaagga agtggtgaag cccctgctgg 1140tgtccaccct cggtgagaag
agcgggaaag gactgaagac ctgtaagagc cctgggcgga 1200aaagcaagga gagcagcccc
aaggggcgca gcagcagcgc ctcctcaccc cccaagaagg 1260agcaccacca ccatcaccac
cactcagagt ccccaaaggc ccccgtgcca ctgctcccac 1320ccctgccccc acctccacct
gagcccgaga gctccgagga ccccaccagc ccccctgagc 1380cccaggactt gagcagcagc
gtctgcaaag aggagaagat gcccagagga ggctcactgg 1440agagcgacgg ctgccccaag
gagccagcta agactcagcc cgcggttgcc accgccgcca 1500cggccgcaga aaagtacaaa
caccgagggg agggagagcg caaagacatt gtttcatcct 1560ccatgccaag gccaaacaga
gaggagcctg tggacagccg gacgcccgtg accgagagag 1620ttagctgact ttacacggag
cggattgcaa agcaaaccaa caagaataaa ggcagctgtt 1680gtctcttctc cttatgggta
gggctctgac aaagcttccc gattaactga aataaaaaat 1740attttttttt ctttcagtaa
acttagagtt tcgtggcttc agggtgggag tagttggagc 1800attggggatg tttttcttac
cgacaagcac agtcaggttg aagacctaac cagggccaga 1860agtagctttg cacttttcta
aactaggctc cttcaacaag gcttgctgca gatactactg 1920accagacaag ctgttgacca
ggcacctccc ctcccgccca aacctttccc ccatgtggtc 1980gttagagaca gagcgacaga
gcagttgaga ggacactccc gttttcggtg ccatcagtgc 2040cccgtctaca gctcccccag
ctccccccac ctcccccact cccaaccacg ttgggacagg 2100gaggtgtgag gcaggagaga
cagttggatt ctttagagaa gatggatatg accagtggct 2160atggcctgtg cgatcccacc
cgtggtggct caagtctggc cccacaccag ccccaatcca 2220aaactggcaa ggacgcttca
caggacagga aagtggcacc tgtctgctcc agctctggca 2280tggctaggag gggggagtcc
cttgaactac tgggtgtaga ctggcctgaa ccacaggaga 2340ggatggccca gggtgaggtg
gcatggtcca ttctcaaggg acgtcctcca acgggtggcg 2400ctagaggcca tggaggcagt
aggacaaggt gcaggcaggc tggcctgggg tcaggccggg 2460cagagcacag cggggtgaga
gggattccta atcactcaga gcagtctgtg acttagtgga 2520caggggaggg ggcaaagggg
gaggagaaga aaatgttctt ccagttactt tccaattctc 2580ctttagggac agcttagaat
tatttgcact attgagtctt catgttccca cttcaaaaca 2640aacagatgct ctgagagcaa
actggcttga attggtgaca tttagtccct caagccacca 2700gatgtgacag tgttgagaac
tacctggatt tgtatatata cctgcgcttg ttttaaagtg 2760ggctcagcac atagggttcc
cacgaagctc cgaaactcta agtgtttgct gcaattttat 2820aaggacttcc tgattggttt
ctcttctccc cttccatttc tgccttttgt tcatttcatc 2880ctttcacttc tttcccttcc
tccgtcctcc tccttcctag ttcatccctt ctcttccagg 2940cagccgcggt gcccaaccac
acttgtcggc tccagtcccc agaactctgc ctgccctttg 3000tcctcctgct gccagtacca
gccccaccct gttttgagcc ctgaggaggc cttgggctct 3060gctgagtccg acctggcctg
tctgtgaaga gcaagagagc agcaaggtct tgctctccta 3120ggtagccccc tcttccctgg
taagaaaaag caaaaggcat ttcccaccct gaacaacgag 3180ccttttcacc cttctactct
agagaagtgg actggaggag ctgggcccga tttggtagtt 3240gaggaaagca cagaggcctc
ctgtggcctg ccagtcatcg agtggcccaa caggggctcc 3300atgccagccg accttgacct
cactcagaag tccagagtct agcgtagtgc agcagggcag 3360tagcggtacc aatgcagaac
tcccaagacc cgagctggga ccagtacctg ggtccccagc 3420ccttcctctg ctcccccttt
tccctcggag ttcttcttga atggcaatgt tttgcttttg 3480ctcgatgcag acagggggcc
agaacaccac acatttcact gtctgtctgg tccatagctg 3540tggtgtaggg gcttagaggc
atgggcttgc tgtgggtttt taattgatca gttttcatgt 3600gggatcccat ctttttaacc
tctgttcagg aagtccttat ctagctgcat atcttcatca 3660tattggtata tccttttctg
tgtttacaga gatgtctctt atatctaaat ctgtccaact 3720gagaagtacc ttatcaaagt
agcaaatgag acagcagtct tatgcttcca gaaacaccca 3780caggcatgtc ccatgtgagc
tgctgccatg aactgtcaag tgtgtgttgt cttgtgtatt 3840tcagttattg tccctggctt
ccttactatg gtgtaatcat gaaggagtga aacatcatag 3900aaactgtcta gcacttcctt
gccagtcttt agtgatcagg aaccatagtt gacagttcca 3960atcagtagct taagaaaaaa
ccgtgtttgt ctcttctgga atggttagaa gtgagggagt 4020ttgccccgtt ctgtttgtag
agtctcatag ttggactttc tagcatatat gtgtccattt 4080ccttatgctg taaaagcaag
tcctgcaacc aaactcccat cagcccaatc cctgatccct 4140gatcccttcc acctgctctg
ctgatgaccc ccccagcttc acttctgact cttccccagg 4200aagggaaggg gggtcagaag
agagggtgag tcctccagaa ctcttcctcc aaggacagaa 4260ggctcctgcc cccatagtgg
cctcgaactc ctggcactac caaaggacac ttatccacga 4320gagcgcagca tccgaccagg
ttgtcactga gaagatgttt attttggtca gttgggtttt 4380tatgtattat acttagtcaa
atgtaatgtg gcttctggaa tcattgtcca gagctgcttc 4440cccgtcacct gggcgtcatc
tggtcctggt aagaggagtg cgtggcccac caggcccccc 4500tgtcacccat gacagttcat
tcagggccga tggggcagtc gtggttggga acacagcatt 4560tcaagcgtca ctttatttca
ttcgggcccc acctgcagct ccctcaaaga ggcagttgcc 4620cagcctcttt cccttccagt
ttattccaga gctgccagtg gggcctgagg ctccttaggg 4680ttttctctct atttccccct
ttcttcctca ttccctcgtc tttcccaaag gcatcacgag 4740tcagtcgcct ttcagcaggc
agccttggcg gtttatcgcc ctggcaggca ggggccctgc 4800agctctcatg ctgcccctgc
cttggggtca ggttgacagg aggttggagg gaaagcctta 4860agctgcagga ttctcaccag
ctgtgtccgg cccagttttg gggtgtgacc tcaatttcaa 4920ttttgtctgt acttgaacat
tatgaagatg ggggcctctt tcagtgaatt tgtgaacagc 4980agaattgacc gacagctttc
cagtacccat ggggctaggt cattaaggcc acatccacag 5040tctcccccac ccttgttcca
gttgttagtt actacctcct ctcctgacaa tactgtatgt 5100cgtcgagctc cccccaggtc
tacccctccc ggccctgcct gctggtgggc ttgtcatagc 5160cagtgggatt gccggtcttg
acagctcagt gagctggaga tacttggtca cagccaggcg 5220ctagcacagc tcccttctgt
tgatgctgta ttcccatatc aaaagacaca ggggacaccc 5280agaaacgcca catcccccaa
tccatcagtg ccaaactagc caacggcccc agcttctcag 5340ctcgctggat ggcggaagct
gctactcgtg agcgccagtg cgggtgcaga caatcttctg 5400ttgggtggca tcattccagg
cccgaagcat gaacagtgca cctgggacag ggagcagccc 5460caaattgtca cctgcttctc
tgcccagctt ttcattgctg tgacagtgat ggcgaaagag 5520ggtaataacc agacacaaac
tgccaagttg ggtggagaaa ggagtttctt tagctgacag 5580aatctctgaa ttttaaatca
cttagtaagc ggctcaagcc caggagggag cagagggata 5640cgagcggagt cccctgcgcg
ggaccatctg gaattggttt agcccaagtg gagcctgaca 5700gccagaactc tgtgtccccc
gtctaaccac agctcctttt ccagagcatt ccagtcaggc 5760tctctgggct gactgggcca
ggggaggtta caggtaccag ttctttaaga agatctttgg 5820gcatatacat ttttagcctg
tgtcattgcc ccaaatggat tcctgtttca agttcacacc 5880tgcagattct aggacctgtg
tcctagactt cagggagtca gctgtttcta gagttcctac 5940catggagtgg gtctggagga
cctgcccggt gggggggcag agccctgctc cctccgggtc 6000ttcctactct tctctctgct
ctgacgggat ttgttgattc tctccatttt ggtgtctttc 6060tcttttagat attgtatcaa
tctttagaaa aggcatagtc tacttgttat aaatcgttag 6120gatactgcct cccccagggt
ctaaaattac atattagagg ggaaaagctg aacactgaag 6180tcagttctca acaatttaga
aggaaaacct agaaaacatt tggcagaaaa ttacatttcg 6240atgtttttga atgaatacga
gcaagctttt acaacagtgc tgatctaaaa atacttagca 6300cttggcctga gatgcctggt
gagcattaca ggcaagggga atctggaggt agccgacctg 6360aggacatggc ttctgaacct
gtcttttggg agtggtatgg aaggtggagc gttcaccagt 6420gacctggaag gcccagcacc
accctccttc ccactcttct catcttgaca gagcctgccc 6480cagcgctgac gtgtcaggaa
aacacccagg gaactaggaa ggcacttctg cctgaggggc 6540agcctgcctt gcccactcct
gctctgctcg cctcggatca gctgagcctt ctgagctggc 6600ctctcactgc ctccccaagg
ccccctgcct gccctgtcag gaggcagaag gaagcaggtg 6660tgagggcagt gcaaggaggg
agcacaaccc ccagctcccg ctccgggctc cgacttgtgc 6720acaggcagag cccagaccct
ggaggaaatc ctacctttga attcaagaac atttggggaa 6780tttggaaatc tctttgcccc
caaaccccca ttctgtccta cctttaatca ggtcctgctc 6840agcagtgaga gcagatgagg
tgaaaaggcc aagaggtttg gctcctgccc actgatagcc 6900cctctccccg cagtgtttgt
gtgtcaagtg gcaaagctgt tcttcctggt gaccctgatt 6960atatccagta acacatagac
tgtgcgcata ggcctgcttt gtctcctcta tcctgggctt 7020ttgttttgct ttttagtttt
gcttttagtt tttctgtccc ttttatttaa cgcaccgact 7080agacacacaa agcagttgaa
tttttatata tatatctgta tattgcacaa ttataaactc 7140attttgcttg tggctccaca
cacacaaaaa aagacctgtt aaaattatac ctgttgctta 7200attacaatat ttctgataac
catagcatag gacaagggaa aataaaaaaa gaaaaaaaag 7260aaaaaaaaac gacaaatctg
tctgctggtc acttcttctg tccaagcaga ttcgtggtct 7320tttcctcgct tctttcaagg
gctttcctgt gccaggtgaa ggaggctcca ggcagcaccc 7380aggttttgca ctcttgtttc
tcccgtgctt gtgaaagagg tcccaaggtt ctgggtgcag 7440gagcgctccc ttgacctgct
gaagtccgga acgtagtcgg cacagcctgg tcgccttcca 7500cctctgggag ctggagtcca
ctggggtggc ctgactcccc cagtcccctt cccgtgacct 7560ggtcagggtg agcccatgtg
gagtcagcct cgcaggcctc cctgccagta gggtccgagt 7620gtgtttcatc cttcccactc
tgtcgagcct gggggctgga gcggagacgg gaggcctggc 7680ctgtctcgga acctgtgagc
tgcaccaggt agaacgccag ggaccccaga atcatgtgcg 7740tcagtccaag gggtcccctc
caggagtagt gaagactcca gaaatgtccc tttcttctcc 7800cccatcctac gagtaattgc
atttgctttt gtaattctta atgagcaata tctgctagag 7860agtttagctg taacagttct
ttttgatcat ctttttttaa taattagaaa caccaaaaaa 7920atccagaaac ttgttcttcc
aaagcagaga gcattataat caccagggcc aaaagcttcc 7980ctccctgctg tcattgcttc
ttctgaggcc tgaatccaaa agaaaaacag ccataggccc 8040tttcagtggc cgggctaccc
gtgagccctt cggaggacca gggctggggc agcctctggg 8100cccacatccg gggccagctc
cggcgtgtgt tcagtgttag cagtgggtca tgatgctctt 8160tcccacccag cctgggatag
gggcagagga ggcgaggagg ccgttgccgc tgatgtttgg 8220ccgtgaacag gtgggtgtct
gcgtgcgtcc acgtgcgtgt tttctgactg acatgaaatc 8280gacgcccgag ttagcctcac
ccggtgacct ctagccctgc ccggatggag cggggcccac 8340ccggttcagt gtttctgggg
agctggacag tggagtgcaa aaggcttgca gaacttgaag 8400cctgctcctt cccttgctac
cacggcctcc tttccgtttg atttgtcact gcttcaatca 8460ataacagccg ctccagagtc
agtagtcaat gaatatatga ccaaatatca ccaggactgt 8520tactcaatgt gtgccgagcc
cttgcccatg ctgggctccc gtgtatctgg acactgtaac 8580gtgtgctgtg tttgctcccc
ttccccttcc ttctttgccc tttacttgtc tttctggggt 8640ttttctgttt gggtttggtt
tggtttttat ttctcctttt gtgttccaaa catgaggttc 8700tctctactgg tcctcttaac
tgtggtgttg aggcttatat ttgtgtaatt tttggtgggt 8760gaaaggaatt ttgctaagta
aatctcttct gtgtttgaac tgaagtctgt attgtaacta 8820tgtttaaagt aattgttcca
gagacaaata tttctagaca ctttttcttt acaaacaaaa 8880gcattcggag ggagggggat
ggtgactgag atgagagggg agagctgaac agatgacccc 8940tgcccagatc agccagaagc
cacccaaagc agtggagccc aggagtccca ctccaagcca 9000gcaagccgaa tagctgatgt
gttgccactt tccaagtcac tgcaaaacca ggttttgttc 9060cgcccagtgg attcttgttt
tgcttcccct ccccccgaga ttattaccac catcccgtgc 9120ttttaaggaa aggcaagatt
gatgtttcct tgaggggagc caggagggga tgtgtgtgtg 9180cagagctgaa gagctgggga
gaatggggct gggcccaccc aagcaggagg ctgggacgct 9240ctgctgtggg cacaggtcag
gctaatgttg gcagatgcag ctcttcctgg acaggccagg 9300tggtgggcat tctctctcca
aggtgtgccc cgtgggcatt actgtttaag acacttccgt 9360cacatcccac cccatcctcc
agggctcaac actgtgacat ctctattccc caccctcccc 9420ttcccagggc aataaaatga
ccatggaggg ggcttgcact ctcttggctg tcacccgatc 9480gccagcaaaa cttagatgtg
agaaaacccc ttcccattcc atggcgaaaa catctcctta 9540gaaaagccat taccctcatt
aggcatggtt ttgggctccc aaaacacctg acagcccctc 9600cctcctctga gaggcggaga
gtgctgactg tagtgaccat tgcatgccgg gtgcagcatc 9660tggaagagct aggcagggtg
tctgccccct cctgagttga agtcatgctc ccctgtgcca 9720gcccagaggc cgagagctat
ggacagcatt gccagtaaca caggccaccc tgtgcagaag 9780ggagctggct ccagcctgga
aacctgtctg aggttgggag aggtgcactt ggggcacagg 9840gagaggccgg gacacactta
gctggagatg tctctaaaag ccctgtatcg tattcacctt 9900cagtttttgt gttttgggac
aattacttta gaaaataagt aggtcgtttt aaaaacaaaa 9960attattgatt gcttttttgt
agtgttcaga aaaaaggttc tttgtgtata gccaaatgac 10020tgaaagcact gatatattta
aaaacaaaag gcaatttatt aaggaaattt gtaccatttc 10080agtaaacctg tctgaatgta
cctgtatacg tttcaaaaac accccccccc cactgaatcc 10140ctgtaaccta tttattatat
aaagagtttg ccttataaat tt 101822410180DNAHomo sapiens
24ccggaaaatg gccgccgccg ccgccgccgc gccgagcgga ggaggaggag gaggcgagga
60ggagagacct ccataaaaat acagactcac cagttcctgc tttgatgtga catgtgactc
120cccagaatac accttgcttc tgtagaccag ctccaacagg attccatggt agctgggatg
180ttagggctca gggaagaaaa gtcagaagac caggacctcc agggcctcaa ggacaaaccc
240ctcaagttta aaaaggtgaa gaaagataag aaagaagaga aagagggcaa gcatgagccc
300gtgcagccat cagcccacca ctctgctgag cccgcagagg caggcaaagc agagacatca
360gaagggtcag gctccgcccc ggctgtgccg gaagcttctg cctcccccaa acagcggcgc
420tccatcatcc gtgaccgggg acccatgtat gatgacccca ccctgcctga aggctggaca
480cggaagctta agcaaaggaa atctggccgc tctgctggga agtatgatgt gtatttgatc
540aatccccagg gaaaagcctt tcgctctaaa gtggagttga ttgcgtactt cgaaaaggta
600ggcgacacat ccctggaccc taatgatttt gacttcacgg taactgggag agggagcccc
660tcccggcgag agcagaaacc acctaagaag cccaaatctc ccaaagctcc aggaactggc
720agaggccggg gacgccccaa agggagcggc accacgagac ccaaggcggc cacgtcagag
780ggtgtgcagg tgaaaagggt cctggagaaa agtcctggga agctccttgt caagatgcct
840tttcaaactt cgccaggggg caaggctgag gggggtgggg ccaccacatc cacccaggtc
900atggtgatca aacgccccgg caggaagcga aaagctgagg ccgaccctca ggccattccc
960aagaaacggg gccgaaagcc ggggagtgtg gtggcagccg ctgccgccga ggccaaaaag
1020aaagccgtga aggagtcttc tatccgatct gtgcaggaga ccgtactccc catcaagaag
1080cgcaagaccc gggagacggt cagcatcgag gtcaaggaag tggtgaagcc cctgctggtg
1140tccaccctcg gtgagaagag cgggaaagga ctgaagacct gtaagagccc tgggcggaaa
1200agcaaggaga gcagccccaa ggggcgcagc agcagcgcct cctcaccccc caagaaggag
1260caccaccacc atcaccacca ctcagagtcc ccaaaggccc ccgtgccact gctcccaccc
1320ctgcccccac ctccacctga gcccgagagc tccgaggacc ccaccagccc ccctgagccc
1380caggacttga gcagcagcgt ctgcaaagag gagaagatgc ccagaggagg ctcactggag
1440agcgacggct gccccaagga gccagctaag actcagcccg cggttgccac cgccgccacg
1500gccgcagaaa agtacaaaca ccgaggggag ggagagcgca aagacattgt ttcatcctcc
1560atgccaaggc caaacagaga ggagcctgtg gacagccgga cgcccgtgac cgagagagtt
1620agctgacttt acacggagcg gattgcaaag caaaccaaca agaataaagg cagctgttgt
1680ctcttctcct tatgggtagg gctctgacaa agcttcccga ttaactgaaa taaaaaatat
1740ttttttttct ttcagtaaac ttagagtttc gtggcttcag ggtgggagta gttggagcat
1800tggggatgtt tttcttaccg acaagcacag tcaggttgaa gacctaacca gggccagaag
1860tagctttgca cttttctaaa ctaggctcct tcaacaaggc ttgctgcaga tactactgac
1920cagacaagct gttgaccagg cacctcccct cccgcccaaa cctttccccc atgtggtcgt
1980tagagacaga gcgacagagc agttgagagg acactcccgt tttcggtgcc atcagtgccc
2040cgtctacagc tcccccagct ccccccacct cccccactcc caaccacgtt gggacaggga
2100ggtgtgaggc aggagagaca gttggattct ttagagaaga tggatatgac cagtggctat
2160ggcctgtgcg atcccacccg tggtggctca agtctggccc cacaccagcc ccaatccaaa
2220actggcaagg acgcttcaca ggacaggaaa gtggcacctg tctgctccag ctctggcatg
2280gctaggaggg gggagtccct tgaactactg ggtgtagact ggcctgaacc acaggagagg
2340atggcccagg gtgaggtggc atggtccatt ctcaagggac gtcctccaac gggtggcgct
2400agaggccatg gaggcagtag gacaaggtgc aggcaggctg gcctggggtc aggccgggca
2460gagcacagcg gggtgagagg gattcctaat cactcagagc agtctgtgac ttagtggaca
2520ggggaggggg caaaggggga ggagaagaaa atgttcttcc agttactttc caattctcct
2580ttagggacag cttagaatta tttgcactat tgagtcttca tgttcccact tcaaaacaaa
2640cagatgctct gagagcaaac tggcttgaat tggtgacatt tagtccctca agccaccaga
2700tgtgacagtg ttgagaacta cctggatttg tatatatacc tgcgcttgtt ttaaagtggg
2760ctcagcacat agggttccca cgaagctccg aaactctaag tgtttgctgc aattttataa
2820ggacttcctg attggtttct cttctcccct tccatttctg ccttttgttc atttcatcct
2880ttcacttctt tcccttcctc cgtcctcctc cttcctagtt catcccttct cttccaggca
2940gccgcggtgc ccaaccacac ttgtcggctc cagtccccag aactctgcct gccctttgtc
3000ctcctgctgc cagtaccagc cccaccctgt tttgagccct gaggaggcct tgggctctgc
3060tgagtccgac ctggcctgtc tgtgaagagc aagagagcag caaggtcttg ctctcctagg
3120tagccccctc ttccctggta agaaaaagca aaaggcattt cccaccctga acaacgagcc
3180ttttcaccct tctactctag agaagtggac tggaggagct gggcccgatt tggtagttga
3240ggaaagcaca gaggcctcct gtggcctgcc agtcatcgag tggcccaaca ggggctccat
3300gccagccgac cttgacctca ctcagaagtc cagagtctag cgtagtgcag cagggcagta
3360gcggtaccaa tgcagaactc ccaagacccg agctgggacc agtacctggg tccccagccc
3420ttcctctgct cccccttttc cctcggagtt cttcttgaat ggcaatgttt tgcttttgct
3480cgatgcagac agggggccag aacaccacac atttcactgt ctgtctggtc catagctgtg
3540gtgtaggggc ttagaggcat gggcttgctg tgggttttta attgatcagt tttcatgtgg
3600gatcccatct ttttaacctc tgttcaggaa gtccttatct agctgcatat cttcatcata
3660ttggtatatc cttttctgtg tttacagaga tgtctcttat atctaaatct gtccaactga
3720gaagtacctt atcaaagtag caaatgagac agcagtctta tgcttccaga aacacccaca
3780ggcatgtccc atgtgagctg ctgccatgaa ctgtcaagtg tgtgttgtct tgtgtatttc
3840agttattgtc cctggcttcc ttactatggt gtaatcatga aggagtgaaa catcatagaa
3900actgtctagc acttccttgc cagtctttag tgatcaggaa ccatagttga cagttccaat
3960cagtagctta agaaaaaacc gtgtttgtct cttctggaat ggttagaagt gagggagttt
4020gccccgttct gtttgtagag tctcatagtt ggactttcta gcatatatgt gtccatttcc
4080ttatgctgta aaagcaagtc ctgcaaccaa actcccatca gcccaatccc tgatccctga
4140tcccttccac ctgctctgct gatgaccccc ccagcttcac ttctgactct tccccaggaa
4200gggaaggggg gtcagaagag agggtgagtc ctccagaact cttcctccaa ggacagaagg
4260ctcctgcccc catagtggcc tcgaactcct ggcactacca aaggacactt atccacgaga
4320gcgcagcatc cgaccaggtt gtcactgaga agatgtttat tttggtcagt tgggttttta
4380tgtattatac ttagtcaaat gtaatgtggc ttctggaatc attgtccaga gctgcttccc
4440cgtcacctgg gcgtcatctg gtcctggtaa gaggagtgcg tggcccacca ggcccccctg
4500tcacccatga cagttcattc agggccgatg gggcagtcgt ggttgggaac acagcatttc
4560aagcgtcact ttatttcatt cgggccccac ctgcagctcc ctcaaagagg cagttgccca
4620gcctctttcc cttccagttt attccagagc tgccagtggg gcctgaggct ccttagggtt
4680ttctctctat ttcccccttt cttcctcatt ccctcgtctt tcccaaaggc atcacgagtc
4740agtcgccttt cagcaggcag ccttggcggt ttatcgccct ggcaggcagg ggccctgcag
4800ctctcatgct gcccctgcct tggggtcagg ttgacaggag gttggaggga aagccttaag
4860ctgcaggatt ctcaccagct gtgtccggcc cagttttggg gtgtgacctc aatttcaatt
4920ttgtctgtac ttgaacatta tgaagatggg ggcctctttc agtgaatttg tgaacagcag
4980aattgaccga cagctttcca gtacccatgg ggctaggtca ttaaggccac atccacagtc
5040tcccccaccc ttgttccagt tgttagttac tacctcctct cctgacaata ctgtatgtcg
5100tcgagctccc cccaggtcta cccctcccgg ccctgcctgc tggtgggctt gtcatagcca
5160gtgggattgc cggtcttgac agctcagtga gctggagata cttggtcaca gccaggcgct
5220agcacagctc ccttctgttg atgctgtatt cccatatcaa aagacacagg ggacacccag
5280aaacgccaca tcccccaatc catcagtgcc aaactagcca acggccccag cttctcagct
5340cgctggatgg cggaagctgc tactcgtgag cgccagtgcg ggtgcagaca atcttctgtt
5400gggtggcatc attccaggcc cgaagcatga acagtgcacc tgggacaggg agcagcccca
5460aattgtcacc tgcttctctg cccagctttt cattgctgtg acagtgatgg cgaaagaggg
5520taataaccag acacaaactg ccaagttggg tggagaaagg agtttcttta gctgacagaa
5580tctctgaatt ttaaatcact tagtaagcgg ctcaagccca ggagggagca gagggatacg
5640agcggagtcc cctgcgcggg accatctgga attggtttag cccaagtgga gcctgacagc
5700cagaactctg tgtcccccgt ctaaccacag ctccttttcc agagcattcc agtcaggctc
5760tctgggctga ctgggccagg ggaggttaca ggtaccagtt ctttaagaag atctttgggc
5820atatacattt ttagcctgtg tcattgcccc aaatggattc ctgtttcaag ttcacacctg
5880cagattctag gacctgtgtc ctagacttca gggagtcagc tgtttctaga gttcctacca
5940tggagtgggt ctggaggacc tgcccggtgg gggggcagag ccctgctccc tccgggtctt
6000cctactcttc tctctgctct gacgggattt gttgattctc tccattttgg tgtctttctc
6060ttttagatat tgtatcaatc tttagaaaag gcatagtcta cttgttataa atcgttagga
6120tactgcctcc cccagggtct aaaattacat attagagggg aaaagctgaa cactgaagtc
6180agttctcaac aatttagaag gaaaacctag aaaacatttg gcagaaaatt acatttcgat
6240gtttttgaat gaatacgagc aagcttttac aacagtgctg atctaaaaat acttagcact
6300tggcctgaga tgcctggtga gcattacagg caaggggaat ctggaggtag ccgacctgag
6360gacatggctt ctgaacctgt cttttgggag tggtatggaa ggtggagcgt tcaccagtga
6420cctggaaggc ccagcaccac cctccttccc actcttctca tcttgacaga gcctgcccca
6480gcgctgacgt gtcaggaaaa cacccaggga actaggaagg cacttctgcc tgaggggcag
6540cctgccttgc ccactcctgc tctgctcgcc tcggatcagc tgagccttct gagctggcct
6600ctcactgcct ccccaaggcc ccctgcctgc cctgtcagga ggcagaagga agcaggtgtg
6660agggcagtgc aaggagggag cacaaccccc agctcccgct ccgggctccg acttgtgcac
6720aggcagagcc cagaccctgg aggaaatcct acctttgaat tcaagaacat ttggggaatt
6780tggaaatctc tttgccccca aacccccatt ctgtcctacc tttaatcagg tcctgctcag
6840cagtgagagc agatgaggtg aaaaggccaa gaggtttggc tcctgcccac tgatagcccc
6900tctccccgca gtgtttgtgt gtcaagtggc aaagctgttc ttcctggtga ccctgattat
6960atccagtaac acatagactg tgcgcatagg cctgctttgt ctcctctatc ctgggctttt
7020gttttgcttt ttagttttgc ttttagtttt tctgtccctt ttatttaacg caccgactag
7080acacacaaag cagttgaatt tttatatata tatctgtata ttgcacaatt ataaactcat
7140tttgcttgtg gctccacaca cacaaaaaaa gacctgttaa aattatacct gttgcttaat
7200tacaatattt ctgataacca tagcatagga caagggaaaa taaaaaaaga aaaaaaagaa
7260aaaaaaacga caaatctgtc tgctggtcac ttcttctgtc caagcagatt cgtggtcttt
7320tcctcgcttc tttcaagggc tttcctgtgc caggtgaagg aggctccagg cagcacccag
7380gttttgcact cttgtttctc ccgtgcttgt gaaagaggtc ccaaggttct gggtgcagga
7440gcgctccctt gacctgctga agtccggaac gtagtcggca cagcctggtc gccttccacc
7500tctgggagct ggagtccact ggggtggcct gactccccca gtccccttcc cgtgacctgg
7560tcagggtgag cccatgtgga gtcagcctcg caggcctccc tgccagtagg gtccgagtgt
7620gtttcatcct tcccactctg tcgagcctgg gggctggagc ggagacggga ggcctggcct
7680gtctcggaac ctgtgagctg caccaggtag aacgccaggg accccagaat catgtgcgtc
7740agtccaaggg gtcccctcca ggagtagtga agactccaga aatgtccctt tcttctcccc
7800catcctacga gtaattgcat ttgcttttgt aattcttaat gagcaatatc tgctagagag
7860tttagctgta acagttcttt ttgatcatct ttttttaata attagaaaca ccaaaaaaat
7920ccagaaactt gttcttccaa agcagagagc attataatca ccagggccaa aagcttccct
7980ccctgctgtc attgcttctt ctgaggcctg aatccaaaag aaaaacagcc ataggccctt
8040tcagtggccg ggctacccgt gagcccttcg gaggaccagg gctggggcag cctctgggcc
8100cacatccggg gccagctccg gcgtgtgttc agtgttagca gtgggtcatg atgctctttc
8160ccacccagcc tgggataggg gcagaggagg cgaggaggcc gttgccgctg atgtttggcc
8220gtgaacaggt gggtgtctgc gtgcgtccac gtgcgtgttt tctgactgac atgaaatcga
8280cgcccgagtt agcctcaccc ggtgacctct agccctgccc ggatggagcg gggcccaccc
8340ggttcagtgt ttctggggag ctggacagtg gagtgcaaaa ggcttgcaga acttgaagcc
8400tgctccttcc cttgctacca cggcctcctt tccgtttgat ttgtcactgc ttcaatcaat
8460aacagccgct ccagagtcag tagtcaatga atatatgacc aaatatcacc aggactgtta
8520ctcaatgtgt gccgagccct tgcccatgct gggctcccgt gtatctggac actgtaacgt
8580gtgctgtgtt tgctcccctt ccccttcctt ctttgccctt tacttgtctt tctggggttt
8640ttctgtttgg gtttggtttg gtttttattt ctccttttgt gttccaaaca tgaggttctc
8700tctactggtc ctcttaactg tggtgttgag gcttatattt gtgtaatttt tggtgggtga
8760aaggaatttt gctaagtaaa tctcttctgt gtttgaactg aagtctgtat tgtaactatg
8820tttaaagtaa ttgttccaga gacaaatatt tctagacact ttttctttac aaacaaaagc
8880attcggaggg agggggatgg tgactgagat gagaggggag agctgaacag atgacccctg
8940cccagatcag ccagaagcca cccaaagcag tggagcccag gagtcccact ccaagccagc
9000aagccgaata gctgatgtgt tgccactttc caagtcactg caaaaccagg ttttgttccg
9060cccagtggat tcttgttttg cttcccctcc ccccgagatt attaccacca tcccgtgctt
9120ttaaggaaag gcaagattga tgtttccttg aggggagcca ggaggggatg tgtgtgtgca
9180gagctgaaga gctggggaga atggggctgg gcccacccaa gcaggaggct gggacgctct
9240gctgtgggca caggtcaggc taatgttggc agatgcagct cttcctggac aggccaggtg
9300gtgggcattc tctctccaag gtgtgccccg tgggcattac tgtttaagac acttccgtca
9360catcccaccc catcctccag ggctcaacac tgtgacatct ctattcccca ccctcccctt
9420cccagggcaa taaaatgacc atggaggggg cttgcactct cttggctgtc acccgatcgc
9480cagcaaaact tagatgtgag aaaacccctt cccattccat ggcgaaaaca tctccttaga
9540aaagccatta ccctcattag gcatggtttt gggctcccaa aacacctgac agcccctccc
9600tcctctgaga ggcggagagt gctgactgta gtgaccattg catgccgggt gcagcatctg
9660gaagagctag gcagggtgtc tgccccctcc tgagttgaag tcatgctccc ctgtgccagc
9720ccagaggccg agagctatgg acagcattgc cagtaacaca ggccaccctg tgcagaaggg
9780agctggctcc agcctggaaa cctgtctgag gttgggagag gtgcacttgg ggcacaggga
9840gaggccggga cacacttagc tggagatgtc tctaaaagcc ctgtatcgta ttcaccttca
9900gtttttgtgt tttgggacaa ttactttaga aaataagtag gtcgttttaa aaacaaaaat
9960tattgattgc ttttttgtag tgttcagaaa aaaggttctt tgtgtatagc caaatgactg
10020aaagcactga tatatttaaa aacaaaaggc aatttattaa ggaaatttgt accatttcag
10080taaacctgtc tgaatgtacc tgtatacgtt tcaaaaacac ccccccccca ctgaatccct
10140gtaacctatt tattatataa agagtttgcc ttataaattt
101802510191DNAHomo sapiens 25ccggaaaatg gccgccgccg ccgccgccgc cgccgccgcg
ccgagcggag gaggaggagg 60aggcgaggag gagagactgc tccataaaaa tacagactca
ccagttcctg ctttgatgtg 120acatgtgact ccccagaata caccttgctt ctgtagacca
gctccaacag gattccatgg 180tagctgggat gttagggctc agggaagaaa agtcagaaga
ccaggacctc cagggcctca 240aggacaaacc cctcaagttt aaaaaggtga agaaagataa
gaaagaagag aaagagggca 300agcatgagcc cgtgcagcca tcagcccacc actctgctga
gcccgcagag gcaggcaaag 360cagagacatc agaagggtca ggctccgccc cggctgtgcc
ggaagcttct gcctccccca 420aacagcggcg ctccatcatc cgtgaccggg gacccatgta
tgatgacccc accctgcctg 480aaggctggac acggaagctt aagcaaagga aatctggccg
ctctgctggg aagtatgatg 540tgtatttgat caatccccag ggaaaagcct ttcgctctaa
agtggagttg attgcgtact 600tcgaaaaggt aggcgacaca tccctggacc ctaatgattt
tgacttcacg gtaactggga 660gagggagccc ctcccggcga gagcagaaac cacctaagaa
gcccaaatct cccaaagctc 720caggaactgg cagaggccgg ggacgcccca aagggagcgg
caccacgaga cccaaggcgg 780ccacgtcaga gggtgtgcag gtgaaaaggg tcctggagaa
aagtcctggg aagctccttg 840tcaagatgcc ttttcaaact tcgccagggg gcaaggctga
ggggggtggg gccaccacat 900ccacccaggt catggtgatc aaacgccccg gcaggaagcg
aaaagctgag gccgaccctc 960aggccattcc caagaaacgg ggccgaaagc cggggagtgt
ggtggcagcc gctgccgccg 1020aggccaaaaa gaaagccgtg aaggagtctt ctatccgatc
tgtgcaggag accgtactcc 1080ccatcaagaa gcgcaagacc cgggagacgg tcagcatcga
ggtcaaggaa gtggtgaagc 1140ccctgctggt gtccaccctc ggtgagaaga gcgggaaagg
actgaagacc tgtaagagcc 1200ctgggcggaa aagcaaggag agcagcccca aggggcgcag
cagcagcgcc tcctcacccc 1260ccaagaagga gcaccaccac catcaccacc actcagagtc
cccaaaggcc cccgtgccac 1320tgctcccacc cctgccccca cctccacctg agcccgagag
ctccgaggac cccaccagcc 1380cccctgagcc ccaggacttg agcagcagcg tctgcaaaga
ggagaagatg cccagaggag 1440gctcactgga gagcgacggc tgccccaagg agccagctaa
gactcagccc gcggttgcca 1500ccgccgccac ggccgcagaa aagtacaaac accgagggga
gggagagcgc aaagacattg 1560tttcatcctc catgccaagg ccaaacagag aggagcctgt
ggacagccgg acgcccgtga 1620ccgagagagt tagctgactt tacacggagc ggattgcaaa
gcaaaccaac aagaataaag 1680gcagctgttg tctcttctcc ttatgggtag ggctctgaca
aagcttcccg attaactgaa 1740ataaaaaata tttttttttc tttcagtaaa cttagagttt
cgtggcttca gggtgggagt 1800agttggagca ttggggatgt ttttcttacc gacaagcaca
gtcaggttga agacctaacc 1860agggccagaa gtagctttgc acttttctaa actaggctcc
ttcaacaagg cttgctgcag 1920atactactga ccagacaagc tgttgaccag gcacctcccc
tcccgcccaa acctttcccc 1980catgtggtcg ttagagacag agcgacagag cagttgagag
gacactcccg ttttcggtgc 2040catcagtgcc ccgtctacag ctcccccagc tccccccacc
tcccccactc ccaaccacgt 2100tgggacaggg aggtgtgagg caggagagac agttggattc
tttagagaag atggatatga 2160ccagtggcta tggcctgtgc gatcccaccc gtggtggctc
aagtctggcc ccacaccagc 2220cccaatccaa aactggcaag gacgcttcac aggacaggaa
agtggcacct gtctgctcca 2280gctctggcat ggctaggagg ggggagtccc ttgaactact
gggtgtagac tggcctgaac 2340cacaggagag gatggcccag ggtgaggtgg catggtccat
tctcaaggga cgtcctccaa 2400cgggtggcgc tagaggccat ggaggcagta ggacaaggtg
caggcaggct ggcctggggt 2460caggccgggc agagcacagc ggggtgagag ggattcctaa
tcactcagag cagtctgtga 2520cttagtggac aggggagggg gcaaaggggg aggagaagaa
aatgttcttc cagttacttt 2580ccaattctcc tttagggaca gcttagaatt atttgcacta
ttgagtcttc atgttcccac 2640ttcaaaacaa acagatgctc tgagagcaaa ctggcttgaa
ttggtgacat ttagtccctc 2700aagccaccag atgtgacagt gttgagaact acctggattt
gtatatatac ctgcgcttgt 2760tttaaagtgg gctcagcaca tagggttccc acgaagctcc
gaaactctaa gtgtttgctg 2820caattttata aggacttcct gattggtttc tcttctcccc
ttccatttct gccttttgtt 2880catttcatcc tttcacttct ttcccttcct ccgtcctcct
ccttcctagt tcatcccttc 2940tcttccaggc agccgcggtg cccaaccaca cttgtcggct
ccagtcccca gaactctgcc 3000tgccctttgt cctcctgctg ccagtaccag ccccaccctg
ttttgagccc tgaggaggcc 3060ttgggctctg ctgagtccga cctggcctgt ctgtgaagag
caagagagca gcaaggtctt 3120gctctcctag gtagccccct cttccctggt aagaaaaagc
aaaaggcatt tcccaccctg 3180aacaacgagc cttttcaccc ttctactcta gagaagtgga
ctggaggagc tgggcccgat 3240ttggtagttg aggaaagcac agaggcctcc tgtggcctgc
cagtcatcga gtggcccaac 3300aggggctcca tgccagccga ccttgacctc actcagaagt
ccagagtcta gcgtagtgca 3360gcagggcagt agcggtacca atgcagaact cccaagaccc
gagctgggac cagtacctgg 3420gtccccagcc cttcctctgc tccccctttt ccctcggagt
tcttcttgaa tggcaatgtt 3480ttgcttttgc tcgatgcaga cagggggcca gaacaccaca
catttcactg tctgtctggt 3540ccatagctgt ggtgtagggg cttagaggca tgggcttgct
gtgggttttt aattgatcag 3600ttttcatgtg ggatcccatc tttttaacct ctgttcagga
agtccttatc tagctgcata 3660tcttcatcat attggtatat ccttttctgt gtttacagag
atgtctctta tatctaaatc 3720tgtccaactg agaagtacct tatcaaagta gcaaatgaga
cagcagtctt atgcttccag 3780aaacacccac aggcatgtcc catgtgagct gctgccatga
actgtcaagt gtgtgttgtc 3840ttgtgtattt cagttattgt ccctggcttc cttactatgg
tgtaatcatg aaggagtgaa 3900acatcataga aactgtctag cacttccttg ccagtcttta
gtgatcagga accatagttg 3960acagttccaa tcagtagctt aagaaaaaac cgtgtttgtc
tcttctggaa tggttagaag 4020tgagggagtt tgccccgttc tgtttgtaga gtctcatagt
tggactttct agcatatatg 4080tgtccatttc cttatgctgt aaaagcaagt cctgcaacca
aactcccatc agcccaatcc 4140ctgatccctg atcccttcca cctgctctgc tgatgacccc
cccagcttca cttctgactc 4200ttccccagga agggaagggg ggtcagaaga gagggtgagt
cctccagaac tcttcctcca 4260aggacagaag gctcctgccc ccatagtggc ctcgaactcc
tggcactacc aaaggacact 4320tatccacgag agcgcagcat ccgaccaggt tgtcactgag
aagatgttta ttttggtcag 4380ttgggttttt atgtattata cttagtcaaa tgtaatgtgg
cttctggaat cattgtccag 4440agctgcttcc ccgtcacctg ggcgtcatct ggtcctggta
agaggagtgc gtggcccacc 4500aggcccccct gtcacccatg acagttcatt cagggccgat
ggggcagtcg tggttgggaa 4560cacagcattt caagcgtcac tttatttcat tcgggcccca
cctgcagctc cctcaaagag 4620gcagttgccc agcctctttc ccttccagtt tattccagag
ctgccagtgg ggcctgaggc 4680tccttagggt tttctctcta tttccccctt tcttcctcat
tccctcgtct ttcccaaagg 4740catcacgagt cagtcgcctt tcagcaggca gccttggcgg
tttatcgccc tggcaggcag 4800gggccctgca gctctcatgc tgcccctgcc ttggggtcag
gttgacagga ggttggaggg 4860aaagccttaa gctgcaggat tctcaccagc tgtgtccggc
ccagttttgg ggtgtgacct 4920caatttcaat tttgtctgta cttgaacatt atgaagatgg
gggcctcttt cagtgaattt 4980gtgaacagca gaattgaccg acagctttcc agtacccatg
gggctaggtc attaaggcca 5040catccacagt ctcccccacc cttgttccag ttgttagtta
ctacctcctc tcctgacaat 5100actgtatgtc gtcgagctcc ccccaggtct acccctcccg
gccctgcctg ctggtgggct 5160tgtcatagcc agtgggattg ccggtcttga cagctcagtg
agctggagat acttggtcac 5220agccaggcgc tagcacagct cccttctgtt gatgctgtat
tcccatatca aaagacacag 5280gggacaccca gaaacgccac atcccccaat ccatcagtgc
caaactagcc aacggcccca 5340gcttctcagc tcgctggatg gcggaagctg ctactcgtga
gcgccagtgc gggtgcagac 5400aatcttctgt tgggtggcat cattccaggc ccgaagcatg
aacagtgcac ctgggacagg 5460gagcagcccc aaattgtcac ctgcttctct gcccagcttt
tcattgctgt gacagtgatg 5520gcgaaagagg gtaataacca gacacaaact gccaagttgg
gtggagaaag gagtttcttt 5580agctgacaga atctctgaat tttaaatcac ttagtaagcg
gctcaagccc aggagggagc 5640agagggatac gagcggagtc ccctgcgcgg gaccatctgg
aattggttta gcccaagtgg 5700agcctgacag ccagaactct gtgtcccccg tctaaccaca
gctccttttc cagagcattc 5760cagtcaggct ctctgggctg actgggccag gggaggttac
aggtaccagt tctttaagaa 5820gatctttggg catatacatt tttagcctgt gtcattgccc
caaatggatt cctgtttcaa 5880gttcacacct gcagattcta ggacctgtgt cctagacttc
agggagtcag ctgtttctag 5940agttcctacc atggagtggg tctggaggac ctgcccggtg
ggggggcaga gccctgctcc 6000ctccgggtct tcctactctt ctctctgctc tgacgggatt
tgttgattct ctccattttg 6060gtgtctttct cttttagata ttgtatcaat ctttagaaaa
ggcatagtct acttgttata 6120aatcgttagg atactgcctc ccccagggtc taaaattaca
tattagaggg gaaaagctga 6180acactgaagt cagttctcaa caatttagaa ggaaaaccta
gaaaacattt ggcagaaaat 6240tacatttcga tgtttttgaa tgaatacgag caagctttta
caacagtgct gatctaaaaa 6300tacttagcac ttggcctgag atgcctggtg agcattacag
gcaaggggaa tctggaggta 6360gccgacctga ggacatggct tctgaacctg tcttttggga
gtggtatgga aggtggagcg 6420ttcaccagtg acctggaagg cccagcacca ccctccttcc
cactcttctc atcttgacag 6480agcctgcccc agcgctgacg tgtcaggaaa acacccaggg
aactaggaag gcacttctgc 6540ctgaggggca gcctgccttg cccactcctg ctctgctcgc
ctcggatcag ctgagccttc 6600tgagctggcc tctcactgcc tccccaaggc cccctgcctg
ccctgtcagg aggcagaagg 6660aagcaggtgt gagggcagtg caaggaggga gcacaacccc
cagctcccgc tccgggctcc 6720gacttgtgca caggcagagc ccagaccctg gaggaaatcc
tacctttgaa ttcaagaaca 6780tttggggaat ttggaaatct ctttgccccc aaacccccat
tctgtcctac ctttaatcag 6840gtcctgctca gcagtgagag cagatgaggt gaaaaggcca
agaggtttgg ctcctgccca 6900ctgatagccc ctctccccgc agtgtttgtg tgtcaagtgg
caaagctgtt cttcctggtg 6960accctgatta tatccagtaa cacatagact gtgcgcatag
gcctgctttg tctcctctat 7020cctgggcttt tgttttgctt tttagttttg cttttagttt
ttctgtccct tttatttaac 7080gcaccgacta gacacacaaa gcagttgaat ttttatatat
atatctgtat attgcacaat 7140tataaactca ttttgcttgt ggctccacac acacaaaaaa
agacctgtta aaattatacc 7200tgttgcttaa ttacaatatt tctgataacc atagcatagg
acaagggaaa ataaaaaaag 7260aaaaaaaaga aaaaaaaacg acaaatctgt ctgctggtca
cttcttctgt ccaagcagat 7320tcgtggtctt ttcctcgctt ctttcaaggg ctttcctgtg
ccaggtgaag gaggctccag 7380gcagcaccca ggttttgcac tcttgtttct cccgtgcttg
tgaaagaggt cccaaggttc 7440tgggtgcagg agcgctccct tgacctgctg aagtccggaa
cgtagtcggc acagcctggt 7500cgccttccac ctctgggagc tggagtccac tggggtggcc
tgactccccc agtccccttc 7560ccgtgacctg gtcagggtga gcccatgtgg agtcagcctc
gcaggcctcc ctgccagtag 7620ggtccgagtg tgtttcatcc ttcccactct gtcgagcctg
ggggctggag cggagacggg 7680aggcctggcc tgtctcggaa cctgtgagct gcaccaggta
gaacgccagg gaccccagaa 7740tcatgtgcgt cagtccaagg ggtcccctcc aggagtagtg
aagactccag aaatgtccct 7800ttcttctccc ccatcctacg agtaattgca tttgcttttg
taattcttaa tgagcaatat 7860ctgctagaga gtttagctgt aacagttctt tttgatcatc
tttttttaat aattagaaac 7920accaaaaaaa tccagaaact tgttcttcca aagcagagag
cattataatc accagggcca 7980aaagcttccc tccctgctgt cattgcttct tctgaggcct
gaatccaaaa gaaaaacagc 8040cataggccct ttcagtggcc gggctacccg tgagcccttc
ggaggaccag ggctggggca 8100gcctctgggc ccacatccgg ggccagctcc ggcgtgtgtt
cagtgttagc agtgggtcat 8160gatgctcttt cccacccagc ctgggatagg ggcagaggag
gcgaggaggc cgttgccgct 8220gatgtttggc cgtgaacagg tgggtgtctg cgtgcgtcca
cgtgcgtgtt ttctgactga 8280catgaaatcg acgcccgagt tagcctcacc cggtgacctc
tagccctgcc cggatggagc 8340ggggcccacc cggttcagtg tttctgggga gctggacagt
ggagtgcaaa aggcttgcag 8400aacttgaagc ctgctccttc ccttgctacc acggcctcct
ttccgtttga tttgtcactg 8460cttcaatcaa taacagccgc tccagagtca gtagtcaatg
aatatatgac caaatatcac 8520caggactgtt actcaatgtg tgccgagccc ttgcccatgc
tgggctcccg tgtatctgga 8580cactgtaacg tgtgctgtgt ttgctcccct tccccttcct
tctttgccct ttacttgtct 8640ttctggggtt tttctgtttg ggtttggttt ggtttttatt
tctccttttg tgttccaaac 8700atgaggttct ctctactggt cctcttaact gtggtgttga
ggcttatatt tgtgtaattt 8760ttggtgggtg aaaggaattt tgctaagtaa atctcttctg
tgtttgaact gaagtctgta 8820ttgtaactat gtttaaagta attgttccag agacaaatat
ttctagacac tttttcttta 8880caaacaaaag cattcggagg gagggggatg gtgactgaga
tgagagggga gagctgaaca 8940gatgacccct gcccagatca gccagaagcc acccaaagca
gtggagccca ggagtcccac 9000tccaagccag caagccgaat agctgatgtg ttgccacttt
ccaagtcact gcaaaaccag 9060gttttgttcc gcccagtgga ttcttgtttt gcttcccctc
cccccgagat tattaccacc 9120atcccgtgct tttaaggaaa ggcaagattg atgtttcctt
gaggggagcc aggaggggat 9180gtgtgtgtgc agagctgaag agctggggag aatggggctg
ggcccaccca agcaggaggc 9240tgggacgctc tgctgtgggc acaggtcagg ctaatgttgg
cagatgcagc tcttcctgga 9300caggccaggt ggtgggcatt ctctctccaa ggtgtgcccc
gtgggcatta ctgtttaaga 9360cacttccgtc acatcccacc ccatcctcca gggctcaaca
ctgtgacatc tctattcccc 9420accctcccct tcccagggca ataaaatgac catggagggg
gcttgcactc tcttggctgt 9480cacccgatcg ccagcaaaac ttagatgtga gaaaacccct
tcccattcca tggcgaaaac 9540atctccttag aaaagccatt accctcatta ggcatggttt
tgggctccca aaacacctga 9600cagcccctcc ctcctctgag aggcggagag tgctgactgt
agtgaccatt gcatgccggg 9660tgcagcatct ggaagagcta ggcagggtgt ctgccccctc
ctgagttgaa gtcatgctcc 9720cctgtgccag cccagaggcc gagagctatg gacagcattg
ccagtaacac aggccaccct 9780gtgcagaagg gagctggctc cagcctggaa acctgtctga
ggttgggaga ggtgcacttg 9840gggcacaggg agaggccggg acacacttag ctggagatgt
ctctaaaagc cctgtatcgt 9900attcaccttc agtttttgtg ttttgggaca attactttag
aaaataagta ggtcgtttta 9960aaaacaaaaa ttattgattg cttttttgta gtgttcagaa
aaaaggttct ttgtgtatag 10020ccaaatgact gaaagcactg atatatttaa aaacaaaagg
caatttatta aggaaatttg 10080taccatttca gtaaacctgt ctgaatgtac ctgtatacgt
ttcaaaaaca cccccccccc 10140actgaatccc tgtaacctat ttattatata aagagtttgc
cttataaatt t 101912610179DNAHomo sapiens 26ccggaaaatg
gccgccgccg ccgccgcgcc gagcggagga ggaggaggag gcgaggagga 60gagactgctc
cataaaaata cagactcacc agttcctgct ttgatgtgac atgtgactcc 120ccagaataca
ccttgcttct gtagaccagc tccaacagga ttccatggta gctgggatgt 180tagggctcag
ggaagaaaag tcagaagacc aggacctcca gggcctcaag gacaaacccc 240tcaagtttaa
aaaggtgaag aaagataaga aagaagagaa agagggcaag catgagcccg 300tgcagccatc
agcccaccac tctgctgagc ccgcagaggc aggcaaagca gagacatcag 360aagggtcagg
ctccgccccg gctgtgccgg aagcttctgc ctcccccaaa cagcggcgct 420ccatcatccg
tgaccgggga cccatgtatg atgaccccac cctgcctgaa ggctggacac 480ggaagcttaa
gcaaaggaaa tctggccgct ctgctgggaa gtatgatgtg tatttgatca 540atccccaggg
aaaagccttt cgctctaaag tggagttgat tgcgtacttc gaaaaggtag 600gcgacacatc
cctggaccct aatgattttg acttcacggt aactgggaga gggagcccct 660cccggcgaga
gcagaaacca cctaagaagc ccaaatctcc caaagctcca ggaactggca 720gaggccgggg
acgccccaaa gggagcggca ccacgagacc caaggcggcc acgtcagagg 780gtgtgcaggt
gaaaagggtc ctggagaaaa gtcctgggaa gctccttgtc aagatgcctt 840ttcaaacttc
gccagggggc aaggctgagg ggggtggggc caccacatcc acccaggtca 900tggtgatcaa
acgccccggc aggaagcgaa aagctgaggc cgaccctcag gccattccca 960agaaacgggg
ccgaaagccg gggagtgtgg tggcagccgc tgccgccgag gccaaaaaga 1020aagccgtgaa
ggagtcttct atccgatctg tgcaggagac cgtactcccc atcaagaagc 1080gcaagacccg
ggagacggtc agcatcgagg tcaaggaagt ggtgaagccc ctgctggtgt 1140ccaccctcgg
tgagaagagc gggaaaggac tgaagacctg taagagccct gggcggaaaa 1200gcaaggagag
cagccccaag gggcgcagca gcagcgcctc ctcacccccc aagaaggagc 1260accaccacca
tcaccaccac tcagagtccc caaaggcccc cgtgccactg ctcccacccc 1320tgcccccacc
tccacctgag cccgagagct ccgaggaccc caccagcccc cctgagcccc 1380aggacttgag
cagcagcgtc tgcaaagagg agaagatgcc cagaggaggc tcactggaga 1440gcgacggctg
ccccaaggag ccagctaaga ctcagcccgc ggttgccacc gccgccacgg 1500ccgcagaaaa
gtacaaacac cgaggggagg gagagcgcaa agacattgtt tcatcctcca 1560tgccaaggcc
aaacagagag gagcctgtgg acagccggac gcccgtgacc gagagagtta 1620gctgacttta
cacggagcgg attgcaaagc aaaccaacaa gaataaaggc agctgttgtc 1680tcttctcctt
atgggtaggg ctctgacaaa gcttcccgat taactgaaat aaaaaatatt 1740tttttttctt
tcagtaaact tagagtttcg tggcttcagg gtgggagtag ttggagcatt 1800ggggatgttt
ttcttaccga caagcacagt caggttgaag acctaaccag ggccagaagt 1860agctttgcac
ttttctaaac taggctcctt caacaaggct tgctgcagat actactgacc 1920agacaagctg
ttgaccaggc acctcccctc ccgcccaaac ctttccccca tgtggtcgtt 1980agagacagag
cgacagagca gttgagagga cactcccgtt ttcggtgcca tcagtgcccc 2040gtctacagct
cccccagctc cccccacctc ccccactccc aaccacgttg ggacagggag 2100gtgtgaggca
ggagagacag ttggattctt tagagaagat ggatatgacc agtggctatg 2160gcctgtgcga
tcccacccgt ggtggctcaa gtctggcccc acaccagccc caatccaaaa 2220ctggcaagga
cgcttcacag gacaggaaag tggcacctgt ctgctccagc tctggcatgg 2280ctaggagggg
ggagtccctt gaactactgg gtgtagactg gcctgaacca caggagagga 2340tggcccaggg
tgaggtggca tggtccattc tcaagggacg tcctccaacg ggtggcgcta 2400gaggccatgg
aggcagtagg acaaggtgca ggcaggctgg cctggggtca ggccgggcag 2460agcacagcgg
ggtgagaggg attcctaatc actcagagca gtctgtgact tagtggacag 2520gggagggggc
aaagggggag gagaagaaaa tgttcttcca gttactttcc aattctcctt 2580tagggacagc
ttagaattat ttgcactatt gagtcttcat gttcccactt caaaacaaac 2640agatgctctg
agagcaaact ggcttgaatt ggtgacattt agtccctcaa gccaccagat 2700gtgacagtgt
tgagaactac ctggatttgt atatatacct gcgcttgttt taaagtgggc 2760tcagcacata
gggttcccac gaagctccga aactctaagt gtttgctgca attttataag 2820gacttcctga
ttggtttctc ttctcccctt ccatttctgc cttttgttca tttcatcctt 2880tcacttcttt
cccttcctcc gtcctcctcc ttcctagttc atcccttctc ttccaggcag 2940ccgcggtgcc
caaccacact tgtcggctcc agtccccaga actctgcctg ccctttgtcc 3000tcctgctgcc
agtaccagcc ccaccctgtt ttgagccctg aggaggcctt gggctctgct 3060gagtccgacc
tggcctgtct gtgaagagca agagagcagc aaggtcttgc tctcctaggt 3120agccccctct
tccctggtaa gaaaaagcaa aaggcatttc ccaccctgaa caacgagcct 3180tttcaccctt
ctactctaga gaagtggact ggaggagctg ggcccgattt ggtagttgag 3240gaaagcacag
aggcctcctg tggcctgcca gtcatcgagt ggcccaacag gggctccatg 3300ccagccgacc
ttgacctcac tcagaagtcc agagtctagc gtagtgcagc agggcagtag 3360cggtaccaat
gcagaactcc caagacccga gctgggacca gtacctgggt ccccagccct 3420tcctctgctc
ccccttttcc ctcggagttc ttcttgaatg gcaatgtttt gcttttgctc 3480gatgcagaca
gggggccaga acaccacaca tttcactgtc tgtctggtcc atagctgtgg 3540tgtaggggct
tagaggcatg ggcttgctgt gggtttttaa ttgatcagtt ttcatgtggg 3600atcccatctt
tttaacctct gttcaggaag tccttatcta gctgcatatc ttcatcatat 3660tggtatatcc
ttttctgtgt ttacagagat gtctcttata tctaaatctg tccaactgag 3720aagtacctta
tcaaagtagc aaatgagaca gcagtcttat gcttccagaa acacccacag 3780gcatgtccca
tgtgagctgc tgccatgaac tgtcaagtgt gtgttgtctt gtgtatttca 3840gttattgtcc
ctggcttcct tactatggtg taatcatgaa ggagtgaaac atcatagaaa 3900ctgtctagca
cttccttgcc agtctttagt gatcaggaac catagttgac agttccaatc 3960agtagcttaa
gaaaaaaccg tgtttgtctc ttctggaatg gttagaagtg agggagtttg 4020ccccgttctg
tttgtagagt ctcatagttg gactttctag catatatgtg tccatttcct 4080tatgctgtaa
aagcaagtcc tgcaaccaaa ctcccatcag cccaatccct gatccctgat 4140cccttccacc
tgctctgctg atgacccccc cagcttcact tctgactctt ccccaggaag 4200ggaagggggg
tcagaagaga gggtgagtcc tccagaactc ttcctccaag gacagaaggc 4260tcctgccccc
atagtggcct cgaactcctg gcactaccaa aggacactta tccacgagag 4320cgcagcatcc
gaccaggttg tcactgagaa gatgtttatt ttggtcagtt gggtttttat 4380gtattatact
tagtcaaatg taatgtggct tctggaatca ttgtccagag ctgcttcccc 4440gtcacctggg
cgtcatctgg tcctggtaag aggagtgcgt ggcccaccag gcccccctgt 4500cacccatgac
agttcattca gggccgatgg ggcagtcgtg gttgggaaca cagcatttca 4560agcgtcactt
tatttcattc gggccccacc tgcagctccc tcaaagaggc agttgcccag 4620cctctttccc
ttccagttta ttccagagct gccagtgggg cctgaggctc cttagggttt 4680tctctctatt
tccccctttc ttcctcattc cctcgtcttt cccaaaggca tcacgagtca 4740gtcgcctttc
agcaggcagc cttggcggtt tatcgccctg gcaggcaggg gccctgcagc 4800tctcatgctg
cccctgcctt ggggtcaggt tgacaggagg ttggagggaa agccttaagc 4860tgcaggattc
tcaccagctg tgtccggccc agttttgggg tgtgacctca atttcaattt 4920tgtctgtact
tgaacattat gaagatgggg gcctctttca gtgaatttgt gaacagcaga 4980attgaccgac
agctttccag tacccatggg gctaggtcat taaggccaca tccacagtct 5040cccccaccct
tgttccagtt gttagttact acctcctctc ctgacaatac tgtatgtcgt 5100cgagctcccc
ccaggtctac ccctcccggc cctgcctgct ggtgggcttg tcatagccag 5160tgggattgcc
ggtcttgaca gctcagtgag ctggagatac ttggtcacag ccaggcgcta 5220gcacagctcc
cttctgttga tgctgtattc ccatatcaaa agacacaggg gacacccaga 5280aacgccacat
cccccaatcc atcagtgcca aactagccaa cggccccagc ttctcagctc 5340gctggatggc
ggaagctgct actcgtgagc gccagtgcgg gtgcagacaa tcttctgttg 5400ggtggcatca
ttccaggccc gaagcatgaa cagtgcacct gggacaggga gcagccccaa 5460attgtcacct
gcttctctgc ccagcttttc attgctgtga cagtgatggc gaaagagggt 5520aataaccaga
cacaaactgc caagttgggt ggagaaagga gtttctttag ctgacagaat 5580ctctgaattt
taaatcactt agtaagcggc tcaagcccag gagggagcag agggatacga 5640gcggagtccc
ctgcgcggga ccatctggaa ttggtttagc ccaagtggag cctgacagcc 5700agaactctgt
gtcccccgtc taaccacagc tccttttcca gagcattcca gtcaggctct 5760ctgggctgac
tgggccaggg gaggttacag gtaccagttc tttaagaaga tctttgggca 5820tatacatttt
tagcctgtgt cattgcccca aatggattcc tgtttcaagt tcacacctgc 5880agattctagg
acctgtgtcc tagacttcag ggagtcagct gtttctagag ttcctaccat 5940ggagtgggtc
tggaggacct gcccggtggg ggggcagagc cctgctccct ccgggtcttc 6000ctactcttct
ctctgctctg acgggatttg ttgattctct ccattttggt gtctttctct 6060tttagatatt
gtatcaatct ttagaaaagg catagtctac ttgttataaa tcgttaggat 6120actgcctccc
ccagggtcta aaattacata ttagagggga aaagctgaac actgaagtca 6180gttctcaaca
atttagaagg aaaacctaga aaacatttgg cagaaaatta catttcgatg 6240tttttgaatg
aatacgagca agcttttaca acagtgctga tctaaaaata cttagcactt 6300ggcctgagat
gcctggtgag cattacaggc aaggggaatc tggaggtagc cgacctgagg 6360acatggcttc
tgaacctgtc ttttgggagt ggtatggaag gtggagcgtt caccagtgac 6420ctggaaggcc
cagcaccacc ctccttccca ctcttctcat cttgacagag cctgccccag 6480cgctgacgtg
tcaggaaaac acccagggaa ctaggaaggc acttctgcct gaggggcagc 6540ctgccttgcc
cactcctgct ctgctcgcct cggatcagct gagccttctg agctggcctc 6600tcactgcctc
cccaaggccc cctgcctgcc ctgtcaggag gcagaaggaa gcaggtgtga 6660gggcagtgca
aggagggagc acaaccccca gctcccgctc cgggctccga cttgtgcaca 6720ggcagagccc
agaccctgga ggaaatccta cctttgaatt caagaacatt tggggaattt 6780ggaaatctct
ttgcccccaa acccccattc tgtcctacct ttaatcaggt cctgctcagc 6840agtgagagca
gatgaggtga aaaggccaag aggtttggct cctgcccact gatagcccct 6900ctccccgcag
tgtttgtgtg tcaagtggca aagctgttct tcctggtgac cctgattata 6960tccagtaaca
catagactgt gcgcataggc ctgctttgtc tcctctatcc tgggcttttg 7020ttttgctttt
tagttttgct tttagttttt ctgtcccttt tatttaacgc accgactaga 7080cacacaaagc
agttgaattt ttatatatat atctgtatat tgcacaatta taaactcatt 7140ttgcttgtgg
ctccacacac acaaaaaaag acctgttaaa attatacctg ttgcttaatt 7200acaatatttc
tgataaccat agcataggac aagggaaaat aaaaaaagaa aaaaaagaaa 7260aaaaaacgac
aaatctgtct gctggtcact tcttctgtcc aagcagattc gtggtctttt 7320cctcgcttct
ttcaagggct ttcctgtgcc aggtgaagga ggctccaggc agcacccagg 7380ttttgcactc
ttgtttctcc cgtgcttgtg aaagaggtcc caaggttctg ggtgcaggag 7440cgctcccttg
acctgctgaa gtccggaacg tagtcggcac agcctggtcg ccttccacct 7500ctgggagctg
gagtccactg gggtggcctg actcccccag tccccttccc gtgacctggt 7560cagggtgagc
ccatgtggag tcagcctcgc aggcctccct gccagtaggg tccgagtgtg 7620tttcatcctt
cccactctgt cgagcctggg ggctggagcg gagacgggag gcctggcctg 7680tctcggaacc
tgtgagctgc accaggtaga acgccaggga ccccagaatc atgtgcgtca 7740gtccaagggg
tcccctccag gagtagtgaa gactccagaa atgtcccttt cttctccccc 7800atcctacgag
taattgcatt tgcttttgta attcttaatg agcaatatct gctagagagt 7860ttagctgtaa
cagttctttt tgatcatctt tttttaataa ttagaaacac caaaaaaatc 7920cagaaacttg
ttcttccaaa gcagagagca ttataatcac cagggccaaa agcttccctc 7980cctgctgtca
ttgcttcttc tgaggcctga atccaaaaga aaaacagcca taggcccttt 8040cagtggccgg
gctacccgtg agcccttcgg aggaccaggg ctggggcagc ctctgggccc 8100acatccgggg
ccagctccgg cgtgtgttca gtgttagcag tgggtcatga tgctctttcc 8160cacccagcct
gggatagggg cagaggaggc gaggaggccg ttgccgctga tgtttggccg 8220tgaacaggtg
ggtgtctgcg tgcgtccacg tgcgtgtttt ctgactgaca tgaaatcgac 8280gcccgagtta
gcctcacccg gtgacctcta gccctgcccg gatggagcgg ggcccacccg 8340gttcagtgtt
tctggggagc tggacagtgg agtgcaaaag gcttgcagaa cttgaagcct 8400gctccttccc
ttgctaccac ggcctccttt ccgtttgatt tgtcactgct tcaatcaata 8460acagccgctc
cagagtcagt agtcaatgaa tatatgacca aatatcacca ggactgttac 8520tcaatgtgtg
ccgagccctt gcccatgctg ggctcccgtg tatctggaca ctgtaacgtg 8580tgctgtgttt
gctccccttc cccttccttc tttgcccttt acttgtcttt ctggggtttt 8640tctgtttggg
tttggtttgg tttttatttc tccttttgtg ttccaaacat gaggttctct 8700ctactggtcc
tcttaactgt ggtgttgagg cttatatttg tgtaattttt ggtgggtgaa 8760aggaattttg
ctaagtaaat ctcttctgtg tttgaactga agtctgtatt gtaactatgt 8820ttaaagtaat
tgttccagag acaaatattt ctagacactt tttctttaca aacaaaagca 8880ttcggaggga
gggggatggt gactgagatg agaggggaga gctgaacaga tgacccctgc 8940ccagatcagc
cagaagccac ccaaagcagt ggagcccagg agtcccactc caagccagca 9000agccgaatag
ctgatgtgtt gccactttcc aagtcactgc aaaaccaggt tttgttccgc 9060ccagtggatt
cttgttttgc ttcccctccc cccgagatta ttaccaccat cccgtgcttt 9120taaggaaagg
caagattgat gtttccttga ggggagccag gaggggatgt gtgtgtgcag 9180agctgaagag
ctggggagaa tggggctggg cccacccaag caggaggctg ggacgctctg 9240ctgtgggcac
aggtcaggct aatgttggca gatgcagctc ttcctggaca ggccaggtgg 9300tgggcattct
ctctccaagg tgtgccccgt gggcattact gtttaagaca cttccgtcac 9360atcccacccc
atcctccagg gctcaacact gtgacatctc tattccccac cctccccttc 9420ccagggcaat
aaaatgacca tggagggggc ttgcactctc ttggctgtca cccgatcgcc 9480agcaaaactt
agatgtgaga aaaccccttc ccattccatg gcgaaaacat ctccttagaa 9540aagccattac
cctcattagg catggttttg ggctcccaaa acacctgaca gcccctccct 9600cctctgagag
gcggagagtg ctgactgtag tgaccattgc atgccgggtg cagcatctgg 9660aagagctagg
cagggtgtct gccccctcct gagttgaagt catgctcccc tgtgccagcc 9720cagaggccga
gagctatgga cagcattgcc agtaacacag gccaccctgt gcagaaggga 9780gctggctcca
gcctggaaac ctgtctgagg ttgggagagg tgcacttggg gcacagggag 9840aggccgggac
acacttagct ggagatgtct ctaaaagccc tgtatcgtat tcaccttcag 9900tttttgtgtt
ttgggacaat tactttagaa aataagtagg tcgttttaaa aacaaaaatt 9960attgattgct
tttttgtagt gttcagaaaa aaggttcttt gtgtatagcc aaatgactga 10020aagcactgat
atatttaaaa acaaaaggca atttattaag gaaatttgta ccatttcagt 10080aaacctgtct
gaatgtacct gtatacgttt caaaaacacc ccccccccac tgaatccctg 10140taacctattt
attatataaa gagtttgcct tataaattt
101792710185DNAHomo sapiens 27ccggaaaatg gccgccgccg ccgccgccgc gccgagcgga
ggaggaggag gaggaggcga 60ggaggagaga ctgctccata aaaatacaga ctcaccagtt
cctgctttga tgtgacatgt 120gactccccag aatacacctt gcttctgtag accagctcca
acaggattcc atggtagctg 180ggatgttagg gctcagggaa gaaaagtcag aagaccagga
cctccagggc ctcaaggaca 240aacccctcaa gtttaaaaag gtgaagaaag ataagaaaga
agagaaagag ggcaagcatg 300agcccgtgca gccatcagcc caccactctg ctgagcccgc
agaggcaggc aaagcagaga 360catcagaagg gtcaggctcc gccccggctg tgccggaagc
ttctgcctcc cccaaacagc 420ggcgctccat catccgtgac cggggaccca tgtatgatga
ccccaccctg cctgaaggct 480ggacacggaa gcttaagcaa aggaaatctg gccgctctgc
tgggaagtat gatgtgtatt 540tgatcaatcc ccagggaaaa gcctttcgct ctaaagtgga
gttgattgcg tacttcgaaa 600aggtaggcga cacatccctg gaccctaatg attttgactt
cacggtaact gggagaggga 660gcccctcccg gcgagagcag aaaccaccta agaagcccaa
atctcccaaa gctccaggaa 720ctggcagagg ccggggacgc cccaaaggga gcggcaccac
gagacccaag gcggccacgt 780cagagggtgt gcaggtgaaa agggtcctgg agaaaagtcc
tgggaagctc cttgtcaaga 840tgccttttca aacttcgcca gggggcaagg ctgagggggg
tggggccacc acatccaccc 900aggtcatggt gatcaaacgc cccggcagga agcgaaaagc
tgaggccgac cctcaggcca 960ttcccaagaa acggggccga aagccgggga gtgtggtggc
agccgctgcc gccgaggcca 1020aaaagaaagc cgtgaaggag tcttctatcc gatctgtgca
ggagaccgta ctccccatca 1080agaagcgcaa gacccgggag acggtcagca tcgaggtcaa
ggaagtggtg aagcccctgc 1140tggtgtccac cctcggtgag aagagcggga aaggactgaa
gacctgtaag agccctgggc 1200ggaaaagcaa ggagagcagc cccaaggggc gcagcagcag
cgcctcctca ccccccaaga 1260aggagcacca ccaccatcac caccactcag agtccccaaa
ggcccccgtg ccactgctcc 1320cacccctgcc cccacctcca cctgagcccg agagctccga
ggaccccacc agcccccctg 1380agccccagga cttgagcagc agcgtctgca aagaggagaa
gatgcccaga ggaggctcac 1440tggagagcga cggctgcccc aaggagccag ctaagactca
gcccgcggtt gccaccgccg 1500ccacggccgc agaaaagtac aaacaccgag gggagggaga
gcgcaaagac attgtttcat 1560cctccatgcc aaggccaaac agagaggagc ctgtggacag
ccggacgccc gtgaccgaga 1620gagttagctg actttacacg gagcggattg caaagcaaac
caacaagaat aaaggcagct 1680gttgtctctt ctccttatgg gtagggctct gacaaagctt
cccgattaac tgaaataaaa 1740aatatttttt tttctttcag taaacttaga gtttcgtggc
ttcagggtgg gagtagttgg 1800agcattgggg atgtttttct taccgacaag cacagtcagg
ttgaagacct aaccagggcc 1860agaagtagct ttgcactttt ctaaactagg ctccttcaac
aaggcttgct gcagatacta 1920ctgaccagac aagctgttga ccaggcacct cccctcccgc
ccaaaccttt cccccatgtg 1980gtcgttagag acagagcgac agagcagttg agaggacact
cccgttttcg gtgccatcag 2040tgccccgtct acagctcccc cagctccccc cacctccccc
actcccaacc acgttgggac 2100agggaggtgt gaggcaggag agacagttgg attctttaga
gaagatggat atgaccagtg 2160gctatggcct gtgcgatccc acccgtggtg gctcaagtct
ggccccacac cagccccaat 2220ccaaaactgg caaggacgct tcacaggaca ggaaagtggc
acctgtctgc tccagctctg 2280gcatggctag gaggggggag tcccttgaac tactgggtgt
agactggcct gaaccacagg 2340agaggatggc ccagggtgag gtggcatggt ccattctcaa
gggacgtcct ccaacgggtg 2400gcgctagagg ccatggaggc agtaggacaa ggtgcaggca
ggctggcctg gggtcaggcc 2460gggcagagca cagcggggtg agagggattc ctaatcactc
agagcagtct gtgacttagt 2520ggacagggga gggggcaaag ggggaggaga agaaaatgtt
cttccagtta ctttccaatt 2580ctcctttagg gacagcttag aattatttgc actattgagt
cttcatgttc ccacttcaaa 2640acaaacagat gctctgagag caaactggct tgaattggtg
acatttagtc cctcaagcca 2700ccagatgtga cagtgttgag aactacctgg atttgtatat
atacctgcgc ttgttttaaa 2760gtgggctcag cacatagggt tcccacgaag ctccgaaact
ctaagtgttt gctgcaattt 2820tataaggact tcctgattgg tttctcttct ccccttccat
ttctgccttt tgttcatttc 2880atcctttcac ttctttccct tcctccgtcc tcctccttcc
tagttcatcc cttctcttcc 2940aggcagccgc ggtgcccaac cacacttgtc ggctccagtc
cccagaactc tgcctgccct 3000ttgtcctcct gctgccagta ccagccccac cctgttttga
gccctgagga ggccttgggc 3060tctgctgagt ccgacctggc ctgtctgtga agagcaagag
agcagcaagg tcttgctctc 3120ctaggtagcc ccctcttccc tggtaagaaa aagcaaaagg
catttcccac cctgaacaac 3180gagccttttc acccttctac tctagagaag tggactggag
gagctgggcc cgatttggta 3240gttgaggaaa gcacagaggc ctcctgtggc ctgccagtca
tcgagtggcc caacaggggc 3300tccatgccag ccgaccttga cctcactcag aagtccagag
tctagcgtag tgcagcaggg 3360cagtagcggt accaatgcag aactcccaag acccgagctg
ggaccagtac ctgggtcccc 3420agcccttcct ctgctccccc ttttccctcg gagttcttct
tgaatggcaa tgttttgctt 3480ttgctcgatg cagacagggg gccagaacac cacacatttc
actgtctgtc tggtccatag 3540ctgtggtgta ggggcttaga ggcatgggct tgctgtgggt
ttttaattga tcagttttca 3600tgtgggatcc catcttttta acctctgttc aggaagtcct
tatctagctg catatcttca 3660tcatattggt atatcctttt ctgtgtttac agagatgtct
cttatatcta aatctgtcca 3720actgagaagt accttatcaa agtagcaaat gagacagcag
tcttatgctt ccagaaacac 3780ccacaggcat gtcccatgtg agctgctgcc atgaactgtc
aagtgtgtgt tgtcttgtgt 3840atttcagtta ttgtccctgg cttccttact atggtgtaat
catgaaggag tgaaacatca 3900tagaaactgt ctagcacttc cttgccagtc tttagtgatc
aggaaccata gttgacagtt 3960ccaatcagta gcttaagaaa aaaccgtgtt tgtctcttct
ggaatggtta gaagtgaggg 4020agtttgcccc gttctgtttg tagagtctca tagttggact
ttctagcata tatgtgtcca 4080tttccttatg ctgtaaaagc aagtcctgca accaaactcc
catcagccca atccctgatc 4140cctgatccct tccacctgct ctgctgatga cccccccagc
ttcacttctg actcttcccc 4200aggaagggaa ggggggtcag aagagagggt gagtcctcca
gaactcttcc tccaaggaca 4260gaaggctcct gcccccatag tggcctcgaa ctcctggcac
taccaaagga cacttatcca 4320cgagagcgca gcatccgacc aggttgtcac tgagaagatg
tttattttgg tcagttgggt 4380ttttatgtat tatacttagt caaatgtaat gtggcttctg
gaatcattgt ccagagctgc 4440ttccccgtca cctgggcgtc atctggtcct ggtaagagga
gtgcgtggcc caccaggccc 4500ccctgtcacc catgacagtt cattcagggc cgatggggca
gtcgtggttg ggaacacagc 4560atttcaagcg tcactttatt tcattcgggc cccacctgca
gctccctcaa agaggcagtt 4620gcccagcctc tttcccttcc agtttattcc agagctgcca
gtggggcctg aggctcctta 4680gggttttctc tctatttccc cctttcttcc tcattccctc
gtctttccca aaggcatcac 4740gagtcagtcg cctttcagca ggcagccttg gcggtttatc
gccctggcag gcaggggccc 4800tgcagctctc atgctgcccc tgccttgggg tcaggttgac
aggaggttgg agggaaagcc 4860ttaagctgca ggattctcac cagctgtgtc cggcccagtt
ttggggtgtg acctcaattt 4920caattttgtc tgtacttgaa cattatgaag atgggggcct
ctttcagtga atttgtgaac 4980agcagaattg accgacagct ttccagtacc catggggcta
ggtcattaag gccacatcca 5040cagtctcccc cacccttgtt ccagttgtta gttactacct
cctctcctga caatactgta 5100tgtcgtcgag ctccccccag gtctacccct cccggccctg
cctgctggtg ggcttgtcat 5160agccagtggg attgccggtc ttgacagctc agtgagctgg
agatacttgg tcacagccag 5220gcgctagcac agctcccttc tgttgatgct gtattcccat
atcaaaagac acaggggaca 5280cccagaaacg ccacatcccc caatccatca gtgccaaact
agccaacggc cccagcttct 5340cagctcgctg gatggcggaa gctgctactc gtgagcgcca
gtgcgggtgc agacaatctt 5400ctgttgggtg gcatcattcc aggcccgaag catgaacagt
gcacctggga cagggagcag 5460ccccaaattg tcacctgctt ctctgcccag cttttcattg
ctgtgacagt gatggcgaaa 5520gagggtaata accagacaca aactgccaag ttgggtggag
aaaggagttt ctttagctga 5580cagaatctct gaattttaaa tcacttagta agcggctcaa
gcccaggagg gagcagaggg 5640atacgagcgg agtcccctgc gcgggaccat ctggaattgg
tttagcccaa gtggagcctg 5700acagccagaa ctctgtgtcc cccgtctaac cacagctcct
tttccagagc attccagtca 5760ggctctctgg gctgactggg ccaggggagg ttacaggtac
cagttcttta agaagatctt 5820tgggcatata catttttagc ctgtgtcatt gccccaaatg
gattcctgtt tcaagttcac 5880acctgcagat tctaggacct gtgtcctaga cttcagggag
tcagctgttt ctagagttcc 5940taccatggag tgggtctgga ggacctgccc ggtggggggg
cagagccctg ctccctccgg 6000gtcttcctac tcttctctct gctctgacgg gatttgttga
ttctctccat tttggtgtct 6060ttctctttta gatattgtat caatctttag aaaaggcata
gtctacttgt tataaatcgt 6120taggatactg cctcccccag ggtctaaaat tacatattag
aggggaaaag ctgaacactg 6180aagtcagttc tcaacaattt agaaggaaaa cctagaaaac
atttggcaga aaattacatt 6240tcgatgtttt tgaatgaata cgagcaagct tttacaacag
tgctgatcta aaaatactta 6300gcacttggcc tgagatgcct ggtgagcatt acaggcaagg
ggaatctgga ggtagccgac 6360ctgaggacat ggcttctgaa cctgtctttt gggagtggta
tggaaggtgg agcgttcacc 6420agtgacctgg aaggcccagc accaccctcc ttcccactct
tctcatcttg acagagcctg 6480ccccagcgct gacgtgtcag gaaaacaccc agggaactag
gaaggcactt ctgcctgagg 6540ggcagcctgc cttgcccact cctgctctgc tcgcctcgga
tcagctgagc cttctgagct 6600ggcctctcac tgcctcccca aggccccctg cctgccctgt
caggaggcag aaggaagcag 6660gtgtgagggc agtgcaagga gggagcacaa cccccagctc
ccgctccggg ctccgacttg 6720tgcacaggca gagcccagac cctggaggaa atcctacctt
tgaattcaag aacatttggg 6780gaatttggaa atctctttgc ccccaaaccc ccattctgtc
ctacctttaa tcaggtcctg 6840ctcagcagtg agagcagatg aggtgaaaag gccaagaggt
ttggctcctg cccactgata 6900gcccctctcc ccgcagtgtt tgtgtgtcaa gtggcaaagc
tgttcttcct ggtgaccctg 6960attatatcca gtaacacata gactgtgcgc ataggcctgc
tttgtctcct ctatcctggg 7020cttttgtttt gctttttagt tttgctttta gtttttctgt
cccttttatt taacgcaccg 7080actagacaca caaagcagtt gaatttttat atatatatct
gtatattgca caattataaa 7140ctcattttgc ttgtggctcc acacacacaa aaaaagacct
gttaaaatta tacctgttgc 7200ttaattacaa tatttctgat aaccatagca taggacaagg
gaaaataaaa aaagaaaaaa 7260aagaaaaaaa aacgacaaat ctgtctgctg gtcacttctt
ctgtccaagc agattcgtgg 7320tcttttcctc gcttctttca agggctttcc tgtgccaggt
gaaggaggct ccaggcagca 7380cccaggtttt gcactcttgt ttctcccgtg cttgtgaaag
aggtcccaag gttctgggtg 7440caggagcgct cccttgacct gctgaagtcc ggaacgtagt
cggcacagcc tggtcgcctt 7500ccacctctgg gagctggagt ccactggggt ggcctgactc
ccccagtccc cttcccgtga 7560cctggtcagg gtgagcccat gtggagtcag cctcgcaggc
ctccctgcca gtagggtccg 7620agtgtgtttc atccttccca ctctgtcgag cctgggggct
ggagcggaga cgggaggcct 7680ggcctgtctc ggaacctgtg agctgcacca ggtagaacgc
cagggacccc agaatcatgt 7740gcgtcagtcc aaggggtccc ctccaggagt agtgaagact
ccagaaatgt ccctttcttc 7800tcccccatcc tacgagtaat tgcatttgct tttgtaattc
ttaatgagca atatctgcta 7860gagagtttag ctgtaacagt tctttttgat catctttttt
taataattag aaacaccaaa 7920aaaatccaga aacttgttct tccaaagcag agagcattat
aatcaccagg gccaaaagct 7980tccctccctg ctgtcattgc ttcttctgag gcctgaatcc
aaaagaaaaa cagccatagg 8040ccctttcagt ggccgggcta cccgtgagcc cttcggagga
ccagggctgg ggcagcctct 8100gggcccacat ccggggccag ctccggcgtg tgttcagtgt
tagcagtggg tcatgatgct 8160ctttcccacc cagcctggga taggggcaga ggaggcgagg
aggccgttgc cgctgatgtt 8220tggccgtgaa caggtgggtg tctgcgtgcg tccacgtgcg
tgttttctga ctgacatgaa 8280atcgacgccc gagttagcct cacccggtga cctctagccc
tgcccggatg gagcggggcc 8340cacccggttc agtgtttctg gggagctgga cagtggagtg
caaaaggctt gcagaacttg 8400aagcctgctc cttcccttgc taccacggcc tcctttccgt
ttgatttgtc actgcttcaa 8460tcaataacag ccgctccaga gtcagtagtc aatgaatata
tgaccaaata tcaccaggac 8520tgttactcaa tgtgtgccga gcccttgccc atgctgggct
cccgtgtatc tggacactgt 8580aacgtgtgct gtgtttgctc cccttcccct tccttctttg
ccctttactt gtctttctgg 8640ggtttttctg tttgggtttg gtttggtttt tatttctcct
tttgtgttcc aaacatgagg 8700ttctctctac tggtcctctt aactgtggtg ttgaggctta
tatttgtgta atttttggtg 8760ggtgaaagga attttgctaa gtaaatctct tctgtgtttg
aactgaagtc tgtattgtaa 8820ctatgtttaa agtaattgtt ccagagacaa atatttctag
acactttttc tttacaaaca 8880aaagcattcg gagggagggg gatggtgact gagatgagag
gggagagctg aacagatgac 8940ccctgcccag atcagccaga agccacccaa agcagtggag
cccaggagtc ccactccaag 9000ccagcaagcc gaatagctga tgtgttgcca ctttccaagt
cactgcaaaa ccaggttttg 9060ttccgcccag tggattcttg ttttgcttcc cctccccccg
agattattac caccatcccg 9120tgcttttaag gaaaggcaag attgatgttt ccttgagggg
agccaggagg ggatgtgtgt 9180gtgcagagct gaagagctgg ggagaatggg gctgggccca
cccaagcagg aggctgggac 9240gctctgctgt gggcacaggt caggctaatg ttggcagatg
cagctcttcc tggacaggcc 9300aggtggtggg cattctctct ccaaggtgtg ccccgtgggc
attactgttt aagacacttc 9360cgtcacatcc caccccatcc tccagggctc aacactgtga
catctctatt ccccaccctc 9420cccttcccag ggcaataaaa tgaccatgga gggggcttgc
actctcttgg ctgtcacccg 9480atcgccagca aaacttagat gtgagaaaac cccttcccat
tccatggcga aaacatctcc 9540ttagaaaagc cattaccctc attaggcatg gttttgggct
cccaaaacac ctgacagccc 9600ctccctcctc tgagaggcgg agagtgctga ctgtagtgac
cattgcatgc cgggtgcagc 9660atctggaaga gctaggcagg gtgtctgccc cctcctgagt
tgaagtcatg ctcccctgtg 9720ccagcccaga ggccgagagc tatggacagc attgccagta
acacaggcca ccctgtgcag 9780aagggagctg gctccagcct ggaaacctgt ctgaggttgg
gagaggtgca cttggggcac 9840agggagaggc cgggacacac ttagctggag atgtctctaa
aagccctgta tcgtattcac 9900cttcagtttt tgtgttttgg gacaattact ttagaaaata
agtaggtcgt tttaaaaaca 9960aaaattattg attgcttttt tgtagtgttc agaaaaaagg
ttctttgtgt atagccaaat 10020gactgaaagc actgatatat ttaaaaacaa aaggcaattt
attaaggaaa tttgtaccat 10080ttcagtaaac ctgtctgaat gtacctgtat acgtttcaaa
aacacccccc ccccactgaa 10140tccctgtaac ctatttatta tataaagagt ttgccttata
aattt 101852810227DNAHomo sapiens 28gggcgcgcgc
tccctcctct cggagagagg gctgtggtaa aagccgtccg gaaaatgcgc 60cgccgccgcc
gccgcgccga gcggaggagg aggaggaggc gaggaggaga gactgctcca 120taaaaataca
gactcaccag ttcctgcttt gatgtgacat gtgactcccc agaatacacc 180ttgcttctgt
agaccagctc caacaggatt ccatggtagc tgggatgtta gggctcaggg 240aagaaaagtc
agaagaccag gacctccagg gcctcaagga caaacccctc aagtttaaaa 300aggtgaagaa
agataagaaa gaagagaaag agggcaagca tgagcccgtg cagccatcag 360cccaccactc
tgctgagccc gcagaggcag gcaaagcaga gacatcagaa gggtcaggct 420ccgccccggc
tgtgccggaa gcttctgcct cccccaaaca gcggcgctcc atcatccgtg 480accggggacc
catgtatgat gaccccaccc tgcctgaagg ctggacacgg aagcttaagc 540aaaggaaatc
tggccgctct gctgggaagt atgatgtgta tttgatcaat ccccagggaa 600aagcctttcg
ctctaaagtg gagttgattg cgtacttcga aaaggtaggc gacacatccc 660tggaccctaa
tgattttgac ttcacggtaa ctgggagagg gagcccctcc cggcgagagc 720agaaaccacc
taagaagccc aaatctccca aagctccagg aactggcaga ggccggggac 780gccccaaagg
gagcggcacc acgagaccca aggcggccac gtcagagggt gtgcaggtga 840aaagggtcct
ggagaaaagt cctgggaagc tccttgtcaa gatgcctttt caaacttcgc 900cagggggcaa
ggctgagggg ggtggggcca ccacatccac ccaggtcatg gtgatcaaac 960gccccggcag
gaagcgaaaa gctgaggccg accctcaggc cattcccaag aaacggggcc 1020gaaagccggg
gagtgtggtg gcagccgctg ccgccgaggc caaaaagaaa gccgtgaagg 1080agtcttctat
ccgatctgtg caggagaccg tactccccat caagaagcgc aagacccggg 1140agacggtcag
catcgaggtc aaggaagtgg tgaagcccct gctggtgtcc accctcggtg 1200agaagagcgg
gaaaggactg aagacctgta agagccctgg gcggaaaagc aaggagagca 1260gccccaaggg
gcgcagcagc agcgcctcct caccccccaa gaaggagcac caccaccatc 1320accaccactc
agagtcccca aaggcccccg tgccactgct cccacccctg cccccacctc 1380cacctgagcc
cgagagctcc gaggacccca ccagcccccc tgagccccag gacttgagca 1440gcagcgtctg
caaagaggag aagatgccca gaggaggctc actggagagc gacggctgcc 1500ccaaggagcc
agctaagact cagcccgcgg ttgccaccgc cgccacggcc gcagaaaagt 1560acaaacaccg
aggggaggga gagcgcaaag acattgtttc atcctccatg ccaaggccaa 1620acagagagga
gcctgtggac agccggacgc ccgtgaccga gagagttagc tgactttaca 1680cggagcggat
tgcaaagcaa accaacaaga ataaaggcag ctgttgtctc ttctccttat 1740gggtagggct
ctgacaaagc ttcccgatta actgaaataa aaaatatttt tttttctttc 1800agtaaactta
gagtttcgtg gcttcagggt gggagtagtt ggagcattgg ggatgttttt 1860cttaccgaca
agcacagtca ggttgaagac ctaaccaggg ccagaagtag ctttgcactt 1920ttctaaacta
ggctccttca acaaggcttg ctgcagatac tactgaccag acaagctgtt 1980gaccaggcac
ctcccctccc gcccaaacct ttcccccatg tggtcgttag agacagagcg 2040acagagcagt
tgagaggaca ctcccgtttt cggtgccatc agtgccccgt ctacagctcc 2100cccagctccc
cccacctccc ccactcccaa ccacgttggg acagggaggt gtgaggcagg 2160agagacagtt
ggattcttta gagaagatgg atatgaccag tggctatggc ctgtgcgatc 2220ccacccgtgg
tggctcaagt ctggccccac accagcccca atccaaaact ggcaaggacg 2280cttcacagga
caggaaagtg gcacctgtct gctccagctc tggcatggct aggagggggg 2340agtcccttga
actactgggt gtagactggc ctgaaccaca ggagaggatg gcccagggtg 2400aggtggcatg
gtccattctc aagggacgtc ctccaacggg tggcgctaga ggccatggag 2460gcagtaggac
aaggtgcagg caggctggcc tggggtcagg ccgggcagag cacagcgggg 2520tgagagggat
tcctaatcac tcagagcagt ctgtgactta gtggacaggg gagggggcaa 2580agggggagga
gaagaaaatg ttcttccagt tactttccaa ttctccttta gggacagctt 2640agaattattt
gcactattga gtcttcatgt tcccacttca aaacaaacag atgctctgag 2700agcaaactgg
cttgaattgg tgacatttag tccctcaagc caccagatgt gacagtgttg 2760agaactacct
ggatttgtat atatacctgc gcttgtttta aagtgggctc agcacatagg 2820gttcccacga
agctccgaaa ctctaagtgt ttgctgcaat tttataagga cttcctgatt 2880ggtttctctt
ctccccttcc atttctgcct tttgttcatt tcatcctttc acttctttcc 2940cttcctccgt
cctcctcctt cctagttcat cccttctctt ccaggcagcc gcggtgccca 3000accacacttg
tcggctccag tccccagaac tctgcctgcc ctttgtcctc ctgctgccag 3060taccagcccc
accctgtttt gagccctgag gaggccttgg gctctgctga gtccgacctg 3120gcctgtctgt
gaagagcaag agagcagcaa ggtcttgctc tcctaggtag ccccctcttc 3180cctggtaaga
aaaagcaaaa ggcatttccc accctgaaca acgagccttt tcacccttct 3240actctagaga
agtggactgg aggagctggg cccgatttgg tagttgagga aagcacagag 3300gcctcctgtg
gcctgccagt catcgagtgg cccaacaggg gctccatgcc agccgacctt 3360gacctcactc
agaagtccag agtctagcgt agtgcagcag ggcagtagcg gtaccaatgc 3420agaactccca
agacccgagc tgggaccagt acctgggtcc ccagcccttc ctctgctccc 3480ccttttccct
cggagttctt cttgaatggc aatgttttgc ttttgctcga tgcagacagg 3540gggccagaac
accacacatt tcactgtctg tctggtccat agctgtggtg taggggctta 3600gaggcatggg
cttgctgtgg gtttttaatt gatcagtttt catgtgggat cccatctttt 3660taacctctgt
tcaggaagtc cttatctagc tgcatatctt catcatattg gtatatcctt 3720ttctgtgttt
acagagatgt ctcttatatc taaatctgtc caactgagaa gtaccttatc 3780aaagtagcaa
atgagacagc agtcttatgc ttccagaaac acccacaggc atgtcccatg 3840tgagctgctg
ccatgaactg tcaagtgtgt gttgtcttgt gtatttcagt tattgtccct 3900ggcttcctta
ctatggtgta atcatgaagg agtgaaacat catagaaact gtctagcact 3960tccttgccag
tctttagtga tcaggaacca tagttgacag ttccaatcag tagcttaaga 4020aaaaaccgtg
tttgtctctt ctggaatggt tagaagtgag ggagtttgcc ccgttctgtt 4080tgtagagtct
catagttgga ctttctagca tatatgtgtc catttcctta tgctgtaaaa 4140gcaagtcctg
caaccaaact cccatcagcc caatccctga tccctgatcc cttccacctg 4200ctctgctgat
gaccccccca gcttcacttc tgactcttcc ccaggaaggg aaggggggtc 4260agaagagagg
gtgagtcctc cagaactctt cctccaagga cagaaggctc ctgcccccat 4320agtggcctcg
aactcctggc actaccaaag gacacttatc cacgagagcg cagcatccga 4380ccaggttgtc
actgagaaga tgtttatttt ggtcagttgg gtttttatgt attatactta 4440gtcaaatgta
atgtggcttc tggaatcatt gtccagagct gcttccccgt cacctgggcg 4500tcatctggtc
ctggtaagag gagtgcgtgg cccaccaggc ccccctgtca cccatgacag 4560ttcattcagg
gccgatgggg cagtcgtggt tgggaacaca gcatttcaag cgtcacttta 4620tttcattcgg
gccccacctg cagctccctc aaagaggcag ttgcccagcc tctttccctt 4680ccagtttatt
ccagagctgc cagtggggcc tgaggctcct tagggttttc tctctatttc 4740cccctttctt
cctcattccc tcgtctttcc caaaggcatc acgagtcagt cgcctttcag 4800caggcagcct
tggcggttta tcgccctggc aggcaggggc cctgcagctc tcatgctgcc 4860cctgccttgg
ggtcaggttg acaggaggtt ggagggaaag ccttaagctg caggattctc 4920accagctgtg
tccggcccag ttttggggtg tgacctcaat ttcaattttg tctgtacttg 4980aacattatga
agatgggggc ctctttcagt gaatttgtga acagcagaat tgaccgacag 5040ctttccagta
cccatggggc taggtcatta aggccacatc cacagtctcc cccacccttg 5100ttccagttgt
tagttactac ctcctctcct gacaatactg tatgtcgtcg agctcccccc 5160aggtctaccc
ctcccggccc tgcctgctgg tgggcttgtc atagccagtg ggattgccgg 5220tcttgacagc
tcagtgagct ggagatactt ggtcacagcc aggcgctagc acagctccct 5280tctgttgatg
ctgtattccc atatcaaaag acacagggga cacccagaaa cgccacatcc 5340cccaatccat
cagtgccaaa ctagccaacg gccccagctt ctcagctcgc tggatggcgg 5400aagctgctac
tcgtgagcgc cagtgcgggt gcagacaatc ttctgttggg tggcatcatt 5460ccaggcccga
agcatgaaca gtgcacctgg gacagggagc agccccaaat tgtcacctgc 5520ttctctgccc
agcttttcat tgctgtgaca gtgatggcga aagagggtaa taaccagaca 5580caaactgcca
agttgggtgg agaaaggagt ttctttagct gacagaatct ctgaatttta 5640aatcacttag
taagcggctc aagcccagga gggagcagag ggatacgagc ggagtcccct 5700gcgcgggacc
atctggaatt ggtttagccc aagtggagcc tgacagccag aactctgtgt 5760cccccgtcta
accacagctc cttttccaga gcattccagt caggctctct gggctgactg 5820ggccagggga
ggttacaggt accagttctt taagaagatc tttgggcata tacattttta 5880gcctgtgtca
ttgccccaaa tggattcctg tttcaagttc acacctgcag attctaggac 5940ctgtgtccta
gacttcaggg agtcagctgt ttctagagtt cctaccatgg agtgggtctg 6000gaggacctgc
ccggtggggg ggcagagccc tgctccctcc gggtcttcct actcttctct 6060ctgctctgac
gggatttgtt gattctctcc attttggtgt ctttctcttt tagatattgt 6120atcaatcttt
agaaaaggca tagtctactt gttataaatc gttaggatac tgcctccccc 6180agggtctaaa
attacatatt agaggggaaa agctgaacac tgaagtcagt tctcaacaat 6240ttagaaggaa
aacctagaaa acatttggca gaaaattaca tttcgatgtt tttgaatgaa 6300tacgagcaag
cttttacaac agtgctgatc taaaaatact tagcacttgg cctgagatgc 6360ctggtgagca
ttacaggcaa ggggaatctg gaggtagccg acctgaggac atggcttctg 6420aacctgtctt
ttgggagtgg tatggaaggt ggagcgttca ccagtgacct ggaaggccca 6480gcaccaccct
ccttcccact cttctcatct tgacagagcc tgccccagcg ctgacgtgtc 6540aggaaaacac
ccagggaact aggaaggcac ttctgcctga ggggcagcct gccttgccca 6600ctcctgctct
gctcgcctcg gatcagctga gccttctgag ctggcctctc actgcctccc 6660caaggccccc
tgcctgccct gtcaggaggc agaaggaagc aggtgtgagg gcagtgcaag 6720gagggagcac
aacccccagc tcccgctccg ggctccgact tgtgcacagg cagagcccag 6780accctggagg
aaatcctacc tttgaattca agaacatttg gggaatttgg aaatctcttt 6840gcccccaaac
ccccattctg tcctaccttt aatcaggtcc tgctcagcag tgagagcaga 6900tgaggtgaaa
aggccaagag gtttggctcc tgcccactga tagcccctct ccccgcagtg 6960tttgtgtgtc
aagtggcaaa gctgttcttc ctggtgaccc tgattatatc cagtaacaca 7020tagactgtgc
gcataggcct gctttgtctc ctctatcctg ggcttttgtt ttgcttttta 7080gttttgcttt
tagtttttct gtccctttta tttaacgcac cgactagaca cacaaagcag 7140ttgaattttt
atatatatat ctgtatattg cacaattata aactcatttt gcttgtggct 7200ccacacacac
aaaaaaagac ctgttaaaat tatacctgtt gcttaattac aatatttctg 7260ataaccatag
cataggacaa gggaaaataa aaaaagaaaa aaaagaaaaa aaaacgacaa 7320atctgtctgc
tggtcacttc ttctgtccaa gcagattcgt ggtcttttcc tcgcttcttt 7380caagggcttt
cctgtgccag gtgaaggagg ctccaggcag cacccaggtt ttgcactctt 7440gtttctcccg
tgcttgtgaa agaggtccca aggttctggg tgcaggagcg ctcccttgac 7500ctgctgaagt
ccggaacgta gtcggcacag cctggtcgcc ttccacctct gggagctgga 7560gtccactggg
gtggcctgac tcccccagtc cccttcccgt gacctggtca gggtgagccc 7620atgtggagtc
agcctcgcag gcctccctgc cagtagggtc cgagtgtgtt tcatccttcc 7680cactctgtcg
agcctggggg ctggagcgga gacgggaggc ctggcctgtc tcggaacctg 7740tgagctgcac
caggtagaac gccagggacc ccagaatcat gtgcgtcagt ccaaggggtc 7800ccctccagga
gtagtgaaga ctccagaaat gtccctttct tctcccccat cctacgagta 7860attgcatttg
cttttgtaat tcttaatgag caatatctgc tagagagttt agctgtaaca 7920gttctttttg
atcatctttt tttaataatt agaaacacca aaaaaatcca gaaacttgtt 7980cttccaaagc
agagagcatt ataatcacca gggccaaaag cttccctccc tgctgtcatt 8040gcttcttctg
aggcctgaat ccaaaagaaa aacagccata ggccctttca gtggccgggc 8100tacccgtgag
cccttcggag gaccagggct ggggcagcct ctgggcccac atccggggcc 8160agctccggcg
tgtgttcagt gttagcagtg ggtcatgatg ctctttccca cccagcctgg 8220gataggggca
gaggaggcga ggaggccgtt gccgctgatg tttggccgtg aacaggtggg 8280tgtctgcgtg
cgtccacgtg cgtgttttct gactgacatg aaatcgacgc ccgagttagc 8340ctcacccggt
gacctctagc cctgcccgga tggagcgggg cccacccggt tcagtgtttc 8400tggggagctg
gacagtggag tgcaaaaggc ttgcagaact tgaagcctgc tccttccctt 8460gctaccacgg
cctcctttcc gtttgatttg tcactgcttc aatcaataac agccgctcca 8520gagtcagtag
tcaatgaata tatgaccaaa tatcaccagg actgttactc aatgtgtgcc 8580gagcccttgc
ccatgctggg ctcccgtgta tctggacact gtaacgtgtg ctgtgtttgc 8640tccccttccc
cttccttctt tgccctttac ttgtctttct ggggtttttc tgtttgggtt 8700tggtttggtt
tttatttctc cttttgtgtt ccaaacatga ggttctctct actggtcctc 8760ttaactgtgg
tgttgaggct tatatttgtg taatttttgg tgggtgaaag gaattttgct 8820aagtaaatct
cttctgtgtt tgaactgaag tctgtattgt aactatgttt aaagtaattg 8880ttccagagac
aaatatttct agacactttt tctttacaaa caaaagcatt cggagggagg 8940gggatggtga
ctgagatgag aggggagagc tgaacagatg acccctgccc agatcagcca 9000gaagccaccc
aaagcagtgg agcccaggag tcccactcca agccagcaag ccgaatagct 9060gatgtgttgc
cactttccaa gtcactgcaa aaccaggttt tgttccgccc agtggattct 9120tgttttgctt
cccctccccc cgagattatt accaccatcc cgtgctttta aggaaaggca 9180agattgatgt
ttccttgagg ggagccagga ggggatgtgt gtgtgcagag ctgaagagct 9240ggggagaatg
gggctgggcc cacccaagca ggaggctggg acgctctgct gtgggcacag 9300gtcaggctaa
tgttggcaga tgcagctctt cctggacagg ccaggtggtg ggcattctct 9360ctccaaggtg
tgccccgtgg gcattactgt ttaagacact tccgtcacat cccaccccat 9420cctccagggc
tcaacactgt gacatctcta ttccccaccc tccccttccc agggcaataa 9480aatgaccatg
gagggggctt gcactctctt ggctgtcacc cgatcgccag caaaacttag 9540atgtgagaaa
accccttccc attccatggc gaaaacatct ccttagaaaa gccattaccc 9600tcattaggca
tggttttggg ctcccaaaac acctgacagc ccctccctcc tctgagaggc 9660ggagagtgct
gactgtagtg accattgcat gccgggtgca gcatctggaa gagctaggca 9720gggtgtctgc
cccctcctga gttgaagtca tgctcccctg tgccagccca gaggccgaga 9780gctatggaca
gcattgccag taacacaggc caccctgtgc agaagggagc tggctccagc 9840ctggaaacct
gtctgaggtt gggagaggtg cacttggggc acagggagag gccgggacac 9900acttagctgg
agatgtctct aaaagccctg tatcgtattc accttcagtt tttgtgtttt 9960gggacaatta
ctttagaaaa taagtaggtc gttttaaaaa caaaaattat tgattgcttt 10020tttgtagtgt
tcagaaaaaa ggttctttgt gtatagccaa atgactgaaa gcactgatat 10080atttaaaaac
aaaaggcaat ttattaagga aatttgtacc atttcagtaa acctgtctga 10140atgtacctgt
atacgtttca aaaacacccc ccccccactg aatccctgta acctatttat 10200tatataaaga
gtttgcctta taaattt
102272910227DNAHomo sapiens 29gggcgcgcgc gctccctcct ctcggagagg gctgtggtaa
aagccgtccg gaaaatggcc 60gccgccgccg ccgccgccga gcggaggagg aggaggaggc
gaggaggaga gactgctcca 120taaaaataca gactcaccag ttcctgcttt gatgtgacat
gtgactcccc agaatacacc 180ttgcttctgt agaccagctc caacaggatt ccatggtagc
tgggatgtta gggctcaggg 240aagaaaagtc agaagaccag gacctccagg gcctcaagga
caaacccctc aagtttaaaa 300aggtgaagaa agataagaaa gaagagaaag agggcaagca
tgagcccgtg cagccatcag 360cccaccactc tgctgagccc gcagaggcag gcaaagcaga
gacatcagaa gggtcaggct 420ccgccccggc tgtgccggaa gcttctgcct cccccaaaca
gcggcgctcc atcatccgtg 480accggggacc catgtatgat gaccccaccc tgcctgaagg
ctggacacgg aagcttaagc 540aaaggaaatc tggccgctct gctgggaagt atgatgtgta
tttgatcaat ccccagggaa 600aagcctttcg ctctaaagtg gagttgattg cgtacttcga
aaaggtaggc gacacatccc 660tggaccctaa tgattttgac ttcacggtaa ctgggagagg
gagcccctcc cggcgagagc 720agaaaccacc taagaagccc aaatctccca aagctccagg
aactggcaga ggccggggac 780gccccaaagg gagcggcacc acgagaccca aggcggccac
gtcagagggt gtgcaggtga 840aaagggtcct ggagaaaagt cctgggaagc tccttgtcaa
gatgcctttt caaacttcgc 900cagggggcaa ggctgagggg ggtggggcca ccacatccac
ccaggtcatg gtgatcaaac 960gccccggcag gaagcgaaaa gctgaggccg accctcaggc
cattcccaag aaacggggcc 1020gaaagccggg gagtgtggtg gcagccgctg ccgccgaggc
caaaaagaaa gccgtgaagg 1080agtcttctat ccgatctgtg caggagaccg tactccccat
caagaagcgc aagacccggg 1140agacggtcag catcgaggtc aaggaagtgg tgaagcccct
gctggtgtcc accctcggtg 1200agaagagcgg gaaaggactg aagacctgta agagccctgg
gcggaaaagc aaggagagca 1260gccccaaggg gcgcagcagc agcgcctcct caccccccaa
gaaggagcac caccaccatc 1320accaccactc agagtcccca aaggcccccg tgccactgct
cccacccctg cccccacctc 1380cacctgagcc cgagagctcc gaggacccca ccagcccccc
tgagccccag gacttgagca 1440gcagcgtctg caaagaggag aagatgccca gaggaggctc
actggagagc gacggctgcc 1500ccaaggagcc agctaagact cagcccgcgg ttgccaccgc
cgccacggcc gcagaaaagt 1560acaaacaccg aggggaggga gagcgcaaag acattgtttc
atcctccatg ccaaggccaa 1620acagagagga gcctgtggac agccggacgc ccgtgaccga
gagagttagc tgactttaca 1680cggagcggat tgcaaagcaa accaacaaga ataaaggcag
ctgttgtctc ttctccttat 1740gggtagggct ctgacaaagc ttcccgatta actgaaataa
aaaatatttt tttttctttc 1800agtaaactta gagtttcgtg gcttcagggt gggagtagtt
ggagcattgg ggatgttttt 1860cttaccgaca agcacagtca ggttgaagac ctaaccaggg
ccagaagtag ctttgcactt 1920ttctaaacta ggctccttca acaaggcttg ctgcagatac
tactgaccag acaagctgtt 1980gaccaggcac ctcccctccc gcccaaacct ttcccccatg
tggtcgttag agacagagcg 2040acagagcagt tgagaggaca ctcccgtttt cggtgccatc
agtgccccgt ctacagctcc 2100cccagctccc cccacctccc ccactcccaa ccacgttggg
acagggaggt gtgaggcagg 2160agagacagtt ggattcttta gagaagatgg atatgaccag
tggctatggc ctgtgcgatc 2220ccacccgtgg tggctcaagt ctggccccac accagcccca
atccaaaact ggcaaggacg 2280cttcacagga caggaaagtg gcacctgtct gctccagctc
tggcatggct aggagggggg 2340agtcccttga actactgggt gtagactggc ctgaaccaca
ggagaggatg gcccagggtg 2400aggtggcatg gtccattctc aagggacgtc ctccaacggg
tggcgctaga ggccatggag 2460gcagtaggac aaggtgcagg caggctggcc tggggtcagg
ccgggcagag cacagcgggg 2520tgagagggat tcctaatcac tcagagcagt ctgtgactta
gtggacaggg gagggggcaa 2580agggggagga gaagaaaatg ttcttccagt tactttccaa
ttctccttta gggacagctt 2640agaattattt gcactattga gtcttcatgt tcccacttca
aaacaaacag atgctctgag 2700agcaaactgg cttgaattgg tgacatttag tccctcaagc
caccagatgt gacagtgttg 2760agaactacct ggatttgtat atatacctgc gcttgtttta
aagtgggctc agcacatagg 2820gttcccacga agctccgaaa ctctaagtgt ttgctgcaat
tttataagga cttcctgatt 2880ggtttctctt ctccccttcc atttctgcct tttgttcatt
tcatcctttc acttctttcc 2940cttcctccgt cctcctcctt cctagttcat cccttctctt
ccaggcagcc gcggtgccca 3000accacacttg tcggctccag tccccagaac tctgcctgcc
ctttgtcctc ctgctgccag 3060taccagcccc accctgtttt gagccctgag gaggccttgg
gctctgctga gtccgacctg 3120gcctgtctgt gaagagcaag agagcagcaa ggtcttgctc
tcctaggtag ccccctcttc 3180cctggtaaga aaaagcaaaa ggcatttccc accctgaaca
acgagccttt tcacccttct 3240actctagaga agtggactgg aggagctggg cccgatttgg
tagttgagga aagcacagag 3300gcctcctgtg gcctgccagt catcgagtgg cccaacaggg
gctccatgcc agccgacctt 3360gacctcactc agaagtccag agtctagcgt agtgcagcag
ggcagtagcg gtaccaatgc 3420agaactccca agacccgagc tgggaccagt acctgggtcc
ccagcccttc ctctgctccc 3480ccttttccct cggagttctt cttgaatggc aatgttttgc
ttttgctcga tgcagacagg 3540gggccagaac accacacatt tcactgtctg tctggtccat
agctgtggtg taggggctta 3600gaggcatggg cttgctgtgg gtttttaatt gatcagtttt
catgtgggat cccatctttt 3660taacctctgt tcaggaagtc cttatctagc tgcatatctt
catcatattg gtatatcctt 3720ttctgtgttt acagagatgt ctcttatatc taaatctgtc
caactgagaa gtaccttatc 3780aaagtagcaa atgagacagc agtcttatgc ttccagaaac
acccacaggc atgtcccatg 3840tgagctgctg ccatgaactg tcaagtgtgt gttgtcttgt
gtatttcagt tattgtccct 3900ggcttcctta ctatggtgta atcatgaagg agtgaaacat
catagaaact gtctagcact 3960tccttgccag tctttagtga tcaggaacca tagttgacag
ttccaatcag tagcttaaga 4020aaaaaccgtg tttgtctctt ctggaatggt tagaagtgag
ggagtttgcc ccgttctgtt 4080tgtagagtct catagttgga ctttctagca tatatgtgtc
catttcctta tgctgtaaaa 4140gcaagtcctg caaccaaact cccatcagcc caatccctga
tccctgatcc cttccacctg 4200ctctgctgat gaccccccca gcttcacttc tgactcttcc
ccaggaaggg aaggggggtc 4260agaagagagg gtgagtcctc cagaactctt cctccaagga
cagaaggctc ctgcccccat 4320agtggcctcg aactcctggc actaccaaag gacacttatc
cacgagagcg cagcatccga 4380ccaggttgtc actgagaaga tgtttatttt ggtcagttgg
gtttttatgt attatactta 4440gtcaaatgta atgtggcttc tggaatcatt gtccagagct
gcttccccgt cacctgggcg 4500tcatctggtc ctggtaagag gagtgcgtgg cccaccaggc
ccccctgtca cccatgacag 4560ttcattcagg gccgatgggg cagtcgtggt tgggaacaca
gcatttcaag cgtcacttta 4620tttcattcgg gccccacctg cagctccctc aaagaggcag
ttgcccagcc tctttccctt 4680ccagtttatt ccagagctgc cagtggggcc tgaggctcct
tagggttttc tctctatttc 4740cccctttctt cctcattccc tcgtctttcc caaaggcatc
acgagtcagt cgcctttcag 4800caggcagcct tggcggttta tcgccctggc aggcaggggc
cctgcagctc tcatgctgcc 4860cctgccttgg ggtcaggttg acaggaggtt ggagggaaag
ccttaagctg caggattctc 4920accagctgtg tccggcccag ttttggggtg tgacctcaat
ttcaattttg tctgtacttg 4980aacattatga agatgggggc ctctttcagt gaatttgtga
acagcagaat tgaccgacag 5040ctttccagta cccatggggc taggtcatta aggccacatc
cacagtctcc cccacccttg 5100ttccagttgt tagttactac ctcctctcct gacaatactg
tatgtcgtcg agctcccccc 5160aggtctaccc ctcccggccc tgcctgctgg tgggcttgtc
atagccagtg ggattgccgg 5220tcttgacagc tcagtgagct ggagatactt ggtcacagcc
aggcgctagc acagctccct 5280tctgttgatg ctgtattccc atatcaaaag acacagggga
cacccagaaa cgccacatcc 5340cccaatccat cagtgccaaa ctagccaacg gccccagctt
ctcagctcgc tggatggcgg 5400aagctgctac tcgtgagcgc cagtgcgggt gcagacaatc
ttctgttggg tggcatcatt 5460ccaggcccga agcatgaaca gtgcacctgg gacagggagc
agccccaaat tgtcacctgc 5520ttctctgccc agcttttcat tgctgtgaca gtgatggcga
aagagggtaa taaccagaca 5580caaactgcca agttgggtgg agaaaggagt ttctttagct
gacagaatct ctgaatttta 5640aatcacttag taagcggctc aagcccagga gggagcagag
ggatacgagc ggagtcccct 5700gcgcgggacc atctggaatt ggtttagccc aagtggagcc
tgacagccag aactctgtgt 5760cccccgtcta accacagctc cttttccaga gcattccagt
caggctctct gggctgactg 5820ggccagggga ggttacaggt accagttctt taagaagatc
tttgggcata tacattttta 5880gcctgtgtca ttgccccaaa tggattcctg tttcaagttc
acacctgcag attctaggac 5940ctgtgtccta gacttcaggg agtcagctgt ttctagagtt
cctaccatgg agtgggtctg 6000gaggacctgc ccggtggggg ggcagagccc tgctccctcc
gggtcttcct actcttctct 6060ctgctctgac gggatttgtt gattctctcc attttggtgt
ctttctcttt tagatattgt 6120atcaatcttt agaaaaggca tagtctactt gttataaatc
gttaggatac tgcctccccc 6180agggtctaaa attacatatt agaggggaaa agctgaacac
tgaagtcagt tctcaacaat 6240ttagaaggaa aacctagaaa acatttggca gaaaattaca
tttcgatgtt tttgaatgaa 6300tacgagcaag cttttacaac agtgctgatc taaaaatact
tagcacttgg cctgagatgc 6360ctggtgagca ttacaggcaa ggggaatctg gaggtagccg
acctgaggac atggcttctg 6420aacctgtctt ttgggagtgg tatggaaggt ggagcgttca
ccagtgacct ggaaggccca 6480gcaccaccct ccttcccact cttctcatct tgacagagcc
tgccccagcg ctgacgtgtc 6540aggaaaacac ccagggaact aggaaggcac ttctgcctga
ggggcagcct gccttgccca 6600ctcctgctct gctcgcctcg gatcagctga gccttctgag
ctggcctctc actgcctccc 6660caaggccccc tgcctgccct gtcaggaggc agaaggaagc
aggtgtgagg gcagtgcaag 6720gagggagcac aacccccagc tcccgctccg ggctccgact
tgtgcacagg cagagcccag 6780accctggagg aaatcctacc tttgaattca agaacatttg
gggaatttgg aaatctcttt 6840gcccccaaac ccccattctg tcctaccttt aatcaggtcc
tgctcagcag tgagagcaga 6900tgaggtgaaa aggccaagag gtttggctcc tgcccactga
tagcccctct ccccgcagtg 6960tttgtgtgtc aagtggcaaa gctgttcttc ctggtgaccc
tgattatatc cagtaacaca 7020tagactgtgc gcataggcct gctttgtctc ctctatcctg
ggcttttgtt ttgcttttta 7080gttttgcttt tagtttttct gtccctttta tttaacgcac
cgactagaca cacaaagcag 7140ttgaattttt atatatatat ctgtatattg cacaattata
aactcatttt gcttgtggct 7200ccacacacac aaaaaaagac ctgttaaaat tatacctgtt
gcttaattac aatatttctg 7260ataaccatag cataggacaa gggaaaataa aaaaagaaaa
aaaagaaaaa aaaacgacaa 7320atctgtctgc tggtcacttc ttctgtccaa gcagattcgt
ggtcttttcc tcgcttcttt 7380caagggcttt cctgtgccag gtgaaggagg ctccaggcag
cacccaggtt ttgcactctt 7440gtttctcccg tgcttgtgaa agaggtccca aggttctggg
tgcaggagcg ctcccttgac 7500ctgctgaagt ccggaacgta gtcggcacag cctggtcgcc
ttccacctct gggagctgga 7560gtccactggg gtggcctgac tcccccagtc cccttcccgt
gacctggtca gggtgagccc 7620atgtggagtc agcctcgcag gcctccctgc cagtagggtc
cgagtgtgtt tcatccttcc 7680cactctgtcg agcctggggg ctggagcgga gacgggaggc
ctggcctgtc tcggaacctg 7740tgagctgcac caggtagaac gccagggacc ccagaatcat
gtgcgtcagt ccaaggggtc 7800ccctccagga gtagtgaaga ctccagaaat gtccctttct
tctcccccat cctacgagta 7860attgcatttg cttttgtaat tcttaatgag caatatctgc
tagagagttt agctgtaaca 7920gttctttttg atcatctttt tttaataatt agaaacacca
aaaaaatcca gaaacttgtt 7980cttccaaagc agagagcatt ataatcacca gggccaaaag
cttccctccc tgctgtcatt 8040gcttcttctg aggcctgaat ccaaaagaaa aacagccata
ggccctttca gtggccgggc 8100tacccgtgag cccttcggag gaccagggct ggggcagcct
ctgggcccac atccggggcc 8160agctccggcg tgtgttcagt gttagcagtg ggtcatgatg
ctctttccca cccagcctgg 8220gataggggca gaggaggcga ggaggccgtt gccgctgatg
tttggccgtg aacaggtggg 8280tgtctgcgtg cgtccacgtg cgtgttttct gactgacatg
aaatcgacgc ccgagttagc 8340ctcacccggt gacctctagc cctgcccgga tggagcgggg
cccacccggt tcagtgtttc 8400tggggagctg gacagtggag tgcaaaaggc ttgcagaact
tgaagcctgc tccttccctt 8460gctaccacgg cctcctttcc gtttgatttg tcactgcttc
aatcaataac agccgctcca 8520gagtcagtag tcaatgaata tatgaccaaa tatcaccagg
actgttactc aatgtgtgcc 8580gagcccttgc ccatgctggg ctcccgtgta tctggacact
gtaacgtgtg ctgtgtttgc 8640tccccttccc cttccttctt tgccctttac ttgtctttct
ggggtttttc tgtttgggtt 8700tggtttggtt tttatttctc cttttgtgtt ccaaacatga
ggttctctct actggtcctc 8760ttaactgtgg tgttgaggct tatatttgtg taatttttgg
tgggtgaaag gaattttgct 8820aagtaaatct cttctgtgtt tgaactgaag tctgtattgt
aactatgttt aaagtaattg 8880ttccagagac aaatatttct agacactttt tctttacaaa
caaaagcatt cggagggagg 8940gggatggtga ctgagatgag aggggagagc tgaacagatg
acccctgccc agatcagcca 9000gaagccaccc aaagcagtgg agcccaggag tcccactcca
agccagcaag ccgaatagct 9060gatgtgttgc cactttccaa gtcactgcaa aaccaggttt
tgttccgccc agtggattct 9120tgttttgctt cccctccccc cgagattatt accaccatcc
cgtgctttta aggaaaggca 9180agattgatgt ttccttgagg ggagccagga ggggatgtgt
gtgtgcagag ctgaagagct 9240ggggagaatg gggctgggcc cacccaagca ggaggctggg
acgctctgct gtgggcacag 9300gtcaggctaa tgttggcaga tgcagctctt cctggacagg
ccaggtggtg ggcattctct 9360ctccaaggtg tgccccgtgg gcattactgt ttaagacact
tccgtcacat cccaccccat 9420cctccagggc tcaacactgt gacatctcta ttccccaccc
tccccttccc agggcaataa 9480aatgaccatg gagggggctt gcactctctt ggctgtcacc
cgatcgccag caaaacttag 9540atgtgagaaa accccttccc attccatggc gaaaacatct
ccttagaaaa gccattaccc 9600tcattaggca tggttttggg ctcccaaaac acctgacagc
ccctccctcc tctgagaggc 9660ggagagtgct gactgtagtg accattgcat gccgggtgca
gcatctggaa gagctaggca 9720gggtgtctgc cccctcctga gttgaagtca tgctcccctg
tgccagccca gaggccgaga 9780gctatggaca gcattgccag taacacaggc caccctgtgc
agaagggagc tggctccagc 9840ctggaaacct gtctgaggtt gggagaggtg cacttggggc
acagggagag gccgggacac 9900acttagctgg agatgtctct aaaagccctg tatcgtattc
accttcagtt tttgtgtttt 9960gggacaatta ctttagaaaa taagtaggtc gttttaaaaa
caaaaattat tgattgcttt 10020tttgtagtgt tcagaaaaaa ggttctttgt gtatagccaa
atgactgaaa gcactgatat 10080atttaaaaac aaaaggcaat ttattaagga aatttgtacc
atttcagtaa acctgtctga 10140atgtacctgt atacgtttca aaaacacccc ccccccactg
aatccctgta acctatttat 10200tatataaaga gtttgcctta taaattt
10227309PRTHomo sapiens 30Met Val Ala Gly Met Leu
Gly Leu Arg1 53121PRTHomo sapiens 31Met Ala Ala Ala Ala Ala
Ala Ala Pro Ser Gly Gly Gly Gly Gly Gly1 5
10 15Glu Glu Glu Arg Leu 203221PRTPan
troglodytes 32Met Ala Ala Ala Ala Ala Ala Ala Pro Ser Gly Gly Gly Gly Gly
Gly1 5 10 15Glu Glu Glu
Arg Leu 203326PRTMus musculus 33Met Ala Ala Ala Ala Ala Thr
Ala Ala Ala Ala Ala Ala Pro Ser Gly1 5 10
15Gly Gly Gly Gly Gly Glu Glu Glu Arg Leu 20
253437PRTRattus norvegicus 34Met Ala Ala Ala Ala Ala Ala
Ala Ala Ala Ala Ala Ala Ala Ala Ala1 5 10
15Ala Ala Ala Ala Ala Ala Ala Ala Ala Pro Ser Gly Gly
Gly Gly Gly 20 25 30Glu Glu
Glu Arg Leu 353521PRTFelis catus 35Met Ala Ala Ala Ala Ala Ala Ala
Pro Ser Gly Gly Gly Gly Gly Gly1 5 10
15Glu Glu Glu Arg Leu 203615PRTGallus gallus
36Met Ala Ala Ala Ala Ala Ala Ala Ala Gly Gly Glu Glu Arg Leu1
5 10 153711PRTXenopus laevis 37Met
Ala Ala Ala Pro Ser Gly Glu Glu Arg Leu1 5
103811PRTDanio rerio 38Met Ala Ala Ala Glu Ser Gly Glu Glu Arg Leu1
5 10399PRTFugu rubripes 39Met Ala Ala Val Glu
Ser Gly Glu Glu1 54013PRTHomo sapiens 40Met Ala Ala Ala Ala
Ala Gln Gly Gly Gly Gly Gly Glu1 5
104114PRTMus musculus 41Met Ala Ala Ala Ala Ala Ala Pro Gly Gly Gly Gly
Gly Glu1 5 104214PRTRattus norvegicus
42Met Ala Ala Ala Ala Ala Ala Pro Gly Gly Gly Gly Gly Glu1
5 104313PRTArtificial SequenceDescription of Artificial
Sequence Synthetic peptide 43Pro Ser Gly Gly Gly Gly Gly Gly Glu Glu
Glu Arg Leu1 5 104439DNAArtificial
SequenceDescription of Artificial Sequence Synthetic oligonucleotide
44ccgagcggag gaggaggagg aggcgaggag gagagactg
394514PRTArtificial SequenceDescription of Artificial Sequence Synthetic
peptide 45Ala Ala Pro Ser Gly Gly Gly Gly Gly Glu Thr Val Glu Trp1
5 104642DNAArtificial SequenceDescription of
Artificial Sequence Synthetic oligonucleotide 46cgccgcgccg
agcggaggag gaggaggaga gactgtgagt gg
424757DNAArtificial SequenceDescription of Artificial Sequence Synthetic
oligonucleotide 47ggcgtcggcg gcgcgcgctc cctcctctcg gagagaggct
gtggtaaaag ccgtccc 574822PRTArtificial SequenceDescription of
Artificial Sequence Synthetic peptide 48Met Ala Ala Ala Ala Ala Ala
Ala Pro Ser Gly Gly Gly Gly Gly Gly1 5 10
15Gly Glu Glu Glu Arg Leu 204955DNAArtificial
SequenceDescription of Artificial Sequence Synthetic oligonucleotide
49gaaaatggcc gccgccgccg ccgccgcgcc gagggaggag gaggaggagg agccg
555020PRTArtificial SequenceDescription of Artificial Sequence Synthetic
peptide 50Met Ala Ala Ala Ala Ala Ala Pro Ser Gly Gly Gly Gly Gly Gly
Glu1 5 10 15Glu Glu Arg
Leu 205159DNAArtificial SequenceDescription of Artificial
Sequence Synthetic oligonucleotide 51gaaaatggcc gccgccgccg
ccgcgccgag cggaggagga ggaggaggcg aggaggaga 595224PRTArtificial
SequenceDescription of Artificial Sequence Synthetic peptide 52Met
Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala Pro Ser Gly Gly Gly1
5 10 15Gly Gly Gly Glu Glu Glu Arg
Leu 205360DNAArtificial SequenceDescription of Artificial
Sequence Synthetic oligonucleotide 53gccgtccgga aaatggccgc
cgccgccgcc gccgccgccg ccgcgccgag cggaggagga 605421PRTArtificial
SequenceDescription of Artificial Sequence Synthetic peptide 54Met
Ala Ala Ala Ala Ala Ala Ala Pro Ser Gly Gly Gly Gly Gly Gly1
5 10 15Glu Glu Glu Arg Leu
205543DNAArtificial SequenceDescription of Artificial Sequence Synthetic
oligonucleotide 55cctcctctcg gagagggctg tggtaaaagc cgtccggaaa atc
43
User Contributions:
Comment about this patent or add new information about this topic: