Patent application title: METHODS AND COMPOSITIONS FOR RNA-GUIDED TREATMENT OF HIV INFECTION
Inventors:
IPC8 Class: AA61K3846FI
USPC Class:
1 1
Class name:
Publication date: 2018-07-19
Patent application number: 20180200343
Abstract:
A method of treating a subject having or at risk for having a virus
infection, by administering a therapeutically effective amount of a
composition comprising a vector encoding a CRISPR-associated endonuclease
and at least two guide RNAs that are complementary to two target
sequences spanning from the 5'- to 3'-LTRs of the sequence in the virus,
and completely excising a fragment of greater than 9000-bp of integrated
proviral DNA that spanned from its 5'- to 3'-LTRs. A method of treating a
subject having or at risk for having a genetic caused disease, by
administering a therapeutically effective amount of a composition
comprising a vector encoding a CRISPR-associated endonuclease and at
least two guide RNAs that are complementary to two target sequences
spanning from the sequence of the subjects DNA greater than 9000-bp that
is chromosomally integrated and causes the genetic caused disease, and
excising the chromosomally integrated sequence.Claims:
1. A method of treating a subject having or at risk for having an HIV-1
virus infection, including the steps of: administering to the subject a
therapeutically effective amount of a composition comprising a Clustered
Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated
endonuclease, and two or more different multiplex guide RNAs (gRNAs),
wherein each of the at least two gRNAs is complementary to a different
target nucleic acid sequence in a long terminal repeat (LTR) of proviral
DNA of the virus that is unique from the genome of the host cell;
cleaving a double strand of the proviral DNA at a first target
protospacer sequence with the CRISPR-associated endonuclease; cleaving a
double strand of the proviral DNA at a second target protospacer sequence
with the CRISPR-associated endonuclease; completely excising a fragment
of greater than 9000-bp of integrated HIV-1 proviral DNA that spanned
from its 5'- to 3'-LTRs; and eradicating the HIV-1 proviral DNA from the
host cell.
2. The method of claim 1, wherein said administering step further includes the steps of: exposing a host cell to a composition including an isolated nucleic acid encoding the CRISPR-associated endonuclease; an isolated nucleic acid sequence encoding a first gRNA having a first spacer sequence that is complementary to a first target protospacer sequence in a proviral DNA; and an isolated nucleic acid encoding a second gRNA having a second spacer sequence that is complementary to a second target protospacer sequence in the proviral DNA; expressing in the host cell the CRISPR-associated endonuclease, the first gRNA, and the second gRNA; assembling, in the host cell, a first gene editing complex including the CRISPR-associated endonuclease and the first gRNA; and a second gene editing complex including the CRISPR-associated endonuclease and the second gRNA; directing the first gene editing complex to the first target protospacer sequence by complementary base pairing between the first spacer sequence and the first target protospacer sequence; and directing the second gene editing complex to the second target protospacer sequence by complementary base pairing between the second spacer sequence and the second target protospacer sequence.
3. The method of claim 2, wherein at least one of the first target protospacer sequence and the second target protospacer sequence is situated within the U3 region of the LTR.
4. The method of claim 3, wherein the first spacer sequence and the second spacer sequence each include a sequence complementary to a target protospacer sequence selected from the group consisting of SEQ ID NO: 96, SEQ ID NO: 121, SEQ ID NO: 87, and SEQ ID NO: 110.
5. The method of claim 3, wherein the first spacer sequence and the second spacer sequence include, respectively, a sequence complementary to the target protospacer sequences SEQ ID NO: 96 and SEQ ID NO: 121.
6. The method of claim 3, wherein the first spacer sequence and the second spacer sequence each include, respectively, a sequence complementary to the target protospacer sequences SEQ ID NO: 87 and SEQ ID NO: 110.
7. The method of claim 1, wherein the CRISPR-associated endonuclease is Cas9 or a human-optimized Cas9.
8. The method of claim 1, wherein the composition is encoded in a vector selected from the group consisting of a plasmid vector, a lentiviral vector, an adenoviral vector, and an adeno-associated virus vector.
9. The method of claim 1, wherein at least one of the gRNAs comprises a CRISPR RNA (crRNA) and a trans-activated small RNA (tracrRNA), which are expressed as separate nucleic acids.
10. The method of claim 1, wherein at least one of the gRNAs is engineered as an artificial fusion small guide RNA (sgRNA) comprised of a crRNA and a tracrRNA.
11. The method of claim 2, wherein said step of expressing in the host cell the CRISPR-associated endonuclease, the first gRNA, and the second gRNA, is further defined as stably expressing in the host cell the CRISPR-associated endonuclease, the first gRNA, and the second gRNA, and the method additionally includes the step of immunizing the host cell against new retroviral infection.
12. The method of claim 2, wherein the host cell is chosen from the group consisting of a CD4+ T cell, a macrophage, a monocyte, a gut associated lymphoid cell, a microglial cell, and an astrocyte.
13. A method of treating a subject having or at risk for having a genetic caused disease, including the steps of: administering to the subject a therapeutically effective amount of a composition comprising a Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated endonuclease, and two or more different multiplex guide RNAs (gRNAs), wherein each of the at least two gRNAs is complementary to a different target nucleic acid sequence in a long terminal repeat (LTR) of the proviral DNA that is unique from the genome of the host cell, and wherein the gRNAs are complementary to two target sequences spanning from the sequence of the subjects DNA greater than 9000-bp that is chromosomally integrated and causes the genetic caused disease; cleaving a double strand of the DNA at a first target protospacer sequence with the CRISPR-associated endonuclease; cleaving a double strand of the DNA at a second target protospacer sequence with the CRISPR-associated endonuclease; excising the entire chromosomally integrated sequence; and eradicating the chromosomally integrated sequence from the host cell.
14. The method of claim 13, wherein said administering step further includes the steps of: exposing a host cell to a composition including an isolated nucleic acid encoding the CRISPR-associated endonuclease; an isolated nucleic acid sequence encoding a first gRNA having a first spacer sequence that is complementary to a first target protospacer sequence in the DNA; and an isolated nucleic acid encoding a second gRNA having a second spacer sequence that is complementary to a second target protospacer sequence in the DNA; expressing in the host cell the CRISPR-associated endonuclease, the first gRNA, and the second gRNA; assembling, in the host cell, a first gene editing complex including the CRISPR-associated endonuclease and the first gRNA; and a second gene editing complex including the CRISPR-associated endonuclease and the second gRNA; directing the first gene editing complex to the first target protospacer sequence by complementary base pairing between the first spacer sequence and the first target protospacer sequence; and directing the second gene editing complex to the second target protospacer sequence by complementary base pairing between the second spacer sequence and the second target protospacer sequence.
15. The method of claim 13, wherein at least one of the first target protospacer sequence and the second target protospacer sequence is situated within the U3 region of the LTR.
16. The method of claim 13, wherein the CRISPR-associated endonuclease is Cas9 or a human-optimized Cas9.
17. The method of claim 13, wherein the composition is encoded in a vector selected from the group consisting of a plasmid vector, a lentiviral vector, an adenoviral vector, and an adeno-associated virus vector.
18. The method of claim 13, wherein at least one of the gRNAs comprises a CRISPR RNA (crRNA) and a trans-activated small RNA (tracrRNA), which are expressed as separate nucleic acids.
19. The method of claim 13, wherein at least one of the gRNAs is engineered as an artificial fusion small guide RNA (sgRNA) comprised of a crRNA and a tracrRNA.
20. The method of claim 13, wherein the host cell is chosen from the group consisting of a CD4+ T cell, a macrophage, a monocyte, a gut associated lymphoid cell, a microglial cell, and an astrocyte.
21. The method of claim 13, wherein said method is performed prenatally.
Description:
BACKGROUND OF THE INVENTION
1. Technical Field
[0002] The present invention relates to compositions that specifically cleave target sequences in retroviruses, for example human immunodeficiency virus (HIV). Such compositions, which can include nucleic acids encoding a Clustered Regularly Interspace Short Palindromic Repeat (CRISPR) associated endonuclease and a guide RNA sequence complementary to a target sequence in a human immunodeficiency virus, can be administered to a subject having or at risk for contracting an HIV infection.
2. Background Art
[0003] For more than three decades since the discovery of HIV-1, AIDS remains a major public health problem affecting greater than 35.3 million people worldwide. AIDS remains incurable due to the permanent integration of HIV-1 into the host genome. Current therapy (highly active antiretroviral therapy or HAART) for controlling HIV-1 infection and impeding AIDS development profoundly reduces viral replication in cells that support HIV-1 infection and reduces plasma viremia to a minimal level. But HAART fails to suppress low level viral genome expression and replication in tissues and fails to target the latently-infected cells, for example, resting memory T cells, brain macrophages, microglia, and astrocytes, gut-associated lymphoid cells, that serve as a reservoir for HIV-1. Persistent HIV-1 infection is also linked to co-morbidities including heart and renal diseases, osteopenia, and neurological disorders. There is a continuing need for curative therapeutic strategies that target persistent viral reservoirs.
SUMMARY OF THE INVENTION
[0004] The present invention provides for a method of treating a subject having or at risk for having an HIV-1 virus infection, by administering to the subject a therapeutically effective amount of a composition comprising a Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated endonuclease, and two or more different multiplex guide RNAs (gRNAs), wherein each of the at least two gRNAs is complementary to a different target nucleic acid sequence in a long terminal repeat (LTR) of proviral DNA of the virus that is unique from the genome of the host cell, cleaving a double strand of the proviral DNA at a first target protospacer sequence with the CRISPR-associated endonuclease, cleaving a double strand of the proviral DNA at a second target protospacer sequence with the CRISPR-associated endonuclease, completely excising a fragment of greater than 9000-bp of integrated HIV-1 proviral DNA that spanned from its 5'- to 3'-LTRs, and eradicating the HIV-1 proviral DNA from the host cell.
[0005] The present invention also provides for a method of treating a subject having or at risk for having a genetic caused disease, by administering to the subject a therapeutically effective amount of a composition comprising a Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated endonuclease, and two or more different multiplex guide RNAs (gRNAs), wherein each of the at least two gRNAs is complementary to a different target nucleic acid sequence in a long terminal repeat (LTR) of the proviral DNA that is unique from the genome of the host cell, and wherein the gRNAs are complementary to two target sequences spanning from the sequence of the subjects DNA greater than 9000-bp that is chromosomally integrated and causes the genetic caused disease, cleaving a double strand of the DNA at a first target protospacer sequence with the CRISPR-associated endonuclease, cleaving a double strand of the DNA at a second target protospacer sequence with the CRISPR-associated endonuclease, excising the entire chromosomally integrated sequence, and eradicating the chromosomally integrated sequence from the host cell.
[0006] The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the description and drawings, and from the claims.
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] FIG. 1A, FIG. 1B, FIG. 1C, FIG. 1D, FIG. 1E, FIG. 1F, FIG. 1G, and FIG. 1H show that Cas9/LTR-gRNA suppresses HIV-1 reporter virus production in CHME5 microglial cells latently infected with HIV-1. FIG. 1A shows a representative gating diagram of EGFP flow cytometry shows a dramatic reduction in TSA-induced reactivation of latent pNL4-3-.DELTA.Gag-d2EGFP reporter virus by stably expressed Cas9 plus LTR-A or -B, vs. empty U6-driven gRNA expression vector (U6-CAG). FIG. 1B shows SURVEYOR Cel-I nuclease assay of PCR product (-453 to +43 within LTR) from selected LTR-A- or -B-expressing stable clones shows dramatic indel mutation patterns (arrows). FIG. 1C shows a PCR fragment analysis of a precise deletion of 190-bp region between LTRs A and B cutting sites (arrowhead and arrow in FIG. 1D), leaving 306-bp fragment (arrow in FIG. 1C) validated by TA-cloning and sequencing results. FIG. 1D discloses SEQ ID NOS 1-3, respectively, in order of appearance. FIG. 1E is a graph showing subcloning of LTR-A/B stable clones reveals complete loss of reporter reactivation determined by EGFP flow cytometry, and FIG. 1F shows elimination of pNL4-3-.DELTA.Gag-d2EGFP proviral genome detected by standard, and FIG. 1G shows real-time (1G) PCR amplification of genomic DNA for EGFP and HIV-1 Rev response element (RRE); .beta.-actin is a DNA purification and loading control. FIG. 1H shows PCR genotyping of LTR-A/B subclones (#8, 13) using primers to amplify DNA fragment covering HIV-1 LTR U3/R/U5 regions (-411 to +129) shows indels (a, deletion; c, insertion) and "intact" or combined LTR (b).
[0008] FIG. 2A, FIG. 2B, and FIG. 2C show that Cas9/LTR-gRNA efficiently eradicates latent HIV-1 virus from U1 monocytic cells. FIG. 2A shows a diagram showing excision of HIV-1 entire genome in chromosome Xp11.4. HIV-1 integration sites were identified using a Genome-Walker link PCR kit. Left, analysis of PCR amplicon lengths using a primer pair (P1/P2) targeting chromosome X integration site-flanking sequence reveals elimination of the entire HIV-1 genome (9709-bp), leaving two fragments (833- and 670-bp). FIG. 2B shows TA cloning and sequencing of the LTR fragment (833-bp) showing the host genomic sequence (small letters, 226-bp) and the partial sequences (634-27=607 bp) of 5'-LTR (underlined using dashes) and 3'-LTR (first underlined section) with a 27-bp deletion around the LTR-A targeting site (second underlined section). Bottom, two indel alleles identified from 15 sequenced clonal amplicons. The 670-bp fragment consists of a host sequence (226-bp) and the remaining LTR sequence (634-190=444 bp) after 190-bp excision by simultaneous cutting at LTR-A and B target sites. The underlined and highlighted sequences indicate the gRNA LTR-A target site and PAM. FIG. 2B discloses SEQ ID NOS 4-13, respectively, in order of appearance. FIG. 2C shows a functional analysis of LTR-A/B-induced eradication of HIV-1 genome, showing substantial blockade of TSA/PMA reactivation-induced p24 virion release. U1 cells were transfected with pX260-LTRs-A, -B, or -A/B. After 2-week puromycin selection, cells were treated with TSA (250 nM)/PMA for 2 days before p24 Gag ELISA was performed.
[0009] FIG. 3A, FIG. 3B, and FIG. 3C show that stable expression of Cas9 plus LTR-A/B vaccinates TZM-bl cells against new HIV-1 virus infection. FIG. 3A shows immunohistochemistry (ICC) and Western blot (WB) analyses with anti-Flag antibody confirm the expression of Flag-Cas9 in TZM-bl stable clones puromycin (2 .mu.g/ml)-selected for 2 weeks. FIG. 3B shows PCR genotyping of Cas9/LTR-A/B stable clones (c1-c7) reveals a close correlation of LTR excision with repression of LTR luciferase reporter activation. Fold changes represent TSA/PMA-induced levels over corresponding non-induction levels. FIG. 3C shows Cas9/LTR-A/B-expressing cells (c4) were infected with pseudotyped-pNL4-3-Nef-EGFP lentivirus at indicated multiplicity of infection (MOI) and infection efficiency measured by EGFP flow cytometry, 2 d post-infection. FIG. 3D shows phase-contrast/fluorescence micrographs show that LTR-A/B stable, but not control (U6-CAG; black) cells, are resistant to new infection (right panel) by pNL4-3-.DELTA.E-EGFP HIV-1 reporter virus (gray).
[0010] FIG. 4A, FIG. 4B, FIG. 4C, and FIG. 4D illustrate the off-target effects of Cas9/LTR-A/B on the human genome. FIG. 4A is a SURVEYOR assay that shows no indel mutations in predicted/potential off-target regions in human TZM-bl and U1 cells. LTR-A on-target region (A) was used as a positive control and empty U6-CAG vector (U6) as a negative control. FIG. 4B shoes whole-genome sequencing of LTR-A/B stable TZM-bl subclone showing the numbers of called indels in the U6-CAG control and LTR-A/B samples, FIG. 4C shows detailed information on 10 called indels near gRNA target sites in both samples, and FIG. 4D shows distribution of off-target called indels. FIG. 4C discloses SEQ ID NOS 14-15, respectively, in order of appearance.
[0011] FIG. 5 shows the LTR U3 sequence of the integrated lentiviral LTR-firefly luciferase reporter identified by TA-cloning and sequencing of PCR product (-411 to -10) from the genomic DNA of human TZM-bl cells. The protospacer and PAM (NGG) sequences of 4 gRNAs (LTR-A to D) and the predicted binding sites of indicated transcription factors are highlighted. The precise cleavage sites are marked with scissors. +1 indicates the transcriptional start site. FIG. 5 discloses SEQ ID NO: 16.
[0012] FIG. 6A, FIG. 6B, and FIG. 6C show that LTR-C and LTR-D remarkably suppress TSA-induced reactivation of latent pNL4-3-.DELTA.Gag-d2EGFP virus in CHME5 microglia cells. FIG. 6A is a diagram schematically showing pNL4-3-.DELTA.Gag-d2EGFP vector containing Tat, Rev, Env, Vpu, and Nef with the reporter gene d2EGFP. FIG. 6B shows a SURVEYOR assay showing indel mutations in the on-target LTR genome of Cas9/LTR-D but not Cas9/LTR-C transfected cells. FIG. 6C shows a representative gating diagram of EGFP flow cytometry showing a dramatic reduction in TSA-induced reactivation of latent pNL4-3-.DELTA.Gag-d2EGFP reporter viruses by stable expression of Cas9/LTR-C or LTR-D as compared with empty U6-driven gRNA expression vector (U6-CAG).
[0013] FIG. 7A, FIG. 7B, FIG. 7C, FIG. 7D, FIG. 7E, and FIG. 7F show that both LTR-C and LTR-D induced indel mutations and significantly decreased constitutive and TSA/PMA-induced luciferase activity in TZM-bl cells stably incorporated with HIV-1 LTR-firefly luciferase reporter gene. FIG. 7A shows a functional luciferase reporter assay revealing a significant reduction of LTR reactivation by LTR-C, LTR-D or both. FIG. 7B shows a SURVEYOR assay showing indel mutation in LTR DNA (-453 to +43) induced by LTR-C and LTR-D (upper arrow). A combination of LTR-C and LTR-D generates a 194 bp fragment (lower arrow) resulting from the deletion of 302 bp region between LTR-C and LTR-D. FIG. 7C and FIG. 7D show Sanger sequencing of 30 clones validating the indel efficiency at 23% for LTR-C and 13% for LTR-D and example chromatograms showing insertion/deletion. FIG. 7C discloses SEQ ID NOS 17-25, respectively, in order of appearance. FIG. 7D discloses SEQ ID NOS 26-30, respectively, in order of appearance. FIG. 7E shows PCR-restriction fragment length polymorphism (RFLP) analysis using BsaJ I to cut 5 sites (96, 102, 372, 386, 482) of the PCR product covering -453 to +43 of LTR showing two major bands (96 bp and 270 bp) in the U6-CAG control sample, but an additional 372 bp band (upper arrow) after LTR-C-induced indel mutation at the 96/102 sites, a 290 bp band (middle arrow) after LTR-D-induced mutations at the 372 site and a 180 bp fragment (lower arrow) after LTR-C/D-induced excision. FIG. 7F shows chromatograms showing the deletion of a 302 bp fragment between LTR-C and LTR-D (top) and an additional 17 bp deletion (bottom). Red arrows indicate the junction sites. *P<0.05 indicates a significant decrease in LTR-C or LTR-D-mediated luciferase activation compared to U6-CAG control. FIG. 7F discloses SEQ ID NOS 31-32, respectively, in order of appearance.
[0014] FIG. 8A, FIG. 8B, and FIG. 8C illustrate the TA cloning and Sanger sequencing of PCR products from CHME5 subclones of LTR-A/B and empty U6-CAG control using primers covering HIV-1 LTR U3/R/U5 regions (-411 to +129). FIG. 8A shows possible combination of LTR-A and LTR-B cuts on both 5'- and 3'-LTRs generating potential fragments a-c as indicated. FIG. 8B shows blasting of fragment a (351 bp) showing 190 bp deletion between LTR-A and LTR-B cut sites. FIG. 8C shows a blast of fragment c (682 bp) showing a 175 bp insertion at the LTR-A cleavage site and a 27 bp deletion at the LTR-B cleavage site. FIG. 8C discloses SEQ ID NOS 33-34, respectively, in order of appearance.
[0015] FIG. 9A, FIG. 9B, FIG. 9C, and FIG. 9D demonstrate that Cas9/LTR-gRNA efficiently eradicates latent HIV-1 virus from U1 monocytic cells. FIG. 9A shows a Sanger sequencing of a 1.1 kb fragment from long-range PCR using a primer pair (T492/T493) targeting a chromosome 2 integration site-flanking sequence (small letters, 467-bp) reveals elimination of the entire HIV-1 genome (9709-bp), leaving combined 5'-LTR (underlined using dashes) and 3'-LTR with a 6-bp insertion (boxed) precisely at the third nucleotide from PAM (TGG) LTR-A targeting site (underlined) and a 4-bp deletion (nnnn). FIG. 9A discloses SEQ ID NO: 35. FIG. 9B is a representative DNA gel picture that shows specific eradication of the HIV-1 genome. NS, non-specific band. FIG. 9C is a graph and FIG. 9D is a graph showing quantitative PCR analysis using the primer pair targeting the Gag gene (T457/T458) shows 85% efficiency of entire HIV-1 genome eradication in Cas9/LTR-A/B-expressing U1 cells. U1 cells were transfected with pX260 empty vector (U6-CAG) or LTRs-A/B-encoding vectors. After 2-week puromycin selection, the cellular genomic DNAs were used for absolute quantitative qPCR analysis using spiked pNL4-3-.DELTA.E-EGFP human genomic DNA as a standard. **P<0.01 indicates a significant decrease compared to the U6-CAG control.
[0016] FIG. 10A, FIG. 10B, and FIG. 10C show that Cas9/LTR gRNAs effectively eradicates HIV-1 provirus in J-Lat latently infected T cells. FIG. 10A shows functional analysis by EGFP flow cytometry reveals approximately 50% reduction of PMA and TNF.alpha.-induced reactivation of EGFP reporter viruses. FIG. 10B is a SURVEYOR assay that shows indel mutations (arrow) in the on-target LTR genome of Cas9/LTR-A/B transfected cells. J-Lat cells were transfected with pX260 empty vector or LTRs-A and -B. After 2-week puromycin selection, cells were treated with PMA or TNF.alpha. for 24 h. The genomic DNAs were subject to PCR using primers covering HIV-1 LTR U3/R/U5 regions (-411 to +129) and the SURVEYOR assay was performed. **P<0.01 indicates a significant decrease compared to the U6-CAG control. FIG. 10C shows a PCR fragment analysis using primers covering HIV-1 LTR (-374 to +43) shows a precise deletion of 190-bp region between LTRs A and B cutting sites, leaving 227-bp fragment (arrow). House-keeping gene .beta.-actin serves as a DNA purification and loading control.
[0017] FIG. 11A, FIG. 11B, FIG. 11C, and FIG. 11D show that genome editing efficiency depends upon the presence of Cas9 and gRNAs. FIG. 11A shows PCR genotyping reveals the absence of a U6-driven LTR-A or LTR-B expression cassette and FIG. 11B shows absence/reduction of CMV-driven Cas9 DNA in puromycin-selected TZM-bl subclones without any indication of genomic editing. Genomic DNAs from indicated subclones were subject to conventional (FIG. 11A) or real-time (FIG. 11B) PCR analyses using a primer pair covering U6 promoter (T351) and LTR-A (T354) or -B (T356), and targeting Cas9 (T477/T491). FIG. 11C and FIG. 11D show Cas9 protein expression is absent in ineffective TZM-bl subclones. FIG. 11C shows that the Flag-tagged Cas9 fusion protein was detected by Western blot (WB) and immunocytochemistry (ICC) with anti-Flag monoclonal antibody. HEK293T cell line stably expressing Flag-Cas9 was used as a positive control for WB. GAPDH serves as a protein loading control. Clone c6 contains Cas9 DNA but no Cas9 protein expression, suggesting a potential mechanism of epigenetic repression after puromycin selection. Clone c5 and c3 may represent a truncated Flag-Cas9 (tCas9). FIG. 11D shows that the nucleus was stained with Hoechst 33258.
[0018] FIG. 12A, FIG. 12B, FIG. 12C, and FIG. 12D demonstrate that stable expression of Cas9/LTR-A/B gRNAs in TZM-bl cells vaccinates against pseudotyped or native HIV-1 viruses. FIG. 12 shows that flow cytometry shows a significant reduction of native pNL4-3-.DELTA.E-EGFP reporter virus infection efficiency in Cas9/LTR-A/B expressing TZM-bl subclones. Real-time PCR analysis reveals suppression or elimination of viral RNA as shown in FIG. 12B and DNA as shown in FIG. 12C by Cas9/LTR-A/B gRNAs. FIG. 12D shows that the firefly-luciferase luminescent assay demonstrates dramatic inhibition of virus infection-stimulated LTR promoter activity by Cas9/LTR-A/B gRNAs. The stable Cas9/LTR-A/B gRNA-expressing TZM-bl cells were infected for 2 hours with indicated native HIV-1 viruses, and washed twice with PBS. At 2 days post-infection, cells were collected, fixed and analyzed by flow cytometry for EGFP expression (in FIG. 12A), or lysed for total RNA extraction and RT-qPCR (in FIG. 12B), genomic DNA purification for qPCR (in FIG. 12C) and luminescence measurement (in FIG. 12D). *P<0.05 and **P<0.01 indicate significant decreases compared to the U6-CAG control.
[0019] FIG. 13 shows the predicted LTR gRNAs and their off-target numbers (100% match). The 5'-LTR sense and antisense sequences (SEQ ID NOS 79-111 and 112-141, respectively) (634 bp) of pHR'-CMV-LacZ lentiviral vector (AF105229) were utilized to search for Cas9/gRNA target sites containing a 20-bp guide sequence (protospacer) plus the protospacer adjacent motif sequence (NGG) using Jack Lin's CRISPR/Cas9 gRNA finder tool (http://spot.colorado.edu/.about.slin/cas9.html). Each gRNA plus NGG (AGG, TGG, GGG, CGG) was blasted against available human genomic and transcript sequences with 1000 aligned sequences being displayed. After pressing Control+F, copy/paste the target sequence (1-23 through 9-23 nucleotides) and find the number of genomic targets with 100% match. The number of off-targets for each searching was divided by 3 because of repeated genome library. The number shown indicates the sum of 4 searches (NGG). The top number (for example, for gRNA sequence (sense): 20, 19, 19, 17, 16, 15, 14, 13, 12) indicates the gRNA target sequences farthest from NGG. The sequence and off-target numbers for the selected LTR-A/B and LTR-C/D are highlighted red and green respectively.
[0020] FIG. 14 depicts the oligonucleotides for gRNA targeting sites and primers (SEQ D NOS 36-78, respectively, in order of appearance) used for PCR and sequencing.
[0021] FIG. 15 shows the locations of predicted gRNA targeting sites of LTR-A and LTR-B and discloses "query Seq" sequences as SEQ ID NOS 142-252, and "ref Seq" sequences as SEQ ID NOS 253-363, all respectively, in order of appearance.
[0022] FIG. 16A, FIG. 16B, FIG. 16C, FIG. 16D, FIG. 16E, FIG. 16F, FIG. 16G, and FIG. 16H show that both LTR-C and LTR-D decreased constitutive and TSA/PMA-induced luciferase activity in TZMBI cells stably incorporated with HIV-1 LTR firefly luciferase reporter gene and combination induced precise genome excision. FIG. 16A shows that six gRNA targets were designed for the promoter region of HIV-LTR. FIG. 16A discloses SEQ ID NO: 16. TZMBI cells were cotransfected with Cas9-EGFP and chimera gRNA expression cassette (PCR products) by lipofectamine 2000. FIG. 16B is a graph showing that after 3 d, EGFP-positive cells were sorted through FACS and 2000 cells per group were collected for luciferase assay. FIG. 16B discloses SEQ ID: 31. FIG. 16C is a graph showing the population sorted cells were cultured for 2 d and treated with TSA/PMA for 1 d before luciferase assay. The single cells were sorted into 96-well plate and cultured till confluence for luciferase assay in the absence (shown in the graph of FIG. 16D) of TSA/PMA for 1 d or presence (shown in the graph of FIG. 1E) of TSA/PMA for 1 d. FIG. 16F and FIG. 16G show the PCR product from the population sorted cells were analyzed with Surveyor Cel-I nuclease assay and restriction fragment length polymorphism with BsajI (FIG. 16G) showing mutation (FIG. 16F) or uncut (FIG. 16G) band (red arrow). A 200 bp fragment (FIG. 16F, FIG. 16G, black arrow) resulting from the deletion of 321 bp region between LTR-C and LTR-D as predicted (FIG. 16A, red arrowhead) was validated by TA-cloning and sequencing showing precise genomic excision (FIG. 16H). Sanger sequencing of PCR products from individual LTR-C and -D identified % and % indel mutation efficiency respectively. * p<0.05 indicates statistically significant reduction using a student's t test compared to the corresponding U6-CAG control. Protospace(E), Protospace(C), Protospace(A), Protospace(B), Protospace(D), and Protospace(F) correspond to SEQ ID NOS 365, 367, 369, 371, 373, and 375, respectively, in order of appearance.
[0023] FIG. 17A, FIG. 17B, FIG. 17C, FIG. 17D, FIG. 17E, FIG. 17F, FIG. 17G, and FIG. 17H show that Cas9/LTR-gRNA inhibited constitutive and inducible production of HIV-1 virus measured by EGFP flow cytometry in HIV-1 latently infected CHME5 microglia cell line. The pHR' lentiviral vector containing Tat, Rev, Env, Vpu, and Nef with the reported gene d2EGFP was transduced into human fetal microglia cell line CHME5 and 400 bp deletion in U3 region of 3'-LTR is illustrated (shown in FIG. 17A). FIG. 17B is a graph showing transient transfection of Cas9/gRNA, Human HIV-1 LTR-A, B alone or combination decreased the intensity but not percentage of EGFP due to suppression of LTR promoter activity. FIG. 17C is a graph showing transient transfection of Cas9/gRNA, Human HIV-1 LTR-C, D alone or combination decreased the intensity but not percentage of EGFP due to suppression of LTR promoter activity. FIG. 17D and FIG. 18 are graphs showing that after antibiotic selection for 1-2 weeks, the percentage of EGFP cells was also reduced. FIG. 17F and FIG. 17G show the PCR product from the stable selected clones were analyzed with Surveyor Cel-I nuclease assay showing indel mutation dramatically in LTR-A and LTR-B but weakly in the combination of LTR-A/B (red arrow). A 331 bp fragment (shown in FIG. 17F and FIG. 17G, black arrow) resulting from the deletion of 190 bp region between LTR-A and LTR-B as predicted (FIG. 17H, red arrowhead) was validated by TA-cloning and sequencing showing precise genomic excision (FIG. 17H). FIG. 17H discloses SEQ ID NOS 1-3, respectively, in order of appearance.
[0024] FIG. 18 shows LTR of a representative HIV-1 sequence (SEQ ID NO: 376). The U3 region extends from nucleotide 1 to nucleotide 432 (SEQ ID NO: 377), the R region extends from nucleotide 432 to nucleotide 559 (SEQ ID NO: 378), and the U5 region extends from 560 to nucleotide 634 (SEQ ID NO: 379).
[0025] FIG. 19 shows LTR of a representative SIV sequence (SEQ ID NO: 380). The U3 region extends from nucleotide 1 to nucleotide 517 (SEQ ID NO: 381), the R region extends from nucleotide 518 to nucleotide 693 (SEQ ID NO: 382), and the U5 region extends from 694 to nucleotide 818 (SEQ ID NO: 383).
DETAILED DESCRIPTION OF THE INVENTION
[0026] The present invention is based, in part, on our discovery that we could eliminate the integrated HIV-1 genome from HIV-1 infected cells by using the RNA-guided Clustered Regularly Interspace Short Palindromic Repeat (CRISPR)-Cas 9 nuclease system (Cas9/gRNA) in single and multiplex configurations. We identified highly specific targets within the HIV-1 LTR U3 region that were efficiently edited by Cas9/gRNA, inactivating viral gene expression and replication in latently-infected microglial, promonocytic and T cells. Cas9/gRNAs caused neither genotoxicity nor off-target editing to the host cells, and completely excised a 9709-bp fragment of integrated proviral DNA that spanned from its 5'- to 3'-LTRs. Furthermore, the presence of multiplex gRNAs within Cas9-expressing cells prevented HIV-1 infection. Our results suggest that Cas9/gRNA can be engineered to provide a specific, efficacious prophylactic and therapeutic approach against AIDS.
[0027] Accordingly, the invention features compositions comprising a nucleic acid encoding a CRISPR-associated endonuclease and a guide RNA that is complementary to a target sequence in a retrovirus, e.g., HIV, as well as pharmaceutical formulations comprising a nucleic acid encoding a CRISPR-associated endonuclease and a guide RNA that is complementary to a target sequence in HIV. Also featured are compositions comprising a CRISPR-associated endonuclease polypeptide and a guide RNA that is complementary to a target sequence in HIV, as well as pharmaceutical formulations comprising a CRISPR-associated endonuclease polypeptide and a guide RNA that is complementary to a target sequence in HIV.
[0028] Also featured are methods of administering the compositions to treat a retroviral infection, e.g., HIV infection, methods of eliminating viral replication, and methods of preventing HIV infection. The therapeutic methods described herein can be carried out in connection with other antiretroviral therapies (e.g., HAART).
[0029] The clinical course of HIV infection can vary according to a number of factors, including the subject's genetic background, age, general health, nutrition, treatment received, and the HIV subtype. In general, most individuals develop flu-like symptoms within a few weeks or months of infection. The symptoms can include fever, headache, muscle aches, rash, chills, sore throat, mouth or genital ulcers, swollen lymph glands, joint pain, night sweats, and diarrhea. The intensity of the symptoms can vary from mild to severe depending upon the individual. During the acute phase, the HIV viral particles are attracted to and enter cells expressing the appropriate CD4 receptor molecules. Once the virus has entered the host cell, the HIV encoded reverse transcriptase generates a proviral DNA copy of the HIV RNA and the pro-viral DNA becomes integrated into the host cell genomic DNA. It is this HIV provirus that is replicated by the host cell, resulting in the release of new HIV virions which can then infect other cells. The methods and compositions of the invention are generally and variously useful for excision of integrated HIV proviral DNA, although the invention is not so limited, and the compositions may be administered to a subject at any stage of infection or to an uninfected subject who is at risk for HIV infection.
[0030] The primary HIV infection subsides within a few weeks to a few months, and is typically followed by a long clinical "latent" period which may last for up to 10 years. The latent period is also referred to as asymptomatic HIV infection or chronic HIV infection. The subject's CD4 lymphocyte numbers rebound, but not to pre-infection levels and most subjects undergo seroconversion, that is, they have detectable levels of anti-HIV antibody in their blood, within 2 to 4 weeks of infection. During this latent period, there can be no detectable viral replication in peripheral blood mononuclear cells and little or no culturable virus in peripheral blood. During the latent period, also referred to as the clinical latency stage, people who are infected with HIV may experience no HIV-related symptoms, or only mild ones. But, the HIV virus continues to reproduce at very low levels. In subjects who have treated with anti-retroviral therapies, this latent period may extend for several decades or more. However, subjects at this stage are still able to transmit HIV to others even if they are receiving antiretroviral therapy, although anti-retroviral therapy reduces the risk of transmission. As noted above, anti-retroviral therapy does not suppress low levels of viral genome expression nor does it efficiently target latently infected cells such as resting memory T cells, brain macrophages, microglia, astrocytes and gut associated lymphoid cells.
[0031] Clinical signs and symptoms of AIDS (acquired immunodeficiency syndrome) appear as CD4 lymphocyte numbers decrease, resulting in irreversible damage to the immune system. Many patients also present with AIDS-related complications, including, for example, opportunistic infections such as tuberculosis, salmonellosis, cytomegalovirus, candidiasis, cryptococcal meningitis, toxoplasmosis, and cryptosporidiosis, as well as certain kinds of cancers, including for example, Kaposi's sarcoma, and lymphomas, as well as wasting syndrome, neurological complications, and HIV-associated nephropathy.
[0032] Compositions
[0033] The compositions of the invention include nucleic acids encoding a CRISPR-associated endonuclease, e.g., Cas9, and a guide RNA that is complementary to a target sequence in a retrovirus, e.g., HIV. In bacteria the CRISPR/Cas loci encode RNA-guided adaptive immune systems against mobile genetic elements (viruses, transposable elements and conjugative plasmids). Three types (I-III) of CRISPR systems have been identified. CRISPR clusters contain spacers, the sequences complementary to antecedent mobile elements. CRISPR clusters are transcribed and processed into mature CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) RNA (crRNA). The CRISPR-associated endonuclease, Cas9, belongs to the type II CRISPR/Cas system and has strong endonuclease activity to cut target DNA. Cas9 is guided by a mature crRNA that contains about 20 base pairs (bp) of unique target sequence (called spacer) and a trans-activated small RNA (tracrRNA) that serves as a guide for ribonuclease III-aided processing of pre-crRNA. The crRNA:tracrRNA duplex directs Cas9 to target DNA via complementary base pairing between the spacer on the crRNA and the complementary sequence (called protospacer) on the target DNA. Cas9 recognizes a trinucleotide (NGG) protospacer adjacent motif (PAM) to specify the cut site (the 3rd nucleotide from PAM). The crRNA and tracrRNA can be expressed separately or engineered into an artificial fusion small guide RNA (sgRNA) via a synthetic stem loop (AGAAAU) to mimic the natural crRNA/tracrRNA duplex. Such sgRNA, like shRNA, can be synthesized or in vitro transcribed for direct RNA transfection or expressed from U6 or H1-promoted RNA expression vector, although cleavage efficiencies of the artificial sgRNA are lower than those for systems with the crRNA and tracrRNA expressed separately.
[0034] The compositions of the invention can include a nucleic acid encoding a CRISPR-associated endonuclease. In some embodiments, the CRISPR-associated endonuclease can be a Cas9 nuclease. The Cas9 nuclease can have a nucleotide sequence identical to the wild type Streptococcus pyrogenes sequence. In some embodiments, the CRISPR-associated endonuclease can be a sequence from other species, for example other Streptococcus species, such as thermophilus; Psuedomona aeruginosa, Escherichia coli, or other sequenced bacteria genomes and archaea, or other prokaryotic microorganisms. Alternatively, the wild type Streptococcus pyrogenes Cas9 sequence can be modified. The nucleic acid sequence can be codon optimized for efficient expression in mammalian cells, i.e., "humanized." A humanized Cas9 nuclease sequence can be for example, the Cas9 nuclease sequence encoded by any of the expression vectors listed in Genbank accession numbers KM099231.1 GI:669193757; KM099232.1 GI:669193761; or KM099233.1 GI:669193765. Alternatively, the Cas9 nuclease sequence can be for example, the sequence contained within a commercially available vector such as PX330 or PX260 from Addgene (Cambridge, Mass.). In some embodiments, the Cas9 endonuclease can have an amino acid sequence that is a variant or a fragment of any of the Cas9 endonuclease sequences of Genbank accession numbers KM099231.1 GI:669193757; KM099232.1 GI:669193761; or KM099233.1 GI:669193765 or Cas9 amino acid sequence of PX330 or PX260 (Addgene, Cambridge, Mass.). The Cas9 nucleotide sequence can be modified to encode biologically active variants of Cas9, and these variants can have or can include, for example, an amino acid sequence that differs from a wild type Cas9 by virtue of containing one or more mutations (e.g., an addition, deletion, or substitution mutation or a combination of such mutations). One or more of the substitution mutations can be a substitution (e.g., a conservative amino acid substitution). For example, a biologically active variant of a Cas9 polypeptide can have an amino acid sequence with at least or about 50% sequence identity (e.g., at least or about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99% sequence identity) to a wild type Cas9 polypeptide. Conservative amino acid substitutions typically include substitutions within the following groups: glycine and alanine; valine, isoleucine, and leucine; aspartic acid and glutamic acid; asparagine, glutamine, serine and threonine; lysine, histidine and arginine; and phenylalanine and tyrosine. The amino acid residues in the Cas9 amino acid sequence can be non-naturally occurring amino acid residues. Naturally occurring amino acid residues include those naturally encoded by the genetic code as well as non-standard amino acids (e.g., amino acids having the D-configuration instead of the L-configuration). The present peptides can also include amino acid residues that are modified versions of standard residues (e.g. pyrrolysine can be used in place of lysine and selenocysteine can be used in place of cysteine). Non-naturally occurring amino acid residues are those that have not been found in nature, but that conform to the basic formula of an amino acid and can be incorporated into a peptide. These include D-alloisoleucine(2R,3S)-2-amino-3-methylpentanoic acid and L-cyclopentyl glycine (S)-2-amino-2-cyclopentyl acetic acid. For other examples, one can consult textbooks or the worldwide web (a site is currently maintained by the California Institute of Technology and displays structures of non-natural amino acids that have been successfully incorporated into functional proteins).
[0035] The Cas9 nuclease sequence can be a mutated sequence. For example the Cas9 nuclease can be mutated in the conserved HNH and RuvC domains, which are involved in strand specific cleavage. For example, an aspartate-to-alanine (D10A) mutation in the RuvC catalytic domain allows the Cas9 nickase mutant (Cas9n) to nick rather than cleave DNA to yield single-stranded breaks, and the subsequent preferential repair through HDR can potentially decrease the frequency of unwanted indel mutations from off-target double-stranded breaks.
[0036] In some embodiments, compositions of the invention can include a CRISPR-associated endonuclease polypeptide encoded by any of the nucleic acid sequences described above. The terms "peptide," "polypeptide," and "protein" are used interchangeably herein, although typically they refer to peptide sequences of varying sizes. We may refer to the amino acid-based compositions of the invention as "polypeptides" to convey that they are linear polymers of amino acid residues, and to help distinguish them from full-length proteins. A polypeptide of the invention can "constitute" or "include" a fragment of a CRISPR-associated endonuclease, and the invention encompasses polypeptides that constitute or include biologically active variants of a CRISPR-associated endonuclease. It will be understood that the polypeptides can therefore include only a fragment of a CRISPR-associated endonuclease (or a biologically active variant thereof) but may include additional residues as well. Biologically active variants will retain sufficient activity to cleave target DNA.
[0037] The bonds between the amino acid residues can be conventional peptide bonds or another covalent bond (such as an ester or ether bond), and the polypeptides can be modified by amidation, phosphorylation or glycosylation. A modification can affect the polypeptide backbone and/or one or more side chains. Chemical modifications can be naturally occurring modifications made in vivo following translation of an mRNA encoding the polypeptide (e.g., glycosylation in a bacterial host) or synthetic modifications made in vitro. A biologically active variant of a CRISPR-associated endonuclease can include one or more structural modifications resulting from any combination of naturally occurring (i.e., made naturally in vivo) and synthetic modifications (i.e., naturally occurring or non-naturally occurring modifications made in vitro). Examples of modifications include, but are not limited to, amidation (e.g., replacement of the free carboxyl group at the C-terminus by an amino group); biotinylation (e.g., acylation of lysine or other reactive amino acid residues with a biotin molecule); glycosylation (e.g., addition of a glycosyl group to either asparagines, hydroxylysine, serine or threonine residues to generate a glycoprotein or glycopeptide); acetylation (e.g., the addition of an acetyl group, typically at the N-terminus of a polypeptide); alkylation (e.g., the addition of an alkyl group); isoprenylation (e.g., the addition of an isoprenoid group); lipoylation (e.g. attachment of a lipoate moiety); and phosphorylation (e.g., addition of a phosphate group to serine, tyrosine, threonine or histidine).
[0038] One or more of the amino acid residues in a biologically active variant may be a non-naturally occurring amino acid residue. Naturally occurring amino acid residues include those naturally encoded by the genetic code as well as non-standard amino acids (e.g., amino acids having the D-configuration instead of the L-configuration). The present peptides can also include amino acid residues that are modified versions of standard residues (e.g. pyrrolysine can be used in place of lysine and selenocysteine can be used in place of cysteine). Non-naturally occurring amino acid residues are those that have not been found in nature, but that conform to the basic formula of an amino acid and can be incorporated into a peptide. These include D-alloisoleucine(2R,3S)-2-amino-3-methylpentanoic acid and L-cyclopentyl glycine (S)-2-amino-2-cyclopentyl acetic acid. For other examples, one can consult textbooks or the worldwide web (a site is currently maintained by the California Institute of Technology and displays structures of non-natural amino acids that have been successfully incorporated into functional proteins).
[0039] Alternatively, or in addition, one or more of the amino acid residues in a biologically active variant can be a naturally occurring residue that differs from the naturally occurring residue found in the corresponding position in a wildtype sequence. In other words, biologically active variants can include one or more amino acid substitutions. We may refer to a substitution, addition, or deletion of amino acid residues as a mutation of the wildtype sequence. As noted, the substitution can replace a naturally occurring amino acid residue with a non-naturally occurring residue or just a different naturally occurring residue. Further the substitution can constitute a conservative or non-conservative substitution. Conservative amino acid substitutions typically include substitutions within the following groups: glycine and alanine; valine, isoleucine, and leucine; aspartic acid and glutamic acid; asparagine, glutamine, serine and threonine; lysine, histidine and arginine; and phenylalanine and tyrosine.
[0040] The polypeptides that are biologically active variants of a CRISPR-associated endonuclease can be characterized in terms of the extent to which their sequence is similar to or identical to the corresponding wild-type polypeptide. For example, the sequence of a biologically active variant can be at least or about 80% identical to corresponding residues in the wild-type polypeptide. For example, a biologically active variant of a CRISPR-associated endonuclease can have an amino acid sequence with at least or about 80% sequence identity (e.g., at least or about 85%, 90%, 95%, 97%, 98%, or 99% sequence identity) to a CRISPR-associated endonuclease or to a homolog or ortholog thereof.
[0041] A biologically active variant of a CRISPR-associated endonuclease polypeptide will retain sufficient biological activity to be useful in the present methods. The biologically active variants will retain sufficient activity to function in targeted DNA cleavage. The biological activity can be assessed in ways known to one of ordinary skill in the art and includes, without limitation, in vitro cleavage assays or functional assays.
[0042] Polypeptides can be generated by a variety of methods including, for example, recombinant techniques or chemical synthesis. Once generated, polypeptides can be isolated and purified to any desired extent by means well known in the art. For example, one can use lyophilization following, for example, reversed phase (preferably) or normal phase HPLC, or size exclusion or partition chromatography on polysaccharide gel media such as Sephadex G-25. The composition of the final polypeptide may be confirmed by amino acid analysis after degradation of the peptide by standard means, by amino acid sequencing, or by FAB-MS techniques. Salts, including acid salts, esters, amides, and N-acyl derivatives of an amino group of a polypeptide may be prepared using methods known in the art, and such peptides are useful in the context of the present invention.
[0043] The compositions of the invention include sequence encoding a guide RNA (gRNA) comprising a sequence that is complementary to a target sequence in a retrovirus. The retrovirus can be a lentivirus, for example, a human immunodeficiency virus, a simian immunodeficiency virus, a feline immunodeficiency virus or a bovine immunodeficiency virus. The human immunodeficiency virus can be HIV-1 or HIV-2. The target sequence can include a sequence from any HIV, for example, HIV-1 and HIV-2, and any circulating recombinant form thereof. The genetic variability of HIV is reflected in the multiple groups and subtypes that have been described. A collection of HIV sequences is compiled in the Los Alamos HIV databases and compendiums. The methods and compositions of the invention can be applied to HIV from any of those various groups, subtypes, and circulating recombinant forms. These include for example, the HIV-1 major group (often referred to as Group M) and the minor groups, Groups N, O, and P, as well as but not limited to, any of the following subtypes, A, B, C, D, F, G, H, J and K. or group (for example, but not limited to any of the following Groups, N, O and P) of HIV. The methods and compositions can also be applied to HIV-2 and any of the A, B, C, F or G clades (also referred to as "subtypes" or "groups"), as well as any circulating recombinant form of HIV-2.
[0044] The guide RNA can be a sequence complimentary to a coding or a non-coding sequence. For example, the guide RNA can be an HIV sequence, such as a long terminal repeat (LTR) sequence, a protein coding sequence, or a regulatory sequence. In some embodiments, the guide RNA comprises a sequence that is complementary to an HIV long terminal repeat (LTR) region. The HIV-1 LTR is approximately 640 bp in length. An exemplary HIV-1 LTR is the sequence of SEQ ID NO: 376. An exemplary SIV LTR is the sequence of SEQ ID NO: 380. HIV-1 long terminal repeats (LTRs) are divided into U3, R and U5 regions. Exemplary HIV-1 LTR U3, R and U5 regions are SEQ ID NOs: 377, 378 and 379, respectively. Exemplary SIV LTR U3, R and U5 regions are SEQ ID NOs: 381, 382, and 383, respectively. The configuration of the U1, R, U5 regions for exemplary HIV-1 and SIV sequences are shown in FIG. 18 and FIG. 19, respectively. LTRs contain all of the required signals for gene expression and are involved in the integration of a provirus into the genome of a host cell. For example, the basal or core promoter, a core enhancer and a modulatory region is found within U3 while the transactivation response element is found within R. In HIV-1, the U5 region includes several sub-regions, for example, TAR or trans-acting responsive element, which is involved in transcriptional activation; Poly A, which is involved in dimerization and genome packaging; PBS or primer binding site; Psi or the packaging signal; DIS or dimer initiation site
[0045] Useful guide sequences are complementary to the U3, R, or U5 region of the LTR. Exemplary guide RNA sequences that target the U3 region of HIV-1 are shown in FIG. 13. A guide RNA sequence can comprise, for example, a sequence complementary to the target protospacer sequence of:
TABLE-US-00001 (SEQ ID NO: 96) LTR A: ATCAGATATCCACTGACCTTTGG, (SEQ ID NO: 121) LTR B: CAGCAGTTCTTGAAGTACTCCGG, (SEQ ID NO: 87) LTR C: GATTGGCAGAACTACACACCAGG, or (SEQ ID NO: 110) LTR D: GCGTGGCCTGGGCGGGACTGGGG.
[0046] The locations of LTR A (SEQ ID NO: 96), LTR B (SEQ ID NO: 121), LTR C (SEQ ID NO: 87) and LTR D (SEQ ID NO: 110) within the U3 (SEQ ID NO: 16) region are shown FIG. 5. Additional exemplary guide RNA sequences that target the U3 region are listed in the table shown in FIG. 13 and can have the sequence of any of SEQ ID NOs: 79-111 and SEQ ID NOs: 111-141. In some embodiments, the guide sequence can comprise a sequence having 95% identity to any of SEQ ID NOs: 79-111 and SEQ ID NOs: 111-141. Thus, a guide RNA sequence can comprise, for example, a sequence having 95% identity to a sequence complementary to the target protospacer sequence of:
TABLE-US-00002 (SEQ ID NO: 96) LTR A: ATCAGATATCCACTGACCTTTGG, (SEQ ID NO: 121) LTR B: CAGCAGTTCTTGAAGTACTCCGG, (SEQ ID NO: 87) LTR C GATTGGCAGAACTACACACCAGG, or (SEQ ID NO: 110) LTR D: GCGTGGCCTGGGCGGGACTGGGG.
[0047] We may also be refer to the guide RNA sequence as a spacer, e.g., spacer (A), spacer (B), spacer (C), and spacer (D).
[0048] The guide RNA sequence can be complementary to a sequence found within an HIV-1 U3, R, or U5 region reference sequence or consensus sequence. The invention is not so limiting however, and the guide RNA sequences can be selected to target any variant or mutant HIV sequence. In some embodiments, more than one guide RNA sequence is employed, for example a first guide RNA sequence and a second guide RNA sequence, with the first and second guide RNA sequences being complimentary to target sequences in any of the above mentioned retroviral regions. In some embodiments, the guide RNA can include a variant sequence or quasi-species sequence. In some embodiments, the guide RNA can be a sequence corresponding to a sequence in the genome of the virus harbored by the subject undergoing treatment. Thus for example, the sequence of the particular U3, R, or U5 region in the HIV virus harbored by the subject can be obtained and guide RNAs complementary to the patient's particular sequences can be used.
[0049] In some embodiments, the guide RNA can be a sequence complimentary to a protein coding sequence, for example, a sequence encoding one or more viral structural proteins, (e.g., gag, pol, env and tat). Thus, the sequence can be complementary to sequence within the gag polyprotein, e.g., MA (matrix protein, p17); CA (capsid protein, p24); SP1 (spacer peptide 1, p2); NC (nucleocapsid protein, p7); SP2 (spacer peptide 2, p1) and P6 protein; pol, e.g., reverse transcriptase (RT) and RNase H, integrase (IN), and HIV protease (PR); env, e.g., gp160, or a cleavage product of gp160, e.g., gp120 or SU, and gp41 or TM; or tat, e.g., the 72-amino acid one-exon Tat or the 86-101 amino-acid two-exon Tat. In some embodiments, the guide RNA can be a sequence complementary to a sequence encoding an accessory protein, including, for example, vif, nef (negative factor) vpu (Virus protein U) and tev.
[0050] In some embodiments, the sequence can be a sequence complementary to a structural or regulatory element, for example, an LTR, as described above; TAR (Target sequence for viral transactivation), the binding site for Tat protein and for cellular proteins, consists of approximately the first 45 nucleotides of the viral mRNAs in HIV-1 (or the first 100 nucleotides in HIV-2) forms a hairpin stem-loop structure; RRE (Rev responsive element) an RNA element encoded within the env region of HIV-1, consisting of approximately 200 nucleotides (positions 7710 to 8061 from the start of transcription in HIV-1, spanning the border of gp120 and gp41); PE (Psi element), a set of 4 stem-loop structures preceding and overlapping the Gag start codon; SLIP, a TTTTTT "slippery site", followed by a stem-loop structure; CRS (Cis-acting repressive sequences); INS Inhibitory/Instability RNA sequences) found for example, at nucleotides 414 to 631 in the gag region of HIV-1.
[0051] The guide RNA sequence can be a sense or anti-sense sequence. The guide RNA sequence generally includes a proto-spacer adjacent motif (PAM). The sequence of the PAM can vary depending upon the specificity requirements of the CRISPR endonuclease used. In the CRISPR-Cas system derived from S. pyogenes, the target DNA typically immediately precedes a 5'-NGG proto-spacer adjacent motif (PAM). Thus, for the S. pyogenes Cas9, the PAM sequence can be AGG, TGG, CGG or GGG. Other Cas9 orthologs may have different PAM specificities. For example, Cas9 from S. thermophilus requires 5'-NNAGAA for CRISPR 1 and 5'-NGGNG for CRISPR3) and Neiseria menigiditis requires 5'-NNNNGATT). The specific sequence of the guide RNA may vary, but, regardless of the sequence, useful guide RNA sequences will be those that minimize off-target effects while achieving high efficiency and complete ablation of the genomically integrated HIV-1 provirus. The length of the guide RNA sequence can vary from about 20 to about 60 or more nucleotides, for example about 20, about 21, about 22, about 23, about 24, about 25, about 26, about 27, about 28, about 29, about 30, about 31, about 32, about 33, about 34, about 35, about 36, about 37, about 38, about 39, about 40, about 45, about 50, about 55, about 60 or more nucleotides. Useful selection methods identify regions having extremely low homology between the foreign viral genome and host cellular genome including endogenous retroviral DNA, include bioinformatic screening using 12-bp+NGG target-selection criteria to exclude off-target human transcriptome or (even rarely) untranslated-genomic sites; avoiding transcription factor binding sites within the HIV-1 LTR promoter (potentially conserved in the host genome); selection of LTR-A- and -B-directed, 30-bp gRNAs and also pre-crRNA system reflecting the original bacterial immune mechanism to enhance specificity/efficiency vs. 20-bp gRNA-, chimeric crRNA-tracRNA-based system and WGS, Sanger sequencing and SURVEYOR assay, to identify and exclude potential off-target effects.
[0052] The guide RNA sequence can be configured as a single sequence or as a combination of one or more different sequences, e.g., a multiplex configuration. Multiplex configurations can include combinations of two, three, four, five, six, seven, eight, nine, ten, or more different guide RNAs, for example any combination of sequences in U3, R, or U5. In some embodiments, combinations of LTR A, LTR B, LTR C and LTR D can be used. In some embodiments, combinations of any of the sequences LTR A (SEQ ID NO: 96), LTR B (SEQ ID NO: 121), LTR C (SEQ ID NO: 87), and LTR D (SEQ ID NO: 110), can be used. In some embodiments, any combinations of the sequences having the sequence of SEQ ID NOs: 79-111 and SEQ ID NOs: 111-141 can be used. When the compositions are administered in an expression vector, the guide RNAs can be encoded by a single vector. Alternatively, multiple vectors can be engineered to each include two or more different guide RNAs. Useful configurations will result in the excision of viral sequences between cleavage sites resulting in the ablation of HIV genome or HIV protein expression. Thus, the use of two or more different guide RNAs promotes excision of the viral sequences between the cleavage sites recognized by the CRISPR endonuclease. The excised region can vary in size from a single nucleotide to several thousand nucleotides. Exemplary excised regions are described in the examples.
[0053] When the compositions are administered as a nucleic acid or are contained within an expression vector, the CRISPR endonuclease can be encoded by the same nucleic acid or vector as the guide RNA sequences. Alternatively or in addition, the CRISPR endonuclease can be encoded in a physically separate nucleic acid from the guide RNA sequences or in a separate vector.
[0054] In some embodiments, the RNA molecules e.g. crRNA, tracrRNA, gRNA are engineered to comprise one or more modified nucleobases. For example, known modifications of RNA molecules can be found, for example, in Genes VI, Chapter 9 ("Interpreting the Genetic Code"), Lewis, ed. (1997, Oxford University Press, New York), and Modification and Editing of RNA, Grosjean and Benne, eds. (1998, ASM Press, Washington D.C.). Modified RNA components include the following: 2'-O-methylcytidine; N.sup.4-methylcytidine; N.sup.4-2'-O-dimethylcytidine; N.sup.4-acetylcytidine; 5-methylcytidine; 5,2'-O-dimethylcytidine; 5-hydroxymethylcytidine; 5-formylcytidine; 2'-O-methyl-5-formaylcytidine; 3-methylcytidine; 2-thiocytidine; lysidine; 2'-O-methyluridine; 2-thiouridine; 2-thio-2'-O-methyluridine; 3,2'-O-dimethyluridine; 3-(3-amino-3-carboxypropyl)uridine; 4-thiouridine; ribosylthymine; 5,2'-O-dimethyluridine; 5-methyl-2-thiouridine; 5-hydroxyuridine; 5-methoxyuridine; uridine 5-oxyacetic acid; uridine 5-oxyacetic acid methyl ester; 5-carboxymethyluridine; 5-methoxycarbonylmethyluridine; 5-methoxycarbonylmethyl-2'-O-methyluridine; 5-methoxycarbonylmethyl-2'-thiouridine; 5-carbamoylmethyluridine; 5-carbamoylmethyl-2'-O-methyluridine; 5-(carboxyhydroxymethyl)uridine; 5-(carboxyhydroxymethyl) uridinemethyl ester; 5-aminomethyl-2-thiouridine; 5-methylaminomethyluridine; 5-methylaminomethyl-2-thiouridine; 5-methylaminomethyl-2-selenouridine; 5-carboxymethylaminomethyluridine; 5-carboxymethylaminomethyl-2'-O-methyl-uridine; 5-carboxymethylaminomethyl-2-thiouridine; dihydrouridine; dihydroribosylthymine; 2'-methyladenosine; 2-methyladenosine; N.sup.6N-methyladenosine; N.sup.6, N.sup.6-dimethyladenosine; N.sup.6,2'-O-trimethyladenosine; 2-methylthio-N.sup.6N-isopentenyladenosine; N.sup.6-(cis-hydroxyisopentenyl)-adenosine; 2-methylthio-N.sup.6-(cis-hydroxyisopentenyl)-adenosine; N.sup.6-glycinylcarbamoyl)adenosine; N.sup.6-threonylcarbamoyl adenosine; N.sup.6-methyl-N.sup.6-threonylcarbamoyl adenosine; 2-methylthio-N.sup.6-methyl-N.sup.6-threonylcarbamoyl adenosine; N.sup.6-hydroxynorvalylcarbamoyl adenosine; 2-methylthio-N.sup.6-hydroxnorvalylcarbamoyl adenosine; 2'-O-ribosyladenosine (phosphate); inosine; 2'O-methyl inosine; 1-methyl inosine; 1;2'-O-dimethyl inosine; 2'-O-methyl guanosine; 1-methyl guanosine; N.sup.2-methyl guanosine; N.sup.2,N.sup.2-dimethyl guanosine; N.sup.2, 2'-O-dimethyl guanosine; N.sup.2, N.sup.2, 2'-O-trimethyl guanosine; 2'-O-ribosyl guanosine (phosphate); 7-methyl guanosine; N.sup.2;7-dimethyl guanosine; N.sup.2; N.sup.2;7-trimethyl guanosine; wyosine; methylwyosine; under-modified hydroxywybutosine; wybutosine; hydroxywybutosine; peroxywybutosine; queuosine; epoxyqueuosine; galactosyl-queuosine; mannosyl-queuosine; 7-cyano-7-deazaguanosine; arachaeosine [also called 7-formamido-7-deazaguanosine]; and 7-aminomethyl-7-deazaguanosine.
[0055] We may use the terms "nucleic acid" and "polynucleotide" interchangeably to refer to both RNA and DNA, including cDNA, genomic DNA, synthetic DNA, and DNA (or RNA) containing nucleic acid analogs, any of which may encode a polypeptide of the invention and all of which are encompassed by the invention. Polynucleotides can have essentially any three-dimensional structure. A nucleic acid can be double-stranded or single-stranded (i.e., a sense strand or an antisense strand). Non-limiting examples of polynucleotides include genes, gene fragments, exons, introns, messenger RNA (mRNA) and portions thereof, transfer RNA, ribosomal RNA, siRNA, micro-RNA, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers, as well as nucleic acid analogs. In the context of the present invention, nucleic acids can encode a fragment of a naturally occurring Cas9 or a biologically active variant thereof and a guide RNA where in the guide RNA is complementary to a sequence in HIV.
[0056] An "isolated" nucleic acid can be, for example, a naturally-occurring DNA molecule or a fragment thereof, provided that at least one of the nucleic acid sequences normally found immediately flanking that DNA molecule in a naturally-occurring genome is removed or absent. Thus, an isolated nucleic acid includes, without limitation, a DNA molecule that exists as a separate molecule, independent of other sequences (e.g., a chemically synthesized nucleic acid, or a cDNA or genomic DNA fragment produced by the polymerase chain reaction (PCR) or restriction endonuclease treatment). An isolated nucleic acid also refers to a DNA molecule that is incorporated into a vector, an autonomously replicating plasmid, a virus, or into the genomic DNA of a prokaryote or eukaryote. In addition, an isolated nucleic acid can include an engineered nucleic acid such as a DNA molecule that is part of a hybrid or fusion nucleic acid. A nucleic acid existing among many (e.g., dozens, or hundreds to millions) of other nucleic acids within, for example, cDNA libraries or genomic libraries, or gel slices containing a genomic DNA restriction digest, is not an isolated nucleic acid.
[0057] Isolated nucleic acid molecules can be produced by standard techniques. For example, polymerase chain reaction (PCR) techniques can be used to obtain an isolated nucleic acid containing a nucleotide sequence described herein, including nucleotide sequences encoding a polypeptide described herein. PCR can be used to amplify specific sequences from DNA as well as RNA, including sequences from total genomic DNA or total cellular RNA. Various PCR methods are described in, for example, PCR Primer: A Laboratory Manual, Dieffenbach and Dveksler, eds., Cold Spring Harbor Laboratory Press, 1995. Generally, sequence information from the ends of the region of interest or beyond is employed to design oligonucleotide primers that are identical or similar in sequence to opposite strands of the template to be amplified. Various PCR strategies also are available by which site-specific nucleotide sequence modifications can be introduced into a template nucleic acid.
[0058] Isolated nucleic acids also can be chemically synthesized, either as a single nucleic acid molecule (e.g., using automated DNA synthesis in the 3' to 5' direction using phosphoramidite technology) or as a series of oligonucleotides. For example, one or more pairs of long oligonucleotides (e.g., >50-100 nucleotides) can be synthesized that contain the desired sequence, with each pair containing a short segment of complementarity (e.g., about 15 nucleotides) such that a duplex is formed when the oligonucleotide pair is annealed. DNA polymerase is used to extend the oligonucleotides, resulting in a single, double-stranded nucleic acid molecule per oligonucleotide pair, which then can be ligated into a vector. Isolated nucleic acids of the invention also can be obtained by mutagenesis of, e.g., a naturally occurring portion of a Cas9-encoding DNA (in accordance with, for example, the formula above).
[0059] Two nucleic acids or the polypeptides they encode may be described as having a certain degree of identity to one another. For example, a Cas9 protein and a biologically active variant thereof may be described as exhibiting a certain degree of identity. Alignments may be assembled by locating short Cas9 sequences in the Protein Information Research (PIR) site, followed by analysis with the "short nearly identical sequences." Basic Local Alignment Search Tool (BLAST) algorithm on the NCBI website.
[0060] As used herein, the term "percent sequence identity" refers to the degree of identity between any given query sequence and a subject sequence. For example, a naturally occurring Cas9 can be the query sequence and a fragment of a Cas9 protein can be the subject sequence. Similarly, a fragment of a Cas9 protein can be the query sequence and a biologically active variant thereof can be the subject sequence.
[0061] To determine sequence identity, a query nucleic acid or amino acid sequence can be aligned to one or more subject nucleic acid or amino acid sequences, respectively, using the computer program ClustalW (version 1.83, default parameters), which allows alignments of nucleic acid or protein sequences to be carried out across their entire length (global alignment). See Chenna et al., Nucleic Acids Res. 31:3497-3500, 2003.
[0062] ClustalW calculates the best match between a query and one or more subject sequences and aligns them so that identities, similarities and differences can be determined. Gaps of one or more residues can be inserted into a query sequence, a subject sequence, or both, to maximize sequence alignments. For fast pair wise alignment of nucleic acid sequences, the following default parameters are used: word size: 2; window size: 4; scoring method: percentage; number of top diagonals: 4; and gap penalty: 5. for multiple alignments of nucleic acid sequences, the following parameters are used: gap opening penalty: 10.0; gap extension penalty: 5.0; and weight transitions: yes. For fast pair wise alignment of protein sequences, the following parameters are used: word size: 1; window size: 5; scoring method: percentage; number of top diagonals: 5; gap penalty: 3. For multiple alignment of protein sequences, the following parameters are used: weight matrix: blosum; gap opening penalty: 10.0; gap extension penalty: 0.05; hydrophilic gaps: on; hydrophilic residues: Gly, Pro, Ser, Asn, Asp, Gin, Glu, Arg, and Lys; residue-specific gap penalties: on. The output is a sequence alignment that reflects the relationship between sequences. ClustalW can be run, for example, at the Baylor College of Medicine Search Launcher site (searchlauncher.bcm.tmc.edu/multi-align/multi-align.html) and at the European Bioinformatics Institute site on the World Wide Web (ebi.ac.uk/clustalw).
[0063] To determine a percent identity between a query sequence and a subject sequence, ClustalW divides the number of identities in the best alignment by the number of residues compared (gap positions are excluded), and multiplies the result by 100. The output is the percent identity of the subject sequence with respect to the query sequence. It is noted that the percent identity value can be rounded to the nearest tenth. For example, 78.11, 78.12, 78.13, and 78.14 are rounded down to 78.1, while 78.15, 78.16, 78.17, 78.18, and 78.19 are rounded up to 78.2.
[0064] The nucleic acids and polypeptides described herein may be referred to as "exogenous". The term "exogenous" indicates that the nucleic acid or polypeptide is part of, or encoded by, a recombinant nucleic acid construct, or is not in its natural environment. For example, an exogenous nucleic acid can be a sequence from one species introduced into another species, i.e., a heterologous nucleic acid. Typically, such an exogenous nucleic acid is introduced into the other species via a recombinant nucleic acid construct. An exogenous nucleic acid can also be a sequence that is native to an organism and that has been reintroduced into cells of that organism. An exogenous nucleic acid that includes a native sequence can often be distinguished from the naturally occurring sequence by the presence of non-natural sequences linked to the exogenous nucleic acid, e.g., non-native regulatory sequences flanking a native sequence in a recombinant nucleic acid construct. In addition, stably transformed exogenous nucleic acids typically are integrated at positions other than the position where the native sequence is found.
[0065] Recombinant constructs are also provided herein and can be used to transform cells in order to express Cas9 and/or a guide RNA complementary to a target sequence in HIV. A recombinant nucleic acid construct comprises a nucleic acid encoding a Cas9 and/or a guide RNA complementary to a target sequence in HIV as described herein, operably linked to a regulatory region suitable for expressing the Cas9 and/or a guide RNA complementary to a target sequence in HIV in the cell. It will be appreciated that a number of nucleic acids can encode a polypeptide having a particular amino acid sequence. The degeneracy of the genetic code is well known in the art. For many amino acids, there is more than one nucleotide triplet that serves as the codon for the amino acid. For example, codons in the coding sequence for Cas9 can be modified such that optimal expression in a particular organism is obtained, using appropriate codon bias tables for that organism.
[0066] Vectors containing nucleic acids such as those described herein also are provided. A "vector" is a replicon, such as a plasmid, phage, or cosmid, into which another DNA segment may be inserted so as to bring about the replication of the inserted segment. Generally, a vector is capable of replication when associated with the proper control elements. Suitable vector backbones include, for example, those routinely used in the art such as plasmids, viruses, artificial chromosomes, BACs, YACs, or PACs. The term "vector" includes cloning and expression vectors, as well as viral vectors and integrating vectors. An "expression vector" is a vector that includes a regulatory region. A wide variety of host/expression vector combinations may be used to express the nucleic acid sequences described herein. Suitable expression vectors include, without limitation, plasmids and viral vectors derived from, for example, bacteriophage, baculoviruses, and retroviruses. Numerous vectors and expression systems are commercially available from such corporations as Novagen (Madison, Wis.), Clontech (Palo Alto, Calif.), Stratagene (La Jolla, Calif.), and Invitrogen/Life Technologies (Carlsbad, Calif.).
[0067] The vectors provided herein also can include, for example, origins of replication, scaffold attachment regions (SARs), and/or markers. A marker gene can confer a selectable phenotype on a host cell. For example, a marker can confer biocide resistance, such as resistance to an antibiotic (e.g., kanamycin, G418, bleomycin, or hygromycin). As noted above, an expression vector can include a tag sequence designed to facilitate manipulation or detection (e.g., purification or localization) of the expressed polypeptide. Tag sequences, such as green fluorescent protein (GFP), glutathione S-transferase (GST), polyhistidine, c-myc, hemagglutinin, or Flag.TM. tag (Kodak, New Haven, Conn.) sequences typically are expressed as a fusion with the encoded polypeptide. Such tags can be inserted anywhere within the polypeptide, including at either the carboxyl or amino terminus.
[0068] Additional expression vectors also can include, for example, segments of chromosomal, non-chromosomal and synthetic DNA sequences. Suitable vectors include derivatives of SV40 and known bacterial plasmids, e.g., E. coli plasmids col E1, pCR1, pBR322, pMal-C2, pET, pGEX, pMB9 and their derivatives, plasmids such as RP4; phage DNAs, e.g., the numerous derivatives of phage 1, e.g., NM989, and other phage DNA, e.g., M13 and filamentous single stranded phage DNA; yeast plasmids such as the 2.mu. plasmid or derivatives thereof, vectors useful in eukaryotic cells, such as vectors useful in insect or mammalian cells; vectors derived from combinations of plasmids and phage DNAs, such as plasmids that have been modified to employ phage DNA or other expression control sequences.
[0069] Yeast expression systems can also be used. For example, the non-fusion pYES2 vector (XbaI, SphI, ShoI, NotI, GstXI, EcoRI, BstXI, BamH1, SacI, Kpn1, and HindIII cloning sites; Invitrogen) or the fusion pYESHisA, B, C (XbaI, SphI, ShoI, NotI, BstXI, EcoRI, BamH1, SacI, KpnI, and HindIII cloning sites, N-terminal peptide purified with ProBond resin and cleaved with enterokinase; Invitrogen), to mention just two, can be employed according to the invention. A yeast two-hybrid expression system can also be prepared in accordance with the invention.
[0070] The vector can also include a regulatory region. The term "regulatory region" refers to nucleotide sequences that influence transcription or translation initiation and rate, and stability and/or mobility of a transcription or translation product. Regulatory regions include, without limitation, promoter sequences, enhancer sequences, response elements, protein recognition sites, inducible elements, protein binding sequences, 5' and 3' untranslated regions (UTRs), transcriptional start sites, termination sequences, polyadenylation sequences, nuclear localization signals, and introns.
[0071] As used herein, the term "operably linked" refers to positioning of a regulatory region and a sequence to be transcribed in a nucleic acid so as to influence transcription or translation of such a sequence. For example, to bring a coding sequence under the control of a promoter, the translation initiation site of the translational reading frame of the polypeptide is typically positioned between one and about fifty nucleotides downstream of the promoter. A promoter can, however, be positioned as much as about 5,000 nucleotides upstream of the translation initiation site or about 2,000 nucleotides upstream of the transcription start site. A promoter typically comprises at least a core (basal) promoter. A promoter also may include at least one control element, such as an enhancer sequence, an upstream element or an upstream activation region (UAR). The choice of promoters to be included depends upon several factors, including, but not limited to, efficiency, selectability, inducibility, desired expression level, and cell- or tissue-preferential expression. It is a routine matter for one of skill in the art to modulate the expression of a coding sequence by appropriately selecting and positioning promoters and other regulatory regions relative to the coding sequence.
[0072] Vectors include, for example, viral vectors (such as adenoviruses ("Ad"), adeno-associated viruses (AAV), and vesicular stomatitis virus (VSV) and retroviruses), liposomes and other lipid-containing complexes, and other macromolecular complexes capable of mediating delivery of a polynucleotide to a host cell. Vectors can also comprise other components or functionalities that further modulate gene delivery and/or gene expression, or that otherwise provide beneficial properties to the targeted cells. As described and illustrated in more detail below, such other components include, for example, components that influence binding or targeting to cells (including components that mediate cell-type or tissue-specific binding); components that influence uptake of the vector nucleic acid by the cell; components that influence localization of the polynucleotide within the cell after uptake (such as agents mediating nuclear localization); and components that influence expression of the polynucleotide. Such components also might include markers, such as detectable and/or selectable markers that can be used to detect or select for cells that have taken up and are expressing the nucleic acid delivered by the vector. Such components can be provided as a natural feature of the vector (such as the use of certain viral vectors which have components or functionalities mediating binding and uptake), or vectors can be modified to provide such functionalities. Other vectors include those described by Chen et al; BioTechniques, 34: 167-171 (2003). A large variety of such vectors are known in the art and are generally available.
[0073] A "recombinant viral vector" refers to a viral vector comprising one or more heterologous gene products or sequences. Since many viral vectors exhibit size-constraints associated with packaging, the heterologous gene products or sequences are typically introduced by replacing one or more portions of the viral genome. Such viruses may become replication-defective, requiring the deleted function(s) to be provided in trans during viral replication and encapsidation (by using, e.g., a helper virus or a packaging cell line carrying gene products necessary for replication and/or encapsidation). Modified viral vectors in which a polynucleotide to be delivered is carried on the outside of the viral particle have also been described (see, e.g., Curiel, D T, et al. PNAS 88: 8850-8854, 1991).
[0074] Suitable nucleic acid delivery systems include recombinant viral vector, typically sequence from at least one of an adenovirus, adenovirus-associated virus (AAV), helper-dependent adenovirus, retrovirus, or hemagglutinating virus of Japan-liposome (HVJ) complex. In such cases, the viral vector comprises a strong eukaryotic promoter operably linked to the polynucleotide e.g., a cytomegalovirus (CMV) promoter. The recombinant viral vector can include one or more of the polynucleotides therein, preferably about one polynucleotide. In some embodiments, the viral vector used in the invention methods has a pfu (plague forming units) of from about 108 to about 5.times.10.sup.10 pfu. In embodiments in which the polynucleotide is to be administered with a non-viral vector, use of between from about 0.1 nanograms to about 4000 micrograms will often be useful e.g., about 1 nanogram to about 100 micrograms.
[0075] Additional vectors include viral vectors, fusion proteins and chemical conjugates. Retroviral vectors include Moloney murine leukemia viruses and HIV-based viruses. One HIV-based viral vector comprises at least two vectors wherein the gag and pol genes are from an HIV genome and the env gene is from another virus. DNA viral vectors include pox vectors such as orthopox or avipox vectors, herpesvirus vectors such as a herpes simplex I virus (HSV) vector [Geller, A. I. et al., J. Neurochem, 64: 487 (1995); Lim, F., et al., in DNA Cloning: Mammalian Systems, D. Glover, Ed. (Oxford Univ. Press, Oxford England) (1995); Geller, A. I. et al., Proc Natl. Acad. Sci.: USA.:90 7603 (1993); Geller, A. I., et al., Proc Natl. Acad. Sci USA: 87:1149 (1990)], Adenovirus Vectors [LeGal LaSalle et al., Science, 259:988 (1993); Davidson, et al., Nat. Genet. 3: 219 (1993); Yang, et al., J. Virol. 69: 2004 (1995)] and Adeno-associated Virus Vectors [Kaplitt, M. G., et al., Nat. Genet. 8:148 (1994)].
[0076] Pox viral vectors introduce the gene into the cells cytoplasm. Avipox virus vectors result in only a short term expression of the nucleic acid. Adenovirus vectors, adeno-associated virus vectors and herpes simplex virus (HSV) vectors may be an indication for some invention embodiments. The adenovirus vector results in a shorter term expression (e.g., less than about a month) than adeno-associated virus, in some embodiments, may exhibit much longer expression. The particular vector chosen will depend upon the target cell and the condition being treated. The selection of appropriate promoters can readily be accomplished. An example of a suitable promoter is the 763-base-pair cytomegalovirus (CMV) promoter. Other suitable promoters which may be used for gene expression include, but are not limited to, the Rous sarcoma virus (RSV) (Davis, et al., Hum Gene Ther 4:151 (1993)), the SV40 early promoter region, the herpes thymidine kinase promoter, the regulatory sequences of the metallothionein (MMT) gene, prokaryotic expression vectors such as the .beta.-lactamase promoter, the tac promoter, promoter elements from yeast or other fungi such as the Gal 4 promoter, the ADC (alcohol dehydrogenase) promoter, PGK (phosphoglycerol kinase) promoter, alkaline phosphatase promoter; and the animal transcriptional control regions, which exhibit tissue specificity and have been utilized in transgenic animals: elastase I gene control region which is active in pancreatic acinar cells, insulin gene control region which is active in pancreatic beta cells, immunoglobulin gene control region which is active in lymphoid cells, mouse mammary tumor virus control region which is active in testicular, breast, lymphoid and mast cells, albumin gene control region which is active in liver, alpha-fetoprotein gene control region which is active in liver, alpha 1-antitrypsin gene control region which is active in the liver, beta-globin gene control region which is active in myeloid cells, myelin basic protein gene control region which is active in oligodendrocyte cells in the brain, myosin light chain-2 gene control region which is active in skeletal muscle, and gonadotropic releasing hormone gene control region which is active in the hypothalamus. Certain proteins can expressed using their native promoter. Other elements that can enhance expression can also be included such as an enhancer or a system that results in high levels of expression such as a tat gene and tar element. This cassette can then be inserted into a vector, e.g., a plasmid vector such as, pUC19, pUC118, pBR322, or other known plasmid vectors, that includes, for example, an E. coli origin of replication. See, Sambrook, et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory press, (1989). The plasmid vector may also include a selectable marker such as the .beta.-lactamase gene for ampicillin resistance, provided that the marker polypeptide does not adversely affect the metabolism of the organism being treated. The cassette can also be bound to a nucleic acid binding moiety in a synthetic delivery system, such as the system disclosed in WO 95/22618.
[0077] If desired, the polynucleotides of the invention may also be used with a microdelivery vehicle such as cationic liposomes and adenoviral vectors. For a review of the procedures for liposome preparation, targeting and delivery of contents, see Mannino and Gould-Fogerite, BioTechniques, 6:682 (1988). See also, Felgner and Holm, Bethesda Res. Lab. Focus, 11(2):21 (1989) and Maurer, R. A., Bethesda Res. Lab. Focus, 11(2):25 (1989).
[0078] Replication-defective recombinant adenoviral vectors, can be produced in accordance with known techniques. See, Quantin, et al., Proc. Natl. Acad. Sci. USA, 89:2581-2584 (1992); Stratford-Perricadet, et al., J. Clin. Invest., 90:626-630 (1992); and Rosenfeld, et al., Cell, 68:143-155 (1992).
[0079] Another delivery method is to use single stranded DNA producing vectors which can produce the expressed products intracellularly. See for example, Chen et al, BioTechniques, 34: 167-171 (2003), which is incorporated herein, by reference, in its entirety.
[0080] Pharmaceutical Compositions
[0081] As described above, the compositions of the present invention can be prepared in a variety of ways known to one of ordinary skill in the art. Regardless of their original source or the manner in which they are obtained, the compositions of the invention can be formulated in accordance with their use. For example, the nucleic acids and vectors described above can be formulated within compositions for application to cells in tissue culture or for administration to a patient or subject. Any of the pharmaceutical compositions of the invention can be formulated for use in the preparation of a medicament, and particular uses are indicated below in the context of treatment, e.g., the treatment of a subject having an HIV infection or at risk for contracting and HIV infection. When employed as pharmaceuticals, any of the nucleic acids and vectors can be administered in the form of pharmaceutical compositions. These compositions can be prepared in a manner well known in the pharmaceutical art, and can be administered by a variety of routes, depending upon whether local or systemic treatment is desired and upon the area to be treated. Administration may be topical (including ophthalmic and to mucous membranes including intranasal, vaginal and rectal delivery), pulmonary (e.g., by inhalation or insufflation of powders or aerosols, including by nebulizer; intratracheal, intranasal, epidermal and transdermal), ocular, oral or parenteral. Methods for ocular delivery can include topical administration (eye drops), subconjunctival, periocular or intravitreal injection or introduction by balloon catheter or ophthalmic inserts surgically placed in the conjunctival sac. Parenteral administration includes intravenous, intra-arterial, subcutaneous, intraperitoneal or intramuscular injection or infusion; or intracranial, e.g., intrathecal or intraventricular administration. Parenteral administration can be in the form of a single bolus dose, or may be, for example, by a continuous perfusion pump. Pharmaceutical compositions and formulations for topical administration may include transdermal patches, ointments, lotions, creams, gels, drops, suppositories, sprays, liquids, powders, and the like. Conventional pharmaceutical carriers, aqueous, powder or oily bases, thickeners and the like may be necessary or desirable.
[0082] This invention also includes pharmaceutical compositions which contain, as the active ingredient, nucleic acids and vectors described herein in combination with one or more pharmaceutically acceptable carriers. We use the terms "pharmaceutically acceptable" (or "pharmacologically acceptable") to refer to molecular entities and compositions that do not produce an adverse, allergic or other untoward reaction when administered to an animal or a human, as appropriate. The term "pharmaceutically acceptable carrier," as used herein, includes any and all solvents, dispersion media, coatings, antibacterial, isotonic and absorption delaying agents, buffers, excipients, binders, lubricants, gels, surfactants and the like, that may be used as media for a pharmaceutically acceptable substance. In making the compositions of the invention, the active ingredient is typically mixed with an excipient, diluted by an excipient or enclosed within such a carrier in the form of, for example, a capsule, tablet, sachet, paper, or other container. When the excipient serves as a diluent, it can be a solid, semisolid, or liquid material (e.g., normal saline), which acts as a vehicle, carrier or medium for the active ingredient. Thus, the compositions can be in the form of tablets, pills, powders, lozenges, sachets, cachets, elixirs, suspensions, emulsions, solutions, syrups, aerosols (as a solid or in a liquid medium), lotions, creams, ointments, gels, soft and hard gelatin capsules, suppositories, sterile injectable solutions, and sterile packaged powders. As is known in the art, the type of diluent can vary depending upon the intended route of administration. The resulting compositions can include additional agents, such as preservatives. In some embodiments, the carrier can be, or can include, a lipid-based or polymer-based colloid. In some embodiments, the carrier material can be a colloid formulated as a liposome, a hydrogel, a microparticle, a nanoparticle, or a block copolymer micelle. As noted, the carrier material can form a capsule, and that material may be a polymer-based colloid.
[0083] The nucleic acid sequences of the invention can be delivered to an appropriate cell of a subject. This can be achieved by, for example, the use of a polymeric, biodegradable microparticle or microcapsule delivery vehicle, sized to optimize phagocytosis by phagocytic cells such as macrophages. For example, PLGA (poly-lacto-co-glycolide) microparticles approximately 1-10 .mu.m in diameter can be used. The polynucleotide is encapsulated in these microparticles, which are taken up by macrophages and gradually biodegraded within the cell, thereby releasing the polynucleotide. Once released, the DNA is expressed within the cell. A second type of microparticle is intended not to be taken up directly by cells, but rather to serve primarily as a slow-release reservoir of nucleic acid that is taken up by cells only upon release from the microparticle through biodegradation. These polymeric particles should therefore be large enough to preclude phagocytosis (i.e., larger than 5 .mu.m and preferably larger than 20 .mu.m). Another way to achieve uptake of the nucleic acid is using liposomes, prepared by standard methods. The nucleic acids can be incorporated alone into these delivery vehicles or co-incorporated with tissue-specific antibodies, for example antibodies that target cell types that are commonly latently infected reservoirs of HIV infection, for example, brain macrophages, microglia, astrocytes, and gut-associated lymphoid cells. Alternatively, one can prepare a molecular complex composed of a plasmid or other vector attached to poly-L-lysine by electrostatic or covalent forces. Poly-L-lysine binds to a ligand that can bind to a receptor on target cells. Delivery of "naked DNA" (i.e., without a delivery vehicle) to an intramuscular, intradermal, or subcutaneous site, is another means to achieve in vivo expression. In the relevant polynucleotides (e.g., expression vectors) the nucleic acid sequence encoding the an isolated nucleic acid sequence comprising a sequence encoding a CRISPR-associated endonuclease and a guide RNA is operatively linked to a promoter or enhancer-promoter combination. Promoters and enhancers are described above.
[0084] In some embodiments, the compositions of the invention can be formulated as a nanoparticle, for example, nanoparticles comprised of a core of high molecular weight linear polyethylenimine (LPEI) complexed with DNA and surrounded by a shell of polyethyleneglycol-modified (PEGylated) low molecular weight LPEI.
[0085] The nucleic acids and vectors may also be applied to a surface of a device (e.g., a catheter) or contained within a pump, patch, or other drug delivery device. The nucleic acids and vectors of the invention can be administered alone, or in a mixture, in the presence of a pharmaceutically acceptable excipient or carrier (e.g., physiological saline). The excipient or carrier is selected on the basis of the mode and route of administration. Suitable pharmaceutical carriers, as well as pharmaceutical necessities for use in pharmaceutical formulations, are described in Remington's Pharmaceutical Sciences (E. W. Martin), a well-known reference text in this field, and in the USP/NF (United States Pharmacopeia and the National Formulary).
[0086] In some embodiments, the compositions may be formulated as a topical gel for blocking sexual transmission of HIV. The topical gel can be applied directly to the skin or mucous membranes of the male or female genital region prior to sexual activity. Alternatively or in addition the topical gel can be applied to the surface or contained within a male or female condom or diaphragm.
[0087] In some embodiments, the compositions can be formulated as a nanoparticle encapsulating a nucleic acid encoding Cas9 or a variant Cas9 and a guide RNA sequence complementary to a target HIV or vector comprising a nucleic acid encoding Cas9 and a guide RNA sequence complementary to a target HIV. Alternatively, the compositions can be formulated as a nanoparticle encapsulating a CRISPR-associated endonuclease polypeptide, e.g., Cas9 or a variant Cas9 and a guide RNA sequence complementary to a target.
[0088] The present formulations can encompass a vector encoding Cas9 and a guide RNA sequence complementary to a target HIV. The guide RNA sequence can include a sequence complementary to a single region, e.g. LTR A, B, C, or D or it can include any combination of sequences complementary to LTR A, B, C, and D. Alternatively the sequence encoding Cas9 and the sequence encoding the guide RNA sequence can be on separate vectors.
[0089] Methods of Treatment
[0090] The compositions disclosed herein are generally and variously useful for treatment of a subject having a retroviral infection, e.g., an HIV infection. We may refer to a subject, patient, or individual interchangeably. The methods are useful for targeting any HIV, for example, HIV-1, HIV-2, and any circulating recombinant form thereof. A subject is effectively treated whenever a clinically beneficial result ensues. This may mean, for example, a complete resolution of the symptoms of a disease, a decrease in the severity of the symptoms of the disease, or a slowing of the disease's progression. These methods can further include the steps of a) identifying a subject (e.g., a patient and, more specifically, a human patient) who has an HIV infection; and b) providing to the subject a composition comprising a nucleic acid encoding a CRISPR-associated nuclease, e.g., Cas9, and a guide RNA complementary to an HIV target sequence, e.g. an HIV LTR. A subject can be identified using standard clinical tests, for example, immunoassays to detect the presence of HIV antibodies or the HIV polypeptide p24 in the subject's serum, or through HIV nucleic acid amplification assays. An amount of such a composition provided to the subject that results in a complete resolution of the symptoms of the infection, a decrease in the severity of the symptoms of the infection, or a slowing of the infection's progression is considered a therapeutically effective amount. The present methods may also include a monitoring step to help optimize dosing and scheduling as well as predict outcome. In some methods of the present invention, one can first determine whether a patient has a latent HIV-1 infection, and then make a determination as to whether or not to treat the patient with one or more of the compositions described herein. Monitoring can also be used to detect the onset of drug resistance and to rapidly distinguish responsive patients from nonresponsive patients. In some embodiments, the methods can further include the step of determining the nucleic acid sequence of the particular HIV harbored by the patient and then designing the guide RNA to be complementary to those particular sequences. For example, one can determine the nucleic acid sequence of a subject's LTR U3, R or U5 region and then design one or more guide RNAs to be precisely complementary to the patient's sequences.
[0091] The compositions are also useful for the treatment, for example, as a prophylactic treatment, of a subject at risk for having a retroviral infection, e.g., an HIV infection. These methods can further include the steps of a) identifying a subject at risk for having an HIV infection; b) providing to the subject a composition comprising a nucleic acid encoding a CRISPR-associated nuclease, e.g., Cas9, and a guide RNA complementary to an HIV target sequence, e.g. an HIV LTR. A subject at risk for having an HIV infection can be, for example, any sexually active individual engaging in unprotected sex, i.e., engaging in sexual activity without the use of a condom; a sexually active individual having another sexually transmitted infection; an intravenous drug user; or an uncircumcised man. A subject at risk for having an HIV infection can be, for example, an individual whose occupation may bring him or her into contact with HIV-infected populations, e.g., healthcare workers or first responders. A subject at risk for having an HIV infection can be, for example, an inmate in a correctional setting or a sex worker, that is, an individual who uses sexual activity for income employment or nonmonetary items such as food, drugs, or shelter.
[0092] The compositions can also be administered to a pregnant or lactating woman having an HIV infection in order to reduce the likelihood of transmission of HIV from the mother to her offspring. A pregnant woman infected with HIV can pass the virus to her offspring transplacentally in utero, at the time of delivery through the birth canal or following delivery, through breast milk. The compositions disclosed herein can be administered to the HIV infected mother either prenatally, perinatally or postnatally during the breast-feeding period, or any combination of prenatal, perinatal, and postnatal administration. Compositions can be administered to the mother along with standard antiretroviral therapies as described below. In some embodiments, the compositions of the invention are also administered to the infant immediately following delivery and, in some embodiments, at intervals thereafter. The infant also can receive standard antiretroviral therapy.
[0093] The methods and compositions disclosed herein are useful for the treatment of retroviral infections. Exemplary retroviruses include human immunodeficiency viruses, e.g. HIV-1, HIV-2; simian immunodeficiency virus (SIV); feline immunodeficiency virus (FIV); bovine immunodeficiency virus (BIV); equine infectious anemia virus (EIAV); and caprine arthritis/encephalitis virus (CAEV). The methods disclosed herein can be applied to a wide range of species, e.g., humans, non-human primates (e.g., monkeys), horses or other livestock, dogs, cats, ferrets or other mammals kept as pets, rats, mice, or other laboratory animals.
[0094] The methods of the invention can be expressed in terms of the preparation of a medicament. Accordingly, the invention encompasses the use of the agents and compositions described herein in the preparation of a medicament. The compounds described herein are useful in therapeutic compositions and regimens or for the manufacture of a medicament for use in treatment of diseases or conditions as described herein.
[0095] Any composition described herein can be administered to any part of the host's body for subsequent delivery to a target cell. A composition can be delivered to, without limitation, the brain, the cerebrospinal fluid, joints, nasal mucosa, blood, lungs, intestines, muscle tissues, skin, or the peritoneal cavity of a mammal. In terms of routes of delivery, a composition can be administered by intravenous, intracranial, intraperitoneal, intramuscular, subcutaneous, intramuscular, intrarectal, intravaginal, intrathecal, intratracheal, intradermal, or transdermal injection, by oral or nasal administration, or by gradual perfusion over time. In a further example, an aerosol preparation of a composition can be given to a host by inhalation.
[0096] The dosage required will depend on the route of administration, the nature of the formulation, the nature of the patient's illness, the patient's size, weight, surface area, age, and sex, other drugs being administered, and the judgment of the attending clinicians. Wide variations in the needed dosage are to be expected in view of the variety of cellular targets and the differing efficiencies of various routes of administration. Variations in these dosage levels can be adjusted using standard empirical routines for optimization, as is well understood in the art. Administrations can be single or multiple (e.g., 2- or 3-, 4-, 6-, 8-, 10-, 20-, 50-, 100-, 150-, or more fold). Encapsulation of the compounds in a suitable delivery vehicle (e.g., polymeric microparticles or implantable devices) may increase the efficiency of delivery.
[0097] The duration of treatment with any composition provided herein can be any length of time from as short as one day to as long as the life span of the host (e.g., many years). For example, a compound can be administered once a week (for, for example, 4 weeks to many months or years); once a month (for, for example, three to twelve months or for many years); or once a year for a period of 5 years, ten years, or longer. It is also noted that the frequency of treatment can be variable. For example, the present compounds can be administered once (or twice, three times, etc.) daily, weekly, monthly, or yearly.
[0098] An effective amount of any composition provided herein can be administered to an individual in need of treatment. The term "effective" as used herein refers to any amount that induces a desired response while not inducing significant toxicity in the patient. Such an amount can be determined by assessing a patient's response after administration of a known amount of a particular composition. In addition, the level of toxicity, if any, can be determined by assessing a patient's clinical symptoms before and after administering a known amount of a particular composition. It is noted that the effective amount of a particular composition administered to a patient can be adjusted according to a desired outcome as well as the patient's response and level of toxicity. Significant toxicity can vary for each particular patient and depends on multiple factors including, without limitation, the patient's disease state, age, and tolerance to side effects.
[0099] Any method known to those in the art can be used to determine if a particular response is induced. Clinical methods that can assess the degree of a particular disease state can be used to determine if a response is induced. The particular methods used to evaluate a response will depend upon the nature of the patient's disorder, the patient's age, and sex, other drugs being administered, and the judgment of the attending clinician.
[0100] The compositions may also be administered with another therapeutic agent, for example, an anti-retroviral agent, used in HAART. Exemplary antiretroviral agents include reverse transcriptase inhibitors (e.g., nucleoside/nucleotide reverse transcriptase inhibitors, zidovudine, emtricitibine, lamivudine and tenofivir; and non-nucleoside reverse transcriptase inhibitors such as efavarenz, nevirapine, rilpivirine); protease inhibitors, e.g., tipiravir, darunavir, indinavir; entry inhibitors, e.g., maraviroc; fusion inhibitors, e.g., enfuviritide; or integrase inhibitors e.g., raltegrivir, dolutegravir. Exemplary antiretroviral agents can also include multi-class combination agents for example, combinations of emtricitabine, efavarenz, and tenofivir; combinations of emtricitabine; rilpivirine, and tenofivir; or combinations of elvitegravir, cobicistat, emtricitabine and tenofivir.
[0101] Concurrent administration of two or more therapeutic agents does not require that the agents be administered at the same time or by the same route, as long as there is an overlap in the time period during which the agents are exerting their therapeutic effect. Simultaneous or sequential administration is contemplated, as is administration on different days or weeks. The therapeutic agents may be administered under a metronomic regimen, e.g., continuous low-doses of a therapeutic agent.
[0102] Dosage, toxicity and therapeutic efficacy of such compositions can be determined by standard pharmaceutical procedures in cell cultures or experimental animals, e.g., for determining the LD.sub.50 (the dose lethal to 50% of the population) and the ED.sub.50 (the dose therapeutically effective in 50% of the population). The dose ratio between toxic and therapeutic effects is the therapeutic index and it can be expressed as the ratio LD.sub.50/ED.sub.50.
[0103] The data obtained from the cell culture assays and animal studies can be used in formulating a range of dosage for use in humans. The dosage of such compositions lies preferably within a range of circulating concentrations that include the ED.sub.50 with little or no toxicity. The dosage may vary within this range depending upon the dosage form employed and the route of administration utilized. For any composition used in the method of the invention, the therapeutically effective dose can be estimated initially from cell culture assays. A dose may be formulated in animal models to achieve a circulating plasma concentration range that includes the IC.sub.50 (i.e., the concentration of the test compound which achieves a half-maximal inhibition of symptoms) as determined in cell culture. Such information can be used to more accurately determine useful doses in humans. Levels in plasma may be measured, for example, by high performance liquid chromatography.
[0104] As described, a therapeutically effective amount of a composition (i.e., an effective dosage) means an amount sufficient to produce a therapeutically (e.g., clinically) desirable result. The compositions can be administered one from one or more times per day to one or more times per week; including once every other day. The skilled artisan will appreciate that certain factors can influence the dosage and timing required to effectively treat a subject, including but not limited to the severity of the disease or disorder, previous treatments, the general health and/or age of the subject, and other diseases present. Moreover, treatment of a subject with a therapeutically effective amount of the compositions of the invention can include a single treatment or a series of treatments.
[0105] The compositions described herein are suitable for use in a variety of drug delivery systems described above. Additionally, in order to enhance the in vivo serum half-life of the administered compound, the compositions may be encapsulated, introduced into the lumen of liposomes, prepared as a colloid, or other conventional techniques may be employed which provide an extended serum half-life of the compositions. A variety of methods are available for preparing liposomes, as described in, e.g., Szoka, et al., U.S. Pat. Nos. 4,235,871, 4,501,728 and 4,837,028 each of which is incorporated herein by reference. Furthermore, one may administer the drug in a targeted drug delivery system, for example, in a liposome coated with a tissue-specific antibody. The liposomes will be targeted to and taken up selectively by the organ.
[0106] Also provided, are methods of inactivating a retrovirus, for example a lentivirus such as a human immunodeficiency virus, a simian immunodeficiency virus, a feline immunodeficiency virus, or a bovine immunodeficiency virus in a mammalian cell. The human immunodeficiency virus can be HIV-1 or HIV-2. The human immunodeficiency virus can be a chromosomally integrated provirus. The mammalian cell can be any cell type infected by HIV, including, but not limited to CD4+ lymphocytes, macrophages, fibroblasts, monocytes, T lymphocytes, B lymphocytes, natural killer cells, dendritic cells such as Langerhans cells and follicular dendritic cells, hematopoietic stem cells, endothelial cells, brain microglial cells, and gastrointestinal epithelial cells. Such cell types include those cell types that are typically infected during a primary infection, for example, a CD4+ lymphocyte, a macrophage, or a Langerhans cell, as well as those cell types that make up latent HIV reservoirs, i.e., a latently infected cell.
[0107] The methods can include exposing the cell to a composition comprising an isolated nucleic acid encoding a gene editing complex comprising a CRISPR-associated endonuclease and one or more guide RNAs wherein the guide RNA is complementary to a target nucleic acid sequence in the retrovirus. In a preferred embodiment, as previously described, the method of inactivating a proviral DNA integrated into the genome of a host cell latently infected with a retrovirus includes the steps of treating the host cell with a composition comprising a CRISPR-associated endonuclease, and two or more different guide RNAs (gRNAs), wherein each of the at least two gRNAs is complementary to a different target nucleic acid sequence in the proviral DNA; and inactivating the proviral DNA. The at least two gRNAs can be configured as a single sequence or as a combination of one or more different sequences, e.g., a multiplex configuration. Multiplex configurations can include combinations of two, three, four, five, six, seven, eight, nine, ten, or more different gRNAs, for example any combination of sequences in U3, R, or U5. In some embodiments, combinations of LTR A, LTR B, LTR C and LTR D can be used. In some embodiments, combinations of any of the sequences LTR A (SEQ ID NO: 96), LTR B (SEQ ID NO: 121), LTR C (SEQ ID NO: 87), and LTR D (SEQ ID NO: 110), can be used. In experiments described in the Examples, the use of two different gRNAs caused the excision of the viral sequences between the cleavage sites recognized by the CRISPR endonuclease. The excised region can include the entire HIV-1 genome. The treating step can take place in vivo, that is, the compositions can be administered directly to a subject having HIV infection. The methods are not so limited however, and the treating step can take place ex vivo. For example, a cell or plurality of cells, or a tissue explant, can be removed from a subject having an HIV infection and placed in culture, and then treated with a composition comprising a CRISPR-associated endonuclease and a guide RNA wherein the guide RNA is complementary to the nucleic acid sequence in the human immunodeficiency virus. As described above, the composition can be a nucleic acid encoding a CRISPR-associated endonuclease and a guide RNA wherein the guide RNA is complementary to the nucleic acid sequence in the human immunodeficiency virus; an expression vector comprising the nucleic acid sequence; or a pharmaceutical composition comprising a nucleic acid encoding a CRISPR-associated endonuclease and a guide RNA wherein the guide RNA is complementary to the nucleic acid sequence in the human immunodeficiency virus; or an expression vector comprising the nucleic acid sequence. In some embodiments, the gene editing complex can comprise a CRISPR-associated endonuclease polypeptide and a guide RNA wherein the guide RNA is complementary to the nucleic acid sequence in the human immunodeficiency virus.
[0108] Regardless of whether compositions are administered as nucleic acids or polypeptides, they are formulated in such a way as to promote uptake by the mammalian cell. Useful vector systems and formulations are described above. In some embodiments the vector can deliver the compositions to a specific cell type. The invention is not so limited however, and other methods of DNA delivery such as chemical transfection, using, for example calcium phosphate, DEAE dextran, liposomes, lipoplexes, surfactants, and perfluoro chemical liquids are also contemplated, as are physical delivery methods, such as electroporation, micro injection, ballistic particles, and "gene gun" systems.
[0109] Standard methods, for example, immunoassays to detect the CRISPR-associated endonuclease, or nucleic acid-based assays such as PCR to detect the gRNA, can be used to confirm that the complex has been taken up and expressed by the cell into which it has been introduced. The engineered cells can then be reintroduced into the subject from whom they were derived as described below.
[0110] The gene editing complex comprises a CRISPR-associated nuclease, e.g., Cas9, and a guide RNA complementary to the retroviral target sequence, for example, an HIV target sequence. The gene editing complex can introduce various mutations into the proviral DNA. The mechanism by which such mutations inactivate the virus can vary, for example the mutation can affect proviral replication, viral gene expression or proviral excision. The mutations may be located in regulatory sequences or structural gene sequences and result in defective production of HIV. The mutation can comprise a deletion. The size of the deletion can vary from a single nucleotide base pair to about 10,000 base pairs. In some embodiments, the deletion can include all or substantially all of the proviral sequence. In some embodiments the deletion can include the entire proviral sequence. The mutation can comprise an insertion; that is the addition of one or more nucleotide base pairs to the pro-viral sequence. The size of the inserted sequence also may vary, for example from about one base pair to about 300 nucleotide base pairs. The mutation can comprise a point mutation, that is, the replacement of a single nucleotide with another nucleotide. Useful point mutations are those that have functional consequences, for example, mutations that result in the conversion of an amino acid codon into a termination codon or that result in the production of a nonfunctional protein.
[0111] In exemplary multiplex methods for inactivating proviral DNA integrated into the genome of a host cell, as demonstrated in Examples 2-5, two different gRNA sequences are deployed, with each gRNA sequence targeting a different site in the proviral DNA. That is, the methods include the steps of exposing the host cell to a composition including an isolated nucleic acid encoding a CRISPR-associated endonuclease; an isolated nucleic acid sequence encoding a first gRNA having a first spacer sequence that is complementary to a first target protospacer sequence in a proviral DNA; and an isolated nucleic acid encoding a second gRNA having a second spacer sequence that is complementary to a second target protospacer sequence in the proviral DNA; expressing in the host cell the CRISPR-associated endonuclease, the first gRNA, and the second gRNA; assembling, in the host cell, a first gene editing complex including the CRISPR-associated endonuclease and the first gRNA; and a second gene editing complex including the CRISPR-associated endonuclease and the second gRNA; directing the first gene editing complex to the first target protospacer sequence by complementary base pairing between the first spacer sequence and the first target protospacer sequence; directing the second gene editing complex to the second target protospacer sequence by complementary base pairing between the second spacer sequence and the second target protospacer sequence; cleaving the proviral DNA at the first target protospacer sequence with the CRISPR-associated endonuclease; cleaving the proviral DNA at the second target protospacer sequence with the CRISPR-associated endonuclease; and inducing at least one mutation in the proviral DNA. The same multiplex method is readily incorporated into methods for treating a subject having a human immunodeficiency virus, and for reducing the risk of a human immunodeficiency virus infection. It will be understood that the term "composition" can include not only a mixture of components, but also separate components that are not necessarily administered simultaneously. As a non-limiting example, a composition according to the present invention can include separate component preparations of nucleic acid sequences encoding a Cas9 nuclease, a first gRNA, and a second gRNA, with each component being administered sequentially in an infusion, during a time frame that results in a host cell being exposed to all three components.
[0112] In other embodiments, the compositions comprise a cell which has been transformed or transfected with one or more Cas/gRNA vectors. In some embodiments, the methods of the invention can be applied ex vivo. That is, a subject's cells can be removed from the body and treated with the compositions in culture to excise HIV sequences and the treated cells returned to the subject's body. The cell can be the subject's cells or they can be haplotype matched or a cell line. The cells can be irradiated to prevent replication. In some embodiments, the cells are human leukocyte antigen (HLA)-matched, autologous, cell lines, or combinations thereof. In other embodiments the cells can be a stem cell. For example, an embryonic stem cell or an artificial pluripotent stem cell (induced pluripotent stem cell (iPS cell)). Embryonic stem cells (ES cells) and artificial pluripotent stem cells (induced pluripotent stem cell, iPS cells) have been established from many animal species, including humans. These types of pluripotent stem cells would be the most useful source of cells for regenerative medicine because these cells are capable of differentiation into almost all of the organs by appropriate induction of their differentiation, with retaining their ability of actively dividing while maintaining their pluripotency. iPS cells, in particular, can be established from self-derived somatic cells, and therefore are not likely to cause ethical and social issues, in comparison with ES cells which are produced by destruction of embryos. Further, iPS cells, which are self-derived cell, make it possible to avoid rejection reactions, which are the biggest obstacle to regenerative medicine or transplantation therapy.
[0113] The gRNA expression cassette can be easily delivered to a subject by methods known in the art, for example, methods which deliver siRNA. In some aspects, the Cas may be a fragment wherein the active domains of the Cas molecule are included, thereby cutting down on the size of the molecule. Thus, the, Cas9/gRNA molecules can be used clinically, similar to the approaches taken by current gene therapy. In particular, a Cas9/multiplex gRNA stable expression stem cell or iPS cells for cell transplantation therapy as well as HIV-1 vaccination will be developed for use in subjects.
[0114] Transduced cells are prepared for reinfusion according to established methods. After a period of about 2-4 weeks in culture, the cells may number between 1.times.10.sup.6 and 1.times.10.sup.10. In this regard, the growth characteristics of cells vary from patient to patient and from cell type to cell type. About 72 hours prior to reinfusion of the transduced cells, an aliquot is taken for analysis of phenotype, and percentage of cells expressing the therapeutic agent. For administration, cells of the present invention can be administered at a rate determined by the LD.sub.50 of the cell type, and the side effects of the cell type at various concentrations, as applied to the mass and overall health of the patient. Administration can be accomplished via single or divided doses. Adult stem cells may also be mobilized using exogenously administered factors that stimulate their production and egress from tissues or spaces that may include, but are not restricted to, bone marrow or adipose tissues.
[0115] Articles of Manufacture
[0116] The compositions described herein can be packaged in suitable containers labeled, for example, for use as a therapy to treat a subject having a retroviral infection, for example, an HIV infection or a subject at for contracting a retroviral infection, for example, an HIV infection. The containers can include a composition comprising a nucleic acid sequence encoding a CRISPR-associated endonuclease, for example, a Cas9 endonuclease, and a guide RNA complementary to a target sequence in a human immunodeficiency virus, or a vector encoding that nucleic acid, and one or more of a suitable stabilizer, carrier molecule, flavoring, and/or the like, as appropriate for the intended use. Accordingly, packaged products (e.g., sterile containers containing one or more of the compositions described herein and packaged for storage, shipment, or sale at concentrated or ready-to-use concentrations) and kits, including at least one composition of the invention, e.g., a nucleic acid sequence encoding a CRISPR-associated endonuclease, for example, a Cas9 endonuclease, and a guide RNA complementary to a target sequence in a human immunodeficiency virus, or a vector encoding that nucleic acid and instructions for use, are also within the scope of the invention. A product can include a container (e.g., a vial, jar, bottle, bag, or the like) containing one or more compositions of the invention. In addition, an article of manufacture further may include, for example, packaging materials, instructions for use, syringes, delivery devices, buffers or other control reagents for treating or monitoring the condition for which prophylaxis or treatment is required.
[0117] In some embodiments, the kits can include one or more additional antiretroviral agents, for example, a reverse transcriptase inhibitor, a protease inhibitor or an entry inhibitor. The additional agents can be packaged together in the same container as a nucleic acid sequence encoding a CRISPR-associated endonuclease, for example, a Cas9 endonuclease, and a guide RNA complementary to a target sequence in a human immunodeficiency virus, or a vector encoding that nucleic acid or they can be packaged separately. The nucleic acid sequence encoding a CRISPR-associated endonuclease, for example, a Cas9 endonuclease, and a guide RNA complementary to a target sequence in a human immunodeficiency virus, or a vector encoding that nucleic acid and the additional agent may be combined just before use or administered separately.
[0118] The product may also include a legend (e.g., a printed label or insert or other medium describing the product's use (e.g., an audio- or videotape)). The legend can be associated with the container (e.g., affixed to the container) and can describe the manner in which the compositions therein should be administered (e.g., the frequency and route of administration), indications therefor, and other uses. The compositions can be ready for administration (e.g., present in dose-appropriate units), and may include one or more additional pharmaceutically acceptable adjuvants, carriers or other diluents and/or an additional therapeutic agent. Alternatively, the compositions can be provided in a concentrated form with a diluent and instructions for dilution.
Example 1: Materials and Methods
[0119] Plasmid preparation: Vectors containing human Cas9 and gRNA expression cassette, pX260, and pX330 (Addgene) were utilized to create various constructs, LTR-A, B, C, and D.
[0120] Cell culture and stable cell lines: TZM-b1 reporter and U1 cell lines were obtained from the NIH AIDS Reagent Program and CHME5 microglial cells are known in the art.
[0121] Immunohistochemistry and Western Blot: Standard methods for immunocytochemical observation of the cells and evaluation of protein expression by Western blot were utilized.
[0122] Firefly-luciferase assay: Cells were lysed 24 h post-treatment using Passive Lysis Buffer (Promega) and assayed with a Luciferase Reporter Gene Assay kit (Promega) according to the manufacturer's protocol. Luciferase activity was normalized to the number of cells determined by a parallel MTT assay (Vybrant, Invitrogen)
[0123] p24 ELISA: After infection or reactivation, the levels of HIV-1 viral load in the supernatants were quantified by p24 Gag ELISA (Advanced BioScience Laboratories, Inc) following the manufacturer's protocol. To assess cell viability upon treatments, MTT assay was performed in parallel according to the manufacturer's manual (Vybrant, Invitrogen).
[0124] EGFP Flow cytometry: Cells were trypsinized, washed with PBS and fixed in 2% paraformaldehyde for 10 min at room temperature, then washed twice with PBS and analyzed using a Guava EasyCyte Mini flow cytometer (Guava Technologies).
[0125] HIV-1 reporter virus preparation and infections: HEK293T cells were transfected using Lipofectamine 2000 reagent (Invitrogen) with pNL4-3-.DELTA.E-EGFP (NIH AIDS Research and Reference Reagent Program). After 48 h, the supernatant was collected, 0.45 .mu.m filtered and tittered in HeLa cells using EGFP as an infection marker. For viral infection, stable Cas9/gRNA TZM-bl cells were incubated 2 h with diluted viral stock, and then washed twice with PBS. At 2 and 4 d post-infection, cells were collected, fixed and analyzed by flow cytometry for EGFP expression, or genomic DNA purification was performed for PCR and whole genome sequencing.
[0126] Genomic DNA amplification, PCR, TA-cloning, and Sanger sequencing, GenomeWalker link PCR: Standard methods for DNA manipulation for cloning and sequencing were utilized. For identification of the integration sites of HIV-1, we utilized Lenti-X.TM. integration site analysis kit was used.
[0127] Surveyor assay: The presence of mutations in PCR products was examined using a SURVEYOR Mutation Detection Kit (Transgenomic) according to the protocol from the manufacturer. Briefly heterogeneous PCR product was denatured for 10 min in 95.degree. C. and hybridized by gradual cooling using a thermocycler. Next, 300 ng of hybridized DNA (9 .mu.l) was subjected to digestion with 0.25 .mu.l of SURVEYOR Nuclease in the presence of 0.25 .mu.l SURVEYOR Enhancer S and 15 mM MgCl.sub.2 for 4 h at 42.degree. C. Then Stop Solution was added and samples were resolved in 2% agarose gel together with equal amounts of undigested PCR product controls.
[0128] Some PCR products were used for restriction fragment length polymorphism analysis. Equal amounts of the PCR products were digested with BsaJI. Digested DNA was separated on an ethidium bromide-contained agarose gel (2%). For sequencing, PCR products were cloned using a TA Cloning.RTM. Kit Dual Promoter with pCR.TM. II vector (Invitrogen). The insert was confirmed by digestion with EcoRI and positive clones were sent to Genewiz for Sanger sequencing.
[0129] Selection of LTR target sites, whole genome sequencing and bioinformatics and statistical analysis. We utilized Jack Lin's CRISPR/Cas9 gRNA finder tool for initial identification of potential target sites within the LTR.
[0130] Plasmid preparation. DNA segment expressing LTR-A or LTR-B for pre-crRNA was cloned into the pX260 vector that contains the puromycin selection gene (Addgene, plasmid #42229). DNA segments expressing LTR-C or LTR-D for the chimeric crRNA-tracrRNA were cloned into the pX330 vector (Addgene, plasmid #42230). Both vectors contain a humanized Cas9 coding sequence driven by a CAG promoter and a gRNA expression cassette driven by a human U6 promoter. The vectors were digested with BbsI and treated with Antarctic Phosphatase, and the linearized vector was purified with a Quick nucleotide removal kit (Qiagen). A pair of oligonucleotides for each targeting site (FIG. 14, AlphaDNA) was annealed, phosphorylated, and ligated to the linearized vector. The gRNA expression cassette was sequenced with U6 sequencing primer (FIG. 14) in GENEWIZ. For pX330 vectors, we designed a pair of universal PCR primers with overhang digestion sites (FIG. 14) that can tease out the gRNA expression cassette (U6-gRNA-crRNA-stem-tracrRNA) for direct transfection or subcloning to other vectors.
[0131] Cell culture. TZM-bl reporter cell line from Dr John C. Kappes, Dr Xiaoyun Wu and Tranzyme Inc, U1/Hiv-1 cell line from Dr. Thomas Folks and J-Lat full length clone from Dr. Eric Verdin were obtained through the NIH AIDS Reagent Program, Division of AIDS, NIAID, NIH. CHME5/HIV fetal microglia cell line were generated as previously described. TZM-bl and CHME5 cells were cultured in Dulbecco's minimal essential medium high glucose supplemented with 10% heat-inactivated fetal bovine serum (FBS) and 1% penicillin/streptomycin. U1 and J-Lat cells were cultured in RPMI 1640 containing 2.0 mM L-glutamine, 10% FBS and 1% penicillin/streptomycin.
[0132] Stable cell lines and subcloning. TZM-bl or CHME5/HIV cells were seeded in 6-well plates at 1.5.times.10.sup.5 cells/well and transfected using Lipofectamine 2000 reagent (Invitrogen) with 1 .mu.g of pX260 (for LTR-A and B) or 1 .mu.g/0.1 .mu.g of pX330/pX260 (for LTR-C and D) plasmids. Next day, cells were transferred into 100-mm dishes and incubated with growth medium containing 1 .mu.g/ml of puromycin (Sigma). Two weeks later, surviving cell colonies were isolated using cloning cylinders (Corning). U1 cells (1.5.times.10.sup.5) were electroporated with 1 .mu.g of DNA using 10 .mu.l tip, 3.times.10 ms 1400 V impulses at The Neon.TM. Transfection System (Invitrogen). Cells were selected with 0.5 .mu.g/ml of puromycin for two weeks. The stable clones were subcultured using a limited dilution method in 96-well plates and single cell-derived subclones were maintained for further studies.
[0133] Immunocytochemistry and western blot. The Cas9/gRNA stable expression TZM-bl cells were cultured in 8-well chamber slides for 2 days and fixed for 10 min in 4% paraformaldehyde/PBS. After three rinses, the cells were treated with 0.5% Triton X-100/PBS for 20 min and blocked in 10% donkey serum for 1 h. Cells were incubated overnight at 4.degree. C. with mouse anti-Flag M2 primary antibody (1:500, Sigma). After rinsing three times, cells were incubated for 1 h with donkey anti-mouse Alexa-Fluor-594 secondary antibodies, and incubated with Hoechst 33258 for 5 min. After three rinses with PBS, the cells were coverslipped with anti-fading aqueous mounting media (Biomeda) and analyzed under a Leica DMI6000B fluorescence microscope.
[0134] TZM-bl cells cultured in 6-well plate were solubilized in 200 .mu.l of Triton X-100-based lysis buffer containing 20 mM Tris-HCl (pH 7.4), 1% Triton X-100, 5 mM ethylenediaminetetraacetic acid, 5 mM dithiothreitol, 150 mM NaCl, 1 mM phenylmethylsulfonyl fluoride, 1.times. nuclear extraction proteinase inhibitor cocktail (Cayman Chemical, Ann Arbor, Mich.), 1 mM sodium orthovanadate and 30 mM NaF. Cell lysates were rotated at 4.degree. C. for 30 min. Nuclear and cellular debris was cleared by centrifugation at 20,000 g for 20 min at 4.degree. C. Equal amounts of lysate proteins (20 .mu.g) were denatured by boiling for 5 min in sodium dodecyl sulphate (SDS) sample buffer, fractionated by SDS-polyacrylamide gel electrophoresis in tris-glycine buffer, and transferred to nitrocellulose membrane (BioRad). The SeeBlue prestained standards (Invitrogen) were used as a molecular weight reference. Blots were blocked in 5% BSA/tris-buffered saline (pH 7.6) plus 0.1% Tween-20 (TBS-T) for 1 h and then incubated overnight at 4.degree. C. with mouse anti-Flag M2 monoclonal antibody (1:1000, Sigma) or mouse anti-GAPDH monoclonal antibody (1:3000, Santa Cruz Biotechnology). After washing with TBS-T, the blots were incubated with IRDye 680LT-conjugated anti-mouse antibody for 1 h at room temperature. Membranes were scanned and analyzed using an Odyssey Infrared Imaging System (LI-COR Biosciences).
[0135] Firefly-luciferase assay. Cells were lysed 24 h post-treatment using Passive Lysis Buffer (Promega) and assayed with a Luciferase Reporter Gene Assay kit (Promega) according to the protocol of the manufacturer. Luciferase activity was normalized to the number of cells determined by parallel MTT assay (Vybrant, Invitrogen).
[0136] p24 ELISA After infection or reactivation, the HIV-1 viral load levels in the supernatants were quantified by p24 Gag ELISA (Advanced BioScience Laboratories, Inc) following the manufacturer's protocol. To assess the cell viability upon treatments, MTT assay was performed in parallel according to the manufacturer's protocol (Vybrant, Invitrogen).
[0137] EGFP Flow cytometry. Cells were trypsinized, washed with PBS and fixed in 2% paraformaldehyde for 10 min at room temperature, then washed twice with PBS and analyzed using a Guava EasyCyte Mini flow cytometer (Guava Technologies).
[0138] Hiv-1 reporter virus preparation and infections. HEK293T cells were transfected using Lipofectamine 2000 reagent (Invitrogen) with pNL4-3-.DELTA.E-EGFP, SF162 and JRFL (NIH AIDS Research and Reference Reagent Program). For pseudotyped pNL4-3-.DELTA.E-EGFP, the VSVG vector was cotransfected. After 48 h, the supernatant was collected, 0.45 .mu.m filtered and tittered in HeLa cells using expressed EGFP as an infection marker. For viral infection, stable Cas9/gRNA TZM-bl cells were incubated 2 h with a diluted viral stock, and washed twice with PBS. At 2 and 4 days post-infection, cells were collected, fixed and analyzed by flow cytometry for EGFP expression, or genomic DNA purification was performed for PCR and whole genome sequencing.
[0139] Genomic DNA purification, PCR, TA-cloning and Sanger sequencing. Genomic DNA was isolated from cells using an ArchivePure DNA cell/tissue purification kit (5PRIME) according to the protocol recommended by the manufacturer. One hundred ng of extracted DNA were subjected to PCR using a high-fidelity FailSafe PCR kit (Epicentre) using primers listed in FIG. 14. Three steps of standard PCR were carried out for 30 cycles with 55.degree. C. annealing and 72.degree. C. extension. The products were resolved in 2% agarose gel. The bands of interest were gel-purified and cloned into pCRII T-A vector (Invitrogen), and the nucleotide sequence of individual clones was determined by sequencing at Genewiz using universal T7 and/or SP6 primers.
[0140] Conventional and real-time reverse transcription (RT)-PCR. For total RNA extraction, cells were processed with an RNeasy Mini kit (Qiagen) as per manufacturer's instructions. The potentially residual genomic DNA was removed through on-column DNase digestion with an RNase-Free DNase Set (Qiagen). One .mu.g of RNA for each sample was reversely transcribed into cDNAs using random hexanucleotide primers with a High Capacity cDNA Reverse Transcription Kit (Invitrogen, Grand Island, N.Y.). Conventional PCR was performed using a standard protocol. Quantitative PCR (qPCR) analyses were carried out in a LightCycler480 (Roche) using an SYBR.RTM. Green PCR Master Mix Kit (Applied Biosystems). The RT reactions were diluted to 5 ng of total RNA per micro-liter of reactions and 2 .mu.l was used in a 20-.mu.l PCR reaction. For qPCR analysis of HIV-1 proviruses, 50 ng of genomic DNA were used. The primers were synthesized in AlphaDNA and shown in FIG. 14. The primers for human housekeeping genes GAPDH and RPL13A were obtained from RealTimePrimers (Elkins Park, Pa.). Each sample was tested in triplicate. Cycle threshold (Ct) values were obtained graphically for the target genes and house-keeping genes. The difference in Ct values between the housekeeping gene and target gene was represented as .DELTA.Ct values. The .DELTA..DELTA.Ct values were obtained by subtracting the .DELTA.Ct values of control samples from those of experimental samples. Relative fold or percentage change was calculated as 2-.DELTA..DELTA.Ct. In some cases, absolute quantification was performed using the pNL4-3-.DELTA.E-EGFP plasmid spiked in human genomic DNA as a standard. The number of HIV-1 viral copies was calculated based on standard curve after normalization with housekeeping gene.
[0141] GenomeWalker link PCR and long-range PCR. The integration sites of HIV-1 in host cells were identified using a Lenti-X.TM. Integration Site Analysis kit (Clontech) following the manufacturer's instruction. Briefly, high quality genomic DNAs were extracted from U1 cells using a NucleoSpin Tissue kit (Clontech). To construct the viral integration libraries, each genomic DNA sample was digested with blunt-end-generating digestion enzymes Dra I, Ssp I or HpaI separately overnight at 37.degree. C. The digestion efficiency was verified by electrophoresis on 0.6% agarose. The digested DNA was purified using a NucleoSpin Gel and PCR Clean-Up kit followed by ligation of the digested genomic DNA fragments to GenomeWalker.TM. Adaptor at 16.degree. C. overnight. The ligation reaction was stopped by incubation at 70.degree. C. for 5 min and diluted 5 times with TE buffer. The primary PCR was performed on the DNA segments with adaptor primer 1 (AP1) and LTR-specific primer 1 (LSP1) using Advantage 2 Polymerase Mix followed by a secondary (nested) PCR using AP2 and LSP2 primers (FIG. 14). The secondary PCR products were separated on 1.5% ethidium bromide-containing agarose gel. The major bands were gel-purified and cloned into pCRII T-A vector (Invitrogen), and the nucleotide sequence of individual clones was determined by sequencing at Genewiz using universal T7 and SP6 primers. The sequence reads were analyzed by NCBI BLAST searching. Two integration sites of HIV-1 in U1 cells were identified in chromosomes X and 2. A pair of primers covering each integration site (FIG. 14) was synthesized in AlphaDNA. Long-range PCR using the U1 genomic DNA was performed with a Phusion High-Fidelity PCR kit (New England Biolabs) following the manufacturer's protocol. The PCR products were visualized on 1% agarose gel and validated by Sanger sequencing.
[0142] Surveyor assay. The presence of mutations in PCR products was tested using a SURVEYOR Mutation Detection Kit (Transgenomic) according to the protocol of the manufacturer. Briefly heterogeneous PCR products were denatured for 10 min in 95.degree. C. and hybridized by gradual cooling using a thermocycler. Next 300 ng of hybridized DNA (9 ul) was subjected to digestion with 0.25 .mu.l of SURVEYOR Nuclease in the presence of 0.25 .mu.l SURVEYOR Enhancer S and 15 mM MgCl.sub.2 for 4 h at 42.degree. C. Then Stop Solution was added and samples were resolved in 2% agarose gel together with equal amounts of undigested PCR products.
[0143] Some PCR products were used for restriction fragment length polymorphism analysis. Equal amount of PCR products were digested with BsaJI. Digested DNA was separated on an ethidium bromide-contained agarose gel (2%). For sequencing, PCR products were cloned using a TA Cloning.RTM. Kit Dual Promoter with pCR.TM. II vector (Invitrogen). The insert was confirmed by digestion with EcoRI and positive clones were sent to Genwiz for Sanger sequencing.
[0144] Selection of LTR target sites and prediction of potential off-target sites. For initial studies, we obtained the LTR promoter sequence (-411 to -10) of the integrated lentiviral LTR-luciferase reporter by TA-cloning sequencing of PCR products from the genome of human TZM-bl cells because of potential mutation of LTR during passaging. This promoter sequence has 100% match to the 5'-LTR of pHR'-CMV-LacZ lentiviral vector (AF105229). Thus, sense and antisense sequences of the full-length pHR' 5'-LTR (634 bp) were utilized to search for Cas9/gRNA target sites containing 20 bp gRNA targeting sequence plus the PAM sequence (NRG) using Jack Lin's CRISPR/Cas9 gRNA finder tool. The number of potential off-targets with exact match was predicted by blasting each gRNA targeting sequence plus NRG (AGG, TGG, GGG and CGG; AAG, TAG, GAG, CAG) against all available human genomic and transcript sequences using the NCBI/blastn suite with E-value cutoff 1,000 and word size 7. After pressing Control+F, copy/paste the target sequence (1-23 through 9-23 nucleotides) and find the number of genomic targets with 100% match to the target sequence. The number of off-targets for each search was divided by 3 because of repeated genome library.
[0145] Whole genome sequencing and bioinformatics analysis. The control subclone C1 and experimental subclone AB7 of TZM-bl cells were validated for target cut efficiency and functional suppression of the LTR-luciferase reporter. The genomic DNA was isolated with NucleoSpin Tissue kit (Clontech). The DNA samples were submitted to the NextGen sequencing facility at Temple University Fox Chase Cancer Center. Duplicated genomic DNA libraries were prepared from each subclone using a NEBNext Ultra DNA Library Prep Kit for Illumina (New England Biolab) following the manufacturer's instruction. All libraries were sequenced with paired-end 141-bp reads in two Illumina Rapid Run flowcells on HiSeq 2500 instrument (Illumina). Demultiplexed read data from the sequenced libraries were sent to AccuraScience, LLC for professional bioinformatics analysis. Briefly, the raw reads were mapped against human genome (hg19) and HIV-1 genome by using Bowtie2. A genomic analysis toolkit (GATK, version 2.8.1) was used for the duplicated read removal, local alignment, base quality recalibration and indel calling. The confidence scores 10 and 30 were the thresholds for low quality (LowQual) and high confidence calling (PASS). The potential off-target sites of LTR-A and LTR-B with various mismatches were predicted by NCBI/blastn suite as described above and by a CRISPR Design Tool. All the potential gRNA target sites (FIG. 15) were used to map the .+-.300 bp regions around each indel identified by GATK. The locations of the overlapped regions in the human genome and HIV-1 genome were compared between the control C1 and experimental AB7.
[0146] Statistical analysis. The quantitative data represented mean.+-.standard deviation from 3-5 independent experiments, and were evaluated by Student's t-test or ANOVA and Newman-Keuls multiple comparison test. A p value that is <0.05 or 0.01 was considered as a statistically significant difference.
Example 2: Cas9/LTR-gRNA Suppresses HIV-1 Reporter Virus Production in CHME5 Microglial Cells Latently Infected with HIV-1
[0147] We assessed the ability of HIV-1-directed guide RNAs (gRNAs) to abrogate LTR transcriptional activity and eradicate proviral DNA from the genomes of latently-infected myeloid cells that serve as HIV-1 reservoirs in the brain, a particularly intractable target population. Our strategy was focused on targeting the HIV-1 LTR promoter U3 region. By bioinformatic screening and efficiency/off-target prediction, we identified four gRNA targets (protospacers; LTRs A-D) that avoid conserved transcription factor binding sites, minimizing the likelihood of altering host gene expression (FIG. 5 and FIG. 13). We inserted DNA fragments complementary to gRNAs A-D into a humanized Cas9 expression vector (A/B in pX260; C/D in pX330) and tested their individual and combined abilities to alter the integrated HIV-1 genome activity. We first utilized the microglial cell line CHME5, which harbors integrated copies of a single round HIV-1 vector that includes the 5' and 3' LTRs, and a gene encoding an enhanced green fluorescent protein (EGFP) reporter replacing Gag (pNL4-3-.DELTA.Gag-d2EGFP). Treating CHME5 cells with trichostatin A (TSA), a histone deacetylase inhibitor, reactivates transcription from the majority of the integrated proviruses and leads to expression of EGFP and the remaining HIV-1 proteome. Expressing of gRNAs plus Cas9 markedly decreased the fraction of TSA-induced EGFP-positive CHME5 cells (FIG. 1A and FIG. 6). We detected insertion/deletion gene mutations (indels) for LTRs A-D (FIG. 1B and FIG. 6B) using a Cell nuclease-based heteroduplex-specific SURVEYOR assay. Similarly, expressing gRNAs targeting LTRs C and D in HeLa-derived TZM-bl cells, that contain stably incorporated HIV-1 LTR copies driving a firefly-luciferase reporter gene, suppressed viral promoter activity (FIG. 7A), and elicited indels within the LTR U3 region (FIG. 7B, FIG. 7C, and FIG. 7D) demonstrated by SURVEYOR and Sanger sequencing. Moreover, the combined expression of LTR C/D-targeting gRNAs in these cells caused excision of the predicted 302-bp viral DNA sequence, and emergence of the residual 194-bp fragment (FIG. 7E and FIG. 7F).
[0148] Multiplex expression of LTR-A/B gRNAs in mixed clonal CHME5 cells caused deletion of a 190-bp fragment between A and B target sites and led to indels to various extents (FIG. 1C and FIG. 1D). Among >20 puromycin-selected stable subclones, we found cell populations with complete blockade of TSA-induced HIV-1 proviral reactivation determined by flow cytometry for EGFP (FIG. 1E). PCR-based analysis for EGFP and HIV-1 Rev response element (RRE) in the proviral genome validated the eradication of HIV-1 genome (FIG. 1F, FIG. 1G). Furthermore, sequencing of the PCR products revealed the entire 5'-3' LTR-spanning viral genome was deleted, yielding a 351-bp fragment via a 190-bp excision between cleavage sites A and B (FIG. 1G and FIG. 8B), and a 682-bp fragment with a 175-bp insertion and a 27-bp deletion at the LTR-A and -B sites respectively (FIG. 8C). The residual HIV-1 genome (FIG. 1F, FIG. 1G, and FIG. 1H) may reflect the presence of trace Cas9/gRNA-negative cells. These results indicate that LTR-targeting Cas9/gRNAs A/B eradicates the HIV-1 genome and blocks its reactivation in latently infected microglial cells.
Example 3: Cas9/LTR-gRNA Efficiently Eradicates Latent HIV-1 Virus from U1 Monocytic Cells
[0149] The promonocytic U-937 cell subclone U1, an HIV-1 latency model for infected perivascular macrophages and monocytes, is chronically HIV-1-infected and exhibits low level constitutive viral gene expression and replication. GenomeWalker mapping detected two integrated proviral DNA copies at chromosomes Xp11-4 (FIG. 2A) and 2p21 (FIG. 9A) in U1 cells. A 9935-bp DNA fragment representing the entire 9709-bp proviral HIV-1 DNA plus a flanking 226-bp X-chromosome-derived sequence (FIG. 2A), and a 10176-bp fragment containing 9709-bp HIV-1 genome plus its flanking 2-chromosome-derived 467-bp (FIG. 9A, FIG. 9B) were identified by the long-range PCR analysis of the parental control or empty-vector (U6-CAG) U1 cells. The 226-bp and 467-bp fragments represent the predicted segment from the other copy of chromosome X and 2 respectively, which lacked the integrated proviral DNA. In U1 cells expressing LTR-A/B gRNAs and Cas9, we found two additional DNA fragments of 833 and 670 bp in chromosome X and one additional 1102-bp fragment in chromosome 2. Thus, gRNAs A/B enabled Cas9 to excise the HIV-15'-3' LTR-spanning viral genome segment in both chromosomes. The 833-bp fragment includes the expected 226-bp from the host genome and a 607-bp viral LTR sequence with a 27-bp deletion around the LTR-A site (FIG. 2A and FIG. 2B). The 670-bp fragment encompassed a 226-bp host sequence and residual 444-bp viral LTR sequence after 190-bp fragment excision (FIG. 1D), caused by gRNAs-A/B-guided cleavage at both LTRs (FIG. 2A). The additional fragments did not emerge via circular LTR integration, because it was absent in the parental U1 cells, and such circular LTR viral genome configuration occurs immediately after HIV-1 infection but is short lived and intolerant to repeated passaging. These cells exhibited substantially decreased HIV-1 viral load, shown by the functional p24 ELISA replication assay (FIG. 2C) and real-time PCR analysis (FIG. 9C, FIG. 9D). The detectable but low residual viral load and reactivation may result from cell population heterogeneity and/or incomplete genome editing. We also validated the ablation of HIV-1 genome by Cas9/LTR-A/B gRNAs in latently infected J-Lat T cells harboring integrated HIV-R7/E-/EGFP using flow cytometry analysis, SURVEYOR assay and PCR genotyping (FIG. 10), supporting the results of previous reports on HIV-1 proviral deletion in Jurkat T cells by Cas9/gRNA and ZFN. Taken together, our results suggest that the multiplex LTR-gRNAs/Cas9 system efficiently suppress HIV-1 replication and reactivation in latently HIV-1-infected "reservoir" (microglial, monocytic and T) cells typical of human latent HIV-1 infection, and in TZM-bl cells highly sensitive for detecting HIV-1 transcription and reactivation. Single or multiplex gRNAs targeting 5'- and 3'-LTRs effectively eradicated the entire HIV-1 genome.
Example 4: Stable Expression of Cas9 Plus LTR-A/B Vaccinates TZM-bl Cells Against New HIV-1 Virus Infection
[0150] We next tested whether combined Cas9/LTR gRNAs can immunize cells against HIV-1 infection using stable Cas9/gRNAs-A and -B-expressing TZM-bl-based clones (FIG. 3A). Two of 7 puromycin-selected subclones exhibited efficient excision of the 190-bp LTR-A/B site-spanning DNA fragment (FIG. 3B). However, the remaining 5 subclones exhibited no excision (FIG. 3B) and no indel mutations as verified by Sanger sequencing. PCR genotyping using primers targeting Cas9 and U6-LTR showed that none of these ineffective subclones retained the integrated copies of Cas9/LTR-A/B gRNA expression cassettes. (FIG. 11A, FIG. 11B). As a result, no expression of full-length Cas9 was detected (FIG. 11C, FIG. 11D). The long-term expression of Cas9/LTR-A/B gRNAs did not adversely affect cell growth or viability, suggesting a low occurrence of off-target interference with the host genome or Cas9-induced toxicity in this model. We assessed de novo HIV-1 replication by infecting cells with the VSVG-pseudotyped pNL4-3-.DELTA.E-EGFP reporter virus, with EGFP-positivity by flow cytometry indicating HIV-1 replication. Unlike the control U6-CAG cells, the cells stably expressing Cas9/gRNAs LTRs-A/B failed to support HIV-1 replication at 2 d post infection, indicating that they were immunized effectively against new HIV-1 infection (FIG. 3C and FIG. 3D). A similar immunity against HIV-1 was observed in Cas/LTR-A/B gRNA expressing cells infected with native T-tropic X4 strain pNL4-3-.DELTA.E-EGFP reporter virus (FIG. 12A) or native M-tropic R5 strains such as SF162 and JRFL (FIG. 12B, FIG. 12C, and FIG. 12D).
Example 5: Off-Target Effects of Cas9/LTR-A/B on Human Genome
[0151] The appeal of Cas9/gRNA as an interventional approach rests on its highly specific on-target indel-producing cleavage, but multiplex gRNAs could potentially cause host genome mutagenesis and chromosomal disorders, cytotoxicity, genotoxicity, or oncogenesis. Fairly low viral-human genome homology reduces this risk, but the human genome contains numerous endogenous retroviral genomes that are potentially susceptible to HIV-1-directed gRNAs. Therefore, we assessed off-target effects of selected HIV-1 LTR gRNAs on the human genome. Because the 12-14-bp seed sequence nearest the protospacer-adjacent motif (PAM) region (NGG) is critical for cleavage specificity, we searched >14-bp seed+NGG, and found no off-target candidate sites by LTR gRNAs A-D (FIG. 13). It is not surprising that progressively shorter gRNA segments yielded increasing off-target cleavage sites 100% matched to corresponding on-target sequences (i.e., NGG+13 bp yielded 6, 0, 2 and 9 off-target sites, respectively, whereas NGG+12 bp yielded 16, 5, 16 and 29; FIG. 13). From human genomic DNA we obtained a 500-800-bp sequence covering one of predicted off-target sites using high-fidelity PCR, and analyzed the potential mutations by SURVEYOR and Sanger sequencing. We found no mutations (see representative off-target sites #1, 5 and 6 in TZM-bl and U1 cells; FIG. 4A).
[0152] To assess risk of off-target effects comprehensively, we performed whole genome sequencing (WGS) using the stable Cas9/gRNA A/B-expressing and control U6-CAG TZM-bl cells (FIG. 4B, FIG. 4C, and FIG. 4D). We identified 676,105 indels, using a genome analysis toolkit (GATK, v.2.8.1) with human (hg19) and HIV-1 genomes as reference sequences. Among the indels, 24% occurred in the U6-CAG control, 26% in LTR-A/B subclone, and 50% in both (FIG. 4B). Such substantial inter-sample indel-calling discrepancy suggests the probable off-target effects, but most likely results from its limited confidence, limited WGS coverage (15-30.times.), and cellular heterogeneity. GATK reported only confidently-identified indels: some found in the U6-CAG control but not in the LTR-A/B subclone, and others in the LTR-A/B but not in the U6-CAG. We expected abundant missing indel calls for both samples due to the limited WGS coverage. Such limited indel-calling confidence also implies the possibility of false negatives: missed indels occurring in LTR-A/B but not U6-CAG controls. Cellular heterogeneity may reflect variability of Cas9/gRNA editing efficiency and effects of passaging. Therefore, we tested whether each indel was LTR-A/B gRNA-induced, by analyzing .+-.300 bp flanking each indel against LTRs-A/-B-targeted sites of the HIV-1 genome and predicted/potential gRNA off-target sites of the host genome (FIG. 15). For sequences 100% matched to one containing the seed (12-bp) plus NRG, we identified only 8 overlapped regions of 92 potential off-target sites against 676,105 indels: 6 indels occurring in both samples, and 2 only in the U6-CAG control (FIG. 4C, FIG. 4D). We also identified 2 indels on HIV-1 LTR that occurred only in the LTR-A/B subclone but, as expected, not in the U6-CAG control (FIG. 4C). The results suggest that LTR-A/B gRNAs induce the indicated on-target indels, but no off-target indels, consistent with prior findings using deep sequencing of PCR products covering predicted/potential off-target site.
[0153] Our combined approaches minimized off-target effects while achieving high efficiency and complete ablation of the genomically integrated HIV-1 provirus. In addition to an extremely low homology between the foreign viral genome and host cellular genome including endogenous retroviral DNA, the key design attributes in our study included: bioinformatic screening using the strictest 12-bp+NGG target-selection criteria to exclude off-target human transcriptome or (even rarely) untranslated-genomic sites; avoiding transcription factor binding sites within the HIV-1 LTR promoter (potentially conserved in the host genome); selection of LTR-A- and -B-directed, 30-bp gRNAs and also pre-crRNA system reflecting the original bacterial immune mechanism to enhance specificity/efficiency vs. 20-bp gRNA-, chimeric crRNA-tracRNA-based system; and WGS, Sanger sequencing and SURVEYOR assay, to identify and exclude potential off-target effects. Indeed, the use of newly developed Cas9 double-nicking and RNA-guided FokI nuclease may further assist identification of new targets within the various conserved regions of HIV-1 with reduced off-target effects.
[0154] Our results show that the HIV-1 Cas9/gRNA system has the ability to target more than one copy of the LTR, which are positioned on different chromosomes, suggesting that this genome editing system can alter the DNA sequence of HIV-1 in latently infected patient's cells harboring multiple proviral DNAs. To further ensure high editing efficacy and consistency of our technology, one may consider the most stable region of HIV-1 genome as a target to eradicate HIV-1 in patient samples, which may not harbor only one strain of HIV-1. Alternatively, one may develop personalized treatment modalities based on the data from deep sequencing of the patient-derived viral genome prior to engineering therapeutic Cas9/gRNA molecules.
[0155] Our results also demonstrate that Cas9/gRNA genome editing can be used to immunize cells against HIV-1 infection. The preventative vaccination is independent of HIV-1 strain's diversity because the system targets genomic sequences regardless of how the viruses enter the infected cells. The preexistence of the Cas9/gRNA system in cells led to a rapid elimination of the new HIV-1 before it integrates into the host genome. One may explore various systems for delivery of Cas9/LTR-gRNA for immunizing high-risk subjects, e.g., gene therapies (viral vector and nanoparticle) and transplantation of autologous Cas9/gRNA-modified bone marrow stem/progenitor cells or inducible pluripotent stem cells for eradicating HIV-1 infection.
[0156] Here, we demonstrated the high specificity of Cas9/gRNAs in editing HIV-1 target genome. Results from subclone data revealed the strict dependence of genome editing on the presence of both Cas9 and gRNA. Moreover, only one nucleotide mismatch in the designed gRNA target will disable the editing potency. In addition, all of our 4 designed LTR gRNAs worked well with different cell lines, indicating that the editing is more efficient in the HIV-1 genome than the host cellular genome, wherein not all designed gRNAs are functional, which may be due to different epigenetic regulation, variable genome accessibility, or other reasons. Given the ease and rapidity of Cas9/gRNA development, even if HIV-1 mutations confer resistance to one Cas9/gRNA-based therapy, as described above, HIV-1 variants can be genotyped to enable another personalized therapy for individual patients.
[0157] A number of embodiments of the invention have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. Accordingly, other embodiments are within the scope of the following claims.
Sequence CWU
1
1
389130DNAHuman immunodeficiency virus 1 1gccagggatc agatatccac tgacctttgg
30234DNAHuman immunodeficiency virus
1 2tccggagtac ttcaagaact gctgacatcg agct
34319DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 3ccactgacta cttcaagaa
194859DNAArtificial SequenceDescription of
Artificial Sequence Syntheticpolynucleotidemodified_base(289)..(313)a, c,
t, g, unknown or othermisc_feature(289)..(313)n is a, c, g, or t
4ctaggtgatt aggatattct acaatccaaa ttcttaccag tttgggatta ttcaaattgg
60gcaccttggc agatatgttt tgaaaactgc taggcaaagc attctggaag aatagacaaa
120gaagtaataa aatataacaa aaagcagtgg aagttacaaa aaaaaatgtt tctcttttgg
180aagggctaat ttggtcccaa agaagacaag atatccttga tctgtggatc taccacacac
240aaggctactt ccctgattgg cagaactaca acaccagggc cagggatcnn nnnnnnnnnn
300nnnnnnnnnn nnnttcaagt tagtaccagt tgagccaggg caggtagaag aggccaatga
360aggagagaac aacaccttgt tacaccctat gagcctgcat gggatggagg acccggaggg
420agaagtatta gtgtggaagt ttgacagcct cctagcattt cgtcacatgg cccgagagct
480gcatccggag tactacaaag actgctgaca tcgagttttc tacaagggac tttccgctgg
540ggactttcca gggaggtgtg gcctgggcgg gactggggag tggcgagccc tcagatgctg
600catataagca gctgcttttt gcctgtactg ggtctctctg gttagaccag atctgagcct
660gggagctctc tggctagcta gggaacccac tgcttaagcc tcaataaagc ttgccttgag
720tgctacaagt agtgtgtgcc cgtctgttgt gtgactctgg taactagaga tccctcagac
780ccttttagtc agtgtggaaa atctctagca tctttaaagt acagaatgcc aaaacaggaa
840ggattgataa gatagtcgt
859510DNAHuman immunodeficiency virus 1 5tcttttggaa
10676DNAHuman immunodeficiency virus
1 6gattggcaga actacacacc agggccaggg atcagatatc cactgacctt tggatggtgc
60ttcaagttag taccag
76710DNAHuman immunodeficiency virus 1 7tctttaaagt
10810DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
8tcttttggaa
10963DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 9gattggcaga actacaacac cagggccagg gatcagatgg
atggtgcttc aagttagtac 60cag
631010DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 10tctttaaagt
101110DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
11tcttttggaa
101250DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 12gattggcaga actacaacac cagggccagg gatcttcaag
ttagtaccag 501310DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 13tctttaaagt
101424DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
14gagatcctgt ctcaaaaaaa agtt
241517DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 15atctatccat gagggcg
1716402DNAHuman immunodeficiency virus 1
16gatctgtgga tctaccacac acaaggctac ttccctgatt ggcagaacta cacaccaggg
60ccagggatca gatatccact gacctttgga tggtgctaca agctagtacc agttgagcaa
120gagaaggtag aagaagccaa tgaaggagag aacacccgct tgttacaccc tgtgagcctg
180catgggatgg atgacccgga gagagaagta ttagagtgga ggtttgacag ccgcctagca
240tttcatcaca tggcccgaga gctgcatccg gagtacttca agaactgctg acatcgagct
300tgctacaagg gactttccgc tggggacttt ccagggaggc gtggcctggg cgggactggg
360gagtggcgag ccctcagatg ctgcatataa gcagctgctt tt
4021731DNAHuman immunodeficiency virus 1 17ccctgattgg cagaactaca
caccagggcc a 311832DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
18ccctgattgg cagaactaca acaccagggc ca
321932DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 19ccctgattgg cagaactaca acaccagggc ca
322032DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 20ccctgattgg cagaactaca
acaccagggc ca 322130DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
21ccctgattgg cagaactaca accagggcca
302229DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 22ccctgattgg cagaactaca ccagggcca
292329DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 23ccctgattgg cagaactaca
ccagggcca 292426DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
24ccctgattgg cagaactaca gggcca
262529DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 25ccctgattgg cagaactaca gggccaggg
292686DNAHuman immunodeficiency virus 1
26gactttccag ggaggcgtgg cctgggcggg actggggagt ggcgagccct cagatgctgc
60atataagcag cggtgaagcc gaattc
862786DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 27gactttccag ggaggcgtgg cctgggcggg actggggggt
ggcgagccct cagatgctgc 60atataagcag cggtgaagcc gaattc
862888DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 28gactttccag ggaggcgtgg
cctgggcggg tatctgggga gtggcgagcc ctcagatgct 60gcatataagc agcggtgaag
ccgaattc 882985DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
29gactttccag gggggcgtgg cctgggcggg actggggagt ggcgagccct cagatgctgc
60ataaagcagc ggtgaagccg aattc
853023DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 30gactttccag ggaagccgaa ttc
233125DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 31gattggcaga actacactgg
ggagt 253226DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
32gattggcaga actacacctc agatgc
263328DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 33catcacatgg cccgctgctg acatcgag
283455DNAHuman immunodeficiency virus 1
34catcacgtgg cccgagagct gcatccggag tacttcaaga actgctgaca tcgag
55351106DNAArtificial SequenceDescription of Artificial Sequence
Syntheticpolynucleotidemodified_base(152)..(155)a, c, t, g, unknown or
othermisc_feature(152)..(155)n is a, c, g, or t 35gctattgtat ctgatcacaa
gctgttaaaa gcggtcatgc cacttcttga atgctttgca 60gctggaaggg ctaatttggt
cccaaagaag acaagatatc cttgatctgt ggatctacca 120cacacaaggc tacttccctg
attggcagaa cnnnncacca gggccaggga tcagatatcc 180actgaccatc cactttggat
ggtgcttcaa gttagtacca gttgagccag ggcaggtaga 240agaggccaat gaaggagaga
acaacacctt gttacaccct atgagcctgc atgggatgga 300ggacccggag ggagaagtat
tagtgtggaa gtttgacagc ctcctagcat ttcgtcacat 360ggcccgagag ctgcatccgg
agtactacaa agactgctga catcgagttt tctacaaggg 420actttccgct ggggactttc
cagggaggtg tggcctgggc gggactgggg agtggcgagc 480cctcagatgc tgcatataag
cagctgcttt ttgcctgtac tgggtctctc tggttagacc 540agatctgagc ctgggagctc
tctggctagc tagggaaccc actgcttaag cctcaataaa 600gcttgccttg agtgctacaa
gtagtgtgtg cccgtctgtt gtgtgactct ggtaactaga 660gatccctcag acccttttag
tcagtgtgga aaatctctag cagcagctta gaaatttttt 720ccaccagagg ccgggcgtgg
tggctcacgc ctgtaatccc agcactttgg gaggccgagg 780tgggcggatc acctgaagtc
aggagttcga gaccagcctc aacatggaga aaccccatct 840ctactaaaaa tacaaaatta
gctgggcgtg gtggtgcatg cctgtaatcc cagctacttg 900ggaggctgag acaggataat
tgcttgaacc tggaaggcag aggttgcggt gagccgagat 960tgcgccattg cattccagcc
tgggcaacag gagcgaaact tcgtctcaaa aaaaaaaaaa 1020aaagacattt tttccaccag
ataccctaga tcatgactgt taagtctggc cttccacgaa 1080gccctaggac ctggacacac
aatcaa 11063636DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
36aaacagggcc agggatcaga tatccactga ccttgt
363735DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 37taaacaaggt cagtggatat ctgatccctg gccct
353836DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 38aaacagctcg atgtcagcag
ttcttgaagt actcgt 363935DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
39taaacgagta cttcaagaac tgctgacatc gagct
354024DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 40caccgattgg cagaactaca cacc
244124DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 41aaacggtgtg tagttctgcc aatc
244224DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
42caccgcgtgg cctgggcggg actg
244324DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 43aaaccagtcc cgcccaggcc acgc
244424DNAArtificial SequenceDescription of
Artificial Sequence Syntheticprimer 44tggaagggct aattcactcc caac
244524DNAArtificial SequenceDescription
of Artificial Sequence Syntheticprimer 45ccgagagctc ccaggctcag atct
244627DNAArtificial
SequenceDescription of Artificial Sequence Syntheticprimer 46caccgatctg
tggatctacc acacaca
274724DNAArtificial SequenceDescription of Artificial Sequence
Syntheticprimer 47aaacgagtca cacaacagac gggc
244837DNAArtificial SequenceDescription of Artificial
Sequence Syntheticprimer 48cgcctcgagg atccgagggc ctatttccca tgattcc
374935DNAArtificial SequenceDescription of
Artificial Sequence Syntheticprimer 49tgtgaattca ggcgggccat ttaccgtaag
ttatg 355025DNAArtificial
SequenceDescription of Artificial Sequence Syntheticprimer 50acgactatct
tatcaatcct tcctg
255126DNAArtificial SequenceDescription of Artificial Sequence
Syntheticprimer 51ctaggtgatt aggatattct acaatc
265224DNAArtificial SequenceDescription of Artificial
Sequence Syntheticprimer 52gctattgtat ctgatcacaa gctg
245324DNAArtificial SequenceDescription of
Artificial Sequence Syntheticprimer 53ttgattgtgt gtccaggtcc tagg
245423DNAArtificial SequenceDescription
of Artificial Sequence Syntheticprimer 54gcaagggcga ggagctgttc acc
235524DNAArtificial
SequenceDescription of Artificial Sequence Syntheticprimer 55ttgtagttgc
cgtcgtcctt gaag
245623DNAArtificial SequenceDescription of Artificial Sequence
Syntheticprimer 56aatggtacat caggccatat cac
235723DNAArtificial SequenceDescription of Artificial
Sequence Syntheticprimer 57cccactgtgt ttagcatggt att
235823DNAArtificial SequenceDescription of
Artificial Sequence Syntheticprimer 58cacagcatca agaagaacct gat
235924DNAArtificial SequenceDescription
of Artificial Sequence Syntheticprimer 59tcttccgtct ggtgtatctt cttc
246028DNAArtificial
SequenceDescription of Artificial Sequence Syntheticprimer 60cgccaagctt
gaataggagc tttgttcc
286130DNAArtificial SequenceDescription of Artificial Sequence
Syntheticprimer 61ctaggatcca ggagctgttg atcctttagg
306223DNAArtificial SequenceDescription of Artificial
Sequence Syntheticoligonucleotide 62gtggactttg gatggtgaga tag
236323DNAArtificial SequenceDescription
of Artificial Sequence Syntheticoligonucleotide 63gcctggcaag agtgaactga
gtc 236423DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
64aagataatga gttgtggcag agc
236524DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 65tctacctggt aatccagcat ctgg
246623DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 66ataggaggaa ggcaccaaga ggg
236723DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
67aatgatgctt tggtcctact cct
236824DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 68tgctcttgct actctggcat gtac
246923DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 69aatctacctc tgagagctgc agg
237023DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
70tcagacacag ctgaagcaga ggc
237123DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 71atgccagtgt cagtagatgt cag
237224DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 72tcaagatcag ccagagtgca catg
247323DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
73tgctcttccg agcctctctg gag
237422DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 74atggactatc atatgcttac cg
227528DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 75gcttcagcaa gccgagtcct
gcgtcgag 287628DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
76gctcctctgg tttccctttc gctttcaa
287722DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 77gtaatacgac tcactatagg gc
227819DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 78actatagggc acgcgtggt
197923DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
79tcagaccctt ttagtcagtg tgg
238023DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 80ttgcttgtac tgggtctctc tgg
238123DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 81cagctgcttt ttgcttgtac tgg
238223DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
82ctgacatcga gcttgctaca agg
238323DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 83ccgcctagca tttcatcaca tgg
238423DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 84cggagagaga agtattagag tgg
238523DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
85agtaccagtt gagcaagaga agg
238623DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 86gatatccact gacctttgga tgg
238723DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 87gattggcaga actacacacc agg
238823DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
88cacaaggcta cttccctgat tgg
238923DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 89ctgtggatct accacacaca agg
239023DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 90tgggagctct ctggctaact agg
239123DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
91ggttagacca gatctgagcc tgg
239223DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 92tgctacaagg gactttccgc tgg
239323DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 93agagagaagt attagagtgg agg
239423DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
94ttacaccctg tgagcctgca tgg
239523DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 95aaggtagaag aagccaatga agg
239623DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 96atcagatatc cactgacctt tgg
239723DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
97gacaagatat ccttgatctg tgg
239823DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 98gcccgtctgt tgtgtgactc tgg
239923DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 99atctgagcct gggagctctc tgg
2310023DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
100ctttccgctg gggactttcc agg
2310123DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 101cagaactaca caccagggcc agg
2310223DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 102cctgcatggg atggatgacc cgg
2310323DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
103ccctgtgagc ctgcatggga tgg
2310423DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 104ctttccaggg aggcgtggcc tgg
2310523DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 105ggggactttc cagggaggcg tgg
2310623DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
106ccgctgggga ctttccaggg agg
2310723DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 107catggcccga gagctgcatc cgg
2310823DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 108gcctgggcgg gactggggag tgg
2310923DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
109aggcgtggcc tgggcgggac tgg
2311023DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 110gcgtggcctg ggcgggactg ggg
2311123DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 111ccagggaggc gtggcctggg cgg
2311223DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
112tgtggtagat ccacagatca agg
2311323DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 113ggtgtgtagt tctgccaatc agg
2311423DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 114gtcagtggat atctgatccc tgg
2311523DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
115tagcaccatc caaaggtcag tgg
2311623DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 116tagcttgtag caccatccaa agg
2311723DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 117tctaccttct cttgctcaac tgg
2311823DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
118cactctaata cttctctctc cgg
2311923DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 119ccatgtgatg aaatgctagg cgg
2312023DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 120gggccatgtg atgaaatgct agg
2312123DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
121cagcagttct tgaagtactc cgg
2312223DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 122ctgcttatat gcagcatctg agg
2312323DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 123cacactactt gaagcactca agg
2312423DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
124taccagagtc acacaacaga cgg
2312523DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 125acactgacta aaagggtctg agg
2312623DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 126caaggatatc ttgtcttcgt tgg
2312723DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
127cagggaagta gccttgtgtg tgg
2312823DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 128gcgggtgttc tctccttcat tgg
2312923DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 129tagttagcca gagagctccc agg
2313023DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
130ctttattgag gcttaagcag tgg
2313123DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 131actcaaggca agctttattg agg
2313223DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 132ggatatctga tccctggccc tgg
2313323DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
133ggctcacagg gtgtaacaag cgg
2313423DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 134tccatcccat gcaggctcac agg
2313523DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 135agtactccgg atgcagctct cgg
2313623DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
136agagctccca ggctcagatc tgg
2313723DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 137gattttccac actgactaaa agg
2313823DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 138ccgggtcatc catcccatgc agg
2313923DNAArtificial
SequenceDescription of Artificial Sequence Syntheticoligonucleotide
139cctccctgga aagtccccag cgg
2314023DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 140gccactcccc agtcccgccc agg
2314123DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 141ccgcccaggc cacgcctccc tgg
2314223DNAHuman
immunodeficiency virus 1 142atcagatatc cactgacctt tgg
2314322DNAHuman immunodeficiency virus 1
143tcagatatcc actgaccttt gg
2214422DNAHuman immunodeficiency virus 1 144tcagatatcc actgaccttt gg
2214521DNAHuman immunodeficiency
virus 1 145cagatatcca ctgacctttg g
2114621DNAHuman immunodeficiency virus 1 146cagatatcca ctgacctttg
g 2114720DNAHuman
immunodeficiency virus 1 147agatatccac tgacctttgg
2014820DNAHuman immunodeficiency virus 1
148agatatccac tgacctttgg
2014919DNAHuman immunodeficiency virus 1 149gatatccact gacctttgg
1915019DNAHuman immunodeficiency
virus 1 150gatatccact gacctttgg
1915118DNAHuman immunodeficiency virus 1 151atatccactg acctttgg
1815218DNAHuman
immunodeficiency virus 1 152atatccactg acctttgg
1815317DNAHuman immunodeficiency virus 1
153tatccactga ccttggg
1715417DNAHuman immunodeficiency virus 1 154tatccactga cctttgg
1715517DNAHuman immunodeficiency
virus 1 155tatccactga cctttgg
1715617DNAHuman immunodeficiency virus 1 156tatccactga ccttaag
1715717DNAHuman
immunodeficiency virus 1 157tatccactga ccttgag
1715816DNAHuman immunodeficiency virus 1
158atccactgac cttagg
1615916DNAHuman immunodeficiency virus 1 159atccactgac cttagg
1616016DNAHuman immunodeficiency
virus 1 160atccactgac cttggg
1616116DNAHuman immunodeficiency virus 1 161atccactgac cttggg
1616216DNAHuman
immunodeficiency virus 1 162atccactgac cttggg
1616316DNAHuman immunodeficiency virus 1
163atccactgac cttggg
1616416DNAHuman immunodeficiency virus 1 164atccactgac ctttgg
1616516DNAHuman immunodeficiency
virus 1 165atccactgac ctttgg
1616616DNAHuman immunodeficiency virus 1 166atccactgac ctttgg
1616716DNAHuman
immunodeficiency virus 1 167atccactgac cttaag
1616816DNAHuman immunodeficiency virus 1
168atccactgac cttaag
1616916DNAHuman immunodeficiency virus 1 169atccactgac cttcag
1617016DNAHuman immunodeficiency
virus 1 170atccactgac cttcag
1617116DNAHuman immunodeficiency virus 1 171atccactgac cttgag
1617216DNAHuman
immunodeficiency virus 1 172atccactgac cttgag
1617315DNAHuman immunodeficiency virus 1
173tccactgacc ttagg
1517415DNAHuman immunodeficiency virus 1 174tccactgacc ttagg
1517515DNAHuman immunodeficiency
virus 1 175tccactgacc ttagg
1517615DNAHuman immunodeficiency virus 1 176tccactgacc ttagg
1517715DNAHuman
immunodeficiency virus 1 177tccactgacc ttagg
1517815DNAHuman immunodeficiency virus 1
178tccactgacc ttagg
1517915DNAHuman immunodeficiency virus 1 179tccactgacc ttggg
1518015DNAHuman immunodeficiency
virus 1 180tccactgacc ttggg
1518115DNAHuman immunodeficiency virus 1 181tccactgacc ttggg
1518215DNAHuman
immunodeficiency virus 1 182tccactgacc ttggg
1518315DNAHuman immunodeficiency virus 1
183tccactgacc ttggg
1518415DNAHuman immunodeficiency virus 1 184tccactgacc ttggg
1518515DNAHuman immunodeficiency
virus 1 185tccactgacc ttggg
1518615DNAHuman immunodeficiency virus 1 186tccactgacc ttggg
1518715DNAHuman
immunodeficiency virus 1 187tccactgacc tttgg
1518815DNAHuman immunodeficiency virus 1
188tccactgacc tttgg
1518915DNAHuman immunodeficiency virus 1 189tccactgacc tttgg
1519015DNAHuman immunodeficiency
virus 1 190tccactgacc tttgg
1519115DNAHuman immunodeficiency virus 1 191tccactgacc tttgg
1519215DNAHuman
immunodeficiency virus 1 192tccactgacc tttgg
1519315DNAHuman immunodeficiency virus 1
193tccactgacc tttgg
1519415DNAHuman immunodeficiency virus 1 194tccactgacc tttgg
1519515DNAHuman immunodeficiency
virus 1 195tccactgacc tttgg
1519615DNAHuman immunodeficiency virus 1 196tccactgacc ttaag
1519715DNAHuman
immunodeficiency virus 1 197tccactgacc ttaag
1519815DNAHuman immunodeficiency virus 1
198tccactgacc ttaag
1519915DNAHuman immunodeficiency virus 1 199tccactgacc ttaag
1520015DNAHuman immunodeficiency
virus 1 200tccactgacc ttaag
1520115DNAHuman immunodeficiency virus 1 201tccactgacc ttcag
1520215DNAHuman
immunodeficiency virus 1 202tccactgacc ttcag
1520315DNAHuman immunodeficiency virus 1
203tccactgacc ttcag
1520415DNAHuman immunodeficiency virus 1 204tccactgacc ttcag
1520515DNAHuman immunodeficiency
virus 1 205tccactgacc ttcag
1520615DNAHuman immunodeficiency virus 1 206tccactgacc ttcag
1520715DNAHuman
immunodeficiency virus 1 207tccactgacc ttcag
1520815DNAHuman immunodeficiency virus 1
208tccactgacc ttcag
1520915DNAHuman immunodeficiency virus 1 209tccactgacc ttcag
1521015DNAHuman immunodeficiency
virus 1 210tccactgacc ttcag
1521115DNAHuman immunodeficiency virus 1 211tccactgacc ttcag
1521215DNAHuman
immunodeficiency virus 1 212tccactgacc ttcag
1521315DNAHuman immunodeficiency virus 1
213tccactgacc ttgag
1521415DNAHuman immunodeficiency virus 1 214tccactgacc ttgag
1521515DNAHuman immunodeficiency
virus 1 215tccactgacc ttgag
1521615DNAHuman immunodeficiency virus 1 216tccactgacc ttgag
1521715DNAHuman
immunodeficiency virus 1 217tccactgacc ttgag
1521815DNAHuman immunodeficiency virus 1
218tccactgacc ttgag
1521915DNAHuman immunodeficiency virus 1 219tccactgacc ttgag
1522015DNAHuman immunodeficiency
virus 1 220tccactgacc ttgag
1522115DNAHuman immunodeficiency virus 1 221tccactgacc ttgag
1522215DNAHuman
immunodeficiency virus 1 222tccactgacc tttag
1522315DNAHuman immunodeficiency virus 1
223tccactgacc tttag
1522415DNAHuman immunodeficiency virus 1 224tccactgacc tttag
1522515DNAHuman immunodeficiency
virus 1 225tccactgacc tttag
1522615DNAHuman immunodeficiency virus 1 226tccactgacc tttag
1522723DNAHuman
immunodeficiency virus 1 227cagcagttct tgaagtactc cgg
2322822DNAHuman immunodeficiency virus 1
228agcagttctt gaagtactcc gg
2222921DNAHuman immunodeficiency virus 1 229gcagttcttg aagtactccg g
2123020DNAHuman immunodeficiency
virus 1 230cagttcttga agtactccgg
2023119DNAHuman immunodeficiency virus 1 231agttcttgaa gtactccgg
1923218DNAHuman
immunodeficiency virus 1 232gttcttgaag tactccgg
1823317DNAHuman immunodeficiency virus 1
233ttcttgaagt actccgg
1723416DNAHuman immunodeficiency virus 1 234tcttgaagta ctccgg
1623516DNAHuman immunodeficiency
virus 1 235tcttgaagta ctctag
1623615DNAHuman immunodeficiency virus 1 236cttgaagtac tcagg
1523715DNAHuman
immunodeficiency virus 1 237cttgaagtac tcagg
1523815DNAHuman immunodeficiency virus 1
238cttgaagtac tcagg
1523915DNAHuman immunodeficiency virus 1 239cttgaagtac tcagg
1524015DNAHuman immunodeficiency
virus 1 240cttgaagtac tccgg
1524115DNAHuman immunodeficiency virus 1 241cttgaagtac tctgg
1524215DNAHuman
immunodeficiency virus 1 242cttgaagtac tcaag
1524315DNAHuman immunodeficiency virus 1
243cttgaagtac tcaag
1524415DNAHuman immunodeficiency virus 1 244cttgaagtac tcaag
1524515DNAHuman immunodeficiency
virus 1 245cttgaagtac tcaag
1524615DNAHuman immunodeficiency virus 1 246cttgaagtac tcaag
1524715DNAHuman
immunodeficiency virus 1 247cttgaagtac tccag
1524815DNAHuman immunodeficiency virus 1
248cttgaagtac tccag
1524915DNAHuman immunodeficiency virus 1 249cttgaagtac tccag
1525015DNAHuman immunodeficiency
virus 1 250cttgaagtac tccag
1525115DNAHuman immunodeficiency virus 1 251cttgaagtac tctag
1525215DNAHuman
immunodeficiency virus 1 252cttgaagtac tctag
1525323DNAHuman immunodeficiency virus 1
253atcagatatc cactgacctt tgg
2325422DNAHuman immunodeficiency virus 1 254tcagatatcc actgaccttt gg
2225522DNAHuman immunodeficiency
virus 1 255tcagatatcc actgaccttt gg
2225621DNAHuman immunodeficiency virus 1 256cagatatcca ctgacctttg
g 2125721DNAHuman
immunodeficiency virus 1 257cagatatcca ctgacctttg g
2125820DNAHuman immunodeficiency virus 1
258agatatccac tgacctttgg
2025920DNAHuman immunodeficiency virus 1 259agatatccac tgacctttgg
2026019DNAHuman immunodeficiency
virus 1 260gatatccact gacctttgg
1926119DNAHuman immunodeficiency virus 1 261gatatccact gacctttgg
1926218DNAHuman
immunodeficiency virus 1 262atatccactg acctttgg
1826318DNAHuman immunodeficiency virus 1
263atatccactg acctttgg
1826417DNAHuman immunodeficiency virus 1 264tatccactga ccttggg
1726517DNAHuman immunodeficiency
virus 1 265tatccactga cctttgg
1726617DNAHuman immunodeficiency virus 1 266tatccactga cctttgg
1726717DNAHuman
immunodeficiency virus 1 267tatccactga ccttaag
1726817DNAHuman immunodeficiency virus 1
268tatccactga ccttgag
1726916DNAHuman immunodeficiency virus 1 269atccactgac cttagg
1627016DNAHuman immunodeficiency
virus 1 270atccactgac cttagg
1627116DNAHuman immunodeficiency virus 1 271atccactgac cttggg
1627216DNAHuman
immunodeficiency virus 1 272atccactgac cttggg
1627316DNAHuman immunodeficiency virus 1
273atccactgac cttggg
1627416DNAHuman immunodeficiency virus 1 274atccactgac cttggg
1627516DNAHuman immunodeficiency
virus 1 275atccactgac ctttgg
1627616DNAHuman immunodeficiency virus 1 276atccactgac ctttgg
1627716DNAHuman
immunodeficiency virus 1 277atccactgac ctttgg
1627816DNAHuman immunodeficiency virus 1
278atccactgac cttaag
1627916DNAHuman immunodeficiency virus 1 279atccactgac cttaag
1628016DNAHuman immunodeficiency
virus 1 280atccactgac cttcag
1628116DNAHuman immunodeficiency virus 1 281atccactgac cttcag
1628216DNAHuman
immunodeficiency virus 1 282atccactgac cttgag
1628316DNAHuman immunodeficiency virus 1
283atccactgac cttgag
1628415DNAHuman immunodeficiency virus 1 284tccactgacc ttagg
1528515DNAHuman immunodeficiency
virus 1 285tccactgacc ttagg
1528615DNAHuman immunodeficiency virus 1 286tccactgacc ttagg
1528715DNAHuman
immunodeficiency virus 1 287tccactgacc ttagg
1528815DNAHuman immunodeficiency virus 1
288tccactgacc ttagg
1528915DNAHuman immunodeficiency virus 1 289tccactgacc ttagg
1529015DNAHuman immunodeficiency
virus 1 290tccactgacc ttggg
1529115DNAHuman immunodeficiency virus 1 291tccactgacc ttggg
1529215DNAHuman
immunodeficiency virus 1 292tccactgacc ttggg
1529315DNAHuman immunodeficiency virus 1
293tccactgacc ttggg
1529415DNAHuman immunodeficiency virus 1 294tccactgacc ttggg
1529515DNAHuman immunodeficiency
virus 1 295tccactgacc ttggg
1529615DNAHuman immunodeficiency virus 1 296tccactgacc ttggg
1529715DNAHuman
immunodeficiency virus 1 297tccactgacc ttggg
1529815DNAHuman immunodeficiency virus 1
298tccactgacc tttgg
1529915DNAHuman immunodeficiency virus 1 299tccactgacc tttgg
1530015DNAHuman immunodeficiency
virus 1 300tccactgacc tttgg
1530115DNAHuman immunodeficiency virus 1 301tccactgacc tttgg
1530215DNAHuman
immunodeficiency virus 1 302tccactgacc tttgg
1530315DNAHuman immunodeficiency virus 1
303tccactgacc tttgg
1530415DNAHuman immunodeficiency virus 1 304tccactgacc tttgg
1530515DNAHuman immunodeficiency
virus 1 305tccactgacc tttgg
1530615DNAHuman immunodeficiency virus 1 306tccactgacc tttgg
1530715DNAHuman
immunodeficiency virus 1 307tccactgacc ttaag
1530815DNAHuman immunodeficiency virus 1
308tccactgacc ttaag
1530915DNAHuman immunodeficiency virus 1 309tccactgacc ttaag
1531015DNAHuman immunodeficiency
virus 1 310tccactgacc ttaag
1531115DNAHuman immunodeficiency virus 1 311tccactgacc ttaag
1531215DNAHuman
immunodeficiency virus 1 312tccactgacc ttcag
1531315DNAHuman immunodeficiency virus 1
313tccactgacc ttcag
1531415DNAHuman immunodeficiency virus 1 314tccactgacc ttcag
1531515DNAHuman immunodeficiency
virus 1 315tccactgacc ttcag
1531615DNAHuman immunodeficiency virus 1 316tccactgacc ttcag
1531715DNAHuman
immunodeficiency virus 1 317tccactgacc ttcag
1531815DNAHuman immunodeficiency virus 1
318tccactgacc ttcag
1531915DNAHuman immunodeficiency virus 1 319tccactgacc ttcag
1532015DNAHuman immunodeficiency
virus 1 320tccactgacc ttcag
1532115DNAHuman immunodeficiency virus 1 321tccactgacc ttcag
1532215DNAHuman
immunodeficiency virus 1 322tccactgacc ttcag
1532315DNAHuman immunodeficiency virus 1
323tccactgacc ttcag
1532415DNAHuman immunodeficiency virus 1 324tccactgacc ttgag
1532515DNAHuman immunodeficiency
virus 1 325tccactgacc ttgag
1532615DNAHuman immunodeficiency virus 1 326tccactgacc ttgag
1532715DNAHuman
immunodeficiency virus 1 327tccactgacc ttgag
1532815DNAHuman immunodeficiency virus 1
328tccactgacc ttgag
1532915DNAHuman immunodeficiency virus 1 329tccactgacc ttgag
1533015DNAHuman immunodeficiency
virus 1 330tccactgacc ttgag
1533115DNAHuman immunodeficiency virus 1 331tccactgacc ttgag
1533215DNAHuman
immunodeficiency virus 1 332tccactgacc ttgag
1533315DNAHuman immunodeficiency virus 1
333tccactgacc tttag
1533415DNAHuman immunodeficiency virus 1 334tccactgacc tttag
1533515DNAHuman immunodeficiency
virus 1 335tccactgacc tttag
1533615DNAHuman immunodeficiency virus 1 336tccactgacc tttag
1533715DNAHuman
immunodeficiency virus 1 337tccactgacc tttag
1533823DNAHuman immunodeficiency virus 1
338cagcagttct tgaagtactc cgg
2333922DNAHuman immunodeficiency virus 1 339agcagttctt gaagtactcc gg
2234021DNAHuman immunodeficiency
virus 1 340gcagttcttg aagtactccg g
2134120DNAHuman immunodeficiency virus 1 341cagttcttga agtactccgg
2034219DNAHuman
immunodeficiency virus 1 342agttcttgaa gtactccgg
1934318DNAHuman immunodeficiency virus 1
343gttcttgaag tactccgg
1834417DNAHuman immunodeficiency virus 1 344ttcttgaagt actccgg
1734516DNAHuman immunodeficiency
virus 1 345tcttgaagta ctccgg
1634616DNAHuman immunodeficiency virus 1 346tcttgaagta ctctag
1634715DNAHuman
immunodeficiency virus 1 347cttgaagtac tcagg
1534815DNAHuman immunodeficiency virus 1
348cttgaagtac tcagg
1534915DNAHuman immunodeficiency virus 1 349cttgaagtac tcagg
1535015DNAHuman immunodeficiency
virus 1 350cttgaagtac tcagg
1535115DNAHuman immunodeficiency virus 1 351cttgaagtac tccgg
1535215DNAHuman
immunodeficiency virus 1 352cttgaagtac tctgg
1535315DNAHuman immunodeficiency virus 1
353cttgaagtac tcaag
1535415DNAHuman immunodeficiency virus 1 354cttgaagtac tcaag
1535515DNAHuman immunodeficiency
virus 1 355cttgaagtac tcaag
1535615DNAHuman immunodeficiency virus 1 356cttgaagtac tcaag
1535715DNAHuman
immunodeficiency virus 1 357cttgaagtac tcaag
1535815DNAHuman immunodeficiency virus 1
358cttgaagtac tccag
1535915DNAHuman immunodeficiency virus 1 359cttgaagtac tccag
1536015DNAHuman immunodeficiency
virus 1 360cttgaagtac tccag
1536115DNAHuman immunodeficiency virus 1 361cttgaagtac tccag
1536215DNAHuman
immunodeficiency virus 1 362cttgaagtac tctag
1536315DNAHuman immunodeficiency virus 1
363cttgaagtac tctag
1536423DNAHuman immunodeficiency virus 1 364gatctgtgga tctaccacac aca
2336526DNAHuman immunodeficiency
virus 1 365gatctgtgga tctaccacac acaagg
2636620DNAHuman immunodeficiency virus 1 366gattggcaga actacacacc
2036723DNAHuman
immunodeficiency virus 1 367gattggcaga actacacacc agg
2336827DNAHuman immunodeficiency virus 1
368gccagggatc agatatccac tgacctt
2736930DNAHuman immunodeficiency virus 1 369gccagggatc agatatccac
tgacctttgg 3037030DNAHuman
immunodeficiency virus 1 370gagtacttca agaactgctg acatcgagct
3037133DNAHuman immunodeficiency virus 1
371ccggagtact tcaagaactg ctgacatcga gct
3337220DNAHuman immunodeficiency virus 1 372gcgtggcctg ggcgggactg
2037323DNAHuman immunodeficiency
virus 1 373gcgtggcctg ggcgggactg ggg
2337422DNAHuman immunodeficiency virus 1 374tcagatgctg catataagca
gc 2237525DNAHuman
immunodeficiency virus 1 375ccctcagatg ctgcatataa gcagc
25376634DNAArtificial SequenceDescription of
Artificial Sequence Syntheticpolynucleotide 376tggaagggct aattcactcc
caacgaagac aagatatcct tgatctgtgg atctaccaca 60cacaaggcta cttccctgat
tggcagaact acacaccagg gccagggatc agatatccac 120tgacctttgg atggtgctac
aagctagtac cagttgagca agagaaggta gaagaagcca 180atgaaggaga gaacacccgc
ttgttacacc ctgtgagcct gcatgggatg gatgacccgg 240agagagaagt attagagtgg
aggtttgaca gccgcctagc atttcatcac atggcccgag 300agctgcatcc ggagtacttc
aagaactgct gacatcgagc ttgctacaag ggactttccg 360ctggggactt tccagggagg
cgtggcctgg gcgggactgg ggagtggcga gccctcagat 420gctgcatata agcagctgct
ttttgcttgt actgggtctc tctggttaga ccagatctga 480gcctgggagc tctctggcta
actagggaac ccactgctta agcctcaata aagcttgcct 540tgagtgcttc aagtagtgtg
tgcccgtctg ttgtgtgact ctggtaacta gagatccctc 600agaccctttt agtcagtgtg
gaaaatctct agca 634377453DNAArtificial
SequenceDescription of Artificial Sequence Syntheticpolynucleotide
377tggaagggct aattcactcc caacgaagac aagatatcct tgatctgtgg atctaccaca
60cacaaggcta cttccctgat tggcagaact acacaccagg gccagggatc agatatccac
120tgacctttgg atggtgctac aagctagtac cagttgagca agagaaggta gaagaagcca
180atgaaggaga gaacacccgc ttgttacacc ctgtgagcct gcatgggatg gatgacccgg
240agagagaagt attagagtgg aggtttgaca gccgcctagc atttcatcac atggcccgag
300agctgcatcc ggagtacttc aagaactgct gacatcgagc ttgctacaag ggactttccg
360ctggggactt tccagggagg cgtggcctgg gcgggactgg ggagtggcga gccctcagat
420gctgcatata agcagctgct ttttgcttgt act
45337897DNAArtificial SequenceDescription of Artificial Sequence
Syntheticoligonucleotide 378gggtctctct ggttagacca gatctgagcc tgggagctct
ctggctaact agggaaccca 60ctgcttaagc ctcaataaag cttgccttga gtgcttc
9737984DNAArtificial SequenceDescription of
Artificial Sequence Syntheticoligonucleotide 379aagtagtgtg tgcccgtctg
ttgtgtgact ctggtaacta gagatccctc agaccctttt 60agtcagtgtg gaaaatctct
agca 84380818DNAArtificial
SequenceDescription of Artificial Sequence Syntheticpolynucleotide
380tggaagggat ttattacagt gcaagaagac atagaatctt agacatatac ttagaaaagg
60aagaaggcat cataccagat tggcaggatt acacctcagg accaggaatt agatacccaa
120agacatttgg ctggctatgg aaattagtcc ctgtaaatgt atcagatgag gcacaggagg
180atgaggagca ttatttaatg catccagctc aaacttccca gtgggatgac ccttggggag
240aggttctagc atggaagttt gatccaactc tggcctacac ttatgaggca tatgttagat
300acccagaaga gtttggaagc aagtcaggcc tgtcagagga agaggttaga agaaggctaa
360ccgcaagagg ccttcttaac atggctgaca agaaggaaac tcgctgaaac agcagggact
420ttccacaagg ggatgttacg gggaggtact ggggaggagc cggtcgggaa cgcccacttt
480cttgatgtat aaatatcact gcatttcgct ctgtattcag tcgctctgcg gagaggctgg
540cagattgagc cctgggaggt tctctccagc actagcaggt agagcctggg tgttccctgc
600tagactctca ccagcacttg gccggtgctg ggcagagtga ctccacgctt gcttgcttaa
660agccctcttc aataaagctg ccattttaga agtaagctag tgtgtgttcc catctctcct
720agccgccgcc tggtcaactc ggtactcaat aataagaaga ccctggtctg ttaggaccct
780ttctgctttg ggaaaccgaa gcaggaaaat ccctagca
818381517DNAArtificial SequenceDescription of Artificial Sequence
Syntheticpolynucleotide 381tggaagggat ttattacagt gcaagaagac atagaatctt
agacatatac ttagaaaagg 60aagaaggcat cataccagat tggcaggatt acacctcagg
accaggaatt agatacccaa 120agacatttgg ctggctatgg aaattagtcc ctgtaaatgt
atcagatgag gcacaggagg 180atgaggagca ttatttaatg catccagctc aaacttccca
gtgggatgac ccttggggag 240aggttctagc atggaagttt gatccaactc tggcctacac
ttatgaggca tatgttagat 300acccagaaga gtttggaagc aagtcaggcc tgtcagagga
agaggttaga agaaggctaa 360ccgcaagagg ccttcttaac atggctgaca agaaggaaac
tcgctgaaac agcagggact 420ttccacaagg ggatgttacg gggaggtact ggggaggagc
cggtcgggaa cgcccacttt 480cttgatgtat aaatatcact gcatttcgct ctgtatt
517382176DNAArtificial SequenceDescription of
Artificial Sequence Syntheticpolynucleotide 382cagtcgctct gcggagaggc
tggcagattg agccctggga ggttctctcc agcactagca 60ggtagagcct gggtgttccc
tgctagactc tcaccagcac ttggccggtg ctgggcagag 120tgactccacg cttgcttgct
taaagccctc ttcaataaag ctgccatttt agaagt 176383125DNAArtificial
SequenceDescription of Artificial Sequence Syntheticpolynucleotide
383aagctagtgt gtgttcccat ctctcctagc cgccgcctgg tcaactcggt actcaataat
60aagaagaccc tggtctgtta ggaccctttc tgctttggga aaccgaagca ggaaaatccc
120tagca
12538414825DNAHuman immunodeficiency virus 1 384tggaagggct aatttggtcc
caaaaaagac aagagatcct tgatctgtgg atctaccaca 60cacaaggcta cttccctgat
tggcagaact acacaccagg gccagggatc agatatccac 120tgacctttgg atggtgcttc
aagttagtac cagttgaacc agagcaagta gaagaggcca 180atgaaggaga gaacaacagc
ttgttacacc ctatgagcca gcatgggatg gaggacccgg 240agggagaagt attagtgtgg
aagtttgaca gcctcctagc atttcgtcac atggcccgag 300agctgcatcc ggagtactac
aaagactgct gacatcgagc tttctacaag ggactttccg 360ctggggactt tccagggagg
tgtggcctgg gcgggactgg ggagtggcga gccctcagat 420gctacatata agcagctgct
ttttgcctgt actgggtctc tctggttaga ccagatctga 480gcctgggagc tctctggcta
actagggaac ccactgctta agcctcaata aagcttgcct 540tgagtgctca aagtagtgtg
tgcccgtctg ttgtgtgact ctggtaacta gagatccctc 600agaccctttt agtcagtgtg
gaaaatctct agcagtggcg cccgaacagg gacttgaaag 660cgaaagtaaa gccagaggag
atctctcgac gcaggactcg gcttgctgaa gcgcgcacgg 720caagaggcga ggggcggcga
ctggtgagta cgccaaaaat tttgactagc ggaggctaga 780aggagagaga tgggtgcgag
agcgtcggta ttaagcgggg gagaattaga taaatgggaa 840aaaattcggt taaggccagg
gggaaagaaa caatataaac taaaacatat agtatgggca 900agcagggagc tagaacgatt
cgcagttaat cctggccttt tagagacatc agaaggctgt 960agacaaatac tgggacagct
acaaccatcc cttcagacag gatcagaaga acttagatca 1020ttatataata caatagcagt
cctctattgt gtgcatcaaa ggatagatgt aaaagacacc 1080aaggaagcct tagataagat
agaggaagag caaaacaaaa gtaagaaaaa ggcacagcaa 1140gcagcagctg acacaggaaa
caacagccag gtcagccaaa attaccctat agtgcagaac 1200ctccaggggc aaatggtaca
tcaggccata tcacctagaa ctttaaatgc atgggtaaaa 1260gtagtagaag agaaggcttt
cagcccagaa gtaataccca tgttttcagc attatcagaa 1320ggagccaccc cacaagattt
aaataccatg ctaaacacag tggggggaca tcaagcagcc 1380atgcaaatgt taaaagagac
catcaatgag gaagctgcag aatgggatag attgcatcca 1440gtgcatgcag ggcctattgc
accaggccag atgagagaac caaggggaag tgacatagca 1500ggaactacta gtacccttca
ggaacaaata ggatggatga cacataatcc acctatccca 1560gtaggagaaa tctataaaag
atggataatc ctgggattaa ataaaatagt aagaatgtat 1620agccctacca gcattctgga
cataagacaa ggaccaaagg aaccctttag agactatgta 1680gaccgattct ataaaactct
aagagccgag caagcttcac aagaggtaaa aaattggatg 1740acagaaacct tgttggtcca
aaatgcgaac ccagattgta agactatttt aaaagcattg 1800ggaccaggag cgacactaga
agaaatgatg acagcatgtc agggagtggg gggacccggc 1860cataaagcaa gagttttggc
tgaagcaatg agccaagtaa caaatccagc taccataatg 1920atacagaaag gcaattttag
gaaccaaaga aagactgtta agtgtttcaa ttgtggcaaa 1980gaagggcaca tagccaaaaa
ttgcagggcc cctaggaaaa agggctgttg gaaatgtgga 2040aaggaaggac accaaatgaa
agattgtact gagagacagg ctaatttttt agggaagatc 2100tggccttccc acaagggaag
gccagggaat tttcttcaga gcagaccaga gccaacagcc 2160ccaccagaag agagcttcag
gtttggggaa gagacaacaa ctccctctca gaagcaggag 2220ccgatagaca aggaactgta
tcctttagct tccctcagat cactctttgg cagcgacccc 2280tcgtcacaat aaagataggg
gggcaattaa aggaagctct attagataca ggagcagatg 2340atacagtatt agaagaaatg
aatttgccag gaagatggaa accaaaaatg atagggggaa 2400ttggaggttt tatcaaagta
agacagtatg atcagatact catagaaatc tgcggacata 2460aagctatagg tacagtatta
gtaggaccta cacctgtcaa cataattgga agaaatctgt 2520tgactcagat tggctgcact
ttaaattttc ccattagtcc tattgagact gtaccagtaa 2580aattaaagcc aggaatggat
ggcccaaaag ttaaacaatg gccattgaca gaagaaaaaa 2640taaaagcatt agtagaaatt
tgtacagaaa tggaaaagga aggaaaaatt tcaaaaattg 2700ggcctgaaaa tccatacaat
actccagtat ttgccataaa gaaaaaagac agtactaaat 2760ggagaaaatt agtagatttc
agagaactta ataagagaac tcaagatttc tgggaagttc 2820aattaggaat accacatcct
gcagggttaa aacagaaaaa atcagtaaca gtactggatg 2880tgggcgatgc atatttttca
gttcccttag ataaagactt caggaagtat actgcattta 2940ccatacctag tataaacaat
gagacaccag ggattagata tcagtacaat gtgcttccac 3000agggatggaa aggatcacca
gcaatattcc agtgtagcat gacaaaaatc ttagagcctt 3060ttagaaaaca aaatccagac
atagtcatct atcaatacat ggatgatttg tatgtaggat 3120ctgacttaga aatagggcag
catagaacaa aaatagagga actgagacaa catctgttga 3180ggtggggatt taccacacca
gacaaaaaac atcagaaaga acctccattc ctttggatgg 3240gttatgaact ccatcctgat
aaatggacag tacagcctat agtgctgcca gaaaaggaca 3300gctggactgt caatgacata
cagaaattag tgggaaaatt gaattgggca agtcagattt 3360atgcagggat taaagtaagg
caattatgta aacttcttag gggaaccaaa gcactaacag 3420aagtagtacc actaacagaa
gaagcagagc tagaactggc agaaaacagg gagattctaa 3480aagaaccggt acatggagtg
tattatgacc catcaaaaga cttaatagca gaaatacaga 3540agcaggggca aggccaatgg
acatatcaaa tttatcaaga gccatttaaa aatctgaaaa 3600caggaaagta tgcaagaatg
aagggtgccc acactaatga tgtgaaacaa ttaacagagg 3660cagtacaaaa aatagccaca
gaaagcatag taatatgggg aaagactcct aaatttaaat 3720tacccataca aaaggaaaca
tgggaagcat ggtggacaga gtattggcaa gccacctgga 3780ttcctgagtg ggagtttgtc
aatacccctc ccttagtgaa gttatggtac cagttagaga 3840aagaacccat aataggagca
gaaactttct atgtagatgg ggcagccaat agggaaacta 3900aattaggaaa agcaggatat
gtaactgaca gaggaagaca aaaagttgtc cccctaacgg 3960acacaacaaa tcagaagact
gagttacaag caattcatct agctttgcag gattcgggat 4020tagaagtaaa catagtgaca
gactcacaat atgcattggg aatcattcaa gcacaaccag 4080ataagagtga atcagagtta
gtcagtcaaa taatagagca gttaataaaa aaggaaaaag 4140tctacctggc atgggtacca
gcacacaaag gaattggagg aaatgaacaa gtagataaat 4200tggtcagtgc tggaatcagg
aaagtactat ttttagatgg aatagataag gcccaagaag 4260aacatgagaa atatcacagt
aattggagag caatggctag tgattttaac ctaccacctg 4320tagtagcaaa agaaatagta
gccagctgtg ataaatgtca gctaaaaggg gaagccatgc 4380atggacaagt agactgtagc
ccaggaatat ggcagctaga ttgtacacat ttagaaggaa 4440aagttatctt ggtagcagtt
catgtagcca gtggatatat agaagcagaa gtaattccag 4500cagagacagg gcaagaaaca
gcatacttcc tcttaaaatt agcaggaaga tggccagtaa 4560aaacagtaca tacagacaat
ggcagcaatt tcaccagtac tacagttaag gccgcctgtt 4620ggtgggcggg gatcaagcag
gaatttggca ttccctacaa tccccaaagt caaggagtaa 4680tagaatctat gaataaagaa
ttaaagaaaa ttataggaca ggtaagagat caggctgaac 4740atcttaagac agcagtacaa
atggcagtat tcatccacaa ttttaaaaga aaagggggga 4800ttggggggta cagtgcaggg
gaaagaatag tagacataat agcaacagac atacaaacta 4860aagaattaca aaaacaaatt
acaaaaattc aaaattttcg ggtttattac agggacagca 4920gagatccagt ttggaaagga
ccagcaaagc tcctctggaa aggtgaaggg gcagtagtaa 4980tacaagataa tagtgacata
aaagtagtgc caagaagaaa agcaaagatc atcagggatt 5040atggaaaaca gatggcaggt
gatgattgtg tggcaagtag acaggatgag gattaacaca 5100tggaaaagat tagtaaaaca
ccatatgtat atttcaagga aagctaagga ctggttttat 5160agacatcact atgaaagtac
taatccaaaa ataagttcag aagtacacat cccactaggg 5220gatgctaaat tagtaataac
aacatattgg ggtctgcata caggagaaag agactggcat 5280ttgggtcagg gagtctccat
agaatggagg aaaaagagat atagcacaca agtagaccct 5340gacctagcag accaactaat
tcatctgcac tattttgatt gtttttcaga atctgctata 5400agaaatacca tattaggacg
tatagttagt cctaggtgtg aatatcaagc aggacataac 5460aaggtaggat ctctacagta
cttggcacta gcagcattaa taaaaccaaa acagataaag 5520ccacctttgc ctagtgttag
gaaactgaca gaggacagat ggaacaagcc ccagaagacc 5580aagggccaca gagggagcca
tacaatgaat ggacactaga gcttttagag gaacttaaga 5640gtgaagctgt tagacatttt
cctaggatat ggctccataa cttaggacaa catatctatg 5700aaacttacgg ggatacttgg
gcaggagtgg aagccataat aagaattctg caacaactgc 5760tgtttatcca tttcagaatt
gggtgtcgac atagcagaat aggcgttact cgacagagga 5820gagcaagaaa tggagccagt
agatcctaga ctagagccct ggaagcatcc aggaagtcag 5880cctaaaactg cttgtaccaa
ttgctattgt aaaaagtgtt gctttcattg ccaagtttgt 5940ttcatgacaa aagccttagg
catctcctat ggcaggaaga agcggagaca gcgacgaaga 6000gctcatcaga acagtcagac
tcatcaagct tctctatcaa agcagtaagt agtacatgta 6060atgcaaccta taatagtagc
aatagtagca ttagtagtag caataataat agcaatagtt 6120gtgtggtcca tagtaatcat
agaatatagg aaaatattaa gacaaagaaa aatagacagg 6180ttaattgata gactaataga
aagagcagaa gacagtggca atgagagtga aggagaagta 6240tcagcacttg tggagatggg
ggtggaaatg gggcaccatg ctccttggga tattgatgat 6300ctgtagtgct acagaaaaat
tgtgggtcac agtctattat ggggtacctg tgtggaagga 6360agcaaccacc actctatttt
gtgcatcaga tgctaaagca tatgatacag aggtacataa 6420tgtttgggcc acacatgcct
gtgtacccac agaccccaac ccacaagaag tagtattggt 6480aaatgtgaca gaaaatttta
acatgtggaa aaatgacatg gtagaacaga tgcatgagga 6540tataatcagt ttatgggatc
aaagcctaaa gccatgtgta aaattaaccc cactctgtgt 6600tagtttaaag tgcactgatt
tgaagaatga tactaatacc aatagtagta gcgggagaat 6660gataatggag aaaggagaga
taaaaaactg ctctttcaat atcagcacaa gcataagaga 6720taaggtgcag aaagaatatg
cattctttta taaacttgat atagtaccaa tagataatac 6780cagctatagg ttgataagtt
gtaacacctc agtcattaca caggcctgtc caaaggtatc 6840ctttgagcca attcccatac
attattgtgc cccggctggt tttgcgattc taaaatgtaa 6900taataagacg ttcaatggaa
caggaccatg tacaaatgtc agcacagtac aatgtacaca 6960tggaatcagg ccagtagtat
caactcaact gctgttaaat ggcagtctag cagaagaaga 7020tgtagtaatt agatctgcca
atttcacaga caatgctaaa accataatag tacagctgaa 7080cacatctgta gaaattaatt
gtacaagacc caacaacaat acaagaaaaa gtatccgtat 7140ccagagggga ccagggagag
catttgttac aataggaaaa ataggaaata tgagacaagc 7200acattgtaac attagtagag
caaaatggaa tgccacttta aaacagatag ctagcaaatt 7260aagagaacaa tttggaaata
ataaaacaat aatctttaag caatcctcag gaggggaccc 7320agaaattgta acgcacagtt
ttaattgtgg aggggaattt ttctactgta attcaacaca 7380actgtttaat agtacttggt
ttaatagtac ttggagtact gaagggtcaa ataacactga 7440aggaagtgac acaatcacac
tcccatgcag aataaaacaa tttataaaca tgtggcagga 7500agtaggaaaa gcaatgtatg
cccctcccat cagtggacaa attagatgtt catcaaatat 7560tactgggctg ctattaacaa
gagatggtgg taataacaac aatgggtccg agatcttcag 7620acctggagga ggcgatatga
gggacaattg gagaagtgaa ttatataaat ataaagtagt 7680aaaaattgaa ccattaggag
tagcacccac caaggcaaag agaagagtgg tgcagagaga 7740aaaaagagca gtgggaatag
gagctttgtt ccttgggttc ttgggagcag caggaagcac 7800tatgggcgca gcgtcaatga
cgctgacggt acaggccaga caattattgt ctgatatagt 7860gcagcagcag aacaatttgc
tgagggctat tgaggcgcaa cagcatctgt tgcaactcac 7920agtctggggc atcaaacagc
tccaggcaag aatcctggct gtggaaagat acctaaagga 7980tcaacagctc ctggggattt
ggggttgctc tggaaaactc atttgcacca ctgctgtgcc 8040ttggaatgct agttggagta
ataaatctct ggaacagatt tggaataaca tgacctggat 8100ggagtgggac agagaaatta
acaattacac aagcttaata cactccttaa ttgaagaatc 8160gcaaaaccag caagaaaaga
atgaacaaga attattggaa ttagataaat gggcaagttt 8220gtggaattgg tttaacataa
caaattggct gtggtatata aaattattca taatgatagt 8280aggaggcttg gtaggtttaa
gaatagtttt tgctgtactt tctatagtga atagagttag 8340gcagggatat tcaccattat
cgtttcagac ccacctccca atcccgaggg gacccgacag 8400gcccgaagga atagaagaag
aaggtggaga gagagacaga gacagatcca ttcgattagt 8460gaacggatcc ttagcactta
tctgggacga tctgcggagc ctgtgcctct tcagctacca 8520ccgcttgaga gacttactct
tgattgtaac gaggattgtg gaacttctgg gacgcagggg 8580gtgggaagcc ctcaaatatt
ggtggaatct cctacagtat tggagtcagg aactaaagaa 8640tagtgctgtt aacttgctca
atgccacagc catagcagta gctgagggga cagatagggt 8700tatagaagta ttacaagcag
cttatagagc tattcgccac atacctagaa gaataagaca 8760gggcttggaa aggattttgc
tataagatgg gtggcaagtg gtcaaaaagt agtgtgattg 8820gatggcctgc tgtaagggaa
agaatgagac gagctgagcc agcagcagat ggggtgggag 8880cagtatctcg agacctagaa
aaacatggag caatcacaag tagcaataca gcagctaaca 8940atgctgcttg tgcctggcta
gaagcacaag aggaggaaga ggtgggtttt ccagtcacac 9000ctcaggtacc tttaagacca
atgacttaca aggcagctgt agatcttagc cactttttaa 9060aagaaaaggg gggactggaa
gggctaattc actcccaaag aagacaagat atccttgatc 9120tgtggatcta ccacacacaa
ggctacttcc ctgattggca gaactacaca ccagggccag 9180gggtcagata tccactgacc
tttggatggt gctacaagct agtaccagtt gagccagata 9240aggtagaaga ggccaataaa
ggagagaaca ccagcttgtt acaccctgtg agcctgcatg 9300gaatggatga ccctgagaga
gaagtgttag agtggaggtt tgacagccgc ctagcatttc 9360atcacgtggc ccgagagctg
catccggagt acttcaagaa ctgctgacat cgagcttgct 9420acaagggact ttccgctggg
gactttccag ggaggcgtgg cctgggcggg actggggagt 9480ggcgagccct cagatgctgc
atataagcag ctgctttttg cctgtactgg gtctctctgg 9540ttagaccaga tctgagcctg
ggagctctct ggctaactag ggaacccact gcttaagcct 9600caataaagct tgccttgagt
gcttcaagta gtgtgtgccc gtctgttgtg tgactctggt 9660aactagagat ccctcagacc
cttttagtca gtgtggaaaa tctctagcac ccaggaggta 9720gaggttgcag tgagccaaga
tcgcgccact gcattccagc ctgggcaaga aaacaagact 9780gtctaaaata ataataataa
gttaagggta ttaaatatat ttatacatgg aggtcataaa 9840aatatatata tttgggctgg
gcgcagtggc tcacacctgc gcccggccct ttgggaggcc 9900gaggcaggtg gatcacctga
gtttgggagt tccagaccag cctgaccaac atggagaaac 9960cccttctctg tgtattttta
gtagatttta ttttatgtgt attttattca caggtatttc 10020tggaaaactg aaactgtttt
tcctctactc tgataccaca agaatcatca gcacagagga 10080agacttctgt gatcaaatgt
ggtgggagag ggaggttttc accagcacat gagcagtcag 10140ttctgccgca gactcggcgg
gtgtccttcg gttcagttcc aacaccgcct gcctggagag 10200aggtcagacc acagggtgag
ggctcagtcc ccaagacata aacacccaag acataaacac 10260ccaacaggtc caccccgcct
gctgcccagg cagagccgat tcaccaagac gggaattagg 10320atagagaaag agtaagtcac
acagagccgg ctgtgcggga gaacggagtt ctattatgac 10380tcaaatcagt ctccccaagc
attcggggat cagagttttt aaggataact tagtgtgtag 10440ggggccagtg agttggagat
gaaagcgtag ggagtcgaag gtgtcctttt gcgccgagtc 10500agttcctggg tgggggccac
aagatcggat gagccagttt atcaatccgg gggtgccagc 10560tgatccatgg agtgcagggt
ctgcaaaata tctcaagcac tgattgatct taggttttac 10620aatagtgatg ttaccccagg
aacaatttgg ggaaggtcag aatcttgtag cctgtagctg 10680catgactcct aaaccataat
ttcttttttg tttttttttt tttatttttg agacagggtc 10740tcactctgtc acctaggctg
gagtgcagtg gtgcaatcac agctcactgc agcctcaacg 10800tcgtaagctc aagcgatcct
cccacctcag cctgcctggt agctgagact acaagcgacg 10860ccccagttaa tttttgtatt
tttggtagag gcagcgtttt gccgtgtggc cctggctggt 10920ctcgaactcc tgggctcaag
tgatccagcc tcagcctccc aaagtgctgg gacaaccggg 10980gccagtcact gcacctggcc
ctaaaccata atttctaatc ttttggctaa tttgttagtc 11040ctacaaaggc agtctagtcc
ccaggcaaaa agggggtttg tttcgggaaa gggctgttac 11100tgtctttgtt tcaaactata
aactaagttc ctcctaaact tagttcggcc tacacccagg 11160aatgaacaag gagagcttgg
aggttagaag cacgatggaa ttggttaggt cagatctctt 11220tcactgtctg agttataatt
ttgcaatggt ggttcaaaga ctgcccgctt ctgacaccag 11280tcgctgcatt aatgaatcgg
ccaacgcgcg gggagaggcg gtttgcgtat tgggcgctct 11340tccgcttcct cgctcactga
ctcgctgcgc tcggtcgttc ggctgcggcg agcggtatca 11400gctcactcaa aggcggtaat
acggttatcc acagaatcag gggataacgc aggaaagaac 11460atgtgagcaa aaggccagca
aaaggccagg aaccgtaaaa aggccgcgtt gctggcgttt 11520ttccataggc tccgcccccc
tgacgagcat cacaaaaatc gacgctcaag tcagaggtgg 11580cgaaacccga caggactata
aagataccag gcgtttcccc ctggaagctc cctcgtgcgc 11640tctcctgttc cgaccctgcc
gcttaccgga tacctgtccg cctttctccc ttcgggaagc 11700gtggcgcttt ctcatagctc
acgctgtagg tatctcagtt cggtgtaggt cgttcgctcc 11760aagctgggct gtgtgcacga
accccccgtt cagcccgacc gctgcgcctt atccggtaac 11820tatcgtcttg agtccaaccc
ggtaagacac gacttatcgc cactggcagc agccactggt 11880aacaggatta gcagagcgag
gtatgtaggc ggtgctacag agttcttgaa gtggtggcct 11940aactacggct acactagaag
aacagtattt ggtatctgcg ctctgctgaa gccagttacc 12000ttcggaaaaa gagttggtag
ctcttgatcc ggcaaacaaa ccaccgctgg tagcggtggt 12060ttttttgttt gcaagcagca
gattacgcgc agaaaaaaag gatctcaaga agatcctttg 12120atcttttcta cggggtctga
cgctcagtgg aacgaaaact cacgttaagg gattttggtc 12180atgagattat caaaaaggat
cttcacctag atccttttaa attaaaaatg aagttttaaa 12240tcaatctaaa gtatatatga
gtaaacttgg tctgacagtt accaatgctt aatcagtgag 12300gcacctatct cagcgatctg
tctatttcgt tcatccatag ttgcctgact ccccgtcgtg 12360tagataacta cgatacggga
gggcttacca tctggcccca gtgctgcaat gataccgcga 12420gacccacgct caccggctcc
agatttatca gcaataaacc agccagccgg aagggccgag 12480cgcagaagtg gtcctgcaac
tttatccgcc tccatccagt ctattaattg ttgccgggaa 12540gctagagtaa gtagttcgcc
agttaatagt ttgcgcaacg ttgttgccat tgctacaggc 12600atcgtggtgt cacgctcgtc
gtttggtatg gcttcattca gctccggttc ccaacgatca 12660aggcgagtta catgatcccc
catgttgtgc aaaaaagcgg ttagctcctt cggtcctccg 12720atcgttgtca gaagtaagtt
ggccgcagtg ttatcactca tggttatggc agcactgcat 12780aattctctta ctgtcatgcc
atccgtaaga tgcttttctg tgactggtga gtactcaacc 12840aagtcattct gagaatagtg
tatgcggcga ccgagttgct cttgcccggc gtcaatacgg 12900gataataccg cgccacatag
cagaacttta aaagtgctca tcattggaaa acgttcttcg 12960gggcgaaaac tctcaaggat
cttaccgctg ttgagatcca gttcgatgta acccactcgt 13020gcacccaact gatcttcagc
atcttttact ttcaccagcg tttctgggtg agcaaaaaca 13080ggaaggcaaa atgccgcaaa
aaagggaata agggcgacac ggaaatgttg aatactcata 13140ctcttccttt ttcaatatta
ttgaagcatt tatcagggtt attgtctcat gagcggatac 13200atatttgaat gtatttagaa
aaataaacaa ataggggttc cgcgcacatt tccccgaaaa 13260gtgccacctg acgtctaaga
aaccattatt atcatgacat taacctataa aaataggcgt 13320atcacgaggc cctttcgtct
cgcgcgtttc ggtgatgacg gtgaaaacct ctgacacatg 13380cagctcccgg agacggtcac
agcttgtctg taagcggatg ccgggagcag acaagcccgt 13440cagggcgcgt cagcgggtgt
tggcgggtgt cggggctggc ttaactatgc ggcatcagag 13500cagattgtac tgagagtgca
ccatatgcgg tgtgaaatac cgcacagatg cgtaaggaga 13560aaataccgca tcaggcgcca
ttcgccattc aggctgcgca actgttggga agggcgatcg 13620gtgcgggcct cttcgctatt
acgccagggg aggcagagat tgcagtaagc tgagatcgca 13680gcactgcact ccagcctggg
cgacagagta agactctgtc tcaaaaataa aataaataaa 13740tcaatcagat attccaatct
tttcctttat ttatttattt attttctatt ttggaaacac 13800agtccttcct tattccagaa
ttacacatat attctatttt tctttatatg ctccagtttt 13860ttttagacct tcacctgaaa
tgtgtgtata caaaatctag gccagtccag cagagcctaa 13920aggtaaaaaa taaaataata
aaaaataaat aaaatctagc tcactccttc acatcaaaat 13980ggagatacag ctgttagcat
taaataccaa ataacccatc ttgtcctcaa taattttaag 14040cgcctctctc caccacatct
aactcctgtc aaaggcatgt gccccttccg ggcgctctgc 14100tgtgctgcca accaactggc
atgtggactc tgcagggtcc ctaactgcca agccccacag 14160tgtgccctga ggctgcccct
tccttctagc ggctgccccc actcggcttt gctttcccta 14220gtttcagtta cttgcgttca
gccaaggtct gaaactaggt gcgcacagag cggtaagact 14280gcgagagaaa gagaccagct
ttacaggggg tttatcacag tgcaccctga cagtcgtcag 14340cctcacaggg ggtttatcac
attgcaccct gacagtcgtc agcctcacag ggggtttatc 14400acagtgcacc cttacaatca
ttccatttga ttcacaattt ttttagtctc tactgtgcct 14460aacttgtaag ttaaatttga
tcagaggtgt gttcccagag gggaaaacag tatatacagg 14520gttcagtact atcgcatttc
aggcctccac ctgggtcttg gaatgtgtcc cccgaggggt 14580gatgactacc tcagttggat
ctccacaggt cacagtgaca caagataacc aagacacctc 14640ccaaggctac cacaatgggc
cgccctccac gtgcacatgg ccggaggaac tgccatgtcg 14700gaggtgcaag cacacctgcg
catcagagtc cttggtgtgg agggagggac cagcgcagct 14760tccagccatc cacctgatga
acagaaccta gggaaagccc cagttctact tacaccagga 14820aaggc
1482538510535DNASimian
immunodeficiency virus 385gcatgcacat tttaaaggct tttgctaaat atagccaaaa
gtccttctac aaattttcta 60agagttctga ttcaaagcag taacaggcct tgtctcatca
tgaactttgg catttcatct 120acagctaagt ttatatcata aatagttctt tacaggcagc
accaacttat acccttatag 180catactttac tgtgtgaaaa ttgcatcttt cattaagctt
actgtaaatt tactggctgt 240cttccttgca ggtttctgga agggatttat tacagtgcaa
gaagacatag aatcttagac 300atatacttag aaaaggaaga aggcatcata ccagattggc
aggattacac ctcaggacca 360ggaattagat acccaaagac atttggctgg ctatggaaat
tagtccctgt aaatgtatca 420gatgaggcac aggaggatga ggagcattat ttaatgcatc
cagctcaaac ttcccagtgg 480gatgaccctt ggggagaggt tctagcatgg aagtttgatc
caactctggc ctacacttat 540gaggcatatg ttagataccc agaagagttt ggaagcaagt
caggcctgtc agaggaagag 600gttagaagaa ggctaaccgc aagaggcctt cttaacatgg
ctgacaagaa ggaaactcgc 660tgaaacagca gggactttcc acaaggggat gttacgggga
ggtactgggg aggagccggt 720cgggaacgcc cactttcttg atgtataaat atcactgcat
ttcgctctgt attcagtcgc 780tctgcggaga ggctggcaga ttgagccctg ggaggttctc
tccagcacta gcaggtagag 840cctgggtgtt ccctgctaga ctctcaccag cacttggccg
gtgctgggca gagtgactcc 900acgcttgctt gcttaaagcc ctcttcaata aagctgccat
tttagaagta agctagtgtg 960tgttcccatc tctcctagcc gccgcctggt caactcggta
ctcaataata agaagaccct 1020ggtctgttag gaccctttct gctttgggaa accgaagcag
gaaaatccct agcagattgg 1080cgcctgaaca gggacttgaa ggagagtgag agactcctga
gtacggctga gtgaaggcag 1140taagggcggc aggaaccaac cacgacggag tgctcctata
aaggcgcggg tcggtaccag 1200acggcgtgag gagcgggaga ggaagaggcc tccggttgca
ggtaagtgca acacaaaaaa 1260gaaatagctg tcttttatcc aggaaggggt aataagatag
agtgggagat gggcgtgaga 1320aactccgtct tgtcagggaa gaaagcagat gaattagaaa
aaattaggct acgacccaac 1380ggaaagaaaa agtacatgtt gaagcatgta gtatgggcag
caaatgaatt agatagattt 1440ggattagcag aaagcctgtt ggagaacaaa gaaggatgtc
aaaaaatact ttcggtctta 1500gctccattag tgccaacagg ctcagaaaat ttaaaaagcc
tttataatac tgtctgcgtc 1560atctggtgca ttcacgcaga agagaaagtg aaacacactg
aggaagcaaa acagatagtg 1620cagagacacc tagtggtgga aacaggaaca acagaaacta
tgccaaaaac aagtagacca 1680acagcaccat ctagcggcag aggaggaaat tacccagtac
aacaaatagg tggtaactat 1740gtccacctgc cattaagccc gagaacatta aatgcctggg
taaaattgat agaggaaaag 1800aaatttggag cagaagtagt gccaggattt caggcactgt
cagaaggttg caccccctat 1860gacattaatc agatgttaaa ttgtgtggga gaccatcaag
cggctatgca gattatcaga 1920gatattataa acgaggaggc tgcagattgg gacttgcagc
acccacaacc agctccacaa 1980caaggacaac ttagggagcc gtcaggatca gatattgcag
gaacaactag ttcagtagat 2040gaacaaatcc agtggatgta cagacaacag aaccccatac
cagtaggcaa catttacagg 2100agatggatcc aactggggtt gcaaaaatgt gtcagaatgt
ataacccaac aaacattcta 2160gatgtaaaac aagggccaaa agagccattt cagagctatg
tagacaggtt ctacaaaagt 2220ttaagagcag aacagacaga tgcagcagta aagaattgga
tgactcaaac actgctgatt 2280caaaatgcta acccagattg caagctagtg ctgaaggggc
tgggtgtgaa tcccacccta 2340gaagaaatgc tgacggcttg tcaaggagta ggggggccgg
gacagaaggc tagattaatg 2400gcagaagccc tgaaagaggc cctcgcacca gtgccaatcc
cttttgcagc agcccaacag 2460aggggaccaa gaaagccaat taagtgttgg aattgtggga
aagagggaca ctctgcaagg 2520caatgcagag ccccaagaag acagggatgc tggaaatgtg
gaaaaatgga ccatgttatg 2580gccaaatgcc cagacagaca ggcgggtttt ttaggccttg
gtccatgggg aaagaagccc 2640cgcaatttcc ccatggctca agtgcatcag gggctgatgc
caactgctcc cccagaggac 2700ccagctgtgg atctgctaaa gaactacatg cagttgggca
agcagcagag agaaaagcag 2760agagaaagca gagagaagcc ttacaaggag gtgacagagg
atttgctgca cctcaattct 2820ctctttggag gagaccagta gtcactgctc atattgaagg
acagcctgta gaagtattac 2880tggatacagg ggctgatgat tctattgtaa caggaataga
gttaggtcca cattataccc 2940caaaaatagt aggaggaata ggaggtttta ttaatactaa
agaatacaaa aatgtagaaa 3000tagaagtttt aggcaaaagg attaaaggga caatcatgac
aggggacacc ccgattaaca 3060tttttggtag aaatttgcta acagctctgg ggatgtctct
aaattttccc atagctaaag 3120tagagcctgt aaaagtcgcc ttaaagccag gaaaggatgg
accaaaattg aagcagtggc 3180cattatcaaa agaaaagata gttgcattaa gagaaatctg
tgaaaagatg gaaaaggatg 3240gtcagttgga ggaagctccc ccgaccaatc catacaacac
ccccacattt gctataaaga 3300aaaaggataa gaacaaatgg agaatgctga tagattttag
ggaactaaat agggtcactc 3360aggactttac ggaagtccaa ttaggaatac cacaccctgc
aggactagca aaaaggaaaa 3420gaattacagt actggatata ggtgatgcat atttctccat
acctctagat gaagaattta 3480ggcagtacac tgcctttact ttaccatcag taaataatgc
agagccagga aaacgataca 3540tttataaggt tctgcctcag ggatggaagg ggtcaccagc
catcttccaa tacactatga 3600gacatgtgct agaacccttc aggaaggcaa atccagatgt
gaccttagtc cagtatatgg 3660atgacatctt aatagctagt gacaggacag acctggaaca
tgacagggta gttttacagt 3720caaaggaact cttgaatagc atagggtttt ctaccccaga
agagaaattc caaaaagatc 3780ccccatttca atggatgggg tacgaattgt ggccaacaaa
atggaagttg caaaagatag 3840agttgccaca aagagagacc tggacagtga atgatataca
gaagttagta ggagtattaa 3900attgggcagc tcaaatttat ccaggtataa aaaccaaaca
tctctgtagg ttaattagag 3960gaaaaatgac tctaacagag gaagttcagt ggactgagat
ggcagaagca gaatatgagg 4020aaaataaaat aattctcagt caggaacaag aaggatgtta
ttaccaagaa ggcaagccat 4080tagaagccac ggtaataaag agtcaggaca atcagtggtc
ttataaaatt caccaagaag 4140acaaaatact gaaagtagga aaatttgcaa agataaagaa
tacacatacc aatggagtga 4200gactattagc acatgtaata cagaaaatag gaaaggaagc
aatagtgatc tggggacagg 4260tcccaaaatt ccacttacca gttgagaagg atgtatggga
acagtggtgg acagactatt 4320ggcaggtaac ctggataccg gaatgggatt ttatctcaac
accaccgcta gtaagattag 4380tcttcaatct agtgaaggac cctatagagg gagaagaaac
ctattataca gatggatcat 4440gtaataaaca gtcaaaagaa gggaaagcag gatatatcac
agataggggc aaagacaaag 4500taaaagtgtt agaacagact actaatcaac aagcagaatt
ggaagcattt ctcatggcat 4560tgacagactc agggccaaag gcaaatatta tagtagattc
acaatatgtt atgggaataa 4620taacaggatg ccctacagaa tcagagagca ggctagttaa
tcaaataata gaagaaatga 4680ttaaaaagtc agaaatttat gtagcatggg taccagcaca
caaaggtata ggaggaaacc 4740aagaaataga ccacctagtt agtcaaggga ttagacaagt
tctcttcttg gaaaagatag 4800agccagcaca agaagaacat gataaatacc atagtaatgt
aaaagaattg gtattcaaat 4860ttggattacc cagaatagtg gccagacaga tagtagacac
ctgtgataaa tgtcatcaga 4920aaggagaggc tatacatggg caggcaaatt cagatctagg
gacttggcaa atggattgta 4980cccatctaga gggaaaaata atcatagttg cagtacatgt
agctagtgga ttcatagaag 5040cagaggtaat tccacaagag acaggaagac agacagcact
atttctgtta aaattggcag 5100gcagatggcc tattacacat ctacacacag ataatggtgc
taactttgct tcgcaagaag 5160taaagatggt tgcatggtgg gcagggatag agcacacctt
tggggtacca tacaatccac 5220agagtcaggg agtagtggaa gcaatgaatc accacctgaa
aaatcaaata gatagaatca 5280gggaacaagc aaattcagta gaaaccatag tattaatggc
agttcattgc atgaatttta 5340aaagaagggg aggaataggg gatatgactc cagcagaaag
attaattaac atgatcacta 5400cagaacaaga gatacaattt caacaatcaa aaaactcaaa
atttaaaaat tttcgggtct 5460attacagaga aggcagagat caactgtgga agggacccgg
tgagctattg tggaaagggg 5520aaggagcagt catcttaaag gtagggacag acattaaggt
agtacccaga agaaaggcta 5580aaattatcaa agattatgga ggaggaaaag aggtggatag
cagttcccac atggaggata 5640ccggagaggc tagagaggtg gcatagcctc ataaaatatc
tgaaatataa aactaaagat 5700ctacaaaagg tttgctatgt gccccatttt aaggtcggat
gggcatggtg gacctgcagc 5760agagtaatct tcccactaca ggaaggaagc catttagaag
tacaagggta ttggcatttg 5820acaccagaaa aagggtggct cagtacttat gcagtgagga
taacctggta ctcaaagaac 5880ttttggacag atgtaacacc aaactatgca gacattttac
tgcatagcac ttatttccct 5940tgctttacag cgggagaagt gagaagggcc atcaggggag
aacaactgct gtcttgctgc 6000aggttcccga gagctcataa gtaccaggta ccaagcctac
agtacttagc actgaaagta 6060gtaagcgatg tcagatccca gggagagaat cccacctgga
aacagtggag aagagacaat 6120aggagaggcc ttcgaatggc taaacagaac agtagaggag
ataaacagag aggcggtaaa 6180ccacctacca agggagctaa ttttccaggt ttggcaaagg
tcttgggaat actggcatga 6240tgaacaaggg atgtcaccaa gctatgtaaa atacagatac
ttgtgtttaa tacaaaaggc 6300tttatttatg cattgcaaga aaggctgtag atgtctaggg
gaaggacatg gggcaggggg 6360atggagacca ggacctcctc ctcctccccc tccaggacta
gcataaatgg aagaaagacc 6420tccagaaaat gaaggaccac aaagggaacc atgggatgaa
tgggtagtgg aggttctgga 6480agaactgaaa gaagaagctt taaaacattt tgatcctcgc
ttgctaactg cacttggtaa 6540tcatatctat aatagacatg gagacaccct tgagggagca
ggagaactca ttagaatcct 6600ccaacgagcg ctcttcatgc atttcagagg cggatgcatc
cactccagaa tcggccaacc 6660tgggggagga aatcctctct cagctatacc gccctctaga
agcatgctat aacacatgct 6720attgtaaaaa gtgttgctac cattgccagt tttgttttct
taaaaaaggc ttggggatat 6780gttatgagca atcacgaaag agaagaagaa ctccgaaaaa
ggctaaggct aatacatctt 6840ctgcatcaaa caagtaagta tgggatgtct tgggaatcag
ctgcttatcg ccatcttgct 6900tttaagtgtc tatgggatct attgtactct atatgtcaca
gtcttttatg gtgtaccagc 6960ttggaggaat gcgacaattc ccctcttttg tgcaaccaag
aatagggata cttggggaac 7020aactcagtgc ctaccagata atggtgatta ttcagaagtg
gcccttaatg ttacagaaag 7080ctttgatgcc tggaataata cagtcacaga acaggcaata
gaggatgtat ggcaactctt 7140tgagacctca ataaagcctt gtgtaaaatt atccccatta
tgcattacta tgagatgcaa 7200taaaagtgag acagatagat ggggattgac aaaatcaata
acaacaacag catcaacaac 7260atcaacgaca gcatcagcaa aagtagacat ggtcaatgag
actagttctt gtatagccca 7320ggataattgc acaggcttgg aacaagagca aatgataagc
tgtaaattca acatgacagg 7380gttaaaaaga gacaagaaaa aagagtacaa tgaaacttgg
tactctgcag atttggtatg 7440tgaacaaggg aataacactg gtaatgaaag tagatgttac
atgaaccact gtaacacttc 7500tgttatccaa gagtcttgtg acaaacatta ttgggatgct
attagattta ggtattgtgc 7560acctccaggt tatgctttgc ttagatgtaa tgacacaaat
tattcaggct ttatgcctaa 7620atgttctaag gtggtggtct cttcatgcac aaggatgatg
gagacacaga cttctacttg 7680gtttggcttt aatggaacta gagcagaaaa tagaacttat
atttactggc atggtaggga 7740taataggact ataattagtt taaataagta ttataatcta
acaatgaaat gtagaagacc 7800aggaaataag acagttttac cagtcaccat tatgtctgga
ttggttttcc actcacaacc 7860aatcaatgat aggccaaagc aggcatggtg ttggtttgga
ggaaaatgga aggatgcaat 7920aaaagaggtg aagcagacca ttgtcaaaca tcccaggtat
actggaacta acaatactga 7980taaaatcaat ttgacggctc ctggaggagg agatccggaa
gttaccttca tgtggacaaa 8040ttgcagagga gagttcctct actgtaaaat gaattggttt
ctaaattggg tagaagatag 8100gaatacagct aaccagaagc caaaggaaca gcataaaagg
aattacgtgc catgtcatat 8160tagacaaata atcaacactt ggcataaagt aggcaaaaat
gtttatttgc ctccaagaga 8220gggagacctc acgtgtaact ccacagtgac cagtctcata
gcaaacatag attggattga 8280tggaaaccaa actaatatca ccatgagtgc agaggtggca
gaactgtatc gattggaatt 8340gggagattat aaattagtag agatcactcc aattggcttg
gcccccacag atgtgaagag 8400gtacactact ggtggcacct caagaaataa aagaggggtc
tttgtgctag ggttcttggg 8460ttttctcgca acggcaggtt ctgcaatggg cgcggcgtcg
ttgacgctga ccgctcagtc 8520ccgaacttta ttggctggga tagtgcagca acagcaacag
ctgttggacg tggtcaagag 8580acaacaagaa ttgttgcgac tgaccgtctg gggaacaaag
aacctccaga ctagggtcac 8640tgccatcgag aagtacttaa aggaccaggc gcagctgaat
gcttggggat gtgcgtttag 8700acaagtctgc cacactactg taccatggcc aaatgcaagt
ctaacaccaa agtggaacaa 8760tgagacttgg caagagtggg agcgaaaggt tgacttcttg
gaagaaaata taacagccct 8820cctagaggag gcacaaattc aacaagagaa gaacatgtat
gaattacaaa agttgaatag 8880ctgggatgtg tttggcaatt ggtttgacct tgcttcttgg
ataaagtata tacaatatgg 8940agtttatata gttgtaggag taatactgtt aagaatagtg
atctatatag tacaaatgct 9000agctaagtta aggcaggggt ataggccagt gttctcttcc
ccaccctctt atttccagca 9060gacccatatc caacaggacc cggcactgcc aaccagagaa
ggcaaagaaa gagacggtgg 9120agaaggcggt ggcaacagct cctggccttg gcagatagaa
tatattcatt tcctgatccg 9180ccaactgata cgcctcttga cttggctatt cagcaactgc
agaaccttgc tatcgagagt 9240ataccagatc ctccaaccaa tactccagag gctctctgcg
accctacaga ggattcgaga 9300agtcctcagg actgaactga cctacctaca atatgggtgg
agctatttcc atgaggcggt 9360ccaggccgtc tggagatctg cgacagagac tcttgcgggc
gcgtggggag acttatggga 9420gactcttagg agaggtggaa gatggatact cgcaatcccc
aggaggatta gacaagggct 9480tgagctcact ctcttgtgag ggacagaaat acaatcaggg
acagtatatg aatactccat 9540ggagaaaccc agctgaagag agagaaaaat tagcatacag
aaaacaaaat atggatgata 9600tagatgagta agatgatgac ttggtagggg tatcagtgag
gccaaaagtt cccctaagaa 9660caatgagtta caaattggca atagacatgt ctcattttat
aaaagaaaag gggggactgg 9720aagggattta ttacagtgca agaagacata gaatcttaga
catatactta gaaaaggaag 9780aaggcatcat accagattgg caggattaca cctcaggacc
aggaattaga tacccaaaga 9840catttggctg gctatggaaa ttagtccctg taaatgtatc
agatgaggca caggaggatg 9900aggagcatta tttaatgcat ccagctcaaa cttcccagtg
ggatgaccct tggggagagg 9960ttctagcatg gaagtttgat ccaactctgg cctacactta
tgaggcatat gttagatacc 10020cagaagagtt tggaagcaag tcaggcctgt cagaggaaga
ggttagaaga aggctaaccg 10080caagaggcct tcttaacatg gctgacaaga aggaaactcg
ctgaaacagc agggactttc 10140cacaagggga tgttacgggg aggtactggg gaggagccgg
tcgggaacgc ccactttctt 10200gatgtataaa tatcactgca tttcgctctg tattcagtcg
ctctgcggag aggctggcag 10260attgagccct gggaggttct ctccagcact agcaggtaga
gcctgggtgt tccctgctag 10320actctcacca gcacttggcc ggtgctgggc agagtgactc
cacgcttgct tgcttaaagc 10380cctcttcaat aaagctgcca ttttagaagt aagctagtgt
gtgttcccat ctctcctagc 10440cgccgcctgg tcaactcggt actcaataat aagaagaccc
tggtctgtta ggaccctttc 10500tgctttggga aaccgaagca ggaaaatccc tagca
105353869713DNAHuman immunodeficiency virus 2
386agtcgctctg cggagaggct ggcagattga gccctgggag gttctctcca gcactagcag
60gtagagcctg ggtgttccct gctagactct caccggtgct tggccggcac tgggcagacg
120gctccacgct tgcttgctta aaagacctct taataaagct gccagttaga agcaagttaa
180gtgtgtgttc ccatctctcc tagtcgccgc ctggtcattc ggtgttcatc tgaataacaa
240gaccctggtc tgttaggacc ctttctgctt tgggaaacca aagcaggaaa atccctagca
300ggttggcgcc cgaacaggga cttagagaag actgaaaagc cttggaacac ggctgagtga
360aggcagtaag ggcggcagga acaaaccacg acggagtgct cctagaaagg cgcaggccaa
420ggtaccaaag gcggcgtgtg gagcgggagt aaagaggcct ccgggtgaag gtaagtacct
480acaccaaaaa attgtagcca ggaagggctt gttatcctac ctttagacag gtagaagatt
540gtgggagatg ggcgcgagaa actccgtctt gaaagggaaa aaagcagacg aattagaaac
600aattaggtta cggcccggcg gaaagaaaaa atacaggcta aagcatattg tgtgggcagc
660gaatgaattg gacagattcg gattagcaga gagcctgttg gagtcaaaag aaggttgcca
720aagaattctt acagttttag gtccattagt accgacaggt tcagaaaatt taaaaagcct
780ttttaatact gtctgcgtca tttggtgcat acacgcagaa gagaaagtga aagatactga
840aggagcaaaa caaatagtac agagacatct agcggcagaa acaggaactg cagagaaaat
900gccaaataca agtagaccaa cagcaccacc tagcgggaag ggaggaaact tccccgtaca
960acaagtaggc ggcaattata cccatgtgcc gctgagtcct cgaaccctaa atgcttgggt
1020aaaattagta gaggaaaaga agttcggggc agaggtagtg ccaggatttc aggcactctc
1080agaaggctgc acgccctatg atatcaacca aatgcttaat tgtgtgggcg accatcaagc
1140agctatgcaa ataatcaggg agatcgttaa tgaagaagca gcagattggg atgtgcaaca
1200tccaatacca ggtcccttac cagcggggca gcttagagaa ccaagagggt ctgacatagc
1260agggacaaca agcacagtag atgaacagat ccagtggatg tttaggccac aaaatcccgt
1320accagtggga aacatctata ggagatggat ccagatagga ctgcagaagt gcgtcaggat
1380gtacaacccg accaacatcc tagacataaa acaaggacca aaggaaccat tccaaagtta
1440tgtagataga ttctacaaaa gcttgagggc agaacaaaca gatccagcag tgaagaattg
1500gatgacccag acactactag tacagaatgc caacccagac tgtaaattag tactaaaagg
1560actagggatg aatcctacct tagaagagat gctaaccgcc tgccaagggg taggtgggcc
1620aggccagaaa gctagactaa tggcagaagc cttaaaagag gccttgacac cagcccctat
1680cccatttgca gcagcccagc agaaaaggac aattaaatgc tggaattgtg gaaaggaagg
1740acactcggca agacaatgcc gagcacctag aagacagggc tgctggaagt gtggtaaacc
1800aggacatgtc atagcaaatt gcccagatag acaggtgggt tttttaggga tgggcccccg
1860gggaaagaag ccccgcaact tccccgtggc ccaagtcccg caggggctaa caccaacagc
1920acccccagta gatccagcag tggacctact ggagaattat atgcagcaag gaaaaagaca
1980aagagaacag agagagagac catacaaaga agtgacagag gacttactgc acctcgagca
2040gggagaggca ccatgcagag agacgacaga ggacttgctg cacctcaatt ctctcttttg
2100aaaagaccag tagtcacggc atacgtcgag ggccagccag tagaagttct gctagacacg
2160ggggctgacg actcaatagt agcagggata gagttaggga gcaattatag tccaaagata
2220gtaggaggaa tagggggatt cataaatacc aaggaatata aaaatgtaaa aatagaagtt
2280ttaggtaaaa aggtaagggc caccataatg acaggtgaca ccccaatcaa catttttggc
2340agaaatattc tgacagcctt aggcatgtca ttaaatttac cagtcgccaa aatagaacca
2400ataaaaataa tgttaaagcc aggaaaagat ggaccaaaac tgaggcaatg gcccttaaca
2460aaagaaaaaa tagaggcact aaaagaaatc tgtgaaaaaa tggaaagaga aggccagcta
2520gaggaagcgc ctccaactaa tccttataac acccccacat ttgcaatcaa gaaaaaggac
2580aaaaataaat ggaggatgct aatagatttt agagaactaa acaaggtaac tcaagatttc
2640acagaaattc agttaggaat tccacaccca gcaggattgg ccaagaaaaa aagaattact
2700gtactagata taggggatgc ttacttttcc ataccactac atgaagactt tagacagtat
2760actgcattta ctttaccatc aataaacaat gcagaaccag gaaaaagata tatatataag
2820gtcctgcctc agggatggaa ggggtcacca gcaatttttc aatacacaat gaggcaggtc
2880ttagaaccat tcagaaaagc aaacctagat gtcattatca ttcagtacat ggatgatatc
2940ctaatagcta gtgacaggac agatctagaa catgacaagg tggtcctgca gctaaaggaa
3000cttctaaata acctaggatt ttctacccca gatgagaagt tccaaaagga ccctccatac
3060cactggatgg gctatgaact gtggccaact aagtggaagc tgcagaagat acagttgccc
3120caaaaagatg tatggacagt aaatgacatc caaaagttag tgggtgtctt aaactgggca
3180gcacaaatct acccagggat aaaaaccaga cacttatgta agctaattag aggaaaaatg
3240acactcacag aagaagtaca gtggacagaa ctagcagagg cggagttaga agagaacaag
3300attatcttaa gccaggagca agagggacac tattaccaag aagaaaaaga gttagaagca
3360acagtccaaa aggatcaaga caatcagtgg acatataaag tacaccaggg agagaaaatt
3420ctaaaagtag ggaaatatgc aaagataaaa aatacccata ccaatggggt cagattgtta
3480gcacaagtag ttcaaaagat aggaaaagaa gcactaatca tttggggacg aataccaaaa
3540tttcacctac cagtagaaag agagacatgg gaacagtggt gggatgacta ctggcaggtg
3600acatggatcc ctgactggga cttcgtatct accccgccgc tggtcagact agcatttaac
3660ctggtaaaag atcctatacc aagaacagag actttctaca cagatggatc ctgcaatagg
3720caatcaaagg aaggaaaagc aggatatgta acagatagag ggagagacaa ggtaaggatg
3780ctagaacaaa ctaccaatca gcaagcagaa ttagaagcct ttgcaatggc actaacagac
3840tcaggtccaa aagccaatat tatagtagac tcacagtatg taatggggat agtagcaggc
3900cagccaacag aatcagagag tagaatagta aatcaaatca tagaggagat gataaaaaag
3960gaagcaatct atgttgcatg ggtcccagcc cataaaggca taggagggaa tcaggaggta
4020gatcagttag taagtcaggg catcagacaa gtgttgttcc tggaaaaaat agagcccgct
4080caggaagaac atgagaaata ccatagcaat gtaaaagaac tatcccataa atttggattg
4140cccaaattag tagcaagaca aatagtaaac acatgtgccc aatgtcaaca gaaaggggag
4200gctatacatg ggcaagtaga tgcagaatta ggcacttggc aaatggactg cacacactta
4260gaaggaaaga tcattatagt agcagtacat gttgcaagtg gattcataga agcagaagtc
4320atcccacagg aatcaggaag gcagacagca ctcttcctat taaaactggc cagtaggtgg
4380ccaataacac acttgcacac agataatggt gccaacttca cttcacagga agtaaaaatg
4440gtagcatggt gggtaggtat agaacaatct ttcggagtac cttacaatcc acaaagccaa
4500ggagtagtag aagcaatgaa tcaccaccta aaaaatcaga taagtagaat tagagaacag
4560gcaaatacag tagaaacaat agtactgatg gcaacacact gcatgaattt taaaagaagg
4620ggaggaatag gggatatgac cccagcagaa agactaatca atatgatcac cacagaacaa
4680gaaatacaat tcctccacgc caaaaattca aaattaaaaa attttcgggt ctatttcaga
4740gaaggcagag atcagctgtg gaaaggaccc ggggaactac tgtggaaggg agacggagca
4800gtcatagtca aggtagggac agacataaaa gtagtaccaa ggaggaaagc caagatcatc
4860aaagactatg gaggaaggca agaactggat agtggttccc acttggaggg tgccagggag
4920gatggagaaa tggcatagcc ttgtcaaata tctaaaatac agaacaaaag atctagaaga
4980cgtgtgctat gttccccacc ataaagtagg atgggcatgg tggacttgca gcagggtaat
5040attcccatta aagggaaaca gtcatctaga aatacaggca tattggaacc taacgccaga
5100aaaaggatgg ctctcctctt attcagtaag aatgacttgg tatacggaaa ggttctggac
5160agatgttacc ccagactgtg cagactccct aatacatagc acttatttct cttgctttac
5220agcaggtgaa gtaagaagag ccatcagagg ggaaaagtta ttgtcctgct gcaattatcc
5280ccaagcccat agagcccagg taccgtcact ccaatttttg gccttagtgg tagtgcagca
5340aaatgacaga ccccagagaa acggtacccc caggaaacag tggcgaagag actatcgaag
5400aggccttcaa ttggctagac aggacggtag aagccataaa cagagaggca gtgaatcacc
5460tgccccgaga gcttattttc caggtgtggc agaggtcctg gagatactgg catgatgaac
5520aagggatgtc acaaagttac acaaagtata gatatttgtg cttaatacag aaggctatgt
5580tcacacattg taagagaggg tgcacttgcc tggggggagg acatgggcca ggagggtgga
5640gaccaggacc tccccctcct ccccctccag gtctagtcta atgactgaag caccaacaga
5700gtttcccccg gaggatggga ccccaccgag ggaaccaggg gatgagtgga taatagaaat
5760cctgagaaaa ataaagaaag aagctttaaa gcattttgac cctcgcttgc taactgctct
5820tggcaactat atccatacta gacatggaga cacccttgaa ggcgccagag agctcattaa
5880tgtcctacaa cgagccctct tcatgcactt cagagcggga tgtaggctct caagaattgg
5940ccaaacaggg ggaagaactc ctttcccagc tacatcgacc cctagaacca tgcaataaca
6000aatgctattg taaaggatgc tgcttccact gccagctgtg ttttttaaac aaggggctcg
6060ggatatgtta tgaccggaag ggcagacgaa gaagaactcc gaagaaaact aaggctcatt
6120catcttctgc atcagacaag tgagtatgat gggtggtaga aatcagctgc ttgttgccat
6180tttgctaact agtacttgct tgatatattg caccaattat gtgactgttt tctatggcat
6240acccgcgtgg agaaatgcat ccattcccct cttttgtgca accaagaata gggatacttg
6300gggaaccata cagtgcttgc cagacaatga tgattatcag gagataactt tgaatgtgac
6360agaggctttc gatgcatggg ataatacagt aacagaacaa gcaatagaag atgtctggaa
6420tctatttgag acatcaataa aaccatgtgt caaattaacg cctttatgtg tagcaatgag
6480atgtaacaac acagatgcaa ggaacacaac cacacccaca acagcatccc cgcgtacaat
6540aaaacccgtg acagagataa gtgagaattc ctcatgcata cgcgcaaaca actgctcagg
6600attgggagaa gaagaggtgg tcaattgtca attcaatatg acaggattag agagagataa
6660gaaaaagcaa tatagtgaga catggtactc gaaggatgta gtttgtgaag gaaatggcac
6720cacagataca tgttacatga accattgcaa cacatcggtc atcacagagt catgtgacaa
6780gcactattgg gatgctatga ggtttagata ctgtgcacca ccaggttttg ccctactaag
6840atgcaatgat accaattatt caggctttgc gcccaattgc tctaaggtag tagctgctac
6900atgcaccaga atgatggaaa cgcaaacttc tacatggttt ggctttaatg gcactagagc
6960agaaaataga acatttatct attggcatgg tagggataac agaactatca tcagcttaaa
7020caaatattat aatctcacta tacattgtaa gaggccagga aataagacag tggtaccaat
7080aacacttatg tcagggttaa ggtttcactc ccagccggtc atcaataaaa gacccagaca
7140agcatggtgt tggttcaaag gtgaatggaa gggagccatg caggaggtga aggaaaccct
7200tgcaaaacat cccaggtata aaggaaccaa tgaaacaaag aatattaact ttacagcacc
7260aggaaagggc tcagacccag aggtggcata catgtggact aactgcagag gagaatttct
7320ctactgcaac atgacttggt tcctcaattg gatagaaaat aagacacacc gcaattatgt
7380accgtgccat ataagacaaa taattaacac ctggcataag gtagggaaaa atgtatattt
7440gcctcccagg gaaggggagt tgacctgcaa ctcaacagta actagcataa ttgctaacat
7500tgatgcaaat ggaaataata caaatattac ctttagtgca gaggtggcag aactataccg
7560attagagttg ggagattata aattggtaga aataacacca attggcttcg cacctacagc
7620agaaaaaaga tactcctcta ctccaatgag gaacaagaga ggtgtgttcg tgctagggtt
7680cttgggtttt ctcgcaacag caggctctgc aatgggcgcg gcgtccttaa cgctgtcggc
7740tcagtctcgg actttactgg ccgggatagt gcagcaacag caacagctgt tggacgtggt
7800caagagacaa caggaaatgt tgcgactgac cgtctgggga acaaaaaatc tccaggcaag
7860agtcactgct atcgagaagt acttaaagga ccaggcgcaa ctaaattcat ggggatgtgc
7920atttagacaa gtctgccaca ctactgtacc atgggtaaat gataccttaa cgcctgagtg
7980gaacaatatg acgtggcaag aatgggaagg caaaatccgc gacctggagg caaatatcag
8040tcaacaatta gaacaagcac aaattcagca agagaagaat atgtatgaac tacaaaagtt
8100aaatagctgg gatgtttttg gtaactggtt tgacttaacc tcctggatca agtatattca
8160atatggagtt tatataataa taggaatagt agttcttaga atagtaatat atatagtaca
8220gatgttaagt agacttagaa agggctatag gcctgttttc tcttcccccc ccggttacct
8280ccaacagatc catatccaca aggactggga acagccagcc agagaagaaa cagaagaaga
8340cgttggaaac aacgttggag acagctcgtg gccttggccg ataagatata tacatttcct
8400gatccaccag ctgattcgcc tcttggccgg actatacaac atctgcagga acttactatc
8460caggatctcc ctgaccctcc gaccagtttt ccagagtctt cagagggcac tgacagcaat
8520cagagactgg ctaagaactg acgcagccta cttgcagtat gggtgcgagt ggatccaagg
8580agcgttccag gccttcgcaa gggctacgag agagactctt gcgggcacgt ggagagactt
8640gtggggggca ctgcagcgga tcgggagggg aatacttgca gtcccaagaa gaatcaggca
8700gggagcagag atcgccctcc tatgagggac agcggtatca gcagggagac tttatgaata
8760ccccatggag aaccccagca aaagaagggg agaaagaatt gtacaagcaa caaaatagag
8820atgatgtaga ttcggatgat gatgacctag taggggtctc tgtcacacca agagtaccac
8880taagagaatt gacacataga ttagcaatag atgtgtcaca ttttataaaa gaaaaagggg
8940gactggaagg gatgtattac agtgagagaa gacatagaat cttagacata taccttgaaa
9000aggaagaagg gataattgca gattggcaga actatactca tgggccagga ataagatacc
9060caatgttctt tgggtggcta tggaagctag taccagtaga tgtcacacga caggaggagg
9120acgatgggac tcactgttta ctacacccag cacaaacaag caggtttgat gacccgcatg
9180gggaaacact gatatggaag tttgacccca cgctggctca tgattacaag gcttttatcc
9240tgcacccaga ggaatttggg cataagtcag gcctgccaga agaagactgg aaggcaagac
9300tgaaagcaag agggatacca tttagttaga gacaggaaca gctatatttg gccagggcag
9360gaaataacta ctgaaaacag ctgagactgc agggactttc cgaaggggct gtaaccaggg
9420gagggacatg ggaggagccg gtggggaacg ccctcatact ttctgtataa agatacccgc
9480tgcttgcatt gtacttcagt cgctctgcgg agaggctggc agattgagcc ctgggaggtt
9540ctctccagca ctagcaggta gagcctgggt gttccctgct agactctcac cggtgcttgg
9600ccggcactgg gcagacggct ccacgcttgc ttgcttaaaa gacctcttaa taaagctgcc
9660agttagaagc aagttaagtg tgtgttccca tctctcctag tcgccgcctg gtc
971338711878DNAArtificial SequenceDescription of Artificial Sequence
Syntheticpolynucleotide 387gcctcactga ttaagcattg gtaactgtca gaccaagttt
actcatatat actttagatt 60gatttaaaac ttcattttta atttaaaagg atctaggtga
agatcctttt tgataatctc 120atgaccaaaa tcccttaacg tgagttttcg ttccactgag
cgtcagaccc cgtagaaaag 180atcaaaggat cttcttgaga tccttttttt ctgcgcgtaa
tctgctgctt gcaaacaaaa 240aaaccaccgc taccagcggt ggtttgtttg ccggatcaag
agctaccaac tctttttccg 300aaggtaactg gcttcagcag agcgcagata ccaaatactg
ttcttctagt gtagccgtag 360ttaggccacc acttcaagaa ctctgtagca ccgcctacat
acctcgctct gctaatcctg 420ttaccagtgg ctgctgccag tggcgataag tcgtgtctta
ccgggttgga ctcaagacga 480tagttaccgg ataaggcgca gcggtcgggc tgaacggggg
gttcgtgcac acagcccagc 540ttggagcgaa cgacctacac cgaactgaga tacctacagc
gtgagctatg agaaagcgcc 600acgcttcccg aagggagaaa ggcggacagg tatccggtaa
gcggcagggt cggaacagga 660gagcgcacga gggagcttcc agggggaaac gcctggtatc
tttatagtcc tgtcgggttt 720cgccacctct gacttgagcg tcgatttttg tgatgctcgt
caggggggcg gagcctatgg 780aaaaacgcca gcaacgcggc ctttttacgg ttcctggcct
tttgctggcc ttttgctcac 840atgttctttc ctgcgttatc ccctgattct gtggataacc
gtattaccgc ctttgagtga 900gctgataccg ctcgccgcag ccgaacgacc gagcgcagcg
agtcagtgag cgaggaagcg 960gaagagcgcc caatacgcaa accgcctctc cccgcgcgtt
ggccgattca ttaatgcagc 1020tggcacgaca ggtttcccga ctggaaagcg ggcagtgagc
gcaacgcaat taatgtgagt 1080tagctcactc attaggcacc ccaggcttta cactttatgc
ttccggctcg tatgttgtgt 1140ggaattgtga gcggataaca atttcacaca ggaaacagct
atgaccatga ttacgccaag 1200ctatttaggt gacactatag aatactcaag cttgggggga
tcctctagag tcgacctgca 1260ggcatgctat ttgatgaatt aactacactt aaaataatac
aattattatt aaattttttt 1320ttgatttatt tattaatttt taaacttaat catttgtatt
tgggaggaat tatatatatc 1380tttataatta ttttattttt ttttattttt ttattttttt
attattatta ttttttttta 1440tttttttttt ttactgtatc aaagaaaaac ctttaaaaaa
aaaattataa tttccccatc 1500ttactatatt tttaatacat acgttttaag gaattaaatt
agacaaaagc tatattatgc 1560tttacatata attagaattt ataaacgttt ggttattaga
tatttcatgt ctcagtaaag 1620tctttcaata catatgtaaa aaaatatata tgaatacaca
taagttgtta atatatttta 1680tatgcataaa tgtataaata tatatatata tatatatata
tgtatgtatg tatatgtgtg 1740tatatgaaat tatttcaatg tttaattttt taaattttaa
tttttttttt tttttttttt 1800tttattatgt atattgatct ttattattta aatattactt
ttttcgtttt ttcttctttt 1860tattattttt tttttttttt atattttata caaatggtaa
ttcaaataaa aggtataaat 1920ttatatttaa ttttctttta tggataaata aaagaaaaat
ataaatatat aaaaatataa 1980aaatatatat atgtatattg gggtgatgat aaaatgaaag
ataatatata tatatatata 2040tctttatttt tttttttttg tagaccccat tgtgagtaca
taaatatatt atataactcg 2100ggagcatcag tcatggaatt cttatttctt tttctttttt
gcctggccgg cctttttcgt 2160ggccgccggc cttttgtcgc ctcccagctg agacaggtcg
atccgtgtct cgtacaggcc 2220ggtgatgctc tggtggatca gggtggcgtc cagcacctct
ttggtgctgg tgtacctctt 2280ccggtcgatg gtggtgtcaa agtacttgaa ggcggcaggg
gctcccagat tggtcagggt 2340aaacaggtgg atgatattct cggcctgctc tctgatgggc
ttatcccggt gcttgttgta 2400ggcggacagc actttgtcca gattagcgtc ggccaggatc
actctcttgg agaactcgct 2460gatctgctcg atgatctcgt ccaggtagtg cttgtgctgt
tccacaaaca gctgtttctg 2520ctcattatcc tcgggggagc ccttcagctt ctcatagtgg
ctggccaggt acaggaagtt 2580cacatatttg gagggcaggg ccagttcgtt tcccttctgc
agttcgccgg cagaggccag 2640cattctcttc cggccgtttt ccagctcgaa cagggagtac
ttaggcagct tgatgatcag 2700gtcctttttc acttctttgt agcccttggc ttccagaaag
tcgatgggat tcttctcgaa 2760gctgcttctt tccatgatgg tgatccccag cagctctttc
acactcttca gtttcttgga 2820cttgcccttt tccactttgg ccaccaccag cacagaatag
gccacggtgg ggctgtcgaa 2880gccgccgtac ttcttagggt cccagtcctt ctttctggcg
atcagcttat cgctgttcct 2940cttgggcagg atagactctt tgctgaagcc gcctgtctgc
acctcggtct ttttcacgat 3000attcacttgg ggcatgctca gcactttccg cacggtggca
aaatcccggc ccttatccca 3060cacgatctcc ccggtttcgc cgtttgtctc gatcagaggc
cgcttccgga tctcgccgtt 3120ggccagggta atctcggtct tgaaaaagtt catgatgttg
ctgtagaaga agtacttggc 3180ggtagccttg ccgatttcct gctcgctctt ggcgatcatc
ttccgcacgt cgtacacctt 3240gtagtcgccg tacacgaact cgctttccag cttagggtac
tttttgatca gggcggttcc 3300cacgacggcg ttcaggtagg cgtcgtgggc gtggtggtag
ttgttgatct cgcgcacttt 3360gtaaaactgg aaatccttcc ggaaatcgga caccagcttg
gacttcaggg tgatcacttt 3420cacttcccgg atcagcttgt cattctcgtc gtacttagtg
ttcatccggg agtccaggat 3480ctgtgccacg tgctttgtga tctgccgggt ttccaccagc
tgtctcttga tgaagccggc 3540cttatccagt tcgctcaggc cgcctctctc ggccttggtc
agattgtcga actttctctg 3600ggtaatcagc ttggcgttca gcagctgccg ccagtagttc
ttcatcttct tcacgacctc 3660ttcggagggc acgttgtcgc tcttgccccg gttcttgtcg
cttctggtca gcaccttgtt 3720gtcgatggag tcgtccttca gaaagctctg aggcacgata
tggtccacat cgtagtcgga 3780cagccggttg atgtccagtt cctggtccac gtacatatcc
cgcccattct gcaggtagta 3840caggtacagc ttctcgttct gcagctgggt gttttccacg
gggtgttctt tcaggatctg 3900gctgcccagc tctttgatgc cctcttcgat ccgcttcatt
ctctcgcggc tgttcttctg 3960tcccttctgg gtggtctggt tctctctggc catttcgatc
acgatgttct cgggcttgtg 4020ccggcccatc actttcacga gctcgtccac caccttcact
gtctgcagga tgcccttctt 4080aatggcgggg ctgccggcca gattggcaat gtgctcgtgc
aggctatcgc cctggccgga 4140cacctgggct ttctggatgt cctctttaaa ggtcaggctg
tcgtcgtgga tcagctgcat 4200gaagtttctg ttggcgaagc cgtcggactt caggaaatcc
aggattgtct tgccggactg 4260cttgtcccgg atgccgttga tcagcttccg gctcagcctg
ccccagccgg tgtatctccg 4320ccgcttcagc tgcttcatca ctttgtcgtc gaacaggtgg
gcataggttt tcagccgttc 4380ctcgatcatc tctctgtcct caaacagtgt cagggtcagc
acgatatctt ccagaatgtc 4440ctcgttttcc tcattgtcca ggaagtcctt gtccttgata
attttcagca gatcgtggta 4500tgtgcccagg gaggcgttga accgatcttc cacgccggag
atttccacgg agtcgaagca 4560ctcgattttc ttgaagtagt cctctttcag ctgcttcacg
gtcactttcc ggttggtctt 4620gaacagcagg tccacgatgg cctttttctg ctcgccgctc
aggaaggcgg gctttctcat 4680tccctcggtc acgtatttca ctttggtcag ctcgttatac
acggtgaagt actcgtacag 4740caggctgtgc ttgggcagca ccttctcgtt gggcaggttc
ttatcgaagt tggtcatccg 4800ctcgatgaag ctctgggcgg aagcgccctt gtccaccact
tcctcgaagt tccagggggt 4860gatggtttcc tcgctctttc tggtcatcca ggcgaatctg
ctgtttcccc tggccagagg 4920gcccacgtag taggggatgc ggaaggtcag gatcttctcg
atcttttccc ggttgtcctt 4980caggaatggg taaaaatctt cctgccgccg cagaatggcg
tgcagctctc ccaggtggat 5040ctggtggggg atgctgccgt tgtcgaaggt ccgctgcttc
cgcagcaggt cctctctgtt 5100cagcttcacg agcagttcct cggtgccgtc catcttttcc
aggatgggct tgatgaactt 5160gtagaactct tcctggctgg ctccgccgtc aatgtagccg
gcgtagccgt tcttgctctg 5220gtcgaagaaa atctctttgt acttctcagg cagctgctgc
cgcacgagag ctttcagcag 5280ggtcaggtcc tggtggtgct cgtcgtatct cttgatcata
gaggcgctca ggggggcctt 5340ggtgatctcg gtgttcactc tcaggatgtc gctcagcagg
atggcgtcgg acaggttctt 5400ggcggccaga aacaggtcgg cgtactggtc gccgatctgg
gccagcaggt tgtccaggtc 5460gtcgtcgtag gtgtccttgc tcagctgcag tttggcatcc
tcggccaggt cgaagttgct 5520cttgaagttg ggggtcaggc ccaggctcag ggcaatcagg
tttccgaaca ggccattctt 5580cttctcgccg ggcagctggg cgatcagatt ttccagccgt
ctgctcttgc tcagtctggc 5640agacaggatg gccttggcgt ccacgccgct ggcgttgatg
gggttttcct cgaacagctg 5700gttgtaggtc tgcaccagct ggatgaacag cttgtccacg
tcgctgttgt cggggttcag 5760gtcgccctcg atcaggaagt ggccccggaa cttgatcatg
tgggccaggg ccagatagat 5820cagccgcagg tcggccttgt cggtgctgtc caccagtttc
tttctcaggt ggtagatggt 5880ggggtacttc tcgtggtagg ccacctcgtc cacgatgttg
ccgaagatgg ggtgccgctc 5940gtgcttctta tcctcttcca ccaggaagga ctcttccagt
ctgtggaaga agctgtcgtc 6000caccttggcc atctcgttgc tgaagatctc ttgcagatag
cagatccggt tcttccgtct 6060ggtgtatctt cttctggcgg ttctcttcag ccgggtggcc
tcggctgttt cgccgctgtc 6120gaacagcagg gctccgatca ggttcttctt gatgctgtgc
cggtcggtgt tgcccagcac 6180cttgaatttc ttgctgggca ccttgtactc gtcggtgatc
acggcccagc ccacagagtt 6240ggtgccgatg tccaggccga tgctgtactt cttgtcggct
gctgggactc cgtggatacc 6300gaccttccgc ttcttctttg gggccatctt atcgtcatcg
tctttgtaat caatatcatg 6360atccttgtag tctccgtcgt ggtccttata gtccattttt
ctcgagggat cctgatatat 6420ttctattagg tatttattat tataaaatat aaatcttgaa
tgataataaa taaaatatta 6480gttattcctt ttctagttta aaatatacat attataaata
tatatatata tatatatatt 6540tttattgtga caagaatata taattataaa ttatattatt
tatttttgta tttttttttt 6600tttttttttt tttttctttt tttgttttat ttttcttttt
ttttataaat attatttttt 6660tcttttatca tgcacattgg aataatacat taatatatat
atatatatta tattatacat 6720atattgaata atgtttataa aaaatgcata acttatatga
atataatttt ttttaaatat 6780gacaaaaaga aaaaaaaaaa aaaccaaaaa aaattaaaat
tgaaatgaaa tatataaata 6840tattatttat atatattata cattgtttaa tactactaca
tgtatatata tatattatat 6900atatatatat atatcaattt tttcaaaaat aaattaatat
aaaaagaggg gaaaaaaaaa 6960aaaaaaaaaa aaaaaagata attaagtaag catttaaaaa
tatataaatt gataatatat 7020aaaattaatc acatataaaa gcttataaac actaggttag
ctaattcgct tgtaagaggt 7080actctcgttt atgcaaaact atttgatata gcattttaac
aagtacacat atatatatgt 7140aatatatata ctatatatat ctattgcatg tgtactaagc
atgtgcatgg catccccttt 7200ttctcgtgtt taaaacagtt tgtatgataa aatataaagg
atttgaaaaa gagaaaaaaa 7260tatatgatct catcctatat agcgccataa tttttatttg
ggttgaataa aattttctac 7320taaatttagg tgtaagtaaa ataatggaat atatataagt
acaataaaaa agtgcataaa 7380ttaaaaaatt tttataataa atattttttt taaaaaagtc
aataataata ttaaatatat 7440ataacacagg attatatatg ttcactacaa ttttttatat
tataatataa attcttttca 7500attttcattt tattttacat acactttcct tttttgtcac
tatattttaa tattcacata 7560tttagtttaa atactggcta tttctttcta catttgctag
taacaattgt gtagtgctta 7620aatatataca cacacctaaa acttacaaag tatcctagga
ccatggccaa gcctttgtct 7680caagaagaat ccaccctcat tgaaagagca acggctacaa
tcaacagcat ccccatctct 7740gaagactaca gcgtcgccag cgcagctctc tctagcgacg
gccgcatctt cactggtgtc 7800aatgtatatc attttactgg gggaccttgt gcagaactcg
tggtgctggg cactgctgct 7860gctgcggcag ctggcaacct gacttgtatc gtcgcgatcg
gaaatgagaa caggggcatc 7920ttgagcccct gcggacggtg ccgacaggtg cttctcgatc
tgcatcctgg gatcaaagcc 7980atagtgaagg acagtgatgg acagccgacg gcagttggga
ttcgtgaatt gctgccctct 8040ggttatgtgt gggagggcta accgcgggta ccccattaaa
tttatttaat aatagattaa 8100aaatattata aaaataaaaa cataaacaca gaaattacaa
aaaaaataca tatgaatttt 8160ttttttgtaa tcttccttat aaatatagaa taatgaatca
tataaaacat atcattattc 8220atttatttac atttaaaatt attgtttcag tatctttaat
ttattatgta tatataaaaa 8280taacttacaa ttttattaat aaacaatata tgtttattaa
ttcatgtttt gtaatttatg 8340ggatagcgat tttttttact gtctgtattt tcttttttaa
ttatgtttta attgtattta 8400ttttattttt attattgttc tttttatagt attattttaa
aacaaaatgt attttctaag 8460aacttataat aataataata taaattttaa taaaaattat
atttatcttt tacaatatga 8520acataaagta caacattaat atatagcttt taatattttt
attcctaatc atgtaaatct 8580taaatttttc tttttaaaca tatgttaaat atttatttct
cattatatat aagaacatat 8640ttattacatc tagaggtacc gagctcgttt tcgacactgg
atggcggcgt tagtatcgaa 8700tcgacagcag tatagcgacc agcattcaca tacgattgac
gcatgatatt actttctgcg 8760cacttaactt cgcatctggg cagatgatgt cgaggcgaaa
aaaaatataa atcacgctaa 8820catttgatta aaatagaaca actacaatat aaaaaaacta
tacaaatgac aagttcttga 8880aaacaagaat ctttttattg tcagtactga ttagaaaaac
tcatcgagca tcaaatgaaa 8940ctgcaattta ttcatatcag gattatcaat accatatttt
tgaaaaagcc gtttctgtaa 9000tgaaggagaa aactcaccga ggcagttcca taggatggca
agatcctggt atcggtctgc 9060gattccgact cgtccaacat caatacaacc tattaatttc
ccctcgtcaa aaataaggtt 9120atcaagtgag aaatcaccat gagtgacgac tgaatccggt
gagaatggca aaagcttatg 9180catttctttc cagacttgtt caacaggcca gccattacgc
tcgtcatcaa aatcactcgc 9240atcaaccaaa ccgttattca ttcgtgattg cgcctgagcg
agacgaaata cgcgatcgct 9300gttaaaagga caattacaaa caggaatcga atgcaaccgg
cgcaggaaca ctgccagcgc 9360atcaacaata ttttcacctg aatcaggata ttcttctaat
acctggaatg ctgttttgcc 9420ggggatcgca gtggtgagta accatgcatc atcaggagta
cggataaaat gcttgatggt 9480cggaagaggc ataaattccg tcagccagtt tagtctgacc
atctcatctg taacatcatt 9540ggcaacgcta cctttgccat gtttcagaaa caactctggc
gcatcgggct tcccatacaa 9600tcgatagatt gtcgcacctg attgcccgac attatcgcga
gcccatttat acccatataa 9660atcagcatcc atgttggaat ttaatcgcgg cctcgaaacg
tgagtctttt ccttacccat 9720ggttgtttat gttcggatgt gatgtgagaa ctgtatccta
gcaagatttt aaaaggaagt 9780atatgaaaga agaacctcag tggcaaatcc taacctttta
tatttctcta caggggcgcg 9840gcgtggggac aattcaacgc gtctgtgagg ggagcgtttc
cctgctcgca ggtctgcagc 9900gaggagccgt aatttttgct tcgcgccgtg cggccatcaa
aatgtatgga tgcaaatgat 9960tatacatggg gatgtatggg ctaaatgtac gggcgacagt
cacatcatgc ccctgagctg 10020cgcacgtcaa gactgtcaag gagggtattc tgggcctcca
tgtcgctggc ctaacattag 10080taatgtaggt ctgactttca ctcatataag tcttatggta
actaaactaa ggtcttacct 10140ttactgatat atgtcttact ttcactaact taggtattac
ttttactaac ttaggtctta 10200aattcagtaa ctaaggtcat acttcgacta actaaggtct
tacattcact gatataggtc 10260ttatgattac taacttaggt cctaatttga ctaacataag
tcctaacatt agtaatgtag 10320gtcttaactt aactaactta ggtcttacct tcactaatat
aggtcttaat attactgact 10380taagtaatta aggtactaac ttaggtcgta aggtaactaa
tatataggtc ttaaggtaac 10440taatttaggt cttgacttaa taaatatagg tcctaacata
aatagtatag gtcctaatat 10500aagtactata ggccttaact taaccaacat aggtcctaac
ataagttata taggtcttaa 10560cgtaactaac ataagtcatt aaggtactaa gtttggtctt
aatttaacaa taacatgtcg 10620ctggcctaac attagtaatg taggtctgac tttcactcat
ataagtctta tggtaactaa 10680actaaggtct tacctttact gatatatgtc ttactttcac
taacttaggt attactttta 10740ctaacttagg tcttaaattc agtaactaag gtcatacttc
gactaactaa ggtcttacat 10800tcactgatat aggtcttatg attactaact taggtcctaa
tttgactaac ataagtccta 10860acattagtaa tgtaggtctt aacttaacta acttaggtct
taccttcact aatataggtc 10920ttaatattac tgacttaagt aattaaggta ctaacttagg
tcgtaaggta actaatatat 10980aggtcttaag gtaactaatt taggtcttga cttaataaat
ataggtccta acataaatag 11040tataggtcct aatataagta ctataggcct taacttaacc
aacataggtc ctaacataag 11100ttatataggt cttaacgtaa ctaacataag tcattaaggt
actaagtttg gtcttaattt 11160aacaataacc atgtcgctgg ccgggtggtc ttaatttaac
aaatatagac catgtcgctg 11220gccgggtgac ccggcgggga cgaggcaagc taaacagatc
ctcgtgatac gcctattttt 11280ataggttaat gtcatgataa taatggtttc ttaggacgga
tcgcttgcct gtaacttaca 11340cgcgcctcgt atcttttaat gatggaataa tttgggaatt
tactctgtgt ttatttattt 11400ttatgttttg tatttggatt ttagaaagta aataaagaag
gtagaagagt tacggaatga 11460agaaaaaaaa ataaacaaag gtttaaaaaa tttcaacaaa
aagcgtactt tacatatata 11520tttattagac aagaaaagca gattaaatag atatacattc
gattaacgat aagtaaaatg 11580taaaatcaca ggattttcgt gtgtggtctt ctacacagac
aagatgaaac aattcggcat 11640taatacctga gagcaggaag agcaagataa aaggtagtat
ttgttggcga tccccctaga 11700gtcttttaca tcttcggaaa acaaaaacta ttttttcttt
aatttctttt tttactttct 11760atttttaatt tatatattta tattaaaaaa tttaaattat
aattattttt atagcacgtg 11820atgaaaagga cccaggtggc acttttcggg gaaatctcga
cctgcagcgt acgaagct 1187838812044DNAArtificial SequenceDescription of
Artificial Sequence Syntheticpolynucleotide 388gcctcactga ttaagcattg
gtaactgtca gaccaagttt actcatatat actttagatt 60gatttaaaac ttcattttta
atttaaaagg atctaggtga agatcctttt tgataatctc 120atgaccaaaa tcccttaacg
tgagttttcg ttccactgag cgtcagaccc cgtagaaaag 180atcaaaggat cttcttgaga
tccttttttt ctgcgcgtaa tctgctgctt gcaaacaaaa 240aaaccaccgc taccagcggt
ggtttgtttg ccggatcaag agctaccaac tctttttccg 300aaggtaactg gcttcagcag
agcgcagata ccaaatactg ttcttctagt gtagccgtag 360ttaggccacc acttcaagaa
ctctgtagca ccgcctacat acctcgctct gctaatcctg 420ttaccagtgg ctgctgccag
tggcgataag tcgtgtctta ccgggttgga ctcaagacga 480tagttaccgg ataaggcgca
gcggtcgggc tgaacggggg gttcgtgcac acagcccagc 540ttggagcgaa cgacctacac
cgaactgaga tacctacagc gtgagctatg agaaagcgcc 600acgcttcccg aagggagaaa
ggcggacagg tatccggtaa gcggcagggt cggaacagga 660gagcgcacga gggagcttcc
agggggaaac gcctggtatc tttatagtcc tgtcgggttt 720cgccacctct gacttgagcg
tcgatttttg tgatgctcgt caggggggcg gagcctatgg 780aaaaacgcca gcaacgcggc
ctttttacgg ttcctggcct tttgctggcc ttttgctcac 840atgttctttc ctgcgttatc
ccctgattct gtggataacc gtattaccgc ctttgagtga 900gctgataccg ctcgccgcag
ccgaacgacc gagcgcagcg agtcagtgag cgaggaagcg 960gaagagcgcc caatacgcaa
accgcctctc cccgcgcgtt ggccgattca ttaatgcagc 1020tggcacgaca ggtttcccga
ctggaaagcg ggcagtgagc gcaacgcaat taatgtgagt 1080tagctcactc attaggcacc
ccaggcttta cactttatgc ttccggctcg tatgttgtgt 1140ggaattgtga gcggataaca
atttcacaca ggaaacagct atgaccatga ttacgccaag 1200ctatttaggt gacactatag
aatactcaag cttgggggga tcctctagag tcgactaata 1260cgactcacta taggaacata
atctatagcg gcgttttaga gctagaaata gcaagttaaa 1320ataaggctag tccgttatca
acttgaaaaa gtggcaccga gtcggtgcta gcataacccc 1380ttggggcctc taaacgggtc
ttgaggggtt ttttggtcga cctgcaggca tgctatttga 1440tgaattaact acacttaaaa
taatacaatt attattaaat ttttttttga tttatttatt 1500aatttttaaa cttaatcatt
tgtatttggg aggaattata tatatcttta taattatttt 1560attttttttt atttttttat
ttttttatta ttattatttt tttttatttt ttttttttac 1620tgtatcaaag aaaaaccttt
aaaaaaaaaa ttataatttc cccatcttac tatattttta 1680atacatacgt tttaaggaat
taaattagac aaaagctata ttatgcttta catataatta 1740gaatttataa acgtttggtt
attagatatt tcatgtctca gtaaagtctt tcaatacata 1800tgtaaaaaaa tatatatgaa
tacacataag ttgttaatat attttatatg cataaatgta 1860taaatatata tatatatata
tatatatgta tgtatgtata tgtgtgtata tgaaattatt 1920tcaatgttta attttttaaa
ttttaatttt tttttttttt ttttttttta ttatgtatat 1980tgatctttat tatttaaata
ttactttttt cgttttttct tctttttatt attttttttt 2040ttttttatat tttatacaaa
tggtaattca aataaaaggt ataaatttat atttaatttt 2100cttttatgga taaataaaag
aaaaatataa atatataaaa atataaaaat atatatatgt 2160atattggggt gatgataaaa
tgaaagataa tatatatata tatatatctt tatttttttt 2220tttttgtaga ccccattgtg
agtacataaa tatattatat aactcgggag catcagtcat 2280ggaattctta tttctttttc
ttttttgcct ggccggcctt tttcgtggcc gccggccttt 2340tgtcgcctcc cagctgagac
aggtcgatcc gtgtctcgta caggccggtg atgctctggt 2400ggatcagggt ggcgtccagc
acctctttgg tgctggtgta cctcttccgg tcgatggtgg 2460tgtcaaagta cttgaaggcg
gcaggggctc ccagattggt cagggtaaac aggtggatga 2520tattctcggc ctgctctctg
atgggcttat cccggtgctt gttgtaggcg gacagcactt 2580tgtccagatt agcgtcggcc
aggatcactc tcttggagaa ctcgctgatc tgctcgatga 2640tctcgtccag gtagtgcttg
tgctgttcca caaacagctg tttctgctca ttatcctcgg 2700gggagccctt cagcttctca
tagtggctgg ccaggtacag gaagttcaca tatttggagg 2760gcagggccag ttcgtttccc
ttctgcagtt cgccggcaga ggccagcatt ctcttccggc 2820cgttttccag ctcgaacagg
gagtacttag gcagcttgat gatcaggtcc tttttcactt 2880ctttgtagcc cttggcttcc
agaaagtcga tgggattctt ctcgaagctg cttctttcca 2940tgatggtgat ccccagcagc
tctttcacac tcttcagttt cttggacttg cccttttcca 3000ctttggccac caccagcaca
gaataggcca cggtggggct gtcgaagccg ccgtacttct 3060tagggtccca gtccttcttt
ctggcgatca gcttatcgct gttcctcttg ggcaggatag 3120actctttgct gaagccgcct
gtctgcacct cggtcttttt cacgatattc acttggggca 3180tgctcagcac tttccgcacg
gtggcaaaat cccggccctt atcccacacg atctccccgg 3240tttcgccgtt tgtctcgatc
agaggccgct tccggatctc gccgttggcc agggtaatct 3300cggtcttgaa aaagttcatg
atgttgctgt agaagaagta cttggcggta gccttgccga 3360tttcctgctc gctcttggcg
atcatcttcc gcacgtcgta caccttgtag tcgccgtaca 3420cgaactcgct ttccagctta
gggtactttt tgatcagggc ggttcccacg acggcgttca 3480ggtaggcgtc gtgggcgtgg
tggtagttgt tgatctcgcg cactttgtaa aactggaaat 3540ccttccggaa atcggacacc
agcttggact tcagggtgat cactttcact tcccggatca 3600gcttgtcatt ctcgtcgtac
ttagtgttca tccgggagtc caggatctgt gccacgtgct 3660ttgtgatctg ccgggtttcc
accagctgtc tcttgatgaa gccggcctta tccagttcgc 3720tcaggccgcc tctctcggcc
ttggtcagat tgtcgaactt tctctgggta atcagcttgg 3780cgttcagcag ctgccgccag
tagttcttca tcttcttcac gacctcttcg gagggcacgt 3840tgtcgctctt gccccggttc
ttgtcgcttc tggtcagcac cttgttgtcg atggagtcgt 3900ccttcagaaa gctctgaggc
acgatatggt ccacatcgta gtcggacagc cggttgatgt 3960ccagttcctg gtccacgtac
atatcccgcc cattctgcag gtagtacagg tacagcttct 4020cgttctgcag ctgggtgttt
tccacggggt gttctttcag gatctggctg cccagctctt 4080tgatgccctc ttcgatccgc
ttcattctct cgcggctgtt cttctgtccc ttctgggtgg 4140tctggttctc tctggccatt
tcgatcacga tgttctcggg cttgtgccgg cccatcactt 4200tcacgagctc gtccaccacc
ttcactgtct gcaggatgcc cttcttaatg gcggggctgc 4260cggccagatt ggcaatgtgc
tcgtgcaggc tatcgccctg gccggacacc tgggctttct 4320ggatgtcctc tttaaaggtc
aggctgtcgt cgtggatcag ctgcatgaag tttctgttgg 4380cgaagccgtc ggacttcagg
aaatccagga ttgtcttgcc ggactgcttg tcccggatgc 4440cgttgatcag cttccggctc
agcctgcccc agccggtgta tctccgccgc ttcagctgct 4500tcatcacttt gtcgtcgaac
aggtgggcat aggttttcag ccgttcctcg atcatctctc 4560tgtcctcaaa cagtgtcagg
gtcagcacga tatcttccag aatgtcctcg ttttcctcat 4620tgtccaggaa gtccttgtcc
ttgataattt tcagcagatc gtggtatgtg cccagggagg 4680cgttgaaccg atcttccacg
ccggagattt ccacggagtc gaagcactcg attttcttga 4740agtagtcctc tttcagctgc
ttcacggtca ctttccggtt ggtcttgaac agcaggtcca 4800cgatggcctt tttctgctcg
ccgctcagga aggcgggctt tctcattccc tcggtcacgt 4860atttcacttt ggtcagctcg
ttatacacgg tgaagtactc gtacagcagg ctgtgcttgg 4920gcagcacctt ctcgttgggc
aggttcttat cgaagttggt catccgctcg atgaagctct 4980gggcggaagc gcccttgtcc
accacttcct cgaagttcca gggggtgatg gtttcctcgc 5040tctttctggt catccaggcg
aatctgctgt ttcccctggc cagagggccc acgtagtagg 5100ggatgcggaa ggtcaggatc
ttctcgatct tttcccggtt gtccttcagg aatgggtaaa 5160aatcttcctg ccgccgcaga
atggcgtgca gctctcccag gtggatctgg tgggggatgc 5220tgccgttgtc gaaggtccgc
tgcttccgca gcaggtcctc tctgttcagc ttcacgagca 5280gttcctcggt gccgtccatc
ttttccagga tgggcttgat gaacttgtag aactcttcct 5340ggctggctcc gccgtcaatg
tagccggcgt agccgttctt gctctggtcg aagaaaatct 5400ctttgtactt ctcaggcagc
tgctgccgca cgagagcttt cagcagggtc aggtcctggt 5460ggtgctcgtc gtatctcttg
atcatagagg cgctcagggg ggccttggtg atctcggtgt 5520tcactctcag gatgtcgctc
agcaggatgg cgtcggacag gttcttggcg gccagaaaca 5580ggtcggcgta ctggtcgccg
atctgggcca gcaggttgtc caggtcgtcg tcgtaggtgt 5640ccttgctcag ctgcagtttg
gcatcctcgg ccaggtcgaa gttgctcttg aagttggggg 5700tcaggcccag gctcagggca
atcaggtttc cgaacaggcc attcttcttc tcgccgggca 5760gctgggcgat cagattttcc
agccgtctgc tcttgctcag tctggcagac aggatggcct 5820tggcgtccac gccgctggcg
ttgatggggt tttcctcgaa cagctggttg taggtctgca 5880ccagctggat gaacagcttg
tccacgtcgc tgttgtcggg gttcaggtcg ccctcgatca 5940ggaagtggcc ccggaacttg
atcatgtggg ccagggccag atagatcagc cgcaggtcgg 6000ccttgtcggt gctgtccacc
agtttctttc tcaggtggta gatggtgggg tacttctcgt 6060ggtaggccac ctcgtccacg
atgttgccga agatggggtg ccgctcgtgc ttcttatcct 6120cttccaccag gaaggactct
tccagtctgt ggaagaagct gtcgtccacc ttggccatct 6180cgttgctgaa gatctcttgc
agatagcaga tccggttctt ccgtctggtg tatcttcttc 6240tggcggttct cttcagccgg
gtggcctcgg ctgtttcgcc gctgtcgaac agcagggctc 6300cgatcaggtt cttcttgatg
ctgtgccggt cggtgttgcc cagcaccttg aatttcttgc 6360tgggcacctt gtactcgtcg
gtgatcacgg cccagcccac agagttggtg ccgatgtcca 6420ggccgatgct gtacttcttg
tcggctgctg ggactccgtg gataccgacc ttccgcttct 6480tctttggggc catcttatcg
tcatcgtctt tgtaatcaat atcatgatcc ttgtagtctc 6540cgtcgtggtc cttatagtcc
atttttctcg agggatcctg atatatttct attaggtatt 6600tattattata aaatataaat
cttgaatgat aataaataaa atattagtta ttccttttct 6660agtttaaaat atacatatta
taaatatata tatatatata tatattttta ttgtgacaag 6720aatatataat tataaattat
attatttatt tttgtatttt tttttttttt tttttttttt 6780tctttttttg ttttattttt
cttttttttt ataaatatta tttttttctt ttatcatgca 6840cattggaata atacattaat
atatatatat atattatatt atacatatat tgaataatgt 6900ttataaaaaa tgcataactt
atatgaatat aatttttttt aaatatgaca aaaagaaaaa 6960aaaaaaaaac caaaaaaaat
taaaattgaa atgaaatata taaatatatt atttatatat 7020attatacatt gtttaatact
actacatgta tatatatata ttatatatat atatatatat 7080caattttttc aaaaataaat
taatataaaa agaggggaaa aaaaaaaaaa aaaaaaaaaa 7140aagataatta agtaagcatt
taaaaatata taaattgata atatataaaa ttaatcacat 7200ataaaagctt ataaacacta
ggttagctaa ttcgcttgta agaggtactc tcgtttatgc 7260aaaactattt gatatagcat
tttaacaagt acacatatat atatgtaata tatatactat 7320atatatctat tgcatgtgta
ctaagcatgt gcatggcatc ccctttttct cgtgtttaaa 7380acagtttgta tgataaaata
taaaggattt gaaaaagaga aaaaaatata tgatctcatc 7440ctatatagcg ccataatttt
tatttgggtt gaataaaatt ttctactaaa tttaggtgta 7500agtaaaataa tggaatatat
ataagtacaa taaaaaagtg cataaattaa aaaattttta 7560taataaatat tttttttaaa
aaagtcaata ataatattaa atatatataa cacaggatta 7620tatatgttca ctacaatttt
ttatattata atataaattc ttttcaattt tcattttatt 7680ttacatacac tttccttttt
tgtcactata ttttaatatt cacatattta gtttaaatac 7740tggctatttc tttctacatt
tgctagtaac aattgtgtag tgcttaaata tatacacaca 7800cctaaaactt acaaagtatc
ctaggaccat ggccaagcct ttgtctcaag aagaatccac 7860cctcattgaa agagcaacgg
ctacaatcaa cagcatcccc atctctgaag actacagcgt 7920cgccagcgca gctctctcta
gcgacggccg catcttcact ggtgtcaatg tatatcattt 7980tactggggga ccttgtgcag
aactcgtggt gctgggcact gctgctgctg cggcagctgg 8040caacctgact tgtatcgtcg
cgatcggaaa tgagaacagg ggcatcttga gcccctgcgg 8100acggtgccga caggtgcttc
tcgatctgca tcctgggatc aaagccatag tgaaggacag 8160tgatggacag ccgacggcag
ttgggattcg tgaattgctg ccctctggtt atgtgtggga 8220gggctaaccg cgggtacccc
attaaattta tttaataata gattaaaaat attataaaaa 8280taaaaacata aacacagaaa
ttacaaaaaa aatacatatg aatttttttt ttgtaatctt 8340ccttataaat atagaataat
gaatcatata aaacatatca ttattcattt atttacattt 8400aaaattattg tttcagtatc
tttaatttat tatgtatata taaaaataac ttacaatttt 8460attaataaac aatatatgtt
tattaattca tgttttgtaa tttatgggat agcgattttt 8520tttactgtct gtattttctt
ttttaattat gttttaattg tatttatttt atttttatta 8580ttgttctttt tatagtatta
ttttaaaaca aaatgtattt tctaagaact tataataata 8640ataatataaa ttttaataaa
aattatattt atcttttaca atatgaacat aaagtacaac 8700attaatatat agcttttaat
atttttattc ctaatcatgt aaatcttaaa tttttctttt 8760taaacatatg ttaaatattt
atttctcatt atatataaga acatatttat tacatctaga 8820ggtaccgagc tcgttttcga
cactggatgg cggcgttagt atcgaatcga cagcagtata 8880gcgaccagca ttcacatacg
attgacgcat gatattactt tctgcgcact taacttcgca 8940tctgggcaga tgatgtcgag
gcgaaaaaaa atataaatca cgctaacatt tgattaaaat 9000agaacaacta caatataaaa
aaactataca aatgacaagt tcttgaaaac aagaatcttt 9060ttattgtcag tactgattag
aaaaactcat cgagcatcaa atgaaactgc aatttattca 9120tatcaggatt atcaatacca
tatttttgaa aaagccgttt ctgtaatgaa ggagaaaact 9180caccgaggca gttccatagg
atggcaagat cctggtatcg gtctgcgatt ccgactcgtc 9240caacatcaat acaacctatt
aatttcccct cgtcaaaaat aaggttatca agtgagaaat 9300caccatgagt gacgactgaa
tccggtgaga atggcaaaag cttatgcatt tctttccaga 9360cttgttcaac aggccagcca
ttacgctcgt catcaaaatc actcgcatca accaaaccgt 9420tattcattcg tgattgcgcc
tgagcgagac gaaatacgcg atcgctgtta aaaggacaat 9480tacaaacagg aatcgaatgc
aaccggcgca ggaacactgc cagcgcatca acaatatttt 9540cacctgaatc aggatattct
tctaatacct ggaatgctgt tttgccgggg atcgcagtgg 9600tgagtaacca tgcatcatca
ggagtacgga taaaatgctt gatggtcgga agaggcataa 9660attccgtcag ccagtttagt
ctgaccatct catctgtaac atcattggca acgctacctt 9720tgccatgttt cagaaacaac
tctggcgcat cgggcttccc atacaatcga tagattgtcg 9780cacctgattg cccgacatta
tcgcgagccc atttataccc atataaatca gcatccatgt 9840tggaatttaa tcgcggcctc
gaaacgtgag tcttttcctt acccatggtt gtttatgttc 9900ggatgtgatg tgagaactgt
atcctagcaa gattttaaaa ggaagtatat gaaagaagaa 9960cctcagtggc aaatcctaac
cttttatatt tctctacagg ggcgcggcgt ggggacaatt 10020caacgcgtct gtgaggggag
cgtttccctg ctcgcaggtc tgcagcgagg agccgtaatt 10080tttgcttcgc gccgtgcggc
catcaaaatg tatggatgca aatgattata catggggatg 10140tatgggctaa atgtacgggc
gacagtcaca tcatgcccct gagctgcgca cgtcaagact 10200gtcaaggagg gtattctggg
cctccatgtc gctggcctaa cattagtaat gtaggtctga 10260ctttcactca tataagtctt
atggtaacta aactaaggtc ttacctttac tgatatatgt 10320cttactttca ctaacttagg
tattactttt actaacttag gtcttaaatt cagtaactaa 10380ggtcatactt cgactaacta
aggtcttaca ttcactgata taggtcttat gattactaac 10440ttaggtccta atttgactaa
cataagtcct aacattagta atgtaggtct taacttaact 10500aacttaggtc ttaccttcac
taatataggt cttaatatta ctgacttaag taattaaggt 10560actaacttag gtcgtaaggt
aactaatata taggtcttaa ggtaactaat ttaggtcttg 10620acttaataaa tataggtcct
aacataaata gtataggtcc taatataagt actataggcc 10680ttaacttaac caacataggt
cctaacataa gttatatagg tcttaacgta actaacataa 10740gtcattaagg tactaagttt
ggtcttaatt taacaataac atgtcgctgg cctaacatta 10800gtaatgtagg tctgactttc
actcatataa gtcttatggt aactaaacta aggtcttacc 10860tttactgata tatgtcttac
tttcactaac ttaggtatta cttttactaa cttaggtctt 10920aaattcagta actaaggtca
tacttcgact aactaaggtc ttacattcac tgatataggt 10980cttatgatta ctaacttagg
tcctaatttg actaacataa gtcctaacat tagtaatgta 11040ggtcttaact taactaactt
aggtcttacc ttcactaata taggtcttaa tattactgac 11100ttaagtaatt aaggtactaa
cttaggtcgt aaggtaacta atatataggt cttaaggtaa 11160ctaatttagg tcttgactta
ataaatatag gtcctaacat aaatagtata ggtcctaata 11220taagtactat aggccttaac
ttaaccaaca taggtcctaa cataagttat ataggtctta 11280acgtaactaa cataagtcat
taaggtacta agtttggtct taatttaaca ataaccatgt 11340cgctggccgg gtggtcttaa
tttaacaaat atagaccatg tcgctggccg ggtgacccgg 11400cggggacgag gcaagctaaa
cagatcctcg tgatacgcct atttttatag gttaatgtca 11460tgataataat ggtttcttag
gacggatcgc ttgcctgtaa cttacacgcg cctcgtatct 11520tttaatgatg gaataatttg
ggaatttact ctgtgtttat ttatttttat gttttgtatt 11580tggattttag aaagtaaata
aagaaggtag aagagttacg gaatgaagaa aaaaaaataa 11640acaaaggttt aaaaaatttc
aacaaaaagc gtactttaca tatatattta ttagacaaga 11700aaagcagatt aaatagatat
acattcgatt aacgataagt aaaatgtaaa atcacaggat 11760tttcgtgtgt ggtcttctac
acagacaaga tgaaacaatt cggcattaat acctgagagc 11820aggaagagca agataaaagg
tagtatttgt tggcgatccc cctagagtct tttacatctt 11880cggaaaacaa aaactatttt
ttctttaatt tcttttttta ctttctattt ttaatttata 11940tatttatatt aaaaaattta
aattataatt atttttatag cacgtgatga aaaggaccca 12000ggtggcactt ttcggggaaa
tctcgacctg cagcgtacga agct 1204438912044DNAArtificial
SequenceDescription of Artificial Sequence Syntheticpolynucleotide
389gcctcactga ttaagcattg gtaactgtca gaccaagttt actcatatat actttagatt
60gatttaaaac ttcattttta atttaaaagg atctaggtga agatcctttt tgataatctc
120atgaccaaaa tcccttaacg tgagttttcg ttccactgag cgtcagaccc cgtagaaaag
180atcaaaggat cttcttgaga tccttttttt ctgcgcgtaa tctgctgctt gcaaacaaaa
240aaaccaccgc taccagcggt ggtttgtttg ccggatcaag agctaccaac tctttttccg
300aaggtaactg gcttcagcag agcgcagata ccaaatactg ttcttctagt gtagccgtag
360ttaggccacc acttcaagaa ctctgtagca ccgcctacat acctcgctct gctaatcctg
420ttaccagtgg ctgctgccag tggcgataag tcgtgtctta ccgggttgga ctcaagacga
480tagttaccgg ataaggcgca gcggtcgggc tgaacggggg gttcgtgcac acagcccagc
540ttggagcgaa cgacctacac cgaactgaga tacctacagc gtgagctatg agaaagcgcc
600acgcttcccg aagggagaaa ggcggacagg tatccggtaa gcggcagggt cggaacagga
660gagcgcacga gggagcttcc agggggaaac gcctggtatc tttatagtcc tgtcgggttt
720cgccacctct gacttgagcg tcgatttttg tgatgctcgt caggggggcg gagcctatgg
780aaaaacgcca gcaacgcggc ctttttacgg ttcctggcct tttgctggcc ttttgctcac
840atgttctttc ctgcgttatc ccctgattct gtggataacc gtattaccgc ctttgagtga
900gctgataccg ctcgccgcag ccgaacgacc gagcgcagcg agtcagtgag cgaggaagcg
960gaagagcgcc caatacgcaa accgcctctc cccgcgcgtt ggccgattca ttaatgcagc
1020tggcacgaca ggtttcccga ctggaaagcg ggcagtgagc gcaacgcaat taatgtgagt
1080tagctcactc attaggcacc ccaggcttta cactttatgc ttccggctcg tatgttgtgt
1140ggaattgtga gcggataaca atttcacaca ggaaacagct atgaccatga ttacgccaag
1200ctatttaggt gacactatag aatactcaag cttgggggga tcctctagag tcgactaata
1260cgactcacta taggaaatga tatggatttt gggttttaga gctagaaata gcaagttaaa
1320ataaggctag tccgttatca acttgaaaaa gtggcaccga gtcggtgcta gcataacccc
1380ttggggcctc taaacgggtc ttgaggggtt ttttggtcga cctgcaggca tgctatttga
1440tgaattaact acacttaaaa taatacaatt attattaaat ttttttttga tttatttatt
1500aatttttaaa cttaatcatt tgtatttggg aggaattata tatatcttta taattatttt
1560attttttttt atttttttat ttttttatta ttattatttt tttttatttt ttttttttac
1620tgtatcaaag aaaaaccttt aaaaaaaaaa ttataatttc cccatcttac tatattttta
1680atacatacgt tttaaggaat taaattagac aaaagctata ttatgcttta catataatta
1740gaatttataa acgtttggtt attagatatt tcatgtctca gtaaagtctt tcaatacata
1800tgtaaaaaaa tatatatgaa tacacataag ttgttaatat attttatatg cataaatgta
1860taaatatata tatatatata tatatatgta tgtatgtata tgtgtgtata tgaaattatt
1920tcaatgttta attttttaaa ttttaatttt tttttttttt ttttttttta ttatgtatat
1980tgatctttat tatttaaata ttactttttt cgttttttct tctttttatt attttttttt
2040ttttttatat tttatacaaa tggtaattca aataaaaggt ataaatttat atttaatttt
2100cttttatgga taaataaaag aaaaatataa atatataaaa atataaaaat atatatatgt
2160atattggggt gatgataaaa tgaaagataa tatatatata tatatatctt tatttttttt
2220tttttgtaga ccccattgtg agtacataaa tatattatat aactcgggag catcagtcat
2280ggaattctta tttctttttc ttttttgcct ggccggcctt tttcgtggcc gccggccttt
2340tgtcgcctcc cagctgagac aggtcgatcc gtgtctcgta caggccggtg atgctctggt
2400ggatcagggt ggcgtccagc acctctttgg tgctggtgta cctcttccgg tcgatggtgg
2460tgtcaaagta cttgaaggcg gcaggggctc ccagattggt cagggtaaac aggtggatga
2520tattctcggc ctgctctctg atgggcttat cccggtgctt gttgtaggcg gacagcactt
2580tgtccagatt agcgtcggcc aggatcactc tcttggagaa ctcgctgatc tgctcgatga
2640tctcgtccag gtagtgcttg tgctgttcca caaacagctg tttctgctca ttatcctcgg
2700gggagccctt cagcttctca tagtggctgg ccaggtacag gaagttcaca tatttggagg
2760gcagggccag ttcgtttccc ttctgcagtt cgccggcaga ggccagcatt ctcttccggc
2820cgttttccag ctcgaacagg gagtacttag gcagcttgat gatcaggtcc tttttcactt
2880ctttgtagcc cttggcttcc agaaagtcga tgggattctt ctcgaagctg cttctttcca
2940tgatggtgat ccccagcagc tctttcacac tcttcagttt cttggacttg cccttttcca
3000ctttggccac caccagcaca gaataggcca cggtggggct gtcgaagccg ccgtacttct
3060tagggtccca gtccttcttt ctggcgatca gcttatcgct gttcctcttg ggcaggatag
3120actctttgct gaagccgcct gtctgcacct cggtcttttt cacgatattc acttggggca
3180tgctcagcac tttccgcacg gtggcaaaat cccggccctt atcccacacg atctccccgg
3240tttcgccgtt tgtctcgatc agaggccgct tccggatctc gccgttggcc agggtaatct
3300cggtcttgaa aaagttcatg atgttgctgt agaagaagta cttggcggta gccttgccga
3360tttcctgctc gctcttggcg atcatcttcc gcacgtcgta caccttgtag tcgccgtaca
3420cgaactcgct ttccagctta gggtactttt tgatcagggc ggttcccacg acggcgttca
3480ggtaggcgtc gtgggcgtgg tggtagttgt tgatctcgcg cactttgtaa aactggaaat
3540ccttccggaa atcggacacc agcttggact tcagggtgat cactttcact tcccggatca
3600gcttgtcatt ctcgtcgtac ttagtgttca tccgggagtc caggatctgt gccacgtgct
3660ttgtgatctg ccgggtttcc accagctgtc tcttgatgaa gccggcctta tccagttcgc
3720tcaggccgcc tctctcggcc ttggtcagat tgtcgaactt tctctgggta atcagcttgg
3780cgttcagcag ctgccgccag tagttcttca tcttcttcac gacctcttcg gagggcacgt
3840tgtcgctctt gccccggttc ttgtcgcttc tggtcagcac cttgttgtcg atggagtcgt
3900ccttcagaaa gctctgaggc acgatatggt ccacatcgta gtcggacagc cggttgatgt
3960ccagttcctg gtccacgtac atatcccgcc cattctgcag gtagtacagg tacagcttct
4020cgttctgcag ctgggtgttt tccacggggt gttctttcag gatctggctg cccagctctt
4080tgatgccctc ttcgatccgc ttcattctct cgcggctgtt cttctgtccc ttctgggtgg
4140tctggttctc tctggccatt tcgatcacga tgttctcggg cttgtgccgg cccatcactt
4200tcacgagctc gtccaccacc ttcactgtct gcaggatgcc cttcttaatg gcggggctgc
4260cggccagatt ggcaatgtgc tcgtgcaggc tatcgccctg gccggacacc tgggctttct
4320ggatgtcctc tttaaaggtc aggctgtcgt cgtggatcag ctgcatgaag tttctgttgg
4380cgaagccgtc ggacttcagg aaatccagga ttgtcttgcc ggactgcttg tcccggatgc
4440cgttgatcag cttccggctc agcctgcccc agccggtgta tctccgccgc ttcagctgct
4500tcatcacttt gtcgtcgaac aggtgggcat aggttttcag ccgttcctcg atcatctctc
4560tgtcctcaaa cagtgtcagg gtcagcacga tatcttccag aatgtcctcg ttttcctcat
4620tgtccaggaa gtccttgtcc ttgataattt tcagcagatc gtggtatgtg cccagggagg
4680cgttgaaccg atcttccacg ccggagattt ccacggagtc gaagcactcg attttcttga
4740agtagtcctc tttcagctgc ttcacggtca ctttccggtt ggtcttgaac agcaggtcca
4800cgatggcctt tttctgctcg ccgctcagga aggcgggctt tctcattccc tcggtcacgt
4860atttcacttt ggtcagctcg ttatacacgg tgaagtactc gtacagcagg ctgtgcttgg
4920gcagcacctt ctcgttgggc aggttcttat cgaagttggt catccgctcg atgaagctct
4980gggcggaagc gcccttgtcc accacttcct cgaagttcca gggggtgatg gtttcctcgc
5040tctttctggt catccaggcg aatctgctgt ttcccctggc cagagggccc acgtagtagg
5100ggatgcggaa ggtcaggatc ttctcgatct tttcccggtt gtccttcagg aatgggtaaa
5160aatcttcctg ccgccgcaga atggcgtgca gctctcccag gtggatctgg tgggggatgc
5220tgccgttgtc gaaggtccgc tgcttccgca gcaggtcctc tctgttcagc ttcacgagca
5280gttcctcggt gccgtccatc ttttccagga tgggcttgat gaacttgtag aactcttcct
5340ggctggctcc gccgtcaatg tagccggcgt agccgttctt gctctggtcg aagaaaatct
5400ctttgtactt ctcaggcagc tgctgccgca cgagagcttt cagcagggtc aggtcctggt
5460ggtgctcgtc gtatctcttg atcatagagg cgctcagggg ggccttggtg atctcggtgt
5520tcactctcag gatgtcgctc agcaggatgg cgtcggacag gttcttggcg gccagaaaca
5580ggtcggcgta ctggtcgccg atctgggcca gcaggttgtc caggtcgtcg tcgtaggtgt
5640ccttgctcag ctgcagtttg gcatcctcgg ccaggtcgaa gttgctcttg aagttggggg
5700tcaggcccag gctcagggca atcaggtttc cgaacaggcc attcttcttc tcgccgggca
5760gctgggcgat cagattttcc agccgtctgc tcttgctcag tctggcagac aggatggcct
5820tggcgtccac gccgctggcg ttgatggggt tttcctcgaa cagctggttg taggtctgca
5880ccagctggat gaacagcttg tccacgtcgc tgttgtcggg gttcaggtcg ccctcgatca
5940ggaagtggcc ccggaacttg atcatgtggg ccagggccag atagatcagc cgcaggtcgg
6000ccttgtcggt gctgtccacc agtttctttc tcaggtggta gatggtgggg tacttctcgt
6060ggtaggccac ctcgtccacg atgttgccga agatggggtg ccgctcgtgc ttcttatcct
6120cttccaccag gaaggactct tccagtctgt ggaagaagct gtcgtccacc ttggccatct
6180cgttgctgaa gatctcttgc agatagcaga tccggttctt ccgtctggtg tatcttcttc
6240tggcggttct cttcagccgg gtggcctcgg ctgtttcgcc gctgtcgaac agcagggctc
6300cgatcaggtt cttcttgatg ctgtgccggt cggtgttgcc cagcaccttg aatttcttgc
6360tgggcacctt gtactcgtcg gtgatcacgg cccagcccac agagttggtg ccgatgtcca
6420ggccgatgct gtacttcttg tcggctgctg ggactccgtg gataccgacc ttccgcttct
6480tctttggggc catcttatcg tcatcgtctt tgtaatcaat atcatgatcc ttgtagtctc
6540cgtcgtggtc cttatagtcc atttttctcg agggatcctg atatatttct attaggtatt
6600tattattata aaatataaat cttgaatgat aataaataaa atattagtta ttccttttct
6660agtttaaaat atacatatta taaatatata tatatatata tatattttta ttgtgacaag
6720aatatataat tataaattat attatttatt tttgtatttt tttttttttt tttttttttt
6780tctttttttg ttttattttt cttttttttt ataaatatta tttttttctt ttatcatgca
6840cattggaata atacattaat atatatatat atattatatt atacatatat tgaataatgt
6900ttataaaaaa tgcataactt atatgaatat aatttttttt aaatatgaca aaaagaaaaa
6960aaaaaaaaac caaaaaaaat taaaattgaa atgaaatata taaatatatt atttatatat
7020attatacatt gtttaatact actacatgta tatatatata ttatatatat atatatatat
7080caattttttc aaaaataaat taatataaaa agaggggaaa aaaaaaaaaa aaaaaaaaaa
7140aagataatta agtaagcatt taaaaatata taaattgata atatataaaa ttaatcacat
7200ataaaagctt ataaacacta ggttagctaa ttcgcttgta agaggtactc tcgtttatgc
7260aaaactattt gatatagcat tttaacaagt acacatatat atatgtaata tatatactat
7320atatatctat tgcatgtgta ctaagcatgt gcatggcatc ccctttttct cgtgtttaaa
7380acagtttgta tgataaaata taaaggattt gaaaaagaga aaaaaatata tgatctcatc
7440ctatatagcg ccataatttt tatttgggtt gaataaaatt ttctactaaa tttaggtgta
7500agtaaaataa tggaatatat ataagtacaa taaaaaagtg cataaattaa aaaattttta
7560taataaatat tttttttaaa aaagtcaata ataatattaa atatatataa cacaggatta
7620tatatgttca ctacaatttt ttatattata atataaattc ttttcaattt tcattttatt
7680ttacatacac tttccttttt tgtcactata ttttaatatt cacatattta gtttaaatac
7740tggctatttc tttctacatt tgctagtaac aattgtgtag tgcttaaata tatacacaca
7800cctaaaactt acaaagtatc ctaggaccat ggccaagcct ttgtctcaag aagaatccac
7860cctcattgaa agagcaacgg ctacaatcaa cagcatcccc atctctgaag actacagcgt
7920cgccagcgca gctctctcta gcgacggccg catcttcact ggtgtcaatg tatatcattt
7980tactggggga ccttgtgcag aactcgtggt gctgggcact gctgctgctg cggcagctgg
8040caacctgact tgtatcgtcg cgatcggaaa tgagaacagg ggcatcttga gcccctgcgg
8100acggtgccga caggtgcttc tcgatctgca tcctgggatc aaagccatag tgaaggacag
8160tgatggacag ccgacggcag ttgggattcg tgaattgctg ccctctggtt atgtgtggga
8220gggctaaccg cgggtacccc attaaattta tttaataata gattaaaaat attataaaaa
8280taaaaacata aacacagaaa ttacaaaaaa aatacatatg aatttttttt ttgtaatctt
8340ccttataaat atagaataat gaatcatata aaacatatca ttattcattt atttacattt
8400aaaattattg tttcagtatc tttaatttat tatgtatata taaaaataac ttacaatttt
8460attaataaac aatatatgtt tattaattca tgttttgtaa tttatgggat agcgattttt
8520tttactgtct gtattttctt ttttaattat gttttaattg tatttatttt atttttatta
8580ttgttctttt tatagtatta ttttaaaaca aaatgtattt tctaagaact tataataata
8640ataatataaa ttttaataaa aattatattt atcttttaca atatgaacat aaagtacaac
8700attaatatat agcttttaat atttttattc ctaatcatgt aaatcttaaa tttttctttt
8760taaacatatg ttaaatattt atttctcatt atatataaga acatatttat tacatctaga
8820ggtaccgagc tcgttttcga cactggatgg cggcgttagt atcgaatcga cagcagtata
8880gcgaccagca ttcacatacg attgacgcat gatattactt tctgcgcact taacttcgca
8940tctgggcaga tgatgtcgag gcgaaaaaaa atataaatca cgctaacatt tgattaaaat
9000agaacaacta caatataaaa aaactataca aatgacaagt tcttgaaaac aagaatcttt
9060ttattgtcag tactgattag aaaaactcat cgagcatcaa atgaaactgc aatttattca
9120tatcaggatt atcaatacca tatttttgaa aaagccgttt ctgtaatgaa ggagaaaact
9180caccgaggca gttccatagg atggcaagat cctggtatcg gtctgcgatt ccgactcgtc
9240caacatcaat acaacctatt aatttcccct cgtcaaaaat aaggttatca agtgagaaat
9300caccatgagt gacgactgaa tccggtgaga atggcaaaag cttatgcatt tctttccaga
9360cttgttcaac aggccagcca ttacgctcgt catcaaaatc actcgcatca accaaaccgt
9420tattcattcg tgattgcgcc tgagcgagac gaaatacgcg atcgctgtta aaaggacaat
9480tacaaacagg aatcgaatgc aaccggcgca ggaacactgc cagcgcatca acaatatttt
9540cacctgaatc aggatattct tctaatacct ggaatgctgt tttgccgggg atcgcagtgg
9600tgagtaacca tgcatcatca ggagtacgga taaaatgctt gatggtcgga agaggcataa
9660attccgtcag ccagtttagt ctgaccatct catctgtaac atcattggca acgctacctt
9720tgccatgttt cagaaacaac tctggcgcat cgggcttccc atacaatcga tagattgtcg
9780cacctgattg cccgacatta tcgcgagccc atttataccc atataaatca gcatccatgt
9840tggaatttaa tcgcggcctc gaaacgtgag tcttttcctt acccatggtt gtttatgttc
9900ggatgtgatg tgagaactgt atcctagcaa gattttaaaa ggaagtatat gaaagaagaa
9960cctcagtggc aaatcctaac cttttatatt tctctacagg ggcgcggcgt ggggacaatt
10020caacgcgtct gtgaggggag cgtttccctg ctcgcaggtc tgcagcgagg agccgtaatt
10080tttgcttcgc gccgtgcggc catcaaaatg tatggatgca aatgattata catggggatg
10140tatgggctaa atgtacgggc gacagtcaca tcatgcccct gagctgcgca cgtcaagact
10200gtcaaggagg gtattctggg cctccatgtc gctggcctaa cattagtaat gtaggtctga
10260ctttcactca tataagtctt atggtaacta aactaaggtc ttacctttac tgatatatgt
10320cttactttca ctaacttagg tattactttt actaacttag gtcttaaatt cagtaactaa
10380ggtcatactt cgactaacta aggtcttaca ttcactgata taggtcttat gattactaac
10440ttaggtccta atttgactaa cataagtcct aacattagta atgtaggtct taacttaact
10500aacttaggtc ttaccttcac taatataggt cttaatatta ctgacttaag taattaaggt
10560actaacttag gtcgtaaggt aactaatata taggtcttaa ggtaactaat ttaggtcttg
10620acttaataaa tataggtcct aacataaata gtataggtcc taatataagt actataggcc
10680ttaacttaac caacataggt cctaacataa gttatatagg tcttaacgta actaacataa
10740gtcattaagg tactaagttt ggtcttaatt taacaataac atgtcgctgg cctaacatta
10800gtaatgtagg tctgactttc actcatataa gtcttatggt aactaaacta aggtcttacc
10860tttactgata tatgtcttac tttcactaac ttaggtatta cttttactaa cttaggtctt
10920aaattcagta actaaggtca tacttcgact aactaaggtc ttacattcac tgatataggt
10980cttatgatta ctaacttagg tcctaatttg actaacataa gtcctaacat tagtaatgta
11040ggtcttaact taactaactt aggtcttacc ttcactaata taggtcttaa tattactgac
11100ttaagtaatt aaggtactaa cttaggtcgt aaggtaacta atatataggt cttaaggtaa
11160ctaatttagg tcttgactta ataaatatag gtcctaacat aaatagtata ggtcctaata
11220taagtactat aggccttaac ttaaccaaca taggtcctaa cataagttat ataggtctta
11280acgtaactaa cataagtcat taaggtacta agtttggtct taatttaaca ataaccatgt
11340cgctggccgg gtggtcttaa tttaacaaat atagaccatg tcgctggccg ggtgacccgg
11400cggggacgag gcaagctaaa cagatcctcg tgatacgcct atttttatag gttaatgtca
11460tgataataat ggtttcttag gacggatcgc ttgcctgtaa cttacacgcg cctcgtatct
11520tttaatgatg gaataatttg ggaatttact ctgtgtttat ttatttttat gttttgtatt
11580tggattttag aaagtaaata aagaaggtag aagagttacg gaatgaagaa aaaaaaataa
11640acaaaggttt aaaaaatttc aacaaaaagc gtactttaca tatatattta ttagacaaga
11700aaagcagatt aaatagatat acattcgatt aacgataagt aaaatgtaaa atcacaggat
11760tttcgtgtgt ggtcttctac acagacaaga tgaaacaatt cggcattaat acctgagagc
11820aggaagagca agataaaagg tagtatttgt tggcgatccc cctagagtct tttacatctt
11880cggaaaacaa aaactatttt ttctttaatt tcttttttta ctttctattt ttaatttata
11940tatttatatt aaaaaattta aattataatt atttttatag cacgtgatga aaaggaccca
12000ggtggcactt ttcggggaaa tctcgacctg cagcgtacga agct
12044
User Contributions:
Comment about this patent or add new information about this topic: