Patent application title: ARC-BASED CAPSIDS AND USES THEREOF
Inventors:
IPC8 Class: AA61K4800FI
USPC Class:
1 1
Class name:
Publication date: 2022-03-24
Patent application number: 20220088224
Abstract:
Disclosed herein, in certain embodiments, are recombinant Arc and
endogenous Gag polypeptides, and methods of using recombinant Arc and
endogenous Gag polypeptides.Claims:
1. A capsid comprising a recombinant endogenous Gag polypeptide and a
therapeutic agent.
2. The capsid of claim 1, wherein the therapeutic agent is a nucleic acid.
3. The capsid of claim 2, wherein the nucleic acid is an RNA.
4. (canceled)
5. (canceled)
6. The capsid of any of claim 1, wherein the recombinant endogenous Gag polypeptide is a human endogenous Gag polypeptide.
7. The capsid of claim 6, wherein the recombinant human endogenous Gag polypeptide is an endogenous Gag polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 16 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 16; b) an amino acid sequence that is SEQ ID NO: 17 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 17; c) an amino acid sequence that is SEQ ID NO: 18 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 18; d) an amino acid sequence that is SEQ ID NO: 19 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 19; e) an amino acid sequence that is SEQ ID NO: 20 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 20; f) an amino acid sequence that is SEQ ID NO: 21 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 21; g) an amino acid sequence that is SEQ ID NO: 22 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 22; h) an amino acid sequence that is SEQ ID NO: 23 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 23; i) an amino acid sequence that is SEQ ID NO: 24 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 24; j) an amino acid sequence that is SEQ ID NO: 25 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 25; or k) an amino acid sequence that is SEQ ID NO: 26 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 26; or l) an amino acid sequence that is SEQ ID NO: 27 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 27; or m) an amino acid sequence that is SEQ ID NO: 28 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 28.
8. A capsid comprising a recombinant Arc polypeptide or a recombinant endogenous Gag polypeptide, wherein the recombinant Arc polypeptide is not a rat Arc polypeptide or a Drosophila melanogaster Arc polypeptide.
9. The capsid of claim 8, further comprising a cargo.
10. The capsid of claim 9, wherein the cargo is a nucleic acid.
11. The capsid of claim 10, wherein the cargo is an RNA.
12. The capsid of claim 9, wherein the cargo is a therapeutic agent.
13. The capsid of claim 8, wherein the recombinant Arc polypeptide is an Arc polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 2 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 2; b) an amino acid sequence that is SEQ ID NO: 3 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 3; c) an amino acid sequence that is SEQ ID NO: 4 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 4; d) an amino acid sequence that is SEQ ID NO: 5 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 5; e) an amino acid sequence that is SEQ ID NO: 6 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 6; f) an amino acid sequence that is SEQ ID NO: 7 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 7; g) an amino acid sequence that is SEQ ID NO: 8 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 8; h) an amino acid sequence that is SEQ ID NO: 9 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 9; i) an amino acid sequence that is SEQ ID NO: 10 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 10; or j) an amino acid sequence that is SEQ ID NO: 11 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 11; or k) an amino acid sequence that is SEQ ID NO: 12 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 12; or l) an amino acid sequence that is SEQ ID NO: 13 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 13; or m) an amino acid sequence that is SEQ ID NO: 14 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 14; or n) an amino acid sequence that is SEQ ID NO: 15 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 15.
14. The capsid of any of claim 8, wherein the recombinant endogenous Gag polypeptide is an endogenous Gag polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 16 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 16; b) an amino acid sequence that is SEQ ID NO: 17 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 17; c) an amino acid sequence that is SEQ ID NO: 18 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 18; d) an amino acid sequence that is SEQ ID NO: 19 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 19; e) an amino acid sequence that is SEQ ID NO: 20 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 20; f) an amino acid sequence that is SEQ ID NO: 21 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 21; or g) an amino acid sequence that is SEQ ID NO: 22 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 22; or h) an amino acid sequence that is SEQ ID NO: 23 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 23; or i) an amino acid sequence that is SEQ ID NO: 24 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 24; or j) an amino acid sequence that is SEQ ID NO: 25 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 25; or k) an amino acid sequence that is SEQ ID NO: 26 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 26; or l) an amino acid sequence that is SEQ ID NO: 27 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 27; or m) an amino acid sequence that is SEQ ID NO: 28 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 28.
15. A vector comprising DNA encoding the recombinant Arc polypeptide or the recombinant endogenous Gag polypeptide of claim 8.
16. A method of delivering a cargo to a cell comprising administering the capsid claim 9 to the cell.
17. The method of claim 16, wherein the cell is a eukaryotic cell.
18. The method of claim 16, wherein the cell is a vertebrate cell.
19. The method of claim 16, wherein the cell is a mammalian cell.
20. The method of claim 16, wherein the cell is a human cell.
21. The method of claim 16, wherein the cargo is a nucleic acid.
22. The method of claim 21, wherein the cell expresses a gene encoded by the nucleic acid.
23. The method of claim 16, wherein the cargo is a therapeutic agent.
24. A method of transfecting a nucleic acid into a cell comprising administering the capsid of claim 10 to the cell.
25. The capsid of claim 8, wherein the recombinant Arc polypeptide is an Arc polypeptide comprising an amino acid sequence that is SEQ ID NO: 1 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 1.
Description:
CROSS REFERENCE
[0001] This application is a national phase entry of International Application No. PCT/US2019/051786, filed Sep. 18, 2019, which claims the benefit of U.S. Provisional Patent Application No. 62/733,015, filed Sep. 18, 2018, each of which is incorporated herein by reference in its entirety.
SEQUENCE LISTING
[0002] The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Dec. 6, 2021, is named 54838-702_601_SL.txt, and is 148,316 bytes in size.
SUMMARY OF THE DISCLOSURE
[0003] Disclosed herein, in certain embodiments, are recombinant and engineered Arc polypeptides and recombinant and engineered endogenous Gag (endo-Gag) polypeptides. In some embodiments, also included are Arc-based capsids and endo-Gag based capsids, either loaded or empty, and methods of preparing the capsids. Additionally included are methods of delivery of the Arc-based capsids and endo-Gag-based capsids to a site of interest.
[0004] Disclosed herein, in certain embodiments, is a capsid comprising a recombinant Arc polypeptide or a recombinant endogenous Gag polypeptide and a therapeutic agent. In some embodiments, the therapeutic agent is a nucleic acid. In some embodiments, the nucleic acid is an RNA. In some embodiments, the recombinant Arc polypeptide is a human Arc polypeptide comprising an amino acid sequence that is SEQ ID NO: 1 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 1. In some embodiments, the recombinant Arc polypeptide is an Arc polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 2 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 2; b) an amino acid sequence that is SEQ ID NO: 3 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 3; c) an amino acid sequence that is SEQ ID NO: 4 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 4; d) an amino acid sequence that is SEQ ID NO: 5 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 5; e) an amino acid sequence that is SEQ ID NO: 6 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 6; f) an amino acid sequence that is SEQ ID NO: 7 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 7; g) an amino acid sequence that is SEQ ID NO: 8 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 8; h) an amino acid sequence that is SEQ ID NO: 9 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 9; i) an amino acid sequence that is SEQ ID NO: 10 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 10; or j) an amino acid sequence that is SEQ ID NO: 11 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 11; or k) an amino acid sequence that is SEQ ID NO: 12 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 12; or l) an amino acid sequence that is SEQ ID NO: 13 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 13; or m) an amino acid sequence that is SEQ ID NO: 14 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 14; or n) an amino acid sequence that is SEQ ID NO: 15 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 15. In some embodiments, the recombinant endogenous Gag polypeptide is a human endogenous Gag polypeptide. In some embodiments, the recombinant endogenous Gag polypeptide is an endogenous Gag polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 16 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 16; b) an amino acid sequence that is SEQ ID NO: 17 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 17; c) an amino acid sequence that is SEQ ID NO: 18 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 18; d) an amino acid sequence that is SEQ ID NO: 19 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 19; e) an amino acid sequence that is SEQ ID NO: 20 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 20; f) an amino acid sequence that is SEQ ID NO: 21 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 21; or g) an amino acid sequence that is SEQ ID NO: 22 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 22; or h) an amino acid sequence that is SEQ ID NO: 23 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 23; or i) an amino acid sequence that is SEQ ID NO: 24 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 24; or j) an amino acid sequence that is SEQ ID NO: 25 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 25; or k) an amino acid sequence that is SEQ ID NO: 26 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 26; or l) an amino acid sequence that is SEQ ID NO: 27 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 27 or m) an amino acid sequence that is SEQ ID NO: 28 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 28.
[0005] Disclosed herein, in certain embodiments, is a capsid comprising a recombinant Arc polypeptide or a recombinant endogenous Gag polypeptide, wherein the recombinant Arc polypeptide is not a rat Arc polypeptide or a human Arc polypeptide. In some embodiments, the capsid further comprises a cargo. In some embodiments, the cargo is a nucleic acid. In some embodiments, the cargo is an RNA. In some embodiments, the cargo is a therapeutic agent. In some embodiments, the recombinant Arc polypeptide is an Arc polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 2 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 2; b) an amino acid sequence that is SEQ ID NO: 3 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 3; c) an amino acid sequence that is SEQ ID NO: 4 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 4; d) an amino acid sequence that is SEQ ID NO: 5 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 5; e) an amino acid sequence that is SEQ ID NO: 6 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 6; f) an amino acid sequence that is SEQ ID NO: 7 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 7; g) an amino acid sequence that is SEQ ID NO: 8 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 8; h) an amino acid sequence that is SEQ ID NO: 9 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 9; i) an amino acid sequence that is SEQ ID NO: 10 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 10; or j) an amino acid sequence that is SEQ ID NO: 11 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 11; or k) an amino acid sequence that is SEQ ID NO: 12 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 12; or l) an amino acid sequence that is SEQ ID NO: 13 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 13; or m) an amino acid sequence that is SEQ ID NO: 14 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 14; or n) an amino acid sequence that is SEQ ID NO: 15 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 15. In some embodiments, the recombinant endogenous Gag polypeptide is an endogenous Gag polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 16 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 16; b) an amino acid sequence that is SEQ ID NO: 17 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 17; c) an amino acid sequence that is SEQ ID NO: 18 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 18; d) an amino acid sequence that is SEQ ID NO: 19 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 19; e) an amino acid sequence that is SEQ ID NO: 20 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 20; f) an amino acid sequence that is SEQ ID NO: 21 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 21; or g) an amino acid sequence that is SEQ ID NO: 22 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 22; or h) an amino acid sequence that is SEQ ID NO: 23 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 23; or i) an amino acid sequence that is SEQ ID NO: 24 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 24; or j) an amino acid sequence that is SEQ ID NO: 25 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 25; or k) an amino acid sequence that is SEQ ID NO: 26 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 26; or l) an amino acid sequence that is SEQ ID NO: 27 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 27 or m) an amino acid sequence that is SEQ ID NO: 28 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 28.
[0006] Disclosed herein, in certain embodiments, is a vector comprising DNA encoding a recombinant Arc polypeptide or a recombinant endogenous Gag polypeptide. In some embodiments, the vector further encodes a therapeutic agent. In some embodiments, the therapeutic agent is a nucleic acid. In some embodiments, the nucleic acid is an RNA. In some embodiments, the recombinant Arc polypeptide is a human Arc polypeptide comprising an amino acid sequence that is SEQ ID NO: 1 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 1. In some embodiments, the recombinant Arc polypeptide is an Arc polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 2 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 2; b) an amino acid sequence that is SEQ ID NO: 3 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 3; c) an amino acid sequence that is SEQ ID NO: 4 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 4; d) an amino acid sequence that is SEQ ID NO: 5 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 5; e) an amino acid sequence that is SEQ ID NO: 6 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 6; f) an amino acid sequence that is SEQ ID NO: 7 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 7; g) an amino acid sequence that is SEQ ID NO: 8 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 8; h) an amino acid sequence that is SEQ ID NO: 9 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 9; i) an amino acid sequence that is SEQ ID NO: 10 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 10; or j) an amino acid sequence that is SEQ ID NO: 11 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 11; or k) an amino acid sequence that is SEQ ID NO: 12 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 12; or l) an amino acid sequence that is SEQ ID NO: 13 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 13; or m) an amino acid sequence that is SEQ ID NO: 14 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 14; or n) an amino acid sequence that is SEQ ID NO: 15 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 15. In some embodiments, the recombinant endogenous Gag polypeptide is a human endogenous Gag polypeptide. In some embodiments, the recombinant endogenous Gag polypeptide is an endogenous Gag polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 16 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 16; b) an amino acid sequence that is SEQ ID NO: 17 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 17; c) an amino acid sequence that is SEQ ID NO: 18 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 18; d) an amino acid sequence that is SEQ ID NO: 19 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 19; e) an amino acid sequence that is SEQ ID NO: 20 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 20; f) an amino acid sequence that is SEQ ID NO: 21 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 21; or g) an amino acid sequence that is SEQ ID NO: 22 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 22; or h) an amino acid sequence that is SEQ ID NO: 23 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 23; or i) an amino acid sequence that is SEQ ID NO: 24 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 24; or j) an amino acid sequence that is SEQ ID NO: 25 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 25; or k) an amino acid sequence that is SEQ ID NO: 26 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 26; or l) an amino acid sequence that is SEQ ID NO: 27 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 27 or m) an amino acid sequence that is SEQ ID NO: 28 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 28.
[0007] Disclosed herein, in certain embodiments, is a vector comprising DNA encoding a recombinant Arc polypeptide or a recombinant endogenous Gag polypeptide, wherein the recombinant Arc polypeptide is not a rat Arc polypeptide or a human Arc polypeptide. In some embodiments, the vector further encodes a cargo. In some embodiments, the cargo is a nucleic acid. In some embodiments, the cargo is an RNA. In some embodiments, the cargo is a therapeutic agent. In some embodiments, the recombinant Arc polypeptide is an Arc polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 2 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 2; b) an amino acid sequence that is SEQ ID NO: 3 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 3; c) an amino acid sequence that is SEQ ID NO: 4 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 4; d) an amino acid sequence that is SEQ ID NO: 5 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 5; e) an amino acid sequence that is SEQ ID NO: 6 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 6; f) an amino acid sequence that is SEQ ID NO: 7 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 7; g) an amino acid sequence that is SEQ ID NO: 8 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 8; h) an amino acid sequence that is SEQ ID NO: 9 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 9; i) an amino acid sequence that is SEQ ID NO: 10 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 10; or j) an amino acid sequence that is SEQ ID NO: 11 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 11; or k) an amino acid sequence that is SEQ ID NO: 12 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 12; or l) an amino acid sequence that is SEQ ID NO: 13 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 13; or m) an amino acid sequence that is SEQ ID NO: 14 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 14; or n) an amino acid sequence that is SEQ ID NO: 15 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 15. In some embodiments, the recombinant endogenous Gag polypeptide is an endogenous Gag polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 16 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 16; b) an amino acid sequence that is SEQ ID NO: 17 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 17; c) an amino acid sequence that is SEQ ID NO: 18 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 18; d) an amino acid sequence that is SEQ ID NO: 19 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 19; e) an amino acid sequence that is SEQ ID NO: 20 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 20; f) an amino acid sequence that is SEQ ID NO: 21 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 21; or g) an amino acid sequence that is SEQ ID NO: 22 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 22; or h) an amino acid sequence that is SEQ ID NO: 23 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 23; or i) an amino acid sequence that is SEQ ID NO: 24 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 24; or j) an amino acid sequence that is SEQ ID NO: 25 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 25; or k) an amino acid sequence that is SEQ ID NO: 26 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 26; or l) an amino acid sequence that is SEQ ID NO: 27 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 27 or m) an amino acid sequence that is SEQ ID NO: 28 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 28.
[0008] Disclosed herein, in certain embodiments, is a method of delivering a cargo to a cell comprising administering to the cell a capsid comprising a recombinant Arc polypeptide or a recombinant endogenous Gag polypeptide and a therapeutic agent. In some embodiments, the therapeutic agent is a nucleic acid. In some embodiments, the nucleic acid is an RNA. In some embodiments, the recombinant Arc polypeptide is a human Arc polypeptide comprising an amino acid sequence that is SEQ ID NO: 1 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 1. In some embodiments, the recombinant Arc polypeptide is an Arc polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 2 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 2; b) an amino acid sequence that is SEQ ID NO: 3 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 3; c) an amino acid sequence that is SEQ ID NO: 4 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 4; d) an amino acid sequence that is SEQ ID NO: 5 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 5; e) an amino acid sequence that is SEQ ID NO: 6 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 6; f) an amino acid sequence that is SEQ ID NO: 7 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 7; g) an amino acid sequence that is SEQ ID NO: 8 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 8; h) an amino acid sequence that is SEQ ID NO: 9 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 9; i) an amino acid sequence that is SEQ ID NO: 10 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 10; or j) an amino acid sequence that is SEQ ID NO: 11 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 11; or k) an amino acid sequence that is SEQ ID NO: 12 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 12; or l) an amino acid sequence that is SEQ ID NO: 13 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 13; or m) an amino acid sequence that is SEQ ID NO: 14 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 14; or n) an amino acid sequence that is SEQ ID NO: 15 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 15. In some embodiments, the recombinant endogenous Gag polypeptide is a human endogenous Gag polypeptide. In some embodiments, the recombinant endogenous Gag polypeptide is an endogenous Gag polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 16 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 16; b) an amino acid sequence that is SEQ ID NO: 17 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 17; c) an amino acid sequence that is SEQ ID NO: 18 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 18; d) an amino acid sequence that is SEQ ID NO: 19 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 19; e) an amino acid sequence that is SEQ ID NO: 20 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 20; f) an amino acid sequence that is SEQ ID NO: 21 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 21; or g) an amino acid sequence that is SEQ ID NO: 22 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 22; or h) an amino acid sequence that is SEQ ID NO: 23 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 23; or i) an amino acid sequence that is SEQ ID NO: 24 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 24; or j) an amino acid sequence that is SEQ ID NO: 25 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 25; or k) an amino acid sequence that is SEQ ID NO: 26 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 26; or l) an amino acid sequence that is SEQ ID NO: 27 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 27 or m) an amino acid sequence that is SEQ ID NO: 28 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 28. In some embodiments, the cell is a eukaryotic cell. In some embodiments, the cell is a vertebrate cell. In some embodiments, the cell is a mammalian cell. In some embodiments, the cell is a human cell. In some embodiments, the cargo is a nucleic acid. In some embodiments, the cell expresses a gene encoded by the nucleic acid. In some embodiments, the cargo is a therapeutic agent.
[0009] Disclosed herein, in certain embodiments, is a method of delivering a cargo to a cell comprising administering to the cell a capsid comprising a recombinant Arc polypeptide or a recombinant endogenous Gag polypeptide, wherein the recombinant Arc polypeptide is not a rat Arc polypeptide or a human Arc polypeptide. In some embodiments, the capsid further comprises a cargo. In some embodiments, the cargo is a nucleic acid. In some embodiments, the cargo is an RNA. In some embodiments, the cargo is a therapeutic agent. In some embodiments, the recombinant Arc polypeptide is an Arc polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 2 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 2; b) an amino acid sequence that is SEQ ID NO: 3 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 3; c) an amino acid sequence that is SEQ ID NO: 4 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 4; d) an amino acid sequence that is SEQ ID NO: 5 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 5; e) an amino acid sequence that is SEQ ID NO: 6 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 6; f) an amino acid sequence that is SEQ ID NO: 7 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 7; g) an amino acid sequence that is SEQ ID NO: 8 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 8; h) an amino acid sequence that is SEQ ID NO: 9 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 9; i) an amino acid sequence that is SEQ ID NO: 10 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 10; or j) an amino acid sequence that is SEQ ID NO: 11 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 11; or k) an amino acid sequence that is SEQ ID NO: 12 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 12; or l) an amino acid sequence that is SEQ ID NO: 13 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 13; or m) an amino acid sequence that is SEQ ID NO: 14 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 14; or n) an amino acid sequence that is SEQ ID NO: 15 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 15. In some embodiments, the recombinant endogenous Gag polypeptide is an endogenous Gag polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 16 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 16; b) an amino acid sequence that is SEQ ID NO: 17 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 17; c) an amino acid sequence that is SEQ ID NO: 18 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 18; d) an amino acid sequence that is SEQ ID NO: 19 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 19; e) an amino acid sequence that is SEQ ID NO: 20 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 20; f) an amino acid sequence that is SEQ ID NO: 21 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 21; or g) an amino acid sequence that is SEQ ID NO: 22 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 22; or h) an amino acid sequence that is SEQ ID NO: 23 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 23; or i) an amino acid sequence that is SEQ ID NO: 24 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 24; or j) an amino acid sequence that is SEQ ID NO: 25 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 25; or k) an amino acid sequence that is SEQ ID NO: 26 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 26; or l) an amino acid sequence that is SEQ ID NO: 27 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 27 or m) an amino acid sequence that is SEQ ID NO: 28 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 28. In some embodiments, the cell is a eukaryotic cell. In some embodiments, the cell is a vertebrate cell. In some embodiments, the cell is a mammalian cell. In some embodiments, the cell is a human cell. In some embodiments, the cargo is a nucleic acid. In some embodiments, the cell expresses a gene encoded by the nucleic acid. In some embodiments, the cargo is a therapeutic agent.
[0010] Disclosed herein, in certain embodiments, is a method of transfecting a nucleic acid into a cell comprising administering to the cell a capsid comprising a recombinant Arc polypeptide or a recombinant endogenous Gag polypeptide and a therapeutic agent. In some embodiments, the therapeutic agent is a nucleic acid. In some embodiments, the nucleic acid is an RNA. In some embodiments, the recombinant Arc polypeptide is a human Arc polypeptide comprising an amino acid sequence that is SEQ ID NO: 1 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 1. In some embodiments, the recombinant Arc polypeptide is an Arc polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 2 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 2; b) an amino acid sequence that is SEQ ID NO: 3 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 3; c) an amino acid sequence that is SEQ ID NO: 4 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 4; d) an amino acid sequence that is SEQ ID NO: 5 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 5; e) an amino acid sequence that is SEQ ID NO: 6 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 6; f) an amino acid sequence that is SEQ ID NO: 7 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 7; g) an amino acid sequence that is SEQ ID NO: 8 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 8; h) an amino acid sequence that is SEQ ID NO: 9 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 9; i) an amino acid sequence that is SEQ ID NO: 10 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 10; or j) an amino acid sequence that is SEQ ID NO: 11 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 11; or k) an amino acid sequence that is SEQ ID NO: 12 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 12; or l) an amino acid sequence that is SEQ ID NO: 13 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 13; or m) an amino acid sequence that is SEQ ID NO: 14 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 14; or n) an amino acid sequence that is SEQ ID NO: 15 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 15. In some embodiments, the recombinant endogenous Gag polypeptide is a human endogenous Gag polypeptide. In some embodiments, the recombinant endogenous Gag polypeptide is an endogenous Gag polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 16 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 16; b) an amino acid sequence that is SEQ ID NO: 17 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 17; c) an amino acid sequence that is SEQ ID NO: 18 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 18; d) an amino acid sequence that is SEQ ID NO: 19 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 19; e) an amino acid sequence that is SEQ ID NO: 20 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 20; f) an amino acid sequence that is SEQ ID NO: 21 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 21; or g) an amino acid sequence that is SEQ ID NO: 22 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 22; or h) an amino acid sequence that is SEQ ID NO: 23 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 23; or i) an amino acid sequence that is SEQ ID NO: 24 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 24; or j) an amino acid sequence that is SEQ ID NO: 25 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 25; or k) an amino acid sequence that is SEQ ID NO: 26 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 26; or l) an amino acid sequence that is SEQ ID NO: 27 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 27 or m) an amino acid sequence that is SEQ ID NO: 28 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 28.
[0011] Disclosed herein, in certain embodiments, is a method of transfecting a nucleic acid into a cell comprising administering to the cell a capsid comprising a recombinant Arc polypeptide or a recombinant endogenous Gag polypeptide, wherein the recombinant Arc polypeptide is not a rat Arc polypeptide or a human Arc polypeptide. In some embodiments, the capsid further comprises a cargo. In some embodiments, the cargo is a nucleic acid. In some embodiments, the cargo is an RNA. In some embodiments, the cargo is a therapeutic agent. In some embodiments, the recombinant Arc polypeptide is an Arc polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 2 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 2; b) an amino acid sequence that is SEQ ID NO: 3 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 3; c) an amino acid sequence that is SEQ ID NO: 4 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 4; d) an amino acid sequence that is SEQ ID NO: 5 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 5; e) an amino acid sequence that is SEQ ID NO: 6 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 6; f) an amino acid sequence that is SEQ ID NO: 7 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 7; g) an amino acid sequence that is SEQ ID NO: 8 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 8; h) an amino acid sequence that is SEQ ID NO: 9 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 9; i) an amino acid sequence that is SEQ ID NO: 10 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 10; or j) an amino acid sequence that is SEQ ID NO: 11 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 11; or k) an amino acid sequence that is SEQ ID NO: 12 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 12; or l) an amino acid sequence that is SEQ ID NO: 13 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 13; or m) an amino acid sequence that is SEQ ID NO: 14 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 14; or n) an amino acid sequence that is SEQ ID NO: 15 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 15. In some embodiments, the recombinant endogenous Gag polypeptide is an endogenous Gag polypeptide comprising: a) an amino acid sequence that is SEQ ID NO: 12 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 12; b) an amino acid sequence that is SEQ ID NO: 13 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 13; c) an amino acid sequence that is SEQ ID NO: 14 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 14; d) an amino acid sequence that is SEQ ID NO: 15 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 15; e) an amino acid sequence that is SEQ ID NO: 16 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 16; f) an amino acid sequence that is SEQ ID NO: 17 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 17; g) an amino acid sequence that is SEQ ID NO: 18 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 18; g) an amino acid sequence that is SEQ ID NO: 19 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 19; g) an amino acid sequence that is SEQ ID NO: 20 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 20; g) an amino acid sequence that is SEQ ID NO: 21 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 21; or h) an amino acid sequence that is SEQ ID NO: 22 or an amino acid sequence that is at least 90% identical to the SEQ ID NO: 22.
[0012] Disclosed herein, in certain embodiments, is an engineered Arc or endo-Gag polypeptide comprising a cargo binding domain and at least one capsid forming subunit from an Arc or endo-Gag polypeptide. In some embodiments, the cargo binding domain comprises a nucleic acid binding domain. In some embodiments, the cargo binding domain comprises a polypeptide that binds to a small molecule. In some embodiments, the cargo binding domain comprises a polypeptide that binds to a protein, a peptide, or an antibody or binding fragment thereof. In some embodiments, the cargo binding domain comprises a polypeptide that binds to a peptidomimetic or a nucleotidomimetic. In some embodiments, the at least one capsid forming subunit comprises a polypeptide that corresponds to the CA N-lobe and/or CA C-lobe of SEQ ID NO: 1. In some embodiments, the engineered Arc or endo-Gag polypeptide further comprises a second capsid forming subunit from a different species of an Arc or endo-Gag polypeptide. In some embodiments, the second capsid forming subunit comprises a polypeptide that corresponds to the N-lobe and/or C-lobe of SEQ ID NO: 1. In some embodiments, the at least one capsid forming subunit and the second capsid forming subunit are each independently selected from a species of Arc or endo-Gag selected from a mammal, a rodent, a bird, a reptile, a fish, an insect, a fungus, or a plant. In some embodiments, the at least one capsid forming subunit and the second capsid forming subunit are from two different species. In some embodiments, the cargo binding domain is fused either directly or via a linker to the C-terminus of the at least one capsid forming subunit. In some embodiments, the cargo binding domain is fused either directly or via a linker to the N-terminus of the at least one capsid forming subunit. In some embodiments, the second capsid forming subunit is fused either directly or via a linker to the C-terminus of the at least one capsid forming subunit. In some embodiments, the second capsid forming subunit is fused either directly or via a linker to the N-terminus of the at least one capsid forming subunit. In some embodiments, the cargo binding domain is fused either directly or via a linker to the N-terminus of the at least one capsid forming subunit and the second capsid forming subunit is fused either directly or via a linker to the C-terminus of the at least one capsid forming subunit. In some embodiments, the cargo binding domain is fused either directly or via a linker to the C-terminus of the at least one capsid forming subunit and the second capsid forming subunit is fused either directly or via a linker to the N-terminus of the at least one capsid forming subunit. In some embodiments, the engineered Arc or endo-Gag polypeptide further comprises a second polypeptide. In some embodiments, the second polypeptide is fused either directly or via a linker to the at least one capsid forming subunit. In some embodiments, the second polypeptide is fused either directly or via a linker to the cargo binding domain. In some embodiments, the second polypeptide is a protein or an antibody or its binding fragments thereof. In some embodiments, the protein is a human protein or a viral protein. In some embodiments, the protein is a human Gag-like protein. In some embodiments, the protein is a de novo engineered protein designed to bind to a target receptor of interest. In some embodiments, the second polypeptide guides the delivery of a capsid formed by the engineered Arc or endo-Gag polypeptide to a target site of interest.
[0013] Disclosed herein, in certain embodiments, is a truncated Arc or endo-Gag polypeptide wherein a portion that is not involved with capsid-formation, nucleic acid binding, or delivery is removed. In some embodiments, the portion comprises a matrix (MA) domain, a reverse transcriptase (RT) domain, a nucleotide binding domain, or a combination thereof, provided that the nucleotide binding domain is not a human Arc RNA binding domain. In some embodiments, the portion comprises a CA C-lobe domain. In some embodiments, the portion comprises an N-terminal deletion, a C-terminal deletion, or a combination thereof. In some embodiments, the N-terminal deletion comprises a deletion of up to 10 amino acids, 20 amino acids, 30 amino acids, or 50 amino acids. In some embodiments, the C-terminal deletion comprises a deletion of up to 10 amino acids, 20 amino acids, 30 amino acids, or 50 amino acids.
[0014] Disclosed herein, in certain embodiments, is an Arc or endo-Gag-based capsid comprising an engineered Arc or endo-Gag polypeptide which may be a truncated Arc or endo-Gag polypeptide and a cargo encapsulated by the capsid formed by the engineered Arc or endo-Gag polypeptide. In some embodiments, the cargo is a nucleic acid molecule. In some embodiments, the nucleic acid molecule is DNA, RNA, or a mixture of DNA and RNA. In some embodiments, the DNA and the RNA are each independently single-stranded, double-stranded, or a mixture of single and double stranded. In some embodiments, the cargo is a small molecule. In some embodiments, the cargo is a protein. In some embodiments, the cargo is a peptide. In some embodiments, the cargo is an antibody or binding fragments thereof. In some embodiments, the cargo is a peptidomimetic or a nucleotidomimetic. In some embodiments, the Arc or endo-Gag-based capsid comprises one or more additional capsid subunits from one or more species of Arc or endo-Gag proteins that are different than the engineered Arc or endo-Gag polypeptide. In some embodiments, the Arc-based or endo-Gag-based capsid comprises one or more additional capsid subunits from non-Arc proteins. In some embodiments, the one or more additional capsid subunits comprise Copia protein, ASPRV1 protein, a protein from the SCAN domain family, a protein encoded by the Paraneoplastic Ma antigen family (e.g. PNMA5, PNMA6, PNMA6A, and PNMA6B), a protein from the retrotransposon Gag-like family (e.g. RTL3, RTL6, RTL8A, RTL8B), or a combination thereof. In some embodiments, the one or more additional capsid subunits comprise BOP, LDOC1, MOAP1, PEG10, PNMA3, PNMA5, PNMA6A, PNMA6B, RTL3, RTL6, RTL8A, RTL8B, and ZNF18. In some embodiments, the capsid has a diameter of at least 1 nm, 2 nm, 3 nm, 4 nm, 5 nm, 10 nm, 15 nm, 20 nm, 25 nm, 30 nm, 50 nm, 80 nm, 100 nm, 120 nm, 150 nm, 200 nm, 250 nm, 300 nm, 500 nm, 600 nm, or more. In some embodiments, the capsid has a diameter of from about 1 nm to about 600 nm, from about 1 nm to about 500 nm, from about 1 nm to about 200 nm, from about 1 nm to about 100 nm, from about 1 nm to about 50 nm, or from about 1 nm to about 30 nm. In some embodiments, the capsid has a reduced off-target effect. In some embodiments, the capsid does not have an off-target effect. In some embodiments, the capsid is formed ex-vivo. In some embodiments, the capsid is formed in-vitro.
[0015] Disclosed herein, in certain embodiments, is a nucleic acid polymer encoding a recombinant or engineered Arc polypeptide or a recombinant or engineered endogenous Gag polypeptide described herein.
[0016] Disclosed herein, in certain embodiments, is a vector comprising a nucleic acid polymer encoding a recombinant or engineered Arc polypeptide or a recombinant or engineered endogenous Gag polypeptide described herein.
[0017] Disclosed herein, in certain embodiments, is a method of preparing a loaded Arc-based or endo-Gag-based capsid comprising: incubating a plurality of recombinant or engineered Arc polypeptides or a plurality of recombinant or engineered endo-Gag polypeptides with a cargo in a solution for a time sufficient to generate the loaded capsid. In some embodiments, the method further comprises mixing the solution comprising the plurality of engineered Arc or endo-Gag polypeptides with a plurality of non-Arc or non-endo-Gag capsid forming subunits prior to incubating with the cargo. In some embodiments, the plurality of non-Arc or non-endo-Gag capsid forming subunits are mixed with the plurality of recombinant or engineered Arc or endo-Gag polypeptides at a ratio of 1:1, 2:1, 3:1, 4:1, 5:1, 6:1, 7:1, 8:1, 9:1, or 10:1. In some embodiments, the plurality of non-Arc or non-endo-Gag capsid forming subunits are mixed with the plurality of engineered Arc or endo-Gag polypeptides at a ratio of 1:2, 1:3, 1:4, 1:5, 1:6, 1:7, 1:8, 1:9, or 1:10. In some embodiments, the method further comprises mixing the solution comprising the plurality of truncated Arc or endo-Gag polypeptides with a plurality of non-Arc or endo-Gag capsid forming subunits prior to incubating with the cargo. In some embodiments, the plurality of non-Arc or endo-Gag capsid forming subunits are mixed with the plurality of truncated Arc or endo-Gag polypeptides at a ratio of 1:1, 2:1, 3:1, 4:1, 5:1, 6:1, 7:1, 8:1, 9:1, or 10:1. In some embodiments, the plurality of non-Arc or non-endo-Gag capsid forming subunits are mixed with the plurality of truncated Arc or endo-Gag polypeptides at a ratio of 1:2, 1:3, 1:4, 1:5, 1:6, 1:7, 1:8, 1:9, or 1:10. In some embodiments, the plurality of engineered Arc or endo-Gag polypeptides is obtained from a bacterial cell system, an insect cell system, or a mammalian cell system. In some embodiments, the plurality of engineered Arc or endo-Gag polypeptides is obtained from a cell-free system. In some embodiments, the plurality of truncated Arc or endo-Gag polypeptides is obtained from a bacterial cell system, an insect cell system, or a mammalian cell system. In some embodiments, the plurality of truncated Arc or endo-Gag polypeptides is obtained from a cell-free system. In some embodiments, the loaded Arc-based or endo-Gag capsid is formulated for systemic administration. In some embodiments, the loaded Arc or endo-Gag-based capsid is formulated for local administration. In some embodiments, the loaded Arc or endo-Gag-based capsid is formulated for parenteral administration. In some embodiments, the loaded Arc or endo-Gag-based capsid is formulated for oral administration. In some embodiments, the loaded Arc or endo-Gag-based capsid is formulated for topical administration. In some embodiments, the loaded Arc or endo-Gag-based capsid is formulated for sublingual or aerosol administration.
[0018] Disclosed herein, in certain embodiments, is use of an engineered or recombinant Arc-based or endo-Gag-based capsid for delivery of a cargo to a site of interest, comprising contacting a cell at the site of interest with an Arc-based or endo-Gag-based capsid for a time sufficient to facilitate cellular uptake of the capsid. In some embodiments, the cell is a tumor cell. In some embodiments, the tumor cell is a solid tumor cell. In some embodiments, the solid tumor cell is a cell from a bladder cancer, breast cancer, brain cancer, colorectal cancer, kidney cancer, liver cancer, lung cancer, pancreatic cancer, prostate cancer, skin cancer, stomach cancer, or thyroid cancer. In some embodiments, the tumor cell is from a hematologic malignancy. In some embodiments, the hematologic malignancy is a B-cell malignancy, or a T-cell malignancy. In some embodiments, the hematologic malignancy is chronic lymphocytic leukemia (CLL), small lymphocytic lymphoma (SLL), diffuse large B cell lymphoma (DLBCL), follicular lymphoma, mantle cell lymphoma, Burkitt lymphoma, cutaneous T-cell lymphoma, or peripheral T cell lymphoma. In some embodiments, the cell is a somatic cell. In some embodiments, the cell is a stem cell or a progenitor cell. In some embodiments, the cell is a mesenchymal stem or progenitor cell. In some embodiments, the cell is a hematopoietic stem or progenitor cell. In some embodiments, the cell is a muscle cell, a skin cell, a blood cell, or an immune cell. In some embodiments, a target protein is overexpressed or is depleted in the cell. In some embodiments, a target gene in the cell has one or more mutations. In some embodiments, the cell comprises an impaired splicing mechanism. In some embodiments, the use is an in vivo use. In some embodiments, the Arc-based capsid is administered systemically to a subject. In some embodiments, the Arc-based or endo-Gag-based capsid is administered via local administration to a subject. In some embodiments, the Arc-based or endo-Gag-based capsid is administered parenterally to a subject. In some embodiments, the Arc-based capsid is administered orally to a subject. In some embodiments, the Arc-based or endo-Gag-based capsid is administered topically to a subject. In some embodiments, the Arc-based or endo-Gag-based capsid is administered via sublingual or aerosol administration to a subject. In some embodiments, the use is an in vitro or ex vivo use.
[0019] Disclosed herein, in certain embodiments, is a kit comprising an engineered Arc or endo-Gag polypeptide, a truncated Arc or endo-Gag polypeptide, a vector encoding a recombinant or engineered Arc or endo-Gag polypeptide, or an Arc-based or endo-Gag-based capsid.
BRIEF DESCRIPTION OF THE DRAWINGS
[0020] Various aspects of the disclosure are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present disclosure will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the disclosure are utilized, and the accompanying drawings below.
[0021] FIG. 1 is a representation of exemplary Arc polypeptides.
[0022] FIG. 2 is a representation of exemplary engineered Arc polypeptides.
[0023] FIG. 3 illustrates an exemplary method of engineering an Arc polypeptide to (A) carry a specific cargo (e.g., an RNA payload) or (B) remove an off-function effect.
[0024] FIG. 4A shows the isolation of 6.times.His-tagged human Arc by elution from a HisTrap column with an imidazole gradient.
[0025] FIG. 4B shows the separation of 6.times.His-tagged human Arc from residual nucleic acids on a mono Q column eluted with a NaCl gradient.
[0026] FIG. 5 shows a transmission electron microscope image of negatively stained human Arc capsids.
[0027] FIG. 6 shows transmission electron microscope images of negatively stained capsids formed from recombinantly expressed Arc orthologs.
[0028] FIG. 7 shows transmission electron microscope images of negatively stained capsids formed from recombinantly expressed endo-Gag proteins.
[0029] FIG. 8 shows selective internalization of Alexa594-labeled Arc capsids by HeLa cells.
[0030] FIG. 9 shows the delivery of Cre RNA to HeLa cells by Arc capsids.
[0031] FIG. 10 illustrates methods for screening Arc and endo-Gag gene candidates for the ability to transmit a heterologous RNA payload.
DETAILED DESCRIPTION OF THE DISCLOSURE
[0032] Administrating diagnostic or therapeutic agents to a site of interest with precision has presented an ongoing challenge. Available methods of delivering nucleic acids to cells have myriad limitations. For example, AAV viral vectors often used for gene therapy are immunogenic, have a limited payload capacity of <3 kb, suffer from poor bio-distribution, can only be administered by direct injection, and pose a risk of disrupting host genes by integration. Non-viral methods have different limitations. Liposomes are primarily delivered to the liver. Extracellular vesicles have a limited payload capacity of <1 kb, limited scalability, and purification difficulties. Thus, there is a recognized need for new methods of delivering therapeutic payloads.
[0033] Most molecules do not possess inherent affinity in the body. In other cases, the administered agents accumulate either in the liver and the kidney for clearance or in unintended tissue or cell types. Method for improving delivery includes coating the agent of choice with hydrophobic compounds or polymers. Such an approach increases the duration of said agent in circulation and augments hydrophobicity for cellular uptake. On the other hand, this approach does not actively direct cargo to the site of interest for delivery.
[0034] To specifically target sites where therapy is needed, therapeutic compounds are optionally fused to moieties such as ligands, antibodies, and aptamers that recognize and bind to receptors displayed on the surface of targeted cells. Upon reaching a cell of interest, the therapeutic compound is optionally further delivered to an intracellular target. For example, a therapeutic RNA can be translated to a protein if it comes into contact with a ribosome in the cytoplasm of the cell.
[0035] Arc (activity-regulated cytoskeleton-associated protein) regulates the endocytic trafficking of .alpha.-amino-3-hydroxy-5-methylisoxazole-4-propionic acid (AMPA) type glutamate receptors. Arc activities have been linked to synaptic strength and neuronal plasticity. Phenotypes of loss of Arc in experimental murine model included defective formation of long-term memory and reduced neuronal activity and plasticity.
[0036] Arc exhibits similar molecular properties to retroviral Gag proteins. The Arc gene may have originated from the Ty3/gypsy retrotransposon. An endogenous Gag (endo-Gag) protein is any protein endogenous to a eukaryotic organism, including Arc, that has predicted and annotated similarity to viral Gag proteins. Exemplary endo-Gag proteins are disclosed in Campillos M, Doerks T, Shah P K, and Bork P, Computational characterization of multiple Gag-like human proteins, Trends Genet. 2006 November; 22(11):585-9. An endo-Gag protein is optionally recombinantly expressed by any host cell, including a prokaryotic or eukaryotic cell, or a bacterial, yeast, insect, vertebrate, mammalian, or human cell. As described herein, in some embodiments an endo-Gag protein assembles into an endo-Gag capsid.
[0037] Disclosed herein, in certain embodiments, are Arc and endo-Gag polypeptides which assemble into a capsid for delivery of a cargo of interest. In some embodiments, also described herein are engineered Arc and endo-Gag polypeptides which assemble into a capsid for delivery of a cargo of interest. In additional embodiments, described herein are capsids, e.g., Arc-based or endo-Gag-based capsids, for delivery of a cargo of interest.
Arc Polypeptides and Endogenous Gag Polypeptides
[0038] In certain embodiments, disclosed herein is an Arc polypeptide. In certain embodiments, disclosed herein is an endo-Gag polypeptide. It should be understood that endo-Gag sequences are optional substitutes for Arc sequences to form any type of engineered Arc polypeptide described in this section.
[0039] In some instances, Arc is a non-human Arc polypeptide. In some instances, the Arc polypeptide comprises a full-length Arc polypeptide (e.g., a full-length non-human Arc polypeptide). In other instances, the Arc polypeptide comprises a fragment of non-human Arc, such as a truncated Arc polypeptide, that participates in the formation of a capsid. In additional instances, the Arc polypeptide comprises one or more domains of a non-human Arc polypeptide, in which at least one of the domains participates in the formation of a capsid. In further instances, the Arc polypeptide is a recombinant Arc polypeptide.
[0040] In some instances, endo-Gag is a non-human endo-Gag polypeptide. In some instances, the endo-Gag polypeptide comprises a full-length endo-Gag polypeptide (e.g., a full-length non-human endo-Gag polypeptide). In other instances, the endo-Gag polypeptide comprises a fragment of non-human endo-Gag, such as a truncated endo-Gag polypeptide, that participates in the formation of a capsid. In additional instances, the endo-Gag polypeptide comprises one or more domains of a non-human endo-Gag polypeptide, in which at least one of the domains participates in the formation of a capsid. In further instances, the endo-Gag polypeptide is a recombinant endo-Gag polypeptide.
[0041] In some embodiments, the Arc is a human Arc polypeptide with at least its RNA binding domain modified to bind to a cargo that is not native to the human Arc. In some instances, the Arc polypeptide comprises a full-length human Arc polypeptide with at least its RNA binding domain modified to bind to a cargo that is not native to the human Arc protein. In other instances, the Arc polypeptide comprises a human Arc fragment comprising modification(s) in at least its RNA binding domain. In additional instances, the Arc polypeptide comprises one or more domains of a human Arc polypeptide, in which at least one of the domains participates in the formation of a capsid and in which the RNA binding domain is modified to bind to a cargo that native human Arc protein does not bind to. In further instances, the Arc polypeptide is a recombinant human Arc polypeptide, with at least the RNA binding domain is modified to enable loading of a cargo that is not native to the human Arc protein.
[0042] In some embodiments, the Endo-Gag is a human Endo-Gag polypeptide with at least its RNA binding domain modified to bind to a cargo that is not native to the human endo-Gag. In some instances, the endo-Gag polypeptide comprises a full-length human endo-Gag polypeptide with at least its RNA binding domain modified to bind to a cargo that is not native to the human endo-Gag protein. In other instances, the endo-Gag polypeptide comprises a human endo-Gag fragment comprising modification(s) in at least its RNA binding domain to bind to a cargo that a native human endo-Gag protein does not bind to. In additional instances, the endo-Gag polypeptide comprises one or more domains of a human endo-Gag polypeptide, in which at least one of the domains participates in the formation of a capsid and in which the RNA binding domain is modified to bind to a cargo that is not native to the human endo-Gag protein. In further instances, the endo-Gag polypeptide is a recombinant human endo-Gag polypeptide, with at least the RNA binding domain is modified to enable loading of a cargo that is not native to the human endo-Gag protein.
[0043] In some instances, the Arc or endo-Gag polypeptide is an engineered Arc or endo-Gag polypeptide. As used herein, an engineered polypeptide is a recombinant polypeptide that is not identical in sequence to a full length, wild-type polypeptide. In some instances, the engineered Arc or endo-Gag polypeptide comprises a fragment of an Arc or endo-Gag polypeptide from a first species and at least an additional fragment from an Arc or endo-Gag polypeptide of a second species. In some cases, the first Arc or endo-Gag polypeptide is selected from a kingdom member of animalia, plantae, fungi, or protista. In some cases, the first species is selected from a mammal, a rodent, a bird, a reptile, a fish, a vertebrate, a eukaryote, an insect, a fungus, or a plant. In some cases, the second Arc polypeptide is selected from a kingdom member of animalia, plantae, fungi, or protista that is the same or different than the first Arc or endo-Gag polypeptide. In some cases, the second species is selected from a mammal, a rodent, a bird, a reptile, a fish, a vertebrate, a eukaryote, an insect, a fungus, or a plant that is different from the first species.
[0044] In some embodiments, an exemplary mammalian Arc or endo-Gag protein for expression as a recombinant or engineered Arc polypeptide is from the species Homo sapiens. Additional exemplary species of primate Arc or endo-Gag protein proteins for expression as a recombinant or engineered Arc polypeptide include: Gorilla, Pongo abelii, Pan paniscus, Macaca nemestrina, Chlorocebus sabaeus, Papio anubis, Rhinopithecus roxellana, Macaca fascicularis, Nomascus leucogenys, Callithrix jacchus, Aotus nancymaae, Cebus capucinus imitator, Saimiri boliviensis boliviensis, Otolemur garnettii, Macaca mulatta, and Macaca fascicularis.
[0045] An exemplary species list of rodent Arc or endo-Gag proteins for expression as a recombinant or engineered Arc or endo-Gag polypeptide includes: Fukomys damarensis, Microcebus murinus, Heterocephalus glaber, Propithecus coquereli, Marmota marmota marmota, Galeopterus variegatus, Cavia porcellus, Dipodomys ordii, Octodon degus, Castor canadensis Nannospalax galili, Carlito syrichta, Chinchilla lanigera, Mus musculus, Ictidomys tridecemlineatus, Rattus norvegicus, Microtus ochrogaster, Otolemur garnettii, Meriones unguiculatus, Cricetulus griseus, Rattus norvegicus, Neotoma lepida, Jaculus jaculus, Mustela putorius furo, Mesocricetus auratus, Tupaia chinensis, Cricetulus griseus, Chrysochloris asiatica, Elephantulus edwardii, Erinaceus europaeus, Ochotona princeps, Sorex araneus, Monodelphis domestica, Echinops telfairi, and Condylura cristata.
[0046] An exemplary species list of Arc or endo-Gag proteins for expression as a recombinant or engineered Arc or endo-Gag polypeptide includes: Vulpes vulpes, Canis lupus dingo, Felis catus, Panthera pardus, Callorhinus ursinus, Odobenus rosmarus divergens, Equus asinus, Sus scrofa, Manis javanica, Ceratotherium simum simum, Leptonychotes weddellii, Enhydra lutris kenyoni, Lipotes vexillifer, Bos grunniens, Bubalus bubalis, Camelus dromedarius, Vicugna pacos, Orcinus orca, Neomonachus schauinslandi, Tursiops truncatus, Bos taurus, Capra hircus, Delphinapterus leucas, Ovis aries musimon, Balaenoptera acutorostrata scammoni, Neophocaena asiaeorientalis asiaeorientalis, Miniopterus natalensis, Pteropus alecto, Physeter catodon, Loxodonta africana, Orycteropus afer afer, Bos mutus, Desmodus rotundus, Hipposideros armiger, Ailuropoda melanoleuca, Trichechus manatus latirostris, Rousettus latirostris, Rousettus aegyptiacus, Eptesicus fuscus, Rhinolophus sinicus, Cervus elaphus hippelaphus, Odocoileus virginianus texanus, Pantholops hodgsonii, Camelus bactrianus, Sarcophilus harrisii, Phascolarctos cinereus, and Ornithorhynchus anatinus.
[0047] An exemplary species list of bird Arc or endo-Gag proteins for expression as a recombinant or engineered Arc or endo-Gag polypeptide includes: Gallus gallus, Corvus cornix, Cornix, Panus major, Corvus brachyrhynchos, Dromaius novaehollandiae, and Apteryx rowi.
[0048] An exemplary species list of reptile Arc protein for expression as a recombinant or engineered Arc or endo-Gag polypeptide includes: Python bivittatus, Pogona vitticeps, Anolis carolinensis, Protobothrops mucrosquamatus, Alligator sinensis, Crocodylus porosus, Gavialis gangeticus, Alligator mississippiensis, Pelodiscus sinensis, Terrapene mexicana triunguis, Chrysemys picta bellii, Chelonia mydas, Nanorana parkeri, Xenopus tropicalis, Xenopus laevis, and Latimeria chalumnae,
[0049] An exemplary species list of fish Arc protein for expression as a recombinant or engineered Arc or endo-Gag polypeptide includes: Oncorhynchus mykiss, Acanthochromis polyacanthus, Oncorhynchus kisutch, Carassius auratus, and Austrofundulus limnaeus.
[0050] An exemplary species list of insect Arc or endo-Gag proteins for expression as a recombinant or engineered Arc or endo-Gag polypeptide includes: Drosophila serrata, Drosophila bipectinata, Solenopsis invicta, Temnothorax curvispinosus, Drosophila melanogaster, Agrilus planipennis, Camponotus floridanus, Pogonomyrmex barbatus, Nilaparvata lugens, Bombyx mori, Tribolium castaneum, and Leptinotarsa decemlineata.
[0051] An exemplary species list of plant Arc or endo-Gag proteins for expression as a recombinant or engineered Arc or endo-Gag polypeptide includes Spinacia oleracea and Erythranthe guttata.
[0052] An exemplary species list of fungi proteins for expression as a recombinant or engineered Arc or endo-Gag polypeptide includes: Saccharomyces cerevisiae, Rhizopus delemar, Fusarium oxysporum, Cryptococcus neoformans, Rhizophagus irregularis, Fusarium fujikuroi, Candida albicans, Trichophyton rubrum, Pyrenophora tritici-repentis, Rhizopus microsporus, Rhizoctonia solani, Aspergillus flavus, Verticillium dahliae, Fusarium verticillioides, Aspergillus niger, Fusarium graminearum, Aspergillus fumigatus, Zymoseptoria tritici, and Trichoderma harzianum.
[0053] An exemplary species list of protists Arc or endo-Gag proteins for expression as a recombinant or engineered Arc or endo-Gag polypeptide includes: Entamoeba histolytica, Paulinella micropora, Guillardia theta, Oxyrrhis marina, Seminavis robusta, Euglena longa, Naegleria gruberi, and Trichomonas vaginalis.
[0054] In some instances, Arc or endo-Gag comprises a capsid assembly/forming (CA) domain, a cargo binding domain (e.g., an RNA binding domain), and optionally a matrix (MA) domain, a reverse transcriptase (RT) domain, or a combination thereof. In some cases, the CA domain is further divided into an N-lobe domain and a C-lobe domain. In some cases, the cargo binding domain comprises an RNA binding domain, a DNA binding domain, a protein binding domain, a peptide binding domain, an antibody binding domain, a small molecule binding domain, or a peptidomimetic/nucleotidomimetic binding domain. Exemplary cargo binding domains include, but are not limited to, domains from GPCRs, antibodies or binding fragments thereof, lipoproteins, integrins, tyrosine kinases, DNA-binding proteins, RNA-binding proteins, nucleases, ligases, proteases, integrases, isomerases, phosphatases, GTPases, aromatases, esterases, adaptor proteins, G-proteins, GEFs, cytokines, interleukins, interleukin receptors, interferons, interferon receptors, caspases, transcription factors, neurotrophic factors and their receptors, growth factors and their receptors, signal recognition particle and receptor components, extracellular matrix proteins, integral components of membrane, ribosomal proteins, translation elongation factors, translation initiation factors, GPI-anchored proteins, tissue factors, dystrophin, utrophin, dystrobrevin, any fusions, combinations, subunits, derivatives, or domains thereof.
[0055] In some embodiments, one or more non-essential regions which are not involved in capsid formation or nucleic acid binding are removed from an Arc or endo-Gag protein to generate an Arc or endo-Gag polypeptide. In such instances, one or more non-essential regions, e.g., an N-terminal region (e.g., up to 10 amino acids, up to 20 amino acids, up to 30 amino acids, or up to 50 amino acids), a C-terminal region (e.g., up to 10 amino acids, up to 20 amino acids, up to 30 amino acids, or up to 50 amino acids), a RT domain, a MA domain, or a combination thereof, are deleted from an Arc or endo-Gag protein to generate an Arc or endo-Gag polypeptide. In some cases, only the essential regions involved in capsid assembly/forming and cargo binding remain in an Arc or endo-Gag polypeptide. In additional cases, only the essential region involved in capsid assembly/forming (e.g., the N-lobe and/or the C-lobe) remains in an Arc polypeptide.
[0056] In certain embodiments, the RT domain, the MA domain, and/or the endogenous RNA binding domain are replaced with other cargo binding domains: for example, replaced with a DNA binding domain, a protein binding domain, a peptide binding domain, an antibody binding domain, a small molecule binding domain, a peptidomimetic binding domain, or a nucleotidomimetic binding domain. In some embodiments, an Arc or endo-Gag polypeptide comprises truncations or modifications of domains involved in capsid forming, nucleic acid binding, or delivery.
[0057] In some embodiments, the Arc or endo-Gag polypeptide comprises a MA domain, a CA N-lobe, a CA C-lobe, a cargo binding domain, and a RT domain. In some instances, the Arc polypeptide comprises from N-terminus to C-terminus the following domains: the MA domain, the CA N-lobe, the CA C-lobe, the RT domain, and the cargo binding domain. In some instances, the Arc or endo-Gag polypeptide comprises from N-terminus to C-terminus the following domains: the MA domain, the RT domain, the cargo binding domain, the CA N-lobe, and the CA C-lobe. In some instances, the Arc or endo-Gag polypeptide comprises from N-terminus to C-terminus the following domains: the cargo binding domain, the MA domain, the RT domain, the CA N-lobe, and the CA C-lobe. In some instances, the domains are arranged in an order that does not impede capsid assembly and cargo binding. In some instances, each of the domains is either directly or indirectly fused to the respective two flanking domains.
[0058] In some embodiments, the Arc or endo-Gag polypeptide comprises a MA domain, a CA N-lobe, a CA C-lobe, and a cargo binding domain. In some instances, the Arc or endo-Gag polypeptide comprises from N-terminus to C-terminus the following domains: the MA domain, the CA N-lobe, the CA C-lobe, and the cargo binding domain. In some instances, the Arc polypeptide comprises from N-terminus to C-terminus the following domains: the MA domain, the cargo binding domain, the CA N-lobe, and the CA C-lobe. In some instances, the Arc or endo-Gag polypeptide comprises from N-terminus to C-terminus the following domains: the cargo binding domain, the MA domain, the CA N-lobe, and the CA C-lobe. In some instances, the domains are arranged in an order that does not impede capsid assembly and cargo binding. In some instances, each of the domains is either directly or indirectly fused to the respective two flanking domains.
[0059] In some embodiments, the Arc or endo-Gag polypeptide comprises a CA N-lobe, a CA C-lobe, and a cargo binding domain. In some instances, the Arc or endo-Gag polypeptide comprises from N-terminus to C-terminus the following domains: the CA N-lobe, the CA C-lobe, and the cargo binding domain. In some instances, the Arc or endo-Gag polypeptide comprises from N-terminus to C-terminus the following domains: the cargo binding domain, the CA N-lobe, and the CA C-lobe. In some instances, the domains are arranged in an order that does not impede capsid assembly and cargo binding. In some instances, each of the domains is either directly or indirectly fused to the respective two flanking domains.
[0060] In some embodiments, the Arc or endo-Gag polypeptide comprises a CA N-lobe and a cargo binding domain. In some instances, the Arc or endo-Gag polypeptide comprises from N-terminus to C-terminus the following domains: the CA N-lobe and the cargo binding domain. In some instances, the Arc or endo-Gag polypeptide comprises from N-terminus to C-terminus the following domains: the cargo binding domain and the CA N-lobe. In some instances, the domains are arranged in an order that does not impede capsid assembly and cargo binding. In some instances, the two domains are either directly or indirectly fused to each other.
[0061] In some embodiments, the Arc or endo-Gag polypeptide is engineered to comprise a cargo binding domain, a CA domain, a MA domain, or a RT domain from one or more additional species to generate an engineered Arc polypeptide. For example, the engineered Arc or endo-Gag polypeptide comprises a cargo binding domain, a CA domain, a MA domain, or a RT domain from a first species and a cargo binding domain, a CA domain, a MA domain, or a RT domain from a second species. In some cases, the first species is selected from a eukaryote, a vertebrate, a human, a mammal, a rodent, a bird, a reptile, a fish, an insect, a fungus, or a plant. In some cases, the second species is selected from a eukaryote, a vertebrate, a human, a mammal, a rodent, a bird, a reptile, a fish, an insect, a fungus, or a plant that is different from the first species.
[0062] In some instances, the engineered or endo-Gag Arc polypeptide comprises a cargo binding domain from a first species and a CA domain (e.g., a CA N-lobe and optionally a CA C-lobe) from a second species. The engineered Arc or endo-Gag polypeptide optionally comprises a MA domain and an RT domain from either the first species or the second species. In some cases, the first species is selected from a eukaryote, a vertebrate, a human, a mammal, a rodent, a bird, a reptile, a fish, an insect, a fungus, or a plant. In some cases, the second species is selected from a eukaryote, a vertebrate, a human, a mammal, a rodent, a bird, a reptile, a fish, an insect, a fungus, or a plant that is different from the first species.
[0063] In some instances, the engineered Arc or endo-Gag polypeptide comprises a cargo binding domain, a first CA domain, a second CA domain, and optionally a MA domain and/or a RT domain. In some cases, the cargo binding domain, the first CA domain, and optionally a MA domain and/or a RT domain are from a first species and the second CA domain is from a second species. In some cases, the first CA domain is from a first species and the cargo binding domain, the second CA domain, and optionally a MA domain and/or a RT domain are from a second species. In some instances, the domains are arranged in an order that does not impede capsid assembly and cargo binding. In some instances, each of the domains is either directly or indirectly fused to the respective two adjacent domains.
[0064] In some instances, the engineered Arc or endo-Gag polypeptide comprises a cargo binding domain, a first CA domain, and a second CA domain. In some cases, the cargo binding domain and the first CA domain are from a first species and the second CA domain is from a second species. In some cases, the first CA domain is from a first species and the cargo binding domain and the second CA domain are from a second species. In such cases, the engineered Arc or endo-Gag polypeptide comprises from the N-terminus to the C-terminus the following domains: a cargo binding domain, a first CA domain, and a second CA domain. In such cases, the engineered Arc or endo-Gag polypeptide comprises from the N-terminus to the C-terminus the following domains: a first CA domain, a cargo binding domain, and a second CA domain. In such cases, the engineered Arc or endo-Gag polypeptide comprises from the N-terminus to the C-terminus the following domains: a first CA domain, a second CA domain, and a cargo binding domain. In some instances, the domains are arranged in an order that does not impede capsid assembly and cargo binding. In some instances, each of the domains is either directly or indirectly fused to the respective two flanking domains.
[0065] In some instances, the engineered Arc or endo-Gag polypeptide further comprises a second polypeptide. In some instances, the second polypeptide is fused directly or indirectly via a linker to one or more of: a cargo binding domain, a first CA domain, a second CA domain, a MA domain if present, or a RT domain if present. In some cases, the second polypeptide is a protein (e.g., a human protein), an antibody or binding fragment thereof, a viral protein, a Gag-like protein (e.g., a human Gag-like protein), or a de novo engineered protein designed to bind to a target receptor of interest. In some instances, the antibody or binding fragment thereof comprises a humanized antibody or binding fragments thereof, a murine antibody or binding fragment thereof, a chimeric antibody or binding fragment thereof, a monoclonal antibody or binding fragment thereof, a multi-specific antibody or binding fragment thereof, a bispecific antibody or biding fragment thereof, a monovalent Fab', a divalent Fab.sub.2, F(ab)'.sub.3 fragments, a single-chain variable fragment (scFv), a bis-scFv, an (scFv).sub.2, a diabody, a minibody, a nanobody, a triabody, a tetrabody, a disulfide stabilized Fv protein (dsFv), a single-domain antibody (sdAb), an Ig NAR, a camelid antibody or binding fragment thereof, or a chemically modified derivative thereof. In some instances, the second polypeptide guides the delivery of a capsid formed by the engineered Arc polypeptide to a target site of interest.
[0066] In some embodiments, a nucleic acid sequence or amino acid sequence of the disclosure (for example, encoding an Arc polypeptide or endo-Gag polypeptide) has at least 70% homology, at least 71% homology, at least 72% homology, at least 73% homology, at least 74% homology, at least 75% homology, at least 76% homology, at least 77% homology, at least 78% homology, at least 79% homology, at least 80% homology, at least 81% homology, at least 82% homology, at least 83% homology, at least 84% homology, at least 85% homology, at least 86% homology, at least 87% homology, at least 88% homology, at least 89% homology, at least 90% homology, at least 91% homology, at least 92% homology, at least 93% homology, at least 94% homology, at least 95% homology, at least 96% homology, at least 97% homology, at least 98% homology, at least 99% homology, at least 99.1% homology, at least 99.2% homology, at least 99.3% homology, at least 99.4% homology, at least 99.5% homology, at least 99.6% homology, at least 99.7% homology, at least 99.8% homology, at least 99.9% or at least 99.99% homology to an amino acid sequence provided herein. Various methods and software programs are used to determine the homology between two or sequences, such as NCBI BLAST, Clustal W, MAFFT, Clustal Omega, AlignMe, Praline, or another suitable method or algorithm.
[0067] In certain embodiments, the Arc polypeptide is a human polypeptide having the amino acid sequence of SEQ ID NO: 1 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 1.
[0068] In certain embodiments, the Arc polypeptide is a killer whale polypeptide having the amino acid sequence of SEQ ID NO: 2 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 2.
[0069] In certain embodiments, the Arc polypeptide is a white tailed deer polypeptide having the amino acid sequence of SEQ ID NO: 3 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 3.
[0070] In certain embodiments, the Arc polypeptide is a platypus polypeptide having the amino acid sequence of SEQ ID NO: 4 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 4.
[0071] In certain embodiments, the Arc polypeptide is a goose polypeptide having the amino acid sequence of SEQ ID NO: 5 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 5.
[0072] In certain embodiments, the Arc polypeptide is a Dalmatian pelican polypeptide having the amino acid sequence of SEQ ID NO: 6 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 6.
[0073] In certain embodiments, the Arc polypeptide is a white tailed eagle polypeptide having the amino acid sequence of SEQ ID NO: 7 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 7.
[0074] In certain embodiments, the Arc polypeptide is a king cobra polypeptide having the amino acid sequence of SEQ ID NO: 8 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 8.
[0075] In certain embodiments, the Arc polypeptide is a ray finned fish polypeptide having the amino acid sequence of SEQ ID NO: 9 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 9.
[0076] In certain embodiments, the Arc polypeptide is a sperm whale polypeptide having the amino acid sequence of SEQ ID NO: 10 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 10.
[0077] In certain embodiments, the Arc polypeptide is a turkey polypeptide having the amino acid sequence of SEQ ID NO: 11 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 11.
[0078] In certain embodiments, the Arc polypeptide is a central bearded dragon polypeptide having the amino acid sequence of SEQ ID NO: 12 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 12.
[0079] In certain embodiments, the Arc polypeptide is a Chinese alligator polypeptide having the amino acid sequence of SEQ ID NO: 13 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 13.
[0080] In certain embodiments, the Arc polypeptide is an American alligator polypeptide having the amino acid sequence of SEQ ID NO: 14 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 14.
[0081] In certain embodiments, the Arc polypeptide is a Japanese gekko polypeptide having the amino acid sequence of SEQ ID NO: 15 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 15.
[0082] In certain embodiments, the endo-Gag polypeptide is a human PNMA3 polypeptide having the amino acid sequence of SEQ ID NO: 16 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 16.
[0083] In certain embodiments, the endo-Gag polypeptide is a human PNMA5 polypeptide having the amino acid sequence of SEQ ID NO: 17 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 17.
[0084] In certain embodiments, the endo-Gag polypeptide is a human PNMA6A polypeptide having the amino acid sequence of SEQ ID NO: 18 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 18.
[0085] In certain embodiments, the endo-Gag polypeptide is a human PNMA6B polypeptide having the amino acid sequence of SEQ ID NO: 19 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 19.
[0086] In certain embodiments, the endo-Gag polypeptide is a human RTL3 polypeptide having the amino acid sequence of SEQ ID NO: 20 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 20.
[0087] In certain embodiments, the endo-Gag polypeptide is a human RTL6 polypeptide having the amino acid sequence of SEQ ID NO: 21 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 21.
[0088] In certain embodiments, the endo-Gag polypeptide is a human RTL8A polypeptide having the amino acid sequence of SEQ ID NO: 22 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 22.
[0089] In certain embodiments, the endo-Gag polypeptide is a human RTL8B polypeptide having the amino acid sequence of SEQ ID NO: 23 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 23.
[0090] In certain embodiments, the endo-Gag polypeptide is a human BOP polypeptide having the amino acid sequence of SEQ ID NO: 24 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 24.
[0091] In certain embodiments, the endo-Gag polypeptide is a human LDOC1 polypeptide having the amino acid sequence of SEQ ID NO: 25 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 25.
[0092] In certain embodiments, the endo-Gag polypeptide is a human ZNF18 polypeptide having the amino acid sequence of SEQ ID NO: 26 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 26.
[0093] In certain embodiments, the endo-Gag polypeptide is a human MOAP1 polypeptide having the amino acid sequence of SEQ ID NO: 27 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 27.
[0094] In certain embodiments, the endo-Gag polypeptide is a human PEG10 polypeptide having the amino acid sequence of SEQ ID NO: 28 or a sequence having at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identity to SEQ ID NO: 28.
[0095] In some cases, the recombinant Arc or endo-Gag polypeptide is an Arc polypeptide illustrated in FIG. 1.
[0096] In some cases, the engineered Arc or endo-Gag polypeptide is an engineered Arc polypeptide illustrated in FIG. 2.
Linkers
[0097] In certain embodiments, a polypeptide of the disclosure comprises a linker. In some embodiments, the linker is a peptide linker. In some instances, the linker is a rigid linker. In other instances, the linker is a flexible linker. In some cases, the linker is a non-cleavable linker. In other cases, the linker is a cleavable linker. In additional cases, the linker comprises a linear structure, or a non-linear structure (e.g., a cyclic structure).
[0098] In certain embodiments, non-cleavable linkers comprise short peptides of varying lengths. Exemplary non-cleavable linkers include (EAAAK)n (SEQ ID NO: 70), or (EAAAR)n (SEQ ID NO: 71), where n is from 1 to 5, and up to 30 residues of glutamic acid-proline or lysine-proline repeats. In some embodiments, the non-cleavable linker comprises (GGGGS)n (SEQ ID NO: 72) or (GGGS)n (SEQ ID NO: 73), wherein n is 1 to 10; KESGSVSSEQLAQFRSLD (SEQ ID NO: 74); or EGKSSGSGSESKST (SEQ ID NO: 75). In some embodiments, the non-cleavable linker comprises a poly-Gly/Ala polymer.
[0099] In certain embodiments, the linker is a cleavable linker, e.g., an extracellular cleavable linker or an intracellular cleavable linker. In some instances, the linker is designed for cleavage in the presence of particular conditions or in a particular environment (e.g., under physiological condition). For example, the design of a linker for cleavage by specific conditions, such as by a specific enzyme, allows the targeting of cellular uptake to a specific location.
[0100] In some embodiments, the linker is a pH-sensitive linker. In one instance, the linker is cleaved under basic pH conditions. In other instance, the linker is cleaved under acidic pH conditions.
[0101] In some embodiments, the linker is cleaved in vivo by endogenous enzymes (e.g., proteases) such as serine proteases including but not limited to thrombin, metalloproteases, furin, cathepsin B, necrotic enzymes (e.g., calpains), and the like. Exemplary cleavable linkers include, but are not limited to, GGAANLVRGG (SEQ ID NO: 76); SGRIGFLRTA (SEQ ID NO: 77); SGRSA (SEQ ID NO: 78); GFLG (SEQ ID NO: 79); ALAL (SEQ ID NO: 80); FK; PIC(Et)F-F (SEQ ID NO: 81), where C(Et) indicates S-ethylcysteine; PR(S/T)(L/I)(S/T) (SEQ ID NO: 82); DEVD (SEQ ID NO: 83); GWEHDG (SEQ ID NO: 84); RPLALWRS (SEQ ID NO: 85); or a combination thereof.
Capsids
[0102] In some embodiments, disclosed herein is a capsid. In some instances, the capsid comprises an Arc polypeptide and/or an endo-Gag polypeptide such as a Copia protein, ASPRV1 protein, a protein from the SCAN domain family, a protein encoded by the Paraneoplastic Ma antigen family, a protein or a combination of proteins chosen from the retrotransposon Gag-like family, or a combination thereof. Exemplary endo-Gag polypeptides are BOP, LDOC1, MOAP1, PEG10, PNMA3, PNMA5, PNMA6A, PNMA6B, RTL3, RTL6, RTL8A, RTL8B, and ZNF18. In some instances, the Arc polypeptide, the Copia protein, the ASPRV1 protein, the protein from the SCAN domain family, the protein encoded by the Paraneoplastic Ma antigen family, and the protein or a combination of proteins chosen from the retrotransposon Gag-like family are each independently a full-length polypeptide. In other instances, the Arc polypeptide, the Copia protein, the ASPRV1 protein, the protein from the SCAN domain family, the protein encoded by the Paraneoplastic Ma antigen family, and the protein or a combination of proteins chosen from the retrotransposon Gag-like family are each independently a functional fragment thereof, e.g., that is capable of forming a subunit of a capsid.
Arc-Based Capsids and Endo-Gag-Based Capsids
[0103] In some embodiments, the capsid comprises an Arc-based capsid. In some embodiments, the capsid comprises an endo-Gag-based capsid. In some instances, the Arc-based and/or endo-Gag capsid comprises a plurality of recombinant Arc polypeptides and/or endo-Gag polypeptides described above, a plurality of engineered Arc polypeptides and/or endo-Gag polypeptides described above, or a combination thereof. In some cases, the Arc-based capsid comprises a plurality of recombinant Arc polypeptides. In other cases, the Arc-based capsid comprises a plurality of engineered Arc polypeptides. In some cases, the endo-Gag-based capsid comprises a plurality of recombinant endo-Gag polypeptides. In other cases, the endo-Gag-based capsid comprises a plurality of engineered endo-Gag polypeptides.
[0104] In some embodiments, the Arc-based or endo-Gag-based capsid comprises a first plurality of Arc and/or endo-Gag polypeptides from a first species and a second plurality of Arc and/or endo-Gag polypeptides from at least a second species. In some cases, the first species is selected from a eukaryote, a vertebrate, a human, a mammal, a rodent, a bird, a reptile, a fish, an insect, a fungus, or a plant. In some cases, the second species is selected from a eukaryote, a vertebrate, a human, a mammal, a rodent, a bird, a reptile, a fish, an insect, a fungus, or a plant that is different from the first species.
[0105] In some instances, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 1:1, 2:1, 3:1, 4:1, 5:1, 6:1, 7:1, 8:1, 9:1, 10:1, 20:1, 50:1, or 100:1. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 1:1. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 2:1. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 4:1. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 5:1. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 8:1. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc polypeptides is 10:1. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 20:1. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 50:1. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 100:1. In some instances, the ratio is the comparison in molar concentration. In some instances, the ratio is the comparison in the number of capsid forming subunits (e.g., each of the or engineered Arc polypeptide forms a capsid subunit).
[0106] In some instances, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 1:2, 1:3, 1:4, 1:5, 1:6, 1:7, 1:8, 1:9, 1:10, 1:20, or 1:50. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 1:2. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 1:5. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 1:8. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 1:10. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 1:20. In some cases, the ratio of the first plurality of Arc or endo-Gag polypeptides to the second plurality of Arc or endo-Gag polypeptides is 1:50. In some instances, the ratio is the comparison in molar concentration. In some instances, the ratio is the comparison in the number of capsid forming subunits (e.g., each of the recombinant or engineered Arc or endo-Gag polypeptide forms a capsid subunit).
[0107] In some embodiments, the Arc-based capsid or endo-Gag-based capsid comprises a plurality of recombinant or engineered Arc polypeptides and a plurality of non-Arc proteins. Exemplary species of non-Arc proteins include but are not limited to, Copia, ASPRV1, a protein or a combination of proteins chosen from the SCAN domain family, a protein or a combination of proteins chosen from the Paraneoplastic Ma antigen family, and a protein or a combination of proteins chosen from the retrotransposon Gag-like family. Exemplary species of non-Arc proteins include BOP, LDOC1, MOAP1, PEG10, PNMA3, PNMA5, PNMA6A, PNMA6B, RTL3, RTL6, RTL8A, RTL8B, and ZNF18.
[0108] In some instances, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 1:1, 2:1, 3:1, 4:1, 5:1, 6:1, 7:1, 8:1, 9:1, 10:1, 20:1, 50:1, or 100:1. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 1:1. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 2:1. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 4:1. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 5:1. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 8:1. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 10:1. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 20:1. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 50:1. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 100:1. In some instances, the ratio is the comparison in molar concentration. In some instances, the ratio is the comparison in the number of capsid forming subunits (e.g., each of the recombinant or engineered Arc polypeptide forms a capsid subunit).
[0109] In some instances, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 1:2, 1:3, 1:4, 1:5, 1:6, 1:7, 1:8, 1:9, 1:10, 1:20, or 1:50. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 1:2. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 1:5. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 1:8. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 1:10. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 1:20. In some cases, the ratio of the plurality of recombinant or engineered Arc polypeptides to the plurality of non-Arc proteins is 1:50. In some instances, the ratio is the comparison in molar concentration. In some instances, the ratio is the comparison in the number of capsid forming subunits (e.g., each of the recombinant or engineered Arc polypeptide forms a capsid subunit).
[0110] In some embodiments, the capsid has a diameter of at least 1 nm, or more. In some instances, the capsid has a diameter of at least 2 nm, 3 nm, 4 nm, 5 nm, 10 nm, 15 nm, 20 nm, 25 nm, 30 nm, 40 nm, 50 nm, 60 nm, 70 nm, 80 nm, 90 nm, 100 nm, 150 nm, 200 nm, 300 nm, 400 nm, 500 nm, 600 nm, or more. In some instances, the capsid has a diameter of at least 5 nm, or more. In some cases, the capsid has a diameter of at least 10 nm, or more. In some instances, the capsid has a diameter of at least 20 nm, or more. In some cases, the capsid has a diameter of at least 30 nm, or more. In some cases, the capsid has a diameter of at least 40 nm, or more. In some cases, the capsid has a diameter of at least 50 nm, or more. In some cases, the capsid has a diameter of at least 80 nm, or more. In some cases, the capsid has a diameter of at least 100 nm, or more. In some cases, the capsid has a diameter of at least 200 nm, or more. In some cases, the capsid has a diameter of at least 300 nm, or more. In some cases, the capsid has a diameter of at least 400 nm, or more. In some cases, the capsid has a diameter of at least 500 nm, or more. In some cases, the capsid has a diameter of at least 600 nm, or more.
[0111] In some embodiments, the capsid has a diameter of at most 1 nm, or less. In some instances, the capsid has a diameter of at most 2 nm, 3 nm, 4 nm, 5 nm, 10 nm, 15 nm, 20 nm, 25 nm, 30 nm, 40 nm, 50 nm, 60 nm, 70 nm, 80 nm, 90 nm, 100 nm, 150 nm, 200 nm, 300 nm, 400 nm, 500 nm, 600 nm, or less. In some instances, the capsid has a diameter of at most 5 nm, or less. In some cases, the capsid has a diameter of at most 10 nm, or less. In some instances, the capsid has a diameter of at most 20 nm, or less. In some cases, the capsid has a diameter of at most 30 nm, or less. In some cases, the capsid has a diameter of at least 40 nm, or less. In some cases, the capsid has a diameter of at least 50 nm, or less. In some cases, the capsid has a diameter of at least 80 nm, or less. In some cases, the capsid has a diameter of at least 100 nm, or less. In some cases, the capsid has a diameter of at least 200 nm, or less. In some cases, the capsid has a diameter of at least 300 nm, or less. In some cases, the capsid has a diameter of at least 400 nm, or less. In some cases, the capsid has a diameter of at least 500 nm, or less. In some cases, the capsid has a diameter of at least 600 nm, or less.
[0112] In some embodiments, the capsid has a diameter of about 1 nm, 2 nm, 3 nm, 4 nm, 5 nm, 10 nm, 15 nm, 20 nm, 25 nm, 30 nm, 40 nm, 50 nm, 60 nm, 70 nm, 80 nm, 90 nm, 100 nm, 150 nm, 200 nm, 300 nm, 400 nm, 500 nm, or 600 nm. In some instances, the capsid has a diameter of about 5 nm. In some cases, the capsid has a diameter of about 10 nm. In some instances, the capsid has a diameter of about 20 nm. In some cases, the capsid has a diameter of about 30 nm. In some cases, the capsid has a diameter of about 40 nm. In some cases, the capsid has a diameter of about 50 nm. In some cases, the capsid has a diameter of about 80 nm. In some cases, the capsid has a diameter of about 100 nm. In some cases, the capsid has a diameter of about 200 nm. In some cases, the capsid has a diameter of about 300 nm. In some cases, the capsid has a diameter of about 400 nm. In some cases, the capsid has a diameter of about 500 nm. In some cases, the capsid has a diameter of about 600 nm.
[0113] In some embodiments, the capsid has a diameter of from about 1 nm to about 600 nm. In some instances, the capsid has a diameter of from about 2 nm to about 500 nm, from about 2 nm to about 400 nm, from about 2 nm to about 300 nm, from about 2 nm to about 200 nm, from about 2 nm to about 100 nm, from about 2 nm to about 50 nm, from about 2 nm to about 30 nm, from about 20 nm to about 400 nm, from about 20 nm to about 300 nm, from about 20 nm to about 200 nm, from about 20 nm to about 100 nm, from about 20 nm to about 50 nm, from about 20 nm to about 30 nm, from about 30 nm to about 500 nm, from about 30 nm to about 400 nm, from about 30 nm to about 300 nm, from about 30 nm to about 200 nm, from about 30 nm to about 100 nm, from about 30 nm to about 50 nm, from about 50 nm to about 300 nm, from about 50 nm to about 200 nm, from about 50 nm to about 100 nm, from about 2 nm to about 25 nm, from about 2 nm to about 20 nm, from about 2 nm to about 10 nm, from about 5 nm to about 25 nm, from about 5 nm to about 20 nm, from about 5 nm to about 10 nm, from about 10 nm to about 25 nm, or from about 10 nm to about 20 nm.
[0114] In some embodiments, the capsid has a reduced off-target effect. In some cases, the off-target effect is less than 10%, 5%, 4%, 3%, 2%, 1%, or 0.5%. In some cases, the off-target effect is no more than 10%, 5%, 4%, 3%, 2%, 1%, or 0.5%.
[0115] In some cases, the capsid does not have an off-target effect.
[0116] In certain embodiments, the formation of Arc and/or endo-Gag-based capsids occurs either ex vivo or in vitro.
[0117] In some instances, the Arc and/or endo-Gag-based capsids is assembled in vivo.
[0118] In some instances, the Arc and/or endo-Gag-based capsids is stable at room temperature. In some cases, the Arc and/or endo-Gag-based capsids is empty. In other cases, the Arc and/or endo-Gag-based capsids is loaded (for example, loaded with a cargo and/or a therapeutic agent, e.g., a DNA or an RNA).
[0119] In some instances, the Arc and/or endo-Gag-based capsids is stable at a temperature from about 2.degree. C. to about 37.degree. C. In some instances, the Arc and/or endo-Gag-based capsids is stable at a temperature from about 2.degree. C. to about 8.degree. C., about 2.degree. C. to about 4.degree. C., about 20.degree. C. to about 37.degree. C., about 25.degree. C. to about 37.degree. C., about 20.degree. C. to about 30.degree. C., about 25.degree. C. to about 30.degree. C., or about 30.degree. C. to about 37.degree. C. In some cases, the Arc and/or endo-Gag-based capsid is empty. In other cases, the Arc and/or endo-Gag-based capsids is loaded (for example, loaded with a cargo and/or a therapeutic agent, e.g., a DNA or an RNA).
[0120] In some instances, the Arc and/or endo-Gag-based capsids is stable for at least about 1 day, 2 days, 4 days, 5 days, 7 days, 14 days, 28 days, 30 days, 60 days, 2 months, 3 months, 4 months, 5 months, 6 months, 12 months, 18 months, 24 months, 3 years, 5 years, or longer. In some case, the Arc and/or endo-Gag-based capsids has minimum degradation, e.g., less than 10%, 5%, 4%, 3%, 2%, 1%, 0.5% based on the total population of the Arc and/or endo-Gag-based capsids that is degraded. In some cases, the Arc and/or endo-Gag-based capsid is empty. In other cases, the Arc and/or endo-Gag-based capsids is loaded (for example, loaded with a therapeutic agent, e.g., a DNA or an RNA).
Additional Capsids
[0121] In some embodiments, the capsid comprises the Copia protein. In some instances, the Copia protein is from Drosophila melanogaster (UniProtKB--P04146), Ceratitis capitate (UniProtKB--W8BHY5), or Drosophila simulans (UniProtKB--Q08461).
[0122] In some embodiments, the capsid comprises the protein ASPRV1. The ASPRV1 protein is a structural protein that participates in the development and maintenance of the skin barrier. In some instances, the protein ASPRV1 is from Homo sapiens (UniProtKB--Q53RT3).
[0123] In some embodiments, the capsid comprises a protein from the SCAN domain family. SCAN domain is a superfamily of zinc finger transcription factors. SCAN domain is also known as leucine rich region (LeR) and functions as protein interaction domain that mediates self-association or selective association with other proteins.
[0124] In some embodiments, the capsid comprises a protein from the Paraneoplastic Ma antigen family. The Paraneoplastic Ma antigen family comprises about 14 members of neuro- and testis-specific proteins.
[0125] In some embodiments, the capsid comprises a protein encoded by a Retrotransposon Gag-like gene.
[0126] In some embodiments, the capsid comprises BOP, LDOC1, MOAP1, PEG10, PNMA3, PNMA5, PNMA6A, PNMA6B, RTL3, RTL6, RTL8A, RTL8B, and/or ZNF18.
Cargos
[0127] In some embodiments, a composition of the disclosure (for example, a capsid) comprises a cargo. In some embodiments, the cargo is a therapeutic agent. In some embodiments, the cargo is a nucleic acid molecule, a small molecule, a protein, a peptide, an antibody or binding fragment thereof, a peptidomimetic, or a nucleotidomimetic. In some instances, the cargo is a therapeutic cargo, comprising e.g., one or more drugs. In some instances, the cargo comprises a diagnostic tool, for profiling, e.g., one or more markers (such as markers associates with one or more disease phenotypes). In additional instances, the cargo comprises an imaging tool.
[0128] In some instances, the cargo is a nucleic acid molecule. Exemplary nucleic acid molecules include DNA, RNA, or a mixture of DNA and RNA. In some instances, the nucleic acid molecule is a DNA polymer. In some cases, the DNA is a single stranded DNA polymer. In other cases, the DNA is a double stranded DNA polymer. In additional cases, the DNA is a hybrid of single and double stranded DNA polymer.
[0129] In some embodiments, the nucleic acid molecule is a RNA polymer, e.g., a single stranded RNA polymer, a double stranded RNA polymer, or a hybrid of single and double stranded RNA polymers. In some instances, the RNA comprises and/or encodes an antisense oligoribonucleotide, a siRNA, an mRNA, a tRNA, an rRNA, a snRNA, a shRNA, microRNA, or a non-coding RNA.
[0130] In some embodiments, the nucleic acid molecule comprises a hybrid of DNA and RNA.
[0131] In some embodiments, the nucleic acid molecule is an antisense oligonucleotide, optionally comprising DNA, RNA, or a hybrid of DNA and RNA.
[0132] In some instances, the nucleic acid molecule comprises and/or encodes an mRNA molecule.
[0133] In some embodiments, the nucleic acid molecule comprises and/or encodes an RNAi molecule. In some cases, the RNAi molecule is a microRNA (miRNA) molecule. In other cases, the RNAi molecule is a siRNA molecule. The miRNA and/or siRNA are optionally double-stranded or as a hairpin, and further optionally encapsulated as precursor molecules.
[0134] In some embodiments, the nucleic acid molecule is for use in a nucleic acid-based therapy. In some instances, the nucleic acid molecule is for regulating gene expression (e.g., modulating mRNA translation or degradation), modulating RNA splicing, or RNA interference. In some cases, the nucleic acid molecule comprises and/or encodes an antisense oligonucleotide, microRNA molecule, siRNA molecule, mRNA molecule, for use in regulation of gene expression, modulating RNA splicing, or RNA interference.
[0135] In some instances, the nucleic acid molecule is for use in gene editing. Exemplary gene editing systems include, but are not limited to, CRISPR-Cas systems, zinc finger nuclease (ZFN) systems, and transcription activator-like effector nuclease (TALEN) systems. In some cases, the nucleic acid molecule comprises and/or encodes a component involved in the CRISPR-Cas systems, ZFN systems, or the TALEN systems.
[0136] In some cases, the nucleic acid molecule is for use in antigen production for therapeutic and/or prophylactic vaccine production. For example, the nucleic acid molecule encodes an antigen that is expressed and elicits a desirable immune response (e.g., a pro-inflammatory immune response, an anti-inflammatory immune response, an B cell response, an antibody response, a T cell response, a CD4+ T cell response, a CD8+ T cell response, a Th1 immune response, a Th2 immune response, a Th17 immune response, a Treg immune response, or a combination thereof).
[0137] In some cases, the nucleic acid molecule comprises a nucleic acid enzyme. Nucleic acid enzymes are RNA molecules (e.g., ribozymes) or DNA molecules (e.g., deoxyribozymes) that have catalytic activities. In some instances, the nucleic acid molecule is a ribozyme. In other instances, the nucleic acid molecule is a deoxyribozyme. In some cases, the nucleic acid molecule is a MNAzyme, which functions as a biosensor and/or a molecular switch (see, e.g., Mokany, et al., "MNAzymes, a versatile new class of nucleic acid enzymes that can function as biosensors and molecular switches," JACS 132(2): 1051-1059 (2010)).
[0138] In some instances, exemplary targets of the nucleic acid molecule include, but are not limited to, UL123 (human cytomegalovirus), APOB, AR (androgen receptor) gene, KRAS, PCSK9, CFTR, and SMN (e.g., SMN2).
[0139] In some embodiments, the nucleic acid molecule is at least 5 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 400, 500, 1000, 1500, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 10 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 12 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 15 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 18 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 19 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 20 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 21 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 22 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 23 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 24 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 25 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 26 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 27 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 28 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 29 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 30 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 40 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 50 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 100 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 200 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 300 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 500 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 1000 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 2000 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 3000 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 4000 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 5000 nucleotides or more in length. In some instances, the nucleic acid molecule is at least 8000 nucleotides or more in length.
[0140] In some embodiments, the nucleic acid molecule is at most 12 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 400, 500, 1000, 1500, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 15 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 18 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 19 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 20 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 21 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 22 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 23 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 24 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 25 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 26 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 27 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 28 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 29 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 30 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 40 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 50 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 100 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 200 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 300 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 500 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 1000 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 2000 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 3000 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 4000 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 5000 nucleotides or less in length. In some instances, the nucleic acid molecule is at most 8000 nucleotides or less in length.
[0141] In some embodiments, the nucleic acid molecule is about 5 nucleotides in length. In some instances, the nucleic acid molecule is about 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 400, 500, 1000, 1500, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000 nucleotides in length. In some instances, the nucleic acid molecule is about 10 nucleotides in length. In some instances, the nucleic acid molecule is about 12 nucleotides in length. In some instances, the nucleic acid molecule is about 15 nucleotides in length. In some instances, the nucleic acid molecule is about 18 nucleotides in length. In some instances, the nucleic acid molecule is about 19 nucleotides in length. In some instances, the nucleic acid molecule is about 20 nucleotides in length. In some instances, the nucleic acid molecule is about 21 nucleotides in length. In some instances, the nucleic acid molecule is about 22 nucleotides in length. In some instances, the nucleic acid molecule is about 23 nucleotides in length. In some instances, the nucleic acid molecule is about 24 nucleotides in length. In some instances, the nucleic acid molecule is about 25 nucleotides in length. In some instances, the nucleic acid molecule is about 26 nucleotides in length. In some instances, the nucleic acid molecule is about 27 nucleotides in length. In some instances, the nucleic acid molecule is about 28 nucleotides in length. In some instances, the nucleic acid molecule is about 29 nucleotides in length. In some instances, the nucleic acid molecule is about 30 nucleotides in length. In some instances, the nucleic acid molecule is about 40 nucleotides in length. In some instances, the nucleic acid molecule is about 50 nucleotides in length. In some instances, the nucleic acid molecule is about 100 nucleotides in length. In some instances, the nucleic acid molecule is about 200 nucleotides in length. In some instances, the nucleic acid molecule is about 300 nucleotides in length. In some instances, the nucleic acid molecule is about 500 nucleotides in length. In some instances, the nucleic acid molecule is about 1000 nucleotides in length. In some instances, the nucleic acid molecule is about 2000 nucleotides in length. In some instances, the nucleic acid molecule is about 3000 nucleotides in length. In some instances, the nucleic acid molecule is about 4000 nucleotides in length. In some instances, the nucleic acid molecule is about 5000 nucleotides in length. In some instances, the nucleic acid molecule is about 8000 nucleotides in length.
[0142] In some embodiments, the nucleic acid molecule is from about 5 to about 10,000 nucleotides in length. In some instances, the nucleic acid molecule is from about 5 to about 9000 nucleotides in length, from about 5 to about 8000 nucleotides in length, from about 5 to about 7000 nucleotides in length, from about 5 to about 6000 nucleotides in length, from about 5 to about 5000 nucleotides in length, from about 5 to about 4000 nucleotides in length, from about 5 to about 3000 nucleotides in length, from about 5 to about 2000 nucleotides in length, from about 5 to about 1000 nucleotides in length, from about 5 to about 500 nucleotides in length, from about 5 to about 100 nucleotides in length, from about 5 to about 50 nucleotides in length, from about 5 to about 40 nucleotides in length, from about 5 to about 30 nucleotides in length, from about 5 to about 25 nucleotides in length, from about 5 to about 20 nucleotides in length, from about 10 to about 8000 nucleotides in length, from about 10 to about 7000 nucleotides in length, from about 10 to about 6000 nucleotides in length, from about 10 to about 5000 nucleotides in length, from about 10 to about 4000 nucleotides in length, from about 10 to about 3000 nucleotides in length, from about 10 to about 2000 nucleotides in length, from about 10 to about 1000 nucleotides in length, from about 10 to about 500 nucleotides in length, from about 10 to about 100 nucleotides in length, from about 10 to about 50 nucleotides in length, from about 10 to about 40 nucleotides in length, from about 10 to about 30 nucleotides in length, from about 10 to about 25 nucleotides in length, from about 10 to about 20 nucleotides in length, from about 18 to about 8000 nucleotides in length, from about 18 to about 7000 nucleotides in length, from about 18 to about 6000 nucleotides in length, from about 18 to about 5000 nucleotides in length, from about 18 to about 4000 nucleotides in length, from about 18 to about 3000 nucleotides in length, from about 18 to about 2000 nucleotides in length, from about 18 to about 1000 nucleotides in length, from about 18 to about 500 nucleotides in length, from about 18 to about 100 nucleotides in length, from about 18 to about 50 nucleotides in length, from about 18 to about 40 nucleotides in length, from about 18 to about 30 nucleotides in length, from about 18 to about 25 nucleotides in length, from about 12 to about 50 nucleotides in length, from about 20 to about 40 nucleotides in length, from about 20 to about 30 nucleotides in length, or from about 25 to about 30 nucleotides in length.
[0143] In some embodiments, the nucleic acid molecule comprises natural, synthetic, or artificial nucleotide analogues or bases. In some cases, the nucleic acid molecule comprises combinations of DNA, RNA and/or nucleotide analogues. In some instances, the synthetic or artificial nucleotide analogues or bases comprise modifications at one or more of ribose moiety, phosphate moiety, nucleoside moiety, or a combination thereof.
[0144] In some embodiments, a nucleotide analogue or artificial nucleotide base described above comprises a nucleic acid with a modification at a 2' hydroxyl group of the ribose moiety. In some instances, the modification includes an H, OR, R, halo, SH, SR, NH2, NHR, NR2, or CN, wherein R is an alkyl moiety. Exemplary alkyl moiety includes, but is not limited to, halogens, sulfurs, thiols, thioethers, thioesters, amines (primary, secondary, or tertiary), amides, ethers, esters, alcohols and oxygen. In some instances, the alkyl moiety further comprises a modification. In some instances, the modification comprises an azo group, a keto group, an aldehyde group, a carboxyl group, a nitro group, a nitroso, group, a nitrile group, a heterocycle (e.g., imidazole, hydrazino or hydroxylamino) group, an isocyanate or cyanate group, or a sulfur containing group (e.g., sulfoxide, sulfone, sulfide, or disulfide). In some instances, the alkyl moiety further comprises a hetero substitution. In some instances, the carbon of the heterocyclic group is substituted by a nitrogen, oxygen or sulfur. In some instances, the heterocyclic substitution includes but is not limited to, morpholino, imidazole, and pyrrolidino.
[0145] In some instances, the modification at the 2' hydroxyl group is a 2'-O-methyl modification or a 2'-O-methoxyethyl (2'-O-MOE) modification. In some cases, the 2'-O-methyl modification adds a methyl group to the 2' hydroxyl group of the ribose moiety whereas the 2'O-methoxyethyl modification adds a methoxyethyl group to the 2' hydroxyl group of the ribose moiety.
[0146] In some instances, the modification at the 2' hydroxyl group is a 2'-O-aminopropyl modification in which an extended amine group comprising a propyl linker binds the amine group to the 2' oxygen. In some instances, this modification neutralizes the phosphate-derived overall negative charge of the oligonucleotide molecule by introducing one positive charge from the amine group per sugar and thereby improves cellular uptake properties due to its zwitterionic properties.
[0147] In some instances, the modification at the 2' hydroxyl group is a locked or bridged ribose modification (e.g., locked nucleic acid or LNA) in which the oxygen molecule bound at the 2' carbon is linked to the 4' carbon by a methylene group, thus forming a 2'-C,4'-C-oxy-methylene-linked bicyclic ribonucleotide monomer.
[0148] In some embodiments, additional modifications at the 2' hydroxyl group include 2'-deoxy, T-deoxy-2'-fluoro, 2'-O-aminopropyl (2'-O-AP), 2'-O-dimethylaminoethyl (2'-O-DMAOE), 2'-O-dimethylaminopropyl (2'-O-DMAP), T-O-dimethylaminoethyloxyethyl (2'-O-DMAEOE), or 2'-O--N-methylacetamido (2'-O-NMA).
[0149] In some embodiments, a nucleotide analogue comprises a modified base such as, but not limited to, 5-propynyluridine, 5-propynylcytidine, 6-methyladenine, 6-methylguanine, N, N, -dimethyladenine, 2-propyladenine, 2propylguanine, 2-aminoadenine, 1-methylinosine, 3-methyluridine, 5-methylcytidine, 5-methyluridine and other nucleotides having a modification at the 5 position, 5-(2-amino) propyl uridine, 5-halocytidine, 5-halouridine, 4-acetylcytidine, 1-methyladenosine, 2-methyladenosine, 3-methylcytidine, 6-methyluridine, 2-methylguanosine, 7-methylguanosine, 2, 2-dimethylguanosine, 5-methylaminoethyluridine, 5-methyloxyuridine, deazanucleotides (such as 7-deaza-adenosine, 6-azouridine, 6-azocytidine, or 6-azothymidine), 5-methyl-2-thiouridine, other thio bases (such as 2-thiouridine, 4-thiouridine, and 2-thiocytidine), dihydrouridine, pseudouridine, queuosine, archaeosine, naphthyl and substituted naphthyl groups, any O-and N-alkylated purines and pyrimidines (such as N6-methyladenosine, 5-methylcarbonylmethyluridine, uridine 5-oxyacetic acid, pyridine-4-one, or pyridine-2-one), phenyl and modified phenyl groups such as aminophenol or 2,4, 6-trimethoxy benzene, modified cytosines that act as G-clamp nucleotides, 8-substituted adenines and guanines, 5-substituted uracils and thymines, azapyrimidines, carboxyhydroxyalkyl nucleotides, carboxyalkylaminoalkyi nucleotides, and alkylcarbonylalkylated nucleotides. Modified nucleotides also include those nucleotides that are modified with respect to the sugar moiety, as well as nucleotides having sugars or analogs thereof that are not ribosyl. For example, the sugar moieties, in some cases are or are based on, mannoses, arabinoses, glucopyranoses, galactopyranoses, 4'-thioribose, and other sugars, heterocycles, or carbocycles. The term nucleotide also includes universal bases. By way of example, universal bases include but are not limited to 3-nitropyrrole, 5-nitroindole, or nebularine.
[0150] In some embodiments, a nucleotide analogue further comprises a morpholino, a peptide nucleic acid (PNA), a methylphosphonate nucleotide, a thiolphosphonate nucleotide, a 2'-fluoro N3-P5'-phosphoramidite, or a 1', 5'-anhydrohexitol nucleic acid (HNA). Morpholino or phosphorodiamidate morpholino oligo (PMO) comprises synthetic molecules whose structure mimics natural nucleic acid structure but deviates from the normal sugar and phosphate structures. In some instances, the five member ribose ring is substituted with a six member morpholino ring containing four carbons, one nitrogen, and one oxygen. In some cases, the ribose monomers are linked by a phosphordiamidate group instead of a phosphate group. In such cases, the backbone alterations remove all positive and negative charges making morpholinos neutral molecules capable of crossing cellular membranes without the aid of cellular delivery agents such as those used by charged oligonucleotides.
[0151] In some embodiments, peptide nucleic acid (PNA) does not contain sugar ring or phosphate linkage and the bases are attached and appropriately spaced by oligoglycine-like molecules, therefore, eliminating a backbone charge.
[0152] In some embodiments, one or more modifications optionally occur at the internucleotide linkage. In some instances, modified internucleotide linkage includes, but is not limited to, phosphorothioates; phosphorodithioates; methylphosphonates; 5'-alkylenephosphonates; 5'-methylphosphonate; 3'-alkylene phosphonates; borontrifluoridates; borano phosphate esters and selenophosphates of 3'-5'linkage or 2'-5'linkage; phosphotriesters; thionoalkylphosphotriesters; hydrogen phosphonate linkages; alkyl phosphonates; alkylphosphonothioates; arylphosphonothioates; phosphoroselenoates; phosphorodiselenoates; phosphinates; phosphoramidates; 3'-alkylphosphoramidates; aminoalkylphosphoramidates; thionophosphoramidates; phosphoropiperazidates; phosphoroanilothioates; phosphoroanilidates; ketones; sulfones; sulfonamides; carbonates; carbamates; methylenehydrazos; methylenedimethylhydrazos; formacetals; thioformacetals; oximes; methyleneiminos; methylenemethyliminos; thioamidates; linkages with riboacetyl groups; aminoethyl glycine; silyl or siloxane linkages; alkyl or cycloalkyl linkages with or without heteroatoms of, for example, 1 to 10 carbons that are saturated or unsaturated and/or substituted and/or contain heteroatoms; linkages with morpholino structures, amides, or polyamides wherein the bases are attached to the aza nitrogens of the backbone directly or indirectly; and combinations thereof.
[0153] In some embodiments, one or more modifications comprise a modified phosphate backbone in which the modification generates a neutral or uncharged backbone. In some instances, the phosphate backbone is modified by alkylation to generate an uncharged or neutral phosphate backbone. As used herein, alkylation includes methylation, ethylation, and propylation. In some cases, an alkyl group, as used herein in the context of alkylation, refers to a linear or branched saturated hydrocarbon group containing from 1 to 6 carbon atoms. In some instances, exemplary alkyl groups include, but are not limited to, methyl, ethyl, n-propyl, isopropyl, n-butyl, isobutyl, sec-butyl, tert-butyl, n-pentyl, isopentyl, neopentyl, hexyl, isohexyl, 1, 1-dimethylbutyl, 2,2-dimethylbutyl, 3.3-dimethylbutyl, and 2-ethylbutyl groups. In some cases, a modified phosphate is a phosphate group as described in U.S. Pat. No. 9,481,905.
[0154] In some embodiments, additional modified phosphate backbones comprise methylphosphonate, ethylphosphonate, methylthiophosphonate, or methoxyphosphonate. In some cases, the modified phosphate is methylphosphonate. In some cases, the modified phosphate is ethylphosphonate. In some cases, the modified phosphate is methylthiophosphonate. In some cases, the modified phosphate is methoxyphosphonate.
[0155] In some embodiments, one or more modifications further optionally include modifications of the ribose moiety, phosphate backbone and the nucleoside, or modifications of the nucleotide analogues at the 3' or the 5' terminus. For example, the 3' terminus optionally include a 3' cationic group, or by inverting the nucleoside at the 3'-terminus with a 3'-3' linkage. In another alternative, the 3'-terminus is optionally conjugated with an aminoalkyl group, e.g., a 3' C5-aminoalkyl dT. In an additional alternative, the 3'-terminus is optionally conjugated with an abasic site, e.g., with an apurinic or apyrimidinic site. In some instances, the 5'-terminus is conjugated with an aminoalkyl group, e.g., a 5'-O-alkylamino substituent. In some cases, the 5'-terminus is conjugated with an abasic site, e.g., with an apurinic or apyrimidinic site.
[0156] In some embodiments, exemplary nucleic acid cargos include, but are not limited to, Fomivirsen, Mipomersen, AZD5312 (AstraZeneca), Nusinersen, and SB010 (Sterna Biologicals).
Small Molecules
[0157] In some embodiments, the cargo is a small molecule. In some instances, the small molecule is an inhibitor (e.g., a pan inhibitor or a selective inhibitor). In other instances, the small molecule is an activator. In additional cases, the small molecule is an agonist, antagonist, a partial agonist, a mixed agonist/antagonist, or a competitive antagonist.
[0158] In some embodiments, the small molecule is a drug that falls under the class of analgesics, antianxiety drugs, antiarrhythmics, antibacterials, antibiotics, anticoagulants and thrombolytics, anticonvulsants, antidepressants, antidiarrheals, antiemetics, antifungals, antihistamines, antihypertensives, anti-inflammatories, antineoplastics, antipsychotics, antipyretics, antivirals, barbiturates, beta-blockers, bronchodilators, cold cures, corticosteroids, cough suppressants, cytotoxics, decongestants, diuretics, expectorant, hormones, hypoglycemics, immunosuppressives, laxatives, muscle relaxants, sex hormones, sleeping drugs, or tranquilizers.
[0159] In some embodiments, the small molecule is an inhibitor, e.g., an inhibitor of a kinase pathway such as the Tyrosine kinase pathway or a Serine/Threonine kinase pathway. In some cases, the small molecule is a dual protein kinase inhibitor. In some cases, the small molecule is a lipid kinase inhibitor.
[0160] In some cases, the small molecule is a neuraminidase inhibitor.
[0161] In some cases, the small molecule is a carbonic anhydrase inhibitor.
[0162] In some embodiments, exemplary targets of the small molecule include, but are not limited to, vascular endothelial growth factor receptor 1 (VEGFR1), vascular endothelial growth factor receptor 2 (VEGFR2), vascular endothelial growth factor receptor 3 (VEGFR3), fibroblast growth factor receptor 1 (FGFR1), fibroblast growth factor receptor 2 (FGFR2), fibroblast growth factor receptor 3 (FGFR3), fibroblast growth factor receptor 4 (FGFR4), cyclin-dependent kinase 4 (CDK4), cyclin-dependent kinase 6 (CDK6), a receptor tyrosine kinase, a phosphoinositide 3-kinase (PI3K) isoform (e.g., PI3K.delta., also known as p110.delta.), Janus kinase 1 (JAK1), Janus kinase 3 (JAK3), a receptor from the family of platelet-derived growth factor receptors (PDFG-R), and carbonic anhydrase (e.g., carbonic anhydrase I).
[0163] In some embodiments, the small molecule targets a viral protein, e.g., a viral envelope protein. In some embodiments, the small molecule decreases viral adsorption to a host cell. In some embodiments, the small molecule decreases viral entry into a host cell. In some embodiments, the small molecule decreases viral replication in a host or a host cell. In some embodiments, the small molecule decreases viral assembly.
[0164] In some embodiments, exemplary small molecule cargos include, but are not limited to, lenvatinib, palbociclib, regorafenib, idelalisib, tofacitinib, nintedanib, zanamivir, ethoxzolamide, and artemisinin.
Proteins
[0165] In some embodiments, the cargo is a protein. In some instances, the protein is a full-length protein. In other instances, the protein is a fragment, e.g., a functional fragment. In some cases, the protein is a naturally occurring protein. In additional cases, the protein is a de novo engineered protein. In further cases, the protein is a fusion protein. In further cases, the protein is a recombinant protein. Exemplary proteins include, but are not limited to, Fc fusion proteins, anticoagulants, blood factors, bone morphogenetic proteins, enzymes, growth factors, hormones, interferons, interleukins, and thrombolytics.
[0166] In some instances, the protein is for use in an enzyme replacement therapy.
[0167] In some cases, the protein is for use in antigen production for therapeutic and/or prophylactic vaccine production. For example, the protein comprises an antigen that elicits a desirable immune response (e.g., a pro-inflammatory immune response, an anti-inflammatory immune response, an B cell response, an antibody response, a T cell response, a CD4+ T cell response, a CD8+ T cell response, a Th1 immune response, a Th2 immune response, a Th17 immune response, a Treg immune response, or a combination thereof).
[0168] In some instances, exemplary protein cargos include, but are not limited to, romiplostim, liraglutide, a human growth hormone (rHGH), human insulin (BHI), follicle-stimulating hormone (FSH), Factor VIII, erythropoietin (EPO), granulocyte colony-stimulating factor (G-CSF), alpha-galactosidase A, alpha-L-iduronidase, N-acetylgalactosamine-4-sulfatase, dornase alfa, tissue plasminogen activator (TPA), glucocerebrosidase, interferon-beta-1a, insulin-like growth factor 1 (IGF-1), or rasburicase.
Peptides
[0169] In some embodiments, the cargo is a peptide. In some instances, the peptide is a naturally occurring peptide. In other instances, the peptide is an artificial engineered peptide or a recombinant peptide. In some cases, the peptide targets a G-protein coupled receptor, an ion channel, a microbe, an anti-microbial target, a catalytic or other Ig-family of receptors, an intracellular target, a membrane-anchored target, or an extracellular target.
[0170] In some cases, the peptide comprises at least 2 amino acids. In some cases, the peptide comprises at least 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100 amino acids. In some cases, the peptide comprises at least 10 amino acids. In some cases, the peptide comprises at least 15 amino acids. In some cases, the peptide comprises at least 20 amino acids. In some cases, the peptide comprises at least 30 amino acids. In some cases, the peptide comprises at least 40 amino acids. In some cases, the peptide comprises at least 50 amino acids. In some cases, the peptide comprises at least 60 amino acids. In some cases, the peptide comprises at least 70 amino acids. In some cases, the peptide comprises at least 80 amino acids. In some cases, the peptide comprises at least 90 amino acids. In some cases, the peptide comprises at least 100 amino acids.
[0171] In some cases, the peptide comprises at most 3 amino acids. In some cases, the peptide comprises at most 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100 amino acids. In some cases, the peptide comprises at most 10 amino acids. In some cases, the peptide comprises at most 15 amino acids. In some cases, the peptide comprises at most 20 amino acids. In some cases, the peptide comprises at most 30 amino acids. In some cases, the peptide comprises at most 40 amino acids. In some cases, the peptide comprises at most 50 amino acids. In some cases, the peptide comprises at most 60 amino acids. In some cases, the peptide comprises at most 70 amino acids. In some cases, the peptide comprises at most 80 amino acids. In some cases, the peptide comprises at most 90 amino acids. In some cases, the peptide comprises at most 100 amino acids.
[0172] In some cases, the peptide comprises from about 1 to about 10 kDa. In some cases, the peptide comprises from about 1 to about 9 kDa, about 1 to about 6 kDa, about 1 to about 5 kDa, about 1 to about 4 kDa, about 1 to about 3 kDa, about 2 to about 8 kDa, about 2 to about 6 kDa, about 2 to about 4 kDa, about 1.2 to about 2.8 kDa, about 1.5 to about 2.5 kDa, or about 1.5 to about 2 kDa.
[0173] In some embodiments, the peptide is a cyclic peptide. In some instances, the cyclic peptide is a macrocyclic peptide. In other instances, the cyclic peptide is a constrained peptide. The cyclic peptides are assembled with varied linkages, such as for example, head-to-tail, head-to-side-chain, side-chain to tail, and side-chain to side-chain linkages. In some instances, a cyclic peptide (e.g., a macrocyclic or a constrained peptide) has a molecular weight from about 500 Dalton to about 2000 Dalton. In other instances, a cyclic peptide (e.g., a macrocyclic or a constrained peptide) ranges from about 10 amino acids to about 100 amino acids, from about 10 amino acids to about 70 amino acids, or from about 10 amino acids to about 50 amino acids.
[0174] In some cases, the peptide is for use in antigen production for therapeutic and/or prophylactic vaccine production. For example, the peptide comprises an antigen that elicits a desirable immune response (e.g., a pro-inflammatory immune response, an anti-inflammatory immune response, an B cell response, an antibody response, a T cell response, a CD4+ T cell response, a CD8+ T cell response, a Th1 immune response, a Th2 immune response, a Th17 immune response, a Treg immune response, or a combination thereof).
[0175] In some embodiments, the peptide comprises natural amino acids, unnatural amino acids, or a combination thereof. In some instances, an amino acid residue refers to a molecule containing both an amino group and a carboxyl group. Suitable amino acids include, without limitation, both the D- and L-isomers of the naturally-occurring amino acids, as well as non-naturally occurring amino acids prepared by organic synthesis or other metabolic routes. The term amino acid, as used herein, includes, without limitation, .alpha.-amino acids, natural amino acids, non-natural amino acids, and amino acid analogs.
[0176] In some instances, .alpha.-amino acid refers to a molecule containing both an amino group and a carboxyl group bound to a carbon which is designated the .alpha.-carbon.
[0177] In some instances, .beta.-amino acid refers to a molecule containing both an amino group and a carboxyl group in a .beta. configuration.
[0178] In some embodiments, an amino acid analog is a racemic mixture. In some instances, the D isomer of the amino acid analog is used. In some cases, the L isomer of the amino acid analog is used. In some instances, the amino acid analog comprises chiral centers that are in the R or S configuration.
[0179] In some embodiments, exemplary peptide cargos include, but are not limited to, peginesatide, insulin, adrenocorticotropic hormone (ACTH), calcitonin, oxytocin, vasopressin, octreolide, and leuprorelin.
[0180] In some embodiments, exemplary peptide cargos include, but are not limited to, Telavancin, Dalbavancin, Oritavancin, Anidulafungin, Lanreotide, Pasireotide, Romidepsin, Linaclotide, and Peginesatide.
Antibodies
[0181] In some embodiments, the cargo is an antibody or a binding fragment thereof. In some instances, the antibody or binding fragment thereof comprises a humanized antibody or binding fragment thereof, murine antibody or binding fragment thereof, chimeric antibody or binding fragment thereof, monoclonal antibody or binding fragment thereof, bispecific antibody or biding fragment thereof, monovalent Fab', divalent Fab.sub.2, F(ab)'.sub.3 fragments, single-chain variable fragment (scFv), bis-scFv, (scFv).sub.2, diabody, minibody, nanobody, triabody, tetrabody, disulfide stabilized Fv protein (dsFv), single-domain antibody (sdAb), Ig NAR, camelid antibody or binding fragment thereof, or a chemically modified derivative thereof.
[0182] In some instances, the antibody or binding fragment thereof recognizes a cell surface protein. In some instances, the cell surface protein is an antigen expressed by a cancerous cell. In some instances, the cell surface protein is a neoepitope. In some instances, the cell surface protein comprises one or more mutations compared to a wild-type protein. Exemplary cancer antigens include, but are not limited to, alpha fetoprotein, ASLG659, B7-H3, BAFF-R, Brevican, CA125 (MUC16), CA15-3, CA19-9, carcinoembryonic antigen (CEA), CA242, CRIPTO (CR, CR1, CRGF, CRIPTO, TDGF1, teratocarcinoma-derived growth factor), CTLA-4, CXCR5, E16 (LAT1, SLC7A5), FcRH2 (IFGP4, IRTA4, SPAP1A (SH2 domain containing phosphatase anchor protein 1a), SPAP1B, SPAP1C), epidermal growth factor, ETBR, Fc receptor-like protein 1 (FCRH1), GEDA, HLA-DOB (Beta subunit of MHC class II molecule (Ia antigen), human chorionic gonadotropin, ICOS, IL-2 receptor, IL20R.alpha., Immunoglobulin superfamily receptor translocation associated 2 (IRTA2), L6, Lewis Y, Lewis X, MAGE-1, MAGE-2, MAGE-3, MAGE 4, MART1, mesothelin, MDP, MPF (SMR, MSLN), MCP1 (CCL2), macrophage inhibitory factor (MIF), MPG, MSG783, mucin, MUC1-KLH, Napi3b (SLC34A2), nectin-4, Neu oncogene product, NCA, placental alkaline phosphatase, prostate specific membrane antigen (PMSA), prostatic acid phosphatase, PSCA hlg, anti-transferrin receptor, p97, Purinergic receptor P2.times. ligand-gated ion channel 5 (P2.times.5), LY64 (Lymphocyte antigen 64 (RP105), gp100, P21, six transmembrane epithelial antigen of prostate (STEAP1), STEAP2, Sema 5b, tumor-associated glycoprotein 72 (TAG-72), TrpM4 (BR22450, F1120041, TRPM4, TRPM4B, transient receptor potential cation channel, subfamily M, member 4) and the like.
[0183] In some instances, the cell surface protein comprises clusters of differentiation (CD) cell surface markers. Exemplary CD cell surface markers include, but are not limited to, CD1, CD2, CD3, CD4, CD5, CD6, CD7, CD8, CD9, CD10, CD11a, CD11b, CD11c, CD11d, CDw12, CD13, CD14, CD15, CD15s, CD16, CDw17, CD18, CD19, CD20, CD21, CD22, CD23, CD24, CD25, CD26, CD27, CD28, CD29, CD30, CD31, CD32, CD33, CD34, CD35, CD36, CD37, CD38, CD39, CD40, CD41, CD42, CD43, CD44, CD45, CD45RO, CD45RA, CD45RB, CD46, CD47, CD48, CD49a, CD49b, CD49c, CD49d, CD49e, CD49f, CD50, CD51, CD52, CD53, CD54, CD55, CD56, CD57, CD58, CD59, CDw60, CD61, CD62E, CD62L (L-selectin), CD62P, CD63, CD64, CD65, CD66a, CD66b, CD66c, CD66d, CD66e, CD71, CD79 (e.g., CD79a, CD79b), CD90, CD95 (Fas), CD103, CD104, CD125 (IL5RA), CD134 (OX40), CD137 (4-1BB), CD152 (CTLA-4), CD221, CD274, CD279 (PD-1), CD319 (SLAMF7), CD326 (EpCAM), and the like.
[0184] In some embodiments, exemplary antibodies or binding fragments thereof include, but are not limited to, zalutumumab (HuMax-EFGr, Genmab), abagovomab (Menarini), abituzumab (Merck), adecatumumab (MT201), alacizumab pegol, alemtuzumab (Campath.RTM., MabCampath, or Campath-1H; Leukosite), AlloMune (BioTransplant), amatuximab (Morphotek, Inc.), anti-VEGF (Genetech), anatumomab mafenatox, apolizumab (hulD10), ascrinvacumab (Pfizer Inc.), atezolizumab (MPDL3280A; Genentech/Roche), B43.13 (OvaRex, AltaRex Corporation), basiliximab (Simulect.RTM., Novartis), belimumab (Benlysta.RTM., GlaxoSmithKline), bevacizumab (Avastin.RTM., Genentech), blinatumomab (Blincyto, AMG103; Amgen), BEC2 (ImGlone Systems Inc.), carlumab (Janssen Biotech), catumaxomab (Removab, Trion Pharma), CEAcide (Immunomedics), Cetuximab (Erbitux.RTM., ImClone), citatuzumab bogatox (VB6-845), cixutumumab (IMC-A12, ImClone Systems Inc.), conatumumab (AMG 655, Amgen), dacetuzumab (SGN-40, huS2C6; Seattle Genetics, Inc.), daratumumab (Darzalex.RTM., Janssen Biotech), detumomab, drozitumab (Genentech), durvalumab (MedImmune), dusigitumab (MedImmune), edrecolomab (MAb17-1A, Panorex, Glaxo Wellcome), elotuzumab (Empliciti.TM., Bristol-Myers Squibb), emibetuzumab (Eli Lilly), enavatuzumab (Facet Biotech Corp.), enfortumab vedotin (Seattle Genetics, Inc.), enoblituzumab (MGA271, MacroGenics, Inc.), ensituxumab (Neogenix Oncology, Inc.), epratuzumab (LymphoCide, Immunomedics, Inc.), ertumaxomab (Rexomun.RTM., Trion Pharma), etaracizumab (Abegrin, MedImmune), farletuzumab (MORAb-003, Morphotek, Inc), FBTA05 (Lymphomun, Trion Pharma), ficlatuzumab (AVEO Pharmaceuticals), figitumumab (CP-751871, Pfizer), flanvotumab (ImClone Systems), fresolimumab (GC1008, Aanofi-Aventis), futuximab, glaximab, ganitumab (Amgen), girentuximab (Rencarex.RTM., Wilex AG), IMAB362 (Claudiximab, Ganymed Pharmaceuticals AG), imalumab (Baxalta), IMC-1C11 (ImClone Systems), IMC-C225 (Imclone Systems Inc.), imgatuzumab (Genentech/Roche), intetumumab (Centocor, Inc.), ipilimumab (Yervoy.RTM., Bristol-Myers Squibb), iratumumab (Medarex, Inc.), isatuximab (SAR650984, Sanofi-Aventis), labetuzumab (CEA-CIDE, Immunomedics), lexatumumab (ETR2-ST01, Cambridge Antibody Technology), lintuzumab (SGN-33, Seattle Genetics), lucatumumab (Novartis), lumiliximab, mapatumumab (HGS-ETR1, Human Genome Sciences), matuzumab (EMD 72000, Merck), milatuzumab (hLL1, Immunomedics, Inc.), mitumomab (BEC-2, ImClone Systems), narnatumab (ImClone Systems), necitumumab (Portrazza.TM., Eli Lilly), nesvacumab (Regeneron Pharmaceuticals), nimotuzumab (h-R3, BIOMAb EGFR, TheraCIM, Theraloc, or CIMAher; Biotech Pharmaceutical Co.), nivolumab (Opdivo.RTM., Bristol-Myers Squibb), obinutuzumab (Gazyva or Gazyvaro; Hoffmann-La Roche), ocaratuzumab (AME-133v, LY2469298; Mentrik Biotech, LLC), ofatumumab (Arzerra.RTM., Genmab), onartuzumab (Genentech), Ontuxizumab (Morphotek, Inc.), oregovomab (OvaRex.RTM., AltaRex Corp.), otlertuzumab (Emergent BioSolutions), panitumumab (ABX-EGF, Amgen), pankomab (Glycotope GMBH), parsatuzumab (Genentech), patritumab, pembrolizumab (Keytruda.RTM., Merck), pemtumomab (Theragyn, Antisoma), pertuzumab (Perjeta, Genentech), pidilizumab (CT-011, Medivation), polatuzumab vedotin (Genentech/Roche), pritumumab, racotumomab (Vaxira.RTM., Recombio), ramucirumab (Cyramza.RTM., ImClone Systems Inc.), rituximab (Rituxan.RTM., Genentech), robatumumab (Schering-Plough), Seribantumab (Sanofi/Merrimack Pharmaceuticals, Inc.), sibrotuzumab, siltuximab (Sylvant.TM., Janssen Biotech), Smart MI95 (Protein Design Labs, Inc.), Smart ID10 (Protein Design Labs, Inc.), tabalumab (LY2127399, Eli Lilly), taplitumomab paptox, tenatumomab, teprotumumab (Roche), tetulomab, TGN1412 (CD28-SuperMAB or TAB08), tigatuzumab (CD-1008, Daiichi Sankyo), tositumomab, trastuzumab (Herceptin.RTM.), tremelimumab (CP-672,206; Pfizer), tucotuzumab celmoleukin (EMD Pharmaceuticals), ublituximab, urelumab (BMS-663513, Bristol-Myers Squibb), volociximab (M200, Biogen Idec), and zatuximab.
[0185] In some instances, the antibody or binding fragments thereof is an antibody-drug conjugate (ADC). In some cases, the payload of the ADC comprises, for example, but is not limited to, an auristatin derivative, maytansine, a maytansinoid, a taxane, a calicheamicin, cemadotin, a duocarmycin, a pyrrolobenzodiazepine (PDB), or a tubulysin. In some instances, the payload comprises monomethyl auristatin E (MMAE) or monomethyl auristatin F (MMAF). In some instances, the payload comprises DM2 (mertansine) or DM4. In some instances, the payload comprises a pyrrolobenzodiazepine dimer.
Additional Cargos
[0186] In some embodiments, the cargo is a peptidomimetic. A peptidomimetic is a small protein-like polymer designed to mimic a peptide. In some instances, the peptidomimetic comprises D-peptides. In other instances, the peptidomimetic comprises L-peptides. Exemplary peptidomimetics include peptoids and .beta.-peptides.
[0187] In some embodiments, the cargo is a nucleotidomimetic.
Vectors and Expression Systems
[0188] In certain embodiments, the Arc polypeptides, endo-Gag polypeptides, engineered Arc and engineered endo-Gag polypeptides described supra are encoded by plasmid vectors. In some embodiments, vectors include any suitable vectors derived from either a eukaryotic or prokaryotic sources. In some cases, vectors are obtained from bacteria (e.g. E. coli), insects, yeast (e.g. Pichia pastoris), algae, or mammalian sources.
[0189] Exemplary bacterial vectors include pACYC177, pASK75, pBAD vector series, pBADM vector series, pET vector series, pETM vector series, pGEX vector series, pHAT, pHAT2, pMal-c2, pMal-p2, pQE vector series, pRSET A, pRSET B, pRSET C, pTrcHis2 series, pZA31-Luc, pZE21-MCS-1, pFLAG ATS, pFLAG CTS, pFLAG MAC, pFLAG Shift-12c, pTAC-MAT-1, pFLAG CTC, or pTAC-MAT-2.
[0190] Exemplary insect vectors include pFastBac1, pFastBac DUAL, pFastBac ET, pFastBac HTa, pFastBac HTb, pFastBac HTc, pFastBac M30a, pFastBact M30b, pFastBac, M30c, pVL1392, pVL1393, pVL1393 M10, pVL1393 M11, pVL1393 M12, FLAG vectors such as pPolh-FLAG1 or pPolh-MAT 2, or MAT vectors such as pPolh-MAT1, or pPolh-MAT2.
[0191] In some cases, yeast vectors include Gateway.RTM. pDEST.TM. 14 vector, Gateway.RTM. pDEST.TM. 15 vector, Gateway.RTM. pDEST.TM. 17 vector, Gateway.RTM. pDEST.TM. 24 vector, Gateway.RTM. pYES-DEST52 vector, pBAD-DEST49 Gateway.RTM. destination vector, pAO815 Pichia vector, pFLD1 Pichi pastoris vector, pGAPZA,B, & C Pichia pastoris vector, pPIC3.5K Pichia vector, pPIC6 A, B, & C Pichia vector, pPIC9K Pichia vector, pTEF1/Zeo, pYES2 yeast vector, pYES2/CT yeast vector, pYES2/NT A, B, & C yeast vector, or pYES3/CT yeast vector.
[0192] Exemplary algae vectors include pChlamy-4 vector or MCS vector.
[0193] Examples of mammalian vectors include transient expression vectors or stable expression vectors. Mammalian transient expression vectors include p3.times.FLAG-CMV 8, pFLAG-Myc-CMV 19, pFLAG-Myc-CMV 23, pFLAG-CMV 2, pFLAG-CMV 6a,b,c, pFLAG-CMV 5.1, pFLAG-CMV 5a,b,c, p3.times.FLAG-CMV 7.1, pFLAG-CMV 20, p3.times.FLAG-Myc-CMV 24, pCMV-FLAG-MAT1, pCMV-FLAG-MAT2, pBICEP-CMV 3, or pBICEP-CMV 4. Mammalian stable expression vector include pFLAG-CMV 3, p3.times.FLAG-CMV 9, p3.times.FLAG-CMV 13, pFLAG-Myc-CMV 21, p3.times.FLAG-Myc-CMV 25, pFLAG-CMV 4, p3.times.FLAG-CMV 10, p3.times.FLAG-CMV 14, pFLAG-Myc-CMV 22, p3.times.FLAG-Myc-CMV 26, pBICEP-CMV 1, or pBICEP-CMV 2.
[0194] In some instances, a cell-free system is a mixture of cytoplasmic and/or nuclear components from a cell and is used for in vitro nucleic acid synthesis. In some cases, a cell-free system utilizes either prokaryotic cell components or eukaryotic cell components. Sometimes, a nucleic acid synthesis is obtained in a cell-free system based on for example Drosophila cell, Xenopus egg, or HeLa cells (ATCC.RTM. CCL-2.TM.). Exemplary cell-free systems include, but are not limited to, E. coli S30 Extract system, E. coli T7 S30 system, or PURExpress.RTM..
Host Cells
[0195] In some embodiments, a host cell includes any suitable cell such as a naturally derived cell or a genetically modified cell. In some instances, a host cell is a production host cell. In some instances, a host cell is a eukaryotic cell. In other instances, a host cell is a prokaryotic cell. In some cases, a eukaryotic cell includes fungi (e.g., a yeast cell), an animal cell, or a plant cell. In some cases, a prokaryotic cell is a bacterial cell. Examples of bacterial cell include gram-positive bacteria or gram-negative bacteria. In some embodiments the gram-negative bacteria is anaerobic, rod-shaped, or both.
[0196] In some instances, gram-positive bacteria include Actinobacteria, Firmicutes or Tenericutes. In some cases, gram-negative bacteria include Aquificae, Deinococcus-Thermus, Fibrobacteres-Chlorobi/Bacteroidetes (FCB group), Fusobacteria, Gemmatimonadetes, Nitrospirae, Planctomycetes-Verrucomicrobia/Chlamydiae (PVC group), Proteobacteria, Spirochaetes or Synergistetes. In some embodiments, bacteria is Acidobacteria, Chloroflexi, Chrysiogenetes, Cyanobacteria, Deferribacteres, Dictyoglomi, Thermodesulfobacteria or Thermotogae. In some embodiments, a bacterial cell is Escherichia coli, Clostridium botulinum, or Coli bacilli.
[0197] Exemplary prokaryotic host cells include, but are not limited to, BL21, Mach1.TM. DH10B.TM., TOP10, DH5.alpha., DH10Bac.TM., OmniMax.TM., MegaX.TM., DH12S.TM., INV110, TOP10F', INV.alpha.F, TOP10/P3, ccdB Survival, PIR1, PIR2, Stb12.TM., Stb13.TM., or Stb14.TM..
[0198] In some instances, animal cells include a cell from a vertebrate or from an invertebrate. In some cases, an animal cell includes a cell from a marine invertebrate, fish, insects, amphibian, reptile, mammal, or human. In some cases, a fungus cell includes a yeast cell, such as brewer's yeast, baker's yeast, or wine yeast.
[0199] Fungi include ascomycetes such as yeast, mold, filamentous fungi, basidiomycetes, or zygomycetes. In some instances, yeast includes Ascomycota or Basidiomycota. In some cases, Ascomycota includes Saccharomycotina (true yeasts, e.g. Saccharomyces cerevisiae (baker's yeast)) or Taphrinomycotina (e.g. Schizosaccharomycetes (fission yeasts)). In some cases, Basidiomycota includes Agaricomycotina (e.g. Tremellomycetes) or Pucciniomycotina (e.g. Microbotryomycetes).
[0200] Exemplary yeast or filamentous fungi include, for example, the genus: Saccharomyces, Schizosaccharomyces, Candida, Pichia, Hansenula, Kluyveromyces, Zygosaccharomyces, Yarrowia, Trichosporon, Rhodosporidi, Aspergillus, Fusarium, or Trichoderma. Exemplary yeast or filamentous fungi include, for example, the species: Saccharomyces cerevisiae, Schizosaccharomyces pombe, Candida utilis, Candida boidini, Candida albicans, Candida tropicalis, Candida stellatoidea, Candida glabrata, Candida krusei, Candida parapsilosis, Candida guilliermondii, Candida viswanathii, Candida lusitaniae, Rhodotorula mucilaginosa, Pichia metanolica, Pichia angusta, Pichia pastoris, Pichia anomala, Hansenula polymorpha, Kluyveromyces lactis, Zygosaccharomyces rouxii, Yarrowia lipolytica, Trichosporon pullulans, Rhodosporidium toru-Aspergillus niger, Aspergillus nidulans, Aspergillus awamori, Aspergillus oryzae, Trichoderma reesei, Yarrowia lipolytica, Brettanomyces bruxellensis, Candida stellata, Schizosaccharomyces pombe, Torulaspora delbrueckii, Zygosaccharomyces bailii, Cryptococcus neoformans, Cryptococcus gattii, or Saccharomyces boulardii.
[0201] Exemplary yeast host cells include, but are not limited to, Pichia pastoris yeast strains such as GS115, KM71H, SMD1168, SMD1168H, and X-33; and Saccharomyces cerevisiae yeast strain such as INVScl.
[0202] In some instances, additional animal cells include cells obtained from a mollusk, arthropod, annelid or sponge. In some cases, an additional animal cell is a mammalian cell, e.g., from a human, primate, ape, equine, bovine, porcine, canine, feline or rodent. In some cases, a rodent includes mouse, rat, hamster, gerbil, hamster, chinchilla, fancy rat, or guinea pig.
[0203] Exemplary mammalian host cells include, but are not limited to, 293A cell line, 293FT cell line, 293F cells, 293 H cells, CHO DG44 cells, CHO--S cells, CHO--K1 cells, Expi293F.TM. cells, Flp-In.TM. T-REx.TM. 293 cell line, Flp-In.TM.-293 cell line, Flp-In.TM.-3T3 cell line, Flp-In.TM.-BHK cell line, Flp-In.TM.-CHO cell line, Flp-In.TM.-CV-1 cell line, Flp-In.TM.-Jurkat cell line, FreeStyle.TM. 293-F cells, FreeStyle.TM. CHO--S cells, GripTite.TM. 293 MSR cell line, GS-CHO cell line, HepaRG.TM. cells, T-REx.TM. Jurkat cell line, Per.C6 cells, T-REx.TM.-293 cell line, T-REx.TM.-CHO cell line, and T-REx.TM.-HeLa cell line.
[0204] In some instances, a mammalian host cell is a primary cell. In some instances, a mammalian host cell is a stable cell line, or a cell line that has incorporated a genetic material of interest into its own genome and has the capability to express the product of the genetic material after many generations of cell division. In some cases, a mammalian host cell is a transient cell line, or a cell line that has not incorporated a genetic material of interest into its own genome and does not have the capability to express the product of the genetic material after many generations of cell division.
[0205] Exemplary insect host cell include, but are not limited to, Drosophila S2 cells, Sf9 cells, Sf21 cells, High Five.TM. cells, and expresSF+.RTM. cells.
[0206] In some instances, plant cells include a cell from algae. Exemplary insect cell lines include, but are not limited to, strains from Chlamydomonas reinhardtii 137c, or Synechococcus elongatus PPC 7942.
Methods of Use
[0207] Disclosed herein, in certain embodiments, are methods of preparing a capsid which encapsulates a cargo. In some embodiments, the method comprises incubating a plurality of Arc or endo-Gag polypeptides, engineered Arc or endo-Gag polypeptides, and/or recombinant Arc or endo-Gag polypeptides with a cargo in a solution for a time sufficient to generate a loaded Arc-based capsid or endo-Gag-based capsid.
[0208] In some instances, the method comprises mixing a solution comprising a plurality of engineered and/or recombinant Arc polypeptides with a plurality of non-Arc capsid forming subunits prior to incubating with the cargo. In some cases, the plurality of non-Arc capsid forming subunits are mixed with the plurality of engineered and/or recombinant Arc polypeptides at a ratio of 1:1, 2:1, 3:1, 4:1, 5:1, 6:1, 7:1, 8:1, 9:1, or 10:1. In other cases, the plurality of non-Arc capsid forming subunits are mixed with the plurality of engineered and/or recombinant Arc polypeptides at a ratio of 1:2, 1:3, 1:4, 1:5, 1:6, 1:7, 1:8, 1:9, or 1:10.
[0209] In some cases, the time sufficient to generate a loaded Arc-based capsid or endo-Gag-based capsid is at least about 5 minutes, at least about 10 minutes, at least about 20 minutes, at least about 30 minutes, at least about 1 hour, at least about 2 hours, at least about 4 hours, at least about 6 hours, at least about 10 hours, at least about 12 hours, at least about 24 hours, or more.
[0210] In some cases, the Arc-based capsid or endo-Gag-based capsid is prepared at a temperature from about 2.degree. C. to about 37.degree. C. In some instances, the Arc-based capsid or endo-Gag-based capsid is prepared at a temperature from about 2.degree. C. to about 8.degree. C., about 2.degree. C. to about 4.degree. C., about 20.degree. C. to about 37.degree. C., about 25.degree. C. to about 37.degree. C., about 20.degree. C. to about 30.degree. C., about 25.degree. C. to about 30.degree. C., or about 30.degree. C. to about 37.degree. C.
[0211] In some cases, the Arc-based capsid or endo-Gag-based capsid is prepared at room temperature.
[0212] In some instances, the Arc-based capsid or endo-Gag-based capsid is further formulated for systemic administration.
[0213] In some instances, the Arc-based capsid or endo-Gag-based capsid is further formulated for local administration.
[0214] In some instances, the Arc-based capsid or endo-Gag-based capsid is further formulated for parenteral (e.g., intra-arterial, intra-articular, intradermal, intralesional, intramuscular, intraocular, intraosseous infusion, intraperitoneal, intrathecal, intravenous, intravitreal, or subcutaneous) administration.
[0215] In some instances, the Arc-based capsid or endo-Gag-based capsid is further formulated for topical administration.
[0216] In some instances, the Arc-based capsid or endo-Gag-based capsid is further formulated for oral administration.
[0217] In some instances, the Arc-based capsid or endo-Gag-based capsid is further formulated for sublingual administration.
[0218] In some instances, the Arc-based capsid or endo-Gag-based capsid is further formulated for aerosol administration.
[0219] In certain embodiments, also described herein is a use of an Arc-based capsid or endo-Gag-based capsid for delivery of a cargo to a site of interest. In some instances, the method comprises contacting a cell at the site of interest with an Arc-based capsid or endo-Gag-based capsid for a time sufficient to facilitate cellular uptake of the capsid.
[0220] In some cases, the cell is a muscle cell, a skin cell, a blood cell, or an immune cell (e.g., a T cell or a B cell).
[0221] In some instances, the cell is a tumor cell, e.g., a solid tumor cell or a cell from a hematologic malignancy. In some cases, the solid tumor cell is a cell from a bladder cancer, breast cancer, brain cancer, colorectal cancer, kidney cancer, liver cancer, lung cancer, pancreatic cancer, prostate cancer, skin cancer, stomach cancer, or thyroid cancer. In some cases, the cell from a hematologic malignancy is from a B-cell malignancy or a T-cell malignancy. In some cases, the cell is from a leukeuma, a lymphoma, a myeloma, chronic lymphocytic leukemia (CLL), small lymphocytic lymphoma (SLL), diffuse large B cell lymphoma (DLBCL), follicular lymphoma, mantle cell lymphoma, Burkitt lymphoma, cutaneous T-cell lymphoma, peripheral T cell lymphoma, multiple myeloma, plasmacytoma, acute lymphoblastic leukemia (ALL), acute myeloid leukemia (AML), or chronic myeloid leukemia (CML).
[0222] In some embodiments, the cell is a somatic cell. In some instances, the cell is a blood cell, a skin cell, a connective tissue cell, a bone cell, a muscle cell, or a cell from an organ.
[0223] In some embodiments, the cell is an epithelial cell, a connective tissue cell, a muscular cell, or a neuron.
[0224] In some instances, the cell is an endodermal cell, a mesodermal cell, or an ectodermal. In some instances, the endoderm comprises cells of the respiratory system, the intestine, the liver, the gallbladder, the pancreas, the islets of Langerhans, the thyroid, or the hindgut. In some cases, the mesoderm comprises osteochondroprogenitor cells, muscle cells, cells from the digestive system, renal stem cells, cells from the reproductive system, cells from the circulatory system (such as endothelial cells). Exemplary cells from the ectoderm comprise epithelial cells, cells of the anterior pituitary, cells of the peripheral nervous system, cells of the neuroendocrine system, cells of the eyes, cells of the central nervous system, cells of the ependymal, or cells of the pineal gland. In some cases, cells derived from the central and peripheral nervous system comprise neurons, Schwann cells, satellite glial cells, oligodendrocytes, or astrocytes. In some cases, neurons further comprise interneurons, pyramidal neurons, gabaergic neurons, dopaminergic neurons, serotoninergic neurons, glutamatergic neurons, motor neurons from the spinal cord, or inhibitory spinal neurons.
[0225] In some embodiments, the cell is a stem cell or a progenitor cell. In some cases, the cell is a mesenchymal stem or progenitor cell. In other cases, the cell is a hematopoietic stem or progenitor cell.
[0226] In some cases, a target protein is overexpressed or is depleted in the cell. In some cases, the target protein is overexpressed in the cell. In additional cases, the target protein is depleted in the cell.
[0227] In some cases, a target gene in the cell has one or more mutations.
[0228] In some cases, the cell comprises an impaired splicing mechanism.
[0229] In some instances, the Arc-based capsid is administered systemically to a subject in need thereof.
[0230] In other instances, the Arc-based capsid or endo-Gag-based capsid is administered locally to a subject in need thereof.
[0231] In some embodiments, the Arc-based capsid or endo-Gag-based capsid is administered parenterally, orally, topically, via sublingual, or by aerosol to a subject in need thereof. In some cases, the Arc-based capsid or endo-Gag-based capsid is administered parenterally to a subject in need thereof. In other cases, the Arc-based capsid or endo-Gag-based capsid is administered orally to a subject in need thereof. In additional cases, the Arc-based capsid or endo-Gag-based capsid is administered topically, via sublingual, or by aerosol to a subject in need thereof.
[0232] In some embodiments, a delivery component is combined with an Arc-based capsid or endo-Gag-based capsid for a targeted delivery to a site of interest. In some instances, the delivery component comprises a carrier, e.g., an extracellular vesicle such as a micelle, a liposome, or a microvesicle; or a viral envelope.
[0233] In some instances, the delivery component serves as a primary delivery vehicle for an Arc-based capsid or endo-Gag-based capsid which does not comprise its own delivery component (e.g., in which the second polypeptide is not present). In such cases, the delivery component directs the Arc-based capsid or endo-Gag-based capsid to a target site of interest and optionally facilitates intracellular uptake.
[0234] In other instances, the delivery component enhances target specificity and/or sensitivity of an Arc-based capsid's second polypeptide. In such cases, the delivery component enhances the specificity and/or affinity of the Arc-based capsid or endo-Gag-based capsid to the target site. In additional cases, the delivery components enhances the specificity and/or affinity by about 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, 20-fold, 30-fold, 50-fold, 100-fold, 200-fold, 500-fold, or more. In further cases, the delivery components enhances the specificity and/or affinity by about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, 200%, 500%, or more. Further still, the delivery component optionally minimizes off-target effect by about 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, 20-fold, 30-fold, 50-fold, 100-fold, 200-fold, 500-fold, or more. Further still, the delivery component optionally minimizes off-target effect by about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, 200%, 500%, or more.
[0235] In additional instances, the delivery component serves as a first vehicle that transports an Arc-based capsid to a general target region (e.g., a tumor microenvironment) and the Arc-based or endo-Gag-based capsid's second polypeptide serves as a second delivery molecule that drives the Arc-based capsid or endo-Gag-based capsid to the specific target site and optionally facilitates intracellular uptake. In such cases, the delivery component minimizes off-target effect by about 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 7-fold, 8-fold, 9-fold, 10-fold, 20-fold, 30-fold, 50-fold, 100-fold, 200-fold, 500-fold, or more. In such cases, the delivery component minimizes off-target effect by about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%, 200%, 500%, or more.
[0236] In further instances, the delivery component serves as a first vehicle that transports an Arc-based capsid to a target site of interest and the Arc-based or endo-Gag-based capsid's second polypeptide serves as a second delivery molecule that facilitates intracellular uptake.
[0237] In some embodiments, the delivery component comprises an extracellular vesicle. In some instances, the extracellular vesicle comprises a microvesicle, a liposome, or a micelle. In some instances, the extracellular vesicle has a diameter of from about 10 nm to about 2000 nm, from about 10 nm to about 1000 nm, from about 10 nm to about 800 nm, from about 20 nm to about 600 nm, from about 30 nm to about 500 nm, from about 50 nm to about 200 nm, or from about 80 nm to about 100 nm.
[0238] In some embodiments, the delivery component comprises a microvesicle. Also known as circulating microvesicles or microparticles, microvesicles are membrane-bound vesicles that comprise phospholipids. In some instances, the microvesicle has a diameter of from about 50 nm to about 1000 nm, from about 100 nm to about 800 nm, from about 200 nm to about 500 nm, or from about 50 nm to about 400 nm.
[0239] In some instances, the microvesicle is originated from cell membrane inversion, exocytosis, shedding, blebbing, or budding. In some instances, the microvesicles are generated from differentiated cells. In other instances, the microvesicles are generated from undifferentiated cells, e.g., by blast cells, progenitor cells, or stem cells.
[0240] In some embodiments, the delivery component comprises a liposome. In some instances, the liposome comprises a plurality of lipopeptides, which are presented on the surface of the liposome, for targeted delivery to a site or region of interest. In some cases, the liposomes fuse with the target cell, whereby the contents of the liposome are then emptied into the target cell. In some cases, a liposome is endocytosed by cells that are phagocytic. Endocytosis is then followed by intralysosomal degradation of liposomal lipids and release of the encapsulated agents.
[0241] Exemplary liposomes suitable for incorporation include, and are not limited to, multilamellar vesicles (MLV), oligolamellar vesicles (OLV), unilamellar vesicles (UV), small unilamellar vesicles (SUV), medium-sized unilamellar vesicles (MUV), large unilamellar vesicles (LUV), giant unilamellar vesicles (GUV), multivesicular vesicles (MVV), single or oligolamellar vesicles made by reverse-phase evaporation method (REV), multilamellar vesicles made by the reverse-phase evaporation method (MLV-REV), stable plurilamellar vesicles (SPLV), frozen and thawed MLV (FATMLV), vesicles prepared by extrusion methods (VET), vesicles prepared by French press (FPV), vesicles prepared by fusion (FUV), dehydration-rehydration vesicles (DRV), and bubblesomes (BSV). In some instances, a liposome comprises Amphipol (A8-35). Techniques for preparing liposomes are described in, for example, COLLOIDAL DRUG DELIVERY SYSTEMS, vol. 66 (J. Kreuter ed., Marcel Dekker, Inc. (1994)).
[0242] Depending on the method of preparation, liposomes are unilamellar or multilamellar, and vary in size with diameters ranging from about 20 nm to greater than about 1000 nm.
[0243] In some instances, liposomes provided herein also comprise carrier lipids. In some embodiments the carrier lipids are phospholipids. Carrier lipids capable of forming liposomes include, but are not limited to, dipalmitoylphosphatidylcholine (DPPC), phosphatidylcholine (PC; lecithin), phosphatidic acid (PA), phosphatidylglycerol (PG), phosphatidylethanolamine (PE), or phosphatidylserine (PS). Other suitable phospholipids further include distearoylphosphatidylcholine (DSPC), dimyristoylphosphatidylcholine (DMPC), dipalmitoylphosphatidyglycerol (DPPG), distearoylphosphatidyglycerol (DSPG), dimyristoylphosphatidylglycerol (DMPG), dipalmitoylphosphatidic acid (DPPA); dimyristoylphosphatidic acid (DMPA), distearoylphosphatidic acid (DSPA), dipalmitoylphosphatidylserine (DPPS), dimyristoylphosphatidylserine (DMPS), distearoylphosphatidylserine (DSPS), dipalmitoylphosphatidyethanolamine (DPPE), dimyristoylphosphatidylethanolamine (DMPE), distearoylphosphatidylethanolamine (DSPE) and the like, or combinations thereof. In some embodiments, the liposomes further comprise a sterol (e.g., cholesterol) which modulates liposome formation. The carrier lipids are optionally any non-phosphate polar lipids.
[0244] In some embodiments, the delivery component comprises a micelle. In some instances, the micelle has a diameter from about 2 nm to about 250 nm, from about 20 nm to about 200 nm, from about 20 nm to about 100 nm, or from about 50 to about 100 nm.
[0245] In some instances, the micelle is a polymeric micelle, characterized by a core shell structure, in which the hydrophobic core is surrounded by a hydrophilic shell. In some cases, the hydrophilic shell further comprises a hydrophilic polymer or copolymer and a pH sensitive component.
[0246] Exemplary hydrophilic polymers or copolymers include, but are not limited to, poly(N-substituted acrylamides), poly(N-acryloyl pyrrolidine), poly(N-acryloyl piperidine), poly(N-acryl-L-amino acid amides), poly(ethyl oxazoline), methylcellulose, hydroxypropyl acrylate, hydroxyalkyl cellulose derivatives and poly(vinyl alcohol), poly(N-isopropylacrylamide), poly(N-vinyl-2-pyrrolidone), polyethyleneglycol derivatives, and combinations thereof.
[0247] The pH-sensitive moiety includes, but is not limited to, an alkylacrylic acid such as methacrylic acid, ethylacrylic acid, propyl acrylic acid and butyl acrylic acid, or an amino acid such as glutamic acid.
[0248] In some instances, the hydrophobic moiety constitutes the core of the micelle and includes, for example, a single alkyl chain, such as octadecyl acrylate or a double chain alkyl compound such as phosphatidylethanolamine or dioctadecylamine. In some cases, the hydrophobic moiety is optionally a water insoluble polymer such as a poly(lactic acid) or a poly(e-caprolactone).
[0249] Polymeric micelles exhibiting pH-sensitive properties are also contemplated and are formed, e.g., by using pH-sensitive polymers including, but not limited to, copolymers from methacrylic acid, methacrylic acid esters and acrylic acid esters, polyvinyl acetate phthalate, hydroxypropyl methyl cellulose phthalate, cellulose acetate phthalate, or cellulose acetate trimellitate.
[0250] In some embodiments, the delivery component comprises a viral envelope. Viral envelopes comprise glycoproteins, phospholipids, and additional proteins obtained from a host. In some instances, the viral envelope is permissive to a wide range of target cells. In other instances, the viral envelope is non-permissive and is specific to a target cell of interest. In some cases, the viral envelope comprises a cell-specific binding protein and optionally a fusogenic molecule that aids in the fusion of the cargo into a target cell. In some cases, the viral envelope comprises an endogenous viral envelope. In other cases, the viral envelope is a modified envelop, comprising one or more foreign proteins.
[0251] In some instances, the viral envelope is derived from a DNA virus. Exemplary enveloped DNA viruses include viruses from the family of Herpesviridae, Poxviridae, and Hepadnavirdae.
[0252] In other instances, the viral envelope is derived from an RNA virus. Exemplary enveloped RNA viruses include viruses from the family of Bunyaviridae, Coronaviridae, Filoviridae, Flaviviridae, Orthomyxoviridae, Paramyxoviridae, Rhabdoviridae, and Togaviridae.
[0253] In additional instances, the viral envelope is derived from a virus from the family of Retroviridae.
[0254] In some embodiments, the viral envelope is from an oncolytic virus, such as an oncolytic DNA virus from the family of Herpesviridae (for example, HSV1) or Poxviridae (for example, Vaccinia virus and myxoma virus); or an oncolytic RNA virus from the family of Rhabdoviridae (for example, VSV) or Paramyxoviridae (for example MV and NDV).
[0255] In some instances, the viral envelope further comprises a foreign or engineered protein that binds to an antigen or a cell surface molecule. Exemplary antigens and cell surface molecules for targeting include, but are not limited to, P-glycoprotein, Her2/Neu, erythropoietin (EPO), epidermal growth factor receptor (EGFR), vascular endothelial growth factor receptor (VEGF-R), cadherin, carcinoembryonic antigen (CEA), CD4. CD8, CD19. CD20, CD33, CD34, CD45, CD117 (c-kit), CD133, HLA-A. HLA-B, HLA-C, chemokine receptor 5 (CCRS), stem cell marker ABCG2 transporter, ovarian cancer antigen CA125, immunoglobulins, integrins, prostate specific antigen (PSA), prostate stem cell antigen (PSCA), dendritic cell-specific intercellular adhesion molecule 3-grabbing nonintegrin (DC-SIGN), thyroglobulin, granulocyte-macrophage colony stimulating factor (GM-CSF), myogenic differentiation promoting factor-1 (MyoD-1), Leu-7 (CD57), LeuM-1, cell proliferation-associated human nuclear antigen defined by the monoclonal antibody Ki-67 (Ki-67), viral envelope proteins, HIV gp120, or transferrin receptor.
[0256] In some embodiments, the Arc-based capsid or endo-Gag-based capsid is for in vitro use.
[0257] In some instances, the Arc-based capsid or endo-Gag-based capsid is for ex vivo use.
[0258] In some cases, the Arc-based capsid or endo-Gag-based capsid is for in vivo use.
Kits/Article of Manufacture
[0259] Disclosed herein, in certain embodiments, are kits and articles of manufacture for use with one or more methods described herein. Such kits include a carrier, package, or container that is compartmentalized to receive one or more containers such as vials, tubes, and the like, each of the container(s) comprising one of the separate elements to be used in a method described herein. Suitable containers include, for example, bottles, vials, syringes, and test tubes. In one embodiment, the containers are formed from a variety of materials such as glass or plastic.
[0260] For example, the container(s) include a recombinant or engineered Arc or endo-Gag polypeptide described above. Such kits optionally include an identifying description or label or instructions relating to its use in the methods described herein. For example, a kit typically includes labels listing contents and/or instructions for use, and package inserts with instructions for use. A set of instructions will also typically be included.
Certain Terminologies
[0261] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood. It is to be understood that the detailed description are exemplary and explanatory only and are not restrictive of any subject matter claimed. In this application, the use of the singular includes the plural unless specifically stated otherwise. It must be noted that, as used in the specification, the singular forms "a," "an" and "the" include plural referents unless the context clearly dictates otherwise. In this application, the use of "or" means "and/or" unless stated otherwise. Furthermore, use of the term "including" as well as other forms, such as "include", "includes," and "included," is not limiting.
[0262] Although various features of the invention may be described in the context of a single embodiment, the features may also be provided separately or in any suitable combination. Conversely, although the invention may be described herein in the context of separate embodiments for clarity, the invention may also be implemented in a single embodiment.
[0263] Reference in the specification to "some embodiments", "an embodiment", "one embodiment" or "other embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least some embodiments, but not necessarily all embodiments, of the inventions.
[0264] As used herein, ranges and amounts can be expressed as "about" a particular value or range. About also includes the exact amount. Hence "about 5 .mu.L" means "about 5 .mu.L" and also "5 .mu.L." Generally, the term "about" includes an amount that would be expected to be within experimental error.
[0265] The section headings used herein are for organizational purposes only and are not to be construed as limiting the subject matter described.
[0266] As used herein, the sequence of a CA N-lobe described herein corresponds to the human CA N-lobe. In some instances, the human CA N-lobe comprises residues 207-278 of SEQ ID NO: 1. In some instances, a CA N-lobe described herein comprises about 30%, 40%, 50%, 60%, 70%, 80%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97% 98% or 99% sequence identity to residue 207-278 of SEQ ID NO: 1. In some cases, a CA N-lobe described herein shares a structural similarity with the human CA N-lobe. For example, a CA N-lobe described herein shares about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97% 98% or 99% structural similarity with the human CA N-lobe. In some cases, the CA N-lobe shares a high structural similarity (e.g., 80%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97% 98% or 99% structural similarity) but does not share a high sequence identity (e.g., the sequence identity is lower than 80%, lower than 70%, lower than 60%, lower than 50%, lower than 40%, or lower than 30%). In some cases, the CA N-lobe comprises residues 207-278 of SEQ ID NO: 1.
[0267] As used herein, the sequence of a CA C-lobe described herein corresponds to the human CA C-lobe. In some instances, the human CA C-lobe comprises residues 278-370 of SEQ ID NO: 1. In some instances, a CA C-lobe described herein comprises about 30%, 40%, 50%, 60%, 70%, 80%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97% 98% or 99% sequence identity to residue 278-370 of SEQ ID NO: 1. In some cases, a CA C-lobe described herein shares a structural similarity with the human CA C-lobe. For example, a CA C-lobe described herein shares about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97% 98% or 99% structural similarity with the human CA C-lobe. In some cases, the CA C-lobe shares a high structural similarity (e.g., 80%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97% 98% or 99% structural similarity) but does not share a high sequence identity (e.g., the sequence identity is lower than 80%, lower than 70%, lower than 60%, lower than 50%, lower than 40%, or lower than 30%). In some cases, the CA C-lobe comprises residues 278-370 of SEQ ID NO: 1.
[0268] As used herein, the terms "individual(s)", "subject(s)" and "patient(s)" mean any mammal. In some embodiments, the mammal is a human. In some embodiments, the mammal is a non-human. None of the terms require or are limited to situations characterized by the supervision (e.g. constant or intermittent) of a health care worker (e.g. a doctor, a registered nurse, a nurse practitioner, a physician's assistant, an orderly or a hospice worker).
EXAMPLES
[0269] These examples are provided for illustrative purposes only and not to limit the scope of the claims provided herein.
Example 1--Construction of DNA Vectors Encoding Recombinant Arc Proteins and Engineered Arc Proteins
[0270] To construct recombinant DNA vectors for Arc expression, full length cDNA open reading frames, excluding the initial methionine, are inserted into a cloning vector and subsequently transferred into an expression vector according to standard methods. The same approach is used to construct recombinant DNA vectors for expressing endo-Gag proteins. Human Arc cDNA includes an annotated matrix domain (MA) and a capsid domain. The capsid domain has an N-terminal lobe (NTD) and a C-terminal lobe (CTD). FIG. 1 illustrates the structure of the Human Arc protein and the predicted structure of Arc from Python, Platypus, and Orca.
[0271] cDNAs encoding engineered Arc proteins are optionally generated by recombining Arc sequences from different species (FIG. 2), by inserting functional domains from other proteins into an Arc protein (FIG. 3A), by modifying the sequence of an Arc protein (FIG. 3B), and/or by any combination of the approaches exemplified in FIGS. 2-3. cDNAs encoding engineered endo-Gag proteins are likewise generated by recombining endo-Gag sequences from different species, by inserting functional domains from other proteins into an endo-Gag protein, by modifying the sequence of an endo-Gag protein, and/or by any combination of these approaches. Furthermore, an engineered endo-Gag protein optionally contains Arc sequences and an engineered Arc protein optionally contains endo-Gag sequences. Engineered Arc and endo-Gag protein monomers assemble into capsids.
[0272] cDNAs encoding the Arc and endo-Gag proteins of Table 1 were inserted into an expression vector derived from pET-41 a(+) (EMD Millipore (Novagen) Cat #70566). The entire cloning site of pET-41 a(+) was removed and replaced with the DNA having the nucleotide sequence of SEQ ID NO: 57, which encodes an alternative N-terminal tag having the amino acid sequence of SEQ ID NO: 58 and comprising a 6.times.His tag (SEQ ID NO: 59), a 6 amino acid spacer (SEQ ID NO: 60), and an AcTEV.TM. cleavage site (SEQ ID NO: 61). Arc and endo-Gag open reading frames without their starting methionine codon were inserted after the AcTEV.TM. cleavage site by Gibson assembly. Gibson D G, Young L, Chuang R Y, Venter J C, Hutchison C A 3rd, Smith H O (2009). "Enzymatic assembly of DNA molecules up to several hundred kilobases". Nature Methods. 6 (5): 343-345. After expression and AcTEV.TM. cleavage, the N-terminus of the resulting Arc or endo-Gag protein has a single residual Glycine from the AcTEV.TM. cleavage site.
TABLE-US-00001 SEQ ID NO: 57 ATGCATCACCATCACCATCACGGCTCAGGGTCTGGTAGCGAAAATCTGT ACTTCCAGGGG SEQ ID NO: 58 MHHHHHHGSGSGSENLYFQG SEQ ID NO: 59 HHHHHH SEQ ID NO: 60 GSGSGS SEQ ID NO: 61 ENLYFQG
TABLE-US-00002 TABLE 1 Sequences of Arc and endo-Gag polypeptides and nucleotides. SEQ ID NO: Gene Species Amino Name Common name Proper name Sequence ID acid DNA Arc Human Homo sapiens NP_056008.1 1 29 Arc Killer Whale Orcinus orca XP_004265337.1 2 30 Arc White Tailed Deer Odocoileus XP_020755692.1 3 31 virginianus texanus Arc Platypus Ornithorhynchus XP_001512750.1 4 32 anatinus Arc Goose Anser cygnoides XP_013046406.1 5 33 domesticus Arc Dalmation Pelican Pelecanus crispus KFQ60200.1 6 34 Arc White Tailed Eagle Haliaeetus albicilla KFQ04633.1 7 35 Arc King Cobra Ophiophagus ETE60609.1 8 36 hannah Arc Ray Finned Fish Austrofundulus XP_013881732.1 9 37 limnaeus Arc Sperm Whale Physeter catodon XP_007119193.2 10 38 Arc Turkey Meleagris XP_010707654.1 11 39 gallopavo Arc Central Bearded Pogona vitticeps XP_020633722.1 12 40 Dragon Arc Chinese Alligator Alligator sinensis XP_006027442.1 13 41 Arc American Alligator Alligator XP_019337372.1 14 42 mississippiensis Arc Japanese Gekko Gekko japonicus XP_015273745.1 15 43 PNMA3 Human Homo sapiens NP_001269464.1 16 44 PNMA5 Human Homo sapiens NP_001096620.1 17 45 PNMA6A Human Homo sapiens NP_116271.3 18 46 PNMA6B Human Homo sapiens SP_P0C5W0.1 19 47 RTL3 Human Homo sapiens NP_689907.1 20 48 RTL6 Human Homo sapiens NP_115663.2 21 49 RTL8A Human Homo sapiens NP_001071640.1 22 50 RTL8B Human Homo sapiens NP_01071641.1 23 51 BOP Human Homo sapiens NP_078903.3 24 52 LDOC1 Human Homo sapiens NP_036449.1 25 53 ZNF18 Human Homo sapiens NP_001290210.1 26 54 MOAP1 Human Homo sapiens AAG31786.1 27 55 PEG10 Human Homo sapiens NP_055883.2 28 56
Example 2--Expression and Purification of Arc and Endo-Gag Proteins
[0273] Expression vectors constructs comprising Arc and endo-Gag open reading frames were transformed into the Rosetta 2 (DE3)pLysS E. coli strain (Millipore Sigma, Cat #71403). Arc or endo-Gag expression was induced with 0.1 mM IPTG followed by a 16-hour incubation at 16.degree. C. Cell pellets were lysed by sonication in 20 mM sodium phosphate pH 7.4, 0.1M NaCl, 40 mM imidazole, 1 mM DTT, and 10% glycerol. The lysate was treated with excess TURBO DNase (Thermo Fisher Scientific, Cat #AM2238), RNase Cocktail (Thermo Fisher Scientific, Cat #AM2286), and Benzonase Nuclease (Millipore Sigma, Cat #71205) to eliminate nucleic acids. NaCl was added to lysate in order to adjust the NaCl concentration to 0.5 M followed by centrifugation and filtration to remove cellular debris. 6.times.His-tagged recombinant protein was loaded onto a HisTrap HP column (GE Healthcare, Cat #17-5247-01), washed with buffer A (20 mM sodium phosphate pH 7.4, 0.5M NaCl, 40 mM imidazole, and 10% glycerol), and eluted with a linear gradient of buffer B (20 mM sodium phosphate pH 7.4, 0.5M NaCl, 500 mM imidazole, and 10% glycerol). Collection tubes were supplemented in advance with 10 .mu.l of 0.5 M EDTA pH 8.0 per 1 ml eluate. The resulting Arc or endo-Gag protein is generally more than 95% pure as revealed by SDS-PAGE analysis, with a yield of up to 50 mg per 1 L of bacterial culture. FIG. 4A.
[0274] Residual nucleic acid was removed by anion exchange chromatography on a mono Q 5/50 GL column (GE Healthcare, Cat #17516601). Before loading to the column, recombinant protein was buffer exchanged to buffer C (20 mM Tris-HCl pH 8.0, 100 mM NaCl, and 10% glycerol) using "Pierce Protein Concentrator PES, 10K MWCO, 5-20 ml" (Thermo Scientific, Cat #88528) according to the manufacturer's protocol. After loading, the mono Q resin was washed with 2 ml of buffer C. Arc and endo-Gag proteins were eluted using a linear gradient of buffer D (20 mM Tris-HCl pH 8.0, 500 mM NaCl, and 10% glycerol). RNA efficiently separated from Arc and eluted at 600 mM NaCl (FIG. 4B).
[0275] The N-terminal 6.times.His tag and spacer were removed from concentrating peak fractions of the mono Q purified Arc using a 10 kDa MWCO PES concentrator and then treating with 10% v/v of AcTEV.TM. Protease (Invitrogen.TM. #12575023). The cleavage efficiency is above 99% as revealed by SDS-PAGE assay. The protein is then diluted into HisTrap Buffer A and cleaned with HisTrap HP resin. The resulting purified Arc has an N-terminal Glycine residue and does not contain the initial methionine.
Example 3--Capsid Assembly
[0276] Cleaved Arc protein (1 mg/mL) was loaded into a 20 kDa MWCO dialysis cassette and dialyzed overnight in 1M sodium phosophate (pH 7.5) at room temperature. The following day, the solution was removed from the cassette, transferred to microcentrifuge tubes, and spun at max speed for 5 minutes in a tabletop centrifuge. The supernatant was transferred to a 100 kDa MWCO Regenerated Cellulose Amicon Ultrafiltration Centrifugal concentrator. The buffer was exchanged to PBS pH 7.5 and the volume was reduced 20-fold.
[0277] Capsid assembly was assayed by transmission electron microscopy. EM grids (Carbon Support Film, Square Grid, 400 mesh, 5-6 nm, Copper, CF400-Cu-UL) were prepared by glow discharge. A 5 .mu.L sample of purified Arc was applied to the grid for 20 seconds and then wicked away using filter paper. The grid was then washed with MilliQ H.sub.2O, stained with 54 of 1% Uranyl Acetate in H.sub.2O for 30 seconds, and air dried for 1 minute. Images of Arc capsids were acquired using a FEI Talos L120C TEM equipped with a Gatan 4 k.times.4 k OneView camera. FIG. 5 shows concentrated human Arc capsids. FIG. 6 shows capsids formed from recombinantly expressed Arc orthologs from other vertebrate species. FIG. 7 shows capsids formed from recombinantly expressed endo-Gag genes from other vertebrate species.
Example 4--Selective Cellular Internalization of Arc Capsids
[0278] Capsids assembled from isolated recombinant human Arc protein (0.5 mg/ml) were fluorescently labeled by reacting with a 50-molar excess of NHS ester Alexa Fluor.TM. 594-NHS dye (Invitrogen.TM. #A20004) (dissolved in DMSO) in PBS (pH 8.5). Reactions were allowed to proceed for 2-hours in the dark. Alexa594-labeled capsids were then dialyzed with PBS (pH 7.5) overnight at room temperature in the dark with at least two buffer exchanges to remove any unlabeled dye.
[0279] HeLa cells (ATCC.RTM. CCL-2.TM.) were seeded 24-hours prior to the experiment in 96-well plates at counts such that they reach .about.80% confluency for treatment. Labeled-capsids were then spiked into complete tissue culture media to a final capsid concentration of 0.05 mg/ml. Treatments proceed for 4-hours at 37.degree. C., and then cells are washed 3-times with imaging media (DMEM, no phenol red, with 10% FBS and 20 mM HEPES) containing 10 ug/ml Hoechst nuclear stain prior to imaging. Fluorescence microscopy revealed a punctate staining pattern, suggesting that the Arc capsids were internalized by the HeLa cells (FIG. 8). Little or no intracellular staining was observed after administration of Alexa Fluor.TM. 594-labeled bovine serum albumin (BSA) (final concentration of 0.05 mg/ml) or 45.6 .mu.M Alexa Fluor.TM. 594 under identical conditions.
Example 5--Heterologous RNA Delivery by Arc Capsids
[0280] Human Arc capsids were loaded with Cre RNA by spiking in excess RNA during capsid formation (by dialysis into 1M sodium phosphate). Cre RNA-loaded capsids were administered to HeLa cells in biological triplicate at a final capsid concentration of 0.05 mg/ml for 4-hours at 37.degree. C. The cells were then washed 3-times with ice-cold 1.times.PBS prior to RNA extraction (Invitrogen.TM. TRIzol.TM. Reagent #15596026). Purified cell-associated RNA was quantified by qPCR in technical triplicate, normalizing values to cellular GAPDH-levels, and comparing to Escherichia coli rrsA mRNA and Arc RNA that could have carried over from protein purification. Table 2 shows primers used for the PCR reaction. The amount of cell-associated Cre RNA detected was >27-fold higher when Arc capsid were loaded with Cre RNA compared to control capsids not loaded with Cre RNA (FIG. 9).
TABLE-US-00003 TABLE 2 Primers for qPCR quantification of RNA delivered by Arc capsids to HeLa cells Gene - Primer Sequence SEQ ID NO: GAPDH-F AAGCTCATTTCCTGGTATGACAACGA 62 GAPDH-R AGGGTCTCTCTCTTCCTCTTGTGCT 63 rrsA-F GCTCAACCTGGGAACTGCATCTGAT 64 rrsA-R TAATCCTGTTTGCTCCCCACGCTTT 65 Arc CDS-F GGCCCCTCAGCTCCAGTGATTC 66 Arc CDS-R CCTGTTGTCACTCTCCTGGCTCTGA 67 Cre CDS-F GCCAAGACATAAGAAACCTCGCCT 68 Cre CDS-R GTGAATCAACATCCTCCCTCCGTC 69
[0281] FIG. 10 illustrates an alternative method of demonstrating the delivery of a heterologous RNA by an Arc or endo-Gag capsid. 6.times.His-tagged Arc or endo-Gag genes are expressed in a host cell. The resulting Arc monomers are mixed with translatable Cre mRNA under capsid forming conditions to form Cre mRNA loaded capsids. Cre-loaded capsids are then administered to LoxP-luciferase reporter mice. Upon successful delivery of Cre mRNA into mouse cells and subsequent translation of Cre recombinase protein, LoxP sites of the reporter are recombined, leading to luciferase expression, which is optionally detected by bioluminescence imaging upon administration of luciferin. This method is used to test the transmission potential of candidate Arc and endo-Gag genes. A positive luciferase signal indicates that the candidate Arc or endo-Gag gene encodes an Arc or endo-Gag protein capable of assembling into capsids that incorporate a heterologous cargo and deliver that cargo to a target cell.
[0282] While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.
TABLE-US-00004 TABLE 3 Arc and endo-Gag amino acid and nucleotide sequences SEQ ID NO: 1 GELDHRTSGGLHAYPGPRGGQVAKPNVILQIGKCRAEMLEHVRRTHRHLLAEVSKQVERELKGLHRSVGKLESN LDGYVPTSDSQRWKKSIKACLCRCQETIANLERWVKREMHVWREVFYRLERWADRLESTGGKYPVGSESARHT VSVGVGGPESYCHEADGYDYTVSPYAITPPPAAGELPGQEPAEAQQYQPWVPGEDGQPSPGVDTQIFEDPREF LSHLEEYLRQVGGSEEYWLSQIQNHMNGPAKKWWEFKQGSVKNWVEFKKEFLQYSEGTLSREAIQRELDLPQ KQGEPLDQFLWRKRDLYQTLYVDADEEEIIQYVVGTLQPKLKRFLRHPLPKTLEQLIQRGMEVQDDLEQAAEPA GPHLPVEDEAETLTPAPNSESVASDRTQPE SEQ ID NO: 2 GELDQRTTGGLHAYPAPRGGPVAKPNVILQIGKCRAEMLEHVRRTHRHLLTEVSKQVERELKGLHRSVGKLESN LDGYVPTGDSQRWRKSIKACLCRCQETIANLERWVKREMHVWREVFYRLERWADRLESMGGKYPVGSNPSR HTTSVGVGGPESYGHEADTYDYTVSPYAITPPPAAGELPGQEAVEAQQYPPWGLGEDGQPSPGVDTQIFEDPR EFLSHLEEYLRQVGGSEEYWLSQIQNHMNGPAKKWWEYKQGSVKNWVEFKKEFLQYSEGALSREAVQRELDL PQKQGEPLDQFLWRKRDLYQTLYVDADEEEIIQYVVGTLQPKLKRFLRPPLPKTLEQLIQKGMEVEDGLEQVAE- P ASPHLPTEEESEALTPALTSESVASDRTQPE SEQ ID NO: 3 GELDHRTTGGLHAYPAPRGGPAAKPNVILQIGKCRAEMLEHVRRTHRHLLAEVSKQVERELKGLHRSVGKLESN LDGYVPTGDSQRWKKSIKACLSRCQETIANLERWVKREMHVWREVFYRLERWADRLESGGGKYPVGSDPARH TVSVGVGGPESYCQDADNYDYTVSPYAITPPPAAGQLPGQEEVEAQQYPPWAPGEDGQLSPGVDTQVFEDPR EFLRHLEDYLRQVGGSEEYWLSQIQNHMNGPAKKWWEYKQGSVKNWVEFKKEFLQYSEGTLSREAIQRELDL PQKQGEPLDQFLWRKRDLYQTLYVDAEEEEIIQYVVGTLQPKLKRFLRPPLPKTLEQLIQKGMEVQDGLEQAAE- P AAEEAEALTPALTNESVASDRTQPE SEQ ID NO: 4 GELDRLNPSSGLHPSSGLHPYPGLRGGATAKPNVILQIGKCRAEMLEHVRKTHRHLLTEVSRQVERELKGLHKS- V GKLESNLDGYVPSSDSQRWKKSIKACLSRCQETIAHLERWVKREMNVWREVFYRLERWADRLEAMGGKYPAG EQARRTVSVGVGGPETCCPGDESYDCPISPYAVPPSTGESPESLDQGDQHYQQWFALPEESPVSPGVDTQIFED PREFLRHLEKYLKQVGGTEEDWLSQIQNHMNGPAKKWWEYKQGSVKNWLEFKKEFLQYSEGTLTRDALKREL DLPQKQGEPLDQFLWRKRDLYQTLYVDADEEEIIQYVVGTLQPKLKRFLHHPLPKTLEQLIQRGQEVQNGLEPT- D DPAGQRTQSEDNDESLTPAVTNESTASEGTLPE SEQ ID NO: 5 GQLDNVTNAGIHSFQGHRGVANKPNVILQIGKCRAEMLEHVRRTHRHLLSEVSKQVERELKGLQKSVGKLENN LEDHVPTDNQRWKKSIKACLARCQETIAHLERWVKREMNVWKEVFFRLEKWADRLESMGGKYCPGEHGKQT VSVGVGGPEIRPSEGEIYDYALDMSQMYALTPPPGEMPSIPQAHDSYQWVSVSEDAPASPVETQVFEDPREFLS HLEEYLKQVGGTEEYWLSQIQNHMNGPAKKWWEYKQDSVKNWVEFKKEFLQYSEGTLTRDAIKRELDLPQKE GEPLDQFLWRKRDLYQTLYVDADEEEIIQYVVGTLQPKLKRFLSYPLPKTLEQLIQRGKEVQGNMDHSDEPSPQ- R TPEIQSGDSVESMPPSTTASPVPSNGTQPEPPSPPATVI SEQ ID NO: 6 GQLDNVTNAGIHSFQGHRGVANKPNVILQIGKCRAEMLEHVRRTHRHLLSEVSKQVERELKGLQKSVGKLENN LEDHVPTDNQRWKKSIKACLARCQETIAHLERWVKREMNVWKEVFFRLEKWADRLESMGGKYCPGEHGKQT VSVGVGGPEIRPSEGEIYDYALDMSQMYALTPPPGEVPSIPQAHDSYQWVSVSEDAPASPVETQVFEDPREFLS HLEEYLKQVGGTEEYWLSQIQNHMNGPAKKWWEYKQDSVKNWVEFKKEFLQYSEGTLTRDAIKRELDLPQKE GEPLDQFLWRKRDLYQTLYVDADEEEIIQYVVGTLQPKLKRFLSYPLPKTLEQLIQRGKEVQGNMDHSEEPSPQ- R TPEIQSGDSVDSVPPSTTASPVPSNGTQPE SEQ ID NO: 7 GQLDNVTNAGIHSFQGHRGVANKPNVILQIGKCRAEMLEHVRRTHRHLLSEVSKQVERELKGLQKSVGKLENN LEDHVPTDNQRWKKSIKACLARCQETIAHLERWVKREMNVWKEVFFRLEKWADRLESMGGKYCPGDHGKQT VSVGVGGPEIRPSEGEIYDYALDMSQMYALTPPPGEVPSIPQAHDSYQWVSTSEDAPASPVETQVFEDPREFLS HLEEYLKQVGGTEEYWLSQIQNHMNGPAKKWWEYKQDSVKNWVEFKKEFLQYSEGTLTRDAIKRELDLPQKE GEPLDQFLWRKRDLYQTLYVDADEEEIIQYVVGTLQPKLKRFLSYPLPKTLEQLIQRGKEVQGNMDHSEEPSPQ- R TPEIQSGDSVDSVPPSTTASPVPSNGTQPE SEQ ID NO: 8 GSWGLQRHVADERRGLATPTYGAVCSIREKKASQLSGQSCLEKELLGWKCTEAIVEMMQVDNFNHGNLHSCQ GHRGMANHKPNVILQIGKCRAEMLDHVRRTHRHLLTEVSKQVERELKSLQKSVGKLENNLEDHVPSAAENQR WKKSIKACLARCQETIAHLERWVKREINVWKEVFFRLEKWADRLESGGGKYGPGDQSRQTVSVGVGAPEIQPR KEEIYDYALDMSQMYALTPPPMGEDPNVPQSHDSYQWITISDDSPPSPVETQIFEDPREFLTHLEDYLKQVGGT EEYWLSQIQNHMNGPAKKWWEYKQDSVKNWLEFKKEFLQYSEGTLTRDAIKQELDLPQKDGEPLDQFLWRK RDLYQTLYIDAEEEEVIQYVVGTLQPKLKRFLSHPYPKTLEQLIQRGKEVEGNLDNSEEPSPQRSPKHQLGGSV- ESL PPSSTASPVASDETHPDVSAPPVTVI SEQ ID NO: 9 GDGETQAENPSTSLNNTDEDILEQLKKIVMDQQHLYQKELKASFEQLSRKMFSQMEQMNSKQTDLLLEHQKQ TVKHVDKRVEYLRAQFDASLGWRLKEQHADITTKIIPEIIQTVKEDISLCLSTLCSIAEDIQTSRATIVTGHAA- VQTH PVDLLGEHHLGTTGHPRLQSTRVGKPDDVPESPVSLFMQGEARSRIVGKSPIKLQFPTFGKANDSSDPLQYLER- C EDFLALNPLTDEELMATLRNVLHGTSRDWWDVARHKIQTWREFNKHFRAAFLSEDYEDELAERVRNRIQKEDE SIRDFAYMYQSLCKRWNPAICEGDVVKLILKNINPQLPSQLRSRVTTVDELVRLGQQLEKDRQNQLQYELRKSS- G KIIQKSSSCETSALPNTKSTPNQQNPATSNRPPQVYCWRCKGHHAPASCPQWKADKHRAQPSRSSGPQTLTNL QAQDI SEQ ID NO: 10 GELDQRAAGGLRAYPAPRGGPVAKPSVILQIGKCRAEMLEHVRRTHRHLLTEVSKQVERELKGLHRSVGKLEGN LDGYVPTGDSQRWKKSIKACLCRCQETIANLERWVKREMHVWREVFYRLERWADRLESMGGKYPVGTNPSR HTVSVGVGGPEGYSHEADTYDYTVSPYAITPPPAAGELPGQEAVEAQQYPPWGLGEDGQPGPGVDTQIFEDP REFLSHLEEYLRQVGGSEEYWLSQIQNHMNGPAKKWWEFKQGSVKNWVEFKKEFLQYSEGTLSREAIQRELDL PQKQGEPLDQFLWRKRDLYQTLYVDAEEEEIIQYVVGTLQPKLKRFLRPPLPKTLEQLIQKGMEVQDGLEQAAE- P ASPRLPPEEESEALTPALTSESVASDRTQPE SEQ ID NO: 11 GQLDNVTNAGIHSFQGHRGVANKPNVILQIGKCRAEMLEHVRRTHRHLLSEVSKQVERELKGLQKSVGKLENN LEDHVPTDNQRWKKSIKACLARCQETIAHLERWVKREMNVWKEVFFRLEKWADRLESMGGKYCPGEHGKQT VSVGVGGPEIRPSEGEIYDYALDMSQMYALTPGPGEVPSIPQAHDSYQWVSVSEDAPASPVETQIFEDPHEFLS HLEEYLKQVGGTEEYWLSQIQNHMNGPAKKWWEYKQDSVKNWVEFKKEFLQYSEGTLTRDAIKRELDLPQKE GEPLDQFLWRKRDLYQTLYVDADEEEIIQYVVGTLQPKLKRFLSYPLPKTLEQLIQRGKEVQGNMDHSEEPSPQ- R TPEIQSGDSVESMPPSTTASPVPSNGTQPEPPSPPATVI SEQ ID NO: 12 GQLENINQGSLHAFQGHRGVVHNNKPNVILQIGKCRAEMLEHVRRTHRHLLTEVSKQVERELKGLQKSVGKLE NNLEDHVPSAAENQRWKKSIKACLARCQETIANLERWVKREMNVWKEVFFRLERWADRLESGGGKYCHADQ GRQTVSVGVGGPEVRPSEGEIYDYALDMSQMYALTPPPMGDVPVIPQPHDSYQWVTDPEEAPPSPVETQIFE DPREFLTHLEDYLKQVGGTEEYWLSQIQNHMNGPAKKWWEYKQDSVKNWLEFKKEFLQYSEGTLTRDAIKQE LDLPQKEGEPLDQFLWRKRDLYQTLYVEAEEEEVIQYVVGTLQPKLKRFLSHPYPKTLEQLIQRGKEVEGNLDN- SE EPSPQRTPEHQLGDSVESLPPSTTASPAGSDKTQPEISLPPTTVI SEQ ID NO: 13 GQLDSVTNAGVHTYQGHRSVANKPNVILQIGKCRTEMLEHVRRTHRHLLTEVSKQVERELKGLQKSVGKLENN LEDHVPTDNQRWKKSIKACLARCQETIAHLERWVKREMNVWKEVFFRLERWADRLESMGGKYCPTDSARQT VSVGVGGPEIRPSEGEIYDYALDMSQMYALTPSPGELPSVPQPHDSYQWVTSPEDAPASPVETQVFEDPREFLC HLEEYLKQVGGTEEYWLSQIQNHMNGPAKKWWEYKQDTVKNWVEFKKEFLQYSEGTLTRDAIKRELDLPQKD GEPLDQFLWRKRDLYQTLYIDADEEQIIQYVVGTLQPKLKRFLSYPLPKTLEQLIQKGKEVQGSLDHSEEPSPQ- RA SEARTGDSVETLPPSTTTSPNTSSGTQPEAPSPPATVI SEQ ID NO: 14 GQLDSVTNAGVHTYQGHRGVANKPNVILQIGKCRTEMLEHVRRTHRHLLTEVSKQVERELKGLQKSVGKLENN LEDHVPTDNQRWKKSIKACLARCQETIAHLERWVKREMNVWKEVFFRLERWADRLESMGGKYCPTDSARQT VSVGVGGPEIRPSEGEIYDYALDMSQMYALTPSPGELPSIPQPHDSYQWVTSPEDAPASPVETQVFEDPREFLC HLEEYLKQVGGTEEYWLSQIQNHMNGPAKKWWEYKQDTVKNWVEFKKEFLQYSEGTLTRDAIKRELDLPQKD GEPLDQFLWRKRDLYQTLYIDADEEQIIQYVVGTLQPKLKRFLSYPLPKTLEQLIQKGKEVQGSLDHSEEPSPQ- RA SEARTGDSVESLPPSTTTSPNASSGTQPEAPSPPATVI SEQ ID NO: 15 GQLENVNHGNLHSFQGHRGGVANKPNVILQIGKCRAEMLDHVRRTHRHLLTEVSKQVERELKGLQKSVGKLE NNLEDHVPSAVENQRWKKSIKACLSRCQETIAHLERWVKREMNVWKEVFFRLERWADRLESGGGKYCHGDN HRQTVSVGVGGPEVRPSEGEIYDYALDMSQMYALTPPSPGDVPVVSQPHDSYQWVTVPEDTPPSPVETQIFED PREFLTHLEDYLKQVGGTEEYWLSQIQNHMNGPAKKWWEYKQDSVKNWLEFKKEFLQYSEGTLTRDAIKEELD LPQKDGEPLDQFLWRKRDLYQTLYVEADEEEVIQYVVGTLQPKLKRFLSHPYPKTLEQLIQRGKEVEGNLDNSE- E PTPQRTPEHQLCGSVESLPPSSTVSPVASDGTQPETSPLPATVI SEQ ID NO: 16 GPLTLLQDWCRGEHLNTRRCMLILGIPEDCGEDEFEETLQEACRHLGRYRVIGRMFRREENAQAILLELAQDID- Y ALLPREIPGKGGPWEVIVKPRNSDGEFLNRLNRFLEEERRTVSDMNRVLGSDTNCSAPRVTISPEFWTWAQTLG AAVQPLLEQMLYRELRVFSGNTISIPGALAFDAWLEHTTEMLQMWQVPEGEKRRRLMECLRGPALQVVSGLR ASNASITVEECLAALQQVFGPVESHKIAQVKLCKAYQEAGEKVSSFVLRLEPLLQRAVENNVVSRRNVNQTRLK- R VLSGATLPDKLRDKLKLMKQRRKPPGFLALVKLLREEEEWEATLGPDRESLEGLEVAPRPPARITGVGAVPLPA- S GNSFDARPSQGYRRRRGRGQHRRGGVARAGSRGSRKRKRHTFCYSCGEDGHIRVQCINPSNLLLAKETKEILEG
GEREAQTNSR SEQ ID NO: 17 GALTLLEDWCKGMDMDPRKALLIVGIPMECSEVEIQDTVKAGLQPLCAYRVLGRMFRREDNAKAVFIELADTV NYTTLPSHIPGKGGSWEVVVKPRNPDDEFLSRLNYFLKDEGRSMTDVARALGCCSLPAESLDAEVMPQVRSPPL EPPKESMWYRKLKVFSGTASPSPGEETFEDWLEQVTEIMPIWQVSEVEKRRRLLESLRGPALSIMRVLQANNDS ITVEQCLDALKQIFGDKEDFRASQFRFLQTSPKIGEKVSTFLLRLEPLLQKAVHKSPLSVRSTDMIRLKHLLAR- VAM TPALRGKLELLDQRGCPPNFLELMKLIRDEEEWENTEAVMKNKEKPSGRGRGASGRQARAEASVSAPQATVQA RSFSDSSPQTIQGGLPPLVKRRRLLGSESTRGEDHGQATYPKAENQTPGREGPQAAGEELGNEAGAGAMSHPK PWET SEQ ID NO: 18 GAVTMLQDWCRWMGVNARRGLLILGIPEDCDDAEFQESLEAALRPMGHFTVLGKAFREEDNATAALVELDRE VNYALVPREIPGTGGPWNVVFVPRCSGEEFLGLGRVFHFPEQEGQMVESVAGALGVGLRRVCWLRSIGQAVQ PWVEAVRCQSLGVFSGRDQPAPGEESFEVWLDHTTEMLHVWQGVSERERRRRLLEGLRGTALQLVHALLAEN PARTAQDCLAALAQVFGDNESQATIRVKCLTAQQQSGERLSAFVLRLEVLLQKAMEKEALARASADRVRLRQM LTRAHLTEPLDEALRKLRMAGRSPSFLEMLGLVRESEAWEASLARSVRAQTQEGAGARAGAQAVARASTKVEA VPGGPGREPEGLLQAGGQEAEELLQEGLKPVLEECDN SEQ ID NO: 19 GAVTMLQDWCRWMGVNARRGLLILGIPEDCDDAEFQESLEAALRPMGHFTVLGKVFREEDNATAALVELDRE VNYALVPREIPGTGGPWNVVFVPRCSGEEFLGLGRVFHFPEQEGQMVESVAGALGVGLRRVCWLRSIGQAVQ PWVEAVRYQSLGVFSGRDQPAPGEESFEVWLDHTTEMLHVWQGVSERERRRRLLEGLRGTALQLVHALLAEN PARTAQDCLAALAQVFGDNESQATIRVKCLTAQQQSGERLSAFVLRLEVLLQKAMEKEALARASADRVRLRQM LTRAHLTEPLDEALRKLRMAGRSPSFLEMLGLVRESEAWEASLARSVRAQTQEGAGARAGAQAVARASTKVEA VPGGPGREPEGLRQAGGQEAEELLQEGLKPVLEECDN SEQ ID NO: 20 GVEDLAASYIVLKLENEIRQAQVQWLMEENAALQAQIPELQKSQAAKEYDLLRKSSEAKEPQKLPEHMNPPAA WEAQKTPEFKEPQKPPEPQDLLPWEPPAAWELQEAPAAPESLAPPATRESQKPPMAHEIPTVLEGQGPANTQ DATIAQEPKNSEPQDPPNIEKPQEAPEYQETAAQLEFLELPPPQEPLEPSNAQEFLELSAAQESLEGLIVVETS- AAS EFPQAPIGLEATDFPLQYTLTFSGDSQKLPEFLVQLYSYMRVRGHLYPTEAALVSFVGNCFSGRAGWWFQLLLD- I QSPLLEQCESFIPVLQDTFDNPENMKDANQCIHQLCQGEGHVATHFHLIAQELNWDESTLWIQFQEGLASSIQ DELSHTSPATNLSDLITQCISLEEKPDPNPLGKSSSAEGDGPESPPAENQPMQAAINCPHISEAEWVRWHKGRL CLYCGYPGHFARDCPVKPHQALQAGNIQACQ SEQ ID NO: 21 GVQPQTSKAESPALAASPNAQMDDVIDTLTSLRLTNSALRREASTLRAEKANLTNMLESVMAELTLLRTRARIP- G ALQITPPISSITSNGTRPMTTPPTSLPEPFSGDPGRLAGFLMQMDRFMIFQASRFPGEAERVAFLVSRLTGEAE- K WAIPHMQPDSPLRNNYQGFLAELRRTYKSPLRHARRAQIRKTSASNRAVRERQMLCRQLASAGTGPCPVHPAS NGTSPAPALPARARNL SEQ ID NO: 22 GDGRVQLMKALLAGPLRPAARRWRNPIPFPETFDGDTDRLPEFIVQTSSYMFVDENTFSNDALKVTFLITRLTG- P ALQWVIPYIRKESPLLNDYRGFLAEMKRVFGWEEDEDF SEQ ID NO: 23 GEGRVQLMKALLARPLRPAARRWRNP1PFPETFDGDTDRLPEFIVQTSSYMFVDENTFSNDALKVTFLITRLTG- P ALQWVIPYIKKESPLLSDYRGFLAEMKRVFGWEEDEDF SEQ ID NO: 24 GPRGRCRQQGPRIPIWAAANYANAHPWQQMDKASPGVAYTPLVDPW1ERPCCGDTVCVRTTMEQKSTASG TCGGKPAERGPLAGHMPSSRPHRVDFCWVPGSDPGTFDGSPWLLDRFLAQLGDYMSFHFEHYQDNISRVCEI LRRLTGRAQAWAAPYLDGDLPLPDDYELFCQDLKEVVQDPNSFAEYHAVVICPLPLASSQLPVAPQLPVVRQYL ARFLEGLALDMGTAPRSLPAAMATPAVSGSNSVSRSALFEQQLTKESTPGPKEPPVLPSSTCSSKPGPVEPASS- Q PEEAAPTPVPRLSESANPPAQRPDPAHPGGPKPQKTEEEVLETEGDQEVSLGTPQEVVEAPETPGEPPLSPGF SEQ ID NO: 25 GVDELVLLLHALLMRHRALSIENSQLMEQLRLLVCERASLLRQVRPPSCPVPFPETFNGESSRLPEFIVQTASY- ML VNENRFCNDAMKVAFLISLLTGEAEEWVVPYIEMDSP1LGDYRAFLDEMKQCFGWDDDEDDDDEEEEDDY SEQ ID NO: 26 GPVDLGQALGLLPSLAKAEDSQFSESDAALQEELSSPETARQLFRQFRYQVMSGPHETLKQLRKLCFQWLQPEV HTKEQILEILMLEQFLTILPGEIQMWVRKQCPGSGEEAVTLVESLKGDPQRLWQWISIQVLGQDILSEKMESPS- C QVGEVEPHLEVVPQELGLENSSSGPGELLSHIVKEESDTEAELALAASQPARLEERLIRDQDLGASLLPAAPQE- Q WRQLDSTQKEQYWDLMLETYGKMVSGAGISHPKSDLTNSIEFGEELAGIYLHVNEK1PRPTCIGDRQENDKENL NLENHRDQELLHASCQASGEVPSQASLRGFFTEDEPGCFGEGENLPEALQNIQDEGTGEQLSPQERISEKQLGQ HLPNPHSGEMSTMWLEEKRETSQKGQPRAPMAQKLPTCRECGKTFYRNSQL1FHQRTHIGETYFQCTICKKAF LRSSDFVKHQRTHTGEKPCKCDYCGKGFSDFSGLRHHEKIHTGEKPYKCPICEKSFIQRSNFNRHQRVHTGEKP- Y KCSHCGKSFSWSSSLDKHQRSHLGKKPFQ SEQ ID NO: 27 GTLRLLEDWCRGMDMNPRKALLIAGISQSCSVAE1EEALQAGLAPLGEYRLLGRMFRRDENRKVALVGLTAETS HALVPKEIPGKGGIWRVIFKPPDPDNTFLSRLNEFLAGEGMTVGELSRALGHENGSLDPEQGMIPEMWAPMLA QALEALQPALQCLKYKKLRVFSGRESPEPGEEEFGRWMFHTTQMIKAWQVPDVEKRRRLLESLRGPALDVIRVL KINNPLITVDECLQALEEVFGVTDNPRELQVKYLTTYHKDEEKLSAYVLRLEPLLQKLVQRGAIERDAVNQARL- DQ VIAGAVHKTIRRELNLPEDGPAPGFLQLLVLIKDYEAAEEEEALLQAILEGNFT SEQ ID NO: 28 GTERRRDELSEEINNLREKVMKQSEENNNLQSQVQKLTEENTTLREQVEPTPEDEDDDIELRGAAAAAAPPPP1- E EECPEDLPEKFDGNPDMLAPFMAQCQIFMEKSTRDFSVDRVRVCFVTSMMTGRAARWASAKLERSHYLMHN YPAFMMEMKHVFEDPQRREVAKRKIRRLRQGMGSVIDYSNAFQMIAQDLDWNEPALIDQYHEGLSDHIQEEL SHLEVAKSLSALIGQCIHIERRLARAAAARKPRSPPRALVLPHIASHHQVDPTEPVGGARMRLTQEEKERRRKL- NL CLYCGTGGHYADNCPAKASKSSPAGKLPGPAVEGPSATGPEIIRSPQDDASSPHLQVMLQIHLPGRHTLFVRAM IDSGASGNFIDHEYVAQNGIPLRIKDWP1LVEAIDGRPIASGPVVHETHDLIVDLGDHREVLSFDVTQSPFFPV- VL GVRWLSTHDPNITWSTRSIVFDSEYCRYHCRMYSP1PPSLPPPAPQPPLYYPVDGYRVYQPVRYYYVQNVYTPV DEHVYPDHRLVDPHIEMIPGAHSIPSGHVYSLSEPEMAALRDFVARNVKDGLITPTIAPNGAQVLQVKRGWKL QVSYDCRAPNNFTIQNQYPRLSIPNLEDQAHLATYTEFVPQIPGYQTYPTYAAYPTYPVGFAWYPVGRDGQGRS LYVPVMITWNPHWYRQPPVPQYPPPQPPPPPPPPPPPPSYSTL SEQ ID NO: 29 GGGGAGCTGGACCACCGGACCAGCGGCGGGCTCCACGCCTACCCCGGGCCGCGGGGCGGGCAGGTGGCC AAGCCCAACGTGATCCTGCAGATCGGGAAGTGCCGGGCCGAGATGCTGGAGCACGTGCGGCGGACGCAC CGGCACCTGCTGGCCGAGGTGTCCAAGCAGGTGGAGCGCGAGCTGAAGGGGCTGCACCGGTCGGTCGGG AAGCTGGAGAGCAACCTGGACGGCTACGTGCCCACGAGCGACTCGCAGCGCTGGAAGAAGTCCATCAAG GCCTGCCTGTGCCGCTGCCAGGAGACCATCGCCAACCTGGAGCGCTGGGTCAAGCGCGAGATGCACGTGT GGCGCGAGGTGTTCTACCGCCTGGAGCGCTGGGCCGACCGCCTGGAGTCCACGGGCGGCAAGTACCCGGT GGGCAGCGAGTCAGCCCGCCACACCGTTTCCGTGGGCGTGGGGGGTCCCGAGAGCTACTGCCACGAGGC AGACGGCTACGACTACACCGTCAGCCCCTACGCCATCACCCCGCCCCCAGCCGCTGGCGAGCTGCCCGGGC AGGAGCCCGCCGAGGCCCAGCAGTACCAGCCGTGGGTCCCCGGCGAGGACGGGCAGCCCAGCCCCGGCG TGGACACGCAGATCTTCGAGGACCCTCGAGAGTTCCTGAGCCACCTAGAGGAGTACTTGCGGCAGGTGGG CGGCTCTGAGGAGTACTGGCTGTCCCAGATCCAGAATCACATGAACGGGCCGGCCAAGAAGTGGTGGGA GTTCAAGCAGGGCTCCGTGAAGAACTGGGTGGAGTTCAAGAAGGAGTTCCTGCAGTACAGCGAGGGCAC GCTGTCCCGAGAGGCCATCCAGCGCGAGCTGGACCTGCCGCAGAAGCAGGGCGAGCCGCTGGACCAGTTC CTGTGGCGCAAGCGGGACCTGTACCAGACGCTCTACGTGGACGCGGACGAGGAGGAGATCATCCAGTAC GIGGTGGGCACCCTGCAGCCCAAGCTCAAGCGTTTCCTGCGCCACCCCCTGCCCAAGACCCTGGAGCAGCT CATCCAGAGGGGCATGGAGGTGCAGGATGACCTGGAGCAGGCGGCCGAGCCGGCCGGCCCCCACCTCCC GGTGGAGGATGAGGCGGAGACCCTCACGCCCGCCCCCAACAGCGAGTCCGTGGCCAGTGACCGGACCCA GCCCGAG SEQ ID NO: 30 GGGGAATTGGATCAACGTACTACCGGIGGCCTTCACGCATACCCTGCACCACGCGGGGGCCCTGTCGCGA AGCCAAATGTCATCCTGCAGATTGGGAAGTGCCGGGCTGAGATGCTGGAGCACGTCCGTCGGACGCATCG TCATCTTCTTACTGAGGTGTCAAAACAGGTGGAGCGTGAACTCAAAGGCTTGCACCGCAGCGTTGGGAAAC TTGAAAGCAACTTAGATGGCTATGTGCCGACTGGCGACAGCCAGCGTTGGCGTAAGTCCATCAAAGCATGT TTGTGTCGTTGCCAGGAAACGATTGCAAACCTGGAGCGTTGGGTCAAACGGGAGATGCATGTCTGGCGTG AAGTATTTTATCGTTTAGAGCGTTGGGCCGATCGTTTAGAGAGCATGGGTGGTAAGTACCCTGTGGGGAGC AACCCTTCTCGGCATACGACGTCAGTCGGTGTTGGCGGGCCGGAGTCCTACGGTCATGAAGCGGACACCTA CGACTATACCGTAAGCCCTTATGCTATTACCCCACCACCTGCGGCCGGCGAATTACCTGGCCAGGAAGCCG TTGAGGCTCAACAATACCCTCCTTGGGGGCTGGGCGAGGATGGTCAACCTAGCCCAGGGGTAGACACGCA AATCTTTGAGGACCCACGGGAGTTTCTTTCCCACCTGGAAGAATACCTGCGTCAGGTTGGTGGGAGCGAAG AATACTGGCTGTCACAAATTCAAAACCATATGAATGGTCCTGCAAAAAAATGGIGGGAATATAAACAGGGT TCCGTGAAAAACTGGGTTGAGTTTAAAAAGGAGTTTCTTCAATATTCCGAGGGCGCCCTCAGTCGGGAGGC GGTCCAACGCGAGTTGGACTTGCCACAGAAACAGGGGGAACCACTCGATCAATTCCTTTGGCGGAAACGT GACCTTTACCAGACATTGTACGTGGATGCAGATGAGGAAGAAATTATCCAATATGTTGTGGGGACCCTGCA GCCGAAACTGAAACGTTTCCTTCGCCCGCCGCTGCCTAAAACGTTGGAACAACTTATTCAGAAAGGTATGG AGGTCGAGGATGGCTTAGAACAAGTCGCAGAGCCGGCCTCGCCACACTTGCCTACAGAGGAGGAATCGGA GGCGCTGACCCCAGCACTTACATCAGAGTCAGTGGCATCAGACCGGACACAACCAGAG SEQ ID NO: 31 GGGGAGTTAGATCACCGTACAACGGGGGGGTTGCACGCATACCCTGCTCCACGTGGCGGGCCGGCAGCTA AGCCAAACGTAATCCTGCAGATTGGGAAGTGCCGGGCAGAGATGTTGGAGCACGTCCGGCGGACCCACCG GCACCTCCTGGCTGAAGTGTCTAAACAAGTAGAACGGGAACTCAAAGGTCTTCATCGTAGCGTCGGGAAAT
TGGAATCGAATTTGGACGGGTATGTTCCTACAGGCGACTCACAGCGGTGGAAAAAGAGCATCAAGGCCTG CCTGAGTCGCTGCCAGGAGACGATTGCTAACCTCGAACGCTGGGTTAAGCGGGAGATGCACGTTTGGCGC GAAGTCTTCTACCGGCTGGAGCGTTGGGCTGATCGGCTCGAATCTGGTGGGGGTAAGTATCCAGTTGGGT CCGACCCTGCTCGCCACACAGTCTCAGTTGGCGTAGGTGGGCCGGAGTCGTATTGCCAAGATGCGGACAA CTATGATTATACAGTTTCCCCATACGCGATCACACCACCGCCGGCAGCAGGGCAGCTGCCAGGTCAGGAAG AGGTTGAGGCCCAGCAGTATCCACCATGGGCCCCAGGGGAAGACGGCCAGCTTTCTCCTGGGGTGGACAC TCAAGTTTTTGAAGATCCGCGTGAATTTCTGCGGCATTTAGAAGATTATCTCCGCCAGGTCGGGGGGTCTG AAGAGTATTGGTTAAGCCAAATTCAAAACCATATGAACGGCCCGGCCAAGAAGTGGTGGGAGTACAAGCA AGGGTCTGTGAAAAATTGGGTGGAGTTTAAGAAAGAATTCTTGCAATATTCTGAGGGCACTCTTTCGCGTG AAGCCATCCAACGCGAACTCGACTTACCGCAGAAACAAGGGGAACCTCTCGACCAATTTCTGTGGCGCAAA CGCGACCTGTACCAGACTCTTTACGTCGATGCTGAGGAGGAAGAAATTATTCAATACGTAGTTGGCACACT GCAGCCTAAGCTTAAACGGTTTTTACGTCCACCATTGCCGAAGACGCTTGAACAACTCATCCAGAAGGGTA TGGAGGTTCAAGATGGTCTGGAACAGGCAGCGGAACCAGCGGCGGAGGAGGCAGAAGCCCTGACACCTG CGTTAACTAACGAGTCTGTCGCGAGCGACCGCACCCAGCCGGAA SEQ ID NO: 32 GGGGAATTAGACCGCCTGAACCCAAGCTCAGGCCTGCATCCATCCTCTGGTTTGCATCCATACCCAGGTCTC CGGGGCGGGGCAACCGCGAAGCCTAATGTCATTTTGCAAATTGGCAAATGCCGTGCGGAAATGCTTGAAC ACGTCCGCAAAACTCACCGTCATCTCCTCACAGAAGTATCGCGCCAAGTAGAACGCGAGCTCAAAGGCCTT CACAAAAGTGTTGGCAAGTTGGAATCAAATCTTGATGGGTACGTACCGTCAAGCGACTCCCAACGCTGGAA GAAAAGCATTAAGGCGTGCTTATCCCGTTGCCAAGAGACGATTGCGCATTTAGAACGCTGGGTTAAACGTG AAATGAATGTATGGCGTGAGGTGTTCTACCGTTTGGAACGTTGGGCGGACCGTCTGGAGGCTATGGGCGG TAAGTATCCTGCCGGTGAGCAGGCCCGGCGTACAGTTTCAGTGGGCGTTGGGGGCCCTGAGACATGTTGT CCAGGGGATGAAAGTTATGATTGTCCGATTTCTCCGTATGCAGTTCCACCTTCCACCGGCGAGTCTCCGGAA TCCTTAGACCAAGGGGATCAGCACTATCAGCAGTGGTTTGCCCTCCCGGAGGAGTCCCCTGTTAGCCCTGG GGTTGATACCCAGATCTTTGAAGATCCTCGCGAGTTTTTACGTCATCTGGAGAAGTACCTGAAACAAGTCG GCGGGACAGAGGAAGACTGGCTTTCTCAAATCCAGAATCACATGAATGGGCCGGCGAAGAAGTGGTGGG AGTACAAGCAAGGGAGTGTTAAGAATTGGCTTGAATTTAAGAAGGAATTTTTACAGTATTCGGAGGGCAC ACTGACGCGGGACGCGTTGAAACGTGAACTGGATCTCCCACAGAAACAAGGCGAACCACTTGATCAATTTT TATGGCGGAAGCGCGACTTATATCAGACACTCTACGTTGACGCCGATGAAGAGGAAATCATTCAGTACGTC GTGGGCACTCTTCAGCCGAAATTAAAACGCTTTCTCCATCACCCACTCCCTAAGACGCTTGAGCAGCTTATC CAACGGGGCCAAGAAGTTCAGAATGGTCTGGAGCCTACCGACGATCCTGCAGGCCAACGCACTCAATCGG AGGACAACGACGAAAGCCTTACCCCTGCCGTCACCAATGAGAGTACTGCAAGCGAGGGCACCCTGCCAGA G SEQ ID NO: 33 GGGCAGCTTGATAACGTTACAAACGCGGGCATCCACTCCTTCCAGGGGCATCGTGGCGTAGCGAATAAGC CAAATGTCATTCTGCAAATTGGTAAATGTCGTGCGGAAATGCTGGAGCACGTTCGCCGCACCCACCGCCAT TTATTATCTGAAGTATCTAAGCAGGTAGAACGTGAGCTGAAAGGGCTGCAAAAGTCCGTGGGCAAGCTCG AGAATAACTTGGAGGATCATGTCCCTACAGATAACCAACGCTGGAAGAAGTCCATTAAAGCGTGCTTGGCT CGTTGTCAAGAGACTATCGCGCATTTAGAGCGTTGGGTGAAACGCGAAATGAACGTCTGGAAGGAGGTGT TTTTCCGGCTGGAAAAGTGGGCAGACCGGCTGGAGTCAATGGGTGGCAAGTACTGCCCGGGCGAACACG GGAAACAAACCGTCAGTGTAGGCGTGGGGGGTCCTGAAATCCGGCCTTCGGAGGGGGAAATTTATGATTA TGCTCTGGATATGAGCCAGATGTATGCACTCACCCCACCTCCAGGCGAAATGCCATCAATCCCACAAGCCCA TGACAGCTATCAGTGGGTTAGTGTCTCAGAAGATGCCCCGGCGAGCCCTGTCGAAACCCAGGTATTTGAGG ACCCTCGGGAATTCCTGTCTCACCTGGAGGAATACCTGAAGCAGGTAGGCGGCACGGAGGAGTATTGGTT GTCCCAGATCCAGAATCACATGAATGGTCCGGCAAAAAAATGGTGGGAATATAAACAGGACTCCGTTAAA AACTGGGTTGAGTTTAAAAAGGAATTCTTGCAATACTCTGAAGGTACTTTAACTCGGGATGCTATTAAGCGT GAACTCGACTTGCCGCAAAAGGAAGGTGAACCTCTTGACCAATTCCTTTGGCGGAAGCGGGACCTCTATCA GACACTTTACGTGGACGCGGATGAGGAGGAGATCATTCAGTATGTGGTCGGTACCCTGCAGCCGAAGCTC AAGCGTTTCCTGAGCTATCCTCTCCCAAAGACTTTAGAACAGCTCATCCAGCGCGGTAAAGAAGTGCAGGG TAACATGGATCACTCCGATGAGCCTTCGCCGCAGCGTACACCTGAAATTCAATCAGGTGACTCCGTAGAAT CTATGCCACCTTCAACAACGGCATCTCCGGTTCCATCTAATGGTACCCAACCTGAGCCGCCGAGCCCGCCAG CCACCGTTATC SEQ ID NO: 34 GGGCAACTTGACAACGTAACAAACGCTGGGATTCACTCCTTTCAGGGCCACCGCGGTGTCGCCAACAAGCC AAACGTAATCTTGCAAATTGGCAAATGCCGTGCGGAGATGTTGGAACACGTTCGTCGTACACATCGTCACT TGCTGTCGGAAGTCTCTAAACAAGTAGAACGTGAACTTAAAGGGCTTCAAAAGTCAGTCGGCAAATTGGAA AACAACCTTGAAGACCATGTACCAACCGACAATCAGCGTTGGAAAAAGTCTATCAAAGCTTGCCTGGCCCG TTGTCAAGAGACGATTGCTCACCTGGAGCGGTGGGTAAAGCGCGAGATGAATGTGTGGAAAGAGGTCTTC TTCCGCTTGGAAAAATGGGCCGACCGTTTGGAGTCCATGGGCGGTAAATATTGTCCGGGTGAACATGGTA AGCAAACAGTCTCTGTGGGCGTTGGTGGGCCGGAGATTCGGCCTTCTGAAGGCGAGATTTACGATTATGC GCTCGACATGTCCCAGATGTATGCGCTTACACCACCACCGGGCGAGGTACCAAGCATTCCTCAAGCGCATG ACAGTTATCAGTGGGTTAGCGTATCCGAAGACGCTCCTGCCTCGCCGGTAGAGACCCAGGTTTTTGAAGAT CCTCGTGAATTTTTAAGCCACTTGGAGGAGTATTTGAAGCAGGTAGGGGGGACAGAGGAATATTGGCTGT CTCAGATCCAGAACCACATGAATGGCCCGGCTAAAAAGTGGTGGGAATACAAACAAGATTCGGTAAAGAA TTGGGTAGAATTTAAAAAGGAGTTTTTACAGTACTCAGAGGGGACTCTCACGCGTGATGCGATCAAACGCG AGTTGGATCTTCCTCAAAAAGAGGGGGAGCCACTCGATCAGTTCCTCTGGCGCAAGCGGGATCTCTACCAA ACACTCTACGTAGACGCAGACGAAGAAGAGATCATCCAGTACGTGGTGGGTACGCTCCAGCCGAAACTCA AACGTTTCCTCAGCTACCCACTTCCTAAGACTCTGGAACAACTGATTCAGCGGGGCAAAGAGGTCCAGGGT AACATGGACCATTCAGAGGAACCTAGTCCGCAACGTACACCTGAGATCCAATCTGGGGATTCTGTCGATTC GGTTCCACCTTCTACAACAGCGTCTCCGGTGCCGTCAAATGGGACCCAACCAGAG SEQ ID NO: 35 GGGCAGCTTGATAATGTAACCAATGCAGGTATCCACTCTTTCCAGGGTCACCGCGGTGTGGCAAACAAGCC AAATGTTATTCTGCAAATTGGTAAGTGTCGCGCTGAGATGTTAGAACACGTCCGGCGCACGCATCGGCATC TCCTGTCAGAGGTTTCAAAGCAGGTAGAGCGTGAATTAAAGGGCCTCCAGAAGTCCGTAGGTAAACTCGA AAATAATCTTGAAGACCACGTTCCTACCGATAATCAACGGTGGAAAAAGTCAATCAAGGCGTGCTTAGCAC GGTGTCAGGAAACGATCGCGCACCTCGAACGTTGGGTGAAGCGCGAAATGAATGTCTGGAAAGAAGTGTT CTTCCGGCTTGAGAAGTGGGCTGATCGGCTCGAATCCATGGGTGGCAAATATTGTCCAGGTGATCATGGCA AGCAAACGGTCTCCGTCGGTGTTGGTGGTCCGGAAATCCGGCCGAGCGAGGGTGAAATCTATGACTACGC TCTTGATATGTCCCAGATGTATGCACTCACTCCTCCGCCGGGTGAGGTCCCGTCGATCCCGCAGGCGCATGA CTCATACCAATGGGTGTCGACTAGCGAAGACGCACCAGCCTCCCCTGTTGAAACTCAAGTATTCGAGGACC CGCGTGAGTTCCTGAGCCATTTAGAGGAGTACCTTAAGCAGGTTGGTGGTACCGAGGAATACTGGTTGAG CCAGATTCAGAATCACATGAACGGGCCGGCTAAGAAATGGTGGGAATACAAGCAGGATTCAGTCAAGAAT TGGGTCGAATTTAAGAAGGAGTTTTTGCAGTACAGTGAGGGGACGCTCACACGCGACGCTATCAAACGGG AGCTGGACCTGCCACAAAAGGAGGGTGAACCGCTTGATCAGTTTCTTTGGCGCAAGCGTGATCTGTATCAA ACCCTGTATGTGGACGCTGACGAAGAAGAGATCATTCAGTACGTGGTTGGGACTCTGCAACCAAAGCTGA AGCGTTTTCTTTCTTATCCTCTCCCTAAGACACTGGAACAGTTAATCCAACGTGGCAAGGAGGTCCAGGGTA ATATGGACCACTCTGAGGAACCGAGCCCGCAACGTACTCCTGAAATTCAGAGCGGGGATAGTGTCGACTC AGTTCCTCCAAGTACGACCGCATCCCCGGTCCCAAGTAACGGTACCCAACCAGAG SEQ ID NO: 36 GGGTCTTGGGGCTTGCAACGTCACGTGGCTGATGAACGTCGTGGCCTCGCTACGCCTACCTACGGCGCGGT TTGTTCCATTCGGGAGAAAAAAGCCTCCCAACTGAGCGGCCAGAGCTGTTTGGAGAAAGAGTTGCTTGGTT GGAAATGTACGGAGGCAATCGTGGAAATGATGCAAGTCGATAACTTTAACCACGGTAACTTACATAGCTGC CAAGGCCATCGGGGGATGGCAAATCACAAACCGAACGTAATCCTTCAAATCGGGAAATGTCGCGCAGAAA TGTTAGACCACGTGCGTCGCACCCACCGCCATCTCTTGACGGAGGTTTCGAAGCAGGTAGAACGCGAATTG AAGTCTCTCCAAAAGTCGGTTGGCAAGCTCGAGAATAATCTGGAAGACCACGTGCCATCGGCAGCGGAGA ACCAACGTTGGAAGAAATCAATTAAAGCCTGCCTGGCCCGGTGCCAAGAAACAATTGCTCACCTCGAACGC TGGGTTAAACGCGAAATCAACGTCTGGAAAGAAGTATTCTTTCGTCTGGAGAAGTGGGCGGACCGCCTTG AGTCGGGTGGGGGCAAGTATGGGCCTGGTGACCAAAGTCGTCAAACTGTAAGTGTCGGTGTTGGGGCCCC AGAAATCCAACCGCGGAAAGAAGAAATCTATGACTACGCTCTCGACATGTCGCAGATGTATGCCTTAACAC CACCGCCGATGGGTGAAGACCCAAACGTACCTCAATCCCACGATAGCTACCAGTGGATTACCATCTCAGAC GATTCACCTCCGTCGCCAGTGGAAACTCAAATTTTCGAGGATCCACGCGAATTCCTTACCCATCTCGAGGAT TATCTTAAGCAAGTGGGCGGGACTGAAGAATATTGGTTGAGTCAGATTCAAAATCATATGAACGGTCCGGC CAAGAAATGGTGGGAGTACAAACAAGATTCCGTGAAAAACTGGTTGGAATTCAAGAAGGAATTCCTTCAA TACTCTGAGGGTACTTTGACACGTGACGCAATTAAACAAGAACTTGACTTACCGCAGAAGGACGGCGAGCC ATTGGATCAATTTCTTTGGCGGAAGCGGGACCTGTATCAGACGCTCTATATTGATGCAGAGGAGGAAGAA GTAATCCAATACGTTGTTGGCACACTCCAACCGAAATTAAAACGTTTCCTTTCCCACCCGTATCCGAAAACTT TGGAACAGTTAATCCAACGTGGGAAAGAGGIGGAAGGCAACCTCGATAACTCTGAGGAGCCTAGCCCGCA ACGGAGTCCAAAGCACCAATTGGGTGGTAGCGTCGAGAGCCTCCCACCTTCGTCGACCGCAAGTCCTGTTG CGTCAGACGAGACTCACCCAGACGTGAGCGCACCTCCGGTAACGGTGATT SEQ ID NO: 37 GGGGACGGCGAGACTCAAGCTGAGAATCCATCTACCAGCTTGAACAACACTGACGAAGATATCTTGGAAC AGCTCAAGAAAATTGTCATGGATCAACAACACCTGTATCAGAAAGAATTAAAGGCATCTTTTGAACAACTC AGTCGCAAAATGTTTTCCCAGATGGAACAAATGAATAGCAAGCAAACGGATCTGCTTTTAGAACATCAAAA ACAGACTGTCAAACATGTAGACAAGCGCGTGGAGTATTTGCGGGCGCAATTCGATGCATCGTTAGGCTGG CGGTTGAAAGAGCAACACGCGGATATTACGACCAAAATCATTCCTGAGATCATCCAAACGGTGAAGGAAG ATATTAGCCTGTGTCTTTCTACGCTCTGCAGTATCGCTGAAGATATCCAGACATCACGGGCTACCACTGTCA CAGGGCATGCTGCCGTACAAACCCATCCTGTGGATCTTTTGGGTGAACACCATTTAGGGACCACGGGGCAC CCACGCTTACAGTCGACCCGTGTAGGGAAACCAGACGACGTACCTGAGTCGCCGGTAAGCCTGTTTATGCA AGGTGAGGCGCGTTCCCGGATCGTTGGCAAGAGTCCGATTAAACTGCAATTTCCGACGTTCGGCAAAGCA AACGATTCTTCCGACCCACTCCAATATCTGGAGCGGTGTGAGGACTTTCTTGCTCTTAACCCTTTAACTGATG AGGAACTTATGGCTACTTTGCGGAATGTGTTACATGGCACCTCTCGGGATTGGIGGGATGTCGCACGTCAT AAAATCCAAACTTGGCGTGAGTTTAATAAACACTTCCGGGCGGCTTTCCTCAGCGAGGATTATGAAGATGA GTTGGCTGAGCGCGTCCGTAACCGCATCCAAAAAGAAGATGAGTCTATCCGCGATTTCGCTTATATGTATC AGTCCTTGTGCAAGCGGTGGAACCCTGCTATCTGCGAAGGTGATGTAGTAAAGCTCATCCTGAAGAACATC AATCCACAACTGCCGTCTCAGTTACGCTCCCGGGTCACGACCGTGGATGAGCTTGTTCGCTTGGGCCAGCA GCTTGAAAAAGATCGTCAGAATCAGCTCCAATATGAGCTTCGGAAGAGTTCCGGCAAAATTATCCAAAAAT CTAGTTCGTGCGAAACTTCAGCGCTCCCGAACACGAAGAGTACACCTAATCAACAAAACCCTGCTACCAGT AACCGTCCTCCACAGGTGTATTGCTGGCGGTGTAAGGGTCACCATGCCCCTGCCTCTTGTCCGCAATGGAA AGCTGATAAGCACCGTGCGCAACCTTCGCGGAGTTCTGGGCCACAAACTCTGACTAATCTCCAAGCTCAAG
ACATC SEQ ID NO: 38 GGGGAATTGGATCAACGTGCGGCAGGGGGCTTGCGCGCGTACCCGGCGCCGCGTGGTGGTCCAGTTGCC AAACCGAGCGTAATTCTTCAGATTGGTAAGTGCCGCGCTGAGATGCTGGAACACGTCCGCCGCACGCATCG CCATCTTCTGACGGAGGTAAGTAAACAAGTGGAGCGCGAACTCAAGGGGTTACATCGGTCTGTCGGTAAG TTGGAGGGCAATTTAGACGGCTATGTGCCTACCGGTGATTCCCAACGCTGGAAAAAAAGTATCAAGGCGT GTCTCTGCCGGTGTCAGGAAACAATTGCAAATCTCGAGCGTTGGGTGAAACGTGAGATGCATGTTTGGCGT GAGGTATTCTATCGTTTGGAACGGTGGGCAGACCGTTTGGAGTCTATGGGGGGCAAGTATCCGGTGGGCA CTAACCCGTCGCGGCACACAGTAAGTGTCGGGGTAGGGGGCCCGGAAGGCTATTCTCATGAAGCGGATAC TTATGACTACACGGTGTCTCCGTATGCTATCACGCCACCGCCTGCCGCGGGTGAGTTGCCTGGTCAAGAGG CTGTCGAGGCACAACAGTACCCTCCATGGGGTCTGGGGGAGGACGGGCAACCAGGTCCGGGCGTGGACA CGCAGATTTTTGAGGACCCTCGCGAATTTTTGAGCCACTTAGAGGAGTACCTGCGGCAAGTAGGGGGGAG TGAAGAGTACTGGTTATCGCAAATTCAAAATCATATGAATGGCCCTGCGAAGAAATGGTGGGAGTTCAAAC AGGGGTCAGTCAAGAATTGGGTCGAGTTTAAGAAAGAATTTTTGCAATACAGTGAGGGTACGTTGAGTCG CGAGGCCATCCAACGTGAACTGGACCTCCCTCAGAAGCAGGGGGAGCCGTTAGATCAATTTTTATGGCGG AAACGTGACTTATACCAAACCCTCTACGTTGACGCTGAGGAAGAAGAAATTATTCAATATGTTGTCGGTAC GCTGCAGCCAAAGCTGAAGCGGTTCCTCCGTCCTCCACTCCCTAAAACCTTAGAACAATTAATCCAAAAAGG CATGGAAGTTCAGGACGGGTTAGAACAAGCGGCCGAACCGGCCTCTCCGCGTCTGCCGCCGGAAGAGGA GAGTGAGGCTCTTACGCCTGCGCTCACGAGCGAATCAGTAGCCTCCGATCGGACACAGCCAGAG SEQ ID NO: 39 GGGCAGCTTGACAATGTGACGAACGCGGGGATTCACAGCTTTCAAGGGCACCGCGGCGTCGCCAACAAAC CGAATGTCATTCTGCAAATCGGTAAATGTCGTGCTGAAATGCTTGAGCACGTTCGTCGTACCCATCGTCACT TGCTTTCTGAAGTATCAAAACAAGTGGAGCGGGAACTCAAAGGCCTGCAAAAGTCAGTGGGTAAATTGGA GAATAACCTCGAAGACCATGTACCTACAGACAACCAGCGGTGGAAAAAATCTATCAAGGCATGCCTCGCTC GTTGCCAGGAGACTATTGCCCATCTTGAGCGGTGGGTGAAACGTGAAATGAACGTATGGAAGGAAGTATT TTTTCGCTTAGAGAAGTGGGCTGATCGTCTTGAATCGATGGGCGGCAAGTACTGTCCTGGGGAACACGGC AAACAAACTGTATCTGTCGGCGTGGGGGGCCCGGAGATCCGGCCATCGGAAGGGGAAATTTATGATTATG CTCTCGACATGTCCCAAATGTATGCTCTCACACCAGGGCCAGGGGAAGTACCGTCAATTCCGCAAGCACAC GACAGCTACCAATGGGTATCTGTGAGCGAGGACGCGCCTGCCTCTCCGGTTGAGACGCAAATCTTTGAGG ACCCACATGAATTTTTGTCTCATCTTGAAGAATATCTCAAACAGGTTGGCGGCACAGAAGAATACTGGTTAT CTCAGATCCAGAATCACATGAACGGCCCGGCTAAAAAGTGGTGGGAGTATAAGCAAGATTCCGTAAAGAA CTGGGTCGAATTCAAGAAAGAGTTTCTTCAATACTCTGAGGGTACTCTGACGCGCGATGCAATTAAGCGGG AGTTAGACCTTCCACAAAAAGAGGGGGAGCCTCTTGACCAGTTCCTGTGGCGTAAGCGCGACCTCTATCAG ACACTTTACGTCGACGCTGATGAAGAAGAGATTATTCAATATGTTGTGGGTACCCTGCAGCCAAAGCTTAA GCGTTTCCTTAGCTACCCACTTCCGAAAACTCTGGAGCAGCTCATTCAACGCGGTAAGGAAGTGCAGGGCA ACATGGACCACTCTGAAGAGCCTAGCCCGCAGCGCACTCCTGAAATCCAATCAGGTGACAGTGTGGAGTCA ATGCCGCCGTCAACCACCGCTTCTCCGGTACCTAGCAACGGGACGCAACCAGAGCCTCCAAGCCCACCGGC TACAGTCATC SEQ ID NO: 40 GGGCAACTTGAGAATATTAACCAAGGTTCCCTGCACGCGTTTCAGGGTCATCGCGGCGTGGTCCATAACAA CAAGCCTAACGTTATTCTCCAGATCGGGAAGTGCCGCGCCGAAATGCTGGAGCATGTGCGGCGCACCCATC GCCATTTGCTCACTGAAGTATCAAAACAGGTGGAGCGTGAGTTGAAGGGGTTGCAGAAAAGTGTAGGCAA ACTTGAAAATAATTTAGAAGACCACGTACCAAGTGCGGCTGAGAACCAACGCTGGAAGAAGTCGATTAAA GCCTGCTTAGCGCGTTGTCAGGAGACCATTGCGAACTTGGAACGCTGGGTTAAACGTGAGATGAATGTTTG GAAGGAGGTCTTTTTCCGCTTAGAGCGCTGGGCAGATCGCCTCGAATCCGGGGGTGGCAAGTACTGCCAT GCAGACCAGGGTCGCCAAACTGTCAGCGTAGGTGTTGGTGGTCCTGAAGTGCGTCCGTCTGAAGGTGAAA TTTACGATTACGCGTTGGATATGAGCCAAATGTACGCCTTGACTCCGCCGCCTATGGGTGATGTTCCAGTAA TTCCTCAGCCGCATGACAGTTATCAGTGGGTGACAGATCCGGAAGAAGCGCCACCAAGTCCGGTTGAGAC ACAAATTTTCGAGGACCCTCGGGAGTTTCTGACCCATCTTGAGGATTATTTAAAACAAGTCGGCGGGACAG AGGAATATTGGCTCTCACAGATCCAAAATCATATGAATGGGCCAGCGAAAAAGTGGTGGGAATATAAACA GGATAGTGTGAAGAACTGGCTTGAGTTCAAAAAAGAATTCTTGCAGTACTCAGAAGGCACGTTAACGCGG GACGCTATTAAACAGGAACTTGACCTTCCACAAAAAGAAGGGGAACCGCTGGATCAATTCCTCTGGCGCAA ACGCGATTTGTACCAAACTCTCTACGTCGAGGCAGAAGAAGAGGAGGTCATCCAATATGTAGTTGGCACAC TGCAACCAAAACTGAAGCGGTTTCTTTCTCATCCGTACCCTAAAACCCTGGAGCAACTCATCCAGCGCGGGA AGGAAGTTGAGGGGAATTTGGACAATAGTGAAGAACCGTCTCCACAGCGGACCCCAGAACATCAGCTGGG GGACAGTGTGGAATCTTTGCCGCCTAGTACTACGGCTTCGCCTGCCGGTTCGGATAAAACGCAACCTGAGA TTAGCTTACCTCCAACTACAGTCATT SEQ ID NO: 41 GGGCAATTAGATTCGGTAACCAATGCGGGCGTCCACACCTACCAGGGCCATCGGAGCGTCGCCAATAAAC CTAACGTCATTCTTCAAATCGGGAAATGTCGGACTGAGATGCTGGAGCATGTCCGTCGGACTCATCGCCAC CTGCTCACAGAAGTGTCAAAGCAAGTGGAACGTGAACTCAAGGGCTTACAGAAGAGCGTGGGCAAACTGG AAAACAATCTTGAAGACCATGTCCCAACTGACAATCAGCGGTGGAAGAAGTCAATCAAGGCATGTCTCGCG CGTTGCCAAGAGACCATTGCTCACCTTGAGCGGTGGGTGAAACGTGAAATGAACGTGTGGAAGGAGGTGT TCTTCCGGTTAGAACGCTGGGCCGACCGCCTTGAATCAATGGGTGGTAAATACTGCCCGACGGACTCTGCA CGTCAGACAGTTAGCGTTGGGGTGGGGGGCCCGGAAATTCGGCCTAGTGAAGGCGAAATCTATGACTACG CGCTCGATATGAGCCAAATGTACGCTCTTACGCCGTCACCGGGCGAATTGCCGTCCGTCCCTCAACCGCATG ATTCATACCAGTGGGTCACTAGTCCGGAAGACGCTCCGGCGTCACCAGTTGAAACGCAGGTATTCGAGGAT CCTCGGGAGTTCTTGTGTCATTTGGAAGAGTACCTGAAGCAGGTTGGCGGTACAGAGGAATATTGGCTGA GCCAGATTCAGAATCATATGAATGGTCCTGCAAAAAAGTGGTGGGAATATAAACAAGACACGGTTAAGAA TTGGGTGGAATTCAAGAAGGAGTTCTTACAATACAGTGAGGGTACACTTACCCGTGATGCGATTAAGCGG GAATTAGACCTCCCGCAAAAGGACGGTGAGCCTCTGGATCAATTTTTATGGCGTAAGCGTGACCTCTATCA GACATTATACATTGATGCCGATGAAGAACAGATCATTCAGTACGTCGTGGGGACATTGCAACCTAAACTCA AGCGGTTCTTGTCCTATCCACTTCCAAAAACTCTTGAACAATTAATCCAGAAAGGGAAGGAGGTGCAGGGT TCACTTGACCACAGCGAGGAGCCGAGTCCTCAACGTGCGAGCGAGGCTCGGACGGGCGATAGTGTGGAA ACCTTGCCGCCTTCTACCACTACATCACCAAATACGTCATCTGGTACACAGCCAGAGGCACCATCGCCTCCA GCGACGGTAATC SEQ ID NO: 42 GGGCAGTTAGACAGTGTGACTAACGCCGGGGTGCATACGTACCAGGGGCACCGCGGGGTCGCCAATAAG CCAAATGTAATTCTCCAGATTGGGAAGTGTCGTACAGAGATGTTGGAACATGTCCGTCGCACTCATCGCCA CTTGCTCACCGAGGTCTCCAAACAAGTAGAACGCGAACTCAAGGGGCTCCAGAAGAGTGTTGGGAAGTTG GAGAATAACCTCGAAGACCACGTTCCGACAGATAACCAACGGTGGAAAAAGTCTATTAAAGCCTGTCTCGC CCGTTGTCAAGAGACAATCGCACACTTGGAACGCTGGGTCAAACGGGAGATGAATGTGTGGAAGGAAGTC TTCTTCCGTCTCGAGCGGIGGGCGGATCGTTTAGAAAGTATGGGCGGTAAATATTGCCCAACTGACTCGGC TCGTCAAACGGTGTCGGTTGGCGTAGGCGGCCCGGAAATTCGCCCTAGCGAGGGTGAGATCTATGACTAT GCACTTGACATGAGTCAGATGTATGCGTTAACTCCGTCGCCAGGGGAGCTTCCAAGTATTCCACAGCCTCA CGATAGTTATCAATGGGTAACTTCTCCTGAAGACGCCCCAGCATCCCCAGTTGAGACACAAGTATTCGAGG ACCCTCGTGAGTTTCTCTGTCACCTCGAGGAGTACCTTAAACAGGTAGGCGGGACCGAAGAGTACTGGTTA TCGCAAATCCAAAACCATATGAATGGTCCTGCCAAAAAGTGGTGGGAGTATAAACAAGATACTGTGAAGA ATTGGGTAGAGTTCAAGAAAGAGTTCTTACAGTACTCTGAGGGGACGTTAACTCGTGATGCGATCAAGCGC GAATTGGATTTACCTCAGAAGGACGGCGAGCCACTCGACCAGTTCTTATGGCGCAAGCGTGACTTGTATCA AACCCTTTATATCGATGCTGACGAGGAACAAATTATCCAGTACGTAGTCGGTACGTTGCAACCAAAACTTAA ACGCTTTCTGAGCTACCCATTACCTAAAACGTTGGAGCAACTGATCCAGAAAGGTAAAGAGGTGCAAGGG AGCCTGGATCATAGTGAAGAACCGAGCCCTCAGCGGGCTTCTGAAGCTCGGACCGGTGATAGCGTCGAAT CTTTACCACCTAGTACCACAACCAGCCCGAATGCGTCATCTGGTACCCAACCTGAAGCGCCTTCCCCACCTG CTACAGTCATT SEQ ID NO: 43 GGGCAGCTCGAGAATGTCAACCATGGGAACCTCCATTCTTTTCAAGGTCATCGCGGCGGCGTCGCCAACAA GCCAAACGTTATCTTGCAGATCGGTAAATGTCGTGCAGAGATGCTGGACCACGTCCGGCGGACCCACCGG CATTTACTGACAGAGGTATCGAAACAGGTTGAACGTGAGTTGAAGGGGTTACAGAAATCAGTAGGGAAAT TAGAAAATAACTTAGAAGACCATGTCCCTTCAGCCGTTGAAAACCAGCGTTGGAAAAAATCGATCAAGGCC TGCCTTTCCCGCTGCCAAGAGACCATTGCCCACCTTGAGCGTTGGGTGAAGCGCGAGATGAACGTATGGAA AGAGGTTTTCTTCCGCTTAGAGCGGTGGGCAGATCGGTTGGAATCTGGGGGCGGGAAATATTGTCACGGT GATAATCATCGTCAAACAGTATCAGTCGGTGTTGGCGGCCCTGAGGTACGTCCATCTGAAGGCGAAATTTA CGATTACGCTCTCGACATGTCGCAAATGTACGCTTTAACACCGCCTAGCCCAGGGGATGTGCCTGTAGTTA GCCAGCCGCACGACAGCTATCAGTGGGTTACGGTTCCGGAGGATACCCCTCCATCCCCGGTGGAGACGCA AATCTTCGAGGACCCACGGGAGTTCTTGACCCACTTAGAGGATTACTTAAAGCAAGTGGGGGGTACAGAG GAATATTGGTTATCTCAGATCCAGAATCACATGAACGGGCCAGCCAAGAAGTGGTGGGAGTATAAGCAAG ACTCAGTAAAAAATTGGCTCGAGTTTAAGAAGGAATTCCTTCAGTATTCCGAGGGGACACTTACGCGCGAC GCTATCAAGGAAGAACTTGACCTCCCGCAAAAGGACGGGGAACCTCTTGATCAGTTCCTGTGGCGCAAGC GCGACTTGTACCAGACCCTGTACGTGGAGGCGGATGAGGAGGAGGTGATCCAGTATGTTGTGGGGACTTT ACAACCTAAATTAAAGCGTTTTCTCTCACACCCTTACCCGAAAACGTTAGAGCAACTTATCCAACGGGGCAA AGAGGTGGAAGGGAACCTCGACAATTCAGAGGAACCAACACCTCAGCGTACTCCAGAACACCAACTGTGT GGTTCTGTAGAATCGCTGCCTCCTTCCTCTACCGTCAGTCCAGTGGCTAGCGATGGTACTCAACCTGAGACT TCGCCATTGCCAGCGACTGTTATT SEQ ID NO: 44 GGGCCATTGACGTTGTTACAAGACTGGTGTCGTGGTGAACATTTAAACACCCGCCGGIGCATGTTGATCCT CGGTATCCCAGAAGATTGCGGCGAGGATGAGTTCGAAGAGACACTTCAGGAGGCGTGTCGCCATTTAGGG CGGTACCGCGTGATCGGCCGCATGTTCCGTCGTGAGGAAAATGCCCAAGCGATCCTCTTGGAATTGGCGCA GGATATTGACTATGCCTTACTCCCTCGGGAAATCCCTGGGAAAGGCGGGCCTTGGGAGGTAATTGTGAAG CCGCGTAATTCCGACGGCGAATTCTTAAATCGGCTTAATCGCTTTCTTGAAGAGGAGCGCCGTACGGTCTCC GATATGAACCGTGTTTTGGGCTCGGATACTAACTGTTCAGCTCCTCGTGTCACCATTAGTCCTGAATTCTGG ACTTGGGCACAGACGCTGGGCGCAGCTGTCCAACCATTGCTCGAACAGATGCTCTACCGGGAGTTACGGG TCTTCAGTGGCAATACGATTTCCATCCCAGGTGCTCTCGCTTTTGACGCGTGGCTGGAGCATACCACGGAAA TGCTTCAAATGTGGCAGGTGCCTGAAGGGGAGAAACGGCGGCGCTTGATGGAGTGTTTGCGGGGGCCAG CCCTGCAAGTCGTTAGTGGGTTACGTGCATCGAATGCCAGTATCACTGTCGAAGAGTGTCTTGCTGCACTG CAGCAGGTATTCGGTCCAGTGGAAAGTCATAAGATTGCCCAAGTAAAGTTATGCAAAGCTTACCAGGAGG CTGGGGAAAAAGTAAGCAGCTTCGTTTTGCGTTTGGAGCCACTGCTTCAGCGTGCTGTAGAAAACAACGTG GTCAGTCGCCGCAATGTCAACCAAACACGTCTTAAGCGTGTTCTGTCGGGCGCCACCCTTCCTGACAAGCTG CGTGATAAATTGAAGTTAATGAAACAGCGCCGTAAACCGCCGGGTTTCTTGGCGTTGGTTAAACTGTTACG
TGAAGAGGAGGAGTGGGAGGCCACCTTAGGGCCAGACCGCGAGTCATTGGAGGGGTTAGAAGTGGCACC GCGCCCGCCAGCACGGATTACGGGTGTTGGCGCAGTACCTCTTCCGGCATCCGGGAATTCATTTGATGCCC GTCCTTCGCAAGGGTACCGGCGCCGTCGGGGTCGTGGTCAGCACCGTCGGGGCGGCGTTGCTCGTGCAGG CTCTCGTGGCTCTCGTAAGCGGAAACGGCACACCTTCTGCTATTCCTGTGGTGAGGATGGCCATATTCGTGT CCAATGCATTAACCCTAGCAATCTCCTGTTGGCTAAGGAGACCAAAGAGATTTTGGAAGGGGGAGAACGT GAAGCGCAAACGAATTCACGT SEQ ID NO: 45 GGGGCTCTTACGCTCTTAGAAGACTGGTGTAAGGGTATGGACATGGACCCGCGGAAGGCTCTCCTGATTGT AGGTATTCCGATGGAATGCAGTGAGGTGGAAATCCAGGATACAGTTAAAGCTGGTCTTCAACCTCTGTGCG CTTATCGTGTACTCGGCCGTATGTTCCGGCGGGAGGATAATGCGAAGGCTGTTTTCATTGAGCTGGCAGAC ACCGTGAATTACACCACGTTACCGTCTCACATTCCGGGTAAAGGGGGTTCCTGGGAAGTCGTTGTTAAACC TCGGAACCCTGACGACGAGTTCCTTTCTCGGCTTAACTACTTCTTGAAAGATGAGGGCCGCTCGATGACGG ATGTCGCCCGGGCACTGGGGIGCTGTAGCTTACCTGCGGAATCACTGGACGCGGAAGTAATGCCACAGGT CCGCTCCCCACCATTAGAACCTCCAAAAGAGAGTATGTGGTACCGTAAGTTAAAAGTGTTTAGTGGTACCG CGTCGCCTTCGCCGGGGGAGGAGACATTTGAGGACTGGTTAGAGCAAGTCACCGAGATCATGCCTATCTG GCAAGTATCTGAAGTTGAAAAGCGCCGTCGGTTACTGGAGTCACTCCGGGGCCCGGCACTCTCAATTATGC GCGTGTTACAAGCCAATAACGATAGCATTACCGTTGAACAGTGTTTGGATGCATTAAAGCAGATCTTTGGC GACAAGGAAGACTTCCGTGCCTCTCAATTTCGTTTTCTTCAAACGTCCCCTAAAATTGGGGAGAAGGTGAGT ACGTTCCTGCTGCGTTTAGAGCCACTCTTGCAAAAGGCCGTTCACAAGAGCCCACTTTCGGTACGTAGTACT GATATGATTCGGTTAAAGCACCTGTTGGCACGCGTAGCCATGACCCCGGCACTGCGTGGTAAACTCGAATT ACTCGACCAACGCGGGIGCCCACCTAATTTTCTTGAGCTGATGAAGCTGATCCGGGATGAGGAAGAGTGG GAGAATACTGAAGCTGTGATGAAAAATAAAGAGAAACCTTCAGGTCGTGGCCGCGGTGCATCAGGCCGTC AAGCTCGCGCCGAGGCCAGTGTAAGTGCTCCGCAAGCAACAGTCCAAGCACGTAGCTTCTCTGATTCTAGC CCGCAGACGATTCAGGGGGGCTTACCACCTCTTGTCAAGCGTCGGCGCCTTTTGGGTTCGGAGAGCACACG TGGGGAAGACCACGGGCAAGCTACTTATCCGAAAGCAGAGAATCAGACTCCAGGGCGTGAGGGCCCGCA GGCGGCTGGGGAGGAACTTGGTAATGAGGCCGGGGCCGGCGCGATGTCCCACCCGAAACCGTGGGAAAC C SEQ ID NO: 46 GGGGCTGTGACAATGCTCCAGGACTGGTGCCGTTGGATGGGCGTGAACGCTCGGCGGGGGCTGTTAATCT TAGGTATCCCTGAAGACTGTGACGATGCAGAGTTCCAAGAGTCGTTAGAAGCTGCACTCCGTCCTATGGGT CACTTTACTGTACTCGGTAAGGCCTTCCGCGAGGAAGACAACGCTACCGCTGCGCTGGTGGAATTAGATCG CGAGGTTAATTACGCACTTGTTCCACGCGAAATTCCGGGCACCGGCGGGCCTTGGAACGTCGTGTTCGTTC CTCGGTGCTCCGGCGAGGAATTCCTGGGGTTAGGCCGCGTGTTCCACTTTCCTGAACAGGAGGGCCAAATG GTAGAATCGGTTGCGGGGGCACTGGGGGTAGGTCTGCGCCGCGTGTGTTGGTTACGCTCGATCGGGCAA GCTGTACAACCATGGGTAGAAGCTGTTCGCTGCCAAAGCTTAGGGGTATTTAGTGGTCGTGATCAACCTGC ACCTGGTGAAGAAAGCTTCGAGGTCTGGTTGGATCATACGACCGAGATGTTGCATGTGTGGCAAGGCGTG TCGGAACGGGAACGGCGCCGTCGTCTGCTGGAAGGGCTGCGTGGCACAGCCTTACAACTTGTACATGCCTT ACTGGCAGAAAATCCGGCACGGACAGCACAAGATTGCTTGGCTGCATTAGCCCAAGTTTTTGGTGATAACG AAAGCCAGGCAACGATTCGTGTTAAATGTTTGACAGCCCAACAGCAGAGTGGCGAACGCCTCTCTGCGTTC GTTCTCCGCTTAGAAGTACTTCTGCAAAAGGCTATGGAGAAGGAAGCATTGGCGCGCGCGTCAGCGGATC GGGTGCGTCTTCGTCAGATGCTGACACGCGCACATCTCACAGAGCCGTTGGATGAAGCCTTACGGAAATTG CGTATGGCAGGGCGTTCTCCGTCTTTTTTGGAAATGCTCGGCTTAGTACGCGAGTCAGAGGCCTGGGAGGC AAGTCTGGCTCGGTCCGTCCGGGCGCAAACCCAGGAGGGTGCAGGGGCCCGGGCGGGGGCCCAAGCAGT TGCGCGTGCCAGCACTAAGGTTGAAGCTGTACCTGGTGGCCCTGGCCGGGAGCCAGAAGGTCTCCTCCAA GCCGGGGGCCAAGAAGCGGAAGAACTTCTCCAAGAGGGCTTAAAGCCGGTTTTAGAGGAATGTGACAAT SEQ ID NO: 47 GGGGCGGTCACCATGTTGCAAGACTGGTGTCGGTGGATGGGCGTGAATGCTCGGCGGGGTTTATTGATCT TGGGTATCCCAGAAGACTGTGACGACGCCGAGTTTCAGGAGTCGCTCGAGGCCGCCCTTCGTCCAATGGG GCATTTTACGGTTCTGGGCAAGGTGTTCCGTGAAGAGGATAACGCTACAGCAGCTCTTGTGGAGCTTGACC GIGAGGTGAATTATGCGTTAGTACCTCGCGAGATTCCAGGTACCGGIGGGCCATGGAACGTAGTCTTCGTC CCACGTTGCTCGGGGGAGGAATTTCTGGGGCTTGGGCGCGTATTCCACTTTCCAGAACAGGAAGGGCAGA TGGTCGAAAGCGTAGCAGGCGCTCTTGGCGTTGGTCTCCGGCGCGTGTGCTGGTTACGCTCCATCGGCCAA GCAGTCCAACCATGGGTTGAAGCCGTACGCTATCAATCTTTAGGTGTCTTCTCAGGCCGTGACCAGCCGGC GCCTGGTGAGGAATCCTTCGAAGTCTGGCTCGATCATACAACTGAGATGCTGCATGTATGGCAAGGTGTCT CAGAGCGGGAACGGCGGCGGCGGTTATTAGAGGGGCTCCGTGGGACTGCGCTCCAATTAGTACATGCGCT TTTGGCCGAAAATCCAGCCCGTACTGCCCAAGATTGTCTGGCAGCACTCGCCCAAGTATTCGGCGACAACG AATCGCAGGCAACAATCCGCGTAAAGTGTCTTACAGCACAGCAGCAGTCAGGGGAACGTCTTAGTGCGTTC GTTCTGCGGCTGGAAGTGTTACTCCAGAAAGCCATGGAAAAGGAGGCATTGGCTCGCGCGAGCGCTGACC GTGTACGTCTGCGGCAAATGCTTACTCGCGCACATCTCACCGAGCCTCTCGATGAAGCACTGCGGAAACTG CGCATGGCAGGCCGCAGCCCGTCTTTCCTGGAAATGTTAGGCTTAGTCCGGGAGTCCGAAGCCTGGGAGG CCAGTCTGGCACGGTCAGTGCGGGCACAAACGCAAGAGGGTGCAGGGGCACGGGCGGGTGCACAAGCA GTTGCACGTGCCTCCACTAAAGTTGAGGCAGTGCCGGGTGGGCCAGGCCGTGAACCGGAGGGTTTGCGCC AAGCCGGCGGGCAGGAAGCCGAAGAATTACTCCAAGAAGGTTTAAAACCGGTTTTGGAGGAATGCGATAA C SEQ ID NO: 48 GGGGTGGAAGATTTGGCGGCATCTTACATCGTATTAAAGCTTGAGAACGAAATCCGGCAGGCGCAGGTCC AATGGTTAATGGAGGAAAACGCCGCCCTGCAGGCCCAGATCCCTGAACTTCAAAAGTCGCAAGCCGCGAA GGAGTATGATCTTCTGCGTAAATCTTCGGAGGCGAAGGAGCCGCAAAAACTGCCAGAACATATGAATCCAC CGGCCGCTTGGGAAGCACAAAAGACTCCAGAGTTTAAGGAACCACAGAAACCTCCTGAACCACAGGATTT GCTTCCTTGGGAGCCGCCTGCTGCCTGGGAGTTGCAAGAAGCACCGGCTGCCCCTGAGTCACTGGCTCCGC CTGCAACCCGTGAGTCTCAGAAACCACCTATGGCGCATGAAATCCCTACTGTATTGGAGGGGCAAGGGCCT GCCAACACACAAGACGCTACGATTGCTCAAGAACCAAAGAATAGCGAGCCGCAAGACCCTCCAAATATCG AGAAACCTCAGGAAGCTCCGGAATATCAAGAAACAGCGGCACAGTTGGAGTTTTTAGAACTTCCTCCACCT CAGGAGCCACTCGAACCGAGCAATGCGCAAGAATTTCTCGAGTTGTCGGCTGCCCAGGAGTCCTTAGAAG GCCTCATTGTAGTTGAAACGTCCGCGGCTTCGGAGTTCCCACAGGCTCCTATCGGGCTTGAAGCCACCGAC TTTCCGCTGCAGTACACGCTTACCTTCTCTGGCGACAGCCAGAAGTTGCCAGAATTTTTGGTCCAACTCTAC AGTTATATGCGGGTACGTGGGCACTTATACCCTACCGAGGCGGCGTTAGTGTCGTTTGTAGGCAATTGTTT CTCAGGGCGCGCGGGCTGGTGGTTTCAGTTGCTTTTGGATATCCAGTCGCCTCTGTTAGAACAGTGTGAAA GTTTTATCCCGGTTCTCCAAGACACATTTGACAATCCGGAAAACATGAAGGACGCAAACCAATGCATCCACC AGCTTTGTCAGGGCGAGGGTCATGTGGCCACACACTTCCACCTCATTGCACAAGAGCTTAATTGGGATGAA AGCACGCTGTGGATCCAGTTCCAGGAAGGCCTGGCCTCATCCATCCAGGATGAACTTTCCCATACATCGCCT GCTACCAACCTGAGTGATCTGATTACTCAATGCATCTCATTAGAGGAAAAGCCTGACCCAAACCCGTTAGG GAAGTCCTCCTCGGCGGAGGGGGATGGCCCGGAAAGTCCGCCAGCAGAAAACCAACCTATGCAAGCTGCG ATCAATTGTCCTCACATTTCCGAAGCAGAGTGGGTTCGTTGGCACAAAGGCCGGCTTTGTCTCTATTGCGGC TATCCGGGTCACTTCGCACGTGATTGCCCAGTGAAGCCACACCAGGCGTTACAGGCAGGGAACATTCAGGC TTGCCAA SEQ ID NO: 49 GGGGTGCAGCCGCAGACTAGCAAAGCTGAATCGCCGGCTCTCGCTGCCTCACCGAACGCACAAATGGATG ACGTTATTGATACATTAACCTCCCTGCGTCTGACGAATTCGGCTCTGCGGCGGGAGGCTAGCACTCTTCGG GCCGAGAAAGCAAATTTAACTAATATGCTCGAGTCAGTGATGGCCGAGTTAACGCTGTTACGGACCCGTGC GCGGATTCCGGGGGCCCTGCAGATTACGCCACCAATTTCGTCTATTACTAGCAACGGTACTCGCCCGATGA CGACTCCTCCAACTAGTTTACCTGAACCGTTTTCTGGCGATCCTGGCCGGTTAGCTGGTTTCCTTATGCAGAT GGACCGTTTTATGATCTTTCAAGCTAGCCGGTTTCCAGGGGAGGCAGAGCGTGTTGCGTTCCTGGTGTCGC GCTTAACTGGCGAAGCAGAAAAATGGGCCATTCCTCACATGCAACCAGACTCTCCTTTGCGTAACAACTATC AAGGCTTCTTAGCAGAGTTACGGCGGACCTATAAGAGCCCGTTGCGTCACGCCCGGCGGGCGCAAATCCG GAAGACATCGGCCTCGAACCGGGCAGTCCGTGAACGCCAAATGCTTTGCCGGCAACTTGCATCAGCAGGT ACAGGCCCATGCCCGGTACACCCTGCTAGTAACGGGACTTCCCCGGCACCGGCATTACCAGCACGGGCGC GTAACTTA SEQ ID NO: 50 GGGGACGGTCGGGTACAGTTGATGAAGGCTTTATTGGCTGGCCCTTTACGTCCGGCGGCACGCCGTTGGC GGAATCCTATTCCATTTCCAGAGACTTTTGATGGGGATACTGATCGCCTCCCGGAGTTTATCGTCCAAACTT CGTCCTACATGTTCGTTGACGAAAATACTTTCTCTAACGACGCTCTGAAAGTGACATTTCTCATTACCCGGCT GACAGGTCCAGCCTTGCAATGGGTCATTCCGTACATTCGTAAAGAAAGCCCGCTTCTTAACGACTATCGGG GTTTCCTGGCCGAGATGAAGCGGGTTTTTGGGTGGGAAGAGGACGAGGACTTT SEQ ID NO: 51 GGGGAAGGTCGGGTGCAACTTATGAAAGCGTTGCTTGCCCGCCCGCTTCGTCCAGCAGCACGTCGCTGGC GGAATCCAATTCCTTTCCCGGAGACTTTTGACGGGGACACCGATCGGCTCCCAGAGTTCATTGTGCAGACG TCAAGCTATATGTTCGTGGATGAGAACACGTTCTCTAACGACGCGTTGAAAGTGACTTTCTTAATTACGCGT TTGACTGGCCCGGCTTTACAATGGGTGATTCCATACATTAAGAAAGAGTCACCGCTTCTCAGTGATTATCGC GGTTTTTTAGCCGAGATGAAGCGGGTCTTCGGGTGGGAAGAAGACGAAGACTTT SEQ ID NO: 52 GGGCCGCGTGGGCGTTGCCGTCAACAAGGTCCTCGGATTCCGATTTGGGCAGCGGCCAACTATGCCAACG CCCACCCGTGGCAACAAATGGATAAGGCTTCGCCAGGCGTTGCTTACACACCTTTGGTTGATCCTTGGATTG AGCGGCCTTGTTGCGGTGACACGGTTTGTGTGCGCACCACAATGGAACAGAAGAGCACAGCGTCAGGCAC TTGTGGTGGTAAGCCTGCTGAGCGTGGTCCTCTCGCGGGGCATATGCCGAGCTCACGCCCACATCGGGTTG ATTTCTGTTGGGTTCCTGGTAGCGACCCAGGCACATTCGACGGCAGTCCATGGCTCTTAGATCGCTTTTTGG CGCAACTTGGTGATTACATGAGTTTTCACTTTGAACACTACCAGGACAATATCAGCCGTGTCTGCGAGATTC TTCGTCGGTTAACGGGCCGCGCTCAGGCATGGGCTGCTCCTTACCTGGACGGGGACCTTCCACTGCCAGAC GACTACGAATTGTTTTGTCAAGACCTTAAGGAGGTAGTACAGGACCCTAACAGTTTCGCCGAGTATCACGC CGTGGTGACTTGTCCACTCCCTCTTGCTTCGTCCCAACTTCCTGTAGCTCCTCAGCTTCCGGTGGTACGCCAA TACCTTGCGCGCTTCTTGGAGGGCCTTGCTTTGGATATGGGTACGGCGCCTCGGTCACTCCCGGCCGCTAT GGCCACACCGGCAGTCTCCGGCTCGAACTCCGTTTCTCGTTCTGCCTTATTTGAACAACAACTCACAAAGGA ATCCACTCCAGGCCCGAAAGAGCCACCTGTTCTCCCTAGCTCGACTTGCTCTAGCAAACCGGGTCCTGTCGA ACCAGCCAGTTCACAACCTGAAGAGGCTGCTCCTACCCCGGTGCCGCGTTTGTCAGAGTCGGCTAACCCAC CGGCTCAGCGTCCAGACCCTGCTCACCCTGGTGGTCCTAAACCACAAAAAACCGAAGAGGAAGTTTTAGAA ACTGAGGGGGACCAGGAAGTTAGCCTGGGGACGCCGCAGGAGGTCGTAGAAGCGCCGGAAACACCAGG TGAACCACCGCTCAGCCCTGGGTTC SEQ ID NO: 53 GGGGTTGATGAATTGGTGCTCTTGTTGCACGCGCTGTTAATGCGCCATCGGGCGCTTTCCATTGAAAATTCT CAGTTGATGGAGCAACTTCGCTTGTTGGTCTGCGAACGGGCGAGCCTTCTTCGTCAGGTACGTCCGCCGAG
CTGTCCAGTGCCATTTCCTGAGACTTTTAACGGGGAGTCATCACGGTTACCTGAGTTCATCGTCCAAACCGC AAGCTATATGTTAGTTAATGAAAATCGCTTTTGCAATGACGCAATGAAAGTCGCTTTTTTGATTAGCCTTCTT ACTGGTGAAGCAGAAGAATGGGTCGTCCCATACATTGAGATGGATTCACCAATTCTTGGGGACTACCGTGC GTTCTTGGATGAGATGAAGCAGTGTTTTGGGTGGGACGATGATGAAGATGACGACGATGAGGAAGAGGA GGATGACTAT SEQ ID NO: 54 GGGCCTGTGGATTTAGGTCAGGCTTTGGGGTTGTTGCCATCCCTCGCTAAGGCCGAAGATTCCCAATTTAG CGAAAGCGATGCAGCTTTACAGGAGGAATTGICTTCTCCGGAAACCGCACGGCAACIIIIICGTCAATTTCG CTATCAAGTCATGTCGGGGCCTCATGAAACACTGAAACAGTTACGGAAGTTATGTTTTCAGTGGCTGCAAC CTGAAGTCCATACAAAGGAACAAATCCTCGAAATTCTGATGCTGGAACAGTTCTTGACCATTCTGCCTGGTG AAATTCAGATGTGGGTCCGCAAGCAGTGCCCTGGTAGTGGGGAGGAGGCGGTTACGTTAGTAGAATCCCT GAAAGGTGATCCACAACGGCTCTGGCAATGGATCTCCATCCAAGTCCTGGGTCAGGATATCCTGTCTGAGA AAATGGAGTCACCTTCTTGCCAGGTGGGCGAAGTGGAGCCACACCTGGAAGTTGTACCTCAGGAACTGGG GTTAGAGAATTCATCTTCAGGGCCGGGGGAACTTCTTTCGCACATCGTGAAAGAGGAGTCTGACACTGAAG CAGAGTTGGCGTTAGCGGCATCCCAGCCAGCTCGTTTGGAAGAACGGCTGATTCGGGATCAGGACCTTGG GGCGTCCCTCCTCCCGGCAGCACCGCAGGAGCAATGGCGTCAATTAGACAGCACTCAAAAAGAACAATATT GGGACCTGATGCTGGAGACCTACGGCAAAATGGTATCCGGCGCGGGTATCTCACACCCGAAGTCCGATTT AACGAACTCAATTGAGTTCGGTGAAGAGTTGGCAGGTATTTATTTACATGTAAACGAAAAGATTCCGCGGC CTACCTGCATTGGTGACCGCCAAGAAAACGACAAAGAAAACCTTAATTTGGAAAACCATCGTGACCAGGAA TTATTACATGCCAGCTGCCAGGCCTCGGGCGAAGTGCCATCCCAGGCATCGTTACGTGGCTTCTTTACCGAG GACGAACCTGGTTGCTTCGGCGAAGGGGAGAACCTTCCTGAGGCACTTCAGAATATCCAGGATGAGGGGA CTGGCGAACAGCTGAGCCCGCAAGAACGCATTAGTGAAAAACAGTTGGGTCAACATTTGCCAAATCCGCAC TCGGGGGAGATGTCGACGATGTGGCTTGAAGAAAAACGGGAGACCAGCCAGAAAGGCCAACCACGTGCA CCAATGGCGCAGAAATTGCCAACGTGCCGCGAATGTGGCAAAACGTTTTATCGCAATAGTCAACTTATCTTT CACCAACGCACACACACCGGTGAGACATATTTTCAATGCACCATCTGCAAAAAGGCGTTTCTCCGGTCATCT GATTTCGTGAAACATCAGCGGACTCATACTGGCGAAAAACCTTGTAAATGTGACTATTGTGGCAAGGGCTT TAGTGATTTTAGCGGGCTTCGGCATCACGAGAAGATCCATACCGGCGAGAAGCCATACAAGTGTCCAATCT GTGAGAAATCTTTCATCCAGCGCAGTAATTTTAACCGCCACCAACGGGTTCACACCGGTGAAAAGCCTTATA AATGCTCGCATTGTGGCAAGAGCTTCAGCTGGAGCTCCTCGCTCGATAAGCATCAACGTTCACATCTGGGG AAGAAGCCGTTCCAA SEQ ID NO: 55 GGGACTCTCCGCTTACTTGAGGATTGGTGTCGGGGGATGGACATGAACCCACGTAAGGCCCTTCTTATCGC CGGGATTTCCCAGTCATGTTCAGTCGCCGAGATTGAAGAGGCGCTCCAAGCCGGGCTTGCTCCTTTAGGCG AGTATCGTCTCCTTGGGCGGATGTTTCGCCGCGATGAAAATCGCAAAGTAGCGTTGGTTGGTCTCACAGCT GAAACTAGCCATGCGCTTGTACCTAAAGAAATTCCIGGTAAAGGCGGGATCTGGCGGGTTATTTTTAAACC ACCGGACCCGGACAATACGTTTCTTTCTCGTTTGAATGAGTTCCTCGCGGGCGAGGGGATGACGGIGGGG GAACTTAGTCGTGCTCTTGGTCACGAAAATGGGTCATTAGACCCTGAACAGGGTATGATTCCGGAAATGTG GGCGCCGATGCTGGCACAGGCTCTGGAGGCTCTCCAACCGGCTTTACAGTGCCTTAAGTACAAGAAGCTGC GCGTTTTTTCAGGGCGCGAGICTCCAGAGCCGGGTGAGGAGGAATTCGGCCGTTGGATGTTCCATACCACC CAGATGATCAAAGCGTGGCAGGTGCCGGATGTCGAGAAACGCCGCCGGCTGTTGGAATCACTCCGCGGGC CGGCACTTGACGTTATTCGGGTTCTGAAAATTAACAACCCGTTAATTACGGTAGATGAATGTTTGCAAGCAC TTGAAGAGGTCTTTGGGGTGACTGACAATCCTCGGGAATTGCAAGTAAAATACTTAACGACCTACCATAAG GACGAGGAGAAATTATCAGCCTACGTACTGCGGCTGGAACCGCTGCTGCAGAAGCTCGTCCAGCGGGGGG CTATTGAACGGGACGCTGTTAATCAGGCTCGCCTGGATCAGGTAATCGCTGGGGCGGTACATAAAACTATC CGCCGTGAGCTGAACCTGCCTGAAGACGGGCCGGCGCCAGGCTTTCTTCAACTCCTCGTTTTGATTAAGGA TTACGAGGCAGCTGAAGAGGAGGAAGCATTACTTCAGGCCATTCTTGAAGGGAACTTTACT SEQ ID NO: 56 GGGACAGAACGGCGTCGCGACGAATTAAGTGAAGAAATTAATAATCTTCGTGAAAAGGTTATGAAACAGA GTGAGGAAAACAACAATCTTCAATCCCAAGTCCAGAAACTCACTGAGGAGAATACTACACTCCGTGAGCAA GTTGAACCTACACCTGAAGATGAAGATGACGACATTGAGTTGCGGGGCGCAGCAGCCGCAGCCGCGCCTC CGCCGCCGATCGAGGAGGAATGCCCGGAGGATTTACCGGAAAAATTTGATGGTAATCCGGACATGTTAGC GCCATTCATGGCCCAGTGCCAAATTTTTATGGAAAAGTCTACGCGCGATTTTAGTGTAGATCGCGTACGTGT ATGTTTTGTGACGAGCATGATGACTGGTCGCGCAGCCCGTTGGGCGTCAGCGAAATTGGAGCGGTCGCAC TACCTGATGCATAATTACCCGGCGTTCATGATGGAGATGAAACACGTGTTTGAAGACCCGCAGCGGCGGG AGGTGGCCAAACGCAAGATCCGGCGGTTGCGGCAGGGCATGGGCAGCGTAATTGATTATAGTAATGCGTT TCAAATGATTGCGCAGGATCTGGATTGGAATGAACCTGCTCTCATTGATCAATATCATGAAGGGCTTAGTG ACCATATTCAAGAGGAACTCTCTCACCTGGAAGTGGCTAAATCTCTCTCCGCCCTTATTGGCCAATGCATTC ATATTGAGCGCCGTCTTGCACGTGCTGCTGCCGCTCGGAAACCGCGTAGTCCACCACGGGCTTTAGTGCTC CCACATATCGCGTCACACCATCAAGTAGATCCTACTGAGCCAGTGGGGGGTGCACGCATGCGCTTAACCCA AGAAGAAAAGGAACGTCGTCGTAAGCTGAATTTATGCCTGTACTGCGGCACTGGTGGCCATTATGCCGATA ACTGTCCTGCCAAAGCCAGTAAGTCAAGCCCGGCTGGGAAACTTCCAGGTCCTGCCGTCGAGGGCCCTTCT GCTACCGGCCCAGAGATTATCCGCTCCCCGCAAGACGATGCGTCGTCGCCTCATCTCCAGGTAATGCTCCAA ATCCACCTCCCTGGCCGGCACACACTCTTTGTCCGGGCGATGATTGACTCTGGGGCGTCTGGTAATTTTATT GATCACGAGTATGTTGCTCAAAATGGTATCCCTCTCCGGATCAAAGACTGGCCTATTCTGGTTGAAGCCATC GATGGCCGTCCGATCGCGAGCGGTCCTGTGGTTCATGAAACGCATGACCTCATCGTTGATCTGGGTGACCA CCGTGAAGTATTATCCTTTGATGTGACTCAGTCACCGTTTTTTCCAGTTGTTTTGGGCGTCCGTTGGCTTTCG ACTCACGATCCTAACATCACGTGGTCGACACGGTCGATTGTCTTCGATTCGGAATATTGTCGTTATCATTGC CGCATGTATTCACCAATTCCGCCGTCTCTCCCGCCGCCTGCGCCGCAACCTCCTCTGTATTACCCGGTGGAC GGTTACCGTGTTTACCAGCCAGTTCGCTACTACTACGTACAAAACGTGTACACGCCTGTTGATGAACACGTG TACCCAGATCACCGCCTGGTCGACCCTCATATTGAGATGATCCCGGGTGCGCACTCGATCCCATCGGGCCAT GTTTATTCCTTGTCTGAGCCAGAAATGGCCGCCTTACGGGATTTTGTGGCCCGGAATGTCAAAGACGGCCT GATTACCCCGACAATTGCACCAAACGGTGCTCAGGTGTTGCAGGTGAAGCGGGGCTGGAAGTTGCAAGTC AGCTATGATTGTCGTGCGCCAAACAACTTCACTATTCAGAACCAATATCCACGTCTCAGCATCCCTAATCTCG AGGACCAGGCACATCTTGCAACATATACTGAATTTGTACCTCAGATTCCTGGCTATCAGACTTATCCTACGT ATGCTGCCTACCCAACATACCCGGTAGGTTTCGCATGGTACCCAGTAGGCCGGGACGGGCAGGGCCGCTCT TTATATGTTCCTGTCATGATTACATGGAACCCGCATTGGTACCGCCAGCCTCCGGTCCCACAGTACCCACCTC CTCAACCTCCACCACCTCCGCCGCCTCCTCCACCGCCACCTTCTTACTCGACATTA
Sequence CWU
1
1
851396PRTHomo sapiens 1Gly Glu Leu Asp His Arg Thr Ser Gly Gly Leu His Ala
Tyr Pro Gly1 5 10 15Pro
Arg Gly Gly Gln Val Ala Lys Pro Asn Val Ile Leu Gln Ile Gly 20
25 30Lys Cys Arg Ala Glu Met Leu Glu
His Val Arg Arg Thr His Arg His 35 40
45Leu Leu Ala Glu Val Ser Lys Gln Val Glu Arg Glu Leu Lys Gly Leu
50 55 60His Arg Ser Val Gly Lys Leu Glu
Ser Asn Leu Asp Gly Tyr Val Pro65 70 75
80Thr Ser Asp Ser Gln Arg Trp Lys Lys Ser Ile Lys Ala
Cys Leu Cys 85 90 95Arg
Cys Gln Glu Thr Ile Ala Asn Leu Glu Arg Trp Val Lys Arg Glu
100 105 110Met His Val Trp Arg Glu Val
Phe Tyr Arg Leu Glu Arg Trp Ala Asp 115 120
125Arg Leu Glu Ser Thr Gly Gly Lys Tyr Pro Val Gly Ser Glu Ser
Ala 130 135 140Arg His Thr Val Ser Val
Gly Val Gly Gly Pro Glu Ser Tyr Cys His145 150
155 160Glu Ala Asp Gly Tyr Asp Tyr Thr Val Ser Pro
Tyr Ala Ile Thr Pro 165 170
175Pro Pro Ala Ala Gly Glu Leu Pro Gly Gln Glu Pro Ala Glu Ala Gln
180 185 190Gln Tyr Gln Pro Trp Val
Pro Gly Glu Asp Gly Gln Pro Ser Pro Gly 195 200
205Val Asp Thr Gln Ile Phe Glu Asp Pro Arg Glu Phe Leu Ser
His Leu 210 215 220Glu Glu Tyr Leu Arg
Gln Val Gly Gly Ser Glu Glu Tyr Trp Leu Ser225 230
235 240Gln Ile Gln Asn His Met Asn Gly Pro Ala
Lys Lys Trp Trp Glu Phe 245 250
255Lys Gln Gly Ser Val Lys Asn Trp Val Glu Phe Lys Lys Glu Phe Leu
260 265 270Gln Tyr Ser Glu Gly
Thr Leu Ser Arg Glu Ala Ile Gln Arg Glu Leu 275
280 285Asp Leu Pro Gln Lys Gln Gly Glu Pro Leu Asp Gln
Phe Leu Trp Arg 290 295 300Lys Arg Asp
Leu Tyr Gln Thr Leu Tyr Val Asp Ala Asp Glu Glu Glu305
310 315 320Ile Ile Gln Tyr Val Val Gly
Thr Leu Gln Pro Lys Leu Lys Arg Phe 325
330 335Leu Arg His Pro Leu Pro Lys Thr Leu Glu Gln Leu
Ile Gln Arg Gly 340 345 350Met
Glu Val Gln Asp Asp Leu Glu Gln Ala Ala Glu Pro Ala Gly Pro 355
360 365His Leu Pro Val Glu Asp Glu Ala Glu
Thr Leu Thr Pro Ala Pro Asn 370 375
380Ser Glu Ser Val Ala Ser Asp Arg Thr Gln Pro Glu385 390
3952396PRTOrcinus orca 2Gly Glu Leu Asp Gln Arg Thr Thr
Gly Gly Leu His Ala Tyr Pro Ala1 5 10
15Pro Arg Gly Gly Pro Val Ala Lys Pro Asn Val Ile Leu Gln
Ile Gly 20 25 30Lys Cys Arg
Ala Glu Met Leu Glu His Val Arg Arg Thr His Arg His 35
40 45Leu Leu Thr Glu Val Ser Lys Gln Val Glu Arg
Glu Leu Lys Gly Leu 50 55 60His Arg
Ser Val Gly Lys Leu Glu Ser Asn Leu Asp Gly Tyr Val Pro65
70 75 80Thr Gly Asp Ser Gln Arg Trp
Arg Lys Ser Ile Lys Ala Cys Leu Cys 85 90
95Arg Cys Gln Glu Thr Ile Ala Asn Leu Glu Arg Trp Val
Lys Arg Glu 100 105 110Met His
Val Trp Arg Glu Val Phe Tyr Arg Leu Glu Arg Trp Ala Asp 115
120 125Arg Leu Glu Ser Met Gly Gly Lys Tyr Pro
Val Gly Ser Asn Pro Ser 130 135 140Arg
His Thr Thr Ser Val Gly Val Gly Gly Pro Glu Ser Tyr Gly His145
150 155 160Glu Ala Asp Thr Tyr Asp
Tyr Thr Val Ser Pro Tyr Ala Ile Thr Pro 165
170 175Pro Pro Ala Ala Gly Glu Leu Pro Gly Gln Glu Ala
Val Glu Ala Gln 180 185 190Gln
Tyr Pro Pro Trp Gly Leu Gly Glu Asp Gly Gln Pro Ser Pro Gly 195
200 205Val Asp Thr Gln Ile Phe Glu Asp Pro
Arg Glu Phe Leu Ser His Leu 210 215
220Glu Glu Tyr Leu Arg Gln Val Gly Gly Ser Glu Glu Tyr Trp Leu Ser225
230 235 240Gln Ile Gln Asn
His Met Asn Gly Pro Ala Lys Lys Trp Trp Glu Tyr 245
250 255Lys Gln Gly Ser Val Lys Asn Trp Val Glu
Phe Lys Lys Glu Phe Leu 260 265
270Gln Tyr Ser Glu Gly Ala Leu Ser Arg Glu Ala Val Gln Arg Glu Leu
275 280 285Asp Leu Pro Gln Lys Gln Gly
Glu Pro Leu Asp Gln Phe Leu Trp Arg 290 295
300Lys Arg Asp Leu Tyr Gln Thr Leu Tyr Val Asp Ala Asp Glu Glu
Glu305 310 315 320Ile Ile
Gln Tyr Val Val Gly Thr Leu Gln Pro Lys Leu Lys Arg Phe
325 330 335Leu Arg Pro Pro Leu Pro Lys
Thr Leu Glu Gln Leu Ile Gln Lys Gly 340 345
350Met Glu Val Glu Asp Gly Leu Glu Gln Val Ala Glu Pro Ala
Ser Pro 355 360 365His Leu Pro Thr
Glu Glu Glu Ser Glu Ala Leu Thr Pro Ala Leu Thr 370
375 380Ser Glu Ser Val Ala Ser Asp Arg Thr Gln Pro Glu385
390 3953390PRTOdocoileus virginianus
texanus 3Gly Glu Leu Asp His Arg Thr Thr Gly Gly Leu His Ala Tyr Pro Ala1
5 10 15Pro Arg Gly Gly
Pro Ala Ala Lys Pro Asn Val Ile Leu Gln Ile Gly 20
25 30Lys Cys Arg Ala Glu Met Leu Glu His Val Arg
Arg Thr His Arg His 35 40 45Leu
Leu Ala Glu Val Ser Lys Gln Val Glu Arg Glu Leu Lys Gly Leu 50
55 60His Arg Ser Val Gly Lys Leu Glu Ser Asn
Leu Asp Gly Tyr Val Pro65 70 75
80Thr Gly Asp Ser Gln Arg Trp Lys Lys Ser Ile Lys Ala Cys Leu
Ser 85 90 95Arg Cys Gln
Glu Thr Ile Ala Asn Leu Glu Arg Trp Val Lys Arg Glu 100
105 110Met His Val Trp Arg Glu Val Phe Tyr Arg
Leu Glu Arg Trp Ala Asp 115 120
125Arg Leu Glu Ser Gly Gly Gly Lys Tyr Pro Val Gly Ser Asp Pro Ala 130
135 140Arg His Thr Val Ser Val Gly Val
Gly Gly Pro Glu Ser Tyr Cys Gln145 150
155 160Asp Ala Asp Asn Tyr Asp Tyr Thr Val Ser Pro Tyr
Ala Ile Thr Pro 165 170
175Pro Pro Ala Ala Gly Gln Leu Pro Gly Gln Glu Glu Val Glu Ala Gln
180 185 190Gln Tyr Pro Pro Trp Ala
Pro Gly Glu Asp Gly Gln Leu Ser Pro Gly 195 200
205Val Asp Thr Gln Val Phe Glu Asp Pro Arg Glu Phe Leu Arg
His Leu 210 215 220Glu Asp Tyr Leu Arg
Gln Val Gly Gly Ser Glu Glu Tyr Trp Leu Ser225 230
235 240Gln Ile Gln Asn His Met Asn Gly Pro Ala
Lys Lys Trp Trp Glu Tyr 245 250
255Lys Gln Gly Ser Val Lys Asn Trp Val Glu Phe Lys Lys Glu Phe Leu
260 265 270Gln Tyr Ser Glu Gly
Thr Leu Ser Arg Glu Ala Ile Gln Arg Glu Leu 275
280 285Asp Leu Pro Gln Lys Gln Gly Glu Pro Leu Asp Gln
Phe Leu Trp Arg 290 295 300Lys Arg Asp
Leu Tyr Gln Thr Leu Tyr Val Asp Ala Glu Glu Glu Glu305
310 315 320Ile Ile Gln Tyr Val Val Gly
Thr Leu Gln Pro Lys Leu Lys Arg Phe 325
330 335Leu Arg Pro Pro Leu Pro Lys Thr Leu Glu Gln Leu
Ile Gln Lys Gly 340 345 350Met
Glu Val Gln Asp Gly Leu Glu Gln Ala Ala Glu Pro Ala Ala Glu 355
360 365Glu Ala Glu Ala Leu Thr Pro Ala Leu
Thr Asn Glu Ser Val Ala Ser 370 375
380Asp Arg Thr Gln Pro Glu385 3904401PRTOrnithorhynchus
anatinus 4Gly Glu Leu Asp Arg Leu Asn Pro Ser Ser Gly Leu His Pro Ser
Ser1 5 10 15Gly Leu His
Pro Tyr Pro Gly Leu Arg Gly Gly Ala Thr Ala Lys Pro 20
25 30Asn Val Ile Leu Gln Ile Gly Lys Cys Arg
Ala Glu Met Leu Glu His 35 40
45Val Arg Lys Thr His Arg His Leu Leu Thr Glu Val Ser Arg Gln Val 50
55 60Glu Arg Glu Leu Lys Gly Leu His Lys
Ser Val Gly Lys Leu Glu Ser65 70 75
80Asn Leu Asp Gly Tyr Val Pro Ser Ser Asp Ser Gln Arg Trp
Lys Lys 85 90 95Ser Ile
Lys Ala Cys Leu Ser Arg Cys Gln Glu Thr Ile Ala His Leu 100
105 110Glu Arg Trp Val Lys Arg Glu Met Asn
Val Trp Arg Glu Val Phe Tyr 115 120
125Arg Leu Glu Arg Trp Ala Asp Arg Leu Glu Ala Met Gly Gly Lys Tyr
130 135 140Pro Ala Gly Glu Gln Ala Arg
Arg Thr Val Ser Val Gly Val Gly Gly145 150
155 160Pro Glu Thr Cys Cys Pro Gly Asp Glu Ser Tyr Asp
Cys Pro Ile Ser 165 170
175Pro Tyr Ala Val Pro Pro Ser Thr Gly Glu Ser Pro Glu Ser Leu Asp
180 185 190Gln Gly Asp Gln His Tyr
Gln Gln Trp Phe Ala Leu Pro Glu Glu Ser 195 200
205Pro Val Ser Pro Gly Val Asp Thr Gln Ile Phe Glu Asp Pro
Arg Glu 210 215 220Phe Leu Arg His Leu
Glu Lys Tyr Leu Lys Gln Val Gly Gly Thr Glu225 230
235 240Glu Asp Trp Leu Ser Gln Ile Gln Asn His
Met Asn Gly Pro Ala Lys 245 250
255Lys Trp Trp Glu Tyr Lys Gln Gly Ser Val Lys Asn Trp Leu Glu Phe
260 265 270Lys Lys Glu Phe Leu
Gln Tyr Ser Glu Gly Thr Leu Thr Arg Asp Ala 275
280 285Leu Lys Arg Glu Leu Asp Leu Pro Gln Lys Gln Gly
Glu Pro Leu Asp 290 295 300Gln Phe Leu
Trp Arg Lys Arg Asp Leu Tyr Gln Thr Leu Tyr Val Asp305
310 315 320Ala Asp Glu Glu Glu Ile Ile
Gln Tyr Val Val Gly Thr Leu Gln Pro 325
330 335Lys Leu Lys Arg Phe Leu His His Pro Leu Pro Lys
Thr Leu Glu Gln 340 345 350Leu
Ile Gln Arg Gly Gln Glu Val Gln Asn Gly Leu Glu Pro Thr Asp 355
360 365Asp Pro Ala Gly Gln Arg Thr Gln Ser
Glu Asp Asn Asp Glu Ser Leu 370 375
380Thr Pro Ala Val Thr Asn Glu Ser Thr Ala Ser Glu Gly Thr Leu Pro385
390 395 400Glu5404PRTAnser
cygnoides domesticus 5Gly Gln Leu Asp Asn Val Thr Asn Ala Gly Ile His Ser
Phe Gln Gly1 5 10 15His
Arg Gly Val Ala Asn Lys Pro Asn Val Ile Leu Gln Ile Gly Lys 20
25 30Cys Arg Ala Glu Met Leu Glu His
Val Arg Arg Thr His Arg His Leu 35 40
45Leu Ser Glu Val Ser Lys Gln Val Glu Arg Glu Leu Lys Gly Leu Gln
50 55 60Lys Ser Val Gly Lys Leu Glu Asn
Asn Leu Glu Asp His Val Pro Thr65 70 75
80Asp Asn Gln Arg Trp Lys Lys Ser Ile Lys Ala Cys Leu
Ala Arg Cys 85 90 95Gln
Glu Thr Ile Ala His Leu Glu Arg Trp Val Lys Arg Glu Met Asn
100 105 110Val Trp Lys Glu Val Phe Phe
Arg Leu Glu Lys Trp Ala Asp Arg Leu 115 120
125Glu Ser Met Gly Gly Lys Tyr Cys Pro Gly Glu His Gly Lys Gln
Thr 130 135 140Val Ser Val Gly Val Gly
Gly Pro Glu Ile Arg Pro Ser Glu Gly Glu145 150
155 160Ile Tyr Asp Tyr Ala Leu Asp Met Ser Gln Met
Tyr Ala Leu Thr Pro 165 170
175Pro Pro Gly Glu Met Pro Ser Ile Pro Gln Ala His Asp Ser Tyr Gln
180 185 190Trp Val Ser Val Ser Glu
Asp Ala Pro Ala Ser Pro Val Glu Thr Gln 195 200
205Val Phe Glu Asp Pro Arg Glu Phe Leu Ser His Leu Glu Glu
Tyr Leu 210 215 220Lys Gln Val Gly Gly
Thr Glu Glu Tyr Trp Leu Ser Gln Ile Gln Asn225 230
235 240His Met Asn Gly Pro Ala Lys Lys Trp Trp
Glu Tyr Lys Gln Asp Ser 245 250
255Val Lys Asn Trp Val Glu Phe Lys Lys Glu Phe Leu Gln Tyr Ser Glu
260 265 270Gly Thr Leu Thr Arg
Asp Ala Ile Lys Arg Glu Leu Asp Leu Pro Gln 275
280 285Lys Glu Gly Glu Pro Leu Asp Gln Phe Leu Trp Arg
Lys Arg Asp Leu 290 295 300Tyr Gln Thr
Leu Tyr Val Asp Ala Asp Glu Glu Glu Ile Ile Gln Tyr305
310 315 320Val Val Gly Thr Leu Gln Pro
Lys Leu Lys Arg Phe Leu Ser Tyr Pro 325
330 335Leu Pro Lys Thr Leu Glu Gln Leu Ile Gln Arg Gly
Lys Glu Val Gln 340 345 350Gly
Asn Met Asp His Ser Asp Glu Pro Ser Pro Gln Arg Thr Pro Glu 355
360 365Ile Gln Ser Gly Asp Ser Val Glu Ser
Met Pro Pro Ser Thr Thr Ala 370 375
380Ser Pro Val Pro Ser Asn Gly Thr Gln Pro Glu Pro Pro Ser Pro Pro385
390 395 400Ala Thr Val
Ile6395PRTPelecanus crispus 6Gly Gln Leu Asp Asn Val Thr Asn Ala Gly Ile
His Ser Phe Gln Gly1 5 10
15His Arg Gly Val Ala Asn Lys Pro Asn Val Ile Leu Gln Ile Gly Lys
20 25 30Cys Arg Ala Glu Met Leu Glu
His Val Arg Arg Thr His Arg His Leu 35 40
45Leu Ser Glu Val Ser Lys Gln Val Glu Arg Glu Leu Lys Gly Leu
Gln 50 55 60Lys Ser Val Gly Lys Leu
Glu Asn Asn Leu Glu Asp His Val Pro Thr65 70
75 80Asp Asn Gln Arg Trp Lys Lys Ser Ile Lys Ala
Cys Leu Ala Arg Cys 85 90
95Gln Glu Thr Ile Ala His Leu Glu Arg Trp Val Lys Arg Glu Met Asn
100 105 110Val Trp Lys Glu Val Phe
Phe Arg Leu Glu Lys Trp Ala Asp Arg Leu 115 120
125Glu Ser Met Gly Gly Lys Tyr Cys Pro Gly Glu His Gly Lys
Gln Thr 130 135 140Val Ser Val Gly Val
Gly Gly Pro Glu Ile Arg Pro Ser Glu Gly Glu145 150
155 160Ile Tyr Asp Tyr Ala Leu Asp Met Ser Gln
Met Tyr Ala Leu Thr Pro 165 170
175Pro Pro Gly Glu Val Pro Ser Ile Pro Gln Ala His Asp Ser Tyr Gln
180 185 190Trp Val Ser Val Ser
Glu Asp Ala Pro Ala Ser Pro Val Glu Thr Gln 195
200 205Val Phe Glu Asp Pro Arg Glu Phe Leu Ser His Leu
Glu Glu Tyr Leu 210 215 220Lys Gln Val
Gly Gly Thr Glu Glu Tyr Trp Leu Ser Gln Ile Gln Asn225
230 235 240His Met Asn Gly Pro Ala Lys
Lys Trp Trp Glu Tyr Lys Gln Asp Ser 245
250 255Val Lys Asn Trp Val Glu Phe Lys Lys Glu Phe Leu
Gln Tyr Ser Glu 260 265 270Gly
Thr Leu Thr Arg Asp Ala Ile Lys Arg Glu Leu Asp Leu Pro Gln 275
280 285Lys Glu Gly Glu Pro Leu Asp Gln Phe
Leu Trp Arg Lys Arg Asp Leu 290 295
300Tyr Gln Thr Leu Tyr Val Asp Ala Asp Glu Glu Glu Ile Ile Gln Tyr305
310 315 320Val Val Gly Thr
Leu Gln Pro Lys Leu Lys Arg Phe Leu Ser Tyr Pro 325
330 335Leu Pro Lys Thr Leu Glu Gln Leu Ile Gln
Arg Gly Lys Glu Val Gln 340 345
350Gly Asn Met Asp His Ser Glu Glu Pro Ser Pro Gln Arg Thr Pro Glu
355 360 365Ile Gln Ser Gly Asp Ser Val
Asp Ser Val Pro Pro Ser Thr Thr Ala 370 375
380Ser Pro Val Pro Ser Asn Gly Thr Gln Pro Glu385
390 3957395PRTHaliaeetus albicilla 7Gly Gln Leu Asp Asn
Val Thr Asn Ala Gly Ile His Ser Phe Gln Gly1 5
10 15His Arg Gly Val Ala Asn Lys Pro Asn Val Ile
Leu Gln Ile Gly Lys 20 25
30Cys Arg Ala Glu Met Leu Glu His Val Arg Arg Thr His Arg His Leu
35 40 45Leu Ser Glu Val Ser Lys Gln Val
Glu Arg Glu Leu Lys Gly Leu Gln 50 55
60Lys Ser Val Gly Lys Leu Glu Asn Asn Leu Glu Asp His Val Pro Thr65
70 75 80Asp Asn Gln Arg Trp
Lys Lys Ser Ile Lys Ala Cys Leu Ala Arg Cys 85
90 95Gln Glu Thr Ile Ala His Leu Glu Arg Trp Val
Lys Arg Glu Met Asn 100 105
110Val Trp Lys Glu Val Phe Phe Arg Leu Glu Lys Trp Ala Asp Arg Leu
115 120 125Glu Ser Met Gly Gly Lys Tyr
Cys Pro Gly Asp His Gly Lys Gln Thr 130 135
140Val Ser Val Gly Val Gly Gly Pro Glu Ile Arg Pro Ser Glu Gly
Glu145 150 155 160Ile Tyr
Asp Tyr Ala Leu Asp Met Ser Gln Met Tyr Ala Leu Thr Pro
165 170 175Pro Pro Gly Glu Val Pro Ser
Ile Pro Gln Ala His Asp Ser Tyr Gln 180 185
190Trp Val Ser Thr Ser Glu Asp Ala Pro Ala Ser Pro Val Glu
Thr Gln 195 200 205Val Phe Glu Asp
Pro Arg Glu Phe Leu Ser His Leu Glu Glu Tyr Leu 210
215 220Lys Gln Val Gly Gly Thr Glu Glu Tyr Trp Leu Ser
Gln Ile Gln Asn225 230 235
240His Met Asn Gly Pro Ala Lys Lys Trp Trp Glu Tyr Lys Gln Asp Ser
245 250 255Val Lys Asn Trp Val
Glu Phe Lys Lys Glu Phe Leu Gln Tyr Ser Glu 260
265 270Gly Thr Leu Thr Arg Asp Ala Ile Lys Arg Glu Leu
Asp Leu Pro Gln 275 280 285Lys Glu
Gly Glu Pro Leu Asp Gln Phe Leu Trp Arg Lys Arg Asp Leu 290
295 300Tyr Gln Thr Leu Tyr Val Asp Ala Asp Glu Glu
Glu Ile Ile Gln Tyr305 310 315
320Val Val Gly Thr Leu Gln Pro Lys Leu Lys Arg Phe Leu Ser Tyr Pro
325 330 335Leu Pro Lys Thr
Leu Glu Gln Leu Ile Gln Arg Gly Lys Glu Val Gln 340
345 350Gly Asn Met Asp His Ser Glu Glu Pro Ser Pro
Gln Arg Thr Pro Glu 355 360 365Ile
Gln Ser Gly Asp Ser Val Asp Ser Val Pro Pro Ser Thr Thr Ala 370
375 380Ser Pro Val Pro Ser Asn Gly Thr Gln Pro
Glu385 390 3958465PRTOphiophagus hannah
8Gly Ser Trp Gly Leu Gln Arg His Val Ala Asp Glu Arg Arg Gly Leu1
5 10 15Ala Thr Pro Thr Tyr Gly
Ala Val Cys Ser Ile Arg Glu Lys Lys Ala 20 25
30Ser Gln Leu Ser Gly Gln Ser Cys Leu Glu Lys Glu Leu
Leu Gly Trp 35 40 45Lys Cys Thr
Glu Ala Ile Val Glu Met Met Gln Val Asp Asn Phe Asn 50
55 60His Gly Asn Leu His Ser Cys Gln Gly His Arg Gly
Met Ala Asn His65 70 75
80Lys Pro Asn Val Ile Leu Gln Ile Gly Lys Cys Arg Ala Glu Met Leu
85 90 95Asp His Val Arg Arg Thr
His Arg His Leu Leu Thr Glu Val Ser Lys 100
105 110Gln Val Glu Arg Glu Leu Lys Ser Leu Gln Lys Ser
Val Gly Lys Leu 115 120 125Glu Asn
Asn Leu Glu Asp His Val Pro Ser Ala Ala Glu Asn Gln Arg 130
135 140Trp Lys Lys Ser Ile Lys Ala Cys Leu Ala Arg
Cys Gln Glu Thr Ile145 150 155
160Ala His Leu Glu Arg Trp Val Lys Arg Glu Ile Asn Val Trp Lys Glu
165 170 175Val Phe Phe Arg
Leu Glu Lys Trp Ala Asp Arg Leu Glu Ser Gly Gly 180
185 190Gly Lys Tyr Gly Pro Gly Asp Gln Ser Arg Gln
Thr Val Ser Val Gly 195 200 205Val
Gly Ala Pro Glu Ile Gln Pro Arg Lys Glu Glu Ile Tyr Asp Tyr 210
215 220Ala Leu Asp Met Ser Gln Met Tyr Ala Leu
Thr Pro Pro Pro Met Gly225 230 235
240Glu Asp Pro Asn Val Pro Gln Ser His Asp Ser Tyr Gln Trp Ile
Thr 245 250 255Ile Ser Asp
Asp Ser Pro Pro Ser Pro Val Glu Thr Gln Ile Phe Glu 260
265 270Asp Pro Arg Glu Phe Leu Thr His Leu Glu
Asp Tyr Leu Lys Gln Val 275 280
285Gly Gly Thr Glu Glu Tyr Trp Leu Ser Gln Ile Gln Asn His Met Asn 290
295 300Gly Pro Ala Lys Lys Trp Trp Glu
Tyr Lys Gln Asp Ser Val Lys Asn305 310
315 320Trp Leu Glu Phe Lys Lys Glu Phe Leu Gln Tyr Ser
Glu Gly Thr Leu 325 330
335Thr Arg Asp Ala Ile Lys Gln Glu Leu Asp Leu Pro Gln Lys Asp Gly
340 345 350Glu Pro Leu Asp Gln Phe
Leu Trp Arg Lys Arg Asp Leu Tyr Gln Thr 355 360
365Leu Tyr Ile Asp Ala Glu Glu Glu Glu Val Ile Gln Tyr Val
Val Gly 370 375 380Thr Leu Gln Pro Lys
Leu Lys Arg Phe Leu Ser His Pro Tyr Pro Lys385 390
395 400Thr Leu Glu Gln Leu Ile Gln Arg Gly Lys
Glu Val Glu Gly Asn Leu 405 410
415Asp Asn Ser Glu Glu Pro Ser Pro Gln Arg Ser Pro Lys His Gln Leu
420 425 430Gly Gly Ser Val Glu
Ser Leu Pro Pro Ser Ser Thr Ala Ser Pro Val 435
440 445Ala Ser Asp Glu Thr His Pro Asp Val Ser Ala Pro
Pro Val Thr Val 450 455
460Ile4659451PRTAustrofundulus limnaeus 9Gly Asp Gly Glu Thr Gln Ala Glu
Asn Pro Ser Thr Ser Leu Asn Asn1 5 10
15Thr Asp Glu Asp Ile Leu Glu Gln Leu Lys Lys Ile Val Met
Asp Gln 20 25 30Gln His Leu
Tyr Gln Lys Glu Leu Lys Ala Ser Phe Glu Gln Leu Ser 35
40 45Arg Lys Met Phe Ser Gln Met Glu Gln Met Asn
Ser Lys Gln Thr Asp 50 55 60Leu Leu
Leu Glu His Gln Lys Gln Thr Val Lys His Val Asp Lys Arg65
70 75 80Val Glu Tyr Leu Arg Ala Gln
Phe Asp Ala Ser Leu Gly Trp Arg Leu 85 90
95Lys Glu Gln His Ala Asp Ile Thr Thr Lys Ile Ile Pro
Glu Ile Ile 100 105 110Gln Thr
Val Lys Glu Asp Ile Ser Leu Cys Leu Ser Thr Leu Cys Ser 115
120 125Ile Ala Glu Asp Ile Gln Thr Ser Arg Ala
Thr Thr Val Thr Gly His 130 135 140Ala
Ala Val Gln Thr His Pro Val Asp Leu Leu Gly Glu His His Leu145
150 155 160Gly Thr Thr Gly His Pro
Arg Leu Gln Ser Thr Arg Val Gly Lys Pro 165
170 175Asp Asp Val Pro Glu Ser Pro Val Ser Leu Phe Met
Gln Gly Glu Ala 180 185 190Arg
Ser Arg Ile Val Gly Lys Ser Pro Ile Lys Leu Gln Phe Pro Thr 195
200 205Phe Gly Lys Ala Asn Asp Ser Ser Asp
Pro Leu Gln Tyr Leu Glu Arg 210 215
220Cys Glu Asp Phe Leu Ala Leu Asn Pro Leu Thr Asp Glu Glu Leu Met225
230 235 240Ala Thr Leu Arg
Asn Val Leu His Gly Thr Ser Arg Asp Trp Trp Asp 245
250 255Val Ala Arg His Lys Ile Gln Thr Trp Arg
Glu Phe Asn Lys His Phe 260 265
270Arg Ala Ala Phe Leu Ser Glu Asp Tyr Glu Asp Glu Leu Ala Glu Arg
275 280 285Val Arg Asn Arg Ile Gln Lys
Glu Asp Glu Ser Ile Arg Asp Phe Ala 290 295
300Tyr Met Tyr Gln Ser Leu Cys Lys Arg Trp Asn Pro Ala Ile Cys
Glu305 310 315 320Gly Asp
Val Val Lys Leu Ile Leu Lys Asn Ile Asn Pro Gln Leu Pro
325 330 335Ser Gln Leu Arg Ser Arg Val
Thr Thr Val Asp Glu Leu Val Arg Leu 340 345
350Gly Gln Gln Leu Glu Lys Asp Arg Gln Asn Gln Leu Gln Tyr
Glu Leu 355 360 365Arg Lys Ser Ser
Gly Lys Ile Ile Gln Lys Ser Ser Ser Cys Glu Thr 370
375 380Ser Ala Leu Pro Asn Thr Lys Ser Thr Pro Asn Gln
Gln Asn Pro Ala385 390 395
400Thr Ser Asn Arg Pro Pro Gln Val Tyr Cys Trp Arg Cys Lys Gly His
405 410 415His Ala Pro Ala Ser
Cys Pro Gln Trp Lys Ala Asp Lys His Arg Ala 420
425 430Gln Pro Ser Arg Ser Ser Gly Pro Gln Thr Leu Thr
Asn Leu Gln Ala 435 440 445Gln Asp
Ile 45010396PRTPhyseter catodon 10Gly Glu Leu Asp Gln Arg Ala Ala Gly
Gly Leu Arg Ala Tyr Pro Ala1 5 10
15Pro Arg Gly Gly Pro Val Ala Lys Pro Ser Val Ile Leu Gln Ile
Gly 20 25 30Lys Cys Arg Ala
Glu Met Leu Glu His Val Arg Arg Thr His Arg His 35
40 45Leu Leu Thr Glu Val Ser Lys Gln Val Glu Arg Glu
Leu Lys Gly Leu 50 55 60His Arg Ser
Val Gly Lys Leu Glu Gly Asn Leu Asp Gly Tyr Val Pro65 70
75 80Thr Gly Asp Ser Gln Arg Trp Lys
Lys Ser Ile Lys Ala Cys Leu Cys 85 90
95Arg Cys Gln Glu Thr Ile Ala Asn Leu Glu Arg Trp Val Lys
Arg Glu 100 105 110Met His Val
Trp Arg Glu Val Phe Tyr Arg Leu Glu Arg Trp Ala Asp 115
120 125Arg Leu Glu Ser Met Gly Gly Lys Tyr Pro Val
Gly Thr Asn Pro Ser 130 135 140Arg His
Thr Val Ser Val Gly Val Gly Gly Pro Glu Gly Tyr Ser His145
150 155 160Glu Ala Asp Thr Tyr Asp Tyr
Thr Val Ser Pro Tyr Ala Ile Thr Pro 165
170 175Pro Pro Ala Ala Gly Glu Leu Pro Gly Gln Glu Ala
Val Glu Ala Gln 180 185 190Gln
Tyr Pro Pro Trp Gly Leu Gly Glu Asp Gly Gln Pro Gly Pro Gly 195
200 205Val Asp Thr Gln Ile Phe Glu Asp Pro
Arg Glu Phe Leu Ser His Leu 210 215
220Glu Glu Tyr Leu Arg Gln Val Gly Gly Ser Glu Glu Tyr Trp Leu Ser225
230 235 240Gln Ile Gln Asn
His Met Asn Gly Pro Ala Lys Lys Trp Trp Glu Phe 245
250 255Lys Gln Gly Ser Val Lys Asn Trp Val Glu
Phe Lys Lys Glu Phe Leu 260 265
270Gln Tyr Ser Glu Gly Thr Leu Ser Arg Glu Ala Ile Gln Arg Glu Leu
275 280 285Asp Leu Pro Gln Lys Gln Gly
Glu Pro Leu Asp Gln Phe Leu Trp Arg 290 295
300Lys Arg Asp Leu Tyr Gln Thr Leu Tyr Val Asp Ala Glu Glu Glu
Glu305 310 315 320Ile Ile
Gln Tyr Val Val Gly Thr Leu Gln Pro Lys Leu Lys Arg Phe
325 330 335Leu Arg Pro Pro Leu Pro Lys
Thr Leu Glu Gln Leu Ile Gln Lys Gly 340 345
350Met Glu Val Gln Asp Gly Leu Glu Gln Ala Ala Glu Pro Ala
Ser Pro 355 360 365Arg Leu Pro Pro
Glu Glu Glu Ser Glu Ala Leu Thr Pro Ala Leu Thr 370
375 380Ser Glu Ser Val Ala Ser Asp Arg Thr Gln Pro Glu385
390 39511404PRTMeleagris gallopavo 11Gly
Gln Leu Asp Asn Val Thr Asn Ala Gly Ile His Ser Phe Gln Gly1
5 10 15His Arg Gly Val Ala Asn Lys
Pro Asn Val Ile Leu Gln Ile Gly Lys 20 25
30Cys Arg Ala Glu Met Leu Glu His Val Arg Arg Thr His Arg
His Leu 35 40 45Leu Ser Glu Val
Ser Lys Gln Val Glu Arg Glu Leu Lys Gly Leu Gln 50 55
60Lys Ser Val Gly Lys Leu Glu Asn Asn Leu Glu Asp His
Val Pro Thr65 70 75
80Asp Asn Gln Arg Trp Lys Lys Ser Ile Lys Ala Cys Leu Ala Arg Cys
85 90 95Gln Glu Thr Ile Ala His
Leu Glu Arg Trp Val Lys Arg Glu Met Asn 100
105 110Val Trp Lys Glu Val Phe Phe Arg Leu Glu Lys Trp
Ala Asp Arg Leu 115 120 125Glu Ser
Met Gly Gly Lys Tyr Cys Pro Gly Glu His Gly Lys Gln Thr 130
135 140Val Ser Val Gly Val Gly Gly Pro Glu Ile Arg
Pro Ser Glu Gly Glu145 150 155
160Ile Tyr Asp Tyr Ala Leu Asp Met Ser Gln Met Tyr Ala Leu Thr Pro
165 170 175Gly Pro Gly Glu
Val Pro Ser Ile Pro Gln Ala His Asp Ser Tyr Gln 180
185 190Trp Val Ser Val Ser Glu Asp Ala Pro Ala Ser
Pro Val Glu Thr Gln 195 200 205Ile
Phe Glu Asp Pro His Glu Phe Leu Ser His Leu Glu Glu Tyr Leu 210
215 220Lys Gln Val Gly Gly Thr Glu Glu Tyr Trp
Leu Ser Gln Ile Gln Asn225 230 235
240His Met Asn Gly Pro Ala Lys Lys Trp Trp Glu Tyr Lys Gln Asp
Ser 245 250 255Val Lys Asn
Trp Val Glu Phe Lys Lys Glu Phe Leu Gln Tyr Ser Glu 260
265 270Gly Thr Leu Thr Arg Asp Ala Ile Lys Arg
Glu Leu Asp Leu Pro Gln 275 280
285Lys Glu Gly Glu Pro Leu Asp Gln Phe Leu Trp Arg Lys Arg Asp Leu 290
295 300Tyr Gln Thr Leu Tyr Val Asp Ala
Asp Glu Glu Glu Ile Ile Gln Tyr305 310
315 320Val Val Gly Thr Leu Gln Pro Lys Leu Lys Arg Phe
Leu Ser Tyr Pro 325 330
335Leu Pro Lys Thr Leu Glu Gln Leu Ile Gln Arg Gly Lys Glu Val Gln
340 345 350Gly Asn Met Asp His Ser
Glu Glu Pro Ser Pro Gln Arg Thr Pro Glu 355 360
365Ile Gln Ser Gly Asp Ser Val Glu Ser Met Pro Pro Ser Thr
Thr Ala 370 375 380Ser Pro Val Pro Ser
Asn Gly Thr Gln Pro Glu Pro Pro Ser Pro Pro385 390
395 400Ala Thr Val Ile12409PRTPogona vitticeps
12Gly Gln Leu Glu Asn Ile Asn Gln Gly Ser Leu His Ala Phe Gln Gly1
5 10 15His Arg Gly Val Val His
Asn Asn Lys Pro Asn Val Ile Leu Gln Ile 20 25
30Gly Lys Cys Arg Ala Glu Met Leu Glu His Val Arg Arg
Thr His Arg 35 40 45His Leu Leu
Thr Glu Val Ser Lys Gln Val Glu Arg Glu Leu Lys Gly 50
55 60Leu Gln Lys Ser Val Gly Lys Leu Glu Asn Asn Leu
Glu Asp His Val65 70 75
80Pro Ser Ala Ala Glu Asn Gln Arg Trp Lys Lys Ser Ile Lys Ala Cys
85 90 95Leu Ala Arg Cys Gln Glu
Thr Ile Ala Asn Leu Glu Arg Trp Val Lys 100
105 110Arg Glu Met Asn Val Trp Lys Glu Val Phe Phe Arg
Leu Glu Arg Trp 115 120 125Ala Asp
Arg Leu Glu Ser Gly Gly Gly Lys Tyr Cys His Ala Asp Gln 130
135 140Gly Arg Gln Thr Val Ser Val Gly Val Gly Gly
Pro Glu Val Arg Pro145 150 155
160Ser Glu Gly Glu Ile Tyr Asp Tyr Ala Leu Asp Met Ser Gln Met Tyr
165 170 175Ala Leu Thr Pro
Pro Pro Met Gly Asp Val Pro Val Ile Pro Gln Pro 180
185 190His Asp Ser Tyr Gln Trp Val Thr Asp Pro Glu
Glu Ala Pro Pro Ser 195 200 205Pro
Val Glu Thr Gln Ile Phe Glu Asp Pro Arg Glu Phe Leu Thr His 210
215 220Leu Glu Asp Tyr Leu Lys Gln Val Gly Gly
Thr Glu Glu Tyr Trp Leu225 230 235
240Ser Gln Ile Gln Asn His Met Asn Gly Pro Ala Lys Lys Trp Trp
Glu 245 250 255Tyr Lys Gln
Asp Ser Val Lys Asn Trp Leu Glu Phe Lys Lys Glu Phe 260
265 270Leu Gln Tyr Ser Glu Gly Thr Leu Thr Arg
Asp Ala Ile Lys Gln Glu 275 280
285Leu Asp Leu Pro Gln Lys Glu Gly Glu Pro Leu Asp Gln Phe Leu Trp 290
295 300Arg Lys Arg Asp Leu Tyr Gln Thr
Leu Tyr Val Glu Ala Glu Glu Glu305 310
315 320Glu Val Ile Gln Tyr Val Val Gly Thr Leu Gln Pro
Lys Leu Lys Arg 325 330
335Phe Leu Ser His Pro Tyr Pro Lys Thr Leu Glu Gln Leu Ile Gln Arg
340 345 350Gly Lys Glu Val Glu Gly
Asn Leu Asp Asn Ser Glu Glu Pro Ser Pro 355 360
365Gln Arg Thr Pro Glu His Gln Leu Gly Asp Ser Val Glu Ser
Leu Pro 370 375 380Pro Ser Thr Thr Ala
Ser Pro Ala Gly Ser Asp Lys Thr Gln Pro Glu385 390
395 400Ile Ser Leu Pro Pro Thr Thr Val Ile
40513404PRTAlligator sinensis 13Gly Gln Leu Asp Ser Val Thr Asn
Ala Gly Val His Thr Tyr Gln Gly1 5 10
15His Arg Ser Val Ala Asn Lys Pro Asn Val Ile Leu Gln Ile
Gly Lys 20 25 30Cys Arg Thr
Glu Met Leu Glu His Val Arg Arg Thr His Arg His Leu 35
40 45Leu Thr Glu Val Ser Lys Gln Val Glu Arg Glu
Leu Lys Gly Leu Gln 50 55 60Lys Ser
Val Gly Lys Leu Glu Asn Asn Leu Glu Asp His Val Pro Thr65
70 75 80Asp Asn Gln Arg Trp Lys Lys
Ser Ile Lys Ala Cys Leu Ala Arg Cys 85 90
95Gln Glu Thr Ile Ala His Leu Glu Arg Trp Val Lys Arg
Glu Met Asn 100 105 110Val Trp
Lys Glu Val Phe Phe Arg Leu Glu Arg Trp Ala Asp Arg Leu 115
120 125Glu Ser Met Gly Gly Lys Tyr Cys Pro Thr
Asp Ser Ala Arg Gln Thr 130 135 140Val
Ser Val Gly Val Gly Gly Pro Glu Ile Arg Pro Ser Glu Gly Glu145
150 155 160Ile Tyr Asp Tyr Ala Leu
Asp Met Ser Gln Met Tyr Ala Leu Thr Pro 165
170 175Ser Pro Gly Glu Leu Pro Ser Val Pro Gln Pro His
Asp Ser Tyr Gln 180 185 190Trp
Val Thr Ser Pro Glu Asp Ala Pro Ala Ser Pro Val Glu Thr Gln 195
200 205Val Phe Glu Asp Pro Arg Glu Phe Leu
Cys His Leu Glu Glu Tyr Leu 210 215
220Lys Gln Val Gly Gly Thr Glu Glu Tyr Trp Leu Ser Gln Ile Gln Asn225
230 235 240His Met Asn Gly
Pro Ala Lys Lys Trp Trp Glu Tyr Lys Gln Asp Thr 245
250 255Val Lys Asn Trp Val Glu Phe Lys Lys Glu
Phe Leu Gln Tyr Ser Glu 260 265
270Gly Thr Leu Thr Arg Asp Ala Ile Lys Arg Glu Leu Asp Leu Pro Gln
275 280 285Lys Asp Gly Glu Pro Leu Asp
Gln Phe Leu Trp Arg Lys Arg Asp Leu 290 295
300Tyr Gln Thr Leu Tyr Ile Asp Ala Asp Glu Glu Gln Ile Ile Gln
Tyr305 310 315 320Val Val
Gly Thr Leu Gln Pro Lys Leu Lys Arg Phe Leu Ser Tyr Pro
325 330 335Leu Pro Lys Thr Leu Glu Gln
Leu Ile Gln Lys Gly Lys Glu Val Gln 340 345
350Gly Ser Leu Asp His Ser Glu Glu Pro Ser Pro Gln Arg Ala
Ser Glu 355 360 365Ala Arg Thr Gly
Asp Ser Val Glu Thr Leu Pro Pro Ser Thr Thr Thr 370
375 380Ser Pro Asn Thr Ser Ser Gly Thr Gln Pro Glu Ala
Pro Ser Pro Pro385 390 395
400Ala Thr Val Ile14404PRTAlligator mississippiensis 14Gly Gln Leu Asp
Ser Val Thr Asn Ala Gly Val His Thr Tyr Gln Gly1 5
10 15His Arg Gly Val Ala Asn Lys Pro Asn Val
Ile Leu Gln Ile Gly Lys 20 25
30Cys Arg Thr Glu Met Leu Glu His Val Arg Arg Thr His Arg His Leu
35 40 45Leu Thr Glu Val Ser Lys Gln Val
Glu Arg Glu Leu Lys Gly Leu Gln 50 55
60Lys Ser Val Gly Lys Leu Glu Asn Asn Leu Glu Asp His Val Pro Thr65
70 75 80Asp Asn Gln Arg Trp
Lys Lys Ser Ile Lys Ala Cys Leu Ala Arg Cys 85
90 95Gln Glu Thr Ile Ala His Leu Glu Arg Trp Val
Lys Arg Glu Met Asn 100 105
110Val Trp Lys Glu Val Phe Phe Arg Leu Glu Arg Trp Ala Asp Arg Leu
115 120 125Glu Ser Met Gly Gly Lys Tyr
Cys Pro Thr Asp Ser Ala Arg Gln Thr 130 135
140Val Ser Val Gly Val Gly Gly Pro Glu Ile Arg Pro Ser Glu Gly
Glu145 150 155 160Ile Tyr
Asp Tyr Ala Leu Asp Met Ser Gln Met Tyr Ala Leu Thr Pro
165 170 175Ser Pro Gly Glu Leu Pro Ser
Ile Pro Gln Pro His Asp Ser Tyr Gln 180 185
190Trp Val Thr Ser Pro Glu Asp Ala Pro Ala Ser Pro Val Glu
Thr Gln 195 200 205Val Phe Glu Asp
Pro Arg Glu Phe Leu Cys His Leu Glu Glu Tyr Leu 210
215 220Lys Gln Val Gly Gly Thr Glu Glu Tyr Trp Leu Ser
Gln Ile Gln Asn225 230 235
240His Met Asn Gly Pro Ala Lys Lys Trp Trp Glu Tyr Lys Gln Asp Thr
245 250 255Val Lys Asn Trp Val
Glu Phe Lys Lys Glu Phe Leu Gln Tyr Ser Glu 260
265 270Gly Thr Leu Thr Arg Asp Ala Ile Lys Arg Glu Leu
Asp Leu Pro Gln 275 280 285Lys Asp
Gly Glu Pro Leu Asp Gln Phe Leu Trp Arg Lys Arg Asp Leu 290
295 300Tyr Gln Thr Leu Tyr Ile Asp Ala Asp Glu Glu
Gln Ile Ile Gln Tyr305 310 315
320Val Val Gly Thr Leu Gln Pro Lys Leu Lys Arg Phe Leu Ser Tyr Pro
325 330 335Leu Pro Lys Thr
Leu Glu Gln Leu Ile Gln Lys Gly Lys Glu Val Gln 340
345 350Gly Ser Leu Asp His Ser Glu Glu Pro Ser Pro
Gln Arg Ala Ser Glu 355 360 365Ala
Arg Thr Gly Asp Ser Val Glu Ser Leu Pro Pro Ser Thr Thr Thr 370
375 380Ser Pro Asn Ala Ser Ser Gly Thr Gln Pro
Glu Ala Pro Ser Pro Pro385 390 395
400Ala Thr Val Ile15408PRTGekko japonicus 15Gly Gln Leu Glu Asn
Val Asn His Gly Asn Leu His Ser Phe Gln Gly1 5
10 15His Arg Gly Gly Val Ala Asn Lys Pro Asn Val
Ile Leu Gln Ile Gly 20 25
30Lys Cys Arg Ala Glu Met Leu Asp His Val Arg Arg Thr His Arg His
35 40 45Leu Leu Thr Glu Val Ser Lys Gln
Val Glu Arg Glu Leu Lys Gly Leu 50 55
60Gln Lys Ser Val Gly Lys Leu Glu Asn Asn Leu Glu Asp His Val Pro65
70 75 80Ser Ala Val Glu Asn
Gln Arg Trp Lys Lys Ser Ile Lys Ala Cys Leu 85
90 95Ser Arg Cys Gln Glu Thr Ile Ala His Leu Glu
Arg Trp Val Lys Arg 100 105
110Glu Met Asn Val Trp Lys Glu Val Phe Phe Arg Leu Glu Arg Trp Ala
115 120 125Asp Arg Leu Glu Ser Gly Gly
Gly Lys Tyr Cys His Gly Asp Asn His 130 135
140Arg Gln Thr Val Ser Val Gly Val Gly Gly Pro Glu Val Arg Pro
Ser145 150 155 160Glu Gly
Glu Ile Tyr Asp Tyr Ala Leu Asp Met Ser Gln Met Tyr Ala
165 170 175Leu Thr Pro Pro Ser Pro Gly
Asp Val Pro Val Val Ser Gln Pro His 180 185
190Asp Ser Tyr Gln Trp Val Thr Val Pro Glu Asp Thr Pro Pro
Ser Pro 195 200 205Val Glu Thr Gln
Ile Phe Glu Asp Pro Arg Glu Phe Leu Thr His Leu 210
215 220Glu Asp Tyr Leu Lys Gln Val Gly Gly Thr Glu Glu
Tyr Trp Leu Ser225 230 235
240Gln Ile Gln Asn His Met Asn Gly Pro Ala Lys Lys Trp Trp Glu Tyr
245 250 255Lys Gln Asp Ser Val
Lys Asn Trp Leu Glu Phe Lys Lys Glu Phe Leu 260
265 270Gln Tyr Ser Glu Gly Thr Leu Thr Arg Asp Ala Ile
Lys Glu Glu Leu 275 280 285Asp Leu
Pro Gln Lys Asp Gly Glu Pro Leu Asp Gln Phe Leu Trp Arg 290
295 300Lys Arg Asp Leu Tyr Gln Thr Leu Tyr Val Glu
Ala Asp Glu Glu Glu305 310 315
320Val Ile Gln Tyr Val Val Gly Thr Leu Gln Pro Lys Leu Lys Arg Phe
325 330 335Leu Ser His Pro
Tyr Pro Lys Thr Leu Glu Gln Leu Ile Gln Arg Gly 340
345 350Lys Glu Val Glu Gly Asn Leu Asp Asn Ser Glu
Glu Pro Thr Pro Gln 355 360 365Arg
Thr Pro Glu His Gln Leu Cys Gly Ser Val Glu Ser Leu Pro Pro 370
375 380Ser Ser Thr Val Ser Pro Val Ala Ser Asp
Gly Thr Gln Pro Glu Thr385 390 395
400Ser Pro Leu Pro Ala Thr Val Ile
40516455PRTHomo sapiens 16Gly Pro Leu Thr Leu Leu Gln Asp Trp Cys Arg Gly
Glu His Leu Asn1 5 10
15Thr Arg Arg Cys Met Leu Ile Leu Gly Ile Pro Glu Asp Cys Gly Glu
20 25 30Asp Glu Phe Glu Glu Thr Leu
Gln Glu Ala Cys Arg His Leu Gly Arg 35 40
45Tyr Arg Val Ile Gly Arg Met Phe Arg Arg Glu Glu Asn Ala Gln
Ala 50 55 60Ile Leu Leu Glu Leu Ala
Gln Asp Ile Asp Tyr Ala Leu Leu Pro Arg65 70
75 80Glu Ile Pro Gly Lys Gly Gly Pro Trp Glu Val
Ile Val Lys Pro Arg 85 90
95Asn Ser Asp Gly Glu Phe Leu Asn Arg Leu Asn Arg Phe Leu Glu Glu
100 105 110Glu Arg Arg Thr Val Ser
Asp Met Asn Arg Val Leu Gly Ser Asp Thr 115 120
125Asn Cys Ser Ala Pro Arg Val Thr Ile Ser Pro Glu Phe Trp
Thr Trp 130 135 140Ala Gln Thr Leu Gly
Ala Ala Val Gln Pro Leu Leu Glu Gln Met Leu145 150
155 160Tyr Arg Glu Leu Arg Val Phe Ser Gly Asn
Thr Ile Ser Ile Pro Gly 165 170
175Ala Leu Ala Phe Asp Ala Trp Leu Glu His Thr Thr Glu Met Leu Gln
180 185 190Met Trp Gln Val Pro
Glu Gly Glu Lys Arg Arg Arg Leu Met Glu Cys 195
200 205Leu Arg Gly Pro Ala Leu Gln Val Val Ser Gly Leu
Arg Ala Ser Asn 210 215 220Ala Ser Ile
Thr Val Glu Glu Cys Leu Ala Ala Leu Gln Gln Val Phe225
230 235 240Gly Pro Val Glu Ser His Lys
Ile Ala Gln Val Lys Leu Cys Lys Ala 245
250 255Tyr Gln Glu Ala Gly Glu Lys Val Ser Ser Phe Val
Leu Arg Leu Glu 260 265 270Pro
Leu Leu Gln Arg Ala Val Glu Asn Asn Val Val Ser Arg Arg Asn 275
280 285Val Asn Gln Thr Arg Leu Lys Arg Val
Leu Ser Gly Ala Thr Leu Pro 290 295
300Asp Lys Leu Arg Asp Lys Leu Lys Leu Met Lys Gln Arg Arg Lys Pro305
310 315 320Pro Gly Phe Leu
Ala Leu Val Lys Leu Leu Arg Glu Glu Glu Glu Trp 325
330 335Glu Ala Thr Leu Gly Pro Asp Arg Glu Ser
Leu Glu Gly Leu Glu Val 340 345
350Ala Pro Arg Pro Pro Ala Arg Ile Thr Gly Val Gly Ala Val Pro Leu
355 360 365Pro Ala Ser Gly Asn Ser Phe
Asp Ala Arg Pro Ser Gln Gly Tyr Arg 370 375
380Arg Arg Arg Gly Arg Gly Gln His Arg Arg Gly Gly Val Ala Arg
Ala385 390 395 400Gly Ser
Arg Gly Ser Arg Lys Arg Lys Arg His Thr Phe Cys Tyr Ser
405 410 415Cys Gly Glu Asp Gly His Ile
Arg Val Gln Cys Ile Asn Pro Ser Asn 420 425
430Leu Leu Leu Ala Lys Glu Thr Lys Glu Ile Leu Glu Gly Gly
Glu Arg 435 440 445Glu Ala Gln Thr
Asn Ser Arg 450 45517448PRTHomo sapiens 17Gly Ala Leu
Thr Leu Leu Glu Asp Trp Cys Lys Gly Met Asp Met Asp1 5
10 15Pro Arg Lys Ala Leu Leu Ile Val Gly
Ile Pro Met Glu Cys Ser Glu 20 25
30Val Glu Ile Gln Asp Thr Val Lys Ala Gly Leu Gln Pro Leu Cys Ala
35 40 45Tyr Arg Val Leu Gly Arg Met
Phe Arg Arg Glu Asp Asn Ala Lys Ala 50 55
60Val Phe Ile Glu Leu Ala Asp Thr Val Asn Tyr Thr Thr Leu Pro Ser65
70 75 80His Ile Pro Gly
Lys Gly Gly Ser Trp Glu Val Val Val Lys Pro Arg 85
90 95Asn Pro Asp Asp Glu Phe Leu Ser Arg Leu
Asn Tyr Phe Leu Lys Asp 100 105
110Glu Gly Arg Ser Met Thr Asp Val Ala Arg Ala Leu Gly Cys Cys Ser
115 120 125Leu Pro Ala Glu Ser Leu Asp
Ala Glu Val Met Pro Gln Val Arg Ser 130 135
140Pro Pro Leu Glu Pro Pro Lys Glu Ser Met Trp Tyr Arg Lys Leu
Lys145 150 155 160Val Phe
Ser Gly Thr Ala Ser Pro Ser Pro Gly Glu Glu Thr Phe Glu
165 170 175Asp Trp Leu Glu Gln Val Thr
Glu Ile Met Pro Ile Trp Gln Val Ser 180 185
190Glu Val Glu Lys Arg Arg Arg Leu Leu Glu Ser Leu Arg Gly
Pro Ala 195 200 205Leu Ser Ile Met
Arg Val Leu Gln Ala Asn Asn Asp Ser Ile Thr Val 210
215 220Glu Gln Cys Leu Asp Ala Leu Lys Gln Ile Phe Gly
Asp Lys Glu Asp225 230 235
240Phe Arg Ala Ser Gln Phe Arg Phe Leu Gln Thr Ser Pro Lys Ile Gly
245 250 255Glu Lys Val Ser Thr
Phe Leu Leu Arg Leu Glu Pro Leu Leu Gln Lys 260
265 270Ala Val His Lys Ser Pro Leu Ser Val Arg Ser Thr
Asp Met Ile Arg 275 280 285Leu Lys
His Leu Leu Ala Arg Val Ala Met Thr Pro Ala Leu Arg Gly 290
295 300Lys Leu Glu Leu Leu Asp Gln Arg Gly Cys Pro
Pro Asn Phe Leu Glu305 310 315
320Leu Met Lys Leu Ile Arg Asp Glu Glu Glu Trp Glu Asn Thr Glu Ala
325 330 335Val Met Lys Asn
Lys Glu Lys Pro Ser Gly Arg Gly Arg Gly Ala Ser 340
345 350Gly Arg Gln Ala Arg Ala Glu Ala Ser Val Ser
Ala Pro Gln Ala Thr 355 360 365Val
Gln Ala Arg Ser Phe Ser Asp Ser Ser Pro Gln Thr Ile Gln Gly 370
375 380Gly Leu Pro Pro Leu Val Lys Arg Arg Arg
Leu Leu Gly Ser Glu Ser385 390 395
400Thr Arg Gly Glu Asp His Gly Gln Ala Thr Tyr Pro Lys Ala Glu
Asn 405 410 415Gln Thr Pro
Gly Arg Glu Gly Pro Gln Ala Ala Gly Glu Glu Leu Gly 420
425 430Asn Glu Ala Gly Ala Gly Ala Met Ser His
Pro Lys Pro Trp Glu Thr 435 440
44518399PRTHomo sapiens 18Gly Ala Val Thr Met Leu Gln Asp Trp Cys Arg Trp
Met Gly Val Asn1 5 10
15Ala Arg Arg Gly Leu Leu Ile Leu Gly Ile Pro Glu Asp Cys Asp Asp
20 25 30Ala Glu Phe Gln Glu Ser Leu
Glu Ala Ala Leu Arg Pro Met Gly His 35 40
45Phe Thr Val Leu Gly Lys Ala Phe Arg Glu Glu Asp Asn Ala Thr
Ala 50 55 60Ala Leu Val Glu Leu Asp
Arg Glu Val Asn Tyr Ala Leu Val Pro Arg65 70
75 80Glu Ile Pro Gly Thr Gly Gly Pro Trp Asn Val
Val Phe Val Pro Arg 85 90
95Cys Ser Gly Glu Glu Phe Leu Gly Leu Gly Arg Val Phe His Phe Pro
100 105 110Glu Gln Glu Gly Gln Met
Val Glu Ser Val Ala Gly Ala Leu Gly Val 115 120
125Gly Leu Arg Arg Val Cys Trp Leu Arg Ser Ile Gly Gln Ala
Val Gln 130 135 140Pro Trp Val Glu Ala
Val Arg Cys Gln Ser Leu Gly Val Phe Ser Gly145 150
155 160Arg Asp Gln Pro Ala Pro Gly Glu Glu Ser
Phe Glu Val Trp Leu Asp 165 170
175His Thr Thr Glu Met Leu His Val Trp Gln Gly Val Ser Glu Arg Glu
180 185 190Arg Arg Arg Arg Leu
Leu Glu Gly Leu Arg Gly Thr Ala Leu Gln Leu 195
200 205Val His Ala Leu Leu Ala Glu Asn Pro Ala Arg Thr
Ala Gln Asp Cys 210 215 220Leu Ala Ala
Leu Ala Gln Val Phe Gly Asp Asn Glu Ser Gln Ala Thr225
230 235 240Ile Arg Val Lys Cys Leu Thr
Ala Gln Gln Gln Ser Gly Glu Arg Leu 245
250 255Ser Ala Phe Val Leu Arg Leu Glu Val Leu Leu Gln
Lys Ala Met Glu 260 265 270Lys
Glu Ala Leu Ala Arg Ala Ser Ala Asp Arg Val Arg Leu Arg Gln 275
280 285Met Leu Thr Arg Ala His Leu Thr Glu
Pro Leu Asp Glu Ala Leu Arg 290 295
300Lys Leu Arg Met Ala Gly Arg Ser Pro Ser Phe Leu Glu Met Leu Gly305
310 315 320Leu Val Arg Glu
Ser Glu Ala Trp Glu Ala Ser Leu Ala Arg Ser Val 325
330 335Arg Ala Gln Thr Gln Glu Gly Ala Gly Ala
Arg Ala Gly Ala Gln Ala 340 345
350Val Ala Arg Ala Ser Thr Lys Val Glu Ala Val Pro Gly Gly Pro Gly
355 360 365Arg Glu Pro Glu Gly Leu Leu
Gln Ala Gly Gly Gln Glu Ala Glu Glu 370 375
380Leu Leu Gln Glu Gly Leu Lys Pro Val Leu Glu Glu Cys Asp Asn385
390 39519399PRTHomo sapiens 19Gly Ala Val
Thr Met Leu Gln Asp Trp Cys Arg Trp Met Gly Val Asn1 5
10 15Ala Arg Arg Gly Leu Leu Ile Leu Gly
Ile Pro Glu Asp Cys Asp Asp 20 25
30Ala Glu Phe Gln Glu Ser Leu Glu Ala Ala Leu Arg Pro Met Gly His
35 40 45Phe Thr Val Leu Gly Lys Val
Phe Arg Glu Glu Asp Asn Ala Thr Ala 50 55
60Ala Leu Val Glu Leu Asp Arg Glu Val Asn Tyr Ala Leu Val Pro Arg65
70 75 80Glu Ile Pro Gly
Thr Gly Gly Pro Trp Asn Val Val Phe Val Pro Arg 85
90 95Cys Ser Gly Glu Glu Phe Leu Gly Leu Gly
Arg Val Phe His Phe Pro 100 105
110Glu Gln Glu Gly Gln Met Val Glu Ser Val Ala Gly Ala Leu Gly Val
115 120 125Gly Leu Arg Arg Val Cys Trp
Leu Arg Ser Ile Gly Gln Ala Val Gln 130 135
140Pro Trp Val Glu Ala Val Arg Tyr Gln Ser Leu Gly Val Phe Ser
Gly145 150 155 160Arg Asp
Gln Pro Ala Pro Gly Glu Glu Ser Phe Glu Val Trp Leu Asp
165 170 175His Thr Thr Glu Met Leu His
Val Trp Gln Gly Val Ser Glu Arg Glu 180 185
190Arg Arg Arg Arg Leu Leu Glu Gly Leu Arg Gly Thr Ala Leu
Gln Leu 195 200 205Val His Ala Leu
Leu Ala Glu Asn Pro Ala Arg Thr Ala Gln Asp Cys 210
215 220Leu Ala Ala Leu Ala Gln Val Phe Gly Asp Asn Glu
Ser Gln Ala Thr225 230 235
240Ile Arg Val Lys Cys Leu Thr Ala Gln Gln Gln Ser Gly Glu Arg Leu
245 250 255Ser Ala Phe Val Leu
Arg Leu Glu Val Leu Leu Gln Lys Ala Met Glu 260
265 270Lys Glu Ala Leu Ala Arg Ala Ser Ala Asp Arg Val
Arg Leu Arg Gln 275 280 285Met Leu
Thr Arg Ala His Leu Thr Glu Pro Leu Asp Glu Ala Leu Arg 290
295 300Lys Leu Arg Met Ala Gly Arg Ser Pro Ser Phe
Leu Glu Met Leu Gly305 310 315
320Leu Val Arg Glu Ser Glu Ala Trp Glu Ala Ser Leu Ala Arg Ser Val
325 330 335Arg Ala Gln Thr
Gln Glu Gly Ala Gly Ala Arg Ala Gly Ala Gln Ala 340
345 350Val Ala Arg Ala Ser Thr Lys Val Glu Ala Val
Pro Gly Gly Pro Gly 355 360 365Arg
Glu Pro Glu Gly Leu Arg Gln Ala Gly Gly Gln Glu Ala Glu Glu 370
375 380Leu Leu Gln Glu Gly Leu Lys Pro Val Leu
Glu Glu Cys Asp Asn385 390
39520475PRTHomo sapiens 20Gly Val Glu Asp Leu Ala Ala Ser Tyr Ile Val Leu
Lys Leu Glu Asn1 5 10
15Glu Ile Arg Gln Ala Gln Val Gln Trp Leu Met Glu Glu Asn Ala Ala
20 25 30Leu Gln Ala Gln Ile Pro Glu
Leu Gln Lys Ser Gln Ala Ala Lys Glu 35 40
45Tyr Asp Leu Leu Arg Lys Ser Ser Glu Ala Lys Glu Pro Gln Lys
Leu 50 55 60Pro Glu His Met Asn Pro
Pro Ala Ala Trp Glu Ala Gln Lys Thr Pro65 70
75 80Glu Phe Lys Glu Pro Gln Lys Pro Pro Glu Pro
Gln Asp Leu Leu Pro 85 90
95Trp Glu Pro Pro Ala Ala Trp Glu Leu Gln Glu Ala Pro Ala Ala Pro
100 105 110Glu Ser Leu Ala Pro Pro
Ala Thr Arg Glu Ser Gln Lys Pro Pro Met 115 120
125Ala His Glu Ile Pro Thr Val Leu Glu Gly Gln Gly Pro Ala
Asn Thr 130 135 140Gln Asp Ala Thr Ile
Ala Gln Glu Pro Lys Asn Ser Glu Pro Gln Asp145 150
155 160Pro Pro Asn Ile Glu Lys Pro Gln Glu Ala
Pro Glu Tyr Gln Glu Thr 165 170
175Ala Ala Gln Leu Glu Phe Leu Glu Leu Pro Pro Pro Gln Glu Pro Leu
180 185 190Glu Pro Ser Asn Ala
Gln Glu Phe Leu Glu Leu Ser Ala Ala Gln Glu 195
200 205Ser Leu Glu Gly Leu Ile Val Val Glu Thr Ser Ala
Ala Ser Glu Phe 210 215 220Pro Gln Ala
Pro Ile Gly Leu Glu Ala Thr Asp Phe Pro Leu Gln Tyr225
230 235 240Thr Leu Thr Phe Ser Gly Asp
Ser Gln Lys Leu Pro Glu Phe Leu Val 245
250 255Gln Leu Tyr Ser Tyr Met Arg Val Arg Gly His Leu
Tyr Pro Thr Glu 260 265 270Ala
Ala Leu Val Ser Phe Val Gly Asn Cys Phe Ser Gly Arg Ala Gly 275
280 285Trp Trp Phe Gln Leu Leu Leu Asp Ile
Gln Ser Pro Leu Leu Glu Gln 290 295
300Cys Glu Ser Phe Ile Pro Val Leu Gln Asp Thr Phe Asp Asn Pro Glu305
310 315 320Asn Met Lys Asp
Ala Asn Gln Cys Ile His Gln Leu Cys Gln Gly Glu 325
330 335Gly His Val Ala Thr His Phe His Leu Ile
Ala Gln Glu Leu Asn Trp 340 345
350Asp Glu Ser Thr Leu Trp Ile Gln Phe Gln Glu Gly Leu Ala Ser Ser
355 360 365Ile Gln Asp Glu Leu Ser His
Thr Ser Pro Ala Thr Asn Leu Ser Asp 370 375
380Leu Ile Thr Gln Cys Ile Ser Leu Glu Glu Lys Pro Asp Pro Asn
Pro385 390 395 400Leu Gly
Lys Ser Ser Ser Ala Glu Gly Asp Gly Pro Glu Ser Pro Pro
405 410 415Ala Glu Asn Gln Pro Met Gln
Ala Ala Ile Asn Cys Pro His Ile Ser 420 425
430Glu Ala Glu Trp Val Arg Trp His Lys Gly Arg Leu Cys Leu
Tyr Cys 435 440 445Gly Tyr Pro Gly
His Phe Ala Arg Asp Cys Pro Val Lys Pro His Gln 450
455 460Ala Leu Gln Ala Gly Asn Ile Gln Ala Cys Gln465
470 47521239PRTHomo sapiens 21Gly Val Gln Pro
Gln Thr Ser Lys Ala Glu Ser Pro Ala Leu Ala Ala1 5
10 15Ser Pro Asn Ala Gln Met Asp Asp Val Ile
Asp Thr Leu Thr Ser Leu 20 25
30Arg Leu Thr Asn Ser Ala Leu Arg Arg Glu Ala Ser Thr Leu Arg Ala
35 40 45Glu Lys Ala Asn Leu Thr Asn Met
Leu Glu Ser Val Met Ala Glu Leu 50 55
60Thr Leu Leu Arg Thr Arg Ala Arg Ile Pro Gly Ala Leu Gln Ile Thr65
70 75 80Pro Pro Ile Ser Ser
Ile Thr Ser Asn Gly Thr Arg Pro Met Thr Thr 85
90 95Pro Pro Thr Ser Leu Pro Glu Pro Phe Ser Gly
Asp Pro Gly Arg Leu 100 105
110Ala Gly Phe Leu Met Gln Met Asp Arg Phe Met Ile Phe Gln Ala Ser
115 120 125Arg Phe Pro Gly Glu Ala Glu
Arg Val Ala Phe Leu Val Ser Arg Leu 130 135
140Thr Gly Glu Ala Glu Lys Trp Ala Ile Pro His Met Gln Pro Asp
Ser145 150 155 160Pro Leu
Arg Asn Asn Tyr Gln Gly Phe Leu Ala Glu Leu Arg Arg Thr
165 170 175Tyr Lys Ser Pro Leu Arg His
Ala Arg Arg Ala Gln Ile Arg Lys Thr 180 185
190Ser Ala Ser Asn Arg Ala Val Arg Glu Arg Gln Met Leu Cys
Arg Gln 195 200 205Leu Ala Ser Ala
Gly Thr Gly Pro Cys Pro Val His Pro Ala Ser Asn 210
215 220Gly Thr Ser Pro Ala Pro Ala Leu Pro Ala Arg Ala
Arg Asn Leu225 230 23522113PRTHomo
sapiens 22Gly Asp Gly Arg Val Gln Leu Met Lys Ala Leu Leu Ala Gly Pro
Leu1 5 10 15Arg Pro Ala
Ala Arg Arg Trp Arg Asn Pro Ile Pro Phe Pro Glu Thr 20
25 30Phe Asp Gly Asp Thr Asp Arg Leu Pro Glu
Phe Ile Val Gln Thr Ser 35 40
45Ser Tyr Met Phe Val Asp Glu Asn Thr Phe Ser Asn Asp Ala Leu Lys 50
55 60Val Thr Phe Leu Ile Thr Arg Leu Thr
Gly Pro Ala Leu Gln Trp Val65 70 75
80Ile Pro Tyr Ile Arg Lys Glu Ser Pro Leu Leu Asn Asp Tyr
Arg Gly 85 90 95Phe Leu
Ala Glu Met Lys Arg Val Phe Gly Trp Glu Glu Asp Glu Asp 100
105 110Phe23113PRTHomo sapiens 23Gly Glu Gly
Arg Val Gln Leu Met Lys Ala Leu Leu Ala Arg Pro Leu1 5
10 15Arg Pro Ala Ala Arg Arg Trp Arg Asn
Pro Ile Pro Phe Pro Glu Thr 20 25
30Phe Asp Gly Asp Thr Asp Arg Leu Pro Glu Phe Ile Val Gln Thr Ser
35 40 45Ser Tyr Met Phe Val Asp Glu
Asn Thr Phe Ser Asn Asp Ala Leu Lys 50 55
60Val Thr Phe Leu Ile Thr Arg Leu Thr Gly Pro Ala Leu Gln Trp Val65
70 75 80Ile Pro Tyr Ile
Lys Lys Glu Ser Pro Leu Leu Ser Asp Tyr Arg Gly 85
90 95Phe Leu Ala Glu Met Lys Arg Val Phe Gly
Trp Glu Glu Asp Glu Asp 100 105
110Phe24364PRTHomo sapiens 24Gly Pro Arg Gly Arg Cys Arg Gln Gln Gly Pro
Arg Ile Pro Ile Trp1 5 10
15Ala Ala Ala Asn Tyr Ala Asn Ala His Pro Trp Gln Gln Met Asp Lys
20 25 30Ala Ser Pro Gly Val Ala Tyr
Thr Pro Leu Val Asp Pro Trp Ile Glu 35 40
45Arg Pro Cys Cys Gly Asp Thr Val Cys Val Arg Thr Thr Met Glu
Gln 50 55 60Lys Ser Thr Ala Ser Gly
Thr Cys Gly Gly Lys Pro Ala Glu Arg Gly65 70
75 80Pro Leu Ala Gly His Met Pro Ser Ser Arg Pro
His Arg Val Asp Phe 85 90
95Cys Trp Val Pro Gly Ser Asp Pro Gly Thr Phe Asp Gly Ser Pro Trp
100 105 110Leu Leu Asp Arg Phe Leu
Ala Gln Leu Gly Asp Tyr Met Ser Phe His 115 120
125Phe Glu His Tyr Gln Asp Asn Ile Ser Arg Val Cys Glu Ile
Leu Arg 130 135 140Arg Leu Thr Gly Arg
Ala Gln Ala Trp Ala Ala Pro Tyr Leu Asp Gly145 150
155 160Asp Leu Pro Leu Pro Asp Asp Tyr Glu Leu
Phe Cys Gln Asp Leu Lys 165 170
175Glu Val Val Gln Asp Pro Asn Ser Phe Ala Glu Tyr His Ala Val Val
180 185 190Thr Cys Pro Leu Pro
Leu Ala Ser Ser Gln Leu Pro Val Ala Pro Gln 195
200 205Leu Pro Val Val Arg Gln Tyr Leu Ala Arg Phe Leu
Glu Gly Leu Ala 210 215 220Leu Asp Met
Gly Thr Ala Pro Arg Ser Leu Pro Ala Ala Met Ala Thr225
230 235 240Pro Ala Val Ser Gly Ser Asn
Ser Val Ser Arg Ser Ala Leu Phe Glu 245
250 255Gln Gln Leu Thr Lys Glu Ser Thr Pro Gly Pro Lys
Glu Pro Pro Val 260 265 270Leu
Pro Ser Ser Thr Cys Ser Ser Lys Pro Gly Pro Val Glu Pro Ala 275
280 285Ser Ser Gln Pro Glu Glu Ala Ala Pro
Thr Pro Val Pro Arg Leu Ser 290 295
300Glu Ser Ala Asn Pro Pro Ala Gln Arg Pro Asp Pro Ala His Pro Gly305
310 315 320Gly Pro Lys Pro
Gln Lys Thr Glu Glu Glu Val Leu Glu Thr Glu Gly 325
330 335Asp Gln Glu Val Ser Leu Gly Thr Pro Gln
Glu Val Val Glu Ala Pro 340 345
350Glu Thr Pro Gly Glu Pro Pro Leu Ser Pro Gly Phe 355
36025146PRTHomo sapiens 25Gly Val Asp Glu Leu Val Leu Leu Leu His
Ala Leu Leu Met Arg His1 5 10
15Arg Ala Leu Ser Ile Glu Asn Ser Gln Leu Met Glu Gln Leu Arg Leu
20 25 30Leu Val Cys Glu Arg Ala
Ser Leu Leu Arg Gln Val Arg Pro Pro Ser 35 40
45Cys Pro Val Pro Phe Pro Glu Thr Phe Asn Gly Glu Ser Ser
Arg Leu 50 55 60Pro Glu Phe Ile Val
Gln Thr Ala Ser Tyr Met Leu Val Asn Glu Asn65 70
75 80Arg Phe Cys Asn Asp Ala Met Lys Val Ala
Phe Leu Ile Ser Leu Leu 85 90
95Thr Gly Glu Ala Glu Glu Trp Val Val Pro Tyr Ile Glu Met Asp Ser
100 105 110Pro Ile Leu Gly Asp
Tyr Arg Ala Phe Leu Asp Glu Met Lys Gln Cys 115
120 125Phe Gly Trp Asp Asp Asp Glu Asp Asp Asp Asp Glu
Glu Glu Glu Asp 130 135 140Asp
Tyr14526549PRTHomo sapiens 26Gly Pro Val Asp Leu Gly Gln Ala Leu Gly Leu
Leu Pro Ser Leu Ala1 5 10
15Lys Ala Glu Asp Ser Gln Phe Ser Glu Ser Asp Ala Ala Leu Gln Glu
20 25 30Glu Leu Ser Ser Pro Glu Thr
Ala Arg Gln Leu Phe Arg Gln Phe Arg 35 40
45Tyr Gln Val Met Ser Gly Pro His Glu Thr Leu Lys Gln Leu Arg
Lys 50 55 60Leu Cys Phe Gln Trp Leu
Gln Pro Glu Val His Thr Lys Glu Gln Ile65 70
75 80Leu Glu Ile Leu Met Leu Glu Gln Phe Leu Thr
Ile Leu Pro Gly Glu 85 90
95Ile Gln Met Trp Val Arg Lys Gln Cys Pro Gly Ser Gly Glu Glu Ala
100 105 110Val Thr Leu Val Glu Ser
Leu Lys Gly Asp Pro Gln Arg Leu Trp Gln 115 120
125Trp Ile Ser Ile Gln Val Leu Gly Gln Asp Ile Leu Ser Glu
Lys Met 130 135 140Glu Ser Pro Ser Cys
Gln Val Gly Glu Val Glu Pro His Leu Glu Val145 150
155 160Val Pro Gln Glu Leu Gly Leu Glu Asn Ser
Ser Ser Gly Pro Gly Glu 165 170
175Leu Leu Ser His Ile Val Lys Glu Glu Ser Asp Thr Glu Ala Glu Leu
180 185 190Ala Leu Ala Ala Ser
Gln Pro Ala Arg Leu Glu Glu Arg Leu Ile Arg 195
200 205Asp Gln Asp Leu Gly Ala Ser Leu Leu Pro Ala Ala
Pro Gln Glu Gln 210 215 220Trp Arg Gln
Leu Asp Ser Thr Gln Lys Glu Gln Tyr Trp Asp Leu Met225
230 235 240Leu Glu Thr Tyr Gly Lys Met
Val Ser Gly Ala Gly Ile Ser His Pro 245
250 255Lys Ser Asp Leu Thr Asn Ser Ile Glu Phe Gly Glu
Glu Leu Ala Gly 260 265 270Ile
Tyr Leu His Val Asn Glu Lys Ile Pro Arg Pro Thr Cys Ile Gly 275
280 285Asp Arg Gln Glu Asn Asp Lys Glu Asn
Leu Asn Leu Glu Asn His Arg 290 295
300Asp Gln Glu Leu Leu His Ala Ser Cys Gln Ala Ser Gly Glu Val Pro305
310 315 320Ser Gln Ala Ser
Leu Arg Gly Phe Phe Thr Glu Asp Glu Pro Gly Cys 325
330 335Phe Gly Glu Gly Glu Asn Leu Pro Glu Ala
Leu Gln Asn Ile Gln Asp 340 345
350Glu Gly Thr Gly Glu Gln Leu Ser Pro Gln Glu Arg Ile Ser Glu Lys
355 360 365Gln Leu Gly Gln His Leu Pro
Asn Pro His Ser Gly Glu Met Ser Thr 370 375
380Met Trp Leu Glu Glu Lys Arg Glu Thr Ser Gln Lys Gly Gln Pro
Arg385 390 395 400Ala Pro
Met Ala Gln Lys Leu Pro Thr Cys Arg Glu Cys Gly Lys Thr
405 410 415Phe Tyr Arg Asn Ser Gln Leu
Ile Phe His Gln Arg Thr His Thr Gly 420 425
430Glu Thr Tyr Phe Gln Cys Thr Ile Cys Lys Lys Ala Phe Leu
Arg Ser 435 440 445Ser Asp Phe Val
Lys His Gln Arg Thr His Thr Gly Glu Lys Pro Cys 450
455 460Lys Cys Asp Tyr Cys Gly Lys Gly Phe Ser Asp Phe
Ser Gly Leu Arg465 470 475
480His His Glu Lys Ile His Thr Gly Glu Lys Pro Tyr Lys Cys Pro Ile
485 490 495Cys Glu Lys Ser Phe
Ile Gln Arg Ser Asn Phe Asn Arg His Gln Arg 500
505 510Val His Thr Gly Glu Lys Pro Tyr Lys Cys Ser His
Cys Gly Lys Ser 515 520 525Phe Ser
Trp Ser Ser Ser Leu Asp Lys His Gln Arg Ser His Leu Gly 530
535 540Lys Lys Pro Phe Gln54527351PRTHomo sapiens
27Gly Thr Leu Arg Leu Leu Glu Asp Trp Cys Arg Gly Met Asp Met Asn1
5 10 15Pro Arg Lys Ala Leu Leu
Ile Ala Gly Ile Ser Gln Ser Cys Ser Val 20 25
30Ala Glu Ile Glu Glu Ala Leu Gln Ala Gly Leu Ala Pro
Leu Gly Glu 35 40 45Tyr Arg Leu
Leu Gly Arg Met Phe Arg Arg Asp Glu Asn Arg Lys Val 50
55 60Ala Leu Val Gly Leu Thr Ala Glu Thr Ser His Ala
Leu Val Pro Lys65 70 75
80Glu Ile Pro Gly Lys Gly Gly Ile Trp Arg Val Ile Phe Lys Pro Pro
85 90 95Asp Pro Asp Asn Thr Phe
Leu Ser Arg Leu Asn Glu Phe Leu Ala Gly 100
105 110Glu Gly Met Thr Val Gly Glu Leu Ser Arg Ala Leu
Gly His Glu Asn 115 120 125Gly Ser
Leu Asp Pro Glu Gln Gly Met Ile Pro Glu Met Trp Ala Pro 130
135 140Met Leu Ala Gln Ala Leu Glu Ala Leu Gln Pro
Ala Leu Gln Cys Leu145 150 155
160Lys Tyr Lys Lys Leu Arg Val Phe Ser Gly Arg Glu Ser Pro Glu Pro
165 170 175Gly Glu Glu Glu
Phe Gly Arg Trp Met Phe His Thr Thr Gln Met Ile 180
185 190Lys Ala Trp Gln Val Pro Asp Val Glu Lys Arg
Arg Arg Leu Leu Glu 195 200 205Ser
Leu Arg Gly Pro Ala Leu Asp Val Ile Arg Val Leu Lys Ile Asn 210
215 220Asn Pro Leu Ile Thr Val Asp Glu Cys Leu
Gln Ala Leu Glu Glu Val225 230 235
240Phe Gly Val Thr Asp Asn Pro Arg Glu Leu Gln Val Lys Tyr Leu
Thr 245 250 255Thr Tyr His
Lys Asp Glu Glu Lys Leu Ser Ala Tyr Val Leu Arg Leu 260
265 270Glu Pro Leu Leu Gln Lys Leu Val Gln Arg
Gly Ala Ile Glu Arg Asp 275 280
285Ala Val Asn Gln Ala Arg Leu Asp Gln Val Ile Ala Gly Ala Val His 290
295 300Lys Thr Ile Arg Arg Glu Leu Asn
Leu Pro Glu Asp Gly Pro Ala Pro305 310
315 320Gly Phe Leu Gln Leu Leu Val Leu Ile Lys Asp Tyr
Glu Ala Ala Glu 325 330
335Glu Glu Glu Ala Leu Leu Gln Ala Ile Leu Glu Gly Asn Phe Thr
340 345 35028708PRTHomo sapiens 28Gly Thr
Glu Arg Arg Arg Asp Glu Leu Ser Glu Glu Ile Asn Asn Leu1 5
10 15Arg Glu Lys Val Met Lys Gln Ser
Glu Glu Asn Asn Asn Leu Gln Ser 20 25
30Gln Val Gln Lys Leu Thr Glu Glu Asn Thr Thr Leu Arg Glu Gln
Val 35 40 45Glu Pro Thr Pro Glu
Asp Glu Asp Asp Asp Ile Glu Leu Arg Gly Ala 50 55
60Ala Ala Ala Ala Ala Pro Pro Pro Pro Ile Glu Glu Glu Cys
Pro Glu65 70 75 80Asp
Leu Pro Glu Lys Phe Asp Gly Asn Pro Asp Met Leu Ala Pro Phe
85 90 95Met Ala Gln Cys Gln Ile Phe
Met Glu Lys Ser Thr Arg Asp Phe Ser 100 105
110Val Asp Arg Val Arg Val Cys Phe Val Thr Ser Met Met Thr
Gly Arg 115 120 125Ala Ala Arg Trp
Ala Ser Ala Lys Leu Glu Arg Ser His Tyr Leu Met 130
135 140His Asn Tyr Pro Ala Phe Met Met Glu Met Lys His
Val Phe Glu Asp145 150 155
160Pro Gln Arg Arg Glu Val Ala Lys Arg Lys Ile Arg Arg Leu Arg Gln
165 170 175Gly Met Gly Ser Val
Ile Asp Tyr Ser Asn Ala Phe Gln Met Ile Ala 180
185 190Gln Asp Leu Asp Trp Asn Glu Pro Ala Leu Ile Asp
Gln Tyr His Glu 195 200 205Gly Leu
Ser Asp His Ile Gln Glu Glu Leu Ser His Leu Glu Val Ala 210
215 220Lys Ser Leu Ser Ala Leu Ile Gly Gln Cys Ile
His Ile Glu Arg Arg225 230 235
240Leu Ala Arg Ala Ala Ala Ala Arg Lys Pro Arg Ser Pro Pro Arg Ala
245 250 255Leu Val Leu Pro
His Ile Ala Ser His His Gln Val Asp Pro Thr Glu 260
265 270Pro Val Gly Gly Ala Arg Met Arg Leu Thr Gln
Glu Glu Lys Glu Arg 275 280 285Arg
Arg Lys Leu Asn Leu Cys Leu Tyr Cys Gly Thr Gly Gly His Tyr 290
295 300Ala Asp Asn Cys Pro Ala Lys Ala Ser Lys
Ser Ser Pro Ala Gly Lys305 310 315
320Leu Pro Gly Pro Ala Val Glu Gly Pro Ser Ala Thr Gly Pro Glu
Ile 325 330 335Ile Arg Ser
Pro Gln Asp Asp Ala Ser Ser Pro His Leu Gln Val Met 340
345 350Leu Gln Ile His Leu Pro Gly Arg His Thr
Leu Phe Val Arg Ala Met 355 360
365Ile Asp Ser Gly Ala Ser Gly Asn Phe Ile Asp His Glu Tyr Val Ala 370
375 380Gln Asn Gly Ile Pro Leu Arg Ile
Lys Asp Trp Pro Ile Leu Val Glu385 390
395 400Ala Ile Asp Gly Arg Pro Ile Ala Ser Gly Pro Val
Val His Glu Thr 405 410
415His Asp Leu Ile Val Asp Leu Gly Asp His Arg Glu Val Leu Ser Phe
420 425 430Asp Val Thr Gln Ser Pro
Phe Phe Pro Val Val Leu Gly Val Arg Trp 435 440
445Leu Ser Thr His Asp Pro Asn Ile Thr Trp Ser Thr Arg Ser
Ile Val 450 455 460Phe Asp Ser Glu Tyr
Cys Arg Tyr His Cys Arg Met Tyr Ser Pro Ile465 470
475 480Pro Pro Ser Leu Pro Pro Pro Ala Pro Gln
Pro Pro Leu Tyr Tyr Pro 485 490
495Val Asp Gly Tyr Arg Val Tyr Gln Pro Val Arg Tyr Tyr Tyr Val Gln
500 505 510Asn Val Tyr Thr Pro
Val Asp Glu His Val Tyr Pro Asp His Arg Leu 515
520 525Val Asp Pro His Ile Glu Met Ile Pro Gly Ala His
Ser Ile Pro Ser 530 535 540Gly His Val
Tyr Ser Leu Ser Glu Pro Glu Met Ala Ala Leu Arg Asp545
550 555 560Phe Val Ala Arg Asn Val Lys
Asp Gly Leu Ile Thr Pro Thr Ile Ala 565
570 575Pro Asn Gly Ala Gln Val Leu Gln Val Lys Arg Gly
Trp Lys Leu Gln 580 585 590Val
Ser Tyr Asp Cys Arg Ala Pro Asn Asn Phe Thr Ile Gln Asn Gln 595
600 605Tyr Pro Arg Leu Ser Ile Pro Asn Leu
Glu Asp Gln Ala His Leu Ala 610 615
620Thr Tyr Thr Glu Phe Val Pro Gln Ile Pro Gly Tyr Gln Thr Tyr Pro625
630 635 640Thr Tyr Ala Ala
Tyr Pro Thr Tyr Pro Val Gly Phe Ala Trp Tyr Pro 645
650 655Val Gly Arg Asp Gly Gln Gly Arg Ser Leu
Tyr Val Pro Val Met Ile 660 665
670Thr Trp Asn Pro His Trp Tyr Arg Gln Pro Pro Val Pro Gln Tyr Pro
675 680 685Pro Pro Gln Pro Pro Pro Pro
Pro Pro Pro Pro Pro Pro Pro Pro Ser 690 695
700Tyr Ser Thr Leu705291188DNAHomo sapiens 29ggggagctgg accaccggac
cagcggcggg ctccacgcct accccgggcc gcggggcggg 60caggtggcca agcccaacgt
gatcctgcag atcgggaagt gccgggccga gatgctggag 120cacgtgcggc ggacgcaccg
gcacctgctg gccgaggtgt ccaagcaggt ggagcgcgag 180ctgaaggggc tgcaccggtc
ggtcgggaag ctggagagca acctggacgg ctacgtgccc 240acgagcgact cgcagcgctg
gaagaagtcc atcaaggcct gcctgtgccg ctgccaggag 300accatcgcca acctggagcg
ctgggtcaag cgcgagatgc acgtgtggcg cgaggtgttc 360taccgcctgg agcgctgggc
cgaccgcctg gagtccacgg gcggcaagta cccggtgggc 420agcgagtcag cccgccacac
cgtttccgtg ggcgtggggg gtcccgagag ctactgccac 480gaggcagacg gctacgacta
caccgtcagc ccctacgcca tcaccccgcc cccagccgct 540ggcgagctgc ccgggcagga
gcccgccgag gcccagcagt accagccgtg ggtccccggc 600gaggacgggc agcccagccc
cggcgtggac acgcagatct tcgaggaccc tcgagagttc 660ctgagccacc tagaggagta
cttgcggcag gtgggcggct ctgaggagta ctggctgtcc 720cagatccaga atcacatgaa
cgggccggcc aagaagtggt gggagttcaa gcagggctcc 780gtgaagaact gggtggagtt
caagaaggag ttcctgcagt acagcgaggg cacgctgtcc 840cgagaggcca tccagcgcga
gctggacctg ccgcagaagc agggcgagcc gctggaccag 900ttcctgtggc gcaagcggga
cctgtaccag acgctctacg tggacgcgga cgaggaggag 960atcatccagt acgtggtggg
caccctgcag cccaagctca agcgtttcct gcgccacccc 1020ctgcccaaga ccctggagca
gctcatccag aggggcatgg aggtgcagga tgacctggag 1080caggcggccg agccggccgg
cccccacctc ccggtggagg atgaggcgga gaccctcacg 1140cccgccccca acagcgagtc
cgtggccagt gaccggaccc agcccgag 1188301188DNAOrcinus orca
30ggggaattgg atcaacgtac taccggtggc cttcacgcat accctgcacc acgcgggggc
60cctgtcgcga agccaaatgt catcctgcag attgggaagt gccgggctga gatgctggag
120cacgtccgtc ggacgcatcg tcatcttctt actgaggtgt caaaacaggt ggagcgtgaa
180ctcaaaggct tgcaccgcag cgttgggaaa cttgaaagca acttagatgg ctatgtgccg
240actggcgaca gccagcgttg gcgtaagtcc atcaaagcat gtttgtgtcg ttgccaggaa
300acgattgcaa acctggagcg ttgggtcaaa cgggagatgc atgtctggcg tgaagtattt
360tatcgtttag agcgttgggc cgatcgttta gagagcatgg gtggtaagta ccctgtgggg
420agcaaccctt ctcggcatac gacgtcagtc ggtgttggcg ggccggagtc ctacggtcat
480gaagcggaca cctacgacta taccgtaagc ccttatgcta ttaccccacc acctgcggcc
540ggcgaattac ctggccagga agccgttgag gctcaacaat accctccttg ggggctgggc
600gaggatggtc aacctagccc aggggtagac acgcaaatct ttgaggaccc acgggagttt
660ctttcccacc tggaagaata cctgcgtcag gttggtggga gcgaagaata ctggctgtca
720caaattcaaa accatatgaa tggtcctgca aaaaaatggt gggaatataa acagggttcc
780gtgaaaaact gggttgagtt taaaaaggag tttcttcaat attccgaggg cgccctcagt
840cgggaggcgg tccaacgcga gttggacttg ccacagaaac agggggaacc actcgatcaa
900ttcctttggc ggaaacgtga cctttaccag acattgtacg tggatgcaga tgaggaagaa
960attatccaat atgttgtggg gaccctgcag ccgaaactga aacgtttcct tcgcccgccg
1020ctgcctaaaa cgttggaaca acttattcag aaaggtatgg aggtcgagga tggcttagaa
1080caagtcgcag agccggcctc gccacacttg cctacagagg aggaatcgga ggcgctgacc
1140ccagcactta catcagagtc agtggcatca gaccggacac aaccagag
1188311170DNAOdocoileus virginianus texanus 31ggggagttag atcaccgtac
aacggggggg ttgcacgcat accctgctcc acgtggcggg 60ccggcagcta agccaaacgt
aatcctgcag attgggaagt gccgggcaga gatgttggag 120cacgtccggc ggacccaccg
gcacctcctg gctgaagtgt ctaaacaagt agaacgggaa 180ctcaaaggtc ttcatcgtag
cgtcgggaaa ttggaatcga atttggacgg gtatgttcct 240acaggcgact cacagcggtg
gaaaaagagc atcaaggcct gcctgagtcg ctgccaggag 300acgattgcta acctcgaacg
ctgggttaag cgggagatgc acgtttggcg cgaagtcttc 360taccggctgg agcgttgggc
tgatcggctc gaatctggtg ggggtaagta tccagttggg 420tccgaccctg ctcgccacac
agtctcagtt ggcgtaggtg ggccggagtc gtattgccaa 480gatgcggaca actatgatta
tacagtttcc ccatacgcga tcacaccacc gccggcagca 540gggcagctgc caggtcagga
agaggttgag gcccagcagt atccaccatg ggccccaggg 600gaagacggcc agctttctcc
tggggtggac actcaagttt ttgaagatcc gcgtgaattt 660ctgcggcatt tagaagatta
tctccgccag gtcggggggt ctgaagagta ttggttaagc 720caaattcaaa accatatgaa
cggcccggcc aagaagtggt gggagtacaa gcaagggtct 780gtgaaaaatt gggtggagtt
taagaaagaa ttcttgcaat attctgaggg cactctttcg 840cgtgaagcca tccaacgcga
actcgactta ccgcagaaac aaggggaacc tctcgaccaa 900tttctgtggc gcaaacgcga
cctgtaccag actctttacg tcgatgctga ggaggaagaa 960attattcaat acgtagttgg
cacactgcag cctaagctta aacggttttt acgtccacca 1020ttgccgaaga cgcttgaaca
actcatccag aagggtatgg aggttcaaga tggtctggaa 1080caggcagcgg aaccagcggc
ggaggaggca gaagccctga cacctgcgtt aactaacgag 1140tctgtcgcga gcgaccgcac
ccagccggaa
1170321203DNAOrnithorhynchus anatinus 32ggggaattag accgcctgaa cccaagctca
ggcctgcatc catcctctgg tttgcatcca 60tacccaggtc tccggggcgg ggcaaccgcg
aagcctaatg tcattttgca aattggcaaa 120tgccgtgcgg aaatgcttga acacgtccgc
aaaactcacc gtcatctcct cacagaagta 180tcgcgccaag tagaacgcga gctcaaaggc
cttcacaaaa gtgttggcaa gttggaatca 240aatcttgatg ggtacgtacc gtcaagcgac
tcccaacgct ggaagaaaag cattaaggcg 300tgcttatccc gttgccaaga gacgattgcg
catttagaac gctgggttaa acgtgaaatg 360aatgtatggc gtgaggtgtt ctaccgtttg
gaacgttggg cggaccgtct ggaggctatg 420ggcggtaagt atcctgccgg tgagcaggcc
cggcgtacag tttcagtggg cgttgggggc 480cctgagacat gttgtccagg ggatgaaagt
tatgattgtc cgatttctcc gtatgcagtt 540ccaccttcca ccggcgagtc tccggaatcc
ttagaccaag gggatcagca ctatcagcag 600tggtttgccc tcccggagga gtcccctgtt
agccctgggg ttgataccca gatctttgaa 660gatcctcgcg agtttttacg tcatctggag
aagtacctga aacaagtcgg cgggacagag 720gaagactggc tttctcaaat ccagaatcac
atgaatgggc cggcgaagaa gtggtgggag 780tacaagcaag ggagtgttaa gaattggctt
gaatttaaga aggaattttt acagtattcg 840gagggcacac tgacgcggga cgcgttgaaa
cgtgaactgg atctcccaca gaaacaaggc 900gaaccacttg atcaattttt atggcggaag
cgcgacttat atcagacact ctacgttgac 960gccgatgaag aggaaatcat tcagtacgtc
gtgggcactc ttcagccgaa attaaaacgc 1020tttctccatc acccactccc taagacgctt
gagcagctta tccaacgggg ccaagaagtt 1080cagaatggtc tggagcctac cgacgatcct
gcaggccaac gcactcaatc ggaggacaac 1140gacgaaagcc ttacccctgc cgtcaccaat
gagagtactg caagcgaggg caccctgcca 1200gag
1203331212DNAAnser cygnoides domesticus
33gggcagcttg ataacgttac aaacgcgggc atccactcct tccaggggca tcgtggcgta
60gcgaataagc caaatgtcat tctgcaaatt ggtaaatgtc gtgcggaaat gctggagcac
120gttcgccgca cccaccgcca tttattatct gaagtatcta agcaggtaga acgtgagctg
180aaagggctgc aaaagtccgt gggcaagctc gagaataact tggaggatca tgtccctaca
240gataaccaac gctggaagaa gtccattaaa gcgtgcttgg ctcgttgtca agagactatc
300gcgcatttag agcgttgggt gaaacgcgaa atgaacgtct ggaaggaggt gtttttccgg
360ctggaaaagt gggcagaccg gctggagtca atgggtggca agtactgccc gggcgaacac
420gggaaacaaa ccgtcagtgt aggcgtgggg ggtcctgaaa tccggccttc ggagggggaa
480atttatgatt atgctctgga tatgagccag atgtatgcac tcaccccacc tccaggcgaa
540atgccatcaa tcccacaagc ccatgacagc tatcagtggg ttagtgtctc agaagatgcc
600ccggcgagcc ctgtcgaaac ccaggtattt gaggaccctc gggaattcct gtctcacctg
660gaggaatacc tgaagcaggt aggcggcacg gaggagtatt ggttgtccca gatccagaat
720cacatgaatg gtccggcaaa aaaatggtgg gaatataaac aggactccgt taaaaactgg
780gttgagttta aaaaggaatt cttgcaatac tctgaaggta ctttaactcg ggatgctatt
840aagcgtgaac tcgacttgcc gcaaaaggaa ggtgaacctc ttgaccaatt cctttggcgg
900aagcgggacc tctatcagac actttacgtg gacgcggatg aggaggagat cattcagtat
960gtggtcggta ccctgcagcc gaagctcaag cgtttcctga gctatcctct cccaaagact
1020ttagaacagc tcatccagcg cggtaaagaa gtgcagggta acatggatca ctccgatgag
1080ccttcgccgc agcgtacacc tgaaattcaa tcaggtgact ccgtagaatc tatgccacct
1140tcaacaacgg catctccggt tccatctaat ggtacccaac ctgagccgcc gagcccgcca
1200gccaccgtta tc
1212341185DNAPelecanus crispus 34gggcaacttg acaacgtaac aaacgctggg
attcactcct ttcagggcca ccgcggtgtc 60gccaacaagc caaacgtaat cttgcaaatt
ggcaaatgcc gtgcggagat gttggaacac 120gttcgtcgta cacatcgtca cttgctgtcg
gaagtctcta aacaagtaga acgtgaactt 180aaagggcttc aaaagtcagt cggcaaattg
gaaaacaacc ttgaagacca tgtaccaacc 240gacaatcagc gttggaaaaa gtctatcaaa
gcttgcctgg cccgttgtca agagacgatt 300gctcacctgg agcggtgggt aaagcgcgag
atgaatgtgt ggaaagaggt cttcttccgc 360ttggaaaaat gggccgaccg tttggagtcc
atgggcggta aatattgtcc gggtgaacat 420ggtaagcaaa cagtctctgt gggcgttggt
gggccggaga ttcggccttc tgaaggcgag 480atttacgatt atgcgctcga catgtcccag
atgtatgcgc ttacaccacc accgggcgag 540gtaccaagca ttcctcaagc gcatgacagt
tatcagtggg ttagcgtatc cgaagacgct 600cctgcctcgc cggtagagac ccaggttttt
gaagatcctc gtgaattttt aagccacttg 660gaggagtatt tgaagcaggt aggggggaca
gaggaatatt ggctgtctca gatccagaac 720cacatgaatg gcccggctaa aaagtggtgg
gaatacaaac aagattcggt aaagaattgg 780gtagaattta aaaaggagtt tttacagtac
tcagagggga ctctcacgcg tgatgcgatc 840aaacgcgagt tggatcttcc tcaaaaagag
ggggagccac tcgatcagtt cctctggcgc 900aagcgggatc tctaccaaac actctacgta
gacgcagacg aagaagagat catccagtac 960gtggtgggta cgctccagcc gaaactcaaa
cgtttcctca gctacccact tcctaagact 1020ctggaacaac tgattcagcg gggcaaagag
gtccagggta acatggacca ttcagaggaa 1080cctagtccgc aacgtacacc tgagatccaa
tctggggatt ctgtcgattc ggttccacct 1140tctacaacag cgtctccggt gccgtcaaat
gggacccaac cagag 1185351185DNAHaliaeetus albicilla
35gggcagcttg ataatgtaac caatgcaggt atccactctt tccagggtca ccgcggtgtg
60gcaaacaagc caaatgttat tctgcaaatt ggtaagtgtc gcgctgagat gttagaacac
120gtccggcgca cgcatcggca tctcctgtca gaggtttcaa agcaggtaga gcgtgaatta
180aagggcctcc agaagtccgt aggtaaactc gaaaataatc ttgaagacca cgttcctacc
240gataatcaac ggtggaaaaa gtcaatcaag gcgtgcttag cacggtgtca ggaaacgatc
300gcgcacctcg aacgttgggt gaagcgcgaa atgaatgtct ggaaagaagt gttcttccgg
360cttgagaagt gggctgatcg gctcgaatcc atgggtggca aatattgtcc aggtgatcat
420ggcaagcaaa cggtctccgt cggtgttggt ggtccggaaa tccggccgag cgagggtgaa
480atctatgact acgctcttga tatgtcccag atgtatgcac tcactcctcc gccgggtgag
540gtcccgtcga tcccgcaggc gcatgactca taccaatggg tgtcgactag cgaagacgca
600ccagcctccc ctgttgaaac tcaagtattc gaggacccgc gtgagttcct gagccattta
660gaggagtacc ttaagcaggt tggtggtacc gaggaatact ggttgagcca gattcagaat
720cacatgaacg ggccggctaa gaaatggtgg gaatacaagc aggattcagt caagaattgg
780gtcgaattta agaaggagtt tttgcagtac agtgagggga cgctcacacg cgacgctatc
840aaacgggagc tggacctgcc acaaaaggag ggtgaaccgc ttgatcagtt tctttggcgc
900aagcgtgatc tgtatcaaac cctgtatgtg gacgctgacg aagaagagat cattcagtac
960gtggttggga ctctgcaacc aaagctgaag cgttttcttt cttatcctct ccctaagaca
1020ctggaacagt taatccaacg tggcaaggag gtccagggta atatggacca ctctgaggaa
1080ccgagcccgc aacgtactcc tgaaattcag agcggggata gtgtcgactc agttcctcca
1140agtacgaccg catccccggt cccaagtaac ggtacccaac cagag
1185361395DNAOphiophagus hannah 36gggtcttggg gcttgcaacg tcacgtggct
gatgaacgtc gtggcctcgc tacgcctacc 60tacggcgcgg tttgttccat tcgggagaaa
aaagcctccc aactgagcgg ccagagctgt 120ttggagaaag agttgcttgg ttggaaatgt
acggaggcaa tcgtggaaat gatgcaagtc 180gataacttta accacggtaa cttacatagc
tgccaaggcc atcgggggat ggcaaatcac 240aaaccgaacg taatccttca aatcgggaaa
tgtcgcgcag aaatgttaga ccacgtgcgt 300cgcacccacc gccatctctt gacggaggtt
tcgaagcagg tagaacgcga attgaagtct 360ctccaaaagt cggttggcaa gctcgagaat
aatctggaag accacgtgcc atcggcagcg 420gagaaccaac gttggaagaa atcaattaaa
gcctgcctgg cccggtgcca agaaacaatt 480gctcacctcg aacgctgggt taaacgcgaa
atcaacgtct ggaaagaagt attctttcgt 540ctggagaagt gggcggaccg ccttgagtcg
ggtgggggca agtatgggcc tggtgaccaa 600agtcgtcaaa ctgtaagtgt cggtgttggg
gccccagaaa tccaaccgcg gaaagaagaa 660atctatgact acgctctcga catgtcgcag
atgtatgcct taacaccacc gccgatgggt 720gaagacccaa acgtacctca atcccacgat
agctaccagt ggattaccat ctcagacgat 780tcacctccgt cgccagtgga aactcaaatt
ttcgaggatc cacgcgaatt ccttacccat 840ctcgaggatt atcttaagca agtgggcggg
actgaagaat attggttgag tcagattcaa 900aatcatatga acggtccggc caagaaatgg
tgggagtaca aacaagattc cgtgaaaaac 960tggttggaat tcaagaagga attccttcaa
tactctgagg gtactttgac acgtgacgca 1020attaaacaag aacttgactt accgcagaag
gacggcgagc cattggatca atttctttgg 1080cggaagcggg acctgtatca gacgctctat
attgatgcag aggaggaaga agtaatccaa 1140tacgttgttg gcacactcca accgaaatta
aaacgtttcc tttcccaccc gtatccgaaa 1200actttggaac agttaatcca acgtgggaaa
gaggtggaag gcaacctcga taactctgag 1260gagcctagcc cgcaacggag tccaaagcac
caattgggtg gtagcgtcga gagcctccca 1320ccttcgtcga ccgcaagtcc tgttgcgtca
gacgagactc acccagacgt gagcgcacct 1380ccggtaacgg tgatt
1395371353DNAAustrofundulus limnaeus
37ggggacggcg agactcaagc tgagaatcca tctaccagct tgaacaacac tgacgaagat
60atcttggaac agctcaagaa aattgtcatg gatcaacaac acctgtatca gaaagaatta
120aaggcatctt ttgaacaact cagtcgcaaa atgttttccc agatggaaca aatgaatagc
180aagcaaacgg atctgctttt agaacatcaa aaacagactg tcaaacatgt agacaagcgc
240gtggagtatt tgcgggcgca attcgatgca tcgttaggct ggcggttgaa agagcaacac
300gcggatatta cgaccaaaat cattcctgag atcatccaaa cggtgaagga agatattagc
360ctgtgtcttt ctacgctctg cagtatcgct gaagatatcc agacatcacg ggctaccact
420gtcacagggc atgctgccgt acaaacccat cctgtggatc ttttgggtga acaccattta
480gggaccacgg ggcacccacg cttacagtcg acccgtgtag ggaaaccaga cgacgtacct
540gagtcgccgg taagcctgtt tatgcaaggt gaggcgcgtt cccggatcgt tggcaagagt
600ccgattaaac tgcaatttcc gacgttcggc aaagcaaacg attcttccga cccactccaa
660tatctggagc ggtgtgagga ctttcttgct cttaaccctt taactgatga ggaacttatg
720gctactttgc ggaatgtgtt acatggcacc tctcgggatt ggtgggatgt cgcacgtcat
780aaaatccaaa cttggcgtga gtttaataaa cacttccggg cggctttcct cagcgaggat
840tatgaagatg agttggctga gcgcgtccgt aaccgcatcc aaaaagaaga tgagtctatc
900cgcgatttcg cttatatgta tcagtccttg tgcaagcggt ggaaccctgc tatctgcgaa
960ggtgatgtag taaagctcat cctgaagaac atcaatccac aactgccgtc tcagttacgc
1020tcccgggtca cgaccgtgga tgagcttgtt cgcttgggcc agcagcttga aaaagatcgt
1080cagaatcagc tccaatatga gcttcggaag agttccggca aaattatcca aaaatctagt
1140tcgtgcgaaa cttcagcgct cccgaacacg aagagtacac ctaatcaaca aaaccctgct
1200accagtaacc gtcctccaca ggtgtattgc tggcggtgta agggtcacca tgcccctgcc
1260tcttgtccgc aatggaaagc tgataagcac cgtgcgcaac cttcgcggag ttctgggcca
1320caaactctga ctaatctcca agctcaagac atc
1353381188DNAPhyseter catodon 38ggggaattgg atcaacgtgc ggcagggggc
ttgcgcgcgt acccggcgcc gcgtggtggt 60ccagttgcca aaccgagcgt aattcttcag
attggtaagt gccgcgctga gatgctggaa 120cacgtccgcc gcacgcatcg ccatcttctg
acggaggtaa gtaaacaagt ggagcgcgaa 180ctcaaggggt tacatcggtc tgtcggtaag
ttggagggca atttagacgg ctatgtgcct 240accggtgatt cccaacgctg gaaaaaaagt
atcaaggcgt gtctctgccg gtgtcaggaa 300acaattgcaa atctcgagcg ttgggtgaaa
cgtgagatgc atgtttggcg tgaggtattc 360tatcgtttgg aacggtgggc agaccgtttg
gagtctatgg ggggcaagta tccggtgggc 420actaacccgt cgcggcacac agtaagtgtc
ggggtagggg gcccggaagg ctattctcat 480gaagcggata cttatgacta cacggtgtct
ccgtatgcta tcacgccacc gcctgccgcg 540ggtgagttgc ctggtcaaga ggctgtcgag
gcacaacagt accctccatg gggtctgggg 600gaggacgggc aaccaggtcc gggcgtggac
acgcagattt ttgaggaccc tcgcgaattt 660ttgagccact tagaggagta cctgcggcaa
gtagggggga gtgaagagta ctggttatcg 720caaattcaaa atcatatgaa tggccctgcg
aagaaatggt gggagttcaa acaggggtca 780gtcaagaatt gggtcgagtt taagaaagaa
tttttgcaat acagtgaggg tacgttgagt 840cgcgaggcca tccaacgtga actggacctc
cctcagaagc agggggagcc gttagatcaa 900tttttatggc ggaaacgtga cttataccaa
accctctacg ttgacgctga ggaagaagaa 960attattcaat atgttgtcgg tacgctgcag
ccaaagctga agcggttcct ccgtcctcca 1020ctccctaaaa ccttagaaca attaatccaa
aaaggcatgg aagttcagga cgggttagaa 1080caagcggccg aaccggcctc tccgcgtctg
ccgccggaag aggagagtga ggctcttacg 1140cctgcgctca cgagcgaatc agtagcctcc
gatcggacac agccagag 1188391212DNAMeleagris gallopavo
39gggcagcttg acaatgtgac gaacgcgggg attcacagct ttcaagggca ccgcggcgtc
60gccaacaaac cgaatgtcat tctgcaaatc ggtaaatgtc gtgctgaaat gcttgagcac
120gttcgtcgta cccatcgtca cttgctttct gaagtatcaa aacaagtgga gcgggaactc
180aaaggcctgc aaaagtcagt gggtaaattg gagaataacc tcgaagacca tgtacctaca
240gacaaccagc ggtggaaaaa atctatcaag gcatgcctcg ctcgttgcca ggagactatt
300gcccatcttg agcggtgggt gaaacgtgaa atgaacgtat ggaaggaagt attttttcgc
360ttagagaagt gggctgatcg tcttgaatcg atgggcggca agtactgtcc tggggaacac
420ggcaaacaaa ctgtatctgt cggcgtgggg ggcccggaga tccggccatc ggaaggggaa
480atttatgatt atgctctcga catgtcccaa atgtatgctc tcacaccagg gccaggggaa
540gtaccgtcaa ttccgcaagc acacgacagc taccaatggg tatctgtgag cgaggacgcg
600cctgcctctc cggttgagac gcaaatcttt gaggacccac atgaattttt gtctcatctt
660gaagaatatc tcaaacaggt tggcggcaca gaagaatact ggttatctca gatccagaat
720cacatgaacg gcccggctaa aaagtggtgg gagtataagc aagattccgt aaagaactgg
780gtcgaattca agaaagagtt tcttcaatac tctgagggta ctctgacgcg cgatgcaatt
840aagcgggagt tagaccttcc acaaaaagag ggggagcctc ttgaccagtt cctgtggcgt
900aagcgcgacc tctatcagac actttacgtc gacgctgatg aagaagagat tattcaatat
960gttgtgggta ccctgcagcc aaagcttaag cgtttcctta gctacccact tccgaaaact
1020ctggagcagc tcattcaacg cggtaaggaa gtgcagggca acatggacca ctctgaagag
1080cctagcccgc agcgcactcc tgaaatccaa tcaggtgaca gtgtggagtc aatgccgccg
1140tcaaccaccg cttctccggt acctagcaac gggacgcaac cagagcctcc aagcccaccg
1200gctacagtca tc
1212401227DNAPogona vitticeps 40gggcaacttg agaatattaa ccaaggttcc
ctgcacgcgt ttcagggtca tcgcggcgtg 60gtccataaca acaagcctaa cgttattctc
cagatcggga agtgccgcgc cgaaatgctg 120gagcatgtgc ggcgcaccca tcgccatttg
ctcactgaag tatcaaaaca ggtggagcgt 180gagttgaagg ggttgcagaa aagtgtaggc
aaacttgaaa ataatttaga agaccacgta 240ccaagtgcgg ctgagaacca acgctggaag
aagtcgatta aagcctgctt agcgcgttgt 300caggagacca ttgcgaactt ggaacgctgg
gttaaacgtg agatgaatgt ttggaaggag 360gtctttttcc gcttagagcg ctgggcagat
cgcctcgaat ccgggggtgg caagtactgc 420catgcagacc agggtcgcca aactgtcagc
gtaggtgttg gtggtcctga agtgcgtccg 480tctgaaggtg aaatttacga ttacgcgttg
gatatgagcc aaatgtacgc cttgactccg 540ccgcctatgg gtgatgttcc agtaattcct
cagccgcatg acagttatca gtgggtgaca 600gatccggaag aagcgccacc aagtccggtt
gagacacaaa ttttcgagga ccctcgggag 660tttctgaccc atcttgagga ttatttaaaa
caagtcggcg ggacagagga atattggctc 720tcacagatcc aaaatcatat gaatgggcca
gcgaaaaagt ggtgggaata taaacaggat 780agtgtgaaga actggcttga gttcaaaaaa
gaattcttgc agtactcaga aggcacgtta 840acgcgggacg ctattaaaca ggaacttgac
cttccacaaa aagaagggga accgctggat 900caattcctct ggcgcaaacg cgatttgtac
caaactctct acgtcgaggc agaagaagag 960gaggtcatcc aatatgtagt tggcacactg
caaccaaaac tgaagcggtt tctttctcat 1020ccgtacccta aaaccctgga gcaactcatc
cagcgcggga aggaagttga ggggaatttg 1080gacaatagtg aagaaccgtc tccacagcgg
accccagaac atcagctggg ggacagtgtg 1140gaatctttgc cgcctagtac tacggcttcg
cctgccggtt cggataaaac gcaacctgag 1200attagcttac ctccaactac agtcatt
1227411212DNAAlligator sinensis
41gggcaattag attcggtaac caatgcgggc gtccacacct accagggcca tcggagcgtc
60gccaataaac ctaacgtcat tcttcaaatc gggaaatgtc ggactgagat gctggagcat
120gtccgtcgga ctcatcgcca cctgctcaca gaagtgtcaa agcaagtgga acgtgaactc
180aagggcttac agaagagcgt gggcaaactg gaaaacaatc ttgaagacca tgtcccaact
240gacaatcagc ggtggaagaa gtcaatcaag gcatgtctcg cgcgttgcca agagaccatt
300gctcaccttg agcggtgggt gaaacgtgaa atgaacgtgt ggaaggaggt gttcttccgg
360ttagaacgct gggccgaccg ccttgaatca atgggtggta aatactgccc gacggactct
420gcacgtcaga cagttagcgt tggggtgggg ggcccggaaa ttcggcctag tgaaggcgaa
480atctatgact acgcgctcga tatgagccaa atgtacgctc ttacgccgtc accgggcgaa
540ttgccgtccg tccctcaacc gcatgattca taccagtggg tcactagtcc ggaagacgct
600ccggcgtcac cagttgaaac gcaggtattc gaggatcctc gggagttctt gtgtcatttg
660gaagagtacc tgaagcaggt tggcggtaca gaggaatatt ggctgagcca gattcagaat
720catatgaatg gtcctgcaaa aaagtggtgg gaatataaac aagacacggt taagaattgg
780gtggaattca agaaggagtt cttacaatac agtgagggta cacttacccg tgatgcgatt
840aagcgggaat tagacctccc gcaaaaggac ggtgagcctc tggatcaatt tttatggcgt
900aagcgtgacc tctatcagac attatacatt gatgccgatg aagaacagat cattcagtac
960gtcgtgggga cattgcaacc taaactcaag cggttcttgt cctatccact tccaaaaact
1020cttgaacaat taatccagaa agggaaggag gtgcagggtt cacttgacca cagcgaggag
1080ccgagtcctc aacgtgcgag cgaggctcgg acgggcgata gtgtggaaac cttgccgcct
1140tctaccacta catcaccaaa tacgtcatct ggtacacagc cagaggcacc atcgcctcca
1200gcgacggtaa tc
1212421212DNAAlligator mississippiensis 42gggcagttag acagtgtgac
taacgccggg gtgcatacgt accaggggca ccgcggggtc 60gccaataagc caaatgtaat
tctccagatt gggaagtgtc gtacagagat gttggaacat 120gtccgtcgca ctcatcgcca
cttgctcacc gaggtctcca aacaagtaga acgcgaactc 180aaggggctcc agaagagtgt
tgggaagttg gagaataacc tcgaagacca cgttccgaca 240gataaccaac ggtggaaaaa
gtctattaaa gcctgtctcg cccgttgtca agagacaatc 300gcacacttgg aacgctgggt
caaacgggag atgaatgtgt ggaaggaagt cttcttccgt 360ctcgagcggt gggcggatcg
tttagaaagt atgggcggta aatattgccc aactgactcg 420gctcgtcaaa cggtgtcggt
tggcgtaggc ggcccggaaa ttcgccctag cgagggtgag 480atctatgact atgcacttga
catgagtcag atgtatgcgt taactccgtc gccaggggag 540cttccaagta ttccacagcc
tcacgatagt tatcaatggg taacttctcc tgaagacgcc 600ccagcatccc cagttgagac
acaagtattc gaggaccctc gtgagtttct ctgtcacctc 660gaggagtacc ttaaacaggt
aggcgggacc gaagagtact ggttatcgca aatccaaaac 720catatgaatg gtcctgccaa
aaagtggtgg gagtataaac aagatactgt gaagaattgg 780gtagagttca agaaagagtt
cttacagtac tctgagggga cgttaactcg tgatgcgatc 840aagcgcgaat tggatttacc
tcagaaggac ggcgagccac tcgaccagtt cttatggcgc 900aagcgtgact tgtatcaaac
cctttatatc gatgctgacg aggaacaaat tatccagtac 960gtagtcggta cgttgcaacc
aaaacttaaa cgctttctga gctacccatt acctaaaacg 1020ttggagcaac tgatccagaa
aggtaaagag gtgcaaggga gcctggatca tagtgaagaa 1080ccgagccctc agcgggcttc
tgaagctcgg accggtgata gcgtcgaatc tttaccacct 1140agtaccacaa ccagcccgaa
tgcgtcatct ggtacccaac ctgaagcgcc ttccccacct 1200gctacagtca tt
1212431224DNAGekko japonicus
43gggcagctcg agaatgtcaa ccatgggaac ctccattctt ttcaaggtca tcgcggcggc
60gtcgccaaca agccaaacgt tatcttgcag atcggtaaat gtcgtgcaga gatgctggac
120cacgtccggc ggacccaccg gcatttactg acagaggtat cgaaacaggt tgaacgtgag
180ttgaaggggt tacagaaatc agtagggaaa ttagaaaata acttagaaga ccatgtccct
240tcagccgttg aaaaccagcg ttggaaaaaa tcgatcaagg cctgcctttc ccgctgccaa
300gagaccattg cccaccttga gcgttgggtg aagcgcgaga tgaacgtatg gaaagaggtt
360ttcttccgct tagagcggtg ggcagatcgg ttggaatctg ggggcgggaa atattgtcac
420ggtgataatc atcgtcaaac agtatcagtc ggtgttggcg gccctgaggt acgtccatct
480gaaggcgaaa tttacgatta cgctctcgac atgtcgcaaa tgtacgcttt aacaccgcct
540agcccagggg atgtgcctgt agttagccag ccgcacgaca gctatcagtg ggttacggtt
600ccggaggata cccctccatc cccggtggag acgcaaatct tcgaggaccc acgggagttc
660ttgacccact tagaggatta cttaaagcaa gtggggggta cagaggaata ttggttatct
720cagatccaga atcacatgaa cgggccagcc aagaagtggt gggagtataa gcaagactca
780gtaaaaaatt ggctcgagtt taagaaggaa ttccttcagt attccgaggg gacacttacg
840cgcgacgcta tcaaggaaga acttgacctc ccgcaaaagg acggggaacc tcttgatcag
900ttcctgtggc gcaagcgcga cttgtaccag accctgtacg tggaggcgga tgaggaggag
960gtgatccagt atgttgtggg gactttacaa cctaaattaa agcgttttct ctcacaccct
1020tacccgaaaa cgttagagca acttatccaa cggggcaaag aggtggaagg gaacctcgac
1080aattcagagg aaccaacacc tcagcgtact ccagaacacc aactgtgtgg ttctgtagaa
1140tcgctgcctc cttcctctac cgtcagtcca gtggctagcg atggtactca acctgagact
1200tcgccattgc cagcgactgt tatt
1224441365DNAHomo sapiens 44gggccattga cgttgttaca agactggtgt cgtggtgaac
atttaaacac ccgccggtgc 60atgttgatcc tcggtatccc agaagattgc ggcgaggatg
agttcgaaga gacacttcag 120gaggcgtgtc gccatttagg gcggtaccgc gtgatcggcc
gcatgttccg tcgtgaggaa 180aatgcccaag cgatcctctt ggaattggcg caggatattg
actatgcctt actccctcgg 240gaaatccctg ggaaaggcgg gccttgggag gtaattgtga
agccgcgtaa ttccgacggc 300gaattcttaa atcggcttaa tcgctttctt gaagaggagc
gccgtacggt ctccgatatg 360aaccgtgttt tgggctcgga tactaactgt tcagctcctc
gtgtcaccat tagtcctgaa 420ttctggactt gggcacagac gctgggcgca gctgtccaac
cattgctcga acagatgctc 480taccgggagt tacgggtctt cagtggcaat acgatttcca
tcccaggtgc tctcgctttt 540gacgcgtggc tggagcatac cacggaaatg cttcaaatgt
ggcaggtgcc tgaaggggag 600aaacggcggc gcttgatgga gtgtttgcgg gggccagccc
tgcaagtcgt tagtgggtta 660cgtgcatcga atgccagtat cactgtcgaa gagtgtcttg
ctgcactgca gcaggtattc 720ggtccagtgg aaagtcataa gattgcccaa gtaaagttat
gcaaagctta ccaggaggct 780ggggaaaaag taagcagctt cgttttgcgt ttggagccac
tgcttcagcg tgctgtagaa 840aacaacgtgg tcagtcgccg caatgtcaac caaacacgtc
ttaagcgtgt tctgtcgggc 900gccacccttc ctgacaagct gcgtgataaa ttgaagttaa
tgaaacagcg ccgtaaaccg 960ccgggtttct tggcgttggt taaactgtta cgtgaagagg
aggagtggga ggccacctta 1020gggccagacc gcgagtcatt ggaggggtta gaagtggcac
cgcgcccgcc agcacggatt 1080acgggtgttg gcgcagtacc tcttccggca tccgggaatt
catttgatgc ccgtccttcg 1140caagggtacc ggcgccgtcg gggtcgtggt cagcaccgtc
ggggcggcgt tgctcgtgca 1200ggctctcgtg gctctcgtaa gcggaaacgg cacaccttct
gctattcctg tggtgaggat 1260ggccatattc gtgtccaatg cattaaccct agcaatctcc
tgttggctaa ggagaccaaa 1320gagattttgg aagggggaga acgtgaagcg caaacgaatt
cacgt 1365451344DNAHomo sapiens 45ggggctctta cgctcttaga
agactggtgt aagggtatgg acatggaccc gcggaaggct 60ctcctgattg taggtattcc
gatggaatgc agtgaggtgg aaatccagga tacagttaaa 120gctggtcttc aacctctgtg
cgcttatcgt gtactcggcc gtatgttccg gcgggaggat 180aatgcgaagg ctgttttcat
tgagctggca gacaccgtga attacaccac gttaccgtct 240cacattccgg gtaaaggggg
ttcctgggaa gtcgttgtta aacctcggaa ccctgacgac 300gagttccttt ctcggcttaa
ctacttcttg aaagatgagg gccgctcgat gacggatgtc 360gcccgggcac tggggtgctg
tagcttacct gcggaatcac tggacgcgga agtaatgcca 420caggtccgct ccccaccatt
agaacctcca aaagagagta tgtggtaccg taagttaaaa 480gtgtttagtg gtaccgcgtc
gccttcgccg ggggaggaga catttgagga ctggttagag 540caagtcaccg agatcatgcc
tatctggcaa gtatctgaag ttgaaaagcg ccgtcggtta 600ctggagtcac tccggggccc
ggcactctca attatgcgcg tgttacaagc caataacgat 660agcattaccg ttgaacagtg
tttggatgca ttaaagcaga tctttggcga caaggaagac 720ttccgtgcct ctcaatttcg
ttttcttcaa acgtccccta aaattgggga gaaggtgagt 780acgttcctgc tgcgtttaga
gccactcttg caaaaggccg ttcacaagag cccactttcg 840gtacgtagta ctgatatgat
tcggttaaag cacctgttgg cacgcgtagc catgaccccg 900gcactgcgtg gtaaactcga
attactcgac caacgcgggt gcccacctaa ttttcttgag 960ctgatgaagc tgatccggga
tgaggaagag tgggagaata ctgaagctgt gatgaaaaat 1020aaagagaaac cttcaggtcg
tggccgcggt gcatcaggcc gtcaagctcg cgccgaggcc 1080agtgtaagtg ctccgcaagc
aacagtccaa gcacgtagct tctctgattc tagcccgcag 1140acgattcagg ggggcttacc
acctcttgtc aagcgtcggc gccttttggg ttcggagagc 1200acacgtgggg aagaccacgg
gcaagctact tatccgaaag cagagaatca gactccaggg 1260cgtgagggcc cgcaggcggc
tggggaggaa cttggtaatg aggccggggc cggcgcgatg 1320tcccacccga aaccgtggga
aacc 1344461197DNAHomo sapiens
46ggggctgtga caatgctcca ggactggtgc cgttggatgg gcgtgaacgc tcggcggggg
60ctgttaatct taggtatccc tgaagactgt gacgatgcag agttccaaga gtcgttagaa
120gctgcactcc gtcctatggg tcactttact gtactcggta aggccttccg cgaggaagac
180aacgctaccg ctgcgctggt ggaattagat cgcgaggtta attacgcact tgttccacgc
240gaaattccgg gcaccggcgg gccttggaac gtcgtgttcg ttcctcggtg ctccggcgag
300gaattcctgg ggttaggccg cgtgttccac tttcctgaac aggagggcca aatggtagaa
360tcggttgcgg gggcactggg ggtaggtctg cgccgcgtgt gttggttacg ctcgatcggg
420caagctgtac aaccatgggt agaagctgtt cgctgccaaa gcttaggggt atttagtggt
480cgtgatcaac ctgcacctgg tgaagaaagc ttcgaggtct ggttggatca tacgaccgag
540atgttgcatg tgtggcaagg cgtgtcggaa cgggaacggc gccgtcgtct gctggaaggg
600ctgcgtggca cagccttaca acttgtacat gccttactgg cagaaaatcc ggcacggaca
660gcacaagatt gcttggctgc attagcccaa gtttttggtg ataacgaaag ccaggcaacg
720attcgtgtta aatgtttgac agcccaacag cagagtggcg aacgcctctc tgcgttcgtt
780ctccgcttag aagtacttct gcaaaaggct atggagaagg aagcattggc gcgcgcgtca
840gcggatcggg tgcgtcttcg tcagatgctg acacgcgcac atctcacaga gccgttggat
900gaagccttac ggaaattgcg tatggcaggg cgttctccgt cttttttgga aatgctcggc
960ttagtacgcg agtcagaggc ctgggaggca agtctggctc ggtccgtccg ggcgcaaacc
1020caggagggtg caggggcccg ggcgggggcc caagcagttg cgcgtgccag cactaaggtt
1080gaagctgtac ctggtggccc tggccgggag ccagaaggtc tcctccaagc cgggggccaa
1140gaagcggaag aacttctcca agagggctta aagccggttt tagaggaatg tgacaat
1197471197DNAHomo sapiens 47ggggcggtca ccatgttgca agactggtgt cggtggatgg
gcgtgaatgc tcggcggggt 60ttattgatct tgggtatccc agaagactgt gacgacgccg
agtttcagga gtcgctcgag 120gccgcccttc gtccaatggg gcattttacg gttctgggca
aggtgttccg tgaagaggat 180aacgctacag cagctcttgt ggagcttgac cgtgaggtga
attatgcgtt agtacctcgc 240gagattccag gtaccggtgg gccatggaac gtagtcttcg
tcccacgttg ctcgggggag 300gaatttctgg ggcttgggcg cgtattccac tttccagaac
aggaagggca gatggtcgaa 360agcgtagcag gcgctcttgg cgttggtctc cggcgcgtgt
gctggttacg ctccatcggc 420caagcagtcc aaccatgggt tgaagccgta cgctatcaat
ctttaggtgt cttctcaggc 480cgtgaccagc cggcgcctgg tgaggaatcc ttcgaagtct
ggctcgatca tacaactgag 540atgctgcatg tatggcaagg tgtctcagag cgggaacggc
ggcggcggtt attagagggg 600ctccgtggga ctgcgctcca attagtacat gcgcttttgg
ccgaaaatcc agcccgtact 660gcccaagatt gtctggcagc actcgcccaa gtattcggcg
acaacgaatc gcaggcaaca 720atccgcgtaa agtgtcttac agcacagcag cagtcagggg
aacgtcttag tgcgttcgtt 780ctgcggctgg aagtgttact ccagaaagcc atggaaaagg
aggcattggc tcgcgcgagc 840gctgaccgtg tacgtctgcg gcaaatgctt actcgcgcac
atctcaccga gcctctcgat 900gaagcactgc ggaaactgcg catggcaggc cgcagcccgt
ctttcctgga aatgttaggc 960ttagtccggg agtccgaagc ctgggaggcc agtctggcac
ggtcagtgcg ggcacaaacg 1020caagagggtg caggggcacg ggcgggtgca caagcagttg
cacgtgcctc cactaaagtt 1080gaggcagtgc cgggtgggcc aggccgtgaa ccggagggtt
tgcgccaagc cggcgggcag 1140gaagccgaag aattactcca agaaggttta aaaccggttt
tggaggaatg cgataac 1197481425DNAHomo sapiens 48ggggtggaag atttggcggc
atcttacatc gtattaaagc ttgagaacga aatccggcag 60gcgcaggtcc aatggttaat
ggaggaaaac gccgccctgc aggcccagat ccctgaactt 120caaaagtcgc aagccgcgaa
ggagtatgat cttctgcgta aatcttcgga ggcgaaggag 180ccgcaaaaac tgccagaaca
tatgaatcca ccggccgctt gggaagcaca aaagactcca 240gagtttaagg aaccacagaa
acctcctgaa ccacaggatt tgcttccttg ggagccgcct 300gctgcctggg agttgcaaga
agcaccggct gcccctgagt cactggctcc gcctgcaacc 360cgtgagtctc agaaaccacc
tatggcgcat gaaatcccta ctgtattgga ggggcaaggg 420cctgccaaca cacaagacgc
tacgattgct caagaaccaa agaatagcga gccgcaagac 480cctccaaata tcgagaaacc
tcaggaagct ccggaatatc aagaaacagc ggcacagttg 540gagtttttag aacttcctcc
acctcaggag ccactcgaac cgagcaatgc gcaagaattt 600ctcgagttgt cggctgccca
ggagtcctta gaaggcctca ttgtagttga aacgtccgcg 660gcttcggagt tcccacaggc
tcctatcggg cttgaagcca ccgactttcc gctgcagtac 720acgcttacct tctctggcga
cagccagaag ttgccagaat ttttggtcca actctacagt 780tatatgcggg tacgtgggca
cttataccct accgaggcgg cgttagtgtc gtttgtaggc 840aattgtttct cagggcgcgc
gggctggtgg tttcagttgc ttttggatat ccagtcgcct 900ctgttagaac agtgtgaaag
ttttatcccg gttctccaag acacatttga caatccggaa 960aacatgaagg acgcaaacca
atgcatccac cagctttgtc agggcgaggg tcatgtggcc 1020acacacttcc acctcattgc
acaagagctt aattgggatg aaagcacgct gtggatccag 1080ttccaggaag gcctggcctc
atccatccag gatgaacttt cccatacatc gcctgctacc 1140aacctgagtg atctgattac
tcaatgcatc tcattagagg aaaagcctga cccaaacccg 1200ttagggaagt cctcctcggc
ggagggggat ggcccggaaa gtccgccagc agaaaaccaa 1260cctatgcaag ctgcgatcaa
ttgtcctcac atttccgaag cagagtgggt tcgttggcac 1320aaaggccggc tttgtctcta
ttgcggctat ccgggtcact tcgcacgtga ttgcccagtg 1380aagccacacc aggcgttaca
ggcagggaac attcaggctt gccaa 142549717DNAHomo sapiens
49ggggtgcagc cgcagactag caaagctgaa tcgccggctc tcgctgcctc accgaacgca
60caaatggatg acgttattga tacattaacc tccctgcgtc tgacgaattc ggctctgcgg
120cgggaggcta gcactcttcg ggccgagaaa gcaaatttaa ctaatatgct cgagtcagtg
180atggccgagt taacgctgtt acggacccgt gcgcggattc cgggggccct gcagattacg
240ccaccaattt cgtctattac tagcaacggt actcgcccga tgacgactcc tccaactagt
300ttacctgaac cgttttctgg cgatcctggc cggttagctg gtttccttat gcagatggac
360cgttttatga tctttcaagc tagccggttt ccaggggagg cagagcgtgt tgcgttcctg
420gtgtcgcgct taactggcga agcagaaaaa tgggccattc ctcacatgca accagactct
480cctttgcgta acaactatca aggcttctta gcagagttac ggcggaccta taagagcccg
540ttgcgtcacg cccggcgggc gcaaatccgg aagacatcgg cctcgaaccg ggcagtccgt
600gaacgccaaa tgctttgccg gcaacttgca tcagcaggta caggcccatg cccggtacac
660cctgctagta acgggacttc cccggcaccg gcattaccag cacgggcgcg taactta
71750339DNAHomo sapiens 50ggggacggtc gggtacagtt gatgaaggct ttattggctg
gccctttacg tccggcggca 60cgccgttggc ggaatcctat tccatttcca gagacttttg
atggggatac tgatcgcctc 120ccggagttta tcgtccaaac ttcgtcctac atgttcgttg
acgaaaatac tttctctaac 180gacgctctga aagtgacatt tctcattacc cggctgacag
gtccagcctt gcaatgggtc 240attccgtaca ttcgtaaaga aagcccgctt cttaacgact
atcggggttt cctggccgag 300atgaagcggg tttttgggtg ggaagaggac gaggacttt
33951339DNAHomo sapiens 51ggggaaggtc gggtgcaact
tatgaaagcg ttgcttgccc gcccgcttcg tccagcagca 60cgtcgctggc ggaatccaat
tcctttcccg gagacttttg acggggacac cgatcggctc 120ccagagttca ttgtgcagac
gtcaagctat atgttcgtgg atgagaacac gttctctaac 180gacgcgttga aagtgacttt
cttaattacg cgtttgactg gcccggcttt acaatgggtg 240attccataca ttaagaaaga
gtcaccgctt ctcagtgatt atcgcggttt tttagccgag 300atgaagcggg tcttcgggtg
ggaagaagac gaagacttt 339521092DNAHomo sapiens
52gggccgcgtg ggcgttgccg tcaacaaggt cctcggattc cgatttgggc agcggccaac
60tatgccaacg cccacccgtg gcaacaaatg gataaggctt cgccaggcgt tgcttacaca
120cctttggttg atccttggat tgagcggcct tgttgcggtg acacggtttg tgtgcgcacc
180acaatggaac agaagagcac agcgtcaggc acttgtggtg gtaagcctgc tgagcgtggt
240cctctcgcgg ggcatatgcc gagctcacgc ccacatcggg ttgatttctg ttgggttcct
300ggtagcgacc caggcacatt cgacggcagt ccatggctct tagatcgctt tttggcgcaa
360cttggtgatt acatgagttt tcactttgaa cactaccagg acaatatcag ccgtgtctgc
420gagattcttc gtcggttaac gggccgcgct caggcatggg ctgctcctta cctggacggg
480gaccttccac tgccagacga ctacgaattg ttttgtcaag accttaagga ggtagtacag
540gaccctaaca gtttcgccga gtatcacgcc gtggtgactt gtccactccc tcttgcttcg
600tcccaacttc ctgtagctcc tcagcttccg gtggtacgcc aataccttgc gcgcttcttg
660gagggccttg ctttggatat gggtacggcg cctcggtcac tcccggccgc tatggccaca
720ccggcagtct ccggctcgaa ctccgtttct cgttctgcct tatttgaaca acaactcaca
780aaggaatcca ctccaggccc gaaagagcca cctgttctcc ctagctcgac ttgctctagc
840aaaccgggtc ctgtcgaacc agccagttca caacctgaag aggctgctcc taccccggtg
900ccgcgtttgt cagagtcggc taacccaccg gctcagcgtc cagaccctgc tcaccctggt
960ggtcctaaac cacaaaaaac cgaagaggaa gttttagaaa ctgaggggga ccaggaagtt
1020agcctgggga cgccgcagga ggtcgtagaa gcgccggaaa caccaggtga accaccgctc
1080agccctgggt tc
109253438DNAHomo sapiens 53ggggttgatg aattggtgct cttgttgcac gcgctgttaa
tgcgccatcg ggcgctttcc 60attgaaaatt ctcagttgat ggagcaactt cgcttgttgg
tctgcgaacg ggcgagcctt 120cttcgtcagg tacgtccgcc gagctgtcca gtgccatttc
ctgagacttt taacggggag 180tcatcacggt tacctgagtt catcgtccaa accgcaagct
atatgttagt taatgaaaat 240cgcttttgca atgacgcaat gaaagtcgct tttttgatta
gccttcttac tggtgaagca 300gaagaatggg tcgtcccata cattgagatg gattcaccaa
ttcttgggga ctaccgtgcg 360ttcttggatg agatgaagca gtgttttggg tgggacgatg
atgaagatga cgacgatgag 420gaagaggagg atgactat
438541647DNAHomo sapiens 54gggcctgtgg atttaggtca
ggctttgggg ttgttgccat ccctcgctaa ggccgaagat 60tcccaattta gcgaaagcga
tgcagcttta caggaggaat tgtcttctcc ggaaaccgca 120cggcaacttt ttcgtcaatt
tcgctatcaa gtcatgtcgg ggcctcatga aacactgaaa 180cagttacgga agttatgttt
tcagtggctg caacctgaag tccatacaaa ggaacaaatc 240ctcgaaattc tgatgctgga
acagttcttg accattctgc ctggtgaaat tcagatgtgg 300gtccgcaagc agtgccctgg
tagtggggag gaggcggtta cgttagtaga atccctgaaa 360ggtgatccac aacggctctg
gcaatggatc tccatccaag tcctgggtca ggatatcctg 420tctgagaaaa tggagtcacc
ttcttgccag gtgggcgaag tggagccaca cctggaagtt 480gtacctcagg aactggggtt
agagaattca tcttcagggc cgggggaact tctttcgcac 540atcgtgaaag aggagtctga
cactgaagca gagttggcgt tagcggcatc ccagccagct 600cgtttggaag aacggctgat
tcgggatcag gaccttgggg cgtccctcct cccggcagca 660ccgcaggagc aatggcgtca
attagacagc actcaaaaag aacaatattg ggacctgatg 720ctggagacct acggcaaaat
ggtatccggc gcgggtatct cacacccgaa gtccgattta 780acgaactcaa ttgagttcgg
tgaagagttg gcaggtattt atttacatgt aaacgaaaag 840attccgcggc ctacctgcat
tggtgaccgc caagaaaacg acaaagaaaa ccttaatttg 900gaaaaccatc gtgaccagga
attattacat gccagctgcc aggcctcggg cgaagtgcca 960tcccaggcat cgttacgtgg
cttctttacc gaggacgaac ctggttgctt cggcgaaggg 1020gagaaccttc ctgaggcact
tcagaatatc caggatgagg ggactggcga acagctgagc 1080ccgcaagaac gcattagtga
aaaacagttg ggtcaacatt tgccaaatcc gcactcgggg 1140gagatgtcga cgatgtggct
tgaagaaaaa cgggagacca gccagaaagg ccaaccacgt 1200gcaccaatgg cgcagaaatt
gccaacgtgc cgcgaatgtg gcaaaacgtt ttatcgcaat 1260agtcaactta tctttcacca
acgcacacac accggtgaga catattttca atgcaccatc 1320tgcaaaaagg cgtttctccg
gtcatctgat ttcgtgaaac atcagcggac tcatactggc 1380gaaaaacctt gtaaatgtga
ctattgtggc aagggcttta gtgattttag cgggcttcgg 1440catcacgaga agatccatac
cggcgagaag ccatacaagt gtccaatctg tgagaaatct 1500ttcatccagc gcagtaattt
taaccgccac caacgggttc acaccggtga aaagccttat 1560aaatgctcgc attgtggcaa
gagcttcagc tggagctcct cgctcgataa gcatcaacgt 1620tcacatctgg ggaagaagcc
gttccaa 1647551053DNAHomo sapiens
55gggactctcc gcttacttga ggattggtgt cgggggatgg acatgaaccc acgtaaggcc
60cttcttatcg ccgggatttc ccagtcatgt tcagtcgccg agattgaaga ggcgctccaa
120gccgggcttg ctcctttagg cgagtatcgt ctccttgggc ggatgtttcg ccgcgatgaa
180aatcgcaaag tagcgttggt tggtctcaca gctgaaacta gccatgcgct tgtacctaaa
240gaaattcctg gtaaaggcgg gatctggcgg gttattttta aaccaccgga cccggacaat
300acgtttcttt ctcgtttgaa tgagttcctc gcgggcgagg ggatgacggt gggggaactt
360agtcgtgctc ttggtcacga aaatgggtca ttagaccctg aacagggtat gattccggaa
420atgtgggcgc cgatgctggc acaggctctg gaggctctcc aaccggcttt acagtgcctt
480aagtacaaga agctgcgcgt tttttcaggg cgcgagtctc cagagccggg tgaggaggaa
540ttcggccgtt ggatgttcca taccacccag atgatcaaag cgtggcaggt gccggatgtc
600gagaaacgcc gccggctgtt ggaatcactc cgcgggccgg cacttgacgt tattcgggtt
660ctgaaaatta acaacccgtt aattacggta gatgaatgtt tgcaagcact tgaagaggtc
720tttggggtga ctgacaatcc tcgggaattg caagtaaaat acttaacgac ctaccataag
780gacgaggaga aattatcagc ctacgtactg cggctggaac cgctgctgca gaagctcgtc
840cagcgggggg ctattgaacg ggacgctgtt aatcaggctc gcctggatca ggtaatcgct
900ggggcggtac ataaaactat ccgccgtgag ctgaacctgc ctgaagacgg gccggcgcca
960ggctttcttc aactcctcgt tttgattaag gattacgagg cagctgaaga ggaggaagca
1020ttacttcagg ccattcttga agggaacttt act
1053562124DNAHomo sapiens 56gggacagaac ggcgtcgcga cgaattaagt gaagaaatta
ataatcttcg tgaaaaggtt 60atgaaacaga gtgaggaaaa caacaatctt caatcccaag
tccagaaact cactgaggag 120aatactacac tccgtgagca agttgaacct acacctgaag
atgaagatga cgacattgag 180ttgcggggcg cagcagccgc agccgcgcct ccgccgccga
tcgaggagga atgcccggag 240gatttaccgg aaaaatttga tggtaatccg gacatgttag
cgccattcat ggcccagtgc 300caaattttta tggaaaagtc tacgcgcgat tttagtgtag
atcgcgtacg tgtatgtttt 360gtgacgagca tgatgactgg tcgcgcagcc cgttgggcgt
cagcgaaatt ggagcggtcg 420cactacctga tgcataatta cccggcgttc atgatggaga
tgaaacacgt gtttgaagac 480ccgcagcggc gggaggtggc caaacgcaag atccggcggt
tgcggcaggg catgggcagc 540gtaattgatt atagtaatgc gtttcaaatg attgcgcagg
atctggattg gaatgaacct 600gctctcattg atcaatatca tgaagggctt agtgaccata
ttcaagagga actctctcac 660ctggaagtgg ctaaatctct ctccgccctt attggccaat
gcattcatat tgagcgccgt 720cttgcacgtg ctgctgccgc tcggaaaccg cgtagtccac
cacgggcttt agtgctccca 780catatcgcgt cacaccatca agtagatcct actgagccag
tggggggtgc acgcatgcgc 840ttaacccaag aagaaaagga acgtcgtcgt aagctgaatt
tatgcctgta ctgcggcact 900ggtggccatt atgccgataa ctgtcctgcc aaagccagta
agtcaagccc ggctgggaaa 960cttccaggtc ctgccgtcga gggcccttct gctaccggcc
cagagattat ccgctccccg 1020caagacgatg cgtcgtcgcc tcatctccag gtaatgctcc
aaatccacct ccctggccgg 1080cacacactct ttgtccgggc gatgattgac tctggggcgt
ctggtaattt tattgatcac 1140gagtatgttg ctcaaaatgg tatccctctc cggatcaaag
actggcctat tctggttgaa 1200gccatcgatg gccgtccgat cgcgagcggt cctgtggttc
atgaaacgca tgacctcatc 1260gttgatctgg gtgaccaccg tgaagtatta tcctttgatg
tgactcagtc accgtttttt 1320ccagttgttt tgggcgtccg ttggctttcg actcacgatc
ctaacatcac gtggtcgaca 1380cggtcgattg tcttcgattc ggaatattgt cgttatcatt
gccgcatgta ttcaccaatt 1440ccgccgtctc tcccgccgcc tgcgccgcaa cctcctctgt
attacccggt ggacggttac 1500cgtgtttacc agccagttcg ctactactac gtacaaaacg
tgtacacgcc tgttgatgaa 1560cacgtgtacc cagatcaccg cctggtcgac cctcatattg
agatgatccc gggtgcgcac 1620tcgatcccat cgggccatgt ttattccttg tctgagccag
aaatggccgc cttacgggat 1680tttgtggccc ggaatgtcaa agacggcctg attaccccga
caattgcacc aaacggtgct 1740caggtgttgc aggtgaagcg gggctggaag ttgcaagtca
gctatgattg tcgtgcgcca 1800aacaacttca ctattcagaa ccaatatcca cgtctcagca
tccctaatct cgaggaccag 1860gcacatcttg caacatatac tgaatttgta cctcagattc
ctggctatca gacttatcct 1920acgtatgctg cctacccaac atacccggta ggtttcgcat
ggtacccagt aggccgggac 1980gggcagggcc gctctttata tgttcctgtc atgattacat
ggaacccgca ttggtaccgc 2040cagcctccgg tcccacagta cccacctcct caacctccac
cacctccgcc gcctcctcca 2100ccgccacctt cttactcgac atta
21245760DNAArtificial SequenceDescription of
Artificial Sequence Synthetic oligonucleotide 57atgcatcacc
atcaccatca cggctcaggg tctggtagcg aaaatctgta cttccagggg
605820PRTArtificial SequenceDescription of Artificial Sequence Synthetic
peptide 58Met His His His His His His Gly Ser Gly Ser Gly Ser Glu Asn
Leu1 5 10 15Tyr Phe Gln
Gly 20596PRTArtificial SequenceDescription of Artificial
Sequence Synthetic 6xHis tag 59His His His His His His1
5606PRTArtificial SequenceDescription of Artificial Sequence Synthetic
peptide 60Gly Ser Gly Ser Gly Ser1 5617PRTArtificial
SequenceDescription of Artificial Sequence Synthetic peptide 61Glu
Asn Leu Tyr Phe Gln Gly1 56226DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
62aagctcattt cctggtatga caacga
266325DNAArtificial SequenceDescription of Artificial Sequence Synthetic
primer 63agggtctctc tcttcctctt gtgct
256425DNAArtificial SequenceDescription of Artificial Sequence
Synthetic primer 64gctcaacctg ggaactgcat ctgat
256525DNAArtificial SequenceDescription of Artificial
Sequence Synthetic primer 65taatcctgtt tgctccccac gcttt
256622DNAArtificial SequenceDescription of
Artificial Sequence Synthetic primer 66ggcccctcag ctccagtgat tc
226725DNAArtificial
SequenceDescription of Artificial Sequence Synthetic primer
67cctgttgtca ctctcctggc tctga
256824DNAArtificial SequenceDescription of Artificial Sequence Synthetic
primer 68gccaagacat aagaaacctc gcct
246924DNAArtificial SequenceDescription of Artificial Sequence
Synthetic primer 69gtgaatcaac atcctccctc cgtc
247025PRTArtificial SequenceDescription of Artificial
Sequence Synthetic PeptideMISC_FEATURE(1)..(25)This sequence may
encompass 1-5 "Glu Ala Ala Ala Lys" repeating units 70Glu Ala Ala
Ala Lys Glu Ala Ala Ala Lys Glu Ala Ala Ala Lys Glu1 5
10 15Ala Ala Ala Lys Glu Ala Ala Ala Lys
20 257125PRTArtificial SequenceDescription of
Artificial Sequence Synthetic PeptideMISC_FEATURE(1)..(25)This
sequence may encompass 1-5 "Glu Ala Ala Ala Arg" repeating units
71Glu Ala Ala Ala Arg Glu Ala Ala Ala Arg Glu Ala Ala Ala Arg Glu1
5 10 15Ala Ala Ala Arg Glu Ala
Ala Ala Arg 20 257250PRTArtificial
SequenceDescription of Artificial Sequence Synthetic
PolypeptideMISC_FEATURE(1)..(50)This sequence may encompass 1-10 "Gly Gly
Gly Gly Ser" repeating units 72Gly Gly Gly Gly Ser Gly Gly Gly Gly
Ser Gly Gly Gly Gly Ser Gly1 5 10
15Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly
Gly 20 25 30Gly Gly Ser Gly
Gly Gly Gly Ser Gly Gly Gly Gly Ser Gly Gly Gly 35
40 45Gly Ser 507340PRTArtificial SequenceDescription
of Artificial Sequence Synthetic
PolypeptideMISC_FEATURE(1)..(40)This sequence may encompass 1-10 "Gly Gly
Gly Ser" repeating units 73Gly Gly Gly Ser Gly Gly Gly Ser Gly Gly
Gly Ser Gly Gly Gly Ser1 5 10
15Gly Gly Gly Ser Gly Gly Gly Ser Gly Gly Gly Ser Gly Gly Gly Ser
20 25 30Gly Gly Gly Ser Gly Gly
Gly Ser 35 407418PRTArtificial
SequenceDescription of Artificial Sequence Synthetic Peptide 74Lys
Glu Ser Gly Ser Val Ser Ser Glu Gln Leu Ala Gln Phe Arg Ser1
5 10 15Leu Asp7514PRTArtificial
SequenceDescription of Artificial Sequence Synthetic peptide 75Glu
Gly Lys Ser Ser Gly Ser Gly Ser Glu Ser Lys Ser Thr1 5
107610PRTArtificial SequenceDescription of Artificial
Sequence Synthetic peptide 76Gly Gly Ala Ala Asn Leu Val Arg Gly
Gly1 5 107710PRTArtificial
SequenceDescription of Artificial Sequence Synthetic peptide 77Ser
Gly Arg Ile Gly Phe Leu Arg Thr Ala1 5
10785PRTArtificial SequenceDescription of Artificial Sequence Synthetic
peptide 78Ser Gly Arg Ser Ala1 5794PRTArtificial
SequenceDescription of Artificial Sequence Synthetic peptide 79Gly
Phe Leu Gly1804PRTArtificial SequenceDescription of Artificial Sequence
Synthetic peptide 80Ala Leu Ala Leu1815PRTArtificial
SequenceDescription of Artificial Sequence Synthetic
PeptideMOD_RES(3)..(3)S-ethylcysteine 81Pro Ile Cys Phe Phe1
5825PRTArtificial SequenceDescription of Artificial Sequence Synthetic
peptideMOD_RES(3)..(3)Ser or ThrMOD_RES(4)..(4)Leu or
IleMOD_RES(5)..(5)Ser or Thr 82Pro Arg Xaa Xaa Xaa1
5834PRTArtificial SequenceDescription of Artificial Sequence Synthetic
peptide 83Asp Glu Val Asp1846PRTArtificial SequenceDescription of
Artificial Sequence Synthetic peptide 84Gly Trp Glu His Asp Gly1
5858PRTArtificial SequenceDescription of Artificial Sequence
Synthetic peptide 85Arg Pro Leu Ala Leu Trp Arg Ser1 5
User Contributions:
Comment about this patent or add new information about this topic: