Patent application number | Description | Published |
20110119370 | MEASURING NETWORK PERFORMANCE FOR CLOUD SERVICES - Described is a technology by which a content server downloads an active content measuring tool object to a client request for a page. When loaded, the measuring tool object makes network measurements, including by direct socket access, and return measurement results. As part of its operations, the measuring tool object may request measurement assignments from a central controller, and/or return those results to the central controller. Measurement assignments may be directed towards determining a round trip time/latency, measuring throughput, packet loss rate, detecting in-fight modification of content and/or detecting the presence of a middle box, including the presence of a caching proxy server middle box. The measurement results may be used to evaluate hypothetical deployment of a number of servers and/or geographic locations for those servers. | 05-19-2011 |
20110134909 | DATA COMMUNICATION WITH COMPENSATION FOR PACKET LOSS - Described is a technology by which a relay is coupled (e.g., by a wire) to a network and (e.g., by a wireless link) to an endpoint. Incoming data packets directed towards the endpoint are processed by the relay according to an error correction scheme, such as one that replicates packets. The reprocessed packets, which in general are more robust against packet loss, are then sent to the endpoint. For outgoing data packets received from the endpoint, the relay reprocesses the outgoing packets based upon the error correction scheme, such as to remove redundant packets before transmitting them to the network over the wire. Also described are various error correction schemes, and various types of computing devices that may be used as relays. The relay may be built into the network infrastructure, and/or a directory service may be employed to automatically find a suitable relay node for an endpoint device. | 06-09-2011 |
20110227790 | CUCKOO HASHING TO STORE BEACON REFERENCE DATA - Storing and retrieving beacon reference data in a truncated cuckoo hash table. Checksums of beacon identifiers associated with beacons are used to retrieve beacon reference data describing locations of the beacons in a hash table. The data is stored in one or more hash tables by cuckoo hashing to eliminate aliasing. The hash tables are provided to devices such as mobile devices. The devices retrieve the beacon reference data from the tables based using beacon identifiers of observed beacons. Location information for the devices is inferred using the retrieved beacon reference data. The cuckoo hash tables consume less memory storage space and obfuscate the beacon reference data. | 09-22-2011 |
20110270964 | USING DNS REFLECTION TO MEASURE NETWORK PERFORMANCE - A top level domain name system (DNS) server receives a DNS query from a local DNS resolver, the DNS query requesting a network address corresponding to a domain name. The top level DNS server reflects the local DNS resolver to a reflector DNS server. The reflector DNS server reflects the local DNS resolver to a collector DNS server, which in turn returns the network address to the local DNS resolver. The reflector DNS server and collector DNS server are both in the same data center, and one or more network performance measurements for communications between the local DNS resolver and the data center are determined based on the communications between the local DNS resolver and both the reflector DNS server and the collector DNS server. | 11-03-2011 |
20110276744 | FLASH MEMORY CACHE INCLUDING FOR USE WITH PERSISTENT KEY-VALUE STORE - Described is using flash memory, RAM-based data structures and mechanisms to provide a flash store for caching data items (e.g., key-value pairs) in flash pages. A RAM-based index maps data items to flash pages, and a RAM-based write buffer maintains data items to be written to the flash store, e.g., when a full page can be written. A recycle mechanism makes used pages in the flash store available by destaging a data item to a hard disk or reinserting it into the write buffer, based on its access pattern. The flash store may be used in a data deduplication system, in which the data items comprise chunk-identifier, metadata pairs, in which each chunk-identifier corresponds to a hash of a chunk of data that indicates. The RAM and flash are accessed with the chunk-identifier (e.g., as a key) to determine whether a chunk is a new chunk or a duplicate. | 11-10-2011 |
20110276780 | Fast and Low-RAM-Footprint Indexing for Data Deduplication - The subject disclosure is directed towards a data deduplication technology in which a hash index service's index maintains a hash index in a secondary storage device such as a hard drive, along with a compact index table and look-ahead cache in RAM that operate to reduce the I/O to access the secondary storage device during deduplication operations. Also described is a session cache for maintaining data during a deduplication session, and encoding of a read-only compact index table for efficiency. | 11-10-2011 |
20110276781 | Fast and Low-RAM-Footprint Indexing for Data Deduplication - The subject disclosure is directed towards a data deduplication technology in which a hash index service's index maintains a hash index in a secondary storage device such as a hard drive, along with a compact index table and look-ahead cache in RAM that operate to reduce the I/O to access the secondary storage device during deduplication operations. Also described is a session cache for maintaining data during a deduplication session, and encoding of a read-only compact index table for efficiency. | 11-10-2011 |
20110299526 | MULTIPARTY REAL TIME CONTENT DELIVERY - Described is a distributed peer-assisted multicast content delivery system (e.g., a multiparty conferencing application) that uses an adaptive link rate control protocol to discover and adapt to an arbitrary topology quickly and converge to efficient link rate allocations allowed by an underlying network. Link rates are regularly obtained and used to determine trees for sending packets to other nodes. Network coding is used to implement data multicast so that mixtures (i.e., linear combinations) of the packets are transmitted in the network. The redundant packets may be differentiated from non-redundant (“innovative”) packets such that network conditions may be measured by link innovation and/or session innovation. | 12-08-2011 |
20120102298 | Low RAM Space, High-Throughput Persistent Key-Value Store using Secondary Memory - Described is using flash memory (or other secondary storage), RAM-based data structures and mechanisms to access key-value pairs stored in the flash memory using only a low RAM space footprint. A mapping (e.g. hash) function maps key-value pairs to a slot in a RAM-based index. The slot includes a pointer that points to a bucket of records on flash memory that each had keys that mapped to the slot. The bucket of records is arranged as a linear-chained linked list, e.g., with pointers from the most-recently written record to the earliest written record. Also described are compacting non-contiguous records of a bucket onto a single flash page, and garbage collection. Still further described is load balancing to reduce variation in bucket sizes, using a bloom filter per slot to avoid unnecessary searching, and splitting a slot into sub-slots. | 04-26-2012 |
20120166401 | Using Index Partitioning and Reconciliation for Data Deduplication - The subject disclosure is directed towards a data deduplication technology in which a hash index service's index is partitioned into subspace indexes, with less than the entire hash index service's index cached to save memory. The subspace index is accessed to determine whether a data chunk already exists or needs to be indexed and stored. The index may be divided into subspaces based on criteria associated with the data to index, such as file type, data type, time of last usage, and so on. Also described is subspace reconciliation, in which duplicate entries in subspaces are detected so as to remove entries and chunks from the deduplication system. Subspace reconciliation may be performed at off-peak time, when more system resources are available, and may be interrupted if resources are needed. Subspaces to reconcile may be based on similarity, including via similarity of signatures that each compactly represents the subspace's hashes. | 06-28-2012 |
20120166448 | Adaptive Index for Data Deduplication - The subject disclosure is directed towards a data deduplication technology in which a hash index service's index and/or indexing operations are adaptable to balance deduplication performance savings, throughput and resource consumption. The indexing service may employ hierarchical chunking using different levels of granularity corresponding to chunk size, a sampled compact index table that contains compact signatures for less than all of the hash index's (or subspace's) hash values, and/or selective subspace indexing based on similarity of a subspace's data to another subspace's data and/or to incoming data chunks. | 06-28-2012 |
20120271909 | GLOBAL TRAFFIC MANAGEMENT USING MODIFIED HOSTNAME - A particular method includes receiving a request from a client at a server and sending a global traffic management identifier in response to the request from the client. The global traffic management identifier is determined based on an attribute of the client. In response to the client requesting access to a service based on a modified hostname of the service, a data center associated with the service is identified based on the modified hostname of the service. The modified hostname identifies the global traffic management identifier, and the identified data center is useable by the client to access the service. | 10-25-2012 |
20130054544 | Content Aware Chunking for Achieving an Improved Chunk Size Distribution - The subject disclosure is directed towards partitioning a file into chunks that satisfy a chunk size restriction, such as maximum and minimum chunk sizes, using a sliding window. For file positions within the chunk size restriction, a signature representative of a window fingerprint is compared with a target pattern, with a chunk boundary candidate identified if matched. Other signatures and patterns are then checked to determine a highest ranking signature (corresponding to a lowest numbered Rule) to associate with that chunk boundary candidate, or set an actual boundary if the highest ranked signature is matched. If the maximum chunk size is reached without matching the highest ranked signature, the chunking mechanism regresses to set the boundary based on the candidate with the next highest ranked signature (if no candidates, the boundary is set at the maximum). Also described is setting chunk boundaries based upon pattern detection (e.g., runs of zeros). | 02-28-2013 |
20130054782 | DETERMINATION OF UNAUTHORIZED CONTENT SOURCES - A plurality of network addresses from a distributed client is obtained, at least a first portion of the obtained network addresses including resolved network address responses to distributed client requests for resolved network addresses corresponding to one or more network location indicators associated with a first web service. Test content is obtained, based on one or more of the network addresses included in the first portion. It is determined whether the obtained test content includes unauthorized content. | 02-28-2013 |
20130114421 | ADAPTIVE BANDWIDTH ESTIMATION - It can be determined whether relative one way delay for data packets in a data stream exceeds a delay threshold. If so, then a delay congestion signal indicating that the relative one way delay exceeds the delay threshold can be generated. The delay congestion signal can be used in calculating an adaptive bandwidth estimate for the data stream. A packet loss rate congestion signal may also be used in calculating the bandwidth estimate. It can be determined whether a data stream of data packets is in a contention state. If the data stream is in the contention state, then an adaptive bandwidth estimate can be calculated for the data stream using a first bandwidth estimation technique. If the data stream is not in the contention state, then the bandwidth estimate for the data stream can be calculated using a second bandwidth estimation technique. | 05-09-2013 |
20130263151 | Consistent Hashing Table for Workload Distribution - Described is a technology by which a consistent hashing table of bins maintains values representing nodes of a distributed system. An assignment stage uses a consistent hashing function and a selection algorithm to assign values that represent the nodes to the bins. In an independent mapping stage, a mapping mechanism deterministically maps an object identifier/key to one of the bins as a mapped-to bin. | 10-03-2013 |
20130282964 | FLASH MEMORY CACHE INCLUDING FOR USE WITH PERSISTENT KEY-VALUE STORE - Described is using flash memory, RAM-based data structures and mechanisms to provide a flash store for caching data items (e.g., key-value pairs) in flash pages. A RAM-based index maps data items to flash pages, and a RAM-based write buffer maintains data items to be written to the flash store, e.g., when a full page can be written. A recycle mechanism makes used pages in the flash store available by destaging a data item to a hard disk or reinserting it into the write buffer, based on its access pattern. The flash store may be used in a data deduplication system, in which the data items comprise chunk-identifier, metadata pairs, in which each chunk-identifier corresponds to a hash of a chunk of data that indicates. The RAM and flash are accessed with the chunk-identifier (e.g., as a key) to determine whether a chunk is a new chunk or a duplicate. | 10-24-2013 |
20130282965 | FLASH MEMORY CACHE INCLUDING FOR USE WITH PERSISTENT KEY-VALUE STORE - Described is using flash memory, RAM-based data structures and mechanisms to provide a flash store for caching data items (e.g., key-value pairs) in flash pages. A RAM-based index maps data items to flash pages, and a RAM-based write buffer maintains data items to be written to the flash store, e.g., when a full page can be written. A recycle mechanism makes used pages in the flash store available by destaging a data item to a hard disk or reinserting it into the write buffer, based on its access pattern. The flash store may be used in a data deduplication system, in which the data items comprise chunk-identifier, metadata pairs, in which each chunk-identifier corresponds to a hash of a chunk of data that indicates. The RAM and flash are accessed with the chunk-identifier (e.g., as a key) to determine whether a chunk is a new chunk or a duplicate. | 10-24-2013 |
20130346672 | Multi-Tiered Cache with Storage Medium Awareness - The subject disclosure is directed towards a multi-tiered cache having cache tiers with different access properties. Objects are written to a selected a tier of the cache based upon object-related properties and/or cache-related properties. In one aspect, objects are stored in an active log among a plurality of logs. The active log is sealed upon reaching a target size, with a new active log opened. Garbage collecting is performed on a sealed log, such as the sealed log with the most garbage therein. | 12-26-2013 |
20140189348 | Integrated Data Deduplication and Encryption - The subject disclosure is directed towards encryption and deduplication integration between computing devices and a network resource. Files are partitioned into data blocks and deduplicated via removal of duplicate data blocks. Using multiple cryptographic keys, each data block is encrypted and stored at the network resource but can only be decrypted by an authorized user, such as domain entity having an appropriate deduplication domain-based cryptographic key. Another cryptographic key referred to as a content-derived cryptographic key ensures that duplicate data blocks encrypt to substantially equivalent encrypted data. | 07-03-2014 |
20140244604 | PREDICTING DATA COMPRESSIBILITY USING DATA ENTROPY ESTIMATION - The subject disclosure is directed towards predicting compressibility of a data block, and using the predicted compressibility in determining whether a data block if compressed will be sufficiently compressible to justify compression. In one aspect, data of the data block is processed to obtain an entropy estimate of the data block, e.g., based upon distinct value estimation. The compressibility prediction may be used in conjunction with a chunking mechanism of a data deduplication system. | 08-28-2014 |
20140280664 | CACHING CONTENT ADDRESSABLE DATA CHUNKS FOR STORAGE VIRTUALIZATION - The subject disclosure is directed towards using primary data deduplication concepts for more efficient access of data via content addressable caches. Chunks of data, such as deduplicated data chunks, are maintained in a fast access client-side cache, such as containing chunks based upon access patterns. The chunked content is content addressable via a hash or other unique identifier of that content in the system. When a chunk is needed, the client-side cache (or caches) is checked for the chunk before going to a file server for the chunk. The file server may likewise maintain content addressable (chunk) caches. Also described are cache maintenance, management and organization, including pre-populating caches with chunks, as well as using RAM and/or solid-state storage device caches. | 09-18-2014 |
20140380125 | ERASURE CODING ACROSS MULTIPLE ZONES - In various embodiments, methods and systems for erasure coding data across multiple storage zones are provided. This may be accomplished by dividing a data chunk into a plurality of sub-fragments. Each of the plurality of sub-fragments is associated with a zone. Zones comprise buildings, data centers, and geographic regions providing a storage service. A plurality of reconstruction parities is computed. Each of the plurality of reconstruction parities computed using at least one sub-fragment from the plurality of sub-fragments. The plurality of reconstruction parities comprises at least one cross-zone parity. The at least one cross-zone parity is assigned to a parity zone. The cross-zone parity provides cross-zone reconstruction of a portion of the data chunk. | 12-25-2014 |
20140380126 | ERASURE CODING ACROSS MULTIPLE ZONES AND SUB-ZONES - In various embodiments, methods and systems for erasure coding data across multiple storage zones are provided. This may be accomplished by dividing a data chunk into a plurality of sub-fragments. Each of the plurality of sub-fragments is associated with a zone. Zones comprise buildings, data centers, and geographic regions providing a storage service. A plurality of reconstruction parities is computed. Each of the plurality of reconstruction parities computed using at least one sub-fragment from the plurality of sub-fragments. The plurality of reconstruction parities comprises at least one cross-zone parity. The at least one cross-zone parity is assigned to a parity zone. The cross-zone parity provides cross-zone reconstruction of a portion of the data chunk. | 12-25-2014 |