Patent application number | Description | Published |
20080235201 | Consistent weighted sampling of multisets and distributions - Techniques are provided that identify near-duplicate items in large collections of items. A list of (value, frequency) pairs is received, and a sample (value, instance) is returned. The value is chosen from the values of the first list, and the instance is a value less than frequency, in such a way that the probability of selecting the same sample from two lists is equal to the similarity of the two lists. | 09-25-2008 |
20090132571 | EFFICIENT USE OF RANDOMNESS IN MIN-HASHING - Documents that are near-duplicates may be determined using techniques such as min-hashing. Randomness that is used in these techniques may be based on sequences of bits. The sequences of bits may be generated from a string of bits, with the sequences determined by parsing the string at each occurrence of a particular value, such as the value “1”. | 05-21-2009 |
20100070511 | REDUCING USE OF RANDOMNESS IN CONSISTENT UNIFORM HASHING - Documents that are near-duplicates may be determined using techniques involving consistent uniform hashing. A biased bit may be placed in the leading position of a sequence of bits that may be generated and subsequently used in comparison techniques to determine near-duplicate documents. Unbiased bits may be used in subsequent positions of the sequence of bits, after the biased bit, for use in comparison techniques. Samples may be used collectively, as opposed to individually, in the generation of biased bits. Sequences of bits may thus be produced not on a single sample basis, but for multiple samples, thereby amortizing the cost of generating randomness for the samples. Less than one bit of randomness per sample may be used. | 03-18-2010 |
20100262964 | Virtual Machine Packing Method Using Scarcity - A method for packing virtual machines onto host devices may calculate scarcity values for several different parameters. A host's scarcity for a parameter may be determined by multiplying the host's capacity for a parameter with the overall scarcity of that parameter. The sum of a host's scarcity for all the parameters determines the host's overall scarcity. Hosts having the highest scarcity are attempted to be populated with a group of virtual machines selected for compatibility with the host. In many cases, several different scenarios may be evaluated and an optimal scenario implemented. The method gives a high priority to those virtual machines that consume scarce resources, with the scarcity being a function of the available hardware and the virtual machines that may be placed on them. | 10-14-2010 |
20100281478 | MULTIPHASE VIRTUAL MACHINE HOST CAPACITY PLANNING - A virtual machine distribution system is described herein that uses a multiphase approach that provides a fast layout of virtual machines on physical computers followed by at least one verification phase that verifies that the layout is correct. During the fast layout phase, the system uses a dimension-aware vector bin-packing algorithm to determine an initial fit of virtual machines to physical hardware based on rescaled resource utilizations calculated against hardware models. During the verification phase, the system uses a virtualization model to check the recommended fit of virtual machine guests to physical hosts created during the fast layout phase to ensure that the distribution will not over-utilize any host given the overhead associated with virtualization. The system modifies the layout to eliminate any identified overutilization. Thus, the virtual machine distribution system provides the advantages of a fast, automated layout planning process with the robustness of slower, exhaustive processes. | 11-04-2010 |
20110067030 | FLOW BASED SCHEDULING - A job scheduler may schedule concurrent distributed jobs in a computer cluster by assigning tasks from the running jobs to compute nodes while balancing fairness with efficiency. Determining which tasks to assign to the compute nodes may be performed using a network flow graph. The weights on at least some of the edges of the graph encode data locality, and the capacities provide constraints that ensure fairness. A min-cost flow technique may be used to perform an assignment of the tasks represented by the network flow graph. Thus, online task scheduling with locality may be mapped onto a network flow graph, which in turn may be used to determine a scheduling assignment using min-cost flow techniques. The costs may encode data locality, fairness, and starvation-freedom. | 03-17-2011 |
20110208763 | DIFFERENTIALLY PRIVATE DATA RELEASE - A query log includes a list of queries and a count for each query representing the number of times that the query was received by a search engine. In order to provide differential privacy protection to the queries, noise is generated and added to each count, and queries that have counts that fall below a threshold are removed from the query log. A distribution associated with a function used to generate the noise is referenced to determine a distribution of a number of times that a hypothetical query having a zero count would have its count exceed the threshold after the addition of noise. Random queries of an amount equal to a sample from the distribution of number of times are added to the query log with a count that is greater than the threshold count. | 08-25-2011 |
20130275977 | VIRTUAL MACHINE PACKING METHOD USING SCARCITY - A method for packing virtual machines onto host devices may calculate scarcity values for several different parameters. A host's scarcity for a parameter may be determined by multiplying the host's capacity for a parameter with the overall scarcity of that parameter. The sum of a host's scarcity for all the parameters determines the host's overall scarcity. Hosts having the highest scarcity are attempted to be populated with a group of virtual machines selected for compatibility with the host. In many cases, several different scenarios may be evaluated and an optimal scenario implemented. The method gives a high priority to those virtual machines that consume scarce resources, with the scarcity being a function of the available hardware and the virtual machines that may be placed on them. | 10-17-2013 |
20140283091 | DIFFERENTIALLY PRIVATE LINEAR QUERIES ON HISTOGRAMS - The privacy of linear queries on histograms is protected. A database containing private data is queried. Base decomposition is performed to recursively compute an orthonormal basis for the database space. Using correlated (or Gaussian) noise and/or least squares estimation, an answer having differential privacy is generated and provided in response to the query. In some implementations, the differential privacy is ε-differential privacy (pure differential privacy) or is (ε,δ)-differential privacy (i.e., approximate differential privacy). In some implementations, the data in the database may be dense. Such implementations may use correlated noise without using least squares estimation. In other implementations, the data in the database may be sparse. Such implementations may use least squares estimation with or without using correlated noise. | 09-18-2014 |