Patent application number | Description | Published |
20080243481 | Large Language Models in Machine Translation - Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n-1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus. | 10-02-2008 |
20080262828 | Encoding and Adaptive, Scalable Accessing of Distributed Models - Systems, methods, and apparatus for accessing distributed models in automated machine processing, including using large language models in machine translation, speech recognition and other applications. | 10-23-2008 |
20100005080 | SYSTEM AND METHOD FOR ANALYZING DATA RECORDS - A method and system for analyzing data records includes allocating groups of records to respective processes of a first plurality of processes executing in parallel. In each respective process of the first plurality of processes, for each record in the group of records allocated to the respective process, a query is applied to the record so as to produce zero or more values. Zero or more emit operators are applied to each of the zero or more produced values so as to add corresponding information to an intermediate data structure. Information from a plurality of the intermediate data structures is aggregated to produce output data. | 01-07-2010 |
20100114965 | SYSTEM AND METHOD FOR IMPROMPTU SHARED COMMUNICATION SPACES - Communications between entities who may share common interests. For entities determined to be sharing common interests (e.g., searching using the same terms or topics, browsing a page, a site or a groups of topically related sites), options for communication among the entities are provided. For example, a chat room may be dynamically created for persons who are currently searching or browsing the same or related information. As another example, a “homepage” may be created for each query and contain various types of information related to the query. A permission module controls which entities may participate, what types of information (and from what sources) an entity can (or desires to) receive, what types of information the entity may (or desires to) share. | 05-06-2010 |
20110022605 | DOCUMENT SCORING BASED ON LINK-BASED CRITERIA - A method may include receiving a document and an initial score for the document; determining that there has been a decrease in a rate or quantity of new links that point to the document over time; classifying the document as stale in response to the determining; decreasing the initial score for the document, resulting in an updated score; and ranking the document with regard to at least one other document based, at least in part, on the score. | 01-27-2011 |
20110029542 | DOCUMENT SCORING BASED ON DOCUMENT INCEPTION DATE - A system may determine a document inception date associated with a document, generate a score for the document based, at least in part, on the document inception date, and rank the document with regard to at least one other document based, at least in part, on the score. | 02-03-2011 |
20110153577 | Query Processing System and Method for Use with Tokenspace Repository - A search engine server system receives from a client system a search query and identifies a set of documents in accordance with the search query. A content snippet corresponding to content in a respective document of the identified set of documents is generated, the content snippet associated with at least one query term of the one or more query terms in the search query. A response to the search query is returned to the client system, the response including information identifying at least the respective document and including the content snippet. Generating the content snippet includes performing a first decompression operation on first token identifiers, from a compressed document repository, to provide a set of second token identifiers, and performing a second decompression operation on the set of second token identifiers to recover uncompressed content comprising a portion of the respective document. | 06-23-2011 |
20110179118 | Shared Communication Space Invitations - A computer-implemented method of providing invitations to a shared communication space, performed by a server system, includes providing the shared communication space, which includes content associated with a set of characteristics, and identifying a user, in accordance with a set of characteristics associated with the user and the set of characteristics associated with the content in the shared communication space. The method further includes sending to the identified user a invitation to participate in the shared communication space, and upon acceptance of the invitation by the user, enabling access by the user to the shared communication space by the user and enabling the user to exchange information with other participants in the shared communication space via the shared communication space. | 07-21-2011 |
20110258185 | DOCUMENT SCORING BASED ON DOCUMENT CONTENT UPDATE - A system may determine a measure of how a content of a document changes over time, generate a score for the document based, at least in part, on the measure of how the content of the document changes over time, and rank the document with regard to at least one other document based, at least in part, on the score. | 10-20-2011 |
20110264671 | DOCUMENT SCORING BASED ON DOCUMENT CONTENT UPDATE - A system may determine a measure of how a content of a document changes over time, generate a score for the document based, at least in part, on the measure of how the content of the document changes over time, and rank the document with regard to at least one other document based, at least in part, on the score. | 10-27-2011 |
20120005199 | DOCUMENT SCORING BASED ON DOCUMENT CONTENT UPDATE - A system may determine a measure of how a content of a document changes over time, generate a score for the document based, at least in part, on the measure of how the content of the document changes over time, and rank the document with regard to at least one other document based, at least in part, on the score. | 01-05-2012 |
20120016870 | DOCUMENT SCORING BASED ON QUERY ANALYSIS - A system may determine an extent to which a document is selected when the document is included in a set of search results, generate a score for the document based, at least in part, on the extent to which the document is selected when the document is included in a set of search results; and rank the document with regard to at least one other document based, at least in part, on the score. | 01-19-2012 |
20120016871 | DOCUMENT SCORING BASED ON QUERY ANALYSIS - A system may determine an extent to which a document is selected when the document is included in a set of search results, generate a score for the document based, at least in part, on the extent to which the document is selected when the document is included in a set of search results; and rank the document with regard to at least one other document based, at least in part, on the score. | 01-19-2012 |
20120016874 | DOCUMENT SCORING BASED ON QUERY ANALYSIS - A system may determine an extent to which a document is selected when the document is included in a set of search results, generate a score for the document based, at least in part, on the extent to which the document is selected when the document is included in a set of search results; and rank the document with regard to at least one other document based, at least in part, on the score. | 01-19-2012 |
20120016888 | DOCUMENT SCORING BASED ON QUERY ANALYSIS - A system may determine an extent to which a document is selected when the document is included in a set of search results, generate a score for the document based, at least in part, on the extent to which the document is selected when the document is included in a set of search results; and rank the document with regard to at least one other document based, at least in part, on the score. | 01-19-2012 |
20120016889 | DOCUMENT SCORING BASED ON QUERY ANALYSIS - A system may determine an extent to which a document is selected when the document is included in a set of search results, generate a score for the document based, at least in part, on the extent to which the document is selected when the document is included in a set of search results; and rank the document with regard to at least one other document based, at least in part, on the score. | 01-19-2012 |
20120023098 | DOCUMENT SCORING BASED ON QUERY ANALYSIS - A system may determine an extent to which a document is selected when the document is included in a set of search results, generate a score for the document based, at least in part, on the extent to which the document is selected when the document is included in a set of search results; and rank the document with regard to at least one other document based, at least in part, on the score. | 01-26-2012 |
20120209838 | DOCUMENT SCORING BASED ON QUERY ANALYSIS - A system may determine an extent to which a document is selected when the document is included in a set of search results, generate a score for the document based, at least in part, on the extent to which the document is selected when the document is included in a set of search results; and rank the document with regard to at least one other document based, at least in part, on the score. | 08-16-2012 |
20120215787 | System and Method for Analyzing Data Records - A method and system for analyzing data records includes allocating groups of records to respective processes of a first plurality of processes executing in parallel. In each respective process of the first plurality of processes, for each record in the group of records allocated to the respective process, a query is applied to the record so as to produce zero or more values. Zero or more emit operators are applied to each of the zero or more produced values so as to add corresponding information to an intermediate data structure. Information from a plurality of the intermediate data structures is aggregated to produce output data. | 08-23-2012 |
20130046530 | ENCODING AND ADAPTIVE, SCALABLE ACCESSING OF DISTRIBUTED MODELS - Systems, methods, and apparatus for accessing distributed models in automated machine processing, including using large language models in machine translation, speech recognition and other applications. | 02-21-2013 |
20130212076 | Generating Content Snippets Using a Tokenspace Repository - A search engine server system receives from a client system a search query and identifies a set of documents in accordance with the search query. A content snippet corresponding to content in a respective document of the identified set of documents is generated, the content snippet associated with at least one query term of the one or more query terms in the search query. A response to the search query is returned to the client system, the response including information identifying at least the respective document and including the content snippet. Generating the content snippet includes performing a first decompression operation on first token identifiers, from a compressed document repository, to provide a set of second token identifiers, and performing a second decompression operation on the set of second token identifiers to recover uncompressed content comprising a portion of the respective document. | 08-15-2013 |
20130346059 | LARGE LANGUAGE MODELS IN MACHINE TRANSLATION - Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n−1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus. | 12-26-2013 |
20140096138 | System and Method For Large-Scale Data Processing Using an Application-Independent Framework - A large-scale data processing system and method for processing data in a distributed and parallel processing environment is disclosed. The system comprises a set of interconnected computing systems, each having one or more processors and memory. The set of interconnected computing systems include: a set of application-independent map modules for reading portions of input files containing data, and for producing intermediate data values by applying at least one user-specified, application-specific map operation to the data; a set of intermediate data structures distributed among a plurality of the interconnected computing systems for storing the intermediate data values; and a set of application-independent reduce modules, distinct from the plurality of application-independent map modules, for producing final output data by applying at least one user-specified, application-specific reduce operation to the intermediate data values. | 04-03-2014 |
20140257787 | ENCODING AND ADAPTIVE, SCALABLE ACCESSING OF DISTRIBUTED MODELS - Systems, methods, and apparatus for accessing distributed models in automated machine processing, including using large language models in machine translation, speech recognition and other applications. | 09-11-2014 |
Patent application number | Description | Published |
20110040733 | SYSTEMS AND METHODS FOR GENERATING STATISTICS FROM SEARCH ENGINE QUERY LOGS - A computer-implemented method includes calculating first statistics about a user-identified event within a first subset of a database of events; selecting a second subset of the database of events based on said first statistics; calculating second statistics about the user-identified event within the second subset of the database of events; merging the first and second statistics as statistics of the user-identified event within the entire database of events; and generating a result including at least a portion of the merged statistics of the user-identified event. | 02-17-2011 |
20110119139 | Identifying related information given content and/or presenting related information in association with content-related advertisements - The usefulness of content (target content), such as advertisements, may be increased by determining additional content and providing such additional content in association with the content. The target content may be text, a Web page, a URL, a search query, etc. The additional content might be related suggested queries (e.g. “Try a search for ______”), news articles (or excerpts or summaries thereof), reviews (or excerpts or summaries thereof), advertisements, user group messages, etc. | 05-19-2011 |
20110179023 | Methods and Apparatus for Employing Usage Statistics in Document Retrieval - Methods and apparatus consistent with the invention provide improved organization of documents responsive to a search query. In one embodiment, a search query is received and a list of responsive documents is identified. The responsive documents are organized based in whole or in part on usage statistics. | 07-21-2011 |
20120023073 | Efficient Indexing of Documents with Similar Content - A set of documents may be stored and indexed as a compressed sequence of tokens. A set of documents are grouped into clusters. Sequences of tokens representing the clusters of documents are encoded to elide some repeating instances of tokens. A compressed sequence of tokens is generated from the compressed cluster sequences of tokens. Queries on the compressed sequence are performed by identifying cluster sequences within the compressed sequence that are likely to have documents that satisfy the query and then identifying, within these identified clusters, the documents that actually satisfies the query. | 01-26-2012 |
20120209726 | IDENTIFYING RELATED INFORMATION GIVEN CONTENT AND/OR PRESENTING RELATED INFORMATION IN ASSOCIATION WITH CONTENT-RELATED ADVERTISEMENTS - The usefulness of content (target content), such as advertisements, may be increased by determining additional content and providing such additional content in association with the content. The target content may be text, a Web page, a URL, a search query, etc. The additional content might be related suggested queries (e.g. “Try a search for ______”), news articles (or excerpts or summaries thereof), reviews (or excerpts or summaries thereof), advertisements, user group messages, etc. | 08-16-2012 |
20120215765 | Systems and Methods for Generating Statistics from Search Engine Query Logs - A computer-implemented method includes calculating first statistics about a user-identified event within a first subset of a database of events; selecting a second subset of the database of events based on said first statistics; calculating second statistics about the user-identified event within the second subset of the database of events; merging the first and second statistics as statistics of the user-identified event within the entire database of events; and generating a result including at least a portion of the merged statistics of the user-identified event. | 08-23-2012 |
20120226705 | METHODS AND APPARATUS FOR EMPLOYING USAGE STATISTICS IN DOCUMENT RETRIEVAL - Methods and apparatus consistent with the invention provide improved organization of documents responsive to a search query. In one embodiment, a search query is received and a list of responsive documents is identified. The responsive documents are organized based in whole or in part on usage statistics. | 09-06-2012 |
20120303622 | Efficient Indexing of Documents with Similar Content - A computer system comprising one or more processors and memory groups a set of documents into a plurality of clusters. Each cluster includes one or more documents of the set of documents and a respective cluster of documents of the plurality of clusters includes respective cluster data corresponding to a plurality of documents including a first document and a second document. The computer system determines that the second document includes duplicate data that is duplicative of corresponding data in the first document, identifies a respective subset of the respective cluster data that excludes at least a subset of the duplicate data, and generates an index of the respective subset of the respective cluster data. | 11-29-2012 |
20130110909 | Redundant Data Requests with Cancellation | 05-02-2013 |
20130212092 | Multi-Stage Query Processing System and Method for Use with Tokenspace Repository - A multi-stage query processing system and method enables multi-stage query scoring, including “snippet” generation, through incremental document reconstruction facilitated by a multi-tiered mapping scheme. At one or more stages of a multi-stage query processing system a set of relevancy scores are used to select a subset of documents for presentation as an ordered list to a user. The set of relevancy scores can be derived in part from one or more sets of relevancy scores determined in prior stages of the multi-stage query processing system. In some embodiments, the multi-stage query processing system is capable of executing one or more passes on a user query, and using information from each pass to expand the user query for use in a subsequent pass to improve the relevancy of documents in the ordered list. | 08-15-2013 |
20130297592 | Associating Application-Specific Methods with Tables Used for Data Storage - A method of accessing data includes storing a table that includes a plurality of tablets corresponding to distinct non-overlapping table portions. Respective pluralities of tablet access objects and application objects are stored in a plurality of servers. A distinct application object and distinct tablet are associated with each tablet access object. Each application object corresponds to a distinct instantiation of an application associated with the table. The tablet access objects and associated application objects are redistributed among the servers in accordance with a first load-balancing criterion. A first request directed to a respective tablet is received from a client. In response, the tablet access object associated with the respective tablet is used to perform a data access operation on the respective tablet, and the application object associated with the respective tablet is used to perform an additional computational operation to produce a result to be returned to the client. | 11-07-2013 |
20140279773 | Scoring Concept Terms Using a Deep Network - Methods, systems, and apparatus, including computer programs encoded on computer storage media, for scoring concept terms using a deep network. One of the methods includes receiving an input comprising a plurality of features of a resource, wherein each feature is a value of a respective attribute of the resource; processing each of the features using a respective embedding function to generate one or more numeric values; processing the numeric values to generate an alternative representation of the features of the resource, wherein processing the floating point values comprises applying one or more non-linear transformations to the floating point values; and processing the alternative representation of the input to generate a respective relevance score for each concept term in a pre-determined set of concept terms, wherein each of the respective relevance scores measures a predicted relevance of the corresponding concept term to the resource. | 09-18-2014 |
20140351029 | METHODS AND APPARATUS FOR SERVING RELEVANT ADVERTISEMENTS - The relevance of advertisements to a user's interests is improved. In one implementation, the content of a web page is analyzed to determine a list of one or more topics associated with that web page. An advertisement is considered to be relevant to that web page if it is associated with keywords belonging to the list of one or more topics. One or more of these relevant advertisements may be provided for rendering in conjunction with the web page or related web pages. | 11-27-2014 |
20150356461 | TRAINING DISTILLED MACHINE LEARNING MODELS - Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a distilled machine learning model. One of the methods includes training a cumbersome machine learning model, wherein the cumbersome machine learning model is configured to receive an input and generate a respective score for each of a plurality of classes; and training a distilled machine learning model on a plurality of training inputs, wherein the distilled machine learning model is also configured to receive inputs and generate scores for the plurality of classes, comprising: processing each training input using the cumbersome machine learning model to generate a cumbersome target soft output for the training input; and training the distilled machine learning model to, for each of the training inputs, generate a soft output that matches the cumbersome target soft output for the training input. | 12-10-2015 |
Patent application number | Description | Published |
20100076954 | Representative Document Selection for Sets of Duplicate Dcouments in a Web Crawler System - Duplicate documents are detected in a web crawler system. Upon receiving a newly crawled document, a set of documents, if any, sharing the same content as the newly crawled document is identified. Information identifying the newly crawled document and the selected set of documents is merged into information identifying a new set of documents. Duplicate documents are included and excluded from the new set of documents based on a query independent metric for each such document. A single representative document for the new set of documents is identified in accordance with a set of predefined conditions. | 03-25-2010 |
20100100437 | Suggesting and/or providing ad serving constraint information - Targeting information (also referred to as ad “serving constraints”) or candidate targeting information for an advertisement is identified. Targeting information may be identified by extracting topics or concepts from, and/or generating topics or concepts based on, ad information, such as information from a Web page to which an ad is linked (or some other Web page of interest to the ad or advertiser). The topics or concepts may be relevant queries associated with the Web page of interest, clusters, etc. | 04-22-2010 |
20100174605 | METHODS AND APPARATUS FOR SERVING RELEVANT ADVERTISEMENTS - The relevance of advertisements to a user's interests is improved. In one implementation, the content of a web page is analyzed to determine a list of one or more topics associated with that web page. An advertisement is considered to be relevant to that web page if it is associated with keywords belonging to the list of one or more topics. One or more of these relevant advertisements may be provided for rendering in conjunction with the web page or related web pages. | 07-08-2010 |
20100185513 | SERVING ADVERTISEMENTS BASED ON CONTENT - Advertisers are permitted to put targeted ads on page on the web (or some other document of any media type). The present invention may do so by (i) obtaining content that includes available spots for ads, (ii) determining ads relevant to content, and/or (iii) combining content with ads determined to be relevant to the content. | 07-22-2010 |
20110145731 | SERVING CONTENT-RELEVANT ADVERTISEMENTS WITH CLIENT-SIDE DEVICE SUPPORT - A client-side application (such as a browser, a browser plug-in, a browser toolbar plug-in, etc. on an end user's computer) is used to support the serving of content-relevant ads to the client device. The client-side application may provide such support by sending document information (such as a document identifier, document content, content relevance information, etc.) to a content ad server. The client-side application may also be used to combine content of the document and the content-relevant ads. For example, the client-side application may combine content of the document and the ads in a window (e.g., in a browser window), may provide the ads in a window above, below, adjacent to a document window, may provide the ads in “chrome” of the browser, etc. | 06-16-2011 |
20110191309 | SERVING ADVERTISEMENTS BASED ON CONTENT - Advertisers are permitted to put targeted ads on page on the web (or some other document of any media type). The present invention may do so by (i) obtaining content that includes available spots for ads, (ii) determining ads relevant to content, and/or (iii) combining content with ads determined to be relevant to the content. | 08-04-2011 |
20110276561 | Representative Document Selection for Sets of Duplicate Documents in a Web Crawler System - Duplicate documents are detected in a web crawler system. Upon receiving a newly crawled document, a set of documents, if any, sharing the same content as the newly crawled document is identified. Information identifying the newly crawled document and the selected set of documents is merged into information identifying a new set of documents. Duplicate documents are included and excluded from the new set of documents based on a query independent metric for each such document. A single representative document for the new set of documents is identified in accordance with a set of predefined conditions. | 11-10-2011 |
20120173334 | METHODS AND APPARATUS FOR SERVING RELEVANT ADVERTISEMENTS - The relevance of advertisements to a user's interests is improved. In one implementation, the content of a web page is analyzed to determine a list of one or more topics associated with that web page. An advertisement is considered to be relevant to that web page if it is associated with keywords belonging to the list of one or more topics. One or more of these relevant advertisements may be provided for rendering in conjunction with the web page or related web pages. | 07-05-2012 |
20120323896 | REPRESENTATIVE DOCUMENT SELECTION FOR A SET OF DUPLICATE DOCUMENTS - Systems and methods for indexing a representative document from a set of duplicate documents are disclosed. Disclosed systems and methods comprise selecting a first document in a plurality of documents on the basis that the first document is associated with a query independent score. Each respective document in the plurality of documents has a fingerprint that indicates that the respective document has substantially identical content to every other document in the plurality of documents. Disclosed systems and methods further comprise indexing, in accordance with the query independent score, the first document thereby producing an indexed first document. With respect to the plurality of documents, only the indexed first document is included in a document index. | 12-20-2012 |
20140040027 | SERVING ADVERTISEMENTS BASED ON CONTENT - Advertisers are permitted to put targeted ads on page on the web (or some other document of any media type). The present invention may do so by (i) obtaining content that includes available spots for ads, (ii) determining ads relevant to content, and/or (iii) combining content with ads determined to be relevant to the content. | 02-06-2014 |
Patent application number | Description | Published |
20090037393 | System and Method of Accessing a Document Efficiently Through Multi-Tier Web Caching - Upon receipt of a document request, a client assistant examines its cache for the document. If not successful, a server searches for the requested document in its cache. If the server copy is still not fresh or not found, the server seeks the document from its host. If the host cannot provide the copy, the server seeks it from a document repository. Certain documents are identified from the document repository as being fresh or stable. Information about each these identified documents is transmitted to the server which inserts entries into an index if the index does not already contain an entry for the document. If and when this particular document is requested, the document will not be present in the server, however the server will contain an entry directing the server to obtain the document from the document repository rather than the document's web host. | 02-05-2009 |
20120271852 | System and Method of Accessing a Document Efficiently Through Multi-Tier Web Caching - Upon receipt of a document request, a client assistant examines its cache for the document. If not successful, a server searches for the requested document in its cache. If the server copy is still not fresh or not found, the server seeks the document from its host. If the host cannot provide the copy, the server seeks it from a document repository. Certain documents are identified from the document repository as being fresh or stable. Information about each of these identified documents is transmitted to the server which inserts entries into an index if the index does not already contain an entry for the document. If and when this particular document is requested, the document will not be present in the server, however the server will contain an entry directing the server to obtain the document from the document repository rather than the document's web host. | 10-25-2012 |
20130339295 | Organizing Data in a Distributed Storage System - A distributed storage system is provided. The distributed storage system includes multiple front-end servers and zones for managing data for clients. Data within the distributed storage system is associated with a plurality of accounts and divided into a plurality of groups, each group including a plurality of splits, each split being associated with a respective account, and each group having multiple tablets and each tablet managed by a respective tablet server of the distributed storage system. Data associated with different accounts may be replicated within the distributed storage system using different data replication policies. There is no limit to the amount of data for an account by adding new splits to the distributed storage system. In response to a client request for a particular account's data, a front-end server communicates such request to a particular zone that has the client-requested data and returns the client-requested data to the requesting client. | 12-19-2013 |
20130339318 | METHOD AND SYSTEM FOR DELETING OBSOLETE FILES FROM A FILE SYSTEM - A method for deleting obsolete files from a file system is provided. The method includes: receiving a request to delete a reference to a target file in a file system from a file reference data structure, wherein the file reference data structure includes target file names and reference file names; identifying a reference file name in the file reference data structure, wherein the reference file name includes a file name of the target file; deleting a reference file from the file system, wherein the reference file has the identified reference file name; checking whether the file system includes at least one reference file whose file name matches the file name of the target file; if there is no such reference file in the file system: deleting the target file from the file system; and deleting the file name of the target file from the file reference data structure. | 12-19-2013 |
20130346540 | Storing and Moving Data in a Distributed Storage System - A system, computer-readable storage medium storing at least one program, and a computer-implemented method for identifying a storage group in a distributed storage system into which data is to be stored is presented. A data structure including information relating to storage groups in a distributed storage system is maintained, where a respective entry in the data structure for a respective storage group includes placement metrics for the respective storage group. A request to identify a storage group into which data is to be stored is received from a computer system. The data structure is used to determine an identifier for a storage group whose placement metrics satisfy a selection criterion. The identifier for the storage group whose placement metrics satisfy the selection criterion is returned to the computer system. | 12-26-2013 |
20140025899 | Efficiently Updating and Deleting Data in a Data Storage System - A method of storing data is disclosed. The method is performed on a data storage server having one or more processors and memory storing one or more programs for execution by the one or more processors. The data storage server receives a first and second data request, the requests including a first and second range of one or more keys and an associated first and second value respectively. The data storage server identifies one or more overlap points associated with the first range and the second range. For each of the overlap points, the data storage server then creates data items including ranges of keys, the ranges of each data item including one or more keys that are either: (a) the keys between a terminal key of the first or second range and the overlap point, or (b) the keys between two adjacent overlap points. | 01-23-2014 |
20150046525 | REDUNDANT DATA REQUESTS WITH CANCELLATION - A method of processing a request, performed by a respective server, is provided in which a request is received from a client. After receiving the request, a determination is made as to whether at least a first predefined number of other servers have a task-processing status for the request indicating that the other servers have undertaken performance of a task-processing operation for the request. When less than the first number of other servers in the set of other servers have the task-processing status for the request, a processing-status message is sent to one or more of the servers in the set of other servers indicating that the respective server is performing the task-processing operation. Upon completion of the task-processing, a result of the processing is sent to the client contingent upon a status of the other servers in the set of other servers. | 02-12-2015 |
20150178383 | Classifying Data Objects - Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying data objects. One of the methods includes obtaining data that associates each term in a vocabulary of terms with a respective high-dimensional representation of the term; obtaining classification data for a data object, wherein the classification data includes a respective score for each of a plurality of categories, and wherein each of the categories is associated with a respective category label; computing an aggregate high-dimensional representation for the data object from high-dimensional representations for the category labels associated with the categories and the respective scores; identifying a first term in the vocabulary of terms having a high-dimensional representation that is closest to the aggregate high-dimensional representation; and selecting the first term as a category label for the data object. | 06-25-2015 |
20150178596 | Label Consistency for Image Analysis - Systems and techniques are disclosed for labeling objects within an image. The objects may be labeled by selecting an option from a plurality of options such that each option is a potential label for the object. An option may have an option score associated with. Additionally, a relation score may be calculated for a first option and a second option corresponding to a second object in an image. The relation score may be based on a frequency, probability, or observance corresponding to the co-occurrence of text associated with the first option and the second option in a text corpus such as the World Wide Web. An option may be selected as a label for an object based on a global score calculated based at least on an option score and relation score associated with the option. | 06-25-2015 |