Patent application number | Description | Published |
20110093464 | SYSTEM AND METHOD FOR GROUPING MULTIPLE STREAMS OF DATA - A document clustering system and method of assigning a document to a cluster of documents containing related content are provided. Each cluster is associated with a cluster summary describing the content of the documents in the cluster. The method comprises: determining, at a document clustering system, whether the document should be grouped with one or more previously created cluster summaries, the previously created cluster summaries being stored in a memory in a B-tree data structure; and if it is determined that the document should not be grouped with the one or more previously created cluster summaries, then creating, at a document clustering system, a cluster summary based on the content of the document and storing the created cluster summary in the B-tree data structure. | 04-21-2011 |
20120330938 | SYSTEM AND METHOD FOR FILTERING DOCUMENTS - A method and document separation system for separating a set of related documents is described. In one aspect, the method comprises: determining, on a document selection system, quality scores for a plurality of the documents in the set of related documents; obtaining a similarity score for a plurality of pairs of documents in the set of related document; and on a document selection system, obtaining a first subset of related documents which solves an optimization problem, the first subset of related documents including a portion of the document in the set of related documents, the optimization problem being a function of one or more quality scores of the documents assigned to the first subset of related documents and one or more similarity scores of pairs of documents assigned to the first subset of related documents. | 12-27-2012 |
20120330969 | SYSTEMS AND METHODS FOR RANKING DOCUMENT CLUSTERS - Document cluster ranking systems and methods of ranking document clusters are described. In some example embodiments, the method comprises: obtaining, at a document cluster ranking system, a value associated with a first feature for each of a plurality of document clusters; based on the values associated with the first feature, automatically generating, at the document cluster ranking system, a plurality of first feature bins, each first feature bin defining a range of values and a bin identifier; and obtaining a score for one of the document clusters, by: i) identifying the first feature bin having a range of values which includes the obtained value associated with the first feature for that one of the document clusters; and ii) determining a score for that document cluster based on the first feature bin identifier for the identified first feature bin. | 12-27-2012 |
20140067812 | SYSTEMS AND METHODS FOR RANKING DOCUMENT CLUSTERS - Document cluster ranking systems and methods of ranking document clusters are described. In some example embodiments, the method comprises: obtaining, at a document cluster ranking system, a value associated with a first feature for each of a plurality of document clusters; based on the values associated with the first feature, automatically generating, at the document cluster ranking system, a plurality of first feature bins, each first feature bin defining a range of values and a bin identifier; and obtaining a score for one of the document clusters, by: i) identifying the first feature bin having a range of values which includes the obtained value associated with the first feature for that one of the document clusters; and ii) determining a score for that document cluster based on the first feature bin identifier for the identified first feature bin. | 03-06-2014 |
20150149295 | METHOD AND SYSTEM FOR GENERATING RECOMMENDATIONS BASED ON MEDIA USAGE AND PURCHASE BEHAVIOR - A server generates item recommendation lists for users, whose behaviors are registered related to user library items and assigned relevance in relation to available inventory items. First, similarity computed between any pair of inventory items based on user-specific relevance factors. Second, similarity is measured between inventory items by over content and/or metadata corresponding to the inventory items. Overall similarity is computed between the library and the available inventory items based on a sum of the first and second measured similarities. Available inventory items are recommended based on the computed overall similarity score. Relevance, which may diminish over a time, may be assigned based on behavior type and duration. | 05-28-2015 |
20150154497 | CONTENT BASED SIMILARITY DETECTION - Content Based Similarity Detection. A computer implemented method includes computing a hash of each word in a collection of books to produce a numerical integer token using a reduced representation and computing an Inverse Document Frequency (IDF) vector comprising the number of books the token appears in, for every token in the collection of books. The method also includes creating a token occurrence count vector for each book in the collection and normalizing the token occurrence count vector using the IDF vector to create a Term Frequency-Inverse Document Frequency (TF-IDF) vector. Further, the method includes reducing each TF-IDF vector by using random projections to obtain a final signature representing each book in the collection, reducing each TF-IDF vector by using random projections to obtain a final signature representing each book in the collection and using a trained machine learning algorithm, determining whether each of the list of candidate books is similar to the target book. | 06-04-2015 |
20150154671 | SYSTEM AND METHOD FOR AUTOMATIC ELECTRONIC DOCUMENT IDENTIFICATION - A system and method for automatically identifying an electronic document. The method includes accessing, within an electronic device, an electronic document and extracting text from the electronic document. A signature is then determined based on the text of the electronic document and the signature is communicated over a communication channel. The method further includes receiving an identifier of the electronic document over the communication channel. In one embodiment, the identifier is determined by a server matching the signature against a signature library. The method further includes receiving a bookmark associated with the electronic document. | 06-04-2015 |
20150379609 | GENERATING RECOMMENDATIONS FOR UNFAMILIAR USERS BY UTILIZING SOCIAL SIDE INFORMATION - System and method for identifying commodities for recommendation to a target user based on side information that is pertinent to a specific target user and extrinsic to the commodities. Training data is exploited to derive a statistical correlation between users' side information of a plurality of attributes with a plurality of commodities. The training data includes side information of a set of training users and a plurality of commodities towards which the training users have manifested preference. Based on the derived statistical correlation and the target user's side information, a probability distribution representing the target user's tendency to purchase the plurality of commodities can be determined. As a result, a list of commodities can be automatically selected from the plurality of commodities and recommended to the target user. | 12-31-2015 |