Patent application number | Description | Published |
20100005088 | Using An Encyclopedia To Build User Profiles - Described are various embodiments which enable organizations to track and use knowledge and expertise of their associated individuals. An organization can use exemplary embodiments to automatically summarize the expertise of each individual from documents available from internal or external web sites. For example, a web crawler crawls a computer network to identify documents that name an individual. Summaries of the documents are generated based on articles in an encyclopedia, and a profile is built of the individual using the summaries. These summaries are used for automatically searching and automatically discovering individuals having particular knowledge or expertise on certain topics and subjects. | 01-07-2010 |
20100268701 | NAVIGATIONAL RANKING FOR FOCUSED CRAWLING - Systems and methods of navigational ranking for focused crawling are disclosed. In an exemplary embodiment, a method may include using a classifier to distinguish at least one target web page from other web pages on a website. The method may also include modeling the web pages on the website by a directed graph G=(V, E), wherein each web page is represented by a vertex (V), and a link between two web pages is represented by an edge (E). The method may also include assigning each web page (u) in V is assigned a weight p(u) based on the classifier to calculate a navigational ranking indicating relevance of a web page. | 10-21-2010 |
20100293116 | URL AND ANCHOR TEXT ANALYSIS FOR FOCUSED CRAWLING - Systems and methods of URL and anchor text analysis for focused crawling are disclosed. In an exemplary embodiment, a method may include training a focused crawler by: obtaining a training set of at least URL's or anchor text for a website, computing a score for the training set, and extracting a plurality of features of the training set, and computing a score for each of the plurality of features. The features identify key information contained in the website. The method may also include executing a trained focused crawler on other websites. | 11-18-2010 |
20100293159 | SYSTEMS AND METHODS FOR EXTRACTING PHASES FROM TEXT - Systems and methods for extracting phrases from text are disclosed. In an exemplary embodiment, a method may include preprocessing desired phrases into at least one phrase indexing data structure for efficient matching. The method may also include scanning text to construct a hash table including keys and corresponding entries. The method may also include locating suffix trie trees for each word in the hash table. The method may also include matching each position in the hash table against the suffix trie trees, and outputting phrases matched in the scanned text. | 11-18-2010 |
20110173145 | CLASSIFICATION OF A DOCUMENT ACCORDING TO A WEIGHTED SEARCH TREE CREATED BY GENETIC ALGORITHMS - A device for classifying a document comprises a module to generate a data tree structure and configured to assign terms to a first plurality of nodes of the data tree structure, where each of the first plurality of nodes is assigned a weight. In assigning the weights of the first plurality of nodes, a first generation of combinations of possible weights assignable as the weights of the first plurality of nodes is obtained, and a second generation of combinations of possible weights assignable as the weights of the first plurality of nodes is obtained by performing the genetic algorithms in the first generation of combinations of possible weights. The device determines whether the document is in a document class based at least the weights of the first plurality of nodes. | 07-14-2011 |
20110282825 | COMPARING AND IDENTIFYING SIMILAR TRACKS - The location of a user over time is monitored by a mobile device. The monitored locations are organized into tracks that describe a path or route that the user took over a period of time. Segments that correspond to each of the tracks are determined. The segments may correspond to road on a map, or some other standardization. The segments are associated with their corresponding tracks, and used to identify similar tracks or to generate similarity scores for pairs of tracks. | 11-17-2011 |
20120271806 | GENERATING DOMAIN-BASED TRAINING DATA FOR TAIL QUERIES - Training data is provided for tail queries based on a phenomena in search engine user behavior—referred to herein as “domain trust”—as an indication of user preferences for individual URLs in search results returned by a search engine for tail queries. Also disclosed are methods for generating training data in a search engine by forming a collection of query+URL pairs, identifying domains in the collection, and labeling each domain. Other implementations are directed ranking search results generated by a search engine by measuring domain trust for each domain corresponding to each URL from among a plurality of URLs and then ranking each URL by its measured domain trust. | 10-25-2012 |
20140164388 | QUERY AND INDEX OVER DOCUMENTS - A document index is generated from a set of documents and is used to identify documents that match one or more queries. A tree is generated for each document with a node corresponding to each object of the document. The nodes of the generated trees are merged or combined to generate the document index, which is itself a tree. In addition, an inverted index is generated for each node of the index that identifies the tree(s) that the node originated from. When a query is received, the query is first executed against the document index tree: during the execution, proper set operations are applied to the inverted indices associated with the nodes matched by the query. The resulted set identifies the documents that may match the query. The query is then executed on the identified documents. | 06-12-2014 |
20140280047 | SCALABLE, SCHEMALESS DOCUMENT QUERY MODEL - Query models for document sets (such as XML documents or records in a relational database) typically involve a schema defining the structure of the documents. However, rigidly defined schemas often raise difficulties with document validation with even inconsequential structural variations. Additionally, queries developed against schema-constrained documents are often sensitive to structural details and variations that are not inconsequential to the query, resulting in inaccurate results and development complications, and that may break upon schema changes. Instead, query models for hierarchically structured documents that enable “twig” queries specifying only the structural details of document nodes that are relevant to the query (e.g., students in a student database having a sibling named “Lee” and a teacher named “Smith,” irrespective of unrelated structural details of the document). Such “twig” query models may enable a more natural query development, and continued accuracy of queries in the event of unrelated schema variations and changes. | 09-18-2014 |
20140283091 | DIFFERENTIALLY PRIVATE LINEAR QUERIES ON HISTOGRAMS - The privacy of linear queries on histograms is protected. A database containing private data is queried. Base decomposition is performed to recursively compute an orthonormal basis for the database space. Using correlated (or Gaussian) noise and/or least squares estimation, an answer having differential privacy is generated and provided in response to the query. In some implementations, the differential privacy is ε-differential privacy (pure differential privacy) or is (ε,δ)-differential privacy (i.e., approximate differential privacy). In some implementations, the data in the database may be dense. Such implementations may use correlated noise without using least squares estimation. In other implementations, the data in the database may be sparse. Such implementations may use least squares estimation with or without using correlated noise. | 09-18-2014 |