Patent application number | Description | Published |
20080222135 | SPAM SCORE PROPAGATION FOR WEB SPAM DETECTION - A SPAM detection system is provided. The system includes a graph clustering component to analyze web data. A link analysis component can be associated with the graph clustering component to facilitate SPAM detection in accordance with the web data. | 09-11-2008 |
20080222725 | GRAPH STRUCTURES AND WEB SPAM DETECTION - A SPAM detection system is provided. The system includes a graph clustering component to analyze web data. A link analysis component can be associated with the graph clustering component to facilitate SPAM detection in accordance with the web data. | 09-11-2008 |
20080222726 | NEIGHBORHOOD CLUSTERING FOR WEB SPAM DETECTION - A SPAM detection system is provided. The system includes a graph clustering component to analyze web data. A link analysis component can be associated with the graph clustering component to facilitate SPAM detection in accordance with the web data. | 09-11-2008 |
20090030916 | LOCAL COMPUTATION OF RANK CONTRIBUTIONS - The claimed subject matter relates to an architecture that can identify, store, and/or output local contributions to a rank of a vertex in a directed graph. The architecture can receive a directed graph and a parameter, and examine a local subset of vertices (e.g., local to a given vertex) in order to determine a local supporting set. The local supporting set can include a local set of vertices that each contributes a minimum fraction of the parameter to a rank of the vertex. The local supporting set can be the basis for an estimate of the supporting set and/or rank of the vertex for the entire graph and can be employed as a means for detecting link or web spam as well as other influence-based social network applications. | 01-29-2009 |
20090222435 | LOCALLY COMPUTABLE SPAM DETECTION FEATURES AND ROBUST PAGERANK - The claimed subject matter provides a system and/or a method that facilitates reducing spam in search results. An interface can obtain web graph information that represents a web of pages. A spam detection component can determines one or more features based at least in part on the web graph information. The one or more features can provide indications that a particular page of the web graph is spam. In addition, a robust rank component is provided that limits amount of contribution a single page can provide to the target page. | 09-03-2009 |
Patent application number | Description | Published |
20120130925 | DECOMPOSABLE RANKING FOR EFFICIENT PRECOMPUTING - Methods and computer storage media are provided for generating an algorithm used to provide preliminary rankings to candidate documents. A final ranking function that provides final rankings for documents is analyzed to identify potential preliminary ranking features, such as static ranking features that are query independent and dynamic atom-isolated components that are related to a single atom. Preliminary ranking features are selected from the potential preliminary ranking features based on many factors. Using these selected features, an algorithm is generated to provide a preliminary ranking to the candidate documents before the most relevant documents are passed to the final ranking stage. | 05-24-2012 |
20120130984 | DYNAMIC QUERY MASTER AGENT FOR QUERY EXECUTION - A preliminary segment root and a final segment root are selected for each segment. Each time a search query is received, a set of nodes in each segment that will be used to resolve the search query is identified. A preliminary segment root is selected from the set of nodes. Based on statistical data from each node in the set of nodes indicating each node's capability to act as a final segment root that assembles query-execution data, the preliminary segment root algorithmically selects the final segment root. The other nodes in the set of nodes are notified regarding the identity of the final segment root. | 05-24-2012 |
20120130994 | MATCHING FUNNEL FOR LARGE DOCUMENT INDEX - Search results are identified and returned in response to search queries by evaluating and pruning candidate documents in multiple stages. The process employs a search index that indexes atoms found in documents and pre-computed scores for document/atom pairs. When a search query is received, atoms are identified from the search query and a reformulated query is generated based on the identified atoms. The reformulated query is used to identify matching documents, and a preliminary score is generated for matching documents using a simplified scoring function and pre-computed scores in the search index. Documents are pruned based on preliminary scores, and the remaining documents are evaluated using a final ranking algorithm that provides a final set of ranked documents, which is used to generate search results to return in response to the search query. | 05-24-2012 |
20120130995 | EFFICIENT FORWARD RANKING IN A SEARCH ENGINE - Methods and computer storage media are provided for generating entries for documents in a forward index. A document and its document identification are received, in addition to static features that are query-independent. The document is parsed into tokens to form a token stream corresponding to the document. Relevant data used to calculate rankings of document is identified and a position of the data is determined. The entry is then generated from the document identification, the token stream of the document, the static features, and the positional information of the relevant data. The entry is stored in the forward index. | 05-24-2012 |
20120130996 | TIERING OF POSTING LISTS IN SEARCH ENGINE INDEX - A search index includes tiered posting lists. Each posting list in the search index corresponds with a different atom and includes a list of documents containing the particular document. Additionally, a rank is stored with each document listed in a posting list for a given atom representing the relevance of the atom to the context of each document. At least some of the posting lists in the search index are tiered. A tiered posting list is divided into a number of tiers with the tiers being ordered by document while each tier is internally ordered by document. Employing tiered posting lists within the search index allows a search engine to evaluate search queries in a manner that allows for a number of efficiencies and precise stopping. | 05-24-2012 |
20120130997 | HYBRID-DISTRIBUTION MODEL FOR SEARCH ENGINE INDEXES - Methods and systems are provided for using a hybrid-distribution system to identify relevant documents based on a search query. A group of documents is assigned to a particular segment. The group of documents is indexed both by atom and by document to form a reverse index and a forward index. Both indexes are divided amongst each node in that segment so that each node is responsible for storing and accessing a different portion of both the reverse and forward indexes. The reverse index portion is accessed on each of a first set of nodes to identify a first set of documents that is relevant to a particular search query. Document identifications associated with the first set of documents are used to identify a second set of nodes that access their forward index portions to limit the number of relevant documents to a second set of documents. | 05-24-2012 |
20120173510 | PRIORITY HASH INDEX - A priority hash index provides efficient lookup of posting lists for search query terms. The priority hash index is a data structure in which hash values for terms are distributed across multiple storage devices based on importance of the terms and access speeds of the storage devices. Terms are grouped into search lists with each search list including a storage location on each storage device. When a search query is received, a term is identified and hashed to a location on the first storage device and to generate a unique hash value for the term. The locations on the storage device for the term's search list are sequentially read until the hash value for the term is located to access the posting list for the term. | 07-05-2012 |
20130297621 | DECOMPOSABLE RANKING FOR EFFICIENT PRECOMPUTING - Methods and computer storage media are provided for generating an algorithm used to provide preliminary rankings to candidate documents. A final ranking function that provides final rankings for documents is analyzed to identify potential preliminary ranking features, such as static ranking features that are query independent and dynamic atom-isolated components that are related to a single atom. Preliminary ranking features are selected from the potential preliminary ranking features based on many factors. Using these selected features, an algorithm is generated to provide a preliminary ranking to the candidate documents before the most relevant documents are passed to the final ranking stage. | 11-07-2013 |
20140324819 | EFFICIENT FORWARD RANKING IN A SEARCH ENGINE - Methods and computer storage media are provided for generating entries for documents in a forward index. A document and its document identification are received, in addition to static features that are query-independent. The document is parsed into tokens to form a token stream corresponding to the document. Relevant data used to calculate rankings of document is identified and a position of the data is determined. The entry is then generated from the document identification, the token stream of the document, the static features, and the positional information of the relevant data. The entry is stored in the forward index. | 10-30-2014 |