Patent application number | Description | Published |
20090094200 | Method for Admission-controlled Caching - A method of caching the results of a search engine query divides a search engine cache into two parts, controlled and uncontrolled, and determines, through an admission policy, to which part the query results should be cached. In one implementation, the admission policy estimates whether a query is likely to be frequent or infrequent in the future by analyzing various features of the query. | 04-09-2009 |
20090094416 | SYSTEM AND METHOD FOR CACHING POSTING LISTS - A method of caching posting lists to a search engine cache calculates the ratios between the frequencies of the query terms in a past query log and the sizes of the posting lists for each term, and uses these ratios to determine which posting lists should be cached by sorting the ratios in decreasing order and storing to the cache those posting lists corresponding to the highest ratio values. Further, a method of finding an optimal allocation between two parts of a search engine cache evaluates a past query stream based on a relationship between various properties of the stream and the total size of the cache, and uses this information to determine the respective sizes of both parts of the cache. | 04-09-2009 |
20090157652 | METHOD AND SYSTEM FOR QUANTIFYING THE QUALITY OF SEARCH RESULTS BASED ON COHESION - A method and system for quantifying the quality of search results from a search engine based on cohesion. The method and system include modeling a set of search engine search results as a cluster and measuring the cohesion of the cluster. In an embodiment, the cohesion of the cluster is the average similarity between the cluster elements to a centroid vector. The centroid vector is the average of the weights of the vectors of the cluster. The similarity between the centroid vector and the cluster's elements is the cosine similarity measure. Each document in the set of search results is represented by a vector where each cell of the vector represents a stemmed word. Each cell has a cell value which is the frequency of the corresponding stemmed word in a document multiplied by a weight that takes into account the location of the stemmed word within the document. | 06-18-2009 |
20090164895 | EXTRACTING SEMANTIC RELATIONS FROM QUERY LOGS - Methods, systems, and apparatuses for associating queries of a query log are provided. The query log lists a plurality of queries and a set of clicked URLs for each query. Each query is designated to be a node of a plurality of nodes. A plurality of edges is determined. A URL is designated to be an edge for a pair of queries if the URL is indicated as clicked in the sets of clicked URLs for both queries of the pair. The nodes and edges are displayed in a graph. Each edge may be displayed in the graph as a line connected between a pair of nodes that correspond to the pair of queries of the pair of nodes. The edges may be classified. Furthermore, the edges and/or the nodes may be weighted. Edges and/or nodes may be filtered from display based on their weights and/or on other criteria. | 06-25-2009 |
20100030768 | CLASSIFYING DOCUMENTS USING IMPLICIT FEEDBACK AND QUERY PATTERNS - Methods and apparatus are described for classifying documents using a document representation model based on implicit user feedback obtained from search engine queries. The model may be used to achieve better results in non-supervised tasks such as clustering and labeling through the incorporation of usage data obtained from the search engine queries. | 02-04-2010 |
20100094853 | SYSTEM AND METHODOLOGY FOR A MULTI-SITE SEARCH ENGINE - Techniques for query processing in a multi-site search engine are described. During an indexing phase, each site of a multi-site search engine indexes a set of assigned web resources and each site calculates, for each term in the set of assigned web resources, a site-specific upper bound ranking score on the contribution of the term to the search engine ranking function for a query containing the term. During a propagation phase, all sites exchange their site-specific upper bound ranking scores with each other. In response to a site receiving a query, the site determines the set of locally matching resources and compares the ranking score of a locally matching resource with the site-specific upper bound ranking scores for the terms of the query that were received during the propagation phase and determines whether to communicate the query to other sites. By exchanging appropriately defined site-specific upper bound ranking scores, the site initially receiving the query can determine whether the locally matching resources would be identical to the resources obtained from a single-site search system without having to communicate the query to each of the other sites. | 04-15-2010 |
20100161145 | SEARCH ENGINE DESIGN AND COMPUTATIONAL COST ANALYSIS - A computer implemented system for search engine facility architecting and design. The system estimates the costs of power and networking based on system parameters, such as average CPU utilization, connection time, and bytes transferred over the network. Regional distribution of facilities may be evaluated to take into account the various parameters and optimize the cost and speed of the systems being designed. The parameters used in analyzing and formulating an architecture are independent of a particular indexing or query processing technique. | 06-24-2010 |
20100235346 | MULTI-TIERED SYSTEM FOR SEARCHING LARGE COLLECTIONS IN PARALLEL - The system includes a pre-retrieval predictor which determines which collection to submit the query to with a certain degree of confidence. The query is then submitted to either one collection, or multiple collections in parallel. When the results are returned, they are assessed and if they are deemed adequate they are shown to the user. If they are inadequate, the results from the smaller and larger collections are merged and shown to the user. Only if the predictor failed to send the query to more than one collection and the result is not adequate, the query is sent to other collections and executed in a sequential fashion. Overall, large scale searching can be accomplished much more efficiently with no degradation in the quality of the retrieved results and a small increase in processing cost. | 09-16-2010 |
20110270965 | Methods for Web Site Analysis - A specification of a target web site is received. A number of field web sites related to the target web site are identified. Data values are acquired for a set of metrics for the target and each field web site. These data values are processed to evaluate a standing of the target web site relative to the field web sites, while maintaining anonymity of the field web sites. An average web site is determined by respectively averaging data values for the field web sites. A bounding web site is characterized by the best data values from among all field web sites. Target web site data values are compared to average and/or bounding web site data values at a given time. Variations in differences between target web site data values and corresponding average and/or bounding web site data values over time determines improvement and/or success of the target web site. | 11-03-2011 |