Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Shuming Shi, Beijing CN

Shuming Shi, Beijing CN

Patent application number	Description	Published
20080215561	SCORING RELEVANCE OF A DOCUMENT BASED ON IMAGE TEXT - A method and system for determining relevance of a document having text and images to a text string is provided. A scoring system identifies image text associated with an image of the document. The scoring system calculates an image score indicating relevance of the image text to the text string. The image score may be used in many applications, such as searching, summary generation, and document classification, image search, and image classification.	09-04-2008
20080215563	Pseudo-Anchor Text Extraction for Vertical Search - A search method uses pseudo-anchor text associated with search objects to improve search performance. The pseudo-anchor text may be extracted in combination with an identifier of the search objects (such as a pseudo-URL) from a digital corpus such as a collection of documents. Pseudo-anchor texts for each object are preferably extracted from candidate anchor blocks using a machine learning based approach. The pseudo-anchor texts are made available for searching and used to help ranking the objects in a search result to improve search performance. Method may be used in vertical search of objects such as published articles, products and images that lack explicit URL and anchor text information.	09-04-2008
20080313177	ADDING DOMINANT MEDIA ELEMENTS TO SEARCH RESULTS - A method and system for determining dominance of the media elements of display pages is provided. The dominance system provides a scoring mechanism for scoring the dominance of media elements of display pages based on features of each media element of the display page. To generate the scores for the media elements of the display page, the dominance system first identifies the media elements and then identifies the features of the media elements. The dominance system then scores the identified media elements using the provided scoring mechanism and the identified features.	12-18-2008
20090024607	QUERY SELECTION FOR EFFECTIVELY LEARNING RANKING FUNCTIONS - A learning system for a search ranking function model may include a computer program that iteratively refines the model using new queries and associated documents from an unlabeled training set. The unlabeled training set may include a set of queries for which the associated documents have not been labeled as “relevant” or otherwise labeled. The new queries may be selected based on a similarity to and an accuracy of each neighbor from a labeled training set, such as a labeled validation set. Upon selection, the documents associated with the new queries may be labeled. The new queries and their associated documents may be accumulated into a labeled training set, such as a labeled training set, and a refined model may be learned based on the augmented labeled training set. The model may be iteratively refined until it is determined that the model is adequate.	01-22-2009
20100145956	PSEUDO-ANCHOR TEXT EXTRACTION - A search method uses pseudo-anchor text associated with search objects to improve search performance. The pseudo-anchor text may be extracted in combination with an identifier of the search objects (such as a pseudo-URL) from a digital corpus such as a collection of documents. Pseudo-anchor texts for each object are preferably extracted from candidate anchor blocks using a machine learning based approach. The pseudo-anchor texts are made available for searching and used to help rank the objects in a search result to improve search performance. The method may be used in vertical search of objects such as published articles, products and images that lack explicit URLs and anchor text information.	06-10-2010
20110078131	EXPERIMENTAL WEB SEARCH SYSTEM - Described is the running of search-related experiments on a full (or partial) offline snapshot copy of the search engine documents of an actual production system. A snapshot experimentation subsystem runs experimental code related to web searches on the offline data, including to run experimental index building code to build an experimental index (e.g., to test a new document feature), and/or to run experimental search-related code, such as to rank search results according to experimental ranking code, to implement an experimental search strategy, and/or to generate experimental captions.	03-31-2011
20110078132	FLEXIBLE INDEXING AND RANKING FOR SEARCH - Described is a flexible framework for index building and document retrieval in a search environment that allows different search scenario applications to reuse index building and document retrieval code for non-scenario-specific functionality. Interfaces to various functionality of an index builder and retrieval engine are defined. An application calls the interfaces to specify custom code to perform a search scenario when needed, or use default code when non-scenario-specific functionality may be used.	03-31-2011
20110087660	SCORING RELEVANCE OF A DOCUMENT BASED ON IMAGE TEXT - A method and system for determining relevance of a document having text and images to a text string is provided. A scoring system identifies image text associated with an image of the document. The scoring system calculates an image score indicating relevance of the image text to the text string. The image score may be used in many applications, such as searching, summary generation, and document classification, image search, and image classification.	04-14-2011
20110137886	Data-Centric Search Engine Architecture - Described is a data-centric web search engine technology/architecture, in which document metadata, including offline-extracted metadata, is used as part of a search indexing and ranking pipeline. A web data management component receives crawled documents and extracts document metadata from the documents. An indexing component uses the document metadata to build an index for the documents. A serving component uses the index and the document metadata to serve content, e.g., search results. Also described is the use of query metadata extracted from queries of a query log for use in the pipeline.	06-09-2011
20110264658	WEB OBJECT RETRIEVAL BASED ON A LANGUAGE MODEL - A method and system is provided for determining relevance of an object to a term based on a language model. The relevance system provides records extracted from web pages that relate to the object. To determine the relevance of the object to a term, the relevance system first determines, for each record of the object, a probability of generating that term using a language model of the record of that object. The relevance system then calculates the relevance of the object to the term by combining the probabilities. The relevance system may also weight the probabilities based on the accuracy or reliability of the extracted information for each data source.	10-27-2011
20110270821	ADDING DOMINANT MEDIA ELEMENTS TO SEARCH RESULTS - A method and system for determining dominance of the media elements of display pages is provided. The dominance system provides a scoring mechanism for scoring the dominance of media elements of display pages based on features of each media element of the display page. To generate the scores for the media elements of the display page, the dominance system first identifies the media elements and then identifies the features of the media elements. The dominance system then scores the identified media elements using the provided scoring mechanism and the identified features.	11-03-2011
20120030206	Employing Topic Models for Semantic Class Mining - A topic modeling architecture is used to discover high-quality semantic classes from a large collection of raw semantic classes (RASCs) for use in generating responses to queries. A specific semantic class is identified from a collection of RASCs, and a preprocessing operation is conducted to remove one or more items with a semantic class frequency less than a predetermined threshold. A topic model is then applied to the specific semantic class for each of the items that remain in the specific semantic class after the preprocessing operation. A postprocessing operation is then conducted on the items of the specific semantic class to merge and sort the results of the topic model and generate final semantic classes for use by a search engine to respond to a query.	02-02-2012
20140143223	Search Query User Interface - This disclosure describes, in part, techniques for operating a search query user interface to allow seamless creating, editing and/or refining of a search query using various interactive functions. The techniques described herein may display a search query divided into segments. A selection of a segment of the search query may then be received. One or more alternatives to the selected segment may then be presented. Next, a selection of one of the presented alternative may be received. As a result, the search query may be altered using the selected alternative. Furthermore, the techniques described herein allow a user to operate on a search query using query substitution, expansion, association and/or history functions.	05-22-2014

Patent applications by Shuming Shi, Beijing CN