Wei-Ying Ma, Beijing CN

Patent application number	Description	Published
20080205770	Generating a Multi-Use Vocabulary based on Image Data - Functionality is described for generating a vocabulary from a source dataset of image items or other non-textual items. The vocabulary serves as a tool for retrieving items from a target dataset in response to queries. The vocabulary has at least one characteristic that allows it to be used to retrieve items from multiple different target datasets. A target dataset can have a different size than the source dataset and/or a different type than the source dataset. The enabling characteristic may correspond to a size of the source dataset above a prescribed minimum number of items and/or a size of the vocabulary above a prescribed minimum number of words.	08-28-2008
20080215561	SCORING RELEVANCE OF A DOCUMENT BASED ON IMAGE TEXT - A method and system for determining relevance of a document having text and images to a text string is provided. A scoring system identifies image text associated with an image of the document. The scoring system calculates an image score indicating relevance of the image text to the text string. The image score may be used in many applications, such as searching, summary generation, and document classification, image search, and image classification.	09-04-2008
20080243829	SPECTRAL CLUSTERING USING SEQUENTIAL SHRINKAGE OPTIMIZATION - A clustering system initially applies an eigenvalue decomposition solver for a number of iterations to a clustering objective function. The eigenvalue decomposition solver generates an eigenvector that is an initial approximation of a solution to the objective function. The clustering system fixes the eigenvector values for the identified objects. The clustering system then reformulates the objective function to focus on the objects whose clusters have not yet been determined. The clustering system then applies an eigenvalue decomposition solver for a number of iterations to the reformulated objective function to generate new values for the eigenvector for the objects whose clusters have not yet been determined. The clustering system then repeats the process of identifying objects, reformulating the objective function, and applying an eigenvalue decomposition solver for a number of iterations until a termination criterion is satisfied.	10-02-2008
20080256068	METHOD AND SYSTEM FOR CALCULATING IMPORTANCE OF A BLOCK WITHIN A DISPLAY PAGE - A method and system for identifying the importance of information areas of a display page. An importance system identifies information areas or blocks of a web page. A block of a web page represents an area of the web page that appears to relate to a similar topic. The importance system provides the characteristics or features of a block to an importance function that generates an indication of the importance of that block to its web page. The importance system “learns” the importance function by generating a model based on the features of blocks and the user-specified importance of those blocks. To learn the importance function, the importance system asks users to provide an indication of the importance of blocks of web pages in a collection of web pages.	10-16-2008
20080263042	OBJECT SIMILARITY SEARCH IN HIGH-DIMENSIONAL VECTOR SPACES - An object search system generates a hierarchical clustering of objects of a collection based on similarity of the objects. The object search system generates a separate hierarchical clustering of objects for multiple features of the objects. To identify objects similar to a target object, the object search system first generates a feature vector for the target object. For each feature of the feature vector, the object search system uses the hierarchical clustering of objects to identify the cluster of objects that is most “feature similar” to that feature of the target object. The object search system indicates the similarity of each candidate object based on the features for which the candidate object is similar.	10-23-2008
20080270334	CLASSIFYING FUNCTIONS OF WEB BLOCKS BASED ON LINGUISTIC FEATURES - A classification system trains a classifier to classify blocks of the web page into various classifications of the function of the block. The classification system trains a classifier using training web pages. To train a classifier, the classification system identifies the blocks of the training web pages, generates feature vectors for the blocks that include a linguistic feature, and inputs classification labels for each block. The classification system learns the coefficients of the classifier using any of a variety of machine learning techniques. The classification system can then use the classifier to classify blocks of web pages.	10-30-2008
20080275862	SPECTRAL CLUSTERING USING SEQUENTIAL MATRIX COMPRESSION - A clustering system generates an original Laplacian matrix representing objects and their relationships. The clustering system initially applies an eigenvalue decomposition solver to the original Laplacian matrix for a number of iterations. The clustering system then identifies the elements of the resultant eigenvector that are stable. The clustering system then aggregates the elements of the original Laplacian matrix corresponding to the identified stable elements and forms a new Laplacian matrix that is a compressed form of the original Laplacian matrix. The clustering system repeats the applying of the eigenvalue decomposition solver and the generating of new compressed Laplacian matrices until the new Laplacian matrix is small enough so that a final solution can be generated in a reasonable amount of time.	11-06-2008
20080281821	Concept Network - A concept network that can be generated in response to a user query. Various embodiments include analysis of structure information, for example, where such information is based at least in part on Universal Resource Locators (URLs) of Web sites or data storage locations. A concept network may be used with a search tool where the search tool searches a plurality of sites (e.g., Web sites, data storage locations, etc.). In such an example, each site location is arranged with a node. Certain ones of the nodes are connected by at least one link. The concept network selects a portion of certain ones of the nodes based on the link, wherein the at least one link is used for content purposes.	11-13-2008
20080313177	ADDING DOMINANT MEDIA ELEMENTS TO SEARCH RESULTS - A method and system for determining dominance of the media elements of display pages is provided. The dominance system provides a scoring mechanism for scoring the dominance of media elements of display pages based on features of each media element of the display page. To generate the scores for the media elements of the display page, the dominance system first identifies the media elements and then identifies the features of the media elements. The dominance system then scores the identified media elements using the provided scoring mechanism and the identified features.	12-18-2008
20080319974	MINING GEOGRAPHIC KNOWLEDGE USING A LOCATION AWARE TOPIC MODEL - Mining geographic knowledge using a location aware topic model is provided. A location system estimates topics and locations associated with documents based on a location aware topic (“LAT”) model. The location system generates the model from a collection of documents that are labeled with their associated locations. The location system generates collection level parameters based on an LDA-style model. To generate the collection level parameters, the location system estimates probabilities of latent topics, locations, and words of the collection. After the model is generated, the location system uses the collection level parameters to estimate probabilities of topics and locations being associated with target documents.	12-25-2008
20090006189	DISPLAYING OF ADVERTISEMENT-INFUSED THUMBNAILS OF IMAGES - An image advertisement system of a computing device displays as part of a display page an advertisement-infused thumbnail of an image prior to displaying the image. The image advertisement system initially receives a display page with an indication of an image to be displayed as part of the display page. The image advertisement system generates an advertisement-infused thumbnail of the image by combining advertisement content with a thumbnail of the image. The image advertisement system then displays the display page with the advertisement-infused thumbnail of the image in place of the image. The image advertisement system then replaces the displayed advertisement-infused thumbnail with the image.	01-01-2009
20090019066	HYBRID LOCATION AND KEYWORD INDEX - A method and system for generating a hybrid index for indexing objects based on location and keyword attributes and performing location-based searching is provided. A search system performs a location-based search using a hybrid index that indexes both location and keyword attributes of objects. The search system generates the hybrid index either using the location attribute as the primary index or the keyword attribute as the primary index. When the location attribute is the primary index, the keyword attribute is the secondary index, and vice versa. To generate the hybrid index, the search system identifies the values for the keyword and location attributes of each object. The search system generates the primary index to map each value of a first attribute to a secondary index. The search system thus generates, for each value of the first attribute, a secondary index to map values of a second attribute to objects that have the associated values of the first and second attributes. The search system then uses the hybrid index to perform location-based searching.	01-15-2009
20090041366	GENERATING SEARCH REQUESTS FROM MULTIMODAL QUERIES - A method and system for generating a search request from a multimodal query that includes a query image and query text is provided. The multimodal query system identifies images of a collection that are textually related to the query image based on similarity between words associated with each image and the query text. The multimodal query system then selects those images of the identified images that are visually related to the query image. The multimodal query system may formulate a search request based on keywords of web pages that contain the selected images and submit that search request to a search engine service.	02-12-2009
20090043764	AUGMENTING A TRAINING SET FOR DOCUMENT CATEGORIZATION - A method and system for augmenting a training set used to train a classifier of documents is provided. The augmentation system augments a training set with training data derived from features of documents based on a document hierarchy. The training data of the initial training set may be derived from the root documents of the hierarchies of documents. The augmentation system generates additional training data that includes an aggregate feature that represents the overall characteristics of a hierarchy of documents, rather than just the root document. After the training data is generated, the augmentation system augments the initial training set with the newly generated training data.	02-12-2009
20090060351	Visual Language Modeling for Image Classification - Systems and methods for visual language modeling for image classification are described. In one aspect the systems and methods model training images corresponding to multiple image categories as matrices of visual words. Visual language models are generated from the matrices. In view of a given image, for example, provided by a user or from the Web, the systems and methods determine an image category corresponding to the given image. This image categorization is accomplished by maximizing the posterior probability of visual words associated with the given image over the visual language models. The image category, or a result corresponding to the image category, is presented to the user.	03-05-2009
20090063455	Bipartite Graph Reinforcement Modeling to Annotate Web Images - Systems and methods for bipartite graph reinforcement modeling to annotate web images are described. In one aspect the systems and methods implement bipartite graph reinforcement modeling operations to identify a set of annotations that are relevant to a Web image. The systems and methods annotate the Web image with the identified annotations. The systems and methods then index the annotated Web image. Responsive to receiving an image search query from a user, wherein the image search query comprises information relevant to at least a subset of the identified annotations, the image search engine service presents the annotated Web image to the user.	03-05-2009
20090074306	Estimating Word Correlations from Images - Word correlations are estimated using a content-based method, which uses visual features of image representations of the words. The image representations of the subject words may be generated by retrieving images from data sources (such as the Internet) using image search with the subject words as query words. One aspect of the techniques is based on calculating the visual distance or visual similarity between the sets of retrieved images corresponding to each query word. The other is based on calculating the visual consistence among the set of the retrieved images corresponding to a conjunctive query word. The combination of the content-based method and a text-based method may produce even better result.	03-19-2009
20090076800	Dual Cross-Media Relevance Model for Image Annotation - A dual cross-media relevance model (DCMRM) is used for automatic image annotation. In contrast to the traditional relevance models which calculate the joint probability of words and images over a training image database, the DCMRM model estimates the joint probability by calculating the expectation over words in a predefined lexicon. The DCMRM model may be advantageous because a predefined lexicon potentially has better behavior than a training image database. The DCMRM model also takes advantage of content-based techniques and image search techniques to define the word-to-image and word-to-word relations involved in image annotation. Both relations can be estimated by using image search techniques on the web data as well as available training data.	03-19-2009
20090106019	METHOD AND SYSTEM FOR PRIORITIZING COMMUNICATIONS BASED ON SENTENCE CLASSIFICATIONS - A method and system for prioritizing communications based on classifications of sentences within the communications is provided. A sentence classification system may classify sentences of communications according to various classifications such as “sentence mode.” The sentence classification system trains a sentence classifier using training data and then classifies sentences using the trained sentence classifier. After the sentences of a communication are classified, a document ranking system may generate a rank for the communication based on the classifications of the sentences within the communication. The document ranking system trains a document rank classifier using training data and then calculates the rank of communications using the trained document rank classifier.	04-23-2009
20090119284	METHOD AND SYSTEM FOR CLASSIFYING DISPLAY PAGES USING SUMMARIES - A method and system for classifying display pages based on automatically generated summaries of display pages. A web page classification system uses a web page summarization system to generate summaries of web pages. The summary of a web page may include the sentences of the web page that are most closely related to the primary topic of the web page. The summarization system may combine the benefits of multiple summarization techniques to identify the sentences of a web page that represent the primary topic of the web page. Once the summary is generated, the classification system may apply conventional classification techniques to the summary to classify the web page. The classification system may use conventional classification techniques such as a Naïve Bayesian classifier or a support vector machine to identify the classifications of a web page based on the summary generated by the summarization system.	05-07-2009
20090125800	Function-based Object Model for Web Page Display in a Mobile Device - By understanding a website author's intention through an analysis of the function of a website, website content can be adapted for presentation or rendering in a manner that more closely appreciates and respects the function behind the website. A website's function is analyzed so that its content can be adapted to different client environments. A function-based object model (FOM) identifies objects associated with a website, and analyzes those objects in terms of their functions. Desktop oriented websites are adapted for mobile devices based on the FOM and on a mobile control intermediary language.	05-14-2009
20090150327	CALCULATING WEB PAGE IMPORTANCE BASED ON A CONDITIONAL MARKOV RANDOM WALK - An importance system calculates the importance of pages using a conditional Markov random walk model rather than a conventional Markov random walk model. The importance system calculates the importance of pages factoring in the importance of sites that contain those pages. The importance system may factor in the importance of sites based on the strength of the correlation of the importance of a page to the importance of a site. The strength of the correlation may be based upon the depth of the page within the site. The importance system may iteratively calculate the importance of the pages using “conditional” transition probabilities. During each iteration, the importance system may recalculate the conditional transition probabilities based on the importance of sites that are derived from the recalculated importance of pages during the iteration.	06-11-2009
20090216787	INDEXING LARGE-SCALE GPS TRACKS - Described is a technology by which uploaded GPS data is indexed according to spatio-temporal relationships to facilitate efficient insertion and retrieval. The indexes may be converted to significantly smaller-sized data structures when new updates to that structure are not likely. GPS data is processed into a track of spatially-partitioned segments such that each segment has a cell. Each cell has an associated temporal index (a compressed start-end tree), into which data for that cell's segments are inserted. The temporal index may include an end time index that relates each segment's end time to a matching start time index. Given query input comprising a spatial predicate and a temporal predicate, tracks may be searched for by determining which spatial candidate cells may contain matching results. For each candidate cell, the search accesses the cell's associated temporal index to find any track or tracks that correspond to the temporal predicate.	08-27-2009
20090265363	FORUM WEB PAGE CLUSTERING BASED ON REPETITIVE REGIONS - Described is a technology by which forum web pages are processed into clusters for classification purposes, including by determining repetitive regions between pages and associating pages that have similar repetitive regions into a common cluster. Patterns corresponding to the regions are determined, and a feature set based at least in part on those patterns (e.g., pattern frequency) is extracted from the page. The feature set of a page is compared against the feature set of another page to determine similarity therewith, e.g., via a feature space distance computation that is evaluated against a threshold distance.	10-22-2009
20090277322	Scalable Music Recommendation by Search - An exemplary method includes providing a music collection of a particular scale, determining a distance parameter for locality sensitive hashing based at least in part on the scale of the music collection and constructing an index for the music collection. Another exemplary method includes providing a song, extracting snippets from the song, analyzing time-varying timbre characteristics of the snippets and constructing one or more queries based on the analyzing. Such exemplary methods may be implemented by a portable device configured to maintain an index, to perform searches based on selected songs or portions of songs and to generate playlists from search results. Other exemplary methods, devices, systems, etc., are also disclosed.	11-12-2009
20090281906	Music Recommendation using Emotional Allocation Modeling - An exemplary method includes defining a vocabulary for emotions; extracting descriptions for songs; generating distributions for the songs in an emotion space based at least in part on the vocabulary and the extracted descriptions; extracting salient words from a document; generating a distribution for the document in an emotion space based at least in part on the vocabulary and the extracted salient words; and matching the distribution for the document to one or more of the distributions for the songs. Various other exemplary methods, devices, systems, etc., are also disclosed.	11-12-2009
20090282032	TOPIC DISTILLATION VIA SUBSITE RETRIEVAL - A method and system for generating a search result for a query of hierarchically organized documents based on retrieval of subtrees that are key resources for topic distillation is provided. The retrieval system may identify documents relevant to a query using conventional searching techniques. The retrieval system then calculates a subtree feature for subtrees that have an identified document as their root. After the retrieval system calculates the subtree feature for the subtrees, the retrieval system may generate a subtree relevance score for each subtree based on its subtree feature. The retrieval system may then order the identified documents based on their corresponding subtree relevances.	11-12-2009
20090313706	METHOD AND SYSTEM FOR DETECTING WHEN AN OUTGOING COMMUNICATION CONTAINS CERTAIN CONTENT - A method and system for detecting whether an outgoing communication contains confidential information or other target information is provided. The detection system is provided with a collection of documents that contain confidential information, referred to as “confidential documents.” When the detection system is provided with an outgoing communication, it compares the content of the outgoing communication to the content of the confidential documents. If the outgoing communication contains confidential information, then the detection system may prevent the outgoing communication from being sent outside the organization. The detection system detects confidential information based on the similarity between the content of an outgoing communication and the content of confidential documents that are known to contain confidential information.	12-17-2009
20090319883	Automatic Video Annotation through Search and Mining - Described is a technology in which a new video is automatically annotated based on terms mined from the text associated with similar videos. In a search phase, searching by one or more various search modalities (e.g., text, concept and/or video) finds a set of videos that are similar to a new video. Text associated with the new video and with the set of videos is obtained, such as by automatic speech recognition that generates transcripts. A mining mechanism combines the associated text of the similar videos with that of the new video to find the terms that annotate the new video. For example, the mining mechanism creates a new term frequency vector by combining term frequency vectors for the set of similar videos with a term frequency vector for the new video, and provides the mined terms by fitting a zipf curve to the new term frequency vector.	12-24-2009
20090327237	WEB FORUM CRAWLING USING SKELETAL LINKS - A method and system for identifying informative links of a web site for use in crawling the web site is provided. A forum crawler analyzes sample web pages of a web forum to identify informative links and then crawls the web forum by following links determined to be informative and not following other links. The forum crawler system determines whether links are informative based on whether they are part of the overall structure of the web site or are used to select sequential information that has been split onto multiple web pages.	12-31-2009
20090327342	DENSITY-BASED CO-LOCATION PATTERN DISCOVERY - Described is using density to efficiently mine co-location patterns, such as closely located businesses frequently found together in business listing databases, geographic search logs, and/or GPS-based data. A data space of such information is geographically partitioned into a grid of cells, with dense cells scanned first. A dynamic upper bound of prevalence measure of co-location patterns is maintained during the scanning process. If the current upper bound is smaller than a threshold, the scanning is stopped, thereby significantly reducing the computation cost for processing many cells, while providing suitable results.	12-31-2009
20100010945	METHOD AND SYSTEM FOR WEB RESOURCE LOCATION CLASSIFICATION AND DETECTION - A method and system for identifying locations associated with a web resource is provided. The location system identifies three different types of geographic locations: a provider location, a content location, and a serving location. A provider location identifies the geographic location of the entity that provides the web resource. A content location identifies the geographic location that is the subject of the web resource. A serving location identifies the geographic scope that the web page reaches. An application can select to use the type of location that is of particular interest.	01-14-2010
20100023508	SEARCH ENGINE ENHANCEMENT USING MINED IMPLICIT LINKS - An implicit links enhancement system and method for search engines that generates implicit links obtained from mining user access logs to facilitate enhanced local searching of web sites and intranets. Embodiments of the implicit links search enhancement system and method includes extracting implicit links by mining users' access patterns and then using a modified link analysis algorithm to re-rank search results obtained from traditional search engines. More specifically, embodiments of the method include extracting implicit links from a user access log, generating an implicit links graph from the extracted implicit links, and computing page rankings using the implicit links graph. The implicit links are extracted from the log using a two-item sequential pattern mining technique. Search results obtained from a search engine are re-ranked based on an implicit links analysis performed using an updated implicit links graph, a modified re-ranking formula, and at least one re-ranking technique.	01-28-2010
20100049772	EXTRACTION OF ANCHOR EXPLANATORY TEXT BY MINING REPEATED PATTERNS - A method and system for identifying explanatory text for a referenced web page based on a reference to the referenced web page contained in a repeated pattern of a referencing web page is provided. An anchor explanatory text (“AET”) system uses the hierarchical organization of the web page to identify a repeated pattern of hierarchical elements that contain references to other display pages. After the AET system identifies a repeated pattern, it identifies the dominant reference or anchor within each occurrence of the pattern. The AET system uses the explanatory text surrounding a dominant anchor as a description of the referenced web page.	02-25-2010
20100057798	METHOD AND SYSTEM FOR ADAPTING SEARCH RESULTS TO PERSONAL INFORMATION NEEDS - A method and system for adapting search results of a query to the information needs of the user submitting the query is provided. A search system analyzes click-through triplets indicating that a user submitted a query and that the user selected a document from the results of the query. To overcome the large size and sparseness of the click-through data, the search system when presented with an input triplet comprising a user, a query, and a document determines a probability that the user will find the input document important by smoothing the click-through triplets. The search system then orders documents of the result based on the probability of their importance to the input user.	03-04-2010
20100073372	METHOD AND SYSTEM FOR PROGRESSIVE IMAGE TRANSMISSION - A method and system for transmitting an image progressively is provided. The transmission system identifies a first region and a second region of the image. The transmission system also identifies a first resolution and a second resolution. The transmission system then transmits the image by transmitting, in the following order, the first region in the first resolution, the second region in the first resolution, the first region in the second resolution, and the second region in the second resolution. The transmission system may identify the regions based on the likelihood of being the focus of user attention.	03-25-2010
20100088647	USER INTERFACE FOR VIEWING CLUSTERS OF IMAGES - A method and system for providing a user interface for presenting images of clusters of an image search result is provided. The user interface system displays the search result in a cluster/view form using a cluster panel and a view panel. The cluster panel contains a cluster area for each cluster. The view panel may contain thumbnails of images of the search result in a list view or a mix view. When a user selects a cluster area from the cluster panel, the user interface system displays a list view of thumbnails for that cluster in the view panel. The user interface system may display a thumbnail list near a cluster area of the cluster panel. The thumbnail list contains mini-thumbnails of the images of the selected cluster. The user interface system may also display a detail view of an image in the view panel when a user selects an image.	04-08-2010
20100153292	Making Friend and Location Recommendations Based on Location Similarities - Method for making a recommendation to a first user in a computing network, including calculating one or more similarity scores between the first user and one or more remaining users in the network, identifying a portion of the remaining users having a highest similarity scores, identifying one or more locations visited by the portion of the remaining users but not by the first user, determining an interest level of the first user in each location, ranking the locations based on the interest levels, and displaying the locations based on the ranking as a first recommendation.	06-17-2010
20100169178	Advertising Method for Image Search - A method for advertising in response to an image search. One or more keywords may be received. The keywords may be for searching one or more images on the network. The images may be retrieved based on the keywords. One or more advertisements may be selected based on a first visual content of the images and a second visual content of the one or more advertisements. The one or more of the advertisements may be displayed.	07-01-2010
20100179759	Detecting Spatial Outliers in a Location Entity Dataset - Disclosed herein are one or more embodiments that arrange a plurality of location entities into a hierarchy of location descriptors. One or more of the disclosed embodiments may determine whether one of the location entities is a spatial outlier based at least in part on presence of one or more other location entities within a predetermined distance of the one location entity. Also, the other location entities and the one location entity may share a location descriptor.	07-15-2010
20100191686	Answer Ranking In Community Question-Answering Sites - In some implementations, a plurality of first questions and corresponding first answers are identified at a community question-answer (CQA) site as a plurality of first question-answer (q-a) pairs. A query thread comprised of a second question and a plurality of candidate second answers is selected for making a determination of answer quality. A set of the first questions that are similar to the second question are identified from the plurality of first questions. First linking features between the identified set of first questions and their corresponding first answers are used for determining an analogy with second linking features between the second question and candidate answers for ranking the candidate answers.	07-29-2010
20100205168	Thread-Based Incremental Web Forum Crawling - The incremental web forum crawling technique described herein is a web forum crawling technique that employs a thread-wise strategy that takes into account thread-level statistics, for example, the number of replies and the frequency of replies, to estimate the activity trend of each thread. To extract such statistical information, the technique employs a simple yet very robust approach to extract the timestamp of each post in a discussion thread. It also employs a regression model to predict the time of the next post for each thread.	08-12-2010
20100205176	Discovering City Landmarks from Online Journals - A blog-based city landmark discovery framework is described to discover and summarize popular scenes and their representative views from blog photos to provide online personalized tourist suggestions. First, a location extraction algorithm is implemented to infer geographical associations of blog photos from their contextual descriptors, thus providing the ability to harvest city scene photos from web blogs. Second, a visual-textual hierarchical clustering scheme is adopted to organize crawled photos into a scene-view structure, and present a PhotoRank algorithm to discover representative views within each scene by viewing the representative photo selection problem as a popularity ranking problem in a visual correlation environment. Third, author, context and content issues are evaluated in a unified Landmark-HITS model to discover representative scenes as well as build author correlations. The author correlations further facilitate a collaborative filtering process for online personalized tourist suggestions based on an author's previous travel logs.	08-12-2010
20100211308	IDENTIFYING INTERESTING LOCATIONS - Interesting location identification embodiments are presented that generally involve identifying and providing the interesting locations found in a given geospatial region. This is accomplished by modeling the location histories of multiple individuals who traveled through the region of interest, and identifying interesting locations in the region based on the number of individuals visiting a location weighted in terms of the travel experience of those individuals. A prescribed number of the top most interesting locations in a specified region can be provided upon request. In addition, prescribed numbers of the top most popular travel sequences through the interesting locations and the top most experienced travelers in the specified region can be provided as well.	08-19-2010
20100211533	EXTRACTING STRUCTURED DATA FROM WEB FORUMS - The web forum data extraction technique is designed for the structured data extraction of data on web forums using both page-level information and site-level knowledge. To do this, the technique finds the kinds of page objects a forum site has, which object a page belongs to, and how different page objects are connected with each other. This information can be obtained by re-constructing the sitemap of the target forum which is based on a Data Object Model of the target forum. The web forum data extraction technique collects three kinds of evidence for data extraction: 1) inner-page features which cover both semantic and layout information on an individual page; 2) inter-vertex features which describe linkage-related observations; and 3) inner-vertex features which characterize interrelationships among pages in one vertex. The technique employs Markov Logic Networks to combine the types of evidence statistically for inference and thereby can extract the desired structures.	08-19-2010
20100211927	WEBSITE DESIGN PATTERN MODELING - Website design pattern modeling technique embodiments are presented that model a website's design patterns. This can be based on the website's layout elements, its URL tokens, or both. When based on both, the design patterns can be modeled separately using first the layout elements and then the URL tokens, or vice versa. Alternately, the modeling can be based on coupled layout and URL token patterns. In operation, the modeling involves first identifying layout elements and/or URL tokens found on at least some of the pages of the website. The website design patterns are then modeled based on the occurrences of the identified layout elements and/or URL tokens in pages of the website. In cases where a coupled modeling scheme is employed, a modeling technique that exploits the correlations between the layout elements and URL tokens is used.	08-19-2010
20100281009	HIERARCHICAL CONDITIONAL RANDOM FIELDS FOR WEB EXTRACTION - A method and system for labeling object information of an information page is provided. A labeling system identifies an object record of an information page based on the labeling of object elements within an object record and labels object elements based on the identification of an object record that contains the object elements. To identify the records and label the elements, the labeling system generates a hierarchical representation of blocks of an information page. The labeling system identifies records and elements within the records by propagating probability-related information of record labels and element labels through the hierarchy of the blocks. The labeling system generates a feature vector for each block to represent the block and calculates a probability of a label for a block being correct based on a score derived from the feature vectors associated with related blocks. The labeling system searches for the labeling of records and elements that has the highest probability of being correct.	11-04-2010
20110072047	Interest Learning from an Image Collection for Advertising - Described herein is a technology that facilitates learning interests for advertising based on automated analysis of images. In several embodiments a person's interests are automatically learned based on the person's photographs for targeted advertising. Techniques are described that facilitate automatically detecting a user's interest from images and suggesting user-targeted ads. As described herein, these techniques include computer-annotating images with learned tags, performing topic learning to obtain an interest model, and performing advertisement matching and ranking based on the interest model.	03-24-2011
20110078159	Long-Query Retrieval - Described herein is a technology that facilitates efficient large-scale similarity-based retrieval. In several embodiments documents, images, and/or other multimedia files are compactly represented and efficiently indexed to enable robust search using a long-query in a large-scale corpus. As described herein, these techniques include performing decomposition of a file, e.g., a document or document-like representation. The techniques use dimension reduction to obtain three parts, topic-related words (major semantics), document specific words (minor semantics), and background words, representing the major semantics in a feature vector and the minor semantics as keywords. Using the techniques described, file vectors are matched in a topic model and the results ranked based on the keywords.	03-31-2011
20110087660	SCORING RELEVANCE OF A DOCUMENT BASED ON IMAGE TEXT - A method and system for determining relevance of a document having text and images to a text string is provided. A scoring system identifies image text associated with an image of the document. The scoring system calculates an image score indicating relevance of the image text to the text string. The image score may be used in many applications, such as searching, summary generation, and document classification, image search, and image classification.	04-14-2011
20110194780	OBJECT SIMILARITY SEARCH IN HIGH-DIMENSIONAL VECTOR SPACES - An object search system generates a hierarchical clustering of objects of a collection based on similarity of the objects. The object search system generates a separate hierarchical clustering of objects for multiple features of the objects. To identify objects similar to a target object, the object search system first generates a feature vector for the target object. For each feature of the feature vector, the object search system uses the hierarchical clustering of objects to identify the cluster of objects that is most “feature similar” to that feature of the target object. The object search system indicates the similarity of each candidate object based on the features for which the candidate object is similar.	08-11-2011
20110264658	WEB OBJECT RETRIEVAL BASED ON A LANGUAGE MODEL - A method and system is provided for determining relevance of an object to a term based on a language model. The relevance system provides records extracted from web pages that relate to the object. To determine the relevance of the object to a term, the relevance system first determines, for each record of the object, a probability of generating that term using a language model of the record of that object. The relevance system then calculates the relevance of the object to the term by combining the probabilities. The relevance system may also weight the probabilities based on the accuracy or reliability of the extracted information for each data source.	10-27-2011
20110264659	TRAINING A RANKING FUNCTION USING PROPAGATED DOCUMENT RELEVANCE - A method and system for propagating the relevance of labeled documents to a query to unlabeled documents is provided. The propagation system provides training data that includes queries, documents labeled with their relevance to the queries, and unlabeled documents. The propagation system then calculates the similarity between pairs of documents in the training data. The propagation system then propagates the relevance of the labeled documents to similar, but unlabeled, documents. The propagation system may iteratively propagate labels of the documents until the labels converge on a solution. The training data with the propagated relevances can then be used to train a ranking function.	10-27-2011
20110270821	ADDING DOMINANT MEDIA ELEMENTS TO SEARCH RESULTS - A method and system for determining dominance of the media elements of display pages is provided. The dominance system provides a scoring mechanism for scoring the dominance of media elements of display pages based on features of each media element of the display page. To generate the scores for the media elements of the display page, the dominance system first identifies the media elements and then identifies the features of the media elements. The dominance system then scores the identified media elements using the provided scoring mechanism and the identified features.	11-03-2011
20110282798	Making Friend and Location Recommendations Based on Location Similarities - Method for making a recommendation to a first user in a computing network, including calculating one or more similarity scores between the first user and one or more remaining users in the network, identifying a portion of the remaining users having a highest similarity scores, identifying one or more locations visited by the portion of the remaining users but not by the first user, determining an interest level of the first user in each location, ranking the locations based on the interest levels, and displaying the locations based on the ranking as a first recommendation.	11-17-2011
20110295775	ASSOCIATING MEDIA WITH METADATA OF NEAR-DUPLICATES - Techniques for identifying near-duplicates of a media object and associating metadata of the near-duplicates with the media object are described herein. One or more devices implementing the techniques are configured to identify the near duplicates based at least on similarity attributes included in the media object. Metadata is then extracted from the near-duplicates and is associated with the media object as descriptors of the media object to enable discovery of the media object based on the descriptors.	12-01-2011
20110307425	ORGANIZING SEARCH RESULTS - Many users make use of search engines to locate desired internet content by submitting search queries. For example, a user may search for photos, applications, websites, videos, documents, and/or information regarding people, places, and things. Unfortunately, search engines may provide a plethora of information that a user may be left to sift through to find relevant content. Accordingly, one or more systems and/or techniques for organizing search results are disclosed herein. In particular, user generated content, such as photos, may be retrieved based upon a search query. The user generated content may be grouped into clusters of user generated content having similar features. Search results of the search query may be obtained and organized based upon comparing the search results with the clusters. The organized search results and/or a table of content comprising the clusters may be presented to provide an enhanced user experience.	12-15-2011
20110307436	PATTERN TREE-BASED RULE LEARNING - A pattern tree is constructed based on a plurality of key-value pairs representing portions of a data set. In some implementations, the pattern tree may be used for learning one or more rules for interacting with a source of the data set.	12-15-2011
20110314013	RANKING ADVERTISEMENTS - While browsing, a user may interact with a wide variety of images. The user may upload and share images taken with a digital camera and/or search for image using a search engine. Because images are rich in contextual information, it may be advantageous to provide additional information, such as adjacent market advertising based upon matching advertisements with contextual information of the images. Accordingly, a query image may be used to retrieve a video frame set. The video frame set may be expanded with related video frames, which may comprise video frames correlating to adjacent markets. The expanded video frame set may be grouped into clusters of similar frames. The clusters may be used to rank advertisements based upon how similar the advertisements are to the clusters and/or video frames within the clusters. In this way, one or more ranked advertisements may be presented with the query image.	12-22-2011
20120005565	Small Form Factor Web Browsing - A large web page is analyzed and partitioned into smaller sub-pages so that a user can navigate the web page on a small form factor device. The user can browse the sub-pages to find and read information in the content of the large web page. The partitioning can be performed at a web server, an edge server, at the small form factor device, or can be distributed across one or more such devices. The analysis leverages design habits of a web page author to extract a representation structure of an authored web page. The extracted representation structure includes high level structure using several markup language tag selection rules and low level structure using visual boundary detection in which visual units of the low level structure are provided by clustering markup language tags. User viewing habits can be learned to display favorite parts of a web page.	01-05-2012
20120093371	GENERATING SEARCH REQUESTS FROM MULTIMODAL QUERIES - A method and system for generating a search request from a multimodal query that includes a query image and query text is provided. The multimodal query system identifies images of a collection that are textually related to the query image based on similarity between words associated with each image and the query text. The multimodal query system then selects those images of the identified images that are visually related to the query image. The multimodal query system may formulate a search request based on keywords of web pages that contain the selected images and submit that search request to a search engine service.	04-19-2012
20120109950	METHOD AND SYSTEM FOR CALCULATING IMPORTANCE OF A BLOCK WITHIN A DISPLAY PAGE - A method and system for identifying the importance of information areas of a display page. An importance system identifies information areas or blocks of a web page. A block of a web page represents an area of the web page that appears to relate to a similar topic. The importance system provides the characteristics or features of a block to an importance function that generates an indication of the importance of that block to its web page. The importance system “learns” the importance function by generating a model based on the features of blocks and the user-specified importance of those blocks. To learn the importance function, the importance system asks users to provide an indication of the importance of blocks of web pages in a collection of web pages.	05-03-2012
20120114197	BUILDING A PERSON PROFILE DATABASE - Names of entities, such as people, in an image may be identified automatically. Visually similar images of entities are retrieved, including text proximate to the visually similar images. The collected text is mined for names of entities, and the detected names are analyzed. A name may be associated with the entity in the image, based on the analysis.	05-10-2012
20120117052	WEB FORUM CRAWLING USING SKELETAL LINKS - A method and system for identifying informative links of a web site for use in crawling the web site is provided. A forum crawler analyzes sample web pages of a web forum to identify informative links and then crawls the web forum by following links determined to be informative and not following other links. The forum crawler system determines whether links are informative based on whether they are part of the overall structure of the web site or are used to select sequential information that has been split onto multiple web pages.	05-10-2012
20120125178	SCALABLE MUSIC RECOMMENDATION BY SEARCH - An exemplary method includes providing a music collection of a particular scale, determining a distance parameter for locality sensitive hashing based at least in part on the scale of the music collection and constructing an index for the music collection. Another exemplary method includes providing a song, extracting snippets from the song, analyzing time-varying timbre characteristics of the snippets and constructing one or more queries based on the analyzing. Such exemplary methods may be implemented by a portable device configured to maintain an index, to perform searches based on selected songs or portions of songs and to generate playlists from search results. Other exemplary methods, devices, systems, etc., are also disclosed.	05-24-2012
20120253899	TABLE APPROACH FOR DETERMINING QUALITY SCORES - Some implementations construct a quality score table based on historic data collected for a plurality of ad-keyword pairs. An ad-keyword pair may be selected for determining a quality score. One or more advertisement parameters may be determined for the selected ad-keyword pair. Based on the one or more advertisement parameters, the quality score for the selected ad-keyword pair may be determined from the quality score table. In some implementations, the quality score table is constructed by iteratively cutting a directed graph representing the advertisement parameters and the historic data. Further, in some implementations, the table may be smoothed using a smoothing operation.	10-04-2012
20120253927	MACHINE LEARNING APPROACH FOR DETERMINING QUALITY SCORES - Some implementations generate a mapping function using one or more historic performance indicators for a set of ad-keyword pairs and one or more advertisement metrics extracted from the set of ad-keyword pairs. The mapping function may be applied to map one or more advertisement metrics of a particular ad-keyword pair to determine a quality score for the particular ad-keyword pair. For example, the quality score may be used when determining whether to select an advertisement for display or may be provided as feedback to an advertiser. Additionally, in some implementations, the mapping function may be applied to determine a quality score for a new ad-keyword pair that has not yet accumulated historic information.	10-04-2012
20120253945	BID TRAFFIC ESTIMATION - Some implementations provide techniques for estimating impression numbers. For example, a log of advertisement bidding data may be used to generate and train an impression estimation model. In some implementations, an impression estimation component may use a boost regression technique to determine a predicted impression value range based on a proposed bid received from an advertiser. For example, the predicted impression value range may be determined based on a predicted estimation error. Additionally, in some instances, the predicted impression value range may be evaluated using one or more evaluation metrics.	10-04-2012
20120296897	Text to Image Translation - Techniques are described for online real time text to image translation suitable for virtually any submitted query. Semantic classes and associated analogous items for each of the semantic classes are determined for the submitted query. One or more requests are formulated that are associated with analogous items. The requests are used to obtain web based images and associated surrounding text. The web based images are used to obtain associated near-duplicate images. The surrounding text of images is analyzed to create high-quality text associated with each semantic class of the submitted query. One or more query dependent classifiers are trained online in real time to remove noisy images. A scoring function is used to score the images. The images with the highest score are returned as a query response.	11-22-2012
20130226693	Allocating Deals to Visitors in a Group-Buying Service - Functionality is described herein for allocating group-buying deals in a group-buying service. In certain implementations, the functionality operates by receiving deal information from deal-providing entities (such as merchants). The deal information describes plural deals. The functionality then assigns a number of impressions to each deal so as to maximize revenue provided to an entity which administers the group-buying service. This yields allocation information. The functionality then presents deals to users in accordance with the allocation information. For example, if the allocated number of impressions for a certain deal is x, then the functionality will provide x opportunities for users to select this deal.	08-29-2013
20130246167	Cost-Per-Action Model Based on Advertiser-Reported Actions - According to a cost-per-action advertising model, advertisers submit ads with cost-per-action bids. Ad auctions are conducted and winning ads are returned with contextually relevant search results. Each time a winning ad is selected by a user, resulting in the user being redirected to a website associated with the advertiser, a selected impression and a price is recorded for the winning ad. Periodically, an advertiser submits a report indicating a number of actions attributed to the ads that have occurred through the advertiser website. The advertiser is then charged a fee for each reported action based on the recorded prices for the winning ads and based on the number of selected impressions recorded for the winning ads.	09-19-2013
20130287302	IDENTIFICATION OF DUPLICATES WITHIN AN IMAGE SPACE - Implementations for identifying duplicate images in an image space are described. An image space is partitioned into a plurality of coarse clusters based on signatures of the images within the image space. The signatures are determined from compact descriptors of the images. Refined clusters that include one or more images of an individual coarse cluster are created based on pair-wise comparisons of the compact descriptors of images in the coarse cluster, and the refined clusters are identified as sets of duplicate images. The refined clusters are grown by searching in similar coarse clusters for images to add to the refined clusters.	10-31-2013

Patent applications by Wei-Ying Ma, Beijing CN

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Wei-Ying Ma, Beijing CN

Wei-Ying Ma, Beijing CN