Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Jun Wu, Saratoga US

Jun Wu, Saratoga, CA US

Patent application number	Description	Published
20090055168	Word Detection - Methods, systems, and apparatus, including computer program products, in which data from web documents are partitioned into a training corpus and a development corpus are provided. First word probabilities for words are determined for the training corpus, and second word probabilities for the words are determined for the development corpus. Uncertainty values based on the word probabilities for the training corpus and the development corpus are compared, and new words are identified based on the comparison.	02-26-2009
20090055381	Domain Dictionary Creation - Methods, systems, and apparatus, including computer program products, to identify topic words in a document corpus that includes topic documents related to a topic are disclosed. A reference topic word divergence value based on the document corpus and the topic document corpus is determined. A candidate topic word divergence value for a candidate topic word is determined based on the document corpus and the topic document corpus. The candidate topic word is determined to be a topic word if the candidate topic word divergence value is greater than the reference topic word divergence value.	02-26-2009
20090070097	USER INPUT CLASSIFICATION - Systems and methods of classifying user input are disclosed. The user input can be, for example, in the form of Roman characters. An ambiguous word (e.g., a word that is a non-pinyin word written in Roman characters and a valid pinyin word) can be identified in the user input. Contextual words (e.g., words adjacent to the ambiguous word) are classified as a pinyin context or a non-pinyin context. The ambiguous word is classified based on the context of the contextual words.	03-12-2009
20090077037	SUGGESTING ALTERNATIVE QUERIES IN QUERY RESULTS - Methods, systems, and apparatus, including computer program products, for suggesting alternative queries based on original query search results. In one aspect, a method includes receiving search results for a first query, where each search result refers to a respective resource and includes a snippet of content from the respective resource, receiving one or more suggested second queries, for each of the suggested second queries: selecting a set of words in one of the snippets to represent the suggested second query, associating the suggested second query with the set so that a user can interact with a word in the set to invoke the suggested second query, and marking the set so as to indicate that the user can interact with a word in the set to invoke the suggested second query, and transmitting the search results including each marked set to a client device for presentation to the user.	03-19-2009
20100005086	RESOURCE LOCATOR SUGGESTIONS FROM INPUT CHARACTER SEQUENCE - Methods, systems, and apparatus, including computer program products, in which an input method editor receives Roman character inputs, identifies keywords for candidate sets of a non-Roman character, and identifies an associated resource location. Upon identifying an associated resource location, associating the resource location with the candidate set of non-Roman characters.	01-07-2010
20100180199	DETECTING NAME ENTITIES AND NEW WORDS - Various aspects can be implemented for detecting name entities and/or new words from input entries. In general, one aspect can be a method that includes receiving an input entry comprising a text string. The method also includes identifying segmentation information from the input entry. The method further includes generating a candidate text string from the text string of the input entry based on the segmentation information. Other implementations of this aspect includes corresponding systems, apparatus, and processing engines.	07-15-2010
20100306139	CJK NAME DETECTION - Aspects directed to name detection are provided. A method includes generating a raw name detection model using a collection of family names and an annotated corpus including a collection of n-grams, each n-gram having a corresponding probability of occurring. The method includes applying the raw name detection model to a collection of semi-structured data to form annotated semi?structured data identifying n-grams identifying names and n?grams not identifying names and applying the raw name detection model to a large unannotated corpus to form a large annotated corpus data identifying n-grams of the large unannotated corpus identifying names and n-grams not identifying names. The method includes generating a name detection model, including deriving a name model using the annotated semi-structured data identifying names and the large annotated corpus data identifying names, deriving a not-name model using the semi?structured data not identifying names, and deriving a language model using the large annotated corpus.	12-02-2010
20110022952	Determining Proximity Measurements Indicating Respective Intended Inputs - Determination of proximity measurements indicative of respective intended inputs are disclosed. User inputs are received, where each user input is one of a predefined plurality of inputs that each map to multiple characters in a language. Rates of user selections of candidates decoded from the user inputs into the language are received, where each of the candidates includes one or more characters in the language. User inputs for the candidates having low rates of selection as non-selected user inputs are identified. User inputs for the candidates having high rates of selection as intended inputs are identified. The intended user inputs to the non-selected user inputs are compared to identify one or more misspelled input and intended input pairs. A proximity measurement for each misspelled input and intended input pair is determined based on a ratio of the number of times corresponding candidates for the misspelled input were not selected to the number of times the misspelled input was entered.	01-27-2011
20110137642	Word Detection - Methods, systems, and apparatus, including computer program products, in which data from web documents are partitioned into a training corpus and a development corpus are provided. First word probabilities for words are determined for the training corpus, and second word probabilities for the words are determined for the development corpus. Uncertainty values based on the word probabilities for the training corpus and the development corpus are compared, and new words are identified based on the comparison.	06-09-2011
20110238413	DOMAIN DICTIONARY CREATION - Methods, systems, and apparatus, including computer program products, to identify topic words in a collection of documents that includes topic documents related to a topic are disclosed. A reference topic word divergence value based on a document collection and the topic document collection is determined. A candidate topic word divergence value for a candidate topic word is determined based on the document collection and the topic document collection. The candidate topic word is determined to be a topic word if the candidate topic word divergence value is greater than the reference topic word divergence value.	09-29-2011
20110296374	CUSTOM LANGUAGE MODELS - Systems, methods, and apparatuses including computer program products for generating a custom language model. In one implementation, a method is provided. The method includes receiving a collection of documents; clustering the documents into one or more clusters; generating a cluster vector for each cluster of the one or more clusters; generating a target vector associcated with a target profile; comparing the target vector with each of the cluster vectors; selecting one or more of the one or more clusters based on the comparison; and generating a language model using documents from the one or more selected clusters.	12-01-2011
20130103696	Suggesting and Refining User Input Based on Original User Input - Systems and methods to generate modified/refined user inputs based on the original user input, such as a search query, are disclosed. The method may be implemented for Roman-based and/or non-Roman based language such as Chinese. The method may generally include receiving an original user input and identifying core terms therein, determining potential alternative inputs by replacing core term(s) in the original input with another term according to a similarity matrix and/or substituting a word sequence in the original input with another word sequence according to an expansion/contraction table where one word sequence is a substring of the other, computing likelihood of each potential alternative input, and selecting most likely alternative inputs according to a predetermined criteria, e.g., likelihood of the alternative input being at least that of the original input. A cache containing pre-computed original user inputs and corresponding alternative inputs may be provided.	04-25-2013
20140012839	SUGGESTING ALTERNATIVE QUERIES IN QUERY RESULTS - Methods, systems, and apparatus, including computer program products, for suggesting alternative queries based on original query search results. In one aspect, a method includes receiving search results for a first query, where each search result refers to a respective resource and includes a snippet of content from the respective resource, receiving one or more suggested second queries, for each of the suggested second queries: selecting a set of words in one of the snippets to represent the suggested second query, associating the suggested second query with the set so that a user can interact with a word in the set to invoke the suggested second query, and marking the set so as to indicate that the user can interact with a word in the set to invoke the suggested second query, and transmitting the search results including each marked set to a client device for presentation to the user.	01-09-2014
20140258892	RESOURCE LOCATOR SUGGESTIONS FROM INPUT CHARACTER SEQUENCE - Methods, systems, and apparatus, including computer program products, in which an input method editor receives Roman character inputs, identifies keywords for candidate sets of a non-Roman character, and identifies an associated resource location. Upon identifying an associated resource location, associating the resource location with the candidate set of non-Roman characters.	09-11-2014

Patent applications by Jun Wu, Saratoga, CA US