Patent application number | Description | Published |
20080228928 | Multimedia content filtering - Methods and apparatuses to filter multimedia content are described. The multimedia content in one embodiment is analyzed for one or more parameters. The multimedia content in one embodiment is filtered based on the one or more parameters using a latent semantic mapping (“LSM”) filter. In one embodiment, the one or more parameters include information about a structure of the multimedia content. A tag that encapsulates the one or more parameters may be generated. Then, the tag is input into the latent semantic mapping filter. In one embodiment, the LSM filter is trained to recognize the multimedia content based on the one or more parameters. In one embodiment, more than two categories are provided for a multimedia content. The multimedia content is classified in more than two categories using the LSM filter. The multimedia content may be blocked based on the classifying. | 09-18-2008 |
20100082327 | SYSTEMS AND METHODS FOR MAPPING PHONEMES FOR TEXT TO SPEECH SYNTHESIS - Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back. | 04-01-2010 |
20100082344 | SYSTEMS AND METHODS FOR SELECTIVE RATE OF SPEECH AND SPEECH PREFERENCES FOR TEXT TO SPEECH SYNTHESIS - Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back. | 04-01-2010 |
20100082348 | SYSTEMS AND METHODS FOR TEXT NORMALIZATION FOR TEXT TO SPEECH SYNTHESIS - Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back. | 04-01-2010 |
20100082349 | SYSTEMS AND METHODS FOR SELECTIVE TEXT TO SPEECH SYNTHESIS - Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back. | 04-01-2010 |
20110112825 | SENTIMENT PREDICTION FROM TEXTUAL DATA - A semantically organized domain space is created from a training corpus. Affective data are mapped onto the domain space to generate affective anchors for the domain space. A sentiment associated with an input text is determined based the affective anchors. A speech output may be generated from the input text based on the determined sentiment. | 05-12-2011 |
20140088964 | Exemplar-Based Latent Perceptual Modeling for Automatic Speech Recognition - Methods, systems, and computer-readable media related to selecting observation-specific training data (also referred to as “observation-specific exemplars”) from a general training corpus, and then creating, from the observation-specific training data, a focused, observation-specific acoustic model for recognizing the observation in an output domain are disclosed. In one aspect, a global speech recognition model is established based on an initial set of training data; a plurality of input speech segments to be recognized in an output domain are received; and for each of the plurality of input speech segments: a respective set of focused training data relevant to the input speech segment is identified in the global speech recognition model; a respective focused speech recognition model is generated based on the respective set of focused training data; and the respective focused speech recognition model is provided to a recognition device for recognizing the input speech segment in the output domain. | 03-27-2014 |
Patent application number | Description | Published |
20110004475 | METHODS AND APPARATUSES FOR AUTOMATIC SPEECH RECOGNITION - Exemplary embodiments of methods and apparatuses for automatic speech recognition are described. First model parameters associated with a first representation of an input signal are generated. The first representation of the input signal is a discrete parameter representation. Second model parameters associated with a second representation of the input signal are generated. The second representation of the input signal includes a continuous parameter representation of residuals of the input signal. The first representation of the input signal includes discrete parameters representing first portions of the input signal. The second representation includes discrete parameters representing second portions of the input signal that are smaller than the first portions. Third model parameters are generated to couple the first representation of the input signal with the second representation of the input signal. The first representation and the second representation of the input signal are mapped into a vector space. | 01-06-2011 |
20120011124 | UNSUPERVISED DOCUMENT CLUSTERING USING LATENT SEMANTIC DENSITY ANALYSIS - According to one embodiment, a latent semantic mapping (LSM) space is generated from a collection of a plurality of documents, where the LSM space includes a plurality of document vectors, each representing one of the documents in the collection. For each of the document vectors considered as a centroid document vector, a group of document vectors is identified in the LSM space that are within a predetermined hypersphere diameter from the centroid document vector. As a result, multiple groups of document vectors are formed. The predetermined hypersphere diameter represents a predetermined closeness measure among the document vectors in the LSM space. Thereafter, a group from the plurality of groups is designated as a cluster of document vectors, where the designated group contains a maximum number of document vectors among the plurality of groups. | 01-12-2012 |
20120308138 | Multi-resolution spatial feature extraction for automatic handwriting recognition - A first technique of recognizing content is disclosed, including: determining a first value representative of a pixel content present at a first set of pixels associated with a first distance from a pixel under consideration; determining a second value representative of a pixel content present at a second set of pixels associated with a second distance from the pixel under consideration; and using the first and second values to compute one or more spatial features associated with the pixel under consideration for purposes of content recognition. A second technique of recognizing content is also disclosed, including: determining, for a pixel, a first value representative of a first feature associated with a set of pixels associated with a first direction from the pixel; and determining, for the pixel, a second value representative of a second feature associated with a set of pixels associated with a second direction from the pixel. | 12-06-2012 |
20120308143 | Integrating feature extraction via local sequential embedding for automatic handwriting recognition - Integrating features is disclosed, including: determining a value associated with a temporal feature for a point; determining a value associated with a spatial feature associated with the temporal feature; including the value associated with a spatial feature and the value associated with the temporal feature into a feature vector; and using the feature vector to decode for a character. Determining a transform is also disclosed, including: determining, for a point associated with a sequence of points, a set of points including: the point, a first subset of points of the sequence preceding a sequence position associated with the point, and a second subset of points following the sequence position associated with the point; and determining the transform associated with the point based at least in part on the set of points. | 12-06-2012 |
20130311487 | SEMANTIC SEARCH USING A SINGLE-SOURCE SEMANTIC MODEL - Techniques for providing semantic search of a data store are disclosed. A similarity metric of a document comprising the data store to a concept represented in a semantic model derived at least in part from a reference source that includes content not included in the data store is determined. A relevance metric of a search query to the concept is computed. The similarity metric and the relevance metric are used to determine, at least in part, a ranking of the document with respect to the search query. | 11-21-2013 |
20140195237 | FAST, LANGUAGE-INDEPENDENT METHOD FOR USER AUTHENTICATION BY VOICE - A method and system for training a user authentication by voice signal are described. In one embodiment, a set of feature vectors are decomposed into speaker-specific recognition units. The speaker-specific recognition units are used to compute distribution values to train the voice signal. In addition, spectral feature vectors are decomposed into speaker-specific characteristic units which are compared to the speaker-specific distribution values. If the speaker-specific characteristic units are within a threshold limit of the speaker-specific distribution values, the speech signal is authenticated. | 07-10-2014 |
20140324435 | COMBINED STATISTICAL AND RULE-BASED PART-OF-SPEECH TAGGING FOR TEXT-TO-SPEECH SYNTHESIS - In response to a word of a text sequence, a first part-of-speech (POS) tag is generated using a statistical part-of-speech (POS) tagger based on a corpus of trained text sequences, each representing a likely POS of a word for a given text sequence. A second POS tag is generated using a rule-based POS tagger based on a set of one or more rules associated with a type of an application associated with the text sequence. A final POS tag is assigned to the word of the text sequence for TTS synthesis based on the first POS tag and the second POS tag. | 10-30-2014 |
20140361983 | REAL-TIME STROKE-ORDER AND STROKE-DIRECTION INDEPENDENT HANDWRITING RECOGNITION - Methods, systems, and computer-readable media related to a technique for providing handwriting input functionality on a user device. A handwriting recognition module is trained to have a repertoire comprising multiple non-overlapping scripts and capable of recognizing tens of thousands of characters using a single handwriting recognition model. The handwriting input module provides real-time, stroke-order and stroke-direction independent handwriting recognition for multi-character handwriting input. In particular, real-time, stroke-order and stroke-direction independent handwriting recognition is provided for multi-character, or sentence level Chinese handwriting recognition. User interfaces for providing the handwriting input functionality are also disclosed. | 12-11-2014 |
20140363074 | MULTI-SCRIPT HANDWRITING RECOGNITION USING A UNIVERSAL RECOGNIZER - Methods, systems, and computer-readable media related to a technique for providing handwriting input functionality on a user device. A handwriting recognition module is trained to have a repertoire comprising multiple non-overlapping scripts and capable of recognizing tens of thousands of characters using a single handwriting recognition model. The handwriting input module provides real-time, stroke-order and stroke-direction independent handwriting recognition. User interfaces for providing the handwriting input functionality are also disclosed. | 12-11-2014 |
20140363082 | INTEGRATING STROKE-DISTRIBUTION INFORMATION INTO SPATIAL FEATURE EXTRACTION FOR AUTOMATIC HANDWRITING RECOGNITION - Methods, systems, and computer-readable media related to a technique for providing handwriting input functionality on a user device. A handwriting recognition module is trained to have a repertoire comprising multiple non-overlapping scripts and capable of recognizing tens of thousands of characters using a single handwriting recognition model. The handwriting input module provides real-time, stroke-order and stroke-direction independent handwriting recognition. In some embodiments, temporally-derived features are used to improve recognition accuracy without compromising the stroke-order and stroke-direction independence of the recognition system. | 12-11-2014 |
20140363083 | MANAGING REAL-TIME HANDWRITING RECOGNITION - Methods, systems, and computer-readable media related to a technique for providing handwriting input functionality on a user device. A handwriting recognition module is trained to have a repertoire comprising multiple non-overlapping scripts and capable of recognizing tens of thousands of characters using a single handwriting recognition model. The handwriting input module provides real-time, stroke-order and stroke-direction independent handwriting recognition for multi-character handwriting input. In particular, real-time, stroke-order and stroke-direction independent handwriting recognition is provided for multi-character, or sentence level Chinese handwriting recognition. User interfaces for providing the handwriting input functionality are also disclosed. | 12-11-2014 |
20140365880 | UNIFIED RANKING WITH ENTROPY-WEIGHTED INFORMATION FOR PHRASE-BASED SEMANTIC AUTO-COMPLETION - Methods, systems, and computer-readable media related to a technique for combining two or more aspects of predictive information for auto-completion of user input, in particular, user commands directed to an intelligent digital assistant. Specifically, predictive information based on (1) usage frequency, (2) usage recency, and (3) semantic information encapsulated in an ontology (e.g., a network of domains) implemented by the digital assistant, are integrated in a balanced and sensible way within a unified framework, such that a consistent ranking of all completion candidates across all domains may be achieved. Auto-completions are selected and presented based on the unified ranking of all completion candidates. | 12-11-2014 |
20140365949 | MANAGING REAL-TIME HANDWRITING RECOGNITION - Methods, systems, and computer-readable media related to a technique for providing handwriting input functionality on a user device. A handwriting recognition module is trained to have a repertoire comprising multiple non-overlapping scripts and capable of recognizing tens of thousands of characters using a single handwriting recognition model. The handwriting input module provides real-time, stroke-order and stroke-direction independent handwriting recognition for multi-character handwriting input. In particular, real-time, stroke-order and stroke-direction independent handwriting recognition is provided for multi-character, or sentence level Chinese handwriting recognition. User interfaces for providing the handwriting input functionality are also disclosed. | 12-11-2014 |