Patent application number | Description | Published |
20100145687 | REMOVING NOISE FROM SPEECH - Method for removing noise from a digital speech waveform, including receiving the digital speech waveform having the noise contained therein, segmenting the digital speech waveform into one or more frames, each frame having a clean portion and a noisy portion, extracting a feature component from each frame, creating an nonlinear speech distortion model from the feature components, creating a statistical noise model by making a Piecewise Linear Approximation (PLA) of the nonlinear speech distortion model, determining the clean portion of each frame using the statistical noise model, a log power spectra of each frame, and a model of a digital speech waveform recorded in a noise controlled environment, and constructing a clean digital speech waveform from each clean portion of each frame. | 06-10-2010 |
20100239168 | SEMI-TIED COVARIANCE MODELLING FOR HANDWRITING RECOGNITION - Described is a technology by which handwriting recognition is performed using a semi-tied covariance modeling (STC) that requires far less memory than other models such as MQDF. Offline training, such as via maximum likelihood and/or minimum classification error techniques, provides classification data. The classification data includes semi-tied transforms that are shared by classes, along with a class-dependent diagonal matrix and a mean vector corresponding to each class. The semi-tied transforms and class-dependent diagonal matrices are obtained by processing a precision matrix for each class. In online recognition, received handwritten input (e.g., an East Asian character) is classified into a class, based upon the class-dependent diagonal matrices and the semi-tied transforms, by a STC recognizer that outputs similarity scores for candidates and a decision rule that selects the most likely class. | 09-23-2010 |
20100246941 | PRECISION CONSTRAINED GAUSSIAN MODEL FOR HANDWRITING RECOGNITION - Described is a technology by which handwriting recognition is performed using a precision constrained Gaussian model (PCGM) that requires far less memory than other models such as MQDF. Offline training, such as via maximum likelihood and/or minimum classification error techniques, provides classification data. The classification data includes basis matrices that are shared by classes, along with weighting coefficients and a mean vector corresponding to each class. The base matrices and weights are obtained by expanding a precision matrix for each class. In online recognition, received handwritten input (e.g., an East Asian character) is classified into a class, based upon the per-class mean vector and weighting coefficients, and the basis matrices, by a PCGM recognizer that outputs similarity scores for candidates and a decision rule that selects the most likely class. | 09-30-2010 |
20100262423 | FEATURE COMPENSATION APPROACH TO ROBUST SPEECH RECOGNITION - Described is a technology by which a feature compensation approach to speech recognition uses a high-order vector Taylor series (HOVTS) approximation of a model of distortions to improve recognition accuracy. Speech recognizer models trained with clean speech degrade when later dealing with speech that is corrupted by additive noises and convolutional distortions. The approach attempts to remove any such noise/distortions from the input speech. To use the HOVTS approximation, a Gaussian mixture model is trained and used to convert cepstral domain feature vectors to log spectrum components. HOVTS computes statistics for the components, which are transformed back to the cepstral domain. A noise/distortion estimate is obtained, and used to provide a clean speech estimate to the recognizer. | 10-14-2010 |
20110157012 | RECOGNIZING INTERACTIVE MEDIA INPUT - Techniques and systems for inputting data to interactive media devices are disclosed herein. In some aspects, a sensing device senses an object as it moves in a trajectory indicative of a desired input to an interactive media device. Recognition software may be used to translate the trajectory into various suggested characters or navigational commands. The suggested characters may be ranked based on a likelihood of being an intended input. The suggested characters may be displayed on a user interface at least in part based on the rank and made available for selection as the intended input. | 06-30-2011 |
20110257976 | Robust Speech Recognition - Speech recognition includes structured modeling, irrelevant variability normalization and unsupervised online adaptation of speech recognition parameters. | 10-20-2011 |
20110262033 | COMPACT HANDWRITING RECOGNITION - One or more techniques and/or systems are disclosed for constructing a compact handwriting character classifier. A precision constrained Gaussian model (PCGM) based handwriting classifier is trained by estimating parameters for the PCGM under minimum classification error (MCE) criterion, such as by using a computer-based processor. The estimated parameters of the trained PCGM classifier are compressed using split vector quantization (VQ) (e.g., and in some embodiments, scalar quantization) to compact the handwriting recognizer in computer-based memory. | 10-27-2011 |
20110268351 | AFFINE DISTORTION COMPENSATION - One or more techniques and/or systems are disclosed for compensating for affine distortions in handwriting recognition. Orientation estimation is performed on a handwriting sample to generate a set of likely characters for the sample. An estimated affine transform is determined for the sample by applying hidden Markov model (HMM) based minimax testing to the sample using the set of likely characters. The estimated affine transform is applied to the sample to compensate for the affine distortions of the sample, yielding an affine distortion compensated sample. | 11-03-2011 |
20110280484 | FEATURE DESIGN FOR HMM-BASED HANDWRITING RECOGNITION - The disclosed architecture is a new feature extraction approach to handwriting recognition. Given an handwriting sample (e.g., from an online source), a sequence of time-ordered dominant points are extracted, which include stroke-endings, points corresponding to local extrema of curvature, and points with a large distance to the chords formed by pairs of previously identified neighboring dominant points. At each dominant point, a multi-dimensional feature vector is extracted, which includes a combination of coordinate features, delta features, and double-delta features. | 11-17-2011 |
20120280974 | PHOTO-REALISTIC SYNTHESIS OF THREE DIMENSIONAL ANIMATION WITH FACIAL FEATURES SYNCHRONIZED WITH SPEECH - Dynamic texture mapping is used to create a photorealistic three dimensional animation of an individual with facial features synchronized with desired speech. Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which the animation will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with facial features, such as lip movements, synchronized with the desired speech. This image sequence is applied to the three-dimensional model. | 11-08-2012 |
20130103383 | TRANSLATING LANGUAGE CHARACTERS IN MEDIA CONTENT - Some implementations disclosed herein provide techniques and arrangements to enable translating language characters in media content. For example, some implementations receive a user selection of a first portion of media content. Some implementations disclosed herein may, based on the first portion, identify a second portion of the media content. The second portion of the media content may include one or more first characters of a first language. Some implementations disclosed herein may create an image that includes the second portion of the media content and may send the image to a server. Some implementations disclosed herein may receive one or more second characters of a second language corresponding to a translation of the one or more first characters of the first language from the server. | 04-25-2013 |
20130185070 | NORMALIZATION BASED DISCRIMINATIVE TRAINING FOR CONTINUOUS SPEECH RECOGNITION - A speech recognition system trains a plurality of feature transforms and a plurality of acoustic models using an irrelevant variability normalization based discriminative training. The speech recognition system employs the trained feature transforms to absorb or ignore variability within an unknown speech that is irrelevant to phonetic classification. The speech recognition system may then recognize the unknown speech using the trained recognition models. The speech recognition system may further perform an unsupervised adaptation to adapt the feature transforms for the unknown speech and thus increase the accuracy of recognizing the unknown speech. | 07-18-2013 |
20130251249 | ROTATION-FREE RECOGNITION OF HANDWRITTEN CHARACTERS - A character recognition system receives an unknown character and recognizes the character based on a pre-trained recognition model. Prior to recognizing the character, the character recognition system may pre-process the character to rotate the character to a normalized orientation. By rotating the character to a normalized orientation in both training and recognition stages, the character recognition system releases the pre-trained recognition model from considering character prototypes in different orientations and thereby speeds up recognition of the unknown character. In one example, the character recognition system rotates the character to the normalized orientation by aligning a line between a sum of coordinates of starting points and a sum of coordinates of ending points of each stroke of the character with a normalized direction. | 09-26-2013 |
20140257814 | Posterior-Based Feature with Partial Distance Elimination for Speech Recognition - A high-dimensional posterior-based feature with partial distance elimination may be utilized for speech recognition. The log likelihood values of a large number of Gaussians are needed to generate the high-dimensional posterior feature. Gaussians with very small log likelihoods are associated with zero posterior values. Log likelihoods for Gaussians for a speech frame may be evaluated with a partial distance elimination method. If the partial distance of a Gaussian is already too small, the Gaussian will have a zero posterior value. The partial distance may be calculated by sequentially adding individual dimensions in a group of dimensions. The partial distance elimination occurs when less than all of the dimensions in the group are sequentially added. | 09-11-2014 |