Patent application number | Description | Published |
20100106505 | USING WORD CONFIDENCE SCORE, INSERTION AND SUBSTITUTION THRESHOLDS FOR SELECTED WORDS IN SPEECH RECOGNITION - A method and system for improving the accuracy of a speech recognition system using word confidence score (WCS) processing is introduced. Parameters in a decoder are selected to minimize a weighted total error rate, such that deletion errors are weighted more heavily than substitution and insertion errors. The occurrence distribution in WCS is different depending on whether the word was correctly identified and based on the type of error. This is used to determine thresholds in WCS for insertion and substitution errors. By processing the hypothetical word (HYP) (output of the decoder), a mHYP (modified HYP) is determined. In some circumstances, depending on the WCS's value in relation to insertion and substitution threshold values, mHYP is set equal to: null, a substituted HYP, or HYP. | 04-29-2010 |
20100145677 | System and Method for Making a User Dependent Language Model - A language model for a speech recognition engine is made based on user-viewed data files. The data files are reviewed and texts are extracted therefrom. The language model is generated based on the extracted texts. Transcriptions of previous user statements are not required. Different weighting factors can be applied to elements of the extracted texts based on the nature of the data files. The weighting factors are then considered during generation of the language model. A user dependent and application independent language model can be created prior to initial use of the speech recognition engine. | 06-10-2010 |
20100250240 | SYSTEM AND METHOD FOR TRAINING AN ACOUSTIC MODEL WITH REDUCED FEATURE SPACE VARIATION - Feature space variation associated with specific text elements is reduced by training an acoustic model with a phoneme set, dictionary and transcription set configured to better distinguish the specific text elements and at least some specific phonemes associated therewith. The specific text elements can include the most frequently occurring text elements from a text data set, which can include text data beyond the transcriptions of a training data set. The specific text elements can be identified using a text element distribution table sorted by occurrence within the text data set. Specific phonemes can be limited to consonant phonemes to improve speed and accuracy. | 09-30-2010 |
20100332230 | PHONETIC DISTANCE MEASUREMENT SYSTEM AND RELATED METHODS - Phonetic distances are empirically measured as a function of speech recognition engine recognition error rates. The error rates are determined by comparing a recognized speech file with a reference file. The phonetic distances can be normalized to earlier measurements. The phonetic distances/error rates can also be used to improve speech recognition engine grammar selection, as an aid in language training and evaluation, and in other applications. | 12-30-2010 |
20110196668 | Integrated Language Model, Related Systems and Methods - An integrated language model includes an upper-level language model component and a lower-level language model component, with the upper-level language model component including a non-terminal and the lower-level language model component being applied to the non-terminal. The upper-level and lower-level language model components can be of the same or different language model formats, including finite state grammar (FSG) and statistical language model (SLM) formats. Systems and methods for making integrated language models allow designation of language model formats for the upper-level and lower-level components and identification of non-terminals. Automatic non-terminal replacement and retention criteria can be used to facilitate the generation of one or both language model components, which can include the modification of existing language models. | 08-11-2011 |
20120014537 | System and Method for Automatic Microphone Volume Setting - Optimal microphone volumes are automatically set for computer applications based on determination of peak volume levels and noise levels from one or more digital audio captures. The peak volume levels and noise levels can be advantageously determined based on distribution curves of sample volume levels in the digital audio captures. Clipping can be automatically compensated for by estimating peak unclipped capture volume levels from the distribution curves. | 01-19-2012 |
20120046946 | SYSTEM AND METHOD FOR MERGING AUDIO DATA STREAMS FOR USE IN SPEECH RECOGNITION APPLICATIONS - A system and method for merging audio data streams receive audio data streams from separate inputs, independently transform each data stream from the time to the frequency domain, and generate separate feature data sets for the transformed data streams. Feature data from each of the separate feature data sets is selected to form a merged feature data set that is output to a decoder for recognition purposes. The separate inputs can include an ear microphone and a mouth microphone. | 02-23-2012 |
20140163989 | INTEGRATED LANGUAGE MODEL, RELATED SYSTEMS AND METHODS - An integrated language model includes an upper-level language model component and a lower-level language model component, with the upper-level language model component including a non-terminal and the lower-level language model component being applied to the non-terminal. The upper-level and lower-level language model components can be of the same or different language model formats, including finite state grammar (FSG) and statistical language model (SLM) formats. Systems and methods for making integrated language models allow designation of language model formats for the upper-level and lower-level components and identification of non-terminals. Automatic non-terminal replacement and retention criteria can be used to facilitate the generation of one or both language model components, which can include the modification of existing language models. | 06-12-2014 |