Class / Patent application number | Description | Number of patent applications / Date published |
704234000 | Normalizing | 19 |
20080201141 | SPEECH FILTERS - Utterances by a speaker are analyzed by an appropriate computational system. The spoken words are recognized and indexed to their respective analogs which are used to tailor the speech sequence to conform to a pre-determined standard of speech characteristics which could be fixed for a given language or chosen based on the regional characteristics of the said common language target for a communication session. Thusly selected audio sequences are then tailored or synthesized into the normalized characteristics and inserted into the outgoing speech stream such that the resulting audio sequence exhibits reduced speech characteristics deemed undesirable. | 08-21-2008 |
20090018828 | Automatic Speech Recognition System - An automatic speech recognition system includes: a sound source localization module for localizing a sound direction of a speaker based on the acoustic signals detected by the plurality of microphones; a sound source separation module for separating a speech signal of the speaker from the acoustic signals according to the sound direction; an acoustic model memory which stores direction-dependent acoustic models that are adjusted to a plurality of directions at intervals; an acoustic model composition module which composes an acoustic model adjusted to the sound direction, which is localized by the sound source localization module, based on the direction-dependent acoustic models, the acoustic model composition module storing the acoustic model in the acoustic model memory; and a speech recognition module which recognizes the features extracted by a feature extractor as character information using the acoustic model composed by the acoustic model composition module. | 01-15-2009 |
20090157400 | SPEECH RECOGNITION SYSTEM AND METHOD WITH CEPSTRAL NOISE SUBTRACTION - The invention relates to a speech recognition system and method with cepstral noise subtraction. The speech recognition system and method utilize a first scalar coefficient, a second scalar coefficient, and a determining condition to limit the process for the cepstral feature vector, so as to avoid excessive enhancement or subtraction in the cepstral feature vector, so that the operation of the cepstral feature vector is performed properly to improve the anti-noise ability in speech recognition. Furthermore, the speech recognition system and method can be applied in any environment, and have a low complexity and can be easily integrated into other systems, so as to provide the user with a more reliable and stable speech recognition result. | 06-18-2009 |
20090259465 | LOW LATENCY REAL-TIME VOCAL TRACT LENGTH NORMALIZATION - A method and system for training an automatic speech recognition system are provided. The method includes separating training data into speaker specific segments, and for each speaker specific segment, performing the following acts: generating spectral data, selecting a first warping factor and warping the spectral data, and comparing the warped spectral data with a speech model. The method also includes iteratively performing the steps of selecting another warping factor and generating another warped spectral data, comparing the other warped spectral data with the speech model, and if the other warping factor produces a closer match to the speech model, saving the other warping factor as the best warping factor for the speaker specific segment. The system includes modules configured to control a processor in the system to perform the steps of the method. | 10-15-2009 |
20090319265 | Method and system for efficient pacing of speech for transription - A method and system for improving the efficiency of real-time and non-real-time speech transcription by machine speech recognizers, human dictation typists, and human voicewriters using speech recognizers. In particular, the pacing with which recorded speech is presented to transcriptionists is automatically adjusted by monitoring the transcriptionists' output by comparing the output acoustically or phonetically to the presented recorded speech as well as monitoring the resulting transcription, and accordingly adjusting the pacing. | 12-24-2009 |
20100094626 | METHOD AND APPARATUS FOR LOCATING SPEECH KEYWORD AND SPEECH RECOGNITION SYSTEM - It is an object of the present invention to provide a method and apparatus for locating a keyword of a speech and a speech recognition system. The method includes the steps of: by extracting feature parameters from frames constituting the recognition target speech, forming a feature parameter vector sequence that represents the recognition target speech; by normalizing of the feature parameter vector sequence with use of a codebook containing a plurality of codebook vectors, obtaining a feature trace of the recognition target speech in a vector space; and specifying the position of a keyword by matching prestored keyword template traces with the feature trace. According to the present invention, a keyword template trace and a feature space trace of a target speech are drawn in accordance with an identical codebook. This causes resampling to be unnecessary in performing linear movement matching of speech wave frames having similar phonological feature structures. This makes it possible to improve the speed of location and recognition while ensuring the precision of recognition. | 04-15-2010 |
20100131270 | METHOD AND SYSTEM FOR REDUCING RECEPTION OF UNWANTED MESSAGES - The invention relates to a method for determining a characteristic pattern for a speech message that is supplied in the form of a numerically encoded audio signal generated by means of a sampling process. Said method comprises at least the following steps for determining the characteristic pattern on the basis of the numerically encoded audio signal: in a first step, non-speech portions of the audio signal are suppressed in that irrelevant frequency ranges are filtered out by applying a suitable signal filter, particularly a bandpass filter, to the audio signal; in a second step, a copy command (SQR) is used in order to copy all elements of the numerically encoded audio signal into the positive number range; in a third step, an audio signal sampling rate characterizing the sampling process is adjusted; in a fourth step, the new value range of all elements of the numerically encoded audio signal is scaled with regard to a maximum value and a mean value, said new value range being the result of the adjustment of the sampling rate. The invention further relates to a system for carrying out the disclosed method as well as devices and a corresponding communication network. | 05-27-2010 |
20100324893 | SYSTEM AND METHOD FOR IMPROVING ROBUSTNESS OF SPEECH RECOGNITION USING VOCAL TRACT LENGTH NORMALIZATION CODEBOOKS - Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker's vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook. | 12-23-2010 |
20120259632 | Online Maximum-Likelihood Mean and Variance Normalization for Speech Recognition - A feature transform for speech recognition is described. An input speech utterance is processed to produce a sequence of representative speech vectors. A time-synchronous speech recognition pass is performed using a decoding search to determine a recognition output corresponding to the speech input. The decoding search includes, for each speech vector after some first threshold number of speech vectors, estimating a feature transform based on the preceding speech vectors in the utterance and partial decoding results of the decoding search. The current speech vector is then adjusted based on the current feature transform, and the adjusted speech vector is used in a current frame of the decoding search. | 10-11-2012 |
20130179164 | VEHICLE VOICE INTERFACE SYSTEM CALIBRATION METHOD - A vehicle voice interface system calibration method comprising electronically convolving voice command data with voice impulse response data, electronically convolving audio system output data with feedback impulse response data, and calibrating the vehicle voice interface system. The voice command data is electronically convolved with voice impulse response data representing a voice acoustic signal path between an artificial mouth simulator and a first microphone, to simulate a voice acoustic transfer function pertaining to the passenger compartment. The audio system output data is convolved with feedback impulse response data representing a feedback acoustic signal path between a vehicle audio system output and a second microphone, to simulate a feedback acoustic transfer function pertaining to the passenger compartment. The voice interface system is calibrated to recognize voice commands represented by the voice command data based on the simulated voice and feedback acoustic transfer functions. | 07-11-2013 |
20140207448 | ADAPTIVE ONLINE FEATURE NORMALIZATION FOR SPEECH RECOGNITION - A speech recognition system adaptively estimates a warping factor used to reduce speaker variability. The warping factor is estimated using a small window (e.g. 100 ms) of speech. The warping factor is adaptively adjusted as more speech is obtained until the warping factor converges or a pre-defined maximum number of adaptation is reached. The speaker may be placed into a group selected from two or more groups based on characteristics that are associated with the speaker's window of speech. Different step sizes may be used within the different groups when estimating the warping factor. VTLN is applied to the speech input using the estimated warping factor. A linear transformation, including a bias term, may also be computed to assist in normalizing the speech along with the application of the VTLN. | 07-24-2014 |
20140222423 | Method and Apparatus for Efficient I-Vector Extraction - Most speaker recognition systems use i-vectors which are compact representations of speaker voice characteristics. Typical i-vector extraction procedures are complex in terms of computations and memory usage. According an embodiment, a method and corresponding apparatus for speaker identification, comprise determining a representation for each component of a variability operator, representing statistical inter- and intra-speaker variability of voice features with respect to a background statistical model, in terms of an orthogonal operator common to all components of the variability operator and having a first dimension larger than a second dimension of the components of the variability operator; computing statistical voice characteristics of a particular speaker using the determined representations; and employing the statistical voice characteristics of the particular speaker in performing speaker recognition. Computing the voice characteristics, by using the determined representations, results in significant reduction in memory usage and substantial increase in execution speed. | 08-07-2014 |
20150088498 | LOW LATENCY REAL-TIME VOCAL TRACT LENGTH NORMALIZATION - A method and system for training an automatic speech recognition system are provided. The method includes separating training data into speaker specific segments, and for each speaker specific segment, performing the following acts: generating spectral data, selecting a first warping factor and warping the spectral data, and comparing the warped spectral data with a speech model. The method also includes iteratively performing the steps of selecting another warping factor and generating another warped spectral data, comparing the other warped spectral data with the speech model, and if the other warping factor produces a closer match to the speech model, saving the other warping factor as the best warping factor for the speaker specific segment. The system includes modules configured to control a processor in the system to perform the steps of the method. | 03-26-2015 |
20150348536 | METHOD AND DEVICE FOR RECOGNIZING SPEECH - A speech is recognized using ACF factors extracted from running autocorrelation functions calculated from the speech. The extracted ACF factors are a W | 12-03-2015 |
20160004501 | AUDIO COMMAND INTENT DETERMINATION SYSTEM AND METHOD - Methods and apparatus are provided for generating aircraft cabin control commands from verbal speech onboard an aircraft. An audio command supplied to an audio input device is processed. Each word of the processed audio command is compared to words stored in a vocabulary map to determine a word type of each word. Each determined word type is processed to determine if an intent of the audio command is discernable. If the intent is discernable, an aircraft cabin control command is generated based on the discerned intent. If a partial intent is discernable, feedback is generated. | 01-07-2016 |
20160042732 | SYSTEM AND METHOD FOR ROBUST ACCESS AND ENTRY TO LARGE STRUCTURED DATA USING VOICE FORM-FILLING - A method, apparatus and machine-readable medium are provided. A phonotactic grammar is utilized to perform speech recognition on received speech and to generate a phoneme lattice. A document shortlist is generated based on using the phoneme lattice to query an index. A grammar is generated from the document shortlist. Data for each of at least one input field is identified based on the received speech and the generated grammar. | 02-11-2016 |
20160049144 | SYSTEM AND METHOD FOR UNIFIED NORMALIZATION IN TEXT-TO-SPEECH AND AUTOMATIC SPEECH RECOGNITION - A system, method and computer-readable storage devices are for using a single set of normalization protocols and a single language lexica (or dictionary) for both TTS and ASR. The system receives input (which is either text to be converted to speech or ASR training text), then normalizes the input. The system produces, using the normalized input and a dictionary configured for both automatic speech recognition and text-to-speech processing, output which is either phonemes corresponding to the input or text corresponding to the input for training the ASR system. When the output is phonemes corresponding to the input, the system generates speech by performing prosody generation and unit selection synthesis using the phonemes. When the output is text corresponding to the input, the system trains both an acoustic model and a language model for use in future speech recognition. | 02-18-2016 |
20160098993 | SPEECH PROCESSING APPARATUS, SPEECH PROCESSING METHOD AND COMPUTER-READABLE MEDIUM - A speech processing apparatus, method and non-transitory computer-readable storage medium are disclosed. A speech processing apparatus may include a memory storing instructions, and at least one processor configured to process the instructions to calculate an acoustic diversity degree value representing a degree of variation in types of sounds included in a speech signal representing a speech, on a basis of the speech signal, and compensate for a recognition feature value calculated to recognize specific attribute information from the speech signal, using the acoustic diversity degree value. | 04-07-2016 |
20160171975 | VOICE WAKEUP DETECTING DEVICE AND METHOD | 06-16-2016 |