Detect speech in noise

Subclass of:

704 - Data processing: speech signal processing, linguistics, language translation, and audio compression/decompression

704200000 - SPEECH SIGNAL PROCESSING

704231000 - Recognition

Patent class list (only not empty are listed)

Deeper subclasses:

Document	Title	Date
Entries
20080228477	Method and Device For Processing a Voice Signal For Robust Speech Recognition - A speech signal is processed for subsequent speech recognition. The speech signal is tainted by noise and represents at least one speech command. The following steps are executed: a) recording of the noise-tainted speech signal; b) use of noise reduction on the speech signal to generate a noise-reduced speech signal; c) normalization of the noise-reduced speech signal to a target signal value with the aid of a normalization factor, to generate a noise-reduced, normalized speech signal).	09-18-2008
20080228478	Targeted speech - A system detects a speech segment that may include unvoiced, fully voiced, or mixed voice content. The system includes a digital converter that converts a time-varying input signal into a digital-domain signal. A window function passes signals within a programmed aural frequency range while substantially blocking signals above and below the programmed aural frequency range when multiplied by an output of the digital converter. A frequency converter converts the signals passing within the programmed aural frequency range into a plurality of frequency bins. A background voice detector estimates the strength of a background speech segment relative to the noise of selected portions of the aural spectrum. A noise estimator estimates a maximum distribution of noise to an average of an acoustic noise power of some of the plurality of frequency bins. A voice detector compares the strength of a desired speech segment to a criterion based on an output of the background voice detector and an output of the noise estimator.	09-18-2008
20080235013	METHOD AND APPARATUS FOR ESTIMATING NOISE BY USING HARMONICS OF VOICE SIGNAL - Disclosed is a method and an apparatus for estimating noise included in a sound signal during sound signal processing. The method includes estimating harmonics components in a frame of an input sound signal; using the estimated harmonics components, computing a Voice Presence Probability (VPP) on the frame of the input sound signal; determining a weight of an equation necessary to estimate a noise spectrum, depending on the computed VPP; and using the determined weight and the equation necessary to estimate a noise spectrum, estimating the noise spectrum, and updating the noise spectrum.	09-25-2008
20080249771	System and method of voice activity detection in noisy environments - An efficient voice activity detection method and system suitable for real-time operation in low SNR (signal-to-noise) environments corrupted by non-Gaussian non-stationary background noise. The method utilizes rank order statistics to generate a binary voice detection output based on deviations between a short-term energy magnitude signal and a short-term noise reference signal. The method does not require voice-free training periods to track the background noise nor is it susceptible to rapid changes in overall noise level making it very robust. In addition a long-term adaptation mechanism is applied to reject harmonic or tonal interference.	10-09-2008
20080249772	APPARATUS AND METHOD FOR ENHANCING SPEECH INTELLIGIBILITY IN A MOBILE TERMINAL - An apparatus and a method for enhancing speech intelligibility in a mobile terminal. A complex spectrum calculator calculates complex spectra of one input frame of an input speech signal by Fourier transform, a speech level calculator calculates its instant levels, an average speech level calculator calculates an average speech level of the speech frame using the instant levels, if the input frame is a speech frame, a scaling factor calculator calculates scaling factors by comparing the average speech level with the instant levels, an HPF characteristic calculator calculates amplitude characteristics using the scaling factors, a HPF high-pass-filters the complex spectra using the amplitude characteristics, a synthesizer converts high-pass-filtered signals to time signals by inverse Fourier transform and synthesizes the time signals, and a combiner outputs an enhanced intelligibility speech signal by combining the synthesized time signal with the input frame.	10-09-2008
20080281591	METHOD OF PATTERN RECOGNITION USING NOISE REDUCTION UNCERTAINTY - A method and apparatus are provided for using the uncertainty of a noise-removal process during pattern recognition. In particular, noise is removed from a representation of a portion of a noisy signal to produce a representation of a cleaned signal. In the meantime, an uncertainty associated with the noise removal is computed and is used with the representation of the cleaned signal to modify a probability for a phonetic state in the recognition system. In particular embodiments, the uncertainty is used to modify a probability distribution, by increasing the variance in each Gaussian distribution by the amount equal to the estimated variance of the cleaned signal, which is used in decoding the phonetic state sequence in a pattern recognition task.	11-13-2008
20080294432	Signal enhancement and speech recognition - Provides speech enhancement techniques which are effective even for extemporaneous noise without a noise interval and unknown extemporaneous noise. An example of a signal enhancement device includes: spectral subtraction means for subtracting a given reference signal from an input signal containing a target signal and a noise signal by spectral subtraction; an adaptive filter applied to the reference signal; and coefficient control means for controlling a filter coefficient of the adaptive filter in order to reduce components of the noise signal in the input signal. In the signal enhancement device, a database of a signal model concerning the target signal expressing a given feature by means of a given statistical model is provided, and the filter coefficient is controlled based on the likelihood of the signal model with respect to an output signal from the spectral subtraction means.	11-27-2008
20080300871	METHOD AND APPARATUS FOR IDENTIFYING ACOUSTIC BACKGROUND ENVIRONMENTS TO ENHANCE AUTOMATIC SPEECH RECOGNITION - Disclosed are systems, methods, and computer readable media for identifying an acoustic environment of a caller. The method embodiment comprises analyzing acoustic features of a received audio signal from a caller, receiving meta-data information, classifying a background environment of the caller based on the analyzed acoustic features and the meta-data, selecting an acoustic model matched to the classified background environment from a plurality of acoustic models, and performing speech recognition as the received audio signal using the selected acoustic model.	12-04-2008
20080306736	METHOD AND SYSTEM FOR A SUBBAND ACOUSTIC ECHO CANCELLER WITH INTEGRATED VOICE ACTIVITY DETECTION - Methods and systems for a subband acoustic echo canceller with integrated voice activity detection are disclosed and may include adjusting transmit and/or receive powers of wirelessly communicated audio signals based on voice activity detection via subband analysis of the wirelessly communicated audio signals. The receive power may be adjusted by utilizing a reduced duty cycle, or by conveying voice activity detection information via an asynchronous control channel in a Bluetooth application. A plurality of subbands may be generated utilizing a fast Fourier transform, and a first subset of the subbands corresponding to voice activity may be selected and a second subset of the subbands may be selected that corresponds to background noise. The processing of the subsets may be dynamically adjusted due to variations in the voice activity or background noise. Comfort noise may be generated and transmitted at a reduced bandwidth utilizing the second subset of the subbands.	12-11-2008
20080312918	VOICE PERFORMANCE EVALUATION SYSTEM AND METHOD FOR LONG-DISTANCE VOICE RECOGNITION - A system and a method are provided for evaluating a voice performance in order to recognize a long-distance voice. The system implements a voice performance evaluation function for long-distance voice input in a robot. Particularly, in robots including a network robot, it is required to normally perform voice recognition so that a speaking subject and a surrounding situation can be recognized by a robot. Accordingly, in order to obtain the most optimal voice quality, it is important to find a noise removal algorithm through an optimal hardware configuration and an optimal combination of the optimal hardware configuration and software. Therefore, a method for finding a noise removal algorithm appropriate for each of cases, including one case where a distance from a speaking subject is fixed and another case where a distance from a speaking subject changes. As a result, the most optimal voice quality can be obtained regardless of a noise environment even when the speaking subject is a long distance away from the robot.	12-18-2008
20090006088	SYSTEM AND METHOD OF PERFORMING SPEECH RECOGNITION BASED ON A USER IDENTIFIER - Speech recognition models are dynamically re-configurable based on user information, application information, background information such as background noise and transducer information such as transducer response characteristics to provide users with alternate input modes to keyboard text entry. Word recognition lattices are generated for each data field of an application and dynamically concatenated into a single word recognition lattice. A language model is applied to the concatenated word recognition lattice to determine the relationships between the word recognition lattices and repeated until the generated word recognition lattices are acceptable or differ from a predetermined value only by a threshold amount. These techniques of dynamic re-configurable speech recognition provide for deployment of speech recognition on small devices such as mobile phones and personal digital assistants as well environments such as office, home or vehicle while maintaining the accuracy of the speech recognition.	01-01-2009
20090012786	Adaptive Noise Cancellation - Speech-free noise estimation by cancellation of speech content from an audio input where the speech content is estimated by noise suppression. Adaptive noise cancellation with primary and noise-reference inputs and an adaptive noise cancellation filter from estimating primary noise from noise-reference input. Speech Suppressor (Noise Estimation) applied to noise-reference input provides speech-free noise estimates for noise cancellation in the primary input.	01-08-2009
20090030679	AMBIENT NOISE INJECTION FOR USE IN SPEECH RECOGNITION - A method of ambient noise injection for use with speech recognition in a production vehicle. The method includes the steps of monitoring audio including user speech, receiving an utterance from the user speech, retrieving vehicle-specific ambient noise, and prepending the vehicle-specific ambient noise to the utterance before pre-processing and decoding the utterance.	01-29-2009
20090043577	SIGNAL PRESENCE DETECTION USING BI-DIRECTIONAL COMMUNICATION DATA - A system and method for using bi-directional conversation data to improve signal presence detection are disclosed. The detector module is adapted to communicate with a signal enhancement module. The detector module collects data from a transmit direction of the connection and a receive direction of a data connection. The collected data from the transmit and the receive direction is used to classify at least one of data in the transmit direction and data in the receive direction. Responsive to the classification, the signal enhancement module enhances data in one of the transmit direction and the receive direction. Hence, data classification accuracy is improved by using data from both the transmit and receive directions. In one embodiment, the detector module applies a voice activity detection module (VAD) process to detect the presence or absence of voice data in the collected data.	02-12-2009
20090055173	SUB BAND VAD - The present invention relates to a voice detector being responsive to an input signal being divided into sub-signals representing a frequency sub-band, comprising: means to calculate, for each sub-band, an SNR value snr[n] based on a corresponding sub-signal for each sub-band and a background signal for each sub-band. The voice detector further comprises: means to calculate a power SNR value for each sub-band, wherein at least one of said power SNR values is calculated based on a non-linear function, means to form a single value snr_sum based on the calculated power SNR values, and means to compare said single value snr_sum and a given threshold value vad_thr to make a voice activity decision vad_prim presented on an output port. The invention also relates to a voice activity detector, a node and a method for selectively suppressing sub-bands in a voice detector.	02-26-2009
20090063143	SYSTEM FOR SPEECH SIGNAL ENHANCEMENT IN A NOISY ENVIRONMENT THROUGH CORRECTIVE ADJUSTMENT OF SPECTRAL NOISE POWER DENSITY ESTIMATIONS - A system estimates the spectral noise power density of an audio signal includes a spectral noise power density estimation unit, a correction term processor, and a combination processor. The spectral noise power density estimation unit may provide a first estimate of the spectral noise power density of the audio signal. The correction term processor may provide a time dependent correction term based, at least in part, on a spectral noise power density estimation error of the actual spectral noise power density. The correction term may be determined so that the spectral noise power density estimation error is reduced. The combination processor may combine the first estimate with the correction term to obtain a second estimate of the spectral noise power density that may be used for subsequent signal processing to enhance a desired signal component of the audio signal.	03-05-2009
20090070108	METHOD AND SYSTEM FOR IDENTIFYING SPEECH SOUND AND NON-SPEECH SOUND IN AN ENVIRONMENT - In a method and system for identifying speech sound and non-speech sound in an environment, a speech signal and other non-speech signals are identified from a mixed sound source having a plurality of channels. The method includes the following steps: (a) using a blind source separation (BSS) unit to separate the mixed sound source into a plurality of sound signals; (b) storing spectrum of each of the sound signals; (c) calculating spectrum fluctuation of each of the sound signals in accordance with stored past spectrum information and current spectrum information sent from the blind source separation unit; and (d) identifying one of the sound signals that has a largest spectrum fluctuation as the speech signal.	03-12-2009
20090076813	METHOD FOR SPEECH RECOGNITION USING UNCERTAINTY INFORMATION FOR SUB-BANDS IN NOISE ENVIRONMENT AND APPARATUS THEREOF - According to a method and apparatus for speech recognition in noise environment of the present invention using uncertainty information for sub-band, uncertainty information of each sub-band is extracted from estimated clean speech using noise modeling, and helps to extract speech features that are robust to noise using the extracted uncertainty information as a weight with respect to each sub-band. Also, an acoustic model is converted according to each sub-band weight, and speech recognition is performed based on the converted acoustic model and the extracted speech features. As a result, while the noise modeling over time is not so accurate, noise influence resulted from sub-bands having high corruption can be reduced according to the uncertainty information of the corresponding sub-band, and speech recognition performance in complex noise environments can be improved.	03-19-2009
20090076814	Apparatus and method for determining speech signal - Provided are a method and apparatus for discriminating a speech signal. The apparatus for discriminating a speech signal includes: an input signal quality improver for reducing additional noise from an acoustic signal received from outside; a first start/end-point detector for receiving the acoustic signal from the input signal quality improver and detecting an end-point of a speech signal included in the acoustic signal; a voiced-speech feature extractor for extracting voiced-speech features of the input signal included in the acoustic signal received from the first start/end-point detector; a voiced-speech/unvoiced-speech discrimination model for storing a voiced-speech model parameter corresponding to a discrimination reference of the voiced-speech feature parameter extracted from the voiced-speech feature extractor; and a voiced-speech/unvoiced-speech discriminator for discriminating a voiced-speech portion using the voiced-speech features extracted by the voiced-speech feature extractor and the voiced-speech discrimination model parameter of the voiced/unvoiced-speech discrimination model.	03-19-2009
20090076815	Speech Recognition Apparatus, Speech Recognition Apparatus and Program Thereof - Provided is a method for canceling background noise of a sound source other than a target direction sound source in order to realize highly accurate speech recognition, and a system using the same. In terms of directional characteristics of a microphone array, due to a capability of approximating a power distribution of each angle of each of possible various sound source directions by use of a sum of coefficient multiples of a base form angle power distribution of a target sound source measured beforehand by base form angle by using a base form sound, and power distribution of a non-directional background sound by base form, only a component of the target sound source direction is extracted at a noise suppression part. In addition, when the target sound source direction is unknown, at a sound source localization part, a distribution for minimizing the approximate residual is selected from base form angle power distributions of various sound source directions to assume a target sound source direction. Further, maximum likelihood estimation is executed by using voice data of the component of the sound source direction passed through these processes, and a voice model obtained by predetermined modeling of the voice data, and speech recognition is carried out based on an obtained assumption value.	03-19-2009
20090089053	MULTIPLE MICROPHONE VOICE ACTIVITY DETECTOR - Voice activity detection using multiple microphones can be based on a relationship between an energy at each of a speech reference microphone and a noise reference microphone. The energy output from each of the speech reference microphone and the noise reference microphone can be determined. A speech to noise energy ratio can be determined and compared to a predetermined voice activity threshold. In another embodiment, the absolute value of the autocorrelation of the speech and noise reference signals are determined and a ratio based on autocorrelation values is determined. Ratios that exceed the predetermined threshold can indicate the presence of a voice signal. The speech and noise energies or autocorrelations can be determined using a weighted average or over a discrete frame size.	04-02-2009
20090089054	APPARATUS AND METHOD OF NOISE AND ECHO REDUCTION IN MULTIPLE MICROPHONE AUDIO SYSTEMS - Multiple microphone noise suppression apparatus and methods are described herein. The apparatus and methods implement a variety of noise suppression techniques and apparatus that can be selectively applied to signals received using multiple microphones. The microphone signals received at each of the multiple microphones can be independently processed to cancel echo signal components that can be generated from a local audio source. The echo cancelled signals may be processed by some or all modules within a signal separator that operates to separate or otherwise isolate a speech signal from noise signals. The signal separator can include a pre-processing de-correlator followed by a blind source separator. The output of the blind source separator can be post filtered to provide post separation de-correlation. The separated speech and noise signals can be non-linearly processed for further noise reduction, and additional post processing can be implemented following the non-linear processing.	04-02-2009
20090112584	DYNAMIC NOISE REDUCTION - A speech enhancement system improves the speech quality and intelligibility of a speech signal. The system includes a time-to-frequency converter that converts segments of a speech signal into frequency bands. A signal detector measures the signal power of the frequency bands of each speech segment. A background noise estimator measures a background noise detected in the speech signal. A dynamic noise reduction controller dynamically models the background noise in the speech signal. The speech enhancement renders a speech signal perceptually pleasing to a listener by dynamically attenuating a portion of the noise that occurs in a portion of the spectrum of the speech signal.	04-30-2009
20090125304	METHOD AND APPARATUS TO DETECT VOICE ACTIVITY - A method and apparatus to detect voice activity by using a zero-crossing rate includes removing noise included in an audio signal, adding a random signal having energy of a predetermined size to the audio signal from which noise is removed, extracting predetermined voice detection parameters from the audio signal to which the random signal is added, and comparing the extracted predetermined voice detection parameters with a threshold value and determining voice and non-voice activities.	05-14-2009
20090125305	METHOD AND APPARATUS FOR DETECTING VOICE ACTIVITY - A robust method and apparatus to detect voice activity based on the power level of an audio frame. The method may include performing primary active/non-active voice period determination of an input audio frame according to a power level of the audio frame, extracting a noise power prediction value and a signal power prediction value by referring to power levels of current and previous audio frames according to a primary active/non-active voice period determination value, and performing secondary active/non-active voice period determination for the input audio frame by comparing the extracted signal power prediction value with the extracted noise power prediction value.	05-14-2009
20090132248	TIME-DOMAIN RECEIVE-SIDE DYNAMIC CONTROL - A system improves the speech intelligibility and the speech quality of a speech segment. The system includes a dynamic controller that detects a background noise from an input by modeling a signal. A variable gain amplifier adjusts the variable gain of the amplifier in response to an output of dynamic controller. A shaping filter adjusts a speech signal by tilting portions of the speech signal of the dynamic controller.	05-21-2009
20090150144	Robust voice detector for receive-side automatic gain control - A voice detector improves voice output quality. The voice detector may be incorporated into a cellphone, hands-free car phone, or any other device that provides voice output. The voice detector provides excellent voice output quality even when signal dropouts and other significant signal artifacts are present in the received signal. Not only does the high quality voice output improve the listening experience, it also benefits downstream processing systems that further process the voice signal.	06-11-2009
20090150145	Learning word segmentation from non-white space languages corpora - Illustrative embodiments provide a computer implemented method, apparatus, and computer program product for learning word segmentation from non-white space language corpora. In one illustrative embodiment, the computer implemented method receives text input characters and calculates a ratio-measure for each pair of characters in the input characters. The computer implemented method further determines whether the ratio-measure of each pair of characters is equal to a predetermined threshold value. Responsive to determining the ratio-measure is less than the predetermined threshold value, and a local-minimum value, the computer method further identifies the pair as a weak pair and breaks the weak pair of characters.	06-11-2009
20090150146	MICROPHONE ARRAY BASED SPEECH RECOGNITION SYSTEM AND TARGET SPEECH EXTRACTING METHOD OF THE SYSTEM - A microphone-array-based speech recognition system using a blind source separation (BBS) and a target speech extraction method in the system are provided. The speech recognition system performs an independent component analysis (ICA) to separate mixed signals input through a plurality of microphone into sound-source signals, extracts one target speech spoken for speech recognition from the separated sound-source signals by using a Gaussian mixture model (GMM) or a hidden Markov Model (HMM), and automatically recognizes a desired speech from the extracted target speech. Accordingly, it is possible to obtain a high speech recognition rate even in a noise environment.	06-11-2009
20090177468	SPEECH RECOGNITION WITH NON-LINEAR NOISE REDUCTION ON MEL-FREQUENCY CEPTRA - In an automatic speech recognition system, a feature extractor extracts features from a speech signal, and speech is recognized by the automatic speech recognition system based on the extracted features. Noise reduction as part of the feature extractor is provided by feature enhancement in which feature-domain noise reduction in the form of Mel-frequency cepstra is provided based on the minimum means square error criterion. Specifically, the devised method takes into account the random phase between the clean speech and the mixing noise. The feature-domain noise reduction is performed in a dimension-wise fashion to the individual dimensions of the feature vectors input to the automatic speech recognition system, in order to perform environment-robust speech recognition.	07-09-2009
20090187402	Performance Prediction For An Interactive Speech Recognition System - The present invention provides an interactive speech recognition system and a corresponding method for determining a performance level of a speech recognition procedure on the basis of recorded background noise. The inventive system effectively exploits speech pauses that occur before the user enters speech that becomes subject to speech recognition. Preferably, the inventive performance prediction makes effective use of trained noise classification models. Moreover, predicted performance levels are indicated to the user in order to give a reliable feedback of the performance of the speech recognition procedure. In this way the interactive speech recognition system may react to noise conditions that are inappropriate for generating reliable speech recognition.	07-23-2009
20090192795	System and method for receiving audible input in a vehicle - A steering wheel system for a vehicle. The steering wheel system includes a first microphone mounted in a steering wheel and a second microphone mounted in the vehicle. The first and second microphones are each configured to receive an audible input. The audible input includes an oral command component and a noise component. The steering wheel system also includes a controller configured to identify the noise component by determining that the noise component received at the first microphone is out of phase with the noise component received at the second microphone. The controller is configured to cancel the noise component from the audible input.	07-30-2009
20090192796	FILTERING OF BEAMFORMED SPEECH SIGNALS - The invention relates to speech signal processing that detects a speech signal from more than one microphone and obtains microphone signals that are processed by a beamformer to obtain a beamformed signal that is post-filtered signal with a filter that employs adaptable filter weights to obtain an enhanced beamformed signal with the post-filter adapting the filter weights with previously learned filter weights.	07-30-2009
20090198492	ADAPTIVE NOISE MODELING SPEECH RECOGNITION SYSTEM - An adaptive noise modeling speech recognition system improves speech recognition by modifying an activation of the system's grammar rules or models based on detected noise characteristics. An adaptive noise modeling speech recognition system includes a sensor that receives acoustic data having a speech component and a noise component. A processor analyzes the acoustic data and generates a noise indicator that identifies a characteristic of the noise component. An integrating decision logic processes the noise indicator and generates a noise model activation data structure that includes data that may be used by a speech recognition engine to adjust the activation of associated grammar rules or models.	08-06-2009
20090210224	SYSTEM, METHOD AND PROGRAM FOR SPEECH PROCESSING - The present invention relates to a system, method and program for speech recognition. In an embodiment of the invention a method for processing a speech signal consists of receiving a power spectrum of a speech signal and generating a log power spectrum signal of the power spectrum. The method further consists of performing discrete cosine transformation on the log power spectrum signal and cutting off cepstrum upper and lower terms of the discrete cosine transformed signal. The method further consists of performing inverse discrete cosine transformation on the signal from which the cepstrum upper and lower terms are cut off. The method further consists of converting the inverse discrete cosine transformed signal so as to bring the signal back to a power spectrum domain and filtering the power spectrum of the speech signal by using, as a filter, the signal which is brought back to the power spectrum domain.	08-20-2009
20090216529	ELECTRONIC DEVICES AND METHODS THAT ADAPT FILTERING OF A MICROPHONE SIGNAL RESPONSIVE TO RECOGNITION OF A TARGETED SPEAKER'S VOICE - Electronic devices and methods are disclosed that adaptively filter a microphone signal responsive to recognition of a targeted speaker's voice. An electronic device can include a microphone, a speaker characterization circuit, an adaptive sound filter circuit, and a speaker recognition circuit. The speaker characterization circuit operates in a training mode to learn characteristics of the targeted speaker's voice component in the microphone signal, and to store the learned characteristics. The adaptive sound filter circuit adaptively filters the microphone signal responsive to a control signal. The speaker recognition circuit uses the learned characteristics to recognize the presence of the targeted speaker's voice in the microphone signal and to regulate the control signal to cause the adaptive sound filter circuit to adapt the filtering to increase the targeted speaker's voice component of the microphone signal relative to other components.	08-27-2009
20090216530	Interference detector - A system improves speech detection or processing by identifying registration signals. The system encodes a limited frequency band by varying the amplitude of a pulse width modulated signal between predefined values. The signal is separated into frequency bins that identify amplitude and phase. The registration signal is measured by comparing a difference in average acoustic power in a plurality of adjacent bins over time.	08-27-2009
20090222263	Method and Apparatus for Transmitting Speech Data To a Remote Device In a Distributed Speech Recognition System - A method of transmitting speech data to a remote device in a distributed speech recognition system, includes the steps of: dividing an input speech signal into frames; calculating, for each frame, a voice activity value representative of the presence of speech activity in the frame; grouping the frames into multiframes, each multiframe including a predetermined number of frames; calculating, for each multiframe, a voice activity marker representative of the number of frames in the multiframe representing speech activity; and selectively transmitting, on the basis of the voice activity marker associated with each multiframe, the multiframes to the remote device.	09-03-2009
20090222264	SUB-BAND CODEC WITH NATIVE VOICE ACTIVITY DETECTION - An augmented version of a Low-Complexity Sub-band Coder (LC-SBC) is described herein that is better suited than conventional LC-SBC for wideband voice communication in the Bluetooth™ framework, where minimizing the power consumption is of paramount importance. The augmented version of LC-SBC utilizes certain techniques associated with speech coding, such as Voice Activity Detection (VAD), to reduce bandwidth usage and power consumption while maintaining voice quality. The augmented version of LC-SBC reduces the average bit rate used for transmitting wideband speech in a manner that does not add significant computational complexity. Furthermore, the augmented version of LC-SBC may advantageously be implemented in a manner that does not require any modification of the underlying logic/structure of LC-SBC.	09-03-2009
20090240496	SPEECH RECOGNIZER AND SPEECH RECOGNIZING METHOD - According to one aspect of the invention, a speech recognizer includes: an audio data acquiring portion configured to acquire audio data via a microphone; a speech section detecting portion configured to detect a talking start time and a talking end time based on the audio data; a spoken word identifying portion configured to identify the audio in a speech section from the talking start time to the talking end time; and a noise suppressing portion configured to suppress a generation of a noise from an electrical noise source for the speech section.	09-24-2009
20090254341	APPARATUS, METHOD, AND COMPUTER PROGRAM PRODUCT FOR JUDGING SPEECH/NON-SPEECH - A spectrum calculating unit calculates, for each of the frames, a spectrum by performing a frequency analysis on an acoustic signal. An estimating unit estimates a noise spectrum. An energy calculating unit calculates an energy characteristic amount. An entropy calculating unit calculates a normalized spectral entropy value. A generating unit generates a characteristic vector based on the energy characteristic amounts and the normalized spectral entropy values that have been calculated for a plurality of frames. A likelihood calculating unit calculates a speech likelihood value of a target frame that corresponds to the characteristic vector. In a case where the speech likelihood value is larger than a threshold value, a judging unit judges that the target frame is a speech frame.	10-08-2009
20090254342	DETECTING BARGE-IN IN A SPEECH DIALOGUE SYSTEM - A method for detecting barge-in in a speech dialogue system comprising determining whether a speech prompt is output by the speech dialogue system, and detecting whether speech activity is present in an input signal based on a time-varying sensitivity threshold of a speech activity detector and/or based on speaker information, where the sensitivity threshold is increased if output of a speech prompt is determined and decreased if no output of a speech prompt is determined. If speech activity is detected in the input signal, the speech prompt may be interrupted or faded out. A speech dialogue system configured to detect barge-in is also disclosed.	10-08-2009
20090265169	Techniques for Comfort Noise Generation in a Communication System - A technique of operating a communication device includes dividing a frequency band associated with a background noise signal into respective sub-bands. Respective individual level estimates for each of the respective sub-bands are then determined. A total level estimate for the background noise signal is determined. Finally, a comfort noise signal (whose characteristics are based on the respective individual level estimates and the total level estimate) is provided.	10-22-2009
20090271188	Adjusting A Speech Engine For A Mobile Computing Device Based On Background Noise - Methods, apparatus, and products are disclosed for adjusting a speech engine for a mobile computing device based on background noise, the mobile computing device operatively coupled to a microphone, that include: sampling, through the microphone, background noise for a plurality of operating environments in which the mobile computing device operates; generating, for each operating environment, a noise model in dependence upon the sampled background noise for that operating environment; and configuring the speech engine for the mobile computing device with the noise model for the operating environment in which the mobile computing device currently operates.	10-29-2009
20090271189	Testing A Grammar Used In Speech Recognition For Reliability In A Plurality Of Operating Environments Having Different Background Noise - Methods, systems, and products for testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise that include: receiving recorded background noise for each of the plurality of operating environments; generating a test speech utterance for recognition by a speech recognition engine using a grammar; mixing the test speech utterance with each recorded background noise, resulting in a plurality of mixed test speech utterances, each mixed test speech utterance having different background noise; performing, for each of the mixed test speech utterances, speech recognition using the grammar and the mixed test speech utterance, resulting in speech recognition results for each of the mixed test speech utterances; and evaluating, for each recorded background noise, speech recognition reliability of the grammar in dependence upon the speech recognition results for the mixed test speech utterance having that recorded background noise.	10-29-2009
20090271190	Method and Apparatus for Voice Activity Determination - In accordance with an example embodiment of the invention, there is provided an apparatus for detecting voice activity in an audio signal. The apparatus comprises a first voice activity detector for making a first voice activity detection decision based at least in part on the voice activity of a first audio signal received from a first microphone. The apparatus also comprises a second voice activity detector for making a second voice activity detection decision based at least in part on an estimate of a direction of the first audio signal and an estimate of a direction of a second audio signal received from a second microphone. The apparatus further comprises a classifier for making a third voice activity detection decision based at least in part on the first and second voice activity detection decisions.	10-29-2009
20090276213	ROBUST DOWNLINK SPEECH AND NOISE DETECTOR - A voice activity detection process is robust to a low and high signal-to-noise ratio speech and signal loss. A process divides an aural signal into one or more bands. Signal magnitudes of frequency components and the respective noise components are estimated. A noise adaptation rate modifies estimates of noise components based on differences between the signal to the estimated noise and signal variability.	11-05-2009
20090281805	INTEGRATED SPEECH INTELLIGIBILITY ENHANCEMENT SYSTEM AND ACOUSTIC ECHO CANCELLER - A system and method is described that improves the intelligibility of a far-end telephone speech signal to a user of a telephony device in the presence of near-end background noise. As described herein, the system and method improves the intelligibility of the far-end telephone speech signal in a manner that does not require user input and that minimizes the distortion of the far-end telephone speech signal. The system is integrated with an acoustic echo canceller and shares information therewith.	11-12-2009
20090287485	ADAPTIVELY FILTERING A MICROPHONE SIGNAL RESPONSIVE TO VIBRATION SENSED IN A USER'S FACE WHILE SPEAKING - Electronic devices and methods are disclosed that adaptively filter a microphone signal responsive to vibration that is sensed in the face of a user speaking into a microphone of the device. An electronic device can include a microphone, a vibration sensor, a vibration characterization unit, and an adaptive sound filter. The microphone generates a microphone signal that can include a user speech component and a background noise component. The vibration sensor senses vibration of the face while a user speaks into the microphone, and generates a vibration signal containing frequency components that are indicative of the sensed vibration. The vibration characterization unit generates speech characterization data that characterize at least one of the frequency components of the vibration signal that is associated with the speech component of the microphone signal. The adaptive sound filter filters the microphone signal using filter coefficients that are tuned in response to the speech characterization data to generate a filtered speech signal with an attenuated background noise component relative to the user speech component from the microphone signal.	11-19-2009
20090299741	Detection and Use of Acoustic Signal Quality Indicators - A computer-driven device assists a user in self-regulating speech control of the device. The device processes an input signal representing human speech to compute acoustic signal quality indicators indicating conditions likely to be problematic to speech recognition, and advises the user of those conditions.	12-03-2009
20090299742	SYSTEMS, METHODS, APPARATUS, AND COMPUTER PROGRAM PRODUCTS FOR SPECTRAL CONTRAST ENHANCEMENT - Systems, methods, and apparatus for spectral contrast enhancement of speech signals, based on information from a noise reference that is derived by a spatially selective processing filter from a multichannel sensed audio signal, are disclosed.	12-03-2009
20100004928	VOICE/MUSIC DETERMINING APPARATUS AND METHOD - A voice/music determining apparatus is configured to calculate first feature parameters for discriminating between a voice signal and a musical signal; and calculate second feature parameters for discriminating between a musical signal and a background-sound-superimposed voice signal. A first score is calculated to indicate likelihood that the input audio signal is a voice signal or a musical signal as a sum of weight-multiplied first feature parameters. A second score is calculated to indicate likelihood that the input audio signal is a musical signal or a background-sound-superimposed voice signal as a sum of weight-multiplied second feature parameters. It is determined whether the input audio signal is a voice signal or a musical signal on the basis of the first score. Further, it is determined whether the musical signal is the input audio signal is a background-sound-superimposed voice signal on the basis of the second score.	01-07-2010
20100004929	APPARATUS AND METHOD FOR CANCELING NOISE OF VOICE SIGNAL IN ELECTRONIC APPARATUS - An apparatus and a method for canceling noise in a voice signal in an electronic apparatus are provided. The apparatus includes a Generalized Sidelobe Canceller (GSC) and a decision unit. The GSC cancels noise components from signals with different phases input via a plurality of microphones. The decision unit estimates a Signal-to-Noise Ratio (SNR) of an input signal to determine a step-size of a filter included in the GSC.	01-07-2010
20100017206	Sound source separation method and system using beamforming technique - A system and method for sound source separation. The system and method use a beamforming technique. The sound source separation system includes a windowing processor; a DFT transformer; a transfer function estimator; and a noise estimator. The system also includes a voice signal extractor that cancels individual voice signals, except an individual voice signal that is desired to be extracted among individual voice signals, from the integrated voice signals. The system further includes a voice signal detector that cancels a noise part provided through the noise estimator from a transfer function of an individual voice signal which is desired to be detected and extracts a noise-canceled individual voice signal. Even when two or more sound sources are simultaneously input, the sound sources can be separated from each other and separately stored and managed, or an initial sound source can be stored and managed.	01-21-2010
20100049514	DYNAMIC SPEECH SHARPENING - An enhanced system for speech interpretation is provided. The system may include receiving a user verbalization and generating one or more preliminary interpretations of the verbalization by identifying one or more phonemes in the verbalization. An acoustic grammar may be used to map the phonemes to syllables or words, and the acoustic grammar may include one or more linking elements to reduce a search space associated with the grammar. The preliminary interpretations may be subject to various post-processing techniques to sharpen accuracy of the preliminary interpretation. A heuristic model may assign weights to various parameters based on a context, a user profile, or other domain knowledge. A probable interpretation may be identified based on a confidence score for each of a set of candidate interpretations generated by the heuristic model. The model may be augmented or updated based on various information associated with the interpretation of the verbalization.	02-25-2010
20100057454	SYSTEM AND METHOD FOR ECHO CANCELLATION - An echo canceller for improved recognition and removal of an echo from a communication device. The echo canceller can dynamically reduce echo using an improved energy estimator and an improved adaptive filter. The improved energy estimator can determine if conversation is in a single talk period or a double talk period based on the combined energy of both the near end background noise and speech. The improved adaptive filter can reduce echo by dynamically changing adaptation speed or step size. In double talk, the adaptive filter(s) can dynamically slow-down or stop adaptation. In single talk, the filter can dynamically increase the speed of adaptation to improve accuracy, or decrease adaptation speed for stability.	03-04-2010
20100070274	APPARATUS AND METHOD FOR SPEECH RECOGNITION BASED ON SOUND SOURCE SEPARATION AND SOUND SOURCE IDENTIFICATION - An apparatus for a speech recognition based on source separation and identification includes: a sound source separator for separating mixed signals, which are input to two or more microphones, into sound source signals by using independent component analysis (ICA), and estimating direction information of the separated sound source signals; and a speech recognizer for calculating normalized log likelihood probabilities of the separated sound source signals. The apparatus further includes a speech signal identifier identifying a sound source corresponding to a user's speech signal by using both of the estimated direction information and the reliability information based on the normalized log likelihood probabilities.	03-18-2010
20100076757	ADAPTING A COMPRESSED MODEL FOR USE IN SPEECH RECOGNITION - A speech recognition system includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an adaptor component that selectively adapts parameters of a compressed model used to recognize at least a portion of the distorted speech utterance, wherein the adaptor component selectively adapts the parameters of the compressed model based at least in part upon the received distorted speech utterance.	03-25-2010
20100076758	PHASE SENSITIVE MODEL ADAPTATION FOR NOISY SPEECH RECOGNITION - A speech recognition system described herein includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an updater component that is in communication with a first model and a second model, wherein the updater component automatically updates parameters of the second model based at least in part upon joint estimates of additive and convolutive distortions output by the first model, wherein the joint estimates of additive and convolutive distortions are estimates of distortions based on a phase-sensitive model in the speech utterance received by the receiver component. Further, distortions other than additive and convolutive distortions, including other stationary and nonstationary sources, can also be estimated used to update the parameters of the second model.	03-25-2010
20100076759	APPARATUS AND METHOD FOR RECOGNIZING A SPEECH - A noisy vector is extracted from a noisy speech, which is a clean speech on which a noise is superimposed. A noise parameter of the noise is estimated from the noisy vector. A prior distribution parameter of a clean vector of the clean speech is already stored. A joint Gaussian distribution parameter between the clean vector and the noisy vector is calculated by unscented transformation, from the noise parameter and the prior distribution parameter. A posterior distribution parameter of the clean vector is calculated by the joint Gaussian distribution parameter, from the noisy vector. By comparing the posterior distribution parameter with a standard pattern of each word previously stored, a word sequence of the noisy speech is output.	03-25-2010
20100082340	SPEECH RECOGNITION SYSTEM AND METHOD FOR GENERATING A MASK OF THE SYSTEM - The speech recognition system of the present invention includes: a sound source separating section which separates mixed speeches from multiple sound sources; a mask generating section which generates a soft mask which can take continuous values between 0 and 1 for each separated speech according to reliability of separation in separating operation of the sound source separating section; and a speech recognizing section which recognizes speeches separated by the sound source separating section using soft masks generated by the mask generating section.	04-01-2010
20100082341	Speaker recognition device and method using voice signal analysis - A device includes a speaker recognition device operable to perform a method that identifies a speaker using voice signal analysis. The speaker recognition device and method identifies the speaker by analyzing a voice signal and comparing the signal with voice signal characteristics of speakers, which are statistically classified. The device and method is applicable to a case where a voice signal is a voiced sound or a voiceless sound or to a case where no information on a voice signal is present. Since voice/non-voice determination is performed, the speaker can be reliably identified from the voice signal. The device and method is adaptable to applications that require a real-time process due to a small amount of data to be calculated and fast processing. Furthermore, the device and method can be variously applied to portable devices due to low power consumption.	04-01-2010
20100088093	Voice Command Acquisition System and Method - A voice command acquisition method and system for motor vehicles is improved in that noise source information is obtained directly from the vehicle system bus. Upon receiving an input signal with a voice command, the system bus is queried for one or more possible sources of a noise component in the input signal. In addition to vehicle-internal information (e.g., window status, fan blower speed, vehicle speed), the system may acquire external information (e.g., weather status) in order to better classify the noise component in the input signal. If the noise source is found to be a window, for example, the driver may be prompted to close the window. In addition, if the fan blower is at a high speed level, it may be slowed down automatically.	04-08-2010
20100088094	DEVICE AND METHOD FOR VOICE ACTIVITY DETECTION - A voice activity detection (VAD) device and method are disclosed, so that the VAD threshold can be adaptive to the background noise variation. The VAD device includes: a background analyzing unit, adapted to: analyze background noise features of a current signal according to an input VAD judgment result, obtain parameters related to the background noise variation, and output these parameters; a VAD threshold adjusting unit, adapted to: obtain a bias of the VAD threshold according to the parameters output by the background analyzing unit, and output the bias of the VAD threshold; and a VAD judging unit, adapted to: modify a VAD threshold to be modified according to the bias of the VAD threshold output by the VAD threshold adjusting unit, judge the background noise by using the modified VAD threshold, and output a VAD judgment result.	04-08-2010
20100094624	SYSTEM AND METHOD FOR MACHINE-BASED DETERMINATION OF SPEECH INTELLIGIBILITY IN AN AIRCRAFT DURING FLIGHT OPERATIONS - A method for effecting a machine-based determination of speech intelligibility in an aircraft during flight operations includes: (a) in no particular order: (1) providing a representation of a machine-based speech evaluating signal; and (2) providing a representation of in-flight noise; (b) combining the representation of a machine-based speech evaluation signal and the representation of in-flight noise to obtain a combined noise signal; and (c) employing the combined noise signal to present the machine-based determination of speech intelligibility in an aircraft during flight operations.	04-15-2010
20100094625	METHODS AND APPARATUS FOR NOISE ESTIMATION - A system and method are disclosed for noise level/spectrum estimation and speech activity detection. Some embodiments include a probabilistic model to estimate noise level and subsequently detect the presence of speech. These embodiments outperform standard voice activity detectors (VADs), producing improved detection in a variety of noisy environments.	04-15-2010
20100114570	APPARATUS AND METHOD FOR RESTORING VOICE - An apparatus and a method for restoring voice are provided. The apparatus reduces noise included in a voice signal input to a microphone and outputs a voice signal having reduced noise, detects harmonic frequencies from the voice signal having reduced noise, and restores the voice signal having reduced noise approximate to its original state before being input to the microphone according to detected harmonic frequencies of the voice signal having reduced noise.	05-06-2010
20100121636	Multisensory Speech Detection - A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters.	05-13-2010
20100131268	VOICE-ESTIMATION INTERFACE AND COMMUNICATION SYSTEM - An apparatus having a voice-estimation (VE) interface that probes the vocal tract of a user with sub-threshold acoustic waves to estimate the user's voice while the user speaks silently or audibly in a noisy or socially sensitive environment. In one embodiment, the VE interface is integrated into a cell phone that directs an estimated-voice signal over a network to a remote party to enable (i) the user to have a conversation with the remote party without disturbing other people, e.g., at a meeting, conference, movie, or performance, and (ii) the remote party to more-clearly hear the user whose voice would otherwise be overwhelmed by a relatively loud ambient noise due to the user being, e.g., in a nightclub, disco, or flying aircraft.	05-27-2010
20100131269	SYSTEMS, METHODS, APPARATUS, AND COMPUTER PROGRAM PRODUCTS FOR ENHANCED ACTIVE NOISE CANCELLATION - Uses of an enhanced sidetone signal in an active noise cancellation operation are disclosed.	05-27-2010
20100153104	Noise Suppressor for Robust Speech Recognition - Described is noise reduction technology generally for speech input in which a noise-suppression related gain value for the frame is determined based upon a noise level associated with that frame in addition to the signal to noise ratios (SNRs). In one implementation, a noise reduction mechanism is based upon minimum mean square error, Mel-frequency cepstra noise reduction technology. A high gain value (e.g., one) is set to accomplish little or no noise suppression when the noise level is below a threshold low level, and a low gain value set or computed to accomplish large noise suppression above a threshold high noise level. A noise-power dependent function, e.g., a log-linear interpolation, is used to compute the gain between the thresholds. Smoothing may be performed by modifying the gain value based upon a prior frame's gain value. Also described is learning parameters used in noise reduction via a step-adaptive discriminative learning algorithm.	06-17-2010
20100161326	SPEECH RECOGNITION SYSTEM AND METHOD - A speech recognition system includes: a speed level classifier for measuring a moving speed of a moving object by using a noise signal at an initial time of speech recognition to determine a speed level of the moving object; a first speech enhancement unit for enhancing sound quality of an input speech signal of the speech recognition by using a Wiener filter, if the speed level of the moving object is equal to or lower than a specific level; and a second speech enhancement unit enhancing the sound quality of the input speech signal by using a Gaussian mixture model, if the speed level of the moving object is higher than the specific level. The system further includes an end point detection unit for detecting start and end points, an elimination unit for eliminating sudden noise components based on a sudden noise Gaussian mixture model.	06-24-2010
20100169089	Voice Recognizing Apparatus, Voice Recognizing Method, Voice Recognizing Program, Interference Reducing Apparatus, Interference Reducing Method, and Interference Reducing Program - A voice recognizing apparatus includes a microphone	07-01-2010
20100169090	WEIGHTED SEQUENTIAL VARIANCE ADAPTATION WITH PRIOR KNOWLEDGE FOR NOISE ROBUST SPEECH RECOGNITION - A method for adapting acoustic models used for automatic speech recognition is provided. The method includes estimating noise in a portion of a speech signal, determining a first estimated variance scaling vector using an estimated 2-order polynomial and the noise estimation, wherein the estimated 2-order polynomial represents a priori knowledge of a dependency of a variance scaling vector on noise, determining a second estimated variance scaling vector using statistics from prior portions of the speech signal, determining a variance scaling factor using the first estimated variance scaling vector and the second estimated variance scaling vector, and using the variance scaling factor to adapt an acoustic model.	07-01-2010
20100198592	Method for recognizing and interpreting patterns in noisy data sequences - This invention maps possibly noisy digital input from any of a number of different hardware or software sources such as keyboards, automatic speech recognition systems, cell phones, smart phones or the web onto an interpretation consisting of an action and one or more physical objects, such as robots, machinery, vehicles, etc. or digital objects such as data files, tables and databases. Tables and lists of (i) homonyms and misrecognitions, (ii) thematic relation patterns, and (iii) lexicons are used to generate alternative forms of the input which are scored to determine the best interpretation of the noisy input. The actions may be executed internally or output to any device which contains a digital component such as, but not limited to, a computer, a robot, a cell phone, a smart phone or the web. This invention may be implemented on sequential and parallel compute engines and systems.	08-05-2010
20100198593	Speech Enhancement with Noise Level Estimation Adjustment - Enhancing speech components of an audio signal composed of speech and noise components includes controlling the gain of the audio signal in ones of its subbands, wherein the gain in a subband is reduced as the level of estimated noise components increases with respect to the level of speech components, wherein the level of estimated noise components is determined at least in part by (1) comparing an estimated noise components level with the level of the audio signal in the subband and increasing the estimated noise components level in the subband by a predetermined amount when the input signal level in the subband exceeds the estimated noise components level in the subband by a limit for more than a defined time, or (2) obtaining and monitoring the signal-to-noise ratio in the subband and increasing the estimated noise components level in the subband by a predetermined amount when the signal-to-noise ratio in the subband exceeds a limit for more than a defined time.	08-05-2010
20100204987	In-vehicle speech recognition device - A speech recognition device is disclosed. The device obtains sound of speech of a user and an image of a lip shape of the user. The device determines whether a sudden noise is generated during user speaking. When it is determined that a sudden noise is not generated, the device recognizes content of the speech based on the sound of the speech. When it is determined that a sudden noise is generated, the device recognize the content of the speech based on the image of the lip shape of the user.	08-12-2010
20100204988	SPEECH RECOGNITION METHOD - A speech recognition method includes receiving a speech input signal in a first noise environment which includes a sequence of observations, determining the likelihood of a sequence of words arising from the sequence of observations using an acoustic model, adapting the model trained in a second noise environment to that of the first environment, wherein adapting the model trained in the second environment to that of the first environment includes using second order or higher order Taylor expansion coefficients derived for a group of probability distributions and the same expansion coefficient is used for the whole group.	08-12-2010
20100211388	Speech Enhancement with Voice Clarity - A method for enhancing speech components of an audio signal composed of speech and noise components processes subbands of the audio signal, the processing including controlling the gain of the audio signal in ones of the subbands, wherein the gain in a subband is controlled at least by processes that convey either additive/subtractive differences in gain or multiplicative ratios of gain so as to reduce gain in a subband as the level of noise components increases with respect to the level of speech components in the subband and increase gain in a subband when speech components are present in subbands of the audio signal, the processes each responding to subbands of the audio signal and controlling gain independently of each other to provide a processed subband audio signal.	08-19-2010
20100217590	SPEAKER LOCALIZATION SYSTEM AND METHOD - A system and method for performing speaker localization is described. The system and method utilizes speaker recognition to provide an estimate of the direction of arrival (DOA) of speech sound waves emanating from a desired speaker with respect to a microphone array included in the system. Candidate DOA estimates may be preselected or generated by one or more other DOA estimation techniques. The system and method is suited to support steerable beamforming as well as other applications that utilize or benefit from DOA estimation. The system and method provides robust performance even in systems and devices having small microphone arrays and thus may advantageously be implemented to steer a beamformer in a cellular telephone or other mobile telephony terminal featuring a speakerphone mode.	08-26-2010
20100241428	METHOD AND SYSTEM FOR BEAMFORMING USING A MICROPHONE ARRAY	09-23-2010
20100262423	FEATURE COMPENSATION APPROACH TO ROBUST SPEECH RECOGNITION - Described is a technology by which a feature compensation approach to speech recognition uses a high-order vector Taylor series (HOVTS) approximation of a model of distortions to improve recognition accuracy. Speech recognizer models trained with clean speech degrade when later dealing with speech that is corrupted by additive noises and convolutional distortions. The approach attempts to remove any such noise/distortions from the input speech. To use the HOVTS approximation, a Gaussian mixture model is trained and used to convert cepstral domain feature vectors to log spectrum components. HOVTS computes statistics for the components, which are transformed back to the cepstral domain. A noise/distortion estimate is obtained, and used to provide a clean speech estimate to the recognizer.	10-14-2010
20100262424	Method of Eliminating Background Noise and a Device Using the Same - The present invention provides a method of eliminating background noise and a device using the same. The method of eliminating background noise comprises the steps of: detecting an effective value of a received audio signal, and generating an average power signal of the received audio signal; generating a noise eliminating control signal by comparing the average power signal with a first threshold; and eliminating the noise, and amplifying the voice signal using the noise eliminating control signal. A device of eliminating background noise comprises a detecting unit, which is configured to detect an effective value, and generate an average power signal of the received audio signal; a first signal generating unit, which is configured to generate a noise eliminating control signal; and an amplifying unit, which is configured to eliminate the noise, and amplify the voice signal.	10-14-2010
20100262425	NOISE SUPPRESSION DEVICE AND NOISE SUPPRESSION METHOD - Disclosed is a noise suppression device capable of better noise suppression by means of a simpler structure and with a lighter computational load. A noise suppression device (	10-14-2010
20100268533	APPARATUS AND METHOD FOR DETECTING SPEECH - A speech detection apparatus and method are provided. The speech detection apparatus and method determine whether a frame is speech or not using feature information extracted from an input signal. The speech detection apparatus may estimate a situation related to an input frame and determine which feature information is required for speech detection for the input frame in the estimated situation. The speech detection apparatus may detect a speech signal using dynamic feature information that may be more suitable to the situation of a particular frame, instead of using the same feature information for each and every frame.	10-21-2010
20100292987	CIRCUIT STARTUP METHOD AND CIRCUIT STARTUP APPARATUS UTILIZING UTTERANCE ESTIMATION FOR USE IN SPEECH PROCESSING SYSTEM PROVIDED WITH SOUND COLLECTING DEVICE - A circuit startup method utilizing utterance estimation in a speech processing system including a sound collecting device is provided. The circuit startup method includes a subset power supply step of supplying power to the sound collecting device and a signal processing circuit, and a sound collecting step of inputting a sound from the sound collecting device through the signal processing circuit. The circuit startup method further includes an utterance estimation step of estimating whether or not a speech is contained in the inputted sound, and a power supply step of supplying power to the speech processing circuit for an utterance interval when it is estimated that a speech is contained from an estimation result of the utterance estimation step.	11-18-2010
20100299144	METHOD AND APPARATUS FOR THE USE OF CROSS MODAL ASSOCIATION TO ISOLATE INDIVIDUAL MEDIA SOURCES - Apparatus for isolation of a media stream of a first modality from a complex media source having at least two media modality, and multiple objects, and events, comprises: recording devices for the different modalities; an associator for associating between events recorded in said first modality and events recorded in said second modality, and providing an association output; and an isolator that uses the association output for isolating those events in the first mode correlating with events in the second mode associated with a predetermined object, thereby to isolate a isolated media stream associated with said predetermined object. Thus it is possible to identify events such as hand or mouth movements, and associate these with sounds, and then produce a filtered track of only those sounds associated with the events. In this way a particular speaker or musical instrument can be isolated from a complex scene.	11-25-2010
20100299145	ACOUSTIC DATA PROCESSOR AND ACOUSTIC DATA PROCESSING METHOD - An acoustic data processor according to the present invention is used for processing acoustic data including signal sounds to reduce noises generated by a mechanical apparatus. The acoustic data processor includes a motion status obtaining section for obtaining motion status of the mechanical apparatus, an acoustic data obtaining section for obtaining acoustic data corresponding to the obtained motion status, and a database for storing various motion statuses of the mechanical apparatus in a unit time and corresponding acoustic data as templates. The acoustic data processor further includes a database searching section for searching the database to retrieve the template having the motion status closest to the obtained motion status; and a template subtraction section for subtracting the acoustic data of the template having the motion status closest to the obtained motion status from the obtained acoustic data to reduce noises generated by the mechanical apparatus.	11-25-2010
20100318354	NOISE ADAPTIVE TRAINING FOR SPEECH RECOGNITION - Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.	12-16-2010
20110004472	Speech Recognition Using Channel Verification - A method for automatic speech recognition includes determining for an input signal a plurality scores representative of certainties that the input signal is associated with corresponding states of a speech recognition model, using the speech recognition model and the determined scores to compute an average signal, computing a difference value representative of a difference between the input signal and the average signal, and processing the input signal in accordance with the difference value.	01-06-2011
20110010171	Singular Value Decomposition for Improved Voice Recognition in Presence of Multi-Talker Background Noise - A system and method for providing speech recognition functionality offers improved accuracy and robustness in noisy environments having multiple speakers. The described technique includes receiving speech energy and converting the received speech energy to a digitized form. The digitized speech energy is decomposed into features that are then projected into a feature space having multiple speaker subspaces. The projected features fall either into one of the multiple speaker subspaces or outside of all speaker subspaces. A speech recognition operation is performed on a selected one of the multiple speaker subspaces to resolve the utterance to a command or data.	01-13-2011
20110010172	NOISE REDUCTION SYSTEM USING A SENSOR BASED SPEECH DETECTOR - Speech detection is a technique to determine and classify periods of speech. In a normal conversation, each speaker speaks less than half the time. The remaining time is devoted to listening to the other end and pauses between speech and silence. The classification is usually done by comparing the signal energy to a threshold. Classifying speech as noise and noise as speech may affect the performance of the communication device. The current invention overcomes such problems by utilizing an alternate sensor signal indicating the presence or absence of speech. In the current invention, the communication device receives an audio signal via single or multiple microphones. The speech sensor may generate a unique signal based on the facial, bone, lips and/or throat movements. The system then combines the information received by the microphones and the speech sensor to decide the presence or absence of speech. This decision can be used in the coding, compression, noise reduction and other aspects of signal processing.	01-13-2011
20110015925	SPEECH RECOGNITION SYSTEM AND METHOD - A speech recognition method, comprising:	01-20-2011
20110029308	Speech & Music Discriminator for Multi-Media Application - The present invention relates to means and methods of classifying speech and music signals in voice communication systems, devices, telephones, and methods, and more specifically, to systems, devices, and methods that automate control when either speech or music is detected over communication links. The present invention provides a novel system and method for monitoring the audio signal, analyze selected audio signal components, compare the results of analysis with a pre-determined threshold value, and classify the audio signal either as speech or music.	02-03-2011
20110029309	SIGNAL SEPARATING APPARATUS AND SIGNAL SEPARATING METHOD - Provided are a signal separating apparatus and a signal separating method capable of solving the permutation problem and separating user speech to be extracted. The signal separating apparatus separates a specific speech signal and a noise signal from a received sound signal. First, a joint probability density distribution estimation unit of a permutation solving unit calculates joint probability density distributions of the respective separated signals. Then, a classifying determination unit of the permutation solving unit determines classifying based on shapes of the calculated joint probability density distributions.	02-03-2011
20110029310	PROCEDURE FOR PROCESSING NOISY SPEECH SIGNALS, AND APPARATUS AND COMPUTER PROGRAM THEREFOR - Provided are a noise state determination method and an apparatus and a computer readable recording medium therefor. A noisy speech signal processing method according to the present invention includes calculating a transformed spectrum by transforming an input noisy speech signal to a frequency domain; calculating a smoothed magnitude spectrum by reducing magnitude differences of the transformed spectrum between neighboring frames; calculating a search spectrum which represents an estimated noise component of the smoothed magnitude spectrum; and calculating an identification ratio which represents a ratio of a noise component included in the input noisy speech signal, by using the smoothed magnitude spectrum and the search spectrum. Since a small amount of calculation is required and a large-capacity memory is not required, the present invention may be easily implemented as hardware or software. Also, since an adaptive operation is performed with respect to each frequency sub-band, the accuracy of determining a noise state may be improved.	02-03-2011
20110035216	SPEECH RECOGNITION METHOD FOR ALL LANGUAGES WITHOUT USING SAMPLES - The invention can recognize any several languages at the same time without using samples. The important skill is that features of known words in any language are extracted from unknown words or continuous voices. These unknown words represented by matrices are spread in the 144-dimensional space. The feature of a known word of any language represented by a matrix is simulated by the surrounding unknown words.	02-10-2011
20110040560	METHOD AND MEANS FOR DECODING BACKGROUND NOISE INFORMATION - A basic idea of the invention is to ascertain information on the course of the bit rate switching during an active speech phase. According to the invention, during the speech phase, information on the percentage proportion of broadband active speech frames in comparison to narrowband active speech frames is compiled on the part of the decoder. A high percentage proportion of broadband active speech frames indicates that a broadband use is preferred on the part of the codec and therefore a need exists for synthesizing noise information in broadband form during a DTX phase.	02-17-2011
20110054891	METHOD OF FILTERING NON-STEADY LATERAL NOISE FOR A MULTI-MICROPHONE AUDIO DEVICE, IN PARTICULAR A "HANDS-FREE" TELEPHONE DEVICE FOR A MOTOR VEHICLE - The method comprises the following steps in the frequency domain:	03-03-2011
20110054892	SYSTEM FOR DETECTING SPEECH INTERVAL AND RECOGNIZING CONTINUOUS SPEECH IN A NOISY ENVIRONMENT THROUGH REAL-TIME RECOGNITION OF CALL COMMANDS - The present invention relates to a continuous speech recognition system that is very robust in a noisy environment. In order to recognize continuous speech smoothly in a noisy environment, the system selects call commands, configures a minimum recognition network in token, which consists of the call commands and mute intervals including noises, recognizes the inputted speech continuously in real time, analyzes the reliability of speech recognition continuously and recognizes the continuous speech from a speaker. When a speaker delivers a call command, the system for detecting the speech interval and recognizing continuous speech in a noisy environment through the real-time recognition of call commands measures the reliability of the speech after recognizing the call command, and recognizes the speech from the speaker by transferring the speech interval following the call command to a continuous speech-recognition engine at the moment when the system recognizes the call command.	03-03-2011
20110071824	Systems and Methods for Multiple Pitch Tracking - An apparatus includes a function module, a strength module, and a filter module. The function module compares an input signal, which has a component, to a first delayed version of the input signal and a second delayed version of the input signal to produce a multi-dimensional model. The strength module calculates a strength of each extremum from a plurality of extrema of the multi-dimensional model based on a value of at least one opposite extremum of the multi-dimensional model. The strength module then identifies a first extremum from the plurality of extrema, which is associated with a pitch of the component of the input signal, that has the strength greater than the strength of the remaining extrema. The filter module extracts the pitch of the component from the input signal based on the strength of the first extremum.	03-24-2011
20110071825	DEVICE, METHOD AND PROGRAM FOR VOICE DETECTION AND RECORDING MEDIUM - To this end, a voice detection device includes a band-based power calculation unit that calculates a total of signal power values (sub-band power) of signals entered from the microphones from one preset frequency width (sub-band) to another. The voice detection device also includes a band-based noise estimation unit that estimates the sub-band based noise power, and a sub-band based SNR calculation unit. The sub-band based SNR calculation unit calculates a sub-band SNR from one sub-band to another to output the largest one of the sub-band SNRs as an SNR for a microphone of interest. The voice detection device further includes a voice/non-voice decision unit that determines the voice/non-voice using the SNR for the microphone of interest.	03-24-2011
20110099010	MULTI-CHANNEL NOISE SUPPRESSION SYSTEM - Techniques are described herein that provide multi-channel noise suppression based on a Teager energy ratio. A Teager energy ratio is a ratio of an average Teager energy operator (TEO) energy of a first signal to an average TEO energy of a second signal. The average TEO energy of a signal is defined by the equation:	04-28-2011
20110106533	Multi-Microphone Voice Activity Detector - A dual microphone voice activity detector system is presented. A voice activity detector system estimates the signal level and noise level at each microphone. A level differential between the two microphones of nearby sounds such as the signal is greater than the level differential of more distant sounds such as the noise. Thus, the voice activity detector detects the presence of nearby sounds.	05-05-2011
20110125497	Method and System for Voice Activity Detection - A method of voice activity detection is provided that includes measuring a first signal level in a first sample of a first audio signal from a first audio capture device and a second signal level in a second sample of a second audio signal from a second audio capture device, and detecting voice activity based on the first signal level, the second signal level, and an activity threshold.	05-26-2011
20110144987	USING PITCH DURING SPEECH RECOGNITION POST-PROCESSING TO IMPROVE RECOGNITION ACCURACY - A method of automated speech recognition in a vehicle. The method includes receiving audio in the vehicle, pre-processing the received audio to generate acoustic feature vectors, decoding the generated acoustic feature vectors to produce at least one speech hypothesis, and post-processing the at least one speech hypothesis using pitch to improve speech recognition accuracy. The speech hypothesis can be accepted as recognized speech during post-processing if pitch is present in the received audio. Alternatively, a pitch count for the received audio can be determined, N-best speech hypotheses can be post-processed by comparing the pitch count to syllable counts associated with the speech hypotheses, and the speech hypothesis having a syllable count equal to the pitch count can be accepted as recognized speech.	06-16-2011
20110144988	EMBEDDED AUDITORY SYSTEM AND METHOD FOR PROCESSING VOICE SIGNAL - An embedded auditory system includes a voice detecting unit for receiving a voice signal as an input and dividing the voice signal into a voice section and a non-voice section; a noise removing unit for removing a noise in the voice section of the voice signal using noise information in the non-voice section of the voice signal; and a keyword spotting unit for extracting a feature vector from the voice signal noise-removed by the noise removing unit and detecting a keyword from the voice section of the voice signal using the feature vector. A method for processing a voice signal includes receiving a voice signal as an input and dividing the voice signal into a voice section and a non-voice section; removing a noise in the voice section of the voice signal using noise information in the non-voice section of the voice signal; and extracting a feature vector from the voice signal noise-removed by the noise removing unit and detecting a keyword from the voice section of the voice signal using the feature vector.	06-16-2011
20110161078	PITCH MODEL FOR NOISE ESTIMATION - Pitch is tracked for individual samples, which are taken much more frequently than an analysis frame. Speech is identified based on the tracked pitch and the speech components of the signal are removed with a time-varying filter, leaving only an estimate of a time-varying speech signal. This estimate is then used to generate a time-varying noise model which, in turn, can be used to enhance speech related systems.	06-30-2011
20110166856	NOISE PROFILE DETERMINATION FOR VOICE-RELATED FEATURE - Systems, methods, and devices for noise profile determination for a voice-related feature of an electronic device are provided. In one example, an electronic device capable of such noise profile determination may include a microphone and data processing circuitry. When a voice-related feature of the electronic device is not in use, the microphone may obtain ambient sounds. The data processing circuitry may determine a noise profile based at least in part on the obtained ambient sounds. The noise profile may enable the data processing circuitry to at least partially filter other ambient sounds obtained when the voice-related feature of the electronic device is in use.	07-07-2011
20110178800	Distortion Measurement for Noise Suppression System - The present technology measures distortion introduced by a noise suppression system. The distortion may be measured as the difference between a noise-reduced speech signal and an estimated idealized noise reduced reference (EINRR). The EINRR may be determined from a speech component and noise component that are pre-processed, and the EINRR may be used with masks associated with energies lost and added in the speech component and noise component. The EINRR may be calculated on a time varying basis.	07-21-2011
20110184734	METHOD AND APPARATUS FOR VOICE ACTIVITY DETECTION, AND ENCODER - A method and an apparatus for Voice Activity Detection (VAD) and an encoder are provided. The method for VAD includes: acquiring a fluctuant feature value of a background noise when an input signal is the background noise, in which the fluctuant feature value is used to represent fluctuation of the background noise; performing adaptive adjustment on a VAD decision criterion related parameter according to the fluctuant feature value; and performing VAD decision on the input signal by using the decision criterion related parameter on which the adaptive adjustment is performed. The method, the apparatus, and the encoder can be adaptive to fluctuation of the background noise to perform VAD decision, so as to enhance the VAD decision performance, save limited channel bandwidth resources, and use the channel bandwidth efficiently.	07-28-2011
20110202340	SPEAKER VERIFICATION - A speaker verification method is proposed that first builds a general model of user utterances using a set of general training speech data. The user also trains the system by providing a training utterance, such as a passphrase or other spoken utterance. Then in a test phase, the user provides a test utterance which includes some background noise as well as a test voice sample. The background noise is used to bring the condition of the training data closer to that of the test voice sample by modifying the training data and a reduced set of the general data, before creating adapted training and general models. Match scores are generated based on the comparison between the adapted models and the test voice sample, with a final match score calculated based on the difference between the match scores. This final match score gives a measure of the degree of matching between the test voice sample and the training utterance and is based on the degree of matching between the speech characteristics from extracted feature vectors that make up the respective speech signals, and is not a direct comparison of the raw signals themselves. Thus, the method can be used to verify a speaker without necessarily requiring the speaker to provide an identical test phrase to the phrase provided in the training sample.	08-18-2011
20110208520	VOICE ACTIVITY DETECTION BASED ON PLURAL VOICE ACTIVITY DETECTORS - A voice activity detection (VAD) system includes a first voice activity detector, a second voice activity detector and control logic. The first voice activity detector is included in a device and produces a first VAD signal. The second voice activity detector is located externally to the device and produces a second VAD signal. The control logic combines the first and second VAD signals into a VAD output signal. Voice activity may be detected based on the VAD output signal. The second VAD signal can be represented as a flag included in a packet containing digitized audio. The packet can be transmitted to the device from the externally located VAD over a wireless link.	08-25-2011
20110208521	Hidden Markov Model for Speech Processing with Training Method - A method, system and apparatus are shown for identifying non-language speech sounds in a speech or audio signal. An audio signal is segmented and feature vectors are extracted from the segments of the audio signal. The segment is classified using a hidden Markov model (HMM) that has been trained on sequences of these feature vectors. Post-processing components can be utilized to enhance classification. An embodiment is described in which the hidden Markov model is used to classify a segment as a language speech sound or one of a variety of non-language speech sounds. Another embodiment is described in which the hidden Markov model is trained using discriminative learning.	08-25-2011
20110213611	METHOD AND DEVICE FOR CONTROLLING THE TRANSPORT OF AN OBJECT TO A PREDETERMINED DESTINATION - A method and a device control the transport of an object to a predetermined destination. The object is provided with information on a destination to which the object is to be transported. The destination information with which the object is provided is inputted into a speech detection station. A speech recognition system evaluates the destination information detected by the speech detection station. A conveying device transports the object. The destination, the information of which is provided to the object, is determined. The evaluation result of the speech recognition system is used to determine the destination. A release signal is produced. The release signal triggers two processes: the speech detection station is released for the input of destination information on another object. The conveying device transports the object. The transport of the object to the determined destination is triggered.	09-01-2011
20110213612	Acoustic Signal Classification System - A system classifies the source of an input signal. The system determines whether a sound source belongs to classes that may include human speech, musical instruments, machine noise, or other classes of sound sources. The system is robust, performing classification despite variation in sound level and noise masking. Additionally, the system consumes relatively few computational resources and adapts over time to provide consistently accurate classification.	09-01-2011
20110224979	Enhancing Speech Recognition Using Visual Information - Speech recognition device uses visual information to narrow down the range of likely adaptation parameters even before a speaker makes an utterance. Images of the speaker and/or the environment are collected using an image capturing device, and then processed to extract biometric features and environmental features. The extracted features and environmental features are then used to estimate adaptation parameters. A voice sample may also be collected to refine the adaptation parameters for more accurate speech recognition.	09-15-2011
20110224980	SPEECH RECOGNITION SYSTEM AND SPEECH RECOGNIZING METHOD - A speech recognition system according to the present invention includes a sound source separating section which separates mixed speeches from multiple sound sources from one another; a mask generating section which generates a soft mask which can take continuous values between 0 and 1 for each frequency spectral component of a separated speech signal using distributions of speech signal and noise against separation reliability of the separated speech signal; and a speech recognizing section which recognizes speeches separated by the sound source separating section using soft masks generated by the mask generating section.	09-15-2011
20110231186	SPEECH DETECTION METHOD - A speech detection method is presented, which includes the following steps. A first voice captured device samples a first signal and a second voice captured device samples a second signal. The first voice captured device is closer to a speech signal source than the second voice captured device. A first energy corresponding to the first signal within an interval is calculated, a second energy corresponding to the second signal within the interval is calculated, and a first ratio is calculated according to the first energy and the second energy. The first ratio is transformed into a second ratio. A threshold value is set. It is determined whether the speech signal source is detected by comparing the second ratio and the threshold value.	09-22-2011
20110231187	VOICE PROCESSING DEVICE, VOICE PROCESSING METHOD AND PROGRAM - A voice processing device includes a zone detection unit which detects a voice zone including a voice signal or a non-steady sound zone including a non-steady signal other than the voice signal from an input signal and a filter calculation unit that calculates a filter coefficient for holding the voice signal in the voice zone and for suppressing the non-steady signal in the non-steady sound zone according to the detection result by the zone detection unit, in which the filter calculation unit calculates the filter coefficient by using a filter coefficient calculated in the non-steady sound zone for the voice zone and using a filter coefficient calculated in the voice zone for the non-steady sound zone.	09-22-2011
20110238416	Acoustic Model Adaptation Using Splines - Described is a technology by which a speech recognizer is adapted to perform in noisy environments using linear spline interpolation to approximate the nonlinear relationship between clean speech, noise, and noisy speech. Linear spline parameters that minimize the error the between predicted noisy features and actual noisy features are learned from training data, along with variance data that reflect regression errors. Also described is compensating for linear channel distortion and updating noise and channel parameters during speech recognition decoding.	09-29-2011
20110238417	SPEECH DETECTION APPARATUS - According to one embodiment, a speech detection apparatus includes a first acoustic signal analyzing unit configured to analyze a frequency spectrum of a first acoustic signal, and a feature extracting unit configured to remove a frequency spectrum of the first acoustic signal from a third acoustic signal, which is obtained by suppressing an echo component of the first acoustic signal contained in a second acoustic signal, so as to extract a feature of a frequency spectrum of the third acoustic signal.	09-29-2011
20110238418	Method and Device for Tracking Background Noise in Communication System - A method and a device for tracking background noise in a communication system, where the method includes: calculating a SNR of a current frame according to input audio signals; increasing a frame counter, and calculating tone features and signal steadiness features of the current frame if the SNR of the current frame is not smaller than a first threshold; judging the possibility of a time window including a noise interval according to the calculated tone feature values and signal steadiness feature values of each frame of the time window when the frame counter is increased to the length of the time window; and extracting noise features in the time window. Existence of background noise is analyzed continuously in a time window, so that background noise that changes frequently and dramatically can be detected or tracked rapidly.	09-29-2011
20110246193	SIGNAL SEPARATION METHOD, AND COMMUNICATION SYSTEM SPEECH RECOGNITION SYSTEM USING THE SIGNAL SEPARATION METHOD - A method for signal separation, communication system, and voice recognition system using the method are disclosed. The method which is performed by an apparatus for signal separation includes receiving a mixed signal, wherein a first signal based on a first sound source signal and a second signal based on a second sound source signal are mixed, via a single voice input sensor, applying the modified BSS algorithm for separating the first sound source signal and the second sound source signal based on the received mixed signal, and separating the first sound source signal according to the result of applying the modified BSS algorithm.	10-06-2011
20110257971	Camera-Assisted Noise Cancellation and Speech Recognition - Methods, system, and articles are described herein for receiving an audio input and a facial image sequence for a period of time, in which the audio input includes speech input from multiple speakers. The audio input is extracted based on the received facial image sequence to extract a speech input of a particular speaker.	10-20-2011
20110282663	TRANSIENT NOISE REJECTION FOR SPEECH RECOGNITION - A method of and system for transient noise rejection for improved speech recognition. The method comprises the steps of (a) receiving audio including user speech and at least some transient noise associated with the speech, (b) converting the received audio into digital data, (c) segmenting the digital data into acoustic frames, and (d) extracting acoustic feature vectors from the acoustic frames. The method also comprises the steps of (e) evaluating the acoustic frames for transient noise on a frame-by-frame basis, (f) rejecting those acoustic frames having transient noise, (g) accepting as speech frames those acoustic frames having no transient noise and, thereafter, (h) recognizing the user speech using the speech frames.	11-17-2011
20110288860	SYSTEMS, METHODS, APPARATUS, AND COMPUTER-READABLE MEDIA FOR PROCESSING OF SPEECH SIGNALS USING HEAD-MOUNTED MICROPHONE PAIR - A noise cancelling headset for voice communications contains a microphone at each of the user's ears and a voice microphone. The headset shares the use of the ear microphones for improving signal-to-noise ratio on both the transmit path and the receive path.	11-24-2011
20110307253	Speech and Noise Models for Speech Recognition - An audio signal generated by a device based on audio input from a user may be received. The audio signal may include at least a user audio portion that corresponds to one or more user utterances recorded by the device. A user speech model associated with the user may be accessed and a determination may be made background audio in the audio signal is below a defined threshold. In response to determining that the background audio in the audio signal is below the defined threshold, the accessed user speech model may be adapted based on the audio signal to generate an adapted user speech model that models speech characteristics of the user. Noise compensation may be performed on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.	12-15-2011
20110313763	PICKUP SIGNAL PROCESSING APPARATUS, METHOD, AND PROGRAM PRODUCT - According to one embodiment, a pickup signal processing apparatus includes microphones, a sound determining unit, a signal level calculating unit, a setting unit, and a calculating unit. The sound determining unit determines whether pickup signals picked up by the microphones are signals from a neighboring sound source or a background noise signal. The signal level calculating unit calculates the signal levels for the microphones. The setting unit sets a gain value of at least one microphone and reduces a difference between the signal levels for the microphones on the basis of the signal levels for the microphones, when determined that the pickup signal is the background noise signal. The calculating unit multiplies the pickup signal of the at least one microphone by the gain value set by the setting unit.	12-22-2011
20120004909	SPEECH AUDIO PROCESSING - A speech processing engine is provided that in some embodiments, employs Kalman filtering with a particular speaker's glottal information to clean up an audio speech signal for more efficient automatic speech recognition.	01-05-2012
20120022863	METHOD AND APPARATUS FOR VOICE ACTIVITY DETECTION - A method and apparatus for detecting voice activity are disclosed. The method of detecting voice activity is performed in a Continuous Listening environment which includes: extracting a feature parameter from a frame signal; determining whether the frame signal is a voice signal or a noise signal by comparing the feature parameter with model parameters of a plurality of comparison signals, respectively; and outputting the frame signal when the frame signal is determined to be a voice signal. The apparatus includes a classifier module which extracts a feature parameter from a frame signal, and generating labeling information with respect to the frame signal by comparing the feature parameter with model parameters of a plurality of comparison signals; and	01-26-2012
20120022864	METHOD AND DEVICE FOR CLASSIFYING BACKGROUND NOISE CONTAINED IN AN AUDIO SIGNAL - Embodiments of methods and devices for classifying background noise contained in an audio signal are disclosed. In one embodiment, the device includes a module for extracting from the audio signal a background noise signal, termed the noise signal. Also included is a second that calculates a first parameter, termed the temporal indicator. The temporal indicator relates to the temporal evolution of the noise signal. The second module also calculates a second parameter, termed the frequency indicator. The frequency indicator relates to the frequency spectrum of the noise signal. Finally, the device includes a third module that classifies the background noise by selecting, as a function of the calculated values of the temporal indicator and of the frequency indicator, a class of background noise from among a predefined set of classes of background noise.	01-26-2012
20120046944	ENVIRONMENT RECOGNITION OF AUDIO INPUT - The present disclosure introduces a new technique for environmental recognition of audio input using feature selection. In one embodiment, audio data may be identified using feature selection. A plurality of audio descriptors may be ranked by calculating a Fisher's discriminant ratio for each audio descriptor. Next, a configurable number of highest ranking audio descriptors based on the Fisher's discriminant ratio of each audio descriptor are selected to obtain a selected feature set. The selected feature set is then applied to audio data. Other embodiments are also described.	02-23-2012
20120078624	METHOD FOR DETECTING VOICE SECTION FROM TIME-SPACE BY USING AUDIO AND VIDEO INFORMATION AND APPARATUS THEREOF - The present invention relates to a method for detecting a voice section in time-space by using audio and video information. According to an embodiment of the present invention, a method for detecting a voice section from time-space by using audio and video information comprises the steps of: detecting a voice section in an audio signal which is inputted into a microphone array; verifying a speaker from the detected voice section; sensing the face of the speaker by using a video signal which is inputted into a camera if the speaker is successfully verified, and then estimating the direction of the face of the speaker; and determining the detected voice section as the voice section of the speaker if the estimated face direction corresponds to a reference direction which is previously stored.	03-29-2012
20120078625	WAVEFORM ANALYSIS OF SPEECH - A waveform analysis of speech is disclosed. Embodiments include methods for analyzing captured sounds produced by animals, such as human vowel sounds, and accurately determining the sound produced. Some embodiments utilize computer processing to identify the location of the sound within a waveform, select a particular time within the sound, and measure a fundamental frequency and one or more formants at the particular time. Embodiments compare the fundamental frequency and the one or more formants to known thresholds and multiples of the fundamental frequency, such as by a computer-run algorithm. The results of this comparison identify of the sound with a high degree of accuracy.	03-29-2012
20120084084	Noise cancellation device for communications in high noise environments - This invention presents a noise cancellation device for improved personal face-to-face and radio communications in high noise environments. The device comprises speech acquisition components, an audio signal processing module, a loudspeaker, and a radio interface. With the noise cancellation device, the signal-to-noise ratio can be improved by as much as 30 dB.	04-05-2012
20120084085	METHOD AND DEVICE FOR TRACKING BACKGROUND NOISE IN COMMUNICATION SYSTEM - A method and a device for tracking background noise in a communication system are provided. The method includes: calculating a SNR of a current frame according to input audio signals; increasing a frame counter, and calculating tone features and signal steadiness features of the current frame if the SNR of the current frame is not less than a first threshold; determining the possibility of a time window including a noise interval according to the calculated tone feature values and signal steadiness feature values of each frame of the time window when the frame counter is increased to the length of the time window; and extracting noise features in the time window. Existence of background noise is analyzed continuously in a time window, so that background noise that changes frequently.	04-05-2012
20120095761	SPEECH RECOGNITION SYSTEM AND SPEECH RECOGNIZING METHOD - A speech recognition system and a speech recognizing method for high-accuracy speech recognition in the environment with ego noise are provided. A speech recognition system according to the present invention includes a sound source separating and speech enhancing section; an ego noise predicting section; and a missing feature mask generating section for generating missing feature masks using outputs of the sound source separating and speech enhancing section and the ego noise predicting section; an acoustic feature extracting section for extracting an acoustic feature of each sound source using an output for said each sound source of the sound source separating and speech enhancing section; and a speech recognizing section for performing speech recognition using outputs of the acoustic feature extracting section and the missing feature masks.	04-19-2012
20120101819	SYSTEM AND A METHOD FOR PROVIDING SOUND SIGNALS - A sound system, the sound system including: (i) a processor, configured to: (a) receive a requested sound signal and an ambient sound input signal; and (b) generate a modified requested signal by processing, in response to a desired level of ambient sound that is defined by a user, the requested sound signal and the ambient sound input signal, wherein an inclusion level of the ambient sound input signal in the modified requested signal is responsive to the desired level of ambient sound; and (ii) a signal provider configured to provide the modified requested signal to multiple speakers of a headset.	04-26-2012
20120109647	System Enhancement of Speech Signals - A system enhances speech by detecting a speaker's utterance through a first microphone positioned a first distance from a source of interference. A second microphone may detect the speaker's utterance at a different position. A monitoring device may estimate the power level of a first microphone signal. A synthesizer may synthesize part of the first microphone signal by processing the second microphone signal. The synthesis may occur when power level is below a predetermined level.	05-03-2012
20120123776	ADJUSTING A SPEECH ENGINE FOR A MOBILE COMPUTING DEVICE BASED ON BACKGROUND NOISE - Methods, apparatus, and products are disclosed for adjusting a speech engine for a mobile computing device based on background noise, the mobile computing device operatively coupled to a microphone, that include: sampling, through the microphone, background noise for a plurality of operating environments in which the mobile computing device operates; generating, for each operating environment, a noise model in dependence upon the sampled background noise for that operating environment; and configuring the speech engine for the mobile computing device with the noise model for the operating environment in which the mobile computing device currently operates.	05-17-2012
20120123777	ADJUSTING A SPEECH ENGINE FOR A MOBILE COMPUTING DEVICE BASED ON BACKGROUND NOISE - Methods, apparatus, and products are disclosed for adjusting a speech engine for a mobile computing device based on background noise, the mobile computing device operatively coupled to a microphone, that include: sampling, through the microphone, background noise for a plurality of operating environments in which the mobile computing device operates; generating, for each operating environment, a noise model in dependence upon the sampled background noise for that operating environment; and configuring the speech engine for the mobile computing device with the noise model for the operating environment in which the mobile computing device currently operates.	05-17-2012
20120130713	SYSTEMS, METHODS, AND APPARATUS FOR VOICE ACTIVITY DETECTION - Systems, methods, apparatus, and machine-readable media for voice activity detection in a single-channel or multichannel audio signal are disclosed.	05-24-2012
20120158404	APPARATUS AND METHOD FOR ISOLATING MULTI-CHANNEL SOUND SOURCE - In an apparatus and method for isolating a multi-channel sound source, the probability of speaker presence calculated when noise of a sound source signal separated by GSS is estimated is used to calculate a gain. Thus, it is not necessary to additionally calculate the probability of speaker presence when calculating the gain, the speaker's voice signal can be easily and quickly separated from peripheral noise and reverb and distortion are minimized. As such, if several interference sound sources, each of which has directivity, and speakers are simultaneously present in a room with high reverb, a plurality of sound sources generated from several microphones can be separated from one another with low sound quality distortion, and the reverb can also be removed.	06-21-2012
20120166190	APPARATUS FOR REMOVING NOISE FOR SOUND/VOICE RECOGNITION AND METHOD THEREOF - The present invention has been made in an effort to provide an apparatus for removing noise for sound/voice recognition removing a TV sound corresponding to a noise signal by using an adaptive filter capable of adapting a filter coefficient in order to remove an analogue signal and performing sound and/or voice recognition and a method thereof.	06-28-2012
20120173234	VOICE ACTIVITY DETECTION APPARATUS, VOICE ACTIVITY DETECTION METHOD, PROGRAM THEREOF, AND RECORDING MEDIUM - The processing efficiency and estimation accuracy of a voice activity detection apparatus are improved. An acoustic signal analyzer receives a digital acoustic signal containing a speech signal and a noise signal, generates a non-speech GMM and a speech GMM adapted to a noise environment, by using a silence GMM and a clean-speech GMM in each frame of the digital acoustic signal, and calculates the output probabilities of dominant Gaussian distributions of the GMMs. A speech state probability to non-speech state probability ratio calculator calculates a speech state probability to non-speech state probability ratio based on a state transition model of a speech state and a non-speech state, by using the output probabilities; and a voice activity detection unit judges, from the speech state probability to non-speech state probability ratio, whether the acoustic signal in the frame is in the speech state or in the non-speech state and outputs only the acoustic signal in the speech state.	07-05-2012
20120185247	UNIFIED MICROPHONE PRE-PROCESSING SYSTEM AND METHOD - A unified microphone pre-processing system includes a plurality of microphones arranged within a vehicle passenger compartment, a processing circuit or system configured to receive signals from one or more of the plurality of microphones, and the processing circuit configured to enhance the received signals for use by at least two of a telephony processing application, an automatic speech recognition processing application, and a noise cancellation processing application. The method includes receiving signals from one or more of a plurality of microphones arranged within a vehicle passenger compartment, and enhancing the received signals for use by at least two of a telephony processing application, an automatic speech recognition processing application, and a noise cancellation processing application. A computer readable medium containing executable instructions to cause a processor to perform a method in accordance with an embodiment of the invention is also described.	07-19-2012
20120185248	VOICE DETECTOR AND A METHOD FOR SUPPRESSING SUB-BANDS IN A VOICE DETECTOR - Embodiments of the present invention relate to a voice detector receiving an input signal that is divided into sub-signals that represent a frequency sub-band. The voice detector calculates, for each sub-band, a signal-to-noise (SNR) value based on a corresponding sub-signal for each sub-band and a background signal for each sub-band. The voice detector also calculates a power SNR value for each sub-band, where at least one of the power SNR values is calculated based on a non-linear function. The voice detector forms a single value based on the calculated power SNR values and compares the single value and a given threshold value to make a voice activity decision presented on an output port.	07-19-2012
20120191450	SYSTEM AND METHOD FOR NOISE REDUCTION IN PROCESSING SPEECH SIGNALS BY TARGETING SPEECH AND DISREGARDING NOISE - A system and method for processing a speech signal delivered in a noisy channel or with ambient noise that focuses on a subset of harmonics that are least corrupted by noise, that disregards the signal harmonics with low signal-to-noise ratio(s), and that disregards amplitude modulations inconsistent with speech.	07-26-2012
20120197639	SYSTEM THAT DETECTS AND IDENTIFIES PERIODIC INTERFERENCE - A system improves speech detection or processing by identifying registration signals. The system encodes a limited frequency band by varying the amplitude of a pulse width modulated signal between predefined values. The signal is separated into frequency bins that identify amplitude and phase. The registration signal is measured by comparing a difference in average acoustic power in a plurality of adjacent bins over time.	08-02-2012
20120203549	NOISE REJECTION APPARATUS, NOISE REJECTION METHOD AND NOISE REJECTION PROGRAM - A speech-segment determination process is performed to determine whether audio data is a speech segment. A result of the speech-segment determination process is memorized. A noise rejection process is performed to reject a noise component of the audio data while performing an adaptive process to change filter coefficients for adaptive filtration if a result of the determination process indicates that the audio data is not the speech segment. The noise component is rejected with no adaptive process if the result of the determination process indicates that the audio data is the speech segment. The determination process is performed again to the audio data having the noise component rejected and the rejection process is performed again to the audio data if a result of the determination process performed again is different from the memorized result of the determination process.	08-09-2012
20120203550	INTERIOR REARVIEW MIRROR SYSTEM FOR VEHICLE - An interior rearview mirror system suitable for use in a vehicle includes an interior rearview mirror assembly having a mirror head and a reflective element. The mirror head includes a first microphone operable to generate a first analog signal and a second microphone operable to generate a second analog signal. The first analog signal is converted to a first digital signal by at least one analog to digital converter and the second analog signal is converted to a second digital signal by the at least one analog to digital converter. A digital sound processor is operable to process the first and second digital signals. Responsive to the processing of the first and second digital signals, the digital sound processor generates a digital output, and the digital output, at least in part, distinguishes a human voice present in the vehicle from noise present in the vehicle.	08-09-2012
20120209603	ACOUSTIC VOICE ACTIVITY DETECTION - Techniques for acoustic voice activity detection (AVAD) is described, including detecting a signal associated with a subband from a microphone, performing an operation on data associated with the signal, the operation generating a value associated with the subband, and determining whether the value distinguishes the signal from noise by using the value to determine a signal-to-noise ratio and comparing the value to a threshold.	08-16-2012
20120209604	Method And Background Estimator For Voice Activity Detection - The present invention relates to a method and a background estimator in voice activity detector for updating a background noise estimate for an input signal. The input signal for a current frame is received and it is determined whether the current frame of the input signal comprises non-noise. Further, an additional determination is performed whether the current frame of the non-noise input comprises noise by analyzing characteristics at least related to correlation and energy level of the input signal, and background noise estimate is updated if it is determined that the current frame comprises noise.	08-16-2012
20120226498	MOTION-BASED VOICE ACTIVITY DETECTION - Motion-based voice activity detection may be provided. A data stream may be received and a determination may be made whether at least one non-audio element associated with the data stream indicates that the data stream comprises speech. In response to determining that the at least one non-audio element associated with the data stream indicates that the data stream comprises speech, a speech to text conversion may be performed on at least one audio element associated with the data stream.	09-06-2012
20120232895	APPARATUS AND METHOD FOR DISCRIMINATING SPEECH, AND COMPUTER READABLE MEDIUM - According to one embodiment, an apparatus for discriminating speech/non-speech of a first acoustic signal includes a weight assignment unit, a feature extraction unit, and a speech/non-speech discrimination unit. The weight assignment unit is configured to assign a weight to each frequency band, based on a frequency spectrum of the first acoustic signal including a user's speech and a frequency spectrum of a second acoustic signal including a disturbance sound. The feature extraction unit is configured to extract a feature from the frequency spectrum of the first acoustic signal, based on the weight of each frequency band. The speech/non-speech discrimination unit is configured to discriminate speech/non-speech of the first acoustic signal, based on the feature.	09-13-2012
20120232896	METHOD AND AN APPARATUS FOR VOICE ACTIVITY DETECTION - A voice activity detection apparatus (	09-13-2012
20120239394	ERRONEOUS DETECTION DETERMINATION DEVICE, ERRONEOUS DETECTION DETERMINATION METHOD, AND STORAGE MEDIUM STORING ERRONEOUS DETECTION DETERMINATION PROGRAM - An erroneous detection determination device includes: a signal acquisition unit configured to acquire, from each of microphones, a plurality of audio signals relating to ambient sound including sound from a sound source in a certain direction; a result acquisition unit configured to acquire a recognition result including voice activity information indicating the inclusion of a voice activity relating to at least one of the audio signals; a calculation unit configured to calculate, for each of audio signals on the basis of the signals in respective unit times and the certain direction, a speech arrival rate representing the proportion of the sound from the certain direction to the ambient sound in each of the unit times; and an error detection unit configured to determine, on the basis of the recognition result and the speech arrival rate, whether or not the voice activity information is the result of erroneous detection.	09-20-2012
20120245933	ADAPTIVE AMBIENT SOUND SUPPRESSION AND SPEECH TRACKING - A device for suppressing ambient sounds from speech received by a microphone array is provided. One embodiment of the device comprises a microphone array, a processor, an analog-to-digital converter, and memory comprising instructions stored therein that are executable by the processor. The instructions stored in the memory are configured to receive a plurality of digital sound signals, each digital sound signal based on an analog sound signal originating at the microphone array, receive a multi-channel speaker signal, generate a monophonic approximation signal of the multi-channel speaker signal, apply a linear acoustic echo canceller to suppress a first ambient sound portion of each digital sound signal, generate a combined directionally-adaptive sound signal from a combination of each digital sound signal by a combination of time-invariant and adaptive beamforming techniques, and apply one or more nonlinear noise suppression techniques to suppress a second ambient sound portion of the combined directionally-adaptive sound signal.	09-27-2012
20120259628	ACCELEROMETER VECTOR CONTROLLED NOISE CANCELLING METHOD - A telecommunication device is disclosed, comprising: a microphone array comprising a plurality of microphones, wherein each microphone receives an analogue acoustic signal; a position sensing device for determining how the telecommunication device is positioned in three-dimensions with respect to a user's mouth; at least one analogue/digital converter for converting each analogue acoustic signal into a digital signal; a digital signal processor for performing signal processing on the received digital signals comprising a controller, a plurality of delay circuits for delaying each received signal based on an input from the controller and a plurality of preamplifiers for adjusting the gain of each received signal based on a gain input from the controller, wherein the controller selects the appropriate delay and gain values applied to each received signal to remove noise from the received signals based on the determined position of the telecommunication device. A method for creating and controlling a location of a virtual microphone near a telecommunication device so as to reduce background noise in a speech signal is also disclosed.	10-11-2012
20120259629	NOISE REDUCTION COMMUNICATION DEVICE - To provide a noise reduction transmitter which can secure clarity of sounds collected in very noisy environments and maintain a quality of sounds without devising a noise insulation cover particularly.	10-11-2012
20120259630	DISPLAY APPARATUS AND VOICE CONVERSION METHOD THEREOF - The voice conversion method of a display apparatus includes: in response to the receipt of a first video frame, detecting one or more entities from the first video frame; in response to the selection of one of the detected entities, storing the selected entity; in response to the selection of one of a plurality of previously-stored voice samples, storing the selected voice sample in connection with the selected entity; and in response to the receipt of a second video frame including the selected entity, changing a voice of the selected entity based on the selected voice sample and outputting the changed voice.	10-11-2012
20120259631	Speech and Noise Models for Speech Recognition - An audio signal generated by a device based on audio input from a user may be received. The audio signal may include at least a user audio portion that corresponds to one or more user utterances recorded by the device. A user speech model associated with the user may be accessed and a determination may be made background audio in the audio signal is below a defined threshold. In response to determining that the background audio in the audio signal is below the defined threshold, the accessed user speech model may be adapted based on the audio signal to generate an adapted user speech model that models speech characteristics of the user. Noise compensation may be performed on the received audio signal using the adapted user speech model to generate a filtered audio signal with reduced background audio compared to the received audio signal.	10-11-2012
20120265526	APPARATUS AND METHOD FOR VOICE ACTIVITY DETECTION - An input signal is received. A plurality of electrical characteristics from the input signal is obtained. A plurality of acoustic features is determined from the obtained electrical characteristics and each of the acoustic features being different from the others. At least some of the acoustic features are compared to a plurality of predetermined criteria. Based upon the comparing of the acoustic features to the plurality of predetermined criteria, it is determined when the signal is a voice signal or a noise signal.	10-18-2012
20120284023	METHOD OF SELECTING ONE MICROPHONE FROM TWO OR MORE MICROPHONES, FOR A SPEECH PROCESSOR SYSTEM SUCH AS A "HANDS-FREE" TELEPHONE DEVICE OPERATING IN A NOISY ENVIRONMENT - The method comprises the steps of: digitizing sound signals picked up simultaneously by two microphones (N, M); executing a short-term Fourier transform on the signals (x	11-08-2012
20120290297	Speaker Liveness Detection - A signal representative of an unpredictable audio stimulus is provided to a putative live speaker within a putative live recording environment. A second signal purportedly emanating from the putative live speaker and/or the environment is received. This second signal is examined for influence of the unpredictable audio stimulus on the putative live speaker and/or the putative live recording environment. The examining includes at least one of audio feedback analysis, Lombard analysis, and evoked otoacoustic response analysis. Based on the examining, a determination is made as to whether the putative live speaker is an actual live speaker and/or whether the putative live recording environment is an actual live recording environment.	11-15-2012
20120303366	SYSTEM FOR DETECTING SPEECH WITH BACKGROUND VOICE ESTIMATES AND NOISE ESTIMATES - A system detects a speech segment that may include unvoiced, fully voiced, or mixed voice content. The system includes a window function that passes signals within a programmed aural frequency range while substantially blocking signals above and below the programmed aural frequency range. A frequency converter converts the signals passing within the programmed aural frequency range into a plurality of frequency bins. A background voice detector estimates the strength of a background speech segment relative to the noise of selected portions of the aural spectrum. A noise estimator estimates a maximum distribution of noise to an average of an acoustic noise power of some of the plurality of frequency bins. A voice detector compares the strength of a desired speech segment to a maximum of an output of the background voice detector and an output of the noise estimator.	11-29-2012
20120303367	Robust Noise Estimation - An enhancement system improves the estimate of noise from a received signal. The system includes a spectrum monitor that divides a portion of the signal at more than one frequency resolution. Adaptation logic derives a noise adaptation factor of the received signal. A plurality of devices tracks the characteristics of an estimated noise in the received signal and modifies multiple noise adaptation rates. Weighting logic applies the modified noise adaptation rates derived from the signal divided at a first frequency resolution to the signal divided at a second frequency resolution.	11-29-2012
20120310640	MIC COVERING DETECTION IN PERSONAL AUDIO DEVICES - A personal audio device, such as a wireless telephone, includes noise canceling circuit that adaptively generates an anti-noise signal from a reference microphone signal and injects the anti-noise signal into the speaker or other transducer output to cause cancellation of ambient audio sounds. An error microphone may also be provided proximate the speaker to estimate an electro-acoustical path from the noise canceling circuit through the transducer. A processing circuit uses the reference and/or error microphone, optionally along with a microphone provided for capturing near-end speech, to determine whether one of the reference or error microphones is obstructed by comparing their received signal content and takes action to avoid generation of erroneous anti-noise.	12-06-2012
20120310641	Method And Apparatus For Voice Activity Determination - In accordance with an example embodiment of the invention, there is provided an apparatus for detecting voice activity in an audio signal. The apparatus comprises a first voice activity detector for making a first voice activity detection decision based at least in part on the voice activity of a first audio signal received from a first microphone. The apparatus also comprises a second voice activity detector for making a second voice activity detection decision based at least in part on an estimate of a direction of the first audio signal and an estimate of a direction of a second audio signal received from a second microphone. The apparatus further comprises a classifier for making a third voice activity detection decision based at least in part on the first and second voice activity detection decisions.	12-06-2012
20120316872	ADAPTIVE ACTIVE NOISE CANCELING FOR HANDSET - Embodiments of the present invention provide an adaptive noise canceling system. The adaptive noise canceling system may be used in a handset to cancel background noise by generating an anti-noise signal. The adaptive noise canceling system may include first input to receive a first signal from a feedforward microphone; a second input to receive a second signal from an error microphone; a controller coupled to the inputs, the controller configured to adaptively generate an anti-noise signal according to the received signals, wherein the controller derives a profile of the anti-noise signal from the first signal and derives a magnitude of the anti-noise signal from both first and second signal; and an output to transmit the anti-noise signal to a speaker.	12-13-2012
20120330655	VOICE RECOGNITION DEVICE - A voice recognition device includes a voice recognition dictionary in which a word which is recognized as a result of voice recognition on an inputted voice is registered, a reply voice data storage unit for storing recorded voice data about words registered in the voice recognition dictionary, a dialog control unit for, when a word registered in the voice recognition dictionary is recognized, acquiring recorded voice data corresponding to the word from the reply voice data storage unit, a reproduction noise reduction unit for carrying out a process of reducing noise included in the recorded voice data, an amplitude adjusting unit for adjusting an amplitude of the recorded voice data in which the noise has been reduced to a predetermined amplitude level, and a voice reproduction unit for reproducing a voice from the amplitude-adjusted recorded voice data.	12-27-2012
20120330656	VOICE ACTIVITY DETECTION - Discrimination between two classes comprises receiving a set of frames including an input signal and determining at least two different feature vectors for each of the frames. Discrimination between two classes further comprises classifying the two different feature vectors using sets of preclassifiers trained for at least two classes of events and from that classification, and determining values for at least one weighting factor. Discrimination between two classes still further comprises calculating a combined feature vector for each of the received frames by applying the weighting factor to the feature vectors and classifying the combined feature vector for each of the frames by using a set of classifiers trained for at least two classes of events.	12-27-2012
20120330657	SPEECH FEATURE EXTRACTION APPARATUS, SPEECH FEATURE EXTRACTION METHOD, AND SPEECH FEATURE EXTRACTION PROGRAM - A speech feature extraction apparatus, speech feature extraction method, and speech feature extraction program. A speech feature extraction apparatus includes: first difference calculation module to: (i) receive, as an input, a spectrum of a speech signal segmented into frames for each frequency bin; and (ii) calculate a delta spectrum for each of the frame, where the delta spectrum is a difference of the spectrum within continuous frames for the frequency bin; and first normalization module to normalize the delta spectrum of the frame for the frequency bin by dividing the delta spectrum by a function of an average spectrum; where the average spectrum is an average of spectra through all frames that are overall speech for the frequency bin; and where an output of the first normalization module is defined as a first delta feature.	12-27-2012
20130006622	ADAPTIVE CONFERENCE COMFORT NOISE - A continuous comfort noise is provided that is overlaid for the entire duration of a conference call scenario. The comfort noise may be adapted to match the levels of the actual background noise detected on one or more of the conference call participant's devices on the transmitting end(s) of a conference call as well as the participants' speech levels. The comfort noise may also be adapted to the type of listening device employed on the receiving end of a conference call. The comfort noise level may be customized to an appropriate and comfortable level for the type of listening device being used, and the system may continuously mix the comfort noise with incoming audio signals for the entire duration of a conference call, lowering the comfort noise level gradually during speaking periods for additional user experience improvement.	01-03-2013
20130006623	SPEECH RECOGNITION USING VARIABLE-LENGTH CONTEXT - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for recognizing speech using a variable length of context. Speech data and data identifying a candidate transcription for the speech data are received. A phonetic representation for the candidate transcription is accessed. Multiple test sequences are extracted for a particular phone in the phonetic representation. Each of the multiple test sequences includes a different set of contextual phones surrounding the particular phone. Data indicating that an acoustic model includes data corresponding to one or more of the multiple test sequences is received. From among the one or more test sequences, the test sequence that includes the highest number of contextual phones is selected. A score for the candidate transcription is generated based on the data from the acoustic model that corresponds to the selected test sequence.	01-03-2013
20130006624	SOUND SOURCES SEPARATION AND MONITORING USING DIRECTIONAL COHERENT ELECTROMAGNETIC WAVES - An apparatus and a method that achieve physical separation of sound sources by pointing directly a beam of coherent electromagnetic waves (i.e. laser). Analyzing the physical properties of a beam reflected from the vibrations generating sound source enable the reconstruction of the sound signal generated by the sound source, eliminating the noise component added to the original sound signal. In addition, the use of multiple electromagnetic waves beams or a beam that rapidly skips from one sound source to another allows the physical separation of these sound sources. Aiming each beam to a different sound source ensures the independence of the sound signals sources and therefore provides full sources separation.	01-03-2013
20130030803	MICROPHONE-ARRAY-BASED SPEECH RECOGNITION SYSTEM AND METHOD - A microphone-array-based speech recognition system combines a noise cancelling technique for cancelling noise of input speech signals from an array of microphones, according to at least an inputted threshold. The system receives noise-cancelled speech signals outputted by a noise masking module through at least a speech model and at least a filler model, then computes a confidence measure score with the at least a speech model and the at least a filler model for each threshold and each noise-cancelled speech signal, and adjusts the threshold to continue the noise cancelling for achieving a maximum confidence measure score, thereby outputting a speech recognition result related to the maximum confidence measure score.	01-31-2013
20130035935	DEVICE AND METHOD FOR DETERMINING SEPARATION CRITERION OF SOUND SOURCE, AND APPARATUS AND METHOD FOR SEPARATING SOUND SOURCE - The present invention allows a man to recognize a location of a sound source in a three-dimensional space using two ears and applies a method of separating a sound source in a certain orientation to improve the performance of an application technology using a speech in a noisy environment. The present invention acquires a speech signal using two sensors and determines an orientation angle of a sound source in a zero-crossing point step with respect to a frequency separated signal with a band pass filter bank. An object of the present invention is to obtain excellent sound source orientation detection and division performance which is difficult to be obtained in an existing crossing correlation method calculated in units of time frames in a noisy environment with a plurality of sound sources.	02-07-2013
20130046536	Method and Apparatus for Performing Song Detection on Audio Signal - Methods and apparatuses for performing song detection on an audio signal are described. Clips of the audio signal are classified into classes comprising music. Class boundaries of music clips are detected as candidate boundaries of a first type. Combinations including non-overlapped sections are derived. Each section meets the following conditions: 1) including at least one music segment longer than a predetermined minimum song duration, 2) shorter than a predetermined maximum song duration, 3) both starting and ending with a music clip, and 4) a proportion of the music clips in each of the sections is greater than a predetermined minimum proportion. In this way, various possible song partitions in the audio signal can be obtained for investigation.	02-21-2013
20130054235	TRULY HANDSFREE SPEECH RECOGNITION IN HIGH NOISE ENVIRONMENTS - Embodiments of the present invention improve content manipulation systems and methods using speech recognition. In one embodiment, the present invention includes a method comprising configuring a recognizer to recognize utterances in the presence of a background audio signal having particular audio characteristics. A composite signal comprising a first audio signal and a spoken utterance of a user is received by the recognizer, where the first audio signal comprises the particular audio characteristics used to configure the recognizer so that the recognizer is desensitized to the first audio signal. The spoke utterance is recognized in the presence of the first audio signal when the spoken utterance is one of the predetermined utterances. An operation is performed on the first audio signal.	02-28-2013
20130054236	METHOD FOR THE DETECTION OF SPEECH SEGMENTS - A method for the detection of noise and speech segments in a digital audio input signal, the input signal being divided into a plurality of frames including a first stage in which a first classification of a frame as noise is performed if the mean energy value for this frame and the previous N frames is not greater than a first energy threshold, N>1, a second stage in which for each frame that has not been classified as noise in the first stage it is decided if the frame is classified as noise or as speech based on combining at least a first criterion of spectral similarity of the frame with acoustic noise and speech models, a second criterion of analysis of the energy of the frame and a third criterion of duration, and of using a state machine for detecting the beginning of a segment as an accumulation of a determined number of consecutive frames with acoustic similarity greater than a first threshold and for detecting the end of the segment; a third stage in which the classification as speech or as noise of the signal frames carried out in the second stage is reviewed using criteria of duration.	02-28-2013
20130060567	Front-End Noise Reduction for Speech Recognition Engine - VoIP phones according to the present invention include a microphone, which may be internal or external, and allow the user to communicate unobtrusively, check voice mail and conduct other activities in an environment which can be noisy in general and extremely noisy sometimes. Speech recognition functionally may also be used to generate and send touch tone or DTMF tones such as in response to call trees or voice recognition functionality used by airlines, credit card companies, voice mail systems, and other applications. A system and method of audio processing which provides enhanced speech recognition is provided. Audio input is received at the microphone which is processed by adaptive noise cancellation to generate an enhanced audio signal. The operation of the speech recognition engine and the adaptive noise canceller may be advantageously controlled based on Voice Activity Detection (VAD).	03-07-2013
20130073285	Robust Downlink Speech and Noise Detector - A voice activity detection process is robust to a low and high signal-to-noise ratio speech and signal loss. A process divides an aural signal into one or more bands. Signal magnitudes of frequency components and the respective noise components are estimated. A noise adaptation rate modifies estimates of noise components based on differences between the signal to the estimated noise and signal variability.	03-21-2013
20130085753	Hybrid Client/Server Speech Recognition In A Mobile Device - A computing device is able to use an embedded speech recognizer and a network speech recognizer for speech recognition. In response to detecting speech in the captured audio, the computing device may forward the captured audio to its embedded speech recognizer and to a speech client for the network speech recognizer. The embedded speech recognizer provides an embedded-recognizer result for the captured audio. If a network-recognition criterion is met, the speech client forwards the captured audio to the network speech recognizer and receives a network-recognizer result for the captured audio from the network speech recognizer. A speech recognition result for the captured audio is forwarded to at least one application, wherein the speech recognition result is based on at least one of the embedded-recognizer result and the network-recognizer result.	04-04-2013
20130096915	System and Method for Dynamic Noise Adaptation for Robust Automatic Speech Recognition - A speech processing method and arrangement are described. A dynamic noise adaptation (DNA) model characterizes a speech input reflecting effects of background noise. A null noise DNA model characterizes the speech input based on reflecting a null noise mismatch condition. A DNA interaction model performs Bayesian model selection and re-weighting of the DNA model and the null noise DNA model to realize a modified DNA model characterizing the speech input for automatic speech recognition and compensating for noise to a varying degree depending on relative probabilities of the DNA model and the null noise DNA model.	04-18-2013
20130103397	SYSTEMS, DEVICES AND METHODS FOR LIST DISPLAY AND MANAGEMENT - Exemplary embodiments provide systems, devices and methods that allow creation and management of lists of items in an integrated manner on an interactive graphical user interface. A user may speak a plurality of list items in a natural unbroken manner to provide an audio input stream into an audio input device. Exemplary embodiments may automatically process the audio input stream to convert the stream into a text output, and may process the text output into one or more n-grams that may be used as list items to populate a list on a user interface.	04-25-2013
20130103398	Method and Apparatus for Audio Signal Classification - An apparatus comprising at least one processor and at least one memory including computer program code the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform determining a signal identification value for an audio signal, determining at least one noise level value for the audio signal, comparing the signal identification value against a signal identification threshold and each of the at least one noise level value against an associated noise level threshold, and identifying the audio signal dependent on the comparison.	04-25-2013
20130132077	Semi-Supervised Source Separation Using Non-Negative Techniques - Systems and methods for semi-supervised source separation using non-negative techniques are described. In some embodiments, various techniques disclosed herein may enable the separation of signals present within a mixture, where one or more of the signals may be emitted by one or more different sources. In audio-related applications, for instance, a signal mixture may include speech (e.g., from a human speaker) and noise (e.g., background noise). In some cases, speech may be separated from noise using a speech model developed from training data. A noise model may be created, for example, during the separation process (e.g., “on-the-fly”) and in the absence of corresponding training data.	05-23-2013
20130132078	VOICE ACTIVITY SEGMENTATION DEVICE, VOICE ACTIVITY SEGMENTATION METHOD, AND VOICE ACTIVITY SEGMENTATION PROGRAM - Provided is a noise-robust voice activity segmentation device which updates parameters used in the determination of voice-active segments without burdening the user, and also provided are a voice activity segmentation method and a voice activity segmentation program.	05-23-2013
20130138437	SPEECH RECOGNITION APPARATUS BASED ON CEPSTRUM FEATURE VECTOR AND METHOD THEREOF - A speech recognition apparatus, includes a reliability estimating unit configured to estimate reliability of a time-frequency segment from an input voice signal; and a reliability reflecting unit configured to reflect the reliability of the time-frequency segment to a normalized cepstrum feature vector extracted from the input speech signal and a cepstrum average vector included for each state of an HMM in decoding. Further, the speech recognition apparatus includes a cepstrum transforming unit configured to transform the cepstrum feature vector and the average vector through a discrete cosine transformation matrix and calculate a transformed cepstrum vector. Furthermore, the speech recognition apparatus includes an output probability calculating unit configured to calculate an output probability value of time-frequency segments of the input speech signal by applying the transformed cepstrum vector to the cepstrum feature vector and the average vector.	05-30-2013
20130144618	METHODS AND ELECTRONIC DEVICES FOR SPEECH RECOGNITION - A disclosed embodiment provides a speech recognition method to be performed by an electronic device. The method includes: collecting user-specific information that is specific to a user through the user's usage of the electronic device; recording an utterance made by the user; letting a remote server generate a remote speech recognition result for the recorded utterance; generating rescoring information for the recorded utterance based on the collected user-specific information; and letting the remote speech recognition result rescored based on the rescoring information.	06-06-2013
20130179163	IN-CAR COMMUNICATION SYSTEM FOR MULTIPLE ACOUSTIC ZONES - An In-Car Communication (ICC) system supports the communication paths within a car by receiving the speech signals of a speaking passenger and playing it back for one or more listening passengers. Signal processing tasks are split into a microphone related part and into a loudspeaker related part. A sound processing system suitable for use in a vehicle having multiple acoustic zones includes a plurality of microphone In-Car Communication (Mic-ICC) instances coupled and a plurality of loudspeaker In-Car Communication (Ls-ICC) instances. The system further includes a dynamic audio routing matrix with a controller and coupled to the Mic-ICC instances, a mixer coupled to the plurality of Mic-ICC instances and a distributor coupled to the Ls-ICC instances.	07-11-2013
20130185065	METHOD AND SYSTEM FOR USING SOUND RELATED VEHICLE INFORMATION TO ENHANCE SPEECH RECOGNITION - An audio signal may be received, in a processor associated with a vehicle. Sound related vehicle information representing one or more sounds may be received by the processor. The sound related vehicle information may or may not include an audio signal. A speech recognition process or system may be modified based on the sound related vehicle information.	07-18-2013
20130185066	METHOD AND SYSTEM FOR USING VEHICLE SOUND INFORMATION TO ENHANCE AUDIO PROMPTING - Sound related vehicle information representing one or more sounds may be received in a processor associated with a vehicle. The sound related vehicle information may or may not include an audio signal. An audio signal output to a passenger may be modified based on the sound related vehicle information.	07-18-2013
20130185067	NOISE REDUCTION METHOD. PROGRAM PRODUCT AND APPARATUS - A probability model represented as the product of the probability distribution of a mismatch vector g (or clean speech x) with an observed value y as a factor and the probability distribution of a mismatch vector g (or clean speech x) with a confidence index β for each band as a factor, executes MMSE estimation on the probability model, and estimates a clean speech estimated value x̂. As a result, each band influences the result of MMSE estimation, with a degree of contribution in accordance with the level of its confidence. Further, the higher the S/N ratio of observation speech, the more the output value becomes shifted to the observed value. As a result, the output of a front-end is optimized.	07-18-2013
20130185068	SPEECH RECOGNITION DEVICE, SPEECH RECOGNITION METHOD AND PROGRAM - The present invention provides a speech recognition device includes a threshold value candidate generation unit which extracts a feature indicating likeliness of being speech from a temporal sequence of input sound, and generates a plurality of threshold value candidates for discriminating between speech and non-speech; a speech determination unit which, by comparing the feature indicating likeliness of being speech with the plurality of threshold value candidates, determines respective speech sections, and outputs determination information as a result of the determination; a search unit which corrects each of the speech sections represented by the determination information, using a speech model and a non-speech model; and a parameter update unit which estimates a threshold value for determining a speech section, on the basis of distribution profiles of the feature respectively in utterance sections and in non-utterance sections, within each of the corrected speech sections, and makes an update with the threshold value.	07-18-2013
20130191124	VOICE PROCESSING APPARATUS, METHOD AND PROGRAM - Provided is a voice processing apparatus including a feature quantity calculation section extracting a feature quantity from a target frame of an input voice signal, a sound pressure estimation candidate point updating section making each frame of the input voice signal a sound pressure estimation candidate point, retaining the feature quantity of each sound pressure estimation candidate point, and updating the sound pressure estimation candidate point based on the feature quantity of the sound pressure estimation candidate point and the feature quantity of the target frame, a sound pressure estimation section calculating an estimated sound pressure of the input voice signal, based on the feature quantity of the sound pressure estimation candidate point, a gain calculation section calculating a gain applied to the input voice signal based on the estimated sound pressure, and a gain application section performing a gain adjustment of the input voice signal based on the gain.	07-25-2013
20130204617	APPARATUS, SYSTEM AND METHOD FOR NOISE CANCELLATION AND COMMUNICATION FOR INCUBATORS AND RELATED DEVICES - Systems, apparatuses and methods for integrating adaptive noise cancellation (ANC) with communication features in an enclosure, such as an incubator, bed, and the like. Utilizing one or more error and reference microphones, a controller for a noise cancellation portion reduces noise within a quiet area of the enclosure. Voice communications are provided to allow external voice signals to be transmitted to the enclosure with minimized interference with noise processing. Vocal communications from within the enclosure may be processed to determine certain characteristics/features of the vocal communications. Using these characteristics, certain emotive and/or physiological states may be identified.	08-08-2013
20130211832	SPEECH SIGNAL PROCESSING RESPONSIVE TO LOW NOISE LEVELS - A method of speech recognition in a vehicle. Audio including noise and a speech signal representative of an utterance from a user is received via a microphone, and a signal-to-noise ratio (SNR) for the received audio is calculated using a processor. It is determined whether the calculated SNR is greater than a predetermined SNR. If so, then a noise distribution is identified for addition to the received audio, and noise corresponding to the identified noise distribution is injected into the received audio to produce noise-injected audio including the speech signal.	08-15-2013
20130218560	METHOD AND APPARATUS FOR AUDIO INTELLIGIBILITY ENHANCEMENT AND COMPUTING APPARATUS - Method and apparatus for audio intelligibility enhancement and computing apparatus are provided. The method includes the following steps. Environment noise is detected by performing voice activity detection according to a detected audio signal from at least a microphone of a computing device. Noise information is obtained according to the detected environment noise and a first audio signal. A second audio signal is outputted by boosting the first audio signal under an adjustable headroom by the computing device according to the noise information and the first audio signal.	08-22-2013
20130226575	SYSTEMS AND METHODS FOR INTERACTIVELY ACCESSING HOSTED SERVICES USING VOICE COMMUNICATIONS - Systems and methods for an interactive voice response system are described herein. In one embodiment, the system may include a voice recognition module, a session manager, and a voice generator module. An utterance received at the voice recognition module may be converted into one or more structures using a lexicon tied to an ontology. Concepts in the utterance may then be identified. If sufficient information has been provided to identify a relevant service, corresponding text responses associated with that service may then be converted into voice messages by the voice generator.	08-29-2013
20130231929	SPEECH RECOGNITION DEVICE, SPEECH RECOGNITION METHOD, AND COMPUTER READABLE MEDIUM - The present invention can increase the types of noises that can be dealt with enough to enable speech recognition with a speech recognition rate of high accuracy.	09-05-2013
20130238327	SPEECH RECOGNITION PROCESSING DEVICE AND SPEECH RECOGNITION PROCESSING METHOD - A speech recognition processing device includes a speech synthesis part, a speech output part, a speech input part, and a speech recognition part. A first synthesized sound and a second synthesized sound synthesized by the speech synthesis part are output from the speech output part. Noise information is obtained from a sound signal input from the speech input part between an output period of the first synthesized sound and an output period of the second synthesized sound, and the noise information is used for noise removal processing in the speech recognition part.	09-12-2013
20130238328	Method and Apparatus for Speech Segmentation - Machine-readable media, methods, apparatus and system for speech segmentation are described. In some embodiments, a fuzzy rule may be determined to discriminate a speech segment from a non-speech segment. An antecedent of the fuzzy rule may include an input variable and an input variable membership. A consequent of the fuzzy rule may include an output variable and an output variable membership. An instance of the input variable may be extracted from a segment. An input variable membership function associated with the input variable membership and an output variable membership function associated with the output variable membership may be trained. The instance of the input variable, the input variable membership function, the output variable, and the output variable membership function may be operated, to determine whether the segment is the speech segment or the non-speech segment.	09-12-2013
20130246062	System and Method for Robust Estimation and Tracking the Fundamental Frequency of Pseudo Periodic Signals in the Presence of Noise - Method and system for tracking fundamental frequencies of pseudo-periodic signals in the presence of noise that include receiving a time-frequency representation of signals measured in a predefined environment; estimating and tracking a fundamental frequency of a respective pseudo-periodic signal at each time frame of the time-frequency representation by tracking detections of harmonious frequencies in the time-frequency representation over time; and outputting each respective estimated fundamental frequency associated with the pseudo-periodic signal of each respective time frame.	09-19-2013
20130253924	Speech Conversation Support Apparatus, Method, and Program - According to one embodiment, a speech conversation support apparatus includes a division unit, an analysis unit, a detection unit, an estimation unit and an output unit. The division unit divides a speech data item including a word item and a sound item into a plurality of divided speech data items. The analysis unit obtains an analysis result. The detection unit detects, for each divided speech data item, at least one clue expression indicating one of an instruction by a user and a state of the user. The estimation unit estimates, if the clue expression is detected, playback data item from at least one divided speech data item corresponding to a speech uttered before the clue expression is detected. The output unit outputs the playback data item.	09-26-2013
20130253925	Speech Recognition in a Lighting Apparatus - Acoustic energy is received at a lighting apparatus to create acoustic data, and speech recognition is performed on the acoustic data to determine one or more words. A message based on the acoustic data is sent across a network from the lighting apparatus.	09-26-2013
20130275128	CHANNEL DETECTION IN NOISE USING SINGLE CHANNEL DATA - Methods related to Generalized Mutual Interdependence Analysis (GMIA), a low complexity statistical method for projecting data in a subspace that captures invariant properties of the data, are implemented on a processor based system. GMIA methods are applied to the signal processing problem of voice activity detection and classification. Real-world conversational speech data are modeled to fit the GMIA assumptions. Low complexity GMIA computations extract reliable features for classification of sound under noisy conditions and operate with small amounts of data. A speaker is characterized by a slow varying or invariant channel that is learned and is tracked from single channel data by GMIA methods.	10-17-2013
20130282372	SYSTEMS AND METHODS FOR AUDIO SIGNAL PROCESSING - A method for detecting voice activity by an electronic device is described. The method includes detecting near end speech based on a near end voiced speech detector and at least one single channel voice activity detector. The near end voiced speech detector is associated with a harmonic statistic based on a speech pitch histogram.	10-24-2013
20130282373	SYSTEMS AND METHODS FOR AUDIO SIGNAL PROCESSING - A method for restoring a processed speech signal by an electronic device is described. The method includes obtaining at least one audio signal. The method also includes performing bin-wise voice activity detection based on the at least one audio signal. The method further includes restoring the processed speech signal based on the bin-wise voice activity detection.	10-24-2013
20130289982	RECORDING MEDIUM - A recording medium is provided that records a separating step of separating a mixed sound signal in which a plurality of excitations are mixed into the respective excitations, and a step of performing speech detection on the plurality of separated excitation signals, judging whether or not the plurality of excitation signals are speech and generating speech section information indicating speech/non-speech information for each excitation signal. The recording medium also includes at least one of a step of calculating and analyzing an utterance overlap duration using the speech section information for combinations of the plurality of excitation signals and a step of calculating and analyzing a silence duration. The recording medium further includes a step of calculating a degree of establishment of a conversation indicating the degree of establishment of a conversation based on the extracted utterance overlap duration or the silence duration.	10-31-2013
20130297305	NON-SPATIAL SPEECH DETECTION SYSTEM AND METHOD OF USING SAME - A non-spatial speech detection system includes a plurality of microphones whose output is supplied to a fixed beamformer. An adaptive beamformer is used for receiving the output of the plurality of microphones and one or more processors are used for processing an output from the fixed beamformer and identifying speech from noise though the use of an algorithm utilizing a covariance matrix.	11-07-2013
20130297306	Adaptive Equalization System - An adaptive equalization system that adjusts the spectral shape of a speech signal based on an intelligibility measurement of the speech signal may improve the intelligibility of the output speech signal. Such an adaptive equalization system may include a speech intelligibility measurement module, a spectral shape adjustment module, and an adaptive equalization module. The speech intelligibility measurement module is configured to calculate a speech intelligibility measurement of a speech signal. The spectral shape adjustment module is configured to generate a weighted long-term speech curve based on a first predetermined long-term average speech curve, a second predetermined long-term average speech curve, and the speech intelligibility measurement. The adaptive equalization module is configured to adapt equalization coefficients for the speech signal based on the weighted long-term speech curve.	11-07-2013
20130304463	NOISE CANCELLATION METHOD - An embodiment of the invention provides a noise cancellation method for an electronic device. The method comprises: receiving an audio signal; applying a Fast Fourier Transform operation on the audio signal to generate a sound spectrum; acquiring a first spectrum corresponding to a noise and a second spectrum corresponding to a human voice signal from the sound spectrum; estimating a center frequency according to the first spectrum and the second spectrum; and applying a high pass filtering operation to the sound spectrum according to the center frequency.	11-14-2013
20130304464	METHOD AND APPARATUS FOR ADAPTIVELY DETECTING A VOICE ACTIVITY IN AN INPUT AUDIO SIGNAL - The disclosure provides a method and an apparatus for adaptively detecting a voice activity in an input audio signal composed of frames. The method comprises the steps of: determining a noise characteristic of the input signal based on a received frame of the input audio signal; deriving a voice activity detection (VAD) parameter based on the noise characteristic of the input audio signal; and comparing the derived VAD parameter with a threshold value to provide a voice activity detection decision.	11-14-2013
20130311176	MRI Compatible Headset - A wireless headset capable of receiving audio signals transmitted wirelessly and compatible for use in an MRI scanner is disclosed. The headset includes a first wireless module connected to the first earphone and a second wireless module connected to the second earphone. Each wireless module is electrically connected to a speaker in the respective earphone. The first wireless module receives the audio signal from a remote source and coordinates transmission of the audio signal to each of the speakers. The compact nature of each earphone minimizes the length of wire runs. In addition, the headset is made of materials having low magnetic susceptibility such that they will not be affected by the magnetic field from the MRI scanner.	11-21-2013
20130317816	METHOD FOR RECOGNIZING AND INTERPRETING PATTERNS IN NOISY DATA SEQUENCES - This invention maps possibly noisy digital input from any of a number of different hardware or software sources such as keyboards, automatic speech recognition systems, cell phones, smart phones or the web onto an interpretation consisting of an action and one or more physical objects, such as robots, machinery, vehicles, etc. or digital objects such as data files, tables and databases. Tables and lists of (i) homonyms and misrecognitions, (ii) thematic relation patterns, and (iii) lexicons are used to generate alternative forms of the input which are scored to determine the best interpretation of the noisy input. The actions may be executed internally or output to any device which contains a digital component such as, but not limited to, a computer, a robot, a cell phone, a smart phone or the web. This invention may be implemented on sequential and parallel compute engines and systems.	11-28-2013
20130332157	AUDIO NOISE ESTIMATION AND AUDIO NOISE REDUCTION USING MULTIPLE MICROPHONES - Digital signal processing techniques for automatically reducing audible noise from a sound recording that contains speech. A noise suppression system uses two types of noise estimators, including a more aggressive one and less aggressive one. Decisions are made on how to select or combine their outputs into a usable noise estimate in a different speech and noise conditions. A 2-channel noise estimator is described. Other embodiments are also described and claimed.	12-12-2013
20140006019	APPARATUS FOR AUDIO SIGNAL PROCESSING	01-02-2014
20140012573	SIGNAL PROCESSING APPARATUS HAVING VOICE ACTIVITY DETECTION UNIT AND RELATED SIGNAL PROCESSING METHODS - A signal processing apparatus includes a speech recognition system and a voice activity detection unit. The voice activity detection unit is coupled to the speech recognition system, and arranged for detecting whether an audio signal is a voice signal and accordingly generating a voice activity detection result to the speech recognition system to control whether the speech recognition system should perform speech recognition upon the audio signal.	01-09-2014
20140039886	OPERATING METHOD FOR VOICE ACTIVITY DETECTION/SILENCE SUPPRESSION SYSTEM - A Voice Activity Detection/Silence Suppression (VAD/SS) system is connected to a channel of a transmission pipe. The channel provides a pathway for the transmission of energy. A method for operating a VAD/SS system includes detecting the energy on the channel, and activating or suppressing activation of the VAD/SS system depending upon the nature of the energy detected on the channel.	02-06-2014
20140058726	Automated difference recognition between speaking sounds and music - The present invention relates to means and methods of automated difference recognition between speech and music signals in voice communication systems, devices, telephones, and methods, and more specifically, to systems, devices, and methods that automate control when either speech or music is detected over communication links. The present invention provides a novel system and method for monitoring the audio signal, analyze selected audio signal components, compare the results of analysis with a pre-determined threshold value, and classify the audio signal either as speech or music.	02-27-2014
20140067387	Utilizing Scalar Operations for Recognizing Utterances During Automatic Speech Recognition in Noisy Environments - Scalar operations for model adaptation or feature enhancement may be utilized for recognizing an utterance during automatic speech recognition in a noisy environment. An utterance including distorted speech generated from a transmission source for delivery to a receiver, may be received by a computer. The distorted speech may be caused by the noisy environment and channel distortion. Computations using scalar operations in the form of an algorithm may then be performed for recognizing the utterance. As a result of performing all of the computations with scalar operations, computational complexity is very small in comparison to matrix and vector operations. Vector Taylor Series with diagonal Jacobian approximation may also be utilized as a distortion-model-based noise robust algorithm with scalar operations.	03-06-2014
20140067388	ROBUST VOICE ACTIVITY DETECTION IN ADVERSE ENVIRONMENTS - A method and a system for robust voice activity detection under adverse environments are provided. The apparatus includes a controller for controlling a signal receiving module, a signal blocking module, a silent/non-silent classification module for discriminating silent blocks by comparing a temporal feature to a threshold, a total variation filtering module for enhancing voiced portions and reducing an effect of background noises, a frame division module for dividing a filtered signal into small frames, a residual processing module for estimating a noise floor, a silent/non-silent frame classification module, a voice/non-voice signal frame classification module based on autocorrelation features of a total variation filtered signal, a binary-flag merging and deletion module, a voice endpoint detection and correction module, and a voice endpoint storing/sending module. A decision-tree is arranged based on time and memory complexity of feature extraction methods. The system is able to determine voice region endpoints under different adverse environments.	03-06-2014
20140074464	THOUGHT RECOLLECTION AND SPEECH ASSISTANCE DEVICE - Some embodiments of the inventive subject matter may include a method for detecting speech loss and supplying appropriate recollection data to the user. The method can include detecting a speech stream from a user. The method can include converting the speech stream to text. The method can include storing the text. The method can include detecting an interruption to the speech stream, wherein the interruption to the speech stream indicates speech loss by the user. The method can include searching a catalog using the text as a search parameter to find relevant catalog data. The method can include presenting the relevant catalog data to remind the user about the speech stream.	03-13-2014
20140095157	Method and Device for Voice Operated Control - At least one exemplary embodiment is directed to a method and device for voice operated control with learning. The method can include measuring a first sound received from a first microphone, measuring a second sound received from a second microphone, detecting a spoken voice based on an analysis of measurements taken at the first and second microphone, learning from the analysis when the user is speaking and a speaking level in noisy environments, training a decision unit from the learning to be robust to a detection of the spoken voice in the noisy environments, mixing the first sound and the second sound to produce a mixed signal, and controlling the production of the mixed signal based on the learning of one or more aspects of the spoken voice and ambient sounds in the noisy environments.	04-03-2014
20140122068	SIGNAL PROCESSING APPARATUS, SIGNAL PROCESSING METHOD AND COMPUTER PROGRAM PRODUCT - According to an embodiment, a signal processing apparatus includes an ambient sound estimating unit, a representative component estimating unit, a voice estimating unit, and a filter generating unit. The ambient sound estimating unit is configured to estimate, from the feature, an ambient sound component that is non-stationary among ambient sound components having a feature. The representative component estimating unit is configured to estimate a representative component representing ambient sound components estimated from one or more features for a time period, based on a largest value among the ambient sound components within the time period. The voice estimating unit is configured to estimate, from the feature, a voice component having the feature. The filter generating unit is configured to generate a filter for extracting a voice component and an ambient sound component from the feature, based on the voice component and the representative component.	05-01-2014
20140136194	METHODS AND APPARATUS FOR IDENTIFYING FRAUDULENT CALLERS - The methods, apparatus, and systems described herein are designed to identify fraudulent callers. A voice print of a call is created and compared to known voice prints to determine if it matches one or more of the known voice prints. The methods include a pre-processing step to separate speech from non-speech, selecting a number of elements that affect the voice print the most, and/or computing an adjustment factor based on the scores of each received voice print against known voice prints.	05-15-2014
20140163978	SPEECH RECOGNITION POWER MANAGEMENT - Power consumption for a computing device may be managed by one or more keywords. For example, if an audio input obtained by the computing device includes a keyword, a network interface module and/or an application processing module of the computing device may be activated. The audio input may then be transmitted via the network interface module to a remote computing device, such as a speech recognition server. Alternately, the computing device may be provided with a speech recognition engine configured to process the audio input for on-device speech recognition.	06-12-2014
20140163979	VOICE PROCESSING DEVICE, VOICE PROCESSING METHOD - A voice processing device includes: a processor; and a memory which stores a plurality of instructions, which when executed by the processor, cause the processor to execute, receiving a first signal including a plurality of voice segments; controlling such that a non-voice segment with a length equal to or greater than a predetermined first threshold value exists between at least one of the plurality of voice segments; and outputting a second signal including the plurality of voice segments and the controlled non-voice segment.	06-12-2014
20140172424	PRESERVING AUDIO DATA COLLECTION PRIVACY IN MOBILE DEVICES - Techniques are disclosed for using the hardware and/or software of the mobile device to obscure speech in the audio data before a context determination is made by a context awareness application using the audio data. In particular, a subset of a continuous audio stream is captured such that speech (words, phrases and sentences) cannot be reliably reconstructed from the gathered audio. The subset is analyzed for audio characteristics, and a determination can be made regarding the ambient environment.	06-19-2014
20140180685	SIGNAL PROCESSING DEVICE, SIGNAL PROCESSING METHOD, AND COMPUTER PROGRAM PRODUCT - According to an embodiment, a signal processing device includes a background calculator, a signal generator, an extractor, a similarity calculator, and a mixer. The background calculator is configured to calculate a first background signal in which a speech signal is removed, based on the acoustic signals. The signal generator is configured to generate a reference signal from at least one of the acoustic signals. The extractor is configured to extract a second background signal by removing a speech signal from the reference signal. The similarity calculator is configured to calculate a similarity between feature data of the background signals. The mixer is configured to calculate a weighted sum of the background signals in such a way that a greater weight is given to the first background signal as the similarity is higher and a greater weight is given to the second background signal as the similarity is lower.	06-26-2014
20140188467	VIBRATION SENSOR AND ACOUSTIC VOICE ACTIVITY DETECTION SYSTEMS (VADS) FOR USE WITH ELECTRONIC SYSTEMS - A voice activity detector (VAD) combines the use of an acoustic VAD and a vibration sensor VAD as appropriate to the conditions a host device is operated. The VAD includes a first detector receiving a first signal and a second detector receiving a second signal. The VAD includes a first VAD component coupled to the first and second detectors. The first VAD component determines that the first signal corresponds to voiced speech when energy resulting from at least one operation on the first signal exceeds a first threshold. The VAD includes a second VAD component coupled to the second detector. The second VAD component determines that the second signal corresponds to voiced speech when a ratio of a second parameter corresponding to the second signal and a first parameter corresponding to the first signal exceeds a second threshold.	07-03-2014
20140195228	VOICE ACTIVITY DETECTION/SILENCE SUPPRESSION SYSTEM - A Voice Activity Detection/Silence Suppression (VAD/SS) system is connected to a channel of a transmission pipe. The channel provides a pathway for the transmission of energy. A method for operating a VAD/SS system includes detecting the energy on the channel, and activating or suppressing activation of the VAD/SS system depending upon the nature of the energy detected on the channel.	07-10-2014
20140200887	SOUND PROCESSING DEVICE AND SOUND PROCESSING METHOD - A sound processing device includes a noise suppression unit configured to suppress a noise component included in an input sound signal, an auxiliary noise addition unit configured to add auxiliary noise to the input sound signal, whose noise component has been suppressed by the noise suppression unit, to generate an auxiliary noise-added signal, a distortion calculation unit configured to calculate a degree of distortion of the auxiliary noise-added signal, and a control unit configured to control an addition amount by which the auxiliary noise addition unit adds the auxiliary noise based on the degree of distortion calculated by the distortion calculation unit.	07-17-2014
20140207446	INDEFINITE SPEECH INPUTS - Embodiments are disclosed that relate to the use of speech inputs including indefinite quantitative terms as computing device inputs. For example, one disclosed embodiment provides a method of operating a computing device, the method including receiving a speech input comprising an indefinite quantitative term, determining a definite quantity corresponding to the indefinite quantitative term, and applying the definite quantity to an action performed via the computing device in response to the speech input.	07-24-2014
20140207447	VOICE IDENTIFICATION METHOD AND APPARATUS - Embodiments of the present invention provide a voice identification method, which includes: obtaining voice data; obtaining a confidence value according to the voice data; obtaining a noise scenario according to the voice data; obtaining a confidence threshold corresponding to the noise scenario; and if the confidence value is greater than or equal to the confidence threshold, processing the voice data. An apparatus is also provided. The method and apparatus that flexibly adjust the confidence threshold according to the noise scenario greatly improve a voice identification rate under a noise environment.	07-24-2014
20140214418	SOUND PROCESSING DEVICE AND SOUND PROCESSING METHOD - A sound processing device includes a first noise suppression unit configured to suppress a noise component included in an input sound signal using a first suppression amount, a second noise suppression unit configured to suppress the noise component included in the input sound signal using a second suppression amount greater than the first suppression amount, a speech section detection unit configured to detect whether the sound signal whose noise component has been suppressed by the second noise suppression unit includes a speech section having a speech for every predetermined time, and a speech recognition unit configured to perform a speech recognizing process on a section, which is detected to be a speech section by the speech section detection unit, in the sound signal whose noise component has been suppressed by the first noise suppression unit.	07-31-2014
20140236594	ASSISTIVE DEVICE FOR CONVERTING AN AUDIO SIGNAL INTO A VISUAL REPRESENTATION - A device for converting an audio signal into a visual representation, the device comprising at least one receiver for receiving the audio signal; a signal processing unit for processing the received audio signal; a converter for converting the processed audio signal into a visual representation; and projecting means for projecting the visual representation onto a display, wherein the display comprises an embedded grating structure.	08-21-2014
20140244249	System and Method for Identification of Intent Segment(s) in Caller-Agent Conversations - Identification of an intent of a conversation can be useful for real-time or post-processing purposes. According to example embodiments, a method, and corresponding apparatus of identifying at least one intent-bearing utterance in a conversation, comprises determining at least one feature for each utterance among a subset of utterances of the conversation; classifying each utterance among the subset of utterances, using a classifier, as an intent classification or a non-intent classification based at least in part on a subset of the at least one determined feature; and selecting at least one utterance, with intent classification, as an intent-bearing utterance based at least in part on classification results by the classifier. Through identification of an intent bearing utterance, a call center for example, can provide improved service for callers through, for example, more effective directing of a call to a live agent.	08-28-2014
20140244250	CARDIOID BEAM WITH A DESIRED NULL BASED ACOUSTIC DEVICES, SYSTEMS, AND METHODS - An acoustic system includes first one or mole acoustic elements designed and arranged in a first manner to facilitate generation of a first signal that includes mostly undesired audio, substantially void of desired audio, in response to a presence of the desired audio and the undesired audio. Second one or more acoustic elements are designed and arranged in a second complementary manner to facilitate generation of a second signal that includes both the desired and the undesired audio, in response to the presence of the desired audio and the undesired audio. A signal extraction component receives the first signal and the second signal. The signal extraction component further includes an inhibit component. The inhibit component is coupled to the first signal and the second signal. A delay element is coupled to a path of the second signal. The delay element introduces a deterministic delay to the second signal. A value of the deterministic delay is selected to model reverberation of the environment that the system is used in. The first signal is input to the adaptive filter and an output of the inhibit component is in communication with the adaptive filter to control adaptive filtering. An output of the adaptive filter is a first input to an adder and the output of the delay element is a second input to the adder. The adder subtracts the first input from the second input to create an output, which is the desired audio.	08-28-2014
20140244251	Systems and Methods for Dynamic Re-Configurable Speech Recognition - Speech recognition models are dynamically re-configurable based on user information, background information such as background noise and transducer information such as transducer response characteristics to provide users with alternate input modes to keyboard text entry. The techniques of dynamic re-configurable speech recognition provide for deployment of speech recognition on small devices such as mobile phones and personal digital assistants as well environments such as office, home or vehicle while maintaining the accuracy of the speech recognition.	08-28-2014
20140249812	ROBUST SPEECH BOUNDARY DETECTION SYSTEM AND METHOD - A system for audio processing comprising an initial background statistical model system configured to generate an initial background statistical model using a predetermined sample size of audio data. A parameter computation system configured to generate parametric data for the audio data including cepstral and energy parameters. A background statistics computation system configured to generate preliminary background statistics for determining whether speech has been detected. A first speech detection system configured to determine whether speech was present in the initial sample of audio data. An adaptive background statistical model system configured to provide an adaptive background statistical model for use in continuous processing of audio data for speech detection. A parameter computation system configured to calculate cepstral parameters, energy parameters and other suitable parameters for speech detection. A speech/non-speech classification system configured to classify individual frames as speech frames or non-speech frames, based on the computed parameters and the adaptive background statistical model data. A background statistics update system configured to update the background statistical model based on detected speech and non-speech frames. A second speech detection system configured to perform speech detection processing and to generate a suitable indicator for use in processing audio data that is determined to include speech signals.	09-04-2014
20140278391	APPARATUS AND METHOD TO CLASSIFY SOUND TO DETECT SPEECH - Audio frames are classified as either speech, non-transient background noise, or transient noise events. Probabilities of speech or transient noise event, or other metrics may be calculated to indicate confidence in classification. Frames classified as speech or noise events are not used in updating models (e.g., spectral subtraction noise estimates, silence model, background energy estimates, signal-to-noise ratio) of non-transient background noise. Frame classification affects acceptance/rejection of recognition hypothesis. Classifications and other audio related information may be determined by circuitry in a headset, and sent (e.g., wirelessly) to a separate processor-based recognition device.	09-18-2014
20140278392	Method and Apparatus for Pre-Processing Audio Signals - The disclosure is directed to pre-processing audio signals. In one implementation, an electronic device receives an audio signal that has audio information, obtains auxiliary information (such as location, velocity, direction, light, proximity of objects, and temperature), and determines, based on the audio information and the auxiliary information, a type of audio environment in which the electronic device is operating. The device selects an audio pre-processing procedure based on the determined audio environment type and pre-processes the audio signal according to the selected pre-processing procedure. The device may then perform speech recognition on the pre-processed audio signal.	09-18-2014
20140278393	Apparatus and Method for Power Efficient Signal Conditioning for a Voice Recognition System - A disclosed method includes monitoring an audio signal energy level while having a plurality of signal processing components deactivated and activating at least one signal processing component in response to a detected change in the audio signal energy level. The method may include activating and running a voice activity detector on the audio signal in response to the detected change where the voice activity detector is the at least one signal processing component. The method may further include activating and running the noise suppressor only if a noise estimator determines that noise suppression is required. The method may activate and runs a noise type classifier to determine the noise type based on information received from the noise estimator and may select a noise suppressor algorithm, from a group of available noise suppressor algorithms, where the selected noise suppressor algorithm is the most power consumption efficient.	09-18-2014
20140278394	Apparatus and Method for Beamforming to Obtain Voice and Noise Signals - One method of operation includes beamforming a plurality of microphone outputs to obtain a plurality of virtual microphone audio channels. Each virtual microphone audio channel corresponds to a beamform. The virtual microphone audio channels include at least one voice channel and at least one noise channel. The method includes performing voice activity detection on the at least one voice channel and adjusting a corresponding voice beamform until voice activity detection indicates that voice is present on the at least one voice channel. Another method beamforms the plurality of microphone outputs to obtain a plurality of virtual microphone audio channels, where each virtual microphone audio channel corresponds to a beamform, and with at least one voice channel and at least one noise channel. The method performs voice recognition on the at least one voice channel and adjusts the corresponding voice beamform to improve a voice recognition confidence metric.	09-18-2014
20140278395	Method and Apparatus for Determining a Motion Environment Profile to Adapt Voice Recognition Processing - A method and apparatus for determining a motion environment profile to adapt voice recognition processing includes a device receiving an acoustic signal including a speech signal, which is provided to a voice recognition module. The method also includes determining a motion profile for the device, determining a temperature profile for the device, and determining a noise profile for the acoustic signal. The method further includes determining, from the motion, temperature, and noise profiles, a motion environment profile for the device and adapting voice recognition processing for the speech signal based on the motion environment profile.	09-18-2014
20140278396	ACOUSTIC SIGNAL MODIFICATION - Systems and methods for modifying acoustic signals are provided by one or more microphones using acoustic transfer functions. The acoustic transfer functions may be determined based in part on an acoustic model and on a determined location of an acoustic element.	09-18-2014
20140278397	SPEAKER-IDENTIFICATION-ASSISTED UPLINK SPEECH PROCESSING SYSTEMS AND METHODS - Methods, systems, and apparatuses are described for performing speaker-identification-assisted speech processing in an uplink path of a communication device. In accordance with certain embodiments, a communication device includes speaker identification (SID) logic that is configured to identify the identity of a near-end speaker. Knowledge of the identity of the near-end speaker is then used to improve the performance of one or more uplink speech processing algorithms implemented on the communication device.	09-18-2014
20140278398	APPARATUSES AND METHODS TO DETECT AND OBTAIN DEIRED AUDIO - A device and method to detect desired audio includes a ratio calculator. The ratio calculator calculates a ratio between a primary acoustic signal, and a reference acoustic signal. The primary acoustic signal contains desired audio and undesired audio and the reference acoustic signal contains mostly undesired audio, substantially void of undesired audio. A long-term mean value calculator is coupled to the ratio calculator. The long-term mean value calculator maintains an average of the ratio. A comparator is coupled to the ratio calculator and the long-term value calculator. The comparator compares the ratio with the average. Desired audio is detected when the ratio is greater than the average by a threshold amount.	09-18-2014
20140278399	SPEECH FRAGMENT DETECTION FOR MANAGEMENT OF INTERACTION IN A REMOTE CONFERENCE - A conferencing system and method involves conducting a conference between endpoints. The conference can be a videoconference in which audio data and video data are exchanged or can be an audio-only conference. Audio of the conference is obtained from one of the endpoints, and speech is detected in the obtained audio. The detected speech is analyzed to determine that the detected speech constitutes a speech fragment, and an indicia indicative of the determined speech fragment is generated. For a videoconference, the indicia can be a visual cue to be added to video for the given endpoint when displayed at other endpoints. For an audio-only conference, the indicia can be an audio cue to be added to the audio of the conference at the other end points.	09-18-2014
20140303970	ADAPTING SPEECH RECOGNITION ACOUSTIC MODELS WITH ENVIRONMENTAL AND SOCIAL CUES - An acoustic model adaptation system includes a memory device and a model selector engine coupled to the memory device. The model selector engine is configured to compile information of environmental conditions to identify a current speech environment for audio input into a speech recognizer on a device. The model selector engine is further configured to compare the information of the environmental conditions with profiles of acoustic models. Each profile associates with an acoustic model. Each acoustic model compensates for background noise or acoustical distortions of the audio input. The model selector engine is further configured to select a first acoustic model for the speech recognizer based on the information of the environmental conditions exclusive of audio input from the user.	10-09-2014
20140303971	TERMINAL AND CONTROL METHOD THEREOF - This specification provides a terminal including a microphone that is configured to receive a user's voice input for controlling an operation of the terminal, an analyzing unit that is configured to sense a degree of proximity between the user's mouth and the microphone while the voice is input, an output unit that is configured to output at least one of visible data and audible data based on the voice, and a controller that is configured to restrict the output of the audible data when the degree of proximity is smaller than a preset range and a volume of the voice is below a preset reference volume.	10-09-2014
20140303972	Method and Apparatus for Identifying Acoustic Background Environments Based on Time and Speed to Enhance Automatic Speech Recognition - Disclosed are systems, methods, and computer readable media for identifying an acoustic environment of a caller. The method embodiment comprises analyzing acoustic features of a received audio signal from a caller, receiving meta-data information, classifying a background environment of the caller based on the analyzed acoustic features and the meta-data, selecting an acoustic model matched to the classified background environment from a plurality of acoustic models, and performing speech recognition as the received audio signal using the selected acoustic model.	10-09-2014
20140309994	APPARATUS AND METHOD FOR VOICE PROCESSING - An apparatus and a corresponding method for voice processing are provided. The apparatus includes a sound receiver, a camera, and a processor. The sound receiver receives a sound signal. The camera takes a video. The processor is coupled to the sound receiver and the camera. The processor obtains a voice onset time (VOT) of the sound signal, detects a human face in the video, detects a change time of a mouth contour of the human face, and verifies at least one preset condition. When all of the preset conditions are true, the processor performs speech recognition on the sound signal. The at least one preset condition includes that a difference between the VOT and the change time is smaller than a threshold value.	10-16-2014
20140316778	NOISE CANCELLATION FOR VOICE ACTIVATION - Devices, methods, and systems for noise cancellation for voice activation are described herein. For example, one or more embodiments include determining, with a computing device, whether a background noise received with the computing device is recognized, creating a stored noise cancellation filter based on the background noise when the background noise is not recognized, receiving a signal that includes the background noise and a voice command at the computing device, filtering out the background noise from the signal with the stored noise cancellation filter, and processing the voice command.	10-23-2014
20140324421	VOICE PROCESSING APPARATUS AND VOICE PROCESSING METHOD - A voice processing apparatus includes: a voice receptor configured to collect a user voice, convert the user voice into a first voice signal, and to output the first voice signal; an audio processor configured to process a sound output through a speaker to output an audio signal; a memory unit configured to store the first voice signal output from the voice receptor and the audio signal output from the audio processor; an echo cancelor configured to remove an echo from the first voice signal to generate a second voice signal; and a first controller configured to control the echo cancelor to generate the second voice signal based on the first voice signal and the audio signal stored in the memory unit.	10-30-2014
20140343935	APPARATUS AND METHOD FOR PERFORMING ASYNCHRONOUS SPEECH RECOGNITION USING MULTIPLE MICROPHONES - An apparatus and method for performing asynchronous speech recognition using multiple microphones are disclosed. The apparatus includes a microphone selection unit, a signal-to-noise ratio measurement unit, a speech recognition and verification unit, and a final recognition result output unit. The microphone selection unit selects two or more microphones responsive to a user's voice from among a plurality of microphones distributed around the user. The signal-to-noise ratio measurement unit measures the signal to noise ratios of inputs of the selected two or more microphones. The speech recognition and verification unit performs speech recognition using the input of the microphone having a highest signal to noise ratio, and verifies the speech recognition using the inputs of the remaining microphones. The final recognition result output unit outputs the final recognition results of the user's voice based on the results of the speech recognition and verification unit.	11-20-2014
20140350926	Voice Controlled Audio Recording System with Adjustable Beamforming - A method of operation beamforms a plurality of microphone outputs to obtain a plurality of virtual microphone audio channels with at least one audio output channel and at least one audio control channel. The method performs voice recognition on the audio control channel to detect voice commands for controlling audio output channel attributes, and adjusts an audio channel attribute in response to detecting a voice command. Adjusting an attribute of the audio channel may be accomplished by, for example, controlling one or more parameters of an adjustable beamformer. The detected voice commands for controlling audio channel attributes may include voice commands for controlling audio sensitivity zooming, panning in a specified direction, focusing on a specified direction, blocking a specified direction, mixing a narrator's voice, blocking a narrator's voice, or reducing background noise. An apparatus that performs the method of operation is also disclosed.	11-27-2014
20140350927	DEVICE AND METHOD FOR SUPPRESSING NOISE SIGNAL, DEVICE AND METHOD FOR DETECTING SPECIAL SIGNAL, AND DEVICE AND METHOD FOR DETECTING NOTIFICATION SOUND - Provided is a noise signal suppressing device including: an input unit configured to receive a sound signal; a time/frequency converting unit; an independent peak spectrum extracting unit configured to extract a peak spectrum having independence; a persistence determining unit configured to determine that the peak spectrum having independence persists for a predetermined period or longer; a noise-signal suppressing unit configured to suppress the peak spectrum having independence as the noise signal. The independent peak spectrum extracting unit includes: a first peak extracting unit configured to extract a peak spectrum having higher energy than that of an adjacent frequency signal, and a second peak extracting unit configured to extract a peak spectrum maintaining a frequency interval of equal to or larger than a predetermined value with respect to a peak spectrum adjacent thereto as the peak spectrum having independence.	11-27-2014
20140358534	General Sound Decomposition Models - Sound decomposition models are described. In one or more implementations, a plurality of individual models is generated for respective ones of a plurality of sound sources. The plurality of models is collected to form a universal audio model that is configured to support sound decomposition of sound data through use of one or more of the models. The plurality of models is not generated using a sound source that originated at least a portion of the sound data.	12-04-2014
20140358535	METHOD OF EXECUTING VOICE RECOGNITION OF ELECTRONIC DEVICE AND ELECTRONIC DEVICE USING THE SAME - A method of performing a voice command function in an electronic device includes detecting voice of a user, acquiring one or more pieces of attribute information from the voice, and authenticating the user by comparing the attribute information with pre-stored authentic attribe information, using a recognition model. An electronic device includes a voice input module configured to detect a voice of a user, a first processor configured to acquire one or more pieces of attribute information from the voice and authenticate the user by comparing the attribute information with a recognition model, and a second processor configured to when the attribute information matches the recognition mode, activate the voice command function, receive a voice command of the user, and execute an application corresponding to the voice command. Other embodiments are also disclosed.	12-04-2014
20140372113	MICROPHONE AND VOICE ACTIVITY DETECTION (VAD) CONFIGURATIONS FOR USE WITH COMMUNICATION SYSTEMS - Communication systems are described, including both portable handset and headset devices, which use a number of microphone configurations to receive acoustic signals of an environment. The microphone configurations include, for example, a two microphone array including two unidirectional microphones, and a two-microphone array including one unidirectional microphone and one omnidirectional microphone. The communication systems also include Voice Activity Detection (VAD) devices to provide information of human voicing activity. Components of the communications systems receive the acoustic signals and voice activity signals and, in response, automatically generate control signals from data of the voice activity signals. Components of the communication systems use the control signals to automatically select a denoising method appropriate to data of frequency subbands of the acoustic signals. The selected denoising method is applied to the acoustic signals to generate denoised acoustic signals when the acoustic signal includes speech and noise.	12-18-2014
20150012267	COUNTERMEASURES FOR VOICE RECOGNITION DETERIORATION DUE TO EXTERIOR NOISE FROM PASSING VEHICLES - Mitigating disruption to a voice recognition system in a vehicle caused by a passing source of noise is provided. Sensors sense an approaching truck or the like that is likely to disrupt operation of the in-vehicle voice recognition system. Countermeasures are initiated to mitigate the disruption.	01-08-2015
20150012268	SPEECH PROCESSING DEVICE, SPEECH PROCESSING METHOD, AND SPEECH PROCESSING PROGRAM - A speech processing device includes a reverberation characteristic selection unit configured to correlate correction data indicating a contribution of a reverberation component based on a corresponding reverberation characteristic with an adaptive acoustic model which is trained using reverbed speech to which a reverberation based on the corresponding reverberation characteristic is added for each of reverberation characteristics, to calculate likelihoods based on the adaptive acoustic models for a recorded speech, and to select correction data corresponding to the adaptive acoustic model having the calculated highest likelihood, and a dereverberation unit configured to remove the reverberation component from the speech based on the correction data.	01-08-2015
20150012269	SPEECH PROCESSING DEVICE, SPEECH PROCESSING METHOD, AND SPEECH PROCESSING PROGRAM - A speech processing device includes a distance acquisition unit configured to acquire a distance between a sound collection unit configured to record speech from a sound source and the sound source, a reverberation characteristic estimation unit configured to estimate a reverberation characteristic based on the distance acquired by the distance acquisition unit, a correction data generation unit configured to generate correction data indicating a contribution of a reverberation component from the reverberation characteristic estimated by the reverberation characteristic estimation unit; and a dereverberation unit configured to remove the reverberation component from the speech by correcting the amplitude of the speech based on the correction data.	01-08-2015
20150012270	SYSTEMS AND METHODS FOR IMPROVING AUDIO CONFERENCING SERVICES - Systems and methods are disclosed herein for improving audio conferencing services. One aspect relates to processing audio content of a conference. A first audio signal is received from a first conference participant, and a start and an end of a first utterance by the first conference participant are detected from the first audio signal. A second audio signal is received from a second conference participant, and a start and an end of a second utterance by the second conference participant is detected from the second audio signal. The second conference participant is provided with at least a portion of the first utterance, wherein at least one of start time, start point, and duration is determined based at least in part on the start, end, or both, of the second utterance.	01-08-2015
20150019215	ELECTRIC EQUIPMENT AND CONTROL METHOD THEREOF - An electric equipment including a communication unit to communicate with at least one electric equipment in a predetermined space through a network, a sound collection unit to collect sound in the predetermined space, a voice recognition unit to recognize a voice from the collected sound, and a controller to transmit a noise reduction control signal to the at least one electric equipment when the recognized voice is an operation command. Voice recognition is performed in a state in which surrounding noise is reduced, thereby improving performance of the voice recognition and thus improving operational accuracy of an electric equipment. In addition, a voice recognition rate is increased, thereby improving user satisfaction.	01-15-2015
20150025880	Method for Processing Speech Signals Using an Ensemble of Speech Enhancement Procedures - A method processes an acoustic signal that is a mixture of a target signal and interfering signals by first enhancing the acoustic signal by a set of enhancement procedures to produce a set of initial enhanced signals. Then, an ensemble learning procedure is applied to the acoustic signal and the set of initial enhancement signals to produce features of the acoustic signal.	01-22-2015
20150025881	SPEECH SIGNAL SEPARATION AND SYNTHESIS BASED ON AUDITORY SCENE ANALYSIS AND SPEECH MODELING - Provided are systems and methods for generating clean speech from a speech signal representing a mixture of a noise and speech. The clean speech may be generated from synthetic speech parameters. The synthetic speech parameters are derived based on the speech signal components and a model of speech using auditory and speech production principles. The modeling may utilize a source-filter structure of the speech signal. One or more spectral analyses on the speech signal are performed to generate spectral representations. The feature data is derived based on a spectral representation. The features corresponding to the target speech according to a model of speech are grouped and separated from the feature data. The synthetic speech parameters, including spectral envelope, pitch data and voice classification data are generated based on features corresponding to the target speech.	01-22-2015
20150032446	METHOD AND SYSTEM FOR SIGNAL TRANSMISSION CONTROL - An audio signal with a temporal sequence of blocks or frames is received or accessed. Features are determined as characterizing aggregately the sequential audio blocks/frames that have been processed recently, relative to current time. The feature determination exceeds a specificity criterion and is delayed, relative to the recently processed audio blocks/frames. Voice activity indication is detected in the audio signal. VAD is based on a decision that exceeds a preset sensitivity threshold and is computed over a brief time period, relative to blocks/frames duration, and relates to current block/frame features. The VAD and the recent feature determination are combined with state related information, which is based on a history of previous feature determinations that are compiled from multiple features, determined over a time prior to the recent feature determination time period. Decisions to commence or terminate the audio signal, or related gains, are outputted based on the combination.	01-29-2015
20150032447	Determining a Harmonicity Measure for Voice Processing - A method, an apparatus, and a computer-readable medium configured with instructions that when executed carry out the method for determining a measure of harmonicity. In one embodiment the method includes selecting candidate fundamental frequencies within a range, and for candidate determining a mask or retrieving a pre-calculated mask that has positive value for each frequency that contributed to harmonicity, and negative value for each frequency that contributes to inharmonicity. A candidate harmonicity measure is calculated for each candidate fundamental by summing the product of the mask and the magnitude measure spectrum. The harmonicity measure is selected as the maximum of the candidate harmonicity measures.	01-29-2015
20150039303	SPEECH RECOGNITION - A speech recognition system comprises: an input, for receiving an input signal from at least one microphone; a first buffer, for storing the input signal; a noise reduction block, for receiving the input signal and generating a noise reduced input signal; a speech recognition engine, for receiving either the input signal output from the first buffer or the noise reduced input signal from the noise reduction block; and a selection circuit for directing either the input signal output from the first buffer or the noise reduced input signal from the noise reduction block to the speech recognition engine.	02-05-2015
20150039304	Voice Activity Detection Using A Soft Decision Mechanism - Voice activity detection (VAD) is an enabling technology for a variety of speech based applications. Herein disclosed is a robust VAD algorithm that is also language independent. Rather than classifying short segments of the audio as either “speech” or “silence”, the VAD as disclosed herein employees a soft-decision mechanism. The VAD outputs a speech-presence probability, which is based on a variety of characteristics.	02-05-2015
20150039305	CONTROLLER FOR VOICE-CONTROLLED DEVICE AND ASSOCIATED METHOD - A controller for a voice-controlled device is provided. The controller includes a setting module and a recognition module. The setting module generates a threshold according to an environmental parameter. The recognition module compares a confident score of speech recognition with the threshold to accordingly execute voice control.	02-05-2015
20150058002	Detecting Wind Noise In An Audio Signal - A method of detecting wind noise in an audio signal includes calculating a power spectrum of the current frame, evaluating whether the current frame is non-stationary, evaluating whether an energy content of the current frame is concentrated at low frequencies and evaluating whether a periodicity is present in the power spectrum. The method further includes determining the presence of wind noise without speech in the current frame if the current frame is non-stationary, the energy content is concentrated at low frequencies, and a periodicity is not present. The periodicity of the power spectrum is analyzed using cepstrum coefficients. An improved wind noise detection may be achieved by analyzing the spectral characteristics of recorded audio signals.	02-26-2015
20150058003	SPEECH RECOGNITION SYSTEM - A system includes a speech recognition processor, a depth sensor coupled to the speech recognition processor, and an array of microphones coupled to the speech recognition processor. The depth sensor is operable to calculate a distance and a direction from the array of microphones to a source of audio data. The speech recognition processor is operable to select an acoustic model as a function of the distance and the direction from the array of microphones to the source of audio data. The speech recognition processor is operable to apply the distance measure in the microphone array beam formation so as to boost portions of the signals originating from the source of audio data and to suppress portions of the signals resulting from noise.	02-26-2015
20150058004	AUGMENTED MULTI-TIER CLASSIFIER FOR MULTI-MODAL VOICE ACTIVITY DETECTION - Disclosed herein are systems, methods, and computer-readable storage media for detecting voice activity in a media signal in an augmented, multi-tier classifier architecture. A system configured to practice the method can receive, from a first classifier, a first voice activity indicator detected in a first modality for a human subject. Then, the system can receive, from a second classifier, a second voice activity indicator detected in a second modality for the human subject, wherein the first voice activity indicator and the second voice activity indicators are based on the human subject at a same time, and wherein the first modality and the second modality are different. The system can concatenate, via a third classifier, the first voice activity indicator and the second voice activity indicator with original features of the human subject, to yield a classifier output, and determine voice activity based on the classifier output.	02-26-2015
20150066497	Cloud Based Adaptive Learning for Distributed Sensors - A low power sound recognition sensor is configured to receive an analog signal that may contain a signature sound. Sound parameter information is extracted from the analog signal and compared to a sound parameter reference stored locally with the sound recognition sensor to detect when the signature sound is received in the analog signal. A trigger signal is generated when a signature sound is detected. A portion of the extracted sound parameter information is sent to a remote training location for adaptive training when a signature sound detection error occurs. An updated sound parameter reference from the remote training location is received in response to the adaptive training.	03-05-2015
20150066498	Analog to Information Sound Signature Detection - A low power sound recognition sensor is configured to receive an analog signal that may contain a signature sound. The received analog signal is evaluated using a detection portion of the analog section to determine when background noise on the analog signal is exceeded. A feature extraction portion of the analog section is triggered to extract sparse sound parameter information from the analog signal when the background noise is exceeded. An initial truncated portion of the sound parameter information is compared to a truncated sound parameter database stored locally with the sound recognition sensor to detect when there is a likelihood that the expected sound is being received in the analog signal. A trigger signal is generated to trigger classification logic when the likelihood that the expected sound is being received exceeds a threshold value.	03-05-2015
20150066499	MONAURAL SPEECH FILTER - A system receives monaural sound which includes speech and background noises. The received sound is divided by frequency and time into time-frequency units (TFUs). Each TFU is classified as speech or non-speech by a processing unit. The processing unit for each frequency range includes at least one of a deep neural network (DNN) or a linear support vector machine (LSVM). The DNN extracts and classifies the features of the TFU and includes a pre-trained stack of Restricted Boltzmann Machines (RBM), and each RBM includes a visible and a hidden layer. The LSVM classifies each TFU based on extracted features from the DNN, including those from the visible layer of the first RBM, and those from the hidden layer of the last RBM in the stack. The LSVM and DNN include training with a plurality of training noises. Each TFU classified as speech is output.	03-05-2015
20150066500	SPEECH PROCESSING DEVICE, SPEECH PROCESSING METHOD, AND SPEECH PROCESSING PROGRAM - A speech processing device includes a speech recognition unit configured to sequentially recognize recognition segments from an input speech, a reverberation influence storage unit configured to store a degree of reverberation influence indicating an influence of a reverberation based on a preceding speech to a subsequent speech subsequent to the preceding speech and a recognition segment group including a plurality of recognition segments in correlation with each other, a reverberation influence selection unit configured to select the degree of reverberation influence corresponding to the recognition segment group which includes the plurality of recognition segments recognized by the speech recognition unit from the reverberation influence storage unit, and a reverberation reduction unit configured to remove a reverberation component weighted with the degree of reverberation influence from the speech from which at least a part of recognition segments of the recognition segment group is recognized.	03-05-2015
20150073787	VOICE FILTERING METHOD, APPARATUS AND ELECTRONIC EQUIPMENT - Embodiments of the present invention provide a voice filtering method and apparatus and electronic equipment. The voice filtering method includes: determining a reference spectral characteristic to which a voice characteristic of a subscriber to be analyzed corresponds; and filtering an input sound signal according to the reference spectral characteristic. With the embodiments of the present invention, transmission effects of voices for different subscribers to be analyzed can be enhanced by using voice characteristics of the subscribers to be analyzed, so as to more efficiently transmit voice information.	03-12-2015
20150106087	Efficient Discrimination of Voiced and Unvoiced Sounds - A method is disclosed for discriminating voiced and unvoiced sounds in speech. The method detects characteristic waveform features of voiced and unvoiced sounds, by applying integral and differential functions to the digitized sound signal in the time domain. Laboratory tests demonstrate extremely high reliability in separating voiced and unvoiced sounds. The method is very fast and computationally efficient. The method enables voice activation in resource-limited and battery-limited devices, including mobile devices, wearable devices, and embedded controllers. The method also enables reliable command identification in applications that recognize only predetermined commands. The method is suitable as a pre-processor for natural language speech interpretation, improving recognition and responsiveness. The method enables realtime coding or compression of speech according to the sound type, improving transmission efficiency.	04-16-2015
20150106088	SPEECH PROCESSING - A technique for enhancing speech signal captured in a noisy environment is provided. According an example embodiment, the technique comprises obtaining a current time frame of a noise-suppressed voice signal, derived on basis of a current time frame of a source audio signal comprising a source voice signal, detecting input voice characteristics for the current time frame of noise-suppressed voice signal, obtaining reference voice characteristics for said current time frame, said reference voice characteristics being descriptive of the source voice signal in noise-free or low-noise environment, and creating a current time frame of a modified voice signal by modifying said current time frame of the noise-suppressed voice signal in response to a difference between the detected input voice characteristic and the reference voice characteristics exceeding a predetermined threshold.	04-16-2015
20150112671	Headset Interview Mode - Methods and apparatuses for headsets are disclosed. In one example, a headset includes a processor, a communications interface, a user interface, and a speaker. The headset includes a microphone array including two or more microphones arranged to detect sound and output two or more microphone output signals. The headset further includes a memory storing an application executable by the processor configured to operate the headset in a first mode utilizing a first set of signal processing parameters to process the two or more microphone output signals and operate the headset in a second mode utilizing a second set of signal processing parameters to process the two or more microphone output signals.	04-23-2015
20150112672	VOICE QUALITY ENHANCEMENT TECHNIQUES, SPEECH RECOGNITION TECHNIQUES, AND RELATED SYSTEMS - An echo canceller can be arranged to receive an input signal and to receive a reference signal. The echo canceller can subtract a linear component of the reference signal from the input signal. A noise suppressor can suppress non-linear effects of the reference signal in the input signal in correspondence with a large number of selectable parameters. Such suppression can be provided on a frequency-by-frequency basis, with a unique set of tunable parameters selected for each frequency. A degree of suppression provided by the noise suppressor can correspond to an estimate of residual echo remaining after the one or more linear components of the reference signal have been subtracted from the input signal, to an estimated double-talk probability, and to an estimated signal-to-noise ratio of near-end speech in the input signal for each respective frequency. A speech recognizer can receive a processed input signal from the noise suppressor.	04-23-2015
20150112673	Acoustic Activity Detection Apparatus and Method - Streaming audio is received. The streaming audio includes a frame having plurality of samples. An energy estimate is obtained for the plurality of samples. The energy estimate is compared to at least one threshold. In addition, a band pass estimate of the signal is determined. An energy estimate is obtained for the band-passed plurality of samples. The two energy estimates are compared to at least one threshold each. Based upon the comparison operation, a determination is made as to whether speech is detected.	04-23-2015
20150120292	Method for Identifying Speech and Music Components of a Sound Signal - The present invention relates to means and methods of automated difference recognition between speech and music signals in voice communication systems, devices, telephones, and methods, and more specifically, to systems, devices, and methods that automate control when either speech or music is detected over communication links. The present invention provides a novel system and method for monitoring the audio signal, analyze selected audio signal components, compare the results of analysis with a pre-determined threshold value, and classify the audio signal either as speech or music.	04-30-2015
20150127338	CO-TALKER NULLING FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS - Speech from a driver and speech from a passenger in a vehicle is selected directionally using two or more microphones. Samples of speech from a driver and picked up by a first microphone are delayed until samples of the speech picked up by a second microphone is in phase with the speech picked up by the first microphone. Samples of a passenger's speech and picked up by the second microphone are delayed until samples of the passenger's speech picked up by the first microphone are out-of-phase with the speech picked up by the second microphone.	05-07-2015
20150142432	Ambient Condition Detector with Processing of Incoming Audible Commands Followed by Speech Recognition - A monitoring system includes at least one detector having a housing which carries at least one ambient condition sensor. Control circuits carried by the housing are coupled to the sensor. A separate audio input transducer carried by the housing is coupled to the control circuits, wherein the circuits include signal processing circuits, coupled to the transducer. The signal processing circuits cancel predetermined audio received from the transducer and output a processed speech signal which is coupled to speech recognition circuitry. The speech recognition circuitry recognizes selected speech to implement predetermined functions.	05-21-2015
20150149166	METHOD AND APPARATUS FOR DETECTING SPEECH/NON-SPEECH SECTION - Provided is an apparatus for detecting a speech/non-speech section. The apparatus includes an acquisition unit which obtains inter-channel relation information of a stereo audio signal, a classification unit which classifies each element of the stereo audio signal into a center channel element and a surround element on the basis of the inter-channel relation information, a calculation unit which calculates an energy ratio value between a center channel signal composed of center channel elements and a surround channel signal composed of surround elements, for each frame, and an energy ratio value between the stereo audio signal and a mono signal generated on the basis of the stereo audio signal, and a judgment unit which determines a speech section and a non-speech section from the stereo audio signal by comparing the energy ratio values.	05-28-2015
20150294667	NOISE CANCELLATION APPARATUS AND METHOD - Disclosed herein is a noise cancellation apparatus and method, which select in advance parameters to be used for noise cancellation in a reference voice signal section by generating a reference voice signal in advance before a voice signal is generated, thus improving noise cancellation effects. The noise cancellation apparatus includes a parameter initialization unit for determining an initial value of a parameter to be used for noise cancellation, based on reference signals filtered for respective frequencies, a parameter estimation unit for receiving the initial value of the parameter, and estimating the parameter in response to signals that are input after being filtered for respective frequencies, a gain estimation unit for calculating gains for respective frequencies based on the parameter from the parameter estimation unit, and a gain application unit for cancelling noise by applying the gains to the signals that are input after being filtered for respective frequencies.	10-15-2015
20150302853	APPARATUS AND METHOD TO CLASSIFY SOUND TO DETECT SPEECH - Audio frames are classified as either speech, non-transient background noise, or transient noise events. Probabilities of speech or transient noise event, or other metrics may be calculated to indicate confidence in classification. Frames classified as speech or noise events are not used in updating models (e.g., spectral subtraction noise estimates, silence model, background energy estimates, signal-to-noise ratio) of non-transient background noise. Frame classification affects acceptance/rejection of recognition hypothesis. Classifications and other audio related information may be determined by circuitry in a headset, and sent (e.g., wirelessly) to a separate processor-based recognition device.	10-22-2015
20150317998	METHOD AND APPARATUS FOR RECOGNIZING SPEECH, AND METHOD AND APPARATUS FOR GENERATING NOISE-SPEECH RECOGNITION MODEL - An apparatus and method for recognizing a speech, and an apparatus and method for generating a noise-speech recognition model are provided. The speech recognition apparatus includes a location determiner configured to determine a location of the apparatus, a noise model generator configured to generate a noise model corresponding to the location by collecting noise data related to the location, and a noise model transmitter configured to transmit the noise model to a server.	11-05-2015
20150348545	VOICE FOCUS ENABLED BY PREDETERMINED TRIGGERS - Provided are techniques for voice focus enabled by predetermined triggers. Voice recognition is used to identify one or more pre-determined triggers from a voice of a speaker. In response to identifying the one or more pre-determined triggers, a voice recognition template is dynamically created for the voice of the speaker, and the voice recognition template and voice isolation are used to focus on the voice from the speaker.	12-03-2015
20150348546	AUDIO PROCESSING APPARATUS AND AUDIO PROCESSING METHOD - An audio processing apparatus and an audio processing method are described. In one embodiment, the audio processing apparatus include an audio masker separator for separating from a first audio signal an audio material comprising a sound other than stationary noise and utterance meaningful in semantics, as an audio masker candidate. The apparatus also includes a first context analyzer for obtaining statistics regarding contextual information of detected audio masker candidates, and a masker library builder for building a masker library or updating an existing masker library by adding, based on the statistics, at least one audio masker candidate as an audio masker into the masker library, wherein audio maskers in the maker library are used to be inserted into a target position in a second audio signal to conceal defects in the second audio signal.	12-03-2015
20150348553	VOICE FOCUS ENABLED BY PREDETERMINED TRIGGERS - Provided are techniques for voice focus enabled by predetermined triggers. Voice recognition is used to identify one or more pre-determined triggers from a voice of a speaker. In response to identifying the one or more pre-determined triggers, a voice recognition template is dynamically created for the voice of the speaker, and the voice recognition template and voice isolation are used to focus on the voice from the speaker.	12-03-2015
20150355649	HOME AUTOMATION CONTROL SYSTEM - A smart premises controller unit for processing data received from a plurality of sensors and for controlling and monitoring one or more pieces of equipment in at least one room in a premises responsive to the processing includes an alarm resolver to activate an alarm, a climate resolver to control a climate in the room and a presence resolver to determine at least the presence of a human in the room. Each resolver receives input from a subset of the plurality of sensors and each the resolver comprises a sensor processor and scorer for at least one of its associated subset of sensors. Each resolver comprises a set of models for each of its associated sensors according to its type of resolver. Each sensor processor and scorer matches its sensor data against its models to produce a score for each of its associated models.	12-10-2015
20150364136	ADAPTIVE BEAM FORMING DEVICES, METHODS, AND SYSTEMS - Devices, methods, systems, and computer-readable media for adaptive beam forming are described herein. One or more embodiments include a method for adaptive beam forming, comprising: receiving a voice command at a number of microphones, determining an instruction based on the received voice command, calculating a confidence level of the determined instruction, determining feedback based on the confidence level of the determined instruction, and altering a beam of the number of microphones based on the feedback.	12-17-2015
20150364137	SPATIAL AUDIO DATABASE BASED NOISE DISCRIMINATION - Methods, systems, and computer-readable and executable instructions for spatial audio database based noise discrimination are described herein. For example, one or more embodiments include comparing a sound received from a plurality of microphones to a spatial audio database, discriminating a speech command and a background noise from the received sound based on the comparison to the spatial audio database, and determining an instruction based on the discriminated speech command.	12-17-2015
20150371634	TERMINAL AND SERVER OF SPEAKER-ADAPTATION SPEECH-RECOGNITION SYSTEM AND METHOD FOR OPERATING THE SYSTEM - Provided are a terminal and server of a speaker-adaptation speech-recognition system and a method for operating the system. The terminal in the speaker-adaptation speech-recognition system includes a speech recorder which transmits speech data of a speaker to a speech-recognition server, a statistical variable accumulator which receives a statistical variable including acoustic statistical information about speech of the speaker from the speech-recognition server which recognizes the transmitted speech data, and accumulates the received statistical variable, a conversion parameter generator which generates a conversion parameter about the speech of the speaker using the accumulated statistical variable and transmits the generated conversion parameter to the speech-recognition server, and a result displaying user interface which receives and displays result data when the speech-recognition server recognizes the speech data of the speaker using the transmitted conversion parameter and transmits the recognized result data.	12-24-2015
20150371639	DYNAMIC THRESHOLD FOR SPEAKER VERIFICATION - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for a dynamic threshold for speaker verification are disclosed. In one aspect, a method includes the actions of receiving, for each of multiple utterances of a hotword, a data set including at least a speaker verification confidence score, and environmental context data. The actions further include selecting from among the data sets, a subset of the data sets that are associated with a particular environmental context. The actions further include selecting a particular data set from among the subset of data sets based on one or more selection criteria. The actions further include selecting, as a speaker verification threshold for the particular environmental context, the speaker verification confidence score. The actions further include providing the speaker verification threshold for use in performing speaker verification of utterances that are associated with the particular environmental context.	12-24-2015
20150379989	VOICE-CONTROLLED INFORMATION EXCHANGE PLATFORM, SUCH AS FOR PROVIDING INFORMATION TO SUPPLEMENT ADVERTISING - A system with an associated method for preloading advertisements by a server to a user's device is disclosed. In response to inquiries made by members of a user group, the system presents advertisements to the members and keeps a record of these presentations. Next, the system identifies those advertisements which have been frequently presented to the members, and preloads the identified advertisements on the device of a user who belongs to the user group. Subsequently, upon receiving a specific inquiry from the user's device, the system determines a response to the specific inquiry. When the determined response contains one of the preloaded advertisements, the system sends an instruction to the user's device to present the preloaded advertisement to the user.	12-31-2015
20150379990	DETECTION AND ENHANCEMENT OF MULTIPLE SPEECH SOURCES - A new method for enhancing the speech of multiple speakers in an enclosure (e.g., home, office, etc) using a microphone array is developed. In the method, the direction of arrival of speech sources and non-speech sources are determined and a beamformer-response mask to enhance and suppress the desired and non-desired acoustic sources, respectively, is constructed. To obtain a beamformer that closely approximates the mask, combinations of pre-computed beamformers are optimally combined together.	12-31-2015
20160012819	Server-Side ASR Adaptation to Speaker, Device and Noise Condition via Non-ASR Audio Transmission	01-14-2016
20160019026	DISTINGUISHING SPEECH FROM MULTIPLE USERS IN A COMPUTER INTERACTION - Speech from multiple users is distinguished. In one example, an apparatus has a sensor to determine a position of a speaker, a microphone array to receive audio from the speaker and from other simultaneous audio sources, and a processor to select a pre-determined filter based on the determined position and to apply the selected filter to the received audio to separate the audio from the speaker from the audio from the other simultaneous audio sources.	01-21-2016
20160019890	Vehicle State-Based Hands-Free Phone Noise Reduction With Learning Capability - This disclosure generally relates to a system, apparatus, and method for achieving a vehicle state-based hands free noise reduction feature. A noise reduction tool is provided for applying a noise reduction strategy on a sound input that uses machine learning to develop future noise reduction strategies, where the noise reduction strategies include analyzing vehicle operational state information and external information that are predicted to contribute to cabin noise and selecting noise reducing pre-filter options based on the analysis. The machine learning may further be supplemented by off-line training to generate a speech quality performance measure for the sound input that may be referenced by the noise reduction tool for further noise reduction strategies.	01-21-2016
20160027438	Concurrent Segmentation of Multiple Similar Vocalizations - Various implementations disclosed herein include a training module configured to concurrently segment a plurality of vocalization instances of a voiced sound pattern (VSP) as vocalized by a particular speaker, who is identifiable by a corresponding set of vocal characteristics. Aspects of various implementations are used to determine a concurrent segmentation of multiple similar instances of a VSP using a modified hierarchical agglomerative clustering (HAC) process adapted to jointly and simultaneously segment multiple similar instances of the VSP. Information produced from multiple instances of a VSP vocalized by a particular speaker characterize how the particular speaker vocalizes the VSP and how those vocalizations may vary between instances. In turn, in some implementations, the information produced using the modified HAC process is sufficient to determine more a reliable detection (and/or matching) threshold metric(s) for detecting and matching the VSP as vocalized by the particular speaker.	01-28-2016
20160055846	METHOD AND APPARATUS FOR SPEECH RECOGNITION USING UNCERTAINTY IN NOISY ENVIRONMENT - A method for speech recognition in accordance with the present invention includes: extracting a speech feature from an inputted speech signal; estimating a noise component of the speech signal; compensating the extracted speech feature by use of the estimated noise component; transforming a given acoustic model based on the extracted speech feature, the compensated speech feature, and the noise component; and performing speech recognition by use of the compensated speech feature and the transformed acoustic model.	02-25-2016
20160063997	Multi-Sourced Noise Suppression - Systems and methods for multi-sourced noise suppression are provided. An example system may receive streams of audio data including a voice signal and noise, the voice signal including a spoken word. The streams of audio data are provided by distributed audio devices. The system can assign weights to the audio streams based at least partially on quality of the audio streams. The weights of audio streams can be determined based on signal-to-noise ratios (SNRs). The system may further process, based on the weights, the audio stream to generate cleaned speech. Each audio device comprises microphone(s) and can be associated with the Internet of Things (IoT), such that the audio devices are Internet of Things devices. The processing can include noise suppression and reduction and echo cancellation. The cleaned speech can be provided to a remote device for further processing which may include Automatic Speech Recognition (ASR).	03-03-2016
20160064000	SOUND SOURCE-SEPARATING DEVICE AND SOUND SOURCE -SEPARATING METHOD - A sound source-separating device includes a sound-collecting part, an imaging part, a sound signal-evaluating part, an image signal-evaluating part, a selection part that selects whether to estimate a sound source direction based on the first sound signal or the first image signal, a person position-estimating part that estimates a sound source direction using the first image signal, a sound source direction-estimating part that estimates a sound source direction, a sound source-separating part that extracts a second sound signal corresponding to the sound source direction from the first sound signal, an image-extracting part that extracts a second image signal of an area corresponding to the estimated sound source direction from the first image signal, and an image-combining part that changes a third image signal of an area other than the area for the second image signal and combines the third image signal with the second image signal.	03-03-2016
20160071526	ACOUSTIC SOURCE TRACKING AND SELECTION - The present disclosure relates generally to improving acoustic source tracking and selection and, more particularly, to techniques for acoustic source tracking and selection using motion or position information. Embodiments of the present disclosure include systems designed to select and track acoustic sources. In one embodiment, the system may be realized as an integrated circuit including a microphone array, motion sensing circuitry, position sensing circuitry, analog-to-digital converter (ADC) circuitry configured to convert analog audio signals from the microphone array into digital audio signals for further processing, and a digital signal processor (DSP) or other circuitry for processing the digital audio signals based on motion data and other sensor data. Sensor data may be correlated to the analog or digital audio signals to improve source separation or other audio processing.	03-10-2016
20160071529	SIGNAL PROCESSING APPARATUS, SIGNAL PROCESSING METHOD, SIGNAL PROCESSING PROGRAM - This invention provides a signal processing apparatus for improving the speech determination accuracy in an input sound. The signal processing apparatus includes a transformer that transforms an input signal into an amplitude component signal in a frequency domain, a calculator that calculates a norm of a change in the amplitude component signal in a frequency direction, an accumulator that accumulates the norm of the change in the amplitude component signal in the frequency direction calculated by the calculator, and an analyzer that analyzes speech in the input signal in accordance with an accumulated value of the norm of the change in the amplitude component signal in the frequency direction calculated by the accumulator.	03-10-2016
20160078884	METHOD AND BACKGROUND ESTIMATOR FOR VOICE ACTIVITY DETECTION - The present invention relates to a method and a background estimator in voice activity detector for updating a background noise estimate for an input signal. The input signal for a current frame is received and it is determined whether the current frame of the input signal comprises non-noise. Further, an additional determination is performed whether the current frame of the non-noise input comprises noise by analyzing characteristics at least related to correlation and energy level of the input signal, and background noise estimate is updated if it is determined that the current frame comprises noise.	03-17-2016
20160086602	SOUND SIGNAL PROCESSING METHOD, AND SOUND SIGNAL PROCESSING APPARATUS AND VEHICLE EQUIPPED WITH THE APPARATUS - A sound signal processing method, the sound signal processing apparatus and the vehicle equipped with the apparatus, in which the sound signal processing apparatus includes a spatial filtering unit configured to obtain a filtered signal including a target signal by a spatial filtering by applying a spatial filter to an input signal, and a mask application unit configured to obtain an output signal by applying a mask to the filtered signal. The mask may be obtained by using a spatial selectivity between the target signal and noise of the target signal.	03-24-2016
20160098989	SYSTEM AND METHOD FOR PROCESSING AN AUDIO SIGNAL CAPTURED FROM A MICROPHONE - A system and method for processing an audio signal captured from a microphone may reproduce a known audio signal with an audio transducer into an acoustic space. The known audio signal may include content from one or more audio sources. A microphone audio signal may be captured from the acoustic space where the microphone audio signal comprises the known audio signal and one or more unknown audio signals. Processing control information may be accessed. The known audio signal may be reduced in the microphone audio signal responsive to the processing control information where the processing control information indicates one or more characteristics of a downstream audio processor that processes the microphone audio signal.	04-07-2016
20160099008	HEARING DEVICE COMPRISING A LOW-LATENCY SOUND SOURCE SEPARATION UNIT - The application relates to a hearing device comprising a) an input unit for delivering a time varying electric input signal representing an audio signal comprising at least two sound sources, b) a cyclic analysis buffer unit of length A adapted for storing the last A audio samples, c) a cyclic synthesis buffer unit of length, where L is smaller than A, adapted for storing the last L audio samples, which are intended to be separated in individual sound sources, d) a database having stored recorded sound examples from said at least two sound sources, each entry in the database being termed an atom, the atoms originating from audio samples from first and second buffers corresponding in size to said synthesis and analysis buffer units, where for each atom, the audio samples from the first buffer overlaps with the audio samples from the second buffer, and where atoms originating from the first buffer constitute a reconstruction dictionary, and where atoms originating from the second buffer constitute an analysis dictionary. The application further relates to a method of separating audio sources, and e) a sound source separation unit for separating said electric input signal to provide separated signals representing said at least two sound sources, the sound source separation unit being configured to determine the most optimal representation (W) of the last A samples given the atoms in the analysis dictionary of the database, and to generate said at least two sound sources by combining atoms in the reconstruction dictionary of the database using the optimal representation (W). The invention may e.g. be used for hearing devices, e.g. hearing aids, headsets, ear phones, active ear protection systems, handsfree telephone systems, mobile telephones, teleconferencing systems, public address systems, classroom amplification systems, etc.	04-07-2016
20160118042	SELECTIVE NOISE SUPPRESSION DURING AUTOMATIC SPEECH RECOGNITION - An automatic speech recognition engine and a method of using the engine is described. The method pertains to front-end processing an audio signal and includes the steps of: identifying a plurality of voiced-frames of the audio signal; determining that one or more of the plurality of voiced-frames have a signal-to-noise (SNR) value greater than a first predetermined threshold; and based on the determination, bypassing noise suppression for the one or more of the plurality of voiced-frames.	04-28-2016
20160118046	Location-Based Conversational Understanding - Location-based conversational understanding may be provided. Upon receiving a query from a user, an environmental context associated with the query may be generated. The query may be interpreted according to the environmental context. The interpreted query may be executed and at least one result associated with the query may be provided to the user.	04-28-2016
20160118063	DEEP TAGGING BACKGROUND NOISES - In a method for deep tagging a recording, a computer records audio comprising speech from one or more people. The computer detects a non-speech sound within the audio. The computer determines that the non-speech sound corresponds to a type of sound, and in response, associates a descriptive term with a time of occurrence of the non-speech sound within the recorded audio to form a searchable tag. The computer stores the searchable tag as metadata of the recorded audio.	04-28-2016
20160133252	VOICE RECOGNITION DEVICE AND METHOD IN VEHICLE - A voice recognition system in a vehicle includes: a first microphone mounted in the vehicle that collects voice data of an occupant of the vehicle; a second microphone provided in a mobile device of the occupant that collects voice data of the occupant; and a voice recognition device connected to the mobile device through local wireless communication including a noise elimination portion eliminating noise in the voice data collected by the first microphone or the second microphone and a voice recognition portion performing voice recognition using the voice data from which noise is eliminated by the noise elimination portion.	05-12-2016
20160133269	SYSTEM AND METHOD FOR IMPROVING NOISE SUPPRESSION FOR AUTOMATIC SPEECH RECOGNITION - Method for improving noise suppression for ASR starts with a microphone receiving an audio signal including speech signal and noise signal. In each frame for frequency band of audio signal, a noise estimator detects ambient noise level and generates noise estimate value based on estimated ambient noise level, variable noise suppression target controller generates suppression target value using noise estimate value and logistic function, a gain value calculator generates a gain value based on suppression target value and noise estimate value, and combiner enhances the audio signal by the gain value to generate a clean audio signal in each frame for all frequency bands. Logistic function models desired noise suppression level that varies based on ambient noise level. Variable level of noise suppression includes low attenuation for low noise levels and progressively higher attenuation for higher noise level. Other embodiments are also described.	05-12-2016
20160140954	SPEECH RECOGNITION SYSTEM AND SPEECH RECOGNITION METHOD - A speech recognition system includes a collector for collecting speech data of a speaker, an articulation pattern classifier for extracting feature points of the speech data of the speaker and selecting an articulation pattern model corresponding to the feature points, a parameter tuner for tuning a parameter which is a reference for recognizing a speech command by using the selected articulation pattern model, and a speech recognition engine for recognizing the speech command of the speaker based on the tuned parameter.	05-19-2016
20160148614	SPEECH RECOGNITION SYSTEM AND SPEECH RECOGNITION METHOD - A speech recognition system includes a transfer function storage storing a vehicle transfer function, which represents an acoustic environment in a vehicle and frequency response characteristic of a microphone; a signal-to-noise ratio (SNR) estimator estimating an SNR of an input signal received from the microphone; a speech section determiner determining a speech section to which the vehicle transfer function is applied based on the SNR; a frequency pattern extractor extracting a feature pattern of the speech signal of which the frequency distortion is compensated; and a speech recognition engine recognizing a speech command by using the feature pattern.	05-26-2016
20160155441	Computer Implemented System and Method for Identifying Significant Speech Frames Within Speech Signals	06-02-2016
20160171976	VOICE WAKEUP DETECTING DEVICE WITH DIGITAL MICROPHONE AND ASSOCIATED METHOD	06-16-2016
20160180843	SYSTEM AND METHOD OF USING NEURAL TRANSFORMS OF ROBUST AUDIO FEATURES FOR SPEECH PROCESSING	06-23-2016
20160189716	SPEECH RECOGNITION WAKE-UP OF A HANDHELD PORTABLE ELECTRONIC DEVICE - A system and method for parallel speech recognition processing of multiple audio signals produced by multiple microphones in a handheld portable electronic device. In one embodiment, a primary processor transitions to a power-saving mode while an auxiliary processor remains active. The auxiliary processor then monitors the speech of a user of the device to detect a wake-up command by speech recognition processing the audio signals in parallel. When the auxiliary processor detects the command it then signals the primary processor to transition to active mode. The auxiliary processor may also identify to the primary processor which microphone resulted in the command being recognized with the highest confidence. Other embodiments are also described.	06-30-2016
20160189730	SPEECH SEPARATION METHOD AND SYSTEM - An example of the present invention discloses a speech separation method and a system, the method comprises: receiving a mixture speech signal to be separated; extracting a speech feature of the mixture speech signal; inputting the extracted speech feature of the mixture speech signal into a regression model for speech separation, obtaining an estimated speech feature of a target speech signal; synthesizing to obtain the target speech signal according to the estimated speech feature. Speech separation effect can be improved effectively using the present invention.	06-30-2016
20160203833	Voice Activity Detection Method and Device	07-14-2016
20160253994	SYSTEM AND METHOD FOR CALIBRATING A SPEECH RECOGNITION SYSTEM TO AN OPERATING ENVIRONMENT	09-01-2016
20160253995	VOICE RECOGNITION METHOD, VOICE RECOGNITION DEVICE, AND ELECTRONIC DEVICE	09-01-2016
20160379662	METHOD, APPARATUS AND SERVER FOR PROCESSING NOISY SPEECH - According to an embodiment, a power spectrum iteration factor is determined according to a noisy speech and a background noise, and a moving average power spectrum of the speech is obtained according to the power spectrum iteration factor. A server is able to trace the noisy speech according to the power spectrum iteration factor.	12-29-2016
20160379670	METHOD FOR DETECTING AUDIO SIGNAL AND APPARATUS - Embodiments disclosed herein provide a method for detecting an audio signal and an apparatus, where the method includes determining an input audio signal as a to-be-determined audio signal; determining an enhanced segmental signal-to-noise ratio (SSNR) of the audio signal, where the enhanced SSNR is greater than a reference SSNR; and comparing the enhanced SSNR with a voice activity detection (VAD) decision threshold to determine whether the audio signal is an active signal. According to the method and the apparatus provided in the embodiments, an active voice and an inactive voice can be accurately distinguished.	12-29-2016
20170236528	AUDIO PROCESSING CIRCUIT AND METHOD FOR REDUCING NOISE IN AN AUDIO SIGNAL	08-17-2017
20190146747	ROBUST VOICE ACTIVITY DETECTOR SYSTEM FOR USE WITH AN EARPHONE	05-16-2019
20190147870	INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING METHOD	05-16-2019

Patent applications in class Detect speech in noise

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Detect speech in noise

Subclass of:

704 - Data processing: speech signal processing, linguistics, language translation, and audio compression/decompression

704200000 - SPEECH SIGNAL PROCESSING

704231000 - Recognition

Patent class list (only not empty are listed)

Deeper subclasses: