Patent application number | Description | Published |
20110268315 | Scalable Media Fingerprint Extraction - Derivation of a fingerprint includes generating feature matrices based on one or more training images, generating projection matrices based on the feature matrices in a training process, and deriving a fingerprint for one or more images by, at least in part, projecting a feature matrix based on the one or more images onto the projection matrices generated in the training process. | 11-03-2011 |
20120046772 | Low Complexity Auditory Event Boundary Detection - An auditory event boundary detector employs down-sampling of the input digital audio signal without an anti-aliasing filter, resulting in a narrower bandwidth intermediate signal with aliasing. Spectral changes of that intermediate signal, indicating event boundaries, may be detected using an adaptive filter to track a linear predictive model of the samples of the intermediate signal. Changes in the magnitude or power of the filter error correspond to changes in the spectrum of the input audio signal. The adaptive filter converges at a rate consistent with the duration of auditory events, so filter error magnitude or power changes indicate event boundaries. The detector is much less complex than methods employing time-to-frequency transforms for the full bandwidth of the audio signal. | 02-23-2012 |
20120237047 | NONLINEAR REFERENCE SIGNAL PROCESSING FOR ECHO SUPPRESSION - An echo suppression system and method, and a computer-readable storage medium that is configured with instructions that when executed carry out echo suppression. Each of the system and the method includes the elements of a linear echo suppressor having a reference signal path, with a nonlinearity introduced in the reference signal path to introduce energy in spectral bands. Unlike an echo canceller, the echo suppression system and method are relatively robust to errors in the introduced nonlinearity. | 09-20-2012 |
20120308038 | Sound Source Localization Apparatus and Method - Sound source localization apparatuses and methods are described. A frame amplitude difference vector is calculated based on short time frame data acquired through an array of microphones. The frame amplitude difference vector reflects differences between amplitudes captured by microphones of the array during recording the short time frame data. Similarity between the frame amplitude difference vector and each of a plurality of reference frame amplitude difference vectors is evaluated. Each of the plurality of reference frame amplitude difference vectors reflects differences between amplitudes captured by microphones of the array during recording sound from one of a plurality of candidate locations. A desired location of sound source is estimated based at least on the candidate locations and associated similarity. The sound source localization can be performed based at least on amplitude difference. | 12-06-2012 |
20130022214 | Method and System for Touch Gesture Detection in Response to Microphone Output - In some embodiments, a method for processing output of at least one microphone of a device (e.g., a headset) to identify at least one touch gesture exerted by a user on the device, including by distinguishing the gesture from input to the microphone other than a touch gesture intended by the user, and by distinguishing between a tap exerted by the user on the device and at least one dynamic gesture exerted by the user on the device, where the output of the at least one microphone is also indicative of ambient sound (e.g., voice utterences). Other embodiments are systems for detecting ambient sound (e.g., voice utterences) and touch gestures, each including a device including at least one microphone and a processor coupled and configured to process output of each microphone to identify at least one touch gesture exerted by a user on the device. | 01-24-2013 |
20130308784 | SYSTEM AND METHOD FOR WIND DETECTION AND SUPPRESSION - In one embodiment, a pickup system includes a wind detector and a wind suppressor. The wind detector has a plurality of analyzers each configured to analyze first and second input signals, and a combiner configured to combine outputs of the plurality of analyzers and issue, based on the combined outputs, a wind level indication signal indicative of wind activity. The analyzers can be selected from a group of analyzers including a spectral slope analyzer, a ratio analyzer, a coherence analyzer, a phase variance analyzer and the like. The wind suppressor has a ratio calculator configured to generate a ratio of the first and second input signals, and a mixer configured to select one of the first or second input signals and to apply to the selected input signal one of first or second panning coefficients based on the wind level indication signal and on the ratio. | 11-21-2013 |
20130322640 | POST-PROCESSING INCLUDING MEDIAN FILTERING OF NOISE SUPPRESSION GAINS - A method of post-processing raw banded gains for applying to an audio signal, an apparatus to generate banded post-processed gains, and a tangible computer-readable storage medium comprising instructions that when executed carry out the method. The raw banded gains are determined by input processing one or more input audio signals. The method includes applying post-processing to the raw banded gains to generate banded post-processed gains, generating a particular post-processed gain for a particular frequency band, including median filtering using raw gain values for frequency bands adjacent to the particular frequency band. One or more properties of the post-processing depend on classification of the one or more input audio signals. | 12-05-2013 |
20140126745 | COMBINED SUPPRESSION OF NOISE, ECHO, AND OUT-OF-LOCATION SIGNALS - A system, a method, logic embodied in a computer-readable medium, and a computer-readable medium comprising instructions that when executed carry out a method. The method processes: (a) a plurality of input signals, e.g., signals from a plurality of spatially separated microphones; and, for echo suppression, (b) one or more reference signals, e.g., signals from or to be rendered by one or more loudspeakers and that can cause echoes. The method processes the input signals and one or more reference signals to carry out in an integrated manner simultaneous noise suppression and out-of-location signal suppression, and in some versions, echo suppression. | 05-08-2014 |
20140240447 | Layered Mixing for Sound Field Conferencing System - A conferencing server ( | 08-28-2014 |
20140241528 | Sound Field Analysis System - In one embodiment, a sound field is mapped by extracting spatial angle information, diffusivity information, and optionally, sound level information. The extracted information is mapped for representation in the form of a Riemann sphere, wherein spatial angle varies longitudinally, diffusivity varies latitudinally, and level varies radially along the sphere. A more generalized mapping employs mapping the spatial angle and diffusivity information onto a representative region exhibiting variations in direction of arrival that correspond to the extracted spatial information and variations in distance that correspond to the extracted diffusivity information. | 08-28-2014 |
20140278380 | Spectral and Spatial Modification of Noise Captured During Teleconferencing - In some embodiments, a method for modifying noise captured at endpoints of a teleconferencing system, including steps of capturing noise at each endpoint, and modifying the captured noise to generate modified noise having a frequency-amplitude spectrum which matches a target spectrum and a spatial property set which matches a target spatial property set. In other embodiments, a teleconferencing method including steps of: at endpoints of a teleconferencing system, determining audio frames indicative of audio captured at each endpoint, each of a subset of the frames indicative of noise but not a significant level of speech; and at each endpoint, generating modified frames indicative of modified noise having a frequency-amplitude spectrum which matches a target spectrum and a spatial property set which matches a target spatial property set, and generating encoded audio including by encoding the modified frames. Other aspects are systems configured to perform any embodiment of the method. | 09-18-2014 |
20150023514 | Method and Apparatus for Acoustic Echo Control - Embodiments of method and apparatus for acoustic echo control are described. According to the method, an echo energy-based doubletalk detection is performed to determine whether there is a doubletalk in a microphone signal with reference to a loudspeaker signal. A spectral similarity between spectra of the microphone signal and the loudspeaker signal is calculated. It is determined that there is no doubletalk in the microphone signal if the spectral similarity is higher than a threshold level. Adaption of an adaptive filter for applying acoustic echo cancellation or acoustic echo suppression on the microphone signal is enabled if it is determined that there is no doubletalk in the microphone signal through the echo energy-based doubletalk detection, or there is no doubletalk through the spectral similarity-based doubletalk detection. | 01-22-2015 |
20150030017 | VOICE COMMUNICATION METHOD AND APPARATUS AND METHOD AND APPARATUS FOR OPERATING JITTER BUFFER - Voice communication method and apparatus and method and apparatus for operating jitter buffer are described. Audio blocks are acquired in sequence. Each of the audio blocks includes one or more audio frames. Voice activity detection is performed on the audio blocks. In response to deciding voice onset for a present one of the audio blocks, a subsequence of the sequence of the acquired audio blocks is retrieved. The subsequence precedes the present audio block immediately. The subsequence has a predetermined length and non-voice is decided for each audio block in the subsequence. The present audio block and the audio blocks in the subsequence are transmitted to a receiving party. The audio blocks in the subsequence are identified as reprocessed audio blocks. In response to deciding non-voice for the present audio block, the present audio block is cached. | 01-29-2015 |
20150030180 | POST-PROCESSING GAINS FOR SIGNAL ENHANCEMENT - A method, an apparatus, and logic to post-process raw gains determined by input processing to generate post-processed gains, comprising using one or both of delta gain smoothing and decision-directed gain smoothing. The delta gain smoothing comprises applying a smoothing filter to the raw gain with a smoothing factor that depends on the gain delta: the absolute value of the difference between the raw gain for the current frame and the post-processed gain for a previous frame. The decision-directed gain smoothing comprises converting the raw gain to a signal-to-noise ratio, applying a smoothing filter with a smoothing factor to the signal-to-noise ratio to calculate a smoothed signal-to-noise ratio, and converting the smoothed signal-to-noise ratio to determine the second smoothed gain, with smoothing factor possibly dependent on the gain delta. | 01-29-2015 |
20150032446 | METHOD AND SYSTEM FOR SIGNAL TRANSMISSION CONTROL - An audio signal with a temporal sequence of blocks or frames is received or accessed. Features are determined as characterizing aggregately the sequential audio blocks/frames that have been processed recently, relative to current time. The feature determination exceeds a specificity criterion and is delayed, relative to the recently processed audio blocks/frames. Voice activity indication is detected in the audio signal. VAD is based on a decision that exceeds a preset sensitivity threshold and is computed over a brief time period, relative to blocks/frames duration, and relates to current block/frame features. The VAD and the recent feature determination are combined with state related information, which is based on a history of previous feature determinations that are compiled from multiple features, determined over a time prior to the recent feature determination time period. Decisions to commence or terminate the audio signal, or related gains, are outputted based on the combination. | 01-29-2015 |
20150032447 | Determining a Harmonicity Measure for Voice Processing - A method, an apparatus, and a computer-readable medium configured with instructions that when executed carry out the method for determining a measure of harmonicity. In one embodiment the method includes selecting candidate fundamental frequencies within a range, and for candidate determining a mask or retrieving a pre-calculated mask that has positive value for each frequency that contributed to harmonicity, and negative value for each frequency that contributes to inharmonicity. A candidate harmonicity measure is calculated for each candidate fundamental by summing the product of the mask and the magnitude measure spectrum. The harmonicity measure is selected as the maximum of the candidate harmonicity measures. | 01-29-2015 |
20150049583 | Conferencing Device Self Test - A plurality of acoustic sensors in a non-anechoic environment are calibrated with the aim of removing manufacturing tolerances and degradation over time but preserving position-dependent differences between the sensors, The sensors are excited by an acoustic stimulus which has either time-dependent characteristics or finite duration. The calibration is to be based on diffuse-field excitation only, in which indirect propagation (including single or multiple reflections) dominate over any direct-path excitation. For this purpose, the calibration process considers only a non-initial portion of sensor outputs and/or of an impulse response derived there-from. Based on these data, a frequency-dependent magnitude response function is estimated and compared with a target response function, from which a calibration function is derived. | 02-19-2015 |
20150051906 | Hierarchical Active Voice Detection - One or more audio signals are processed using a multi-stage (hierarchical) voice and/or signal activity detector (VAD/SAD). A first stage is capable of reducing the workload bandwidth by employing an inexpensive VAD/SAD processor. One or more subsequent stages may further process the audio signals from the first stage. Other implementations may include a first stage that also performs continuity preservation between last blocks of audio signal and the first blocks of audio after it is detected that relevant audio signals are resumed. In yet other implementations, the first stage may extract features from audio signals when they are presented in their coded domain, and possibly with little or no decoding of the audio signal. | 02-19-2015 |
20150078594 | System and Method of Speaker Cluster Design and Rendering - A method of outputting audio in a teleconferencing environment includes receiving audio streams, processing the audio streams according to information regarding effective spatial positions, and outputting, by at least three speakers arranged in more than one dimension, the audio streams having been processed. The information regarding the plurality of effective spatial positions corresponds to a perceived spatial scene that extends beyond the speakers in at least two dimensions. In this manner, participants in the teleconference perceive the audio from the remote participants as originating at different positions in the teleconference room. | 03-19-2015 |