Class / Patent application number | Description | Number of patent applications / Date published |
704208000 | Voiced or unvoiced | 43 |
20090125301 | VOICING DETECTION MODULES IN A SYSTEM FOR AUTOMATIC TRANSCRIPTION OF SUNG OR HUMMED MELODIES - The technology disclosed relates to audio signal processing. It includes a series of modules that individually are useful to solve audio signal processing problems. Among the problems addressed are buzz removal, selecting a pitch candidate among pitch candidates based on local continuity of pitch and regional octave consistency, making small adjustments in pitch, ensuring that a selected pitch is consistent with harmonic peaks, determining whether a given frame or region of frames includes harmonic, voiced signal, extracting harmonics from voice signals and detecting vibrato. One environment in which these modules are useful is transcribing singing or humming into a symbolic melody. Another environment that would usefully employ some of these modules is speech processing. Some of the modules, such as buzz removal, are useful in many other environments as well. | 05-14-2009 |
20090182556 | PITCH ESTIMATION AND MARKING OF A SIGNAL REPRESENTING SPEECH - Methods, systems, and machine-readable media are disclosed for processing a signal representing speech. According to one embodiment, a method of processing a signal representing speech can comprise receiving a frame of the signal representing speech, classifying the frame as a voiced frame, and parsing the voiced frame into one or more regions based on occurrence of one or more events within the voiced frame. For example, the one or more events can comprise one or more glottal pulses. The one or more regions may collectively represent less than all of the voiced frame. | 07-16-2009 |
20100017202 | Method and apparatus for determining coding mode - Provided is a method and apparatus for determining a signal coding mode. The signal coding mode may be determined or changed according to whether a current frame corresponds to a silence period and by using a history of speech or music presence possibilities. | 01-21-2010 |
20100017203 | AUTOMATIC LEVEL CONTROL OF SPEECH SIGNALS - A method and apparatus for processing audio signals. The method includes receiving an audio signal as a sequence of digital samples, said audio signal containing a speech portion and a non-speech portion, dividing said sequence of digital samples into a sequence of sub-frames, selecting a set of sub-frames from said sequence of sub-frames, said set including a current sub-frame, determining whether a difference of peak values for any pair of sub-frames is greater than a pre-determined threshold, wherein said pair of sub-frames are contained in said set of sub-frames, and concluding that said current sub-frame represents said speech portion if said difference of peak values exceeds said pre-determined threshold. | 01-21-2010 |
20100088089 | Speech Synthesizer - Synthesizing a set of digital speech samples corresponding to a selected voicing state includes dividing speech model parameters into frames, with a frame of speech model parameters including pitch information, voicing information determining the voicing state in one or more frequency regions, and spectral information. First and second digital filters are computed using, respectively, first and second frames of speech model parameters, with the frequency responses of the digital filters corresponding to the spectral information in frequency regions for which the voicing state equals the selected voicing state. A set of pulse locations are determined, and sets of first and second signal samples are produced using the pulse locations and, respectively, the first and second digital filters. Finally, the sets of first and second signal samples are combined to produce a set of digital speech samples corresponding to the selected voicing state. | 04-08-2010 |
20100094620 | Voice Transcoder - First encoded voice bits are transcoded into second encoded voice bits by dividing the first encoded voice bits into one or more received frames, with each received frame containing multiple ones of the first encoded voice bits. First parameter bits for at least one of the received frames are generated by applying error control decoding to one or more of the encoded voice bits contained in the received frame, speech parameters are computed from the first parameter bits, and the speech parameters are quantized to produce second parameter bits. Finally, a transmission frame is formed by applying error control encoding to one or more of the second parameter bits, and the transmission frame is included in the second encoded voice bits. | 04-15-2010 |
20100125452 | PITCH RANGE REFINEMENT - A method of refining a pitch period estimation of a signal, the method comprising: for each of a plurality of portions of the signal, scanning over a predefined range of time offsets to find an estimate of the pitch period of the portion within the predefined range of time offsets; identifying the average pitch period of the estimated pitch periods of the portions; determining a refined range of time offsets in dependence on the average pitch period, the refined range of time offsets being narrower than the predefined range of time offsets; and for a subsequent portion of the signal, scanning over the refined range of time offsets to find an estimate of the pitch period of the subsequent portion. | 05-20-2010 |
20100145688 | Method and apparatus for encoding/decoding speech signal using coding mode - An apparatus and a method to encode and decode a speech signal using an encoding mode are provided. An encoding apparatus may select an encoding mode of a frame included in an input speech signal, and encode a frame having an unvoiced mode for an unvoiced speech as the selected encoding mode. | 06-10-2010 |
20100250246 | SPEECH SIGNAL EVALUATION APPARATUS, STORAGE MEDIUM STORING SPEECH SIGNAL EVALUATION PROGRAM, AND SPEECH SIGNAL EVALUATION METHOD - A speech signal evaluation apparatus includes: an acquisition unit that acquires, as a first frame, a speech signal of a specified length from speech signals; a first detection unit that detects, on the basis of a speech condition, whether the first frame is voiced or unvoiced; a variation calculation unit that, when the first frame is unvoiced, calculates a variation in a spectrum associated with the first frame on the basis of a spectrum of the first frame and a spectrum of a second frame that is unvoiced and precedes the first frame in time; and a second detection unit that detects, on the basis of a non-stationary condition based on the variation in spectrum, whether the variation of the first frame satisfies the non-stationary condition. | 09-30-2010 |
20110029304 | HYBRID INSTANTANEOUS/DIFFERENTIAL PITCH PERIOD CODING - A hybrid instantaneous/differential encoding technique is described herein that may be used to reduce the bit rate required to encode a pitch period associated with a segment of a speech signal in a manner that will result in relatively little or no degradation of a decoded speech signal generated using the encoded pitch period. The hybrid instantaneous/differential encoding technique is advantageously applicable to any speech codec that encodes a pitch period associated with a segment of a speech signal. | 02-03-2011 |
20110035213 | Method and Device for Sound Activity Detection and Sound Signal Classification - A device and method for estimating a tonality of a sound signal comprise: calculating a current residual spectrum of the sound signal; detecting peaks in the current residual spectrum; calculating a correlation map between the current residual spectrum and a previous residual spectrum for each detected peak; and calculating a long-term correlation map based on the calculated correlation map, the long-term correlation map being indicative of a tonality in the sound signal. | 02-10-2011 |
20110099006 | AUTOMATED AND ENHANCED NOTE TAKING FOR ONLINE COLLABORATIVE COMPUTING SESSIONS - In one embodiment, during participation in an online collaborative computing session, a computer process associated with the session may monitor an audio stream of the session for a predefined action-inducing phrase. In response to the phrase, a subsequent segment of the session is recorded, such that a report may be generated containing any recorded segments of the session (e.g., and dynamically sent to participants of the session). | 04-28-2011 |
20110153317 | GENDER DETECTION IN MOBILE PHONES - An apparatus for wireless communications includes a processing system. The processing system is configured to receive an input sound stream of a user, split the input sound stream into a plurality of frames, classify each of the frames as one selected from the group consisting of a non-speech frame and a speech frame, determine a pitch of each of the frames in a subset of the speech frames, and identify a gender of the user from the determined pitch. To determine the pitch, the processing system is configured to filter the speech frames to compute an error signal, compute an autocorrelation of the error signal, find a maximum autocorrelation value, and set the pitch to an index of the maximum autocorrelation value. | 06-23-2011 |
20110153318 | Method and system for speech bandwidth extension - There is provided a method or a device for extending a bandwidth of a first band speech signal to generate a second band speech signal wider than the first band speech signal and including the first band speech signal. The method comprises receiving a segment of the first band speech signal having a low cut off frequency and a high cut off frequency; determining the high cut off frequency of the segment; determining whether the segment is voiced or unvoiced; if the segment is voiced, applying a first bandwidth extension function to the segment to generate a first bandwidth extension in high frequencies; if the segment is unvoiced, applying a second bandwidth extension function to the segment to generate a second bandwidth extension in the high frequencies; using the first bandwidth extension and the second bandwidth extension to extend the first band speech signal beyond the high cut off frequency. | 06-23-2011 |
20110218801 | METHOD FOR ERROR CONCEALMENT IN THE TRANSMISSION OF SPEECH DATA WITH ERRORS - The invention relates to a method for outputting a speech signal. Speech signal frames are received and are used in a predetermined sequence in order to produce a speech signal to be output. If one speech signal frame to be received is not received, then a substitute speech signal frame is used in its place, which is produced as a function of a previously received speech signal frame. According to the invention, in the situation in which the previously received speech signal frame has a voiceless speech signal, the substitute speech signal frame is produced by means of a noise signal. | 09-08-2011 |
20110257965 | INTEROPERABLE VOCODER - Encoding a sequence of digital speech samples into a bit stream includes dividing the digital speech samples into one or more frames and computing a set of model parameters for the frames. The set of model parameters includes at least a first parameter conveying pitch information. The voicing state of a frame is determined and the first parameter conveying pitch information is modified to designate the determined voicing state of the frame, if the determined voicing state of the frame is equal to one of a set of reserved voicing states. The model parameters are quantized to generate quantizer bits which are used to produce the bit stream. | 10-20-2011 |
20110264447 | SYSTEMS, METHODS, AND APPARATUS FOR SPEECH FEATURE DETECTION - Implementations and applications are disclosed for detection of a transition in a voice activity state of an audio signal, based on a change in energy that is consistent in time across a range of frequencies of the signal. | 10-27-2011 |
20110282658 | Method and Apparatus for Audio Source Separation - The present invention relates to co-channel audio source separation. In one embodiment a first frequency-related representation of plural regions of the acoustic signal is prepared over time, and a two-dimensional transform of plural two-dimensional localized regions of the first frequency-related representation, each less than an entire frequency range of the first frequency related representation, is obtained to provide a two-dimensional compressed frequency-related representation with respect to each two dimensional localized region. For each of the plural regions, at least one pitch is identified. The pitch from the plural regions is processed to provide multiple pitch estimates over time. In another embodiment, a mixed acoustic signal is processed by localizing multiple time-frequency regions of a spectrogram of the mixed acoustic signal to obtain one or more acoustic properties. A separate pitch estimate of each of the multiple acoustic signals at a time point are provided by combining the one or more acoustic properties. At least one of the multiple acoustic signals is recovered using the separate pitch estimates. | 11-17-2011 |
20120022859 | AUTOMATIC MARKING METHOD FOR KARAOKE VOCAL ACCOMPANIMENT - An automatic marking method for Karaoke vocal accompaniment is provided. In the method, pitch, beat position and volume of a singer are compared with the original pitch, beat position and volume of the theme of a song to generate a score of pitch, a score of beat and a score of emotion respectively, so as to obtain a weighted total score in a weighted marking method. By using the method, the pitch, beat position and volume error of each section of the song sung by the singer can be exactly worked out, and a pitch curve and a volume curve can be displayed, so that the singer can learn which part is sung incorrectly and which part needs to be enhanced. The present invention also has the advantages of dual effects of teaching and entertainment, high practicability and technical advancement. | 01-26-2012 |
20120158401 | MUSIC DETECTION USING SPECTRAL PEAK ANALYSIS - In one embodiment, a music detection (MD) module accumulates sets of one or more frames and performs FFT processing on each set to recover a set of coefficients, each corresponding to a different frequency k. For each frame, the module identifies candidate musical tones by searching for peak values in the set of coefficients. If a coefficient corresponds to a peak, then a variable TONE[k] corresponding to the coefficient is set equal to one. Otherwise, the variable is set equal to zero. For each variable TONE[k] having a value of one, a corresponding accumulator A[k] is increased. Candidate musical tones that are short in duration are filtered out by comparing each accumulator A[k] to a minimum duration threshold. A determination is made as to whether or not music is present based on a number of candidate musical tones and a sum of candidate musical tone durations using a state machine. | 06-21-2012 |
20120239389 | AUDIO SIGNAL PROCESSING METHOD AND DEVICE - Disclosed is an audio signal processing method comprising the steps of: receiving an audio signal containing current frame data; generating a first temporary output signal for the current frame when an error occurs in the current frame data, by carrying out frame error concealment with respect to the current frame data a random codebook; generating a parameter by carrying out one or more of short-term prediction, long-term prediction and a fixed codebook search based on the first temporary output signal; and memory updating the parameter for the next frame; wherein the parameter comprises one or more of pitch gain, pitch delay, fixed codebook gain and a fixed codebook. | 09-20-2012 |
20130024192 | ATMOSPHERE EXPRESSION WORD SELECTION SYSTEM, ATMOSPHERE EXPRESSION WORD SELECTION METHOD, AND PROGRAM - Disclosed is an information display system provided with: a signal analyzing unit which analyzes the audio signals obtained from a predetermined location and which generates ambient sound information regarding the sound generated at the predetermined location; and an ambient expression selection unit which selects an ambient expression which expresses the content of what a person is feeling from the sound generated at the predetermined location on the basis of the ambient sound information. | 01-24-2013 |
20130041658 | SYSTEM AND METHOD OF PROCESSING A SOUND SIGNAL INCLUDING TRANSFORMING THE SOUND SIGNAL INTO A FREQUENCY-CHIRP DOMAIN - A system and method may be configured to process an audio signal. The system and method may track pitch, chirp rate, and/or harmonic envelope across the audio signal, may reconstruct sound represented in the audio signal, and/or may segment or classify the audio signal. A transform may be performed on the audio signal to place the audio signal in a frequency chirp domain that enhances the sound parameter tracking, reconstruction, and/or classification. | 02-14-2013 |
20130144612 | Pitch Period Segmentation of Speech Signals - A method for automatic segmentation of pitch periods of speech waveforms takes a speech waveform, a corresponding fundamental frequency contour of the speech waveform, that can be computed by some standard fundamental frequency detection algorithm, and optionally the voicing information of the speech waveform, that can be computed by some standard voicing detection algorithm, as inputs and calculates the corresponding pitch period boundaries of the speech waveform as outputs by iteratively •calculating the Fast Fourier Transform (FFT) of a speech segment having a length of approximately two periods, the period being calculated as the inverse of the mean fundamental frequency associated with these speech segments, •placing the pitch period boundary either at the position where the phase of the third FFT coefficient is −180 degrees, or at the position where the correlation coefficient of two speech segments shifted within the two period long analysis frame maximizes, or at a position calculated as a combination of both measures stated above, and repeatedly shifting the analysis frame one period length further until the end of the speech waveform is reached. | 06-06-2013 |
20130144613 | Half-Rate Vocoder - Encoding a sequence of digital speech samples into a bit stream includes dividing the digital speech samples into one or more frames, computing model parameters for a frame, and quantizing the model parameters to produce pitch bits conveying pitch information, voicing bits conveying voicing information, and gain bits conveying signal level information. One or more of the pitch bits are combined with one or more of the voicing bits and one or more of the gain bits to create a first parameter codeword that is encoded with an error control code to produce a first FEC codeword that is included in a bit stream for the frame. The process may be reversed to decode the bit stream. | 06-06-2013 |
20130262098 | VOICE ANALYSIS APPARATUS, VOICE SYNTHESIS APPARATUS, VOICE ANALYSIS SYNTHESIS SYSTEM - A speech analysis apparatus is provided. An F0 extraction part extracts a pitch value from speech information. A spectrum extraction part extracts spectrum information from the speech information. An MVF extraction part extract a maximum voiced frequency and allows boundary information for respectively filtering a harmonic component and a non-harmonic component to be obtained. According to the speech analysis apparatus, speech synthesis apparatus, and speech analysis synthesis system of the present invention, speech that is closer to the original voice and is more natural may be synthesized. Also, speech may be represented with less data capacity. | 10-03-2013 |
20130262099 | APPARATUS AND METHOD FOR APPLYING PITCH FEATURES IN AUTOMATIC SPEECH RECOGNITION - According to one embodiment, an apparatus for applying pitch features in automatic speech recognition is provided. The apparatus includes a distribution evaluation module, normalization module, and random value adjusting module. The distribution evaluation module evaluates the global distribution of pitch features of voiced frames in speech signals, and the global distribution of random values for unvoiced frames in speech signals. The normalization module normalizes the global distribution of random values for unvoiced frames based on the global distribution of pitch features of voiced frames. The random value adjusting module adjusts random values for unvoiced frames based on the normalized global distribution, so that the adjusted random values can be assigned to unvoiced frames in speech signals as pitch features of the unvoiced frames. | 10-03-2013 |
20130346071 | METHOD OF SIMULTANEOUSLY TRANSFORMING A PLURALITY OF VOICE SIGNALS INPUT TO A COMMUNICATIONS SYSTEM - A method of simultaneously transforming at least two input voice signals x | 12-26-2013 |
20140006017 | SYSTEMS, METHODS, APPARATUS, AND COMPUTER-READABLE MEDIA FOR GENERATING OBFUSCATED SPEECH SIGNAL | 01-02-2014 |
20140006018 | VOICE PROCESSING APPARATUS | 01-02-2014 |
20140046658 | FRAME BASED AUDIO SIGNAL CLASSIFICATION - An audio classifier for frame based audio signal classification includes a feature extractor configured to determine, for each of a predetermined number of consecutive frames, feature measures representing at least the following features: auto correlation, frame signal energy, inter-frame signal energy variation. A feature measure comparator is configured to compare each determined feature measure to at least one corresponding predetermined feature interval. A frame classifier is configured to calculate, for each feature interval, a fraction measure representing the total number of corresponding feature measures that fall within the feature interval, and to classify the latest of the consecutive frames as speech if each fraction measure lies within a corresponding fraction interval, and as non-speech otherwise. | 02-13-2014 |
20140081629 | Audio Classification Based on Perceptual Quality for Low or Medium Bit Rates - The quality of encoded signals can be improved by reclassifying AUDIO signals carrying non-speech data as VOICE signals when periodicity parameters of the signal satisfy one or more criteria. In some embodiments, only low or medium bit rate signals are considered for re-classification. The periodicity parameters can include any characteristic or set of characteristics indicative of periodicity. For example, the periodicity parameter may include pitch differences between subframes in the audio signal, a normalized pitch correlation for one or more subframes, an average normalized pitch correlation for the audio signal, or combinations thereof. Audio signals which are re-classified as VOICED signals may be encoded in the time-domain, while audio signals that remain classified as AUDIO signals may be encoded in the frequency-domain. | 03-20-2014 |
20140114653 | PITCH ESTIMATOR - An apparatus comprising an analysis window definer configured to define at least one analysis window for a first audio signal, wherein the at least one analysis window definer is configured to be dependent on the first audio signal and a pitch estimator configured to determine a first pitch estimate for the first audio signal, wherein the pitch estimator is dependent on the first audio signal sample values within the analysis window. | 04-24-2014 |
20140142932 | Method for Producing Audio File and Terminal Device - Embodiments of the present invention provide a method for producing an audio file and a terminal device. The method includes recording a user's voice to obtain audio information, generating a score curve according to the audio information, and displaying the score curve; receiving a polishing instruction that is sent by the user by operating the score curve, and adjusting the audio information according to the polishing instruction, and generating an audio file. The technical solutions provided in the present invention enable the user to create a song of himself or herself on the terminal device, thereby improving functions of the terminal device and meeting an application requirement of the user. | 05-22-2014 |
20140180683 | DYNAMICALLY ADAPTED PITCH CORRECTION BASED ON AUDIO INPUT - Systems and methods for adjusting pitch of an audio signal include detecting input notes in the audio signal, mapping the input notes to corresponding output notes, each output note having an associated upper note boundary and lower note boundary, and modifying at least one of the upper note boundary and the lower note boundary of at least one output note in response to previously received input notes. Pitch of the input notes may be shifted to match an associated pitch of corresponding output notes. Delay of the pitch shifting process may be dynamically adjusted based on detected stability of the input notes. | 06-26-2014 |
20140222421 | STREAMING ENCODER, PROSODY INFORMATION ENCODING DEVICE, PROSODY-ANALYZING DEVICE, AND DEVICE AND METHOD FOR SPEECH SYNTHESIZING - A speech-synthesizing device includes a hierarchical prosodic module, a prosody-analyzing device, and a prosody-synthesizing unit. The hierarchical prosodic module generates at least a first hierarchical prosodic model. The prosody-analyzing device receives a low-level linguistic feature, a high-level linguistic feature and a first prosodic feature, and generates at least a prosodic tag based on the low-level linguistic feature, the high-level linguistic feature, the first prosodic feature and the first hierarchical prosodic model. The prosody-synthesizing unit synthesizes a second prosodic feature based on the hierarchical prosodic module, the low-level linguistic feature and the prosodic tag. | 08-07-2014 |
20140297272 | INTELLIGENT INTERACTIVE VOICE COMMUNICATION SYSTEM AND METHOD - The present invention generally relates to intelligent voice communication systems. Specifically, this invention relates to systems and methods for providing intelligent interactive voice communication services to users of a telephony means. Preferred embodiments of the invention are directed to providing interactive voice communication services in the form of intelligent and interactive automated prank calling services. | 10-02-2014 |
20140343934 | Method, Apparatus, and Speech Synthesis System for Classifying Unvoiced and Voiced Sound - A method, apparatus, and speech synthesis system are disclosed for classifying unvoiced and voiced sound. The method includes: setting an unvoiced and voiced sound classification question set; using speech training data and the unvoiced and voiced sound classification question set for training a sound classification model of a binary decision tree structure, where the binary decision tree structure includes non-leaf nodes and leaf nodes, the non-leaf nodes represent questions in the unvoiced and voiced sound classification question set, and the leaf nodes represent unvoiced and voiced sound classification results; and receiving speech test data, and using the trained sound classification model to decide whether the speech test data is unvoiced sound or voiced sound. | 11-20-2014 |
20150032445 | NOISE ESTIMATION APPARATUS, NOISE ESTIMATION METHOD, NOISE ESTIMATION PROGRAM, AND RECORDING MEDIUM - A noise estimation apparatus which estimates a non-stationary noise component on the basis of the likelihood maximization criterion is provided. The noise estimation apparatus obtains the variance of a noise signal that causes a large value to be obtained by weighted addition of the sums each of which is obtained by adding the product of the log likelihood of a model of an observed signal expressed by a Gaussian distribution in a speech segment and a speech posterior probability in each frame, and the product of the log likelihood of a model of an observed signal expressed by a Gaussian distribution in a non-speech segment and a non-speech posterior probability in each frame, by using complex spectra of a plurality of observed signals up to the current frame. | 01-29-2015 |
20160118054 | METHOD FOR RECOVERING LOST FRAMES - A method for recovering lost frame in a media bitstream is provided. When a frame loss event occurs, a decoder obtains a synthesized high frequency band signal of a current lost frame, and recovery information related to the current lost frame. The decoder determines a global gain gradient of the current lost frame, and further determines a global gain of the current lost frame according to the global gain gradient and a global gain of each frame in previous M frames of the current lost frame. A high frequency band signal of the current lost frame is obtained by adjusting the synthesized high frequency band signal of the current lost frame according to the global gain and a subframe gain of the current lost frame. The process enables natural and smooth transitions of the high frequency band signal between the frames, and attenuates noises in the high frequency band signal. | 04-28-2016 |
20160155456 | Audio Signal Classification Method and Apparatus | 06-02-2016 |
20160379673 | SPEECH SECTION DETECTION DEVICE, VOICE PROCESSING SYSTEM, SPEECH SECTION DETECTION METHOD, AND COMPUTER PROGRAM PRODUCT - According to an embodiment, a speech section detection device includes a reception unit and a detection unit. The reception unit is configured to receive, from an external device, a first voice signal that is a signal in which likelihood indicating a probability of speech is equal to or more than a first threshold. The detection unit is configured to detect, from the first voice signal, a second voice signal that is a signal of a section in which the likelihood is equal to or more than a second threshold that is larger than the first threshold. | 12-29-2016 |
20170236526 | SOUND QUALITY IMPROVING METHOD AND DEVICE, SOUND DECODING METHOD AND DEVICE, AND MULTIMEDIA DEVICE EMPLOYING SAME | 08-17-2017 |