Entries |
Document | Title | Date |
20080208573 | Speech Signal Coding - The present invention relates to methods and apparatuses for speech signal encoding and decoding. In accordance with the invention, a discrete time speech signal is encoded by identifying a speech element in the speech signal. If the speech element is identified for the first time, the encoder ( | 08-28-2008 |
20080215316 | METHOD AND SYSTEM FOR TRIMMING AUDIO FILES - A system for automatically trimming an audio files based upon textual content associated with the audio file is provided. The source of the textual content may be an electronic document or written language text. The textual content may include predefined hints, a text mark, or end-of-phrase punctuation mark. The system generates a trimming instruction based upon textual content corresponding to the audio file, and the audio file is trimmed based upon the trimming instruction. | 09-04-2008 |
20080221877 | User interactive apparatus and method, and computer program product - A response storage unit stores a response, a watching degree relative to a display unit, and an output form of the response to a speaker and the display unit. An extracting unit extracts a request from a speech recognition result. A response determining unit determines a response based on the extracted request. A direction detector detects a viewing direction of a user. A watching-degree determining unit determines a watching degree based on the viewing direction. An output controller obtains an output form corresponding to the response and the determined watching degree from the response storage unit, and output the response to the speaker and the display unit according to the obtained output form. | 09-11-2008 |
20080228472 | Audio Data Packet Format and Decoding Method thereof and Method for Correcting Mobile Communication Terminal Codec Setup Error and Mobile Communication Terminal Performance Same - Disclosed is an audio data packet format for transmitting an IYIPEG-4 HE-AAC frame via a voice channel of a mobile communication network, a method for decoding the audio data packet format, a method for correcting a codec setup error by identifying a codec used to encode sound source data inserted into a data field of voice slot data, based on the sequence number of the voice slot data, and correcting the codec setup error when a codec set up in a mobile communication terminal is different from the codec used to encode the sound source data, and a mobile communication terminal adapted to correct a codec setup error. | 09-18-2008 |
20080255831 | Speech Band Extension Device - A speech band extension device ( | 10-16-2008 |
20080262837 | METHOD AND SYSTEM OF DYNAMICALLY ADJUSTING A SPEECH OUTPUT RATE TO MATCH A SPEECH INPUT RATE | 10-23-2008 |
20080275696 | Method of Audio Encoding - There is described a method of encoding an input signal ( | 11-06-2008 |
20090006084 | LOW-COMPLEXITY FRAME ERASURE CONCEALMENT - A system is described that performs frame erasure concealment (FEC) to generate frames of an output speech signal corresponding to erased frames of encoded bit-stream in a manner that conceals the quality-degrading effects of such erased frames. An embodiment of the invention advantageously does not introduce additional delay, has lower state memory requirement than the FEC technique specified in G.711 Appendix I, and produces better speech quality than the FEC technique specified in G.711 Appendix I while still allowing for reduced computational complexity and code size. | 01-01-2009 |
20090043570 | METHOD FOR PROCESSING SPEECH SIGNAL DATA - Method for processing speech signal data. A speech signal is divided into frames. Each frame is characterized by a frame number T representing a unique interval of time. Each speech signal is characterized by a power spectrum with respect to frame T and frequency band ω. A speech segment and a reverberation segment of the speech signal is determined. L filter coefficients W(k) (k=1, 2, . . . , L) respectively corresponding to L frames immediately preceding frame T are computed such that the L filter coefficients minimize a function Φ that is a linear combination of sum of squares of a residual speech power in the reverberation segment and a sum of squares of a subtracted speech power in the speech segment. The computed L filter coefficients are stored within storage media of the computing apparatus. | 02-12-2009 |
20090043571 | Methods and Systems for Sample Rate Conversion - Methods and systems for sample rate conversion convert a sampled signal to a higher data rate signal. Conversion pulses are received, having a conversion rate that is higher than the sample rate of the sampled signal. Sample points are then reconstructed from the sampled signal, in real time, on either side of a conversion pulse. An interpolation is performed between the reconstructed sample points, at the time of the conversion pulse. The interpolation results are outputted in real time. The process is repeated for additional conversion pulses. The outputted interpolated amplitudes form the higher data rate signal having a data rate equal to the conversion rate. Sample rate conversion is thus performed in real time according to the higher data rate clock, rather than with fixed ratios. As a result, when the higher data rate clock is affected by, for example, jitter or other frequency variations, the higher data rate samples immediately track the lower data rate samples. This helps to insure that the output higher data rate data tracks the lower rate data, thus providing a more accurate sample rate conversion. | 02-12-2009 |
20090063140 | ENCODING AND DECODING OF AUDIO SIGNALS USING COMPLEX-VALUED FILTER BANKS - An encoder ( | 03-05-2009 |
20090070104 | ACOUSTIC COMMUNICATION SYSTEM - A number of encoders for encoding a data signal within an audio signal are provided. In some of the encoders, the audio signal is separated into a tonal part, and a residual part, and the data signal is shaped based on the residual part. In other encoders, the data signal is separated into a tonal part and a residual part, and the data signal is combined with the audio signal independence upon the residual part. In other encoders, the rate at which the data is encoded within the audio signal is varied in dependence upon the audio signal. There are also described various decoders associated with the described encoders. | 03-12-2009 |
20090144054 | Embedded system to perform frame switching - The present patent discloses an embedded transient detection module, which improves the quality of the audio encoder, at the same time requires less computational power, as compared to existing schemes. This module uses a long frame, when the input audio signal is in steady state, while a short frame is used, when there are transients in the signal. | 06-04-2009 |
20090157396 | Voice data signal recording and retrieving - Embodiments related to recording and retrieving of voice data signals are described and depicted. | 06-18-2009 |
20090182557 | SOUND/VOICE PROCESSING APPARATUS, SOUND/VOICE PROCESSING METHOD, AND SOUND/VOICE PROCESSING PROGRAM - If a delay occurs in execution of sound/voice processing application software, and, as a result, MIC data is stored in a plurality of buffers, then a CPU identifies, based on a buffer list, a buffer in which newest MIC data is stored. The CPU reads the newest MIC data from the identified buffer and adjusts an output sound/voice level depending on an external sound/voice level, using the newest MIC data. | 07-16-2009 |
20090271183 | PRODUCING TIME UNIFORM FEATURE VECTORS - Methods, systems, and machine-readable media are disclosed for processing a signal representing speech. According to one embodiment, processing a signal representing speech can comprise receiving a frame of the signal representing speech, the frame comprising a voiced frame. One or more cords can be extracted from the voiced frame based on occurrence of one or more events within the frame. For example, the one or more events comprise one or more glottal pulses. The one or more cords can collectively comprise less than all of the frame. The one or more cords can be normalized on a time basis. For example, each of the one or more cords can begin with onset of a glottal pulse and extend to a point prior to an onset of neighboring glottal pulse but may exclude a portion of the frame prior to the onset of the neighboring glottal pulse. | 10-29-2009 |
20090276210 | STEREO AUDIO ENCODING APPARATUS, STEREO AUDIO DECODING APPARATUS, AND METHOD THEREOF - Disclosed is a stereo speech decoding device and others capable of reducing a stereo speech encoding bit rate and suppressing degradation of speech quality. In this device, a section | 11-05-2009 |
20090287479 | SOUND FRAME LENGTH ADAPTATION - A method of producing time domain sound data (B) from sound parameters (A), the method comprising the steps of: forming first frames, each first frame containing sound parameters representing sound, —forming second frames from the first frames, each second frame containing transform domain sound data derived from the sound parameters, the transform domain sound data of each second frame representing sound having a specific time domain length, and each second frame having a length corresponding with an efficient inverse transform, inversely transforming the second frames into third frames (G | 11-19-2009 |
20090306974 | SYSTEM AND METHOD OF AN IN-BAND MODEM FOR DATA COMMUNICATIONS OVER DIGITAL WIRELESS COMMUNICATION NETWORKS - A system is provided for transmitting information through a speech codec (in-band) such as found in a wireless communication network. A modulator transforms the data into a spectrally noise-like signal based on the mapping of a shaped pulse to predetermined positions within a modulation frame, and the signal is efficiently encoded by a speech codec. A synchronization sequence provides modulation frame timing at the receiver and is detected based on analysis of a correlation peak pattern. A request/response protocol provides reliable transfer of data using message redundancy, retransmission, and/or robust modulation modes dependent on the communication channel conditions. | 12-10-2009 |
20100023322 | APPARATUS AND METHOD FOR GENERATING AUDIO SUBBAND VALUES AND APPARATUS AND METHOD FOR GENERATING TIME-DOMAIN AUDIO SAMPLES - An embodiment of an apparatus for generating audio subband values in audio subband channels includes an analysis windower for windowing a frame of time-domain audio input samples being in a time sequence extending from an early sample to a later sample using an analysis window function including a sequence of window coefficients to obtain windowed samples. The analysis window function includes a first number of window coefficients derived from a larger window function including a sequence of a larger second number of window coefficients, wherein the window coefficients of the window function are derived by an interpolation of window coefficients of the larger window function. The apparatus further includes a calculator for calculating the audio subband values using the windowed samples. | 01-28-2010 |
20100145691 | GLOBAL BOUNDARY-CENTRIC FEATURE EXTRACTION AND ASSOCIATED DISCONTINUITY METRICS - Portions from time-domain speech segments are extracted. Feature vectors that represent the portions in a vector space are created. The feature vectors incorporate phase information of the portions. A distance between the feature vectors in the vector space is determined. In one aspect, the feature vectors are created by constructing a matrix W from the portions and decomposing the matrix W. In one aspect, decomposing the matrix W comprises extracting global boundary-centric features from the portions. In one aspect, the portions include at least one pitch period. In another aspect, the portions include centered pitch periods. | 06-10-2010 |
20100153098 | DATA COMPRESSION FORMAT - An encoder for compressing a plurality of independent mono audio channels into a recording and generating a restricted set of additional parameters used to master an audio track of a storage device is described. The plurality of independent mono audio channels are constructed such that the storage device can be played using solid state disk player so that in a first mode all of the plurality of independent mono audio channels are played as the recording and in a second mode the original channels are reconstructed using a higher sample rate. A corresponding decoder and an audio system comprising such encoder and decoder are also described. | 06-17-2010 |
20100179806 | METHOD FOR PHASE MISMATCH CALIBRATION FOR AN ARRAY MICROPHONE AND PHASE CALIBRATION MODULE FOR THE SAME - The invention provides a phase calibration module, calibrating phase mismatch between microphone signals output by a plurality of microphones of an array microphone. In one embodiment, the phase calibration module comprises a subband filter, a delay calculation module, and a delay compensation filter. The subband filter extracts a high frequency component and a low frequency component from each of the microphone signals to obtain a plurality of high-frequency component signals and a plurality of low-frequency component signals. The delay calculation module calculates delays between the low-frequency component signals. The delay compensation filter then compensates the low-frequency component signals for phase mismatches therebetween according to the calculated delays to obtain a plurality of calibrated low-frequency component signals. | 07-15-2010 |
20100191525 | Gateway With Voice - In one aspect of the present invention, a network gateway is configured to facilitate on line and off line bi-directional communication between a number of near end data and telephony devices with far end data termination devices via a hybrid fiber coaxial network and a cable modem termination system. The described network gateway combines a QAM receiver, a transmitter, a DOCSIS MAC, a CPU, a voice and audio processor, an Ethernet MAC, and a USB controller to provide high performance and robust operation. | 07-29-2010 |
20110099007 | NOISE ESTIMATION USING AN ADAPTIVE SMOOTHING FACTOR BASED ON A TEAGER ENERGY RATIO IN A MULTI-CHANNEL NOISE SUPPRESSION SYSTEM - Techniques are described herein that provide multi-channel noise suppression based on a Teager energy ratio. A Teager energy ratio is a ratio of an average Teager energy operator (TEO) energy of a first signal to an average TEO energy of a second signal. The average TEO energy of a signal is defined by the equation: | 04-28-2011 |
20110112830 | SYSTEM AND METHOD FOR LOW OVERHEAD VOICE AUTHENTICATION - A system and method are provided to authenticate a voice in a frequency domain. A voice in the time domain is transformed to a signal in the frequency domain. The first harmonic is set to a predetermined frequency and the other harmonic components are equalized. Similarly, the amplitude of the first harmonic is set to a predetermined amplitude, and the harmonic components are also equalized. The voice signal is then filtered. The amplitudes of each of the harmonic components are then digitized into bits to form at least part of a voice ID. In another system and method, a voice is authenticated in a time domain. The initial rise time, initial fall time, second rise time, second fall time and final oscillation time are digitized into bits to form at least part of a voice ID. The voice IDs are used to authenticate a user's voice. | 05-12-2011 |
20110153319 | Method and Apparatus to Prepare Listener-Interest-Filtered Works - An embodiment of the present invention is a method of storing a speed contour for use in playback of at least a portion of an audio or audio-visual work including: (a) generating one or more speed contours and/or average speed contours and/or democratic speed contours for the audio or audio-visual work; (b) storing the one or more speed contours and/or average speed contours and/or democratic speed contours in a database; and (c) associating retrieval information with the one or more stored contours in the database. | 06-23-2011 |
20110172994 | PROCESSING OF VOICE INPUTS - This is directed to processing voice inputs received by an electronic device while prompts are provided. In particular, this is directed to providing a sequence of prompts to a user (e.g., voice over prompts) while monitoring for a voice input. When the voice input is received, a characteristic time stamp can be identified for the voice input, and can be compared to periods or windows associated with each of the provided prompts. The electronic device can then determine that the prompt corresponding to a window that includes the characteristic time stamp was the prompt to which the user wished to apply the voice input. The device can process the voice input to extract a user instruction, and apply the instruction to the identified prompt (e.g., and perform an operation associated with the prompt). | 07-14-2011 |
20110208517 | TIME-WARPING OF AUDIO SIGNALS FOR PACKET LOSS CONCEALMENT - Packet loss concealment (PLC) systems and methods are described that use time-warping to merge a concealment signal generated to replace one or more bad frames of an audio signal with a received signal representing one or more subsequent good frames of the audio signal in a manner that avoids signal discontinuity and audible artifacts resulting therefrom. Prediction-based PLC systems and methods are also described that use time-warping to conceal the loss of one or more frames containing a transition region in a manner that will not result in an audible artifact. | 08-25-2011 |
20110218802 | Continuous Speech Recognition - A computerized method for continuous speech recognition using a speech recognition engine and a phoneme model. The computerized method inputs a speech signal into the speech recognition engine. Based on the phoneme model, the speech signal is indexed by scoring for the phonemes of the phoneme model and a time-ordered list of phoneme candidates and respective scores resulting from the scoring are produced. The phoneme candidates are input with the scores from the time-ordered list. Word transcription candidates are typically input from a dictionary and words are built by selecting from the word transcription candidates based on the scores. A stream of transcriptions is outputted corresponding to the input speech signal. The stream of transcriptions is re-scored by searching for and detecting anomalous word transcriptions in the stream of transcriptions to produce second scores. | 09-08-2011 |
20110295599 | Aligning Scheme for Audio Signals - Methods, devices, and computer programs described herein may segment a reference signal that corresponds to a non-degraded signal into a plurality of reference signal segments; generate filter coefficients based on each reference signal segment; and filter each reference signal segment with its corresponding generated filter coefficients. The methods, devices, and computer programs may also filter a degraded signal, which includes a delayed signal of the reference signal, with each of the generated filtering coefficients to produce a number of degraded signals equivalent to a number of the plurality of reference signal segments; perform time-wise alignment for each filtered degraded signal with respect to each corresponding filtered reference signal segment; and output a time offset based on the performing. | 12-01-2011 |
20110301945 | SPEECH SIGNAL PROCESSING SYSTEM, SPEECH SIGNAL PROCESSING METHOD AND SPEECH SIGNAL PROCESSING PROGRAM PRODUCT FOR OUTPUTTING SPEECH FEATURE - A speech signal processing system which outputs a speech feature, divides an input speech signal into frames so that each pair of consecutive frames have a frame shift length equal to at least one period of the speech signal and have an overlap equal to at least a predetermined length, applies discrete Fourier transform to each of the frames, calculates a CSP coefficient for the pair, searches a predetermined search range in which a speech wave lags a period equal to at least one period to obtain the maximum value of the CSP coefficient for the pair, and generates time-series data of the maximum CSP coefficient values arranged in the order in which the frames appear. A method and a computer readable article of manufacture for the implementing the same are also provided. | 12-08-2011 |
20110313760 | SIGNAL DECOMPOSITION, ANALYSIS AND RECONSTRUCTION - The present invention provides a system and method for representing quasi-periodic (“qp”) waveforms comprising, representing a plurality of limited decompositions of the qp waveform, wherein each decomposition includes a first and second amplitude value and at least one time value. In some embodiments, each of the decompositions is phase adjusted such that the arithmetic sum of the plurality of limited decompositions reconstructs the qp waveform. These decompositions are stored into a data structure having a plurality of attributes. Optionally, these attributes are used to reconstruct the qp waveform, or patterns or features of the qp wave can be determined by using various pattern-recognition techniques. Some embodiments provide a system that uses software, embedded hardware or firmware to carry out the above-described method. Some embodiments use a computer-readable medium to store the data structure and/or instructions to execute the method. | 12-22-2011 |
20120084081 | SYSTEM AND METHOD FOR PERFORMING SPEECH ANALYTICS - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for performing trend analysis of speech. A system practicing the method receives a speech trend analysis request having candidate feature constraints, an objective function with respect to a speech trend to be analyzed, and a set of speech record constraints. The system selects a subset of speech records from the group of speech records based on the set of speech record constraints to yield selected speech records, identifies features in the selected speech records based on the set of candidate feature constraints to yield identified features, and assigns a weight to each of the identified features based on the objective function. Then the system ranks the identified features by their respective weights to yield ranked identified features, and outputs at least one of the ranked identified features associated with a speech-based trend in response to the speech trend analysis request. | 04-05-2012 |
20120158402 | System And Method For Adjusting Floor Controls Based On Conversational Characteristics Of Participants - A system and method for automatically adjusting floor controls based on conversational characteristics is provided. Audio streams are received, which each originate from an audio source. Floor controls for a current configuration including at least a portion of the audio streams are maintained. Conversational characteristics shared by two or more of the audio sources are determined. Possible configurations for the audio streams are identified based on the conversational characteristics. An analysis of the current configuration and the possible configurations is performed. A change threshold comprising a minimum number of timeslices for at least one of the current configuration and one of the possible configurations is applied to the analysis. When the analysis satisfies the change threshold, the floor controls are automatically adjusted. The audio streams are mixed into one or more outputs based on the adjusted floor controls. | 06-21-2012 |
20120158403 | VOICE REPRODUCTION APPARATUS AND VOICE REPRODUCTION METHOD - A voice reproduction apparatus includes an ambient sound analysis unit to analyze a characteristic of an ambient sound, a characteristic analysis unit to analyze an acoustic characteristic of a signal for reproduction, a reproduction timing adjusting unit to record the signal for reproduction and to read the signal for reproduction at a reproduction timing of follow-up reproduction, a reproduction speed changing unit to change a reproduction speed of the read signal for reproduction, and a control unit to control the reproduction timing adjusting unit so that the signal for reproduction is reproduced at the reproduction timing corresponding to an analysis result of the ambient sound analysis unit and to control the reproduction speed changing unit so that the signal for reproduction is reproduced at the reproduction speed corresponding to the analysis result of the ambient sound analysis unit and the acoustic characteristic obtained by the characteristic analysis unit. | 06-21-2012 |
20120179460 | CREATION AND USE OF TEST CASES FOR AUTOMATED TESTING OF MEDIA-BASED APPLICATIONS - A method for testing an automated interactive media system. The method can include establishing a communication session with the automated interactive media system. In response to receiving control and/or media information from the automated interactive media system, pre-recorded control and/or media information can be propagated to the automated interactive media system. The pre-recorded control and/or media information can be recorded in real time. | 07-12-2012 |
20120215528 | SPEECH RECOGNITION SYSTEM, SPEECH RECOGNITION REQUEST DEVICE, SPEECH RECOGNITION METHOD, SPEECH RECOGNITION PROGRAM, AND RECORDING MEDIUM - Provided is a speech recognition system, including: a first information processing device including a speech recognition processing unit for receiving data to be used for speech recognition transmitted via a network, carrying out speech recognition processing, and returning resultant data; and a second information processing device connected to the first information processing device via the network. The second information processing device performs conversion of the data into data having a format that disables a content thereof from being perceived and also enables the speech recognition processing unit to perform the speech recognition processing. Thereafter, the second information processing device transmits the data to be used for the speech recognition by the speech recognition processing unit and constructs resultant data returned from the first information processing device into a content of a valid and perceivable recognition result. | 08-23-2012 |
20120221327 | METHOD, DEVICE AND SYSTEM FOR VOICE ENCODING/DECODING - A method, a device and a system for voice encoding/decoding are disclosed in the present invention. The method includes: assembling an input pulse code modulation signal into one signal according to a designated time slot and assembly manner; and encoding the assembled signal according to a designated encoding manner to output an encoded voice signal. In the present invention, because a process of assembling or splitting the signal may be implemented through software, in the case that hardware in a current network does not need to be replaced, an effect of encoding/decoding voice with a 7 K spectrum may be achieved in the current network. | 08-30-2012 |
20120245929 | TERMINAL DEVICE, AUDIO OUTPUT METHOD, AND INFORMATION PROCESSING SYSTEM - In an audio output terminal device, a buffer control unit adjusts the buffer size of a jitter buffer in accordance with the setting of a sound output mode instructed in an instruction receiving unit. If the instruction receiving unit acknowledges an instruction for setting an audio output mode that requires low delay in outputting sound, the buffer control unit reduces the buffer size of the jitter buffer. Further, the buffer control unit controls, in accordance with the instructed setting of the sound output mode, timing for allowing a media buffer to transmit one or more voice packets to the jitter buffer. | 09-27-2012 |
20120265524 | Method And Apparatus Of Visual Feedback For Latency In Communication Media - A method and apparatus are provided for visualizing the latency in a conversation between a local speaker and at least one remote speaker separated from the local speaker by a communication medium. A latency estimate is obtained. A timing indication of at least the end of a conversational turn by the local speaker is obtained, and an outbound graphic is displayed, indicating the progress of at least the end-of-turn across the communication medium toward the remote speaker. The outbound graphical indication is displayed with a transit time across the medium that is derived from the latency estimate. An inbound graphic is displayed, indicating the progress across the communication medium toward the local speaker, of a start of a conversational turn by the remote speaker, which is imputed to begin when the remote speaker receives the local speaker's end-of-turn. The inbound graphical indication is displayed with a transit time across the medium that is derived from the latency estimate. | 10-18-2012 |
20120265525 | ENCODING METHOD, DECODING METHOD, ENCODER APPARATUS, DECODER APPARATUS, PROGRAM AND RECORDING MEDIUM - In encoding, pitch periods for time series signals in a predetermined time interval are calculated, and a code corresponding thereto is output. In that encoding, the resolutions for expressing the pitch periods and/or a pitch period encoding mode are switched according to whether an index indicating a periodicity and/or stationarity level of the time series signals satisfies a condition indicating high or low in periodicity and/or stationarity. In that decoding, according to whether an index indicating a periodicity and/or stationarity level, the index being included in or obtained from an input code corresponding to the predetermined time interval, satisfies a condition indicating high periodicity and/or stationarity, a decoding mode for a code, included in the input code, corresponding to pitch periods is switched to decode the code corresponding to the pitch periods to obtain the pitch periods corresponding to the predetermined time interval. | 10-18-2012 |
20120296642 | METHOD AND APPRATUS FOR TEMPORAL SPEECH SCORING - A method and apparatus for speech analysis, comprising detecting an at least one temporal characteristic of an at least one speech of an at least one speaker, and deducing an at least one quantitative score from the at least one temporal characteristic, where the at least one quantitative score indicates an at least one extent of an at least one behavioral aspect of the at least one speaker. | 11-22-2012 |
20130066626 | SPEECH ENHANCEMENT METHOD - A speech enhancement method is disclosed. The method includes the steps of: receiving a plurality of frames of sound signals by a microphone array; calculating an inter-aural time difference for each frequency band of each frame of the sound signals corresponding to at least one two-microphone set of the microphone array; calculating a plurality of values of cumulative histograms according to the calculated inter-aural time differences; determining a first inter-aural time difference threshold according to the calculated value of the cumulative histograms; and filtering the plurality of frames of sound signals according to the first inter-aural time difference threshold. | 03-14-2013 |
20130124200 | Noise-Robust Template Matching - Noise robust template matching may be performed. First features of a first signal may be computed. Based at least on a portion of the first features, second features of a second signal may be computed. A new signal may be generated based on at least another portion of the first features and on at least a portion of the second features. | 05-16-2013 |
20130253921 | Method and Apparatus to Prepare Listener-Interest-Filtered Works - An embodiment of the present invention is a method of presenting at least a portion of an audio or audio-visual work including: (a) retrieving an average speed contour or a democratic speed contour from a database apparatus; and (b) presenting the at least a portion at a playback apparatus using the retrieved average speed contour or democratic speed contour to provide presentation rates. | 09-26-2013 |
20130297299 | Sparse Auditory Reproducing Kernel (SPARK) Features for Noise-Robust Speech and Speaker Recognition - The speech feature extraction algorithm is based on a hierarchical combination of auditory similarity and pooling functions. Computationally efficient features referred to as “Sparse Auditory Reproducing Kernel” (SPARK) coefficients are extracted under the hypothesis that the noise-robust information in speech signal is embedded in a reproducing kernel Hilbert space (RKHS) spanned by overcomplete, nonlinear, and time-shifted gammatone basis functions. The feature extraction algorithm first involves computing kernel based similarity between the speech signal and the time-shifted gammatone functions, followed by feature pruning using a simple pooling technique (“MAX” operation). Different hyper-parameters and kernel functions may be used to enhance the performance of a SPARK based speech recognizer. | 11-07-2013 |
20140088957 | METHOD AND APPARATUS FOR PERFORMING PACKET LOSS OR FRAME ERASURE CONCEALMENT - A method for performing packet loss or Frame Erasure Concealment (FEC) for a speech coder receives encoded frames of compressed speech information transmitted from an encoder. The method determines whether an encoded frame has been lost, corrupted in transmission, or erased, synthesizes properly received frames, and decides on an overlap-add window to use in combining a portion of the synthesized speech signal with a subsequent speech signal resulting from a received and decoded packet, where the size of the overlap-add window is based on the unavailability of packets. If it is determined that an encoded frame has been lost, corrupted in transmission, or erased, the method performed an overlap-add operation on the portion of the synthesized speech signal and the subsequent speech signal, using the decided-on overlap-add window. | 03-27-2014 |
20140172420 | AUDIO OR VOICE SIGNAL PROCESSOR - A voice or audio signal processor for processing received network packets received over a communication network to provide an output signal, the voice or audio signal processor comprising a jitter buffer being configured to buffer the received network packets, a voice or audio decoder being configured to decode the received network packets as buffered by the jitter buffer to obtain a decoded voice or audio signal, a controllable time scaler being configured to amend a length of the decoded voice or audio signal to obtain a time scaled voice or audio signal as the output voice or audio signal, and an adaptation control means being configured to control an operation of the time scaler in dependency on a processing complexity measure. | 06-19-2014 |
20140180684 | Systems, Methods, and Apparatus for Assigning Three-Dimensional Spatial Data to Sounds and Audio Files - Embodiments of the disclosure can include systems, methods, and apparatus for assigning three-dimensional spatial data to sounds and audio files. In one embodiment, a method can include receiving at least one audio signal, receiving sonic spatial data, associating the sonic spatial data with the at least one audio signal, associating the at least one audio signal and sonic spatial data with a time code, and storing the sonic spatial data, the at least one audio signal, and time code in an encoded sound file. | 06-26-2014 |
20140236586 | METHOD AND APPARATUS FOR COMMUNICATING MESSAGES AMONGST A NODE, DEVICE AND A USER OF A DEVICE - An method and apparatus that modifies static media, such as music files being played to a user of the device, upon the generation or receipt of an alert, notification or message, so that information in the alert, notification or message can be incorporated into the media files then communicated to the user. In a further embodiment, a user's response to the communicated information can be sensed using one or more sensors and transducers so as to provide feedback to the device, and then optionally to a node in a system. | 08-21-2014 |
20140244245 | METHOD FOR SOUNDPROOFING AN AUDIO SIGNAL BY AN ALGORITHM WITH A VARIABLE SPECTRAL GAIN AND A DYNAMICALLY MODULATABLE HARDNESS - The method comprises, in the frequency domain: estimating ( | 08-28-2014 |
20150066493 | TIME WARP ACTIVATION SIGNAL PROVIDER, AUDIO SIGNAL ENCODER, METHOD FOR PROVIDING A TIME WARP ACTIVATION SIGNAL, METHOD FOR ENCODING AN AUDIO SIGNAL AND COMPUTER PROGRAMS - An audio encoder has a window function controller, a windower, a time warper with a final quality check functionality, a time/frequency converter, a TNS stage or a quantizer encoder, the window function controller, the time warper, the TNS stage or an additional noise filling analyzer are controlled by signal analysis results obtained by a time warp analyzer or a signal classifier. Furthermore, a decoder applies a noise filling operation using a manipulated noise filling estimate depending on a harmonic or speech characteristic of the audio signal. | 03-05-2015 |
20150120286 | APPARATUS, PROCESS, AND PROGRAM FOR COMBINING SPEECH AND AUDIO DATA - There is provided a speech processing apparatus including: a data obtaining unit which obtains music progression data defining a property of one or more time points or one or more time periods along progression of music; a determining unit which determines an output time point at which a speech is to be output during reproducing the music by utilizing the music progression data obtained by the data obtaining unit; and an audio output unit which outputs the speech at the output time point determined by the determining unit during reproducing the music. | 04-30-2015 |
20150302859 | Scalable And Embedded Codec For Speech And Audio Signals - A system and method for processing of audio and speech signals is disclosed, which provide compatibility over a range of communication devices operating at different sampling frequencies and/or bit rates. The analyzer of the system divides the input signal in different portions, at least one of which carries information sufficient to provide intelligible reconstruction of the input signal. The analyzer also encodes separate information about other portions of the signal in an embedded manner, so that a smooth transition can be achieved from low bit-rate to high bit-rate applications. Accordingly, communication devices operating at different sampling rates and/or bit-rates can extract corresponding information from the output bit stream of the analyzer. In the present invention embedded information generally relates to separate parameters of the input signal, or to additional resolution in the transmission of original signal parameters. Non-linear techniques for enhancing the overall performance of the system are also disclosed. Also disclosed is a novel method of improving the quantization of signal parameters. In a specific embodiment the input signal is processed in two or more modes dependent on the state of the signal in a frame. When the signal is determined to be in a transition state, the encoder provides phase information about N sinusoids, which the decoder end uses to improve the quality of the output signal at low bit rates. | 10-22-2015 |
20150332687 | APPARATUS AND METHOD FOR DETERMINING AUDIO AND/OR VISUAL TIME SHIFT - An system for determining time offset of an audio signature from a reference signature time stamp may have a server connected to one or more remote devices. The server may have a receiver connected to one of more communication channels configured to receive an audio signature generated by the remote device and transmitted over the communication channel. The system may have a database at or connected to the server that contains one or more reference audio signatures. Each of the reference audio signatures may have a time stamp also stored in the database. A query engine may compare the remote audio signature to one or more reference audio signatures stored in the database. A processor may be provided to compare a reference timestamp to a timestamp associated with said remote audio signature. The system may be used to evaluate viewing habits and particularly delayed viewing and program playback manipulation such as Fast-forward, slow-motion, skip, rewind and program abandonment. | 11-19-2015 |
20150380058 | METHOD, DEVICE, TERMINAL, AND SYSTEM FOR AUDIO RECORDING AND PLAYING - A recording method includes receiving a mark start instruction in a process of recording audio data and establishing a mark event according to the mark start instruction. The mark event is configured to mark the audio data. The method further includes recording at least one parameter of the mark event, receiving a mark end instruction, and ending recording of the at least one parameter of the mark event according to the mark end instruction to obtain a mark data structure. | 12-31-2015 |
20160086606 | Automated Speech Recognition Proxy System for Natural Language Understanding - An interactive response system mixes HSR subsystems with ASR subsystems to facilitate overall capability of voice user interfaces. The system permits imperfect ASR subsystems to nonetheless relieve burden on HSR subsystems. An ASR proxy is used to implement an IVR system, and the proxy dynamically determines how many ASR and HSR subsystems are to perform recognition for any particular utterance, based on factors such as confidence thresholds of the ASRs and availability of human resources for HSRs. | 03-24-2016 |