Class / Patent application number | Description | Number of patent applications / Date published |
704215000 | Silence decision | 7 |
20090106020 | AUDIO GLITCH REDUCTION - To reduce audio glitch rendering buffer of an audio application is pre-filled with natural sounding audio rather than zeros. For every frame of audio sent for rendering, the rendering buffer is also pre-filled or the signal is stretched in the buffer in anticipation of a glitch. If the glitch does not occur, then the stretched signal is overwritten and the end user does not notice it. If the glitch does occur, then the rendering buffer is already filled with a stretched version of the previous audio and may result in sound that is acceptable. After recovery from the glitch, any new data is smoothly merged into the fake audio that was generated before. | 04-23-2009 |
20090198490 | RESPONSE TIME WHEN USING A DUAL FACTOR END OF UTTERANCE DETERMINATION TECHNIQUE - The present invention discloses a solution for a speech processing system to determine end-of-utterance (EOU) events. The solution is a modified dual factor technique, where one factor is based upon a number of silence frames received and a second factor is based upon an end-of-path occurrence. The solution permits a set of configurable timeout delay values to be established, which can be configured on an application specific basis by application developers. The solution can speed up EOU determinations made through a dual factor technique, by situationally making finalization determination before a silence frame window is full. | 08-06-2009 |
20090259460 | SILENCE-BASED ADAPTIVE REAL-TIME VOICE AND VIDEO TRANSMISSION METHODS AND SYSTEM - A client for silence-based adaptive real-time voice and video (SAVV) transmission methods and systems, detects the activity of a voice stream of conversational speech and aggressively transmits the corresponding video frames if silence in the sending or receiving voice stream has been detected, and adaptively generates and transmits key frames of the video stream according to characteristics of the conversational speech. In one aspect, a coordination management module generates video frames, segmentation and transmission strategies according to feedback from a voice encoder of the SAVV client and the user's instructions. In another aspect, the coordination management module generates video frames, segmentation and transmission strategies according to feedback from a voice decoder of the SAVV client and the user's instructions. In one example, the coordination management module adaptively generates a key video frame when silence is detected in the receiving voice stream. | 10-15-2009 |
20100094621 | System and Method for Assessing Script Running Time - A method and system for assessing script running time are disclosed. According to one embodiment, a computer implemented method comprises receiving a first input from a client, wherein the first input comprises a script, the script comprising spoken audio and video elements. A second input is received from the client, wherein the second input comprises a value of words per minute. A third input is received from the client, wherein the third input comprises a non-verbal duration. A duration of the spoken audio of the script is calculated by multiplying the value of words per minute and the spoken content of the script. A duration of the script is calculated by summing the duration of the spoken audio and the non-verbal duration and the duration of the script is returned to the client. | 04-15-2010 |
20100106490 | Method and Speech Encoder with Length Adjustment of DTX Hangover Period - The present invention relates to a speech encoder comprising: a voice activity detector (VAD) configured to receive speech frames and to generate a speech decision (VAD_flag), a speech/SID encoder configured to receive said speech frames and to generate a signal identifying speech frames based on the encoder decision (SP), which in turn is based on the speech decision (VAD_flag) and a DTX-hangover period, and a SID-synchronizer configured to transmit a signal (TxType) comprising speech frames, SID frames and No_data frames. The speech encoder further comprises: a signal analyzer configured to analyze energy values of speech frames within the DTX-hangover period, and a DTX-handler configured to adjust the length of the DTX-hangover period in response to the analysis performed by the signal analyzer. The invention also relates to a method for estimating the characteristic of a DTX-hangover period in a speech encoder. | 04-29-2010 |
20100106491 | Voice Activity Detection and Silence Suppression in a Packet Network - The present invention is a system and method that improves upon voice activity detection by packetizing actual noise signals, typically background noise. In accordance with the present invention an access network receives an input voice signal (including noise) and converts the input voice signal into a packetized voice signal. The packetized voice signal is transmitted via a network to an egress network. The egress network receives the packetized voice signal, converts the packetized voice signal into an output voice signal, and outputs the output voice signal. The egress network also extracts and stores noise packets from the received packetized voice signal and converts the packetized noise signal into an output noise signal. When the access network ceases to receive the input voice signal while the call is still ongoing, the access network instructs the egress network to continually output the output noise signal. | 04-29-2010 |
20150325256 | METHOD AND APPARATUS FOR DETECTING VOICE SIGNAL - The invention discloses a method including: performing in a unit of first timeframe frame length, framing on a continuous voice sample to obtain a plurality of first timeframes, detecting energy of each of the first timeframes, and determining a target first timeframe including a potential abrupt exception of a voice signal by analyzing a relationship between the energy of the plurality of first timeframes; performing, in a unit of second timeframe frame length, framing on the continuous voice sample to obtain a plurality of second timeframes, and processing each of the second timeframes to acquire a tone feature, and determining, by analyzing a tone feature of at least one of the second timeframes including at least one target second timeframe, whether the potential abrupt exception of a voice signal included in the target first timeframe included in the target second timeframe is a real abrupt exception of a voice signal. | 11-12-2015 |