Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Endpoint detection

Subclass of:

704 - Data processing: speech signal processing, linguistics, language translation, and audio compression/decompression

704200000 - SPEECH SIGNAL PROCESSING

704231000 - Recognition

704246000 - Voice recognition

Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application number	Description	Number of patent applications / Date published
704248000	Endpoint detection	15
20100030559	System and method for an endpoint detection of speech for improved speech recognition in noisy environments - According to a disclosed embodiment, an endpointer determines the background energy of a first portion of a speech signal, and a cepstral computing module extracts one or more features of the first portion. The endpointer calculates an average distance of the first portion based on the features. Subsequently, an energy computing module measures the energy of a second portion of the speech signal, and the cepstral computing module extracts one or more features of the second portion. Based on the features of the second portion, the endpointer calculates a distance of the second portion. Thereafter, the endpointer contrasts the energy of the second portion with the background energy of the first portion, and compares the distance of the second portion with the distance of the first portion. The second portion of the speech signal is classified by the endpointer as speech or non-speech based on the contrast and the comparison.	02-04-2010
20100153110	VOICE RECOGNITION SYSTEM AND METHOD OF A MOBILE COMMUNICATION DEVICE - A voice recognition system and method of a mobile communication device. The mobile communication device has a storage device which stores voice templates and characteristics of each of the voice templates. The method recognizes voice data form a sound input, calculates a similarity ratio between characteristics of the sound input and the characteristics of each of the voice templates, and sorts the voice templates according to the similarity ratio in a list of text symbols representing the voice templates. The list of text symbols can be selected as a proper text input by a user.	06-17-2010
20110224987	DETECTION OF VOICE INACTIVITY WITHIN A SOUND STREAM - A method for identifying end of voiced speech within an audio stream of a noisy environment employs a speech discriminator. The discriminator analyzes each window of the audio stream, producing an output corresponding to the window. The output is used to classify the window in one of several classes, for example, (	09-15-2011
20120191455	System and Method for an Endpoint Detection of Speech for Improved Speech Recognition in Noisy Environments - According to a disclosed embodiment, an endpointer determines the background energy of a first portion of a speech signal, and a cepstral computing module extracts one or more features of the first portion. The endpointer calculates an average distance of the first portion based on the features. Subsequently, an energy computing module measures the energy of a second portion of the speech signal, and the cepstral computing module extracts one or more features of the second portion. Based on the features of the second portion, the endpointer calculates a distance of the second portion. Thereafter, the endpointer contrasts the energy of the second portion with the background energy of the first portion, and compares the distance of the second portion with the distance of the first portion. The second portion of the speech signal is classified by the endpointer as speech or non-speech based on the contrast and the comparison.	07-26-2012
20120271633	INTERACTIVE DEVICE - The present invention provides an interactive device which allows quick utterance recognition results and sequential output thereof and which diminishes a recognition rate decrease even if user's utterance is divided by a short interval into frames for quick decision. The interactive device: sets a recognition section for voice recognition; performs voice recognition for the recognition section; when the voice recognition includes a key phrase, determines response actions corresponding thereto; and executes the response actions. The interactive device repeatedly updates the set recognition terminal point to a frame which is the predetermined time length ahead of the set recognition terminal point to set a plurality of recognition sections. The interactive device performs voice recognition for each recognition section.	10-25-2012
20130144622	SPEECH PROCESSING DEVICE AND SPEECH PROCESSING METHOD - A speech processing device which can accurately extract a conversation group from among a plurality of speakers, even when a conversation group formed of three or more people is present. This device (	06-06-2013
20130179168	IMAGE DISPLAY APPARATUS AND METHOD OF CONTROLLING THE SAME - Provided are an image display apparatus and a method of controlling the same. The image display apparatus enabling voice recognition includes: a first voice inputter which receives a user-side audio signal; an audio outputter which outputs an audio signal processed by the image display apparatus; a first voice recognizer which recognizes the user-side audio signal received through the first voice inputter; and a controller which decreases a volume of the audio signal output through the audio outputter to a predetermined level if a voice recognition start command is received.	07-11-2013
20130238335	ENDPOINT DETECTION APPARATUS FOR SOUND SOURCE AND METHOD THEREOF - An apparatus for detecting endpoints of sound signals when sound sources vocalized from a remote site are processed even if a plurality of speakers exists and an interference sound being input from a direction different from a direction of one speaker, and a method thereof, wherein in an environment in which a plurality of sound sources exists, the existence and the length of the sound source being input according to each direction is determined and the endpoint is found, thereby improving the performance of the post-processing, and speech being input from a direction other than a direction of speech from a speaker vocalized at a remote area from a sound source collecting unit is distinguished while the speech from the speaker is being recorded, thereby enabling a remote sound source recognition without restriction on the installation region of a microphone.	09-12-2013
20140046665	Computer-Implemented System And Method For Enhancing Visual Representation To Individuals Participating In A Conversation - A system and method for enhancing visual representation to individuals participating in a conversation is provided. Visual data for a plurality of individuals participating in one or more conversations is analyzed. Possible conversational configurations of the individuals are generated. Each possible conversational configuration includes one or more pair-wise probabilities of at least two of the individuals. A probability weight is assigned to each of the pair-wise probabilities having a likelihood that the individuals of that pair-wise probability are participating in a conversation. A probability of each possible conversational configuration is determined by combining the probability weights for the pair-wise probabilities of that possible conversational configuration. The possible conversational configuration with the highest probability is selected as a most probable configuration. The individuals participating in the conversations based on the pair-wise probabilities with the most probable configuration are determined. Visual representation is enhanced for each individual participating in the determined conversations.	02-13-2014
20140149117	METHOD AND SYSTEM FOR IDENTIFICATION OF SPEECH SEGMENTS - A system for distinguishing and identifying speech segments originating from speech of one or more relevant speakers in a predefined detection area. The system includes an optical system which outputs optical patterns, each representing audio signals as detected by the optical system in the area within a specific time frame; and a computer processor which receives each of the outputted optical patterns and analyses each respective optical pattern to provide information that enables identification of speech segments thereby, by identifying blank spaces in the optical pattern, which define beginning or ending of each respective speech segment.	05-29-2014
20140163986	VOICE-BASED CAPTCHA METHOD AND APPARATUS - Disclosed herein is a voice-based CAPTCHA method and apparatus which can perform a CAPTCHA procedure using the voice of a human being. In the voice-based CAPTCHA) method, a plurality of uttered sounds of a user are collected. A start point and an end point of a voice from each of the collected uttered sounds are detected and then speech sections are detected. Uttered sounds of the respective detected speech sections are compared with reference uttered sounds, and then it is determined whether the uttered sounds are correctly uttered sounds. It is determined whether the uttered sounds have been made by an identical speaker if it is determined that the uttered sounds are correctly uttered sounds. Accordingly, a CAPTCHA procedure is performed using the voice of a human being, and thus it can be easily checked whether a human being has personally made a response using a voice online	06-12-2014
20140379345	METHOD AND APPARATUS FOR DETECTING SPEECH ENDPOINT USING WEIGHTED FINITE STATE TRANSDUCER - Disclosed are an apparatus and a method for detecting a speech endpoint using a WFST. The apparatus in accordance with an embodiment of the present invention includes: a speech decision portion configured to receive frame units of feature vector converted from a speech signal and to analyze and classify the received feature vector into a speech class or a noise class; a frame level WFST configured to receive the speech class and the noise class and to convert the speech class and the noise class to a WFST format; a speech level WFST configured to detect a speech endpoint by analyzing a relationship between the speech class and noise class and a preset state; a WFST combination portion configured to combine the frame level WFST with the speech level WFST; and an optimization portion configured to optimize the combined WFST having the frame level WFST and the speech level WFST combined therein to have a minimum route.	12-25-2014
20150371665	ROBUST END-POINTING OF SPEECH SIGNALS USING SPEAKER RECOGNITION - Systems and processes for robust end-pointing of speech signals using speaker recognition are provided. In one example process, a stream of audio having a spoken user request can be received. A first likelihood that the stream of audio includes user speech can be determined. A second likelihood that the stream of audio includes user speech spoken by an authorized user can be determined. A start-point or an end-point of the spoken user request can be determined based at least in part on the first likelihood and the second likelihood.	12-24-2015
20160005394	VOICE RECOGNITION APPARATUS, VOICE RECOGNITION METHOD AND PROGRAM - There is provided an apparatus and a method for rapidly extracting a target sound from a sound signal where a variety of sounds are mixed generated from a plurality of the sound sources. There is a voice recognition unit including a tracking unit for detecting a sound source direction and a voice segment to execute a sound source extraction process, and a voice recognition unit for inputting a sound source extraction result to execute a voice recognition process. In the tracking unit, a segment being created management unit that creates and manages a voice segment per unit of sound source sequentially detects a sound source direction, sequentially updates a voice segment estimated by connecting a detection result to a time direction, creates an extraction filter for a sound source extraction after a predetermined time is elapsed, and sequentially creates a sound source extraction result by sequentially applying the extraction filter to an input voice signal. The voice recognition unit sequentially executes the voice recognition process to a partial sound source extraction result to output a voice recognition result.	01-07-2016
20170236532	SYSTEMS AND METHODS FOR IMPROVING AUDIO CONFERENCING SERVICES	08-17-2017