Patent application number | Description | Published |
20140074467 | Speaker Separation in Diarization - The system and method of separating speakers in an audio file including obtaining an audio file. The audio file is transcribed into at least one text file by a transcription server. Homogenous speech segments are identified within the at least one text file. The audio file is segmented into homogenous audio segments that correspond to the identified homogenous speech segments. The homogenous audio segments of the audio file are separated into a first speaker audio file and second speaker audio file the first speaker audio file and the second speaker audio file are transcribed to produce a diarized transcript. | 03-13-2014 |
20150025887 | Blind Diarization of Recorded Calls with Arbitrary Number of Speakers - In a method of diarization of audio data, audio data is segmented into a plurality of utterances. Each utterance is represented as an utterance model representative of a plurality of feature vectors. The utterance models are clustered. A plurality of speaker models are constructed from the clustered utterance models. A hidden Markov model is constructed of the plurality of speaker models. A sequence of identified speaker models is decoded. | 01-22-2015 |
20150039304 | Voice Activity Detection Using A Soft Decision Mechanism - Voice activity detection (VAD) is an enabling technology for a variety of speech based applications. Herein disclosed is a robust VAD algorithm that is also language independent. Rather than classifying short segments of the audio as either “speech” or “silence”, the VAD as disclosed herein employees a soft-decision mechanism. The VAD outputs a speech-presence probability, which is based on a variety of characteristics. | 02-05-2015 |
20150039306 | System and Method of Automated Evaluation of Transcription Quality - Systems and methods automatedly evaluate a transcription quality. Audio data is obtained. The audio data is segmented into a plurality of utterances with a voice activity detector operating on a computer processor. The plurality of utterances are transcribed into at least one word lattice with a large vocabulary continuous speech recognition system operating on the processor. A minimum Bayes risk decoder is applied to the at least one word lattice to create at least one confusion network. At least conformity ratio is calculated from the at least one confusion network. | 02-05-2015 |
20150066504 | System and Method for Determining the Compliance of Agent Scripts - Systems and methods of script identification in audio data obtained from audio data. The audio data is segmented into a plurality of utterances. A script model representative of a script text is obtained. The plurality of utterances are decoded with the script model. A determination is made if the script text occurred in the audio data. | 03-05-2015 |