Entries |
Document | Title | Date |
20080208578 | Robust Speaker-Dependent Speech Recognition System - The present invention provides a method of incorporating speaker-dependent expressions into a speaker-independent speech recognition system providing training data for a plurality of environmental conditions and for a plurality of speakers. The speakerdependent expression is transformed in a sequence of feature vectors and a mixture density of the set of speaker-independent training data is determined that has a minimum distance to the generated sequence of feature vectors. The determined mixture density is then assigned to a Hidden-Markov-Model (HMM) state of the speaker-dependent expression. Therefore, speaker-dependent training data and references no longer have to be explicitly stored in the speech recognition system. Moreover, by representing a speaker-dependent expression by speaker-independent training data, an environmental adaptation is inherently provided. Additionally, the invention provides generation of artificial feature vectors on the basis of the speaker-dependent expression providing a substantial improvement for the robustness of the speech recognition system with respect to varying environmental conditions. | 08-28-2008 |
20080215322 | Method and System for Generating Training Data for an Automatic Speech Recogniser - The invention describes a method and a system for generating training data (D | 09-04-2008 |
20080221884 | MOBILE ENVIRONMENT SPEECH PROCESSING FACILITY - In embodiments of the present invention improved capabilities are described for a mobile environment speech processing facility. The present invention may provide for the entering of text into a software application resident on a mobile communication facility, where recorded speech may be presented by the user using the mobile communications facility's resident capture facility. Transmission of the recording may be provided through a wireless communication facility to a speech recognition facility, and may be accompanied by information related to the software application. Results may be generated utilizing the speech recognition facility that may be independent of structured grammar, and may be based at least in part on the information relating to the software application and the recording. The results may then be transmitted to the mobile communications facility, where they may be loaded into the software application. In embodiments, the user may be allowed to alter the results that are received from the speech recognition facility. | 09-11-2008 |
20080249773 | METHOD AND SYSTEM FOR THE AUTOMATIC GENERATION OF SPEECH FEATURES FOR SCORING HIGH ENTROPY SPEECH - A method and system for automatically generating a scoring model for scoring a speech sample are disclosed. One or more training speech samples are received in response to a prompt. One or more speech features are determined for each of the training speech samples. A scoring model is then generated based on the speech features. At least one of the training speech samples may be a high entropy speech sample. An evaluation speech sample is received and a score is assigned to the evaluation speech sample using the scoring model. The evaluation speech sample may be a high entropy speech sample. | 10-09-2008 |
20080281593 | Apparatus for Reducing Spurious Insertions in Speech Recognition - Techniques for improving an automatic baseform generation system. More particularly, the invention provides techniques for reducing insertion of spurious speech events in a word or phone sequence generated by an automatic baseform generation system. Such automatic baseform generation techniques may be accomplished by enhancing the scores of long-lasting speech events with respect to the scores of short-lasting events. For example, this may be achieved by merging competing candidates that relate to the same speech event (e.g., phone or word) and that overlap in time into a single candidate, the score of which may be equal to the sum of the scores of the merged candidates. | 11-13-2008 |
20080300876 | Speech Recognition Device Using Statistical Language Model - [Object] To provide recognition of natural speech for a speech application in a grammar method with little effort and cost. | 12-04-2008 |
20090006091 | APPARATUSES AND METHODS FOR HANDLING RECORDED VOICE STRINGS - An apparatus, method, and computer program product for facilitating the identification and manipulation of recorded voice strings is provided. The apparatus includes a processor for receiving a voice string that has been recorded. The processor automatically assigns the recorded voice string a name that is indicative of the content of the voice string or of a characteristic of the voice string but which may include other information regarding the voice string. Thus, the voice string may be assigned a name that provides the user with an idea of the content or circumstance of the voice string when it was recorded without requiring the user to input a name for the recorded voice string. In this way, the user may be able to access the recorded voice string more easily. The apparatus may also include a microphone, memory element, and/or a display for presenting a list of recorded voice strings. | 01-01-2009 |
20090030687 | ADAPTING AN UNSTRUCTURED LANGUAGE MODEL SPEECH RECOGNITION SYSTEM BASED ON USAGE - A user may control a mobile communication facility through recognized speech provided to the mobile communication facility. Speech that is recorded by a user using a mobile communication facility resident capture facility is transmitted through a wireless communication facility to a speech recognition facility. The speech recognition facility generates results using an unstructured language model based at least in part on information relating to the recording. The results are transmitted to the mobile communications facility where an action is performed on the mobile communication facility based on the results and adapting the speech recognition facility based on usage. | 01-29-2009 |
20090030688 | TAGGING SPEECH RECOGNITION RESULTS BASED ON AN UNSTRUCTURED LANGUAGE MODEL FOR USE IN A MOBILE COMMUNICATION FACILITY APPLICATION - Entering information into a software application resident on a mobile communication facility comprises recording speech presented by a user using a mobile communication facility resident capture facility, transmitting the recording through a wireless communication facility to a speech recognition facility, transmitting information relating to the software application to the speech recognition facility, generating results utilizing the speech recognition facility using an unstructured language model based at least in part on the information relating to the software application and the recording, tagging the results with information about the words in the results, transmitting the results and tags to the mobile communications facility, and loading the results and tags into the software application. | 01-29-2009 |
20090037172 | Method for generating a vector codebook, method and device for compressing data, and distributed speech recognition system - A method for compressing data, the data being represented by an input vector having Q features, wherein Q is an integer higher than 1, including the steps of 1) providing a vector codebook of sub-sets of indexed Q-feature reference vectors and threshold values associated with the sub-sets for a prefixed feature; 2) identifying a sub-set of reference vectors among the sub-sets by progressively comparing the value of a feature of the input vector which corresponds to the prefixed feature, with the threshold values associated with the sub-sets; and 3) identifying the reference vector which, within the sub-set identified in step 2), provides the lowest distortion with respect to the input vector. | 02-05-2009 |
20090119103 | SPEAKER RECOGNITION SYSTEM - A method automatically recognizes speech received through an input. The method accesses one or more speaker-independent speaker models. The method detects whether the received speech input matches a speaker model according to an adaptable predetermined criterion. The method creates a speaker model assigned to a speaker model set when no match occurs based on the input. | 05-07-2009 |
20090132249 | MODIFYING METHOD FOR SPEECH MODEL AND MODIFYING MODULE THEREOF - A modifying method for a speech model and a modifying module thereof are provided. The modifying method is as follows. First, a correct sequence of a speech is generated according to a correct sequence generating method and the speech model. Next, a candidate sequence generating method is selected from a plurality of candidate sequence generating methods, and a candidate sequence of the speech is generated according to the selected candidate sequence generating method and the speech model. Finally, the speech model is modified according to the correct sequence and the candidate sequence. Therefore, the present invention increases a discrimination of the speech model. | 05-21-2009 |
20090138263 | Data Process unit and data process unit control program - To provide a data process unit and data process unit control program which are suitable for generating acoustic models for unspecified speakers taking distribution of diversifying feature parameters into consideration under such specific conditions as the type of speaker, speech lexicons, speech styles, and speech environment and which are suitable for providing acoustic models intended for unspecified speakers and adapted to speech of a specific person. | 05-28-2009 |
20090150148 | VOICE RECOGNITION APPARATUS AND MEMORY PRODUCT - A voice recognition apparatus can reduce false recognition caused by matching with respect to the phrases composed of a small number of syllables, when it performs a recognition process, by a pronunciation unit, for voice data based on voice produced by a speaker such as a syllable and further performs recognition by a method such as the Word Spotting for matching with respect to the phrases stored in the phrase database. The voice recognition apparatus performs a recognition process for comparing a result of the recognition process by a pronunciation unit with the extended phrases obtained by adding the additional phrase before and/or behind the respective phrases. | 06-11-2009 |
20090157401 | Semantic Decoding of User Queries - An intelligent query system for processing voiced-based queries is disclosed, which uses semantic based processing to identify the question posed by the user by understanding the meaning of the user's utterance. Based on identifying the meaning of the utterance, the system selects a single answer that best matches the user's query. The answer that is paired to this single question is then retrieved and presented to the user. The system, as implemented, accepts environmental variables selected by the user and is scalable to provide answers to a variety and quantity of user-initiated queries. | 06-18-2009 |
20090228275 | SYSTEM FOR ENHANCING LIVE SPEECH WITH INFORMATION ACCESSED FROM THE WORLD WIDE WEB - A system that includes a speaker workstation and a system that includes an auditor device. The speaker workstation is configured to perform a method for generating a Speech Hyperlink-Time table in conjunction with a system of universal time. The speaker workstation creates a Speech Hyperlink table. While a speech is being spoken by a speaker, the speaker workstation recognizes each hyperlinked term of the Speech Hyperlink table being spoken by the speaker, and for each recognized hyperlinked term, generates a row in the Speech Hyperlink-Time table. The auditor device is configured to perform a method for processing a speech in conjunction with a system of universal time. The auditor device determines and records, in a record of a Selections Hyperlink-Time table, a universal time corresponding to a hyperlinked term spoken during a speech. | 09-10-2009 |
20090319269 | Method of Trainable Speaker Diarization - A novel and useful method of using labeled training data and machine learning tools to train a speaker diarization system. Intra-speaker variability profiles are created from training data consisting of an audio stream labeled where speaker changes occur (i.e. which participant is speaking at any given time). These intra-speaker variability profiles are then applied to an unlabeled audio stream to segment the audio stream into speaker homogeneous segments and to cluster segments according to speaker identity. | 12-24-2009 |
20100057461 | METHOD AND SYSTEM FOR CREATING OR UPDATING ENTRIES IN A SPEECH RECOGNITION LEXICON - In a method and a system ( | 03-04-2010 |
20100057462 | Speech Recognition - The present invention relates to a method for speech recognition of a speech signal comprising the steps of providing at least one codebook comprising codebook entries, in particular, multivariate Gaussians of feature vectors, that are frequency weighted such that higher weights are assigned to entries corresponding to frequencies below a predetermined level than to entries corresponding to frequencies above the predetermined level and processing the speech signal for speech recognition comprising extracting at least one feature vector from the speech signal and matching the feature vector with the entries of the codebook. | 03-04-2010 |
20100063817 | ACOUSTIC MODEL REGISTRATION APPARATUS, TALKER RECOGNITION APPARATUS, ACOUSTIC MODEL REGISTRATION METHOD AND ACOUSTIC MODEL REGISTRATION PROCESSING PROGRAM - An acoustic model registration apparatus, an talker recognition apparatus, an acoustic model registration method and an acoustic model registration processing program, each of which prevents certainly an acoustic model having a low recognition capability for talker from being registered certainly, are provided. | 03-11-2010 |
20100070276 | METHOD AND APPARATUS FOR INTERACTION OR DISCOURSE ANALYTICS - A method and apparatus for analyzing and segmenting a vocal interaction captured in a test audio source, the test audio source captured within an environment. The method and apparatus first use text and acoustic features extracted from the interaction with tagging information, for constructing a model. Then, at production time, text and acoustic features are extracted from the interactions, and by applying the model, tagging information is retrieved for the interaction, enabling analysis, flow visualization or further processing of the interaction. | 03-18-2010 |
20100106501 | Updating a Voice Template - Updating a voice template for recognizing a speaker on the basis of a voice uttered by the speaker is disclosed. Stored voice templates indicate distinctive characteristics of utterances from speakers. Distinctive characteristics are extracted for a specific speaker based on a voice message utterance received from that speaker. The distinctive characteristics are compared to the characteristics indicated by the stored voice templates to selected a template that matches within a predetermined threshold. The selected template is updated on the basis of the extracted characteristics. | 04-29-2010 |
20100131272 | APPARATUS AND METHOD FOR GENERATING AND VERIFYING A VOICE SIGNATURE OF A MESSAGE AND COMPUTER READABLE MEDIUM THEREOF - Apparatuses and methods for generating and verifying a voice signature of a message and computer readable medium thereof are provided. The generation and verification ends both use the same set of pronounceable symbols. The set of pronounceable symbols comprises a plurality of pronounceable units, and each of the pronounceable units comprises an index and a pronounceable symbol. The generation end converts the message into a message digest by a hash function and generates a plurality of designated pronounceable symbols according to the message digest. A user utters the designated pronounceable symbols to generate the voice signature. After receiving the message and the voice signature, the verification end performs voice authentication to determine a user identity of the voice signature, performs speech recognition to determine the relation between the message and the voice signature, and determines whether the user generates the voice signature for the message. | 05-27-2010 |
20100138222 | Method for Adapting a Codebook for Speech Recognition - A method for adapting a codebook for speech recognition, wherein the codebook is from a set of codebooks comprising a speaker-independent codebook and at least one speaker-dependent codebook is disclosed. A speech input is received and a feature vector based on the received speech input is determined. For each of the Gaussian densities, a first mean vector is estimated using an expectation process and taking into account the determined feature vector. For each of the Gaussian densities, a second mean vector using an Eigenvoice adaptation is determined taking into account the determined feature vector. For each of the Gaussian densities, the mean vector is set to a convex combination of the first and the second mean vector. Thus, this process allows for adaptation during operation and does not require a lengthy training phase. | 06-03-2010 |
20100153108 | METHOD FOR DYNAMIC LEARNING OF INDIVIDUAL VOICE PATTERNS - The present invention is a system and method for generating a personal voice font including, monitoring voice segments automatically from phone conversations of a user by a voice learning processor to generate a personalized voice font and delivering the personalized voice font (PVF) to the a server. | 06-17-2010 |
20100153109 | METHOD AND APPARATUS FOR SPEECH SEGMENTATION - Machine-readable media, methods, apparatus and system for speech segmentation are described. In some embodiments, a fuzzy rule may be determined to discriminate a speech segment from a non-speech segment. An antecedent of the fuzzy rule may include an input variable and an input variable membership. A consequent of the fuzzy rule may include an output variable and an output variable membership. An instance of the input variable may be extracted from a segment. An input variable membership function associated with the input variable membership and an output variable membership function associated with the output variable membership may be trained. The instance of the input variable, the input variable membership function, the output variable, and the output variable membership function may be operated, to determine whether the segment is the speech segment or the non-speech segment. | 06-17-2010 |
20100161330 | SPEECH MODELS GENERATED USING COMPETITIVE TRAINING, ASYMMETRIC TRAINING, AND DATA BOOSTING - Speech models are trained using one or more of three different training systems. They include competitive training which reduces a distance between a recognized result and a true result, data boosting which divides and weights training data, and asymmetric training which trains different model components differently. | 06-24-2010 |
20100169093 | INFORMATION PROCESSING APPARATUS, METHOD AND RECORDING MEDIUM FOR GENERATING ACOUSTIC MODEL - An information processing apparatus for speech recognition includes a first speech dataset storing speech data uttered by low recognition rate speakers; a second speech dataset storing speech data uttered by a plurality of speakers; a third speech dataset storing speech data to be mixed with the speech data of the second speech dataset; a similarity calculating part obtaining, for each piece of the speech data in the second speech dataset, a degree of similarity to a given average voice in the first speech dataset; a speech data selecting part recording the speech data, the degree of similarity of which is within a given selection range, as selected speech data in the third speech dataset; and an acoustic model generating part generating a first acoustic model using the speech data recorded in the second speech dataset and the third speech dataset. | 07-01-2010 |
20100204990 | SPEECH ANALYZER AND SPEECH ANALYSYS METHOD - A speech analyzer includes a vocal tract and sound source separating unit which separates a vocal tract feature and a sound source feature from an input speech, based on a speech generation model, a fundamental frequency stability calculating unit which calculates a temporal stability of a fundamental frequency of the input speech in the sound source feature, from the separated sound source feature, a stable analyzed period extracting unit which extracts time information of a stable period, based on the temporal stability, and a vocal tract feature interpolation unit which interpolates a vocal tract feature which is not included in the stable period, using a vocal tract feature included in the extracted stable period. | 08-12-2010 |
20100250251 | ADAPTATION FOR STATISTICAL LANGUAGE MODEL - Architecture that suppresses the unexpected appearance of words by applying appropriate restrictions to long-term and short-term memory. The quickness of adaptation is also realized by leveraging the restriction. The architecture includes a history component for processing user input history for conversion of a phonetic string by a conversion process that output conversion results, and an adaptation component for adapting the conversion process to the user input history based on restriction(s) applied to short-term memory that impacts word appearances during the conversion process. The architecture performs probability boosting based on context-dependent probability differences (short-term memory), and dynamic linear-interpolation between long-term memory and baseline language model based on frequency of preceding context of word (long-term memory). | 09-30-2010 |
20100268536 | SYSTEM AND METHOD FOR IMPROVING PERFORMANCE OF SEMANTIC CLASSIFIERS IN SPOKEN DIALOG SYSTEMS - A method and apparatus for continuously improving the performance of semantic classifiers in the scope of spoken dialog systems are disclosed. Rule-based or statistical classifiers are replaced with better performing rule-based or statistical classifiers and/or certain parameters of existing classifiers are modified. The replacement classifiers or new parameters are trained and tested on a collection of transcriptions and annotations of utterances which are generated manually or in a partially automated fashion. Automated quality assurance leads to more accurate training and testing data, higher classification performance, and feedback into the design of the spoken dialog system by suggesting changes to improve system behavior. | 10-21-2010 |
20100292988 | SYSTEM AND METHOD FOR SPEECH RECOGNITION - A speech recognition system samples speech signals having the same meaning, and obtains frequency spectrum images of the speech signals. Training objects are obtained by modifying the frequency spectrum images to be the same width. The speech recognition system obtains specific data of the speech signals by analyzing the training objects. The specific data is linked with the meaning of the speech signals. The specific data may include probability values representing probabilities that the training objects appear at different points in an image area of the training objects. A speech command may be sampled, and a frequency spectrum image of the speech command is modified to be the same width as the training objects. The speech recognition system can determine a meaning of a speech command by determining a matching degree of the modified frequency spectrum image of the speech command and the specific data of the speech signals. | 11-18-2010 |
20100324897 | AUDIO RECOGNITION DEVICE AND AUDIO RECOGNITION METHOD - Acoustic models and language models are learned according to a speaking length which indicates a length of a speaking section in speech data, and speech recognition process is implemented by using the learned acoustic models and language models. A speech recognition apparatus includes means ( | 12-23-2010 |
20110004473 | APPARATUS AND METHOD FOR ENHANCED SPEECH RECOGNITION - A method and apparatus for improving speech recognition results for an audio signal captured within an organization, comprising: receiving the audio signal captured by a capturing or logging device; extracting a phonetic feature and an acoustic feature from the audio signal; decoding the phonetic feature into a phonetic searchable structure; storing the phonetic searchable structure and the acoustic feature in an index; performing phonetic search for a word or a phrase in the phonetic searchable structure to obtain a result; activating an audio analysis engine which receives the acoustic feature to validate the result and obtain an enhanced result. | 01-06-2011 |
20110029311 | VOICE PROCESSING DEVICE AND METHOD, AND PROGRAM - There is provided a voice processing device. The device includes: score calculation unit configured to calculate a score indicating compatibility of a voice signal input on the basis of an utterance of a user with each of plural pieces of intention information indicating each of a plurality of intentions; intention selection unit configured to select the intention information indicating the intention of the utterance of the user among the plural pieces of intention information on the basis of the score calculated by the score calculation unit; and intention reliability calculation unit configured to calculate the reliability with respect to the intention information selected by the intention selection unit on the basis of the score calculated by the score calculation unit. | 02-03-2011 |
20110029312 | METHODS AND SYSTEMS FOR ADAPTING A MODEL FOR A SPEECH RECOGNITION SYSTEM - Methods are disclosed for identifying possible errors made by a speech recognition system without using a transcript of words input to the system. A method for model adaptation for a speech recognition system includes determining an error rate, corresponding to either recognition of instances of a word or recognition of instances of various words, without using a transcript of words input to the system. The method may further include adjusting an adaptation, of the model for the word or various models for the various words, based on the error rate. Apparatus are disclosed for identifying possible errors made by a speech recognition system without using a transcript of words input to the system. An apparatus for model adaptation for a speech recognition system includes a processor adapted to estimate an error rate, corresponding to either recognition of instances of a word or recognition of instances of various words, without using a transcript of words input to the system. The apparatus may further include a controller adapted to adjust an adaptation of the model for the word or various models for the various words, based on the error rate. | 02-03-2011 |
20110046952 | ACOUSTIC MODEL LEARNING DEVICE AND SPEECH RECOGNITION DEVICE - Parameters of a first variation model, a second variation model and an environment-independent acoustic model are estimated in such a way that an integrated degree of fitness obtained by integrating a degree of fitness of the first variation model to the sample speech data, a degree of fitness of the second variation model to the sample speech data, and a degree of fitness of the environment-independent acoustic model to the sample speech data becomes the maximum. Therefore, when constructing an acoustic model by using sample speech data affected by a plurality of acoustic environments; the effect on a speech which is caused by each of the acoustic environments can be extracted with high accuracy. | 02-24-2011 |
20110060588 | Method and System for Automatic Speech Recognition with Multiple Contexts - A method and a system for activating functions including a first function and a second function, wherein the system is embedded in an apparatus, are disclosed. The system includes a control configured to be activated by a plurality of activation styles, wherein the control generates a signal indicative of a particular activation style from multiple activation styles; and controller configured to activate either the first function or the second function based on the particular activation style, wherein the first function is configured to be executed based only on the activation style, and wherein the second function is further configured to be executed based on a speech input. | 03-10-2011 |
20110071829 | IMAGE PROCESSING APPARATUS, SPEECH RECOGNITION PROCESSING APPARATUS, CONTROL METHOD FOR SPEECH RECOGNITION PROCESSING APPARATUS, AND COMPUTER-READABLE STORAGE MEDIUM FOR COMPUTER PROGRAM - An image processing apparatus includes a speech input portion that receives an input of speech from a user, a dictionary storage portion that stores a dictionary configured by phrase information pieces for recognizing the speech, a compound phrase generation portion that generates a plurality of compound phrases formed by all combinations of a plurality of predetermined phrases in different orders, a compound phrase registration portion that registers the plurality of compound phrases that have been generated in the dictionary as the phrase information pieces, a speech recognition portion that, in a case where speech including a speech phrase formed by the plurality of predetermined phrases said in an arbitrary order has been input, performs speech recognition on the speech by searching the dictionary for a compound phrase that matches the speech phrase. | 03-24-2011 |
20110082696 | SYSTEM AND METHOD FOR SPEECH-ENABLED ACCESS TO MEDIA CONTENT - Disclosed herein are systems, methods, and computer-readable storage media for generating a speech recognition model for a media content retrieval system. The method causes a computing device to retrieve information describing media available in a media content retrieval system, construct a graph that models how the media are interconnected based on the retrieved information, rank the information describing the media based on the graph, and generate a speech recognition model based on the ranked information. The information can be a list of actors, directors, composers, titles, and/or locations. The graph that models how the media are interconnected can further model pieces of common information between two or more media. The method can further cause the computing device to weight the graph based on the retrieved information. The graph can further model relative popularity information in the list. The method can rank information based on a PageRank algorithm. | 04-07-2011 |
20110093265 | Systems and Methods for Creating and Using Geo-Centric Language Models - Systems and methods for creating and using geo-centric language models are provided herein. An exemplary method includes assigning each of a plurality of listings to a local service area, determining a geographic center for the local service area, computing a listing density for the local service area, and selecting a desired number of listings for a geo-centric listing set. The geo-centric listing set includes a subset of the plurality of listings. The exemplary method further includes dividing the local service area into regions based upon the listing density and the number of listings in the geo-centric listing set, and building a language model for the geo-centric listing set. | 04-21-2011 |
20110144991 | Compressing Feature Space Transforms - Methods for compressing a transform associated with a feature space are presented. For example, a method for compressing a transform associated with a feature space includes obtaining the transform including a plurality of transform parameters, assigning each of a plurality of quantization levels for the plurality of transform parameters to one of a plurality of quantization values, and assigning each of the plurality of transform parameters to one of the plurality of quantization values to which one of the plurality of quantization levels is assigned. One or more of obtaining the transform, assigning of each of the plurality of quantization levels, and assigning of each of the transform parameters are implemented as instruction code executed on a processor device. Further, a Viterbi algorithm may be employed for use in non-uniform level/value assignments. | 06-16-2011 |
20110144992 | UNSUPERVISED LEARNING USING GLOBAL FEATURES, INCLUDING FOR LOG-LINEAR MODEL WORD SEGMENTATION - Described is a technology for performing unsupervised learning using global features extracted from unlabeled examples. The unsupervised learning process may be used to train a log-linear model, such as for use in morphological segmentation of words. For example, segmentations of the examples are sampled based upon the global features to produce a segmented corpus and log-linear model, which are then iteratively reprocessed to produce a final segmented corpus and a log-linear model. | 06-16-2011 |
20110144993 | Disfluent-utterance tracking system and method - A disfluent-utterance tracking system includes a speech transducer; one or more targeted-disfluent-utterance records stored in a memory; a real-time speech recording mechanism operatively connected with the speech transducer for recording a real-time utterance; and an analyzer operatively coupled with the targeted-disfluent-utterance record and with the real-time speech recording mechanism, the analyzer configured to compare one or more real-time snippets of the recorded speech with the targeted-disfluent-utterance record to determine and indicate to a user a level of correlation therebetween. | 06-16-2011 |
20110153327 | SYSTEMS AND METHODS FOR IDENTITY MATCHING BASED ON PHONETIC AND EDIT DISTANCE MATCHING - According to embodiments of the present disclosure, a matching module is configured to accurately match a probe identity of an entity to a collection of entities. The matching module is configured to match the probe identity of the entity to the collection of entities based on a combination of phonetic matching processes and edit distance processes. The matching module is configured to create phonetic groups for name parts of identities in the collection. The matching module is configured to compare probe name parts of the probe identity to the name parts associated with the phonetic groups. | 06-23-2011 |
20110166858 | METHOD OF RECOGNIZING SPEECH - A method for recognizing speech involves presenting an utterance to a speech recognition system and determining, via the speech recognition system, that the utterance contains a particular expression, where the particular expression is capable of being associated with at least two different meanings. The method further involves splitting the utterance into a plurality of speech frames, where each frame is assigned a predetermined time segment and a frame number, and indexing the utterance to i) a predetermined frame number, or ii) a predetermined time segment. The indexing of the utterance identifies that one of the frames includes the particular expression. Then the frame including the particular expression is re-presented to the speech recognition system to verify that the particular expression was actually recited in the utterance. | 07-07-2011 |
20110218804 | SPEECH PROCESSOR, A SPEECH PROCESSING METHOD AND A METHOD OF TRAINING A SPEECH PROCESSOR - A speech recognition method, the method involving:
| 09-08-2011 |
20110231189 | METHODS AND APPARATUS FOR EXTRACTING ALTERNATE MEDIA TITLES TO FACILITATE SPEECH RECOGNITION - Techniques for generating a set of one or more alternate titles associated with stored digital media content and updating a speech recognition system to enable the speech recognition system to recognize the set of alternate titles. The system operates on an original media title to extract a set of alternate media titles by applying at least one rule to the original title. The extracted set of alternate media titles are used to update the speech recognition system prior to runtime. In one aspect rules that are applied to original titles are determined by analyzing a corpus of original titles and corresponding possible alternate media titles that a user may use to refer to the original titles. | 09-22-2011 |
20110231190 | METHOD OF AND SYSTEM FOR PROVIDING ADAPTIVE RESPONDENT TRAINING IN A SPEECH RECOGNITION APPLICATION - A system for conducting a telephonic speech recognition application includes an automated telephone device for making telephonic contact with a respondent and a speech recognition device which, upon the telephonic contact being made, presents the respondent with at least one introductory prompt for the respondent to reply to; receives a spoken response from the respondent; and performs a speech recognition analysis on the spoken response to determine a capability of the respondent to complete the application. If the speech recognition device, based on the spoken response to the introductory prompt, determines that the respondent is capable of competing the application, the speech recognition device presents at least one application prompt to the respondent. If the speech recognition device, based on the spoken response to the introductory prompt, determines that the respondent is not capable of completing the application, the speech recognition system presents instructions on completing the application to the respondent. | 09-22-2011 |
20110231191 | Weight Coefficient Generation Device, Voice Recognition Device, Navigation Device, Vehicle, Weight Coefficient Generation Method, and Weight Coefficient Generation Program - A weight coefficient generation device, a speech recognition device, a navigation system, a vehicle, a vehicle coefficient generation method, and a weight coefficient generation program are provided for the purpose of improving a speech recognition performance of place names. In order to address the above purpose, an address database | 09-22-2011 |
20110276329 | SPEECH DIALOGUE APPARATUS, DIALOGUE CONTROL METHOD, AND DIALOGUE CONTROL PROGRAM - A speech dialogue apparatus, a dialogue control method, and a dialogue control program are provided, whereby an appropriate dialogue control is enabled by determining a user's proficiency level in a dialogue behavior correctly and performing an appropriate dialogue control according to the user's proficiency level correctly determined, without being influenced by an accidental one-time behavior of the user. An input unit | 11-10-2011 |
20110295602 | APPARATUS AND METHOD FOR MODEL ADAPTATION FOR SPOKEN LANGUAGE UNDERSTANDING - An apparatus and a method are provided for building a spoken language understanding model. Labeled data may be obtained for a target application. A new classification model may be formed for use with the target application by using the labeled data for adaptation of an existing classification model. In some implementations, the existing classification model may be used to determine the most informative examples to label. | 12-01-2011 |
20110301953 | SYSTEM AND METHOD OF MULTI MODEL ADAPTATION AND VOICE RECOGNITION - Provided is a system of voice recognition that adapts and stores a voice of a speaker for each feature to each of a basic voice model and new independent multi models and provides stable real-time voice recognition through voice recognition using a multi adaptive model. | 12-08-2011 |
20120010885 | System and Method for Unsupervised and Active Learning for Automatic Speech Recognition - A system and method is provided for combining active and unsupervised learning for automatic speech recognition. This process enables a reduction in the amount of human supervision required for training acoustic and language models and an increase in the performance given the transcribed and un-transcribed data. | 01-12-2012 |
20120046946 | SYSTEM AND METHOD FOR MERGING AUDIO DATA STREAMS FOR USE IN SPEECH RECOGNITION APPLICATIONS - A system and method for merging audio data streams receive audio data streams from separate inputs, independently transform each data stream from the time to the frequency domain, and generate separate feature data sets for the transformed data streams. Feature data from each of the separate feature data sets is selected to form a merged feature data set that is output to a decoder for recognition purposes. The separate inputs can include an ear microphone and a mouth microphone. | 02-23-2012 |
20120059653 | METHODS AND SYSTEMS FOR OBTAINING LANGUAGE MODELS FOR TRANSCRIBING COMMUNICATIONS - A method for producing speech recognition results on a device includes receiving first speech recognition results, obtaining a language model, wherein the language model represents information stored on the device, and using the first speech recognition results and the language model to generate second speech recognition results. | 03-08-2012 |
20120059654 | SPEAKER-ADAPTIVE SYNTHESIZED VOICE - An objective is to provide a technique for accurately reproducing features of a fundamental frequency of a target-speaker's voice on the basis of only a small amount of learning data. A learning apparatus learns shift amounts from a reference source F0 pattern to a target F0 pattern of a target-speaker's voice. The learning apparatus associates a source F0 pattern of a learning text to a target F0 pattern of the same learning text by associating their peaks and troughs. For each of points on the target F0 pattern, the learning apparatus obtains shift amounts in a time-axis direction and in a frequency-axis direction from a corresponding point on the source F0 pattern in reference to a result of the association, and learns a decision tree using, as an input feature vector, linguistic information obtained by parsing the learning text, and using, as an output feature vector, the calculated shift amounts. | 03-08-2012 |
20120072217 | SYSTEM AND METHOD FOR USING PROSODY FOR VOICE-ENABLED SEARCH - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating relevant responses to a user query with voice-enabled search. A system practicing the method receives a word lattice generated by an automatic speech recognizer based on a user speech and a prosodic analysis of the user speech, generates a reweighted word lattice based on the word lattice and the prosodic analysis, approximates based on the reweighted word lattice one or more relevant responses to the query, and presents to a user the responses to the query. The prosodic analysis examines metalinguistic information of the user speech and can identify the most salient subject matter of the speech, assess how confident a speaker is in the content of his or her speech, and identify the attitude, mood, emotion, sentiment, etc. of the speaker. Other information not described in the content of the speech can also be used. | 03-22-2012 |
20120101821 | SPEECH RECOGNITION APPARATUS - A speech recognition apparatus is disclosed. The apparatus converts a speech signal into a digitalized speech data, and performs speech recognition based on the speech data. The apparatus makes a comparison between the speech data inputted the last time and the speech data inputted the time before the last time in response to a user's indication that the speech recognition results in erroneous recognition multiple times in a row. When the speech data inputted the last time is determined to substantially match the speech data inputted the time before the last time, the apparatus outputs a guidance prompting the user to utter an input target by calling it by another name. | 04-26-2012 |
20120116762 | SYSTEM AND METHOD FOR COMMUNICATION TERMINAL SURVEILLANCE BASED ON SPEAKER RECOGNITION - A Candidate Isolation System (CIS) detects subscribers of phone call services as candidates to be surveillance targets. A Voice Matching System (VMS) then decides whether or not a given candidate Communication Terminals (CTs) should be tracked by determining, using speaker recognition techniques, whether the subscriber operating the candidate CT is a known target subscriber. The CIS receives from the network call event data that relate to CTs in the network. The CIS detects candidate CTs using a unique candidate isolation process, which applies predefined selection criteria to the received call events data | 05-10-2012 |
20120130715 | METHOD AND APPARATUS FOR GENERATING A VOICE-TAG - According to one embodiment, an apparatus for generating a voice-tag includes an input unit, a recognition unit, and a combination unit. The input unit is configured to input a registration speech. The recognition unit is configured to recognize the registration speech to obtain N-best recognition results, wherein N is an integer greater than or equal to 2. The combination unit is configured to combine the N-best recognition results as a voice-tag of the registration speech. | 05-24-2012 |
20120203553 | RECOGNITION DICTIONARY CREATING DEVICE, VOICE RECOGNITION DEVICE, AND VOICE SYNTHESIZER - A recognition dictionary creating device includes a user dictionary in which a phoneme label string of an inputted voice is registered and an interlanguage acoustic data mapping table in which a correspondence between phoneme labels in different languages is defined, and refers to the interlanguage acoustic data mapping table to convert the phoneme label string registered in the user dictionary and expressed in a language set at the time of creating the user dictionary into a phoneme label string expressed in another language which the recognition dictionary creating device has switched. | 08-09-2012 |
20120215535 | METHOD AND APPARATUS FOR AUTOMATIC CORRELATION OF MULTI-CHANNEL INTERACTIONS - A method and apparatus for multi-channel categorization, comprising capturing a vocal interaction and a non-vocal interaction, using logging or capturing devices; retrieving a first word from the vocal interaction and a second word from the non-vocal interaction; assigning the vocal interaction into a first category using the first word; assigning the non-vocal interaction into a second category using the second word; and associating the first category and the second category into a multi-channel category, thus aggregating the vocal interaction and the non-vocal interaction. | 08-23-2012 |
20120221333 | Phonetic Features for Speech Recognition - Techniques are disclosed for using phonetic features for speech recognition. For example, a method comprises the steps of obtaining a first dictionary and a training data set associated with a speech recognition system, computing one or more support parameters from the training data set, transforming the first dictionary into a second dictionary, wherein the second dictionary is a function of one or more phonetic labels of the first dictionary, and using the one or more support parameters to select one or more samples from the second dictionary to create a set of one or more exemplar-based class identification features for a pattern recognition task. | 08-30-2012 |
20120232902 | SYSTEM AND METHOD FOR SPEECH RECOGNITION MODELING FOR MOBILE VOICE SEARCH - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating an acoustic model for use in speech recognition. A system configured to practice the method first receives training data and identifies non-contextual lexical-level features in the training data. Then the system infers sentence-level features from the training data and generates a set of decision trees by node-splitting based on the non-contextual lexical-level features and the sentence-level features. The system decorrelates training vectors, based on the training data, for each decision tree in the set of decision trees to approximate full-covariance Gaussian models, and then can train an acoustic model for use in speech recognition based on the training data, the set of decision trees, and the training vectors. | 09-13-2012 |
20120239399 | VOICE RECOGNITION DEVICE - Disclosed is a voice recognition device which creates a recognition dictionary (statically-created dictionary) in advance for a vocabulary having words to be recognized whose number is equal to or larger than a threshold, and creates a recognition dictionary (dynamically-created dictionary) for a vocabulary having words to be recognized whose number is smaller than the threshold in an interactive situation. | 09-20-2012 |
20120245939 | METHOD AND SYSTEM FOR CONSIDERING INFORMATION ABOUT AN EXPECTED RESPONSE WHEN PERFORMING SPEECH RECOGNITION - A speech recognition system receives and analyzes speech input from a user in order to recognize and accept a response from the user. Under certain conditions, information about the response expected from the user may be available. In these situations, the available information about the expected response is used to modify the behavior of the speech recognition system by taking this information into account. The modified behavior of the speech recognition system according to the invention has several embodiments including: comparing the observed speech features to the models of the expected response separately from the usual hypothesis search in order to speed up the recognition system; modifying the usual hypothesis search to emphasize the expected response; updating and adapting the models when the recognized speech matches the expected response to improve the accuracy of the recognition system. | 09-27-2012 |
20120271631 | SPEECH RECOGNITION USING MULTIPLE LANGUAGE MODELS - In accordance with one embodiment, a method of generating language models for speech recognition includes identifying a plurality of utterances in training data corresponding to speech, generating a frequency count of each utterance in the plurality of utterances, generating a high-frequency plurality of utterances from the plurality of utterances having a frequency that exceeds a predetermined frequency threshold, generating a low-frequency plurality of utterances from the plurality of utterances having a frequency that is below the predetermined frequency threshold, generating a grammar-based language model using the high-frequency plurality of utterances as training data, and generating a statistical language model using the low-frequency plurality of utterances as training data. | 10-25-2012 |
20130085756 | System and Method of Semi-Supervised Learning for Spoken Language Understanding Using Semantic Role Labeling - A system and method are disclosed for providing semi-supervised learning for a spoken language understanding module using semantic role labeling. The method embodiment relates to a method of generating a spoken language understanding module. Steps in the method comprise selecting at least one predicate/argument pair as an intent from a set of the most frequent predicate/argument pairs for a domain, labeling training data using mapping rules associated with the selected at least one predicate/argument pair, training a call-type classification model using the labeled training data, re-labeling the training data using the call-type classification model and iteratively several of the above steps until training set labels converge. | 04-04-2013 |
20130090926 | MOBILE DEVICE CONTEXT INFORMATION USING SPEECH DETECTION - Systems and methods for speech detection in association with a mobile device are described herein. A method described herein for identifying presence of speech associated with a mobile device includes obtaining a plurality of audio samples from the mobile device while the mobile device operates in a mode distinct from a voice call operating mode, generating spectrogram data from the plurality of audio samples, and determining whether the plurality of audio samples include information indicative of speech by classifying the spectrogram data. | 04-11-2013 |
20130110511 | System, Method and Program for Customized Voice Communication | 05-02-2013 |
20130132083 | GENERIC FRAMEWORK FOR LARGE-MARGIN MCE TRAINING IN SPEECH RECOGNITION - A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the initial acoustic model. Also, a sample-adaptive window bandwidth is calculated for each training token. From the calculated scores and the sample-adaptive window bandwidth values, loss values are calculated based on a loss function. The loss function, which may be derived from a Bayesian risk minimization viewpoint, can include a margin value that moves a decision boundary such that token-to-boundary distances for correct tokens that are near the decision boundary are maximized. The margin can either be a fixed margin or can vary monotonically as a function of algorithm iterations. The acoustic model is updated based on the calculated loss values. This process can be repeated until an empirical convergence is met. | 05-23-2013 |
20130151252 | SYSTEM AND METHOD FOR STANDARDIZED SPEECH RECOGNITION - Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure. | 06-13-2013 |
20130166295 | METHOD AND APPARATUS FOR SPEAKER-CALIBRATED SPEAKER DETECTION - The present invention relates to a method and apparatus for speaker-calibrated speaker detection. One embodiment of a method for generating a speaker model for use in detecting a speaker of interest includes identifying one or more speech features that best distinguish the speaker of interest from a plurality of impostor speakers and then incorporating the speech features in the speaker model. | 06-27-2013 |
20130166296 | METHOD AND APPARATUS FOR GENERATING SPEAKER-SPECIFIC SPOKEN PASSWORDS - The present invention relates to a method and apparatus for generating speaker-specific spoken passwords. One embodiment of a method for generating a spoken password for use by a speaker of interest includes identifying one or more speech features that best distinguish the speaker of interest from a plurality of impostor speakers and incorporating the speech features in the spoken password. | 06-27-2013 |
20130166297 | Discriminative Training of Document Transcription System - A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system may identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model using discriminative training techniques, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript. | 06-27-2013 |
20130185070 | NORMALIZATION BASED DISCRIMINATIVE TRAINING FOR CONTINUOUS SPEECH RECOGNITION - A speech recognition system trains a plurality of feature transforms and a plurality of acoustic models using an irrelevant variability normalization based discriminative training. The speech recognition system employs the trained feature transforms to absorb or ignore variability within an unknown speech that is irrelevant to phonetic classification. The speech recognition system may then recognize the unknown speech using the trained recognition models. The speech recognition system may further perform an unsupervised adaptation to adapt the feature transforms for the unknown speech and thus increase the accuracy of recognizing the unknown speech. | 07-18-2013 |
20130211835 | SPEECH RECOGNITION CIRCUIT AND METHOD - A speech recognition circuit comprising a circuit for providing state identifiers which identify states corresponding to nodes or groups of adjacent nodes in a lexical tree, and for providing scores corresponding to said state identifiers, the lexical tree comprising a model of words; a memory structure for receiving and storing state identifiers identified by a node identifier identifying a node or group of adjacent nodes, said memory structure being adapted to allow lookup to identify particular state identifiers, reading of the scores corresponding to the state identifiers, and writing back of the scores to the memory structure after modification of the scores; an accumulator for receiving score updates corresponding to particular state identifiers from a score update generating circuit which generates the score updates using audio input, for receiving scores from the memory structure, and for modifying said scores by adding said score updates to said scores; and a selector circuit for selecting at least one node or group of adjacent nodes of the lexical tree according to said scores. | 08-15-2013 |
20130262114 | Crowdsourced, Grounded Language for Intent Modeling in Conversational Interfaces - Different advantageous embodiments provide a crowdsourcing method for modeling user intent in conversational interfaces. One or more stimuli are presented to a plurality of describers. One or more sets of describer data are captured from the plurality of describers using a data collection mechanism. The one or more sets of describer data are processed to generate one or more models. Each of the one or more models is associated with a specific stimulus from the one or more stimuli. | 10-03-2013 |
20130268272 | TEXT DEPENDENTSPEAKER RECOGNITION WITH LONG-TERM FEATURE BASED ON FUNCTIONAL DATA ANALYSIS - One or more test features are extracted from a time domain signal. The test features are represented by discrete data. The discrete data is represented for each of the one or more test features by a corresponding one or more fitting functions, which are defined in terms of finite number of continuous basis functions and a corresponding finite number of expansion coefficients. Each fitting function is compressed through Functional Principal Component Analysis (FPCA) to generate corresponding sets of principal components. Each principal component for a given test feature is uncorrelated to each other principal component for the given test feature. A distance between a set of principal components for the given test feature and a set of principal components for one or more training features with the processing system is calculated. The test feature is classified according to the distance calculated with the processing system. | 10-10-2013 |
20130289989 | Sampling Training Data for an Automatic Speech Recognition System Based on a Benchmark Classification Distribution - A set of benchmark text strings may be classified to provide a set of benchmark classifications. The benchmark text strings in the set may correspond to a benchmark corpus of benchmark utterances in a particular language. A benchmark classification distribution of the set of benchmark classifications may be determined. A respective classification for each text string in a corpus of text strings may also be determined. Text strings from the corpus of text strings may be sampled to form a training corpus of training text strings such that the classifications of the training text strings have a training text string classification distribution that is based on the benchmark classification distribution. The training corpus of training text strings may be used to train an automatic speech recognition (ASR) system. | 10-31-2013 |
20130289990 | System and Dialog Manager Developed Using Modular Spoken-Dialog Components - A dialog manager and spoken dialog service having a dialog manager generated according to a method comprising selecting a top level flow controller based on application type, selecting available reusable subdialogs for each application part, developing a subdialog for each application part not having an available subdialog and testing and deploying the spoken dialog service using the selected top level flow controller, selected reusable subdialogs and developed subdialogs. The dialog manager capable of handling context shifts in a spoken dialog with a user. Application dependencies are established in the top level flow controller thus enabling the subdialogs to be reusable and to be capable of managing context shifts and mixed initiative dialogs. | 10-31-2013 |
20130297310 | GENERATING ACOUSTIC MODELS - This document describes methods, systems, techniques, and computer program products for generating and/or modifying acoustic models. Acoustic models and/or transformations for a target language/dialect can be generated and/or modified using acoustic models and/or transformations from a source language/dialect. | 11-07-2013 |
20130332164 | NAME RECOGNITION SYSTEM - A speech recognition system uses, in one embodiment, an extended phonetic dictionary that is obtained by processing words in a user's set of databases, such as a user's contacts database, with a set of pronunciation guessers. The speech recognition system can use a conventional phonetic dictionary and the extended phonetic dictionary to recognize speech inputs that are user requests to use the contacts database, for example, to make a phone call, etc. The extended phonetic dictionary can be updated in response to changes in the contacts database, and the set of pronunciation guessers can include pronunciation guessers for a plurality of locales, each locale having its own pronunciation guesser. | 12-12-2013 |
20140046662 | METHOD AND SYSTEM FOR ACOUSTIC DATA SELECTION FOR TRAINING THE PARAMETERS OF AN ACOUSTIC MODEL - A system and method are presented for acoustic data selection of a particular quality for training the parameters of an acoustic model, such as a Hidden Markov Model and Gaussian Mixture Model, for example, in automatic speech recognition systems in the speech analytics field. A raw acoustic model may be trained using a given speech corpus and maximum likelihood criteria. A series of operations are performed, such as a forced Viterbi-alignment, calculations of likelihood scores, and phoneme recognition, for example, to form a subset corpus of training data. During the process, audio files of a quality that does not meet a criterion, such as poor quality audio files, may be automatically rejected from the corpus. The subset may then be used to train a new acoustic model. | 02-13-2014 |
20140052444 | SYSTEM AND METHODS FOR MATCHING AN UTTERANCE TO A TEMPLATE HIERARCHY - A system and methods for matching at least one word of an utterance against a set of template hierarchies to select the best matching template or set of templates corresponding to the utterance. Certain embodiments of the system and methods determines at least one exact, inexact, and partial match between the at least one word of the utterance and at least one term within the template hierarchy to select and populate a template or set of templates corresponding to the utterance. The populated template or set of templates may then be used to generate a narrative template or a report template. | 02-20-2014 |
20140058731 | Method and System for Selectively Biased Linear Discriminant Analysis in Automatic Speech Recognition Systems - A system and method are presented for selectively biased linear discriminant analysis in automatic speech recognition systems. Linear Discriminant Analysis (LDA) may be used to improve the discrimination between the hidden Markov model (HMM) tied-states in the acoustic feature space. The between-class and within-class covariance matrices may be biased based on the observed recognition errors of the tied-states, such as shared HMM states of the context dependent tri-phone acoustic model. The recognition errors may be obtained from a trained maximum-likelihood acoustic model utilizing the tied-states which may then be used as classes in the analysis. | 02-27-2014 |
20140067393 | MODEL LEARNING DEVICE, MODEL GENERATION METHOD, AND COMPUTER PROGRAM PRODUCT - According to an embodiment, a model learning device learns a model having a full covariance matrix shared among a plurality of Gaussian distributions. The device includes a first calculator to calculate, from training data, frequencies of occurrence and sufficient statistics of the Gaussian distributions contained in the model; and a second calculator to select, on the basis of the frequencies of occurrence and the sufficient statistics, a sharing structure in which a covariance matrix is shared among Gaussian distributions, and calculate the full covariance matrix shared in the selected sharing structure. | 03-06-2014 |
20140088964 | Exemplar-Based Latent Perceptual Modeling for Automatic Speech Recognition - Methods, systems, and computer-readable media related to selecting observation-specific training data (also referred to as “observation-specific exemplars”) from a general training corpus, and then creating, from the observation-specific training data, a focused, observation-specific acoustic model for recognizing the observation in an output domain are disclosed. In one aspect, a global speech recognition model is established based on an initial set of training data; a plurality of input speech segments to be recognized in an output domain are received; and for each of the plurality of input speech segments: a respective set of focused training data relevant to the input speech segment is identified in the global speech recognition model; a respective focused speech recognition model is generated based on the respective set of focused training data; and the respective focused speech recognition model is provided to a recognition device for recognizing the input speech segment in the output domain. | 03-27-2014 |
20140129222 | SPEECH RECOGNITION SYSTEM, RECOGNITION DICTIONARY REGISTRATION SYSTEM, AND ACOUSTIC MODEL IDENTIFIER SERIES GENERATION APPARATUS - When it is determined that sound data is unrecognizable through a speech recognition process by a first speech recognition unit ( | 05-08-2014 |
20140142942 | UTILIZING MULTIPLE PROCESSING UNITS FOR RAPID TRAINING OF HIDDEN MARKOV MODELS - A method of optimizing the calculation of matching scores between phone states and acoustic frames across a matrix of an expected progression of phone states aligned with an observed progression of acoustic frames within an utterance is provided. The matrix has a plurality of cells associated with a characteristic acoustic frame and a characteristic phone state. A first set and second set of cells that meet a threshold probability of matching a first phone state or a second phone state, respectively, are determined. The phone states are stored on a local cache of a first core and a second core, respectively. The first and second sets of cells are also provided to the first core and second core, respectively. Further, matching scores of each characteristic state and characteristic observation of each cell of the first set of cells and of the second set of cells are calculated. | 05-22-2014 |
20140156273 | Method and System for Information Recognition - A system and a method perform information recognition. The method arranges data base information in a data base information structure. The method matches input information to the data base information using at least one matching algorithm and using a matching information structure. In accordance with the system and the method, the matching information structure differs from the data base information structure. | 06-05-2014 |
20140200891 | Semantic Graphs and Conversational Agents - Semantic clustering techniques are described. In various implementations, a conversational agent is configured to perform semantic clustering of a corpus of user utterances. Semantic clustering may be used to provide a variety of functionality, such as to group a corpus of utterances into semantic clusters in which each cluster pertains to a similar topic. These clusters may then be leveraged to identify topics and assess their relative importance, as for example to prioritize topics whose handling by the conversation agent should be improved. A variety of utterances may be processed using these techniques, such as spoken words, textual descriptions entered via live chat, instant messaging, a website interface, email, SMS, a social network, a blogging or micro-blogging interface, and so on. | 07-17-2014 |
20140207457 | FALSE ALARM REDUCTION IN SPEECH RECOGNITION SYSTEMS USING CONTEXTUAL INFORMATION - A system and method are presented for using spoken word verification to reduce false alarms by exploiting global and local contexts on a lexical level, a phoneme level, and on an acoustical level. The reduction of false alarms may occur through a process that determines whether a word has been detected or if it is a false alarm. Training examples are used to generate models of internal and external contexts which are compared to test word examples. The word may be accepted or rejected based on comparison results. Comparison may be performed either at the end of the process or at multiple steps of the process to determine whether the word is rejected. | 07-24-2014 |
20140207458 | Concise Dynamic Grammars Using N-Best Selection - A method and apparatus derive a dynamic grammar composed of a subset of a plurality of data elements that are each associated with one of a plurality of reference identifiers. The present invention generates a set of selection identifiers on the basis of a user-provided first input identifier and determines which of these selection identifiers are present in a set of pre-stored reference identifiers. The present invention creates a dynamic grammar that includes those data elements that are associated with those reference identifiers that are matched to any of the selection identifiers. Based on a user-provided second identifier and on the data elements of the dynamic grammar, the present invention selects one of the reference identifiers in the dynamic grammar. | 07-24-2014 |
20140214420 | FEATURE SPACE TRANSFORMATION FOR PERSONALIZATION USING GENERALIZED I-VECTOR CLUSTERING - Personalization for Automatic Speech Recognition (ASR) is associated with a particular device. A generalized i-vector clustering method is used to train i-vector parameters on utterances received from a device and to classify test utterances from the same device. A sub-loading matrix and a residual noise term may be used when determining the personalization. A Universal Background Model (UBM) is trained using the utterances. The UBM is applied to obtain i-vectors of training utterances received from a device and a Gaussian Mixture Model (GMM) is trained using the i-vectors. During testing, the i-vector for each utterance received from the device is estimated using the device's UBM. The utterance is then assigned to the cluster with the closest centroid in the GMM. For each utterance, the i-vector and the residual noise estimation is performed. Hyperparameter estimation is also performed. The i-vector estimation and hyperparameter estimation are performed until convergence. | 07-31-2014 |
20140214421 | PROSODIC AND LEXICAL ADDRESSEE DETECTION - Prosodic features are used for discriminating computer-directed speech from human-directed speech. Statistics and models describing energy/intensity patterns over time, speech/pause distributions, pitch patterns, vocal effort features, and speech segment duration patterns may be used for prosodic modeling. The prosodic features for at least a portion of an utterance are monitored over a period of time to determine a shape associated with the utterance. A score may be determined to assist in classifying the current utterance as human directed or computer directed without relying on knowledge of preceding utterances or utterances following the current utterance. Outside data may be used for training lexical addressee detection systems for the H-H-C scenario. H-C training data can be obtained from a single-user H-C collection and that H-H speech can be modeled using general conversational speech. H-C and H-H language models may also be adapted using interpolation with small amounts of matched H-H-C data. | 07-31-2014 |
20140214422 | METHOD AND SYSTEM FOR DETECTING BOUNDARY OF COARTICULATED UNITS FROM ISOLATED SPEECH - The application provides a method and system for determinism in non-linear systems for speech processing, particularly automatic speech segmentation for building speech recognition systems. More particularly, the application enables a method and system for detecting boundary of coarticulated units from isolated speech using recurrence plot. | 07-31-2014 |
20140222425 | SPEECH RECOGNITION LEARNING METHOD USING 3D GEOMETRIC INFORMATION AND SPEECH RECOGNITION METHOD USING 3D GEOMETRIC INFORMATION - Provided are a speech recognition learning method using 3D geometric information and a speech recognition method by using 3D geometric information. The method performs learning by using 3D geometric information for learning or information derived from the 3D geometric information to generate a recognizer, and the speech recognition method performs speech recognition by applying 3D geometric information on a physical object correlated to or dependent on voice or information derived from the 3D geometric information to the recognizer. | 08-07-2014 |
20140222426 | System and Method of Providing an Automated Data-Collection in Spoken Dialog Systems - The invention relates to a system and method for gathering data for use in a spoken dialog system. An aspect of the invention is generally referred to as an automated hidden human that performs data collection automatically at the beginning of a conversation with a user in a spoken dialog system. The method comprises presenting an initial prompt to a user, recognizing a received user utterance using an automatic speech recognition engine and classifying the recognized user utterance using a spoken language understanding module. If the recognized user utterance is not understood or classifiable to a predetermined acceptance threshold, then the method re-prompts the user. If the recognized user utterance is not classifiable to a predetermined rejection threshold, then the method transfers the user to a human as this may imply a task-specific utterance. The received and classified user utterance is then used for training the spoken dialog system. | 08-07-2014 |
20140244254 | FACILITATING DEVELOPMENT OF A SPOKEN NATURAL LANGUAGE INTERFACE - A development system is described for facilitating the development of a spoken natural language (SNL) interface. The development system receives seed templates from a developer, each of which provides a command phrasing that can be used to invoke a function, when spoken by an end user. The development system then uses one or more development resources, such as a crowdsourcing system and a paraphrasing system, to provide additional templates. This yields an extended set of templates. A generation system then generates one or more models based on the extended set of templates. A user device may install the model(s) for use in interpreting commands spoken by an end user. When the user device recognizes a command, it may automatically invoke a function associated with that command. Overall, the development system provides an easy-to-use tool for producing an SNL interface. | 08-28-2014 |
20140257810 | PATTERN CLASSIFIER DEVICE, PATTERN CLASSIFYING METHOD, COMPUTER PROGRAM PRODUCT, LEARNING DEVICE, AND LEARNING METHOD - According to an embodiment, a pattern classifier device includes a decision unit, an execution unit, a calculator, and a determination unit. The decision unit is configured to decide a subclass to which the input pattern is to belong, based on attribute information of the input pattern. The execution unit is configured to determine whether the input pattern belongs to a class that is divided into subclasses, using a weak classifier allocated to the decided subclass, and output a result of the determination and a reliability of the weak classifier. The calculator is configured to calculate an integrated value obtained by integrating an evaluation value based on the determination result and the reliability. The determination unit is configured to repeat the determination processing when a termination condition of the determination processing is not satisfied, and terminate the determination processing and output the integrated value when the termination condition, has been satisfied. | 09-11-2014 |
20140278413 | TRAINING AN AT LEAST PARTIAL VOICE COMMAND SYSTEM - An electronic device with one or more processors and memory includes a procedure for training a digital assistant. In some embodiments, the device detects an impasse in a dialogue between the digital assistant and a user including a speech input. During a learning session, the device utilizes a subsequent clarification input from the user to adjust intent inference or task execution associated with the speech input to produce a satisfactory response. In some embodiments, the device identifies a pattern of success or failure associated with an aspect previously used to complete a task and generates a hypothesis regarding a parameter used in speech recognition, intent inference or task execution as a cause for the pattern. Then, the device tests the hypothesis by altering the parameter for a subsequent completion of the task and adopts or rejects the hypothesis based on feedback information collected from the subsequent completion. | 09-18-2014 |
20140324427 | SYSTEM AND DIALOG MANAGER DEVELOPED USING MODULAR SPOKEN-DIALOG COMPONENTS - A dialog manager and spoken dialog service having a dialog manager generated according to a method comprising selecting a top level flow controller based on application type, selecting available reusable subdialogs for each application part, developing a subdialog for each application part not having an available subdialog and testing and deploying the spoken dialog service using the selected top level flow controller, selected reusable subdialogs and developed subdialogs. The dialog manager capable of handling context shifts in a spoken dialog with a user. Application dependencies are established in the top level flow controller thus enabling the subdialogs to be reusable and to be capable of managing context shifts and mixed initiative dialogs. | 10-30-2014 |
20140337026 | METHOD, APPARATUS, AND PROGRAM FOR GENERATING TRAINING SPEECH DATA FOR TARGET DOMAIN - A method and system for generating training data for a target domain using speech data of a source domain. The training data generation method including: reading out a Gaussian mixture model (GMM) of a target domain trained with a clean speech data set of the target domain; mapping, by referring to the GMM of the target domain, a set of source domain speech data received as an input to the set of target domain speech data on a basis of a channel characteristic of the target domain speech data; and adding a noise of the target domain to the mapped set of source domain speech data to output a set of pseudo target domain speech data. | 11-13-2014 |
20140350931 | LANGUAGE MODEL TRAINED USING PREDICTED QUERIES FROM STATISTICAL MACHINE TRANSLATION - A Statistical Machine Translation (SMT) model is trained using pairs of sentences that include content obtained from one or more content sources (e.g. feed(s)) with corresponding queries that have been used to access the content. A query click graph may be used to assist in determining candidate pairs for the SMT training data. All/portion of the candidate pairs may be used to train the SMT model. After training the SMT model using the SMT training data, the SMT model is applied to content to determine predicted queries that may be used to search for the content. The predicted queries are used to train a language model, such as a query language model. The query language model may be interpolated other language models, such as a background language model, as well as a feed language model trained using the content used in determining the predicted queries. | 11-27-2014 |
20140358538 | METHODS AND SYSTEMS FOR SHAPING DIALOG OF SPEECH SYSTEMS - Methods and systems are provided for shaping speech dialog of a speech system. In one embodiment, a method includes: receiving data related to a first utterance from a user of the speech system; processing the data based on at least one attribute processing technique that determines at least one attribute of the first utterance; determining a shaping pattern based on the at least one attribute; and generating a speech prompt based on the shaping pattern. | 12-04-2014 |
20140358539 | METHOD AND APPARATUS FOR BUILDING A LANGUAGE MODEL - A method includes: acquiring data samples; performing categorized sentence mining in the acquired data samples to obtain categorized training samples for multiple categories; building a text classifier based on the categorized training samples; classifying the data samples using the text classifier to obtain a class vocabulary and a corpus for each category; mining the corpus for each category according to the class vocabulary for the category to obtain a respective set of high-frequency language templates; training on the templates for each category to obtain a template-based language model for the category; training on the corpus for each category to obtain a class-based language model for the category; training on the class vocabulary for each category to obtain a lexicon-based language model for the category; building a speech decoder according to an acoustic model, the class-based language model and the lexicon-based language model for any given field, and the data samples. | 12-04-2014 |
20150051909 | PATTERN RECOGNITION APPARATUS AND PATTERN RECOGNITION METHOD - Provided is a pattern recognition apparatus for creating multiple systems and combining the multiple systems to improve the recognition performance, including a discriminative training unit for constructing model parameters of a second or subsequent system based on an output tendency of a previously-constructed model so as to be different from the output tendency of the previously-constructed model. Accordingly, when multiple systems are combined, the recognition performance can be improved without trials-and-errors. | 02-19-2015 |
20150058013 | AUTOMATED VERBAL FLUENCY ASSESSMENT - Techniques are described for calculating one or more verbal fluency scores for a person. An example method includes classifying, by a computing device, samples of audio data of speech of a person, based on amplitudes of the samples, into a first class of samples including speech or sound and a second class of samples including silence. The method further includes analyzing the first class of samples to determine a number of words spoken by the person, and calculating a verbal fluency score for the person based at least in part on the determined number of words spoken by the person. | 02-26-2015 |
20150058014 | SYSTEM AND METHOD FOR MANAGING CONVERSATION - A conversation management system includes: a training unit that generates an articulation speech act and an entity name of a training corpus, that generates a lexical syntactic pattern, and that estimates a speech act and an entity name of a training corpus; a database that stores the articulation speech act, the entity name, and the lexical syntactic pattern of the training corpus; an execution unit that generates an articulation speech act and an entity name of a user, that generates a user lexical syntactic pattern, that estimates a speech act and an entity name of a user, that searches for an articulation pair corresponding to a user articulation at the database using a search condition including the estimated user speech act and the generated user lexical syntactic pattern, and that generates a final response by selecting an articulation template using a restriction condition including an estimated entity name among the found articulation pair; and an output unit that outputs a final response that is generated by the execution unit. | 02-26-2015 |
20150058015 | VOICE PROCESSING APPARATUS, VOICE PROCESSING METHOD, AND PROGRAM - A voice processing apparatus includes a voice quality determining unit configured to determine a target speaker determining method used for a voice quality conversion in accordance with a determining method control value for instructing the target speaker determining method of determining a target speaker whose voice quality is targeted to the voice quality conversion, and determine the target speaker in accordance with the target speaker determining method. | 02-26-2015 |
20150073794 | SPEECH SYLLABLE/VOWEL/PHONE BOUNDARY DETECTION USING AUDITORY ATTENTION CUES - In syllable or vowel or phone boundary detection during speech, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more syllable or vowel or phone boundaries in the input window of sound can be detected by mapping the cumulative gist vector to one or more syllable or vowel or phone boundary characteristics using a machine learning algorithm. | 03-12-2015 |
20150073795 | User Programmable Voice Command Recognition Based On Sparse Features - A low power sound recognition sensor is configured to receive an analog signal that may contain a signature sound. Sparse sound parameter information is extracted from the analog signal. The extracted sparse sound parameter information is processed using a speaker dependent sound signature database stored in the sound recognition sensor to identify sounds or speech contained in the analog signal. The sound signature database may include several user enrollments for a sound command each representing an entire word or multiword phrase. The extracted sparse sound parameter information may be compared to the multiple user enrolled signatures using cosine distance, Euclidean distance, correlation distance, etc., for example. | 03-12-2015 |
20150081297 | SYSTEM AND METHOD FOR UNSUPERVISED AND ACTIVE LEARNING FOR AUTOMATIC SPEECH RECOGNITION - A system and method is provided for combining active and unsupervised learning for automatic speech recognition. This process enables a reduction in the amount of human supervision required for training acoustic and language models and an increase in the performance given the transcribed and un-transcribed data. | 03-19-2015 |
20150088509 | ANTI-SPOOFING - System for classifying whether audio data received in a speaker recognition system is genuine or a spoof using a Gaussian classifier and method for classifying whether audio data received in a speaker recognition system is genuine or a spoof using a Gaussian classifier. | 03-26-2015 |
20150088510 | SYSTEM AND METHOD FOR ROBUST ACCESS AND ENTRY TO LARGE STRUCTURED DATA USING VOICE FORM-FILLING - A method, apparatus and machine-readable medium are provided. A phonotactic grammar is utilized to perform speech recognition on received speech and to generate a phoneme lattice. A document shortlist is generated based on using the phoneme lattice to query an index. A grammar is generated from the document shortlist. Data for each of at least one input field is identified based on the received speech and the generated grammar. | 03-26-2015 |
20150100317 | SPEECH RECOGNITION DEVICE - A speech recognition device starts to generate dictionary data for each type of name based on name data and paraphrase data, and executes dictionary registration of the dictionary data. The speech recognition device obtains text information same as text information for generating the dictionary data last time. When back-up data corresponding to the last time text information is generated, the speech recognition device executes the dictionary registration of the dictionary data generated as the back-up data. Further, a dictionary data generation device executes the dictionary registration of the dictionary data based on given name data every time the dictionary data generation device completes generation of the dictionary data based on the given name data. | 04-09-2015 |
20150112679 | METHOD FOR BUILDING LANGUAGE MODEL, SPEECH RECOGNITION METHOD AND ELECTRONIC APPARATUS - A method for building a language model, a speech recognition method and an electronic apparatus are provided. The speech recognition method includes the following steps. Phonetic transcriptions of a speech signal are obtained from an acoustic model. Phonetic spellings matching the phonetic transcriptions are obtained according to the phonetic transcriptions and a syllable acoustic lexicon. According to the phonetic spellings, a plurality of text sequences and a plurality of text sequence probabilities are obtained from a language model. Each phonetic spelling is matched to a candidate sentence table; a word probability of each phonetic spelling matching a word in a sentence of the sentence table are obtained; and the word probabilities of the phonetic spellings are calculated so as to obtain the text sequence probabilities. The text sequence corresponding to a largest one of the sequence probabilities is selected as a recognition result of the speech signal. | 04-23-2015 |
20150120297 | VOICE-RESPONSIVE BUILDING MANAGEMENT SYSTEM - A voice-responsive building management system is described herein. One system includes an interface, a dynamic grammar builder, and a speech processing engine. The interface is configured to receive a speech card of a user, wherein the speech card of the user includes speech training data of the user and domain vocabulary for applications of the building management system for which the user is authorized. The dynamic grammar builder is configured to generate grammar from a building information model of the building management system. The speech processing engine is configured to receive a voice command or voice query from the user, and execute the voice command or voice query using the speech training data of the user, the domain vocabulary, and the grammar generated from the building information model. | 04-30-2015 |
20150134331 | Always-On Audio Control for Mobile Device - In an embodiment, an integrated circuit may include one or more CPUs, a memory controller, and a circuit configured to remain powered on when the rest of the SOC is powered down. The circuit may be configured to receive audio samples from a microphone, and match those audio samples against a predetermined pattern to detect a possible command from a user of the device that includes the SOC. In response to detecting the predetermined pattern, the circuit may cause the memory controller to power up so that audio samples may be stored in the memory to which the memory controller is coupled. The circuit may also cause the CPUs to be powered on and initialized, and the operating system (OS) may boot. During the time that the CPUs are initializing and the OS is booting, the circuit and the memory may be capturing the audio samples. | 05-14-2015 |
20150134332 | SPEECH RECOGNITION METHOD AND DEVICE - A speech recognition method and device are disclosed. The method includes: acquiring a text file specified by a user, and extracting a command word from the text file, to obtain a command word list; comparing the command word list with a command word library, to confirm whether the command word list includes a new command word; if the command word list includes the new command word, generating a corresponding new pronunciation dictionary; merging the new language model into a language model library; and receiving speech, and performing speech recognition on the speech according to an acoustic model, a phonation dictionary, and the language model library. Command words acquired online are closely related to online content; therefore, the number of the command words is limited and far less than the number of frequently used words. | 05-14-2015 |
20150302850 | EMAIL-LIKE USER INTERFACE FOR TRAINING NATURAL LANGUAGE SYSTEMS - An email-like user interface displays a list of user logs determined based on user-specified list criteria to user logs received in a natural language (NL) training environment. The list comprise a subset of the received user logs in order to minimize the number of actions required to configure and train the NL configuration system in a semi-supervised manner, thereby improving the quality and accuracy of NL configuration system. To determine a list of user logs relevant for training the user logs can be filtered, sorted, grouped and searched within the email-like user interface. A training interface to a network of instances that comprises a plurality of NL configuration systems leverages a crowd-sourcing community of developers in order to efficiently create a customizable NL configuration system. | 10-22-2015 |
20150317974 | SYSTEM AND METHOD FOR CREATING VOICE PROFILES FOR SPECIFIC DEMOGRAPHICS - Systems, methods, and computer-readable storage devices for receiving an utterance from a user and analyzing the utterance to identify the demographics of the user. The system then analyzes the utterance to determine the prosody of the utterance, and retrieves from the Internet data associated with the determined demographics. Using the retrieved data, the system retrieves, also from the Internet, recorded speech matching the identified prosody. The recorded speech, which is based on the demographic data of the utterance and has a prosody matching the utterance, is then saved to a database for future use in generating speech specific to the user. | 11-05-2015 |
20150332669 | EFFICIENT APPARATUS AND METHOD FOR AUDIO SIGNATURE GENERATION USING MOTION - An automatic content recognition system that includes a user device for the purpose of capturing audio and generating an audio signature. The user device may be a smartphone or tablet. The system is also capable of determining whether a user device is in motion and refraining from audio monitoring and/or generating audio signatures when the user device is in motion. Motion may also be used to reduce the frequency of audio monitoring and/or signature generation. The system may have a database within the user device or the user device may communicate with a server having a database that contains reference audio signatures. | 11-19-2015 |
20150348555 | Voice Recognition Device, Voice Recognition Program, and Voice Recognition Method - It is an object of the present invention to provide a technology for a speech recognition device having higher convenience. The speech recognition device according to the present invention includes: a storage unit for storing screen definition information, in which a screen is associated with an option on the screen, and selection history information identifying a number of selected times for each of the options; a touch instruction reception unit for receiving an instruction through a touching operation; a voice instruction reception unit for receiving an instruction through an operation using a voice; and an option reading unit for conducting, when reception of the instruction conducted by the touch instruction reception unit is restricted on a predetermined screen, voice outputs of the options on the predetermined screen in order corresponding to the number of selected, times in which the voice instruction reception unit receives an instruction regarding any one of the options output by the option reading unit. | 12-03-2015 |
20150371629 | SYSTEM AND METHOD FOR ENABLING SEARCH AND RETRIEVAL OPERATIONS TO BE PERFORMED FOR DATA ITEMS AND RECORDS USING DATA OBTAINED FROM ASSOCIATED VOICE FILES - A method and system are provided for using the contents of voice files as a basis for enabling search and other selection operations for data items that are associated with those voice files. Voice files may be received having associations with other data items, such as images or records. A corresponding text file is generated for each of the one or more voice files using programmatic means, such as a speech-to-text application. Each text file is provided an association with a data item based on the association of the voice file that served as the basis of its creation. Each text file is then made available for the performance of search and selection operations that result in the identification of associated data items. | 12-24-2015 |
20160027432 | Speaker Dependent Voiced Sound Pattern Template Mapping - Various implementations disclosed herein include a training module configured to produce a set of segment templates from a concurrent segmentation of a plurality of vocalization instances of a VSP vocalized by a particular speaker, who is identifiable by a corresponding set of vocal characteristics. Each segment template provides a stochastic characterization of how each of one or more portions of a VSP is vocalized by the particular speaker in accordance with the corresponding set of vocal characteristics. Additionally, in various implementations, the training module includes systems, methods and/or devices configured to produce a set of VSP segment maps that each provide a quantitative characterization of how respective segments of the plurality of vocalization instances vary in relation to a corresponding one of a set of segment templates. | 01-28-2016 |
20160042733 | DYNAMIC GEO-FENCING FOR VOICE RECOGNITION DICTIONARY - There is a provided a mobile electronic device and a computer-implemented method. The method includes: receiving, by an electronic device, location parameters comprising a location of the electronic device, a direction of travel of the electronic device and a speed of the electronic device; configuring, by the electronic device, a dynamic geo-fenced area based on the location parameters, wherein the dynamic geo-fenced area surrounds the location of the electronic device; retrieving, by the electronic device, a Voice Recognition (VR) dictionary subset comprising data associated with the dynamic geo-fenced area from a VR dictionary, wherein the data comprises a broadcast station name and a broadcast frequency associated with the broadcast station name; and performing voice recognition using the VR dictionary subset. | 02-11-2016 |
20160086597 | TECHNOLOGY FOR RESPONDING TO REMARKS USING SPEECH SYNTHESIS - The present invention is provided with: a voice input section that receives a remark (a question) via a voice signal; a reply creation section that creates a voice sequence of a reply (response) to the remark; a pitch analysis section that analyzes the pitch of a first segment (e.g., word ending) of the remark; and a voice generation section (a voice synthesis section, etc.) that generates a reply, in the form of voice, represented by the voice sequence. The voice generation section controls the pitch of the entire reply in such a manner that the pitch of a second segment (e.g., word ending) of the reply assumes a predetermined pitch (e.g., five degrees down) with respect to the pitch of the first segment of the remark. Such arrangements can realize synthesis of replying voice capable of giving a natural feel to the user. | 03-24-2016 |
20160086599 | Speech Recognition Model Construction Method, Speech Recognition Method, Computer System, Speech Recognition Apparatus, Program, and Recording Medium - A construction method for a speech recognition model, in which a computer system includes; a step of acquiring alignment between speech of each of a plurality of speakers and a transcript of the speaker; a step of joining transcripts of the respective ones of the plurality of speakers along a time axis, creating a transcript of speech of mixed speakers obtained from synthesized speech of the speakers, and replacing predetermined transcribed portions of the plurality of speakers overlapping on the time axis with a unit which represents a simultaneous speech segment; and a step of constructing at least one of an acoustic model and a language model which make up a speech recognition model, based on the transcript of the speech of the mixed speakers. | 03-24-2016 |
20160098986 | SYSTEM AND METHOD OF AUTOMATIC SPEECH RECOGNITION USING ON-THE-FLY WORD LATTICE GENERATION WITH WORD HISTORIES - A systems, article, and method of automatic speech recognition using on-the-fly word lattice generation with word histories. | 04-07-2016 |
20160104476 | Cognitive Security for Voice Phishing Activity - An approach is provided in which a question answer system monitors a voice conversation between a first entity and a second entity. During the conversation, the question answer system parses the conversation into information phrases, and constructs the information phrases into a current conversation pattern. The question answer system identifies deceptive conversation properties of the current conversation by analyzing the current conversation pattern against domain-based conversation patterns. The question answer system, in turn, sends an alert message to the first entity to notify the first entity of the identified deceptive conversation properties. | 04-14-2016 |
20160104477 | METHOD FOR THE INTERPRETATION OF AUTOMATIC SPEECH RECOGNITION - A device for automated improvement of digital speech interpretation on a computer system includes: a speech recognizer, configured to recognize digitally input speech; a speech interpreter, configured to accept the output of the speech recognizer as an input, and to manage a digital vocabulary with keywords and their synonyms in a database in order to trigger a specific function; and a speech synthesizer, configured to automatically synthesize the keywords and to feed them to the speech recognizer in order to then insert its output as further synonyms into the database of the speech interpreter if they differ from the keywords or their synonyms. | 04-14-2016 |
20160379625 | LANGUAGE MODEL BIASING MODULATION - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for modulating language model biasing. In some implementations, context data is received. A likely context associated with a user is determined based on at least a portion of the context data. One or more language model biasing parameters based at least on the likely context associated with the user is selected. A context confidence score associated with the likely context based on at least a portion of the context data is determined. One or more language model biasing parameters based at least on the context confidence score is adjusted. A baseline language model based at least on the one or more of the adjusted language model biasing parameters is biased. The baseline language model is provided for use by an automated speech recognizer (ASR). | 12-29-2016 |
20170236058 | GENERATING A TRAINING MODEL BASED ON FEEDBACK | 08-17-2017 |
20190147853 | QUANTIZED DIALOG LANGUAGE MODEL FOR DIALOG SYSTEMS | 05-16-2019 |
20220139375 | MEMORY DETERIORATION DETECTION AND AMELIORATION - Memory deterioration detection and evaluation includes capturing human utterances with a voice interface and generating, for a user, a human utterances corpus that comprises human utterances selected from the plurality of human utterances based on meanings of the human utterances as determined by natural language processing by a computer processor. Based on data generated in response to signals sensed by one or more sensing devices operatively coupled with the computer processor, contextual information corresponding to one or more human utterances of the corpus is determined. Patterns among the corpus of human utterances are recognized based on pattern recognition performed by the computer processor using one or more machine learning models. Based on the pattern recognition a change in memory functioning of the user is identified. The identified change is classified, based on the contextual information, as to whether the change is likely due to memory impairment of the user. | 05-05-2022 |