Entries |
Document | Title | Date |
20080201148 | SYSTEM AND METHOD FOR GENERATING AND USING AN ARRAY OF DYNAMIC GRAMMAR - A system and method for generating dynamic grammars for use by a speech recognition system in response to signals from sensors indicative of the position and/or movement of a vehicle or platform, such as an aircraft or helicopter. | 08-21-2008 |
20080208583 | METHOD AND APPARATUS FOR BUILDING ASSET BASED NATURAL LANGUAGE CALL ROUTING APPLICATION WITH LIMITED RESOURCES - A method of processing limited natural language data to automatically develop an optimal feature set, bypassing the standard Wizard of OZ (WOZ) approach is provided. The method provides for building natural language understanding models or for processing existing data from other domains, such as the Internet, for domain-specific adaptation through the use of an optimal feature set. Consequently, when the optimal feature set is passed on to any engine, the optimal feature set produces robust models that can be used for natural language call routing. | 08-28-2008 |
20080221892 | SYSTEMS AND METHODS FOR AN AUTONOMOUS AVATAR DRIVER - The autonomous avatar driver is useful in association with language sources. A sourcer may receive dialog from the language source. It may also, in some embodiments, receive external data from data sources. A segmentor may convert characters, represent particles and split dialog. A parser may then apply a link grammar, analyze grammatical mood, tag the dialog and prune dialog variants. A semantic engine may lookup token frames, generate semantic lexicons and semantic networks, and resolve ambiguous co-references. An analytics engine may filter common words from dialog, analyze N-grams, count lemmatized words, and analyze nodes. A pragmatics analyzer may resolve slang, generate knowledge templates, group proper nouns and estimate affect of dialog. A recommender may generate tag clouds, cluster the language sources into neighborhoods, recommend social networking to individuals and businesses, and generate contextual advertising. Lastly, a response generator may generate responses for the autonomous avatar using the analyzed dialog. The response generator may also incorporate the generated recommendations. | 09-11-2008 |
20080221893 | SYSTEM AND METHOD FOR DYNAMIC LEARNING - New language constantly emerges from complex, collaborative human-human interactions like meetings—such as when a presenter handwrites a new term on a whiteboard while saying it redundantly. The system and method described includes devices for receiving various types of human communication activities (e.g., speech, writing and gestures) presented in a multimodally redundant manner, includes processors and recognizers for segmenting or parsing, and then recognizing selected sub-word units such as phonemes and syllables, and then includes alignment, refinement, and integration modules to find or at least an approximate match to the one or more terms that were presented in the multimodally redundant manner. Once the system has performed a successful integration, one or more terms may be newly enrolled into a database of the system, which permits the system to continuously learn and provide an association for proper names, abbreviations, acronyms, symbols, and other forms of communicated language. | 09-11-2008 |
20080228486 | METHOD AND SYSTEM HAVING HYPOTHESIS TYPE VARIABLE THRESHOLDS - A method (and system) for spoken dialog confirmation classifies a plurality of spoken dialog hypotheses, and assigns a threshold to each class of spoken dialog hypotheses. | 09-18-2008 |
20080235021 | Indexing Digitized Speech With Words Represented In The Digitized Speech - Indexing digitized speech with words represented in the digitized speech, with a multimodal digital audio editor operating on a multimodal device supporting modes of user interaction, the modes of user interaction including a voice mode and one or more non-voice modes, the multimodal digital audio editor operatively coupled to an ASR engine, including providing by the multimodal digital audio editor to the ASR engine digitized speech for recognition; receiving in the multimodal digital audio editor from the ASR engine recognized user speech including a recognized word, also including information indicating where, in the digitized speech, representation of the recognized word begins; and inserting by the multimodal digital audio editor the recognized word, in association with the information indicating where, in the digitized speech, representation of the recognized word begins, into a speech recognition grammar, the speech recognition grammar voice enabling user interface commands of the multimodal digital audio editor. | 09-25-2008 |
20080235022 | Automatic Speech Recognition With Dynamic Grammar Rules - Automatic speech recognition implemented with a speech recognition grammar of a multimodal application in an ASR engine, the multimodal application operating on a multimodal device supporting multiple modes of user interaction including a voice mode, the multimodal application operatively coupled to the ASR engine, including: matching by the ASR engine at least one static rule of the speech recognition grammar with at least one word of a voice utterance, yielding a matched value, the matched value specified by the grammar to be required for processing of a dynamic rule of the grammar; and dynamically defining at run time the dynamic rule of the grammar as a new static rule in dependence upon the matched value, the dynamic rule comprising a rule that is specified by the grammar as a rule that is not to be processed by the ASR until after the at least one static rule has been matched. | 09-25-2008 |
20080235023 | SYSTEMS AND METHODS FOR RESPONDING TO NATURAL LANGUAGE SPEECH UTTERANCE - Systems and methods for receiving natural language queries and/or commands and execute the queries and/or commands. The systems and methods overcomes the deficiencies of prior art speech query and response systems through the application of a complete speech-based information query, retrieval, presentation and command environment. This environment makes significant use of context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users making queries or commands in multiple domains. Through this integrated approach, a complete speech-based natural language query and response environment can be created. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command. | 09-25-2008 |
20080243507 | NATURAL ERROR HANDLING IN SPEECH RECOGNITION - A user interface, and associated techniques, that permit a fast and efficient way of correcting speech recognition errors, or of diminishing their impact. The user may correct mistakes in a natural way, essentially by repeating the information that was incorrectly recognized previously. Such a mechanism closely approximates what human-to-human dialogue would be in similar circumstances. Such a system fully takes advantage of all the information provided by the user, and on its own estimates the quality of the recognition in order to determine the correct sequence of words in the fewest number of steps. | 10-02-2008 |
20080249775 | Information exchange system and method - A method receives a request for information regarding a product or service from a user. The received request is provided to a speech processing system which attempts to generate an automated response to the received request. If the speech processing system generates a response to the received request, that response is provided to the user. However, if the speech processing system does not generate a response to the received request, the user is referred to an advisor to handle the received request. | 10-09-2008 |
20080255845 | Speech Based Query System Using Semantic Decoding - An intelligent query system for processing voiced-based queries is disclosed, which uses a combination of both statistical and semantic based processing to identify the question posed by the user by understanding the meaning of the user's utterance. Based on identifying the meaning of the utterance, the system selects a single answer that best matches the user's query. The answer that is paired to this single question is then retrieved and presented to the user. The system, as implemented, accepts environmental variables selected by the user and is scalable to provide answers to a variety and quantity of user-initiated queries. | 10-16-2008 |
20080270135 | METHOD AND SYSTEM FOR USING A STATISTICAL LANGUAGE MODEL AND AN ACTION CLASSIFIER IN PARALLEL WITH GRAMMAR FOR BETTER HANDLING OF OUT-OF-GRAMMAR UTTERANCES - A method (and system) of handling out-of-grammar utterances includes building a statistical language model for a dialog state using, generating sentences and semantic interpretations for the sentences using finite state grammar, building a statistical action classifier, receiving user input, carrying out recognition with the finite state grammar, carrying out recognition with the statistical language model, using the statistical action classifier to find semantic interpretations, comparing an output from the finite state grammar and an output from the statistical language model, deciding which output of the output from the finite state grammar and the output from the statistical language model to keep as a final recognition output, selecting the final recognition output, and outputting the final recognition result, wherein the statistical action classifier, the finite state grammar and the statistical language model are used in conjunction to carry out speech recognition and interpretation. | 10-30-2008 |
20080270136 | Methods and Apparatus for Use in Speech Recognition Systems for Identifying Unknown Words and for Adding Previously Unknown Words to Vocabularies and Grammars of Speech Recognition Systems - The present invention concerns methods and apparatus for identifying and assigning meaning to words not recognized by a vocabulary or grammar of a speech recognition system. In an embodiment of the invention, the word may be in an acoustic vocabulary of the speech recognition system, but may be unrecognized by an embedded grammar of a language model of the speech recognition system. In another embodiment of the invention, the word may not be recognized by any vocabulary associated with the speech recognition system. In embodiments of the invention, at least one hypothesis is generated for an utterance not recognized by the speech recognition system. If the at least one hypothesis meets at least one predetermined criterion, a sword or more corresponding to the at least one hypothesis is added to the vocabulary of the speech recognition system. In other embodiments of the invention, before adding the word to the vocabulary of the speech recognition system, the at least one hypothesis may be presented to the user of the speech recognition system to determine if that is what the used intended when the user spoke. | 10-30-2008 |
20080275704 | Method for a System of Performing a Dialogue Communication with a User - The present invention relates to a method for a system ( | 11-06-2008 |
20080300881 | Speech Recognition Device Using Statistical Language Model | 12-04-2008 |
20080319751 | SYSTEMS AND METHODS FOR RESPONDING TO NATURAL LANGUAGE SPEECH UTTERANCE - Systems and methods for receiving natural language queries and/or commands and execute the queries and/or commands. The systems and methods overcomes the deficiencies of prior art speech query and response systems through the application of a complete speech-based information query, retrieval, presentation and command environment. This environment makes significant use of context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users making queries or commands in multiple domains. Through this integrated approach, a complete speech-based natural language query and response environment can be created. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command. | 12-25-2008 |
20090018833 | MODEL WEIGHTING, SELECTION AND HYPOTHESES COMBINATION FOR AUTOMATIC SPEECH RECOGNITION AND MACHINE TRANSLATION - A translation method and system include a recognition engine having a plurality of models each being employed to decode a same utterance to provide an output. A model combiner is configured to assign probabilities to each model output and configured to assign weights to the outputs of the plurality of models based on the probabilities to provide a best performing model for the context of the utterance. | 01-15-2009 |
20090018834 | Personal Virtual Assistant - A computer-based virtual assistant includes a virtual assistant application running on a computer capable of receiving human voice communications from a user of a remote user interface and transmitting a vocalization to the remote user interface, the virtual assistant application enabling the user to access email and voicemail messages of the user, the virtual assistant application selecting a responsive action to a verbal query or instruction received from the remote user interface and transmitting a vocalization characterizing the selected responsive action to the remote user interface, and the virtual assistant waiting a predetermined period of time, and if no canceling indication is received from the remote user interface, proceeding to perform the selected responsive action, and if a canceling indication is received from the remote user interface halting the selected responsive action and transmitting a new vocalization to the remote user interface. Also a method of using the virtual assistant. | 01-15-2009 |
20090018835 | Personal Virtual Assistant - A computer-based virtual assistant includes a virtual assistant application running on a computer capable of receiving human voice communications from a user of a remote user interface and transmitting a vocalization to the remote user interface, the virtual assistant application enabling the user to access email and voicemail messages of the user, the virtual assistant application selecting a responsive action to a verbal query or instruction received from the remote user interface and transmitting a vocalization characterizing the selected responsive action to the remote user interface, and the virtual assistant waiting a predetermined period of time, and if no canceling indication is received from the remote user interface, proceeding to perform the selected responsive action, and if a canceling indication is received from the remote user interface halting the selected responsive action and transmitting a new vocalization to the remote user interface. Also a method of using the virtual assistant. | 01-15-2009 |
20090024391 | SPEECH RECOGNITION SYSTEM AND METHOD - According to the present invention, a method for integrating processes with a multi-faceted human centered interface is provided. The interface is facilitated to implement a hands free, voice driven environment to control processes and applications. A natural language model is used to parse voice initiated commands and data, and to route those voice initiated inputs to the required applications or processes. The use of an intelligent context based parser allows the system to intelligently determine what processes are required to complete a task which is initiated using natural language. A single window environment provides an interface which is comfortable to the user by preventing the occurrence of distracting windows from appearing. The single window has a plurality of facets which allow distinct viewing areas. Each facet has an independent process routing its outputs thereto. As other processes are activated, each facet can reshape itself to bring a new process into one of the viewing areas. All activated processes are executed simultaneously to provide true multitasking. | 01-22-2009 |
20090024392 | SPEECH RECOGNITION DICTIONARY COMPILATION ASSISTING SYSTEM, SPEECH RECOGNITION DICTIONARY COMPILATION ASSISTING METHOD AND SPEECH RECOGNITION DICTIONARY COMPILATION ASSISTING PROGRAM - A speech recognition dictionary making supporting system for efficiently making/updating a speech recognition dictionary/language model with reduced speech recognition errors by using text data available at low cost. The speech recognition dictionary making supporting system comprises a recognition dictionary storage section ( | 01-22-2009 |
20090030692 | Natural Language System and Method Based on Unisolated Performance Metric - A natural language business system and method is developed to understand the underlying meaning of a person's speech, such as during a transaction with the business system. The system includes a speech recognition engine, and action classification engine, and a control module. The control module causes the system to execute an inventive method wherein the speech recognition and action classification models may be recursively optimized on an unisolated performance metric that is pertinent to the overall performance of the natural language business system, as opposed to the isolated model-specific criteria previously employed | 01-29-2009 |
20090043582 | METHOD AND SYSTEM FOR CREATION OF VOICE TRAINING PROFILES WITH MULTIPLE METHODS WITH UNIFORM SERVER MECHANISM USING HETEROGENEOUS DEVICES - A system and method for creating user voice profiles enables a user to create a single user voice profile that is compatible with one or more voice servers. Such a system includes a training server that receives audio information from a client associated with a user and stores the audio information and corresponding textual information. The system further includes a training server adaptor. The training server adaptor is configured to receive a voice profile format and a communication protocol corresponding to one of the plurality of voice servers, convert the audio information and corresponding textual information into a format compatible with the voice profile format and communication protocol corresponding to the one of the plurality of voice servers, and provide the converted audio information and corresponding textual information to the one of the plurality of voice servers. | 02-12-2009 |
20090048839 | SPEECH RECOGNITION APPARATUS AND METHOD THEREOF - A speech recognition apparatus includes a first grammar storage unit configured to store one or more grammar segments, a second grammar storage unit configured to store one or more grammar segments, a first decoder configured to carry out a decoding process by referring to the grammar segment stored in the second grammar storage unit, a grammar transfer unit configured to transfer a trailing grammar segment from the first grammar storage unit to the second grammar storage unit, a second decoder configured to operate in parallel to the grammar transfer unit and carry out the decoding process by referring to the grammar segment stored in the second grammar storage unit, and a recognition control unit configured to monitor the state of transfer of the trailing grammar segment carried out by the grammar transfer unit and activate the both decoders by switching the operation thereof according to the state of transfer of the grammar segment. | 02-19-2009 |
20090055184 | Creation and Use of Application-Generic Class-Based Statistical Language Models for Automatic Speech Recognition - A method of creating an application-generic class-based SLM includes, for each of a plurality of speech applications, parsing a corpus of utterance transcriptions to produce a first output set, in which expressions identified in the corpus are replaced with corresponding grammar tags from a grammar that is specific to the application. The method further includes, for each of the plurality of speech applications, replacing each of the grammar tags in the first output set with a class identifier of an application-generic class, to produce a second output set. The method further includes processing the resulting second output sets with a statistical language model (SLM) trainer to generate an application-generic class-based SLM. | 02-26-2009 |
20090055185 | VOICE CHAT SYSTEM, INFORMATION PROCESSING APPARATUS, SPEECH RECOGNITION METHOD, KEYWORD DATA ELECTRODE DETECTION METHOD, AND PROGRAM - A voice chat system includes a plurality of information processing apparatuses that performs a voice chat while performing speech recognition and a search server connected to the plural information processing apparatuses via a communication network. The search server discloses a search keyword list containing the search keywords searched by the search server to at least one of the plural information processing apparatuses. The at least one information processing apparatus includes a recognition word dictionary generating unit that acquires the search keyword list from the search server to generate a recognition word dictionary containing words for use in the speech recognition, and a speech recognition unit that performs speech recognition on voice data obtained from a dialog of the conversation during the voice chat by referencing a recognition database containing the recognition word dictionary. | 02-26-2009 |
20090070112 | Automatic reading tutoring - A method of providing automatic reading tutoring is disclosed. The method includes retrieving a textual indication of a story from a data store and creating a language model including constructing a target context free grammar indicative of a first portion of the story. A first acoustic input is received and a speech recognition engine is employed to recognize the first acoustic input. An output of the speech recognition engine is compared to the language model and a signal indicative of whether the output of the speech recognition matches at least a portion of the target context free grammar is provided. | 03-12-2009 |
20090070113 | SYSTEM FOR HANDLING FREQUENTLY ASKED QUESTIONS IN A NATURAL LANGUAGE DIALOG SERVICE - A voice-enabled help desk service is disclosed. The service comprises an automatic speech recognition module for recognizing speech from a user, a spoken language understanding module for understanding the output from the automatic speech recognition module, a dialog management module for generating a response to speech from the user, a natural voices text-to-speech synthesis module for synthesizing speech to generate the response to the user, and a frequently asked questions module. The frequently asked questions module handles frequently asked questions from the user by changing voices and providing predetermined prompts to answer the frequently asked question. | 03-12-2009 |
20090076818 | METHOD AND SYSTEM OF BUILDING A GRAMMAR RULE WITH BASEFORMS GENERATED DYNAMICALLY FROM USER UTTERANCES | 03-19-2009 |
20090089060 | Document Based Character Ambiguity Resolution - Methods and apparatus for document based ambiguous character resolution. An application searches a document for words that do not contain ambiguous characters and adds them to a dictionary, then searches the document for words that do contain ambiguous characters. For each ambiguous word, a set of candidate solutions is created by resolving the ambiguous characters in all possible ways. The dictionary is searched for words matching members of the candidate solution set. When a single member is matched, the ambiguous characters are resolved accordingly. When no member or more than one member is matched, a user is prompted to resolve the ambiguous characters. Alternatively, when more than one member is matched, the ambiguous characters are resolved to obtain the largest word, the smallest word, the most words, or the fewest words. | 04-02-2009 |
20090150156 | SYSTEM AND METHOD FOR PROVIDING A NATURAL LANGUAGE VOICE USER INTERFACE IN AN INTEGRATED VOICE NAVIGATION SERVICES ENVIRONMENT - A conversational, natural language voice user interface may provide an integrated voice navigation services environment. The voice user interface may enable a user to make natural language requests relating to various navigation services, and further, may interact with the user in a cooperative, conversational dialogue to resolve the requests. Through dynamic awareness of context, available sources of information, domain knowledge, user behavior and preferences, and external systems and devices, among other things, the voice user interface may provide an integrated environment in which the user can speak conversationally, using natural language, to issue queries, commands, or other requests relating to the navigation services provided in the environment. | 06-11-2009 |
20090157404 | GRAMMAR WEIGHTING VOICE RECOGNITION INFORMATION - A device receives a voice recognition statistic from a voice recognition application and applies a grammar improvement rule based on the voice recognition statistic. The device also automatically adjusts a weight of the voice recognition statistic based on the grammar improvement rule, and outputs the weight adjusted voice recognition statistic for use in the voice recognition application. | 06-18-2009 |
20090157405 | USING PARTIAL INFORMATION TO IMPROVE DIALOG IN AUTOMATIC SPEECH RECOGNITION SYSTEMS - A method, system and computer readable device for recognizing a partial utterance in an automatic speech recognition (ASR) system where said method comprising the steps of, receiving, by a ASR recognition unit, an input signal representing a speech utterance or word and transcribing the input signal into text, interpreting, by a ASR interpreter unit, whether the text is either a positive or a negative match to a list of automated options by matching the text with a grammar or semantic database representing the list of automated options, wherein if the ASR interpreter unit results in said positive match proceeding to a next input signal and if the ASR interpreter unit results in said negative match rejecting the text as representing said partial utterance, and processing, by a linguistic filtering unit, the rejected text to derive a correct match between the rejected text and the grammar or semantic database. And, then using the derived word for responding to the user in the next dialog turn in order to reduce or eliminate churn in the human-computer spoken dialog interaction. | 06-18-2009 |
20090171663 | REDUCING A SIZE OF A COMPILED SPEECH RECOGNITION GRAMMAR - The present invention discloses creating and using speech recognition grammars of reduced size. The reduced speech recognition grammars can include a set of entries, each entry having a unique identifier and a phonetic representation that is used when matching speech input against the entries. Each entry can lack a textual spelling corresponding to the phonetic representation. The reduced speech recognition grammar can be digitally encoded and stored in a computer readable media, such as a hard drive or flash memory of a portable speech enabled device. | 07-02-2009 |
20090171664 | SYSTEMS AND METHODS FOR RESPONDING TO NATURAL LANGUAGE SPEECH UTTERANCE - Systems and methods for receiving natural language queries and/or commands and execute the queries and/or commands. The systems and methods overcomes the deficiencies of prior art speech query and response systems through the application of a complete speech-based information query, retrieval, presentation and command environment. This environment makes significant use of context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users making queries or commands in multiple domains. Through this integrated approach, a complete speech-based natural language query and response environment can be created. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command. | 07-02-2009 |
20090198496 | Aspect oriented programmable dialogue manager and apparatus operated thereby - A dialogue system enabling a natural language interaction between a user and a machine having a script interpreter capable of executing dialogue specifications formed according to the rules of an aspect oriented programming language. The script interpreter further contains an advice executor which operates in a weaver type fashion using an appropriately defined select function to determine at most one advice to be executed at join points identified by pointcuts. | 08-06-2009 |
20090222267 | AUTOMATED SENTENCE PLANNING IN A TASK CLASSIFICATION SYSTEM - The invention relates to a task classification system ( | 09-03-2009 |
20090240500 | SPEECH RECOGNITION APPARATUS AND METHOD - A speech recognition apparatus includes a storage unit which store vocabularies, each of vocabularies including plural word body data, each of the word body data obtained by removing a specific word head from a word or sentence, and store at least one word head portion including labeled nodes to express at least one common word head common to at least two of the vocabularies, an instruction receiving unit which receive an instruction of a target vocabulary and an instruction of a operation, a grammar network generating unit which generate, when adding is instructed, a grammar network containing the word head portion, the target vocabulary and connection information indicating that each of the word body data contained in the target vocabulary is connected to a specific one of the labeled nodes contained in the word head portion, and a speech recognition unit which execute speech recognition using the generated grammar network. | 09-24-2009 |
20090248416 | SYSTEM AND METHOD OF SPOKEN LANGUAGE UNDERSTANDING USING WORD CONFUSION NETWORKS - Word lattices that are generated by an automatic speech recognition system are used to generate a modified word lattice that is usable by a spoken language understanding module. In one embodiment, the spoken language understanding module determines a set of salient phrases by calculating an intersection of the modified word lattice, which is optionally preprocessed, and a finite state machine that includes a plurality of salient grammar fragments. | 10-01-2009 |
20090299745 | SYSTEM AND METHOD FOR AN INTEGRATED, MULTI-MODAL, MULTI-DEVICE NATURAL LANGUAGE VOICE SERVICES ENVIRONMENT - A system and method for an integrated, multi-modal, multi-device natural language voice services environment may be provided. In particular, the environment may include a plurality of voice-enabled devices each having intent determination capabilities for processing multi-modal natural language inputs in addition to knowledge of the intent determination capabilities of other devices in the environment. Further, the environment may be arranged in a centralized manner, a distributed peer-to-peer manner, or various combinations thereof. As such, the various devices may cooperate to determine intent of multi-modal natural language inputs, and commands, queries, or other requests may be routed to one or more of the devices best suited to take action in response thereto. | 12-03-2009 |
20090306984 | SYSTEM FOR AND METHOD OF AUTOMATED QUALITY MONITORING - A system and method according to the present invention automates call monitoring activities to evaluate and directly improve agent-customer interactions. Rather than listening to an entire call or monitoring only a small fraction of all the calls made in the contact center, the system performs highly accurate, automated evaluations of all customer interactions. By automating the time-consuming aspect of monitoring calls, the system empowers contact center operators to address quality issues, more accurately measure, coach and reward agents, and identify business-critical trends. | 12-10-2009 |
20090326947 | SYSTEM AND METHOD FOR SPOKEN TOPIC OR CRITERION RECOGNITION IN DIGITAL MEDIA AND CONTEXTUAL ADVERTISING - Systems and methods for automated analysis and targeting of digital media based upon spoken topic or criterion recognition of the digital media are provided. Pre-specified criteria are used as the starting point for a top-down topic or criterion recognition approach. Individual words used in the audio track of the digital media are recognized only in context of each candidate topic or criterion hypothesis, thus yielding greater accuracy than two-step approaches that first transcribe speech and then recognize topic based upon the transcription. | 12-31-2009 |
20100010814 | ENHANCING MEDIA PLAYBACK WITH SPEECH RECOGNITION - A method for enhancing a media file to enable speech-recognition of spoken navigation commands can be provided. The method can include receiving a plurality of textual items based on subject matter of the media file and generating a grammar for each textual item, thereby generating a plurality of grammars for use by a speech recognition engine. The method can further include associating a time stamp with each grammar, wherein a time stamp indicates a location in the media file of a textual item corresponding with a grammar. The method can further include associating the plurality of grammars with the media file, such that speech recognized by the speech recognition engine is associated with a corresponding location in the media file. | 01-14-2010 |
20100023331 | Speech recognition semantic classification training - An automated method is described for developing an automated speech input semantic classification system such as a call routing system. A set of semantic classifications is defined for classification of input speech utterances, where each semantic classification represents a specific semantic classification of the speech input. The semantic classification system is trained from training data having little or no in-domain manually transcribed training data, and then operated to assign input speech utterances to the defined semantic classifications. Adaptation training data based on input speech utterances is collected with manually assigned semantic labels. When the adaptation training data satisfies a pre-determined adaptation criteria, the semantic classification system is automatically retrained based on the adaptation training data. | 01-28-2010 |
20100030560 | SPEECH RECOGNITION SYSTEM, SPEECH RECOGNITION METHOD, AND SPEECH RECOGNITION PROGRAM - Disclosed is a speech recognition system in which a common data processing means performs speech recognition of a speech captured by a speech input means to generate recognition result hypotheses which is not biased to one of applications and an adaptation data processing means regenerates recognition result hypotheses, using adaptation data and adaptation processing for each application. The adaptation data processing means provides to each application the recognition result recalculated for each application. | 02-04-2010 |
20100049520 | SYSTEMS AND METHODS FOR AUTOMATICALLY DETERMINING CULTURE-BASED BEHAVIOR IN CUSTOMER SERVICE INTERACTIONS - Systems and methods are provided to automatically determine culture-based behavioral tendencies and preferences of individuals in the context of customer service interactions. For example, systems and methods are provided to process natural language dialog input of an individual to detect linguistic features indicative of individualistic and collectivistic behavioral tendencies and predict whether such individual will be cooperative or uncooperative with automated customer service. | 02-25-2010 |
20100049521 | SELECTIVE ENABLEMENT OF SPEECH RECOGNITION GRAMMARS - A method for processing speech audio in a network connected client device can include selecting a speech grammar for use in a speech recognition system in the network connected client device; characterizing the selected speech grammar; and, based on the characterization, determining whether to process the speech grammar locally in the network connected client device, or remotely in a speech server in the network. In one aspect of the invention, the selecting step can include establishing a communications session with a speech server; and, querying the speech server for a speech grammar over the established communications session. Additionally, the selecting step can further include registering the speech grammar in the speech recognition system. | 02-25-2010 |
20100057463 | System and Method for Generating Natural Language Phrases From User Utterances in Dialog Systems - Embodiments of a dialog system that employs a corpus-based approach to generate responses based on a given number of semantic constraint-value pairs are described. The system makes full use of the data from the user input to produce dialog system responses in combination with a template generator. The system primarily utilizes constraint values in order to realize efficiencies based on the more frequent tasks performed in real dialog systems although rhetorical or discourse aspects of the dialog could also be included in a similar way, that is, labeling the data with such information and performing a training process. The benefits of this system include higher quality user-aligned responses, broader coverage, faster response time, and shorter development cycles. | 03-04-2010 |
20100082343 | SEQUENTIAL SPEECH RECOGNITION WITH TWO UNEQUAL ASR SYSTEMS - Sequential speech recognition using two unequal automatic speech recognition (ASR) systems may be provided. The system may provide two sets of vocabulary data. A determination may be made as to whether entries in one set of vocabulary data are likely to be confused with entries in the other set of vocabulary data. If confusion is likely, a decoy entry from one set of the vocabulary data may be placed in the other set of vocabulary data to ensure more efficient and accurate speech recognition processing may take place. | 04-01-2010 |
20100100383 | SYSTEM AND METHOD FOR SEARCHING WEBPAGE WITH VOICE CONTROL - The present invention relates to a system and method for searching webpage by voice control. After an explorer opens a webpage, a natural language processing module searches a plurality of texts and hyperlinks in the webpage and remarks them with numbers and colors, and generates a constructive concept script representing them after analyzing the text, the hyperlink and the remarks. The constructive concept script and the corresponding linking commands can be stored in the database. The user only has to speak out the number, color or keyword, the system receives the user's voice signal by a sound receiving device and generates the constructive concept script by a voice identification module in order to match with a plurality of constructive concept script stored in the database, and execute the command of the resulting match. Thus, the user can directly say the number, color or keyword, the explorer will link to the desired webpage accordingly. Because the operation in the explorer is convenient and prompt, the user can retrieve the desired webpage in a shorter time. | 04-22-2010 |
20100100384 | Speech Recognition System with Display Information - A language processing system may determine a display form of a spoken word by analyzing the spoken form using a language model that includes dictionary entries for display forms of homonyms. The homonyms may include trade names as well as given names and other phrases. The language processing system may receive spoken language and produce a display form of the language while displaying the proper form of the homonym. Such a system may be used in search systems where audio input is converted to a graphical display of a portion of the spoken input. | 04-22-2010 |
20100114577 | METHOD AND DEVICE FOR THE NATURAL-LANGUAGE RECOGNITION OF A VOCAL EXPRESSION - The invention relates to a method and a device for the natural-language recognition of a vocal expression. A vocal expression of a person is detected and converted into a voice signal to be processed by a voice recognition device. Afterwards, the voice signal is analyzed at the same time or sequentially in a plurality of voice recognition branches of the voice recognition device using a plurality of grammars, wherein the recognition process is successfully completed if the analysis of the voice signal in at least one voice recognition branch supplies a positive recognition result. | 05-06-2010 |
20100131274 | SYSTEM AND METHOD FOR DIALOG MODELING - Disclosed herein are systems, computer-implemented methods, and computer-readable media for dialog modeling. The method includes receiving spoken dialogs annotated to indicate dialog acts and task/subtask information, parsing the spoken dialogs with a hierarchical, parse-based dialog model which operates incrementally from left to right and which only analyzes a preceding dialog context to generate parsed spoken dialogs, and constructing a functional task structure of the parsed spoken dialogs. The method can further either interpret user utterances with the functional task structure of the parsed spoken dialogs or plan system responses to user utterances with the functional task structure of the parsed spoken dialogs. The parse-based dialog model can be a shift-reduce model, a start-complete model, or a connection path model. | 05-27-2010 |
20100131275 | FACILITATING MULTIMODAL INTERACTION WITH GRAMMAR-BASED SPEECH APPLICATIONS - Multimodal interaction with grammar-based speech applications may be facilitated with a device by presenting permissible phrases that are in-grammar based on acceptable terms that are in-vocabulary and that have been recognized from a spoken utterance. In an example embodiment, a spoken utterance having two or more terms is received. The two or more terms include one or more acceptable terms. An index is searched using the acceptable terms as query terms. From the searching of the index, permissible phrase(s) are produced that include the acceptable terms. The index is a searchable data structure that represents multiple possible grammar paths that are ascertainable based on acceptable values for each term position of a grammar-based speech application. The permissible phrase(s) are presented to a user as option(s) that may be selected to conduct multimodal interaction with the device. | 05-27-2010 |
20100145699 | ADAPTATION OF AUTOMATIC SPEECH RECOGNITION ACOUSTIC MODELS - Methods and systems for adapting of acoustic models are disclosed. A user terminal may determine a phoneme distribution of a text corpus, determine an acoustic model gain distribution of phonemes of an acoustic model before and after adaptation of the acoustic model, determine a desired phoneme distribution based on the phoneme distribution and the acoustic model gain distribution, generate an adaption sentence based on the desired phoneme distribution, and generate a prompt requesting a user speak the adaptation sentence. | 06-10-2010 |
20100145700 | MOBILE SYSTEMS AND METHODS FOR RESPONDING TO NATURAL LANGUAGE SPEECH UTTERANCE - Mobile systems and methods that overcomes the deficiencies of prior art speech-based interfaces for telematics applications through the use of a complete speech-based information query, retrieval, presentation and local or remote command environment. This environment makes significant use of context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users making queries or commands in multiple domains. Through this integrated approach, a complete speech-based natural language query and response environment can be created. The invention creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command. The invention may organize domain specific behavior and information into agents, that are distributable or updateable over a wide area network. The invention can be used in dynamic environments such as those of mobile vehicles to control and communicate with both vehicle systems and remote systems and devices. | 06-10-2010 |
20100153112 | PROGRESSIVELY REFINING A SPEECH-BASED SEARCH - Disclosed are editing methods that are added to speech-based searching to allow users to better understand textual queries submitted to a search engine and to easily edit their speech queries. According to some embodiments, the user begins to speak. The user's speech is translated into a textual query and submitted to a search engine. The results of the search are presented to the user. As the user continues to speak, the user's speech query is refined based on the user's further speech. The refined speech query is converted to a textual query which is again submitted to the search engine. The refined results are presented to the user. This process continues as long as the user continues to refine the query. Some embodiments present the textual query to the user and allow the user to use both speech-based and non-speech-based tools to edit the textual query. | 06-17-2010 |
20100161336 | SYSTEM AND METHOD FOR ENHANCING SPEECH RECOGNITION ACCURACY - Disclosed herein are systems, computer-implemented methods, and computer-readable media for enhancing speech recognition accuracy. The method includes dividing a system dialog turn into segments based on timing of probable user responses, generating a weighted grammar for each segment, exclusively activating the weighted grammar generated for a current segment of the dialog turn during the current segment of the dialog turn, and recognizing user speech received during the current segment using the activated weighted grammar generated for the current segment. The method can further include assigning probability to the weighted grammar based on historical user responses and activating each weighted grammar is based on the assigned probability. Weighted grammars can be generated based on a user profile. A weighted grammar can be generated for two or more segments. Exclusively activating each weighted grammar can include a transition period blending the previously activated grammar and the grammar to be activated. | 06-24-2010 |
20100161337 | SYSTEM AND METHOD FOR RECOGNIZING SPEECH WITH DIALECT GRAMMARS - Disclosed herein are systems, computer-implemented methods, and computer-readable media for recognizing speech. The method includes receiving speech from a user, perceiving at least one speech dialect in the received speech, selecting at least one grammar from a plurality of optimized dialect grammars based on at least one score associated with the perceived speech dialect and the perceived at least one speech dialect, and recognizing the received speech with the selected at least one grammar. Selecting at least one grammar can be further based on a user profile. Multiple grammars can be blended. Predefined parameters can include pronunciation differences, vocabulary, and sentence structure. Optimized dialect grammars can be domain specific. The method can further include recognizing initial received speech with a generic grammar until an optimized dialect grammar is selected. Selecting at least one grammar from a plurality of optimized dialect grammars can be based on a certainty threshold. | 06-24-2010 |
20100204994 | SYSTEMS AND METHODS FOR RESPONDING TO NATURAL LANGUAGE SPEECH UTTERANCE - Systems and methods for receiving natural language queries and/or commands and execute the queries and/or commands. The systems and methods overcome the deficiencies of prior art speech query and response systems through the application of a complete speech-based information query, retrieval, presentation and command environment. This environment makes significant use of context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users making queries or commands in multiple domains. Through this integrated approach, a complete speech-based natural language query and response environment can be created. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command. | 08-12-2010 |
20100241431 | System and Method for Multi-Modal Input Synchronization and Disambiguation - Embodiments of a dialog system that utilizes a multi-modal input interface for recognizing user input in human-machine interaction (HMI) systems are described. Embodiments include a component that receives user input from a plurality of different user input mechanisms (multi-modal input) and performs certain synchronization and disambiguation processes. The multi-modal input components synchronizes and integrates the information obtained from different modalities, disambiguates the input, and recovers from any errors that might be produced with respect to any of the user inputs. Such a system effectively addresses any ambiguity associated with the user input and corrects for errors in the human-machine interaction. | 09-23-2010 |
20100286985 | SYSTEMS AND METHODS FOR RESPONDING TO NATURAL LANGUAGE SPEECH UTTERANCE - Systems and methods for receiving natural language queries and/or commands and execute the queries and/or commands. The systems and methods overcome the deficiencies of prior art speech query and response systems through the application of a complete speech-based information query, retrieval, presentation and command environment. This environment makes significant use of context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users making queries or commands in multiple domains. Through this integrated approach, a complete speech-based natural language query and response environment can be created. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command. | 11-11-2010 |
20100318359 | APPLICATION-DEPENDENT INFORMATION FOR RECOGNITION PROCESSING - Architecture for integrating application-dependent information into a constraints component at deployment time or when available. In terms of a general grammar, the constraints component can include or be a general grammar that comprises application-independent information and is structured in such a way that application-dependent information can be integrated into the general grammar without loss of fidelity. The general grammar includes a probability space and reserves a section of the probability space for the integration of application-dependent information. An integration component integrates the application-dependent information into the reserved section of the probability space for recognition processing. The application-dependent information is integrated into the reserved section of the probability space at deployment time or when available. The general grammar is structured to support the integration and improve the overall system. | 12-16-2010 |
20110010177 | QUESTION AND ANSWER DATABASE EXPANSION APPARATUS AND QUESTION AND ANSWER DATABASE EXPANSION METHOD - A question and answer database expansion apparatus includes: a question and answer database in which questions and answers corresponding to the questions are registered in association with each other, a first speech recognition unit which carries out speech recognition for an input sound signal by using a language model based on the question and answer database, and outputs a first speech recognition result as the recognition result, a second speech recognition unit which carries out speech recognition for the input sound signal by using a language model based on a large vocabulary database, and outputs a second speech recognition result as the recognition result, and a question detection unit which detects an unregistered utterance, which is not registered in the question and answer database, from the input sound based on the first speech recognition result and the second speech recognition result, and outputs the detected unregistered utterance. | 01-13-2011 |
20110015928 | COMBINATION AND FEDERATION OF LOCAL AND REMOTE SPEECH RECOGNITION - Techniques to provide automatic speech recognition at a local device are described. An apparatus may include an audio input to receive audio data indicating a task. The apparatus may further include a local recognizer component to receive the audio data, to pass the audio data to a remote recognizer while receiving the audio data, and to recognize speech from the audio data. The apparatus may further include a federation component operative to receive one or more recognition results from the local recognizer and/or the remote recognizer, and to federate a plurality of recognition results to produce a most likely result. The apparatus may further include an application to perform the task indicated by the most likely result. Other embodiments are described and claimed. | 01-20-2011 |
20110077944 | SPEECH RECOGNITION MODULE AND APPLICATIONS THEREOF - A speech recognition module includes an acoustic front-end module, a sound detection module, and a word detection module. The acoustic front-end module generates a plurality of representations of frames from a digital audio signal and generates speech characteristic probabilities for the plurality of frames. The sound detection module determines a plurality of estimated utterances from the plurality of representations and the speech characteristic probabilities. The word detection module determines one or more words based on the plurality of estimated utterances and the speech characteristics probabilities. | 03-31-2011 |
20110093271 | MULTIMODAL NATURAL LANGUAGE QUERY SYSTEM FOR PROCESSING AND ANALYZING VOICE AND PROXIMITY-BASED QUERIES - The present disclosure provides a natural language query system and method for processing and analyzing multimodally-originated queries, including voice and proximity-based queries. The natural language query system includes a Web-enabled device including a speech input module for receiving a voice-based query in natural language form from a user and a location/proximity module for receiving location/proximity information from a location/proximity device. The query system also includes a speech conversion module for converting the voice-based query in natural language form to text in natural language form and a natural language processing module for converting the text in natural language form to text in searchable form. The query system further includes a semantic engine module for converting the text in searchable form to a formal database query and a database-look-up module for using the formal database query to obtain a result related to the voice-based query in natural language form from a database. | 04-21-2011 |
20110137654 | SYSTEM AND METHOD FOR EFFICIENT TRACKING OF MULTIPLE DIALOG STATES WITH INCREMENTAL RECOMBINATION - Disclosed herein are systems, methods, and computer-readable storage media for tracking multiple dialog states. A system practicing the method receives an N-best list of speech recognition candidates, a list of current partitions, and a belief for each of the current partitions. A partition is a group of dialog states. In an outer loop, the system iterates over the N-best list of speech recognition candidates. In an inner loop, the system performs a split, update, and recombination process to generate a fixed number of partitions after each speech recognition candidate in the N-best list. The system recognizes speech based on the N-best list and the fixed number of partitions. The split process can perform all possible splits on all partitions. The update process can compute an estimated new belief. The estimated new belief can be a product of ASR reliability, user likelihood to produce this action, and an original belief. | 06-09-2011 |
20110191107 | Structure for Grammar and Dictionary Representation in Voice Recognition and Method for Simplifying Link and Node-Generated Grammars - A speech recognition engine is provided with an acoustic model and a layered grammar and dictionary library. The layered grammar and dictionary library includes a language and non-grammar layer that supplies types of rules a grammar definition layer can use and defines non-grammar the speech recognition engine should ignore. The layered grammar and dictionary library also includes a dictionary layer that defines phonetic transcriptions for word groups the speech recognition engine is meant to recognize when voice input is received. The layered grammar and dictionary library further includes a grammar definition layer that applies rules from the language and non-grammar layer to define combinations of word groups the speech recognition system is meant to recognize. Voice input is received at a speech recognition engine and is processed using the acoustic model and the layered grammar and dictionary library. | 08-04-2011 |
20110208527 | Voice Activatable System for Providing the Correct Spelling of a Spoken Word - A voice activatable system for providing the correct spelling of a spoken word is disposed in an elongated body of a writing instrument such as a ball point pen. The system includes a microphone the output of which is fed to an amplifier analog to a digital converter and from there to a speech recognition program, the output of the speech recognition program is fed to a computer, namely a word processor/controller that includes a data base. The output of the speech recognition is compared with the digital library of words and when a match is found, it is amplified and fed to digital to analog connector. The output of the digital/analog computer is fed to a speaker that repeats the word with the correct pronunciation followed by a correct spelling of the word. The system includes a battery for powering the system as well as an on/off switch and a repeat button for repeating information from the system. | 08-25-2011 |
20110213616 | "System and Method for the Adaptive Use of Uncertainty Information in Speech Recognition to Assist in the Recognition of Natural Language Phrases" - A speech recognition system includes a natural language processing component and an automated speech recognition component distinct from each other such that uncertainty in speech recognition is isolated from uncertainty in natural language understanding, wherein the natural language processing component and an automated speech recognition component communicate corresponding weighted meta-information representative of the uncertainty. | 09-01-2011 |
20110218808 | SYSTEM AND METHOD FOR SPELLING RECOGNITION USING SPEECH AND NON-SPEECH INPUT - A system and method for non-speech input or keypad-aided word and spelling recognition is disclosed. The method includes generating an unweighted grammar, selecting a database of words, generating a weighted grammar using the unweighted grammar and a statistical letter model trained on the database of words, receiving speech from a user after receiving the non-speech input and after generating the weighted grammar, and performing automatic speech recognition on the speech and non-speech input using the weighted grammar If a confidence is below a predetermined level, then the method includes receiving non-speech input from the user, disambiguating possible spellings by generating a letter lattice based on a user input modality, and constraining the letter lattice and generating a new letter string of possible word spellings until a letter string is correctly recognized. | 09-08-2011 |
20120072222 | Automatic Detection, Summarization And Reporting Of Business Intelligence Highlights From Automated Dialog Systems - A method and system for reporting data from a spoken dialog service is disclosed. The method comprises extracting data regarding user dialogs using a dialog logging module in the spoken dialog service, analyzing the data to identify trends and reporting the trends. The data may be presented in a visual form for easier consumption. The method may also relate to identifying data within the control or outside the control of a service provider that is used to adjust the spoken dialog service to maximize customer retention. | 03-22-2012 |
20120232905 | METHODOLOGY TO IMPROVE FAILURE PREDICTION ACCURACY BY FUSING TEXTUAL DATA WITH RELIABILITY MODEL - A method and system for developing reliability models from unstructured text documents, such as text verbatim descriptions from service technicians. An ontology, or data model, and heuristic rules are used to identify and extract failure modes and parts from the text verbatim comments associated with specific labor codes from service events. Like-meaning but differently-worded terms are then merged using text similarity scoring techniques. The resultant failure modes are used to create enhanced reliability models, where component reliability is predicted in terms of individual failure modes instead of aggregated for the component. The enhanced reliability models provide improved reliability prediction for the component, and also provides insight into aspects of the component design which can be improved in the future. | 09-13-2012 |
20120278080 | COMMUNICATION DEVICE FOR DETERMINING CONTEXTUAL INFORMATION - A method and communication device for determining contextual information is provided. Textual information is received from at least one of an input device and a communication interface at the communication device. The textual information is processed to automatically extract contextual data embedded in the textual information in response to the receiving. Supplementary contextual data is automatically retrieved based on the contextual data from a remote data source via the communication interface in response to the processing. The supplementary contextual data is automatically rendered at the display device in association with the contextual data in response to receiving the supplementary contextual data. | 11-01-2012 |
20120310648 | NAME IDENTIFICATION RULE GENERATING APPARATUS AND NAME IDENTIFICATION RULE GENERATING METHOD - A name identification rule generating method, includes: generating an abstract syntax tree by removing a portion of an input sentence unrelated to a process in analysis of syntax of the input sentence by a computer; setting, in generating the abstract syntax tree, nodes corresponding to a plurality of arguments at the same layer; and generating, in generating the abstract syntax tree, a first character string pattern including a second character string corresponding to a node of the abstract syntax tree where a number of types of terminal symbols on the node is equal to or smaller than a certain multiple of a number of types of processes that call the input sentence. | 12-06-2012 |
20130006640 | Automatic Language Model Update - A method for generating a speech recognition model includes accessing a baseline speech recognition model, obtaining information related to recent language usage from search queries, and modifying the speech recognition model to revise probabilities of a portion of a sound occurrence based on the information. The portion of a sound may include a word. Also, a method for generating a speech recognition model, includes receiving at a search engine from a remote device an audio recording and a transcript that substantially represents at least a portion of the audio recording, synchronizing the transcript with the audio recording, extracting one or more letters from the transcript and extracting the associated pronunciation of the one or more letters from the audio recording, and generating a dictionary entry in a pronunciation dictionary. | 01-03-2013 |
20130013311 | METHOD AND APPARATUS FOR ADAPTING A LANGUAGE MODEL IN RESPONSE TO ERROR CORRECTION - The present invention relates to a method and apparatus for adapting a language model in response to error correction. One embodiment of a method for processing an input signal including human language includes receiving the input signal and applying a statistical language model combined with a separate, corrective language model to the input signal in order to produce a processing result. | 01-10-2013 |
20130096919 | APPARATUS AND ASSOCIATED METHOD FOR MODIFYING MEDIA DATA ENTERED PURSUANT TO A MEDIA FUNCTION - A white board function, and an associated method, for a wireless, or other, device. Entry of graphical and audio media pursuant to the white board function is detected and correlated. A search is performed to locate substitute graphical media amenable for substitution for the entered graphical media. If located, the substitute graphical media is substituted for the entered graphical media. | 04-18-2013 |
20130117024 | STRUCTURED TERM RECOGNITION - A method, system and computer program product for recognizing terms in a specified corpus. In one embodiment, the method comprises providing a set of known terms t ∈ T, each of the known terms t belonging to a set of types Γ (t)={γ | 05-09-2013 |
20130132086 | METHODS AND SYSTEMS FOR ADAPTING GRAMMARS IN HYBRID SPEECH RECOGNITION ENGINES FOR ENHANCING LOCAL SR PERFORMANCE - A speech recognition method includes providing a processor communicatively coupled to each of a local speech recognition engine and a server-based speech recognition engine. A first speech input is inputted into the server-based speech recognition engine. A first recognition result from the server-based speech recognition engine is received at the processor. The first recognition result is based on the first speech input. The first recognition result is stored in a memory device in association with the first speech input. A second speech input is inputted into the local speech recognition engine. The first recognition result is retrieved from the memory device. A second recognition result is produced by the local speech recognition engine. The second recognition result is based on the second speech input and is dependent upon the retrieved first recognition result. | 05-23-2013 |
20130159001 | SATISFYING SPECIFIED INTENT(S) BASED ON MULTIMODAL REQUEST(S) - Techniques are described herein that are capable of satisfying specified intent(s) based on multimodal request(s). A multimodal request is a request that includes at least one request of a first type and at least one request of a second type that is different from the first type. Example types of request include but are not limited to a speech request, a text command, a tactile command, and a visual command. A determination is made that one or more entities in visual content are selected in accordance with an explicit scoping command from a user. In response, speech understanding functionality is automatically activated, and audio signals are automatically monitored for speech requests from the user to be processed using the speech understanding functionality. | 06-20-2013 |
20130185074 | Paraphrasing of User Requests and Results by Automated Digital Assistant - Methods, systems, and computer readable storage medium related to operating an intelligent digital assistant are disclosed. A user request is received, the user request including at least a speech input received from a user. In response to the user request, (1) an echo of the speech input based on a textual interpretation of the speech input, and (2) a paraphrase of the user request based at least in part on a respective semantic interpretation of the speech input are presented to the user. | 07-18-2013 |
20130218564 | System and Method for Providing a Natural Language Interface to a Database - A system and method for providing a natural language interface to a database or the Internet. The method provides a response from a database to a natural language query. The method comprises receiving a user query, extracting key data from the user query, submitting the extracted key data to a data base search engine to retrieve a top n pages from the data base, processing of the top n pages through a natural language dialog engine and providing a response based on processing the top n pages. | 08-22-2013 |
20130218565 | Enhanced Media Playback with Speech Recognition - A method for enhancing a media file to enable speech-recognition of spoken navigation commands can be provided. The method can include receiving a plurality of textual items based on subject matter of the media file and generating a grammar for each textual item, thereby generating a plurality of grammars for use by a speech recognition engine. The method can further include associating a time stamp with each grammar, wherein a time stamp indicates a location in the media file of a textual item corresponding with a grammar. The method can further include associating the plurality of grammars with the media file, such that speech recognized by the speech recognition engine is associated with a corresponding location in the media file. | 08-22-2013 |
20130275136 | SYSTEM AND METHOD FOR ENHANCING SPEECH RECOGNITION ACCURACY - Disclosed herein are systems, computer-implemented methods, and computer-readable media for enhancing speech recognition accuracy. The method includes dividing a system dialog turn into segments based on timing of probable user responses, generating a weighted grammar for each segment, exclusively activating the weighted grammar generated for a current segment of the dialog turn during the current segment of the dialog turn, and recognizing user speech received during the current segment using the activated weighted grammar generated for the current segment. The method can further include assigning probability to the weighted grammar based on historical user responses and activating each weighted grammar is based on the assigned probability. Weighted grammars can be generated based on a user profile. A weighted grammar can be generated for two or more segments. Exclusively activating each weighted grammar can include a transition period blending the previously activated grammar and the grammar to be activated. | 10-17-2013 |
20130289996 | MULTIPASS ASR CONTROLLING MULTIPLE APPLICATIONS - A multipass processing system includes a first grammar-based speech recognition system that compares a spoken utterance to a sub-grammar. The sub-grammar includes keywords or key phrases from active grammars that each uniquely identifies one of many application engines. The first grammar-based speech recognition system generates a first grammar-based speech recognition result and a first grammar-based confidence score. A demultiplexer receives the spoken utterance through an input. The demultiplexer transmits the spoken utterance to one of many other grammar-based speech recognition systems based on the first grammar-based speech recognition-result. | 10-31-2013 |
20130289997 | CONTEXT-BASED INTERACTIVE PLUSH TOY - An interactive toy for interacting with a user while a story is being read aloud from a book or played from a movie/video. The toy includes a speech recognition unit that receives and detects certain triggering phrases as they are read aloud or played from a companion literary work. The triggering phrase read aloud from the book or played in the movie/video may have independent significance or may only have significance when combined with other phrases read aloud from the book or played in the movie/video. | 10-31-2013 |
20130297314 | RESCORING METHOD AND APPARATUS IN DISTRIBUTED ENVIRONMENT - Disclosed are a distributed environment rescoring method and apparatus. A distributed environment rescoring method in accordance with the present invention includes generating a word lattice by performing voice recognition on received voice, converting the word lattice into a word confusion network formed from the temporal connection of confusion sets clustered based on temporal redundancy and phoneme similarities, generating a list of subword confusion networks based on the entropy values of the respective confusion sets included in the word confusion network, and generating a modified word confusion network by modifying a list of the subword confusion networks through distributed environment rescoring. | 11-07-2013 |
20130297315 | Enhanced Accuracy for Speech Recognition Grammars - Disclosed herein are methods and systems for recognizing speech. A method embodiment comprises comparing received speech with a precompiled grammar based on a database and if the received speech matches data in the precompiled grammar then returning a result based on the matched data. If the received speech does not match data in the precompiled grammar, then dynamically compiling a new grammar based only on new data added to the database after the compiling of the precompiled grammar The database may comprise a directory of names. | 11-07-2013 |
20130304473 | SYSTEM AND METHOD FOR PROCESSING MULTI-MODAL DEVICE INTERACTIONS IN A NATURAL LANGUAGE VOICE SERVICES ENVIRONMENT - A system and method for processing multi-modal device interactions in a natural language voice services environment may be provided. In particular, one or more multi-modal device interactions may be received in a natural language voice services environment that includes one or more electronic devices. The multi-modal device interactions may include a non-voice interaction with at least one of the electronic devices or an application associated therewith, and may further include a natural language utterance relating to the non-voice interaction. Context relating to the non-voice interaction and the natural language utterance may be extracted and combined to determine an intent of the multi-modal device interaction, and a request may then be routed to one or more of the electronic devices based on the determined intent of the multi-modal device interaction. | 11-14-2013 |
20130339021 | Intent Discovery in Audio or Text-Based Conversation - Techniques, an apparatus and an article of manufacture identifying one or more utterances that are likely to carry the intent of a speaker, from a conversation between two or more parties. A method includes obtaining an input of a set of utterances in chronological order from a conversation between two or more parties, computing an intent confidence value of each utterance by summing intent confidence scores from each of the constituent words of the utterance, wherein intent confidence scores capture each word's influence on the subsequent utterances in the conversation based on (i) the uniqueness of the word in the conversation and (ii) the number of times the word subsequently occurs in the conversation, and generating a ranked order of the utterances from highest to lowest intent confidence value, wherein the highest intent value corresponds to the utterance which is most likely to carry intent of the speaker. | 12-19-2013 |
20130339022 | System and method for a cooperative conversational voice user interface - A cooperative conversational voice user interface is provided. The cooperative conversational voice user interface may build upon short-term and long-term shared knowledge to generate one or more explicit and/or implicit hypotheses about an intent of a user utterance. The hypotheses may be ranked based on varying degrees of certainty, and an adaptive response may be generated for the user. Responses may be worded based on the degrees of certainty and to frame an appropriate domain for a subsequent utterance. In one implementation, misrecognitions may be tolerated, and conversational course may be corrected based on subsequent utterances and/or responses. | 12-19-2013 |
20130346080 | System And Method For Performing Distributed Speech Recognition - A system and method for performing distributed speech recognition is provided. Audio data is received on a main recognizer and on each of a plurality of secondary recognizers. Secondary grammars are transmitted to each of the secondary recognizers. The secondary recognizers each perform speech recognition on the audio data using the secondary grammar for that secondary recognizer. A new grammar is constructed based on results of the speech recognition by each of the secondary recognizers. The main recognizer performs speech recognition on the audio data using the new grammar. | 12-26-2013 |
20140012579 | DETECTING POTENTIAL SIGNIFICANT ERRORS IN SPEECH RECOGNITION RESULTS - In some embodiments, recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential errors. In some embodiments, the indications of potential errors may include discrepancies between recognition results that are meaningful for a domain, such as medically-meaningful discrepancies. The evaluation of the recognition results may be carried out using any suitable criteria, including one or more criteria that differ from criteria used by an ASR system in determining the top recognition result and the alternative recognition results from the speech input. In some embodiments, a recognition result may additionally or alternatively be processed to determine whether the recognition result includes a word or phrase that is unlikely to appear in a domain to which speech input relates. | 01-09-2014 |
20140012580 | DETECTING POTENTIAL SIGNIFICANT ERRORS IN SPEECH RECOGNITION RESULTS - In some embodiments, the recognition results produced by a speech processing system (which may include a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential significant errors. In some embodiments, the recognition results may be evaluated to determine whether a meaning of any of the alternative recognition results differs from a meaning of the top recognition result in a manner that is significant for the domain. In some embodiments, one or more of the recognition results may be evaluated to determine whether the result(s) include one or more words or phrases that, when included in a result, would change a meaning of the result in a manner that would be significant for the domain. | 01-09-2014 |
20140012581 | DETECTING POTENTIAL SIGNIFICANT ERRORS IN SPEECH RECOGNITION RESULTS - In some embodiments, the recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential significant errors. In some embodiments, the recognition results may be evaluated using one or more sets of words and/or phrases, such as pairs of words/phrases that may include words/phrases that are acoustically similar to one another and/or that, when included in a result, would change a meaning of the result in a manner that would be significant for a domain. The recognition results may be evaluated using the set(s) of words/phrases to determine, when the top result includes a word/phrase from a set of words/phrases, whether any of the alternative recognition results includes any of the other, corresponding words/phrases from the set. | 01-09-2014 |
20140012582 | DETECTING POTENTIAL SIGNIFICANT ERRORS IN SPEECH RECOGNITION RESULTS - In some embodiments, a recognition result produced by a speech processing system based on an analysis of a speech input is evaluated for indications of potential errors. In some embodiments, sets of words/phrases that may be acoustically similar or otherwise confusable, the misrecognition of which can be significant in the domain, may be used together with a language model to evaluate a recognition result to determine whether the recognition result includes such an indication. In some embodiments, a word/phrase of a set that appears in the result is iteratively replaced with each of the other words/phrases of the set. The result of the replacement may be evaluated using a language model to determine a likelihood of the newly-created string of words appearing in a language and/or domain. The likelihood may then be evaluated to determine whether the result of the replacement is sufficiently likely for an alert to be triggered. | 01-09-2014 |
20140019132 | INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, DISPLAY CONTROL APPARATUS, AND DISPLAY CONTROL METHOD - There is provided an information processing apparatus including an information acquiring unit that acquires information to identify an editing point of content including a voice, on the basis of language analysis of the content, and an information output unit that outputs the acquired information. | 01-16-2014 |
20140019133 | DATA PROCESSING METHOD, PRESENTATION METHOD, AND CORRESPONDING APPARATUSES - A data processing method includes obtaining text information corresponding to a presented content, the presented content comprising a plurality of areas; performing text analysis on the text information to obtain a first keyword sequence, the first keyword sequence including area keywords associated with at least one area of the plurality of areas; obtaining speech information related to the presented content, the speech information at least comprising a current speech segment; and using a first model network to perform analysis on the current speech segment to determine the area corresponding to the current speech segment, wherein the first model network comprises the first keyword sequence. | 01-16-2014 |
20140025380 | SYSTEM, METHOD AND PROGRAM PRODUCT FOR PROVIDING AUTOMATIC SPEECH RECOGNITION (ASR) IN A SHARED RESOURCE ENVIRONMENT - A speech recognition system, method of recognizing speech and a computer program product therefor. A client device identified with a context for an associated user selectively streams audio to a provider computer, e.g., a cloud computer. Speech recognition receives streaming audio, maps utterances to specific textual candidates and determines a likelihood of a correct match for each mapped textual candidate. A context model selectively winnows candidate to resolve recognition ambiguity according to context whenever multiple textual candidates are recognized as potential matches for the same mapped utterance. Matches are used to update the context model, which may be used for multiple users in the same context. | 01-23-2014 |
20140032217 | NATURAL LANGUAGE SYSTEM AND METHOD BASED ON UNISOLATED PERFORMANCE METRIC - A natural language business system and method is developed to understand the underlying meaning of a person's speech, such as during a transaction with the business system. The system includes a speech recognition engine, and action classification engine, and a control module. The control module causes the system to execute an inventive method wherein the speech recognition and action classification models may be recursively optimized on an unisolated performance metric that is pertinent to the overall performance of the natural language business system, as opposed to the isolated model-specific criteria previously employed | 01-30-2014 |
20140039895 | METHOD FOR USING PAUSES DETECTED IN SPEECH INPUT TO ASSIST IN INTERPRETING THE INPUT DURING CONVERSATIONAL INTERACTION FOR INFORMATION RETRIEVAL - A method for using speech disfluencies detected in speech input to assist in interpreting the input is provided. The method includes providing access to a set of content items with metadata describing the content items, and receiving a speech input intended to identify a desired content item. The method further includes detecting a speech disfluency in the speech input and determining a measure of confidence of a user in a portion of the speech input following the speech disfluency. If the confidence measure is lower than a threshold value, the method includes determining an alternative query input based on replacing the portion of the speech input following the speech disfluency with another word or phrase. The method further includes selecting content items based on comparing the speech input, the alternative query input (when the confidence measure is low), and the metadata associated with the content items. | 02-06-2014 |
20140039896 | Methods and System for Grammar Fitness Evaluation as Speech Recognition Error Predictor - A plurality of statements are received from within a grammar structure. Each of the statements is formed by a number of word sets. A number of alignment regions across the statements are identified by aligning the statements on a word set basis. Each aligned word set represents an alignment region. A number of potential confusion zones are identified across the statements. Each potential confusion zone is defined by words from two or more of the statements at corresponding positions outside the alignment regions. For each of the identified potential confusion zones, phonetic pronunciations of the words within the potential confusion zone are analyzed to determine a measure of confusion probability between the words when audibly processed by a speech recognition system during the computing event. An identity of the potential confusion zones across the statements and their corresponding measure of confusion probability are reported to facilitate grammar structure improvement. | 02-06-2014 |
20140074477 | System and Method of Spoken Language Understanding in Human Computer Dialogs - A system and method are disclosed that improve automatic speech recognition in a spoken dialog system. The method comprises partitioning speech recognizer output into self-contained clauses, identifying a dialog act in each of the self-contained clauses, qualifying dialog acts by identifying a current domain object and/or a current domain action, and determining whether further qualification is possible for the current domain object and/or current domain action. If further qualification is possible, then the method comprises identifying another domain action and/or another domain object associated with the current domain object and/or current domain action, reassigning the another domain action and/or another domain object as the current domain action and/or current domain object and then recursively qualifying the new current domain action and/or current object. This process continues until nothing is left to qualify. | 03-13-2014 |
20140081641 | MULTIMODAL DISAMBIGUATION OF SPEECH RECOGNITION - The present invention provides a speech recognition system combined with one or more alternate input modalities to ensure efficient and accurate text input. The speech recognition system achieves less than perfect accuracy due to limited processing power, environmental noise, and/or natural variations in speaking style. The alternate input modalities use disambiguation or recognition engines to compensate for reduced keyboards, sloppy input, and/or natural variations in writing style. The ambiguity remaining in the speech recognition process is mostly orthogonal to the ambiguity inherent in the alternate input modality, such that the combination of the two modalities resolves the recognition errors efficiently and accurately. The invention is especially well suited for mobile devices with limited space for keyboards or touch-screen input. | 03-20-2014 |
20140156279 | CONTENT SEARCHING APPARATUS, CONTENT SEARCH METHOD, AND CONTROL PROGRAM PRODUCT - According to one embodiment, a content searching apparatus includes: a search condition generator configured to perform voice recognition in parallel with an input of a natural language voice giving an instruction for a search for a piece of content, and to generate search conditions sequentially; a searching module configured to perform a content search while updating the search condition used in the search as the search condition is generated; and a search result display configured to update the search condition used in the content search and a result of the content search based on the search condition to be displayed as the search condition is generated. | 06-05-2014 |
20140163989 | INTEGRATED LANGUAGE MODEL, RELATED SYSTEMS AND METHODS - An integrated language model includes an upper-level language model component and a lower-level language model component, with the upper-level language model component including a non-terminal and the lower-level language model component being applied to the non-terminal. The upper-level and lower-level language model components can be of the same or different language model formats, including finite state grammar (FSG) and statistical language model (SLM) formats. Systems and methods for making integrated language models allow designation of language model formats for the upper-level and lower-level components and identification of non-terminals. Automatic non-terminal replacement and retention criteria can be used to facilitate the generation of one or both language model components, which can include the modification of existing language models. | 06-12-2014 |
20140188477 | METHOD FOR CORRECTING A SPEECH RESPONSE AND NATURAL LANGUAGE DIALOGUE SYSTEM - A natural language dialogue system and a method capable of correcting a speech response are provided. The method includes following steps. A first speech input is received. At least one keyword included in the first speech input is parsed to obtain a candidate list having at least one report answers. One of the report answers is selected from the candidate list as a first report answer, and a first speech response is output according to the first report answer. A second speech input is received and parsed to determine whether the first report answer is correct. If the first report answer is incorrect, another report answer other than the first report answer is selected from the candidate list as a second report answer. According to the second report answer, a second speech response is output. | 07-03-2014 |
20140188478 | NATURAL LANGUAGE DIALOGUE METHOD AND NATURAL LANGUAGE DIALOGUE SYSTEM - A natural language dialogue method and a natural language dialogue system are provided. In the method, a first speech input is received and parsed to generate at least one keyword included in the first speech input, so that a candidate list including at least one report answer is obtained. According to a properties database, one report answer is selected from the candidate list, and a first speech response is output according to the report answer. Other speech inputs are received, and a user's preference data is captured from the speech inputs. The user's preference data is stored in the properties database. | 07-03-2014 |
20140200892 | Method and Apparatus to Model and Transfer the Prosody of Tags across Languages - Identify, Capture, Retain and Synthesize Non-Linguistic and Discourse Components of Speech across Languages | 07-17-2014 |
20140200893 | SYSTEMS AND METHODS FOR FILTERING OBJECTIONABLE CONTENT - Systems and methods for filtering media containing objectionable content are described. Marker files that list the times objectionable content occurs in audio content (such as a song, podcast, audio associated with a video or television program, or the like) can be stored in a user device. When a user plays audio content for which a marker file exists, the system can automatically filter out the objectionable content marked in the marker file from playback of the audio content. The system may also provide functionality for the user to specify a level of filtering to be applied or even specific words to be filtered from audio content. | 07-17-2014 |
20140214426 | SYSTEM AND METHOD FOR IMPROVING VOICE COMMUNICATION OVER A NETWORK - Systems and methods for improving communication over a network are provided. A system for improving communication over a network, comprises a detection module capable of detecting data indicating a problem with a communication between at least two participants communicating via communication devices over the network, a management module capable of analyzing the data to determine whether a participant is dissatisfied with the communication, wherein the management module includes a determining module capable of determining that the participant is dissatisfied, and identifying an event causing the dissatisfaction, and a resolution module capable of providing a solution for eliminating the problem. | 07-31-2014 |
20140222432 | WIRELESS COMMUNICATION CHANNEL OPERATION METHOD AND SYSTEM OF PORTABLE TERMINAL - A voice talk function-enabled terminal and voice talk control method for outputting distinct content based on the current emotional state, age, and gender of the user are provided. The mobile terminal supporting a voice talk function includes a display unit, an audio processing unit, which selects content corresponding to a first criterion associated with a user in response to a user input, determines a content output scheme based on a second criterion associated with the user, and outputs the selected content through the display unit and audio processing unit according to the content output scheme. | 08-07-2014 |
20140244261 | CONVERSION OF NON-BACK-OFF LANGUAGE MODELS FOR EFFICIENT SPEECH DECODING - Techniques for conversion of non-back-off language models for use in speech decoders. For example, an apparatus for conversion of non-back-off language models for use in speech decoders. For example, an apparatus is configured convert a non-back-off language model to a back-off language model. The converted back-off language model is pruned. The converted back-off language model is usable for decoding speech. | 08-28-2014 |
20140249821 | SYSTEM AND METHOD FOR PROCESSING NATURAL LANGUAGE UTTERANCES - Systems and methods for receiving natural language queries and/or commands and execute the queries and/or commands. The systems and methods overcomes the deficiencies of prior art speech query and response systems through the application of a complete speech-based information query, retrieval, presentation and command environment. This environment makes significant use of context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users making queries or commands in multiple domains. Through this integrated approach, a complete speech-based natural language query and response environment can be created. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command. | 09-04-2014 |
20140249822 | SYSTEM AND METHOD FOR PROCESSING MULTI-MODAL DEVICE INTERACTIONS IN A NATURAL LANGUAGE VOICE SERVICES ENVIRONMENT - A system and method for processing multi-modal device interactions in a natural language voice services environment may be provided. In particular, one or more multi-modal device interactions may be received in a natural language voice services environment that includes one or more electronic devices. The multi-modal device interactions may include a non-voice interaction with at least one of the electronic devices or an application associated therewith, and may further include a natural language utterance relating to the non-voice interaction. Context relating to the non-voice interaction and the natural language utterance may be extracted and combined to determine an intent of the multi-modal device interaction, and a request may then be routed to one or more of the electronic devices based on the determined intent of the multi-modal device interaction. | 09-04-2014 |
20140278424 | KERNEL DEEP CONVEX NETWORKS AND END-TO-END LEARNING - Data associated with spoken language may be obtained. An analysis of the obtained data may be initiated for understanding of the spoken language using a deep convex network that is integrated with a kernel trick. The resulting kernel deep convex network may also be constructed by stacking one shallow kernel network over another with concatenation of the output vector of the lower network with the input data vector. A probability associated with a slot that is associated with slot-filling may be determined, based on local, discriminative features that are extracted using the kernel deep convex network. | 09-18-2014 |
20140278425 | DATA SHREDDING FOR SPEECH RECOGNITION LANGUAGE MODEL TRAINING UNDER DATA RETENTION RESTRICTIONS - Training speech recognizers, e.g., their language or acoustic models, using actual user data is useful, but retaining personally identifiable information may be restricted in certain environments due to regulations. Accordingly, a method or system is provided for enabling training of a language model which includes producing segments of text in a text corpus and counts corresponding to the segments of text, the text corpus being in a depersonalized state. The method further includes enabling a system to train a language model using the segments of text in the depersonalized state and the counts. Because the data is depersonalized, actual data may be used, enabling speech recognizers to keep up-to-date with user trends in speech and usage, among other benefits. | 09-18-2014 |
20140278426 | DATA SHREDDING FOR SPEECH RECOGNITION ACOUSTIC MODEL TRAINING UNDER DATA RETENTION RESTRICTIONS - Training speech recognizers, e.g., their language or acoustic models, using actual user data is useful, but retaining personally identifiable information may be restricted in certain environments due to regulations. Accordingly, a method or system is provided for enabling training of an acoustic model which includes dynamically shredding a speech corpus to produce text segments and depersonalized audio features corresponding to the text segments. The method further includes enabling a system to train an acoustic model using the text segments and the depersonalized audio features. Because the data is depersonalized, actual data may be used, enabling speech recognizers to keep up-to-date with user trends in speech and usage, among other benefits. | 09-18-2014 |
20140278427 | DYNAMIC DIALOG SYSTEM AGENT INTEGRATION - A method for dialog agent integration comprises discovering a dialog agent required for a dialog request including dialog information comprising terms required for audio feedback in a service domain required for the dialog request, extracting the dialog information from the discovered dialog agent, integrating the dialog information to existing dialog information of a dialog system (DS) that provides dialog functionality for an electronic device, and expanding the service domain dialog functionality of the DS with the integrated dialog information. | 09-18-2014 |
20140278428 | TRACKING SPOKEN LANGUAGE USING A DYNAMIC ACTIVE VOCABULARY - Systems and methods to provide a set of dictionaries and highlighting lists for speech recognition and highlighting, where the speech recognition focuses only on a limited scope of vocabulary as present in a document. The systems and methods allow a rapid and accurate matching of the utterance with the available text, and appropriately indicate the location in the text or signal any errors made during reading. Described herein is a system and method to create speech recognition systems focused on reading a fixed text and providing feedback on what they read to improve literacy, aid those with disabilities, and to make the reading experience more efficient and fun. | 09-18-2014 |
20140288936 | LINGUISTIC MODEL DATABASE FOR LINGUISTIC RECOGNITION, LINGUISTIC RECOGNITION DEVICE AND LINGUISTIC RECOGNITION METHOD, AND LINGUISTIC RECOGNITION SYSTEM - A method of building a database for a linguistic recognition device is provided The method includes storing common linguistic model data configured to infer a word or a sentence from a character acquired by recognizing a language input by a user in a storage section of a linguistic recognition device, collecting recognition-related information related to the user after storing the common linguistic data, and analyzing the collected recognition-related information to be stored as individual linguistic model data. | 09-25-2014 |
20140288937 | SYSTEM AND METHOD FOR HANDLING MISSING SPEECH DATA - Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for handling missing speech data. The computer-implemented method includes receiving speech with a missing segment, generating a plurality of hypotheses for the missing segment, identifying a best hypothesis for the missing segment, and recognizing the received speech by inserting the identified best hypothesis for the missing segment. In another method embodiment, the final step is replaced with synthesizing the received speech by inserting the identified best hypothesis for the missing segment. In one aspect, the method further includes identifying a duration for the missing segment and generating the plurality of hypotheses of the identified duration for the missing segment. The step of identifying the best hypothesis for the missing segment can be based on speech context, a pronouncing lexicon, and/or a language model. Each hypothesis can have an identical acoustic score. | 09-25-2014 |
20140297283 | Concept Cloud in Smart Phone Applications - An automated arrangement is described for conducting natural language interactions with a human user. A user interface is provided for user communication in a given active natural language interaction with a natural language application during an automated dialog session. An automatic speech recognition (ASR) engine processes unknown user speech inputs from the user interface to produce corresponding speech recognition results. A natural language concept module processes the speech recognition results to develop corresponding natural language concept items. A concept item storage holds selected concept items for reuse in a subsequent natural language interaction with the user during the automated dialog session. | 10-02-2014 |
20140297284 | USING CONTEXT INFORMATION TO FACILITATE PROCESSING OF COMMANDS IN A VIRTUAL ASSISTANT - A virtual assistant uses context information to supplement natural language or gestural input from a user. Context helps to clarify the user's intent and to reduce the number of candidate interpretations of the user's input, and reduces the need for the user to provide excessive clarification input. Context can include any available information that is usable by the assistant to supplement explicit user input to constrain an information-processing problem and/or to personalize results. Context can be used to constrain solutions during various phases of processing, including, for example, speech recognition, natural language processing, task flow processing, and dialog generation. | 10-02-2014 |
20140324434 | SYSTEMS AND METHODS FOR PROVIDING METADATA-DEPENDENT LANGUAGE MODELS - Techniques for generating language models. The techniques include: obtaining language data comprising training data and associated values for one or more metadata attributes, the language data comprising a plurality of instances of language data, an instance of language data comprising an instance of training data and one or more metadata attribute values associated with the instance of training data; identifying, by processing the language data using at least one processor, a set of one or more of the metadata attributes to use for clustering the instances of training data into a plurality of clusters; clustering the training data instances based on their respective values for the identified set of metadata attributes into the plurality of clusters; and generating a language model for each of the plurality of clusters. | 10-30-2014 |
20140337032 | Multiple Recognizer Speech Recognition - The subject matter of this specification can be embodied in, among other things, a method that includes receiving audio data that corresponds to an utterance, obtaining a first transcription of the utterance that was generated using a limited speech recognizer. The limited speech recognizer includes a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar. A second transcription of the utterance is obtained that was generated using an expanded speech recognizer. The expanded speech recognizer includes a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar. The utterance is classified based at least on a portion of the first transcription or the second transcription. | 11-13-2014 |
20140343944 | METHOD OF VISUAL VOICE RECOGNITION WITH SELECTION OF GROUPS OF MOST RELEVANT POINTS OF INTEREST - The method comprises steps of: a) forming a starting set of microstructures of n points of interest, each defined by a tuple of order n, with n≧1; b) determining, for each tuple, associated structured visual characteristics, based on local gradient and/or movement descriptors of the points of interest; and c) iteratively searching for and selecting the most discriminant tuples. Step c) operates by: c1) applying to the set of tuples an algorithm of the Multi-Kernel Learning MKL type; c2) extracting a sub-set of tuples producing the highest relevancy scores; c3) aggregating to these tuples an additional tuple to obtain a new set of tuples of higher order; c4) determining structured visual characteristics associated to each aggregated tuple; c5) selecting a new sub-set of most discriminant tuples; and c6) reiterating steps c1) to c5) up to a maximal order N. | 11-20-2014 |
20140343945 | METHOD OF VISUAL VOICE RECOGNITION BY FOLLOWING-UP THE LOCAL DEFORMATIONS OF A SET OF POINTS OF INTEREST OF THE SPEAKER'S MOUTH - The method comprises steps of: a) for each point of interest of each image, calculating a local gradient descriptor and a local movement descriptor; b) forming microstructures of n points of interest, each defined by a tuple of order n, with n≧1; c) determining, for each tuple of a vector of structured visual characteristics (d | 11-20-2014 |
20140350939 | Systems and Methods for Adding Punctuations - Systems and methods are provided for adding punctuations. For example, one or more first feature units are identified in a voice file taken as a whole; the voice file is divided into multiple segments: one or more second feature units are identified in the voice file; a first aggregate weight of first punctuation states of the voice file and a second aggregate weight of second punctuation states of the voice file are determined, using a language model established based on word separation and third semantic features; a weighted calculation is performed to generate a third aggregate weight based on at least information associated with the first aggregate weight and the second aggregate weight; and one or more final punctuations are added to the voice file based on at least information associated with the third aggregate weight. | 11-27-2014 |
20140358545 | Multiple Parallel Dialogs in Smart Phone Applications - An arrangement is described for conducting natural language dialogs with a user on a mobile device using automatic speech recognition (ASR) and multiple different dialog applications. A user interface provides for user interaction with the dialogue applications in natural language dialogs. An ASR engine processes unknown speech inputs from the user to produce corresponding speech recognition results. A dialog concept module develops dialog concept items from the speech recognition results and stores the dialog concept items and additional dialog information in a dialog concept database. A dialog processor accesses dialog concept database information and coordinates operation of the ASR engine and the dialog applications to conduct with the user a plurality of separate parallel natural language dialogs in the dialog applications. | 12-04-2014 |
20140365222 | MOBILE SYSTEMS AND METHODS OF SUPPORTING NATURAL LANGUAGE HUMAN-MACHINE INTERACTIONS - A mobile system is provided that includes speech-based and non-speech-based interfaces for telematics applications. The mobile system identifies and uses context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for users that submit requests and/or commands in multiple domains. The invention creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command. The invention may organize domain specific behavior and information into agents, that are distributable or updateable over a wide area network. | 12-11-2014 |
20140372122 | Determining Word Sequence Constraints for Low Cognitive Speech Recognition - A method for recognizing speech including a sequence of words determines a shape of a gesture and a location of the gesture with respect to a display device showing a set of interpretations of the speech. The method determines a type of the word sequence constraint based on the shape of the gesture and determines a value of the word sequence constraint based on the location of the gesture. Next, the speech is recognized using the word sequence constraint. | 12-18-2014 |
20140379349 | System and Method for Tightly Coupling Automatic Speech Recognition and Search - Disclosed herein are systems, methods, and computer-readable storage media for performing a search. A system configured to practice the method first receives from an automatic speech recognition (ASR) system a word lattice based on speech query and receives indexed documents from an information repository. The system composes, based on the word lattice and the indexed documents, at least one triple including a query word, selected indexed document, and weight. The system generates an N-best path through the word lattice based on the at least one triple and re-ranks ASR output based on the N-best path. The system aggregates each weight across the query words to generate N-best listings and returns search results to the speech query based on the re-ranked ASR output and the N-best listings. The lattice can be a confusion network, the arc density of which can be adjusted for a desired performance level. | 12-25-2014 |
20150019227 | SYSTEM, DEVICE AND METHOD FOR PROCESSING INTERLACED MULTIMODAL USER INPUT - A device, method and system are provided for interpreting and executing operations based on multimodal input received at a computing device. The multimodal input can include one or more verbal and non-verbal inputs, such as a combination of speech and gesture inputs received substantially concurrently via suitable user interface means provided on the computing device. One or more target objects is identified from the non-verbal input, and text is recognized from the verbal input. An interaction object is generated using the recognized text and identified target objects, and thus comprises a natural language expression with embedded target objects. The interaction object is then processed to identify one or more operations to be executed. | 01-15-2015 |
20150032454 | Menu Hierarchy Skipping Dialog For Directed Dialog Speech Recognition - A method and a processing device for managing an interactive speech recognition system is provided. Whether a voice input relates to expected input, at least partially, of any one of a group of menus different from a current menu is determined. If the voice input relates to the expected input, at least partially, of any one of the group of menus different from the current menu, skipping to the one of the group of menus is performed. The group of menus is different from the current menu include menus at multiple hierarchical levels. | 01-29-2015 |
20150058018 | MULTIPLE PASS AUTOMATIC SPEECH RECOGNITION METHODS AND APPARATUS - In some aspects, a method of recognizing speech that comprises natural language and at least one word specified in at least one domain-specific vocabulary is provided. The method comprises performing a first speech processing pass comprising identifying, in the speech, a first portion including the natural language and a second portion including the at least one word specified in the at least one domain-specific vocabulary, and recognizing the first portion including the natural language. The method further comprises performing a second speech processing pass comprising recognizing the second portion including the at least one word specified in the at least one domain-specific vocabulary. | 02-26-2015 |
20150088519 | DETECTING POTENTIAL SIGNIFICANT ERRORS IN SPEECH RECOGNITION RESULTS - In some embodiments, a recognition result produced by a speech processing system based on an analysis of a speech input is evaluated for indications of potential errors. In some embodiments, sets of words/phrases that may be acoustically similar or otherwise confusable, the misrecognition of which can be significant in the domain, may be used together with a language model to evaluate a recognition result to determine whether the recognition result includes such an indication. In some embodiments, a word/phrase of a set that appears in the result is iteratively replaced with each of the other words/phrases of the set. The result of the replacement may be evaluated using a language model to determine a likelihood of the newly-created string of words appearing in a language and/or domain. The likelihood may then be evaluated to determine whether the result of the replacement is sufficiently likely for an alert to be triggered. | 03-26-2015 |
20150095033 | TECHNIQUES FOR UPDATING A PARTIAL DIALOG STATE - Embodiments provide for tracking a partial dialog state as part of managing a dialog state space, but the embodiments are not so limited. A method of an embodiment jointly models partial state update and named entity recognition using a sequence-based classification or other model, wherein recognition of named entities and a partial state update can be performed in a single processing stage at runtime to generate a distribution over partial dialog states. A system of an embodiment is configured to generate a distribution over partial dialog states at runtime in part using a sequence classification decoding or other algorithm to generate one or more partial dialog state hypothesis and/or a confidence score or measure associated with each hypothesis. Other embodiments are included. | 04-02-2015 |
20150106100 | EFFICIENT EMPIRICAL COMPUTATION AND UTILIZATION OF ACOUSTIC CONFUSABILITY - Efficient empirical determination, computation, and use of an acoustic confusability measure comprises: (1) an empirically derived acoustic confusability measure, comprising a means for determining the acoustic confusability between any two textual phrases in a given language, where the measure of acoustic confusability is empirically derived from examples of the application of a specific speech recognition technology, where the procedure does not require access to the internal computational models of the speech recognition technology, and does not depend upon any particular internal structure or modeling technique, and where the procedure is based upon iterative improvement from an initial estimate; (2) techniques for efficient computation of empirically derived acoustic confusability measure, comprising means for efficient application of an acoustic confusability score, allowing practical application to very large-scale problems; and (3) a method for using acoustic confusability measures to make principled choices about which specific phrases to make recognizable by a speech recognition application. | 04-16-2015 |
20150112684 | Content-Aware Speaker Recognition - A content-aware speaker recognition system includes technologies to, among other things, analyze phonetic content of a speech sample, incorporate phonetic content of the speech sample into a speaker model, and use the phonetically-aware speaker model for speaker recognition. | 04-23-2015 |
20150112685 | SPEECH RECOGNITION METHOD AND ELECTRONIC APPARATUS USING THE METHOD - A speech recognition method and an electronic apparatus using the method are provided. In the method, a feature vector obtained from a speech signal is inputted to a plurality of speech recognition modules, and a plurality of string probabilities and a plurality of candidate strings are obtained from the speech recognition modules respectively. The candidate string corresponding to the largest one of the plurality of string probabilities is selected as a recognition result of the speech signal. | 04-23-2015 |
20150120302 | METHOD AND SYSTEM FOR PERFORMING TERM ANALYSIS IN SOCIAL DATA - Disclosed is a system, method, and computer program product for allowing an entity to access social media data, and to perform term analysis upon that data. The approach is capable of accessing data across multiple types of internet-based sources of social data and commentary. A user interface is provided that allows the user to view and interact with the results of performing term analysis. | 04-30-2015 |
20150127347 | DETECTING SPEECH INPUT PHRASE CONFUSION RISK - Embodiments are disclosed that relate to identifying phonetically similar speech grammar terms during computer program development. For example, one disclosed embodiment provides a method including providing a speech grammar development tool configured to receive input of a text representation of each of a plurality of proposed speech grammar terms, convert each text representation to a phonetic representation of the speech grammar term, compare the phonetic representation of the speech grammar term to the phonetic representations of other speech grammar terms using a weighted similarity matrix, and provide an output regarding risk of confusion between two proposed speech grammar terms based upon a comparison of the phonetic representations of the two proposed speech grammar terms. The method further includes receiving data regarding incorrect speech grammar term identification, and modifying one or more weights in the weighted similarity matrix based upon the data. | 05-07-2015 |
20150142443 | SYNTAX PARSING APPARATUS BASED ON SYNTAX PREPROCESSING AND METHOD THEREOF - The present disclosure relates to a syntax parsing apparatus based on syntax preprocessing and a method thereof. In specific, the present disclosure parses syntaxes that can be parsed by rules and patterns without ambiguity by syntax parsing preprocessing, draws all possible syntax parsing results by applying syntax rules based on a result of syntax parsing preprocessing in which ambiguity is partially resolved, and resolves structural ambiguity by applying a statistic syntax parsing model learned from a syntax tree attachment learning corpus so as to reduce ambiguity in rule-based syntax parsing and to resolve ambiguity by a statistics-based scheme so that parsing correctness and processing efficiency in a syntax parsing method can be enhanced. | 05-21-2015 |
20150149176 | SYSTEM AND METHOD FOR TRAINING A CLASSIFIER FOR NATURAL LANGUAGE UNDERSTANDING - Disclosed herein are systems, methods, and computer-readable storage devices for building classifiers in a semi-supervised or unsupervised way. An example system implementing the method can receive a human-generated map which identifies categories of transcriptions. Then the system can receive a set of machine transcriptions. The system can process each machine transcription in the set of machine transcriptions via a set of natural language understanding classifiers, to yield a machine map, the machine map including a set of classifications and a classification score for each machine transcription in the set of machine transcriptions. Then the system can generate silver annotated data by combining the human-generated map and the machine map. The algorithm can include different branches for when the machine transcription is available, when partial results are available, when no results are found for the machine transcription, and so forth. | 05-28-2015 |
20150149177 | Sharing Intents to Provide Virtual Assistance in a Multi-Person Dialog - A computing system is operable as virtual personal assistant (VPA) to understand relationships between different instances of natural language dialog expressed by different people in a multi-person conversational dialog session. The VPA can develop a common resource, a shared intent, which represents the VPA's semantic understanding of at least a portion of the multi-person dialog experience. The VPA can store and manipulate multiple shared intents, and can alternate between different shared intents as the multi-person conversation unfolds. With the shared intents, the computing system can generate useful action items and present the action items to one or more of the participants in the dialog session. | 05-28-2015 |
20150294665 | UNSUPERVISED TRAINING METHOD, TRAINING APPARATUS, AND TRAINING PROGRAM FOR N-GRAM LANGUAGE MODEL - A computer-based, unsupervised training method for an N-gram language model includes reading, by a computer, recognition results obtained as a result of speech recognition of speech data; acquiring, by the computer, a reliability for each of the read recognition results; referring, by the computer, to the recognition result and the acquired reliability to select an N-gram entry; and training, by the computer, the N-gram language model about selected one of more of the N-gram entries using all recognition results. | 10-15-2015 |
20150310859 | Method and Apparatus For Passive Data Acquisition In Speech Recognition and Natural Language Understanding - Speech recognition systems often process speech by employing models and analyzing audio data. An embodiment of the method and corresponding system described herein allow for passive monitoring of for example, conversation between user(s) to determine context to use to prime model(s) for later speech recognition requests submitted to the speech recognition system. The embodiment improves the results of the speech recognition system by updating speech recognition model(s) with contextual information of the conversation. This increases the probability that the speech recognition system interprets the conversation to contextually relevant information. | 10-29-2015 |
20150310862 | DEEP LEARNING FOR SEMANTIC PARSING INCLUDING SEMANTIC UTTERANCE CLASSIFICATION - One or more aspects of the subject disclosure are directed towards performing a semantic parsing task, such as classifying text corresponding to a spoken utterance into a class. Feature data representative of input data is provided to a semantic parsing mechanism that uses a deep model trained at least in part via unsupervised learning using unlabeled data. For example, if used in a classification task, a classifier may use an associated deep neural network that is trained to have an embeddings layer corresponding to at least one of words, phrases, or sentences. The layers are learned from unlabeled data, such as query click log data. | 10-29-2015 |
20150325235 | Language Model Optimization For In-Domain Application - Systems and methods are provided for optimizing language models for in-domain applications through an iterative, joint-modeling approach that expresses training material as alternative representations of higher-level tokens, such as named entities and carrier phrases. From a first language model, an in-domain training corpus may be represented as a set of alternative parses of tokens. Statistical information determined from these parsed representations may be used to produce a second (or updated) language model, which is further optimized for the domain. The second language model may be used to determine another alternative parsed representation of the corpus for a next iteration, and the statistical information determined from this representation may be used to produce a third (or further updated) language model. Through each iteration, a language model may be determined that is further optimized for the domain. | 11-12-2015 |
20150325237 | USER QUERY HISTORY EXPANSION FOR IMPROVING LANGUAGE MODEL ADAPTATION - Query history expansion may be provided. Upon receiving a spoken query from a user, an adapted language model may be applied to convert the spoken query to text. The adapted language model may comprise a plurality of queries interpolated from the user's previous queries and queries associated with other users. The spoken query may be executed and the results of the spoken query may be provided to the user. | 11-12-2015 |
20150325238 | Voice Recognition Method And Electronic Device - A voice recognition method and an electronic device are described. The method is applicable in an electronic device having a voice recognition system. The method includes acquiring first voice information of a user; recognizing the first voice information on the basis of a first recognition file library, acquiring a first recognition result, where the first recognition file library is a recognition file library updated from a second recognition file library of the voice recognition system on the basis of usage information expressing usage and syntax habits of the user, the first recognition file library includes an M-number of recognition entries, the second recognition file library includes an N-number of recognition entries, where M is an integer greater than or equal to one, and N is an integer greater than or equal to one. | 11-12-2015 |
20150332665 | SYSTEM AND METHOD FOR DATA-DRIVEN SOCIALLY CUSTOMIZED MODELS FOR LANGUAGE GENERATION - Systems, methods, and computer-readable storage devices for generating speech using a presentation style specific to a user, and in particular the user's social group. Systems configured according to this disclosure can then use the resulting, personalized, text and/or speech in a spoken dialogue or presentation system to communicate with the user. For example, a system practicing the disclosed method can receive speech from a user, identify the user, and respond to the received speech by applying a personalized natural language generation model. The personalized natural language generation model provides communications which can be specific to the identified user. | 11-19-2015 |
20150332672 | Knowledge Source Personalization To Improve Language Models - Systems and methods are provided for improving language models for speech recognition by personalizing knowledge sources utilized by the language models to specific users or user-population characteristics. A knowledge source, such as a knowledge graph, is personalized for a particular user by mapping entities or user actions from usage history for the user, such as query logs, to the knowledge source. The personalized knowledge source may be used to build a personal language model by training a language model with queries corresponding to entities or entity pairs that appear in usage history. In some embodiments, a personalized knowledge source for a specific user can be extended based on personalized knowledge sources of similar users. | 11-19-2015 |
20150340035 | AUTOMATED GENERATION OF PHONEMIC LEXICON FOR VOICE ACTIVATED COCKPIT MANAGEMENT SYSTEMS - A system, method and program for acquiring from an input text a character string set and generating the pronunciation thereof which should be recognized as a word is disclosed. The system selects from an input text, plural candidate character strings which are phonemic character candidates or allophones to be recognized as a word; generates plural pronunciation candidates of the selected candidate character string and outputs the optimum pronunciation candidate to be recognized as a word; generates phonemic dictionary by combining data in which the pronunciation candidate with optimal recognition is respectively associated with the character strings; generates recognition data in which character strings respectively indicating plural words contained in the input speech are associated with pronunciations; and outputs a combination contained in the recognition data, out of combinations each consisting of one of the candidate character strings and the one of the pronunciations candidates with the optimum recognition. | 11-26-2015 |
20150348541 | Generating Language Models - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating language models. In some implementations, data is accessed that indicates a set of classes corresponding to a concept. A first language model is generated in which a first class represents the concept. A second language model is generated in which second classes represent the concept. Output of the first language model and the second language model is obtained, and the outputs are evaluated. A class from the set of classes is selected based on evaluating the output of the first language model and the output of the second language model. In some implementations, the first class and the second class are selected from a parse tree or other data that indicates relationships among the classes in the set of classes. | 12-03-2015 |
20150348543 | Speech Recognition of Partial Proper Names by Natural Language Processing - A method for speech recognition of partial proper names is described which includes natural language processing (NLP), partial name candidate generation, speech recognition and post processing. Natural language processing techniques including shallow and deep parsing are applied to long proper names to identify syntactic units (for example, noun phrases). The syntactic units form a basis for generating a candidate list of partial names for each original full name. A partial name is part of the original name, with some words omitted, or word order changed, or even word substitution. After candidate partial names are generated, their phonetic transcriptions are incorporated into a model for a speech recognizer to recognize the partial names in a speech recognition system. | 12-03-2015 |
20150348569 | SEMANTIC-FREE TEXT ANALYSIS FOR IDENTIFYING TRAITS - A method, system, and/or computer program product uses speech traits of an entity to predict a future state of the entity. Units of speech are collected from a stream of speech that is generated by a first entity. Tokens from the stream of speech are identified, where each token identifies a particular unit of speech from the stream of speech, and where identification of the tokens is semantic-free. Nodes in a first speech graph are populated with the tokens, and a first shape of the first speech graph is identified. The first shape is matched to a second shape, where the second shape is of a second speech graph from a second entity in a known category. The first entity is assigned to the known category, and a future state of the first entity is predicted based on the first entity being assigned to the known category. | 12-03-2015 |
20150364132 | STRUCTURED NATURAL LANGUAGE REPRESENTATIONS - In accordance with aspects of the disclosure, a computing device may identify a prompt associated with an automated dialog application. An application expectation of the automated dialog application may be identified. The application expectation may comprise a structured natural language representation for a natural language response to the prompt. The computing device may receive natural language input responsive to the prompt, populate one or more data fields of the structured natural language representation with at least a portion of the natural language input, and may respond to the application expectation using the one or more data fields of the structured natural language representation. | 12-17-2015 |
20150364134 | GEO-SPATIAL EVENT PROCESSING - A geo-spatial grammar comprises rules, syntax, and other means by which a data input is determined to have a meaning associated with a particular event. The event may then be provided to an application, such as a calendaring or messaging application. As a benefit, an input, such as a user speaking the phrase, “I'll be there in an hour,” may be interpreted, via the geo-spatial grammar, as an event (e.g., “I'll be in the office,” “I'll join you for dinner,” “I'll be home,” etc.). An application may then perform an action based upon the event (e.g., reschedule the meeting that starts in five minutes, present directions to the restaurant on the user's car's navigation system, notify the user's spouse, etc.). | 12-17-2015 |
20160005400 | SPEECH-RECOGNITION DEVICE AND SPEECH-RECOGNITION METHOD - With respect to speech data | 01-07-2016 |
20160042735 | Dialog Flow Management In Hierarchical Task Dialogs - Methods and systems for managing multiple tasks using a dialog are presented. In some embodiments, a processor may parse a first natural language user input received at a user device to extract task related information from the first natural language user input. In response to identifying that the first natural language user input comprises a request to perform a first task, the processor may initiate execution of the first task. The user device may receive a second natural language user input after execution of the first task has been initiated which requests execution of a second task. The processor may initiate execution of the second task before execution of the first task is complete. | 02-11-2016 |
20160049152 | SYSTEM AND METHOD FOR HYBRID PROCESSING IN A NATURAL LANGUAGE VOICE SERVICES ENVIRONMENT - A system and method for hybrid processing in a natural language voice services environment that includes a plurality of multi-modal devices may be provided. In particular, the hybrid processing may generally include the plurality of multi-modal devices cooperatively interpreting and processing one or more natural language utterances included in one or more multi-modal requests. For example, a virtual router may receive various messages that include encoded audio corresponding to a natural language utterance contained in a multi-modal interaction provided to one or more of the devices. The virtual router may then analyze the encoded audio to select a cleanest sample of the natural language utterance and communicate with one or more other devices in the environment to determine an intent of the multi-modal interaction. The virtual router may then coordinate resolving the multi-modal interaction based on the intent of the multi-modal interaction. | 02-18-2016 |
20160063994 | Query Rewrite Corrections - Methods, systems, and apparatus, including computer programs encoded on computer storage media, for natural language processing. One of the methods includes receiving a first voice query; generating a first recognition output; receiving a second voice query; determining from a recognition of the second voice query that the second voice query triggers a correction request; using the first recognition output and the second recognition to determine a plurality of candidate corrections; scoring each candidate correction; and generating a corrected recognition output for a particular candidate correction having a score that satisfies a threshold value. | 03-03-2016 |
20160086389 | METHODS AND SYSTEMS FOR PROCESSING SPEECH TO ASSIST MAINTENANCE OPERATIONS - Methods and systems are provided for recording natural conversation of a user of a vehicle. In one embodiment, a method includes: recognizing speech from the recording; processing the recognized speech to determine a meaning associated with the speech; identifying a category of the speech based on the meaning; and generating a maintenance report to be used by a maintainer of the vehicle based on the category and the speech. | 03-24-2016 |
20160086601 | SYSTEM AND METHOD FOR USING SEMANTIC AND SYNTACTIC GRAPHS FOR UTTERANCE CLASSIFICATION - Disclosed herein is a system, method and computer readable medium storing instructions related to semantic and syntactic information in a language understanding system. The method embodiment of the invention is a method for classifying utterances during a natural language dialog between a human and a computing device. The method comprises receiving a user utterance; generating a semantic and syntactic graph associated with the received utterance, extracting all n-grams as features from the generated semantic and syntactic graph and classifying the utterance. Classifying the utterance may be performed any number of ways such as using the extracted n-grams, a syntactic and semantic graphs or writing rules. | 03-24-2016 |
20160092160 | USER ADAPTIVE INTERFACES - Systems and methods for providing a user adaptive natural language interface are disclosed. The disclosed embodiments may receive and analyze user input to derive current user behavior data, including data indicative of characteristics of the user input. The user input is classified based on prior user behavior data previously logged during one or more previous user-system interactions and the current user behavior data to generate a classification of the user input. Machine learning algorithms can be employed to classify the user input. User adaptive utterances are selected based on the user input and the classification of the user input. The user-system interaction is logged for use as prior user behavior data in future user-system interactions. A response to the user input is generated, including synthesizing output speech from the user adaptive utterances selected. Example applications of the disclosed systems and methods provide user adaptive navigation directions in navigation systems. | 03-31-2016 |
20160098988 | AUTOMATIC DATA-DRIVEN DIALOG DISCOVERY SYSTEM - Methods and systems for providing help prompts to a user of an automated dialog system are presented. In some embodiments, a computing device may receive a help request from the user of an automated dialog system. The help request may comprise a user request for information about one or more capabilities of the automated dialog system. The computing device may identify information expected to be input by the user to request that the automated dialog system perform its one or more capabilities. A natural language help prompt may be generated to provide guidance to the user to provide the identified information expected to be input. | 04-07-2016 |
20160111092 | System And Method For Distributed Speech Recognition - A system and method for distributed speech recognition is provided. A prompt is provided to a caller during a call. One or more audio responses are received from the caller in response to the prompt. Distributed speech recognition is performed on the audio responses by providing a non-overlapping section of a main grammar to each of a plurality of secondary recognizers for each audio response. Speech recognition is performed on the audio responses by each of the secondary recognizers using the non-overlapping section of the main grammar associated with that secondary recognizer. A new grammar is generated based on results of the speech recognition from each of the secondary recognizers. Further speech recognition is performed on the audio responses against the new grammar and a further prompt is selected for providing to the caller based on results of the distributed speech recognition. | 04-21-2016 |
20160118060 | METHODS AND SYSTEMS FOR PROCESSING A MULTIMEDIA CONTENT - The disclosed embodiments illustrate methods and systems for processing multimedia content. The method includes extracting one or more words from an audio stream associated with multimedia content. Each word has associated one or more timestamps indicative of temporal occurrences of said word in said multimedia content. The method further includes creating a word cloud of said one or more words in said multimedia content based on a measure of emphasis laid on each word in said multimedia content and said one or more timestamps associated with said one or more words. The method further includes presenting one or more multimedia snippets, of said multimedia content, associated with a word selected by a user from said word cloud. Each of said one or more multimedia snippets corresponds to said one or more timestamps associated with occurrences of said word in said multimedia content. | 04-28-2016 |
20160133250 | SYSTEM AND METHOD FOR ENHANCING SPEECH RECOGNITION ACCURACY - Disclosed herein are systems, computer-implemented methods, and computer-readable media for enhancing speech recognition accuracy. The method includes dividing a system dialog turn into segments based on timing of probable user responses, generating a weighted grammar for each segment, exclusively activating the weighted grammar generated for a current segment of the dialog turn during the current segment of the dialog turn, and recognizing user speech received during the current segment using the activated weighted grammar generated for the current segment. The method can further include assigning probability to the weighted grammar based on historical user responses and activating each weighted grammar is based on the assigned probability. Weighted grammars can be generated based on a user profile. A weighted grammar can be generated for two or more segments. Exclusively activating each weighted grammar can include a transition period blending the previously activated grammar and the grammar to be activated. | 05-12-2016 |
20160148612 | System and Method of Determining a Domain and/or an Action Related to a Natural Language Input - The disclosure relates to methods, systems and other embodiments directed to determining an information domain match for a natural language (NL) input (e.g., a spoken utterance), and confirming whether the NL input is correctly matched to the information domain. For example, after receiving an NL input, a first information domain to which the NL input belongs and a feature value set may be determined based on a semantic pattern matching technique. Further, a second information domain to which the NL input belongs, and a corresponding confidence score related to the second information domain may be determined. The second information domain may be determined based on a first statistical classification technique. Based on the determined feature value set and the confidence score related to the second information domain, it may be confirmed whether the NL input correctly belongs to the first information domain, e.g., based on a second statistical classification technique. | 05-26-2016 |
20160155439 | SYSTEM AND METHOD OF SPOKEN LANGUAGE UNDERSTANDING IN HUMAN COMPUTER DIALOGS | 06-02-2016 |
20160180840 | SYSTEMS AND METHODS FOR IMPROVING SPEECH RECOGNITION PERFORMANCE BY GENERATING COMBINED INTERPRETATIONS | 06-23-2016 |
20160188292 | SYSTEM AND METHOD FOR INTERPRETING NATURAL LANGUAGE INPUTS BASED ON STORAGE OF THE INPUTS - In certain implementations, a system and method for interpreting natural language inputs based on storage of the inputs is provided. A natural language input of a user may be obtained. The natural language input may be obtained via an input mode. The natural language input may be processed to determine a first interpretation of the natural language input. The natural language input may be stored based on a data format associated with the input mode. The natural language input may be obtained from storage. The natural language input obtained from storage may be reprocessed to determine a second interpretation of the natural language input. | 06-30-2016 |
20160196257 | GRAMMAR CORRECTING METHOD AND APPARATUS | 07-07-2016 |
20160203121 | ANALYSIS OBJECT DETERMINATION DEVICE AND ANALYSIS OBJECT DETERMINATION METHOD | 07-14-2016 |
20160253989 | SPEECH RECOGNITION ERROR DIAGNOSIS | 09-01-2016 |
20160379629 | METHOD AND SYSTEM OF AUTOMATIC SPEECH RECOGNITION WITH DYNAMIC VOCABULARIES - A system, article, and method of automatic speech recognition with dynamic vocabularies is described herein. | 12-29-2016 |
20170236514 | Integration and Probabilistic Control of Electronic Devices | 08-17-2017 |
20180025722 | SYSTEM AND METHOD FOR ENHANCING SPEECH RECOGNITION ACCURACY USING WEIGHTED GRAMMARS BASED ON USER PROFILE INCLUDING DEMOGRAPHIC, ACCOUNT, TIME AND DATE INFORMATION | 01-25-2018 |
20180025725 | SYSTEMS AND METHODS FOR ACTIVATING A VOICE ASSISTANT AND PROVIDING AN INDICATOR THAT THE VOICE ASSISTANT HAS ASSISTANCE TO GIVE | 01-25-2018 |
20180025726 | CREATING COORDINATED MULTI-CHATBOTS USING NATURAL DIALOGUES BY MEANS OF KNOWLEDGE BASE | 01-25-2018 |
20190147044 | UNDERSPECIFICATION OF INTENTS IN A NATURAL LANGUAGE PROCESSING SYSTEM | 05-16-2019 |
20190147850 | INTEGRATION OF THIRD PARTY VIRTUAL ASSISTANTS | 05-16-2019 |
20190147868 | VOICE INTERACTION METHOD AND APPARATUS, TERMINAL, SERVER AND READABLE STORAGE MEDIUM | 05-16-2019 |
20190147873 | METHOD FOR CHECKING AN ONBOARD SPEECH DETECTION SYSTEM OF A MOTOR VEHICLE AND CONTROL DEVICE AND MOTOR VEHICLE | 05-16-2019 |
20190147875 | CONTINUOUS TOPIC DETECTION AND ADAPTION IN AUDIO ENVIRONMENTS | 05-16-2019 |
20190147877 | METHOD FOR ASSISTING HUMAN-COMPUTER INTERACTION AND COMPUTER-READABLE MEDIUM | 05-16-2019 |
20220139372 | NATURAL LANGUAGE DOMAIN CORPUS DATA SET CREATION BASED ON ENHANCED ROOT UTTERANCES - Systems and methods for generating a natural language domain corpus to train a machine learning natural language understanding process. A base utterance expressing an intent and an intent profile indicating at least one of categories, keywords, concepts, sentiment, entities, or emotion of the intent are received. Machine translation translates the base utterance into a plurality of foreign language utterances and back into respective utterances in the target natural language to create a normalized utterance set. Analysis of each utterance in the normalized utterance set determines respective meta information for each such utterance. Comparison of the meta information to the intent profile determines a highest ranking matching utterance within the normalized utterance set. A set of natural language data to train a machine learning natural language understating process is created based on further natural language translations of the highest ranking matching utterance. | 05-05-2022 |