Entries |
Document | Title | Date |
20080208582 | Methods for statistical analysis of speech - Computer-implemented methods and apparatus are provided to facilitate the recognition of the content of a body of speech data. In one embodiment, a method for analyzing verbal communication is provided, comprising acts of producing an electronic recording of a plurality of spoken words; processing the electronic recording to identify a plurality of word alternatives for each of the spoken words, each of the plurality of word alternatives being identified by comparing a portion of the electronic recording with a lexicon, and each of the plurality of word alternatives being assigned a probability of correctly identifying a spoken word; loading the word alternatives and the probabilities to a database for subsequent analysis; and examining the word alternatives and the probabilities to determine at least one characteristic of the plurality of spoken words. | 08-28-2008 |
20080215325 | TECHNIQUE FOR ACCURATELY DETECTING SYSTEM FAILURE - An apparatus, method and program for dividing a conversational dialog into utterance. The apparatus includes a computer processor; a word database for storing spellings and pronunciations of words; a grammar database for storing syntactic rules on words; a pause detecting section which detects a pause location in a channel making a main speech among conversational dialogs inputted in at least two channels; an acknowledgement detecting section which detects an acknowledgement location in a channel not making the main speech; a boundary-candidate extracting section which extracts boundary candidates in the main speech, by extracting pauses existing within a predetermined range before and after a base point that is the acknowledgement location; and a recognizing unit which outputs a word string of the main speech segmented by one of the extracted boundary candidates after dividing the segmented speech into optimal utterance in reference to the word database and grammar database. | 09-04-2008 |
20080215326 | SPEAKER ADAPTATION OF VOCABULARY FOR SPEECH RECOGNITION - A phonetic vocabulary for a speech recognition system is adapted to a particular speaker's pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed. | 09-04-2008 |
20080215327 | Method For Processing Speech Data For A Distributed Recognition System - Speech signal information is formatted, processed and transported in accordance with a format adapted for TCP/IP protocols used on the Internet and other communications networks. NULL characters are used for indicating the end of a voice segment. The method is useful for distributed speech recognition systems such as a client-server system, typically implemented on an intranet or over the Internet based on user queries at his/her computer, a PDA, or a workstation using a speech input interface. | 09-04-2008 |
20080221889 | MOBILE CONTENT SEARCH ENVIRONMENT SPEECH PROCESSING FACILITY - In embodiments of the present invention improved capabilities are described for a mobile environment speech processing facility. The present invention may provide for the entering of text into a content search software application resident on a mobile communication facility, where speech may be recorded using the mobile communications facility's resident capture facility. Transmission of the recording may be provided through a wireless communication facility to a speech recognition facility. Results may be generated utilizing the speech recognition facility that may be independent of structured grammar, and may be based at least in part on the information relating to the recording. The results may then be transmitted to the mobile communications facility, where they may be loaded into the content search software application. In embodiments, the user may be allowed to alter the results that are received from the speech recognition facility. In addition, the speech recognition facility may be adapted based on usage. | 09-11-2008 |
20080221890 | UNSUPERVISED LEXICON ACQUISITION FROM SPEECH AND TEXT - Techniques for acquiring, from an input text and an input speech, a set of a character string and a pronunciation thereof which should be recognized as a word. A system according to the present invention: selects, from an input text, plural candidate character strings which are candidates to be recognized as a word; generates plural pronunciation candidates of the selected candidate character strings; generates frequency data by combining data in which the generated pronunciation candidates are respectively associated with the character strings; generates recognition data in which character strings respectively indicating plural words contained in the input speech are associated with pronunciations; and selects and outputs a combination contained in the recognition data, out of combinations each consisting of one of the candidate character strings and one of the pronunciation candidates. | 09-11-2008 |
20080228483 | Method, Device And System for Implementing Speech Recognition Function - The present disclosure discloses a method, a device and a system for implementing a speech recognition function, in which a media resource control device controls a media resource processing device to recognize a speech input by a user via H.248 protocol. The method includes receiving, by the media resource processing device, an H.248 message carrying a speech recognition instruction and a related parameter sent by the media resource control device; performing speech recognition according to the speech recognition instruction and the parameter; and reporting a recognition result to the media resource control device. A corresponding device and system for implementing the speech recognition function is further provided. | 09-18-2008 |
20080228484 | Techniques for Aiding Speech-to-Speech Translation - Techniques for assisting in translation are provided. A speech recognition hypothesis is obtained, corresponding to a source language utterance. Information retrieval is performed on a supplemental database, based on a situational context, to obtain at least one word string that is related to the source language utterance. The speech recognition hypothesis and the word string are then formatted for display to a user, to facilitate an appropriate selection by the user for translation. | 09-18-2008 |
20080235018 | Method and System for Determing the Topic of a Conversation and Locating and Presenting Related Content - A method and system are disclosed for determining the topic of a conversation and obtaining and presenting related content. The disclosed system provides a “creative inspirator” in an ongoing conversation. The system extracts keywords from the conversation and utilizes the keywords to determine the topic(s) being discussed. The disclosed system then conducts searches to obtain supplemental content based on the topic(s) of the conversation. The content can be presented to the participants in the conversation to supplement their discussion. A method is also disclosed for determining the topic of a text document including transcripts of audio tracks, newspaper articles, and journal papers. | 09-25-2008 |
20080235019 | AGE DETERMINATION USING SPEECH - A device may include logic configured to receive voice data from a user, identify a result from the voice data, calculate a confidence score associated with the result, and determine a likely age range associated with the user based on the confidence score. | 09-25-2008 |
20080243505 | Method for variable resolution and error control in spoken language understanding - A method for variable resolution and error control in spoken language understanding (SLU) allows arranging the categories of the SLU into a hierarchy of different levels of specificity. The pre-determined hierarchy is used to identify different types of errors such as high-cost errors and low-cost errors and trade, if necessary, high cost errors for low cost errors. | 10-02-2008 |
20080262841 | APPARATUS AND METHOD FOR RENDERING CONTENTS, CONTAINING SOUND DATA, MOVING IMAGE DATA AND STATIC IMAGE DATA, HARMLESS - A method of rendering multimedia contents harmless is described. The method includes: reading out a predetermined word and the contents from a recording apparatus; replacing the predetermined word in transcript data with a different word, and setting the transcript data including the different word, and the predetermined word, respectively, as transcript data of harmless contents, and as transcript data of unique information; replacing the predetermined word with the different word, and setting the sound data including the different word and the predetermined word according to a time when the predetermined word appears in the firstly mentioned transcript data, respectively, as sound data of the harmless contents, and as sound data of the unique information; replacing the predetermined word in the presentation data with the different word, and the predetermined word, respectively, as presentation data of the harmless contents, and as presentation data of the unique information; recording the harmless contents; and recording the unique information. | 10-23-2008 |
20080262842 | PORTABLE COMPUTER WITH SPEECH RECOGNITION FUNCTION AND METHOD FOR PROCESSING SPEECH COMMAND THEREOF - A portable computer with a speech recognition function and the method for processing a speech command thereof is disclosed. In the method of a speech command, the speech command has Y command character strings, wherein Y is a positive integer and which is greater than or equal to one. The method includes a step: providing a plurality of speech recognition databases and loading a corresponding speech recognition database responding to execute the X-th command string of the speech command, wherein X is a positive integer and is greater than or equal to one and is less than or equal to N. When the string corresponding to the X-th command character string is found in the loaded speech recognition database, an operation designated by the X-th command string is executed, and when X is not equal to Y, one is added to X. | 10-23-2008 |
20080270133 | Speech model refinement with transcription error detection - Reliable transcription error-checking algorithm that uses a word confidence score and a word duration probability to detect transcription errors for improved results through the automatic detection of transcription errors in a corpus. The transcription error-checking algorithm is combined model training so as to use a current model to detect transcription errors, remove utterances which contain incorrect transcription (or manually fix the found errors), and retrain the model. This process can be repeated for several iterations to obtain an improved speech recognition model. The speech model is employed to achieve speech-transcription alignment to obtain a word boundary. Speech recognizer is then utilized to generate a word-lattice. Using the word boundary and word lattice, error detection is computed using a word confidence score and a word duration probability. | 10-30-2008 |
20080270134 | Hybrid-captioning system - A hybrid-captioning system for editing captions for spoken utterances within video includes an editor-type caption-editing subsystem, a line-based caption-editing subsystem, and a mechanism. The editor-type subsystem is that in which captions are edited for spoken utterances within the video on a groups-of-line basis without respect to particular lines of the captions and without respect to temporal positioning of the captions in relation to the spoken utterances. The line-based subsystem is that in which captions are edited for spoken utterances within the video on a line-by-line basis with respect to particular lines of the captions and with respect to temporal positioning of the captions in relation to the spoken utterances. For each section of spoken utterances within the video, the mechanism is to select the editor-type or the line-based subsystem to provide captions for the section of spoken utterances in accordance with a predetermined criteria. | 10-30-2008 |
20080281596 | CONTINUOUS ADAPTATION IN DETECTION SYSTEMS VIA SELF-TUNING FROM TARGET POPULATION SUBSETS - The present invention provides a system and method for treating distortion propagated though a detection system. The system includes a compensation module that compensates for untreated distortions propagating through the detection compensation system, a user model pool that comprises of a plurality of model sets, and a model selector that selects at least one model set from plurality of model sets in the user model pool. The compensation is accomplished by continually producing scores distributed according to a prescribed distribution for the at least one model set and mitigating the adverse effects of the scores being distorted and lying off a pre-set operating point. | 11-13-2008 |
20080288253 | AUTOMATIC SPEECH RECOGNITION METHOD AND APPARATUS, USING NON-LINEAR ENVELOPE DETECTION OF SIGNAL POWER SPECTRA - An automatic speech recognition method includes converting an acoustic signal into a digital signal; determine a power spectrum of at least one portion of the digital signal; and non-linearly determining envelope values of the power spectrum at a plurality of respective frequencies, based on a combination of the power spectrum with a filter function. Non-linearly determining envelope values involves calculating each envelope value based on a respective number of values of the power spectrum and of the filter function and the respective number of values is correlated to the respective frequency of the envelope value. | 11-20-2008 |
20080288254 | Voice recognition apparatus and navigation apparatus - A voice recognition apparatus recognizes speaker's voice collected by a microphone, determines whether a telephone number is grouped into categories based on an inclusion of vocabulary in the telephone number that divides the telephone number into groups such as an area code, a city code and a subscriber number, and displays the telephone number in a display part in a grouped form of the area code, city code and subscriber number. | 11-20-2008 |
20080294436 | SPEECH RECOGNITION FOR IDENTIFYING ADVERTISEMENTS AND/OR WEB PAGES - A device may identify terms in a speech signal using speech recognition. The device may further retain one or more of the identified terms by comparing them to a set of words and send the retained terms and information associated with the retained terms to a remote device. The device may also receive messages that are related to the retained terms and to the information associated with the retained terms from the remote device. | 11-27-2008 |
20080294437 | LANGUAGE UNDERSTANDING DEVICE - A language understanding device includes: a language understanding model storing unit configured to store word transition data including pre-transition states, input words, predefined outputs corresponding to the input words, word weight information, and post-transition states, and concept weighting data including concepts obtained from language understanding results for at least one word, and concept weight information corresponding to the concepts; a finite state transducer processing unit configured to output understanding result candidates including the predefined outputs, to accumulate word weights so as to obtain a cumulative word weight, and to sequentially perform state transition operations; a concept weighting processing unit configured to accumulate concept weights so as to obtain a cumulative concept weight; and an understanding result determination unit configured to determine an understanding result from the understanding result candidates by referring to the cumulative word weight and the cumulative concept weight. | 11-27-2008 |
20080294438 | Apparatus for the processing of sales - The invention relates to an apparatus for the processing of sales of articles of a product assortment to a customer, in particular to store scales, having a microphone device for listening to a conversation between a customer and a salesperson for the conversion of connected spoken words of the conversation into an electrical speech signal, having a speech recognition device for the generation of a speech recognition result representing the words from the electrical speech signal having a comparator for the comparison of the speech recognition result with keywords stored in a memory device of the apparatus for keyword recognition in the speech recognition result, with at least some of the stored keywords being product names which define a group of keywords, and having a control device which, on the detection of a keyword belonging to the group and/or of a permitted combination of keywords, which includes a keyword belonging to the group, is adapted to output a piece of information associated with the detected keyword or with the detected permitted combination and available via a data source or to output an offer for the output of the information, in particular in the form of a section menu, by means of an output device of the apparatus. | 11-27-2008 |
20080294439 | Speech screening - This invention relates to screening of spoken audio data so as to detect threat words or phrases. The method is particularly useful for protecting children or vulnerable adults from unsuitable content and/or suspicious or threatening contact with others via a communication medium. The method is applicable to screening speech transmitted over a computer network such as the internet and provides screening of access to stored content, e.g. audio or multimedia data files, as well as real time speech such as live broadcasts or communication via voice over IP or similar communication protocols. The method allows an administrator, e.g. a parent, to identify groups of threat words or phrases to be monitored, to set user access levels and to determine appropriate responses when threat words or phrases are detected. | 11-27-2008 |
20080294440 | METHOD AND SYSTEM FOR ASSESSING PRONUNCIATION DIFFICULTIES OF NON-NATIVE SPEAKERSL - The present disclosure presents a useful metric for assessing the relative difficulty which non-native speakers face in pronouncing a given utterance and a method and systems for using such a metric in the evaluation and assessment of the utterances of non-native speakers. In an embodiment, the metric may be based on both known sources of difficulty for language learners and a corpus-based measure of cross-language sound differences. The method may be applied to speakers who primarily speak a first language speaking utterances in any non-native second language. | 11-27-2008 |
20080300878 | Method For Transporting Speech Data For A Distributed Recognition System - Speech signal information is formatted, processed and transported in accordance with a format adapted for TCP/IP protocols used on the Internet and other communications networks. NULL characters are used for indicating the end of a voice segment. The method is useful for distributed speech recognition systems such as a client-server system, typically implemented on an intranet or over the Internet based on user queries at his/her computer, a PDA, or a workstation using a speech input interface. | 12-04-2008 |
20080312927 | Computer assisted interactive writing instruction method - A computer assisted method of instruction for writing that is directed toward students of English as a second language. The method includes instruction for a plurality of composition types with interactive assistance in the writing strategies and tone of writing. The student selects the type of composition and can choose from a plurality of instructions in writing strategy displayed and with audio playback. The student interactively proceeds with writing and revision as the strategies are reviewed until completion of the writing process. | 12-18-2008 |
20080319748 | Conversation System and Conversation Software - A first domain satisfying a first condition concerning a current utterance understanding result and a second domain satisfying a second condition concerning a selection history are specified. For each of the first and second domains, indices representing reliability in consideration of the utterance understanding history, selection history, and utterance generation history are evaluated. Based on the evaluation results, one of the first, second, and third domains is selected as a current domain according to a selection rule. | 12-25-2008 |
20090006095 | LEARNING TO REORDER ALTERNATES BASED ON A USER'S PERSONALIZED VOCABULARY - Learning to reorder alternates based on a user's personalized vocabulary may be provided. An alternate list provided to a user for replacing words input by the user via a character recognition application may be reordered based on data previously viewed or input by the user (personal data). The alternate list may contain generic data, for example, words for possible substitution with one or more words input by the user. By using the user's personal data and statistical learning methodologies in conjunction with generic data in the alternate list, the alternate list can be reordered to present a top alternate that more closely reflect the user's vocabulary. Accordingly, the user is presented with a top alternate that is more likely to be used by the user to replace data incorrectly input. | 01-01-2009 |
20090012789 | Method and process for performing category-based analysis, evaluation, and prescriptive practice creation upon stenographically written and voice-written text files - System and method for electronically identifying and analyzing the type and frequency of errors and mismatches in a stenographically or voice written text against a stored master file and dynamically creating personalized user feedback, drills, and practice based on identified errors and mismatches from within the context of the stored master file. The system provides the user with a plurality of methods to enter a text file for error identification and analysis including both realtime and non-realtime input. The text input is then compared to a stored master file through a word-by-word iterative process which produces a comparison of writing input and stored master wherein errors and mismatches are identified and grouped in a plurality of pre-defined and user-selected categories, each of which is color-coded to facilitate pattern recognition of type and frequency of errors and mismatches in the submitted writing. | 01-08-2009 |
20090012790 | SPEECH RECOGNITION APPARATUS AND CONTROL METHOD THEREOF - A speech recognition apparatus which improves the sound quality of speech output as a speech recognition result is provided. The speech recognition apparatus includes a recognition unit, which recognizes speech based on a recognition dictionary, and a registration unit, which registers a dictionary entry of a new recognition word in the recognition dictionary. The recognition unit includes a generation unit, which generates a dictionary entry including speech of the new recognition word item and feature parameters of the speech, and a modification unit, which makes a modification for improving the sound quality of the speech included in the dictionary entry generated by the generation unit. The recognition unit includes a speech output unit, which outputs speech which is included in a dictionary entry corresponding to the recognition result of input speech, and is modified by the modification unit. | 01-08-2009 |
20090012791 | REFERENCE PATTERN ADAPTATION APPARATUS, REFERENCE PATTERN ADAPTATION METHOD AND REFERENCE PATTERN ADAPTATION PROGRAM - A method and apparatus for carrying out adaptation using input speech data information even at a low reference pattern recognition performance. A reference pattern adaptation device | 01-08-2009 |
20090018832 | INFORMATION COMMUNICATION TERMINAL, INFORMATION COMMUNICATION SYSTEM, INFORMATION COMMUNICATION METHOD, INFORMATION COMMUNICATION PROGRAM, AND RECORDING MEDIUM RECORDING THEREOF - An information communication terminal ( | 01-15-2009 |
20090037174 | Understanding spoken location information based on intersections - In one embodiment, the present system recognizes a user's speech input using an automatically generated probabilistic context free grammar for street names that maps all pronunciation variations of a street name to a single canonical representation during recognition. A tokenizer expands the representation using position-dependent phonetic tokens and an intersection classifier classifies an intersection, despite the presence of recognition errors and incomplete street names. | 02-05-2009 |
20090037175 | Confidence measure generation for speech related searching - A voice search system has a speech recognizer, a search component, and a dialog manager. A confidence measure generator receives speech recognition features from the speech recognizer, search features from the search component, and dialog features from the dialog manager, and calculates an overall confidence measure for voice search results based upon the features received. The invention can be extended to include the generation of additional features, based on those received from the individual components of the voice search system. | 02-05-2009 |
20090043580 | System and Method for Controlling the Operation of a Device by Voice Commands - The present invention includes a speech recognition system comprising a light element, a power control switch, the power control switch varying the power delivered to the light element, a controller, a microphone, a speech recognizer coupled to the microphone for recognizing speech input signals and transmitting recognition results to the controller, and a speech synthesizer coupled to the controller for generating synthesized speech, wherein the controller varies the power to the light element in accordance with the recognition results received from the speech recognizer. Embodiments of the invention may alternatively include a low power wake up circuit. In another embodiment, the present invention is a method of controlling a device by voice commands. | 02-12-2009 |
20090055180 | System and method for optimizing speech recognition in a vehicle - A system is provided for controlling personalized settings in a vehicle. The system includes a microphone for receiving spoken commands from a person in the vehicle, a location recognizer for identifying location of the speaker, and an identity recognizer for identifying the identity of the speaker. The system also includes a speech recognizer for recognizing the received spoken commands. The system further includes a controller for processing the identified location, identity and commands of the speaker. The controller controls one or more feature settings based on the identified location, identified identity and recognized spoken commands of the speaker. The system also optimizes on the beamforming microphone array used in the vehicle. | 02-26-2009 |
20090055181 | MOBILE TERMINAL AND METHOD OF INPUTTING MESSAGE THERETO - A mobile terminal and a method of inputting a message thereto are provided. The method of inputting a message includes analyzing an input voice signal and determining whether the voice signal corresponds to a message modification instruction, and modifying, if the voice signal corresponds to a message modification instruction, a message according to the voice signal. A user can thereby input a message through the input of a voice in the mobile terminal and modify the input message. | 02-26-2009 |
20090063147 | PHONETIC, SYNTACTIC AND CONCEPTUAL ANALYSIS DRIVEN SPEECH RECOGNITION SYSTEM AND METHOD - A new approach to speech recognition that reacts to concepts conveyed through speech, which shifts the balance of power in speech recognition from straight sound recognition and statistical models to a more powerful and complete approach determining and addressing conveyed concepts. A probabilistically unbiased multi-phoneme recognition process is employed, followed by a phoneme stream analysis process that builds the list of candidate words derived from recognized phonemes, followed by a permutation analysis process that produces sequences of candidate words with high potential of being syntactically valid, and finally, by processing targeted syntactic sequences in a conceptual analysis process to generate the utterance's conceptual representation that can be used to produce an adequate response. Applications include improving accuracy or automatically generating punctuation for transcription and dictation, word or concept spotting in audio streams, concept spotting in electronic text, customer support, call routing and other command/response scenarios. | 03-05-2009 |
20090063148 | CALIBRATION OF WORD SPOTS SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT - An example embodiment of the invention may include a system, a method and/or a computer program product for enabling calibrating of word spots resulting from a spoken query, including, e.g., but not limited to, presenting a plurality of word spots to a user, each of the plurality of word spots having a confidence level; determining by the user whether at least one of the plurality of word spots is a hit or a false positive by determining whether the at least one of the plurality of word spots matches at least one word In the spoken query; receiving a maximum acceptable percentage of false positives from the user; and determining an acceptable confidence threshold value for the spoken query by locating the smallest confidence level in the plurality of word spots below which the percentage of word spots in the plurality of word spots that are false positives exceeds the maximum acceptable percentage of false positives. | 03-05-2009 |
20090063149 | Speech retrieval apparatus - A speech retrieval apparatus derives a times series of pitch or power values of speech input as a retrieval condition, obtains a pattern of local maxima, local minima, and inflection points in the time series, compares this pattern with similar patterns obtained from speech stored in a speech database, and outputs only stored speech for which the compared patterns approximately match. Correct retrieval results are thereby obtained even from speech input including multiple accent nuclei. | 03-05-2009 |
20090070111 | METHODS, SYSTEMS, AND COMPUTER PROGRAM PRODUCTS FOR SPOKEN LANGUAGE GRAMMAR EVALUATION - A method, system, and computer program product for spoken language grammar evaluation are provided. The method includes playing a recorded question to a candidate, recording a spoken answer from the candidate, and converting the spoken answer into text. The method further includes comparing the text to a grammar database, calculating a spoken language grammar evaluation score based on the comparison, and outputting the spoken language grammar evaluation score. | 03-12-2009 |
20090083034 | VEHICLE CONTROL - The present invention relates to voice-activated vehicle control, and to the control of UAVs (unmanned air vehicles) using speech in particular. A method of controlling a vehicle is provided that includes receiving one or more instructions issued as speech and analyzing the speech using speech recognition software to provide a sequence of words and a word confidence measure for each word so recognized. The sequence of words is analyzed to identify a semantic concept corresponding to an instruction based on the analysis, and a semantic confidence level for the semantic concept identified derived at least in part with reference to the word confidence measures of the words associated with the semantic concept. A spoken confirmation of the semantic concept so identified based on the semantic confidence level is provided, and the semantic concept is used to provide a control input for the vehicle. | 03-26-2009 |
20090089057 | SPOKEN LANGUAGE GRAMMAR IMPROVEMENT TOOL AND METHOD OF USE - A system and method of improving language skills and, more particularly, a spoken language grammar improvement tool and method of use is provided. The method includes monitoring and analyzing a user's speech pattern; matching a stored undesirable phrase and/or word with the user's speech pattern; and providing feedback to the user when a match is found between the stored undesirable phrase and/or word and the user's speech pattern. A system for monitoring spoken language includes a computer infrastructure being operable to: detect a user's speech pattern; compare the user's speech pattern with stored undesirable words and/or phrases; and provide a notification type to a user that the user's speech pattern matches with at least one of the stored words and/or phrases. | 04-02-2009 |
20090089058 | Part-of-speech tagging using latent analogy - Methods and apparatuses to assign part-of-speech tags to words are described. An input sequence of words is received. A global fabric of a corpus having training sequences of words may be analyzed in a vector space. A global semantic information associated with the input sequence of words may be extracted based on the analyzing. A part-of-speech tag may be assigned to a word of the input sequence based on POS tags from pertinent words in relevant training sequences identified using the global semantic information. The input sequence may be mapped into a vector space. A neighborhood associated with the input sequence may be formed in the vector space wherein the neighborhood represents one or more training sequences that are globally relevant to the input sequence. | 04-02-2009 |
20090094031 | Method, Apparatus and Computer Program Product for Providing Text Independent Voice Conversion - An apparatus for providing text independent voice conversion may include a first voice conversion model and a second voice conversion model. The first voice conversion model may be trained with respect to conversion of training source speech to synthetic speech corresponding to the training source speech. The second voice conversion model may be trained with respect to conversion to training target speech from synthetic speech corresponding to the training target speech. An output of the first voice conversion model may be communicated to the second voice conversion model to process source speech input into the first voice conversion model into target speech corresponding to the source speech as the output of the second voice conversion model. | 04-09-2009 |
20090094032 | Systems and methods of performing speech recognition using sensory inputs of human position - Embodiments of the present invention improve methods of performing speech recognition using sensory inputs of human position. In one embodiment, the present invention includes a speech recognition method comprising sensing a change in position of at least one part of a human body, selecting a recognition set based on the change of position, receiving a speech input signal, and recognizing the speech input signal in the context of the first recognition set. | 04-09-2009 |
20090094033 | Systems and methods of performing speech recognition using historical information - Embodiments of the present invention improve speech recognition using historical information. In one embodiment, the present invention includes a method of performing speech recognition comprising receiving an identifier specifying a user of a kiosk, retrieving history information about the user using the identifier, receiving speech input, recognizing said speech input in the context of a first recognition set, resulting in first recognition results, and modifying the first recognition results using the history information. | 04-09-2009 |
20090106026 | Speech recognition method, device, and computer program - A speech recognition method including for a spoken expression: a) providing a vocabulary of words including predetermined subsets of words, b) assigning to each word of at least one subset an individual score as a function of the value of a criterion of the acoustic resemblance of that word to a portion of the spoken expression, c) for a plurality of subsets, assigning to each subset of the plurality of subsets a composite score corresponding to a sum of the individual scores of the words of said subset, d) determining at least one preferred subset having the highest composite score. | 04-23-2009 |
20090112593 | SYSTEM FOR RECOGNIZING SPEECH FOR SEARCHING A DATABASE - A system is provided for recognizing speech for searching a database. The system receives speech input as a spoken search request and then processes the speech input in a speech recognition step using a vocabulary for recognizing the spoken request. By processing the speech input words recognized in the speech input and included in the vocabulary are obtained to form at least one hypothesis. The hypothesis is then utilized to search a database using the at least one hypothesis as a search query. A search result is then received from the database and provided to the user. | 04-30-2009 |
20090119107 | SPEECH RECOGNITION BASED ON SYMBOLIC REPRESENTATION OF A TARGET SENTENCE - Systems and methods for processing a user speech input to determine whether the user has correctly read a target sentence string are provided. One disclosed method may include receiving a sentence array including component words of the target sentence string and processing the sentence array to generate a symbolic representation of the target sentence string. The symbolic representation may include a subset of words selected from the component words of the target sentence string, having fewer words than the sentence array. The method may include processing user speech input to recognize in the user speech input each of the words in the subset of words in the symbolic representation of the target sentence string. The method may further include, upon recognizing the subset of words, making a determination that the user has correctly read the target sentence string. | 05-07-2009 |
20090132250 | ROBOT APPARATUS WITH VOCAL INTERACTIVE FUNCTION AND METHOD THEREFOR - The present invention provides a robot apparatus with a vocal interactive function. The robot apparatus | 05-21-2009 |
20090138264 | SPEECH TO DTMF GENERATION - A method of speech to DTMF generation involving ASR-enabled and DTMF-controlled communications systems. The ASR-enabled system is used to recognize speech received from the DTMF-controlled telecommunications system using sampling rate independent speech recognition. It then identifies a speech segment contained in the speech received from the DTMF-controlled system that corresponds with at least one keyword associated with user-defined data. Then, the ASR-enabled system transmits at least one DTMF signal to the DTMF-controlled system in response to the identified speech segment. This allows a user of an ASR-enabled system such as a vehicle telematics unit to at least partially automate access to the DTMF-controlled system using the telematics unit, so that voice mailbox numbers, passwords, and the like normally entered via a telephone keypad can be automatically sent to the DTMF-controlled system from the telematics unit without having to be manually input each time by the user. | 05-28-2009 |
20090138265 | Joint Discriminative Training of Multiple Speech Recognizers - Adjusting model parameters is described for a speech recognition system that combines recognition outputs from multiple speech recognition processes. Discriminative adjustments are made to model parameters of at least one acoustic model based on a joint discriminative criterion over multiple complementary acoustic models to lower recognition word error rate in the system. | 05-28-2009 |
20090164216 | IN-VEHICLE CIRCUMSTANTIAL SPEECH RECOGNITION - A method of circumstantial speech recognition in a vehicle. A plurality of parameters associated with a plurality of vehicle functions are monitored as an indication of current vehicle circumstances. At least one vehicle function is identified as a candidate for user-intended ASR control based on user interaction with the vehicle. The identified vehicle function is then used to disambiguate between potential commands contained in speech received from the user. | 06-25-2009 |
20090171662 | Robust Information Extraction from Utterances - The performance of traditional speech recognition systems (as applied to information extraction or translation) decreases significantly with, larger domain size, scarce training data as well as under noisy environmental conditions. This invention mitigates these problems through the introduction of a novel predictive feature extraction method which combines linguistic and statistical information for representation of information embedded in a noisy source language. The predictive features are combined with text classifiers to map the noisy text to one of the semantically or functionally similar groups. The features used by the classifier can be syntactic, semantic, and statistical. | 07-02-2009 |
20090177471 | MODEL DEVELOPMENT AUTHORING, GENERATION AND EXECUTION BASED ON DATA AND PROCESSOR DEPENDENCIES - A recognition (e.g., speech, handwriting, etc.) model build process that is declarative and data-dependence-based. Process steps are defined in a declarative language as individual processors having input/output data relationships and data dependencies of predecessors and subsequent process steps. A compiler is utilized to generate the model building sequence. The compiler uses the input data and output data files of each model build processor to determine the sequence of model building and automatically orders the processing steps based on the declared input/output relationship (the user does not need to determine the order of execution). The compiler also automatically detects ill-defined processes, including cyclic definition and data being produced by more than one action. The user can add, change and/or modify a process by editing a declaration file, and rerunning the compiler, thereby a new process is automatically generated. | 07-09-2009 |
20090210228 | System for Dynamic Management of Customer Direction During Live Interaction - A system for customer interaction includes a telephony-enabled device for receiving voice calls from customers, a voice recognition engine connected to the telephony-enabled device for monitoring the voice channel, and an application server connected to the voice recognition engine for receiving notification when specific keywords phrases or tones are detected. The system is characterized in that the application server selects scripts for presentation to the customer based at least in part on the notifications received from the voice recognition engine. | 08-20-2009 |
20090210229 | Processing Received Voice Messages - A voice message processing system shortens received voice messages to reduce the time a user must spend in reviewing the user's voice messages. In some embodiments, a data file associated with a caller is created and updated with words and associated audio files that may be used to replace longer words or phrases in future voice messages from the caller. A user may manually configure preferences to aggressively shorten messages in some embodiments. A speech synthesizer may be employed to replace text in messages when sufficient audio files are not stored to provide sufficient processing of messages. An audible indicator may be played with a revised message to allow a user to play back at least a portion of the original, received message without the substituted portions. Such systems provide a user the opportunity to review messages in a reduced time. | 08-20-2009 |
20090216533 | STORED PHRASE REUTILIZATION WHEN TESTING SPEECH RECOGNITION - A set of audio phrases and corresponding phrase characteristics can be maintained, such as in a database. The phrase characteristics can include a translation of speech in the associated audio phrase. A finite state grammar that includes a set of textual phrases can be received. A software algorithm can execute to compare the set of textual phrases against the translations associated with the maintained audio phrases. A result of the software algorithm execution can be produced, where the result indicates phrase coverage for the finite state grammar based upon the audio phrases. | 08-27-2009 |
20090216534 | VOICE-ACTIVATED EMERGENCY MEDICAL SERVICES COMMUNICATION AND DOCUMENTATION SYSTEM - A method of documenting information as well as a documentation and communication system for documenting information with a wearable computing device of the type that includes a processing unit and a touchscreen display is provided. The method includes displaying at least one screen on the touchscreen display. A field on the screen in which to enter data is selected and speech input from a user is received. The speech input is converted to machine readable input and the machine readable input is displayed in the field on the at least one screen. | 08-27-2009 |
20090248415 | USE OF METADATA TO POST PROCESS SPEECH RECOGNITION OUTPUT - A method of utilizing metadata stored in a computer-readable medium to assist in the conversion of an audio stream to a text stream. The method compares personally identifiable data, such as a user's electronic address book and/or Caller/Recipient ID information (in the case of processing voice mail to text), to the n-best results generated by a speech recognition engine for each word that is output by the engine. A goal of this comparison is to correct a possible misrecognition of a spoken proper noun such as a name or company with its proper textual form or a spoken phone number to correctly formatted phone number with Arabic numerals to improve the overall accuracy of the output of the voice recognition system. | 10-01-2009 |
20090254344 | ACTIVE LABELING FOR SPOKEN LANGUAGE UNDERSTANDING - A spoken language understanding method and system are provided. The method includes classifying a set of labeled candidate utterances based on a previously trained classifier, generating classification types for each candidate utterance, receiving confidence scores for the classification types from the trained classifier, sorting the classified utterances based on an analysis of the confidence score of each candidate utterance compared to a respective label of the candidate utterance, and rechecking candidate utterances according to the analysis. The system includes modules configured to control a processor in the system to perform the steps of the method. | 10-08-2009 |
20090265171 | SEGMENTING WORDS USING SCALED PROBABILITIES - Systems, methods, and apparatuses including computer program products for segmenting words using scaled probabilities. In one implementation, a method is provided. The method includes receiving a probability of a n-gram identifying a word, determining a number of atomic units in the corresponding n-gram, identifying a scaling weight depending on the number of atomic units in the n-gram, and applying the scaling weight to the probability of the n-gram identifying a word to determine a scaled probability of the n-gram identifying a word. | 10-22-2009 |
20090271199 | Records Disambiguation In A Multimodal Application Operating On A Multimodal Device - Methods, apparatus, and products are disclosed for record disambiguation in a multimodal application operating on a multimodal device, the multimodal device supporting multiple modes of interaction including at least a voice mode and a visual mode, that include: prompting, by the multimodal application, a user to identify a particular record among a plurality of records; receiving, by the multimodal application in response to the prompt, a voice utterance from the user; determining, by the multimodal application, that the voice utterance ambiguously identifies more than one of the plurality of records; generating, by the multimodal application, a user interaction to disambiguate the records ambiguously identified by the voice utterance in dependence upon record attributes of the records ambiguously identified by the voice utterance; and selecting, by the multimodal application for further processing, one of the records ambiguously identified by the voice utterance in dependence upon the user interaction. | 10-29-2009 |
20090276219 | VOICE INPUT SYSTEM AND VOICE INPUT METHOD - In the present invention, a voice input system and a voice input method are provided. The voice input method includes the steps of: (A) initiating a speech recognition process by a first input associated with a first parameter of a first speech recognition subject; (B) providing a voice and a searching space constructed by a speech recognition model associated with the first speech recognition subject; (C) obtaining a sub-searching space from the searching space based on the first parameter; (D) searching at least one candidate item associated with the voice from the sub-searching space; and (E) showing the at least one candidate item. | 11-05-2009 |
20090276220 | MEASURING DOUBLE TALK PERFORMANCE - A system evaluates a hands free communication system. The system automatically selects a consonant-vowel-consonant (CVC), vowel-consonant-vowel (VCV), or other combination of sounds from an intelligent database. The selection is transmitted with another communication stream that temporally overlaps the selection. The quality of the communication system is evaluated through an automatic speech recognition engine. The evaluation occurs at a location remote from the transmitted selection. | 11-05-2009 |
20090292540 | SYSTEM AND METHOD FOR EXCERPT CREATION - A method including displaying content on a display of a device, receiving a speech input designating a segment of the content to be excerpted and transferring the excerpted content to a predetermined location for storage and retrieval. | 11-26-2009 |
20090292541 | METHODS AND APPARATUS FOR ENHANCING SPEECH ANALYTICS - Methods and apparatus for the enhancement of speech to text engines, by providing indications to the correctness of the found words, based on additional sources besides the internal indication provided by the STT engine. The enhanced indications comprise sources of data such as acoustic features, CTI features, phonetic search and others. The apparatus and methods also enable the detection of important or significant keywords found in audio files, thus enabling more efficient usages, such as further processing or transfer of interactions to relevant agents, escalation of issues, or the like. The methods and apparatus employ a training phase in which word model and key phrase model are generated for determining an enhanced correctness indication for a word and an enhanced importance indication for a key phrase, based on the additional features. | 11-26-2009 |
20090306983 | USER ACCESS AND UPDATE OF PERSONAL HEALTH RECORDS IN A COMPUTERIZED HEALTH DATA STORE VIA VOICE INPUTS - Systems and methods for enabling user access and update of personal health records stored in a health data store via voice inputs are provided. The system may include a computer program having a recognizer module configured to process structured word data of a user voice input received from a voice platform, to produce a set of tagged structured word data based on a healthcare-specific glossary. The computer program may further include a health data store interface configured to apply a rule set to the tagged structured word data to produce a query to the health data store and receive a response from the health data store based on the query, and a grammar generator configured to generate a reply sentence based on the response received from the health data store and pass the reply sentence to the voice platform to be played as a voice reply to the user. | 12-10-2009 |
20090319272 | METHOD AND SYSTEM FOR VOICE ORDERING UTILIZING PRODUCT INFORMATION - A method for voice ordering utilizing catalog taxonomies and hierarchical categorization relationships in product information management (PIM) systems includes: prompting a user with a query to input speech into a speech recognition engine; translating the inputted speech into a series of words; querying a product information management component (PIM) based on the series of words; wherein the querying is performed as a matching algorithm against PIM category and attribute keywords; returning coded results to a voice synthesizer to produce at least one of: a voice response, and a text response to the user; and wherein in the coded results indicate one or more of: a not found message for zero matches, a confirmation of a suitable single match, a request for additional information in the event one or more of the following occurs: more than one matching item, category, and item attribute was found in the PIM. | 12-24-2009 |
20100017210 | SYSTEM AND METHOD FOR SEARCHING STORED AUDIO DATA BASED ON A SEARCH PATTERN - A system for searching stored audio data is described. The system includes a memory configured to store audio data received from a radio receiver and a processing circuit. The processing circuit is configured to receive a search pattern, search the stored audio data for the search pattern, and provide audio data based on the search. | 01-21-2010 |
20100023330 | SPEED PODCASTING - Embodiments of the present invention address deficiencies of the art in respect to podcasting and provide a method, system and computer program product for speed podcasting. In an embodiment of the invention, a speed podcasting method can include speech recognizing an audio portion of a podcast, parsing the speech recognized audio portion to identify essential words, and playing back only audio segments and corresponding video segments of the podcast including the essential words while excluding from playback audio segments and corresponding video segments of the podcast including non-essential words. | 01-28-2010 |
20100036665 | GENERATING SPEECH-ENABLED USER INTERFACES - Methods, systems, and apparatus, including computer program products, for automatically creating a speech-based user interface involve identifying a software service definition that includes service inputs, service outputs, and context data and accessing a standard user interface incorporating the service input and output. The standard user interface defines a set of valid inputs for the service input and a set of available outputs, at least one of which based on the context data. Audio data is associated with at least some of the inputs in the set of valid inputs to define a set of valid speech inputs. A speech-based user interface is automatically generated from the standard user interface and the set of valid speech inputs. | 02-11-2010 |
20100036666 | METHOD AND SYSTEM FOR PROVIDING META DATA FOR A WORK - A method for providing meta data for a work includes designating a file for uploading data associated therewith to a telematics unit operatively connected to a vehicle; and using meta data associated with the designed file, obtaining phonetic meta data for the designed file from an on-line service. The method further includes creating a phonetic meta data file associated with the designed file and including the obtained phonetic meta data, and transferring the phonetic metal data file to the telematics unit. Also disclosed herein is a system for providing the same. | 02-11-2010 |
20100049516 | METHOD OF USING MICROPHONE CHARACTERISTICS TO OPTIMIZE SPEECH RECOGNITION PERFORMANCE - A system and method for tuning a speech recognition engine to an individual microphone using a database containing acoustical models for a plurality of microphones. Microphone performance characteristics are obtained from a microphone at a speech recognition engine, the database is searched for an acoustical model that matches the characteristics, and the speech recognition engine is then modified based on the matching acoustical model. | 02-25-2010 |
20100049517 | AUTOMATIC ANSWERING DEVICE, AUTOMATIC ANSWERING SYSTEM, CONVERSATION SCENARIO EDITING DEVICE, CONVERSATION SERVER, AND AUTOMATIC ANSWERING METHOD - An automatic answering device and an automatic answering method for automatically answering to a user utterance are configured: to prepare a conversation scenario that is a set of input sentences and replay sentences, the input sentences each corresponding to a user utterance assumed to be uttered by a user, the reply sentences each being an automatic reply to the inputted sentence; to accept a user utterance; to determine the reply sentence to the accepted user utterance on the basis of the conversation scenario; and to present the determined reply sentence to the user. Data of the conversation scenario have a data structure that enables the inputted sentences and the reply sentences to be expressed in a state transition diagram in which each of the inputted sentences is defined as a morphism and the reply sentence corresponding to the inputted sentence is defined as an object. | 02-25-2010 |
20100063818 | MULTI-TIERED VOICE FEEDBACK IN AN ELECTRONIC DEVICE - This invention is directed to providing voice feedback to a user of an electronic device. Because each electronic device display may include several speakable elements (i.e., elements for which voice feedback is provided), the elements may be ordered. To do so, the electronic device may associate a tier with the display of each speakable element. The electronic device may then provide voice feedback for displayed speakable elements based on the associated tier. To reduce the complexity in designing the voice feedback system, the voice feedback features may be integrated in a Model View Controller (MVC) design used for displaying content to a user. For example, the model and view of the MVC design may include additional variables associated with speakable properties. The electronic device may receive audio files for each speakable element using any suitable approach, including for example by providing a host device with a list of speakable elements and directing a text to speech engine of the host device to generate and provide the audio files. | 03-11-2010 |
20100063819 | LANGUAGE MODEL LEARNING SYSTEM, LANGUAGE MODEL LEARNING METHOD, AND LANGUAGE MODEL LEARNING PROGRAM - A language model learning system for learning a language model on an identifiable basis relating to a word error rate used in speech recognition. The language model learning system ( | 03-11-2010 |
20100063820 | CORRELATING VIDEO IMAGES OF LIP MOVEMENTS WITH AUDIO SIGNALS TO IMPROVE SPEECH RECOGNITION - A speech recognition device can include an audio signal receiver configured to receive audio signals from a speech source, a video signal receiver configured to receive video signals from the speech source, and a processing unit configured to process the audio signals and the video signals. In addition, the speech recognition device can include a conversion unit configured to convert the audio signals and the video signals to recognizable speech, and an implementation unit configured to implement a task based on the recognizable speech. | 03-11-2010 |
20100076764 | METHOD OF DIALING PHONE NUMBERS USING AN IN-VEHICLE SPEECH RECOGNITION SYSTEM - A method of dialing phone numbers using an in-vehicle speech recognition system includes receiving speech input at a vehicle, separating the speech input into a word segment and a digit segment, identifying the letters in a word segment, converting the letters in the word segment to digits, and operating an alphanumeric keypad based on the digit speech segment and the converted word segment. | 03-25-2010 |
20100088097 | USER FRIENDLY SPEAKER ADAPTATION FOR SPEECH RECOGNITION - Improved performance and user experience for speech recognition application and system by utilizing for example offline adaptation without tedious effort by a user. Interactions with a user may be in the form of a quiz, game, or other scenario wherein the user may implicitly provide vocal input for adaptation data. Queries with a plurality of candidate answers may be designed in an optimal and efficient way, and presented to the user, wherein detected speech from the user is then matched to one of the candidate answers, and may be used to adapt an acoustic model to the particular speaker for speech recognition. | 04-08-2010 |
20100100381 | System and Method for Automatic Verification of the Understandability of Speech - The present invention relates to a system and method for automatically verifying that a message received from a user is intelligible. In an exemplary embodiment, a message is received from the user. A speech level of the user's message may be measured and compared to a pre-determined speech level threshold to determine whether the measured speech level is below the pre-determined speech level threshold. A signal-to-noise ratio of the user's message may be measured and compared to a pre-determined signal-to-noise ratio threshold to determine whether the measured signal-to-noise ratio of the message is below the pre-determined signal-to-noise ratio threshold. An estimate of intelligibility for the user's message may be calculated and compared to an intelligibility threshold to determine whether the calculated estimate of intelligibility is below the intelligibility threshold. If any of the measured speech level, measured signal-to-noise ratio and calculated estimate of intelligibility of the user's message are determined to be below their respective thresholds, the user may be prompted to repeat at least a portion of the message. | 04-22-2010 |
20100106505 | USING WORD CONFIDENCE SCORE, INSERTION AND SUBSTITUTION THRESHOLDS FOR SELECTED WORDS IN SPEECH RECOGNITION - A method and system for improving the accuracy of a speech recognition system using word confidence score (WCS) processing is introduced. Parameters in a decoder are selected to minimize a weighted total error rate, such that deletion errors are weighted more heavily than substitution and insertion errors. The occurrence distribution in WCS is different depending on whether the word was correctly identified and based on the type of error. This is used to determine thresholds in WCS for insertion and substitution errors. By processing the hypothetical word (HYP) (output of the decoder), a mHYP (modified HYP) is determined. In some circumstances, depending on the WCS's value in relation to insertion and substitution threshold values, mHYP is set equal to: null, a substituted HYP, or HYP. | 04-29-2010 |
20100114574 | RETRIEVAL USING A GENERALIZED SENTENCE COLLOCATION - A method and system for identifying documents relevant to a query that specifies a part of speech is provided. A retrieval system receives from a user an input query that includes a word and a part of speech. Upon receiving an input query that includes a word and a part of speech, the retrieval system identifies documents with a sentence that includes that word collocated with a word that is used as that part of speech. The retrieval system displays to the user an indication of the identified documents. | 05-06-2010 |
20100114575 | System and Method for Extracting a Specific Situation From a Conversation - A system, method, and computer readable article of manufacture for extracting a specific situation in a conversation. The system includes: an acquisition unit for acquiring speech voice data of speakers in the conversation; a specific expression detection unit for detecting the speech voice data of a specific expression from speech voice data of a specific speaker in the conversation; and a specific situation extraction unit for extracting, from the speech voice data of the speakers in the conversation, a portion of the speech voice data that forms a speech pattern that includes the speech voice data of the specific expression detected by the specific expression detection unit. | 05-06-2010 |
20100125456 | System and Method for Recognizing Proper Names in Dialog Systems - Embodiments of a dialog system that utilizes contextual information to perform recognition of proper names are described. Unlike present name recognition methods on large name lists that generally focus strictly on the static aspect of the names, embodiments of the present system take into account of the temporal, recency and context effect when names are used, and formulates new questions to further constrain the search space or grammar for recognition of the past and current utterances. | 05-20-2010 |
20100153111 | INPUT DEVICE AND INPUT METHOD FOR MOBILE BODY - Provided is an input device for a mobile body, the input device allowing a safe input operation at the time of operating an equipment such as a car regardless of whether the mobile body is traveling or stopping. The input device includes: an input section ( | 06-17-2010 |
20100161333 | ADAPTIVE PERSONAL NAME GRAMMARS - In one embodiment, an adaptive personal name grammar improves speech recognition by limiting or weighting the scope of potential addressable names based upon meta-information relative to the communications patterns, environmental considerations, or sociological/professional hierarchy of a user to increase the likelihood of a positive match. | 06-24-2010 |
20100161334 | UTTERANCE VERIFICATION METHOD AND APPARATUS FOR ISOLATED WORD N-BEST RECOGNITION RESULT - An utterance verification method for an isolated word N-best speech recognition result includes: calculating log likelihoods of a context-dependent phoneme and an anti-phoneme model based on an N-best speech recognition result for an input utterance; measuring a confidence score of an N-best speech-recognized word using the log likelihoods; calculating distance between phonemes for the N-best speech-recognized word; comparing the confidence score with a threshold and the distance with a predetermined mean of distances; and accepting the N-best speech-recognized word when the compared results for the confidence score and the distance correspond to acceptance. | 06-24-2010 |
20100169095 | DATA PROCESSING APPARATUS, DATA PROCESSING METHOD, AND PROGRAM - A data processing apparatus includes a speech recognition unit configured to perform continuous speech recognition on speech data, a related word acquiring unit configured to acquire a word related to at least one word obtained through the continuous speech recognition as a related word that is related to content corresponding to content data including the speech data, and a speech retrieval unit configured to retrieve an utterance of the related word from the speech data so as to acquire the related word whose utterance has been retrieved as metadata for the content. | 07-01-2010 |
20100185445 | MACHINE, SYSTEM AND METHOD FOR USER-GUIDED TEACHING AND MODIFYING OF VOICE COMMANDS AND ACTIONS EXECUTED BY A CONVERSATIONAL LEARNING SYSTEM - A machine, system and method for user-guided teaching and modifications of voice commands and actions to be executed by a conversational learning system. The machine includes a system bus for communicating data and control signals received from the conversational learning system to a computer system, a vehicle data and control bus for connecting devices and sensors in the machine, a bridge module for connecting the vehicle data and control bus to the system bus, machine subsystems coupled to the vehicle data and control bus having a respective user interface for receiving a voice command or input signal from a user, a memory coupled to the system bus for storing action command sequences learned for a new voice command and a processing unit coupled to the system bus for automatically executing the action command sequences learned when the new voice command is spoken. | 07-22-2010 |
20100185446 | SPEECH RECOGNITION SYSTEM AND DATA UPDATING METHOD - It is provided a speech recognition system installed in a terminal coupled to a server via a network. The terminal holds map data including a landmark. The speech recognition system manages recognition data including a word corresponding to a name of the landmark, and sends update area information and updated time to the server. The server generates, when recognition data of the area of the update area information sent from the terminal has been changed after updated time, difference data between latest recognition data and recognition data of the update area information at a time of the updated time, and sends the generated difference data and map data of the update area information to the terminal. The terminal updates the map data based on the map data sent from the server. The speech recognition system updates the recognition data managed by the terminal based on the difference data. | 07-22-2010 |
20100217596 | WORD SPOTTING FALSE ALARM PHRASES - In one aspect, a method for processing media includes accepting a query. One or more language patterns are identified that are similar to the query. A putative instance of the query is located in the media. The putative instance is associated with a corresponding location in the media. The media in a vicinity of the putative instance is compared to the identified language patterns and data characterizing the putative instance of the query is provided according to the comparing of the media to the language patterns, for example, as a score for the putative instance that is determined according to the comparing of the media to the language patterns. | 08-26-2010 |
20100217597 | Systems and Methods for Monitoring Speech Data Labelers - Systems and methods for using an annotation guide to label utterances and speech data with a call type are disclosed. A method embodiment monitors labelers of speech data by presenting via a processor a test utterance to a labeler, receiving input from the labeler that selects a particular call type from a list of call types and determining via the processor if the labeler labeled the test utterance correctly. Based on the determining step, the method performs at least one of the following: revising the annotation guide, retraining the labeler or altering the test utterance. | 08-26-2010 |
20100228548 | TECHNIQUES FOR ENHANCED AUTOMATIC SPEECH RECOGNITION - Techniques for enhanced automatic speech recognition are described. An enhanced ASR system may be operative to generate an error correction function. The error correction function may represent a mapping between a supervised set of parameters and an unsupervised training set of parameters generated using a same set of acoustic training data, and apply the error correction function to an unsupervised testing set of parameters to form a corrected set of parameters used to perform speaker adaptation. Other embodiments are described and claimed. | 09-09-2010 |
20100286984 | METHOD FOR SPEECH ROCOGNITION - A method for the voice recognition of a spoken expression to be recognized, comprising a plurality of expression parts that are to be recognized. Partial voice recognition takes place on a first selected expression part, and depending on a selection of hits for the first expression part detected by the partial voice recognition, voice recognition on the first and further expression parts is executed. | 11-11-2010 |
20100318356 | APPLICATION OF USER-SPECIFIED TRANSFORMATIONS TO AUTOMATIC SPEECH RECOGNITION RESULTS - Textual transcription of speech is generated and formatted according to user-specified transformation and behavior requirements for a speech recognition system having input grammars and transformations. An apparatus may include a speech recognition platform configured to receive a user-specified transformation requirement, recognize speech in speech data into recognized speech according to a set of recognition grammars; and apply transformations to the recognized speech according to the user-specified transformation requirement. The apparatus may further be configured to receive a user-specified behavior requirement and transform the recognized speech according to the behavior requirement. Other embodiments are described and claimed. | 12-16-2010 |
20100318357 | VOICE CONTROL OF MULTIMEDIA CONTENT - Techniques are described for managing various types of content in various ways, such as based on voice commands or other voice-based control instructions provided by a user. In some situations, at least some of the content being managed includes content of a variety of types, such as music and other audio information, photos, images, non-television video information, videogames, Internet Web pages and other data, etc., which may be managed via the voice controls in a variety of ways, such as to allow a user to locate and identify content of potential interest, to schedule recordings of selected content, to manage previously recorded content (e.g., to play or delete the content), to control live television, etc. This abstract is provided to comply with rules requiring it, and is submitted with the intention that it will not be used to interpret or limit the scope or meaning of the claims. | 12-16-2010 |
20100324899 | VOICE RECOGNITION SYSTEM, VOICE RECOGNITION METHOD, AND VOICE RECOGNITION PROCESSING PROGRAM - A speech recognition system for rapidly performing recognition processing while maintaining quality of speech recognition in a speech recognition device, are provided. A speech recognition system includes a speech input device which inputs speech and displays a recognition result, and a speech recognition device which receives the speech from the speech input device, performs recognition processing, and sends back the speech to the speech input device. The speech input device includes a user dictionary section which stores words used for recognizing the input speech, and a reduced user dictionary creation unit which extracts words corresponding to the input speech from the user dictionary and creates a reduced user dictionary. The speech recognition device has a speech recognition unit which inputs the input speech and the reduced user dictionary from the speech input/output device and recognizes the input speech based on the reduced user dictionary and a system dictionary provided beforehand. | 12-23-2010 |
20100332229 | APPARATUS CONTROL BASED ON VISUAL LIP SHARE RECOGNITION - An information processing apparatus that includes an image acquisition unit to acquire a temporal sequence of frames of image data, a detecting unit to detect a lip area and a lip image from each of the frames of the image data, a recognition unit to recognize a word based on the detected lip images of the lip areas, and a controller to control an operation at the information processing apparatus based on the word recognized by the recognition unit. | 12-30-2010 |
20110004475 | METHODS AND APPARATUSES FOR AUTOMATIC SPEECH RECOGNITION - Exemplary embodiments of methods and apparatuses for automatic speech recognition are described. First model parameters associated with a first representation of an input signal are generated. The first representation of the input signal is a discrete parameter representation. Second model parameters associated with a second representation of the input signal are generated. The second representation of the input signal includes a continuous parameter representation of residuals of the input signal. The first representation of the input signal includes discrete parameters representing first portions of the input signal. The second representation includes discrete parameters representing second portions of the input signal that are smaller than the first portions. Third model parameters are generated to couple the first representation of the input signal with the second representation of the input signal. The first representation and the second representation of the input signal are mapped into a vector space. | 01-06-2011 |
20110029313 | METHODS AND SYSTEMS FOR ADAPTING A MODEL FOR A SPEECH RECOGNITION SYSTEM - Methods are disclosed for identifying possible errors made by a speech recognition system without using a transcript of words input to the system. A method for model adaptation for a speech recognition system includes determining an error rate, corresponding to either recognition of instances of a word or recognition of instances of various words, without using a transcript of words input to the system. The method may further include adjusting an adaptation, of the model for the word or various models for the various words, based on the error rate. Apparatus are disclosed for identifying possible errors made by a speech recognition system without using a transcript of words input to the system. An apparatus for model adaptation for a speech recognition system includes a processor adapted to estimate an error rate, corresponding to either recognition of instances of a word or recognition of instances of various words, without using a transcript of words input to the system. The apparatus may further include a controller adapted to adjust an adaptation of the model for the word or various models for the various words, based on the error rate. | 02-03-2011 |
20110040562 | WORD CLOUD AUDIO NAVIGATION - The present invention is directed generally to linking a collection of words and/or phrases with locations in a video and/or audio stream where the words and/or phrases occur and/or associations of a collection of words and/or phrases with a call history. | 02-17-2011 |
20110060589 | Multi-Purpose Contextual Control - A method and a system for activating functions including a first function and a second function, wherein the system is embedded in an apparatus, are disclosed. The system includes a control configured to be activated by a plurality of activation styles, wherein the control generates a signal indicative of a particular activation style from multiple activation styles; and controller configured to activate either the first function or the second function based on the particular activation style, wherein the first function is configured to be executed based only on the activation style, and wherein the second function is further configured to be executed based on a speech input. | 03-10-2011 |
20110066435 | IMAGE TRANSMITTING APPARATUS, IMAGE TRANSMITTING METHOD, AND IMAGE TRANSMITTING PROGRAM EMBODIED ON COMPUTER READABLE MEDIUM - An MFP includes an accepting portion to accept an image and a speech, a speech recognition portion to recognize the accepted speech, a display screen generating portion, in response to an event that a keyword included in a predetermined output setting is recognized by the speech recognition portion, to generate a display screen in accordance with the output setting, the display screen including at least one of the accepted image and an image of object data that is stored in advance independently from the accepted image, and a transmission control portion to transmit, in accordance with the output setting, the generated display screen to at least one of a plurality of PCs operated respectively by a plurality of users. | 03-17-2011 |
20110066436 | Speaker intent analysis system - A speaker intent analysis system and method for validating the truthfulness and intent of a plurality of participants' responses to questions. A computer stores, retrieves, and transmits a series of questions to be answered audibly by participants. The participants' answers are received by a data processor. The data processor analyzes and records the participants' speech parameters for determining the likelihood of dishonesty. In addition to analyzing participants' speech parameters for distinguishing stress or other abnormality, the processor may be equipped with voice recognition software to screen responses that while not dishonest, are indicative of possible malfeasance on the part of the participants. Once the responses are analyzed, the processor produces an output that is indicative of the participant's credibility. The output may be sent to proper parties and/or devices such as a web page, computer, e-mail, PDA, pager, database, report, etc. for appropriate action. | 03-17-2011 |
20110071833 | SPEECH RETRIEVAL APPARATUS AND SPEECH RETRIEVAL METHOD - Disclosed are a speech retrieval apparatus and a speech retrieval method for searching, in a speech database, for an audio file matching an input search term by using an acoustic model serialization code, a phonemic code, a sub-word unit, and a speech recognition result of speech. The speech retrieval apparatus comprises a first conversion device, a first division device, a first speech retrieval unit creation device, a second conversion device, a second division device, a second speech retrieval unit creation device, and a matching device. The speech retrieval method comprises a first conversion step, a first division step, a first speech retrieval unit creation step, a second conversion step, a second division step, a second speech retrieval unit creation step, and a matching step. | 03-24-2011 |
20110071834 | SYSTEM AND METHOD FOR IMPROVING TEXT INPUT IN A SHORTHAND-ON-KEYBOARD INTERFACE - A word pattern recognition system improves text input entered via a shorthand-on-keyboard interface. A core lexicon comprises commonly used words in a language; an extended lexicon comprises words not included in the core lexicon. The system only directly outputs words from the core lexicon. Candidate words from the extended lexicon can be outputted and simultaneously admitted to the core lexicon upon user selection. A concatenation module enables a user to input parts of a long word separately. A compound word module combines two common shorter words whose concatenation forms a long word. | 03-24-2011 |
20110093269 | METHOD AND SYSTEM FOR CONSIDERING INFORMATION ABOUT AN EXPECTED RESPONSE WHEN PERFORMING SPEECH RECOGNITION - A speech recognition system receives and analyzes speech input from a user in order to recognize and accept a response from the user. Under certain conditions, information about the response expected from the user may be available. In these situations, the available information about the expected response is used to modify the behavior of the speech recognition system by taking this information into account. The modified behavior of the speech recognition system comprises adjusting the rejection threshold when speech input matches the predetermined expected response. | 04-21-2011 |
20110125499 | SPEECH RECOGNITION - Systems, methods, and apparatus, including computer program products for accepting a predetermined vocabulary-dependent characterization of a set of audio signals, the predetermined characterization including an identification of putative occurrences of each of a plurality of vocabulary items in the set of audio signals, the plurality of vocabulary items included in the vocabulary; accepting a new vocabulary item not included in the vocabulary; accepting putative occurrences of the new vocabulary item in the set of audio signals; and generating, by an analysis engine of a speech processing system, an augmented characterization of the set of audio signals based on the identified putative occurrences of the new vocabulary item. | 05-26-2011 |
20110125500 | AUTOMATED DISTORTION CLASSIFICATION - A method of and system for automated distortion classification. The method includes steps of (a) receiving audio including a user speech signal and at least some distortion associated with the signal; (b) pre-processing the received audio to generate acoustic feature vectors; (c) decoding the generated acoustic feature vectors to produce a plurality of hypotheses for the distortion; and (d) post-processing the plurality of hypotheses to identify at least one distortion hypothesis of the plurality of hypotheses as the received distortion. The system can include one or more distortion models including distortion-related acoustic features representative of various types of distortion and used by a decoder to compare the acoustic feature vectors with the distortion-related acoustic features to produce the plurality of hypotheses for the distortion. | 05-26-2011 |
20110125501 | Method and device for automatic recognition of given keywords and/or terms within voice data - The present invention relates to a method of and a device ( | 05-26-2011 |
20110131046 | FEATURES FOR UTILIZATION IN SPEECH RECOGNITION - A computer-implemented speech recognition system described herein includes a receiver component that receives a plurality of detected units of an audio signal, wherein the audio signal comprises a speech utterance of an individual. A selector component selects a subset of the plurality of detected units that correspond to a particular time-span. A generator component generates at least one feature with respect to the particular time-span, wherein the at least one feature is one of an existence feature, an expectation feature, or an edit distance feature. Additionally, a statistical speech recognition model outputs at least one word that corresponds to the particular time-span based at least in part upon the at least one feature generated by the feature generator component. | 06-02-2011 |
20110144994 | Automatic Sound Level Control - In one or more embodiments, one or more methods and/or systems described can perform determining two or more words of a written language from first data, determining at least one of a noise level external to a mobile device and a location of the mobile device, determining a sound output level based on the at least one of the noise level external to the mobile device and the location of the mobile device, and generating sound data based on the two or more words of the written language and the sound output level. The first data can include, for example, portable document format data that can include first text and/or an image that can include second text. In one or more embodiments, the location can be determined by using at least one of a global positioning system receiver and a location of an access point communicating with the mobile device. | 06-16-2011 |
20110144995 | SYSTEM AND METHOD FOR TIGHTLY COUPLING AUTOMATIC SPEECH RECOGNITION AND SEARCH - Disclosed herein are systems, methods, and computer-readable storage media for performing a search. A system configured to practice the method first receives from an automatic speech recognition (ASR) system a word lattice based on speech query and receives indexed documents from an information repository. The system composes, based on the word lattice and the indexed documents, at least one triple including a query word, selected indexed document, and weight. The system generates an N-best path through the word lattice based on the at least one triple and re-ranks ASR output based on the N-best path. The system aggregates each weight across the query words to generate N-best listings and returns search results to the speech query based on the re-ranked ASR output and the N-best listings. The lattice can be a confusion network, the arc density of which can be adjusted for a desired performance level. | 06-16-2011 |
20110144996 | ANALYZING AND PROCESSING A VERBAL EXPRESSION CONTAINING MULTIPLE GOALS - Disclosed is a method for parsing a verbal expression received from a user to determine whether or not the expression contains a multiple-goal command. Specifically, known techniques are applied to extract terms from the verbal expression. The extracted terms are assigned to categories. If two or more terms are found in the parsed verbal expression that are in associated categories and that do not overlap one another temporally, then the confidence levels of these terms are compared. If the confidence levels are similar, then the terms may be parallel entries in the verbal expression and may represent multiple goals. If a multiple-goal command is found, then the command is either presented to the user for review and possible editing or is executed. If the parsed multiple-goal command is presented to the user for review, then the presentation can be made via any appropriate interface including voice and text interfaces. | 06-16-2011 |
20110153328 | OBSCENE CONTENT ANALYSIS APPARATUS AND METHOD BASED ON AUDIO DATA ANALYSIS - Provided is an obscene content analysis apparatus and method. The obscene content analysis apparatus includes a content input unit that receives content, an input data buffering unit that buffers the received content, wherein buffering is performed on content corresponding to a length of a previously set analysis section or a length longer than the analysis section, an obscenity analysis determining unit that determines whether or not the analysis section of audio data extracted from the buffered content is obscene by using a previously generated audio-based obscenity determining model and marks the analysis section with an obscenity mark when the analysis section is determined as obscene, a reproduction data buffering unit that accumulates and stores content in which obscenity has been determined by the obscenity analysis determining unit, and a content reproducing unit that reproduces the content while blocking the analysis section marked with the obscenity mark. | 06-23-2011 |
20110161082 | METHODS AND SYSTEMS FOR ASSESSING AND IMPROVING THE PERFORMANCE OF A SPEECH RECOGNITION SYSTEM - A method for assessing a performance of a speech recognition system may include determining a grade, corresponding to either recognition of instances of a word or recognition of instances of various words among a set of words, wherein the grade indicates a level of the performance of the system and the grade is based on a recognition rate and at least one recognition factor. An apparatus for assessing a performance of a speech recognition system may include a processor that determines a grade, corresponding to either recognition of instances of a word or recognition of instances of various words among a set of words, wherein the grade indicates a level of the performance of the system and wherein the grade is based on a recognition rate and at least one recognition factor. | 06-30-2011 |
20110161083 | METHODS AND SYSTEMS FOR ASSESSING AND IMPROVING THE PERFORMANCE OF A SPEECH RECOGNITION SYSTEM - A method for assessing a performance of a speech recognition system may include determining a grade, corresponding to either recognition of instances of a word or recognition of instances of various words among a set of words, wherein the grade indicates a level of the performance of the system and the grade is based on a recognition rate and at least one recognition factor. An apparatus for assessing a performance of a speech recognition system may include a processor that determines a grade, corresponding to either recognition of instances of a word or recognition of instances of various words among a set of words, wherein the grade indicates a level of the performance of the system and wherein the grade is based on a recognition rate and at least one recognition factor. | 06-30-2011 |
20110178802 | APPARATUS FOR CLASSIFYING OR DISAMBIGUATING DATA - A computing system has a data storage device ( | 07-21-2011 |
20110191105 | Systems and Methods for Word Offensiveness Detection and Processing Using Weighted Dictionaries and Normalization - Computer-implemented systems and methods are provided for identifying language that would be considered obscene or otherwise offensive to a user or proprietor of a system. A plurality of offensive words are received, where each offensive word is associated with a severity score identifying the offensiveness of that word. A string of words is received. A distance between a candidate word and each offensive word in the plurality of offensive words is calculated, and a plurality of offensiveness scores for the candidate word are calculated, each offensiveness score based on the calculated distance between the candidate word and the offensive word and the severity score of the offensive word. A determination is made as to whether the candidate word is an offender word, where the candidate word is deemed to be an offender word when the highest offensiveness score in the plurality of offensiveness scores exceeds an offensiveness threshold value. | 08-04-2011 |
20110191106 | WORD RECOGNITION SYSTEM AND METHOD FOR CUSTOMER AND EMPLOYEE ASSESSMENT - One-to-many comparisons of callers' words and/or voice prints with known words and/or voice prints to identify any substantial matches between them. When a customer communicates with a particular entity, such as a customer service center, the system makes a recording of the real-time call including both the customer's and agent's voices. The system segments the recording to extract different words, such as words of anger. The system may also segment at least a portion of the customer's voice to create a tone profile, and it formats the segmented words and tone profiles for network transmission to a server. The server compares the customer's words and/or tone profiles with multiple known words and/or tone profiles stored on a database to determine any substantial matches. The identification of any matches may be used for a variety of purposes, such as providing representative feedback or customer follow-up. | 08-04-2011 |
20110196678 | SPEECH RECOGNITION APPARATUS AND SPEECH RECOGNITION METHOD - A distance calculation unit ( | 08-11-2011 |
20110202342 | MULTI-MODAL WEB INTERACTION OVER WIRELESS NETWORK - A system, apparatus, and method is disclosed for receiving user input at a client device, interpreting the user input to identify a selection of at least one of a plurality of web interaction modes, producing a corresponding client request based in part on the user input and the web interaction mode; and sending the client request to a server via a network. | 08-18-2011 |
20110208526 | METHOD FOR VARIABLE RESOLUTION AND ERROR CONTROL IN SPOKEN LANGUAGE UNDERSTANDING - A method for variable resolution and error control in spoken language understanding (SLU) allows arranging the categories of the SLU into a hierarchy of different levels of specificity. The pre-determined hierarchy is used to identify different types of errors such as high-cost errors and low-cost errors and trade, if necessary, high cost errors for low cost errors. | 08-25-2011 |
20110218805 | SPOKEN TERM DETECTION APPARATUS, METHOD, PROGRAM, AND STORAGE MEDIUM - A spoken term detection apparatus includes: processing performed by a processor includes a feature extraction process extracting an acoustic feature from speech data accumulated in an accumulation part and storing an extracted acoustic feature in an acoustic feature storage, a first calculation process calculating a standard score from a similarity between an acoustic feature stored in the acoustic feature storage and an acoustic model stored in the acoustic model storage part, a second calculation process for comparing an acoustic model corresponding to an input keyword with the acoustic feature stored in the acoustic feature storage part to calculate a score of the keyword, and a retrieval process retrieving speech data including the keyword from speech data accumulated in the accumulation part based on the score of the keyword calculated by the second calculation process and the standard score stored in the standard score storage part. | 09-08-2011 |
20110218806 | DETERMINING TEXT TO SPEECH PRONUNCIATION BASED ON AN UTTERANCE FROM A USER - Systems and methods are provided for automatically building a native phonetic lexicon for a speech-based application trained to process a native (base) language, wherein the native phonetic lexicon includes native phonetic transcriptions (base forms) for non-native (foreign) words which are automatically derived from non-native phonetic transcriptions of the non-native words. | 09-08-2011 |
20110218807 | Method for Automated Sentence Planning in a Task Classification System - The invention relates to a method for sentence planning ( | 09-08-2011 |
20110238419 | BINAURAL METHOD AND BINAURAL CONFIGURATION FOR VOICE CONTROL OF HEARING DEVICES - A binaural configuration and an associated method have/utilize first and second hearing devices for the voice control of the hearing devices by voice commands. The configuration contains a first voice recognition module in the first hearing device and a second voice recognition module in the second hearing device. The second voice recognition module uses information data from the first voice recognition module for recognition of the voice commands. It is here advantageous that the rate of erroneously recognized voice commands (“false alarms”) is reduced. | 09-29-2011 |
20110270612 | Computer-Implemented Systems and Methods for Estimating Word Accuracy for Automatic Speech Recognition - Systems and methods are provided for scoring non-native, spontaneous speech. A spontaneous speech sample is received, where the sample is of spontaneous speech spoken by a non-native speaker. Automatic speech recognition is performed on the sample using an automatic speech recognition system to generate a transcript of the sample, where a speech recognizer metric is determined by the automatic speech recognition system. A word accuracy rate estimate is determined for the transcript of the sample generated by the automatic speech recognition system based on the speech recognizer metric. The spontaneous speech sample is scored using a preferred scoring model when the word accuracy rate estimate satisfies a threshold, and the spontaneous speech sample is scored using an alternate scoring model when the word accuracy rate estimate fails to satisfy the threshold. | 11-03-2011 |
20110288867 | NAMETAG CONFUSABILITY DETERMINATION - A method of and system for managing nametags including receiving a command from a user to store a nametag, prompting the user to input a number to be stored in association with the nametag, receiving an input for the number from the user, prompting the user to input the nametag to be stored in association with the number, receiving an input for the nametag from the user, processing the nametag input, and calculating confusability of the nametag input in multiple individual domains including a nametag domain, a number domain, and a command domain. | 11-24-2011 |
20110288868 | DISAMBIGUATION OF CONTACT INFORMATION USING HISTORICAL DATA - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for disambiguating contact information. A method includes receiving an audio signal, generating an affinity score based on a frequency with which a user has previously communicated with a contact associated with an item of contact information, and further based on a recency of one or more past interactions between the user and the contact associated with the item of contact information, inferring a probability that the user intends to initiate a communication using the item of contact information based on the affinity score generated for the item of contact information, and generating a communication initiation grammar. | 11-24-2011 |
20110295605 | SPEECH RECOGNITION SYSTEM AND METHOD WITH ADJUSTABLE MEMORY USAGE - This speech recognition system provides a function that is capable of adjusting memory usage according to the different target resources. It extracts a sequence of feature vectors from input speech signal. A module for constructing search space reads a text file and generates a word-level search space in an off-line phase. After removing redundancy, the word-level search space is expanded to a phone-level one and is represented by a tree-structure. This may be performed by combining the information from dictionary which gives the mapping from a word to its phonetic sequence(s). In the online phase, a decoder traverses the search space, takes the dictionary and at least one acoustic model as input, computes score of feature vectors and outputs decoding result. | 12-01-2011 |
20110301955 | Predicting and Learning Carrier Phrases for Speech Input - Predicting and learning users' intended actions on an electronic device based on free-form speech input. Users' actions can be monitored to develop of a list of carrier phrases having one or more actions that correspond to the carrier phrases. A user can speak a command into a device to initiate an action. The spoken command can be parsed and compared to a list of carrier phrases. If the spoken command matches one of the known carrier phrases, the corresponding action(s) can be presented to the user for selection. If the spoken command does not match one of the known carrier phrases, search results (e.g., Internet search results) corresponding to the spoken command can be presented to the user. The actions of the user in response to the presented action(s) and/or the search results can be monitored to update the list of carrier phrases. | 12-08-2011 |
20110307257 | METHODS AND APPARATUS FOR REAL-TIME INTERACTION ANALYSIS IN CALL CENTERS - A method and system for indicating in real time that an interaction is associated with a problem or issue, comprising: receiving a segment of an interaction in which a representative of the organization participates; extracting a feature from the segment; extracting a global feature associated with the interaction; aggregating the feature and the global feature; and classifying the segment or the interaction in association with the problem or issue by applying a model to the feature and the global feature. The method and system may also use features extracted from earlier segments within the interaction. The method and system can also evaluate the model based on features extracted from training interactions and manual tagging assigned to the interactions or segments thereof. | 12-15-2011 |
20110307258 | REAL-TIME APPLICATION OF INTERACTION ANLYTICS - A method and apparatus for providing real-time assistance related to an interaction associated with a contact center, comprising steps or components for: receiving at least a part of an audio signal of an interaction captured by a capturing device associated with an organization, and metadata information associated with the interaction; performing audio analysis of the at least part of the audio signal, while the interaction is still in progress to obtain audio information; categorizing at least a part of the metadata information and the audio information, to determine a category associated with the interaction, while the interaction is still in progress to obtain audio information; and taking an action associated with the category. | 12-15-2011 |
20110313767 | SYSTEM AND METHOD FOR DATA INTENSIVE LOCAL INFERENCE - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating an accent source. A system practicing the method collects data associated with customer specific services, generates country-specific or dialect-specific weights for each service in the customer specific services list, generates a summary weight based on an aggregation of the country-specific or dialect-specific weights, and sets an interactive voice response system language model based on the summary weight and the country-specific or dialect-specific weights. The interactive voice response system can also change the user interface based on the interactive voice response system language model. The interactive voice response system can tune a voice recognition algorithm based on the summary weight and the country-specific weights. The interactive voice response system can adjust phoneme matching in the language model based on a possibility that the speaker is using other languages. | 12-22-2011 |
20110313768 | COMPOUND GESTURE-SPEECH COMMANDS - A multimedia entertainment system combines both gestures and voice commands to provide an enhanced control scheme. A user's body position or motion may be recognized as a gesture, and may be used to provide context to recognize user generated sounds, such as speech input. Likewise, speech input may be recognized as a voice command, and may be used to provide context to recognize a body position or motion as a gesture. Weights may be assigned to the inputs to facilitate processing. When a gesture is recognized, a limited set of voice commands associated with the recognized gesture are loaded for use. Further, additional sets of voice commands may be structured in a hierarchical manner such that speaking a voice command from one set of voice commands leads to the system loading a next set of voice commands. | 12-22-2011 |
20110320201 | SOUND VERIFICATION SYSTEM USING TEMPLATES - An audio signal verification system is presented for verifying the sound is from a predetermined source. Various methods for analyzing the sound are presented and the various methods may be combined to vary degrees to determine an appropriate correlation with a predefined pattern. Moreover a confidence level or other indication may be used to indicate the determination was successful. The sound may be reduced to templates with varying degrees of richness. Also different templates may be created using the same sound source and different sounds from the same source may be aggregated to form a single template. Comparisons may be made comparing a sound or a template derived from that sound with stored sounds or templates derived from that stored sound. Moreover comparisons can be made using templates of different richness to achieve confidence levels and confidence levels may be represented based on the results of the comparisons. | 12-29-2011 |
20110320202 | LOCATION VERIFICATION SYSTEM USING SOUND TEMPLATES - A system using sound templates is presented that may receive a first template for an audio signal and compares it to templates from different sound sources to determine a correlation between them. A location history database is created that assists in identifying the location of a user in response to audio templates generated by the user over time and at different locations. Comparisons can be made using templates of different richness to achieve confidence levels and confidence levels may be represented based on the results of the comparisons. Queries may be run against the database to track users by templates generated from their voice. In addition, background information may be filtered out of the voice signal and separately compared against the database to assist in identifying a location based on the background noise. | 12-29-2011 |
20120022871 | SPEECH RECOGNITION CIRCUIT USING PARALLEL PROCESSORS - A speech recognition circuit comprises a memory containing lexical data for word recognition, the lexical data comprising a plurality of lexical data structures stored in each of a plurality of parts of the memory; and a parallel processor structure connected to the memory to process speech parameters by performing parallel processing on a plurality of the lexical data structures. | 01-26-2012 |
20120029919 | USING LINGUISTICALLY-AWARE VARIABLES IN COMPUTER-GENERATED TEXT - One embodiment of the present invention provides a system for placing linguistically-aware variables in computer-generated text. During operation, the system receives a sentence at a computer system, wherein the sentence comprises two or more words. Next, the system analyzes the sentence to identify a first variable, wherein the first variable is a place-holder for a first word. The system then receives the first word. After that, the system automatically determines a gender of the first word. Next, the system analyzes the sentence to identify a first dependent word that is dependent on the first word, wherein a spelling of the first dependent word is dependent on the gender of the first word. The system then determines the spelling of the first dependent word that corresponds to the gender of the first word. Next, the system replaces the first variable in the sentence with the first word. If necessary, the system modifies the spelling of the first dependent word in the sentence to match the gender of the first word. Finally, the system outputs the sentence. | 02-02-2012 |
20120035930 | Keyword Alerting in Conference Calls - A conferencing system is disclosed in which a participant to a conference call can program the embodiment to listen for one or more “keywords” in the conference call. The keywords might be a participant's name or words associated with him or her or words associated with his or her area of knowledge. The embodiments uses speech recognition technology to listen for those words. When the embodiments detects that those words have been spoken, the embodiment alerts the participant—using audible, visual, and/or tactile signals—that the participant's attention to the call is warranted. When the keywords are chosen wisely, the benefit can be great. | 02-09-2012 |
20120035931 | Automatically Monitoring for Voice Input Based on Context - In one implementation, a computer-implemented method includes detecting a current context associated with a mobile computing device and determining, based on the current context, whether to switch the mobile computing device from a current mode of operation to a second mode of operation during which the mobile computing device monitors ambient sounds for voice input that indicates a request to perform an operation. The method can further include, in response to determining whether to switch to the second mode of operation, activating one or more microphones and a speech analysis subsystem associated with the mobile computing device so that the mobile computing device receives a stream of audio data. The method can also include providing output on the mobile computing device that is responsive to voice input that is detected in the stream of audio data and that indicates a request to perform an operation. | 02-09-2012 |
20120059655 | METHODS AND APPARATUS FOR PROVIDING INPUT TO A SPEECH-ENABLED APPLICATION PROGRAM - Some embodiments are directed to allowing a user to provide speech input intended for a speech-enabled application program into a mobile communications device, such as a smartphone, that is not connected to the computer that executes the speech-enabled application program. The mobile communications device may provide the user's speech input as audio data to a broker application executing on a server, which determines to which computer the received audio data is to be provided. When the broker application determines the computer to which the audio data is to be provided, it sends the audio data to that computer. In some embodiments, automated speech recognition may be performed on the audio data before it is provided to the computer. In such embodiments, instead of providing the audio data, the broker application may send the recognition result generated from performing automated speech recognition to the identified computer. | 03-08-2012 |
20120072219 | SYSTEM AND METHOD FOR ENHANCING VOICE-ENABLED SEARCH BASED ON AUTOMATED DEMOGRAPHIC IDENTIFICATION - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating responses to a user speech query in voice-enabled search based on metadata that include demographic features of the speaker. A system practicing the method recognizes received speech from a speaker to generate recognized speech, identifies metadata about the speaker from the received speech, and feeds the recognized speech and the metadata to a question-answering engine. Identifying the metadata about the speaker is based on voice characteristics of the received speech. The demographic features can include age, gender, socio-economic group, nationality, and/or region. The metadata identified about the speaker from the received speech can be combined with or override self-reported speaker demographic information. | 03-22-2012 |
20120072220 | Matching text sets - Matching text sets is disclosed, including: extracting a text set from data associated with a current period; storing the text set with a plurality of text sets; extracting a keyword from the text set; determining a weight value associated with the keyword associated with the text set; determining a degree of similarity between the text set and another text set based at least in part on a weight value associated with the keyword associated with the text set and a weight value associated with a keyword associated with the other text set; and determining whether the text set is related to the other text set based at least in part on the determined degree of similarity. | 03-22-2012 |
20120072221 | DISTRIBUTED VOICE USER INTERFACE - A distributed voice user interface system includes a local device which receives speech input issued from a user. Such speech input may specify a command or a request by the user. The local device performs preliminary processing of the speech input and determines whether it is able to respond to the command or request by itself. If not, the local device initiates communication with a remote system for further processing of the speech input. | 03-22-2012 |
20120089397 | LANGUAGE MODEL GENERATING DEVICE, METHOD THEREOF, AND RECORDING MEDIUM STORING PROGRAM THEREOF - A text in a corpus including a set of world wide web (web) pages is analyzed. At least one word appropriate for a document type set according to a voice recognition target is extracted based on an analysis result. A word set is generated from the extracted at least one word. A retrieval engine is caused to perform a retrieval process using the generated word set as a retrieval query of the retrieval engine on the Internet, and a link to a web page from the retrieval result is acquired. A language model for voice recognition is generated from the acquired web page. | 04-12-2012 |
20120095765 | AUTOMATICALLY PROVIDING A USER WITH SUBSTITUTES FOR POTENTIALLY AMBIGUOUS USER-DEFINED SPEECH COMMANDS - A method for alleviating ambiguity issues of new user-defined speech commands. An original command for a user-defined speech command can be received. It can then be determined if the original command is likely to be confused with a set of existing speech commands. When confusion is unlikely, the original command can be automatically stored. When confusion is likely, a substitute command that is unlikely to be confused with existing commands can be automatically determined. The substitute can be presented as an alternative to the original command and can be selectively stored as the user-defined speech command. | 04-19-2012 |
20120109652 | Leveraging Interaction Context to Improve Recognition Confidence Scores - On a computing device a speech utterance is received from a user. The speech utterance is a section of a speech dialog that includes a plurality of speech utterances. One or more features from the speech utterance are identified. Each identified feature from the speech utterance is a specific characteristic of the speech utterance. One or more features from the speech dialog are identified. Each identified feature from the speech dialog is associated with one or more events in the speech dialog. The one or more events occur prior to the speech utterance. One or more identified features from the speech utterance and one or more identified features from the speech dialog are used to calculate a confidence score for the speech utterance. | 05-03-2012 |
20120116764 | Speech recognition method on sentences in all languages - A speech recognition method on all sentences in all languages is provided. A sentence can be a word, name or sentence. All sentences are represented by E×P=12×12 matrices of linear predict coding cepstra (LPCC) 1000 different voices are transformed into 1000 matrices of LPCC to represent 1000 databases. E×P matrices of known sentences after deletion of time intervals between two words are put into their closest databases. To classify an unknown sentence, use the distance to find its F closest databases and then from known sentences in its F databases, find a known sentence to be the unknown one. The invention needs no samples and can find a sentence in one second using Visual Basic. Any person without training can immediately and freely communicate with computer in any language. It can recognize up to 7200 English words, 500 sentences of any language and 500 Chinese words. | 05-10-2012 |
20120116765 | SPEECH PROCESSING DEVICE, METHOD, AND STORAGE MEDIUM - A speech recognition unit ( | 05-10-2012 |
20120143609 | METHOD AND SYSTEM FOR PROVIDING SPEECH RECOGNITION - An approach for providing speech recognition is disclosed. A name is retrieved from a user based on data provided by the user. The user is prompted for a name of the user. A first audio input is received from the user in response to the prompt. Speech recognition is applied to the first audio input using a name grammar database to output a recognized name. A determination is made whether the recognized name matches the retrieved name. If no match is determined, the user is re-prompted for the name of the user for a second audio input. Speech recognition is applied to the second audio input using a confidence database having entries less than the name grammar database. | 06-07-2012 |
20120185252 | CONFIDENCE MEASURE GENERATION FOR SPEECH RELATED SEARCHING - A method of generating a confidence measure generator is provided for use in a voice search system, the voice search system including voice search components comprising a speech recognition system, a dialog manager and a search system. The method includes selecting voice search features, from a plurality of the voice search components, to be considered by the confidence measure generator in generating a voice search confidence measure. The method includes training a model, using a computer processor, to generate the voice search confidence measure based on selected voice search features. | 07-19-2012 |
20120209610 | VOICED PROGRAMMING SYSTEM AND METHOD - Provided herein are systems and methods for using context-sensitive speech recognition logic in a computer to create a software program, including context-aware voice entry of instructions that make up a software program, automatic context-sensitive instruction formatting, and automatic context-sensitive insertion-point positioning. | 08-16-2012 |
20120215538 | Performance measurement for customer contact centers - In one embodiment, a method includes identifying a first communication from a customer, identifying a second communication from the customer following a response to the first communication from a contact center, and analyzing the first and second communications at a contact center network device to determine a change in sentiment from the first communication to the second communication. An apparatus for contact center performance measurement is also disclosed. | 08-23-2012 |
20120239402 | SPEECH RECOGNITION DEVICE AND METHOD - A speech recognition device includes, a speech recognition section that conducts a search, by speech recognition, on audio data stored in a first memory section to extract word-spoken portions where plural words transferred are each spoken and, of the word-spoken portions extracted, rejects the word-spoken portion for the word designated as a rejecting object; an acquisition section that obtains a derived word of a designated search target word, the derived word being generated in accordance with a derived word generation rule stored in a second memory section or read out from the second memory section; a transfer section that transfers the derived word and the search target word to the speech recognition section, the derived word being and set to the outputting object or the rejecting object by the acquisition section; and an output section that outputs the word-spoken portion extracted and not rejected in the search. | 09-20-2012 |
20120271634 | Context Based Voice Activity Detection Sensitivity - A speech dialog system is described that adjusts a voice activity detection threshold during a speech dialog prompt C to reflect a context-based probability of user barge in speech occurring. For example, the context-based probability may be based on the location of one or more transition relevance places in the speech dialog prompt. | 10-25-2012 |
20120278078 | INPUT AND DISPLAYED INFORMATION DEFINITION BASED ON AUTOMATIC SPEECH RECOGNITION DURING A COMMUNICATION SESSION - Methods and systems for providing contextually relevant information to a user are provided. In particular, a user context is determined. The determination of the user context can be made from information stored on or entered in a user device. The determined user context is provided to an automatic speech recognition (ASR) engine as a watch list. A voice stream is monitored by the ASR engine. In response to the detection of a word on the watch list by the ASR engine, the context engine is notified. The context engine then modifies a display presented to the user, to provide a selectable item that the user can select to access relevant information. | 11-01-2012 |
20120290303 | SPEECH RECOGNITION SYSTEM AND METHOD BASED ON WORD-LEVEL CANDIDATE GENERATION - A speech recognition system and method based on word-level candidate generation are provided. The speech recognition system may include a speech recognition result verifying unit to verify a word sequence and a candidate word for at least one word included in the word sequence when the word sequence and the candidate word are provided as a result of speech recognition. A word sequence displaying unit may display the word sequence in which the at least one word is visually distinguishable from other words of the word sequence. The word sequence displaying unit may display the word sequence by replacing the at least one word with the candidate word when the at least one word is selected by a user. | 11-15-2012 |
20120296652 | OBTAINING INFORMATION ON AUDIO VIDEO PROGRAM USING VOICE RECOGNITION OF SOUNDTRACK - A method for obtaining information on an audio video program being presented on a consumer electronics (CE) device includes receiving at the CE device a viewer command to recognize the audio video program being presented on the CE device. The method also includes receiving signals from a microphone representative of audio from the audio video program as sensed by the microphone as the audio is played real time on the CE device. The method then includes executing voice recognition on the signals from the microphone to determine words in the audio from the audio video program as sensed by the microphone. Words are then uploaded to an Internet server, where they are correlated to at least one audio video script. The method then includes receiving back from the Internet server information correlated by the server using the words to the audio video program. | 11-22-2012 |
20120316877 | DYNAMICALLY ADDING PERSONALIZATION FEATURES TO LANGUAGE MODELS FOR VOICE SEARCH - A dynamic exponential, feature-based, language model is continually adjusted per utterance by a user, based on the user's usage history. This adjustment of the model is done incrementally per user, over a large number of users, each with a unique history. The user history can include previously recognized utterances, text queries, and other user inputs. The history data for a user is processed to derive features. These features are then added into the language model dynamically for that user. | 12-13-2012 |
20120316878 | VOICE RECOGNITION GRAMMAR SELECTION BASED ON CONTEXT - The subject matter of this specification can be embodied in, among other things, a method that includes receiving geographical information derived from a non-verbal user action associated with a first computing device. The non-verbal user action implies an interest of a user in a geographic location. The method also includes identifying a grammar associated with the geographic location using the derived geographical information and outputting a grammar indicator for use in selecting the identified grammar for voice recognition processing of vocal input from the user. | 12-13-2012 |
20120323576 | AUTOMATED ADVERSE DRUG EVENT ALERTS - Event audio data that is based on verbal utterances associated with a pharmaceutical event associated with a patient may be received. Medical history information associated with the patient may be obtained, based on information included in a medical history repository. At least one text string that matches at least one interpretation of the event audio data may be obtained, based on information included in a pharmaceutical speech repository, information included in a speech accent repository, and a drug matching function, the at least one text string being associated with a pharmaceutical drug. One or more adverse drug event (ADE) alerts may be determined based on matching the at least one text string and medical history attributes associated with the at least one patient with ADE attributes obtained from an ADE repository. An ADE alert report may be generated, based on the determined one or more ADE alerts. | 12-20-2012 |
20130006637 | HIERARCHICAL METHODS AND APPARATUS FOR EXTRACTING USER INTENT FROM SPOKEN UTTERANCES - Improved techniques are disclosed for permitting a user to employ more human-based grammar (i.e., free form or conversational input) while addressing a target system via a voice system. For example, a technique for determining intent associated with a spoken utterance of a user comprises the following steps/operations. Decoded speech uttered by the user is obtained. An intent is then extracted from the decoded speech uttered by the user. The intent is extracted in an iterative manner such that a first class is determined after a first iteration and a sub-class of the first class is determined after a second iteration. The first class and the sub-class of the first class are hierarchically indicative of the intent of the user, e.g., a target and data that may be associated with the target. | 01-03-2013 |
20130006638 | Electronic Devices with Voice Command and Contextual Data Processing Capabilities - An electronic device may capture a voice command from a user. The electronic device may store contextual information about the state of the electronic device when the voice command is received. The electronic device may transmit the voice command and the contextual information to computing equipment such as a desktop computer or a remote server. The computing equipment may perform a speech recognition operation on the voice command and may process the contextual information. The computing equipment may respond to the voice command. The computing equipment may also transmit information to the electronic device that allows the electronic device to respond to the voice command. | 01-03-2013 |
20130006639 | SYSTEM AND METHOD FOR IMPROVING TEXT INPUT IN A SHORTHAND-ON-KEYBOARD INTERFACE - A word pattern recognition system improves text input entered via a shorthand-on-keyboard interface. A core lexicon comprises commonly used words in a language; an extended lexicon comprises words not included in the core lexicon. The system only directly outputs words from the core lexicon. Candidate words from the extended lexicon can be outputted and simultaneously admitted to the core lexicon upon user selection. A concatenation module enables a user to input parts of a long word separately. A compound word module combines two common shorter words whose concatenation forms a long word. | 01-03-2013 |
20130013310 | SPEECH RECOGNITION SYSTEM - A speech recognition system comprising a recognition dictionary for use in speech recognition and a controller configured to recognize an inputted speech by using the recognition dictionary is disclosed. The controller detects a speech section based on a signal level of the inputted speech, recognizes a speech data corresponding to the speech section by using the recognition dictionary, and displays a recognition result of the recognition process and a correspondence item that corresponds to the recognition result in form of list. The correspondence item displayed in form of list is manually operable. | 01-10-2013 |
20130035938 | APPARATUS AND METHOD FOR RECOGNIZING VOICE - The present invention includes a hierarchical search process. The hierarchical search process includes three steps. In a first step, a word boundary is determined using a recognition method of determining a following word dependent on a preceding word, and a word boundary detector. In a second step, word unit based recognition is performed in each area by dividing an input voice into a plurality of areas based on the determined word boundary. Finally, in a third step, a language model is applied to induce an optimal sentence recognition result with respect to a candidate word that is determined for each area. The present invention may improve the voice recognition performance, and particularly, the sentence unit based consecutive voice recognition performance. | 02-07-2013 |
20130041667 | MULTIMODAL DISAMBIGUATION OF SPEECH RECOGNITION - The present invention provides a speech recognition system combined with one or more alternate input modalities to ensure efficient and accurate text input. The speech recognition system achieves less than perfect accuracy due to limited processing power, environmental noise, and/or natural variations in speaking style. The alternate input modalities use disambiguation or recognition engines to compensate for reduced keyboards, sloppy input, and/or natural variations in writing style. The ambiguity remaining in the speech recognition process is mostly orthogonal to the ambiguity inherent in the alternate input modality, such that the combination of the two modalities resolves the recognition errors efficiently and accurately. The invention is especially well suited for mobile devices with limited space for keyboards or touch-screen input. | 02-14-2013 |
20130060570 | SYSTEM AND METHOD FOR ADVANCED TURN-TAKING FOR INTERACTIVE SPOKEN DIALOG SYSTEMS - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for advanced turn-taking in an interactive spoken dialog system. A system configured according to this disclosure can incrementally process speech prior to completion of the speech utterance, and can communicate partial speech recognition results upon finding particular conditions. A first condition which, if found, allows the system to communicate partial speech recognition results, is that the most recent word found in the partial results is statistically likely to be the termination of the utterance, also known as a terminal node. A second condition is the determination that all search paths within a speech lattice converge to a common node, also known as a pinch node, before branching out again. Upon finding either condition, the system can communicate the partial speech recognition results. Stability and correctness probabilities can also determine which partial results are communicated. | 03-07-2013 |
20130060571 | INTEGRATED LOCAL AND CLOUD BASED SPEECH RECOGNITION - A system for integrating local speech recognition with cloud-based speech recognition in order to provide an efficient natural user interface is described. In some embodiments, a computing device determines a direction associated with a particular person within an environment and generates an audio recording associated with the direction. The computing device then performs local speech recognition on the audio recording in order to detect a first utterance spoken by the particular person and to detect one or more keywords within the first utterance. The first utterance may be detected by applying voice activity detection techniques to the audio recording. The first utterance and the one or more keywords are subsequently transferred to a server which may identify speech sounds within the first utterance associated with the one or more keywords and adapt one or more speech recognition techniques based on the identified speech sounds. | 03-07-2013 |
20130080171 | BACKGROUND SPEECH RECOGNITION ASSISTANT - In one embodiment, a method receives an acoustic input signal at a speech recognizer configured to recognize the acoustic input signal in an always on mode. A set of responses based on the recognized acoustic input signal is determined and ranked based on criteria. A computing device determines if the response should be output based on a ranking of the response. The method determines an output method in a plurality of output methods based on the ranking of the response and outputs the response using the output method if it is determined the response should be output. | 03-28-2013 |
20130096918 | RECOGNIZING DEVICE, COMPUTER-READABLE RECORDING MEDIUM, RECOGNIZING METHOD, GENERATING DEVICE, AND GENERATING METHOD - A recognizing device includes a memory and a processor coupled to the memory. The memory stores words included in a sentence and positional information indicating a position of the words in the sentence. The processor executes a process including comparing an input voice signal with reading information of a character string that connects a plurality of words stored in the memory to calculate a similarity; calculating a connection score indicating a proximity between the plurality of connected words based on positional information of the words stored in the memory; and determining a character string corresponding to the voice signal based on the similarity and the connection score. | 04-18-2013 |
20130166302 | Methods and Apparatus for Audio Input for Customization of Digital Displays - Aspects of customizing digital signage are addressed. For example, an audio feed may be analyzed for keywords occurring in potential customers' speech. These keywords are then employed to customize display screens of a digital display. | 06-27-2013 |
20130173269 | METHODS, APPARATUSES AND COMPUTER PROGRAM PRODUCTS FOR JOINT USE OF SPEECH AND TEXT-BASED FEATURES FOR SENTIMENT DETECTION - An apparatus for generating a review based in part on detected sentiment may include a processor and memory storing executable computer code causing the apparatus to at least perform operations including determining a location(s) of the apparatus and a time(s) that the location(s) was determined responsive to capturing voice data of speech content associated with spoken reviews of entities. The computer program code may further cause the apparatus to analyze textual and acoustic data corresponding to the voice data to detect whether the textual or acoustic data includes words indicating a sentiment(s) of a user speaking the speech content. The computer program code may further cause the apparatus to generate a review of an entity corresponding to a spoken review(s) based on assigning a predefined sentiment to a word(s) responsive to detecting that the word indicates the sentiment of the user. Corresponding methods and computer program products are also provided. | 07-04-2013 |
20130204622 | ENHANCED CONTEXT AWARENESS FOR SPEECH RECOGNITION - A method comprising establishing a call connection ( | 08-08-2013 |
20130218562 | Sound Recognition Operation Apparatus and Sound Recognition Operation Method - According to one embodiment, a sound recognition operation apparatus includes a sound detection module, a keyword detection module, an audio mute module, and a transmission module. The sound detection module is configured to detect sound. The keyword detection module is configured to detect a particular keyword using voice recognition when the sound detection module detects sound. The audio mute module is configured to transmit an operation signal for muting audio sound when the keyword detection module detects the keyword. The transmission module is configured to recognize the voice command after the keyword is detected by the keyword detection module, and transmit an operation signal corresponding to the voice command. | 08-22-2013 |
20130268274 | System and Method for Efficient Tracking of Multiple Dialog States with Incremental Recombination - Disclosed herein are systems, methods, and computer-readable storage media for tracking multiple dialog states. A system practicing the method receives an N-best list of speech recognition candidates, a list of current partitions, and a belief for each of the current partitions. A partition is a group of dialog states. In an outer loop, the system iterates over the N-best list of speech recognition candidates. In an inner loop, the system performs a split, update, and recombination process to generate a fixed number of partitions after each speech recognition candidate in the N-best list. The system recognizes speech based on the N-best list and the fixed number of partitions. The split process can perform all possible splits on all partitions. The update process can compute an estimated new belief. The estimated new belief can be a product of ASR reliability, user likelihood to produce this action, and an original belief. | 10-10-2013 |
20130289993 | SPEAK AND TOUCH AUTO CORRECTION INTERFACE - The disclosure describes an overall system/method for developing a “speak and touch auto correction interface” referred to as STACI which is far more superior to existing user interfaces including the widely adopted qwerty. Using STACI a user speaks and types a word at the same time. The redundant information from the two modes, namely speech and the letters typed, enables the user to sloppily and partially type the words. The result is a very fast and accurate enhanced keyboard interface enabling document production on computing devices like phones and tablets. | 10-31-2013 |
20130297312 | Systems and Methods for Off-Board Voice-Automated Web Searching - A system for surfing the web includes a mobile system for processing and transmitting through a wireless link a voice stream spoken by a user of the mobile system and a data center for processing the voice stream received into voice web search information. The continuous voice stream includes a web search request. The data center performs automated voice recognition processing on the voice web search information to recognize components of the web search request, confirms the recognized components of the web search request through interactive speech exchanges with the user through the wireless link and the mobile system, selectively allows human data center operator intervention to assist in identifying the selected recognized web search components having a recognition confidence below a selected threshold value, and downloads web search results pertaining to the web search request for transmission to the mobile system derived from the confirmed recognized web search components. | 11-07-2013 |
20130304471 | Contextual Voice Query Dilation - A method, an apparatus and an article of manufacture for contextual voice query dilation in a Spoken Web search. The method includes determining a context in which a voice query is created, generating a set of multiple voice query terms based on the context and information derived by a speech recognizer component pertaining to the voice query, and processing the set of query terms with at least one dilation operator to produce a dilated set of queries. A method for performing a search on a voice query is provided, including generating a set of multiple query terms based on information derived by a speech recognizer component processing a voice query, processing the set with multiple dilation operators to produce multiple dilated sub-sets of query terms, selecting at least one query term from each dilated sub-set to compose a query set, and performing a search on the query set. | 11-14-2013 |
20130317823 | CUSTOMIZED VOICE ACTION SYSTEM - Systems, methods, and computer-readable media that may be used to modify a voice action system to include voice actions provided by advertisers or users are provided. One method includes receiving electronic voice action bids from advertisers to modify the voice action system to include a specific voice action (e.g., a triggering phrase and an action). One or more bids may be selected. The method includes, for each of the selected bids, modifying data associated with the voice action system to include the voice action associated with the bid, such that the action associated with the respective voice action is performed when voice input from a user is received that the voice action system determines to correspond to the triggering phrase associated with the respective voice action. | 11-28-2013 |
20130317824 | System and Method for Detecting Synthetic Speaker Verification - Disclosed herein are systems, methods, and tangible computer readable-media for detecting synthetic speaker verification. The method comprises receiving a plurality of speech samples of the same word or phrase for verification, comparing each of the plurality of speech samples to each other, denying verification if the plurality of speech samples demonstrate little variance over time or are the same, and verifying the plurality of speech samples if the plurality of speech samples demonstrates sufficient variance over time. One embodiment further adds that each of the plurality of speech samples is collected at different times or in different contexts. In other embodiments, variance is based on a pre-determined threshold or the threshold for variance is adjusted based on a need for authentication certainty. In another embodiment, if the initial comparison is inconclusive, additional speech samples are received. | 11-28-2013 |
20130325474 | SPEECH RECOGNITION ADAPTATION SYSTEMS BASED ON ADAPTATION DATA - Computationally implemented methods and systems include receiving indication of initiation of a speech-facilitated transaction between a party and a target device, and receiving adaptation data correlated to the party. The receiving is facilitated by a particular device associated with the party. The adaptation data is at least partly based on previous adaptation data derived at least in part from one or more previous speech interactions of the party. The methods and systems also include applying the received adaptation data correlated to the party to the target device, and processing speech from the party using the target device to which the received adaptation data has been applied. In addition to the foregoing, other aspects are described in the claims, drawings, and text. | 12-05-2013 |
20130332167 | AUDIO ANIMATION METHODS AND APPARATUS - According to some aspects, a method of providing an interactive audio presentation, at least in part, by traversing a plurality of audio animations, each audio animation comprising a plurality of frames, each of the plurality of frames comprising a duration, at least one audio element, and at least one gate indicating criteria for transitioning to and identification of a subsequent frame and/or a subsequent animation is provided. The method comprises rendering a first audio animation, receiving input from the user associated with the presentation, selecting a second audio animation based, at least in part, on the input, and rendering the second audio animation. Some aspects include a system for to performing the above method and some aspects include a computer readable medium storing instructions that perform the above method when executed by at least one processor. | 12-12-2013 |
20130332168 | VOICE ACTIVATED SEARCH AND CONTROL FOR APPLICATIONS - A method for voice activated search and control comprises converting, using an electronic device, multiple first speech signals into one or more first words. The one or more first words are used for determining a first phrase contextually related to an application space. The first phrase is used for performing a first action within the application space. Multiple second speech signals are converted, using the electronic device, into one or more second words. The one or more second words are used for determining a second phrase contextually related to the application space. The second phrase is used for performing a second action that is associated with a result of the first action within the application space. | 12-12-2013 |
20130339019 | SYSTEMS AND METHODS FOR MANAGING AN EMERGENCY SITUATION - Systems and methods for managing an emergency situation are provided herein. According to some embodiments, the present technology may related to a security system and method for monitoring, detecting, and providing notification and/or response measures in response to an emergency situation regarding a user. | 12-19-2013 |
20140006028 | COMPUTER IMPLEMENTED METHODS AND APPARATUS FOR SELECTIVELY INTERACTING WITH A SERVER TO BUILD A LOCAL DICTATION DATABASE FOR SPEECH RECOGNITION AT A DEVICE | 01-02-2014 |
20140039894 | SPEECH RECOGNITION SYSTEM AND METHOD USING GROUP CALL STATISTICS - An enhanced speech recognition system and method are provided that may be used with a voice recognition wireless communication system. The enhanced speech recognition system and method take advantage of group to group calling statistics to improve the recognition of names by the speech recognition system. | 02-06-2014 |
20140052445 | VOICE SEARCH AND RESPONSE BASED ON RELEVANCY - A user seeking information relevant to the purchase of a home improvement product or other product submits a query to an automated system. The system transforms the user's voice query into a text statement and searches a knowledge base for candidate responses. Quality scores for the candidate responses are determined. If no candidate response having at least a minimum quality score is identified, the query is sent to a second device associated with an agent. The agent response is provided to the user and stored in the knowledge base for future use. | 02-20-2014 |
20140067395 | SYSTEMS AND METHODS FOR ENGAGING AN AUDIENCE IN A CONVERSATIONAL ADVERTISEMENT - A system and method are described for engaging an audience in a conversational advertisement. A conversational advertising system converses with an audience using spoken words. The conversational advertising system uses a speech recognition application to convert an audience's spoken input into text and a text-to-speech application to transform text of a response to speech that is to be played to the audience. The conversational adverting system follows an advertisement script to guide the audience in a conversation. | 03-06-2014 |
20140074474 | IDENTIFYING MEDIA CONTENT - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving (i) audio data that encodes a spoken natural language query, and (ii) environmental audio data, obtaining a transcription of the spoken natural language query, determining a particular content type associated with one or more keywords in the transcription, providing at least a portion of the environmental audio data to a content recognition engine, and identifying a content item that has been output by the content recognition engine, and that matches the particular content type. | 03-13-2014 |
20140074475 | SPEECH RECOGNITION RESULT SHAPING APPARATUS, SPEECH RECOGNITION RESULT SHAPING METHOD, AND NON-TRANSITORY STORAGE MEDIUM STORING PROGRAM - There is provided a speech recognition result forming apparatus ( | 03-13-2014 |
20140088967 | APPARATUS AND METHOD FOR SPEECH RECOGNITION - Apparatus for speech recognition includes a recognition unit configured to recognize a speech signal and to generate a first recognition result, a transmitting unit that transmits at least one of the speech signal and a recognition feature to a server, a receiving unit that receives a second recognition result from the server, a result generating unit configured to generate a third recognition result, a result storage unit that stores the third recognition result and a dictionary update unit configured to update the client recognition dictionary. | 03-27-2014 |
20140095163 | HANDSFREE DEVICE WITH COUNTINUOUS KEYWORD RECOGNITION - A handsfree device, which is coupled to a data processing device, may be operable to monitor at least one audio stream for occurrence of at least one keyword. Upon recognition of the at least one keyword, the handsfree device may establish a first connection between the handsfree device and the data processing device for launching a voice interface in the data processing device. The handsfree device may send audio data received after the recognition of the at least one keyword to the data processing device, via the first connection for responding to the audio data via the voice interface. During a keyword configuration operation, the handsfree device may send at least one inputted keyword to the data processing device for recording. The handsfree device may receive, via a second connection, the recorded at least one keyword from the data processing device for keyword configuration of the handsfree device. | 04-03-2014 |
20140100851 | METHOD FOR CUSTOMER FEEDBACK MEASUREMENT IN PUBLIC PLACES UTILIZING SPEECH RECOGNITION TECHNOLOGY - A method, a system and a computer program product for enabling a customer response speech recognition unit to dynamically receive customer feedback. The customer response speech recognition unit is positioned at a customer location. The speech recognition unit is automatically initialized when one or more spoken words are detected. The response statements of customers are dynamically received by the customer response speech recognition unit at the customer location, in real time. The customer response speech recognition unit determines when the one or more spoken words of the customer response statement are associated with a score in a database. An analysis of the words is performed to generate a score that reflects the evaluation of the subject by the customer. The score is dynamically updated as new evaluations are received, and the score is displayed within graphical user interface (GUI) to be viewed by one or more potential customers. | 04-10-2014 |
20140122078 | Low Power Mechanism for Keyword Based Hands-Free Wake Up in Always ON-Domain - A low power keyword based speech recognition hardware architecture for hands free wake up of devices is provided. This system can be used in always ON domain for detection of voice activity, due to its low power operational ability. The system goes into deep low power state by deactivating all the non-required processes, if no activity is detected for a pre-specified time. Upon detection of the valid voice activity the system searches for the detection of the spoken keyword, if the valid keyword is detected, all the application processes are activated and system goes into full functional mode and if the voice activity doesn't contain the valid keyword present in the database then the system goes back into the deep low power state. | 05-01-2014 |
20140129225 | FILTERING SOME PORTIONS OF A MULTIMEDIA STREAM - A computing platform may comprise a receiver, which may identify inappropriate words in the original audio portion of a multimedia stream. The receiver may allow a user to store the inappropriate words in a memory and then compare the words in the original audio stream with the inappropriate words stored in the memory. If there is a match then such words may be filtered or blanked off from the original audio portion to generate a modified audio portion. The modified audio portion and the video portion may be synchronized to generate a synchronized multimedia stream, which may be void of inappropriate words. | 05-08-2014 |
20140149118 | APPARATUS AND METHOD FOR DRIVING ELECTRIC DEVICE USING SPEECH RECOGNITION - An apparatus and method for driving an electric device using speech recognition are disclosed. The method for driving an electric device using speech recognition includes switching an operation stop state to a pre-operation speech recognition standby mode if a first keyword is recognized at the operation stop state; starting basic operation of the device if a second keyword is recognized at the pre-operation speech recognition standby mode; and performing a corresponding command if at least one of a plurality of command languages that may be recognized by a user command is recognized within a predefined time during the basic operation, wherein the first keyword, the second keyword and the plurality of command languages are different from one another. | 05-29-2014 |
20140156277 | INFORMATION PROCESSING DEVICE AND CONTENT RETRIEVAL METHOD - According to one embodiment, an information processing device includes: an input module configured to receive voice input therein; a display controller configured to identify from the input voice a keyword and a single piece of attribute information associated with the keyword to be used for content retrieval and to cause to display on a display the identified keyword and the identified attribute information, and attribute candidate information that is associated with the identified keyword and selectable as an alternative to the identified attribute information; and a retrieval instructing module configured to give an instruction for the content retrieval using the identified keyword and the selected attribute candidate information. | 06-05-2014 |
20140180690 | Hybrid Hashing Scheme for Active HMMS - Embodiments of the present invention include a data storage device and a method for storing data in a hash table. The data storage device can include a first memory device, a second memory device, and a processing device. The first memory device is configured to store one or more data elements. The second memory device is configured to store one or more status bits at one or more respective table indices. In addition, each of the table indices is mapped to a corresponding table index in the first memory device. The processing device is configured to calculate one or more hash values based on the one or more data elements. | 06-26-2014 |
20140180691 | SYSTEMS AND METHODS FOR HANDS-FREE VOICE CONTROL AND VOICE SEARCH - In one embodiment the present invention includes a method comprising receiving an acoustic input signal and processing the acoustic input signal with a plurality of acoustic recognition processes configured to recognize the same target sound. Different acoustic recognition processes start processing different segments of the acoustic input signal at different time points in the acoustic input signal. In one embodiment, initial states in the recognition processes may be configured on each time step. | 06-26-2014 |
20140188473 | VOICE INSPECTION GUIDANCE - A system is provided that includes an inspection instrument. The inspection instrument includes communications circuitry, configured to communicatively couple the inspection instrument with information services; an audio input device, configured to receive inspection audio; and a processor, configured to query the information services via sending a request to the information services based upon an analysis of keywords, phrases, or both interpreted from the inspection audio. A voice recognition system configured to analyze the inspection audio for the keywords, phrases, or both is also provided. | 07-03-2014 |
20140188474 | System and Method of Lattice-Based Search for Spoken Utterance Retrieval - A system and method are disclosed for retrieving audio segments from a spoken document. The spoken document preferably is one having moderate word error rates such as telephone calls or teleconferences. The method comprises converting speech associated with a spoken document into a lattice representation and indexing the lattice representation of speech. These steps are performed typically off-line. Upon receiving a query from a user, the method further comprises searching the indexed lattice representation of speech and returning retrieved audio segments from the spoken document that match the user query. | 07-03-2014 |
20140195238 | METHOD AND APPARATUS OF CONFIDENCE MEASURE CALCULATION - An apparatus that calculates a confidence measure of a target word string specified in a recognition result includes: an alternative candidate generator which generates an alternative candidate word string in the position of the target word string; a classifier training unit which trains a classifier which is configured to discriminate between the target word string and the alternative candidate word string; a feature extractor which extracts a feature value representing an adjacent context in the position of the target word string; and a confidence measure calculator which determining whether the true word string in the position of the target word string is the target word string or the alternative candidate word string by using the classifier and the feature value and calculates a confidence measure of the target word string on the basis of the determination result. | 07-10-2014 |
20140222429 | VIDEO CONFERENCE CALL CONVERSATION TOPIC SHARING SYSTEM - Systems and methods are disclosed herein to a method for presenting topics of conversation during a call comprising: connecting, by a computer, a first device and a second device over a network; opening, by a computer, an audio channel that facilitates audio communication between a first user of the first device and a second user of the second device; receiving, by a computer, an audio stream over the audio channel; analyzing, by a computer, the audio stream to determine spoken words said by either the first or second users; correlating, by a computer, the determined spoken words to determine a topic of conversation; and displaying, by a computer, the topic of conversation in an information post to a remote terminal connected to the network. | 08-07-2014 |
20140236600 | METHOD AND DEVICE FOR KEYWORD DETECTION - An electronic device with one or more processors and memory trains an acoustic model with an international phonetic alphabet (IPA) phoneme mapping collection and audio samples in different languages, where the acoustic model includes: a foreground model; and a background model. The device generates a phone decoder based on the trained acoustic model. The device collects keyword audio samples, decodes the keyword audio samples with the phone decoder to generate phoneme sequence candidates, and selects a keyword phoneme sequence from the phoneme sequence candidates. After obtaining the keyword phoneme sequence, the device detects one or more keywords in an input audio signal with the trained acoustic model, including: matching phonemic keyword portions of the input audio signal with phonemes in the keyword phoneme sequence with the foreground model; and filtering out phonemic non-keyword portions of the input audio signal with the background model. | 08-21-2014 |
20140249820 | VOICE CONTROL DEVICE AND METHOD FOR DECIDING RESPONSE OF VOICE CONTROL ACCORDING TO RECOGNIZED SPEECH COMMAND AND DETECTION OUTPUT DERIVED FROM PROCESSING SENSOR DATA - A voice control device has a speech command recognizer, a sensor data processor and a decision making circuit. The speech command recognizer is arranged for performing speech command recognition to output a recognized speech command. The sensor data processor is arranged for processing sensor data generated from at least one auxiliary sensor to generate a detection output. The decision making circuit is arranged for deciding a response of the voice control device according to the recognized speech command and the detection output. The same speech command is able to trigger difference responses according to the detection output (e.g., detected motion). Besides, an adaptive training process may be employed to improve the accuracy of the sensor data processor. Hence, the voice control device may have improved performance of the voice control feature due to a reduce occurrence probability of miss errors and false alarm errors. | 09-04-2014 |
20140257813 | MICROPHONE CIRCUIT ASSEMBLY AND SYSTEM WITH SPEECH RECOGNITION - A microphone circuit assembly for an external application processor, such as a programmable Digital Signal Processor, may include a microphone preamplifier and analog-to-digital converter to generate microphone signal samples at a first predetermined sample rate. A speech feature extractor is configured for receipt and processing of predetermined blocks of the microphone signal samples to extract speech feature vectors representing speech features of the microphone signal samples. The microphone circuit assembly may include a speech vocabulary comprising a target word or target phrase of human speech encoded as a set of target feature vectors and a decision circuit is configured to compare the speech feature vectors generated by the speech feature extractor with the target feature vectors to detect the target speech word or phrase. | 09-11-2014 |
20140278421 | SYSTEM AND METHODS FOR IMPROVING LANGUAGE PRONUNCIATION - A system and methods for analyzing pronunciation, detecting errors and providing automatic feedback to help non-native speakers improve pronunciation of a foreign language is provided that employs publicly available, high accuracy third-party automatic speech recognizers available via the Internet to analyze and identify mispronunciations. | 09-18-2014 |
20140278422 | INDEXING DIGITIZED SPEECH WITH WORDS REPRESENTED IN THE DIGITIZED SPEECH - Indexing digitized speech with words represented in the digitized speech, with a multimodal digital audio editor operating on a multimodal device supporting modes of user interaction, the modes of user interaction including a voice mode and one or more non-voice modes, the multimodal digital audio editor operatively coupled to an ASR engine, including providing by the multimodal digital audio editor to the ASR engine digitized speech for recognition; receiving in the multimodal digital audio editor from the ASR engine recognized user speech including a recognized word, also including information indicating where, in the digitized speech, representation of the recognized word begins; and inserting by the multimodal digital audio editor the recognized word, in association with the information indicating where, in the digitized speech, representation of the recognized word begins, into a speech recognition grammar, the speech recognition grammar voice enabling user interface commands of the multimodal digital audio editor. | 09-18-2014 |
20140288933 | METHOD AND SYSTEM FOR CONSIDERING INFORMATION ABOUT AN EXPECTED RESPONSE WHEN PERFORMING SPEECH RECOGNITION - A speech recognition system receives and analyzes speech input from a user in order to recognize and accept a response from the user. Under certain conditions, information about the response expected from the user may be available. In these situations, the available information about the expected response is used to modify the behavior of the speech recognition system by taking this information into account. The modified behavior of the speech recognition system comprises adjusting the rejection threshold when speech input matches the predetermined expected response. | 09-25-2014 |
20140297281 | SPEECH PROCESSING METHOD, DEVICE AND SYSTEM - A speech processing method executed by a computer, the speech processing method includes: extracting, based on speech recognition for an input speech data, a plurality of word candidates including a first word candidate and a second word candidate from a memory, the plurality of word candidates being candidates for a word corresponding to the input speech data; determining at least one different part between the first word candidate and the second word candidate based on a comparison between the first word candidate and the second word candidate; and outputting the first word candidate with emphasis on the at least one different part. | 10-02-2014 |
20140337028 | MESSAGE-TRIGGERED VOICE COMMAND INTERFACE IN PORTABLE ELECTRONIC DEVICES - The embodiments provided herein are directed to a system and method of message-triggered voice command interface in portable electronic devices. The voice command interface is normally not activated until a message (e.g., an e-mail, a text message, or a voice mail) has been received by a portable electronic device. The arriving of a message is used to trigger the voice command interface by activating one or more speech recognition routines in a predetermined time period corresponding to the one or more speech recognition routines. The voice command interface come to an end when the predetermined time period expires or the user has no further commands. | 11-13-2014 |
20140337029 | SPEECH RECOGNITION WITH A PLURALITY OF MICROPHONES - At least first and second microphones with different frequency responses form part of a speech recognition system. The microphones are coupled to a processor that is configured to recognize a spoken word based on the microphone signals. The processor classifies the spoken word, and weights the signals from the microphones based on the classification of the spoken word. | 11-13-2014 |
20140337030 | ADAPTIVE AUDIO FRAME PROCESSING FOR KEYWORD DETECTION - A method of detecting a target keyword from an input sound for activating a function in a mobile device is disclosed. In this method, a first plurality of sound features is received in a buffer, and a second plurality of sound features is received in the buffer. While receiving each of the second plurality of sound features in the buffer, a first number of the sound features are processed from the buffer. The first number of the sound features includes two or more sound features. Further, the method may include determining a keyword score for each of the processed sound features and detecting the input sound as the target keyword if at least one of the keyword scores is greater than a threshold score. | 11-13-2014 |
20140350935 | Voice Controlled Audio Recording or Transmission Apparatus with Keyword Filtering - A method includes obtaining a plurality of audio channels using a plurality of microphone outputs having at least one audio control channel and at least one audio output channel. When a keyword is detected on the audio control channel using voice recognition, adaptive filtering is performed to attenuate the keyword from the audio output channel. An apparatus operative to perform the method includes a plurality of microphones that provide a plurality of audio channels with at least one audio output channel and at least one audio control channel. Voice command recognition logic is operatively coupled to the plurality of microphones to receive the at least one audio control channel. The voice command recognition logic detect keywords on the audio control channel and filter logic with at least one adaptive filter performs adaptive filtering to attenuate the keyword from the at least one audio output channel. | 11-27-2014 |
20140350936 | ELECTRONIC DEVICE - According to at least one embodiment, an electronic device includes storage and a processor. The storage stores a database including a plurality of names. The processor outputs an identified name based on a search of the database for a first name having one or more characteristics in common with a character string associated with speech data. | 11-27-2014 |
20140350937 | VOICE PROCESSING DEVICE AND VOICE PROCESSING METHOD - A voice processing device includes a processor; and a memory which stores a plurality of instructions, which when executed by the processor, causing the processor to execute: acquiring an input voice; detecting a sound period included in the input voice and a silent period adjacent to a back end of the sound period; calculating a number of words included in the sound period; and controlling a length of the silent period according to the number of words. | 11-27-2014 |
20140350938 | SYSTEM AND METHOD FOR DETECTING SYNTHETIC SPEAKER VERIFICATION - Disclosed herein are systems, methods, and tangible computer readable-media for detecting synthetic speaker verification. The method comprises receiving a plurality of speech samples of the same word or phrase for verification, comparing each of the plurality of speech samples to each other, denying verification if the plurality of speech samples demonstrate little variance over time or are the same, and verifying the plurality of speech samples if the plurality of speech samples demonstrates sufficient variance over time. One embodiment further adds that each of the plurality of speech samples is collected at different times or in different contexts. In other embodiments, variance is based on a pre-determined threshold or the threshold for variance is adjusted based on a need for authentication certainty. In another embodiment, if the initial comparison is inconclusive, additional speech samples are received. | 11-27-2014 |
20140365220 | SPEECH RECOGNITION CIRCUIT USING PARALLEL PROCESSORS - A speech recognition circuit comprises an input buffer for receiving processed speech parameters. A lexical memory contains lexical data for word recognition. The lexical data comprises a plurality of lexical tree data structures. Each lexical tree data structure comprises a model of words having common prefix components. An initial component of each lexical tree structure is unique. A plurality of lexical tree processors are connected in parallel to the input buffer for processing the speech parameters in parallel to perform parallel lexical tree processing for word recognition by accessing the lexical data in the lexical memory. A results memory is connected to the lexical tree processors for storing processing results from the lexical tree processors and lexical tree identifiers to identify lexical trees to be processed by the lexical tree processors. A controller controls the lexical tree processors to process lexical trees identified in the results memory by performing parallel processing on a plurality of said lexical tree data structures. | 12-11-2014 |
20140372120 | System and Method for Recognizing Speech - A system and a method recognize speech including a sequence of words. A set of interpretations of the speech is generated using an acoustic model and a language model, and, for each interpretation, a score representing correctness of an interpretation in representing the sequence of words is determined to produce a set of scores. Next, the set of scores is updated based on a consistency of each interpretation with a constraint determined in response to receiving a word sequence constraint. | 12-18-2014 |
20140379346 | VIDEO ANALYSIS BASED LANGUAGE MODEL ADAPTATION - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving audio data obtained by a microphone of a wearable computing device, wherein the audio data encodes a user utterance, receiving image data obtained by a camera of the wearable computing device, identifying one or more image features based on the image data, identifying one or more concepts based on the one or more image features, selecting one or more terms associated with a language model used by a speech recognizer to generate transcriptions, adjusting one or more probabilities associated with the language model that correspond to one or more of the selected terms based on the relevance of one or more of the selected terms to the one or more concepts, and obtaining a transcription of the user utterance using the speech recognizer. | 12-25-2014 |
20150073802 | DEALING WITH SWITCH LATENCY IN SPEECH RECOGNITION - In embodiments of the present invention improved capabilities are described for interacting with a mobile communication facility comprising receiving a switch activation from a user to initiate a speech recognition recording session, wherein the speech recognition recording session comprises a voice command from the user followed by the speech to be recognized from the user; recording the speech recognition recording session using a mobile communication facility resident capture facility; recognizing at least a portion of the voice command as an indication that user speech for recognition will begin following the end of the at least a portion of the voice command; recognizing the recorded speech using a speech recognition facility to produce an external output; and using the selected output to perform a function on the mobile communication facility. | 03-12-2015 |
20150081303 | PERSONNEL TRANSPORT DEVICE WITH COMPUTERIZED INFORMATION AND DISPLAY APPARATUS - A transport device which includes computerized apparatus useful for obtaining and displaying information. In one embodiment, the computerized apparatus includes a display to device and speech recognition apparatus configured to receive user speech and/or other input and enable performance of various tasks, such as obtaining desired information. In one variant, the computerized apparatus is configured to establish a plurality of ad hoc data links with portable user devices present within the transport device, thereby enabling passengers to, among other things, wirelessly and simultaneously obtain data from a remote server or network such as the Internet. In another variant, the transport device includes a plurality of display devices within its passenger compartment so as to permit substantially simultaneous individual user viewing and interaction. | 03-19-2015 |
20150088515 | PRIMARY SPEAKER IDENTIFICATION FROM AUDIO AND VIDEO DATA - An aspect provides a method, including: receiving image data from a visual sensor of an information handling device; receiving audio data from one or more microphones of the information handling device; identifying, using one or more processors, human speech in the audio data; identifying, using the one or more processors, a pattern of visual features in the image data associated with speaking; matching, using the one or more processors, the human speech in the audio data with the pattern of visual features in the image data associated with speaking; selecting, using the one or more processors, a primary speaker from among matched human speech; assigning control to the primary speaker; and performing one or more actions based on audio input of the primary speaker. Other aspects are described and claimed. | 03-26-2015 |
20150088516 | DETECTING POTENTIAL SIGNIFICANT ERRORS IN SPEECH RECOGNITION RESULTS - In some embodiments, the recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential significant errors. In some embodiments, the recognition results may be evaluated to determine whether a meaning of any of the alternative recognition results differs from a meaning of the top recognition result in a manner that is significant for a domain, such as the medical domain. In some embodiments, words and/or phrases that may be confused by an ASR system may be determined and associated in sets of words and/or phrases. Words and/or phrases that may be determined include those that change a meaning of a phrase or sentence when included in the phrase/sentence. | 03-26-2015 |
20150088517 | DETECTING POTENTIAL SIGNIFICANT ERRORS IN SPEECH RECOGNITION RESULTS - In some embodiments, the recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential significant errors. In some embodiments, the recognition results may be evaluated to determine whether a meaning of any of the alternative recognition results differs from a meaning of the top recognition result in a manner that is significant for a domain, such as the medical domain. In some embodiments, words and/or phrases that may be confused by an ASR system may be determined and associated in sets of words and/or phrases. Words and/or phrases that may be determined include those that change a meaning of a phrase or sentence when included in the phrase/sentence. | 03-26-2015 |
20150088518 | APPARATUS AND METHOD FOR MULTIPLE DEVICE VOICE CONTROL - In an environment including multiple electronic devices that are each capable of being controlled by a user's voice command, an individual device is able to distinguish a voice command intended particularly for the device from among other voice commands that are intended for other devices present in the common environment. The device is able to accomplish this distinction by identifying unique attributes belonging to the device itself from within a user's voice command. Thus only voice commands that include attribute information that are supported by the device will be recognized by the device, and other voice commands that include attribute information that are not supported by the device may be effectively ignored for voice control purposes of the device. | 03-26-2015 |
20150095030 | CENTRALIZED METHOD AND SYSTEM FOR CLARIFYING VOICE COMMANDS - A method and system for facilitating centralized interaction with a user includes providing a recognized voice command to a plurality of application modules. A plurality of interpretations of the voice command are generated by at least one of the plurality of application modules. A centralized interface module visually renders the plurality of interpretations of the voice command on a centralized display. An indication of selection of an interpretation is received from the user. | 04-02-2015 |
20150120300 | VOICE RECOGNITION DEVICE - According to a voice recognition device of this invention, with respect to a keyword extracted by a voice recognition unit from a speech content by a user, display contents each displayed by an operation by the user and their respective numbers of display times are stored as history information, and a search level is set through determination of whether or not the same operations and displays have been made by a predetermined number of times or more. This makes it possible, at the next time the same keyword is extracted, to immediately present information of such a level that the user requires, and thus, detailed information necessary for the user can always be provided efficiently, so that the convenience of the user is enhanced. | 04-30-2015 |
20150120301 | Information Recognition Method and Apparatus - An information recognition method and apparatus are provided. The method includes receiving, by a terminal, voice information, extracting a voice feature from the voice information, performing matching calculation on the voice feature and a phoneme string corresponding to each candidate text in multiple candidate texts to obtain a recognition result, where the recognition result includes at least one command word and a label corresponding to the at least one command word, and recognizing, according to the label corresponding to the at least one command word, an operation instruction corresponding to the voice information. A terminal recognizes text information, which is corresponding to voice information input by a user, as an operation instruction. | 04-30-2015 |
20150127345 | Name Based Initiation of Speech Recognition - A computer-implemented method includes listening for audio name information indicative of a name of a computer, with the computer configured to listen for the audio name information in a first power mode that promotes a conservation of power; detecting the audio name information indicative of the name of the computer; after detection of the audio name information, switching to a second power mode that promotes a performance of speech recognition; receiving audio command information; and performing speech recognition on the audio command information. | 05-07-2015 |
20150134334 | MEDIA ITEM SELECTION USING USER-SPECIFIC GRAMMAR - A storage machine holds instructions executable by a logic machine to receive a digital representation of a spoken command. The digital representation is provided to a speech recognizer trained with a user-specific grammar library. The logic machine then receives from the speech recognizer a confidence rating for each of a plurality of different media items. The confidence rating indicates the likelihood that the media item is named in the spoken command. The logic machine then automatically plays back the media item with a greatest confidence rating. | 05-14-2015 |
20150134335 | DETECTING POTENTIAL SIGNIFICANT ERRORS IN SPEECH RECOGNITION RESULTS - In some embodiments, the recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential significant errors. In some embodiments, the recognition results may be evaluated using one or more sets of words and/or phrases, such as pairs of words/phrases that may include words/phrases that are acoustically similar to one another and/or that, when included in a result, would change a meaning of the result in a manner that would be significant for a domain. The recognition results may be evaluated using the set(s) of words/phrases to determine, when the top result includes a word/phrase from a set of words/phrases, whether any of the alternative recognition results includes any of the other, corresponding words/phrases from the set. | 05-14-2015 |
20150142442 | IDENTIFYING A CONTACT - A method of identifying a contact in a communication system using voice input, the method comprising: receiving an input string of characters, the input string representing a contact and being normally unpronounceable by a human voice when spoken literally; performing at least one transforming step to transform at least one character of the input string to thereby generate a pronounceable name for the contact; and outputting the pronounceable name for use in establishing a communication event with the contact using voice input. | 05-21-2015 |
20150294666 | DEVICE INCLUDING SPEECH RECOGNITION FUNCTION AND METHOD OF RECOGNIZING SPEECH - A device including a speech recognition function which recognizes speech from a user, includes: a loudspeaker which outputs speech to a space; a microphone which collects speech in the space; a first speech recognition unit which recognizes the speech collected by the microphone; a command control unit which issues a command for controlling the device, based on the speech recognized by the first speech recognition unit; and a control unit which prohibits the command issuance unit from issuing the command, based on the speech to be output from the loudspeaker. | 10-15-2015 |
20150302847 | KEYWORD MODEL GENERATION FOR DETECTING USER-DEFINED KEYWORD - According to an aspect of the present disclosure, a method for generating a keyword model of a user-defined keyword in an electronic device is disclosed. The method includes receiving at least one input indicative of the user-defined keyword, determining a sequence of subwords from the at least one input, generating the keyword model associated with the user-defined keyword based on the sequence of subwords and a subword model of the subwords, wherein the subword model is configured to model a plurality of acoustic features of the subwords based on a speech database, and providing the keyword model associated with the user-defined keyword to a voice activation unit configured with a keyword model associated with a predetermined keyword. | 10-22-2015 |
20150310852 | SPEECH EFFECTIVENESS RATING - In an approach to determining speech effectiveness, a computer receives speech input. The computer determines, based, at least in part, on the received speech input, whether the speech input is one of: a conversation with words spoken by two or more people during a predetermined time interval, and a presentation with words spoken by one person and not any other person during a predetermined time interval. The computer detects at least one problem with the speech input. If the speech input is the presentation, the computer weights, by a first factor, the detected at least one problem with the speech input based on the speech input being a presentation and not a conversation, and if the speech input is a conversation, the computer weights, by a second factor, the detected at least one problem with the speech input based on the speech input being a conversation and not a presentation. | 10-29-2015 |
20150310856 | SPEECH RECOGNITION APPARATUS, SPEECH RECOGNITION METHOD, AND TELEVISION SET - A speech recognition apparatus includes: a speech acquisition unit which acquires speech uttered by a user; a recognition result acquisition unit which acquires a result of recognition performed on the acquired speech; an extraction unit which, when the recognition result includes a keyword and a selection command that is used for selecting one of selectable information items, extracts a selection candidate that includes the keyword; a selection mode switching unit which, when more than one selection candidate is extracted, switches a selection mode from a first selection mode that allows selection among the selectable information items to a second selection that allows selection among the selection candidates; a display control unit which changes a display manner of the display information, according to the second selection mode switched from the first selection mode; and a selection unit which selects one of the selection candidates, according to an entry from the user. | 10-29-2015 |
20150331658 | Controlling Audio Players Using Environmental Audio Analysis - A method, system, and computer program product containing instructions for analyzing audio input to a receiver coupled to an audio player to identify an audio event as one of a plurality of pre-determined audio event types. In response to identifying the audio event, the audio player is caused to adjust an audio output. Adjusting the audio output may include causing the audio player to pause playing audio output or to lower the volume of the audio output. The audio input to the receiver may be recorded. In response to identifying the audio event, the audio player may be caused to replay a recorded portion of the audio input. The recorded portion of the audio input may include a portion recorded prior to identifying the audio event. | 11-19-2015 |
20150348539 | SPEECH RECOGNITION SYSTEM - A system has a speech recognition unit | 12-03-2015 |
20150348547 | METHOD FOR SUPPORTING DYNAMIC GRAMMARS IN WFST-BASED ASR - Systems and processes are disclosed for recognizing speech using a weighted finite state transducer (WFST) approach. Dynamic grammars can be supported by constructing the final recognition cascade during runtime using difference grammars. In a first grammar, non-terminals can be replaced with a, weighted phone loop that produces sequences of mono-phone words. In a second grammar, at runtime, non-terminals can be replaced with sub-grammars derived from user-specific usage data including contact, media, and application lists. Interaction frequencies associated with these entities can be used to weight certain words over others. With all non-terminals replaced, a static recognition cascade with the first grammar can be composed with the personalized second grammar to produce a user-specific WEST. User speech can then be processed to generate candidate words having associated probabilities, and the likeliest result can be output. | 12-03-2015 |
20150356968 | Systems and Methods of Interpreting Speech Data - Method and systems are provided for interpreting speech data. A method and system for recognizing speech involving a filter module to generate a set of processed audio data based on raw audio data; a translation module to provide a set of translation results for the raw audio data; and a decision module to select the text data that represents the raw audio data. A method for minimizing noise in audio signals received by a microphone array is also described. A method and system of automatic entry of data into one or more data fields involving receiving a processed audio data; and operating a processing module to: search in a trigger dictionary for a field identifier that corresponds to the trigger identifier; identify a data field associated with a data field identifier corresponding to the field identifier; and providing content data associated with the trigger identifier to the identified data field. | 12-10-2015 |
20150364129 | Language Identification - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language identification. In some implementations, speech data for an utterance is received and provided to (i) a language identification module and (ii) multiple speech recognizers that are each configured to recognize speech in a different language. From the language identification module, language identification scores corresponding to different languages are received, the language identification scores each indicating a likelihood that the utterance is speech in the corresponding language. A language model confidence score that indicates a level of confidence that a language model has in a transcription of the utterance in a language corresponding to the language model is received. A language is selected based on the language identification scores and the language model confidence scores. | 12-17-2015 |
20150371632 | ENTITY NAME RECOGNITION - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for recognizing names of entities in speech. In one aspect, a method includes actions of receiving an utterance that includes (i) a first term that indicates a particular entity type, and (ii) a second term that indicates an entity name. Additional actions include obtaining a phonetic representation of the second term and determining that the phonetic representation of the second term matches a particular phonetic representation of a particular canonical name of a set of canonical names associated with a particular entity. Further actions include outputting a reference name associated with the particular entity as a transcription of the second term. | 12-24-2015 |
20150371635 | System and Method for Processing Speech to Identify Keywords or Other Information - A system and method are provided for performing speech processing. A system includes an audio detection system configured to receive a signal including speech and a memory having stored therein a database of keyword models forming an ensemble of filters associated with each keyword in the database. A processor is configured to receive the signal including speech from the audio detection system, decompose the signal including speech into a sparse set of phonetic impulses, and access the database of keywords and convolve the sparse set of phonetic impulses with the ensemble of filters. The processor is further configured to identify keywords within the signal including speech based a result of the convolution and control operation the electronic system based on the keywords identified. | 12-24-2015 |
20160005397 | SPEECH RECOGNITION CIRCUIT AND METHOD - A speech recognition circuit comprising a circuit for providing state identifiers which identify states corresponding to nodes or groups of adjacent nodes in a lexical tree, and for providing scores corresponding to said state identifiers, the lexical tree comprising a model of words. The circuit includes: a memory structure for receiving and storing state identifiers identified by a node identifier identifying a node or group of adjacent nodes, the memory structure being adapted to allow lookup to identify particular state identifiers, reading of the scores corresponding to the state identifiers, and writing back of the scores to the memory structure after modification of the scores; an accumulator for receiving score updates corresponding to particular state identifiers from a score update generating circuit which generates the score updates using audio input, for receiving scores from the memory structure, and for modifying said scores by adding said score updates to said scores; and a selector circuit for selecting at least one node or group of adjacent nodes of the lexical tree according to said scores. | 01-07-2016 |
20160012686 | AUTOMATICALLY ACTIVATED VISUAL INDICATORS ON COMPUTING DEVICE | 01-14-2016 |
20160012820 | MULTILEVEL SPEECH RECOGNITION METHOD AND APPARATUS | 01-14-2016 |
20160019888 | ORDER ENTRY SYSTEM AND ORDER ENTRY METHOD - There is provided an order entry system including: a first microphone that picks up speech regarding order details of a first speaker; a second microphone that picks up speech regarding the order details of a second speaker for checking the order details of the first speaker; a speech recognizer that recognizes the speech regarding the order details of the first speaker which is picked up by the first microphone and the speech regarding the order details of the second speaker which is picked up by the second microphone; and an order data output that displays on a display, a display screen of order data regarding the order details of the first speaker, including a first speech recognition result of the speech regarding the order details of the first speaker and a second speech recognition result of the speech regarding the order details of the second speaker. | 01-21-2016 |
20160026627 | System And Method For Enhancing Voice-Enabled Search Based On Automated Demographic Identification - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating responses to a user speech query in voice-enabled search based on metadata that include demographic features of the speaker. A system practicing the method recognizes received speech from a speaker to generate recognized speech, identifies metadata about the speaker from the received speech, and feeds the recognized speech and the metadata to a question-answering engine. Identifying the metadata about the speaker is based on voice characteristics of the received speech. The demographic features can include age, gender, socio-economic group, nationality, and/or region. The metadata identified about the speaker from the received speech can be combined with or override self-reported speaker demographic information. | 01-28-2016 |
20160034458 | SPEECH RECOGNITION APPARATUS AND METHOD THEREOF - There is provided a speech recognition controlling method which includes extracting a keyword by crawling a webpage, adding the keyword to a lexicon in which a plurality of words are registered and updating the lexicon, recognizing, in response to a user speech being input, the speech based on the updated lexicon, performing a search according to the recognized result, and displaying a result of the search. | 02-04-2016 |
20160071516 | KEYWORD DETECTION USING SPEAKER-INDEPENDENT KEYWORD MODELS FOR USER-DESIGNATED KEYWORDS - A method, which is performed by an electronic device, for obtaining a speaker-independent keyword model of a keyword designated by a user is disclosed. The method may include receiving at least one sample sound from the user indicative of the keyword. The method may also generate a speaker-dependent keyword model for the keyword based on the at least one sample sound, send a request for the speaker-independent keyword model of the keyword to a server in response to generating the speaker-dependent keyword model, and receive the speaker-independent keyword model adapted for detecting the keyword spoken by a plurality of users from the server. | 03-10-2016 |
20160111084 | SPEECH RECOGNITION DEVICE AND SPEECH RECOGNITION METHOD - A speech recognition device includes: a collector collecting speech data of a first speaker from a speech-based device; a first storage accumulating the speech data of the first speaker; a learner learning the speech data of the first speaker accumulated in the first storage and generating an individual acoustic model of the first speaker based on the learned speech data; a second storage storing the individual acoustic model of the first speaker and a generic acoustic model; a feature vector extractor extracting a feature vector from the speech data of the first speaker when a speech recognition request is received from the first speaker; and a speech recognizer selecting either one of the individual acoustic model of the first speaker and the generic acoustic model based on an accumulated amount of the speech data of the first speaker and recognizing a speech command using the extracted feature vector and the selected acoustic model. | 04-21-2016 |
20160111090 | HYBRIDIZED AUTOMATIC SPEECH RECOGNITION - A system and method of providing speech received in a vehicle to an automatic speech recognition (ASR) system includes: receiving speech at the vehicle from a vehicle occupant; providing the received speech to a remotely-located ASR system and a vehicle-based ASR system; and thereafter determining a confidence level for the speech processed by the vehicle-based ASR system; presenting in the vehicle results from the vehicle-based ASR system when the determined confidence level is above a predetermined confidence threshold is not above; presenting in the vehicle results from the remotely-located ASR system when the determined confidence level is not above a predetermined confidence threshold. | 04-21-2016 |
20160125874 | METHOD AND APPARATUS FOR OPTIMIZING A SPEECH RECOGNITION RESULT - According to one embodiment, an apparatus for optimizing a speech recognition result comprising: a receiving unit configured to receive a speech recognition result; a calculating unit configured to calculate a pronunciation similarity between a segment of the speech recognition result and a key word in a key word list; and a replacing unit configured to replace the segment with the key word in a case that the pronunciation similarity is higher than a first threshold. | 05-05-2016 |
20160133253 | CONCATENATED EXPECTED RESPONSES FOR SPEECH RECOGNITION - A speech recognition system used for hands-free data entry receives and analyzes speech input to recognize and accept a user's response. Under certain conditions, a user's response might be expected. In these situations, the expected response may modify the behavior of the speech recognition system to improve performance. For example, if the hypothesis of a user's response matches the expected response then there is a high probability that the user's response was recognized correctly. This information may be used to make adjustments. An expected response may include expected response parts, each part containing expected words. By considering an expected response as the concatenation of expected response parts, each part may be considered independently for the purposes of adjusting an acceptance algorithm, adjusting a model, or recording an apparent error. In this way, the speech recognition system may make modifications based on a wide range of user responses. | 05-12-2016 |
20160155454 | SYSTEMS AND METHODS FOR MANAGING AN EMERGENCY SITUATION | 06-02-2016 |
20160180837 | SYSTEM AND METHOD OF SPEECH RECOGNITION | 06-23-2016 |
20160180846 | SPEECH RECOGNITION APPARATUS, VEHICLE INCLUDING THE SAME, AND METHOD OF CONTROLLING THE SAME | 06-23-2016 |
20160180848 | CUSTOMIZED VOICE ACTION SYSTEM | 06-23-2016 |
20160180851 | Systems and Methods for Continual Speech Recognition and Detection in Mobile Computing Devices | 06-23-2016 |
20160189708 | AUDIBLE PROXIMITY MESSAGING - Methods, systems, and computer program products for providing audible proximity messaging are disclosed. A computer-implemented method may include receiving a message for communication to one or more users, receiving a keyword associated with a message, analyzing an audio track to determine whether the keyword exists in the audio track, matching the keyword to the audio track, identifying one or more locations of the keyword in the audio track, converting the message to an audible format, determine whether to provide the message to a user based on one or more conditions associated with the user, and providing the message to a user when the keyword is played during the audio track. In some examples, the message may be an audio message played when the keyword plays in the audio track based on one or more of a user preference, a user location, a current user activity, and/or other factors. | 06-30-2016 |
20160379628 | DISCOVERING WINDOWS IN TEMPORAL PREDICATES - A method and system are provided. The method includes separating a predicate that specifies a set of events into a temporal part and a non-temporal part. The method further includes comparing the temporal part of the predicate against a predicate of a known window type. The method also includes determining whether the temporal part of the predicate matches the predicate of the known window type. The method additionally includes replacing (i) the non-temporal part of the predicate by a filter, and (ii) the temporal part of the predicate by an instance of the known window type, responsive to the temporal part of the temporal predicate matching the predicate of the known window type. The instance is parameterized with substitutions used to match the temporal part of the predicate to the predicate of the known window type. | 12-29-2016 |
20160379634 | CONTROL DEVICE, CONTROL METHOD, AND PROGRAM - A control device includes: a storage that stores a dialog model in which a question to a user, a reply candidate to the question from the user and a control content of each electronic device are associated with an input query from the user; an acquirer that acquires environmental data in a surrounding of the user; a calculator that, based on the environmental data, calculates environment predicted data to predict environment in the surrounding of the user after elapse of a predetermined period of time in cases where each control content corresponding to the input query is executed; and a question selector that selects a question corresponding to the control content that maximizes data indicative of a degree of comfort of the surrounding environment of the user in cases where each control is executed based on the environment predicted data. | 12-29-2016 |
20160379635 | ACTIVATING SPEECH PROCESS - A method of processing received data representing speech comprises monitoring the received data to detect the presence of data representing a first portion of a trigger phrase in said received data. On detection of the data representing the first portion of the trigger phrase, a control signal is sent to activate a speech processing block. The received data is monitored to detect the presence of data representing a second portion of the trigger phrase in said received data. If the control signal to activate the speech processing block has previously been sent, then, on detection of the data representing the second portion of the trigger phrase, the activation of the speech processing block is maintained. | 12-29-2016 |
20170236510 | VOICE PROCESSING DEVICE | 08-17-2017 |
20180025728 | IMAGE DISPLAY APPARATUS AND METHOD OF CONTROLLING THE SAME | 01-25-2018 |
20190147851 | INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD, AND STORAGE MEDIUM WHICH STORES INFORMATION PROCESSING PROGRAM THEREIN | 05-16-2019 |
20190147860 | METHOD AND APPARATUS FOR IDENTIFYING INFORMATION | 05-16-2019 |
20190147862 | METHOD AND APPARATUS FOR PROVIDING VOICE SERVICE | 05-16-2019 |
20190147867 | DIALOGUE SYSTEM AND METHOD FOR CONTROLLING THEREOF | 05-16-2019 |
20190147869 | VOICE INTERACTION METHOD AND APPARATUS, TERMINAL, SERVER AND READABLE STORAGE MEDIUM | 05-16-2019 |
20190147872 | INFORMATION PROCESSING DEVICE | 05-16-2019 |
20190147880 | INTELLIGENT LIST READING | 05-16-2019 |
20190147881 | INFORMATION PROCESSING DEVICE, RECEPTION DEVICE, AND INFORMATION PROCESSING METHOD | 05-16-2019 |