Patent application number | Description | Published |
20080235164 | Apparatus, method and computer program product providing a hierarchical approach to command-control tasks using a brain-computer interface - Disclosed is a method, a computer program product, and a device that are responsive to detected mental states of a user to perform selection processes to execute a task. The method includes providing a hierarchical multi-level decision tree structure comprised of internal nodes and leaf nodes, where the decision tree structure represents a task. The method further includes navigating, using information derived from detected mental states of the user, through levels of the decision tree structure to reach a leaf node to accomplish the task. The step of navigating includes selecting, using the information derived from the detected mental states of the user, between attribute values associated with internal nodes of the decision tree structure. As non-limiting examples, the device may be a communication device, and the task may be a name dialing or a command/control task. | 09-25-2008 |
20080255827 | Voice Conversion Training and Data Collection - It may be desirable to provide a way to collect high quality speech training data without undue burden to the user. Speech training data may be collected during normal usage of a device. In this way, the collection of speech training data may be effectively transparent to the user, without the need for a distinct collection mode from the user's point of view. For example, where the device is or includes a phone (such as a cellular phone), when the user makes or receives a phone call to/from another party, speech training data may be automatically collected from one or both of the parties during the phone call. | 10-16-2008 |
20080262838 | Method, apparatus and computer program product for providing voice conversion using temporal dynamic features - An apparatus for providing voice conversion using temporal dynamic features includes a feature extractor and a transformation element. The feature extractor may be configured to extract dynamic feature vectors from source speech. The transformation element may be in communication with the feature extractor and configured to apply a first conversion function to a signal including the extracted dynamic feature vectors to produce converted dynamic feature vectors. The first conversion function may have been trained using at least dynamic feature data associated with training source speech and training target speech. The transformation element may be further configured to produce converted speech based on an output of applying the first conversion function. | 10-23-2008 |
20090094031 | Method, Apparatus and Computer Program Product for Providing Text Independent Voice Conversion - An apparatus for providing text independent voice conversion may include a first voice conversion model and a second voice conversion model. The first voice conversion model may be trained with respect to conversion of training source speech to synthetic speech corresponding to the training source speech. The second voice conversion model may be trained with respect to conversion to training target speech from synthetic speech corresponding to the training target speech. An output of the first voice conversion model may be communicated to the second voice conversion model to process source speech input into the first voice conversion model into target speech corresponding to the source speech as the output of the second voice conversion model. | 04-09-2009 |
20090157385 | Inverse Text Normalization - Embodiments are directed to efficient multilingual inverse text normalization (ITN) of text in spoken form to produce normalized text for display. Embodiments are directed to preprocessing the multilingual text into a language-independent representation, tokenizing text in spoken form, segmenting the tokenized text into ITN items by grouping consecutive words using an ITN lexicon, classifying the ITN items into ITN categories by using the ITN lexicon or tagged information from language model, applying one or more ITN rules that are selected based on the ITN categories into which ITN items have been classified to rewrite the ITN items; and post processing the ITN item and outputting inversely normalized text in written form for display. The ITN lexicon may include ITN lexicon entries that are each located within an ITN category in the ITN lexicon. | 06-18-2009 |
20090171657 | Hybrid Approach in Voice Conversion - A hybrid approach is described for combining frequency warping and Gaussian Mixture Modeling (GMM) to achieve better speaker identity and speech quality. To train the voice conversion GMM model, line spectral frequency and other features are extracted from a set of source sounds to generate a source feature vector and from a set of target sounds to generate a target feature vector. The GMM model is estimated based on the aligned source feature vector and the target feature vector. A mixture specific warping function is generated each set of mixture mean pairs of the GMM model, and a warping function is generated based on a weighting of each of the mixture specific warping functions. The warping function can be used to convert sounds received from a source speaker to approximate speech of a target speaker. | 07-02-2009 |
20090326945 | METHODS, APPARATUSES, AND COMPUTER PROGRAM PRODUCTS FOR PROVIDING A MIXED LANGUAGE ENTRY SPEECH DICTATION SYSTEM - An apparatus may include a processor configured to receive vocabulary entry data. The processor may be further configured to determine a class for the received vocabulary entry data. The processor may be additionally configured to identify one or more languages for the vocabulary entry data based upon the determined class. The processor may also be configured to generate a phoneme sequence for the vocabulary entry data for each identified language. Corresponding methods and computer program products are also provided. | 12-31-2009 |
20100088097 | USER FRIENDLY SPEAKER ADAPTATION FOR SPEECH RECOGNITION - Improved performance and user experience for speech recognition application and system by utilizing for example offline adaptation without tedious effort by a user. Interactions with a user may be in the form of a quiz, game, or other scenario wherein the user may implicitly provide vocal input for adaptation data. Queries with a plurality of candidate answers may be designed in an optimal and efficient way, and presented to the user, wherein detected speech from the user is then matched to one of the candidate answers, and may be used to adapt an acoustic model to the particular speaker for speech recognition. | 04-08-2010 |
20100145699 | ADAPTATION OF AUTOMATIC SPEECH RECOGNITION ACOUSTIC MODELS - Methods and systems for adapting of acoustic models are disclosed. A user terminal may determine a phoneme distribution of a text corpus, determine an acoustic model gain distribution of phonemes of an acoustic model before and after adaptation of the acoustic model, determine a desired phoneme distribution based on the phoneme distribution and the acoustic model gain distribution, generate an adaption sentence based on the desired phoneme distribution, and generate a prompt requesting a user speak the adaptation sentence. | 06-10-2010 |
20110320741 | METHOD AND APPARATUS PROVIDING FOR DIRECT CONTROLLED ACCESS TO A DYNAMIC USER PROFILE - An apparatus may include a profile determiner configured to determine a user profile. A contextual characteristic determiner may be configured to determine contextual characteristics relating to the apparatus and/or the user of the apparatus such that the profile determiner may infer user preferences and thereby create a dynamic portion of the user profile. An index builder may be configured to build an index of profile categories included within the user profile. A subscription registrar may cause the user profile to be registered for sharing with a service provider. Thereby a profile manager may provide for direct controlled access to the user profile which may be limited by user selection of permission levels and/or profile categories which are shared. Thereby access to the user profile may occur directly with the apparatus without storing the user profile on a separate server. | 12-29-2011 |
20140249815 | METHOD, APPARATUS AND COMPUTER PROGRAM PRODUCT FOR PROVIDING TEXT INDEPENDENT VOICE CONVERSION - An apparatus for providing text independent voice conversion may include a first voice conversion model and a second voice conversion model. The first voice conversion model may be trained with respect to conversion of training source speech to synthetic speech corresponding to the training source speech. The second voice conversion model may be trained with respect to conversion to training target speech from synthetic speech corresponding to the training target speech. An output of the first voice conversion model may be communicated to the second voice conversion model to process source speech input into the first voice conversion model into target speech corresponding to the source speech as the output of the second voice conversion model. | 09-04-2014 |