Class / Patent application number | Description | Number of patent applications / Date published |
704259000 | Neural network | 6 |
20080243510 | OVERLAPPING SCREEN READING OF NON-SEQUENTIAL TEXT - Embodiments of the present invention address deficiencies of the art in respect to screen reading non-sequential text and provide a method, system and computer program product for overlapping screen reading of non-sequential text, such as a tag cloud or Web page header. In an embodiment of the invention, an overlapping screen reading method for a non-sequential list of words can include computing different speech synthesis parameters for different words in a non-sequential list of words, generating different audio forms for each of the different words according to the different speech synthesis parameters, and overlappingly merging the generated different audio forms into a single audio stream. The speech synthesis parameters can include, for instance, separation, volume, tone and location speech synthesis parameters. Thereafter, the method can include playing back the single audio stream to simulate a natural visual scanning of the non-sequential list of words. | 10-02-2008 |
20140358546 | HYBRID PREDICTIVE MODEL FOR ENHANCING PROSODIC EXPRESSIVENESS - Systems and methods for prosody prediction include extracting features from runtime data using a parametric model. The features from runtime data are compared with features from training data using an exemplar-based model to predict prosody of the runtime data. The features from the training data are paired with exemplars from the training data and stored on a computer readable storage medium. | 12-04-2014 |
20140358547 | HYBRID PREDICTIVE MODEL FOR ENHANCING PROSODIC EXPRESSIVENESS - Systems and methods for prosody prediction include extracting features from runtime data using a parametric model. The features from runtime data are compared with features from training data using an exemplar-based model to predict prosody of the runtime data. The features from the training data are paired with exemplars from the training data and stored on a computer readable storage medium. | 12-04-2014 |
20150073804 | DEEP NETWORKS FOR UNIT SELECTION SPEECH SYNTHESIS - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing a representation based on structured data in resources. The methods, systems, and apparatus include actions of receiving target acoustic features output from a neural network that has been trained to predict acoustic features given linguistic features. Additional actions include determining a distance between the target acoustic features and acoustic features of a stored acoustic sample. Further actions include selecting the acoustic sample to be used in speech synthesis based at least on the determined distance and synthesizing speech based on the selected acoustic sample. | 03-12-2015 |
20150364127 | ADVANCED RECURRENT NEURAL NETWORK BASED LETTER-TO-SOUND - The technology relates to performing letter-to-sound conversion utilizing recurrent neural networks (RNNs). The RNNs may be implemented as RNN modules for letter-to-sound conversion. The RNN modules receive text input and convert the text to corresponding phonemes. In determining the corresponding phonemes, the RNN modules may analyze the letters of the text and the letters surrounding the text being analyzed. The RNN modules may also analyze the letters of the text in reverse order. The RNN modules may also receive contextual information about the input text. The letter-to-sound conversion may then also be based on the contextual information that is received. The determined phonemes may be utilized to generate synthesized speech from the input text. | 12-17-2015 |
20150364128 | HYPER-STRUCTURE RECURRENT NEURAL NETWORKS FOR TEXT-TO-SPEECH - The technology relates to converting text to speech utilizing recurrent neural networks (RNNs). The recurrent neural networks may be implemented as multiple modules for determining properties of the text. In embodiments, a part-of-speech RNN module, letter-to-sound RNN module, a linguistic prosody tagger RNN module, and a context awareness and semantic mining RNN module may all be utilized. The properties from the RNN modules are processed by a hyper-structure RNN module that determine the phonetic properties of the input text based on the outputs of the other RNN modules. The hyper-structure RNN module may generate a generation sequence that is capable of being converting to audible speech by a speech synthesizer. The generation sequence may also be optimized by a global optimization module prior to being synthesized into audible speech. | 12-17-2015 |