Class / Patent application number | Description | Number of patent applications / Date published |
704266000 | Specialized model | 15 |
20080201150 | VOICE CONVERSION APPARATUS AND SPEECH SYNTHESIS APPARATUS - A conversion rule and a rule selection parameter are stored. The conversion rule converts a spectral parameter of a source speaker to a spectral parameter of a target speaker. The rule selection parameter represents the spectral parameter of the source speaker. A first conversion rule of start timing and a second conversion rule of end timing in a speech unit of the source speaker are selected by the spectral parameter of the start timing and the end timing. An interpolation coefficient corresponding to the spectral parameter of each timing in the speech unit is calculated by the first conversion rule and the second conversion rule. A third conversion rule corresponding to the spectral parameter of each timing in the speech unit is calculated by interpolating the first conversion rule and the second conversion rule with the interpolation coefficient. The spectral parameter of each timing is converted to a spectral parameter of the target speaker by the third conversion rule. A spectral acquired from the spectral parameter of the target speaker is compensated by a spectral compensation quantity. A speech waveform is generated from the compensated spectral. | 08-21-2008 |
20090313025 | Automatic Segmentation in Speech Synthesis - A method and system are disclosed that automatically segment speech to generate a speech inventory. The method includes initializing a Hidden Markov Model (HMM) using seed input data, performing a segmentation of the HMM into speech units to generate phone labels, correcting the segmentation of the speech units. Correcting the segmentation of the speech units includes re-estimating the HMM based on a current version of the phone labels, embedded re-estimating of the HMM, and updating the current version of the phone labels using spectral boundary correction. The system includes modules configured to control a processor to perform steps of the method. | 12-17-2009 |
20090319275 | SPEECH SYNTHESIZING DEVICE, SPEECH SYNTHESIZING SYSTEM, LANGUAGE PROCESSING DEVICE, SPEECH SYNTHESIZING METHOD AND RECORDING MEDIUM - A speech synthesizing device, the device includes: a text accepting unit for accepting text data; an extracting unit for extracting a special character including a pictographic character, a face mark or a symbol from text data accepted by the text accepting unit; a dictionary database in which a plurality of special characters and a plurality of phonetic expressions for each special character are registered; a selecting unit for selecting a phonetic expression of an extracted special character from the dictionary database when the extracting unit extracts the special character; a converting unit for converting the text data accepted by the accepting unit to a phonogram in accordance with a phonetic expression selected by the selecting unit in association with the extracted special character; and a speech synthesizing unit for synthesizing a voice from a phonogram obtained by the converting | 12-24-2009 |
20100049523 | SYSTEM AND METHOD FOR CONFIGURING VOICE SYNTHESIS - Systems and methods for providing synthesized speech in a manner that takes into account the environment where the speech is presented. A method embodiment includes, based on a listening environment and at least one other parameter associated with at least one other parameter, selecting an approach from the plurality of approaches for presenting synthesized speech in a listening environment, presenting synthesized speech according to the selected approach and based on natural language input received from a user indicating that an inability to understand the presented synthesized speech, selecting a second approach from the plurality of approaches and presenting subsequent synthesized speech using the second approach. | 02-25-2010 |
20100076768 | SPEECH SYNTHESIZING APPARATUS, METHOD, AND PROGRAM - Disclosed is a speech synthesizing apparatus including a segment selection unit that selects a segment suited to a target segment environment from candidate segments, includes a prosody change amount calculation unit that calculates prosody change amount of each candidate segment based on prosody information of candidate segments and the target segment environment, a selection criterion calculation unit that calculates a selection criterion based on the prosody change amount, a candidate selection unit that narrows down selection candidates based on the prosody change amount and the selection criterion, and an optimum segment search unit than searches for an optimum segment from among the narrowed-down candidate segments. | 03-25-2010 |
20100145706 | Speech Synthesizing Device, Speech Synthesizing Method, and Program - An object of the present invention is to provide a device and a method for generating a synthesized speech that has an utterance form that matches music. A musical genre estimation unit of the speech synthesizing device estimates the musical genre to which a received music signal belongs, an utterance form selection unit references an utterance form information storage unit to determine an utterance form from the musical genre. A prosody generation unit references a prosody generation rule storage unit, selected from prosody generation rule storage units | 06-10-2010 |
20100305949 | SPEECH SYNTHESIS DEVICE, SPEECH SYNTHESIS METHOD, AND SPEECH SYNTHESIS PROGRAM - It is possible to provide a speech synthesis device, speech synthesis method, and speech synthesis program which can improve a speech quality and reduce a calculation amount with a preferable balance between them. The speech synthesis device includes: a sub-score calculation unit ( | 12-02-2010 |
20110046957 | SYSTEM AND METHOD FOR SPEECH SYNTHESIS USING FREQUENCY SPLICING - Techniques are disclosed for frequency splicing in which speech segments used in the creation of a final speech waveform are constructed, at least in part, by combining (e.g., summing) a small number (e.g., two) of component speech segments that overlap substantially, or entirely, in time but have spectral energy that occupies disjoint, or substantially disjoint, frequency ranges. The component speech segments may be derived from speech segments produced by different speakers or from different speech segments produced by the same speaker. Depending on the embodiment, frequency splicing may supplement rule-based, concatenative, hybrid, or limited-vocabulary speech synthesis systems to provide various advantages. | 02-24-2011 |
20110112840 | SYNTHETIC SOUND GENERATION METHOD AND APPARATUS - A synthetic sound generation method for generating a synthetic sound for making a listener recall an image of an actual sound signal which is a sound signal other than a speech signal and of which the listener knows what sound the actual sound signal is, by hearing the speech signal, comprising the steps of: extracting a signal of a predetermined frequency band of an inputted speech signal; extracting an amplitude envelope curve component of the extracted signal; extracting a signal of a predetermined frequency band of the actual sound signal which is a sound signal other than the speech signal and of which the listener knows what sound the actual sound signal is; and multiplying the amplitude envelope curve component of the inputted speech signal and the extracted predetermined frequency band signal of the actual sound signal. | 05-12-2011 |
20140142946 | SYSTEM AND METHOD FOR VOICE TRANSFORMATION - The present invention is a method and system to convert speech signal into a parametric representation in terms of timbre vectors, and to recover the speech signal thereof. The speech signal is first segmented into non-overlapping frames using the glottal closure instant information, each frame is converted into an amplitude spectrum using a Fourier analyzer, and then using Laguerre functions to generate a set of coefficients which constitute a timbre vector. A sequence of timbre vectors can be subject to a variety of manipulations. The new timbre vectors are converted back into voice signals by first transforming into amplitude spectra using Laguerre functions, then generating phase spectra from the amplitude spectra using Kramers-Knonig relations. A Fourier transformer converts the amplitude spectra and phase spectra into elementary waveforms, then superposed to become the output voice. The method and system can be used for voice transformation, speech synthesis, and automatic speech recognition. | 05-22-2014 |
20150095035 | WIDEBAND SPEECH PARAMETERIZATION FOR HIGH QUALITY SYNTHESIS, TRANSFORMATION AND QUANTIZATION - A method for speech parameterization and coding of a continuous speech signal. The method comprises dividing said speech signal into a plurality of speech frames, and for each one of the plurality of speech frames, modeling said speech frame by a first harmonic modeling to produce a plurality of harmonic model parameters, reconstructing an estimated frame signal from the plurality of harmonic model parameters, subtracting the estimated frame signal from the speech frame to produce a harmonic model residual, performing at least one second harmonic modeling analysis on the first harmonic model residual to determine at least one set of second harmonic model components, removing the at least one set of second harmonic model components from the first harmonic model residual to produce a harmonically-filtered residual signal, and processing the harmonically-filtered residual signal with analysis by synthesis techniques to produce vectors of codebook indices and corresponding gains. | 04-02-2015 |
20150127349 | Method and System for Cross-Lingual Voice Conversion - A method and system for is disclosed for cross-lingual voice conversion. A speech-to-speech system may include hidden Markov model (HMM) HMM based speech modeling for both recognizing input speech and synthesizing output speech. A cross-lingual HMM may be initially set to an output HMM trained with a voice of an output speaker in an output language. An auxiliary HMM may be trained with a voice of an auxiliary speaker in an input language. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each HMM state of the output HMM to a HMM state of the auxiliary HMM. The HMM states of the cross-lingual HMM may be replaced with the matched states. Transforms may be applied to adapt the cross-lingual HMM to the voices of the auxiliary speaker and of an input speaker. The cross-lingual HMM may be used for speech synthesis. | 05-07-2015 |
20150127350 | Method and System for Non-Parametric Voice Conversion - A method and system is disclosed for non-parametric speech conversion. A text-to-speech (TTS) synthesis system may include hidden Markov model (HMM) HMM based speech modeling for both synthesizing output speech. A converted HMM may be initially set to a source HMM trained with a voice of a source speaker. A parametric representation of speech may be extract from speech of a target speaker to generate a set of target-speaker vectors. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each HMM state of the source HMM to a target-speaker vector. The HMM states of the converted HMM may be replaced with the matched target-speaker vectors. Transforms may be applied to further adapt the converted HMM to the voice of target speaker. The converted HMM may be used to synthesize speech with voice characteristics of the target speaker. | 05-07-2015 |
20150348535 | METHOD FOR FORMING THE EXCITATION SIGNAL FOR A GLOTTAL PULSE MODEL BASED PARAMETRIC SPEECH SYNTHESIS SYSTEM - A method is presented for forming the excitation signal for a glottal pulse model based parametric speech synthesis system. In one embodiment, fundamental frequency values are used to form the excitation signal. The excitation is modeled using a voice source pulse selected from a database of a given speaker. The voice source signal is segmented into glottal segments, which are used in vector representation to identify the glottal pulse used for formation of the excitation signal. Use of a novel distance metric and preserving the original signals extracted from the speakers voice samples helps capture low frequency information of the excitation signal. In addition, segment edge artifacts are removed by applying a unique segment joining method to improve the quality of synthetic speech while creating a true representation of the voice quality of a speaker. | 12-03-2015 |
20160189705 | QUANTITATIVE F0 CONTOUR GENERATING DEVICE AND METHOD, AND MODEL LEARNING DEVICE AND METHOD FOR F0 CONTOUR GENERATION - [Object] An object is to provide an F0 contour synthesizing device based on statistic model, to clarify correspondence between linguistic information and F0 contour while maintaining accuracy. | 06-30-2016 |