Patent application number | Description | Published |
20080312931 | SPEECH SYNTHESIS METHOD, SPEECH SYNTHESIS SYSTEM, AND SPEECH SYNTHESIS PROGRAM - A speech synthesis system stores a group of speech units in a memory, selects a plurality of speech units from the group based on prosodic information of target speech, the speech units selected corresponding to each of segments which are obtained by segmenting a phoneme string of the target speech and minimizing distortion of synthetic speech generated from the speech units selected to the target speech, generates a new speech unit corresponding to the each of the segments, by fusing the speech units selected, to obtain a plurality of new speech units corresponding to the segments respectively, and generates synthetic speech by concatenating the new speech units. | 12-18-2008 |
20090018836 | SPEECH SYNTHESIS SYSTEM AND SPEECH SYNTHESIS METHOD - In a speech synthesis, a selecting unit selects one string from first speech unit strings corresponding to a first segment sequence obtained by dividing a phoneme string corresponding to target speech into segments. The selecting unit performs repeatedly generating, based on maximum W second speech unit strings corresponding to a second segment sequence as a partial sequence of the first sequence, third speech unit strings corresponding to a third segment sequence obtained by adding a segment to the second sequence, and selecting maximum W strings from the third strings based on a evaluation value of each of the third strings. The value is obtained by correcting a total cost of each of the third string candidate with a penalty coefficient for each of the third strings. The coefficient is based on a restriction concerning quickness of speech unit data acquisition, and depends on extent in which the restriction is approached. | 01-15-2009 |
20090055158 | Speech translation apparatus and method - A speech translation apparatus includes a speech recognition unit configured to recognize input speech of a first language to generate a first text of the first language, an extraction unit configured to compare original prosody information of the input speech with first synthesized prosody information based on the first text to extract paralinguistic information about each of first words of the first text, a machine translation unit configured to translate the first text to a second text of a second language, a mapping unit configured to allocate the paralinguistic information about each of the first words to each of second words of the second text in accordance with synonymity, a generating unit configured to generate second synthesized prosody information based on the paralinguistic information allocated to each of the second words, and a speech synthesis unit configured to synthesize output speech based on the second synthesized prosody information. | 02-26-2009 |
20100049522 | VOICE CONVERSION APPARATUS AND METHOD AND SPEECH SYNTHESIS APPARATUS AND METHOD - A voice conversion apparatus stores, in a parameter memory, target speech spectral parameters of target speech, stores, in a voice conversion rule memory, a voice conversion rule for converting voice quality of source speech into voice quality of the target speech, extracts, from an input source speech, a source speech spectral parameter of the input source speech, converts extracted source speech spectral parameter into a first conversion spectral parameter by using the voice conversion rule, selects target speech spectral parameter similar to the first conversion spectral parameter from the parameter memory, generates an aperiodic component spectral parameter representing from selected target speech spectral parameter, mixes a periodic component spectral parameter included in the first conversion spectral parameter with the aperiodic component spectral parameter, to obtain a second conversion spectral parameter, and generates a speech waveform from the second conversion spectral parameter. | 02-25-2010 |
20110087488 | SPEECH SYNTHESIS APPARATUS AND METHOD - According to an embodiment, a speech synthesis apparatus includes a selecting unit configured to select speaker's parameters one by one for respective speakers and obtain a plurality of speakers' parameters, the speaker's parameters being prepared for respective pitch waveforms corresponding to speaker's speech sounds, the speaker's parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms. The apparatus includes a mapping unit configured to make formants correspond to each other between the plurality of speakers' parameters using a cost function based on the formant frequencies and the formant powers. The apparatus includes a generating unit configured to generate an interpolated speaker's parameter by interpolating, at desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of formants which are made to correspond to each other. | 04-14-2011 |
20140052446 | PROSODY EDITING APPARATUS AND METHOD - According to one embodiment, a prosody editing apparatus includes a storage, a first selection unit, a search unit, a normalization unit, a mapping unit, a display, a second selection unit, a restoring unit and a replacing unit. The search unit searches the storage for one or more second prosodic patterns corresponding to attribute information that matches attribute information of the selected phrase. The mapping maps each of the normalized second prosodic patterns on a low-dimensional space. The restoring unit restores a restored prosodic pattern according to the selected coordinates. The replacing unit replaces prosody of synthetic speech generated based on the selected phrase by the restored prosodic pattern. | 02-20-2014 |
20140052447 | SPEECH SYNTHESIS APPARATUS, METHOD, AND COMPUTER-READABLE MEDIUM - According to one embodiment, a speech synthesis apparatus is provided with generation, normalization, interpolation and synthesis units. The generation unit generates a first parameter using a prosodic control dictionary of a target speaker and one or more second parameters using a prosodic control dictionary of one or more standard speakers based on language information for an input text. The normalization unit normalizes the one or more second parameters based a normalization parameter. The interpolation unit interpolates the first parameter and the one or more normalized second parameters based on weight information to generate a third parameter and the synthesis unit generates synthesized speech using the third parameter. | 02-20-2014 |
Patent application number | Description | Published |
20090043568 | Accent information extracting apparatus and method thereof - An accent type is determined by outputting mora synchronized signals, extracting a pitch pattern which is a variation pattern of a voice height (fundamental frequency) from a speech signal entered by a user, generating mora synchronized pattern from the pitch pattern and the mora synchronized signal, storing typical patterns for respective accent types, collating the mora synchronized pattern and reference accent pattern, calculating matching of the mora synchronized patterns with respect to the respective accent types, referring the matching and determining the accent type. | 02-12-2009 |
20090055188 | PITCH PATTERN GENERATION METHOD AND APPARATUS THEREOF - The prosody control unit pattern generation module generates pitch patterns in respective prosody control units based on language attribute information, the phoneme duration and emphasis degree information, the modification method decision module decides a modification method by smoothing processing with respect to the pitch pattern in a connection portion between the prosody control unit and at least one of previous and next prosody control units based on at least emphasis degree information to generate modification method information, and the pattern connection module modifies pitch patterns generated in respective prosody control units by smoothing processing according to the modification method information and connects them to generate a sentence pitch pattern corresponding to a text to be a target for speech synthesis. | 02-26-2009 |
20090112580 | Speech processing apparatus and method of speech processing - The speech processing apparatus configured to split a first speech waveform and a second speech waveform into a plurality of frequency bands respectively to generate a first band speech waveform and a second band speech waveform each being a component of each frequency band; determine an overlap-added position between the first band speech waveform and the second band speech waveform by the each frequency band so that a high cross correlation between the first band speech waveform and the second band speech waveform is obtained; and overlap-add the first band speech waveform and the second band speech waveform by the each frequency band on the basis of the overlap-added position and integrates overlap-added band speech waveforms in the plurality of frequency bands over all the plurality of frequency bands to generate a concatenated speech waveform. | 04-30-2009 |
20090150157 | SPEECH PROCESSING APPARATUS AND PROGRAM - A word dictionary including sets of a character string which constitutes a word, a phoneme sequence which constitutes pronunciation of the word and a part of speech of the word is referenced, an entered text is analyzed, the entered text is divided into one or more subtexts, a phoneme sequence and a part of speech sequence are generated for each subtext, the part of speech sequence of the subtext and a list of part of speech sequence are collated to determine whether the phonetic sound of the subtext is to be converted or not, and the phonetic sounds of the phoneme sequence in the subtext whose phonetic sounds are determined to be converted are converted. | 06-11-2009 |
20090216537 | SPEECH SYNTHESIS APPARATUS AND METHOD THEREOF - A speech synthesis apparatus includes a text obtaining device that obtains text data for speech synthesis from the outside, a language processor that carries out morphological analysis/parsing to the text data, a prosodic processor that outputs, to a speech synthesizer, a synthesis unit string based on the prosodic and language related attributes of the text data such as accents and word classes, the speech synthesizer that generates synthesized speech from the synthesis unit string, and a speech waveform output device that reproduces a prescribed amount of output synthesized speech after it is accumulated or sequentially as it is output. | 08-27-2009 |
20100211392 | SPEECH SYNTHESIZING DEVICE, METHOD AND COMPUTER PROGRAM PRODUCT - The speech synthesizing device acquires numerical data at regular time intervals, each piece of the numerical data representing a value having a plurality of digits, detects a change between two values represented by the numerical data that is acquired at two consecutive times, determines which digit of the value represented by the numerical data is used to generate speech data depending on the detected change, generates numerical information that indicates the determined digit of the value represented by the numerical data, and generates speech data from the digit indicated by the numerical information. | 08-19-2010 |
20110246199 | SPEECH SYNTHESIZER - According to one embodiment, a speech synthesizer generates a speech segment sequence and synthesizes speech by connecting speech segments of the generated speech segment sequence. If a speech segment of a synthesized first speech segment sequence is different from the speech segment of a synthesized second speech segment sequence having the same synthesis unit as the first speech segment sequence, the speech synthesizer disables the speech segment of the first speech segment sequence that is different from the speech segment of the second speech segment sequence. | 10-06-2011 |
20120065981 | TEXT PRESENTATION APPARATUS, TEXT PRESENTATION METHOD, AND COMPUTER PROGRAM PRODUCT - According to an embodiment, a text presentation apparatus presenting text for a speaker to read aloud for voice recording includes: a text storing unit for storing first text; a presenting unit for presenting the first text; a determination unit for determining whether or not the first text needs to be replaced, on the basis of a speaker's input for the first text presented; a preliminary text storing unit for storing preliminary text; a select unit configured to select, if it is determined that the first text needs to be replaced, second text to replace the first text from among the preliminary text, the selecting being performed on the basis of attribute information describing an attribute of the first text and on the basis of at least one of attribute information describing pronunciation of the first text and attribute information describing a stress type of the first text; and a control unit configured to control the presenting unit so that the presenting unit presents the second text. | 03-15-2012 |
20120185244 | SPEECH PROCESSING DEVICE, SPEECH PROCESSING METHOD, AND COMPUTER PROGRAM PRODUCT - According to one embodiment, in a speech processing device, an extractor windows a part of the speech signal and extracts a partial waveform. A calculator performs frequency analysis of the partial waveform to calculate a frequency spectrum. An estimator generates an artificial waveform that is a waveform according to an interval between the pitch marks for each harmonic component having a frequency that is a predetermined multiple of a fundamental frequency of the speech signal and estimates harmonic spectral features representing characteristics of the frequency spectrum of the harmonic component from each of the artificial waveforms. A separator separates the partial waveform into a periodic component produced from periodic vocal-fold vibration as an acoustic source and an aperiodic component produced from aperiodic acoustic sources other than the vocal-fold vibration by using the respective harmonic spectral features and the frequency spectrum of the partial waveform. | 07-19-2012 |
Patent application number | Description | Published |
20090048844 | Speech synthesis method and apparatus - A phoneme sequence corresponding to a target speech is divided into a plurality of segments. A plurality of speech units for each segment is selected from a speech unit memory that stores speech units having at least one frame. The plurality of speech units has a prosodic feature accordant or similar to the target speech. A formant parameter having at least one formant frequency is generated for each frame of the plurality of speech units. A fused formant parameter of each frame is generated from formant parameters of each frame of the plurality of speech units. A fused speech unit of each segment is generated from the fused formant parameter of each frame. A synthesized speech is generated by concatenating the fused speech unit of each segment. | 02-19-2009 |
20090144053 | SPEECH PROCESSING APPARATUS AND SPEECH SYNTHESIS APPARATUS - An information extraction unit extracts spectral envelope information of L-dimension from each frame of speech data. The spectral envelope information does not have a spectral fine structure. A basis storage unit stores N bases (L>N>1). Each basis is differently a frequency band having a maximum as a peak frequency in a spectral domain having L-dimension. A value corresponding to a frequency outside the frequency band along a frequency axis of the spectral domain is zero. Two frequency bands of which two peak frequencies are adjacent along the frequency axis partially overlap. A parameter calculation unit minimizes a distortion between the spectral envelope information and a linear combination of each basis with a coefficient by changing the coefficient, and sets the coefficient of each basis from which the distortion is minimized to a spectral envelope parameter of the spectral envelope information. | 06-04-2009 |
20110238420 | METHOD AND APPARATUS FOR EDITING SPEECH, AND METHOD FOR SYNTHESIZING SPEECH - According to one embodiment, a method for editing speech is disclosed. The method can generate speech information from a text. The speech information includes phonologic information and prosody information. The method can divide the speech information into a plurality of speech units, based on at least one of the phonologic information and the prosody information. The method can search at least two speech units from the plurality of speech units. At least one of the phonologic information and the prosody information in the at least two speech units are identical or similar. In addition, the method can store a speech unit waveform corresponding to one of the at least two speech units as a representative speech unit into a memory. | 09-29-2011 |
20120239390 | APPARATUS AND METHOD FOR SUPPORTING READING OF DOCUMENT, AND COMPUTER READABLE MEDIUM - According to one embodiment, an apparatus for supporting reading of a document includes a model storage unit, a document acquisition unit, a feature information extraction, and an utterance style estimation unit. The model storage unit is configured to store a model which has trained a correspondence relationship between first feature information and an utterance style. The first feature information is extracted from a plurality of sentences in a training document. The document acquisition unit is configured to acquire a document to be read. The feature information extraction unit is configured to extract second feature information from each sentence in the document to be read. The utterance style estimation unit is configured to compare the second feature information of a plurality of sentences in the document to be read with the model, and to estimate an utterance style of the each sentence of the document to be read. | 09-20-2012 |
20130080155 | APPARATUS AND METHOD FOR CREATING DICTIONARY FOR SPEECH SYNTHESIS - Apparatus for creating a dictionary for speech synthesis includes a sentence storage unit configured to store N sentences, a sentence display unit configured to selectively display a first sentence which is one of the N sentences, a recording unit configured to record each user speech, a necessity determination unit configured to make a determination of whether to create the dictionary, a dictionary creation unit configured to create the dictionary by utilizing the user speech, and a speech synthesis unit configured to convert a second sentence to a synthesized speech with the dictionary. The determination unit makes the determination under a condition that the recording unit records the user speech of M first sentences (M is less than N) and the determination is based on at least one of an instruction from the user, M and an amount of the recorded user speech. | 03-28-2013 |
20130226584 | SPEECH SYNTHESIS APPARATUS AND METHOD - A speech synthesizing apparatus includes a selector configured to select a plurality of speech units for synthesizing a speech of an input phoneme sequence by referring to speech unit information stored in an information memory. Speech unit waveforms corresponding to the speech units are acquired from a plurality of speech unit waveforms stored in a waveform memory, and the speech is synthesized by concatenating the speech unit waveforms acquired. When acquiring the speech unit waveforms, at least two speech unit waveforms from a continuous region of the waveform memory are copied onto a buffer by one access, wherein a data quantity of the at least two speech unit waveforms is less than or equal to a size of the buffer. | 08-29-2013 |
20140180681 | SPEECH SYNTHESIS APPARATUS AND METHOD - A waveform memory that stores a plurality of speech unit waveforms corresponding to respective speech units, wherein an address order of the speech unit waveforms is determined by a sort order of speech units included in a speech unit sequence corresponding to a phoneme sequence of training data, and the speech units included in the speech unit sequence are selected so as to synthesize a speech of the phone sequence. | 06-26-2014 |