Entries |
Document | Title | Date |
20080243508 | Prosody-pattern generating apparatus, speech synthesizing apparatus, and computer program product and method thereof - Normalization parameters are generated at a normalization-parameter generating unit by calculating the mean values and the standard deviations of an initial prosody pattern and a prosody pattern of a training sentence of a speech corpus. Then, the variance range or variance width of the initial prosody pattern is normalized at the prosody-pattern normalizing unit in accordance with the normalization parameters. As a result, a prosody pattern similar to speech of human beings and improved in naturalness can be generated with a small amount of calculation. | 10-02-2008 |
20080243509 | Speech module - A speech module ( | 10-02-2008 |
20080275705 | Battery Tester with Wireless Voice Status Messages - Methods and systems are disclosed for providing wireless data transfer and voice messages in a voltage measurement device. In some embodiments, the methods and systems dynamically construct voice messages that substantially correspond to text messages displayed on a display of the measurement device. The voice messages may be assembled from words and phrases that have been prerecorded and stored in a voice module of the measurement device. A wireless communication module transmits the voice messages from the voice module to a wireless receiver that may be worn or carried by the operator. The wireless communication module also facilitates wireless transfer of data from the measurement device to a computer. Such an arrangement allows an operator to conduct tests in noisy, cramp, and/or hazardous environments without having to divert his/her eyes to see or strain his/her ears to hear the measurement device. | 11-06-2008 |
20080281597 | Information processing system and storage medium storing information processing program - A plurality of input devices each includes a speaker, operation data transmitting means, voice data receiving means, and voice controlling means. An information processing apparatus includes voice storing means, object displaying means, operation data acquiring means, pointing position determining means, object specifying means, voice reading means, and voice data transmitting means. The pointing position determining means specifies, for each of the input devices, a pointing position on a screen based on operation data transmitted from the operation data transmitting means. The voice reading means reads voice data corresponding to the pointing position for each of the input devices. The voice data transmitting means transmits the voice data to each of the input devices. The voice controlling means outputs voice from the speaker based on the voice data. | 11-13-2008 |
20080319752 | SPEECH SYNTHESIZER GENERATING SYSTEM AND METHOD THEREOF - A speech synthesizer generating system and a method thereof are provided. A speech synthesizer generator in the speech synthesizer generating system automatically generates a speech synthesizer conforming to a speech output specification input by a user. In addition, a recording script is automatically generated by a recording script generator in the speech synthesizer generating system according to the speech output specification, and a customized or expanded speech material is recorded according to the recording script. After the speech material is uploaded to the speech synthesizer generating system, the speech synthesizer generator automatically generates a speech synthesizer conforming to the speech output specification. The speech synthesizer then synthesizes and outputs a speech output at a user end. | 12-25-2008 |
20090037177 | METHOD AND DEVICE FOR PROVIDING 3D AUDIO WORK - A method for providing a 3D audio work includes providing a one-ear HRTF filter and a related function synthesizer storing a related function therein, and inputting sound signals into the one-ear HRTF filter. The sound signals are converted into one-ear output sound signals which are received by one ear and synthesized to output sound signals for the other ear. A method for providing the related function includes inputting sound signals into HRTF filters of opposite ears and obtaining output sound signals which respectively act as raw signals and target signals. The raw signals are synthesized by a synthesizer to output sound signals which compare with the target signals. A related function registered in the synthesizer is accordingly regulated so as to obtain the related function which satisfies a minimum difference between the output sound signals from the synthesizer and the target signals. | 02-05-2009 |
20090132252 | Unsupervised Topic Segmentation of Acoustic Speech Signal - Disclosed methods and apparatus segment a signal, such as an acoustic speech signal, into coherent segments, such as coherent topics. In the case of an acoustic speech signal, the segmentation relies on only raw acoustic information and may be performed without requiring access to, or generation of, a transcript of the acoustic speech signal. Recurring acoustic patterns are found by matching pairs of sounds, based on acoustic similarity. Information about distributional similarity from multiple local comparisons is aggregated and is further processed to fill gaps in the data by growing regions that represent recurring acoustic patterns. Selection criteria are used to identify coherent topics represented by the grown regions and topic boundaries therebetween. Another signal, such as a video signal, may be partitioned according to topic boundaries identified in an acoustic speech signal that is related to the video signal. Other (non-acoustic) one-dimensional signals, such as electrocardiogram (EKG) signals, may be automatically segmented into parts, such as parts that relate to normal and to abnormal heart beats. | 05-21-2009 |
20090132253 | Context-aware unit selection - Methods and apparatuses to perform context-aware unit selection for natural language processing are described. Streams of information associated with input units are received. The streams of information are analyzed in a context associated with first candidate units to determine a first set of weights of the streams of information. A first candidate unit is selected from the first candidate units based on the first set of weights of the streams of information. The streams of information are analyzed in the context associated with second candidate units to determine a second set of weights of the streams of information. A second candidate unit is selected from second candidate units to concatenate with the first candidate unit based on the second set of weights of the streams of information. | 05-21-2009 |
20090138267 | Audio Coding System Using Temporal Shape of a Decoded Signal to Adapt Synthesized Spectral Components - A receiver in an audio coding system receives a signal conveying frequency subband signals representing an audio signal. The subband signals are examined to assess one or more characteristics of the audio signal including temporal shape. Spectral components are synthesized having the one or more assessed characteristics, integrated with the subband signals and passed through a synthesis filterbank to generate an output signal. | 05-28-2009 |
20090157406 | Acoustic Signal Transmission Method And Acoustic Signal Transmission Apparatus - The acoustic signal transmission method is based on generating a synthesized sound electrical signal by electrically synthesizing an audible sound signal and another signal different than the audible sound signal at the sending side, and transmitting the synthesized sound electrical signal, and extracting the another signal different than the audible sound signal from the synthesized sound electrical signal at the receiving side. Here, generation of the synthesized sound electrical signal is made by using a data hiding technique, for example. Accordingly, the acoustic signal represented by the synthesized sound electrical signal can be heard by human ears in the same way as the audible sound signal, and the synthesized another signal cannot be detected by human ears. Here, the synthesized sound electrical signal can be transmitted as a sound wave in air space, as electrical signal through a transmission line or as radio signals such as infrared and electromagnetic waves. It is also possible to transport or distribute the synthesized sounds by recording on recording media such as compact disc and DVD. Also, using the signal extracted from the synthesized sound electrical signal, applications such as controlling machines such as robot, transmitting text data such as information for car navigation systems, computer network addresses and commercial business information. | 06-18-2009 |
20090164219 | Accelerometer-Based Control of Wearable Devices - Accelerometer-based orientation and/or movement detection for controlling wearable devices, such as wrist-worn audio recorders and wristwatches. A wrist-worn audio recorder can use an accelerometer to detect the orientation and/or movement of a user's wrist and subsequently activate a corresponding audio-recorder function, for instance recording or playback. A wearable device with a vibration mechanism can use this method to remind a user of an undesirable movement such as restless leg movement. Likewise, a talking wristwatch can use this method to activate audio reporting of time when a user moves or orients his or her wrist in close proximity to his or her ear. In such applications, and many others, accelerometer-based control of the wearable device offers significant advantages over conventional means of control, particularly in terms of ease of use and durability. | 06-25-2009 |
20090171665 | METHOD AND APPARATUS FOR CREATING AND MODIFYING NAVIGATION VOICE SYNTAX - Techniques are described for enabling flexible and dynamic creation and/or modification of voice data for a position-determining device. In some embodiments, a voice package is provided that includes a language database and a plurality of audio files. The language database specifies appropriate syntax and vocabulary for information that is intended for audio output by a position-determining device. The audio files include words and/or phrases that may be accessed by the position-determining device to communicate the information via audible output. Some embodiments utilize a voice package toolkit to construct and/or customize one or more parts of a voice package. | 07-02-2009 |
20090281808 | VOICE DATA CREATION SYSTEM, PROGRAM, SEMICONDUCTOR INTEGRATED CIRCUIT DEVICE, AND METHOD FOR PRODUCING SEMICONDUCTOR INTEGRATED CIRCUIT DEVICE - A voice data creation system includes a dictionary data memory section that stores dictionary data for generating synthesized voice data corresponding to text data; an edition processing section that displays an edition screen for editing a voice guidance message as a sentence including a plurality of phrases to receive edition input formation so as to perform an edition processing based on the edition input information; a list information generation processing section that generates list information relating to each sentence and phrases included in the each sentence based on a result of the edition processing; a phrase voice data generating section that determines a target phrase for voice data creation based on the list information to generate and maintain voice data corresponding to the target phrase determined for voice data creation based on the dictionary data; and a memory write information generating section that determines a target phrase to be stored in a voice data memory based on the list information to generate memory write information including voice data of the target phrase determined to be stored in the memory. In the system, the edition processing section divides a sentence input on the edition screen into a plurality of phrases based on text data of the sentence input; the list information generation processing section specifies the phrases included in the sentence and a reproduction order of the phrases based on a result of the sentence division to generate sentence information including phrase specification information of the phrases included in the sentence and sequence information relating to the reproduction order of the phrases; the phrase voice data generating section generates synthesized voice data corresponding to text data of the target phrase for voice data creation based on the dictionary data; and the memory write information generating section determines the target phrase to be stored in the memory such that voice data of a phrase used commonly in a plurality of sentences and a phrase used a plurality of number of times in a single sentence are not duplicately stored. | 11-12-2009 |
20090306985 | SYSTEM AND METHOD FOR SYNTHETICALLY GENERATED SPEECH DESCRIBING MEDIA CONTENT - Disclosed herein are systems, methods, and computer readable-media for providing an automatic synthetically generated voice describing media content, the method comprising receiving one or more pieces of metadata for a primary media content, selecting at least one piece of metadata for output, and outputting the at least one piece of metadata as synthetically generated speech with the primary media content. Other aspects of the invention involve alternative output, output speech simultaneously with the primary media content, output speech during gaps in the primary media content, translate metadata in foreign language, tailor voice, accent, and language to match the metadata and/or primary media content. A user may control output via a user interface or output may be customized based on preferences in a user profile. | 12-10-2009 |
20100082344 | SYSTEMS AND METHODS FOR SELECTIVE RATE OF SPEECH AND SPEECH PREFERENCES FOR TEXT TO SPEECH SYNTHESIS - Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back. | 04-01-2010 |
20100094631 | APPARATUS AND METHOD FOR SYNTHESIZING AN OUTPUT SIGNAL - An apparatus for synthesizing a rendered output signal having a first audio channel and a second audio channel includes a decorrelator stage for generating a decorrelator signal based on a downmix signal, and a combiner for performing a weighted combination of the downmix signal and a decorrelated signal based on parametric audio object information, downmix information and target rendering information. The combiner solves the problem of optimally combining matrixing with decorrelation for a high quality stereo scene reproduction of a number of individual audio objects using a multichannel downmix. | 04-15-2010 |
20100145701 | USER VOICE MIXING DEVICE, VIRTUAL SPACE SHARING SYSTEM, COMPUTER CONTROL METHOD, AND INFORMATION STORAGE MEDIUM - A sensation of presence of voice chat in a virtual space is enhanced. A user speech synthesizer used in a virtual space sharing system where information processing devices share the virtual space. The user speech synthesizer comprises a speech data acquiring section ( | 06-10-2010 |
20100145702 | ASSOCIATION OF CONTEXT DATA WITH A VOICE-MESSAGE COMPONENT - Disclosed are a system, method, and article of manufacture of associating a context data with a voice-message component. The context data may be encoded into a voice message signal. The context-data may be associated with the voice-message component according to an attribute of the voice message. The attribute of the voice message may include at least one of a word, a phrase, a voice timbre, a duration of a pause between two words, a volume a voice and an ambient sound voice and a duration of a pause between two words. The context data may be selected according to a meaning of the attribute of the voice-message component. | 06-10-2010 |
20100153113 | AUTOMATIC CREATION AND TRANSMISSION OF DATA ORIGINATING FROM ENTERPRISE INFORMATION SYSTEMS AS AUDIO PODCASTS - A system and method for the automatic creation and transmission of data and information originating from enterprise information systems, such as ERP, CRM, SRM, and so on, as audio podcasts, for playback on portable media players, mobile devices or personal computers. | 06-17-2010 |
20100286986 | Methods and Apparatus for Rapid Acoustic Unit Selection From a Large Speech Corpus - A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and aching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. However, statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice. A method for constructing an efficient concatenation cost database is provided by synthesizing a large body of speech, identifying the acoustic unit sequential pairs generated and their respective concatenation costs, and storing those concatenation costs likely to occur. By constructing a concatenation cost database in this fraction, the processing power required at run-time is greatly reduced with negligible effect on speech quality. | 11-11-2010 |
20110046954 | PORTABLE AUDIO CONTROL SYSTEM AND AUDIO CONTROL DEVICE THEREOF - A portable audio control system that controls an audio signal transmitted from an electronic device, including an earphone device and an audio control device. The audio control device includes an audio source receiver, a signal synthesis module, and an audio output unit. The audio receiver, which is connected with the electronic device, is used for receiving the audio signal. The signal synthesis module receives both the audio signal and a voice signal coming from an external audio resource, and then synthesizes those signals. The audio transmitter is used to output the synthesized sound to the earphone device. As users utilize the portable audio control system to connect with the electronic device, both sound from the electronic device and the external voice or song can be listened at the same time. | 02-24-2011 |
20110054902 | SINGING VOICE SYNTHESIS SYSTEM, METHOD, AND APPARATUS - A singing voice synthesis system is provided. The storage unit stores at least one tune. The tempo unit provides a set of tempo cues in accordance with a selected tune from the at least one tune. The input unit receives a plurality of original voice signals corresponding to the selected tune. The processing unit processes the original voice signals and generates a synthesized singing voice signal according to the selected tune. | 03-03-2011 |
20110066438 | CONTEXTUAL VOICEOVER - A method for providing voice feedback with playback of media on an electronic device is provided. In one embodiment, the method may include determining one or more characteristics of the media with which the voice feedback is associated. For instance, the media may include a song, and the determined characteristics could include one or more of genre, reverberation, pitch, balance, timbre, tempo, or the like. The method may also include processing the voice feedback to alter characteristics thereof based on the one or more determined characteristics of the associated media. Additional methods, devices, and manufactures are also disclosed. | 03-17-2011 |
20110093272 | MEDIA PROCESS SERVER APPARATUS AND MEDIA PROCESS METHOD THEREFOR - A media process server apparatus has a speech synthesis data storage device for storing, after categorizing into emotions, data for speech synthesis in association with a user identifier, a text analyzer for determining, from a text message received from a message server apparatus, emotion of text, and a speech data synthesizer for generating speech data with emotional expression by synthesizing speech corresponding to the text, using data for speech synthesis that corresponds to the determined emotion and that is in association with a user identifier of a user who is a transmitter of the text message. | 04-21-2011 |
20110119061 | METHOD AND SYSTEM FOR DIALOG ENHANCEMENT - A method and system for enhancing dialog determined by an audio input signal. In some embodiments the input signal is a stereo signal, and the system includes an analysis subsystem configured to analyze the stereo signal to generate filter control values, and a filtering subsystem including upmixing circuitry configured to upmix the input signal to generate a speech channel and non-speech channels and a peaking filter configured to filter the speech channel to enhance dialog while being steered by at least one of the control values. The filtering subsystem also includes ducking circuitry for attenuating the non-speech channels while being steered by at least some of the control values, and downmixing circuitry configured to combine outputs of the peaking filter and ducking circuitry to generate a filtered stereo output. In some embodiments, the system is configured to downmix a multichannel input signal to generate a downmixed stereo signal, an analysis subsystem is configured to analyze the downmixed stereo signal to generate filter control values, and a filtering subsystem is configured to generate a dialog-enhanced audio signal in response to the input signal while being steered by at least some of the filter control values. Preferably, the filter control values are generated without use of feedback including by generating power ratios (for pairs of speech and non-speech channels) and preferably also shaping in nonlinear fashion and scaling at least one of the power ratios. | 05-19-2011 |
20110144997 | VOICE SYNTHESIS MODEL GENERATION DEVICE, VOICE SYNTHESIS MODEL GENERATION SYSTEM, COMMUNICATION TERMINAL DEVICE AND METHOD FOR GENERATING VOICE SYNTHESIS MODEL - A voice synthesis model generation device, a voice synthesis model generation system, a communication terminal device, and a method for generating a voice synthesis model all of which are capable of preferably acquiring a user's voice. A voice synthesis model generation system is configured to include a mobile communication terminal device and a voice synthesis model generation device. The mobile communication terminal device includes a characteristic amount extraction portion that extracts a characteristic amount of input voice, and a text data acquisition portion that acquires text data from the voice. The voice synthesis model device includes a voice synthesis model generation portion that generates a voice synthesis model based on the characteristic amount and the text data that are acquired by a learning information acquisition portion, an image information generation portion that generates image information based on a parameter based on the characteristic amount and the text data, and an information output portion that transmits the image information to the mobile communication terminal device. | 06-16-2011 |
20110246199 | SPEECH SYNTHESIZER - According to one embodiment, a speech synthesizer generates a speech segment sequence and synthesizes speech by connecting speech segments of the generated speech segment sequence. If a speech segment of a synthesized first speech segment sequence is different from the speech segment of a synthesized second speech segment sequence having the same synthesis unit as the first speech segment sequence, the speech synthesizer disables the speech segment of the first speech segment sequence that is different from the speech segment of the second speech segment sequence. | 10-06-2011 |
20110313770 | ELECTRONIC EMERGENCY MESSAGING SYSTEM - An electronic alert apparatus comprises a radio frequency receiver configured to receive and identify an emergency message preamble that indicates an impending transmission of an emergency message sent at a first data rate and an emergency message addressed to a shared device address sent at a second data rate. The electronic alert apparatus also includes a processor operatively connected to the radio frequency receiver and configured to decode the encoded emergency message. A memory is operatively connected to the processor and configured to store the decoded emergency message. A display is operatively connected to the controller and configured to present the decoded emergency message. A power source is configured to supply electrical power to the processor, and a housing is configured to at least partially enclose the processor, the memory, and the power source. | 12-22-2011 |
20110313771 | METHOD AND DEVICE FOR AUDIBLY INSTRUCTING A USER TO INTERACT WITH A FUNCTION - A method for audibly instructing a user to interact with a function. A function is associated with a user-written selectable item. The user-written selectable item is recognized on a surface. In response to recognizing the user-written selectable item, a first instructional message related to the operation of the function is audibly rendered without requiring further interaction from the user. | 12-22-2011 |
20120016674 | Modification of Speech Quality in Conversations Over Voice Channels - Techniques are disclosed for modifying speech quality in a conversation over a voice channel. For example, a method for modifying a speech quality associated with a spoken utterance transmittable over a voice channel comprises the following steps. The spoken utterance is obtained prior to an intended recipient of the spoken utterance receiving the spoken utterance. An existing speech quality of the spoken utterance is determined. The existing speech quality of the spoken utterance is compared to at least one desired speech quality associated with at least one previously obtained spoken utterance to determine whether the existing speech quality substantially matches the desired speech quality. At least one characteristic of the spoken utterance is modified to change the existing speech quality of the spoken utterance to the desired speech quality when the existing speech quality does not substantially match the desired speech quality. The spoken utterance is presented with the desired speech quality to the intended recipient. | 01-19-2012 |
20120065977 | System and Method for Teaching Non-Lexical Speech Effects - Herein, a method is disclosed, which may include delexicalizing a first speech segment to provide a first prosodic speech signal; storing data indicative of the first prosodic speech signal in a computer memory; audibly playing the first speech segment to a language student; prompting the student to recite the speech segment; and recording audible speech uttered by the student in response to the prompt. | 03-15-2012 |
20120065978 | VOICE PROCESSING DEVICE - In voice processing, a first distribution generation unit approximates a distribution of feature information representative of voice of a first speaker per a unit interval thereof as a mixed probability distribution which is a mixture of a plurality of first probability distributions corresponding to a plurality of different phones. A second distribution generation unit also approximates a distribution of feature information representative of voice of a second speaker as a mixed probability distribution which is a mixture of a plurality of second probability distributions. A function generation unit generates, for each phone, a conversion function for converting the feature information of voice of the first speaker to that of the second speaker based on respective statistics of the first and second probability distributions that correspond to the phone. | 03-15-2012 |
20120072223 | SYSTEM AND METHOD FOR CONFIGURING VOICE SYNTHESIS - Systems and methods for providing synthesized speech in a manner that takes into account the environment where the speech is presented. A method embodiment includes, based on a listening environment and at least one other parameter associated with at least one other parameter, selecting an approach from the plurality of approaches for presenting synthesized speech in a listening environment, presenting synthesized speech according to the selected approach and based on natural language input received from a user indicating that an inability to understand the presented synthesized speech, selecting a second approach from the plurality of approaches and presenting subsequent synthesized speech using the second approach. | 03-22-2012 |
20120078632 | VOICE-BAND EXTENDING APPARATUS AND VOICE-BAND EXTENDING METHOD - An optical device includes a fast Fourier transform (FFT) unit, a signal noise ratio (SNR) calculation processing unit, a band selecting unit, an extension-signal creating unit, an addition unit, and an inverse fast Fourier transform (IFFT) unit. The FFT unit performs the Fourier transform on an input signal that is input from the outside. The SNR calculation processing unit calculates an SNR with respect to each of bands in the input signal. The band selecting unit selects a band of which SNR exceeds a threshold and is the maximum SNR, based on respective SNRs of the bands. The extension-signal creating unit creates an extension signal based on a signal acquired by the band selecting unit. The addition unit adds the extension signal to the input signal, and creates a band-extended signal. The IFFT unit performs the inverse fast Fourier transform on the band-extended signal, and creates an output signal. | 03-29-2012 |
20120089399 | Voice Over Short Messaging Service - A method of operating a mobile communication device is described. A text message is received over a wireless messaging channel, wherein the text message contains a non-text representation of an utterance. The non-text representation is extracted from the text message, and an audio representation of the spoken utterance is synthesized from the non-text representation. | 04-12-2012 |
20120095767 | VOICE QUALITY CONVERSION DEVICE, METHOD OF MANUFACTURING THE VOICE QUALITY CONVERSION DEVICE, VOWEL INFORMATION GENERATION DEVICE, AND VOICE QUALITY CONVERSION SYSTEM - A device includes: an input speech separation unit which separates an input speech into vocal tract information and voicing source information; a mouth opening degree calculation unit which calculates a mouth opening degree from the vocal tract information; a target vowel database storage unit which stores pieces of vowel information on a target speaker; an agreement degree calculation unit which calculates a degree of agreement between the calculated mouth opening degree and a mouth opening degree included in the vowel information; a target vowel selection unit which selects the vowel information from among the pieces of vowel information, based on the calculated agreement degree; a vowel transformation unit which transforms the vocal tract information on the input speech, using vocal tract information included in the selected vowel information; and a synthesis unit which generates a synthetic speech using the transformed vocal tract information and the voicing source information. | 04-19-2012 |
20120109653 | Very Low Bit Rate Signal Coder and Decoder - Improved oscillator-based source modeling methods for estimating model parameters, for evaluating model quality for restoring the input from the model parameters, and for improving performance over known in the art methods are disclosed. An application of these innovations to speech coding is described. The improved oscillator model is derived from the information contained in the current input signal as well as from some form of data history, often the restored versions of the earlier processed data. Operations can be performed in real time, and compression can be achieved at a user-specified level of performance and, in some cases, without information loss. The new model can be combined with methods in the existing art in order to complement the properties of these methods, to improve overall performance. The present invention is effective for very low bit-rate coding/compression and decoding/decompression of digital signals, including digitized speech and audio signals. | 05-03-2012 |
20120130717 | Real-time Animation for an Expressive Avatar - Techniques for providing real-time animation for a personalized cartoon avatar are described. In one example, a process trains one or more animated models to provide a set of probabilistic motions of one or more upper body parts based on speech and motion data. The process links one or more predetermined phrases that represent emotional states to the one or more animated models. After creation of the models, the process receives real-time speech input. Next, the process identifies an emotional state to be expressed based on the one or more predetermined phrases matching in context to the real-time speech input. The process then generates an animated sequence of motions of the one or more upper body parts by applying the one or more animated models in response to the real-time speech input. | 05-24-2012 |
20120136663 | METHODS AND APPARATUS FOR RAPID ACOUSTIC UNIT SELECTION FROM A LARGE SPEECH CORPUS - A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs or acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and aching the concatenation costs. The number of possible sequential pairs of acoustic units makes such caching prohibitive. Statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs or acoustic units occur in practice. The system synthesizes a large body of speech, identifies the acoustic unit sequential pairs generated and their respective concatenation costs, and stores those concatenation costs likely to occur. | 05-31-2012 |
20120150542 | TELEPHONE OR OTHER DEVICE WITH SPEAKER-BASED OR LOCATION-BASED SOUND FIELD PROCESSING - A method includes obtaining audio data representing audio content from at least one speaker. The method also includes spatially processing the audio data to create at least one sound field, where each sound field has a spatial characteristic that is unique to a specific speaker. The method further includes generating the at least one sound field using the processed audio data. The audio data could represent audio content from multiple speakers, and generating the at least one sound field could include generating multiple sound fields around a listener. The spatially processing could include performing beam forming to create multiple directional beams, and generating the multiple sound fields around the listener could include generating the directional beams with different apparent origins around the listener. The method could further include separating the audio data based on speaker, where each sound field is associated with the audio data from one of the speakers. | 06-14-2012 |
20120197645 | Electronic Apparatus - An electronic apparatus includes a communication module, a storage module, a manipulation module, voice output control module, and a control module. The communication module receives book data delivered externally. The storage module stores the received book data. The manipulation module converts a manipulation of a user into an electrical signal. The voice output control module reproduces, as a voice, the book data based on the manipulation while controlling the reproduction speed of the voice. The control module determines a part that is important to the user, stores, in the storage module, a position of voice reproduction of the book data, and synchronizes the position of the voice reproduction with a reproduction position in the book data. | 08-02-2012 |
20130013312 | METHOD AND SYSTEM FOR PRESELECTION OF SUITABLE UNITS FOR CONCATENATIVE SPEECH - A system and method for improving the response time of text-to-speech synthesis using triphone contexts. The method includes receiving input text, selecting a plurality of N phoneme units from a triphone unit selection database as candidate phonemes for synthesized speech based on the input text, wherein the triphone unit selection database comprises triphone units each comprising three phones and if the candidate phonemes are available in the triphone unit selection database, and applying a cost process to select a set of phonemes from the candidate phonemes. If so candidate phonemes are available in the triphone unit selection database, the method includes applying a single phoneme approach to select single phonemes for synthesis, the single phonemes used in synthesis independent of a triphone structure. | 01-10-2013 |
20130035940 | ELECTROLARYNGEAL SPEECH RECONSTRUCTION METHOD AND SYSTEM THEREOF - The invention provides an electrolaryngeal speech reconstruction method and a system thereof. Firstly, model parameters are extracted from the collected speech as a parameter library, then facial images of a speaker are acquired and then transmitted to an image analyzing and processing module to obtain the voice onset and offset times and the vowel classes, then a waveform of a voice source is synthesized by a voice source synthesis module, finally, the waveform of the above voice source is output by an electrolarynx vibration output module, wherein the voice source synthesis module firstly sets the model parameters of a glottal voice source so as to synthesize the waveform of the glottal voice source, and then a waveguide model is used to simulate sound transmission in a vocal tract and select shape parameters of the vocal tract according to the vowel classes. | 02-07-2013 |
20130066631 | PARAMETRIC SPEECH SYNTHESIS METHOD AND SYSTEM - The present invention provides a parametric speech synthesis method and a parametric speech synthesis system. The method comprises sequentially processing each frame of speech of each phone in a phone sequence of an input text as follows: for a current phone, extracting a corresponding statistic model from a statistic model library and using model parameters of the statistic model that correspond to the current frame of the current phone as rough values of currently predicted speech parameters; according to the rough values and information about a predetermined number of speech frames occurring before the current time point, obtaining smoothed values of the currently predicted speech parameters; according to global mean values and global standard deviation ratios of the speech parameters obtained through statistics, performing global optimization on the smoothed values of the speech parameters to generate necessary speech parameters; and synthesizing the generated speech parameters to obtain a frame of speech synthesized for the current frame of the current phone. With this solution, the capacity of an RAM needed by speech synthesis will not increase with the length of the synthesized speech, and the time length of the synthesized speech is no longer limited by the RAM. | 03-14-2013 |
20130166303 | ACCESSING MEDIA DATA USING METADATA REPOSITORY - A computer-implemented method includes receiving, in a computer system, a user query comprising at least a first term, parsing the user query to at least determine whether the user query assigns a field to the first term, the parsing resulting in a parsed query that conforms to a predefined format, performing a search in a metadata repository using the parsed query, the metadata repository embodied in a computer readable medium and including triplets generated based on multiple modes of metadata for video content, the search identifying a set of candidate scenes from the video content, ranking the set of candidate scenes according to a scoring metric into a ranked scene list, and generating an output from the computer system that includes at least part of the ranked scene list, the output generated in response to the user query. | 06-27-2013 |
20130238337 | VOICE QUALITY CONVERSION SYSTEM, VOICE QUALITY CONVERSION DEVICE, VOICE QUALITY CONVERSION METHOD, VOCAL TRACT INFORMATION GENERATION DEVICE, AND VOCAL TRACT INFORMATION GENERATION METHOD - A voice quality conversion system includes: an analysis unit which analyzes sounds of plural vowels of different types to generate first vocal tract shape information for each type of the vowels; a combination unit which combines, for each type of the vowels, the first vocal tract shape information on that type of vowel and the first vocal tract shape information on a different type of vowel to generate second vocal tract shape information on that type of vowel; and a synthesis unit which (i) combines vocal tract shape information on a vowel included in input speech and the second vocal tract shape information on the same type of vowel to convert vocal tract shape information on the input speech, and (ii) generates a synthetic sound using the converted vocal tract shape information and voicing source information on the input speech to convert the voice quality of the input speech. | 09-12-2013 |
20130253934 | SOCIAL BROADCASTING USER EXPERIENCE - A method of providing user participation in a social broadcast environment is disclosed. A network communication is received from a user of a broadcast that includes a preference data indicating a preference of the user that a promoted content be included in the broadcast. Via a responsive network communication, a feedback data is provided to the user that includes a predicted future time at which the promoted content may be included in the broadcast. | 09-26-2013 |
20130325476 | APPARATUS AND METHOD FOR GENERATING WAVE FIELD SYNTHESIS SIGNALS - An apparatus and method for generating a wave field synthesis (WFS) signal in consideration of a height of a speaker are disclosed. The WFS signal generation apparatus may include a waveform propagation distance determination unit to determine a propagation distance of a waveform propagate from a sound source based on a height of a speaker, and a WFS signal generation unit to generate a WFS signal corresponding to the speaker using the propagation distance of the waveform. | 12-05-2013 |
20130325477 | SPEECH SYNTHESIS SYSTEM, SPEECH SYNTHESIS METHOD AND SPEECH SYNTHESIS PROGRAM - A speech synthesis system includes: a training database storing training data which is set of features extracted from speech waveform data; a feature space division unit which divides a feature space which is a space concerning to the training data into partial spaces; a sparse or dense state detection unit which detects a sparse or dense state to each partial space which is the divided feature space, generates sparse or dense information which is information indicating the sparse or dense state and outputs the sparse or dense information; and a pronunciation information correcting unit which corrects pronunciation information which is used for speech synthesis based on the outputted sparse or dense information. | 12-05-2013 |
20140046667 | SYSTEM FOR CREATING MUSICAL CONTENT USING A CLIENT TERMINAL - A system for creating musical content using a client terminal, wherein diverse musical information such as a desired lyric and musical scale, duration and singing technique is input from an online or cloud computer, an embedded terminal or other such client terminal by means of technology for generating musical vocal content by using computer speech synthesis technology, and then speech in which cadence is expressed in accordance with the musical scale is synthesized as speech run by being produced for the applicable duration and is transmitted to the client terminal is provided. | 02-13-2014 |
20140067396 | SEGMENT INFORMATION GENERATION DEVICE, SPEECH SYNTHESIS DEVICE, SPEECH SYNTHESIS METHOD, AND SPEECH SYNTHESIS PROGRAM - A segment information generation device includes a waveform cutout unit cuts out a speech waveform from natural speech at a time period not depending on a pitch frequency of the natural speech. A feature parameter extraction unit extracts a feature parameter of a speech waveform from the speech waveform cut out by the waveform cutout unit. A time domain waveform generation unit generates a time domain waveform based on the feature parameter. | 03-06-2014 |
20140081642 | System and Method for Configuring Voice Synthesis - Systems and methods for providing synthesized speech in a manner that takes into account the environment where the speech is presented. A method embodiment includes, based on a listening environment and at least one other parameter associated with at least one other parameter, selecting an approach from the plurality of approaches for presenting synthesized speech in a listening environment, presenting synthesized speech according to the selected approach and based on natural language input received from a user indicating that an inability to understand the presented synthesized speech, selecting a second approach from the plurality of approaches and presenting subsequent synthesized speech using the second approach. | 03-20-2014 |
20140136207 | VOICE SYNTHESIZING METHOD AND VOICE SYNTHESIZING APPARATUS - A voice synthesizing apparatus includes a first receiver configured to receive first utterance control information generated by detecting a start of a manipulation on a manipulating member by a user, a first synthesizer configured to synthesize, in response to a reception of the first utterance control information, a first voice corresponding to a first phoneme in a phoneme sequence of a voice to be synthesized to output the first voice, a second receiver configured to receive second utterance control information generated by detecting a completion of the manipulation on the manipulating member or a manipulation on a different manipulating member, and a second synthesizer configured to synthesize, in response to a reception of the second utterance control information, a second voice including at least the first phoneme and a succeeding phoneme being subsequent to the first phoneme of the voice to be synthesized to output the second voice. | 05-15-2014 |
20140163990 | TENNIS UMPIRE - A system for communicating game information in real time by audibly announcing game information in human speech and in a preferred embodiment also visually displaying game information, the system comprising an electronic umpire unit that receives game change information transmitted by an electronic point transmitter that may be worn by a game player or spectator. | 06-12-2014 |
20140163991 | SYSTEMS AND METHODS FOR SOURCE SIGNAL SEPARATION - A method of processing a signal, including taking a signal formed from a plurality of source signal emitters and expressed in an original domain, decomposing the signal into a mathematical representation of a plurality of constituent elements in an alternate domain, analyzing the plurality of constituent elements to associate at least a subset of the constituent elements with at least one of the plurality of source signal emitters, separating at least a subset of the constituent elements based on the association and reconstituting at least a subset of constituent elements to produce an output signal in at least one of the original domain, the alternate domain and another domain. | 06-12-2014 |
20140180695 | GENERATION OF CONVERSATION TO ACHIEVE A GOAL - Conversation to reach a goal may be created by stitching together pieces of past conversations. Conversations are stored and indexed. A user specifies a goal that the user would like to achieve through conversation. Pieces of conversation that could achieve that goal are retrieved and/or stitched together from smaller conversation fragments, and the resulting conversation pieces are evaluated for merit. The merit evaluator is pluggable so that different merit calculations may be used for various different situations. The conversation may be displayed or spoken to the user as a prompt, so that the user can engage in a real conversation with a real person based on the guidance received. The system can react to the current state of the conversation, and may change conversational strategies or even conversational goals during the course of the conversation. | 06-26-2014 |
20140236602 | Synthesizing Vowels and Consonants of Speech - For speech synthesis, a vowel module sequentially applies a vowel filter set to form a vowel with a source signal. The vowel filter set is selected from a logical arc traversing a vowel filter array. The vowel filter array includes a plurality of vowel filters organized as a logical space. Vowel filters traversed by the logical arc are selected. A consonant module sequentially applies a consonant filter set from a consonant filter array to form a consonant with the source signal. The consonant filter set is selected in response to a discrete consonant value. | 08-21-2014 |
20140244262 | VOICE SYNTHESIZING METHOD, VOICE SYNTHESIZING APPARATUS AND COMPUTER-READABLE RECORDING MEDIUM - A voice synthesizing apparatus includes a manipulation determiner configured to determine a manipulation position which is moved according to a manipulation of a user, and a voice synthesizer configured to generate, in response to an instruction to generate a voice in which a second phoneme follows a first phoneme, a voice signal so that vocalization of the first phoneme starts before the manipulation position reaches a reference position and that vocalization from the first phoneme to the second phoneme is made when the manipulation position reaches the reference position. | 08-28-2014 |
20140309997 | Information Processing Apparatus and Control Method - One embodiment provides an information processing apparatus, including: a sound synthesizer configured to output a combined sound signal of an alarm sound signal and a remaining sound signal other than the alarm sound signal; a sound separator configured to separate the combined sound signal into a human voice signal and a background sound signal; an alarm sound detector configured to detect whether a background sound corresponding to the background sound signal output from the sound separator includes an alarm sound or not; and an alarm sound receiver configured to, when the alarm sound detector detects that the background sound includes the alarm sound, read the alarm sound signal corresponding to the detected alarm sound, and further combine the read alarm sound signal with the combined sound signal to thereby output an adjusted combined sound signal. | 10-16-2014 |
20140350940 | System and Method for Generalized Preselection for Unit Selection Synthesis - Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for unit selection synthesis. The method causes a computing device to add a supplemental phoneset to a speech synthesizer front end having an existing phoneset, modify a unit preselection process based on the supplemental phoneset, preselect units from the supplemental phoneset and the existing phoneset based on the modified unit preselection process, and generate speech based on the preselected units. The supplemental phoneset can be a variation of the existing phoneset, can include a word boundary feature, can include a cluster feature where initial consonant clusters and some word boundaries are marked with diacritics, can include a function word feature which marks units as originating from a function word or a content word, and/or can include a pre-vocalic or post-vocalic feature. The speech synthesizer front end can incorporates the supplemental phoneset as an extra feature. | 11-27-2014 |
20150088520 | VOICE SYNTHESIZER - A candidate voice segment sequence generator | 03-26-2015 |
20150112686 | AUDIO DATA SYNTHESIS TERMINAL, AUDIO DATA RECORDING TERMINAL, AUDIO DATA SYNTHESIS METHOD, AUDIO OUTPUT METHOD, AND PROGRAM - The time difference calculation unit which calculates a time difference between own terminal and the another terminal, based on the time at which output of the first sound from the audio output module is started, a time at which input of a sound corresponding to the audio data to the audio input module is started, a time indicated by the first information, and a time indicated by the second information. | 04-23-2015 |
20150120303 | SENTENCE SET GENERATING DEVICE, SENTENCE SET GENERATING METHOD, AND COMPUTER PROGRAM PRODUCT - According to an embodiment, a sentence set generating device includes an importance degree storage, a frequency storage, a calculator, and a selector. The importance degree storage is configured to store therein a degree of importance of each of a plurality of acoustic units. The frequency storage is configured to store therein a frequency of appearance of each of the acoustic units in a second sentence set. The calculator is configured to calculate a score of a first sentence included in a first sentence set, from a degree of rarity corresponding to the frequency of appearance of each acoustic unit in the first sentence and from a degree of importance of the each acoustic unit. The selector is configured to, from sentences included in the first sentence set, select a sentence having a score higher than other sentences, and add the selected sentence to the second sentence set. | 04-30-2015 |
20150310849 | CONVERSATION-SENTENCE GENERATION DEVICE, CONVERSATION-SENTENCE GENERATION METHOD, AND CONVERSATION-SENTENCE GENERATION PROGRAM - A conversation-sentence generation device according to the invention of this application includes: an input unit that receives, as input information, a conversation sentence given from a user to an agent, and clue information based on which a physical and psychological state of the agent is estimated; an agent state storing unit that stores the physical and psychological state of the agent as an agent state; an agent state estimating unit that estimates a new agent state based on the input information and the agent state; an utterance intention generating unit that generates, based on the input information and the agent state, an utterance intention directed from the agent to the user; a conversation sentence generating unit that generates, based on the input information, the agent state, and the utterance intention, a conversation sentence given from the agent to the user; and an output unit that outputs the conversation sentence generated by the conversation sentence generating unit. | 10-29-2015 |
20150310850 | SYSTEM AND METHOD FOR SINGING SYNTHESIS - A singing synthesis section for generating singing by integrating into one singing a plurality of vocals sung by a singer a plurality of times or vocals of which parts that he/she does not like are sung again. A music audio signal playback section plays back the music audio signal from a signal portion or its immediately preceding signal corresponding to a character in the lyrics when the character displayed on the display screen is selected by a character selecting section. An estimation and analysis data storing section automatically aligns the lyrics with the vocal, decomposes the vocal into three elements, pitch, power, and timber, and stores them. A data selecting section allows the user to select each of the three elements for respective time periods of phonemes. The data editing section modifies the time periods of the three elements in alignment with the modified time periods of the phonemes. | 10-29-2015 |
20150333717 | METHODS AND APPARATUS FOR PROCESSING AUDIO SIGNALS - A method for processing an audio signal (i(t)), comprises: receiving a first set (x(t)) of time-varying signals representing a first sound comprised in the audio signal (i(t)), the first set (x(t)) of time-varying signals comprising an amplitude modulation signal (a(t)), a carrier frequency signal (f | 11-19-2015 |
20150380014 | METHOD OF SINGING VOICE SEPARATION FROM AN AUDIO MIXTURE AND CORRESPONDING APPARATUS - Separation of a singing voice source from an audio mixture by using auxiliary information related to temporal activity of the different audio sources to improve the separation process. An audio signal is produced from symbolic digital musical score and symbolic digital lyrics information related to a singing voice in the audio mixture. By means of Non-negative Matrix Factorization (NMF), characteristics of the audio mixture and of the produced audio signal are used to produce an estimated singing voice and an estimated accompaniment through Wiener filtering. | 12-31-2015 |
20190147849 | NATURAL LANGUAGE GENERATION BASED ON USER SPEECH STYLE | 05-16-2019 |
20190149959 | METHOD FOR CONTROLLING A VIRTUAL TALK GROUP MEMEBER TO PERFORM AN ASSIGNMENT | 05-16-2019 |