Patent application number | Description | Published |
20090306985 | SYSTEM AND METHOD FOR SYNTHETICALLY GENERATED SPEECH DESCRIBING MEDIA CONTENT - Disclosed herein are systems, methods, and computer readable-media for providing an automatic synthetically generated voice describing media content, the method comprising receiving one or more pieces of metadata for a primary media content, selecting at least one piece of metadata for output, and outputting the at least one piece of metadata as synthetically generated speech with the primary media content. Other aspects of the invention involve alternative output, output speech simultaneously with the primary media content, output speech during gaps in the primary media content, translate metadata in foreign language, tailor voice, accent, and language to match the metadata and/or primary media content. A user may control output via a user interface or output may be customized based on preferences in a user profile. | 12-10-2009 |
20110065428 | SYSTEMS AND METHODS FOR SELECTING AN OUTPUT MODALITY IN A MOBILE DEVICE - A method and system for selecting the output modality of an application in a mobile device from attributes of the mobile device includes a detection of at least one attribute of the mobile device, automatically identifying available modalities for the output of the application based on the attribute, and selecting a preferred output modality from the available modalities. The output of the application is converted to the output modality and transmitted through an output interface selected based on the attributes measured. | 03-17-2011 |
20120130709 | SYSTEM AND METHOD FOR BUILDING AND EVALUATING AUTOMATIC SPEECH RECOGNITION VIA AN APPLICATION PROGRAMMER INTERFACE - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for building an automatic speech recognition system through an Internet API. A network-based automatic speech recognition server configured to practice the method receives feature streams, transcriptions, and parameter values as inputs from a network client independent of knowledge of internal operations of the server. The server processes the inputs to train an acoustic model and a language model, and transmits the acoustic model and the language model to the network client. The server can also generate a log describing the processing and transmit the log to the client. On the server side, a human expert can intervene to modify how the server processes the inputs. The inputs can include an additional feature stream generated from speech by algorithms in the client's proprietary feature extraction. | 05-24-2012 |
20120134507 | Methods, Systems, and Products for Voice Control - Methods, systems, and computer program products provide voice control of electronic devices. Speech and a beacon signal are received. A directional microphone is aligned to a source of the beacon signal. A voice command in the speech is received and executed. | 05-31-2012 |
20130317824 | System and Method for Detecting Synthetic Speaker Verification - Disclosed herein are systems, methods, and tangible computer readable-media for detecting synthetic speaker verification. The method comprises receiving a plurality of speech samples of the same word or phrase for verification, comparing each of the plurality of speech samples to each other, denying verification if the plurality of speech samples demonstrate little variance over time or are the same, and verifying the plurality of speech samples if the plurality of speech samples demonstrates sufficient variance over time. One embodiment further adds that each of the plurality of speech samples is collected at different times or in different contexts. In other embodiments, variance is based on a pre-determined threshold or the threshold for variance is adjusted based on a need for authentication certainty. In another embodiment, if the initial comparison is inconclusive, additional speech samples are received. | 11-28-2013 |
20140099594 | Methods, Systems, and Products for Monitoring Health - Methods, systems, and products monitor a person's regimen for medicinal and dietary restrictions. When the person's regimen requires a liquid medication or supplement, an oral instrument is commanded to dispense a dosage of fluid. The oral instrument stores a reservoir of the fluid. If the oral instrument is a spoon, for example, the spoon may automatically dispense cough syrup or other medicine. A toothbrush, likewise, may automatically dispense mouthwash. A sensor may confirm presence of the oral instrument in the person's mouth, thus ensuring the dosage of fluid is ingested. | 04-10-2014 |
20140101084 | Methods, Systems, and Products for Interfacing with Neurological and Biological Networks - Methods, systems, and products provide interfaces between intrahost networks and interhost networks within biological hosts. Neuroregional translations are performed to route communications to and from the biological hosts. Bioregional translations may also be performed to route communications to and from the biological hosts. | 04-10-2014 |
20140101296 | Methods, Systems, and Products for Prediction of Mood - Methods, systems, and products predict emotional moods. Predicted moods may then be used to configure devices and machinery. A communications device may be configured to a mood of a user. A car may adjust to the mood of an operator. Even assembly lines may be configured, based on the mood of operators. Machinery and equipment may thus adopt performance and safety precautions that account for varying moods. | 04-10-2014 |
20140101740 | Methods, Systems, and Products for Authentication of Users - Methods, systems, and products authenticate users for access to devices, applications, and services. Skills of a user are learned over time, such that an electronic model of random subject matter may be generated. The user is prompted to interpret the random subject matter, such as with a drawing, physical arrangement, or performance. The user's interpretation is then compared to the electronic model of the random subject matter. If the user is truly who they purport to be, their interpretation will match the electronic model, thus authenticating the user. If interpretation fails to match the electronic model, authentication may be denied. | 04-10-2014 |
20140126741 | Methods, Systems, and Products for Personalized Feedback - Methods, systems, and computer program products provide personalized feedback in a cloud-based environment. A client device routes image data and audio data to a server for analysis. The server analyzes the image data to recognize people of interest. The server also analyzes the audio data to generate audible feedback. Because the server performs image recognition and audio processing, the client device is relieved of these intensive operations. | 05-08-2014 |
20140145873 | Electromagnetic Reflection Profiles - Methods, systems, and products determine electromagnetic reflective characteristics of ambient environments. A wireless communications device sends a cellular impulse and receives reflections of the cellular impulse. The cellular impulse and the reflections of the cellular impulse may be compared to determine the electromagnetic reflective characteristics of an ambient environment. | 05-29-2014 |
20140156697 | Methods, Systems, and Products for Recalling and Retrieving Documentary Evidence - Methods, systems, and products help users recall memories and search for content of those memories. When a user cannot recall a memory, the user is prompted with questions to help recall the memory. As the user answers the questions, a virtual recollection of the memory is synthesized from the answers to the questions. When the user is satisfied with the virtual recollection of the memory, a database of content may be searched for the virtual recollection of the memory. Video data, for example, may be retrieved that matches the virtual recollection of the memory. The video data is thus historical data documenting past events. | 06-05-2014 |
20140162607 | System and Method for Answering a Communication Notification - Disclosed herein are systems, methods, and computer readable-media for answering a communication notification. The method for answering a communication notification comprises receiving a notification of communication from a user, converting information related to the notification to speech, outputting the information as speech to the user, and receiving from the user an instruction to accept or ignore the incoming communication associated with the notification. In one embodiment, information related to the notification comprises one or more of a telephone number, an area code, a geographic origin of the request, caller id, a voice message, address book information, a text message, an email, a subject line, an importance level, a photograph, a video clip, metadata, an IP address, or a domain name. Another embodiment involves notification assigned an importance level and repeat attempts at notification if it is of high importance. | 06-12-2014 |
20140163960 | REAL - TIME EMOTION TRACKING SYSTEM - Devices, systems, methods, media, and programs for detecting an emotional state change in an audio signal are provided. A plurality of segments of the audio signal is received, with the plurality of segments being sequential. Each segment of the plurality of segments is analyzed, and, for each segment, an emotional state and a confidence score of the emotional state are determined. The emotional state and the confidence score of each segment are sequentially analyzed, and a current emotional state of the audio signal is tracked throughout each of the plurality of segments. For each segment, it is determined whether the current emotional state of the audio signal changes to another emotional state based on the emotional state and the confidence score of the segment. | 06-12-2014 |
20140237577 | Methods, Systems, and Products for Authentication of Users - Methods, systems, and products authenticate users for access to devices, applications, and services. Skills of a user are learned over time, such that an electronic model of random subject matter may be generated. The user is prompted to interpret the random subject matter, such as with a drawing, physical arrangement, or performance. The user's interpretation is then compared to the electronic model of the random subject matter. If the user is truly who they purport to be, their interpretation will match the electronic model, thus authenticating the user. If interpretation fails to match the electronic model, authentication may be denied. | 08-21-2014 |
20140350938 | SYSTEM AND METHOD FOR DETECTING SYNTHETIC SPEAKER VERIFICATION - Disclosed herein are systems, methods, and tangible computer readable-media for detecting synthetic speaker verification. The method comprises receiving a plurality of speech samples of the same word or phrase for verification, comparing each of the plurality of speech samples to each other, denying verification if the plurality of speech samples demonstrate little variance over time or are the same, and verifying the plurality of speech samples if the plurality of speech samples demonstrates sufficient variance over time. One embodiment further adds that each of the plurality of speech samples is collected at different times or in different contexts. In other embodiments, variance is based on a pre-determined threshold or the threshold for variance is adjusted based on a need for authentication certainty. In another embodiment, if the initial comparison is inconclusive, additional speech samples are received. | 11-27-2014 |
20140379350 | System and Method for Synthetically Generated Speech Describing Media Content - Disclosed herein are systems, methods, and computer readable-media for providing an automatic synthetically generated voice describing media content, the method comprising receiving one or more pieces of metadata for a primary media content, selecting at least one piece of metadata for output, and outputting the at least one piece of metadata as synthetically generated speech with the primary media content. Other aspects of the invention involve alternative output, output speech simultaneously with the primary media content, output speech during gaps in the primary media content, translate metadata in foreign language, tailor voice, accent, and language to match the metadata and/or primary media content. A user may control output via a user interface or output may be customized based on preferences in a user profile. | 12-25-2014 |
20150046160 | Systems, Computer-Implemented Methods, and Tangible Computer-Readable Storage Media For Transcription Alighnment - Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for captioning a media presentation. The method includes receiving automatic speech recognition (ASR) output from a media presentation and a transcription of the media presentation. The method includes selecting via a processor a pair of anchor words in the media presentation based on the ASR output and transcription and generating captions by aligning the transcription with the ASR output between the selected pair of anchor words. The transcription can be human-generated. Selecting pairs of anchor words can be based on a similarity threshold between the ASR output and the transcription. In one variation, commonly used words on a stop list are ineligible as anchor words. The method includes outputting the media presentation with the generated captions. The presentation can be a recording of a live event. | 02-12-2015 |
20150072739 | System and Method for Answering a Communication Notification - Disclosed herein are systems, methods, and computer readable-media for answering a communication notification. The method for answering a communication notification comprises receiving a notification of communication from a user, converting information related to the notification to speech, outputting the information as speech to the user, and receiving from the user an instruction to accept or ignore the incoming communication associated with the notification. In one embodiment, information related to the notification comprises one or more of a telephone number, an area code, a geographic origin of the request, caller id, a voice message, address book information, a text message, an email, a subject line, an importance level, a photograph, a video clip, metadata, an IP address, or a domain name. Another embodiment involves notification assigned an importance level and repeat attempts at notification if it is of high importance. | 03-12-2015 |
20150073805 | SYSTEM AND METHOD FOR DISTRIBUTED VOICE MODELS ACROSS CLOUD AND DEVICE FOR EMBEDDED TEXT-TO-SPEECH - Disclosed herein are systems, methods, and computer-readable storage media for intelligent caching of concatenative speech units for use in speech synthesis. A system configured to practice the method can identify a speech synthesis context, and determine, based on a local cache of text-to-speech units for a text-to-speech voice and based on the speech synthesis context, additional text-to-speech units which are not in the local cache. The system can request from a server the additional text-to-speech units, and store the additional text-to-speech units in the local cache. The system can then synthesize speech using the text-to-speech units and the additional text-to-speech units in the local cache. The system can prune the cache as the context changes, based on availability of local storage, or after synthesizing the speech. The local cache can store a core set of text-to-speech units associated with the text-to-speech voice that cannot be pruned from the local cache. | 03-12-2015 |
20150145995 | ENHANCED VIEW FOR CONNECTED CARS - Network connectivity is used to share relevant visual and other sensory information between vehicles, as well as delivering relevant information provided by network services to create an enhanced view of the vehicle's surroundings. The enhanced view is presented to the occupants of the vehicle to provide an improved driving experience and/or enable the occupants to take proper action (e.g., avoid obstacles, identify traffic delays, etc.). In one example, the enhanced view comprises information that is not visible to the naked eye and/or cannot be currently sensed by the vehicle's sensors (e.g., due to a partial or blocked view, low visibility conditions, hardware capabilities of the vehicle's sensors, position of the vehicle's sensors, etc.). | 05-28-2015 |
20150220830 | Routing Policies for Biological Hosts - Methods, systems, and products provide interfaces between intrahost networks and interhost networks within biological hosts. Neuroregional translations are performed to route communications to and from the biological hosts. Bioregional translations may also be performed to route communications to and from the biological hosts. | 08-06-2015 |
20150235655 | REAL-TIME EMOTION TRACKING SYSTEM - Devices, systems, methods, media, and programs for detecting an emotional state change in an audio signal are provided. A plurality of segments of the audio signal is received, with the plurality of segments being sequential. Each segment of the plurality of segments is analyzed, and, for each segment, an emotional state and a confidence score of the emotional state are determined. The emotional state and the confidence score of each segment are sequentially analyzed, and a current emotional state of the audio signal is tracked throughout each of the plurality of segments. For each segment, it is determined whether the current emotional state of the audio signal changes to another emotional state based on the emotional state and the confidence score of the segment. | 08-20-2015 |
Patent application number | Description | Published |
20090300041 | Method and System for Training a Text-to-Speech Synthesis System Using a Specific Domain Speech Database - A method and system are disclosed that train a text-to-speech synthesis system for use in speech synthesis. The method includes generating a speech database of audio files comprising domain-specific voices having various prosodies, and training a text-to-speech synthesis system using the speech database by selecting audio segments having a prosody based on at least one dialog state. The system includes a processor, a speech database of audio files, and modules for implementing the method. | 12-03-2009 |
20120136664 | SYSTEM AND METHOD FOR CLOUD-BASED TEXT-TO-SPEECH WEB SERVICES - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating speech. One variation of the method is from a server side, and another variation of the method is from a client side. The server side method, as implemented by a network-based automatic speech processing system, includes first receiving, from a network client independent of knowledge of internal operations of the system, a request to generate a text-to-speech voice. The request can include speech samples, transcriptions of the speech samples, and metadata describing the speech samples. The system extracts sound units from the speech samples based on the transcriptions and generates an interactive demonstration of the text-to-speech voice based on the sound units, the transcriptions, and the metadata, wherein the interactive demonstration hides a back end processing implementation from the network client. The system provides access to the interactive demonstration to the network client. | 05-31-2012 |
20140098240 | METHOD AND APPARATUS FOR PROCESSING COMMANDS DIRECTED TO A MEDIA CENTER - A system that incorporates teachings of the subject disclosure may include, for example, a method for controlling a steering of a plurality of cameras to identify a plurality of potential sources, identifying the plurality of potential sources according to image data provided by the plurality of cameras, assigning a beam of a plurality of beams of a plurality of microphones to each of the plurality of potential sources, detecting a first command comprising one of a first audible cue based on signals from a portion of the plurality of microphones, a first visual cue based on image data from one of the plurality of cameras, or both for controlling a media center, and configuring the media center according to the first command. Other embodiments are disclosed. | 04-10-2014 |
20140101689 | SYSTEM AND METHOD FOR A COMMUNICATION EXCHANGE WITH AN AVATAR IN A MEDIA COMMUNICATION SYSTEM - A system that incorporates teachings of the present disclosure may include, for example, a processor that causes a STB to present an avatar. The processor can receive from the STB a response of the user, detect from the response a change in an emotional state of the user, adapt a search for media content according to the change in the emotional state of the user, and adapt a portion of the characteristics of the avatar relating to emotional feedback according to the change in the emotional state of the user. The processor can cause the STB to present the adapted avatar presenting content from a media content source identified from the adapted search for media content. Other embodiments are disclosed. | 04-10-2014 |
20140157152 | SYSTEM AND METHOD FOR DISTRIBUTING AN AVATAR - A system that incorporates teachings of the present disclosure may include, for example, a first computing device having a controller to present an avatar having characteristics that correlate to a user profile and that conform to operating characteristics of the first computing device, and transmit to a second computing device operational information associated with the avatar for reproducing at least in part the avatar at said second computing device. Other embodiments are disclosed. | 06-05-2014 |
20150040147 | PRESENTATION OF AN AVATAR IN ASSOCIATION WITH A MERCHANT SYSTEM - A system that incorporates teachings of the present disclosure may include, for example, an avatar engine having a controller to retrieve a user profile, cause a presentation device to present a user an avatar having characteristics that correlate to the user profile, detect one or more responses of the user, identify from the one or more responses a need to communicate with a merchant system, establish a communication session with the merchant system, receive a notification from the merchant system of a merchant avatar engine, establish communication with the merchant avatar engine, adapt the characteristics of the avatar at least in part according to instructions supplied by the merchant avatar engine, and cause the presentation device to present the user the adapted avatar. Other embodiments are disclosed. | 02-05-2015 |
20150095930 | System and Method for a Communication Exchange with an Avatar in a MediaCommunication System - A system that incorporates teachings of the present disclosure may include, for example, a processor that causes a STB to present an avatar. The processor can receive from the STB a response of the user, detect from the response a change in an emotional state of the user, adapt a search for media content according to the change in the emotional state of the user, and adapt a portion of the characteristics of the avatar relating to emotional feedback according to the change in the emotional state of the user. The processor can cause the STB to present the adapted avatar presenting content from a media content source identified from the adapted search for media content. Other embodiments are disclosed. | 04-02-2015 |
20150149159 | SYSTEM AND METHOD FOR NETWORK BANDWIDTH MANAGEMENT FOR ADJUSTING AUDIO QUALITY - Disclosed herein are systems, methods, and computer-readable storage devices for processing audio signals. An example system configured to practice the method receives audio at a device to be transmitted to a remote speech processing system. The system analyzes one of noise conditions, need for an enhanced speech quality, and network load to yield an analysis. Based on the analysis, the system determines to bypass user-defined options for enhancing audio for speech processing. Then, based on the analysis, the system can modify an audio transmission parameter used to transmit the audio from the device to the remote speech processing system. The audio transmission parameter can be one of an amount of coding, a chosen codec, an amount of coding, or a number of audio channels, for example. | 05-28-2015 |
20150149285 | TARGETING MEDIA DELIVERY TO A MOBILE AUDIENCE - A system that incorporates the subject disclosure may perform, for example, operations including determining a representative trajectory of a number of mobile devices relative to a media presentation device, such as a digital billboard. An audience of the number of mobile devices is identified and user characteristics are obtained of the audience. A representative interest of the audience is determined from the user characteristics of the audience, and a media content item is selected according to the representative interest and the representative trajectory. The media content item is presented at the media presentation device to expose the audience to the media content item. Other embodiments are disclosed. | 05-28-2015 |
20150221298 | System and Method for Cloud-Based Text-to-Speech Web Services - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating speech. One variation of the method is from a server side, and another variation of the method is from a client side. The server side method, as implemented by a network-based automatic speech processing system, includes first receiving, from a network client independent of knowledge of internal operations of the system, a request to generate a text-to-speech voice. The request can include speech samples, transcriptions of the speech samples, and metadata describing the speech samples. The system extracts sound units from the speech samples based on the transcriptions and generates an interactive demonstration of the text-to-speech voice based on the sound units, the transcriptions, and the metadata, wherein the interactive demonstration hides a back end processing implementation from the network client. The system provides access to the interactive demonstration to the network client. | 08-06-2015 |