Entries |
Document | Title | Date |
20080201142 | METHOD AND APPARATUS FOR AUTOMICATION CREATION OF AN INTERACTIVE LOG BASED ON REAL-TIME CONTENT | 08-21-2008 |
20080201143 | SYSTEM AND METHOD FOR MULTI-MODAL AUDIO MINING OF TELEPHONE CONVERSATIONS - A system and method for the automated monitoring of inmate telephone calls as well as multi-modal search, retrieval and playback capabilities for said calls. A general term for such capabilities is multi-modal audio mining. The invention is designed to provide an efficient means for organizations such as correctional facilities to identify and monitor the contents of telephone conversations and to provide evidence of possible inappropriate conduct and/or criminal activity of inmates by analyzing monitored telephone conversations for events, including, but not limited to, the addition of third parties, the discussion of particular topics, and the mention of certain entities. | 08-21-2008 |
20080215321 | Pitch model for noise estimation - Pitch is tracked for individual samples, which are taken much more frequently than an analysis frame. Speech is identified based on the tracked pitch and the speech components of the signal are removed with a time-varying filter, leaving only an estimate of a time-varying speech signal. This estimate is then used to generate a time-varying noise model which, in turn, can be used to enhance speech related systems. | 09-04-2008 |
20080221879 | MOBILE ENVIRONMENT SPEECH PROCESSING FACILITY - In embodiments of the present invention improved capabilities are described for a mobile environment speech processing facility. The present invention may provide for the entering of text into a software application resident on a mobile communication facility, where recorded speech may be presented by the user using the mobile communications facility's resident capture facility. Transmission of the recording may be provided through a wireless communication facility to a speech recognition facility, and may be accompanied by information related to the software application. Results may be generated utilizing the speech recognition facility that may be independent of structured grammar, and may be based at least in part on the information relating to the software application and the recording. The results may then be transmitted to the mobile communications facility, where they may be loaded into the software application. In addition, the speech recognition facility may be adapted based on usage. | 09-11-2008 |
20080221880 | Mobile music environment speech processing facility - In embodiments of the present invention improved capabilities are described for a mobile environment speech processing facility. The present invention may provide for the entering of text into a music software application resident on a mobile communication facility, where speech may be recorded using the mobile communications facility's resident capture facility. Transmission of the recording may be provided through a wireless communication facility to a speech recognition facility. Results may be generated utilizing the speech recognition facility that may be independent of structured grammar, and may be based at least in part on the information relating to the recording. The results may then be transmitted to the mobile communications facility, where they may be loaded into the music software application. In embodiments, the user may be allowed to alter the results that are received from the speech recognition facility. In addition, the speech recognition facility may be adapted based on usage. | 09-11-2008 |
20080221881 | Recognition of Speech in Editable Audio Streams - A speech processing system divides a spoken audio stream into partial audio streams, referred to as “snippets.” The system may divide a portion of the audio stream into two snippets at a position at which the speaker performed an editing operation, such as pausing and then resuming recording, or rewinding and then resuming recording. The snippets may be transmitted sequentially to a consumer, such as an automatic speech recognizer or a playback device, as the snippets are generated. The consumer may process (e.g., recognize or play back) the snippets as they are received. The consumer may modify its output in response to editing operations reflected in the snippets. The consumer may process the audio stream while it is being created and transmitted even if the audio stream includes editing operations that invalidate previously-transmitted partial audio streams, thereby enabling shorter turnaround time between dictation and consumption of the complete audio stream. | 09-11-2008 |
20080221882 | SYSTEM FOR EXCLUDING UNWANTED DATA FROM A VOICE RECORDING - An apparatus and method for the preparation of a censored recording of an audio source according to a procedure whereby no tangible, durable version of the original audio data is created in the course of preparing the censored record. Further, a method is provided for identifying target speech elements in a primary speech text by iteratively using portions of already identified target elements to locate further target elements that contain identical portions. The target speech elements, once identified, are removed from the primary speech text or rendered unintelligible to produce a censored record of the primary speech text. Copies of such censored primary speech text elements may be transmitted and stored with reduced security precautions. | 09-11-2008 |
20080221883 | HANDS FREE CONTACT DATABASE INFORMATION ENTRY AT A COMMUNICATION DEVICE - A method, system, and program provides for hands free contact database information entry at a communication device. A recording system at a communication device detects a user initiation to record. Responsive to detecting the user initiation to record, the recording system records the ongoing conversation supported between the communication device and a second remote communication device. The recording system converts the recording of the conversation into text. Next, the recording system extracts contact information from the text. Then, the recording system stores the extracted contact information in an entry of the contact database, such that contact information is added to the contact database of the communication device without manual entry of the contact information by the user. | 09-11-2008 |
20080228479 | Data transcription and management system and method - What is disclosed is a data gathering, storage and management system, which includes a database in which data files are stored. The database includes a series of selected keywords each associated with one or more content files, the content files comprising advertisements, information, on-screen control buttons for performing a series of functions, and links for access to websites and other sources of information. The system accepts audio data files and identifies keywords that may be heard in the audio file. In one embodiment, the audio data file is transcribed and keywords are searched for in the transcribed text. The identified keywords from the audio data file are compared with the selected keywords, and at least one content file is selected for display for each retrieved keyword in the list which matches a selected keyword. The content file is displayed to the user so that the displayed content is relevant to the produced audio, which may be a recording of a conversation, a speech, or issued voice commands. What is disclosed is a method and system for providing relevant content to a user based upon speech. | 09-18-2008 |
20080228480 | SPEECH RECOGNITION METHOD, SPEECH RECOGNITION SYSTEM, AND SERVER THEREOF - A speech recognition method comprises model selection step which selects a recognition model based on characteristic information of input speech and speech recognition step which translates input speech into text data based on the selected recognition model. | 09-18-2008 |
20080235014 | Method and System for Processing Dictated Information - A method and a system for processing dictated information into a dynamic form are disclosed. The method comprises presenting an image ( | 09-25-2008 |
20080243500 | Automatic Editing Using Probabilistic Word Substitution Models - An input sequence of unstructured speech recognition text is transformed into output structured document text. A probabilistic word substitution model is provided which establishes association probabilities indicative of target structured document text correlating with source unstructured speech recognition text. The input sequence of unstructured speech recognition text is looked up in the word substitution model to determine likelihoods of the represented structured document text corresponding to the text in the input sequence. Then, a most likely sequence of structured document text is generated as an output. | 10-02-2008 |
20080243501 | Location-Based Responses to Telephone Requests - A method for receiving processed information at a remote device is described. The method includes transmitting from the remote device a verbal request to a first information provider and receiving a digital message from the first information provider in response to the transmitted verbal request. The digital message includes a symbolic representation indicator associated with a symbolic representation of the verbal request and data used to control an application. The method also includes transmitting, using the application, the symbolic representation indicator to a second information provider for generating results to be displayed on the remote device. | 10-02-2008 |
20080255837 | Method for locating an audio segment within an audio file - A method for locating an audio segment within an audio file comprising (i) providing a first transcribed text file associated with the audio file; (ii) providing a second transcribed text file associated with the audio file; (iii) receiving a user input defining a text segment corresponding to the audio segment to be located; (iv) searching for the text segment in the first transcribed text file; and (v) displaying only those occurrences of the text segment within the first transcribed text file that are also a match to occurrences of the text segment within the second transcribed text file. | 10-16-2008 |
20080255838 | AUDIBLE PRESENTATION AND VERBAL INTERACTION OF HTML-LIKE FORM CONSTRUCTS - A method of synchronizing an audio and visual presentation in a multi-modal browser. A form is transmitted over a network having at least one field requiring user supplied information to a multi-modal browser. Blank fields within the form are filled in by user who provides either verbal or tactile interaction, or a combination of verbal and tactile interaction. The browser moves to the next field requiring user provided input. Finally, the form exits after the user has supplied input for all required fields. The method also provides a synchronized verbal and visual presentation by said browser by having the headings for the fields to be filled out and typing in what the user says. | 10-16-2008 |
20080270128 | Text Input System and Method Based on Voice Recognition - Provided is a text input system and method based on voice recognition. The system includes: an input unit for receiving part of text, i.e., partial text; a voice input unit for receiving entire text of the partial text by voice; a voice recognition preprocessing unit for analyzing the voice inputted through the voice input unit and transmitting the partial text inputted through the input unit with voice analysis information; a voice recognizing unit for creating a list of a recognition candidates by using the partial text transmitted from the voice recognition preprocessing unit, performing a voice recognition and selecting a text among the recognition candidates; and an output unit for outputting a finally voice recognized text. | 10-30-2008 |
20080275700 | Method of and System for Modifying Messages - The invention describes a method of and a system for modifying an input message (IM) containing audio content, which method comprises the steps of converting the audio content (A) of the input message (IM) into elements of a text representation (TR), segmenting the audio content (A) of the input message (IM) into constituent phonetic elements (A | 11-06-2008 |
20080275701 | SYSTEM AND METHOD FOR RETRIEVING DATA BASED ON TOPICS OF CONVERSATION - A method includes performing computerized monitoring with a computer of at least one side of a telephone conversation, which includes spoken words, between a first person and a second person, automatically identifying at least one topic of the conversation, automatically performing a search for information related to the at least one topic, and outputting a result of the search. Also a system for performing the method. | 11-06-2008 |
20080275702 | System and method for providing digital dictation capabilities over a wireless device - A system and method for providing digital dictation capabilities over a wireless device. The system and method enables digital dictations to be recorded on a wireless device, such as a BlackBerry smartphone, and then uploaded wirelessly to a remote location, such a server, for transcription. Features of the wireless device, such as the display and trackball, can be used to control the dictation. | 11-06-2008 |
20080281592 | Method and Apparatus for Annotating Video Content With Metadata Generated Using Speech Recognition Technology - A method and apparatus is provided for annotating video content with metadata generated using speech recognition technology. The method begins by rendering video content on a display device. A segment of speech is received from a user such that the speech segment annotates a portion of the video content currently being rendered. The speech segment is converted to a text-segment and the text-segment is associated with the rendered portion of the video content. The text segment is stored in a selectively retrievable manner so that it is associated with the rendered portion of the video content. | 11-13-2008 |
20080288249 | Method and System for Dynamic Creation of Contexts - A method and a system for a speech recognition system ( | 11-20-2008 |
20080288250 | REAL-TIME TRANSCRIPTION SYSTEM - A transcription system and method that includes a transcription terminal for recording electronically generated text as units of transcribed text, and a conversion unit for translating the units of ascribed text into a generally accurate transcript of the electronically generated text and converting said transcript into a signal to be transmitted to an authorized receiving unit over a communication link. The system and method optionally includes any of a presentation object to be transmitted to the authorized receiving unit, a wireless access point for transmitting serial data representing the transcript, and suppression of an automatic network identifier. | 11-20-2008 |
20080288251 | Tracking Time Using Portable Recorders and Speech Recognition - In general, the present invention converts speech, preferably recorded on a portable recorder, to text, analyzes the text, and determines voice commands and times when the voice commands occurred. Task names are associated with voice commands and time segments. These time segments and tasks may be packaged as time increments and stored (e.g., in a file or database) for further processing. Preferably, phrase grammar rules are used when analyzing the text, as this helps to determine voice commands. Using phrase grammar rules also allows the text to contain a variety of topics, only some of which are pertinent to tracking time. | 11-20-2008 |
20080294433 | Automatic Text-Speech Mapping Tool - A text-speech mapping method. Silence segments for incoming speech data are obtained. Incoming transcript data is preprocessed. The incoming transcript data comprises a written document of the speech data. Possible candidate sentence endpoints based on the silence segments are found. A best match sentence endpoint is selected based on a forced alignment score. The next sentence is set to begin immediately after the current sentence endpoint, and the process of finding candidate sentence endpoints, selecting the best match sentence endpoint, and setting the next sentence is repeated until all sentences for the incoming speech data are mapped. The process is repeated for each mapped sentence to provide word level mapping. | 11-27-2008 |
20080294434 | Live Media Captioning Subscription Framework for Mobile Devices - A subscription-based system provides transcribed audio information to one or more mobile devices. Some techniques feature a system for providing subscription services for currently-generated (e.g., not stored) information (e.g., caption information, transcribed audio) for one or more mobile devices for a live/current audio event. There can be a communication network for communicating to the one or more mobile devices, a transcriber configured for transcribing the event to generate information (e.g., caption information, transcribed audio). Caption data includes transcribed data and control code data. The system includes a subscription gateway configured for live/current transfer of the transcribed data to the one or more mobile devices. The subscription gateway is configured to provide access for the transcribed data to the one or more mobile devices. User preferences for subscribers can be set and/or updated by mobile device users and/or GPS-capable mobile devices to receive feeds for the live/current audio event. | 11-27-2008 |
20080300872 | SCALABLE SUMMARIES OF AUDIO OR VISUAL CONTENT - Providing for browsing a summary of content formed of keywords that can scale to a user-defined level of detail is disclosed herein. Components of a system can include a summarization component that extracts keywords related to the content and associates the keywords with portions thereof, and a zooming component that displays a number of keywords based on a keyword/keyphrase relevance rank and a zoom factor. Additionally, a speech to text component can translate speech associated with the content into text, wherein the keywords are extracted from the translated text. Consequently, the claimed subject matter can present a variable hierarchy of keywords to form a scalable summary of such recorded content. | 12-04-2008 |
20080300873 | Systems And Methods For Securely Transcribing Voicemail Messages - A system or method for securely transcribing voicemail messages includes answering a call within a secure communication provider, the secure communication provider recording audio of the call, sending the audio to a voicemail transcription service via a secure communication link, transcribing the audio into text, and sending the text to the secure communication provider via the secure communication link, the audio and the text not being permanently stored and not being available for interpretation by humans during this transcription method. | 12-04-2008 |
20080300874 | SPEECH SKILLS ASSESSMENT - An approach to evaluating a person's speech skills includes automatically processing speech of a person and text some or all of which corresponds to the speech. In some examples, a job application procedure includes collecting speech from an applicant, and using text corresponding to the collected speech to automatically assess speech skills of the applicant. The text may include text that is presented to the applicant and the speech collected from the applicant can include the applicant reading the presented text. | 12-04-2008 |
20080306737 | SYSTEMS AND METHODS FOR CLASSIFYING AND REPRESENTING GESTURAL INPUTS - Gesture and handwriting recognition agents provide possible interpretations of electronic ink. Recognition is performed on both individual strokes and combinations of strokes in the input ink lattice. The interpretations of electronic ink are classified and encoded as symbol complexes where symbols convey specific attributes of the contents of the stroke. The use of symbol complexes to represent strokes in the input ink lattice facilitates reference to sets of entities of a specific type. | 12-11-2008 |
20080312919 | Method and System for Speech Based Document History Tracking - A method and a system of history tracking corrections in a speech based document are disclosed. The speech based document comprises one or more sections of text recognized or transcribed from sections of speech, wherein the sections of speech are dictated by a user and processed by a speech recognizer in a speech recognition system into corresponding sections of text of the speech based document. The method comprises associating of at least one speech attribute ( | 12-18-2008 |
20080312920 | SPEECH-TO-SPEECH GENERATION SYSTEM AND METHOD - An expressive speech-to-speech generation system which can generate expressive speech output by using expressive parameters extracted from the original speech signal to drive the standard TTS system. The system comprises: speech recognition means, machine translation means, text-to-speech generation means, expressive parameter detection means for extracting expressive parameters from the speech of language A, and expressive parameter mapping means for mapping the expressive parameters extracted by the expressive parameter detection means from language A to language B, and driving the text-to-speech generation means by the mapping results to synthesize expressive speech. | 12-18-2008 |
20080319742 | SYSTEM AND METHOD FOR POSTING TO A BLOG OR WIKI USING A TELEPHONE - The present invention discloses a system and method for creating, editing, and posting a BLOG or a WIKI using a telephone. In the invention, a voice-based, real-time telephone communication can be established between a user and a voice response system. User speech can be received over the communication. The user speech can be speech-to-text converted to produce text. The text can be added to a BLOG or a WIKI, which can be posted to a server. The telephone communication can be terminated. The newly posted BLOG or WIKI can be served by the server to clients. | 12-25-2008 |
20080319743 | ASR-Aided Transcription with Segmented Feedback Training - An ASR-aided transcription system with segmented feedback training is provided, the system including a transcription process manager configured to extract a first segment and a second segment from an audio input of speech uttered by a speaker, and an ASR engine configured to operate in a first speech recognition mode to convert the first speech segment into a first text transcript using a speaker-independent acoustic model and a speaker-independent language model, operate in a first training mode to create a speaker-specific acoustic model and a speaker-specific language model by adapting the speaker-independent acoustic model and the speaker-independent language model using either of the first segment and a corrected version of the first text transcript, and operate in a second speech recognition mode to convert the second speech segment into a second text transcript using the speaker-specific acoustic model and the speaker-specific language model. | 12-25-2008 |
20080319744 | METHOD AND SYSTEM FOR RAPID TRANSCRIPTION - A method and system for producing and working with transcripts according to the invention eliminates the foregoing time inefficiencies. By dispersing a source recording to a transcription team in small segments, so that team members transcribe segments in parallel, a rapid transcription process delivers a fully edited transcript within minutes. Clients can view accurate, grammatically correct, proofread and fact-checked documents that shadow live proceedings by mere minutes. The rapid transcript includes time coding, speaker identification and summary. A viewer application allows a client to view a video recording side-by-side with a transcript. Clicking on a word in the transcript locates the corresponding recorded content; advancing a recording to a particular point locates and displays the corresponding spot in the transcript. The recording is viewed using common video features, and may be downloaded. The client can edit the transcript and insert comments. Any number of colleagues can view and edit simultaneously. | 12-25-2008 |
20080319745 | METHOD AND DEVICE FOR PROVIDING SPEECH-TO-TEXT ENCODING AND TELEPHONY SERVICE - A machine-readable medium and a network device are provided for speech-to-text translation. Speech packets are received at a broadband telephony interface and stored in a buffer. The speech packets are processed and textual representations thereof are displayed as words on a display device. Speech processing is activated and deactivated in response to a command from a subscriber. | 12-25-2008 |
20090006089 | METHOD AND APPARATUS FOR STORING REAL TIME INFORMATION ON A MOBILE COMMUNICATION DEVICE - A method and apparatus that stores information on a mobile communication device is disclosed. The method may include receiving a first signal from a user, initiating a recording of information spoken by at least one of the user, a voice mail recording, a recorded message, and a party engaged in the telephone call with the user based on the received first signal, receiving a second signal from the user, stopping the recording of the information based on the second signal being received, converting the recorded information to text, and storing the converted text to a designated location. | 01-01-2009 |
20090006090 | IMAGE COMMUNICATION APPARATUS AND CONTROL METHOD OF THE SAME - An image communication apparatus includes: an image pickup unit which picks up a user image of a user and processes the user image into a user image signal; an audio input unit which receives a user audio signal of the user; an encoder which encodes the user image signal processed by the image pickup unit and the user audio signal; a communication unit which receives an encoded image signal and an encoded audio signal from outside and transmits the user image signal and the user audio signal which are encoded by the encoder; a decoder which decodes the encoded image signal and the encoded audio signal which are received through the communication unit; and a controller which converts at least one of the user audio signal, the user image signal, the decoded image signal and the decoded audio signal into a data file which is stored. | 01-01-2009 |
20090012787 | DIALOG PROCESSING SYSTEM, DIALOG PROCESSING METHOD AND COMPUTER PROGRAM - A dialog processing system which includes a target expression data extraction unit for extracting a plurality of target expression data each including a pattern matching portion which matches an utterance pattern, which are inputted by an utterance pattern input unit and is an utterance structure derived from contents of field-independent general conversations, among a plurality of utterance data which are inputted by an utterance data input unit and obtained by converting contents of a plurality of conversations in one field; a feature extraction unit for retrieving the pattern matching portions, respectively, from the plurality of target expression data extracted, and then for extracting feature quantity common to the plurality of pattern matching portions; and a mandatory data extraction unit for extracting mandatory data in the one field included in the plurality of utterance data by use of the feature quantities extracted. | 01-08-2009 |
20090012788 | SIGN LANGUAGE TRANSLATION SYSTEM - The translation system of a preferred embodiment includes an input element that receives an input language as audio information, an output element that displays an output language as visual information, and a remote server coupled to the input element and the output element, the remote server including a database of sign language images; and a processor that receives the input language from the input element, translates the input language into the output language, and transmits the output language to the output element, wherein the output language is a series of the sign language images that correspond to the input language and that are coupled to one another with substantially seamless continuity, such that the ending position of a first image is blended into the starting position of a second image. | 01-08-2009 |
20090018829 | Speech Recognition Dialog Management - Described is a speech recognition dialog management system that allows more open-ended conversations between virtual agents and people than are possible using just agent-directed dialogs. The system uses both novel dialog context switching and learning algorithms based on spoken interactions with people. The context switching is performed through processing multiple dialog goals in a last-in-first-out (LIFO) pattern. The recognition accuracy for these new flexible conversations is improved through automated learning from processing errors and addition of new grammars. | 01-15-2009 |
20090018830 | SPEECH CONTROL OF COMPUTING DEVICES - The invention relates to techniques of controlling a computing device via speech. A method realization of the proposed techniques comprises the steps of transforming speech input into a text string comprising one or more input words; performing a context-related mapping of the input words to one or more functions for controlling the computing device; and preparing an execution of the identified function. Another realization is related to a remote speech control of computing devices. | 01-15-2009 |
20090024389 | Text oriented, user-friendly editing of a voicemail message - A system in one embodiment includes a server associated with a unified messaging system (UMS). The server records speech of a user as an audio data file, translates the audio data file into a text data file, and maps each word within the text data file to a corresponding segment of audio data in the audio data file. A graphical user interface (GUI) of a message editor running on an endpoint associated with the user displays the text data file on the endpoint and allows the user to identify a portion of the text data file for replacement. The server being further operable to record new speech of the user as new audio data and to replace one or more segments of the audio data file corresponding to the portion of the text with the new audio data. | 01-22-2009 |
20090030680 | Method and System of Indexing Speech Data - A method and system of indexing speech data. The method includes indexing word transcripts including a timestamp for a word occurrence; and indexing sub-word transcripts including a timestamp for a sub-word occurrence. A timestamp in the index indicates the time and duration of occurrence of the word or sub-word in the speech data, and word and sub-word occurrences can be correlated using the timestamps. A method of searching speech transcripts is also provided in which a search query in the form of a phrase to be searched includes at least one in-vocabulary word and at least one out-of-vocabulary word. The method of searching includes extracting the search terms from the phrase, retrieving a list of occurrence of words for an in-vocabulary search term from an index of words having timestamps, retrieving a list of occurrences of sub-words for an out-of-vocabulary search term from an index of sub-words having timestamps, and merging the retrieved lists of occurrences of words and sub-words according to their timestamps. | 01-29-2009 |
20090030681 | CONTROLLING A SET-TOP BOX VIA REMOTE SPEECH RECOGNITION - A device may receive over a network a digitized speech signal from a remote control that accepts speech. In addition, the device may convert the digitized speech signal into text, use the text to obtain command information applicable to a set-top box, and send the command information to the set-top box to control presentation of multimedia content on a television in accordance with the command information. | 01-29-2009 |
20090030682 | System and method for publishing media files - A method for publishing a digital media file. The method includes the steps of receiving the digital media file containing speech, converting the speech to text, identifying a keyword in the text, retrieving, for the keyword, a corresponding URL from a database, inserting into the text a hyperlink linking the keyword with the corresponding URL, and making the media file and the text available to a subscriber. | 01-29-2009 |
20090037170 | METHOD AND APPARATUS FOR VOICE COMMUNICATION USING ABBREVIATED TEXT MESSAGES - The present invention relates generally to an apparatus and method for capturing and producing voice using abbreviated text messages, specifically, to translate voice into an abbreviated message text format for transmission in a communication system and to translate abbreviated message text received from a communication system to voice. | 02-05-2009 |
20090037171 | Real-time voice transcription system - The real-time voice transcription system provides a speech recognition system and method that includes use of speech and spatial-temporal acoustic data to enhance speech recognition probabilities while simultaneously identifying the speaker. Real-time edit capability is provided enabling a user to train the system during a transcription session. The system may be connected to user computers via local network and/or wide area network means. | 02-05-2009 |
20090048829 | Differential Dynamic Content Delivery With Text Display In Dependence Upon Sound Level - Differential dynamic content delivery including providing a session document for a presentation, where the session document includes a session grammar and a session structured document; selecting from the session structured document a classified structural element in dependence upon user classifications of a user participant in the presentation; presenting the selected structural element to the user; streaming speech to the user from one or more users participating in the presentation; converting the speech to text; detecting a total sound level for the user; and determining whether to display the text in dependence upon the total sound level for the user. | 02-19-2009 |
20090048830 | Conceptual analysis driven data-mining and dictation system and method - A new approach to speech recognition that reacts to concepts conveyed through speech, which shifts the balance of power in speech recognition from straight sound recognition and statistical models to a more powerful and complete approach determining and addressing conveyed concepts. A probabilistically unbiased multi-phoneme recognition process is employed, followed by a phoneme stream analysis process that builds the list of candidate words derived from recognized phonemes, followed by a permutation analysis process that produces sequences of candidate words with high potential of being syntactically valid, and finally, by processing targeted syntactic sequences in a conceptual analysis process to generate the utterance's conceptual representation that can be used to produce an adequate response. Applications include improving accuracy or automatically generating punctuation for transcription and dictation, word or concept spotting in audio streams, concept spotting in electronic text, customer support, call routing and other command/response scenarios. | 02-19-2009 |
20090048831 | Scripting support for data identifiers, voice recognition and speech in a telnet session - Methods of adding data identifiers and speech/voice recognition functionality are disclosed. A telnet client runs one or more scripts that add data identifiers to data fields in a telnet session. The input data is inserted in the corresponding fields based on data identifiers. Scripts run only on the telnet client without modifications to the server applications. Further disclosed are methods for providing speech recognition and voice functionality to telnet clients. Portions of input data are converted to voice and played to the user. A user also may provide input to certain fields of the telnet session by using his voice. Scripts running on the telnet client convert the user's voice into text and is inserted to corresponding fields. | 02-19-2009 |
20090048832 | SPEECH-TO-TEXT SYSTEM, SPEECH-TO-TEXT METHOD, AND SPEECH-TO-TEXT PROGRAM - [Problems] To provide a speech-to-text system and the like capable of matching edit result text acquired by editing recognition result text or edit result text which is newly-written text information with speech data. | 02-19-2009 |
20090048833 | Automated Extraction of Semantic Content and Generation of a Structured Document from Speech - Techniques are disclosed for automatically generating structured documents based on speech, including identification of relevant concepts and their interpretation. In one embodiment, a structured document generator uses an integrated process to generate a structured textual document (such as a structured textual medical report) based on a spoken audio stream. The spoken audio stream may be recognized using a language model which includes a plurality of sub-models arranged in a hierarchical structure. Each of the sub-models may correspond to a concept that is expected to appear in the spoken audio stream. Different portions of the spoken audio stream may be recognized using different sub-models. The resulting structured textual document may have a hierarchical structure that corresponds to the hierarchical structure of the language sub-models that were used to generate the structured textual document. | 02-19-2009 |
20090048834 | Audio Signal De-Identification - Techniques are disclosed for automatically de-identifying spoken audio signals. In particular, techniques are disclosed for automatically removing personally identifying information from spoken audio signals and replacing such information with non-personally identifying information. De-identification of a spoken audio signal may be performed by automatically generating a report based on the spoken audio signal. The report may include concept content (e.g., text) corresponding to one or more concepts represented by the spoken audio signal. The report may also include timestamps indicating temporal positions of speech in the spoken audio signal that corresponds to the concept content. Concept content that represents personally identifying information is identified. Audio corresponding to the personally identifying concept content is removed from the spoken audio signal. The removed audio may be replaced with non-personally identifying audio. | 02-19-2009 |
20090055174 | Method and apparatus for automatically completing text input using speech recognition - Provided are a method and apparatus for automatically completing a text input using speech recognition. The method includes: receiving a first part of a text from a user through a text input device; recognizing a speech of the user, which corresponds to the text; and completing a remaining part of the text based on the first part of the text and the recognized speech. Therefore, accuracy of the text input and convenience of the speech recognition can be ensured, and a non-input part of the text can be easily input based on the input part of the text and the recognized speech at a high speed. | 02-26-2009 |
20090055175 | CONTINUOUS SPEECH TRANSCRIPTION PERFORMANCE INDICATION - A method of providing speech transcription performance indication includes receiving, at a user device data representing text transcribed from an audio stream by an ASR system, and data representing a metric associated with the audio stream; displaying, via the user device, said text; and via the user device, providing, in user-perceptible form, an indicator of said metric. Another method includes displaying, by a user device, text transcribed from an audio stream by an ASR system; and via the user device, providing, in user-perceptible form, an indicator of a level of background noise of the audio stream. Another method includes receiving data representing an audio stream; converting said data representing an audio stream to text via an ASR system; determining a metric associated with the audio stream; transmitting data representing said text to a user device; and transmitting data representing said metric to the user device. | 02-26-2009 |
20090070109 | Speech-to-Text Transcription for Personal Communication Devices - A speech-to-text transcription system for a personal communication device (PCD) is housed in a communications server that is communicatively coupled to one or more PCDs. A user of the PCD, dictates an e-mail, for example, into the PCD. The PCD converts the user's voice into a speech signal that is transmitted to the speech-to-text transcription system located in the server. The speech-to-text transcription system transcribes the speech signal into a text message. The text message is then transmitted by the server to the PCD. Upon receiving the text message, the user carries out corrections on erroneously transcribed words before using the text message in various applications. | 03-12-2009 |
20090076816 | ASSISTIVE LISTENING SYSTEM WITH DISPLAY AND SELECTIVE VISUAL INDICATORS FOR SOUND SOURCES - A portable assistive listening system for enhancing sound for hearing impaired individuals includes a functional hearing aid and a separate handheld digital signal processing (DSP) device. The invention focuses on a handheld DSP device that provides a visual cue to the user representing the source of an intermittent incoming sound. It is known that it is easier to distinguish and recognize sounds when the user has knowledge of the sound source. The system provides for various wired and/or wireless audio inputs from, for example, a television, a wireless microphone on a person, a doorbell, a telephone, a smoke alarm, etc. The wireless audio sources are linked to the DSP and can be identified as a particular type of source. For example, the telephone input is associated with a graphical image of a telephone, and the smoke alarm is associated with a graphical image of a smoke alarm. The DSP is configured and arranged to monitor the audio sources and will visually display the graphical image of the input source when sound input is detected from the input. Accordingly, when the telephone rings, the DSP device will display the image of the phone as a visual cue to the user that the phone is ringing. Additionally, the DSP will turn on backlight of the display as an added visual cue that there is an incoming audio signal. | 03-19-2009 |
20090083032 | METHODS AND SYSTEMS FOR DYNAMICALLY UPDATING WEB SERVICE PROFILE INFORMATION BY PARSING TRANSCRIBED MESSAGE STRINGS - Systems, methods, and software for parsing and/or filtering message strings of text messages and/or instant messages in order to identify keywords, phrases, or fragments as a function of which user preferences of user profiles are dynamically updated are disclosed. Such systems, methods, and software are utilized in the context of a communication system including text messaging, instant messaging, or both. Furthermore, such communication system preferably includes an automatic speech recognition (ASR) system. Additionally, ad impressions are selected and delivered to users based, at least in part, on the parsing and/or filtering and/or data maintained in user profiles as dynamically updated from time to time. The ad impression preferably is delivered within a text message or within an instant message conversation and is generally unobtrusive. Revenues preferably may be generated from the delivering of the ad impressions, whereby a provider of instant messaging or text messaging may further derive monetary benefit from providing such service and whereby users of such service may be provided with contextually relevant information in an unobtrusive manner. | 03-26-2009 |
20090089055 | Method and apparatus for identification of conference call participants - A system including a conferencing telephone coupled to or in communication with an identification service. The identification service is configured to poll user devices of conference participants to determine or confirm identities. In response, the user devices transmit audio electronic business cards, which can include user voice samples and/or preprocessed voice recognition data. The identification service stores the resulting audio electronic business card data. When the corresponding participant speaks during the conference, the identification service identifies the speaker. | 04-02-2009 |
20090094027 | Method, Apparatus and Computer Program Product for Providing Improved Voice Conversion - An apparatus for providing improved voice conversion includes a sub-feature generator and a transformation element. The sub-feature generator may be configured to define sub-feature units with respect to a feature of source speech. The transformation element may be configured to perform voice conversion of the source speech to target speech based on the conversion of the sub-feature units to corresponding target speech sub-feature units using a conversion model trained with respect to converting training source speech sub-feature units to training target speech sub-feature units. | 04-09-2009 |
20090094028 | SYSTEMS AND METHODS FOR MAINTENANCE KNOWLEDGE MANAGEMENT - Knowledge-based information can be captured and processed to create a library of such knowledge. A maintenance worker performing a task for an asset can record audio and/or video information during the performance, and can upload the recording to a maintenance system. The system processes the recording to produce a text file corresponding to any speech during the recording, and generates a search index allowing the text file to be searched by a user. If the task is performed in the context of a work order, for example, information from the work order can be associated with the text file so that a user can search by text search, keyword, task, or other such information. A user then can locate and access the text file and/or the corresponding recording for playback. | 04-09-2009 |
20090099845 | Methods and system for capturing voice files and rendering them searchable by keyword or phrase - A system for capturing voice files and rendering them searchable, comprising one or more devices capable of capturing audio speech electronically, a recorder coupled to the devices for retrieving audio speech, a controller coupled to the recorder, a recognition engine adapted to transcribe audio speech into text, and a database system is disclosed. In the system, the controller causes the recorder to capture audio speech from at least one of the devices, the recorder stores the audio speech as data in the database system, and the recognition engine subsequently retrieves the audio speech data, transcribes the audio speech data into text, and stores the text and data associating the text data with at least the audio speech data in the database system for subsequent retrieval by a search application. | 04-16-2009 |
20090119100 | ASSOCIATING ANNOTATION RECORDING WITH A CELL PHONE NUMBER - A method, system and computer program product for creating voice annotations during a mobile phone call. During the phone call a user engages a trigger on the communication device prompting the phone to first mute the device of the user, and then record an audible message. The audible message, or voice annotation, is automatically linked to the current call information. The voice annotation may be transcribed and stored as a textual annotation. The voice or textual annotation may be retrieved utilizing a graphical user interface (GUI). | 05-07-2009 |
20090119101 | Transcript Alignment - An approach to alignment of transcripts with recorded audio is tolerant of moderate transcript inaccuracies, untranscribed speech, and significant non-speech noise. In one aspect, a number of search terms are formed from the transcript such that each search term is associated with a location within the transcript. Possible locations of the search terms are then determined in the audio recording. The audio recording and the transcript are then aligned using the possible locations of the search terms. In another aspect a search expression is accepted, and then a search is performed for spoken occurrences of the search expression in an audio recording. This search includes searching for text occurrences of the search expression in a text transcript of the audio recording, and searching for spoken occurrences of the search expression in the audio recording. | 05-07-2009 |
20090138262 | SYSTEMS AND METHODS TO INDEX AND SEARCH VOICE SITES - A method comprises crawling and indexing voice sites and storing results in an index; receiving a search request in voice from a user via a telephone; performing speech recognition on the voice search request and converting the request from voice to text; parsing the query; and performing a search on the index and ranking the search results. Search results may be filtered based on attributes such as location and context. Filtered search results may be presented to the user in categories to enable easy voice browsing of the search results by the user. Computer program code and systems are also provided. | 05-28-2009 |
20090144057 | Method, Apparatus, and Program for Certifying a Voice Profile When Transmitting Text Messages for Synthesized Speech - A mechanism is provided for authenticating and using a personal voice profile. The voice profile may be issued by a trusted third party, such as a certification authority. The personal voice profile may include information for generating a digest or digital signature for text messages. A speech synthesis system may speak the text message using the voice characteristics, such as prosodic characteristics, only if the voice profile is authenticated and the text message is valid and free of tampering. | 06-04-2009 |
20090150147 | RECORDING AUDIO METADATA FOR STORED IMAGES - A method of processing audio signals recorded during display of image data from a media file on a display device to produce semantic understanding data and associating such data with the original media file, includes: separating a desired audio signal from the aggregate mixture of audio signals; analyzing the separated signal for purposes of gaining semantic understanding; and associating the semantic information obtained from the audio signals recorded during image display with the original media file. | 06-11-2009 |
20090164214 | System, method and software program for enabling communications between customer service agents and users of communication devices - The present invention provides a system, method and software application for enabling a customer service agent to efficiently communicate with users of a communication device. When a user enters speech input into his communication device, the speech is converted to text, and the text is displayed to the customer service agent on the agent's computer screen. Alternately, the user's speech input is provided to the customer service agent in the form of an audio file. The agent types a response, and the agent's response is provided to the user on the user's communication device. The agent's response may be converted to speech and played to the user, and/or the agent's response may be displayed as text on the display screen of the user's communication device. | 06-25-2009 |
20090171659 | METHODS AND APPARATUS FOR IMPLEMENTING DISTRIBUTED MULTI-MODAL APPLICATIONS - Embodiments include methods and apparatus for synchronizing data and focus between visual and voice views associated with distributed multi-modal applications. An embodiment includes a client device adapted to render a visual display that includes at least one multi-modal display element for which input data is receivable though a visual modality and a voice modality. When the client detects a user utterance via the voice modality, the client sends uplink audio data representing the utterance to a speech recognizer. An application server receives a speech recognition result generated by the speech recognizer, and sends a voice event response to the client. The voice event response is sent as a response to an asynchronous HTTP voice event request previously sent to the application server by the client. The client may then send another voice event request to the application server in response to receiving the voice event response. | 07-02-2009 |
20090177469 | SYSTEM FOR RECORDING AND ANALYSING MEETINGS - A system for producing a transcript of a meeting having n attendees, the attendees being identified as ID | 07-09-2009 |
20090177470 | DISTRIBUTED DICTATION/TRANSCRIPTION SYSTEM - A distributed dictation/transcription system is provided. The system provides a client station, dictation manager, and dictation server networked such that the dictation manager can select a dictation server to transcribe audio from the client station. The dictation manager selects one of a plurality of dictation servers based on conventional load balancing as well as on a determination of which of the dictation servers may already have a user profile uploaded. Moreover, while selecting a dictation server and/or uploading a profile, the user or client at the client station may begin dictating, which audio would be stored in a buffer of dictation manager until a dictation server was selected and/or available. The user would receive in real time or near real time a display of the textual data that may be corrected by the user. The corrective textual data may be transmitted back to the dictation manager to update the user profile. | 07-09-2009 |
20090182559 | CONTEXT SENSITIVE MULTI-STAGE SPEECH RECOGNITION - A system enables devices to recognize and process speech. The system includes a database that retains one or more lexical lists. A speech input detects a verbal utterance and generates a speech signal corresponding to the detected verbal utterance. A processor generates a phonetic representation of the speech signal that is designated a first recognition result. The processor generates variants of the phonetic representation based on context information provided by the phonetic representation. One or more of the variants of the phonetic representation selected by the processor are designated as a second recognition result. The processor matches the second recognition result with stored phonetic representations of one or more of the stored lexical lists. | 07-16-2009 |
20090182560 | USING A PHYSICAL PHENOMENON DETECTOR TO CONTROL OPERATION OF A SPEECH RECOGNITION ENGINE - A transmission device such as a cell phone or other mobile communication device includes a physical phenomenon detection device to perform a “push to talk” function by detecting the occurrence of a particular physical phenomenon and using such detection to start and stop recording an utterance for subsequent analysis by a speech recognition engine. A method of controlling operation of a speech recognition engine in response to detection of a physical phenomenon includes detecting or sensing, via a physical phenomenon detection unit, a predetermined physical phenomenon representative of an intent to invoke operation of a speech recognition engine. In response to the detection or sensing of the predetermined physical phenomenon, a signal is transmitted to a control unit in a communication device. In response to the receipt of the transmitted signal, the utterance received from a user via the communication device is recorded, and the recorded utterance is provided to a speech recognition engine for operation thereon. The user may thus effectuate operation of the speech recognition engine upon the utterance by causing the physical phenomenon to occur. | 07-16-2009 |
20090187403 | INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING PROGRAM AND RECORDING MEDIUM - A copyright managing information processing apparatus includes a storage module for storing copyrighted content including audio data; a first topic module for recognizing audio data in content opened to the public by a to-be-opened information processing apparatus, converting the audio data into text data, extracting keywords from the text data, and conducting topic processing using the keywords to create topic information; a second topic module for recognizing audio data in content stored in the storage means, converting the audio data into text data, extracting keywords from the text data, and conducting topic processing using the keywords to create topic information; and a similarity determining module for comparing the topic information generated by the first topic module with that created by the second topic module for thereby determining presence or absence of similarity therebetween. | 07-23-2009 |
20090192797 | Talk text - The patent that I am requesting is for an existing product on the market. It is for voice transponding or activated texting. Rather than typing a text message, the user can simply speak into their cell phone and the message will be typed into the text message. | 07-30-2009 |
20090198493 | System and Method for Unsupervised and Active Learning for Automatic Speech Recognition - A system and method is provided for combining active and unsupervised learning for automatic speech recognition. This process enables a reduction in the amount of human supervision required for training acoustic and language models and an increase in the performance given the transcribed and un-transcribed data. | 08-06-2009 |
20090204399 | SPEECH DATA SUMMARIZING AND REPRODUCING APPARATUS, SPEECH DATA SUMMARIZING AND REPRODUCING METHOD, AND SPEECH DATA SUMMARIZING AND REPRODUCING PROGRAM - Necessary portions of stored speech data representing conference content are summarized and reproduced in a predetermined time. Conference speech is summarized and reproduced using a speech data summarizing and reproducing apparatus comprising a speech data divider for dividing and structuring conference speech data into several utterance unit data based on utterers, distributed documents, the occurrence frequency of words in speech recognition results, and pauses, an importance level calculator for determining important utterance unit data based on the occurrence frequency of keywords, the information of utterers, and data specified by the user, a summarizer for extracting important utterance unit data and summarizing them within a specified time, and a speech data reproducer for reproducing the summarized speech data in chronological order or an order of importance levels with auxiliary information added thereto. | 08-13-2009 |
20090210225 | SUPPORTING ELECTRONIC TASK MANAGEMENT SYSTEMS VIA TELEPHONE - The disclosed personal information management (PIM) system supports tasks and reminders via a audio user interface. The user creates a task object via a telephone call to the server. The task object may include an audio recording of the user's voice received during the telephone call. The system may convert the user's speech to text and may store the text in the task object. The system may include other structured data further defining the task such as calling party number, due date, start date, priority, status, percentage complete, categories, or the like. As stored by the system, the task may appear with the user's other tasks in the user's client. The PIM system may provide outbound telephone calls to the user as reminders associated with the user's tasks. The user receiving the reminder call may hear voice prompts, computer generated speech, and/or the audio recording associated with the task. | 08-20-2009 |
20090216531 | PROVIDING TEXT INPUT USING SPEECH DATA AND NON-SPEECH DATA - Systems, methods, and computer readable media providing a speech input interface. The interface can receive speech input and non-speech input from a user through a user interface. The speech input can be converted to text data and the text data can be combined with the non-speech input for presentation to a user. | 08-27-2009 |
20090216532 | Automatic Extraction and Dissemination of Audio Impression - A method of creating a voice message is described. A dictated audio input is converted by automatic speech recognition to produce a structured text report that includes report fields with report field data extracted from the dictated audio input. A report message is created for transmission over an electronic communication system to a message recipient. The report message has message fields with message field data based on corresponding report field data. A message audio extract is automatically extracted from a portion of the dictated audio input and attached to the report message. And the report message with the message audio extract attachment is forwarded over the electronic communication system to the message recipient | 08-27-2009 |
20090228274 | USE OF INTERMEDIATE SPEECH TRANSCRIPTION RESULTS IN EDITING FINAL SPEECH TRANSCRIPTION RESULTS - A communication system includes at least one transmitting device and at least one receiving device, one or more network systems for connecting the transmitting device to the receiving device, and an automatic speech recognition (“ASR”) system, including an ASR engine. A user speaks an utterance into the transmitting device, and the recorded speech audio is sent to the ASR engine. The ASR engine returns intermediate transcription results to the transmitting device, which displays the intermediate transcription results in real-time to the user. The intermediate transcription results are also correlated by utterance fragment to final transcription results and displayed to the user. The user may use the information thus presented to make decisions as to whether to edit the final transcription results or to speak the utterance again, thereby repeating the process. The intermediate transcription results may also be used by the user to edit the final transcription results. | 09-10-2009 |
20090240497 | METHOD AND SYSTEM FOR MESSAGE ALERT AND DELIVERY USING AN EARPIECE - A method for an earpiece to manage a delivery of a message can include receiving a notice that a message is available at a communication device, parsing the notice for header information that identifies at least a portion of the message, and requesting a subsequent delivery of at least a portion of the message from the communication device if at least one keyword in the header information is in an acceptance list. Other embodiments are disclosed. | 09-24-2009 |
20090248410 | LIGHTED TEXT DISPLAY FOR MOTOR VEHICLES - A system and method are disclosed enabling drivers of a vehicle to enter, edit and post messages on a graphical display attached to the interior or exterior of a vehicle. Voice recognition software allows the driver of the vehicle to input, and the system to display, a message without the need for diverting his or her attention from the road or otherwise manually interacting with the system. | 10-01-2009 |
20090271191 | METHOD AND SYSTEMS FOR SIMPLIFYING COPYING AND PASTING TRANSCRIPTIONS GENERATED FROM A DICTATION BASED SPEECH-TO-TEXT SYSTEM - A computer-implemented method for simplifying the pasting of textual transcriptions from a transcription engine into an application is described. An audio file is sent to a transcription engine. A textual transcription file of the audio file is received from the transcription engine. The textual transcription file is automatically loaded into a copy buffer. The textual transcription file is pasted from the copy buffer into an application. | 10-29-2009 |
20090271192 | METHOD AND SYSTEMS FOR MEASURING USER PERFORMANCE WITH SPEECH-TO-TEXT CONVERSION FOR DICTATION SYSTEMS - A computer-implemented system and method for evaluating the performance of a user using a dictation system is provided. The system and method include receiving a text or transcription file generated from user audio. A performance metric, such as, for example, words/minute or errors is generated based on the transcription file. The performance metric is provided to an administrator so the administrator can evaluate the performance of the user using the dictation system. | 10-29-2009 |
20090271193 | SUPPORT DEVICE, PROGRAM AND SUPPORT METHOD - A support device, program and support method for supporting generation of text from speech data. The support device includes a confirmed rate calculator, a candidate obtaining unit and a selector. The confirmed rate calculator calculates a confirmed utterance rate which is an utterance rate of a confirmed part having already-confirmed text in the speech data. The candidate obtaining unit obtains multiple candidate character strings resulting from a speech recognition of an unconfirmed part having unconfirmed text in the speech data. The selector preferentially selects, from among the plurality of candidate character strings, a candidate character string whose utterance time consumed in uttering the candidate character string at the confirmed utterance rate is closest to an utterance time of the unconfirmed part of the speech data. | 10-29-2009 |
20090271194 | Speech recognition and transcription among users having heterogeneous protocols - A system is disclosed for facilitating speech recognition and transcription among users employing incompatible protocols for generating, transcribing, and exchanging speech. The system includes a system transaction manager that receives a speech information request from at least one of the users. The speech information request includes formatted spoken text generated using a first protocol. The system also includes a speech recognition and transcription engine, which communicates with the system transaction manager. The speech recognition and transcription engine receives the speech information request from the system transaction manager and generates a transcribed response, which includes a formatted transcription of the formatted speech. The system transmits the response to the system transaction manager, which routes the response to one or more of the users. The latter users employ a second protocol to handle the response, which may be the same as or different than the first protocol. The system transaction manager utilizes a uniform system protocol for handling the speech information request and the response. | 10-29-2009 |
20090276214 | METHOD FOR DUAL CHANNEL MONITORING ON A RADIO DEVICE - A method for dual channel monitoring on a radio device as provided enables efficient use of communication network resources. The method includes receiving at the radio device a first speech signal over a first channel, while simultaneously receiving at the radio device a second speech signal over a second channel. The first speech signal is then processed at the radio device to generate a text transcription of the first speech signal, and the text transcription of the first speech signal is displayed on a display screen of the radio device. An audible voice signal is then produced from a speaker that is operatively connected to the radio device simultaneously with displaying the text transcription of the first speech signal. | 11-05-2009 |
20090276215 | METHODS AND SYSTEMS FOR CORRECTING TRANSCRIBED AUDIO FILES - Methods and systems for correcting transcribed text. One method includes receiving audio data from one or more audio data sources and transcribing the audio data based on a voice model to generate text data. The method also includes making the text data available to a plurality of users over at least one computer network and receiving corrected text data over the at least one computer network from the plurality of users. In addition, the method can include modifying the voice model based on the corrected text data. | 11-05-2009 |
20090281806 | SYSTEM AND METHOD FOR SPELLING RECOGNITION USING SPEECH AND NON-SPEECH INPUT - A system and method for non-speech input or keypad-aided word and spelling recognition is disclosed. The method includes generating an unweighted grammar, selecting a database of words, generating a weighted grammar using the unweighted grammar and a statistical letter model trained on the database of words, receiving speech from a user after receiving the non-speech input and after generating the weighted grammar, and performing automatic speech recognition on the speech and non-speech input using the weighted grammar. If a confidence is below a predetermined level, then the method includes receiving non-speech input from the user, disambiguating possible spellings by generating a letter lattice based on a user input modality, and constraining the letter lattice and generating a new letter string of possible word spellings until a letter string is correctly recognized. | 11-12-2009 |
20090287486 | Methods and Apparatus to Generate a Speech Recognition Library - Methods and apparatus to generate a speech recognition library for use by a speech recognition system are disclosed. An example method comprises identifying a plurality of video segments having closed caption data corresponding to a phrase, the plurality of video segments associated with respective ones of a plurality of audio data segments, computing a plurality of difference metrics between a baseline audio data segment associated with the phrase and respective ones of the plurality of audio data segments, selecting a set of the plurality of audio data segments based on the plurality of difference metrics, identifying a first one of the audio data segments in the set as a representative audio data segment, determining a first phonetic transcription of the representative audio data segment, and adding the first phonetic transcription to a speech recognition library when the first phonetic transcription differs from a second phonetic transcription associated with the phrase in the speech recognition library. | 11-19-2009 |
20090287487 | Systems and Methods for a Visual Indicator to Track Medical Report Dictation Progress - Certain embodiments of the present invention provide a system for medical report dictation including a database component, a voice recognition component, and a user interface component. The database component is adapted to store a plurality of available templates. Each of the plurality of available templates is associated with a template cue. Each template cue includes a list of elements. The voice recognition component is adapted to convert a voice data input to a transcription data output. The user interface component is adapted to receive voice data from a user related to an image and the user interface component is adapted to present a visual indicator to the user. The visual indicator is based on a template cue associated with a template selected from the plurality of available templates. The user interface utilizes the voice recognition component to update the visual indicator. | 11-19-2009 |
20090287488 | TEXT DISPLAY, TEXT DISPLAY METHOD, AND PROGRAM - A text display in which speech information can be effectively conveyed in a text to the user. The text display comprises a speech input section ( | 11-19-2009 |
20090292539 | SYSTEM AND METHOD FOR THE SECURE, REAL-TIME, HIGH ACCURACY CONVERSION OF GENERAL QUALITY SPEECH INTO TEXT - Described is a speech-to-text conversion system and method that provides secure, real-time and high-accuracy conversion of general-quality speech into text. The system is designed to interface with external devices and services, providing a simple and convenient manner to transcribe audio that may be stored elsewhere such as a wireless phone's voice mail, or occurring between two or more parties such as a conference call. The first step in the system's process ensures secure and private transcription by separating an audio stream into many audio shreds, each of which has duration of only a few seconds and cannot reveal the context of the conversation. A workforce of geographically distributed transcription agents who transcribe the audio shreds is able to generate transcription in real time, with many agents working in parallel on a single conversation. No one agent (or group of agents) receives a sufficient number of audio shreds to reconstruct the context of any conversation. The use of human transcribers allows the system to overcome limitations typical of computer-based speech recognition and permits accurate transcription of general-quality speech even in acoustically hostile environments. | 11-26-2009 |
20090299743 | METHOD AND SYSTEM FOR TRANSCRIBING TELEPHONE CONVERSATION TO TEXT - Methods and systems for transcribing portions of a telephone conversation to text enables users to request transcription such as by pressing a button on a mobile device, with the request transmitted to a server including transcription software. The server transcribes some or all of the telephone conversation to text, and transmits the text to the mobile device. The text data may be scanned for selected information, and only the selected information transmitted to the mobile device. The selected information may be automatically stored in memory of the mobile device, such as in an address book. | 12-03-2009 |
20090306979 | Data processing system for autonomously building speech identification and tagging data - A method, system, and computer program product for autonomously transcribing and building tagging data of a conversation. A corpus processing agent monitors a conversation and utilizes a speech recognition agent to identify the spoken languages, speakers, and emotional patterns of speakers of the conversation. While monitoring the conversation, the corpus processing agent determines emotional patterns by monitoring voice modulation of the speakers and evaluating the context of the conversation. When the conversation is complete, the corpus processing agent determines synonyms and paraphrases of spoken words and phrases of the conversation taking into consideration any localized dialect of the speakers. Additionally, metadata of the conversation is created and stored in a link database, for comparison with other processed conversations. A corpus, a transcription of the conversation containing metadata links, is then created. The corpus processing agent also determines the frequency of spoken keywords and phrases and compiles a popularity index. | 12-10-2009 |
20090306980 | MOBILE TERMINAL AND TEXT CORRECTING METHOD IN THE SAME - A mobile terminal including a voice receiving unit configured to receive input voice, a controller configured to convert the received input voice to text, a display configured to display the converted text, and an input unit configured to select a word included in the displayed converted text. Further, the controller is further configured to control the display to display a plurality of possible candidate words corresponding to the selected word in an arrangement in which a corresponding displayed candidate word is displayed with a proximity from the selected word that is based on how similar the corresponding candidate word is to the selected word. | 12-10-2009 |
20090306981 | Systems and methods for conversation enhancement - This invention description details systems and methods for improving human conversations by enhancing conversation participants' ability to: —Distill out and record core ideas of conversations. —Classify and prioritize these key concepts. —Recollect commitments and issues and take appropriate action. —Analyze and uncover new insight from the linkage of these ideas with those from other conversations. | 12-10-2009 |
20090313013 | SIGN LANGUAGE CAPABLE MOBILE PHONE - A mobile phone includes a display, a data capturing module, and a sign language translating system. The data capturing module is configured to capture images of a user communicating with sign language. The sign language translating system is connected to the data capturing module. The sign language translating system includes a data input module and a data output module. The data input module is configured to receive and transform a text data or a speech data to a corresponding sign language image, and display the sign language image on the display, the data output module is configured to receive and transform a sign language image captured by the data capturing module to a corresponding text data or a corresponding speech data. | 12-17-2009 |
20090313014 | MOBILE TERMINAL AND METHOD FOR RECOGNIZING VOICE THEREOF - A method for detecting a character or a word emphasized by a user from a voice inputted in a mobile terminal to refer it as meaningful information for a voice recognition, or emphatically displaying the user-emphasized character or word in a pre-set format when the inputted voice is converted into text, and a mobile terminal implementing the same are disclosed. The mobile terminal includes: a microphone to receive a voice of user; a controller to convert the received voice into corresponding text and detect a character or a word emphatically pronounced by the user from the voice; and a display unit to emphatically display the detected character or word in a pre-set format when the converted text is displayed. | 12-17-2009 |
20090319266 | MULTIMODAL INPUT USING SCRATCHPAD GRAPHICAL USER INTERFACE TO EDIT SPEECH TEXT INPUT WITH KEYBOARD INPUT - A system and method for multimodal input into an application program. The method may include performing speech recognition on speech audio input to thereby produce recognized speech text input for insertion into a document of an application program, the document having keyboard focus. The method may also include identifying the document as being text service framework unaware. The method may further include displaying the recognized speech text input in a scratchpad graphical user interface for editing the recognized speech text input. The method may further include reflecting keyboard input bound for the document to the scratchpad graphical user interface, while preserving the keyboard focus of the document. The method may also include displaying the reflected keyboard input on the scratchpad graphical user interface, to thereby effect edits in the recognized speech text input. | 12-24-2009 |
20090319267 | METHOD, A SYSTEM AND A DEVICE FOR CONVERTING SPEECH - An arrangement for converting speech into text comprises a mobile device ( | 12-24-2009 |
20090326936 | VOICE RECOGNITION DEVICE, VOICE RECOGNITION METHOD, AND VOICE RECOGNITION PROGRAM - A voice recognition device, method, and program for operating a plurality of control objects recognizing a plurality of user-provided verbal commands. The voice recognition device determines a control object and control content from predefined types of control objects and contents, based on a recognition result of the input verbal command. A voice recognition unit converts input verbal commands into a text expressed with a series of words, a first parsing unit performs an identification process of a first control candidate group as a control candidate for the control object and control content, a second parsing unit performs an identification process of a second control candidate group as a control candidate for the control object and control content, and a control candidate identification unit identifies a final control candidate group for determining the control object and control content from the first control candidate group and the second control candidate group. | 12-31-2009 |
20090326937 | USING PERSONALIZED HEALTH INFORMATION TO IMPROVE SPEECH RECOGNITION - The claimed subject matter provides systems and/or methods that improve speech recognition in the medical context. The system includes mechanisms that access personal health records associated with patients and/or analyze the personal health records for current diseases and/or past ailments. The system thereafter acquires attributes associated with the diseases or ailments and dynamically populates a speech model with these attributes. The speech model utilizes the attributes associated with the diseases or ailments to more accurately transcribe a voice pattern into text that can be projected on a visual display or persisted to a storage device. | 12-31-2009 |
20090326938 | MULTIWORD TEXT CORRECTION - A method including detecting a selection of a plurality of erroneous words in text presented on a display of a device, in an automatic speech recognition system, receiving sequentially dictated corrections for the selected erroneous words in a single, continuous operation where each dictated correction corresponds to at least one of the selected erroneous words, and replacing the plurality of erroneous words with one or more corresponding words of the dictated corrections where each erroneous word is matched with the one or more corresponding words of the dictated corrections in an order the erroneous words appear according to a reading direction of the text. | 12-31-2009 |
20090326939 | SYSTEM AND METHOD FOR TRANSCRIBING AND DISPLAYING SPEECH DURING A TELEPHONE CALL - A system and method for providing speech transcription to a user during a telephone call may include a receiver configured to receive a telecommunications signal forming a telephone call. The telecommunications signal communicates speech data representative of words spoken by a telephone call participant. A processing unit may be in communication with the receiver and be configured to transcribe the speech data representative of words into text. A display unit may be in communication with the processing unit and be configured to display the text for a user during the telephone call. | 12-31-2009 |
20090326940 | AUTOMATED VOICE-OPERATED USER SUPPORT - An information device for voice-operated support of a user includes a storage medium, a knowledge database, a processing unit, an input device, a recording component, a transcription component, and an ontological analysis component. A signal is detected by the input device and stored by the recording component via the processing unit in the storage medium. The signal is transformed into a corresponding text by the transcription component via the processing unit and stored in the storage medium. The ontological analysis component categorizes the text via the processing unit using the knowledge database and processes the text using the categorization and the knowledge database via the processing unit. | 12-31-2009 |
20090326941 | SPEECH RECOGNITION CIRCUIT USING PARALLEL PROCESSORS - A speech recognition circuit comprises an input buffer for receiving processed speech parameters. A lexical memory contains lexical data for word recognition. The lexical data comprises a plurality of lexical tree data structures. Each lexical tree data structure comprises a model of words having common prefix components. An initial component of each lexical tree structure is unique. A plurality of lexical tree processors are connected in parallel to the input buffer for processing the speech parameters in parallel to perform parallel lexical tree processing for word recognition by accessing the lexical data in the lexical memory. A results memory is connected to the lexical tree processors for storing processing results from the lexical tree processors and lexical tree identifiers to identify lexical trees to be processed by the lexical tree processors. A controller controls the lexical tree processors to process lexical trees identified in the results memory by performing parallel processing on a plurality of said lexical tree data structures. | 12-31-2009 |
20100030557 | Voice and text communication system, method and apparatus - The disclosure relates to systems, methods and apparatus to convert speech to text and vice versa. One apparatus comprises a vocoder, a speech to text conversion engine, a text to speech conversion engine, and a user interface. The vocoder is operable to convert speech signals into packets and convert packets into speech signals. The speech to text conversion engine is operable to convert speech to text. The text to speech conversion engine is operable to convert text to speech. The user interface is operable to receive a user selection of a mode from among a plurality of modes, wherein a first mode enables the speech to text conversion engine, a second mode enables the text to speech conversion engine, and a third mode enables the speech to text conversion engine and the text to speech conversion engine. | 02-04-2010 |
20100036661 | Methods and Systems for Providing Grammar Services - A computing system, comprising: an I/O platform for interfacing with a user; and a processing entity configured to implement a dialog with the user via the I/O platform. The processing entity is further configured for: identifying a grammar template and an instantiation context associated with a current point in the dialog; causing creation of an instantiated grammar model from the grammar template and the instantiation context; storing the instantiated grammar model in a memory; and interpreting user input received via the I/O platform in accordance with the instantiated grammar model. Also, a grammar authoring environment supporting a variety of grammar development tools is disclosed. | 02-11-2010 |
20100036662 | JOURNALING DEVICE AND INFORMATION MANAGEMENT SYSTEM - A portable journaling device is useful for thought and behavior tracking, documenting, optimizing, teaching, and training. The device is arranged and configured to receive and store voice data, and to transmit the received voice data to a server. The server transcribes the voice data, if possible, and stores the transcribed data. If the server is unable to transcribe the voice data, the voice data is automatically forwarded to a data transcription service for manual transcription. The manually transcribed data is then received by the server where the manually transcribed data is stored. | 02-11-2010 |
20100042409 | AUTOMATED VOICE SYSTEM AND METHOD - A voice browser for use in a customer interaction management system for interacting with a customer, based on data contained in one or more generic knowledge trees. The system can traverse the knowledge tree based on customer responses to presented questions until a leaf node of the knowledge tree is reached. The information contained in the leaf node is then presented to the customer. | 02-18-2010 |
20100057455 | Method and System for 3D Lip-Synch Generation with Data-Faithful Machine Learning - A method for generating three-dimensional speech animation is provided using data-driven and machine learning approaches. It utilizes the most relevant part of the captured utterances for the synthesis of input phoneme sequences. If highly relevant data are missing or lacking, then it utilizes less relevant (but more abundant) data and relies more heavily on machine learning for the lip-synch generation. | 03-04-2010 |
20100057456 | VOICE RESPONSE UNIT MAPPING - A system, method and program product for mapping voice response units (VRUs). A system is provided that includes: an interrogation system for interrogating a VRU and gathering a hierarchical set of options associated with the VRU; a map building system for converting the hierarchical set of options into a VRU map suitable for display; and a user interface for displaying the VRU map to an end user. | 03-04-2010 |
20100057457 | SPEECH RECOGNITION SYSTEM AND PROGRAM THEREFOR - An unknown word is additionally registered in a speech recognition dictionary by utilizing a correction result, and a new pronunciation of the word that has been registered in a speech recognition dictionary is additionally registered in the speech recognition dictionary, thereby increasing the accuracy of speech recognition. The start time and finish time of each phoneme unit in speech data corresponding to each phoneme included in a phoneme sequence acquired by a phoneme sequence converting section | 03-04-2010 |
20100057458 | IMAGE PROCESSING APPARATUS, IMAGE PROCESSING PROGRAM AND IMAGE PROCESSING METHOD - Regarding audio data related to document data, an image processing apparatus pertaining to the present invention generates text data by using a speech recognition technology in advance, and determines delimiter positions in the text data and the audio data in correspondence. In a keyword search, if a keyword is detected in the text data, the image processing apparatus plays the audio data from a delimiter that is immediately before the keyword. | 03-04-2010 |
20100057459 | VOICE RECOGNITION SYSTEM FOR INTERACTIVELY GATHERING INFORMATION TO GENERATE DOCUMENTS - A voice recognition system for interactively gathering information to generate a document, form, or application. An user establishes a connection with the voice recognition system and provides verbal responses to a plurality of verbal questions generated by voice recognition system to compile a document, form or application. The voice recognition system converts the user's verbal responses to textually converted responses. | 03-04-2010 |
20100057460 | VERBAL LABELS FOR ELECTRONIC MESSAGES - Verbal labels for electronic messages, as well as systems and methods for making and using such labels, are disclosed. A verbal label is a label containing audio data (such as a digital audio file of a user's voice and/or a speaker template thereof) that is associated with one or more electronic messages. Verbal labels permit a user to more efficiently manipulate e-mail and other electronic messages by voice. For example, a user can add such labels verbally to an e-mail or to a group of e-mails, thereby permitting these messages to be sorted and retrieved more easily. | 03-04-2010 |
20100063814 | APPARATUS, METHOD AND COMPUTER PROGRAM PRODUCT FOR RECOGNIZING SPEECH - A speech recognition apparatus includes a document input unit configured to input a document including a reference term which a user refers to; a vocabulary storage unit configured to store a vocabulary list including a group of notation information, reading information and part of speech; a hypernym hyponym relation storage unit configured to store a hypernym hyponym relation tree on a concept between terms; a hypernym acquisition unit configured to search a hypernym of the reference term from the hypernym hyponym relation tree and to acquire the notation information and the part of speech of the hypernym from the vocabulary list; a correspondence storage unit configured to store a correspondence list showing correspondence between the hypernym and the reference term; a display unit configured to display the hypernym; a speech input unit configured to input speech, including the hypernym of the reference term, which the user speaks from the display unit; a speech recognition unit configured to convert the speech into text information by using the vocabulary list; a replacing unit configured to replace the hypernym, which is included in the text information, with the reference term; and an output unit configured to output the text information replaced by the replacing unit. | 03-11-2010 |
20100063815 | REAL-TIME TRANSCRIPTION - A computing system accepts audio from one or more sources, parses the audio into chunks, and transcribes the chunks in substantially real time. Some transcription is performed automatically, while other transcription is performed by humans who listen to the audio and enter the words spoken and/or the intent of the caller (such as directions given to the system). The system provides for participants a user interface that is updated in substantially real time with the transcribed text from the audio stream(s). A single audio line can be used for simple transcription, and multiple audio lines are used to provide a real-time transcript of a conference call, deposition, or the like. A pool of analysts creates, checks, and/or corrects transcription, and callers/observers can even assist in the correction process through their respective user interfaces. Ads derived from the transcript are displayed together with the text in substantially real time. | 03-11-2010 |
20100070275 | SPEECH TO MESSAGE PROCESSING - Voice message processors are configured to produce text representations of voice messages. The text representations can be compacted based on one or more abbreviation libraries or rule libraries. Abbreviation processing can be applied to produce a compact text representation based on display properties of a destination device or to enhance user perception. Text representation length can be reduced based on abbreviations in a standard abbreviation list, a user specific abbreviation list, or a combination of standard and custom lists. In some examples, text length is shortened based on stored rules. Mobile stations are configured to receive text representations of voice messages and request delivery of the associated voice messages based on message identifiers or message availability indicators that are presented on mobile station display. Network elements comprise a voice message processor and are configured to produce text representations and deliver text representation or voice messages based as requested by message recipients. | 03-18-2010 |
20100076760 | DIALOG FILTERING FOR FILLING OUT A FORM - The invention discloses a system and method for filling out a form from a dialog between a caller and a call center agent. The caller and the caller center agent can have the dialog in the form of telephone conversation, instant messaging chat or email exchange. The system and method provides a list of named entities specific to the call center operation and uses a translation and transcription minor to filter relevant elements from the dialog between the caller and the call center agent. The relevant elements filtered from the dialog are subsequently displayed on the call center agent's computer screen to fill out application forms automatically or through drag and drop operations by the call center agent. | 03-25-2010 |
20100076761 | Decoding-Time Prediction of Non-Verbalized Tokens - Non-verbalized tokens, such as punctuation, are automatically predicted and inserted into a transcription of speech in which the tokens were not explicitly verbalized. Token prediction may be integrated with speech decoding, rather than performed as a post-process to speech decoding. | 03-25-2010 |
20100076762 | Coarticulation Method for Audio-Visual Text-to-Speech Synthesis - A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. The processor reads first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus. The processor reads, based on the first data. second data comprising images of a noise-producing entity. The processor generates an animated sequence of the noise-producing entity. | 03-25-2010 |
20100088095 | METHODS AND SYSTEM TO GENERATE DATA ASSOCIATED WITH A MEDICAL REPORT USING VOICE INPUTS - Methods and system to generate data associated with a medical report using voice inputs are described herein. In one example implementation, a computer-implemented method to automatically generate data associated with a medical report using voice inputs received during a first encounter includes receiving a voice input from a source and determining an identity of the source. Additionally, the method includes performing a speech-to-text conversion on the voice input to generate a text string representing the voice input and associating the text string with the identity of the source. Further, the example method includes identifying and selecting one or more keywords from the text string. The one or more keywords are associated with one or more data fields. Further still, the method includes populating the one or more data fields with the identified keywords according to values associated with the identified keywords and the identity of the source. | 04-08-2010 |
20100088096 | Hand held speech recognition device - A hand held device is used for interactively converting speech into text with at least one speaker. The device includes: a screen for displaying text; at least one voice input source for receiving speech from a single speaker; a sound processor operably connected to the voice input source; a storage device capable of storing an operating system, a speech recognition engine, speech-to-text applications and data files; a power source; a navigation system; and a control system operably connected to the screen, each voice input source, the storage device, the power source and the navigation system. | 04-08-2010 |
20100094627 | AUTOMATIC IDENTIFICATION OF TAGS FOR USER GENERATED CONTENT - A method and system for automatically identifying tags for a media item. An audio track associated with a media item is analyzed. References to individuals in the audio track are compared to known acquaintances of a user. Matches are identified as potential tags. Duplicate matches can be presented to the user for resolution. | 04-15-2010 |
20100094628 | System and Method for Latency Reduction for Automatic Speech Recognition Using Partial Multi-Pass Results - A system and method is provided for reducing latency for automatic speech recognition. In one embodiment, intermediate results produced by multiple search passes are used to update a display of transcribed text. | 04-15-2010 |
20100100376 | VISUALIZATION INTERFACE OF CONTINUOUS WAVEFORM MULTI-SPEAKER IDENTIFICATION - A method implemented in a computer infrastructure having computer executable code having programming instructions tangibly embodied on a computer readable storage medium. The programming instructions are operable to receive a current waveform of a communication between a plurality of participants. Additionally, the programming instructions are operable to create a voiceprint from the current waveform if the current waveform is of a human voice. Furthermore, the programming instructions are operable to determine one of whether a match exists between the voiceprint and one library waveform of one or more library waveforms, whether a correlation exists between the voiceprint and a number of library waveforms of the one or more library waveforms and whether the voiceprint is unique. Additionally, the programming instructions are operable to transcribe the current waveform into text and provide a match indication display (MID) indicating an association between the current waveform and the one or more library waveforms based on the determining. | 04-22-2010 |
20100100377 | GENERATING AND PROCESSING FORMS FOR RECEIVING SPEECH DATA - A system and method for dynamically generating and processing forms for receiving data, such as text-based data or speech data provided over a telephone, mobile device, via a computer and microphone, etc. is disclosed. A form developer can use a toolkit provided by the system to create forms that end-users connect to and complete. The system provides a user-friendly interface for the form developer to create various input fields for the form and impose parameters on the data that may be used to complete or populate those fields. These fields may be included to receive specific information, such as the name of the person filling out the form, or may be free-form, allowing a user to provide a continuous stream of information. Furthermore, the system allows a form developer to establish means for providing access to the form and set access limits on the form. Other aspects are disclosed herein. | 04-22-2010 |
20100100378 | METHOD OF AND SYSTEM FOR IMPROVING ACCURACY IN A SPEECH RECOGNITION SYSTEM - A method for transcribing an audio response includes:
| 04-22-2010 |
20100100379 | VOICE RECOGNITION CORRELATION RULE LEARNING SYSTEM, VOICE RECOGNITION CORRELATION RULE LEARNING PROGRAM, AND VOICE RECOGNITION CORRELATION RULE LEARNING METHOD - A speech recognition rule learning device is connected to a speech recognition device that uses conversion rules for conversion between a first-type character string expressing a sound and a second-type character string for forming a recognition result. The character string recording unit records a first-type character string and a corresponding second-type character string. The extraction unit extracts second-type learned character string candidates. The rule learning unit extracts, from the second-type learned character string candidates, a second-type learned character string that matches at least part of the second-type character string in the character string recording unit; extracts a first-type learned character string from the first-type character string in the character string recording unit; and adds the correspondence relationship between the first-type learned character string and the second-type learned character string to the conversion rules. | 04-22-2010 |
20100106498 | SYSTEM AND METHOD FOR TARGETED ADVERTISING - Disclosed herein are systems, methods, and computer readable-media for targeted advertising, the method including receiving an audio stream containing user speech from a first device, generating text based on the speech contained in the audio stream, identifying at least one key phrase in the text, receiving from an advertiser an advertisement related to the identified at least one key phrase, and displaying the advertisement. In one aspect, the method further includes receiving from an advertiser a set of rules associated with the received advertisement and displaying the advertisement in accordance with the associated set of rules. The first device can be a converged voice and data communications device connected to a network. The communications device can generate text based on the speech. In one aspect, the method displays the advertisement on one or both of a converged voice and data communications device and a second communications device. A central server can generate text based on the speech. At least one key phrase in the text can be identified based on a confidence score threshold. In another aspect, the method further includes receiving multiple audio streams containing speech from a same user and generating text based on the speech contained in the multiple audio streams. The advertisement can be displayed after the audio stream terminates. | 04-29-2010 |
20100106499 | METHODS AND APPARATUS FOR LANGUAGE IDENTIFICATION - In a multi-lingual environment, a method and apparatus for determining a language spoken in a speech utterance. The method and apparatus test acoustic feature vectors extracted from the utterances against acoustic models associated with one or more of the languages. Speech to text is then performed for the language indicated by the acoustic testing, followed by textual verification of the resulting text. During verification, the resulting text is processed by language specific NLP and verified against textual models associated with the language. The system is self-learning, i.e., once a language is verified or rejected, the relevant feature vectors are used for enhancing one or more acoustic models associated with one or more languages, so that acoustic determination may improve. | 04-29-2010 |
20100106500 | METHOD AND SYSTEM FOR ENHANCING VERBAL COMMUNICATION SESSIONS - An approach is provided for enhancing verbal communication sessions. A verbal component of a communication session is converted into textual information. The converted textual information is scanned for a text string to trigger an application. The application is invoked to provide supplemental information about the textual information or to perform an action in response to the textual information for or on behalf of a party of the communication session. The supplemental information or a confirmation of the action is transmitted to the party. | 04-29-2010 |
20100114571 | INFORMATION RETRIEVAL SYSTEM, INFORMATION RETRIEVAL METHOD, AND INFORMATION RETRIEVAL PROGRAM - An information retrieval system comprises: a speech input unit for inputting speech; an information storage unit for storing information with which speech information, of a length with which text degree of similarity is computable, is associated as a retrieval tag; an information selection unit for comparing a feature of each spoken content item extracted from each item of said speech information, with a feature of spoken content extracted from said input speech, to select information with which speech information similar to input speech is associated. The system further comprises an output unit for outputting information selected by said information selection unit, as information associated with input speech. | 05-06-2010 |
20100121637 | Semi-Automatic Speech Transcription - A semi-automatic speech transcription system of the invention leverages the complementary capabilities of human and machine, building a system which combines automatic and manual approaches. With the invention, collected audio data is automatically distilled into speech segments, using signal processing and pattern recognition algorithms. The detected speech segments are presented to a human transcriber using a transcription tool with a streamlined transcription interface, requiring the transcriber to simply “listen and type”. This eliminates the need to manually navigate the audio, coupling the human effort to the amount of speech, rather than the amount of audio. Errors produced by the automatic system can be quickly identified by the human transcriber, which are used to improve the automatic system performance. The automatic system is tuned to maximize the human transcriber efficiency. The result is a system which takes considerably less time than purely manual transcription approaches to produce a complete transcription. | 05-13-2010 |
20100121638 | SYSTEM AND METHOD FOR AUTOMATIC SPEECH TO TEXT CONVERSION - Speech recognition is performed in near-real-time and improved by exploiting events and event sequences, employing machine learning techniques including boosted classifiers, ensembles, detectors and cascades and using perceptual clusters. Speech recognition is also improved using tandem processing. An automatic punctuator injects punctuation into recognized text streams. | 05-13-2010 |
20100131271 | SYSTEMS AND METHODS TO REDIRECT AUDIO BETWEEN CALLERS AND VOICE APPLICATIONS - A call center environment is provided that allows a customer service representative to populate a workstation display screen with data using either keystrokes or voice input. The voice input is provided to the workstation using a voice overlay and voice platform to convert audio into data usable by the workstation to populate the screen. | 05-27-2010 |
20100138221 | DEDICATED HARDWARE/SOFTWARE VOICE-TO-TEXT SYSTEM - A text preparation system has a first and a second CPU, with the first dedicated to a conventional voice-to-text software and the second to all other functions including a voice-to-text correction software. Voice commands enable the user to initiate the first and the second voice-to-text software and associated lexicons alternately, the second software and lexicon providing a corrections mode for errors made by the first voice-to-text software. | 06-03-2010 |
20100145694 | REPLYING TO TEXT MESSAGES VIA AUTOMATED VOICE SEARCH TECHNIQUES - An automated “Voice Search Message Service” provides a voice-based user interface for generating text messages from an arbitrary speech input. Specifically, the Voice Search Message Service provides a voice-search information retrieval process that evaluates user speech inputs to select one or more probabilistic matches from a database of pre-defined or user-defined text messages. These probabilistic matches are also optionally sorted in terms of relevancy. A single text message from the probabilistic matches is then selected and automatically transmitted to one or more intended recipients. Optionally, one or more of the probabilistic matches are presented to the user for confirmation or selection prior to transmission. Correction or recovery of speech recognition errors avoided since the probabilistic matches are intended to paraphrase the user speech input rather than exactly reproduce that speech, though exact matches are possible. Consequently, potential distractions to the user are significantly reduced relative to conventional speech recognition techniques. | 06-10-2010 |
20100153105 | SYSTEM AND METHOD FOR REFERRING TO ENTITIES IN A DISCOURSE DOMAIN - Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for referring to entities. The method includes receiving domain-specific training data of sentences describing a target entity in a context, extracting a speaker history and a visual context from the training data, selecting attributes of the target entity based on at least one of the speaker history, the visual context, and speaker preferences, generating a text expression referring to the target entity based on at least one of the selected attributes, the speaker history, and the context, and outputting the generated text expression. The weighted finite-state automaton can represent partial orderings of word pairs in the domain-specific training data. The weighted finite-state automaton can be speaker specific or speaker independent. The weighted finite-state automaton can include a set of weighted partial orderings of the training data for each possible realization. | 06-17-2010 |
20100153106 | CONVERSATION MAPPING - A method may include receiving communications associated with a communication session. The communication session may correspond to a telephone conversation, text-based conversation or a multimedia conversation. The method may also include identifying portions of the communication session and storing the identified portions. The method may further include receiving a request to retrieve information associated with the communication session and providing to a display, information associated with the identified portions. | 06-17-2010 |
20100161327 | SYSTEM-EFFECTED METHODS FOR ANALYZING, PREDICTING, AND/OR MODIFYING ACOUSTIC UNITS OF HUMAN UTTERANCES FOR USE IN SPEECH SYNTHESIS AND RECOGNITION - A computer-implemented method for automatically analyzing, predicting, and/or modifying acoustic units of prosodic human speech utterances for use in speech synthesis or speech recognition. Possible steps include: initiating analysis of acoustic wave data representing the human speech utterances, via the phase state of the acoustic wave data; using one or more phase state defined acoustic wave metrics as common elements for analyzing, and optionally modifying, pitch, amplitude, duration, and other measurable acoustic parameters of the acoustic wave data, at predetermined time intervals; analyzing acoustic wave data representing a selected acoustic unit to determine the phase state of the acoustic unit; and analyzing the acoustic wave data representing the selected acoustic unit to determine at least one acoustic parameter of the acoustic unit with reference to the determined phase state of the selected acoustic unit. Also included are systems for implementing the described and related methods. | 06-24-2010 |
20100169091 | DEVICE, SYSTEM AND METHOD FOR PROVIDING TARGETED ADVERTISEMENTS AND CONTENT - An aspect of the present invention is drawn to an audio data processing device for use by a user to control a system and for use with a microphone, a user demographic profiles database and a content/ad database. The microphone may be operable to detect speech and to generate speech data based on the detected speech. The user demographic profiles database may be capable of having demographic data stored therein. The content/ad database may be capable of having at least one of content data and advertisement data stored therein. The audio data processing device includes a voice recognition portion, a voice analysis portion and a speech to text portion. The voice recognition portion may be operable to process user instructions based on the speech data. The voice analysis portion may be operable to determine characteristics of the user based on the speech data. The speech to text portion may be operable to determine interests of the user. | 07-01-2010 |
20100169092 | VOICE INTERFACE OCX - A medical dictation workflow system can be customized from the selection of available user application programs. A voice interface OCX can interface speech technologies with the selected user application programs of the medical dictation workflow system. The medical dictation workflow system may be directed to generating reports through filling out defined fields. The fields can be generated through a tracking system subscribing to a core reporting system and requesting certain information be captured or through a user. The voice interface OCX can provide macros so a user can customize the fields, navigate among the fields, or fill in the fields with data through a voice recognition engine or a wave player control. The data entered into the fields can be automatically entered into corresponding database elements of a database. | 07-01-2010 |
20100174543 | SYSTEMS FOR DISPLAYING CONVERSIONS OF TEXT EQUIVALENTS - Embodiments of the invention include a system for displaying an audit diagram. The system includes a monitor capable of electronically displaying the audit diagram. The monitor includes a text equivalent constructed from an input text, and a conversion representation including an operator indicator, a result arrow, and a rule arrow. | 07-08-2010 |
20100179811 | IDENTIFYING KEYWORD OCCURRENCES IN AUDIO DATA - Occurrences of one or more keywords in audio data are identified using a speech recognizer employing a language model to derive a transcript of the keywords. The transcript is converted into a phoneme sequence. The phonemes of the phoneme sequence are mapped to the audio data to derive a time-aligned phoneme sequence that is searched for occurrences of keyword phoneme sequences corresponding to the phonemes of the keywords. Searching includes computing a confusion matrix. The language model used by the speech recognizer is adapted to keywords by increasing the likelihoods of the keywords in the language model. For each potential occurrences keywords detected, a corresponding subset of the audio data may be played back to an operator to confirm whether the potential occurrences correspond to actual occurrences of the keywords. | 07-15-2010 |
20100185443 | System and Method for Processing Speech - Systems and methods for processing speech are provided. A system may include a speech recognition interface and a processor. The processor may convert speech received from a call at the speech recognition interface to at least one word string. The processor may parse each word string of the at least one word string into first objects and first actions. The processor may access a synonym table to determine second objects and second actions based on the first objects and the first actions. The processor may also select a preferred object and a preferred action from the second objects and the second actions. | 07-22-2010 |
20100198594 | MOBILE PHONE COMMUNICATION GAP RECOVERY - Mobile phone signals may be corrupted by noise, fading, interference with other signals, and low strength field coverage of a transmitting and/or a receiving mobile phone as they pass through the communication network (e.g., free space). Because of the corruption of the mobile phone signal, a voice conversation between a caller and a receiver may be interrupted and there may be gaps in a received oral communication from one or more participants in the voice conversation forcing either or both the caller and the receiver to repeat the conversation. Transmitting a transcript of the oral communication along with a voice signal comprising the oral communication can help ensure that voice conversation is not interrupted due to a corrupted voice signal. The transcript of the oral communication can be used to retrieve parts of the oral communication lost in transmission (e.g., by fading, etc.) to make the conversation more fluid. | 08-05-2010 |
20100198595 | SYSTEMS AND METHODS FOR INTERACTIVELY ACCESSING HOSTED SERVICES USING VOICE COMMUNICATIONS - In a system comprising a voice recognition module, a session manager, and a voice generator module, a method for providing a service to a user comprises receiving an utterance via the voice recognition module; converting the utterance into one or more structures using lexicon tied to an ontology; identifying concepts in the utterance using the structures; provided the utterance includes sufficient information, selecting a service based on the concepts; generating a text message based on the selected service; and converting the text message to a voice message using the voice generator. | 08-05-2010 |
20100198596 | MESSAGE TRANSCRIPTION, VOICE QUERY AND QUERY DELIVERY SYSTEM - A message transmission system accepts a telephone call from a user who wishes to send an e-mail message, send an SMS message, perform an Internet query or retrieve his or her electronic mail. The voice call is transcribed and the message is sent, or the question in the voice call is transcribed and answered by an agent. Any number of agents connect to a central site over an Internet connection and transcribe messages or answer queries in an assembly line like fashion. In addition, a Web query delivery system accepts a query or statement from a user; the query is transcribed, classified, and then broadcast over any medium to any number of experts or web sites that desire to answer the particular type of query received. The entire query is delivered to an expert or web site who provides a full answer to the user. | 08-05-2010 |
20100204989 | APPARATUS AND METHOD FOR QUEUING JOBS IN A DISTRIBUTED DICTATION /TRANSCRIPTION SYSTEM - A distributed dictation/transcription system is provided. The system provides a client station, dictation manager, and dictation server connected such that the dictation manager can select a dictation server to transcribe audio from the client station. A job queue at the dictation manager holds the queues the audio to be provided to the dictation servers. The dictation manager reviews all jobs in the job queue and send audio with a user profile matching a user profile already uploaded to the dictation server regardless of whether the matching audio is next in the job queue. If alternative audio has been pending over a predetermined amount of time or has a higher priority, the alternative audio is sent to the dictation server. | 08-12-2010 |
20100211389 | System of communication employing both voice and text - The disclosed invention comprises a method of communication that integrates both speech to text technology and text to speech technology. In its simplest form, one user employs a communication device having means for converting vocal signals into text; this converted text is then sent to the other user. This recipient is presented with the sender's text and to respond, he can enter text which is then output to the first user as speech sounds. This system creates an opportunity for two users to carry on a conversation, one using his voice (and hearing a synthesized voice in response) and the other using text (and receiving speech rendered as text): the first user has a voice conversation; the second user has a text based conversation. This system allows a user to select his preferred method of communication, regardless of the selection of his communication partner. | 08-19-2010 |
20100217591 | VOWEL RECOGNITION SYSTEM AND METHOD IN SPEECH TO TEXT APPLICTIONS - The present invention provides systems, software and methods method for accurate vowel detection in speech to text conversion, the method including the steps of applying a voice recognition algorithm to a first user speech input so as to detect known words and residual undetected words; and detecting at least one undetected vowel from the residual undetected words by applying a user-fitted vowel recognition algorithm to vowels from the known words so as to accurately detect the vowels in the undetected words in the speech input, to enhance conversion of voice to text. | 08-26-2010 |
20100223055 | MOBILE WIRELESS COMMUNICATIONS DEVICE WITH SPEECH TO TEXT CONVERSION AND RELATED METHODS - A mobile wireless communications device may include a housing and a wireless transceiver carried by the housing. The mobile wireless communications device may also include audio transducer carried by the housing, and a controller cooperating with the wireless transceiver to perform at least one wireless communications function. The controller may also cooperate with the at least one audio transducer to convert speech input through the audio transducer to converted text, determine a proposed modification for the converted text, and output from the audio output transducer the proposed modification for the converted text. | 09-02-2010 |
20100223056 | VARIOUS APPARATUS AND METHODS FOR A SPEECH RECOGNITION SYSTEM - A method, apparatus, and system are described for a continuous speech recognition engine that includes a fine speech recognizer model, a coarse sound representation generator, and a coarse match generator. The fine speech recognizer model receives a time coded sequence of sound feature frames, applies a speech recognition process to the sound feature frames and determines at least a best guess at each recognizable word that corresponds to the sound feature frames. The coarse sound representation generator generates a coarse sound representation of the recognized word. The coarse match generator determines a likelihood of the coarse sound representation actually being the recognized word based on comparing the coarse sound representation of the recognized word to a database containing the known sound of that recognized word and assigns the likelihood as a robust confidence level parameter to that recognized word. | 09-02-2010 |
20100228546 | SYSTEM AND METHODS FOR PROVIDING VOICE TRANSCRIPTION - A system and methods is provided for providing SIP based voice transcription services. A computer implemented method includes: transcribing a Session Initiation Protocol (SIP) based conversation between one or more users from voice to text transcription; identifying each of the one or more users that are speaking using a device SIP_ID of the one or more users; marking the identity of the one or more users that are speaking in the text transcription; and providing the text transcription of the speaking user to non-speaking users. | 09-09-2010 |
20100228547 | METHOD AND APPARATUS FOR ANALYZING DISCUSSION REGARDING MEDIA PROGRAMS - A system that incorporates teachings of the present disclosure may include, for example, a device, such as a set-top box, including a controller to detect a plurality of users engaging in a voice conference to discuss a presentation of a media program, convert speech dialog detected in the voice conference to textual dialog, detect from the textual dialog a behavioral profile of at least one of the plurality of users, and identify at least one of advertisement content and marketable media content based on the behavioral profile of the at least one user. Other embodiments are disclosed. | 09-09-2010 |
20100241429 | Systems And Methods For Punctuating Voicemail Transcriptions - A system, method and software product punctuates voicemail transcription text. A transcription text of the voicemail message is generated and the pauses between words of the transcribed text are determined. Ellipses are inserted into the transcription text at the position of “er” and “ahh” type words and pauses between words of the transcribed text. | 09-23-2010 |
20100250248 | COMBINED SPEECH AND TOUCH INPUT FOR OBSERVATION SYMBOL MAPPINGS - The invention relates to systems and or methodologies for enabling combined speech and touch inputs for observation symbol mappings. More particularly, the current innovation leverages the commonality of touch screen display text entry and speech recognition based text entry to increase the speed and accuracy of text entry via mobile devices. Touch screen devices often contain small and closely grouped keypads that can make it difficult for a user to press the intended character, by combining touch screen based text entry with speech recognition based text entry the aforementioned limitation can be overcome efficiently and conveniently. | 09-30-2010 |
20100250249 | Communication control apparatus, communication control method, and computer-readable medium storing a communication control program - A communication control apparatus for communicating sound and image with another communication control apparatus via a network, includes a sound input device that acquires sound data from a sound of a user's speech, a level measuring device that measures a volume level of sound data input from the sound input device, a first determining device that determines whether the volume level measured by the level measuring device is smaller than a predetermined standard volume value, a sound recognizing device that executes sound reorganization of the sound data so as to create text data when the first determining device determines that the volume level is smaller than the standard volume value, and a transmitting device that transmits the text data created by the sound recognizing device to the another communication control apparatus. | 09-30-2010 |
20100250250 | SYSTEMS AND METHODS FOR GENERATING A HYBRID TEXT STRING FROM TWO OR MORE TEXT STRINGS GENERATED BY MULTIPLE AUTOMATED SPEECH RECOGNITION SYSTEMS - A hybrid text generator is disclosed that generates a hybrid text string from multiple text strings that are produced from an audio input by multiple automated speech recognition systems. The hybrid text generator receives metadata that describes a time-location that each word from the multiple text strings is located in the audio input. The hybrid text generator matches words between the multiple text strings using the metadata and generates a hybrid text string that includes the matched words. The hybrid text generator utilizes confidence scores associated with words that do not match between the multiple text strings to determine whether to add an unmatched word to the hybrid text string. | 09-30-2010 |
20100268534 | TRANSCRIPTION, ARCHIVING AND THREADING OF VOICE COMMUNICATIONS - Described is a technology that provides highly accurate speech-recognized text transcripts of conversations, particularly telephone or meeting conversations. Speech is received for recognition when it is at a high quality and separate for each user, that is, independent of any transmission. Moreover, because the speech is received separately, a personalized recognition model adapted to each user's voice and vocabulary may be used. The separately recognized text is then merged into a transcript of the communication. The transcript may be labeled with the identity of each user that spoke the corresponding speech. The output of the transcript may be dynamic as the conversation takes place, or may occur later, such as contingent upon each user agreeing to release his or her text. The transcript may be incorporated into the text or data of another program, such as to insert it as a thread in a larger email conversation or the like. | 10-21-2010 |
20100286982 | System and Method for Automatic Merging of Multiple Time-Stamped Transcriptions - A system for automatically merging multiple time-stamped transcriptions is provided. The system includes a transcription server for receiving a signal having time-stamp information, a splitter, a merging utility, and a text output. A method for automatic merging of multiple time-stamped transcriptions comprises the following steps: transferring a signal having timestamp information encoded therein to a splitter which yields a mixed audio output having resultant corresponding audio channels, transferring the mixed audio output to a transcriber server which thereby yields one or more text outputs, and the text outputs being merged by a merging utility with the timestamps included in the signal thereby providing a single text file. | 11-11-2010 |
20100299146 | Speech Capabilities Of A Multimodal Application - Improving speech capabilities of a multimodal application including receiving, by the multimodal browser, a media file having a metadata container; retrieving, by the multimodal browser, from the metadata container a speech artifact related to content stored in the media file for inclusion in the speech engine available to the multimodal browser; determining whether the speech artifact includes a grammar rule or a pronunciation rule; if the speech artifact includes a grammar rule, modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule; and if the speech artifact includes a pronunciation rule, modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule. | 11-25-2010 |
20100299147 | SPEECH-TO-SPEECH TRANSLATION - Systems and methods for facilitating communication including recognizing speech in a first language represented in a first audio signal; forming a first text representation of the speech; processing the first text representation to form data representing a second audio signal; and causing presentation of the second audio signal to a second user while responsive to an interrupt signal from a first user. In some embodiments, processing the first text representation includes translating the first text representation to a second text representation in a second language and processing the second text representation to form the data representing the second audio signal. In some embodiments include accepting an interrupt signal from the first user and interrupting the presentation of the second audio signal. | 11-25-2010 |
20100305945 | REPRESENTING GROUP INTERACTIONS - Disclosed is a system for generating a representation of a group interaction, the system comprising: a transcription module adapted to generate a transcript of the group interaction from audio source data representing the group interaction, the transcript comprising a sequence of lines of text, each line corresponding to an audible utterance in the audio source data; and a labeling module adapted to generate a conversation path from the transcript by labeling each transcript line with an identifier identifying the speaker of the corresponding utterance in the audio source data; and generate the representation of the group interaction by associating the conversation path with a plurality of voice profiles, each voice profile corresponding to an identified speaker in the conversation path. | 12-02-2010 |
20100324894 | Voice to Text to Voice Processing - Technologies are generally described for voice to text to voice processing. An audio signal can be preprocessed and translated into text prior to being processed in the textual domain. The text domain processing or subsequent text to voice regeneration can seek to improve clarity, correct grammar, adjust vocabulary level, remove profanity, correct slang, alter dialect, alter accent, or provide other modifications of various oral communication characteristics. The processed text may be translated back into the audio domain for delivery to a listener. The processing at each stage may be driven by a set of objectives and constraints set by the speaker, the listener, a third party, or any combination of explicit or implicit participants. The voice processing may translate the voice content from a specific human language to the same human language with various improvements. The processing may also involve translation into one or more other languages. | 12-23-2010 |
20100324895 | SYNCHRONIZATION FOR DOCUMENT NARRATION - Disclosed are techniques and systems for synchronizing an audio file with a sequence of words displayed on a user interface. | 12-23-2010 |
20100332225 | TRANSCRIPT ALIGNMENT - Some general aspects relate to systems and methods for media processing. One aspect, for example, relates to a method for aligning multimedia recording with a transcript. A group of search terms are formed from the transcript, with each search term being associated with a location within the transcript. Putative locations of the search terms are determined in a time interval of the multimedia recording. For each search term, zero or more putative locations are determined and, for at least some of the search terms, multiple putative locations are determined in the time interval of the multimedia recording. According to a first sequencing constraint, a first representation of a group of sequences each of a subset of the putative locations of the search terms is formed. A second representation of a group of sequences each of a subset of the search terms is formed. Using the first and the second representations, the time interval of the multimedia recording is partially aligned with the transcript. | 12-30-2010 |
20100332226 | MOBILE TERMINAL AND CONTROLLING METHOD THEREOF - A mobile terminal and controlling method thereof are disclosed, by which a specific content and another content associated with the specific content can be quickly searched using a user's voice. The present invention includes inputting a voice for a search for a specific content provided to the mobile terminal via a microphone, analyzing a meaning of the inputted voice, searching a memory for at least one content to which a voice name having a meaning associated with the analyzed voice is tagged, and displaying the searched at least one content. | 12-30-2010 |
20110010173 | System for Analyzing Interactions and Reporting Analytic Results to Human-Operated and System Interfaces in Real Time - A computerized system for advising one communicant in electronic communication between two or more communicants has apparatus monitoring and recording interaction between the communicants, software executing from a machine-readable medium and providing analytics, the software functions including rendering speech into text, and analyzing the rendered text for topics, performing communicant verification, and detecting changes in communicant emotion. Advice is offered to the one communicant during the interaction, based on results of the analytics. | 01-13-2011 |
20110010174 | MULTIMODAL DISAMBIGUATION OF SPEECH RECOGNITION - The present invention provides a speech recognition system combined with one or more alternate input modalities to ensure efficient and accurate text input. The speech recognition system achieves less than perfect accuracy due to limited processing power, environmental noise, and/or natural variations in speaking style. The alternate input modalities use disambiguation or recognition engines to compensate for reduced keyboards, sloppy input, and/or natural variations in writing style. The ambiguity remaining in the speech recognition process is mostly orthogonal to the ambiguity inherent in the alternate input modality, such that the combination of the two modalities resolves the recognition errors efficiently and accurately. The invention is especially well suited for mobile devices with limited space for keyboards or touch-screen input. | 01-13-2011 |
20110015926 | WORD DETECTION FUNCTIONALITY OF A MOBILE COMMUNICATION TERMINAL - Voice communication by first and second users in a voice communication session that facilitates communication between a first device through which a first user communicates and a second device through which a second user communicates is enabled. Words spoken in the voice communication session between the first device and the second device are monitored. Presence of one or more key words as a subset of less than all of the monitored spoken in the voice communication session is determined from the monitored words spoken in the voice communication session. The one or more key words are displayed on a display screen. | 01-20-2011 |
20110015927 | SYSTEM AND METHOD FOR EFFICIENT LASER PROCESSING OF A MOVING WEB-BASED MATERIAL - An automatic speech recognition system recognizes user changes to dictated text and infers whether such changes result from the user changing his/her mind, or whether such changes are a result of a recognition error. If a recognition error is detected, the system uses the type of user correction to modify itself to reduce the chance that such recognition error will occur again. Accordingly, the system and methods provide for significant speech recognition learning with little or no additional user interaction. | 01-20-2011 |
20110022386 | SPEECH RECOGNITION TUNING TOOL - Systems and methods for tuning a dictionary of a speech recognition system includes accessing a voice mail record of a user, accessing a recorded audio file of a name of the user in the voice mail record spoken by the user, providing the audio file to a speech recognition system, processing the audio file in the speech recognition system and obtaining a text result, determining whether a confidence score of the text result is below a predetermined threshold, and adding, at least, the name of the user to a list of low confidence names. Alternate spellings for the low confidence names can then be added to the dictionary. | 01-27-2011 |
20110022387 | CORRECTING TRANSCRIBED AUDIO FILES WITH AN EMAIL-CLIENT INTERFACE - Methods and systems for requesting a transcription of audio data. One method includes displaying a send-for-transcription button within an email-client interface on a computer-controlled display, and automatically sending a selected email message and associated audio data to a transcription server as a request for a transcription of the associated audio data when a user selects the send-for-transcription button. | 01-27-2011 |
20110035217 | SPEECH-DRIVEN SELECTION OF AN AUDIO FILE - A system and method for detecting a refrain in an audio file having vocal components. The method and system includes generating a phonetic transcription of a portion of the audio file, analyzing the phonetic transcription and identifying a vocal segment in the generated phonetic transcription that is repeated frequently. The method and system further relate to the speech-driven selection based on similarity of detected refrain and user input. | 02-10-2011 |
20110035218 | Live Media Captioning Subscription Framework for Mobile Devices - A subscription-based system provides transcribed audio information to one or more mobile devices. Some techniques feature a system for providing subscription services for currently-generated (e.g., not stored) information (e.g., caption information, transcribed audio) for one or more mobile devices for a live/current audio event. There can be a communication network for communicating to the one or more mobile devices, a transcriber configured for transcribing the event to generate information (e.g., caption information, transcribed audio). Caption data includes transcribed data and control code data. The system includes a subscription gateway configured for live/current transfer of the transcribed data to the one or more mobile devices. The subscription gateway is configured to provide access for the transcribed data to the one or more mobile devices. User preferences for subscribers can be set and/or updated by mobile device users and/or GPS-capable mobile devices to receive feeds for the live/current audio event. | 02-10-2011 |
20110046950 | Wireless Dictaphone Features and Interface - A system and method for a method for integrating a communications system with a dictation system on a mobile device includes displaying a first graphical user interface screen on a display of the mobile device, the first graphical user interface screen including a first plurality of selections, when selected by a user, enable the user to dictate and create one or more voice files for sending to a receiving server; and automatically displaying a second graphical user interface screen on the display of the mobile device when the communications system receives an incoming call, said second graphical user interface screen indicating suspension of dictation functionality and enabling telephone functionality. | 02-24-2011 |
20110054893 | SYSTEM AND METHOD FOR GENERATING USER MODELS FROM TRANSCRIBED DIALOGS - Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for generating personalized user models. The method includes receiving automatic speech recognition (ASR) output of speech interactions with a user, receiving an ASR transcription error model characterizing how ASR transcription errors are made, generating guesses of a true transcription and a user model via an expectation maximization (EM) algorithm based on the error model and the respective ASR output where the guesses will converge to a personalized user model which maximizes the likelihood of the ASR output. The ASR output can be unlabeled. The method can include casting speech interactions as a dynamic Bayesian network with four variables: (s), (u), (r), (m), and encoding relationships between (s), (u), (r), (m) as conditional probability tables. At each dialog turn (r) and (m) are known and (s) and (u) are hidden. | 03-03-2011 |
20110054894 | SPEECH RECOGNITION THROUGH THE COLLECTION OF CONTACT INFORMATION IN MOBILE DICTATION APPLICATION - In embodiments of the present invention improved capabilities are described for improving speech recognition through the collection of contact information on a mobile communication facility comprising capturing speech presented by a user using a resident capture facility on the mobile communication facility; transmitting at least a portion of the captured speech as data through a wireless communication facility to a speech recognition facility; prompting the user to manage contacts associated with usage of the mobile communication facility; transmitting contact names to the speech recognition facility, wherein the contact names are used by the speech recognition facility to at least one of tune, enhance, and improve the speech recognition of the speech recognition facility; generating speech-to-text results for the captured speech utilizing the speech recognition facility based at least in part on at least one of a contact name; transmitting the text results from the speech recognition facility to the mobile communications facility; and entering the text results into a text field on the mobile communications facility. | 03-03-2011 |
20110054895 | UTILIZING USER TRANSMITTED TEXT TO IMPROVE LANGUAGE MODEL IN MOBILE DICTATION APPLICATION - In embodiments of the present invention improved capabilities are described for utilizing user transmitted text to improve language modeling in converting voice to text on a mobile communication facility comprising capturing speech presented by a user using a resident capture facility on the mobile communication facility; transmitting at least a portion of the captured speech as data through a wireless communication facility to a speech recognition facility; generating speech-to-text results for the captured speech utilizing the speech recognition facility; transmitting the text results from the speech recognition facility to the mobile communications facility; entering the text results into a text field on the mobile communications facility; monitoring for a user selected transmission of the entered text results through a communications application on the mobile communications facility; and receiving the user selected transmitted text at the speech recognition facility and using it to improve the performance of the speech recognition facility. | 03-03-2011 |
20110054896 | SENDING A COMMUNICATIONS HEADER WITH VOICE RECORDING TO SEND METADATA FOR USE IN SPEECH RECOGNITION AND FORMATTING IN MOBILE DICTATION APPLICATION - In embodiments of the present invention improved capabilities are described for sending a communications header with voice recording to send metadata for use in speech recognition and formatting when converting voice to text on a mobile communication facility comprising capturing speech presented by a user using a resident capture facility on the mobile communication facility; transmitting a communications header to a speech recognition facility from the mobile communication facility via a wireless communications facility, wherein the communications header includes at least one of device name, network provider, network type, audio source, a display parameter for the wireless communications facility, geographic location, and phone number information; transmitting at least a portion of the captured speech as data through a wireless communication facility to a speech recognition facility; generating speech-to-text results for the captured speech utilizing the speech recognition facility based at least in part on the communications header; transmitting the text results from the speech recognition facility to the mobile communications facility; and entering the text results into a text field on the mobile communication facility. | 03-03-2011 |
20110054897 | TRANSMITTING SIGNAL QUALITY INFORMATION IN MOBILE DICTATION APPLICATION - In embodiments of the present invention improved capabilities are described for transmitting signal quality information when converting voice to text on a mobile communication facility comprising capturing speech presented by a user using a resident capture facility on the mobile communication facility; transmitting at least a portion of the captured speech as data through a wireless communication facility to a speech recognition facility; generating speech-to-text results for the captured speech utilizing the speech recognition facility; transmitting the text results from the speech recognition facility to the mobile communications facility, including text from the speech-to-text results and information on signal quality, such information including at least one of signal-to-noise ratio, clipping, and energy; and entering the text results into a text field on the mobile communications facility. | 03-03-2011 |
20110054898 | MULTIPLE WEB-BASED CONTENT SEARCH USER INTERFACE IN MOBILE SEARCH APPLICATION - In embodiments of the present invention improved capabilities are described for a multiple web-based content search user interface in searching for web content on a mobile communication facility comprising capturing speech presented by a user using a resident capture facility on the mobile communication facility; transmitting at least a portion of the captured speech as data through a wireless communication facility to a speech recognition facility; generating speech-to-text results utilizing the speech recognition facility; and transmitting text from the speech-to-text results along with URL usage information configured to enable a user to conduct a search on the mobile communication facility. | 03-03-2011 |
20110054899 | COMMAND AND CONTROL UTILIZING CONTENT INFORMATION IN A MOBILE VOICE-TO-SPEECH APPLICATION - In embodiments of the present invention improved capabilities are described for controlling a mobile communication facility utilizing content information comprising accepting speech presented by a user using a resident capture facility on the mobile communication facility while the user engages an interface that enables a command mode for the mobile communications facility; processing the speech using a resident speech recognition facility to recognize command elements and content elements; transmitting at least a portion of the speech through a wireless communication facility to a remote speech recognition facility; transmitting information from the mobile communication facility to the remote speech recognition facility, wherein the information includes information about a command recognized by the resident speech recognition facility and information about the content recognized by the resident speech recognition facility; generating speech-to-text results utilizing the remote speech recognition facility based at least in part on the speech and on the information related to the mobile communication facility; and transmitting the text results for use on the mobile communications facility. | 03-03-2011 |
20110054900 | HYBRID COMMAND AND CONTROL BETWEEN RESIDENT AND REMOTE SPEECH RECOGNITION FACILITIES IN A MOBILE VOICE-TO-SPEECH APPLICATION - In embodiments of the present invention improved capabilities are described for hybrid command and control between resident and remote speech recognition facilities in controlling a mobile communication facility comprising accepting speech presented by a user using a resident capture facility on the mobile communication facility while the user engages an interface that enables a command mode for the mobile communications facility; processing the speech using a resident speech recognition facility to recognize command elements and content elements; transmitting at least a portion of the speech through a wireless communication facility to a remote speech recognition facility; transmitting information from the mobile communication facility to the remote speech recognition facility, wherein the information includes information about a command recognizable by the resident speech recognition facility; generating speech-to-text results utilizing a hybrid of the resident speech recognition facility and the remote speech recognition facility based at least in part on the speech and on the information related to the mobile communication facility; and transmitting the text results for use on the mobile communications facility. | 03-03-2011 |
20110060587 | COMMAND AND CONTROL UTILIZING ANCILLARY INFORMATION IN A MOBILE VOICE-TO-SPEECH APPLICATION - In embodiments of the present invention improved capabilities are described for controlling a mobile communication facility utilizing ancillary information comprising accepting speech presented by a user using a resident capture facility on the mobile communication facility while the user engages an interface that enables a command mode for the mobile communications facility; processing the speech using a resident speech recognition facility to recognize command elements and content elements; transmitting at least a portion of the speech through a wireless communication facility to a remote speech recognition facility; transmitting information from the mobile communication facility to the remote speech recognition facility, wherein the information includes information about a command recognizable by the resident speech recognition facility and at least one of language, location, display type, model, identifier, network provider, and phone number associated with the mobile communication facility; generating speech-to-text results utilizing the remote speech recognition facility based at least in part on the speech and on the information related to the mobile communication facility; and transmitting the text results for use on the mobile communications facility. | 03-10-2011 |
20110066431 | HAND-HELD INPUT APPARATUS AND INPUT METHOD FOR INPUTTING DATA TO A REMOTE RECEIVING DEVICE - A hand-held input apparatus includes an input unit, a translator and a wireless transmitter. The input unit generates an input signal. The translator receives the input signal from the input unit, converts the input signal to a meaningful text and translates the meaningful text to a translated signal according to a protocol used in a remote receiving device. The wireless transmitter wirelessly transmits the translated signal to the remote receiving device. | 03-17-2011 |
20110066432 | CONTENT FILTERING FOR A DIGITAL AUDIO SIGNAL - According to some embodiments, content filtering is provided for a digital audio signal. | 03-17-2011 |
20110071826 | METHOD AND APPARATUS FOR ORDERING RESULTS OF A QUERY - A method and apparatus for ordering results from a query is provided herein. During operation, a spoken query is received and converted to a textual representation, such as a word lattice. Search strings are then created from the word lattice. For example a set search strings may be created from the N-grams, such as unigrams and bigrams, of the word lattice. The search strings may be ordered and truncated based on confidence values assigned to the n-grams by the speech recognition system. The set of search strings are sent to at least one search engine, and search results are obtained. The search results are then re-arranged or reordered based on a semantic similarity between the search results and the word lattice. | 03-24-2011 |
20110071827 | GENERATION AND SELECTION OF SPEECH RECOGNITION GRAMMARS FOR CONDUCTING SEARCHES - Various processes are disclosed for generating and selecting speech recognition grammars for conducting searches by voice. In one such process, search queries are selected from a search query log for incorporation into speech recognition grammar. The search query log may include or consist of search queries specified by users without the use of voice. Another disclosed process enables a user to efficiently submit a search query by partially spelling the search query (e.g., on a telephone keypad or via voice utterances) and uttering the full search query. The user's partial spelling is used to select a particular speech recognition grammar for interpreting the utterance of the full search query. | 03-24-2011 |
20110077941 | Enabling Spoken Tags - Techniques for assigning a spoken tag in a telecom web platform are provided. The techniques include receiving a spoken tag, comparing the spoken tag to a set of one or more template tags, if the spoken tag is a match to a template tag, assigning the spoken tag and updating frequency of the tag in the set of one or more template tags, and if the spoken tag is not a match to a template tag, assigning the spoken tag and registering the spoken tag as a new tag in the set of one or more template tags. | 03-31-2011 |
20110087491 | Method and system for efficient management of speech transcribers - A method and system for improving the efficiency of speech transcription by automating the management of a varying pool of human and machine transcribers having diverse qualifications, skills, and reliability for a fluctuating load of speech transcription tasks of diverse requirements such as accuracy, promptness, privacy, and security, from sources of diverse characteristics such as language, dialect, accent, speech style, voice type, vocabulary, audio quality, and duration. | 04-14-2011 |
20110093263 | Automated Video Captioning - An automated closed captioning, captioning, or subtitle generation system that automatically generates the captioning text from the audio signal in a submitted online video and then allows the user to type in any corrections after which it adds the captioning text to the video allowing users to enable the captioning as needed. The user text review and correction step allows the text prediction model to accumulate additional corrected data with each use thereby improving the accuracy of the text generation over time and use of the system. | 04-21-2011 |
20110093264 | Providing Information Services Related to Multimodal Inputs - A system and method provides information services related to multimodal inputs. Several different types of data used as multimodal inputs are described. Also described are various methods involving the generation of contexts using multimodal inputs, synthesizing context-information service mappings and identifying and providing information services. | 04-21-2011 |
20110099011 | Detecting And Communicating Biometrics Of Recorded Voice During Transcription Process - A method and system for determining and communicating biometrics of a recorded speaker in a voice transcription process. An interactive voice response system receives a request from a user for a transcription of a voice file. A profile associated with the requesting user is obtained, wherein the profile comprises biometric parameters and preferences defined by the user. The requested voice file is analyzed for biometric elements according to the parameters specified in the user's profile. Responsive to detecting biometric elements in the voice file that conform to the parameters specified in the user's profile, a transcription output of the voice file is modified according to the preferences specified in the user's profile for the detected biometric elements to form a modified transcription output file. The modified transcription output file may then be provided to the requesting user. | 04-28-2011 |
20110106534 | Voice Actions on Computing Devices - A computer-implemented method includes receiving spoken input at a computing device from a user of the computing device, the spoken input including a carrier phrase and a subject to which the carrier phrase is directed, providing at least a portion of the spoken input to a server system in audio form for speech-to-text conversion by the server system, the portion including the subject to which the carrier phrase is directed, receiving from the server system instructions for automatically performing an operation on the computing device, the operation including an action defined by the carrier phrase using parameters defined by the subject, and automatically performing the operation on the computing device. | 05-05-2011 |
20110106535 | Caption presentation method and apparatus using same - A caption presentation method and an apparatus using the method, by which caption and information related to the caption can be provided together in a broadcast receiver or in an image reproducer that displays the caption in a closed caption method. The method includes detecting subject information from a caption signal; obtaining visual information with respect to the caption, based on the detected caption subject information; and displaying the visual information and the caption signal together. | 05-05-2011 |
20110112832 | AUTO-TRANSCRIPTION BY CROSS-REFERENCING SYNCHRONIZED MEDIA RESOURCES - A media archive comprising a plurality of media resources associated with events that occurred during a time interval are processed to synchronize the media resources. Sequences of patterns are identified in each media resource of the media archive. Elements of the sequences associated with different media resources are correlated such that a set of correlated elements is associated with the same event that occurred in the given time interval. The synchronization information of the processed media resources is represented in a flexible and extensible data format. The synchronization information is used for correction of errors occurring in the media resources of a media archive, for enhancing processes identifying information in media resources, for example by transcription of audio resources or by optical character recognition of images. | 05-12-2011 |
20110112833 | REAL-TIME TRANSCRIPTION OF CONFERENCE CALLS - Described herein are embodiments of systems, methods and computer program products for real-time transcription of conference calls that employ voice activity detection, audio snippet capture, and multiple transcription instances to deliver practical real-time or near real-time conference call transcription. | 05-12-2011 |
20110112834 | Communication method and terminal - A communication method and terminal assist hearing and speech impaired persons. The communication method includes generating a text by combining at least one character input in a text call mode. The text is converted to a speech, and the converted speech is transmitted to a counterparty terminal. The communication terminal of the present invention is capable of converting the text input by the user to a speech signal for the counterparty terminal and converting the speech signal received from the counterparty terminal to a text such that it is possible for the user to conduct a communication with the counterparty regardless of whether the counterparty is a hearing or speech impaired or whether the counterparty's terminal supports the text call service, resulting in reduction of the text call service implementation complexity and improvement of user convenience. | 05-12-2011 |
20110112835 | COMMENT RECORDING APPARATUS, METHOD, PROGRAM, AND STORAGE MEDIUM - A comment recording apparatus, including a voice input device and a voice output device for recording and playing back comment voice, includes a voice obtaining unit, a voice recognition unit, a morphological analysis unit, and a display generation unit. The voice obtaining unit obtains comment voice as voice data, and registers the obtained voice data to a voice database for each topic specified by a topic specification device and each comment-delivered participant identified from the voice data. The voice recognition unit conducts a voice recognition process on the voice data to obtain text information. The morphological analysis unit conducts a morphological analysis on the text information, and registers a keyword extracted from words obtained by the morphological analysis unit to a keyword database with topic and comment-delivered participant along with voice. The display generation unit displays the keyword in a matrix while relating the keyword to a topic and a comment-delivering participant. | 05-12-2011 |
20110112836 | METHOD AND DEVICE FOR CONVERTING SPEECH - Electronic device and method for obtaining a digital speech signal and a control command relating to the digital speech signal while obtaining the digital speech signal, and for temporally associating the control command with a substantially corresponding time instant in the digital speech signal to which the control command was directed, wherein the control command determines one or more punctuation marks or another, optionally symbolic, elements to be at least logically positioned at a text location corresponding to the communication instant relative to the digital speech signal so as to cultivate the speech to text conversion procedure. | 05-12-2011 |
20110112837 | METHOD AND DEVICE FOR CONVERTING SPEECH - Electronic device and method for speech to text conversion procedure, wherein the overall conversion result may include smaller portions with multiple conversion options that are audibly and optionally visually or tactilely reproduced for user confirmation, thereby resulting enhanced conversion accuracy with minimal additional effort by the user. | 05-12-2011 |
20110119058 | METHOD AND SYSTEM FOR THE CREATION OF A PERSONALIZED VIDEO - A method and system for creating a personalized video destined to an intended recipient, comprising gathering personal information about the intended recipient, selecting of a non personalized video, retrieving the selected non personalized video along with associated customizable elements, setting the customizable elements according to the personal information of the intended recipient and assembling the non personalized video and the set customizable elements to create the personalized video. | 05-19-2011 |
20110131041 | Systems And Methods For Synthesis Of Motion For Animation Of Virtual Heads/Characters Via Voice Processing In Portable Devices - Systems and methods consistent with the innovations herein relate to communication using a virtual humanoid animated during call processing. According to one exemplary implementation, the animation may be performed using a system of recognition of spoken vowels for animation of the lips, which may also be associated with the recognition of DTMF tones for animation of head movements and facial features. The innovations herein may be generally implemented in portable devices such as PDAs, cell phones and Smart Phones that have access to mobile telephony. | 06-02-2011 |
20110144989 | SYSTEM AND METHOD FOR AUDIBLE TEXT CENTER SUBSYSTEM - Disclosed herein are systems, methods, and computer-readable storage media for sending a spoken message as a text message. The method includes initiating a connection with a first subscriber, receiving from the first subscriber a spoken message and spoken information associated with at least one recipient address. The method further includes converting the spoken message to text via an audible text center subsystem (ATCS), and delivering the text to the recipient address. The method can also include verifying a subscription status of the first subscriber, or delivering the text to the recipient address based on retrieved preferences of the first subscriber. The preferences can be retrieved from a consolidated network repository or embedded within the spoken message. Text and the spoken message can be delivered to the same or different recipient addresses. The method can include updating recipient addresses based on a received oral command from the first subscriber. | 06-16-2011 |
20110144990 | RATING SPEECH NATURALNESS OF SPEECH UTTERANCES BASED ON A PLURALITY OF HUMAN TESTERS - A method that includes: generating an utterance-specific scoring model for each one of a plurality of obtained speech utterances, each scoring model usable to estimate a level of speech naturalness for a respective one of the obtained speech utterances; presenting a plurality of human-testers with some of the obtained speech utterances; receiving, for each presented speech utterance, a plurality of human tester generated speech utterances being at least one human repetition of the presented speech utterance; updating the scoring model for each presented speech utterance, based on respective human-tester generated speech utterances; and obtaining a speech naturalness score for each presented speech utterance by respectively applying the updated utterance-specific scoring model to each presented speech utterance. | 06-16-2011 |
20110153322 | DIALOG MANAGEMENT SYSTEM AND METHOD FOR PROCESSING INFORMATION-SEEKING DIALOGUE - A dialog management apparatus and method for processing an information-seeking dialogue with a user and providing a service to the user by prompting the user for a task-oriented dialogue may be provided. A hierarchical topic plan in which pieces of information are organized in a hierarchy according to topics corresponding to services may be used to prompt the user to change an information-seeking dialogue to a task-oriented dialogue, and the user may be provided with a service. | 06-23-2011 |
20110153323 | METHOD AND SYSTEM FOR CONTROLLING EXTERNAL OUTPUT OF A MOBILE DEVICE - A method and system is provided that controls an external output function of a mobile device according to control interactions received via the microphone. The method includes, activating a microphone according to preset optional information when the mobile device enters an external output mode, performing an external output operation in the external output mode, detecting an interaction based on sound information in the external output mode, and controlling the external output according to the interaction. | 06-23-2011 |
20110153324 | Language Model Selection for Speech-to-Text Conversion - Methods, computer program products and systems are described for converting speech to text. Sound information is received at a computer server system from an electronic device, where the sound information is from a user of the electronic device. A context identifier indicates a context within which the user provided the sound information. The context identifier is used to select, from among multiple language models, a language model appropriate for the context. Speech in the sound information is converted to text using the selected language model. The text is provided for use by the electronic device. | 06-23-2011 |
20110153325 | Multi-Modal Input on an Electronic Device - A computer-implemented input-method editor process includes receiving a request from a user for an application-independent input method editor having written and spoken input capabilities, identifying that the user is about to provide spoken input to the application-independent input method editor, and receiving a spoken input from the user. The spoken input corresponds to input to an application and is converted to text that represents the spoken input. The text is provided as input to the application. | 06-23-2011 |
20110161079 | Grammar and Template-Based Speech Recognition of Spoken Utterances - The present invention relates to a communication system, comprising a database including classes of speech templates, in particular, classified according to a predetermined grammar; an input configured to receive and to digitize speech signals corresponding to a spoken utterance; a speech recognizer configured to receive and recognize the digitized speech signals; and wherein the speech recognizer is configured to recognize the digitized speech signals based on speech templates stored in the database and a predetermined grammatical structure. | 06-30-2011 |
20110161080 | Speech to Text Conversion - Methods, computer program products and systems are described for speech-to-text conversion. A voice input is received from a user of an electronic device and contextual metadata is received that describes a context of the electronic device at a time when the voice input is received. Multiple base language models are identified, where each base language model corresponds to a distinct textual corpus of content. Using the contextual metadata, an interpolated language model is generated based on contributions from the base language models. The contributions are weighted according to a weighting for each of the base language models. The interpolated language model is used to convert the received voice input to a textual output. The voice input is received at a computer server system that is remote to the electronic device. The textual output is transmitted to the electronic device. | 06-30-2011 |
20110208522 | METHOD AND APPARATUS FOR DETECTION OF SENTIMENT IN AUTOMATED TRANSCRIPTIONS - A method and apparatus for automatically detecting sentiment in interactions. The method and apparatus include training, in which a model is generated upon features extracted from training interactions and tagging information. and run-time in which the model is used towards detecting sentiment in further interactions. | 08-25-2011 |
20110208523 | VOICE-TO-DACTYLOLOGY CONVERSION METHOD AND SYSTEM - A voice-to-dactylology conversion method and system include primarily a transmitting communication device operating in collaboration with a receiving communication device. An ordinary user can use the transmitting communication device to send a voice message to the receiving communication device. At this time, the receiving communication device converts the voice message into a corresponding dactylology image message and displays the dactylology image message in motion pictures on a screen of the receiving communication device, allowing a deaf-mute to understand a message to be expressed by the other party. On the other hand, the deaf-mute can use the receiving communication device to select images to be expressed, arrange and combine the images, followed by converting them into the voice message to be sent to the transmitting communication device. As a result, the communication method of the deaf-mute can be improved significantly. | 08-25-2011 |
20110213613 | Automatic Language Model Update - A method for generating a speech recognition model includes accessing a baseline speech recognition model, obtaining information related to recent language usage from search queries, and modifying the speech recognition model to revise probabilities of a portion of a sound occurrence based on the information. The portion of a sound may include a word. Also, a method for generating a speech recognition model, includes receiving at a search engine from a remote device an audio recording and a transcript that substantially represents at least a portion of the audio recording, synchronizing the transcript with the audio recording, extracting one or more letters from the transcript and extracting the associated pronunciation of the one or more letters from the audio recording, and generating a dictionary entry in a pronunciation dictionary. | 09-01-2011 |
20110224981 | DYNAMIC SPEECH RECOGNITION AND TRANSCRIPTION AMONG USERS HAVING HETEROGENEOUS PROTOCOLS - A system is disclosed for facilitating free form dictation, including directed dictation and constrained recognition and/or structured transcription among users having heterogeneous native (legacy) protocols for generating, transcribing, and exchanging recognized and transcribed speech. The system includes at least one system transaction manager having a “system protocol,” to receive a verified, streamed speech information request from at least one authorized user employing a first legacy user protocol. The speech information request which includes spoken text and system commands is generated using a user interface capable of bi-directional communication with the system transaction manager and supporting dictation applications, including prompts to direct user dictation in response to user system protocol commands and systems transaction manager commands. A speech recognition and/or transcription engine (ASR), in communication with the systems transaction manager, receives the speech information request from the system transaction manager, generates a transcribed response, which can include a formatted transcription, and transmits the response to the system transaction manager. The system transaction manager routes the response to one or more of the users employing a second protocol, which may be the same as or different than the first protocol. In another embodiment, the system employs a virtual sound driver for streaming free form dictation to any ASR, regardless of the ASR's ability to recognize and/or transcribe spoken text from any input source such as, for example, a live microphone or line input. In another embodiment, the system employs a buffer to facilitate the system's use of ASRs requiring input data to be in batches, while providing the user with an uninterrupted, seamless dictating experience. | 09-15-2011 |
20110246194 | Indicia to indicate a dictation application is capable of receiving audio - A client station having access to an application is provided. The application has at least one indicia having a first configuration and a second configuration different from the first configuration. The second configuration indicating the application is able to accept input. | 10-06-2011 |
20110246195 | HIERARCHICAL QUICK NOTE TO ALLOW DICTATED CODE PHRASES TO BE TRANSCRIBED TO STANDARD CLAUSES - A dictation system that allows using trainable code phrases is provided. The dictation system operates by receiving audio and recognizing the audio as text. The text/audio may contain code phrases that are identified by a comparator that matches the text/audio and replaces the code phrase with a standard clause that is associated with the code phrase. The database or memory containing the code phrases is loaded with matched standard clauses that may be identified to provide a hierarchal system such that certain code phrases may have multiple meanings depending on the user. | 10-06-2011 |
20110246196 | INTEGRATED VOICE BIOMETRICS CLOUD SECURITY GATEWAY - A triple factor authentication in one step method and system is disclosed. According to one embodiment, an Integrated Voice Biometrics Cloud Security Gateway (IVCS Gateway) intercepts an access request to a resource server from a user using a user device. IVCS Gateway then authenticates the user by placing a call to the user device and sending a challenge message prompting the user to respond by voice. After receiving the voice sample of the user, the voice sample is compared against a stored voice biometrics record for the user. The voice sample is also converted into a text phrase and compared against a stored secret text phrase. In an alternative embodiment, an IVCS Gateway that is capable of making non-binary access decisions and associating multiple levels of access with a single user or group is described. | 10-06-2011 |
20110246197 | METHOD, APPARATUS, AND PROGRAM FOR CERTIFYING A VOICE PROFILE WHEN TRANSMITTING TEXT MESSAGES FOR SYNTHESIZED SPEECH - A mechanism is provided for authenticating and using a personal voice profile. The voice profile may be issued by a trusted third party, such as a certification authority. The personal voice profile may include information for generating a digest or digital signature for text messages. A speech synthesis system may speak the text message using the voice characteristics, such as prosodic characteristics, only if the voice profile is authenticated and the text message is valid and free of tampering. | 10-06-2011 |
20110251843 | COMPENSATION OF INTRA-SPEAKER VARIABILITY IN SPEAKER DIARIZATION - A method, system, and computer program product compensation of intra-speaker variability in speaker diarization are provided. The method includes: dividing a speech session into segments of duration less than an average duration between speaker change; parameterizing each segment by a time dependent probability density function supervector, for example, using a Gaussian Mixture Model; computing a difference between successive segment supervectors; and computing a scatter measure such as a covariance matrix of the difference as an estimate of intra-speaker variability. The method further includes compensating the speech session for intra-speaker variability using the estimate of intra-speaker variability. | 10-13-2011 |
20110257972 | SYSTEM AND METHOD FOR LOCATION TRACKING USING AUDIO INPUT - An electronic device and method of location tracking adapted to enhance a user's ability in recalling or returning to a former location. The electronic device may record audio, such as the user's speech and/or speech from others. The location at which the speech is recorded is determined and stored. The speech may be converted to text, which is associated with the determined location. The converted text may be indexed for searching. A user may perform a text-based search for words that the user may recall speaking and/or hearing at the location. Returned search results may remind the user of the location and provide directions for returning to the location. | 10-20-2011 |
20110257973 | VEHICLE USER INTERFACE SYSTEMS AND METHODS - A control system for mounting in a vehicle and for providing information to a portable electronic device for processing by the portable electronic device is shown and described. The control system includes a first interface for communicating with the portable electronic device and a memory device. The control system also includes a processing circuit communicably coupled to the first interface and the memory device, the processing circuit configured to extract information from the memory device and to provide the information to the first interface so that the first interface communicates the information to the portable electronic device. The processing circuit is further configured to determine the capabilities of the portable electronic device based on data received from the portable electronic device via the first interface and to determine whether or not to communicate the information to the portable electronic device based on the determined capabilities. | 10-20-2011 |
20110264451 | METHODS AND SYSTEMS FOR TRAINING DICTATION-BASED SPEECH-TO-TEXT SYSTEMS USING RECORDED SAMPLES - A method and apparatus useful to train speech recognition engines is provided. Many of today's speech recognition engines require training to particular individuals to accurately convert speech to text. The training requires the use of significant resources for certain applications. To alleviate the resources, a trainer is provided with the text transcription and the audio file. The trainer updates the text based on the audio file. The changes are provided to the speech recognition to train the recognition engine and update the user profile. In certain aspects, the training is reversible as it is possible to over train the system such that the trained system is actually less proficient. | 10-27-2011 |
20110270609 | REAL-TIME SPEECH-TO-TEXT CONVERSION IN AN AUDIO CONFERENCE SESSION - Various embodiments of systems, methods, and computer programs are disclosed for providing real-time resources to participants in an audio conference session. One embodiment is a method for providing real-time resources to participants in an audio conference session via a communication network. One such method comprises: a conferencing system establishing an audio conference session between a plurality of computing devices via a communication network, each computing device generating a corresponding audio stream comprising a speech signal; and in real-time during the audio conference session, a server: receiving and processing the audio streams to determine the speech signals; extracting words from the speech signals; analyzing the extracted words to determine a relevant keyword being discussed in the audio conference session; identifying a resource related to the relevant keyword; and providing the resource to one or more of the computing devices. | 11-03-2011 |
20110276325 | Training A Transcription System - According to certain embodiments, training a transcription system includes accessing recorded voice data of a user from one or more sources. The recorded voice data comprises voice samples. A transcript of the recorded voice data is accessed. The transcript comprises text representing one or more words of each voice sample. The transcript and the recorded voice data are provided to a transcription system to generate a voice profile for the user. The voice profile comprises information used to convert a voice sample to corresponding text. | 11-10-2011 |
20110276326 | METHOD AND SYSTEM FOR OPERATIONAL IMPROVEMENTS IN DISPATCH CONSOLE SYSTEMS IN A MULTI-SOURCE ENVIRONMENT - A method and system for operational improvements in a dispatch console in a multi-source environment includes receiving ( | 11-10-2011 |
20110276327 | VOICE-TO-EXPRESSIVE TEXT - A method including receiving a vocal input including words spoken by a user; determining vocal characteristics associated with the vocal input mapping the vocal characteristics to textual characteristics; and generating a voice-to-expressive text that includes, in addition to text corresponding to the words spoken by the user, a textual representation of the vocal characteristics based on the mapping. | 11-10-2011 |
20110276328 | APPLICATION SERVER FOR REDUCING AMBIANCE NOISE IN AN AUSCULTATION SIGNAL, AND FOR RECORDING COMMENTS WHILE AUSCULTATING A PATIENT WITH AN ELECTRONIC STETHOSCOPE - An application server for reducing ambiance noise in an auscultation signal, and for recording comments while auscultating a patient with an electronic stethoscope This application server (AS) comprises: means (SPH) for receiving samples of a raw auscultation signal representing auscultation sounds mixed with ambiance sounds, this raw auscultation signal being transmitted by a first microphone (M | 11-10-2011 |
20110282664 | METHOD AND SYSTEM FOR ASSISTING INPUT OF TEXT INFORMATION FROM VOICE DATA - Methods and systems for providing services and/or computing resources are provided. A method may include converting voice data into text data and tagging at least one portion of the text data in the text conversion file with at least one tag, the at least one tag indicating that the at least one portion of the text data includes a particular type of data. The method may also include displaying the text data on a display such that the at least one portion of text data is displayed with at least one associated graphical element indicating that the at least one portion of text data is associated with the at least one tag. The at least one portion of text data may be a selectable item on the display allowing a user interfacing with the display to select the at least one portion of text data in order to apply the at least one portion of text data to an application. | 11-17-2011 |
20110288861 | Audio Synchronization For Document Narration with User-Selected Playback - Disclosed are techniques and systems to provide a narration of a text. In some aspects, the techniques and systems described herein include generating a timing file that includes elapsed time information for expected portions of text that provides an elapsed time period from a reference time in an audio recording to each portion of text in recognized portions of text | 11-24-2011 |
20110288862 | Methods and Systems for Performing Synchronization of Audio with Corresponding Textual Transcriptions and Determining Confidence Values of the Synchronization - Methods and systems for performing audio synchronization with corresponding textual transcription and determining confidence values of the timing-synchronization are provided. Audio and a corresponding text (e.g., transcript) may be synchronized in a forward and reverse direction using speech recognition to output a time-annotated audio-lyrics synchronized data. Metrics can be computed to quantify and/or qualify a confidence of the synchronization. Based on the metrics, example embodiments describe methods for enhancing an automated synchronization process to possibly adapted Hidden Markov Models (HMMs) to the synchronized audio for use during the speech recognition. Other examples describe methods for selecting an appropriate HMM for use. | 11-24-2011 |
20110288863 | VOICE STREAM AUGMENTED NOTE TAKING - Voice stream augmented note taking may be provided. An audio stream associated with at least one speaker may be recorded and converted into text chunks. A text entry may be received from a user, such as in an electronic document. The text entry may be compared to the text chunks to identify matches, and the matching text chunks may be displayed to the user for selection. | 11-24-2011 |
20110301951 | ELECTRONIC QUESTIONNAIRE - A questionnaire is presented to a user in a more efficient manner in which the user is more likely to participate. The questionnaire is sent electronically to the user's vehicle and presented audibly to the user. The user responds audibly to the questions in the questionnaire. The user's responses are converted to text and sent back to the provider server for tallying. | 12-08-2011 |
20110301952 | SPEECH RECOGNITION PROCESSING SYSTEM AND SPEECH RECOGNITION PROCESSING METHOD - The present invention provides a speech recognition processing system in which speech recognition processing is executed parallelly by plural speech recognizing units. Before text data as the speech recognition result is output from each of the speech recognizing units, information indicating each speaker is parallelly displayed on a display in emission order of each speech. When the text data is output from each of the speech recognizing units, the text data is associated with the information indicating each speaker and the text data is displayed. | 12-08-2011 |
20110307254 | SPEECH RECOGNITION INVOLVING A MOBILE DEVICE - A system and method of speech recognition involving a mobile device. Speech input is received ( | 12-15-2011 |
20110307255 | System and Method for Conversion of Speech to Displayed Media Data - A method for instantaneous and real-time conversion of sound into media data and with the ability to project, print, copy, or manipulate such media data. The invention relates to a method for converting speech to a text string, recognizing the text string, and then displaying the media data that corresponds with the text string. | 12-15-2011 |
20110313764 | System and Method for Latency Reduction for Automatic Speech Recognition Using Partial Multi-Pass Results - A system and method is provided for reducing latency for automatic speech recognition. In one embodiment, intermediate results produced by multiple search passes are used to update a display of transcribed text. | 12-22-2011 |
20110320197 | METHOD FOR INDEXING MULTIMEDIA INFORMATION - It comprises analyzing audio content of multimedia files and performing a speech to text transcription thereof automatically by means of an ASR process, and selecting acoustic and language models adapted for the ASR process at least before the latter processes the multimedia file, i.e. “a priori”. | 12-29-2011 |
20110320198 | INTERACTIVE ENVIRONMENT FOR PERFORMING ARTS SCRIPTS - One or more embodiments present a script to a user in an interactive script environment. A digital representation of a manuscript is analyzed. This digital representation includes a set of roles and a set of information associated with each role in the set of roles. An active role in the set of roles that is associated with a given user is identified based on the analyzing. At least a portion of the manuscript is presented to the given user via a user interface. The portion includes at least a subset of information in the set of information. Information within the set of information that is associated with the active role is presented in a visually different manner than information within the set of information that is associated with a non-active role, which is a role that is associated with a user other than the given user. | 12-29-2011 |
20110320199 | METHOD AND APPARATUS FOR FUSING VOICED PHONEME UNITS IN TEXT-TO-SPEECH - According to one embodiment, an apparatus for fusing voiced phoneme units in Text-To-Speech, includes a reference unit selection module configured to select a reference unit from the plurality of units based on pitch cycle information of the each unit and the number of pitch cycles of the target segment. The apparatus includes a template creation module configured to create a template based on the reference unit selected by the reference unit selection module and the number of pitch cycles of the target segment, wherein the number of pitch cycles of the template is same with that of pitch cycles of the target segment. The apparatus includes a pitch cycle alignment module configured to align pitch cycles of each unit of the plurality of units except the reference unit with pitch cycles of the template by using a dynamic programming algorithm. | 12-29-2011 |
20120004910 | System and method for speech processing and speech to text - Systems and method for processing speech from a user is disclosed. In the system of the present invention, the user's speech is received as input audio stream. The input audio stream is converted text that corresponds to the input audio stream. The converted text is converted to an echo audio stream. Then, the echo audio stream is sent to the user. This process is performed in real time. Accordingly, the user is able to determine whether or not the speech to text process was correct, or that his or her speech was corrected converted to text. If the conversion was incorrect, the user is able to correct the conversion process by using editing commands. The corresponding text is then analyzed to determine the operation which it demands. Then, the operation is performed on the corresponding text. | 01-05-2012 |
20120004911 | Method and Apparatus for Identifying Video Program Material or Content via Nonlinear Transformations - A system for identification of video content in a video signal is provided via a sound track audio signal. The audio signal is processed with filtering and non linear transformations to extract voice signals from the sound track channel. The extracted voice signals are coupled to a speech recognition system to provide in text form, the words of the video content, which is later compared with a reference library of words or dialog from known video programs or movies. Other attributes of the video signal or transport stream may be combined with closed caption data or closed caption text for identification purposes. Example attributes include DVS/SAP information, time code information, histograms, and or rendered video or pictures. | 01-05-2012 |
20120010883 | TRANSCRIPTION DATA EXTRACTION - A computer program product, for performing data determination from medical record transcriptions, resides on a computer-readable medium and includes computer-readable instructions for causing a computer to obtain a medical transcription of a dictation, the dictation being from medical personnel and concerning a patient, analyze the transcription for an indicating phrase associated with a type of data desired to be determined from the transcription, the type of desired data being relevant to medical records, determine whether data indicated by text disposed proximately to the indicating phrase is of the desired type, and store an indication of the data if the data is of the desired type. | 01-12-2012 |
20120016671 | Tool and method for enhanced human machine collaboration for rapid and accurate transcriptions - A system and methods for transcribing text from audio and video files including a set of transcription hosts and an automatic speech recognition system. ASR word-lattices are dynamically selected from either a text box or word-lattice graph wherein the most probable text sequences are presented to the transcriptionist. Secure transcriptions may be accomplished by segmenting a digital audio file into a set of audio slices for transcription by a plurality of transcriptionist. No one transcriptionist is aware of the final transcribed text, only small portions of transcribed text. Secure and high quality transcriptions may be accomplished by segmenting a digital audio file into a set of audio slices, sending them serially to a set of transcriptionists and updating the acoustic and language models at each step to improve the word-lattice accuracy. | 01-19-2012 |
20120022865 | System and Method for Efficiently Reducing Transcription Error Using Hybrid Voice Transcription - A system and method for efficiently reducing transcription error using hybrid voice transcription is provided. A voice stream is parsed from a call into utterances. An initial transcribed value and corresponding recognition score are assigned to each utterance. A transcribed message is generated for the call and includes the initial transcribed values. A threshold is applied to the recognition scores to identify those utterances with recognition scores below the threshold as questionable utterances. At least one questionable utterance is compared to other questionable utterances from other calls and a group of similar questionable utterances is formed. One or more of the similar questionable utterances is selected from the group. A common manual transcription value is received for the selected similar questionable utterances. The common manual transcription value is assigned to the remaining similar questionable utterances in the group. | 01-26-2012 |
20120022866 | Language Model Selection for Speech-to-Text Conversion - Methods, computer program products and systems are described for converting speech to text. Sound information is received at a computer server system from an electronic device, where the sound information is from a user of the electronic device. A context identifier indicates a context within which the user provided the sound information. The context identifier is used to select, from among multiple language models, a language model appropriate for the context. Speech in the sound information is converted to text using the selected language model. The text is provided for use by the electronic device. | 01-26-2012 |
20120022867 | Speech to Text Conversion - Methods, computer program products and systems are described for speech-to-text conversion. A voice input is received from a user of an electronic device and contextual metadata is received that describes a context of the electronic device at a time when the voice input is received. Multiple base language models are identified, where each base language model corresponds to a distinct textual corpus of content. Using the contextual metadata, an interpolated language model is generated based on contributions from the base language models. The contributions are weighted according to a weighting for each of the base language models. The interpolated language model is used to convert the received voice input to a textual output. The voice input is received at a computer server system that is remote to the electronic device. The textual output is transmitted to the electronic device. | 01-26-2012 |
20120022868 | Word-Level Correction of Speech Input - The subject matter of this specification can be implemented in, among other things, a computer-implemented method for correcting words in transcribed text including receiving speech audio data from a microphone. The method further includes sending the speech audio data to a transcription system. The method further includes receiving a word lattice transcribed from the speech audio data by the transcription system. The method further includes presenting one or more transcribed words from the word lattice. The method further includes receiving a user selection of at least one of the presented transcribed words. The method further includes presenting one or more alternate words from the word lattice for the selected transcribed word. The method further includes receiving a user selection of at least one of the alternate words. The method further includes replacing the selected transcribed word in the presented transcribed words with the selected alternate word. | 01-26-2012 |
20120029917 | APPARATUS AND METHOD FOR PROVIDING MESSAGES IN A SOCIAL NETWORK - A system that incorporates teachings of the present disclosure may include, for example, a server including a controller to receive audio signals and content identification information from a media processor, generate text representing a voice message based on the audio signals, determine an identity of media content based on the content identification information, generate an enhanced message having text and additional content where the additional content is obtained by the controller based on the identity of the media content, and transmit the enhanced message to the media processor for presentation on the display device, where the enhanced message is accessible by one or more communication devices that are associated with a social network and remote from the media processor. Other embodiments are disclosed. | 02-02-2012 |
20120029918 | SYSTEMS AND METHODS FOR RECORDING, SEARCHING, AND SHARING SPOKEN CONTENT IN MEDIA FILES - Systems for recording, searching for, and sharing media files among a plurality of users are disclosed. The systems include a server that is configured to receive, index, and store a plurality of media files, which are received by the server from a plurality of sources, within at least one database in communication with the server. In addition, the server is configured to make one or more of the media files accessible to one or more persons—other than the original sources of such media files. Still further, the server is configured to transcribe the media files into text; receive and publish comments associated with the media files within a graphical user interface of a website; and allow users to query and playback excerpted portions of such media files. | 02-02-2012 |
20120035923 | IN-VEHICLE TEXT MESSAGING EXPERIENCE ENGINE - The disclosed invention provides a system and apparatus for providing a telematics system user with an improved texting experience. A messaging experience engine database enables voice avatar/personality selection, acronym conversion, shorthand conversion, and custom audio and video mapping. As an interpreter of the messaging content that is passed through the telematics system, the system eliminates the need for a user to manually manipulate a texting device, or to read such a device. The system recognizes functional content and executes actions based on the identified functional content. | 02-09-2012 |
20120035924 | DISAMBIGUATING INPUT BASED ON CONTEXT - In one implementation, a computer-implemented method includes receiving, at a mobile computing device, ambiguous user input that indicates more than one of a plurality of commands; and determining a current context associated with the mobile computing device that indicates where the mobile computing device is currently located. The method can further include disambiguating the ambiguous user input by selecting a command from the plurality of commands based on the current context associated with the mobile computing device; and causing output associated with performance of the selected command to be provided by the mobile computing device. | 02-09-2012 |
20120035925 | Population of Lists and Tasks from Captured Voice and Audio Content - Automatic capture and population of task and list items in an electronic task or list surface via voice or audio input through an audio recording-capable mobile computing device is provided. A voice or audio task or list item may be captured for entry into a task application interface or into a list authoring surface interface for subsequent use as task items, reminders, “to do” items, list items, agenda items, work organization outlines, and the like. Captured voice or audio content may be transcribed locally or remotely, and transcribed content may be populated into a task or list authoring surface user interface that may be displayed on the capturing device (e.g., mobile telephone), or that may be stored remotely and subsequently displayed in association with a number of applications on a number of different computing devices. | 02-09-2012 |
20120035926 | Platform for Enabling Voice Commands to Resolve Phoneme Based Domain Name Registrations - A method, apparatus, and system are directed towards employing machine representations of phonemes to generate and manage domain names, and/or messaging addresses. A user of a computing device may provide an audio input signal such as obtained from human language sounds. The audio input signal is received at a phoneme encoder that converts the sounds into machine representations of the sounds using a phoneme representation viewable as a sequence of alpha-numeric values. The sequence of alpha-numeric values may then be combined with a host name, or the like to generate a URI, a message address, or the like. The generated URI, message address, or the like, may then be used to communication over a network. | 02-09-2012 |
20120059651 | MOBILE COMMUNICATION DEVICE FOR TRANSCRIBING A MULTI-PARTY CONVERSATION - A mobile communications device includes a network interface for communicating over a wide-area network, an input/output interface for communicating over a PAN and a display. The communication device also includes one or more processors for executing machine-executable instructions and one or more machine-readable storage media for storing the machine-executable instructions. The instructions, when executed by the one more processors, implement a voice proximity component, a speech-to-text component and a user interface. The voice proximity component is configured to select a first user's voice from among a plurality of user voices. The first user voice belongs to a user who is in closest proximity to the mobile communication device. The speech-to-text component is configured to convert to text in real-time speech received from the first user but not the other users. The user interface is arranged for displaying the text on the display as it received over the PAN from the other mobile communication devices. | 03-08-2012 |
20120059652 | METHODS AND SYSTEMS FOR OBTAINING LANGUAGE MODELS FOR TRANSCRIBING COMMUNICATIONS - A method for transcribing a spoken communication includes acts of receiving a spoken first communication from a first sender to a first recipient, obtaining information relating to a second communication, which is different from the first communication, from a second sender to a second recipient, using the obtained information to obtain a language model, and using the language model to transcribe the spoken first communication. | 03-08-2012 |
20120065969 | System and Method for Contextual Social Network Communications During Phone Conversation - An embodiment of the invention includes methods and systems for contextual social network communications during a phone conversation. A telephone conversation between a first user and at least one second user is monitored. More specifically, a monitor identifies terms spoken by the first user and the second user during the telephone conversation. The terms spoken are translated into textual keywords by a translating module. One or more of the second user's web applications are searched by a processor for portion(s) of the second user's web applications that include at least one of the keywords. The processor also searches one or more of the first user's web applications for portion(s) of the first user's web applications that include at least one of the keywords. The portion(s) of the second user's web applications and the portion(s) of the first user's web applications are displayed to the first user during the telephone conversation. | 03-15-2012 |
20120065970 | SYSTEM AND METHOD FOR PROVIDING GROUP DISCUSSIONS - A system and method for providing a discussion, including receiving by a processor text related to a discussion; converting by the processor the text to voice; storing by the processor in a memory the converted voice; receiving by the processor voice related to the discussion; storing by the processor in the memory the received voice; receiving by the processor a request to play voice related to at least part of the discussion; and transmitting by the processor audio containing the voice identified by the request related to the at least part of the discussion. | 03-15-2012 |
20120065971 | VOICE CONTROL OF MULTIMEDIA AND COMMUNICATIONS DEVICES - A method for operating a communications device can include receiving a plurality of spoken commands uttered by a user, the plurality of spoken commands comprising a custom written communication message to be displayed. The method can also include executing a speech recognition engine to recognize and convert each of the spoken commands into corresponding electronic signals that selectively enable and operatively control each of a plurality of multimedia units and at least one light array, wherein the electronic signals are configured to cause multiple light units of the light array to be selectively activated and display the custom written communication message. The method can further include transmitting audio signals received from different ones of the plurality of multimedia units to a radio via a preset open radio channel for broadcasting the audio signals through at least one speaker connected to the radio. | 03-15-2012 |
20120078626 | SYSTEMS AND METHODS FOR CONVERTING SPEECH IN MULTIMEDIA CONTENT TO TEXT - Methods and systems for converting speech to text are disclosed. One method includes analyzing multimedia content to determine the presence of closed captioning data. The method includes, upon detecting closed captioning data, indexing the closed captioning data as associated with the multimedia content. The method also includes, upon failure to detect closed captioning data in the multimedia content, extracting audio data from multimedia content, the audio data including speech data, performing a plurality of speech to text conversions on the speech data to create a plurality of transcripts of the speech data, selecting text from one or more of the plurality of transcripts to form an amalgamated transcript, and indexing the amalgamated transcript as associated with the multimedia content. | 03-29-2012 |
20120078627 | ELECTRONIC DEVICE WITH TEXT ERROR CORRECTION BASED ON VOICE RECOGNITION DATA - During operation of an electronic device such as a cellular telephone with a touch screen display or other electronic equipment, a voice recognition engine may gather data on spoken words. Data on the spoken words that are recognized may be maintained in a spoken word database maintained by an input processor with an autocorrection engine. A user may supply text input that contains mistyped words to the electronic device using the touch screen or a keyboard. The input processor may use the autocorrection engine to automatically replace mistyped words with corrected versions of the mistyped words. The corrected words may be displayed in real time as the user supplies the text input. The autocorrection engine may make word correction decisions based at least partly on information in the spoken word database. | 03-29-2012 |
20120078628 | HEAD-MOUNTED TEXT DISPLAY SYSTEM AND METHOD FOR THE HEARING IMPAIRED - The head-mounted text display system for the hearing impaired is a speech-to-text system, in which spoken words are converted into a visual textual display and displayed to the user in passages containing a selected number of words. The system includes a head-mounted visual display, such as eyeglass-type dual liquid crystal displays or the like, and a controller. The controller includes an audio receiver, such as a microphone or the like, for receiving spoken language and converting the spoken language into electrical signals. The controller further includes a speech-to-text module for converting the electrical signals representative of the spoken language to a textual data signal representative of individual words. A transmitter associated with the controller transmits the textual data signal to a receiver associated with the head-mounted display. The textual data is then displayed to the user in passages containing a selected number of individual words. | 03-29-2012 |
20120078629 | MEETING SUPPORT APPARATUS, METHOD AND PROGRAM - According to one embodiment, a meeting support apparatus includes a storage unit, a determination unit, a generation unit. The storage unit is configured to store storage information for each of words, the storage information indicating a word of the words, pronunciation information on the word, and pronunciation recognition frequency. The determination unit is configured to generate emphasis determination information including an emphasis level that represents whether a first word should be highlighted and represents a degree of highlighting determined in accordance with a pronunciation recognition frequency of a second word when the first word is highlighted, based on whether the storage information includes second set corresponding to first set and based on the pronunciation recognition frequency of the second word when the second set is included. The generation unit is configured to generate an emphasis character string based on the emphasis determination information when the first word is highlighted. | 03-29-2012 |
20120084086 | SYSTEM AND METHOD FOR OPEN SPEECH RECOGNITION - Disclosed herein are systems, methods and non-transitory computer-readable media for performing speech recognition across different applications or environments without model customization or prior knowledge of the domain of the received speech. The disclosure includes recognizing received speech with a collection of domain-specific speech recognizers, determining a speech recognition confidence for each of the speech recognition outputs, selecting speech recognition candidates based on a respective speech recognition confidence for each speech recognition output, and combining selected speech recognition candidates to generate text based on the combination. | 04-05-2012 |
20120089394 | Visual Display of Semantic Information - Techniques involving visual display of information related to matching user utterances against graph patterns are described. In one or more implementations, an utterance of a user is obtained that has been indicated as corresponding to a graph pattern through linguistic analysis. The utterance is displayed in a user interface as a representation of the graph pattern. | 04-12-2012 |
20120089395 | SYSTEM AND METHOD FOR NEAR REAL-TIME IDENTIFICATION AND DEFINITION QUERY - A method of operating a communication system includes generating a transcript of at least a portion of a conversation between a plurality of users. The transcript includes a plurality of subsets of characters. The method further includes displaying the transcript on a plurality of communication devices, identifying an occurrence of at least one selected subset of characters from the plurality of subsets of characters, and querying a definition source for at least one definition for the selected subset of characters. The definition for the selected subset of characters is displayed on the plurality of communication devices. | 04-12-2012 |
20120109648 | Speech Morphing Communication System - A communication system is described. The communication system including an automatic speech recognizer configured to receive a speech signal and to convert the speech signal into a text sequence. The communication also including a speech analyzer configured to receive the speech signal. The speech analyzer configured to extract paralinguistic characteristics from the speech signal. In addition, the communication system includes a speech output device coupled with the automatic speech recognizer and the speech analyzer. The speech output device configured to convert the text sequence into an output speech signal based on the extracted paralinguistic characteristics. | 05-03-2012 |
20120116761 | Minimum Converted Trajectory Error (MCTE) Audio-to-Video Engine - Embodiments of an audio-to-video engine are disclosed. In operation, the audio-to-video engine generates facial movement (e.g., a virtual talking head) based on an input speech. The audio-to-video engine receives the input speech and recognizes the input speech as a source feature vector. The audio-to-video engine then determines a Maximum A Posterior (MAP) mixture sequence based on the source feature vector. The MAP mixture sequence may be a function of a refined Gaussian Mixture Model (GMM). The audio-to-video engine may then use the MAP to estimate video feature parameters. The video feature parameters are then interpreted as facial movement. The facial movement may be stored as data to a storage module and/or it may be displayed as video to a display device. | 05-10-2012 |
20120123778 | Security Control for SMS and MMS Support Using Unified Messaging System - A method and apparatus for providing security control of short messaging service (SMS) messages and multimedia messaging service (MMS) messages in a unified messaging (UM) system are disclosed. An SMS or MMS message directed to a recipient mailbox in a UM system is received. It is determined that the recipient mailbox is a secondary mailbox associated with a primary mailbox in the UM system. The message is audited according to an audit policy associated with the recipient mailbox. | 05-17-2012 |
20120123779 | MOBILE DEVICES, METHODS, AND COMPUTER PROGRAM PRODUCTS FOR ENHANCING SOCIAL INTERACTIONS WITH RELEVANT SOCIAL NETWORKING INFORMATION - Devices, methods, and computer program products are for facilitating enhanced social interactions using a mobile device. A method for facilitating an enhanced social interaction using a mobile device includes receiving an audio input at the mobile device, determining a salient portion of the audio input, receiving relevant information associated with the salient portion, and presenting the relevant information via the mobile device. | 05-17-2012 |
20120130714 | SYSTEM AND METHOD FOR GENERATING CHALLENGE UTTERANCES FOR SPEAKER VERIFICATION - Disclosed herein are systems, methods, and non-transitory computer-readable storage media relating to speaker verification. In one aspect, a system receives a first user identity from a second user, and, based on the identity, accesses voice characteristics. The system randomly generates a challenge sentence according to a rule and/or grammar, based on the voice characteristics, and prompts the second user to speak the challenge sentence. The system verifies that the second user is the first user if the spoken challenge sentence matches the voice characteristics. In an enrollment aspect, the system constructs an enrollment phrase that covers a minimum threshold of unique speech sounds based on speaker-distinctive phonemes, phoneme clusters, and prosody. Then user utters the enrollment phrase and extracts voice characteristics for the user from the uttered enrollment phrase. The system generates a user profile, based on the voice characteristics, for generating random challenge sentences according to a grammar. | 05-24-2012 |
20120143605 | CONFERENCE TRANSCRIPTION BASED ON CONFERENCE DATA - In one implementation, a collaboration server is a conference bridge or other network device configured to host an audio and/or video conference among a plurality of conference participants. The collaboration server sends conference data and a media stream including speech to a speech recognition engine. The conference data may include the conference roster or text extracted from documents or other files shared in the conference. The speech recognition engine updates a default language model according to the conference data and transcribes the speech in the media stream based on the updated language model. In one example, the performance of default language model, the updated language model, or both may be tested using a confidence interval or submitted for approval of the conference participant. | 06-07-2012 |
20120143606 | METHOD AND SYSTEM FOR TESTING CLOSED CAPTION CONTENT OF VIDEO ASSETS - A method and system for monitoring video assets provided by a multimedia content distribution network includes testing closed captions provided in output video signals. A video and audio portion of a video signal are acquired during a time period that a closed caption occurs. A first text string is extracted from a text portion of a video image, while a second text string is extracted from speech content in the audio portion. A degree of matching between the strings is evaluated based on a threshold to determine when a caption error occurs. Various operations may be performed when the caption error occurs, including logging caption error data and sending notifications of the caption error. | 06-07-2012 |
20120143607 | MULTIMODAL DISAMBIGUATION OF SPEECH RECOGNITION - The present invention provides a speech recognition system combined with one or more alternate input modalities to ensure efficient and accurate text input. The speech recognition system achieves less than perfect accuracy due to limited processing power, environmental noise, and/or natural variations in speaking style. The alternate input modalities use disambiguation or recognition engines to compensate for reduced keyboards, sloppy input, and/or natural variations in writing style. The ambiguity remaining in the speech recognition process is mostly orthogonal to the ambiguity inherent in the alternate input modality, such that the combination of the two modalities resolves the recognition errors efficiently and accurately. The invention is especially well suited for mobile devices with limited space for keyboards or touch-screen input. | 06-07-2012 |
20120150537 | FILTERING CONFIDENTIAL INFORMATION IN VOICE AND IMAGE DATA - Confidential information included in image and voice data is filtered in an apparatus that includes an extraction unit for extracting a character string from an image frame, and a conversion unit for converting audio data to a character string. The apparatus also includes a determination unit for determining, in response to contents of a database, whether at least one of the image frame and the audio data include confidential information. The apparatus also includes a masking unit for concealing contents of the image frame by masking the image frame in response to determining that the image frame includes confidential information, and for making the audio data inaudible by masking the audio data in response to determining that the audio data includes confidential information. The playback unit included in the apparatus is for playing back the image frame and the audio data. | 06-14-2012 |
20120150538 | VOICE MESSAGE CONVERTER - A textual representation of a voice message is provided to a communication device, such as a mobile phone, for example, when the mobile phone is operating in a silent mode. The voice message is input by a caller and the voice message converted to phonemes. A text representation of the voice message is transmitted to the mobile phone. The representation includes characters based on the phonemes with well known words being represented in an easily understood shorthand format. | 06-14-2012 |
20120158405 | SYNCHRONISE AN AUDIO CURSOR AND A TEXT CURSOR DURING EDITING - A speech recognition device ( | 06-21-2012 |
20120166191 | ELECTRONIC BOOK WITH VOICE EMULATION FEATURES - A method and system for providing text-to-audio conversion of an electronic book displayed on a viewer. A user selects a portion of displayed text and converts it into audio. The text-to-audio conversion may be performed via a header file and pre-recorded audio for each electronic book, via text-to-speech conversion, or other available means. The user may select manual or automatic text-to audio conversion. The automatic text-to-audio conversion may be performed by automatically turning the pages of the electronic book or by the user manually turning the pages. The user may also select to convert the entire electronic book, or portions of it, into audio. The user may also select an option to receive an audio definition of a particular word in the electronic book. The present invention allows a user to control the system by selecting options from a screen or by entering voice commands. | 06-28-2012 |
20120166192 | PROVIDING TEXT INPUT USING SPEECH DATA AND NON-SPEECH DATA - Systems, methods, and computer readable media providing a speech input interface. The interface can receive speech input and non-speech input from a user through a user interface. The speech input can be converted to text data and the text data can be combined with the non-speech input for presentation to a user. | 06-28-2012 |
20120166193 | METHOD AND SYSTEM FOR AUTOMATIC TRANSCRIPTION PRIORITIZATION - A visual toolkit for prioritizing speech transcription is provided. The toolkit can include a logger ( | 06-28-2012 |
20120173235 | Offline Generation of Subtitles - One embodiment described herein may take the form of a system or method for generating subtitles (also known as “closed captioning”) of an audio component of a multimedia presentation automatically for one or more stored presentations. In general, the system or method may access one or more multimedia programs stored on a storage medium, either as an entire program or in portions. Upon retrieval, the system or method may perform an analysis of the audio component of the program and generate a subtitle text file that corresponds to the audio component. In one embodiment, the system or method may perform a speech recognition analysis on the audio component to generate the subtitle text file. | 07-05-2012 |
20120173236 | SPEECH TO TEXT CONVERTING DEVICE AND METHOD - A speech to text converting device includes a display, a voice receiving module, and a voice recognition module, an input module, and a control module. The voice receiving module receives a speech within a certain period of time. The voice recognition module converts the speech to voice data. The control module establishes text data corresponding to the voice data and displays the text data, any inputted words, and the relevant time period. | 07-05-2012 |
20120179465 | REAL TIME GENERATION OF AUDIO CONTENT SUMMARIES - Audio content is converted to text using speech recognition software. The text is then associated with a distinct voice or a generic placeholder label if no distinction can be made. From the text and voice information, a word cloud is generated based on key words and key speakers. A visualization of the cloud displays as it is being created. Words grow in size in relation to their dominance. When it is determined that the predominant words or speakers have changed, the word cloud is complete. That word cloud continues to be displayed statically and a new word cloud display begins based upon a new set of predominant words or a new predominant speaker or set of speakers. This process may continue until the meeting is concluded. At the end of the meeting, the completed visualization may be saved to a storage device, sent to selected individuals, removed, or any combination of the preceding. | 07-12-2012 |
20120179466 | SPEECH TO TEXT CONVERTING DEVICE AND METHOD - A speech to text converting device includes a display, a voice receiving module, a voice recognition module, an identity recognition module, and a control module. The voice receiving module receives a voice signal. The voice recognition module converts the voice signal to voice data and produces text data corresponding to the voice data. The identity recognition module receives the voice signal and establishes an identity data corresponding to the voice signal. The control module displays the text data and the identity data together on the display. | 07-12-2012 |
20120185249 | METHOD AND SYSTEM FOR SPEECH BASED DOCUMENT HISTORY TRACKING - A method and a system of history tracking corrections in a speech based document. The speech based document comprises one or more sections of text recognized or transcribed from sections of speech, wherein the sections of speech are dictated by a user and processed by a speech recognizer in a speech recognition system into corresponding sections of text of the speech based document. The method comprises associating at least one speech attribute to each section of text in the speech based document, said speech attribute comprising information related to said section of text, respectively; presenting said speech based document on a presenting unit; detecting an action being performed within any of said sections of text; and updating information of said speech attributes related to the kind of action detected on one of said sections of text for updating said speech based document. | 07-19-2012 |
20120185250 | Distributed Dictation/Transcription System - A distributed dictation/transcription system is provided. The system provides a client station, dictation manager, and dictation server networked such that the dictation manager selects a dictation server to transcribe audio from the client station. The dictation manager selects one of a plurality of dictation servers based on conventional load balancing and on a determination of whether the user profile is already uploaded to a dictation server. While selecting a dictation server or uploading a profile, the client may begin dictating, which audio would be stored in a buffer of dictation manager until a dictation server was selected or available. The user may receive in real time or near real time a display of the textual data that may be corrected by the user to update the user profile. | 07-19-2012 |
20120191451 | STORAGE AND ACCESS OF DIGITAL CONTENT - In one embodiment, the invention provides a method, comprising providing a first communications channel to transmit digital content to a notes-access application for storage against a particular user, the first communications channel being selected from the group consisting of an SMS channel, an MMS channel, a fax channel, an e-mail channel, and an IM channel; responsive to receiving digital content from said user via the first communications channel storing said digital content in the database associated with said notes-access application; and providing a second communications channel to the notes-access application whereby the digital content stored by the notes-access application against said user is provided to said user, the second communications channel being selected from the group consisting of an SMS channel, an MMS channel, a fax channel, an e-mail channel, and an IM channel. | 07-26-2012 |
20120191452 | REPRESENTING GROUP INTERACTIONS - Disclosed is a system for generating a representation of a group interaction, the system comprising: a transcription module adapted to generate a transcript of the group interaction from audio source data representing the group interaction, the transcript comprising a sequence of lines of text, each line corresponding to an audible utterance in the audio source data; and a labeling module adapted to generate a conversation path from the transcript by labeling each transcript line with an identifier identifying the speaker of the corresponding utterance in the audio source data; and generate the representation of the group interaction by associating the conversation path with a plurality of voice profiles, each voice profile corresponding to an identified speaker in the conversation path. | 07-26-2012 |
20120197640 | System and Method for Unsupervised and Active Learning for Automatic Speech Recognition - A system and method is provided for combining active and unsupervised learning for automatic speech recognition. This process enables a reduction in the amount of human supervision required for training acoustic and language models and an increase in the performance given the transcribed and un-transcribed data. | 08-02-2012 |
20120203551 | AUTOMATED FOLLOW UP FOR E-MEETINGS - Embodiments of the present invention provide a method, system and computer program product for automated follow-up for e-meetings. In an embodiment of the invention, a method for automated follow-up for e-meetings is provided. The method includes monitoring content provided to an e-meeting managed by an e-meeting server executing in memory of a host computer. The method also includes applying a rule in a rules base to the monitored content. Finally, the method includes triggering generation of a follow up item in response to applying the rule to the monitored content. | 08-09-2012 |
20120203552 | CONTROLLING A SET-TOP BOX VIA REMOTE SPEECH RECOGNITION - A device may receive over a network a digitized speech signal from a remote control that accepts speech. In addition, the device may convert the digitized speech signal into text, use the text to obtain command information applicable to a set-top box, and send the command information to the set-top box to control presentation of multimedia content on a television in accordance with the command information. | 08-09-2012 |
20120209605 | METHOD AND APPARATUS FOR DATA EXPLORATION OF INTERACTIONS - Retrieving data from audio interactions associated with an organization. Retrieving the data comprises: receiving a corpus containing interactions; performing natural language processing on a text document representing an interaction from the corpus; extracting at least one keyphrase from the text document; assigning a rank to the at least one keyphrase; modeling relations between at least two keyphrases using the rank; and identifying topics relevant for the organization from the relations. | 08-16-2012 |
20120209606 | METHOD AND APPARATUS FOR INFORMATION EXTRACTION FROM INTERACTIONS - Obtaining information from audio interactions associated with an organization. The information may comprise entities, relations or events. The method comprises: receiving a corpus comprising audio interactions; performing audio analysis on audio interactions of the corpus to obtain text documents; performing linguistic analysis of the text documents; matching the text documents with one or more rules to obtain one or more matches; and unifying or filtering the matches. | 08-16-2012 |
20120209607 | METHOD AND APPARATUS FOR SCROLLING TEXT DISPLAY OF VOICE CALL OR MESSAGE DURING VIDEO DISPLAY SESSION - A method and communication device disclosed includes displaying a video on a display, converting voice audio data to textual data by applying voice-to-text conversion, and displaying the textual data as scrolling text displayed along with the video on the display and either above, below or across the video. The method may further include receiving a voice call indication from a network, providing the voice call indication to a user interface where the voice call indication corresponds to an incoming voice call; and receiving a user input for receiving the voice call and displaying the voice call as scrolling text. In another embodiment, a method includes displaying application related data on a display; converting voice audio data to textual data by applying voice-to-text conversion; converting the textual data to a video format; and displaying the textual data as scrolling text over the application related data on the display. | 08-16-2012 |
20120215532 | HEARING ASSISTANCE SYSTEM FOR PROVIDING CONSISTENT HUMAN SPEECH - Broadly speaking, the embodiments disclosed herein describe an apparatus, system, and method that allows a user of a hearing assistance system to perceive consistent human speech. The consistent human speech can be based upon user specific preferences. | 08-23-2012 |
20120215533 | Method of and System for Error Correction in Multiple Input Modality Search Engines - A method of and system for error correction in multiple input modality search engines is presented. A method of processing input information based on an information type of the input information includes receiving input information for performing a search for identifying at least one item desired by a user and determining an information type associated with the input information. The method also includes forming a query input for identifying the at least one item desired by the user based on the input information and on the information type. The method further includes submitting the query input to at least one search engine system. | 08-23-2012 |
20120215534 | System and Method for Automatic Storage and Retrieval of Emergency Information - A vehicle communication system includes a computer processor in communication with a memory circuit, a transceiver in communication with the processor and operable to communicate with one or more wireless devices, and one or more storage locations storing one or more pieces of emergency contact information. In this illustrative system, the processor is operable to establish communication with a first wireless device through the transceiver. Upon detection of an emergency event by at least one vehicle based sensor system, the vehicle communication system is operable to contact an emergency operator. The vehicle communication system is further operable to display one or more of the one or more pieces of emergency contact information in a selectable manner. Upon selection of one of the one or more pieces of emergency contact information, the vehicle computing system places a call to a phone number associated with the selected emergency contact. | 08-23-2012 |
20120221330 | LEVERAGING SPEECH RECOGNIZER FEEDBACK FOR VOICE ACTIVITY DETECTION - A voice activity detection (VAD) module analyzes a media file, such as an audio file or a video file, to determine whether one or more frames of the media file include speech. A speech recognizer generates feedback relating to an accuracy of the VAD determination. The VAD module leverages the feedback to improve subsequent VAD determinations. The VAD module also utilizes a look-ahead window associated with the media file to adjust estimated probabilities or VAD decisions for previously processed frames. | 08-30-2012 |
20120221331 | Method and Apparatus for Automatically Building Conversational Systems - A system and method provides a natural language interface to world-wide web content. Either in advance or dynamically, webpage content is parsed using a parsing algorithm. A person using a telephone interface can provide speech information, which is converted to text and used to automatically fill in input fields on a webpage form. The form is then submitted to a database search and a response is generated. Information contained on the responsive webpage is extracted and converted to speech via a text-to-speech engine and communicated to the person. | 08-30-2012 |
20120221332 | SYSTEM AND METHOD FOR REFERRING TO ENTITIES IN A DISCOURSE DOMAIN - Systems, methods, and non-transitory computer-readable media for referring to entities. The method includes receiving domain-specific training data of sentences describing a target entity in a context, extracting a speaker history and a visual context from the training data, selecting attributes of the target entity based on at least one of the speaker history, the visual context, and speaker preferences, generating a text expression referring to the target entity based on at least one of the selected attributes, the speaker history, and the context, and outputting the generated text expression. The weighted finite-state automaton can represent partial orderings of word pairs in the domain-specific training data. The weighted finite-state automaton can be speaker specific or speaker independent. The weighted finite-state automaton can include a set of weighted partial orderings of the training data for each possible realization. | 08-30-2012 |
20120226499 | SCRIPTING SUPPORT FOR DATA IDENTIFIERS, VOICE RECOGNITION AND SPEECH IN A TELNET SESSION - Methods of adding data identifiers and speech/voice recognition functionality are disclosed. A telnet client runs one or more scripts that add data identifiers to data fields in a telnet session. The input data is inserted in the corresponding fields based on data identifiers. Scripts run only on the telnet client without modifications to the server applications. Further disclosed are methods for providing speech recognition and voice functionality to telnet clients. Portions of input data are converted to voice and played to the user. A user also may provide input to certain fields of the telnet session by using his voice. Scripts running on the telnet client convert the user's voice into text and is inserted to corresponding fields. | 09-06-2012 |
20120232897 | Locating Products in Stores Using Voice Search From a Communication Device - A user can locate products by dialing a number from any phone and accessing an automatic voice recognition system. Reply is made to the user with information locating the product using a store's product location data converted to automatic voice responses. Smart phone and mobile web access to a product database is enabled using voice-to-text and text search. A taxonomy enables product search requests by product descriptions and/or product brand names, and enable synonyms and phonetic enhancements to the system. Search results are related to products and product categories with concise organization. Relevant advertisements, promotional offers and coupons are delivered based upon search and taxonomy elements. Search requests generate dynamic interior maps of a products location inside the shoppers' location, assisting a shopper to efficiently shop the location for listed items. Business intelligence of product categories enable rapid scaling across retail segments. | 09-13-2012 |
20120232898 | SYSTEM AND METHOD OF PROVIDING AN AUTOMATED DATA-COLLECTION IN SPOKEN DIALOG SYSTEMS - The invention relates to a system and method for gathering data for use in a spoken dialog system. An aspect of the invention is generally referred to as an automated hidden human that performs data collection automatically at the beginning of a conversation with a user in a spoken dialog system. The method comprises presenting an initial prompt to a user, recognizing a received user utterance using an automatic speech recognition engine and classifying the recognized user utterance using a spoken language understanding module. If the recognized user utterance is not understood or classifiable to a predetermined acceptance threshold, then the method re-prompts the user. If the recognized user utterance is not classifiable to a predetermined rejection threshold, then the method transfers the user to a human as this may imply a task-specific utterance. The received and classified user utterance is then used for training the spoken dialog system. | 09-13-2012 |
20120239395 | Selection of Text Prediction Results by an Accessory - A method for entering text in a text input field using a non-keyboard type accessory includes selecting a character for entry into the text field presented by a portable computing device. The portable computing device determines whether text suggestions are available based on the character. If text suggestions are available, the portable computing device can determine the text suggestions and send them to the accessory, which in turn can display the suggestions on a display. A user operating the accessory can select one of the text suggestions, expressly reject the text suggestions, or ignore the text suggestions. If a text suggestion is selected, the accessory can send the selected text to the portable computing device for populating the text field. | 09-20-2012 |
20120239396 | MULTIMODAL REMOTE CONTROL - A method and system for operating a remotely controlled device may use multimodal remote control commands that include a gesture command and a speech command. The gesture command may be interpreted from a gesture performed by a user, while the speech command may be interpreted from speech utterances made by the user. The gesture and speech utterances may be simultaneously received by the remotely controlled device in response to displaying a user interface configured to receive multimodal commands. | 09-20-2012 |
20120239397 | Digital Ink Database Searching Using Handwriting Feature Synthesis - A method of searching a digital ink database is disclosed. The digital ink database is associated with a specific author. The method starts by receiving a computer text query from an input device. The computer text query is then mapped to a set of feature vectors using a handwriting model of that specific author. As a result, the set of feature vectors approximates features that would have been extracted had that specific author written the computer query text by hand. Finally, the set of feature vectors is used to search the digital ink database. | 09-20-2012 |
20120245934 | SPEECH RECOGNITION DEPENDENT ON TEXT MESSAGE CONTENT - A method of automatic speech recognition. An utterance is received from a user in reply to a text message, via a microphone that converts the reply utterance into a speech signal. The speech signal is processed using at least one processor to extract acoustic data from the speech signal. An acoustic model is identified from a plurality of acoustic models to decode the acoustic data, and using a conversational context associated with the text message. The acoustic data is decoded using the identified acoustic model to produce a plurality of hypotheses for the reply utterance. | 09-27-2012 |
20120245935 | ELECTRONIC DEVICE AND SERVER FOR PROCESSING VOICE MESSAGE - An electronic device includes a voice processing unit, a wireless communication unit, and a combining unit. The voice processing unit receives speech signals. The wireless communication unit sends the speech signals to a server. The server converts the speech signals into a text message. The wireless communication unit receives the text message from the server. The combining unit combines the text message and the speech signals into a combined message. The wireless communication unit further sends the combined message to a recipient. A related server is also provided. | 09-27-2012 |
20120245936 | Device to Capture and Temporally Synchronize Aspects of a Conversation and Method and System Thereof - A system, device, and method for capturing and temporally synchronizing different aspect of a conversation is presented. The method includes receiving an audible statement, receiving a note temporally corresponding to an utterance in the audible statement, creating a first temporal marker comprising temporal information related to the note, transcribing the utterance into a transcribed text, creating a second temporal marker comprising temporal information related to the transcribed text, temporally synchronizing the audible statement, the note, and the transcribed text. Temporally synchronizing comprises associating a time point in the audible statement with the note using the first temporal marker, associating the time point in the audible statement with the transcribed text using the second temporal marker, and associating the note with the transcribed text using the first temporal marker and second temporal marker. | 09-27-2012 |
20120245937 | Voice Rendering Of E-mail With Tags For Improved User Experience - Tags, such as XML tags, are inserted into email to separate email content from signature blocks, privacy notices and confidentiality notices, and to separate original email messages from replies and replies from further replies. The tags are detected by a system that renders email as speech, such as voice command platform or network-based virtual assistant or message center. The system can render an original email message in one voice mode and the reply in a different voice mode. The tags can be inserted to identify a voice memo in which a user responds to a particular portion of an email message. | 09-27-2012 |
20120245938 | METHODS, SYSTEMS, AND COMPUTER PROGRAM PRODUCTS FOR MANAGING AUDIO AND/OR VIDEO INFORMATION VIA A WEB BROADCAST - Provided Web broadcast information is managed by annotation markers. At least one marker is received that annotates the audio and/or video information and the annotated audio and/or video information is saved in an electronically searchable file. | 09-27-2012 |
20120253801 | AUTOMATIC DETERMINATION OF AND RESPONSE TO A TOPIC OF A CONVERSATION - A system, computer-readable medium, and method for automatically determining a topic of a conversation and responding to the topic determination are provided. In the method, an active topic is defined as a first topic in response to execution of an application. The first topic includes first text defining a plurality of phrases, a probability of occurrence associated with each of the plurality of phrases, and a response associated with each of the plurality of phrases. Speech text recognized from a recorded audio signal is received. Recognition of the speech text is based at least partially on the probability of occurrence associated with each of the plurality of phrases of the first topic. A phrase of the plurality of phrases associated with the received speech text is identified. The response associated with the identified phrase is performed by the computing device. The response includes instructions defining an action triggered by occurrence of the received speech text, wherein the action includes defining the active topic as a second topic. | 10-04-2012 |
20120253802 | Location-Based Conversational Understanding - Location-based conversational understanding may be provided. Upon receiving a query from a user, an environmental context associated with the query may be generated. The query may be interpreted according to the environmental context. The interpreted query may be executed and at least one result associated with the query may be provided to the user. | 10-04-2012 |
20120253803 | VOICE RECOGNITION DEVICE AND VOICE RECOGNITION METHOD - According to embodiments, a voice inputting unit converts voice into a digital signal. The state detecting unit includes an acceleration sensor, and detects movement and/or a state of an equipment main body. The holding unit stores movement or state pattern models of predetermined movement or a state of the equipment main body and predetermined voice recognition process patterns corresponding to the models. The pattern detecting unit detects whether or not movement and/or a state of the equipment main body from the state detecting unit matches the movement or state pattern models stored in the holding unit, and detects a voice recognition process pattern corresponding to the matched model. The voice recognition process executing unit executes the voice recognition process on the digital signal output from the voice inputting unit according to the detected voice recognition process pattern. | 10-04-2012 |
20120253804 | VOICE PROCESSOR AND VOICE PROCESSING METHOD - According to one embodiment, a voice processor includes: a storage module; a converter; a character string converter; a similarity calculator; and an output module. The storage module stores therein first character string information and a first phoneme symbol corresponding thereto in association with each other. The converter converts an input voice into a second phoneme symbol. The character string converter converts the second phoneme symbol into second character string information in which content of the voice is described in a natural language. The similarity calculator calculates similarity between the input voice and a portion of the first character string information stored in the storage module using at least one of the second phoneme symbol converted by the converter and the second character string information converted by the character string converter. The output module outputs the first character string information based on the similarity calculated by the similarity calculator. | 10-04-2012 |
20120259633 | AUDIO-INTERACTIVE MESSAGE EXCHANGE - A completely hands free exchange of messages, especially in portable devices, is provided through a combination of speech recognition, text-to-speech (TTS), and detection algorithms. An incoming message may be read aloud to a user and the user enabled to respond to the sender with a reply message through audio input upon determining whether the audio interaction mode is proper. Users may also be provided with options for responding in a different communication mode (e.g., a call) or perform other actions. Users may further be enabled to initiate a message exchange using natural language. | 10-11-2012 |
20120259634 | MUSIC PLAYBACK DEVICE, MUSIC PLAYBACK METHOD, PROGRAM, AND DATA CREATION DEVICE - There is provided a music playback device comprising a playback unit configured to playback music, an analysis unit configured to analyze lyrics of the music and extract a word or a phrase included in the lyrics, an acquisition unit configured to acquire an image using the word or the phrase extracted by the analysis unit, and a display control unit configured to, during playback of the music, cause a display device to display the image acquired by the acquisition unit. | 10-11-2012 |
20120259635 | Document Certification and Security System - A system for the storing of client information in an independent repository is disclosed. Client data may be uploaded by client or those authorized by client or collected and stored by the repository. Data about the client file such as, for example, the time of upload and modifications are stored in a metadata file associated with the client file. | 10-11-2012 |
20120259636 | METHOD AND APPARATUS FOR PROCESSING SPOKEN SEARCH QUERIES - Some embodiments relate to a method of performing a search for content on the Internet, in which a user may speak a search query and speech recognition may be performed on the spoken query to generate a text search query to be provided to a plurality of search engines. This enables a user to speak the search query rather than having to type it, and also allows the user to provide the search query only once, rather than having to provide it separately to multiple different search engines. | 10-11-2012 |
20120265527 | INTERACTIVE VOICE RECOGNITION ELECTRONIC DEVICE AND METHOD - An interactive voice recognition electronic device converts a received voice signal to a text, and searches a voice databases to find a matched voice text of the converted text. The matched voice text is taken as a recognized voice text of the voice signal if the matched voice text exists in the voice database. The electronic device obtains a predetermined number of similar voice texts if no matched voice text exists in the voice database. The electronic device converts the predetermined number of similar voice texts to the voice signals, outputs the converted voice signals in turn, and selects one of the similar voice texts as the recognized voice text according to the selection of the user. The electronic device obtains the associated answer text of the recognized voice text in the voice database and converts the answer text to voice signals. | 10-18-2012 |
20120265528 | Using Context Information To Facilitate Processing Of Commands In A Virtual Assistant - A virtual assistant uses context information to supplement natural language or gestural input from a user. Context helps to clarify the user's intent and to reduce the number of candidate interpretations of the user's input, and reduces the need for the user to provide excessive clarification input. Context can include any available information that is usable by the assistant to supplement explicit user input to constrain an information-processing problem and/or to personalize results. Context can be used to constrain solutions during various phases of processing, including, for example, speech recognition, natural language processing, task flow processing, and dialog generation. | 10-18-2012 |
20120265529 | SYSTEMS AND METHODS FOR OBTAINING AND DISPLAYING AN X-RAY IMAGE - A method and system for transcription of spoken language into continuous text for a user comprising the steps of inputting spoken language of at least one user or of a communication partner of the at least one user into a mobile device of the respective user, wherein the input spoken language of the user is transported within a corresponding stream of voice over IP data packets to a transcription server; transforming the spoken language transported within the respective stream of voice over IP data packets into continuous text by means of a speech recognition algorithm run by said transcription server, wherein said speech recognition algorithm is selected depending on a natural language or dialect spoken in the area of the current position of said mobile device; and outputting said transformed continuous text forwarded by said transcription server to said mobile device of the respective user or to a user terminal of the respective user in real time. | 10-18-2012 |
20120278071 | TRANSCRIPTION SYSTEM - A transcription system automates the control of the playback of the audio to accommodate the user's ability to transcribe the words spoken. In some examples, a delay between playback and typed input is estimated by processing the typed words using a wordspotting approach. The estimated delay is used as in input to an automated speed control, for example, to maintain a target or maximum delay between playback and typed input. | 11-01-2012 |
20120278072 | REMOTE HEALTHCARE SYSTEM AND HEALTHCARE METHOD USING THE SAME - A remote healthcare system includes a healthcare staff terminal which includes an input part configured to input text to be transmitted to a patient by a healthcare staff member, and a first transmitter-receiver part configured to transmit the text and a qualifier of the healthcare staff member; a server which includes a second transmitter-receiver part configured to receive the text and the qualifier of the healthcare staff member transmitted from the healthcare staff terminal, an acoustic source database having an acoustic source of the healthcare staff member stored therein, and a converter configured to change the text into voice using the stored acoustic source of the healthcare staff member; and a patient terminal which includes a third transmitter-receiver part configured to receive the voice converted from the text and the text transmitted by the second transmitter-receiver part of the server, and an output part configured to output the voice to the patient who is managed by the healthcare staff member. | 11-01-2012 |
20120278073 | MOBILE SYSTEMS AND METHODS OF SUPPORTING NATURAL LANGUAGE HUMAN-MACHINE INTERACTIONS - A mobile system is provided that includes speech-based and non-speech-based interfaces for telematics applications. The mobile system identifies and uses context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for users that submit requests and/or commands in multiple domains. The invention creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command. The invention may organize domain specific behavior and information into agents, that are distributable or updateable over a wide area network. | 11-01-2012 |
20120278074 | MULTISENSORY SPEECH DETECTION - A computer-implemented method of multisensory speech detection is disclosed. The method comprises determining an orientation of a mobile device and determining an operating mode of the mobile device based on the orientation of the mobile device. The method further includes identifying speech detection parameters that specify when speech detection begins or ends based on the determined operating mode and detecting speech from a user of the mobile device based on the speech detection parameters. | 11-01-2012 |
20120284024 | Text Interface Device and Method in Voice Communication - A computerized communication device has a display screen, a mechanism for a user to select words or phrases displayed on the display screen, and software executing from a non-transitory physical medium, the software providing a function for providing audio signal output in a connected voice-telephone call from the text words or phrases selected by a user. | 11-08-2012 |
20120290298 | SYSTEM AND METHOD FOR OPTIMIZING SPEECH RECOGNITION AND NATURAL LANGUAGE PARAMETERS WITH USER FEEDBACK - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for assigning saliency weights to words of an ASR model. The saliency values assigned to words within an ASR model are based on human perception judgments of previous transcripts. These saliency values are applied as weights to modify an ASR model such that the results of the weighted ASR model in converting a spoken document to a transcript provide a more accurate and useful transcription to the user. | 11-15-2012 |
20120290299 | Translating Between Spoken and Written Language - Techniques for converting spoken speech into written speech are provided. The techniques include transcribing input speech via speech recognition, mapping each spoken utterance from input speech into a corresponding formal utterance, and mapping each formal utterance into a stylistically formatted written utterance. | 11-15-2012 |
20120290300 | APPARATUS AND METHOD FOR FOREIGN LANGUAGE STUDY - The apparatus for foreign language study includes: a voice recognition device configured to recognize a speech entered by a user and convert the speech into a speech text; a speech intent recognition device configured to extract a user speech intent for the speech text using skill level information of the user and dialogue context information; and a feedback processing device configured to extract a different expression depending on the user speech intent and a speech situation of the user. According to the present invention, the intent of a learner's speech may be determined even though the learner's skill is low, and customized expressions for various situations may be provided to the learner. | 11-15-2012 |
20120290301 | METHOD AND SYSTEM OF ENABLING INTELLIGENT AND LIGHTWEIGHT SPEECH TO TEXT TRANSCRIPTION THROUGH DISTRIBUTED ENVIRONMENT - A system includes at least one wireless client device, a service manager, and a plurality of voice transcription servers. The service manager includes a resource management service and a profile management service. The client device communicates the presence of a voice transcription task to the resource management service. The resource management service surveys the plurality of voice transcription servers and selects one voice transcription server based on a set of predefined criteria. The resource management service then communicated an address of the selected server to the profile management service, which then transmits a trained voice profile or default profile to the selected server. The address of the selected server is then sent to the client device, which then transmits an audio stream to the server. Finally, the selected server transcribes the audio stream to a textual format. | 11-15-2012 |
20120296646 | MULTI-MODE TEXT INPUT - Concepts and technologies are described herein for multi-mode text input. In accordance with the concepts and technologies disclosed herein, content is received. The content can include one or more input indicators. The input indicators can indicate that user input can be used in conjunction with consumption or use of the content. The application is configured to analyze the content to determine context associated with the content and/or the client device executing the application. The application also is configured to determine, based upon the content and/or the contextual information, which input device to use to obtain input associated with use or consumption of the content. Input captured with the input device can be converted to text and used during use or consumption of the content. | 11-22-2012 |
20120296647 | INFORMATION PROCESSING APPARATUS - In an embodiment, an information processing apparatus includes: a converting unit; a selecting unit; a dividing unit; a generating unit; and a display processing unit. The converting unit recognizes a voice input from a user into a character string. The selecting unit selects characters from the character string according to designation of the user. The dividing unit converts the selected characters into phonetic characters and divides the phonetic characters into phonetic characters of sound units. The generating unit extracts similar character candidates corresponding to each of the divided phonetic characters of the sound units, from a similar character dictionary storing a plurality of phonetic characters of sound units similar in sound as the similar character candidates in association with each other, and generates correction character candidates for the selected characters. The display processing unit makes a display unit display the generated correction character candidates selectable by the user. | 11-22-2012 |
20120303368 | NUMBER-ASSISTANT VOICE INPUT SYSTEM, NUMBER-ASSISTANT VOICE INPUT METHOD FOR VOICE INPUT SYSTEM AND NUMBER-ASSISTANT VOICE CORRECTING METHOD FOR VOICE INPUT SYSTEM - The present invention discloses a number-assistant voice input system, a number-assistant voice input method for a voice input system and a number-assistant voice correcting method for a voice input system, which apply software to drive a voice input system of an electronic device to provide a voice input logic circuit module. The voice input logic circuit module defines the pronunciation of numbers 1 to 26 as the paths to respectively input letters A to Z in the voice input system and allows users to selectively input or correct a letter by reading a number from 1 to 26 instead of a letter from A to Z. | 11-29-2012 |
20120310642 | AUTOMATICALLY CREATING A MAPPING BETWEEN TEXT DATA AND AUDIO DATA - Techniques are provided for creating a mapping that maps locations in audio data (e.g., an audio book) to corresponding locations in text data (e.g., an e-book). Techniques are provided for using a mapping between audio data and text data, whether or not the mapping is created automatically or manually. A mapping may be used for bookmark switching where a bookmark established in one version of a digital work is used to identify a corresponding location with another version of the digital work. Alternatively, the mapping may be used to play audio that corresponds to text selected by a user. Alternatively, the mapping may be used to automatically highlight text in response to audio that corresponds to the text being played. Alternatively, the mapping may be used to determine where an annotation created in one media context (e.g., audio) will be consumed in another media context (e.g., text). | 12-06-2012 |
20120310643 | METHODS AND APPARATUS FOR PROOFING OF A TEXT INPUT - Techniques for presenting data input as a plurality of data chunks including a first data chunk and a second data chunk. The techniques include converting the plurality of data chunks to a textual representation comprising a plurality of text chunks including a first text chunk corresponding to the first data chunk and a second text chunk corresponding to the second data chunk, respectively, and providing a presentation of at least part of the textual representation such that the first text chunk is presented differently than the second text chunk to, when presented, assist a user in proofing the textual representation. | 12-06-2012 |
20120310644 | INSERTION OF STANDARD TEXT IN TRANSCRIPTION - A computer program product, for automatically editing a medical record transcription, resides on a computer-readable medium and includes computer-readable instructions for causing a computer to obtain a first medical transcription of a dictation, the dictation being from medical personnel and concerning a patient, analyze the first medical transcription for presence of a first trigger phrase associated with a first standard text block, determine that the first trigger phrase is present in the first medical transcription if an actual phrase in the first medical transcription corresponds with the first trigger phrase, and insert the first standard text block into the first medical transcription. | 12-06-2012 |
20120310645 | INTEGRATION OF EMBEDDED AND NETWORK SPEECH RECOGNIZERS - A method, computer program product, and system are provided for performing a voice command on a client device. The method can include translating, using a first speech recognizer located on the client device, an audio stream of a voice command to a first machine-readable voice command and generating a first query result using the first machine-readable voice command to query a client database. In addition, the audio stream can be transmitted to a remote server device that translates the audio stream to a second machine-readable voice command using a second speech recognizer. Further, the method can include receiving a second query result from the remote server device, where the second query result is generated by the remote server device using the second machine-readable voice command and displaying the first query result and the second query result on the client device. | 12-06-2012 |
20120316873 | METHOD OF PROVIDING INFORMATION AND MOBILE TELECOMMUNICATION TERMINAL THEREOF - A method of providing information of a mobile communication terminal, and a mobile communication terminal for performing the method, are provided. The method includes determining whether a search command event has been generated during a call with a counterpart terminal, converting a voice signal received from a microphone into a text when the generation of search command event is determined to have been generated, identifying information matching the text in a memory, and sending the information to the counterpart terminal. | 12-13-2012 |
20120316874 | RADIOLOGY VERIFICATION SYSTEM AND METHOD - A system and method of radiology verification is provided. The verification may be implemented as a standalone software utility, as part of a radiology imaging graphical user interface, or within a more complex computing system configured for generating radiology reports. | 12-13-2012 |
20120316875 | HOSTED SPEECH HANDLING - Embodiments of the invention provide systems and methods for speech signal handling. Speech handling according to one embodiment of the present invention can be performed via a hosted architecture. Electrical signal representing human speech can be analyzed with an Automatic Speech Recognizer (ASR) hosted on a different server from a media server or other server hosting a service utilizing speech input. Neither server need be located at the same location as the user. The spoken sounds can be accepted as input to and handled with a media server which identifies parts of the electrical signal that contain a representation of speech. This architecture can serve any user who has a web-browser and Internet access, either on a PC, PDA, cell phone, tablet, or any other computing device. | 12-13-2012 |
20120323572 | Document Extension in Dictation-Based Document Generation Workflow - An automatic speech recognizer is used to produce a structured document representing the contents of human speech. A best practice is applied to the structured document to produce a conclusion, such as a conclusion that required information is missing from the structured document. Content is inserted into the structured document based on the conclusion, thereby producing a modified document. The inserted content may be obtained by prompting a human user for the content and receiving input representing the content from the human user. | 12-20-2012 |
20120330658 | SYSTEMS AND METHODS TO PRESENT VOICE MESSAGE INFORMATION TO A USER OF A COMPUTING DEVICE - Systems and methods to process and/or present information relating to voice messages for a user that are received from other persons. In one embodiment, a method implemented in a data processing system includes: receiving first data associated with prior communications or activities for a first user on a mobile device; receiving a voice message for the first user; transcribing the voice message using the first data to provide a transcribed message; and sending the transcribed message to the mobile device for display to the user. | 12-27-2012 |
20120330659 | INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD, AND INFORMATION PROCESSING PROGRAM - An information processing device includes a display data creating unit configured to create display data including characters representing the content of an utterance based on a sound and a symbol surrounding the characters and indicating a first direction, and an image combining unit configured to determine the position of the display data based on a display position of an image representing a sound source of the utterance, and to combine the display data and the image of the sound source so that an orientation in which the sound is radiated is matched with the first direction. | 12-27-2012 |
20120330660 | Detecting and Communicating Biometrics of Recorded Voice During Transcription Process - A method and system for determining and communicating biometrics of a recorded speaker in a voice transcription process. An interactive voice response system receives a request from a user for a transcription of a voice file. A profile associated with the requesting user is obtained, wherein the profile comprises biometric parameters and preferences defined by the user. The requested voice file is analyzed for biometric elements according to the parameters specified in the user's profile. Responsive to detecting biometric elements in the voice file that conform to the parameters specified in the user's profile, a transcription output of the voice file is modified according to the preferences specified in the user's profile for the detected biometric elements to form a modified transcription output file. The modified transcription output file may then be provided to the requesting user. | 12-27-2012 |
20120330661 | Electronic Devices with Voice Command and Contextual Data Processing Capabilities - An electronic device may capture a voice command from a user. The electronic device may store contextual information about the state of the electronic device when the voice command is received. The electronic device may transmit the voice command and the contextual information to computing equipment such as a desktop computer or a remote server. The computing equipment may perform a speech recognition operation on the voice command and may process the contextual information. The computing equipment may respond to the voice command. The computing equipment may also transmit information to the electronic device that allows the electronic device to respond to the voice command. | 12-27-2012 |
20130006625 | EXTENDED VIDEOLENS MEDIA ENGINE FOR AUDIO RECOGNITION - A system, method, and computer program product for automatically analyzing multimedia data audio content are disclosed. Embodiments receive multimedia data, detect portions having specified audio features, and output a corresponding subset of the multimedia data and generated metadata. Audio content features including voices, non-voice sounds, and closed captioning, from downloaded or streaming movies or video clips are identified as a human probably would do, but in essentially real time. Particular speakers and the most meaningful content sounds and words and corresponding time-stamps are recognized via database comparison, and may be presented in order of match probability. Embodiments responsively pre-fetch related data, recognize locations, and provide related advertisements. The content features may be also sent to search engines so that further related content may be identified. User feedback and verification may improve the embodiments over time. | 01-03-2013 |
20130006626 | VOICE-BASED TELECOMMUNICATION LOGIN - A voice-based telecommunications login system which includes a login process controller; a speech recognition module; a speaker verification module; a speech synthesis module; and a user database. Responsive to a user-provided first verbal answer to a first verbal question, the first verbal answer is converted to text and compared with data previously stored in the user database. The speech synthesis module provides a second question to the user, and responsive to a user-provided second verbal answer to the second question, the speaker verification module compares the second verbal answer with a voice print of the user previously stored in the user database and validates that the second verbal answer matches a voice print of the user previously stored in the user database. Also disclosed is a method of logging in to the telecommunications system and a computer program product for logging in to the telecommunications system. | 01-03-2013 |
20130006627 | Method and System for Communicating Between a Sender and a Recipient Via a Personalized Message Including an Audio Clip Extracted from a Pre-Existing Recording - A method of communicating between a sender and a recipient via a personalized message is disclosed comprising: (a) identifying text, via the user interface of a communication device, of a desired lyric phrase from within a pre-existing audio recording; (b) extracting audio substantially associated with the desired lyric phrase from the pre-existing recording into a desired audio clip; (c) inputting personalized text via the user interface; (d) creating the personalized message with the sender identification, the personalized text and access to the desired audio clip; (e) sending an electronic message to the electronic address of the recipient, wherein the electronic message may be an SMS/EMS/MMS message, instant message or email message including a link to the personalized message or an EMS/MMS or email message including the personalized message. An associated method of earning money from the communication along with associated systems are also disclosed. | 01-03-2013 |
20130006628 | GENERATING REPRESENTATIONS OF GROUP INTERACTIONS - A transcript of a group interaction is generated from audio source data representing the group interaction. The transcript includes a sequence of lines of text, each line corresponding to an audible utterance in the audio source data. A conversation path is generated from the transcript by labeling each transcript line with an identifier identifying the speaker of the corresponding utterance in the audio source data. A representation of the group interaction is generated by associating the conversation path with a set of voice profiles, each voice profile corresponding to an identified speaker in the conversation path. | 01-03-2013 |
20130013305 | METHOD AND SUBSYSTEM FOR SEARCHING MEDIA CONTENT WITHIN A CONTENT-SEARCH SERVICE SYSTEM - Various embodiments of the present invention include concept-service components of content-search-service systems which employ ontologies and vocabularies prepared for particular categories of content at particular times in order to score transcripts prepared from content items to enable a search-service component of a content-search-service system to assign estimates of the relatedness of portions of a content item to search criteria in order to render search results to clients of the content-search-service system. The concept-service component processes a search request to generate lists of related terms, and then employs the lists of related terms to process transcripts in order to score transcripts based on information contained in the ontologies. | 01-10-2013 |
20130013306 | TRANSCRIPTION DATA EXTRACTION - A computer program product, for performing data determination from medical record transcriptions, resides on a computer-readable medium and includes computer-readable instructions for causing a computer to obtain a medical transcription of a dictation, the dictation being from medical personnel and concerning a patient, analyze the transcription for an indicating phrase associated with a type of data desired to be determined from the transcription, the type of desired data being relevant to medical records, determine whether data indicated by text disposed proximately to the indicating phrase is of the desired type, and store an indication of the data if the data is of the desired type. | 01-10-2013 |
20130013307 | DIFFERENTIAL DYNAMIC CONTENT DELIVERY WITH TEXT DISPLAY IN DEPENDENCE UPON SIMULTANEOUS SPEECH - Differential dynamic content delivery including providing a session document for a presentation, wherein the session document includes a session grammar and a session structured document; selecting from the session structured document a classified structural element in dependence upon user classifications of a user participant in the presentation; presenting the selected structural element to the user; streaming presentation speech to the user including individual speech from at least one user participating in the presentation; converting the presentation speech to text; detecting whether the presentation speech contains simultaneous individual speech from two or more users; and displaying the text if the presentation speech contains simultaneous individual speech from two or more users. | 01-10-2013 |
20130018655 | CONTINUOUS SPEECH TRANSCRIPTION PERFORMANCE INDICATION - A method of providing speech transcription performance indication includes receiving, at a user device data representing text transcribed from an audio stream by an ASR system, and data representing a metric associated with the audio stream; displaying, via the user device, said text; and via the user device, providing, in user-perceptible form, an indicator of said metric. Another method includes displaying, by a user device, text transcribed from an audio stream by an ASR system; and via the user device, providing, in user-perceptible form, an indicator of a level of background noise of the audio stream. Another method includes receiving data representing an audio stream; converting said data representing an audio stream to text via an ASR system; determining a metric associated with the audio stream; transmitting data representing said text to a user device; and transmitting data representing said metric to the user device. | 01-17-2013 |
20130018656 | FILTERING TRANSCRIPTIONS OF UTTERANCES - A method for facilitating mobile phone messaging, such as text messaging and instant messaging, includes receiving audio data communicated from the mobile communication device, the audio data representing an utterance that is intended to be at least a portion of the text of the message that is to be sent from the mobile phone to a recipient; transcribing the utterance to text based on the received audio data to generate a transcription; and applying a filter to the transcribed text to generate a filtered transcription, the text of which is intended to mimic language patterns of mobile device messaging that is performed manually by users. The method may also be applied to the audio data of a voicemail, with the filtered, transcribed text being communicated to a mobile phone as, for example, an SMS text message. | 01-17-2013 |
20130024195 | CORRECTIVE FEEDBACK LOOP FOR AUTOMATED SPEECH RECOGNITION - A method for facilitating the updating of a language model includes receiving, at a client device, via a microphone, an audio message corresponding to speech of a user; communicating the audio message to a first remote server; receiving, that the client device, a result, transcribed at the first remote server using an automatic speech recognition system (“ASR”), from the audio message; receiving, at the client device from the user, an affirmation of the result; storing, at the client device, the result in association with an identifier corresponding to the audio message; and communicating, to a second remote server, the stored result together with the identifier. | 01-24-2013 |
20130030804 | SYSTEMS AND METHODS FOR IMPROVING THE ACCURACY OF A TRANSCRIPTION USING AUXILIARY DATA SUCH AS PERSONAL DATA - A method is described for improving the accuracy of a transcription generated by an automatic speech recognition (ASR) engine. A personal vocabulary is maintained that includes replacement words. The replacement words in the personal vocabulary are obtained from personal data associated with a user. A transcription is received of an audio recording. The transcription is generated by an ASR engine using an ASR vocabulary and includes a transcribed word that represents a spoken word in the audio recording. Data is received that is associated with the transcribed word. A replacement word from the personal vocabulary is identified, which is used to re-score the transcription and replace the transcribed word. | 01-31-2013 |
20130030805 | TRANSCRIPTION SUPPORT SYSTEM AND TRANSCRIPTION SUPPORT METHOD - According to one embodiment, a transcription support system supports transcription work to convert voice data to text. The system includes a first storage unit configured to store therein the voice data; a playback unit configured to play back the voice data; a second storage unit configured to store therein voice indices, each of which associates a character string obtained from a voice recognition process with voice positional information, for which the voice positional information is indicative of a temporal position in the voice data and corresponds to the character string; a text creating unit that creates the text in response to an operation input of a user; and an estimation unit configured to estimate already-transcribed voice positional information indicative of a position at which the creation of the text is completed in the voice data based on the voice indices. | 01-31-2013 |
20130030806 | TRANSCRIPTION SUPPORT SYSTEM AND TRANSCRIPTION SUPPORT METHOD - In an embodiment, a transcription support system includes: a first storage, a playback unit, a second storage, a text generating unit, an estimating unit, and a setting unit. The first storage stores the voice data therein; a playback unit plays back the voice data; and a second storage stores voice indices, each of which associates a character string obtained from a voice recognition process with voice positional information, for which the voice positional information is indicative of a temporal position in the voice data and corresponds to the character string. The text creating unit creates text; the estimating unit estimates already-transcribed voice positional information based on the voice indices; and the setting unit sets a playback starting position that indicates a position at which playback is started in the voice data based on the already-transcribed voice positional information. | 01-31-2013 |
20130030807 | WIRELESS SPEECH RECOGNITION TOOL - The wireless voice recognition system for data retrieval comprises a server, a database and an input/output device, operably connected to the server. When the user speaks, the voice transmission is converted into a data stream using a specialized user interface. The input/output device and the server exchange the data stream. The server uses a programming interface having an engine to match and compare the stream of audible data to a data element of selected searchable information. A data element of recognized information is generated and transferred to the input/output device for user verification. | 01-31-2013 |
20130035936 | LANGUAGE TRANSCRIPTION - A transcription system is applicable to transcription for a language in which there is limited pronunciation and/or acoustic data. A transcription station is configured using pronunciation data and acoustic data for use with the language. The pronunciation data and/or the acoustic data is initially from another dialect of a language, another language from a language group, or is universal (e.g., not specific to any particular language). A partial transcription of the audio recording is accepted via the transcription station (e.g., from a transcriptionist). One or more repetitions of one or more portions of the partial transcription are identified in the audio recording, and can be accepted during transcription. The pronunciation data and/or the acoustic data is updated in a bootstrapping manner during transcription, thereby improving the efficiency of the transcription process. | 02-07-2013 |
20130035937 | System And Method For Efficiently Transcribing Verbal Messages To Text - A system and method for efficiently transcribing verbal messages to text is provided. Verbal messages are received and at least one of the verbal messages is divided into segments. Automatically recognized text is determined for each of the segments by performing speech recognition and a confidence rating is assigned to the automatically recognized text for each segment. A threshold is applied to the confidence ratings and those segments with confidence ratings that fall below the threshold are identified. The segments that fall below the threshold are assigned to one or more human agents starting with those segments that have the lowest confidence ratings. Transcription from the human agents is received for the segments assigned to that agent. The transcription is assembled with the automatically recognized text of the segments not assigned to the human agents as a text message for the at least one verbal message. | 02-07-2013 |
20130041661 | AUDIO COMMUNICATION ASSESSMENT - A device may include a communication interface configured to receive audio signals associated with audible communications from a user; an output device; and logic. The logic may be configured to determine one or more audio qualities associated with the audio signals, map the one or more audio qualities to at least one value, generate audio-related information based on the mapping, and provide, via the output device during the audible communications, the audio-related information to the user. | 02-14-2013 |
20130041662 | SYSTEM AND METHOD OF CONTROLLING SERVICES ON A DEVICE USING VOICE DATA - A device and method to control applications using voice data. In one embodiment, a method includes detecting voice data from a user, converting the voice data to text data, matching the text data to an identifier the identifier associated with a list of identifiers for controlling operation of the application, and controlling the application based on the identifier matched with the text data. In another embodiment, voice data may be received from a control device. | 02-14-2013 |
20130041663 | COMMUNICATION APPLICATION FOR CONDUCTING CONVERSATIONS INCLUDING MULTIPLE MEDIA TYPES IN EITHER A REAL-TIME MODE OR A TIME-SHIFTED MODE - A communication application configured to support a conversation among participants over a communication network. The communication application is configured to (i) support one or more media types within the context of the conversation, (ii) interleave the one or more media types in a time-indexed order within the context of the conversation, (iii) enable the participants to render the conversation including the interleaved one or more media types in either a real-time rendering mode or time-shifted rendering mode, and (iv) seamlessly transition the conversation between the two modes so that the conversation may take place substantially live when in the real-time rendering mode or asynchronously when in the time-shifted rendering mode. | 02-14-2013 |
20130041664 | Method and Apparatus for Annotating Video Content With Metadata Generated Using Speech Recognition Technology - A method and apparatus is provided for annotating video content with metadata generated using speech recognition technology. The method begins by rendering video content on a display device. A segment of speech is received from a user such that the speech segment annotates a portion of the video content currently being rendered. The speech segment is converted to a text-segment and the text-segment is associated with the rendered portion of the video content. The text segment is stored in a selectively retrievable manner so that it is associated with the rendered portion of the video content. | 02-14-2013 |
20130046537 | Systems and Methods for Providing an Electronic Dictation Interface - Some embodiments disclosed herein store a target application and a dictation application. The target application may be configured to receive input from a user. The dictation application interface may include a full overlay mode option, where in response to selection of the full overlay mode option, the dictation application interface is automatically sized and positioned over the target application interface to fully cover a text area of the target application interface to appear as if the dictation application interface is part of the target application interface. The dictation application may be further configured to receive an audio dictation from the user, convert the audio dictation into text, provide the text in the dictation application interface and in response to receiving a first user command to complete the dictation, automatically copy the text from the dictation application interface and inserting the text into the target application interface. | 02-21-2013 |
20130046538 | VISUALIZATION INTERFACE OF CONTINUOUS WAVEFORM MULTI-SPEAKER IDENTIFICATION - A method implemented in a computer infrastructure having computer executable code having programming instructions tangibly embodied on a computer readable storage medium. The programming instructions are operable to receive a current waveform of a communication between a plurality of participants. Additionally, the programming instructions are operable to create a voiceprint from the current waveform if the current waveform is of a human voice. Furthermore, the programming instructions are operable to determine one of whether a match exists between the voiceprint and one library waveform of one or more library waveforms, whether a correlation exists between the voiceprint and a number of library waveforms of the one or more library waveforms and whether the voiceprint is unique. Additionally, the programming instructions are operable to transcribe the current waveform into text and provide a match indication display (MID) indicating an association between the current waveform and the one or more library waveforms based on the determining. | 02-21-2013 |
20130054237 | COMMUNICATIONS SYSTEM WITH SPEECH-TO-TEXT CONVERSION AND ASSOCIATED METHODS - A communications system includes a first communications device cooperating with a second communications device. The first communications device multiplexes a digital speech message and a corresponding text message into a multiplexed signal, and wirelessly transmits the multiplexed signal. The second communications device wirelessly receives the multiplexed signal, de-multiplexes the multiplexed signal digital into the speech message and the corresponding text message, decodes the speech message for an audio output transducer, and operates a text processor on the corresponding text message for display. The corresponding text message is displayed in synchronization with the speech message output by the audio output transducer. A memory is coupled to the text processor for storing the text message, and the text processor is configured to display the stored text message. | 02-28-2013 |
20130054238 | Using Multiple Modality Input to Feedback Context for Natural Language Understanding - Input context for a statistical dialog manager may be provided. Upon receiving a spoken query from a user, the query may be categorized according to at least one context clue. The spoken query may then be converted to text according to a statistical dialog manager associated with the category of the query and a response to the spoken query may be provided to the user. | 02-28-2013 |
20130054239 | System, Method and Computer Program Product for Dataset Authoring and Presentation with Timer and Randomizer - A system, method and computer program product for authoring and presenting discrete data elements and datasets on any computing device are described. Said datasets can comprise of typed, entered or speech-converted text, numbers, images, and sounds. Said system and method feature a user-controlled timer that can be set in intervals of one or more milliseconds and can be used to display said data elements in said dataset in succession. Another feature described is a randomizer which can present said data elements in said dataset in an unpredictable and random order. | 02-28-2013 |
20130054240 | APPARATUS AND METHOD FOR RECOGNIZING VOICE BY USING LIP IMAGE - An apparatus and a method for recognizing a voice by using a lip image are provided. The apparatus includes: a voice recognizer which recognizes a voice of a user and outputs text information based on the recognized voice; a lip shape detector which detects a lip shape of the user; and a voice recognition result verifier which determines whether the text information output by the voice recognizer is correct, by using a result of the detection by the lip shape detector. | 02-28-2013 |
20130054241 | RAPID TRANSCRIPTION BY DISPERSING SEGMENTS OF SOURCE MATERIAL TO A PLURALITY OF TRANSCRIBING STATIONS - A method and system for producing and working with transcripts according to the invention eliminates time inefficiencies. By dispersing a source recording to a transcription team in small segments, so that team members transcribe segments in parallel, a rapid transcription process delivers a fully edited transcript within minutes. Clients can view accurate, grammatically correct, proofread and fact-checked documents that shadow live proceedings by mere minutes. The rapid transcript includes time coding, speaker identification and summary. A viewer application allows a client to view a video recording side-by-side with a transcript. Clicking on a word in the transcript locates the corresponding recorded content; advancing a recording to a particular point locates and displays the corresponding spot in the transcript. The recording is viewed using common video features, and may be downloaded. The client can edit the transcript and insert comments. Any number of colleagues can view and edit simultaneously. | 02-28-2013 |
20130060568 | OBSERVATION PLATFORM FOR PERFORMING STRUCTURED COMMUNICATIONS - Using structured communications within an organization or retail environment, the users establish a fabric of communications that allows external users of devices or applications to integrate in a way that is non-disruptive, measured and structured. An observation platform may be used for performing structured communications. A signal is received from a first communication device at a second communication device associated with a computer system, wherein the computer system is associated with an organization, wherein a first characteristic of the signal corresponds to an audible source and a second characteristic of the signal corresponds to information indicative of a geographic position of the first communication device. | 03-07-2013 |
20130066630 | AUDIO TRANSCRIPTION GENERATOR AND EDITOR - A system for correcting errors in automatically generated audio transcriptions includes an audio recorder, a computerized transcription generator, a voice recording, a collection of link data, transcription text, an audio player, a system of cross linking, and a text editor including a text display with a cursor. The system permits a user to correct transcription errors using techniques of jump to position; show position; and track playback. | 03-14-2013 |
20130080162 | User Query History Expansion for Improving Language Model Adaptation - Query history expansion may be provided. Upon receiving a spoken query from a user, an adapted language model may be applied to convert the spoken query to text. The adapted language model may comprise a plurality of queries interpolated from the user's previous queries and queries associated with other users. The spoken query may be executed and the results of the spoken query may be provided to the user. | 03-28-2013 |
20130080163 | INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD AND COMPUTER PROGRAM PRODUCT - According to an embodiment, an information processing apparatus includes a storage unit, a detector, an acquisition unit, and a search unit. The storage unit configured to store therein voice indices, each of which associates a character string included in voice text data obtained from a voice recognition process with voice positional information, the voice positional information indicating a temporal position in the voice data and corresponding to the character string. The acquisition unit acquires reading information being at least a part of a character string representing a reading of a phrase to be transcribed from the voice data played back. The search unit specifies, as search targets, character strings whose associated voice positional information is included in the played-back section information among the character strings included in the voice indices, and retrieves a character string including the reading represented by the reading information from among the specified character strings. | 03-28-2013 |
20130080164 | Selective Feedback For Text Recognition Systems - This specification describes technologies relating to recognition of text in various media. In general, one aspect of the subject matter described in this specification can be embodied in methods that include receiving an input signal including data representing one or more words and passing the input signal to a text recognition system that generates a recognized text string based on the input signal. The methods may further include receiving the recognized text string from the text recognition system. The methods may further include presenting the recognized text string to a user and receiving a corrected text string based on input from the user. The methods may further include checking if an edit distance between the corrected text string and the recognized text string is below a threshold. If the edit distance is below the threshold, the corrected text string may be passed to the text recognition system for training purposes. | 03-28-2013 |
20130085754 | Interactive Text Editing - A method for providing suggestions includes capturing audio that includes speech and receiving textual content from a speech recognition engine. The speech recognition engine performs speech recognition on the audio signal to obtain the textual content, which includes one or more passages. The method also includes receiving a selection of a portion of a first word in a passage in the textual content, wherein the passage includes multiple words, and retrieving a set of suggestions that can potentially replace the first word. At least one suggestion from the set of suggestions provides a multi-word suggestion for potentially replacing the first word. The method further includes displaying, on a display device, the set of suggestions, and highlighting a portion of the textual content, as displayed on the display device, for potentially changing to one of the suggestions from the set of suggestions. | 04-04-2013 |
20130085755 | Systems And Methods For Continual Speech Recognition And Detection In Mobile Computing Devices - The present application describes systems, articles of manufacture, and methods for continuous speech recognition for mobile computing devices. One embodiment includes determining whether a mobile computing device is receiving operating power from an external power source or a battery power source, and activating a trigger word detection subroutine in response to determining that the mobile computing device is receiving power from the external power source. In some embodiments, the trigger word detection subroutine operates continually while the mobile computing device is receiving power from the external power source. The trigger word detection subroutine includes determining whether a plurality of spoken words received via a microphone includes one or more trigger words, and in response to determining that the plurality of spoken words includes at least one trigger word, launching an application corresponding to the at least one trigger word included in the plurality of spoken words. | 04-04-2013 |
20130090924 | DEVICE, SYSTEM AND METHOD FOR ENABLING SPEECH RECOGNITION ON A PORTABLE DATA DEVICE - Devices, systems and methods for converting analog data to digital data or digital data to analog data for enabling speech recognition processing on a portable data device are provided. The system includes at least one portable data device including an input module configured to receive analog audio signals; a processing module configured to convert the analog audio signals to digital audio data; a communication module configured to transmit the digital audio data to a remote processor and to receive digital text data from the remote processor; and a display module for displaying the received digital text data; the remote processor configured for receiving digital audio data, converting the digital audio data to digital text data and transmitting the converted digital text data to the at least one portable data device; and a communications network for coupling the remote processor to the at least one portable data device. | 04-11-2013 |
20130096916 | MULTICHANNEL DEVICE UTILIZING A CENTRALIZED OUT-OF-BAND AUTHENTICATION SYSTEM (COBAS) - A multichannel security system is disclosed, which system is for granting and denying access to a host computer in response to a demand from an access-seeking individual and computer. The access-seeker has a peripheral device operative within an authentication channel to communicate with the security system. The access-seeker initially presents identification and password data over an access channel which is intercepted and transmitted to the security computer. The security computer then communicates with the access-seeker. A biometric analyzer—a voice or fingerprint recognition device—operates upon instructions from the authentication program to analyze the monitored parameter of the individual. In the security computer, a comparator matches the biometric sample with stored data, and, upon obtaining a match, provides authentication. The security computer instructs the host computer to grant access and communicates the same to the access-seeker, whereupon access is initiated over the access channel. | 04-18-2013 |
20130103399 | DETERMINING AND CONVEYING CONTEXTUAL INFORMATION FOR REAL TIME TEXT - Aspects relate to machine recognition of human voices in live or recorded audio content, and delivering text derived from such live or recorded content as real time text, with contextual information derived from characteristics of the audio. For example, volume information can be encoded as larger and smaller font sizes. Speaker changes can be detected and indicated through text additions, or color changes to the font. A variety of other context information can be detected and encoded in graphical rendition commands available through RTT, or by extending the information provided with RTT packets, and processing that extended information accordingly for modifying the display of the RTT text content. | 04-25-2013 |
20130103400 | Document Transcription System Training - A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system my identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript. | 04-25-2013 |
20130103401 | METHOD AND SYSTEM FOR SPEECH BASED DOCUMENT HISTORY TRACKING - A method and a system of history tracking corrections in a speech based document are disclosed. The speech based document comprises one or more sections of text recognized or transcribed from sections of speech, wherein the sections of speech are dictated by a user and processed by a speech recognizer in a speech recognition system into corresponding sections of text of the speech based document. The method comprises associating of at least one speech attribute ( | 04-25-2013 |
20130110509 | DISTRIBUTED USER INPUT TO TEXT GENERATED BY A SPEECH TO TEXT TRANSCRIPTION SERVICE | 05-02-2013 |
20130110510 | NATURAL LANGUAGE CALL ROUTER | 05-02-2013 |
20130117018 | VOICE CONTENT TRANSCRIPTION DURING COLLABORATION SESSIONS - A method, computer program product, and system for voice content transcription during collaboration sessions is described. A method may comprise receiving an indication to provide one or more real-time voice content-to-text content transcriptions to a first collaboration session participant. The one or more real-time voice content-to-text content transcriptions may correspond to voice content of a second collaboration session participant in one or more collaboration sessions including the first collaboration session participant and the second collaboration session participant. The method may additionally comprise defining a preference for the first collaboration session participant to receive the one or more real-time voice content-to-text content transcriptions corresponding to the voice content of the second collaboration session participant in the one or more collaboration sessions including the first collaboration session participant and the second collaboration session participant based upon, at least in part, the indication. | 05-09-2013 |
20130117019 | Remote Laboratory Gateway - A remote laboratory gateway enables a plurality of students to access and control a laboratory experiment remotely. Access is provided by an experimentation gateway, which is configured to provide secure access to the experiment via a network-centric, web-enabled interface graphical user interface. Experimental hardware is directly controlled by an experiment controller, which is communicatively coupled to the experimentation gateway and which may be a software application, a standalone computing device, or a virtual machine hosted on the experimentation gateway. The remote laboratory of the present specification may be configured for a software-as-a-service business model. | 05-09-2013 |
20130117020 | PERSONALIZED ADVERTISEMENT DEVICE BASED ON SPEECH RECOGNITION SMS SERVICE, AND PERSONALIZED ADVERTISEMENT EXPOSURE METHOD BASED ON SPEECH RECOGNITION SMS SERVICE - Disclosed are a personalized advertisement device based on speech recognition SMS services and a personalized advertisement exposure method based on speech recognition SMS services. The present invention provides a personalized advertisement device based on speech recognition SMS services and a personalized advertisement exposure method based on speech recognition SMS services capable of maximizing an effect of advertisement by grasping user's intention, an emotion state, and positional information from speech data uttered by a user during a process of providing speech recognition SMS services, configuring advertisements based thereon, and exposing the configured advertisements to a user. | 05-09-2013 |
20130117021 | MESSAGE AND VEHICLE INTERFACE INTEGRATION SYSTEM AND METHOD - A method and system uses an integration application to extract an information feature from a message and to provide the information feature to a vehicle interface device which acts on the information feature to provide a service. The extracted information feature may be automatically acted upon, or may be outputted for review, editing, and/or selection before being acted on. The vehicle interface device may include a navigation system, infotainment system, telephone, and/or a head unit. The message may be received by the vehicle interface device or from a portable or remote device in linked communication with the vehicle interface device. The message may be a voice-based or text-based message. The service may include placing a call, sending a message, or providing navigation instructions using the information feature. An off-board or back-end service provider in communication with the integration application may extract and/or transcribe the information feature and/or provide a service. | 05-09-2013 |
20130117022 | Personalized Vocabulary for Digital Assistant - Methods, systems, and computer readable storage medium related to operating an intelligent digital assistant are disclosed. A text string is obtained from a speech input received from a user. The received text string is interpreted to derive a representation of user intent based at least in part on a plurality of words associated with a user and stored in memory associated with the user, the plurality of words including words from a plurality of user interactions with an automated assistant. At least one domain, a task, and at least one parameter for the task, are identified based at least in part on the representation of user intent. The identified task is performed. An output is provided to the user, where the output is related to the performance of the task. | 05-09-2013 |
20130124202 | METHOD AND APPARATUS FOR PROCESSING SCRIPTS AND RELATED DATA - Provided in some embodiments is a method including receiving ordered script words are indicative of dialogue words to be spoken, receiving audio data corresponding to at least a portion of the dialogue words to be spoken and including timecodes associated with dialogue words, generating a matrix of the ordered script words versus the dialogue words, aligning the matrix to determine hard alignment points that include matching consecutive sequences of ordered script words with corresponding sequences of dialogue words, partitioning the matrix of ordered script words into sub-matrices bounded by adjacent hard-alignment points and including corresponding sub-sets the script and dialogue words between the hard-alignment points, and aligning each of the sub-matrices. The alignment of the sub-matrices including: matching script and dialogue words of the sub-subsets, assigning timecodes for matched ordered script words, and interpolating timecodes for the unmatched script words based on the timecodes of the matched script words. | 05-16-2013 |
20130124203 | Aligning Scripts To Dialogues For Unmatched Portions Based On Matched Portions - Provided in some embodiments is a computer implemented method that includes providing script data including script words indicative of dialogue words to be spoken, providing recorded dialogue audio data corresponding to at least a portion of the dialogue words to be spoken, wherein the recorded dialogue audio data includes timecodes associated with recorded audio dialogue words, matching at least some of the script words to corresponding recorded audio dialogue words to determine alignment points, determining that a set of unmatched script words are accurate based on the matching of at least some of the script words matched to corresponding recorded audio dialogue words, generating time-aligned script data including the script words and their corresponding timecodes and the set of unmatched script words determined to be accurate based on the matching of at least some of the script words matched to corresponding recorded audio dialogue words. | 05-16-2013 |
20130124204 | Displaying Sound Indications On A Wearable Computing System - Example methods and systems for displaying one or more indications that indicate (i) the direction of a source of sound and (ii) the intensity level of the sound are disclosed. A method may involve receiving audio data corresponding to sound detected by a wearable computing system. Further, the method may involve analyzing the audio data to determine both (i) a direction from the wearable computing system of a source of the sound and (ii) an intensity level of the sound. Still further, the method may involve causing the wearable computing system to display one or more indications that indicate (i) the direction of the source of the sound and (ii) the intensity level of the sound. | 05-16-2013 |
20130132079 | INTERACTIVE SPEECH RECOGNITION - A first plurality of audio features associated with a first utterance may be obtained. A first text result associated with a first speech-to-text translation of the first utterance may be obtained based on an audio signal analysis associated with the audio features, the first text result including at least one first word. A first set of audio features correlated with at least a first portion of the first speech-to-text translation associated with the at least one first word may be obtained. A display of at least a portion of the first text result that includes the at least one first word may be initiated. A selection indication may be received, indicating an error in the first speech-to-text translation, the error associated with the at least one first word. | 05-23-2013 |
20130132080 | SYSTEM AND METHOD FOR CROWD-SOURCED DATA LABELING - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for crowd-sourced data labeling. The system requests a respective response from each of a set of entities. The set of entities includes crowd workers. Next, the system incrementally receives a number of responses from the set of entities until at least one of an accuracy threshold is reached and m responses are received, wherein the accuracy threshold is based on characteristics of the number of responses. Finally, the system generates an output response based on the number of responses. | 05-23-2013 |
20130132081 | CONTENTS PROVIDING SCHEME USING SPEECH INFORMATION - An apparatus for providing contents based on speech information is provided. The apparatus includes a speech information reception unit configured to receive speech information from a first device, a device identification unit configured to receive device information of the first device from the first device and identify the first device based on the received device information, a speech information translation unit configured to translate the speech information into text information according to the received device information, and a contents provision unit configured to search for contents based on the translated text information, and provide the searched contents to a second device. | 05-23-2013 |
20130138438 | SYSTEMS AND METHODS FOR CAPTURING, PUBLISHING, AND UTILIZING METADATA THAT ARE ASSOCIATED WITH MEDIA FILES - Systems for recording, searching for, and obtaining metadata that are relevant to a plurality of media files are disclosed. The systems generally include a server that is configured to receive, index, and store a plurality of media files, which are received by the server from a plurality of sources, within at least one database in communication with the server. In addition, the server is configured to make one or more of the media files accessible to and searchable by, one or more persons—other than the original sources of such media files. Still further, the server is configured to display metadata that are associated with each media file. Such metadata may include links to one or more profile pages that are published within one or more social networks, with each of such profile pages being correlated with a unique voice signature that is detected within each media file. In addition, these metadata may include a geographical area from which each media file is provided to the server; a date on which each media file was created; a popularity index that is assigned to each media file; one or more theme categories that are assigned to each media file; or combinations of the above. | 05-30-2013 |
20130144619 | ENHANCED VOICE CONFERENCING - Techniques for ability enhancement are described. Some embodiments provide an ability enhancement facilitator system (“AEFS”) configured to enhance voice conferencing among multiple speakers. In one embodiment, the AEFS receives data that represents utterances of multiple speakers who are engaging in a voice conference with one another. The AEFS then determines speaker-related information, such as by identifying a current speaker, locating an information item (e.g., an email message, document) associated with the speaker, or the like. The AEFS then informs a user of the speaker-related information, such as by presenting the speaker-related information on a display of a conferencing device associated with the user. | 06-06-2013 |
20130151250 | HYBRID SPEECH RECOGNITION - Described is a technology by which speech is locally and remotely recognized in a hybrid way. Speech is input and recognized locally, with remote recognition invoked if locally recognized speech data was not confidently recognized. The part of the speech that was not confidently recognized is sent to the remote recognizer, along with any confidently recognized text, which the remote recognizer may use as context data in interpreting the part of the speech data that was sent. Alternative text candidates may be sent instead of corresponding speech to the remote recognizer. | 06-13-2013 |
20130151251 | AUTOMATIC DIALOG REPLACEMENT BY REAL-TIME ANALYTIC PROCESSING - An automated method and apparatus for automatic dialog replacement having an optional I/O interface converts an A/V stream into a format suitable for automated processing. The I/O interface feeds the A/V stream to a dubbing engine for generating new dubbed dialog from said A/V stream. A dubber/slicer replaces the original dialog with the new dubbed dialog in the A/V stream. The I/O interface then transmits the A/V stream that is enhanced with a new dubbed dialog. | 06-13-2013 |
20130158991 | METHODS AND SYSTEMS FOR COMMUNICATING AUDIO CAPTURED ONBOARD AN AIRCRAFT - Methods and systems are provided for communicating information from an aircraft to a computer system at a ground location. One exemplary method involves obtaining an audio input from an audio input device onboard the aircraft, generating text data comprising a textual representation of the one or more words of the audio input, and communicating the text data to the computer system at the ground location. | 06-20-2013 |
20130158992 | SPEECH PROCESSING SYSTEM AND METHOD - An exemplary speech processing method includes extracting voice features from the stored audio files. Next, the method extracts speech(s) of a speaker from one or more audio files that contains voice feature matching one selected voice model, to form a single audio file, implements a speech-to-text algorithm to create a textual file based on the single audio file, and further records time point(s). The method then associates each of the words in the converted text with corresponding recorded time points recorded. Next, the method searches for an input keyword in the converted textual file. The method further obtains a time point associated with a word first appearing in the textual file that matches the keyword, and further controls an audio play device to play the single audio file at the determined time point. | 06-20-2013 |
20130158993 | Audio User Interface With Audio Cursor - An audio user interface is provided in which items are represented in an audio field by corresponding synthesized sound sources from where sounds related to the items appear to emanate. An audio cursor, in the form of a synthesised sound source from which a distinctive cursor sound emanates, is movable in the audio field under user control. Upon the cursor being moved close to an item-representing sound source, a related audible indication is generated by modifying the sounds emanating from at least one of that item-representing sound source and the cursor. In one embodiment, this audible indication also indicates the current distance between the cursor and item-representing sound source and also the direction of the latter from the cursor. | 06-20-2013 |
20130158994 | RETRIEVAL AND PRESENTATION OF NETWORK SERVICE RESULTS FOR MOBILE DEVICE USING A MULTIMODAL BROWSER - A method of obtaining information using a mobile device can include receiving a request including speech data from the mobile device, and querying a network service using query information extracted from the speech data, whereby search results are received from the network service. The search results can be formatted for presentation on a display of the mobile device. The search results further can be sent, along with a voice grammar generated from the search results, to the mobile device. The mobile device then can render the search results. | 06-20-2013 |
20130158995 | METHODS AND APPARATUSES RELATED TO TEXT CAPTION ERROR CORRECTION - Systems and methods related to providing error correction in a text caption are disclosed. A method may comprise displaying a text caption including one or more blocks of text on each of a first device and a second device remote from the first device. The method may also include generating another block of text and replacing a block of text of the text caption with the another block of text. Furthermore, the method may include displaying the text caption on the second device having the block of text of the first text caption replaced by the another block of text. | 06-20-2013 |
20130166292 | Accessing Content Using a Source-Specific Content-Adaptable Dialogue - A system for accessing content maintains a set of content selections associated with a first user. The system receives first original content from a first content source associated with a first one of the content selections associated with the first user. The system applies, to the first original content, a first rule (such as a parsing rule) that is specific to the first one of the content selections, to produce first derived content. The system changes the state of at least one component of a human-machine dialogue system (such as a text-to-act engine, a dialogue manager, or an act-to-text engine) based on the first derived content. The system may apply a second rule (such as a dialogue rule) to the first derived content to produce rule output and change the state of the human-machine dialogue system based on the rule output. | 06-27-2013 |
20130166293 | AUTOMATIC DISCLOSURE DETECTION - A method of detecting pre-determined phrases to determine compliance quality is provided. The method includes determining whether at least one of an event or a precursor event has occurred based on a comparison between pre-determined phrases and a communication between a sender and a recipient in a communications network, and rating the recipient based on the presence of the pre-determined phrases associated with the event or the presence of the pre-determined phrases associated with the precursor event in the communication. | 06-27-2013 |
20130173265 | Speech-to-online-text system - Speech-to-text software, sometimes known as dictation software, is software that lets you talk to the computer in some form and have the computer react appropriately to what you are saying. This is totally different to text-to-speech software, which is software can read out text already in the computer. Speech-to-online-text software allows you to speak words into the webpage of an Internet capable device. Speech-to-online-text software will also support the capabilities provided by speech-to-text software. The hardware required to support this technology is an Internet capable device and a compatible microphone. This capability will be especially useful for communicating in different languages and dialects around the world. | 07-04-2013 |
20130179165 | DYNAMIC PRESENTATION AID - Performing operations for dynamic display element management. The operations include receiving a verbal input. The operations also include automatically obtaining a display element from an element repository. The display element is a graphical representation of at least a portion of the verbal input. The display element includes a graphical image having a plurality of characteristics. The operations also include evaluating at least one of the plurality of characteristics relative to a present state of a display. The operations also include sending the display element to the display based on the evaluation of the present state of the display. | 07-11-2013 |
20130179166 | VOICE CONVERSION DEVICE, PORTABLE TELEPHONE TERMINAL, VOICE CONVERSION METHOD, AND RECORD MEDIUM - A portable-telephone terminal frees the user from repeatedly performing a correction process. A voice-conversion device includes a voice-recognition unit accepting a voice and converting the voice into a character string; a display unit displaying the character string; a correction unit accepting a correction command that causes a word or a phrase being a part of a character string displayed on the display unit to be corrected and correcting the word or phrase corresponding to the correction command; a storage unit storing a word or a phrase corrected by the correction unit; and a control unit generating a selection candidate corresponding to the corrected word or phrase of the character string and displaying the selection candidate as a recognition-result candidate of the voice on the display unit if the corrected word or phrase has been stored in the storage unit when the voice-recognition unit converts the voice into the character string. | 07-11-2013 |
20130185069 | AMUSEMENT SYSTEM - A technique for allowing a virtual experience of more realistic live performance. A main apparatus reproduces music data and audience video data recording a video image of audience. A user holds a microphone and makes a live performance for the audience displayed on a monitor. The microphone sends voice data and motion information of the microphone to the main apparatus. The main apparatus determines that the user makes a live performance when the user calls on the audience with a specific phrase and performs an action corresponding to the specific phrase. The main apparatus reproduces reaction data recording a video image and sound indicating a reaction of the audience to the live performance. | 07-18-2013 |
20130191125 | TRANSCRIPTION SUPPORTING SYSTEM AND TRANSCRIPTION SUPPORTING METHOD - A transcription supporting system for the conversion of voice data to text data includes a first storage module, a playing module, a voice recognition module, an index generating module, a second storage module, a text forming module, and an estimation module. The first storage module stores the voice data. The playing module plays the voice data. The voice recognition module executes the voice recognition processing on the voice data. The index generating module generates a voice index that makes the plural text strings generated in the voice recognition processing correspond to voice position data. The second storage module stores the voice index. The text forming module forms text corresponding to input of a user correcting or editing the generated text strings. The estimation module estimates the formed voice position indicating the last position in the voice data where the user corrected/confirmed the voice recognition. | 07-25-2013 |
20130197908 | Speech Processing in Telecommunication Networks - Systems and methods for speech processing in telecommunication networks are described. In some embodiments, a method may include receiving speech transmitted over a network, causing the speech to be converted to text, and identifying the speech as predetermined speech in response to the text matching a stored text associated with the predetermined speech. The stored text may have been obtained, for example, by subjecting the predetermined speech to a network impairment condition. The method may further include identifying terms within the text that match terms within the stored text (e.g., despite not being identical to each other), calculating a score between the text and the stored text, and determining that the text matches the stored text in response to the score meeting a threshold value. In some cases, the method may also identify one of a plurality of speeches based on a selected one of a plurality of stored texts. | 08-01-2013 |
20130197909 | METHODS AND SYSTEMS FOR CORRECTING TRANSCRIBED AUDIO FILES - Methods and systems for correcting transcribed text. One method includes receiving audio data from one or more audio data sources and transcribing the audio data based on a voice model to generate text data. The method also includes making the text data available to a plurality of users over at least one computer network and receiving corrected text data over the at least one computer network from the plurality of users. In addition, the method can include modifying the voice model based on the corrected text data. | 08-01-2013 |
20130197910 | System and Method for Audible Text Center Subsystem - A system, method, and computer-readable storage device for sending a spoken message as a text message. The method includes initiating a connection with a first subscriber, receiving from the first subscriber a spoken message and spoken disambiguating information associated with at least one recipient address. The method further includes converting the spoken message to text via an audible text center subsystem (ATCS), and delivering the text to the recipient address. The method can also include verifying a subscription status of the first subscriber, or delivering the text to the recipient address based on retrieved preferences of the first subscriber. The preferences can be retrieved from a consolidated network repository or embedded within the spoken message. Text and the spoken message can be delivered to the same or different recipient addresses. The method can include updating recipient addresses based on a received oral command from the first subscriber. | 08-01-2013 |
20130197911 | Method and System For Endpoint Automatic Detection of Audio Record - A method and system for endpoint automatic detection of audio record is provided. The method comprises the following steps: acquiring a audio record text and affirming the text endpoint acoustic model for the audio record text; starting acquiring the audio record data of each frame in turn from the audio record start frame in the audio record data; affirming the characteristics acoustic model of the decoding optimal path for the acquired current frame of the audio record data; comparing the characteristics acoustic model of the decoding optimal path acquired from the current frame of the audio record data with the endpoint acoustic model to determine if they are the same; if yes, updating a mute duration threshold with a second time threshold, wherein the second time threshold is less than a first time threshold. This method can improve the recognizing efficiency of the audio record endpoint. | 08-01-2013 |
20130204618 | Methods and Systems for Dictation and Transcription - Automated delivery and filing of transcribed material prepared from dictated audio files into a central record-keeping system are presented. A user dictates information from any location, uploads that audio file to a transcriptionist to be transcribed, and the transcribed material is automatically delivered into a central record keeping system, filed with the appropriate client or matter file, and the data stored in the designated appropriate fields within those client or matter files. Also described is the recordation of meetings from multiple sources using mobile devices and the detection of the active or most prominent speaker at given intervals in the meeting. Further, text boxes on websites are completed using an audio recording application and offsite transcription. | 08-08-2013 |
20130204619 | SYSTEMS AND METHODS FOR VOICE-GUIDED OPERATIONS - A method includes transforming textual material data into a multimodal data structure including a plurality of classes selected from the group consisting of output, procedural information, and contextual information to produce transformed textual data, storing the transformed textual data on a memory device, retrieving, in response to a user request via a multimodal interface, requested transformed textual data and presenting the retrieved transformed textual data to the user via the multimodal interface. | 08-08-2013 |
20130211833 | TECHNIQUES FOR OVERLAYING A CUSTOM INTERFACE ONTO AN EXISTING KIOSK INTERFACE - Techniques for overlaying a custom interface onto an existing kiosk interface are provided. An event is detected that triggers a kiosk to process an agent that overlays, and without modifying, the kiosk's existing interface. The agent alters screen features and visual presentation of the existing interface and provides additional alternative operations for navigating and executing features defined in the existing interface. In an embodiment, the agent provides a custom interface overlaid onto the existing interface to provide a customer-facing interface for individuals that are sight impaired. | 08-15-2013 |
20130211834 | AUTOMATED INTERPRETATION OF CLINICAL ENCOUNTERS WITH CULTURAL CUES - A method, system and a computer program product for an automated interpretation and/or translation are disclosed. An automated interpretation and/or translation occurs by receiving language-based content from a user. The received language-based content is processed to interpret and/or translate the received language-based content into a target language. Also, a presence of a cultural sensitivity in the received language-based content is detected. Further, an appropriate guidance for dealing with the detected cultural sensitivity is provided. | 08-15-2013 |
20130226576 | Conference Call Service with Speech Processing for Heavily Accented Speakers - Speech recognition processing captures phonemes of words in a spoken speech string and retrieves text of words corresponding to particular combinations of phonemes from a phoneme dictionary. A text-to-speech synthesizer then can produce and substitute a synthesized pronunciation of that word in the speech string. If the speech recognition processing fails to recognize a particular combination of phonemes of a word, as spoken, as may occur when a word is spoken with an accent or when the speaker has a speech impediment, the speaker is prompted to clarify the word by entry, as text, from a keyboard or the like for storage in the phoneme dictionary such that a synthesized pronunciation of the word can be played out when the initially unrecognized spoken word is again encountered in a speech string to improve intelligibility, particularly for conference calls. | 08-29-2013 |
20130226577 | MESSAGE PREVIEW CONTROL - Embodiments of the invention relate generally to computing devices and systems, as well as software, computer programs, applications, and user interfaces, and more particularly, to systems, devices and methods to facilitate message preview control. For example, the method may include generating representations for messages to present on an interface, and detecting selection of the representation for the message. Further, the method can include presenting preview information for the message, which can be an electronic facsimile. The representations for the messages can include a representation for an electronic facsimile, as well as a voice message and an email. | 08-29-2013 |
20130226578 | ASYNCHRONOUS VIDEO INTERVIEW SYSTEM - Aspects of an asynchronous video interview system and related techniques include a server that receives a plurality of pre-recorded video prompts, generates an interview script, transmits a video prompt from the interview script to be displayed at a client computing device, and receives a streamed video response from the client computing device. The server can perform algorithmic analysis on content of the video response. In another aspect, a server obtains response preference data indicating a timing parameter for a response. In another aspect, a video prompt and an information supplement (e.g., a news item) that relates to the content of the video prompt are transmitted. In another aspect, a server automatically selects a video prompt (e.g., a follow-up question) to be displayed at the client computing device (e.g., based on a response or information about an interviewee). | 08-29-2013 |
20130226579 | SYSTEMS AND METHODS FOR INTERACTIVELY ACCESSING HOSTED SERVICES USING VOICE COMMUNICATIONS - Systems and methods for an interactive voice response system are described herein. In one embodiment, the system may include a voice recognition module, a session manager, and a voice generator module. An utterance received at the voice recognition module may be converted into one or more structures using a lexicon tied to an ontology. Concepts in the utterance may then be identified. If sufficient information has been provided to identify a relevant service, corresponding text responses associated with that service may then be converted into voice messages by the voice generator. | 08-29-2013 |
20130231930 | METHOD AND APPARATUS FOR AUTOMATICALLY FILTERING AN AUDIO SIGNAL - A computer implemented method and apparatus for automatically filtering an audio input to make a filtered recording comprising: identifying words used in an audio input, determining whether each identified word is contained in a dictionary of banned words, and creating a filtered recording as an audio output, wherein each word identified in the audio input that is found in the dictionary of banned words, is automatically deleted or replaced in the audio output used to make the filtered recording. | 09-05-2013 |
20130231931 | SYSTEM, METHOD, AND APPARATUS FOR GENERATING, CUSTOMIZING, DISTRIBUTING, AND PRESENTING AN INTERACTIVE AUDIO PUBLICATION - Systems, methods, and apparatuses for generating, customizing, distributing, and presenting an interactive audio publication to a user are provided. A plurality of text-based and/or speech-based content items is converted into voice-navigable interactive audio content items that include segmented audio data, embedded visual content, and accompanying metadata. An audio publication is generated by associating one or more audio content items with one or more audio publication sections, and generating metadata that defines the audio publication structure. Assembled audio publications may be used to generate one or more new custom audio publications for a user by utilizing one or more user-defined custom audio publication templates. Audio publications are delivered to a user for presentation on an enabled presentation system. The user is enabled to navigate and interact with the audio publication, using voice commands and/or a button interface, in a manner similar to browsing visually-oriented content. | 09-05-2013 |
20130238329 | METHODS AND APPARATUS FOR GENERATING CLINICAL REPORTS - Techniques for documenting a clinical procedure involve transcribing audio data comprising audio of one or more clinical personnel speaking while performing the clinical procedure. Examples of applicable clinical procedures include sterile procedures such as surgical procedures, as well as non-sterile procedures such as those conventionally involving a core code reporter. The transcribed audio data may be analyzed to identify relevant information for documenting the clinical procedure, and a text report including the relevant information documenting the clinical procedure may be automatically generated. | 09-12-2013 |
20130238330 | METHODS AND APPARATUS FOR GENERATING CLINICAL REPORTS - Techniques for documenting a clinical procedure involve transcribing audio data comprising audio of one or more clinical personnel speaking while performing the clinical procedure. Examples of applicable clinical procedures include sterile procedures such as surgical procedures, as well as non-sterile procedures such as those conventionally involving a core code reporter. The transcribed audio data may be analyzed to identify relevant information for documenting the clinical procedure, and a text report including the relevant information documenting the clinical procedure may be automatically generated. | 09-12-2013 |
20130238331 | Transcoding Voice to/from Text Based on Location of a Communication Device - A device, method, and system for routing communications to an output of a communications device, such as a mobile telephone, based on the format of an incoming communication and an output mode of the communications device is disclosed. An incoming speech communication can be delivered to a speaker output or forwarded to a format converter to create a text communication that can be delivered to a display output. An incoming text communication can be delivered to a display output or forwarded to a format converter to create a speech communication for delivery to a speaker output. The output mode of the communication device can be set according to device settings, application settings, or location of the device, or a combination thereof. The invention provides new delivery options for communications which can be inure appropriate for a location or current use of the communication device than those previously available. | 09-12-2013 |
20130246063 | System and Methods for Providing Animated Video Content with a Spoken Language Segment - A system and methods are disclosed which provide simple and rapid animated content creation, particularly for more life-like synthesis of voice segments associated with an animated element. A voice input tool enables quick creation of spoken language segments for animated characters. Speech is converted to text. That text may be reconverted to speech with prosodic elements added. The text, prosodic elements, and voice may be edited. | 09-19-2013 |
20130253926 | SPEECH DIALOGUE SYSTEM, TERMINAL APPARATUS, AND DATA CENTER APPARATUS - A speech dialogue system includes a data center apparatus and a terminal apparatus. The data center apparatus acquires answer information for request information obtained in a speech recognition process for speech data from a terminal apparatus, creates a scenario including the answer information, creates first synthesized speech data concerning the answer information, transmits the first synthesized speech data to the terminal apparatus, and transmits the scenario to the terminal apparatus while the first synthesized speech data is being created in the creating the first synthesized speech data. The terminal apparatus creates second synthesized speech data concerning the answer information in the received scenario, receives the first synthesized speech data, selects one of the first synthesized speech data and the second synthesized speech data based on a determination result regarding whether the reception of the first synthesized speech data is completed, and reproduces speech. | 09-26-2013 |
20130253927 | METHOD AND APPARATUS FOR ANALYZING DISCUSSION REGARDING MEDIA PROGRAMS - A process and system including a device including a controller to detect a plurality of users engaging in a voice conference related to a presentation of a media program, convert speech dialog detected in the voice conference to textual dialog, detect from the textual dialog a behavioral profile of at least one of the plurality of users, and identify at least one of advertisement content and marketable media content for the plurality of users based on the behavioral profile of the at least one user. Other embodiments are disclosed. | 09-26-2013 |
20130253928 | Voice Control For Asynchronous Notifications - A computing device may receive an incoming communication and, in response, generate a notification that indicates that the incoming communication can be accessed using a particular application on the communication device. The computing device may further provide an audio signal indicative of the notification and automatically activate a listening mode. The computing device may receive a voice input during the listening mode, and an input text may be obtained based on speech recognition performed upon the voice input. A command may be detected in the input text. In response to the command, the computing device may generate an output text that is based on at least the notification and provide a voice output that is generated from the output text via speech synthesis. The voice output identifies at least the particular application. | 09-26-2013 |
20130253929 | MOBILE SYSTEMS AND METHODS OF SUPPORTING NATURAL LANGUAGE HUMAN-MACHINE INTERACTIONS - A mobile system is provided that includes speech-based and non-speech-based interfaces for telematics applications. The mobile system identifies and uses context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for users that submit requests and/or commands in multiple domains. The invention creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command. The invention may organize domain specific behavior and information into agents, that are distributable or updateable over a wide area network. | 09-26-2013 |
20130262103 | Verbal Intelligibility Analyzer for Audio Announcement Systems - A device and method are disclosed for testing the intelligibility of audio announcement systems. The device may include a microphone, a translation engine, a processor, a memory associated with the processor, and a display. The microphone of the analyzer may be coupled to the translation engine, which in-turn may be coupled to the processor, which is in-turn may be coupled to the memory and the display. The translation engine can convert audio speech input from the microphone into data output. The processor can receive the data output and can apply a scoring algorithm thereto. The algorithm can compare the received data against data that is stored in the memory of the analyzer and calculates the accuracy of the received data. The algorithm may translate the calculated accuracy into a standardized STI intelligibility score that is then presented on the display of the analyzer. | 10-03-2013 |
20130262104 | Procurement System - A procurement system may include a first interface configured to receive a query from a user, a command module configured to parameterize the query, an intelligent search and match engine configured to compare the parameterized query with stored queries in a historical knowledge base and, in the event the parameterized query does not match a stored query within the historical knowledge base, search for a match in a plurality of knowledge models, and a response solution engine configured to receive a system response ID from the intelligent search and match engine, the response solution engine being configured to initiate a system action by interacting with sub-system and related databases to generate a system response. | 10-03-2013 |
20130262105 | DYNAMIC LONG-DISTANCE DEPENDENCY WITH CONDITIONAL RANDOM FIELDS - Dynamic features are utilized with CRFs to handle long-distance dependencies of output labels. The dynamic features present a probability distribution involved in explicit distance from/to a special output label that is pre-defined according to each application scenario. Besides the number of units in the segment (from the previous special output label to the current unit), the dynamic features may also include the sum of any basic features of units in the segment. Since the added dynamic features are involved in the distance from the previous specific label, the searching lattice associated with Viterbi searching is expanded to distinguish the nodes with various distances. The dynamic features may be used in a variety of different applications, such as Natural Language Processing, Text-To-Speech and Automatic Speech Recognition. For example, the dynamic features may be used to assist in prosodic break and pause prediction. | 10-03-2013 |
20130262106 | METHOD AND SYSTEM FOR AUTOMATIC DOMAIN ADAPTATION IN SPEECH RECOGNITION APPLICATIONS - A system and method for adapting a language model to a specific environment by receiving interactions captured the specific environment, generating a collection of documents from documents retrieved from external resources, detecting in the collection of documents terms related to the environment that are not included in an initial language model and adapting the initial language model to include the terms detected. | 10-03-2013 |
20130262107 | Multimodal Natural Language Query System for Processing and Analyzing Voice and Proximity-Based Queries - The disclosure provides a natural language query system and method for processing and analyzing multimodally-originated queries, including voice and proximity-based queries. The natural language query system includes a Web-enabled device including a speech input module for receiving a voice-based query in natural language form from a user and a location/proximity module for receiving location/proximity information from a location/proximity device. The query system also includes a speech conversion module for converting the voice-based query in natural language form to text in natural language form and a natural language processing module for converting the text in natural language form to text in searchable form. The query system further includes a semantic engine module for converting the text in searchable form to a formal database query and a database-look-up module for using the formal database query to obtain a result related to the voice-based query in natural language form from a database. | 10-03-2013 |
20130262108 | AUDIO SYNCHRONIZATION FOR DOCUMENT NARRATION WITH USER-SELECTED PLAYBACK - Disclosed are techniques and systems to provide a narration of a text. In some aspects, the techniques and systems described herein include generating a timing file that includes elapsed time information for expected portions of text that provides an elapsed time period from a reference time in an audio recording to each portion of text in recognized portions of text | 10-03-2013 |
20130262109 | TEXT TO SPEECH METHOD AND SYSTEM - A text-to-speech method for simulating a plurality of different voice characteristics includes dividing inputted text into a sequence of acoustic units; selecting voice characteristics for the inputted text; converting the sequence of acoustic units to a sequence of speech vectors using an acoustic model having a plurality of model parameters provided in clusters each having at least one sub-cluster and describing probability distributions which relate an acoustic unit to a speech vector; and outputting the sequence of speech vectors as audio with the selected voice characteristics. A parameter of a predetermined type of each probability distribution is expressed as a weighted sum of parameters of the same type using voice characteristic dependent weighting. In converting the sequence of acoustic units to a sequence of speech vectors, the voice characteristic dependent weights for the selected voice characteristics are retrieved for each cluster such that there is one weight per sub-cluster. | 10-03-2013 |
20130262110 | Unsupervised Language Model Adaptation for Automated Speech Scoring - Systems and methods are provided for generating a transcript of a speech sample response to a test question. The speech sample response to the test question is provided to a language model, where the language model is configured to perform an automated speech recognition function. The language model is adapted to the test question to improve the automated speech recognition function by providing to the language model automated speech recognition data related to the test question, Internet data related to the test question, or human-generated transcript data related to the test question. The transcript of the speech sample is generated using the adapted language model. | 10-03-2013 |
20130262111 | AUTOMATED VOICE AND SPEECH LABELING - A system and method for voice and speech analysis which correlates a speaker signal source and a normalized signal comprising measurements of input acoustic data to a database of language, dialect, accent, and/or speaker attributes in order to create a transcription of the input acoustic data. | 10-03-2013 |
20130262112 | METHOD AND SYSTEM FOR USING CONVERSATIONAL BIOMETRICS AND SPEAKER IDENTIFICATION/VERIFICATION TO FILTER VOICE STREAMS - A method implemented in a computer infrastructure having computer executable code having programming instructions tangibly embodied on a computer readable storage medium. The programming instructions are operable to receive an audio stream of a communication between a plurality of participants. Additionally, the programming instructions are operable to filter the audio stream of the communication into separate audio streams, one for each of the plurality of participants, wherein each of the separate audio streams contains portions of the communication attributable to a respective participant of the plurality of participants. Furthermore, the programming instructions are operable to output the separate audio streams to a storage system. | 10-03-2013 |
20130262113 | METHOD AND SYSTEM FOR PROCESSING DICTATED INFORMATION - A method and system for processing dictated information into a dynamic form are disclosed. The method comprises presenting an image ( | 10-03-2013 |
20130275129 | METHODS, APPARATUSES, AND SYSTEMS FOR PROVIDING TIMELY USER CUES PERTAINING TO SPEECH RECOGNITION - An automatic speech recognition engine may generate text or tokens that correspond to audio data. For example, the automatic speech recognition engine may generate first text or first speech tokens corresponding to a first portion of audio data. The automatic speech recognition engine may further generate second text or second speech tokens that correspond to a first portion of the audio data and a second portion of the audio data. The text or speech tokens generated by the automatic speech recognition engine may be provided to a device for presentation thereon. In some embodiments, the automatic speech recognition engine generates the second text or second speech tokens substantially while the first text or first speech tokens are presented on the device. | 10-17-2013 |
20130275130 | SPEECH RECOGNITION APPARATUS, METHOD OF RECOGNIZING SPEECH, AND COMPUTER READABLE MEDIUM FOR THE SAME - A speech recognition apparatus includes: a recognition device that recognizes a speech of a user and generates a speech character string; a display device that displays the speech character string; a reception device that receives an input of a correction character string, which is used for correction of the speech character string, through an operation portion; and a correction device that corrects the speech character string with using the correction character string. | 10-17-2013 |
20130275131 | METHOD AND SYSTEM FOR DYNAMIC CREATION OF CONTEXTS - A method and a system for a speech recognition system, comprising an electronic speech-based document is associated with a document template and comprises one or more sections of text recognized or transcribed from sections of speech. The sections of speech are transcribed by the speech recognition system into corresponding sections of text of the electronic speech based document. The method includes the steps of dynamically creating sub contexts and associating the sub context to sections of text of the document template. | 10-17-2013 |
20130275132 | Method and Apparatus for Automatically Building Conversational Systems - A system and method provides a natural language interface to world-wide web content. Either in advance or dynamically, webpage content is parsed using a parsing algorithm. A person using a telephone interface can provide speech information, which is converted to text and used to automatically fill in input fields on a webpage form. The form is then submitted to a database search and a response is generated. Information contained on the responsive webpage is extracted and converted to speech via a text-to-speech engine and communicated to the person. | 10-17-2013 |
20130275133 | Electronic Pen with Printable Arrangement - An electronic pen includes a voice recognition module for converting voice messages to text messages, a digital scanning model for scanning the content of the digital copy and on the paper, a storage module, an interaction and control module, and a printing module for exporting the content selecting from the electronic pen through a sliding movement on the paper. | 10-17-2013 |
20130275134 | INFORMATION EQUIPMENT - An information equipment for displaying shortcut keys for operating the information equipment on a screen and rearranging the displayed shortcut keys on the screen based on a recognition result of an input voice, the information equipment including: a voice recognition processor referring to a recognition dictionary database memory to output text as a recognition result of an input voice; and a shortcut key rearranging mechanism referring to a conversion database memory, where an association relation between a function of the information equipment and text is written, mapping the recognition result text onto a function of the information equipment, and displaying a shortcut key to the function on the display screen. | 10-17-2013 |
20130289983 | ELECTRONIC DEVICE AND METHOD OF CONTROLLING THE SAME - An electronic device and a method of controlling the electronic device are provided. According to an embodiment, the electronic device may recognize a first sound signal output from at least one external device connectable through a communication unit and to control a sound output of at least one of the at least one external device or the sound output unit when a second sound signal is output through the sound output unit. | 10-31-2013 |
20130289984 | Preserving Privacy in Natural Language Databases - An apparatus and a method for preserving privacy in natural language databases are provided. Natural language input may be received. At least one of sanitizing or anonymizing the natural language input may be performed to form a clean output. The clean output may be stored. | 10-31-2013 |
20130289985 | System and Method for Generating User Models From Transcribed Dialogs - Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for generating personalized user models. The method includes receiving automatic speech recognition (ASR) output of speech interactions with a user, receiving an ASR transcription error model characterizing how ASR transcription errors are made, generating guesses of a true transcription and a user model via an expectation maximization (EM) algorithm based on the error model and the respective ASR output where the guesses will converge to a personalized user model which maximizes the likelihood of the ASR output. The ASR output can be unlabeled. The method can include casting speech interactions as a dynamic Bayesian network with four variables: (s), (u), (r), (m), and encoding relationships between (s), (u), (r), (m) as conditional probability tables. At each dialog turn (r) and (m) are known and (s) and (u) are hidden. | 10-31-2013 |
20130289986 | WEARABLE HEADSET WITH SELF-CONTAINED VOCAL FEEDBACK AND VOCAL COMMAND - A headset includes a wearable body, first and second earphones extending from the wearable body, controls for controlling an external communication/multimedia device wirelessly, a microphone for picking up vocal data from a user of the headset system and a signal processing unit. The signal processing unit includes circuitry for processing the vocal data into a distinctly audible vocal feedback signal, circuitry for enhancing the vocal feedback signal thereby producing an enhanced vocal feedback signal and circuitry for mixing the enhanced vocal feedback signal with audio signals originating from the external communication/multimedia device, thereby producing a mixed output signal and then sending the mixed output signal to the user via the earphones. The external communication/multimedia device comprises a vocal command application and the headset further comprises a vocal command control for sending vocal commands to the external communication/multimedia device and to the vocal command application. | 10-31-2013 |
20130297307 | DICTATION WITH INCREMENTAL RECOGNITION OF SPEECH - A dictation module is described herein which receives and interprets a complete utterance of the user in incremental fashion, that is, one incremental portion at a time. The dictation module also provides rendered text in incremental fashion. The rendered text corresponds to the dictation module's interpretation of each incremental portion. The dictation module also allows the user to modify any part of the rendered text, as it becomes available. In one case, for instance, the dictation module provides a marking menu which includes multiple options by which a user can modify a selected part of the rendered text. The dictation module also uses the rendered text (as modified or unmodified by the user using the marking menu) to adjust one or more models used by the dictation model to interpret the user's utterance. | 11-07-2013 |
20130297308 | METHOD FOR DISPLAYING TEXT ASSOCIATED WITH AUDIO FILE AND ELECTRONIC DEVICE - The present disclosure may provide an electronic device capable of audio recording. The electronic device may include a recording function unit configured to record an external sound to store it as an audio file, a conversion unit configured to convert a voice contained in the sound into a text based on a speech-to-text (STT) conversion, and a controller configured to detect a core keyword from the text, and set the detected core keyword to at least part of a file name for the audio file. | 11-07-2013 |
20130297309 | PERFORMING SPEECH RECOGNITION OVER A NETWORK AND USING SPEECH RECOGNITION RESULTS - Systems, methods and apparatus for generating, distributing, and using speech recognition models. A shared speech processing facility is used to support speech recognition for a wide variety of devices with limited capabilities including business computer systems, personal data assistants, etc., which are coupled to the speech processing facility via a communications channel, e.g., the Internet. Devices with audio capture capability record and transmit to the speech processing facility, via the Internet, digitized speech and receive speech processing services, e.g., speech recognition model generation and/or speech recognition services, in response. The Internet is used to return speech recognition models and/or information identifying recognized words or phrases. The speech processing facility can be used to provide speech recognition capabilities to devices without such capabilities and/or to augment a device's speech processing capability. Voice dialing, telephone control and/or other services are provided by the speech processing facility in response to speech recognition results. | 11-07-2013 |
20130304465 | METHOD AND SYSTEM FOR AUDIO-VIDEO INTEGRATION - A method of dictation is described which, after transcription, integrates video into text at locations designated by the user. A user collects audio and visual information using an application that has been installed on the user's mobile device. The user designates, using the application, the desired location of the video files within the audio file. Both the audio and video information are uploaded to a transcription provider. The transcription provider uses transcription software to transcribe the audio files into text, the transcription software being able to identify the transcriptionist as to the location within the audio file each video is to be inserted. The transcribed document with integrated audio and video is then delivered back to the user. | 11-14-2013 |
20130304466 | METHOD AND DEVICE FOR PROVIDING SPEECH-TO-TEXT ENCODING AND TELEPHONY SERVICE - A machine-readable medium and a network device are provided for speech-to-text translation. Speech packets are received at a broadband telephony interface and stored in a buffer. The speech packets are processed and textual representations thereof are displayed as words on a display device. Speech processing is activated and deactivated in response to a command from a subscriber. | 11-14-2013 |
20130304467 | Word-Level Correction of Speech Input - The subject matter of this specification can be implemented in, among other things, a computer-implemented method for correcting words in transcribed text including receiving speech audio data from a microphone. The method further includes sending the speech audio data to a transcription system. The method further includes receiving a word lattice transcribed from the speech audio data by the transcription system. The method further includes presenting one or more transcribed words from the word lattice. The method further includes receiving a user selection of at least one of the presented transcribed words. The method further includes presenting one or more alternate words from the word lattice for the selected transcribed word. The method further includes receiving a user selection of at least one of the alternate words. The method further includes replacing the selected transcribed word in the presented transcribed words with the selected alternate word. | 11-14-2013 |
20130311177 | AUTOMATED COLLABORATIVE ANNOTATION OF CONVERGED WEB CONFERENCE OBJECTS - A methodology may be provided that automatically annotates web conference application sharing (e.g., sharing scenes and/or slides) based on voice and/or web conference data. In one specific example, a methodology may be provided that threads the annotations and assigns authorship to the correct resources. | 11-21-2013 |
20130311178 | METHOD AND ELECTRONIC DEVICE FOR EASILY SEARCHING FOR VOICE RECORD - A method of writing a specific time point through a familiar pattern input that can be instantaneously applied to a portion desired to be memorize or highlighted by the user during audio recording. An electronic device according to an embodiment disclosed in the present disclosure may include a storage unit configured to store audio data and the recording information of the audio data; a controller configured to convert an input audio signal into audio data to store the audio data; a display unit configured to display one or more texts based on the execution of a speech-to-text (STT) for the input audio signal; and an input unit configured to receive a specific pattern input or a selection input for part of the texts from the user while receiving the audio signal. | 11-21-2013 |
20130311179 | Electronic Device with Text Error Correction Based on Voice Recognition Data - During operation of an electronic device such as a cellular telephone with a touch screen display or other electronic equipment, a voice recognition engine may gather data on spoken words. Data on the spoken words that are recognized may be maintained in a spoken word database maintained by an input processor with an autocorrection engine. A user may supply text input that contains mistyped words to the electronic device using the touch screen or a keyboard. The input processor may use the autocorrection engine to automatically replace mistyped words with corrected versions of the mistyped words. The corrected words may be displayed in real time as the user supplies the text input. The autocorrection engine may make word correction decisions based at least partly on information in the spoken word database. | 11-21-2013 |
20130311180 | REMOTE ACCESS SYSTEM AND METHOD AND INTELLIGENT AGENT THEREFOR - The invention relates to remote access systems and methods using automatic speech recognition to access a computer system. The invention also relates to an intelligent agent resident on the computer system for facilitating remote access to, and receipt of, information on the computer system through speech recognition or text-to-speech read-back. The remote access systems and methods can be used by a user of the computer system while traveling. The user can dial into a server system which is configured to interact with the user by automatic speech recognition and text-to-speech conversion. The server system establishes a connection to an intelligent agent running on the user's remotely located computer system by packet communication over a public network. The intelligent agent sources information on the user's computer system or a network accessible to the computer system, processes the information and transmits it to the server system over the public network. The server system converts the information into speech signals and transmits the speech signals to a telephone operated by the user. | 11-21-2013 |
20130311181 | SYSTEMS AND METHODS FOR IDENTIFYING CONCEPTS AND KEYWORDS FROM SPOKEN WORDS IN TEXT, AUDIO, AND VIDEO CONTENT - Systems for identifying, summarizing, and communicating topics and keywords included within an input file are disclosed. The systems include a server that receives one or more input files from an external source; conducts a speech-to-text transcription (when the input file is an audio or video file); and applies an algorithm to the text in order to analyze the content therein. The algorithm calculates a total score for each word included within the text, which is calculated using a variety of metrics that include: a length of each word in relation to a mean length of words, the frequency of letter groups used within each word, the frequency of repetition of each word and word sequences, a part of speech that is represented by each word, and membership of each word within a custom set of words. The systems are further capable of generating a graphical representation of each input file, which depicts those parts of the input file that exhibit a higher total score from those that do not. In addition, the systems allow users to publish commentary—through an email interface—to such graphical representations of the input files. | 11-21-2013 |
20130317817 | Method and Apparatus for Applying Steganography in a Signed Model - Computer models are powerful resources that can be accessed by remote users. Models can be copied without authorization or can become an out-of-date version. A model with a signature, referred to herein as a “signed” model, can indicate the signature without affecting usage by users who are unaware that the model contains the signature. The signed model can respond to an input in a steganographic way such that only the designer of the model knows that the signature is embedded in the model. The response is a way to check the source or other characteristics of the model. The signed model can include embedded signatures of various degrees of detectability to respond to select steganographic inputs with steganographic outputs. In this manner, a designer of signed models can prove whether an unauthorized copy of the signed model is being used by a third party while using publically-available user interfaces. | 11-28-2013 |
20130317818 | Systems and Methods for Captioning by Non-Experts - Methods and systems for captioning speech in real-time are provided. Embodiments utilize captionists, who may be non-expert captionists, to transcribe a speech using a worker interface. Each worker is provided with the speech or portions of the speech, and is asked to transcribe all or portions of what they receive. The transcriptions received from each worker are aligned and combined to create a resulting caption. Automated speech recognition systems may be integrated by serving in the role of one or more workers, or integrated in other ways. Workers may work locally (able to hear the speech) and/or workers may work remotely, the speech being provided to them as an audio stream. Worker performance may be measured and used to provide feedback into the system such that overall performance is improved. | 11-28-2013 |
20130317819 | System and Method for Unsupervised and Active Learning for Automatic Speech Recognition - A system and method is provided for combining active and unsupervised learning for automatic speech recognition. This process enables a reduction in the amount of human supervision required for training acoustic and language models and an increase in the performance given the transcribed and un-transcribed data. | 11-28-2013 |
20130325461 | BUSINESS PLATFORM VOICE INTERACTIONS - According to some embodiments, a user device may receive business enterprise information from a remote enterprise server. The user device may then automatically convert at least some of the business enterprise information into speech output provided to a user of the user device. Speech input from the user may be received via and converted by the user device. The user device may then interact with the remote enterprise server in accordance with the converted speech input and the business enterprise information. | 12-05-2013 |
20130325462 | AUTOMATIC TAG EXTRACTION FROM AUDIO ANNOTATED PHOTOS - A system and method for assigning one or more tags to an image file. In one aspect, a server computer receives an image file captured by a client device. In one embodiment, the image file includes an audio component embedded therein by the client device, where the audio component was spoken by a user of the client device as a tag of the image file. The server computer determines metadata associated with the image file and identifies a dictionary of potential textual tags from the metadata. The server computer determines a textual tag from the audio component and from the dictionary of potential textual tags. The server computer then associates the textual tag with the image file as additional metadata. | 12-05-2013 |
20130325463 | SYSTEM, APPARATUS, AND METHOD FOR IDENTIFYING RELATED CONTENT BASED ON EYE MOVEMENTS - An apparatus, system, and method for identifying related content based on eye movements are disclosed. The system, apparatus, and method display content to a user concurrently in two or more windows, identify areas in each of the windows where a user's eyes focus, extract keywords from the areas where the user's eyes focus in each of the windows, search a communications network for related content using the keywords, and notify the user of the related content by displaying it concurrently with the two or more windows. The keywords are extracted from one or more locations in the two or more windows in which the user's eyes pause for a predetermined amount of time or, when the user's eyes pause on an image, from at least one of the text adjacent to and the metadata associated with that image. | 12-05-2013 |
20130325464 | METHOD FOR DISPLAYING WORDS AND PROCESSING DEVICE AND COMPUTER PROGRAM PRODUCT THEREOF - The disclosure provides a method for displaying words. In the method, a speech signal is received. A pitch contour and an energy contour of the speech signal are extracted. Speech recognition is performed on the speech signal to recognize a plurality of words corresponding to the speech signal and determine time alignment information of each of the plurality of words. At least one display parameter of each of the plurality of words is determined according to the pitch contour, the energy contour and the time alignment information of each of the plurality of words. Thus, the plurality of words is integrated into a sentence according to the at least one display parameter of each of the plurality of words. Then, the sentence is displayed on at least one display device. | 12-05-2013 |
20130325465 | MEDICAL IMAGE READING SYSTEM - A teleradiology processing system comprises a central reading station having a plurality of computer work areas, a plurality of transcriptionist stations each having a transcriptionist computer and a shared audio communication channel that facilitates audio communication between the central reading station and each of the transcriptionist stations. A database stores a plurality of medical case files, where the database is accessible by each of the plurality of transcriptionist computers. In operation, each transcriptionist computer takes control of a uniquely assigned one of the plurality of computer work areas of the central reading station to provide information from a pre-fetched medical case. Moreover, each transcriptionist computer prepares a report based upon a received code transmitted to the corresponding transcriptionist station over the shared audio communication channel. Each report, once approved by the specialist at the central reading station, is written to the database. | 12-05-2013 |
20130325466 | SYSTEM AND METHOD FOR CONTROLLING INTERACTIVE VIDEO USING VOICE - System for controlling interactive video using voice commands incorporates: a control module, a voice-to-text conversion module, a text comparison module, an interactive content overlay module, a dynamic content display module, and a content storage module. The control module is implemented as a mobile application executing on a suitable mobile device, which is configured to transmit voice commands received from the user to the server. The voice-to-text conversion module operates on the server and performs conversion of the user's voice commands to text commands. The text comparison module is deployed on provider's server and performs comparison of the words in the converted text commands received from the conversion module with the keywords stored in the interactive video file. The interactive content overlay module displays the interactive content corresponding to the user's commands by overlaying it over the video content displayed to the user. The dynamic content display module dynamically displays the interactive content to the user. The content storage module controls the storing of the content on the server. | 12-05-2013 |
20130325467 | SYSTEMS AND METHODS FOR PRESENTING AUDIO MESSAGES - Systems and methods for presenting audio messages are provided. In some aspects, a method includes receiving an audio message from a first user and generating a text-based representation of the audio message. The method also includes generating one or more identification tags based on the text-based representation of the audio message. At least one of the one or more identification tags includes a subject of the audio message. The method also includes presenting at least one of the text-based representation of the audio message or the one or more identification tags to a second user using a graphical user interface. | 12-05-2013 |
20130332158 | Apparatus and Methods Using a Pattern Matching Speech Recognition Engine to Train a Natural Language Speech Recognition Engine - The technology of the present application provides a speech recognition system with at least two different speech recognition engines or a single engine speech recognition engine with at least two different modes of operation. The first speech recognition being used to match audio to text, which text may be words or phrases. The matched audio and text is used by a training module to train a user profile for a natural language speech recognition engine, which is at least one of the two different speech recognition engines or modes. An evaluation module evaluates when the user profile is sufficiently trained to convert the speech recognition engine from the first speech recognition engine or mode to the natural language speech recognition or mode. | 12-12-2013 |
20130332159 | USING FAN THROTTLING TO ENHANCE DICTATION ACCURACY - A dictation computer that includes a fan speed regulator is described. The fan speed regulator monitors a speech recognition unit to determine when the speech recognition unit is activated. Upon detection that the speech recognition unit is activated, the fan speed regulator ducks the speed of a cooling fan embedded within the dictation computer to an optimized speed of rotation over a delay time interval. The fan speed regulator may include components to adapt the optimized speed and delay time to the characteristics of the dictation computer and the user. Other embodiments are also described. | 12-12-2013 |
20130332160 | SMART PHONE WITH SELF-TRAINING, LIP-READING AND EYE-TRACKING CAPABILITIES - Smartphones and other portable electronic devices include self-training, lip-reading, and/or eye-tracking capabilities. In one disclosed method, an eye-tracking application is operative to use the video camera of the device to track the eye movements of the user while text is being entered or read on the display. If it is determined that the user is moving at a rate of speed associated with motor vehicle travel, as though GPS or other methods, a determination is made if the user is engaged in a text-messaging session, and if the user is looking away from the device during the text-messaging session assumptions may be made about texting while driving, including corrective actions. | 12-12-2013 |
20130332161 | Distributed Dictation/Transcription System - A distributed dictation/transcription system is provided. The system provides a client station, dictation manager, and dictation server networked such that the dictation manager selects a dictation server to transcribe audio from the client station. The dictation manager selects one of a plurality of dictation servers based on conventional load balancing and on a determination of whether the user profile is already uploaded to a dictation server. While selecting a dictation server or uploading a profile, the client may begin dictating, which audio would be stored in a buffer of dictation manager until a dictation server was selected or available. The user may receive in real time or near real time a display of the textual data that may be corrected by the user to update the user profile. | 12-12-2013 |
20130332162 | Systems and Methods for Recognizing Textual Identifiers Within a Plurality of Words - Methods and systems for recognizing textual identifiers within a plurality of words are described. A textual representation of a voice input is received from a user. The textual representation includes a plurality of words. A keyword is identified in the textual representation. It is determined whether one or more words adjacent to the keyword correspond to a textual identifier of a collection of textual identifiers. Responsive to a determination that the one or more adjacent words correspond to a textual identifier, the keyword and the one or more adjacent words are replaced with the textual identifier. | 12-12-2013 |
20130339014 | Channel Normalization Using Recognition Feedback - A computer-implemented arrangement is described for performing cepstral mean normalization (CMN) in automatic speech recognition. A current CMN function is stored in a computer memory as a previous CMN function. The current CMN function is updated based on a current audio input to produce an updated CMN function. The updated CMN function is used to process the current audio input to produce a processed audio input. Automatic speech recognition of the processed audio input is performed to determine representative text. If the audio input is not recognized as representative text, the updated CMN function is replaced with the previous CMN function. | 12-19-2013 |
20130339015 | TERMINAL APPARATUS AND CONTROL METHOD THEREOF - A terminal apparatus is provided. The terminal apparatus includes a voice collecting unit which collects a user's voice, a communicating unit which transmits the collected user's voice to an external server and which receives response information in response to the user's voice, a voice converting unit which converts the response information into voice signals, a voice outputting unit which outputs the converted voice signals, and a controller which analyzes at least one of a frequency and a tone of the collected user's voice, and controls so that the response information is converted to have the voice signals having the voice features corresponding to the analyzed result. | 12-19-2013 |
20130339016 | SPEECH RECOGNITION AND TRANSCRIPTION AMONG USERS HAVING HETEROGENEOUS PROTOCOLS - A system for facilitating free form dictation, including directed dictation and constrained recognition and/or structured transcription among users having heterogeneous protocols for generating, transcribing, and exchanging recognized and transcribed speech. The system includes a system transaction manager having a “system protocol,” to receive a speech information request from an authorized user. The speech information request is generated using a user interface capable of bi-directional communication with the system transaction manager and supporting dictation applications. A speech recognition and/or transcription engine (ASR), in communication with the system transaction manager, receives the speech information request, generates a transcribed response, and transmits the response to the system transaction manager. The system transaction manager routes the response to one or more of the users. In another embodiment, the system employs a virtual sound driver for streaming free form dictation to any ASR. | 12-19-2013 |
20130339017 | Speech to Message Processing - Voice message processors are configured to produce text representations of voice messages. The text representations can be compacted based on one or more abbreviation libraries or rule libraries. Abbreviation processing can be applied to produce a compact text representation based on display properties of a destination device or to enhance user perception. Text representation length can be reduced based on abbreviations in a standard abbreviation list, a user specific abbreviation list, or a combination of standard and custom lists. In some examples, text length is shortened based on stored rules. | 12-19-2013 |
20130346075 | FACILITATION OF CONCURRENT CONSUMPTION OF MEDIA CONTENT BY MULTIPLE USERS USING SUPERIMPOSED ANIMATION - Embodiments of apparatus, computer-implemented methods, systems, devices, and computer-readable media are described herein for facilitation of concurrent consumption of media content by a first user of a first computing device and a second user of a second computing device. In various embodiments, facilitation may include superimposition of an animation of the second user over the media content presented on the first computing device, based on captured visual data of the second user received from the second computing device. In various embodiments, the animation may be visually emphasized on determination of the first user's interest in the second user. In various embodiments, facilitation may include conditional alteration of captured visual data of the first user based at least in part on whether the second user has been assigned a trusted status, and transmittal of the altered or unaltered visual data of the first user to the second computing device. | 12-26-2013 |
20130346076 | VISUAL CONFIRMATION OF VOICE RECOGNIZED TEXT INPUT - A computing device receives an audio input from a user. The computing device determines a series of words from the audio input. The computing device outputs, for display, one or more substituted symbols. The one or more substituted symbols correspond to at least a portion of the series of words. In response to determining that receipt of the audio input has completed, the computing device outputs, for display, alphanumeric characters comprising the series of words in place of the one or more substituted symbols. | 12-26-2013 |
20130346077 | DYNAMIC LANGUAGE MODEL - Methods, systems, and apparatus, including computer programs encoded on computer storage media, for speech recognition. One of the methods includes receiving a base language model for speech recognition including a first word sequence having a base probability value; receiving a voice search query associated with a query context; determining that a customized language model is to be used when the query context satisfies one or more criteria associated with the customized language model; obtaining the customized language model, the customized language model including the first word sequence having an adjusted probability value being the base probability value adjusted according to the query context; and converting the voice search query to a text search query based on one or more probabilities, each of the probabilities corresponding to a word sequence in a group of one or more word sequences, the group including the first word sequence having the adjusted probability value. | 12-26-2013 |
20130346078 | MIXED MODEL SPEECH RECOGNITION - In one aspect, a method comprises accessing audio data generated by a computing device based on audio input from a user, the audio data encoding one or more user utterances. The method further comprises generating a first transcription of the utterances by performing speech recognition on the audio data using a first speech recognizer that employs a language model based on user-specific data. The method further comprises generating a second transcription of the utterances by performing speech recognition on the audio data using a second speech recognizer that employs a language model independent of user-specific data. The method further comprises determining that the second transcription of the utterances includes a term from a predefined set of one or more terms. The method further comprises, based on determining that the second transcription of the utterance includes the term, providing an output of the first transcription of the utterance. | 12-26-2013 |
20130346079 | SPEECH RECOGNITION AND TRANSCRIPTION AMONG USERS HAVING HETEROGENEOUS PROTOCOLS - A system for facilitating free form dictation, including directed dictation and constrained recognition and/or structured transcription among users having heterogeneous protocols for generating, transcribing, and exchanging recognized and transcribed speech. The system includes a system transaction manager having a “system protocol,” to receive a speech information request from an authorized user. The speech information request is generated using a user interface capable of bi-directional communication with the system transaction manager and supporting dictation applications. A speech recognition and/or transcription engine (ASR), in communication with the system transaction manager, receives the speech information request, generates a transcribed response, and transmits the response to the system transaction manager. The system transaction manager routes the response to one or more of the users. In another embodiment, the system employs a virtual sound driver for streaming free form dictation to any ASR. | 12-26-2013 |
20140006020 | TRANSCRIPTION METHOD, APPARATUS AND COMPUTER PROGRAM PRODUCT | 01-02-2014 |
20140006021 | METHOD FOR ADJUSTING DISCRETE MODEL COMPLEXITY IN AN AUTOMATIC SPEECH RECOGNITION SYSTEM | 01-02-2014 |
20140006022 | DISPLAY APPARATUS, METHOD FOR CONTROLLING DISPLAY APPARATUS, AND INTERACTIVE SYSTEM | 01-02-2014 |
20140012574 | INTERACTIVE TIMELINE FOR PRESENTING AND ORGANIZING TASKS - A system, method and computer program for performing voice commands and presenting results on an interactive timeline is disclosed. A user may utter a voice command (e.g. into a mobile device) which is processed to derive the intention, specifically by determining the domain, at least one task and at least one parameter for the task. A services component performs the task identified and presents the results. In various embodiments, the results are presented on a timeline and may be grouped together by domains and presented chronologically. A search history view may also be viewed that includes search results sorted chronologically each of which is represented graphically by an icon that represents a search domain. A voice command may be presented by a text representation with an edit button, a resay button, and a progress bar. The text representation may be modified while the natural language processing is being performed. | 01-09-2014 |
20140019126 | SPEECH-TO-TEXT RECOGNITION OF NON-DICTIONARY WORDS USING LOCATION DATA - Speech-to-text recognition of non-dictionary words by an electronic device having a speech-to-text recognizer and a global positing system (GPS) includes receiving a user's speech and attempting to convert the speech to text using at least a word dictionary; in response to a portion of the speech being unrecognizable, determining if the speech contains a location-based phrase that contains a term relating to any combination of a geographic origin or destination, a current location, and a route; retrieving from a global positioning system location data that are within geographical proximity to the location-based phrase, wherein the location data include any combination of street names, business names, places of interest, and municipality names; updating the word dictionary by temporarily adding words from the location data to the word dictionary; and using the updated word dictionary to convert the previously unrecognized portion of the speech to text. | 01-16-2014 |
20140019127 | METHOD FOR CORRECTING VOICE RECOGNITION ERROR AND BROADCAST RECEIVING APPARATUS APPLYING THE SAME - A method for correcting a voice recognition error and a broadcast receiving apparatus applying the same are provided. The method for correcting the voice recognition error includes, receiving a user's spoken command, recognizing the user's spoken command and determining text corresponding to the user's spoken command, if a user command to correct the determined text is input, displaying a text correction user interface in which a morpheme of the determined text and an indicator are associated with each other, and correcting the morpheme of the determined text by selecting the associated indicator of the text correction UI. Accordingly, the broadcast receiving apparatus exactly corrects the misrecognized word with a word desired by the user. | 01-16-2014 |
20140019128 | Voice Based System and Method for Data Input - Described herein are systems and methods for transforming a speech input into machine-interpretable structured data. In some embodiments, a system may include an automated speech recognition (ASR) engine configured to receive a live speech input and to continuously generate a text of the live speech input, a natural language processing (NLP) engine configured to transform the text into machine-interpretable structured data, and a user interface device configured to display the live speech input and a corresponding portion of the structured data in a predetermined order with respect to the structured data. In some embodiments, the method may include the steps of receiving a speech input with a speech capture component of a user interface device, generating a text from the speech input, identifying textual cues in the text, modifying the text based on the textual cues, and transforming the modified text into machine-interpretable structured data. | 01-16-2014 |
20140019129 | DIFFERENTIAL DYNAMIC CONTENT DELIVERY WITH TEXT DISPLAY IN DEPENDENCE UPON SIMULTANEOUS SPEECH - Differential dynamic content delivery including providing a session document for a presentation, wherein the session document includes a session grammar and a session structured document; selecting from the session structured document a classified structural element in dependence upon user classifications of a user participant in the presentation; presenting the selected structural element to the user; streaming presentation speech to the user including individual speech from at least one user participating in the presentation; converting the presentation speech to text; detecting whether the presentation speech contains simultaneous individual speech from two or more users; and displaying the text if the presentation speech contains simultaneous individual speech from two or more users. | 01-16-2014 |
20140032214 | System and Method for Adapting Automatic Speech Recognition Pronunciation by Acoustic Model Restructuring - Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model. | 01-30-2014 |
20140032215 | Asynchronous Video Interview System - Aspects of an asynchronous video interview system and related techniques include a server that receives a plurality of pre-recorded video prompts, generates an interview script, transmits a video prompt from the interview script to be displayed at a client computing device, and receives a streamed video response from the client computing device. The server can perform algorithmic analysis on content of the video response. In another aspect, a server obtains response preference data indicating a timing parameter for a response. In another aspect, a video prompt and an information supplement (e.g., a news item) that relates to the content of the video prompt are transmitted. In another aspect, a server automatically selects a video prompt (e.g., a follow-up question) to be displayed at the client computing device (e.g., based on a response or information about an interviewee). | 01-30-2014 |
20140039887 | IDENTIFYING CORRESPONDING REGIONS OF CONTENT - A content alignment service may generate content synchronization information to facilitate the synchronous presentation of audio content and textual content. In some embodiments, a region of the textual content whose correspondence to the audio content is uncertain may be analyzed to determine whether the region of textual content corresponds to one or more words that are audibly presented in the audio content, or whether the region of textual content is a mismatch with respect to the audio content. In some embodiments, words in the textual content that correspond to words in the audio content are synchronously presented, while mismatched words in the textual content may be skipped to maintain synchronous presentation. Accordingly, in one example application, an audiobook is synchronized with an electronic book, so that as the electronic book is displayed, corresponding words of the audiobook are audibly presented. | 02-06-2014 |
20140039888 | SPEECH RECOGNITION MODELS BASED ON LOCATION INDICIA - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing speech recognition using models that are based on where, within a building, a speaker makes an utterance are disclosed. The methods, systems, and apparatus include actions of receiving data corresponding to an utterance, and obtaining location indicia for an area within a building where the utterance was spoken. Further actions include selecting one or more models for speech recognition based on the location indicia, wherein each of the selected one or more models is associated with a weight based on the location indicia. Additionally, the actions include generating a composite model using the selected one or more models and the respective weights of the selected one or more models. And the actions also include generating a transcription of the utterance using the composite model. | 02-06-2014 |
20140039889 | SYSTEMS AND METHODS FOR PROVIDING AN ELECTRONIC DICTATION INTERFACE - Some embodiments disclosed herein store a target application and a dictation application. The target application may be configured to receive input from a user. The dictation application interface may include a full overlay mode option, where in response to selection of the full overlay mode option, the dictation application interface is automatically sized and positioned over the target application interface to fully cover a text area of the target application interface to appear as if the dictation application interface is part of the target application interface. The dictation application may be further configured to receive an audio dictation from the user, convert the audio dictation into text, provide the text in the dictation application interface and in response to receiving a first user command to complete the dictation, automatically copy the text from the dictation application interface and inserting the text into the target application interface. | 02-06-2014 |
20140046660 | METHOD AND SYSTEM FOR VOICE BASED MOOD ANALYSIS - A computer-implemented method for voice based mood analysis includes receiving an acoustic speech of a plurality of words from a user in response to the user utilizing speech to text mode. The computer-implemented method also includes analyzing the acoustic speech to distinguish voice patterns. Further, the computer-implemented method includes measuring a plurality of tone parameters from the voice patterns, wherein the tone parameters comprises voice decibel, timbre and pitch. Furthermore, the computer-implemented method includes identifying mood of the user based on the plurality of tone parameters. Moreover, the computer-implemented method includes streaming appropriate web content to the user based on the mood of the user. | 02-13-2014 |
20140046661 | APPARATUSES, METHODS AND SYSTEMS TO PROVIDE TRANSLATIONS OF INFORMATION INTO SIGN LANGUAGE OR OTHER FORMATS - Some embodiments provide methods of providing a translation of information to a translated format comprising: receiving information in a first format; identifying the first format, where in the first format is one of a plurality of different formats configured to be received; processing the information in accordance with the first format and extracting one or more speech elements from the information; identifying, through at least one processor configured to translate the received information, one or more sign language identifiers corresponding the one or more extracted speech elements, wherein at least one of the one or more sign language identifiers directly corresponds to a synonym of at least one of the one or more speech elements; and causing one or more sign language clips corresponding to at least one of the one or more sign language identifiers to be reproduced on a display of a displaying device. | 02-13-2014 |
20140052442 | System and Method for the Transformation and Canonicalization of Semantically Structured Data - A method of transforming and canonicalizing semantically structured data includes obtaining data from a network of computers, applying text patterns to the obtained data and placing the data in a first data file, providing a second data file containing the obtained data in a uniform format, and generating interface specific sentences from the data in the second data file. | 02-20-2014 |
20140058727 | MULTIMEDIA RECORDING SYSTEM AND METHOD - A multimedia recording system is provided. The multimedia recording system includes a storage module, a recognition module, and a tagging module. The storage module stores a multimedia file corresponding to multimedia data with audio content, wherein the multimedia data is received through a computer network. The recognition module converts the audio content of the multimedia data into text. The tagging module produces tag information according to the text, wherein the tag information corresponds to portion(s) of the multimedia file. The disclosure further provides a multimedia recording method. | 02-27-2014 |
20140058728 | Speech Recognition with Parallel Recognition Tasks - The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS's). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS's that have not generated a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results. | 02-27-2014 |
20140058729 | SPEECH RECOGNITION SYSTEM, SPEECH RECOGNITION REQUEST DEVICE, SPEECH RECOGNITION METHOD, SPEECH RECOGNITION PROGRAM, AND RECORDING MEDIUM - Provided is a speech recognition system, including: a first information processing device including a speech recognition processing unit for receiving data to be used for speech recognition transmitted via a network, carrying out speech recognition processing, and returning resultant data; and a second information processing device connected to the first information processing device via the network. The second information processing device performs conversion of the data into data having a format that disables a content thereof from being perceived and also enables the speech recognition processing unit to perform the speech recognition processing. Thereafter, the second information processing device transmits the data to be used for the speech recognition by the speech recognition processing unit and constructs resultant data returned from the first information processing device into a content of a valid and perceivable recognition result. | 02-27-2014 |
20140058730 | SYSTEMS AND METHODS FOR EVENT AND INCIDENT REPORTING AND MANAGEMENT - Systems and methods for information and action management including managing and communicating critical and non-critical information relating to certain emergency services events or incidents as well as other applications. More specifically, systems and methods for information and action management including a plurality of mobile interface units that include one or more of a language translation sub-system, an action receipt sub-system, a voice-to-text conversion sub-system, a media management sub-system, a revision management sub-system that restricts the abilities of some users, and a report generation sub-system that creates reports operatively coupled to the language translation sub-system, the action receipt sub-system, the voice-to-text conversion sub-system, the media management sub-system, and the revision management sub-system to auto-populate report fields. | 02-27-2014 |
20140067389 | Apparatus and Method for Queuing Jobs in a Distributed Dictation/Transcription System - A distributed dictation/transcription system is provided. The system provides a client station, dictation manager, and dictation server connected such that the dictation manager can select a dictation server to transcribe audio from the client station. A job queue at the dictation manager holds the queues the audio to be provided to the dictation servers. The dictation manager reviews all jobs in the job queue and send audio with a user profile matching a user profile already uploaded to the dictation server regardless of whether the matching audio is next in the job queue. If alternative audio has been pending over a predetermined amount of time or has a higher priority, the alternative audio is sent to the dictation server. | 03-06-2014 |
20140067390 | Computer-Implemented System And Method For Transcribing Verbal Messages - A computer-implemented system and method for transcribing verbal messages is provided. Verbal messages each comprising audio content are received. Automatically recognized text is generated for the audio content of at least one of the verbal messages. A turn-around processing time is applied to the verbal message. The automatically recognized text and verbal message are transferred to a human agent when an expected processing time of the verbal message satisfies the turn-around processing time. At least a portion of the automatically recognized text is replaced with manual transcription from the human agent. The automatically recognized text and manual transcription are provided as a text message. | 03-06-2014 |
20140074465 | SYSTEM AND METHOD TO GENERATE A NARRATOR SPECIFIC ACOUSTIC DATABASE WITHOUT A PREDEFINED SCRIPT - A system and method for generating an acoustic database for a particular narrator that does not require said narrator to recite a pre-defined script. The system and method generate an acoustic database by using voice recognition or speech-to-text algorithms to automatically generate a text script of a voice message while simultaneously or near-simultaneously sampling the voice message to create the acoustic database. The acoustic database may be associated with the narrator of the voice message by an identifier, such as a telephone number. The acoustic database may then be used by a text-to-speech processor to read a text message when the narrator is identified as the sender of the text massage, providing an audio output of the contents of the text message with a simulation of the sender's voice. The user of the system may be provided an audio message that sounds like the originator of the text message. | 03-13-2014 |
20140074466 | ANSWERING QUESTIONS USING ENVIRONMENTAL CONTEXT - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving audio data encoding an utterance and environmental data, obtaining a transcription of the utterance, identifying an entity using the environmental data, submitting a query to a natural language query processing engine, wherein the query includes at least a portion of the transcription and data that identifies the entity, and obtaining one or more results of the query. | 03-13-2014 |
20140074467 | Speaker Separation in Diarization - The system and method of separating speakers in an audio file including obtaining an audio file. The audio file is transcribed into at least one text file by a transcription server. Homogenous speech segments are identified within the at least one text file. The audio file is segmented into homogenous audio segments that correspond to the identified homogenous speech segments. The homogenous audio segments of the audio file are separated into a first speaker audio file and second speaker audio file the first speaker audio file and the second speaker audio file are transcribed to produce a diarized transcript. | 03-13-2014 |
20140081633 | Voice-Based Media Searching - Methods and systems for searching for media items using a voice-based digital assistant are described. Natural language text strings corresponding to search queries are provided. The search queries include query terms. The text strings may correspond to speech inputs input by a user into an electronic device. At least one information source is searched to identify at least one parameter associated with at least one of the query terms. The parameters include at least one of a time parameter, a date parameter, or a geo-code parameter. The parameters are compared to tags of media items to identify matches. In some implementations, media items whose tags match the parameter are presented to the user. | 03-20-2014 |
20140081634 | LEVERAGING HEAD MOUNTED DISPLAYS TO ENABLE PERSON-TO-PERSON INTERACTIONS - Various arrangements for using an augmented reality device are presented. Speech spoken by a person in a real-world scene may be captured by an augmented reality (AR) device. It may be determined that a second AR device is to receive data on the speech. The second AR device may not have been present for the speech when initially spoken. Data corresponding to the speech may be transmitted to the second augmented reality device. | 03-20-2014 |
20140081635 | Providing Text Input Using Speech Data and Non-Speech Data - Systems, methods, and computer readable media providing a speech input interface. The interface can receive speech input and non-speech input from a user through a user interface. The speech input can be converted to text data and the text data can be combined with the non-speech input for presentation to a user. | 03-20-2014 |
20140088961 | Captioning Using Socially Derived Acoustic Profiles - Mechanisms for performing dynamic automatic speech recognition on a portion of multimedia content are provided. Multimedia content is segmented into homogeneous segments of content with regard to speakers and background sounds. For the at least one segment, a speaker providing speech in an audio track of the at least one segment is identified using information retrieved from a social network service source. A speech profile for the speaker is generated using information retrieved from the social network service source, an acoustic profile for the segment is generated based on the generated speech profile, and an automatic speech recognition engine is dynamically configured for operation on the at least one segment based on the acoustic profile. Automatic speech recognition operations are performed on the audio track of the at least one segment to generate a textual representation of speech content in the audio track corresponding to the speaker. | 03-27-2014 |
20140088962 | APPARATUS AND METHODS FOR MANAGING RESOURCES FOR A SYSTEM USING VOICE RECOGNITION - The technology of the present application provides a method and apparatus to managing resources for a system using voice recognition. The method and apparatus includes maintaining a database of historical data regarding a plurality of users. The historical database maintains data regarding the training resources required for users to achieve an accuracy score using voice recognition. A resource calculation module determines from the historical data an expected amount of training resources necessary to train a new user to the accuracy score. | 03-27-2014 |
20140095158 | CONTINUOUS AMBIENT VOICE CAPTURE AND USE - An apparatus, system and method for continuously capturing ambient voice and using it to update content delivered to a user of an electronic device are provided. Subsets of words are continuously extracted from speech and used to deliver content relevant to the subsets of words. | 04-03-2014 |
20140095159 | IMAGE PROCESSING APPARATUS AND CONTROL METHOD THEREOF AND IMAGE PROCESSING SYSTEM - An image processing apparatus including: image processor which processes broadcasting signal, to display image based on processed broadcasting signal; communication unit which is connected to a server; a voice input unit which receives a user's speech; a voice processor which processes a performance of a preset corresponding operation according to a voice command corresponding to the speech; and a controller which processes the voice command corresponding to the speech through one of the voice processor and the server if the speech is input through the voice input unit. If the voice command includes a keyword relating to a call sign of a broadcasting channel, the controller controls one of the voice processor and the server to select a recommended call sign corresponding to the keyword according to a predetermined selection condition, and performs a corresponding operation under the voice command with respect to the broadcasting channel of the recommended call sign. | 04-03-2014 |
20140095160 | CORRECTING TEXT WITH VOICE PROCESSING - The present invention relates to voice processing and provides a method and system for correcting a text. The method comprising: determining a target text unit to be corrected in a text; receiving a reference voice segment input by the user for the target text unit; determining a reference text unit whose pronunciation is similar to a word in the target text unit based on the reference voice segment; and correcting the word in the target text unit in the text by the reference text unit. The present invention enables the user to easily correct errors in the text vocally. | 04-03-2014 |
20140108010 | VOICE-ENABLED DOCUMENTS FOR FACILITATING OPERATIONAL PROCEDURES - A voice-enabled document system facilitates execution of service delivery operations by eliminating the need for manual or visual interaction during information retrieval by an operator. Access to voice-enabled documents can facilitate operations for mobile vendors, on-site or field-service repairs, medical service providers, food service providers, and the like. Service providers can access the voice-enabled documents by using a client device to retrieve the document, display it on a screen, and, via voice commands initiate playback of selected audio files containing information derived from text data objects selected from the document. Data structures that are components of a voice-enabled document include audio playback files and a logical association that links the audio playback files to user-selectable fields, and to a set of voice commands. | 04-17-2014 |
20140114656 | ELECTRONIC DEVICE CAPABLE OF GENERATING TAG FILE FOR MEDIA FILE BASED ON SPEAKER RECOGNITION - An electronic device with speaker recognition function is provided. The electronic device includes a speaker recognition unit that can make a speaker recognition for a media file including speech content. Speakers of the speech content are thus determined. The processor of the electronic device determines the time durations when each of the speaker is speaking, and generates a tag file including the identities of the speakers and the time durations corresponding to each of the speakers. The tag file is associated with the media file. | 04-24-2014 |
20140114657 | APPARATUS AND METHOD FOR INSERTING MATERIAL INTO TRANSCRIPTS - A system and method for placing and displaying advertising into a document including transcribing text in real-time in a recording device; communicating the transcribed text to a computer configured to embed an advertisement into the transcribed text; receiving a request from a user to access the transcribed text with the embedded advertisement; and communicating the transcribed text with the embedded advertisement to a user's peripheral device. | 04-24-2014 |
20140114658 | METHODS AND SYSTEMS FOR CORRECTING TRANSCRIBED AUDIO FILES - Methods and systems for correcting transcribed text. One method includes receiving audio data from one or more audio data sources and transcribing the audio data based on a voice model to generate text data. The method also includes making the text data available to a plurality of users over at least one computer network and receiving corrected text data over the at least one computer network from the plurality of users. In addition, the method can include modifying the voice model based on the corrected text data. | 04-24-2014 |
20140114659 | IDENTIFYING MEDIA CONTENT - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving (i) audio data that encodes a spoken natural language query, and (ii) environmental audio data, obtaining a transcription of the spoken natural language query, determining a particular content type associated with one or more keywords in the transcription, providing at least a portion of the environmental audio data to a content recognition engine, and identifying a content item that has been output by the content recognition engine, and that matches the particular content type. | 04-24-2014 |
20140122069 | Automatic Speech Recognition Accuracy Improvement Through Utilization of Context Analysis - A mechanism is provided for utilizing content analytics to automate corrections and improve speech recognition accuracy. A set of current corrected content elements is identified within a transcribed corrected media. Each current corrected content element in the set of current corrected content elements is weighted with an assigned weight based on one or more predetermined weighting conditions and a context of the transcribed corrected media. A confidence level is associated with each corrected content element based on the assigned weight. The set of current corrected content elements and the confidence level associated with each current corrected content element in a set of corrected elements is stored in a storage device for use in a subsequent transcription correction. | 05-01-2014 |
20140122070 | GRAPHIC DISPLAY SYSTEM FOR ASSISTING VEHICLE OPERATORS - A system for converting audible air traffic control instructions for pilots operating from an air facility to textual format. The system may comprise a processor connected to a jack of the standard pilot headset and a separate portable display screen connected to the processor. The processor may have a language converting functionality which can recognize traffic control nomenclature and display messages accordingly. Displayed text may be limited to information intended for a specific aircraft. The display may show hazardous discrepancies between authorized altitudes and headings and actual altitudes and headings. The display may be capable of correction by the user, and may utilize Global Positioning System (GPS) to obtain appropriate corrections. The system may date and time stamp communications and hold the same in memory. The system may have computer style user functions such as scrollability and operating prompts. | 05-01-2014 |
20140122071 | Method and System for Voice Recognition Employing Multiple Voice-Recognition Techniques - A method and system for voice recognition are disclosed. In one example embodiment, the method includes receiving voice input information by way of a receiver on a mobile device and performing, by way of at least one processing device on the mobile device, first and second processing operations respectively with respect to first and second voice input portions, respectively, which respectively correspond to and are based at least indirectly upon different respective portions of the voice input information. The first processing operation includes a speech-to-text operation and the second processing operation includes an alternate processing operation. Additionally, the method includes generating recognized voice information based at least indirectly upon results from the first and second processing operations, and performing at least one action based at least in part upon the recognized voice information, where the at least one action includes outputting at least one signal by an output device. | 05-01-2014 |
20140122072 | Text-to-Speech System for Stitcher - A text-to-speech system for a stitcher includes a tablet device in operative communication with the stitcher; the tablet device further comprising a display screen; a memory; a microprocessor; communication module; a command input device; a speaker; and a text-to-speech algorithm. An associated method includes the steps of accepting a command for operation of the stitcher from a user; transmitting the command to the text-to-speech algorithm; and producing an audible confirmation of the command through the speaker. | 05-01-2014 |
20140122073 | PERSONAL AUDIO ASSISTANT DEVICE AND METHOD - A system for delivery of personalized audio services including a first microphone for capturing audio, a voice controlled interface coupled to the first microphone for detecting a voice command, a communication module for accessing information from a network in response to the detected voice command to provide results from accessing the network, a memory for storing a user profile and a user listening history, and a processor for modifying the results based on the detected voice command, the user profile and the user listening history. Other embodiments are disclosed. | 05-01-2014 |
20140129219 | Computer-Implemented System And Method For Masking Special Data - A computer-implemented system and method for masking special data is provided. Speakers of a call recording are identified. The call recording is separated into strands corresponding to each of the speakers. A prompt list of elements that prompt the speaker of the other strand to utter special information is applied to one of the strands. At least one of the elements of the prompt list is identified in the one strand. A special information candidate is identified in the other strand and is located after a location in time where the element was found in the voice recording of the one strand. A confidence score is assigned to the element located in the one strand and to the special information candidate in the other strand. The confidence scores are combined and a threshold is applied. The special information candidate is rendered unintelligible when the combined confidence scores satisfy the threshold. | 05-08-2014 |
20140129220 | SPEAKER AND CALL CHARACTERISTIC SENSITIVE OPEN VOICE SEARCH - Techniques disclosed herein include systems and methods for open-domain voice-enabled searching that is speaker sensitive. Techniques include using speech information, speaker information, and information associated with a spoken query to enhance open voice search results. This includes integrating a textual index with a voice index to support the entire search cycle. Given a voice query, the system can execute two matching processes simultaneously. This can include a text matching process based on the output of speech recognition, as well as a voice matching process based on characteristics of a caller or user voicing a query. Characteristics of the caller can include output of voice feature extraction and metadata about the call. The system clusters callers according to these characteristics. The system can use specific voice and text clusters to modify speech recognition results, as well as modifying search results. | 05-08-2014 |
20140136195 | Voice-Operated Internet-Ready Ubiquitous Computing Device and Method Thereof - In one aspect of this disclosure, a voice-operated internet-ready ubiquitous computing device and a method implemented by the computing device are disclosed. A sound input associated with a user is received at the first computing device, and in response to receiving the sound input, the first computing device establishes a voice session associated with the user. The first computing device then determines a session quality associated with the voice input, and sends the session quality to a further computing device such as a second computing device or a server. The first computing device will receive a request to transfer the voice session to a second computing device; and in response to receiving the request, transfers the voice session to the second computing device. | 05-15-2014 |
20140136196 | SYSTEM AND METHOD FOR POSTING MESSAGE BY AUDIO SIGNAL - A system for posting a message by an audio signal is provided. The system has: a communication unit used to connect the system to a communications network; an audio receiving unit used to receive a first audio signal; a display unit; and a processing unit, connected to the communication unit, the audio receiving unit and the display unit, used to recognize the first audio signal to generate a first string, determine a target object from a display screen displayed on the display unit according to the first string, and automatically generate a message corresponding to the target object, and post the message on a social network through the communication unit. | 05-15-2014 |
20140136197 | ACCURACY IMPROVEMENT OF SPOKEN QUERIES TRANSCRIPTION USING CO-OCCURRENCE INFORMATION - Techniques disclosed herein include systems and methods for voice-enabled searching. Techniques include a co-occurrence based approach to improve accuracy of the 1-best hypothesis for non-phrase voice queries, as well as for phrased voice queries. A co-occurrence model is used in addition to a statistical natural language model and acoustic model to recognize spoken queries, such as spoken queries for searching a search engine. Given an utterance and an associated list of automated speech recognition n-best hypotheses, the system rescores the different hypotheses using co-occurrence information. For each hypothesis, the system estimates a frequency of co-occurrence within web documents. Combined scores from a speech recognizer and a co-occurrence engine can be combined to select a best hypothesis with a lower word error rate. | 05-15-2014 |
20140136198 | CORRECTING TEXT WITH VOICE PROCESSING - The present invention relates to voice processing and provides a method and system for correcting a text. The method comprising: determining a target text unit to be corrected in a text; receiving a reference voice segment input by the user for the target text unit; determining a reference text unit whose pronunciation is similar to a word in the target text unit based on the reference voice segment; and correcting the word in the target text unit in the text by the reference text unit. The present invention enables the user to easily correct errors in the text vocally. | 05-15-2014 |
20140136199 | CORRECTING TRANSCRIBED AUDIO FILES WITH AN EMAIL-CLIENT INTERFACE - Methods and systems for requesting a transcription of audio data. One method includes displaying a send-for-transcription button within an email-client interface on a computer-controlled display, and automatically sending a selected email message and associated audio data to a transcription server as a request for a transcription of the associated audio data when a user selects the send-for-transcription button. | 05-15-2014 |
20140142937 | GESTURE-AUGMENTED SPEECH RECOGNITION - Methods and systems may provide for generating text based on speech input and recognizing one or more hand gestures. Additionally, an adaptation of the text may be conducted based on the hand gestures. In one example, the hand gestures are associated with operations such as punctuation insertion operations, cursor movement operations, text selection operations, capitalization operations, pause of speech recognition operations, resume of speech recognition operations, application-specific actions, and so forth, wherein the adaptation of the text includes the associated operation. | 05-22-2014 |
20140142938 | MESSAGE PROCESSING DEVICE - A speech input recognition unit, an event processing unit, which processes an event including a speech recognition result or a command, an expert unit including a plurality of expert modules each of which processes the event in cooperation with the event processing unit, and an execution history management unit, which manages execution history of the expert modules are provided. The expert unit is, for recording a user speech as message text, provided with a speech processing expert module for producing standard format text and a speech processing expert module for producing free format text and a transmission expert module for sending the message text. | 05-22-2014 |
20140142939 | METHOD AND SYSTEM FOR VOICE TO TEXT REPORTING FOR MEDICAL IMAGE SOFTWARE - A system and method for voice to text reporting for medical image software. The system and method may optionally include a separate voice to text engine, for converting the voice report to text, and also some type of medical image software, for providing medical image processing capabilities. According to at least some embodiments, both capabilities are provided remotely to the user's computer, and may optionally be provided through a “zero footprint” on the user's computer. | 05-22-2014 |
20140142940 | Diarization Using Linguistic Labeling - Systems and methods of diarization using linguistic labeling include receiving a set of diarized textual transcripts. A least one heuristic is automatedly applied to the diarized textual transcripts to select transcripts likely to be associated with an identified group of speakers. The selected transcripts are analyzed to create at least one linguistic model. The linguistic model is applied to transcripted audio data to label a portion of the transcripted audio data as having been spoken by the identified group of speakers. Still further embodiments of diarization using linguistic labeling may serve to label agent speech and customer speech in a recorded and transcripted customer service interaction. | 05-22-2014 |
20140142941 | GENERATION OF TIMED TEXT USING SPEECH-TO-TEXT TECHNOLOGY, AND APPLICATIONS THEREOF - Embodiments relate to generation of timed text in web video. In an embodiment, a computer-implemented method generates timed text for online video. In the method, a request to play a timed text track of a video incorporated into a web video service is received from a client computing device. Prior to receipt of the request, audio of the video is processed to determine intermediate timed text data. The intermediate timed text data lacks a complete text transcription of the audio, but includes data to enable the complete text transcription to be generated when playing the video. In response to receipt of the request, a text transcription of the audio is determined using the intermediate data with an automated speech-to-text algorithm. Finally, the text transcription of the audio is sent to the client computing device for display along with the video. | 05-22-2014 |
20140149113 | SPEECH RECOGNITION - A speech recognition system, according to an example embodiment, includes a data storage to store speech training data. A training engine determines consecutive breakout periods in the speech training data, calculates forward and backward probabilities for the breakout periods, and generates a speech recognition Hidden Markov Model (HMM) from the forward and backward probabilities calculated for the breakout periods. | 05-29-2014 |
20140149114 | Automatic Decision Support - Speech is transcribed to produce a transcript. At least some of the text in the transcript is encoded as data. These codings may be verified for accuracy and corrected if inaccurate. The resulting transcript is provided to a decision support system to perform functions such as checking for drug-drug, drug-allergy, and drug-procedure interactions, and checking against clinical performance measures (such as recommended treatments). Alerts and other information output by the decision support system are associated with the transcript. The transcript and associated decision support output are provided to a physician to assist the physician in reviewing the transcript and in taking any appropriate action in response to the transcript. | 05-29-2014 |
20140149115 | Computer-Implemented System And Method For Voice Transcription Error Reduction - A computer-implemented system and method for voice transcription error reduction is provided. Speech utterances are obtained from a voice stream and each speech utterance is associated with a transcribed value and a confidence score. Those utterances with transcription values associated with lower confidence scores are identified as questionable utterances. One of the questionable utterances is selected from the voice stream. A predetermined number of questionable utterances from other voice streams and having transcribed values similar to the transcribed value of the selected questionable utterance are identified as a pool of related utterances. A further transcribed value is received for each of a plurality of the questionable utterances in the pool of related utterances. A transcribed message is generated for the voice stream using those transcribed values with higher confidence scores and the further transcribed value for the selected questionable utterance. | 05-29-2014 |
20140156271 | SYSTEM AND METHOD FOR BROADCASTING CAPTIONS - There is disclosed one or more methods, systems and components therefor for broadcasting captions of a presenter's speech to audience members to accompany the live viewing of the presentation. A host captioning device converts the presenter's speech to text and communicates the text to and for presentation by an audience member's client device. The communication session between the host captioning device and the client device is established by an invitation request from the host captioning device in response to a registration request from the client device. The captioning information may be communicated in real time as text. The host captioning device either connects to a network or provides one itself, thereby serving as an access point for the client devices. | 06-05-2014 |
20140156272 | VOICE ENTRY VIN METHOD AND APPARATUS - A mobile electronic device such as a smart phone receives a spoken vehicle identification number (VIN) by a user. The smart phone device interprets the spoken input as interpreted text. The device removes any spaces, punctuation, words, or sound-alike words from the interpreted text. The device replaces any prohibited characters with corresponding acceptable letters. The device displays the resulting character string as a VIN to the user for comparison to the VIN on a vehicle. The device may communicate to a database to obtain make, model, and model year information corresponding to the VIN and display that vehicle information to the user for confirmation. | 06-05-2014 |
20140163980 | MULTIMEDIA MESSAGE HAVING PORTIONS OF MEDIA CONTENT WITH AUDIO OVERLAY - A multimedia message is generated according to media content portions identified by a message input. The media content portions are identified from among media content that can include videos, images, and audio content that is associated or not associated with the media content portions respectively. The media content portions correspond to words or phrases of the message inputs. A multimedia message is generated having one or more of the media content portions corresponding to words or phrases received. Audio content of the media content portions can be separated and reassembled with different media content portions than originally associated with. The multimedia message comprises the media content portions having different audio content portions than initially. | 06-12-2014 |
20140163981 | Combining Re-Speaking, Partial Agent Transcription and ASR for Improved Accuracy / Human Guided ASR - A speech transcription system is described for producing a representative transcription text from one or more different audio signals representing one or more different speakers participating in a speech session. A preliminary transcription module develops a preliminary transcription of the speech session using automatic speech recognition having a preliminary recognition accuracy performance. A speech selection module enables user selection of one or more portions of the preliminary transcription to receive higher accuracy transcription processing. A final transcription module is responsive to the user selection for developing a final transcription output for the speech session having a final recognition accuracy performance for the selected one or more portions which is higher than the preliminary recognition accuracy performance. | 06-12-2014 |
20140163982 | Human Transcriptionist Directed Posterior Audio Source Separation - A graphical user interface is described for human guided audio source separation in a multi-speaker automated transcription system receiving audio signals representing speakers participating together in a speech session. A speaker avatar for each speaker is distributed about a user interface display to suggest speaker positions relative to each other during the speech session. There also is a speaker highlight element on the interface display for visually highlighting a specific speaker avatar corresponding to an active speaker in the speech session to aid a human transcriptionist listening to the speech session to identify the active speaker. A speech signal processor performs signal processing of the audio signals to isolate an audio signal corresponding to the highlighted speaker avatar. A session transcription processor performs automatic speech recognition (ASR) of the signal processed audio signal for the speech session as supervised by the human transcriptionist and reflecting position of the speaker highlight element. | 06-12-2014 |
20140163983 | DISPLAY DEVICE FOR CONVERTING VOICE TO TEXT AND METHOD THEREOF - A display device for converting a voice to a text and displaying the converted text and a method thereof are disclosed. The display device comprises a storage unit configured to store the voice data; a display unit configured to display the text; a sensor unit configured to detect a user input to the display unit; and a processor configured to convert the voice data to the text and display the converted text in the display unit, wherein the processor provides a text preview interface displaying at least a part of the text in the display unit, in response to a first user input, and provides a text output interface displaying the text in the display unit, in response to a second user input, and the text output interface displays the text in accordance with the second user input. | 06-12-2014 |
20140172425 | METHOD FOR SENDING MULTI-MEDIA MESSAGES WITH CUSTOMIZED AUDIO - A system and method of creating a customized multi-media message to a recipient is disclosed. The multi-media message is created by a sender and contains an animated entity that delivers an audible message. The sender chooses the animated entity from a plurality of animated entities. The system receives a text message from the sender and receives a sender audio message associated with the text message. The sender audio message is associated with the chosen animated entity to create the multi-media message. The multi-media message is delivered by the animated entity using as the voice the sender audio message wherein the mouth movements of the animated entity conform to the sender audio message. | 06-19-2014 |
20140172426 | Method for Processing Speech of Particular Speaker, Electronic System for the Same, and Program for Electronic System - An object of the present invention is to process the speech of a particular speaker. The present invention provides a technique for collecting speech, analyzing the collected speech to extract the features of the speech, grouping the speech, or text corresponding to the speech, on the basis of the extracted features, presenting the result of the grouping to a user, and when one or more of the groups is selected by the user, enhancing, or reducing or cancelling the speech of a speaker associated with the selected group. | 06-19-2014 |
20140180686 | SELF CONTAINED BREATHING AND COMMUNICATION APPARATUS - A self-contained breathing and communication apparatus is described that can facilitate communication between a first user and a second user. A microphone can record sound when the first user speaks. The microphone can convert the recorded sound to a voice signal. A voice activity detection processor can detect spoken words and informative sounds of the first user from the converted voice signal. For this detection, the voice activity detection processor can remove noise from the voice signal. A voice-to-text processor can convert the detected words and informative sounds to a text message. A transmitter of the transmitting module can transmit the text message to a receiver of the second user via a communication network. A display device of the second user can display the text message on a graphical user interface. Related methods, apparatus, systems, techniques and articles are also described. | 06-26-2014 |
20140180687 | Method And Apparatus For Automatic Conversion Of Audio Data To Electronic Fields of Text Data - A computer system capable of administering a search engine program with a marketing component, which includes at least one client device operatively connected to a host server through a communication network to communicate data between the client device and the host server. A compilation program receives data in audio format for conversion into a text format. The converted data is combined with a second source of data for storage and display on clients devices. | 06-26-2014 |
20140188468 | APPARATUS, SYSTEM AND METHOD FOR CALCULATING PASSPHRASE VARIABILITY - An apparatus, system and method for calculating passphrase variability are disclosed. The passphrase variability value can then be used for generating phonetically rich passwords in text-dependent speaker recognition systems, or for estimating the variability of the input passphrase in text-independent system during the enrolling process in a speech recognition security system. | 07-03-2014 |
20140188469 | DIFFERENTIAL DYNAMIC CONTENT DELIVERY WITH TEXT DISPLAY IN DEPENDENCE UPON SIMULTANEOUS SPEECH - Differential dynamic content delivery including providing a session document for a presentation, wherein the session document includes a session grammar and a session structured document; selecting from the session structured document a classified structural element in dependence upon user classifications of a user participant in the presentation; presenting the selected structural element to the user; streaming presentation speech to the user including individual speech from at least one user participating in the presentation; converting the presentation speech to text; detecting whether the presentation speech contains simultaneous individual speech from two or more users; and displaying the text if the presentation speech contains simultaneous individual speech from two or more users. | 07-03-2014 |
20140195229 | Methodology for Live Text Broadcasting - A Transcription Engine is able to broadcast over the Internet streaming text associated with the broadcast to registered and authenticated end users who may be hearing impaired or may have difficulty understanding the language used in the broadcast. The end users' understanding of the information being broadcast is improved because of the availability of the associated text. The Transcription Engine comprises an authentication server, a database server and a Transcription server. End users are first registered automatically at a website associated with the Transcription Engine. End users can then login and are authenticated automatically by the Transcription Engine prior to being given access to a live or recorded broadcast of associated broadcast. The end users obtain access to the associated text broadcast via the Internet after having been authenticated by the Transcription Engine. | 07-10-2014 |
20140195230 | DISPLAY APPARATUS AND METHOD FOR CONTROLLING THE SAME - A display apparatus is provided. The display apparatus includes: an output unit; a voice collector which collects a user's voice; a first communication unit which transmits the user's voice to a first server and receives text information which corresponds to the user's voice; a second communication unit which transmits the received text information to a second server; and a controller which, when response information which corresponds to the text information is received, controls the output unit to output a system response which corresponds to an utterance intention of the user based on the response information, and when the user's utterance intention is related to at least one of performance of a function of the display apparatus and a search for a content, the system response includes an additional question which relates to the at least one of the performance of the function and the search for the content. | 07-10-2014 |
20140195231 | REUSABLE MULTIMODAL APPLICATION - A method and system are disclosed herein for accepting multimodal inputs and deriving synchronized and processed information. A reusable multimodal application is provided on the mobile device. A user transmits a multimodal command to the multimodal platform via the mobile network. The one or more modes of communication that are inputted are transmitted to the multimodal platform(s) via the mobile network(s) and thereafter synchronized and processed at the multimodal platform. The synchronized and processed information is transmitted to the multimodal application. If required, the user verifies and appropriately modifies the synchronized and processed information. The verified and modified information are transferred from the multimodal application to the visual application. The final result(s) are derived by inputting the verified and modified results into the visual application. | 07-10-2014 |
20140200888 | System and Method for Generating a Script for a Web Conference - A system includes an interface operable to detect a plurality of active audio streams in a plurality of multimedia streams, each multimedia stream associated with a particular user. The system further includes a processor operable to generate a text translation of each active audio stream and generate a script comprising the text translation of each active audio stream and an indication of the particular user associated with each active audio stream, the text translations being ordered according to times associated with the respective corresponding active audio stream. | 07-17-2014 |
20140200889 | System and Method for Speech Recognition Using Pitch-Synchronous Spectral Parameters - The present invention defines a pitch-synchronous parametrical representation of speech signals as the basis of speech recognition, and discloses methods of generating the said pitch-synchronous parametrical representation from speech signals. The speech signal is first going through a pitch-marks picking program to identify the pitch periods. The speech signal is then segmented into pitch-synchronous frames. An ends-matching program equalizes the values at the two ends of the waveform in each frame. Using Fourier analysis, the speech signal in each frame is converted into a pitch-synchronous amplitude spectrum. Using Laguerre functions, the said amplitude spectrum is converted into a unit vector, referred to as the timbre vector. By using a database of correlated phonemes and timbre vectors, the most likely phoneme sequence of an input speech signal can be decoded in the acoustic stage of a speech recognition system. | 07-17-2014 |
20140207449 | USING SPEECH TO TEXT FOR DETECTING COMMERCIALS AND ALIGNING EDITED EPISODES WITH TRANSCRIPTS - Methods and apparatus, including computer program products, for using speech to text for detecting commercials and aligning edited episodes with transcripts. A method includes, receiving an original video or audio having a transcript, receiving an edited video or audio of the original video or audio, applying a speech-to-text process to the received original video or audio having a transcript, applying a speech-to-text process to the received edited video or audio, and applying an alignment to determine locations of the edits. | 07-24-2014 |
20140207450 | Real-Time Customizable Media Content Filter - According to one embodiment of the present disclosure, an approach is provided in which a processor receives a media stream that includes media content. The processor selects a media stream segment included in the media stream, and generates annotated data based upon a portion of the media content included in the selected media stream segment. The processor, in turn, compares the annotated data with obfuscation preferences that correspond to prohibited content, and modifies the media stream segment in response to the comparison. | 07-24-2014 |
20140207451 | Method and Apparatus of Adaptive Textual Prediction of Voice Data - Typical textual prediction of voice data employs a predefined implementation arrangement of a single or multiple prediction sources. Using a predefined implementation arrangement of the prediction sources may not provide a good prediction performance in a consistent manner with variations in voice data quality. Prediction performance may be improved by employing adaptive textual prediction. According to at least one embodiment determining a configuration of a plurality of prediction sources, used for textual interpretation of the voice data, is determined based at least in part on one or more features associated with the voice data or one or more a-priori interpretations of the voice data. A textual output prediction of the voice data is then generated using the plurality of prediction sources according to the determined configuration. Employing an adaptive configuration of the text prediction sources facilitates providing more accurate text transcripts of the voice data. | 07-24-2014 |
20140207452 | VISUAL FEEDBACK FOR SPEECH RECOGNITION SYSTEM - Embodiments are disclosed that relate to providing visual feedback in a speech recognition system. For example, one disclosed embodiment provides a method including displaying a graphical feedback indicator having a variable appearance dependent upon a state of the speech recognition system. The method further comprises receiving a speech input, modifying an appearance of the graphical feedback indicator in a first manner if the speech input is heard and understood by the system, and modifying the appearance of the graphical feedback indicator in a different manner than the first manner if the speech input is heard and not understood. | 07-24-2014 |
20140207453 | METHOD AND APPARATUS FOR EDITING VOICE RECOGNITION RESULTS IN PORTABLE DEVICE - Disclosed is a method of editing voice recognition results in a portable device. The method includes a process of converting the voice recognition results into text and displaying the text in a touch panel, a process of recognizing a touch interaction in the touch panel, a process of analyzing an intent of execution of the recognized touch interaction, and a process of editing contents of the text based on the analyzed intent of execution. | 07-24-2014 |
20140207454 | TEXT REPRODUCTION DEVICE, TEXT REPRODUCTION METHOD AND COMPUTER PROGRAM PRODUCT - According to an embodiment, a text reproduction device includes a setting unit, an acquiring unit, an estimating unit, and a modifying unit. The setting unit is configured to set a pause position delimiting text in response to input data that is input by the user during reproduction of speech data. The acquiring unit is configured to acquire a reproduction position of the speech data being reproduced when the pause position is set. The estimating unit is configured to estimate a more accurate position corresponding to the pause position by matching the text around the pause position with the speech data around the reproduction position. The modifying unit is configured to modify the reproduction position to the estimated more accurate position in the speech data, and set the pause position so that reproduction of the speech data can be started from the modified reproduction position when the pause position is designated by the user. | 07-24-2014 |
20140207455 | Automated Communication Techniques - Various technologies and techniques are disclosed for providing an autoresponder that allows subscribers to opt-in to one or more autoresponder campaigns using their spoken voice. Voice input is received from a subscriber and converted to text. The subscriber is added to at least one campaign. A contact communication identifier is stored in a subscriber contact record from the text that was converted from the voice input. One or more messages are sent to the subscriber using the contact communication identifier, and according to a schedule specified in the campaign. A virtual seminar playback system is described that simulates a live virtual seminar and allows subscribers to access a playback of a media recording over a communication connection at a specified time. An autoresponder system is described that delivers messages to subscribers in multiple available formats, based upon selections received by the subscribers. | 07-24-2014 |
20140222424 | METHOD AND APPARATUS FOR CONTEXTUAL TEXT TO SPEECH CONVERSION - The present specification discloses systems and methods for contextual text to speech conversion, in part, by interpreting the contextual format of the underlying document, and modifying the literal text so as to reflect that context in the conversion, thereby converting text to contextually appropriate speech. | 08-07-2014 |
20140236595 | RECOGNIZING ACCENTED SPEECH - Techniques ( | 08-21-2014 |
20140236596 | EMOTION DETECTION IN VOICEMAIL - Methods and apparatus for processing a voicemail message to generate a textual representation of at least a portion of the voicemail message. At least one emotion expressed in the voicemail message is determined by applying at least one emotion classifier to the voicemail message and/or the textual representation. An indication of the determined at least one emotion is provided in a manner associated with the textual representation of the at least a portion of the voicemail message. | 08-21-2014 |
20140236597 | SYSTEM AND METHOD FOR SUPERVISED CREATION OF PERSONALIZED SPEECH SAMPLES LIBRARIES IN REAL-TIME FOR TEXT-TO-SPEECH SYNTHESIS - A system and method for supervised creation of a speech samples library for text-to speech synthesis are provided. The method includes tracking at least one speech unit in an existing speech samples library to determine if the existing speech samples library achieves a desired quality; receiving at least one speech sample; analyzing the at least one received speech sample to identify at least one speech unit necessitated to obtain the desired quality of the speech samples library; and storing the at least one necessary speech unit in the speech samples library. | 08-21-2014 |
20140244252 | METHOD FOR PREPARING A TRANSCRIPT OF A CONVERSION - A method for providing participants to a multiparty meeting with a transcript of the meeting, comprising the steps of: establishing an meeting among two or more participants; exchanging during said meeting voice data as well as documents; uploading at least a part of said voice data and at least a part of said documents to a remote speech recognition server ( | 08-28-2014 |
20140244253 | Systems and Methods for Continual Speech Recognition and Detection in Mobile Computing Devices - The present application describes systems, articles of manufacture, and methods for continuous speech recognition for mobile computing devices. One embodiment includes determining whether a mobile computing device is receiving operating power from an external power source or a battery power source, and activating a trigger word detection subroutine in response to determining that the mobile computing device is receiving power from the external power source. In some embodiments, the trigger word detection subroutine operates continually while the mobile computing device is receiving power from the external power source. The trigger word detection subroutine includes determining whether a plurality of spoken words received via a microphone includes one or more trigger words, and in response to determining that the plurality of spoken words includes at least one trigger word, launching an application corresponding to the at least one trigger word included in the plurality of spoken words. | 08-28-2014 |
20140249813 | Methods and Systems for Interfaces Allowing Limited Edits to Transcripts - A transcript interface for displaying a plurality of words of a transcript in a text editor can be provided and configured to receive a command to edit the transcript. Limited edits to data corresponding to the transcript can be made based on in response to commands received via the user interface module. For example, edits may be limited to selection of a single word in the text editor for editing via a given command. The edit may affect an adjacent word in some instances, such as when two adjacent words are merged. In some embodiments, data corresponding to the selected word of the transcript is changed to reflect the edit without changing data defining the relative timing of those words of the transcript that are not adjacent to the selected word. | 09-04-2014 |
20140249814 | OBJECT RECOGNITION SYSTEM AND AN OBJECT RECOGNITION METHOD - An object recognition system is applicable to practical use, and utilizes image information besides speech information to improve recognition accuracy. The object recognition system comprises a speech recognition unit to determine candidates for a result of speech recognition on input speech and their likelihoods, and an image model generation unit to generate image models of a predetermined number of the candidates having the highest likelihoods. The system further comprises an image likelihood calculation unit to calculate image likelihoods of input images based on the image models, and an object recognition unit to perform object recognition using the image likelihoods. At the time of generating the image model of the candidate, the image model generation unit first searches an image model database, and, when the image model of the candidate is not found in the database, the image model generation unit generates said image model from image information on the web. | 09-04-2014 |
20140249815 | METHOD, APPARATUS AND COMPUTER PROGRAM PRODUCT FOR PROVIDING TEXT INDEPENDENT VOICE CONVERSION - An apparatus for providing text independent voice conversion may include a first voice conversion model and a second voice conversion model. The first voice conversion model may be trained with respect to conversion of training source speech to synthetic speech corresponding to the training source speech. The second voice conversion model may be trained with respect to conversion to training target speech from synthetic speech corresponding to the training target speech. An output of the first voice conversion model may be communicated to the second voice conversion model to process source speech input into the first voice conversion model into target speech corresponding to the source speech as the output of the second voice conversion model. | 09-04-2014 |
20140257806 | FLEXIBLE ANIMATION FRAMEWORK FOR CONTEXTUAL ANIMATION DISPLAY - A method of providing a custom response in conjunction with an application providing speech interaction is described. In one embodiment, the method comprises determining a context of a current interaction with the user, and identifying an associated custom animation, when the associated custom animation exists. The method further comprises displaying the custom animation by overlaying it over a native response of the application. In one embodiment, when no custom animation exists, the method determines whether there is a default animation, and displays the default animation as part of a state change of the application. | 09-11-2014 |
20140257807 | SPEECH RECOGNITION AND INTERPRETATION SYSTEM - A method of providing a task assistant comprising starting to receive speech input from a user, and identifying a format associated with a destination for speech input based on a flag associated with the destination field. When the format comprises dictation, converting the speech to text, and inserting it into the destination location, and when the format comprises an intent, determining a meaning of the input, and sending a formatted query to an application. The method further comprising receiving data from the application in response to the intent and providing a response to the user through multimodal output. | 09-11-2014 |
20140257808 | APPARATUS AND METHOD FOR REQUESTING A TERMINAL TO PERFORM AN ACTION ACCORDING TO AN AUDIO COMMAND - An apparatus and method for performing a function on a terminal according to a received audio command are provided. The method includes receiving an audio command, determining a command target based on the audio command, and performing a function associated with the command target. | 09-11-2014 |
20140278400 | Search Results Using Intonation Nuances - Systems and methods for responding to an audio query are presented. More particularly, vocalization nuances of a vocalized search query (audio query) are identified are utilized in responding to the audio query. In addition to converting the audio query to a textual representation, vocalization nuances of the audio query are identified. Search results are identified according to the textual representation of the audio query and in light of the vocalization nuances. A search results presentation is prepared in response to the audio query, where the search results presentation is based on the identified search results and also based on the vocalization nuances. The search results presentation is returned in response to the audio query. | 09-18-2014 |
20140278401 | IDENTIFYING CORRESPONDING POSITIONS IN DIFFERENT REPRESENTATIONS OF A TEXTUAL WORK - Described herein are techniques for determining corresponding positions between different representations of a textual work. In some of the techniques, portions of one or more representations may be processed. A determination of a corresponding position may be made in response to a request received from a user, such as a reader that desires to switch between representations. The request may indicate a position in one representation and the representation to which the user would like to switch. In response to receiving the request, one or more portions of one or more representations of a textual work may be processed. In some techniques, a corresponding position between different representations may be determined without processing the entirety of one or more representations of the textual work. For example, a corresponding position may be determined without processing an entire audio representation. | 09-18-2014 |
20140278402 | Automatic Channel Selective Transcription Engine - An Automatic Channel Selective Transcription Engine (ACSTE) is provided that is capable of establishing a telephone call that provides automatic transcription of the voice signals of the different parties to the telephone call even during periods, in the telephone call, where two or more of the parties are speaking simultaneously. The ACSTE has access to the transmit voice signals of each of the parties to the telephone call. Each of the transmit voice signals is transmitted through its respective transmit voice channel. The transmit voice signals are processed, isolated and transcribed automatically to become associated text signals, which are transmitted over text channels to corresponding parties to the telephone call. | 09-18-2014 |
20140278403 | SYSTEMS AND METHODS FOR INTERACTIVE SYNTHETIC CHARACTER DIALOGUE - Various of the disclosed embodiments concern systems and methods for conversation-based human-computer interactions. In some embodiments, the system includes a plurality of interactive scenes. A user may access each scene and engage in conversation with a synthetic character regarding an activity associated with that active scene. In certain embodiments, a central server may house a plurality of waveforms associated with the synthetic character's speech, and may dynamically deliver the waveforms to a user device in conjunction with the operation of an artificial intelligence. In other embodiments, the character's speech is generated using a text-to-speech system. | 09-18-2014 |
20140278404 | AUDIO MERGE TAGS - A method of creating a message. The method includes recording a message. The method also includes identifying an audio merge tag in the message. The method further includes replacing the audio merge tag with alternative audio. | 09-18-2014 |
20140278405 | AUTOMATIC NOTE TAKING WITHIN A VIRTUAL MEETING - Arrangements relate to automatically taking notes in a virtual meeting. The virtual meeting has meeting content that includes a plurality of meeting content streams. One or more of the meeting content streams is in a non-text format. The one or more meeting content streams in a non-text format can be converted into text. As a result, the plurality of meeting content streams is in text format. The text of the plurality of meeting content streams can be analyzed to identify a key element within the text. Consolidated system notes that include the key element can be generated. | 09-18-2014 |
20140278406 | OBTAINING DATA FROM UNSTRUCTURED DATA FOR A STRUCTURED DATA COLLECTION - Techniques for obtaining data from unstructured data for a structured data collection include receiving unstructured data that includes text; identifying an attribute associated with a structured data collection; obtaining at least one of historical data associated with the attribute or additional data associated with a user of the computing system; identifying one or more terms from the unstructured data as being associated with the attribute based on at least one of the historical data or the additional data; and storing the identified one or more terms in a data record of the unstructured data collection. | 09-18-2014 |
20140278407 | LANGUAGE MODELING OF COMPLETE LANGUAGE SEQUENCES - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language modeling of complete language sequences. Training data indicating language sequences is accessed, and counts for a number of times each language sequence occurs in the training data are determined. A proper subset of the language sequences is selected, and a first component of a language model is trained. The first component includes first probability data for assigning scores to the selected language sequences. A second component of the language model is trained based on the training data, where the second component includes second probability data for assigning scores to language sequences that are not included in the selected language sequences. Adjustment data that normalizes the second probability data with respect to the first probability data is generated, and the first component, the second component, and the adjustment data are stored. | 09-18-2014 |
20140278408 | MOBILE TERMINAL AND METHOD OF CONTROLLING THE MOBILE TERMINAL - A mobile terminal including a wireless communication unit configured to wirelessly communicate with at least one other terminal; a memory configured to store recorded voice data; a display unit configured to display a graphic object representing a reproduction progress of the recorded voice data; and a controller configured to receive a selection signal indicating a portion of the graphic object has been selected, select a section of the recorded voice data including a point-in-time at which the graphic object is selected, convert keyword voice data included in the selected section of the recorded voice data to keyword text data, and display the keyword text data on the display unit. | 09-18-2014 |
20140278409 | PRESERVING PRIVACY IN NATURAL LANGAUGE DATABASES - An apparatus and a method for preserving privacy in natural language databases are provided. Natural language input may be received. At least one of sanitizing or anonymizing the natural language input may be performed to form a clean output. The clean output may be stored. | 09-18-2014 |
20140278410 | TEXT PROCESSING USING NATURAL LANGUAGE UNDERSTANDING - Techniques for converting spoken speech into written speech are provided. The techniques include transcribing input speech via speech recognition, mapping each spoken utterance from input speech into a corresponding formal utterance, and mapping each formal utterance into a stylistically formatted written utterance. | 09-18-2014 |
20140288929 | Multi-Modal Input on an Electronic Device - A computer-implemented input-method editor process includes receiving a request from a user for an application-independent input method editor having written and spoken input capabilities, identifying that the user is about to provide spoken input to the application-independent input method editor, and receiving a spoken input from the user. The spoken input corresponds to input to an application and is converted to text that represents the spoken input. The text is provided as input to the application. | 09-25-2014 |
20140297276 | EDITING APPARATUS, EDITING METHOD, AND COMPUTER PROGRAM PRODUCT - According to an embodiment, an editing apparatus includes a receiver and a controller. The receiver is configured to receive input data. The controller is configured to produce one or more operable target objects from the input data, receive operation through a screen, and produce an editing result object by performing editing processing on the target object designated in the operation. | 10-02-2014 |
20140297277 | Systems and Methods for Automated Scoring of Spoken Language in Multiparty Conversations - Systems and methods are provided for scoring spoken language in multiparty conversations. A computer receives a conversation between an examinee and at least one interlocutor. The computer selects a portion of the conversation. The portion includes one or more examinee utterances and one or more interlocutor utterances. The computer assesses the portion using one or more metrics, such as: a pragmatic metric for measuring a pragmatic fit of the one or more examinee utterances; a speech act metric for measuring a speech act appropriateness of the one or more examinee utterances; a speech register metric for measuring a speech register appropriateness of the one or more examinee utterances; and an accommodation metric for measuring a level of accommodation of the one or more examinee utterances. The computer computes a final score for the portion of the conversation based on the one or more metrics applied. | 10-02-2014 |
20140297278 | METHODS AND APPARATUS FOR LINKING EXTRACTED CLINICAL FACTS TO TEXT - A plurality of clinical facts may be extracted from a free-form narration of a patient encounter provided by a clinician. The plurality of clinical facts may include a first fact and a second fact. The first fact may be extracted from a first portion of the free-form narration, and the second fact may be extracted from a second portion of the free-form narration. A first indicator that indicates a first linkage between the first fact and the first portion of the free-form narration may be provided to a user. A second indicator, different from the first indicator, that indicates a second linkage between the second fact and the second portion of the free-form narration may also be provided to the user. | 10-02-2014 |
20140297279 | SYSTEM AND METHOD USING FEEDBACK SPEECH ANALYSIS FOR IMPROVING SPEAKING ABILITY - A speech analysis system and method for analyzing speech. The system includes: a voice recognition system for converting inputted speech to text; an analytics system for generating feedback information by analyzing the inputted speech and text; and a feedback system for outputting the feedback information. | 10-02-2014 |
20140303973 | Minimum Bayesian Risk Methods for Automatic Speech Recognition - A hypothesis space of a search graph may be determined. The hypothesis space may include n hypothesis-space transcriptions of an utterance, each selected from a search graph that includes t>n transcriptions of the utterance. An evidence space of the search graph may also be determined. The evidence space may include m evidence-space transcriptions of the utterance that are randomly selected from the search graph, where t>m. For each particular hypothesis-space transcription in the hypothesis space, an expected word error rate may be calculated by comparing the particular hypothesis-space transcription to each of the evidence-space transcriptions. Based on the expected word error rates, a lowest expected word error rate may be obtained, and the particular hypothesis-space transcription that is associated with the lowest expected word error rate may be provided. | 10-09-2014 |
20140303974 | TEXT GENERATOR, TEXT GENERATING METHOD, AND COMPUTER PROGRAM PRODUCT - According to an embodiment, a text generator includes a recognizer, a selector, and a generation unit. The recognizer is configured to recognize an acquired sound and obtain recognized character strings in recognition units and confidence levels of the recognized character strings. The selector is configured to select at least one of the recognized character strings used for a transcribed sentence on the basis of at least one of a parameter about transcription accuracy and a parameter about a workload needed for transcription. The generation unit is configured to generate the transcribed sentence using the selected recognized character strings. | 10-09-2014 |
20140303975 | INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD AND COMPUTER PROGRAM - There is provided an information processing apparatus including a history acquiring unit configured to acquire histories of information obtained by analysis of voice information including utterance content by a speaker, and a display control section configured to identifiably display each acquired history as history information in an order in which the corresponding histories are recorded in association with display information corresponding to voice recognition. | 10-09-2014 |
20140303976 | METHOD AND SYSTEM FOR DYNAMIC CREATION OF CONTEXTS - A method and a system for a speech recognition system, comprising an electronic speech-based document is associated with a document template and comprises one or more sections of text recognized or transcribed from sections of speech. The sections of speech are transcribed by the speech recognition system into corresponding sections of text of the electronic speech based document. The method includes the steps of dynamically creating sub contexts and associating the sub context to sections of text of the document template. | 10-09-2014 |
20140303977 | Synchronized Transcription Rules Handling - Methods, systems, and software are disclosed for providing rule handling functionality in a distributed transcription environment. Some embodiments provide client-server workflow management for providing and supporting distributed transcription services. Other embodiments provide audio-to-text synchronization to support certain transcription functionality. Still other embodiments provide logging functionality to support quality, personnel, billing, and/or other enterprise tasks. And other embodiments provide functionality to support rule generation, editing, validation, and/or execution. | 10-09-2014 |
20140309995 | Content-Based Audio Playback Emphasis - Techniques are disclosed for facilitating the process of proofreading draft transcripts of spoken audio streams. In general, proofreading of a draft transcript is facilitated by playing back the corresponding spoken audio stream with an emphasis on those regions in the audio stream that are highly relevant or likely to have been transcribed incorrectly. Regions may be emphasized by, for example, playing them back more slowly than regions that are of low relevance and likely to have been transcribed correctly. Emphasizing those regions of the audio stream that are most important to transcribe correctly and those regions that are most likely to have been transcribed incorrectly increases the likelihood that the proofreader will accurately correct any errors in those regions, thereby improving the overall accuracy of the transcript. | 10-16-2014 |
20140316779 | OBSERVATION PLATFORM FOR USING STRUCTURED COMMUNICATIONS - Observation platform for using structured communications. A signal from a first communication device is received at a second communication device associated with a computer system, wherein a first characteristic of the signal corresponds to an audible source and a second characteristic of the signal corresponds to information indicative of a geographic position of the first communication device. A first user associated with the first communication device is recognized at the computer system. Context information for the signal is derived at the computer system associated with the second communication device. A copy of at least one characteristic of the signal is stored in a storage medium, wherein the copy of at least one characteristic of the signal is available for developing performance metrics. The signal is relayed to a destination derived from the context information. | 10-23-2014 |
20140316780 | METHOD AND SYSTEM FOR PROVIDING AN AUTOMATED WEB TRANSCRIPTION SERVICE - A system, method and computer readable medium that provides an automated web transcription service is disclosed. The method may include receiving input speech from a user using a communications network, recognizing the received input speech, understanding the recognized speech, transcribing the understood speech to text, storing the transcribed text in a database, receiving a request via a web page to display the transcribed text, retrieving transcribed text from the database, and displaying the transcribed text to the requester using the web page. | 10-23-2014 |
20140316781 | WIRELESS TERMINAL AND INFORMATION PROCESSING METHOD OF THE WIRELESS TERMINAL - The present invention relates to a wireless terminal, an information processing method and a recording medium. The wireless terminal according to the present invention comprises a short-range communication section for connecting a short-range communication channel with a communication apparatus included in a vehicle; a checking section for checking whether the wireless terminal is positioned in the vehicle, based on connection information of the short-range communication channel through the short-range communication section; a first converting section for converting text information received through one of N applications for transceiving text information into speech information, the applications being included in the wireless terminal, in the case that the wireless terminal is positioned in the vehicle, according to a result of the checking by the checking section; a processing section for outputting the converted speech information through a speaker of the wireless terminal; an input section for receiving speech information from a user of the wireless terminal; and a second converting section for converting the received speech information into text information. The processing section transmits the converted text information through the application for transceiving text information. | 10-23-2014 |
20140324422 | SYNCHRONOUS AUDIO DISTRIBUTION TO PORTABLE COMPUTING DEVICES - Analog audio inputs are processed by a digital signal processor and rebroadcast over a local wireless network synchronous with a separate broadcast device or in real time with a live audio source. A software application on a portable computing device provides end-users the option to select from a plurality of audio source streams, cached translations of preprocessed audio streams and speech-to-text captioning. An embodiment of the invention relays wirelessly received digital audio streams converted from an analog source to the portable computing device then onto a hearing aid. | 10-30-2014 |
20140324423 | Document Extension in Dictation-Based Document Generation Workflow - An automatic speech recognizer is used to produce a structured document representing the contents of human speech. A best practice is applied to the structured document to produce a conclusion, such as a conclusion that required information is missing from the structured document. Content is inserted into the structured document based on the conclusion, thereby producing a modified document. The inserted content may be obtained by prompting a human user for the content and receiving input representing the content from the human user. | 10-30-2014 |
20140324424 | METHOD FOR PROVIDING A SUPPLEMENTARY VOICE RECOGNITION SERVICE AND APPARATUS APPLIED TO SAME - Disclosed are a method of providing a voice recognition supplementary service and an apparatus applied to the same. The method includes: generating voice information corresponding to a designated step by a provision of a voice recognition service to a terminal device and text information corresponding to the voice information; providing the voice information generated according to the designated step to the terminal device; and transmitting the generated text information to the terminal device simultaneously with the provision of the voice information and continuously displaying the transmitted text information such that the text information is synchronized with the corresponding voice information provided to the terminal device. | 10-30-2014 |
20140330558 | Enhancing Speech Recognition with Domain-Specific Knowledge to Detect Topic-Related Content - Methods, systems, and computer-readable storage media for providing action items from audio within an enterprise context. In some implementations, actions include determining a context of audio that is to be processed, providing training data to a speech recognition component, the training data being provided based on the context, receiving text from the speech recognition component, processing the text to identify one or more action items by identifying one or more concepts within the text and matching the one or more concepts to respective transitions in an automaton, and providing the one or more action items for display to one or more users. | 11-06-2014 |
20140330559 | DEVICE, SYSTEM, METHOD, AND COMPUTER-READABLE MEDIUM FOR PROVIDING INTERACTIVE ADVERTISING - A method, device, system, and computer medium for providing interactive advertising are provided. For example, a device may request an advertisement from a remote server, receive the advertisement, receive a response from a user who is listening and/or watching the advertisement, and transmit the response to the server for further action. The user may input a response by speaking. A server may receive an advertisement request from the device, select an advertisement based on pre-defined one or more criteria, transmit the selected advertisement to the device for play, receive from the device a response to the selected advertisement, and then perform an action corresponding to the received response. | 11-06-2014 |
20140330560 | USER AUTHENTICATION OF VOICE CONTROLLED DEVICES - Methods, systems, and devices are described herein. One method can include receiving a voice command from a user at a voice controlled device, determining a presence of the user to the device using a sensor, converting the voice command to a device specific command, and performing the device specific command using the device in response to the determined presence. | 11-06-2014 |
20140330561 | VOICE RECOGNITION METHOD AND ELECTRONIC DEVICE THEREOF - A voice recognition method and an electronic device thereof are provided, including executing a voice recognition function, setting a voice recognition mode based on information input from at least one sensor of the electronic device, and processing an input voice according to the set mode. | 11-06-2014 |
20140330562 | Method and Apparatus for Obtaining Information from the Web - An intelligent conversation system augmenting a conversation between two or more individuals uses a speech to text block configured to convert voices of the conversation into text, a determination circuit configured to determine topics from the text of the conversation, search parameters determined by the determination circuit from the topics are sent to an Internet, search results corresponding to the search parameters are received from the Internet; and a memory configured to store the search results received from the Internet. The speech to text block is configured to convert the search results to speech. An earphone is configured to transmit the speech to one of the two or more individuals. The speech is used by one of the individuals to augment the conversation. | 11-06-2014 |
20140337023 | SPEECH TO TEXT CONVERSION - Embodiments that relate to converting audio inputs from an environment into text are disclosed. For example, in one disclosed embodiment a speech conversion program receives audio inputs from a microphone array of a head-mounted display device. Image data is captured from the environment, and one or more possible faces are detected from image data. Eye-tracking data is used to determine a target face on which a user is focused. A beamforming technique is applied to at least a portion of the audio inputs to identify target audio inputs that are associated with the target face. The target audio inputs are converted into text that is displayed via a transparent display of the head-mounted display device. | 11-13-2014 |
20140343936 | CALENDARING ACTIVITIES BASED ON COMMUNICATION PROCESSING - A method is provided in one embodiment and includes establishing a communication session involving a first endpoint and a second endpoint that are associated with a session, the first endpoint being associated with a first identifier and the second endpoint being associated with a second identifier. The method also includes evaluating first data for the first endpoint; evaluating second data for the second point; and determining whether to initiate a calendaring activity based, at least in part, on the first data and the second data. In more specific embodiments, the method includes evaluating a first availability associated with the first endpoint; evaluating a second availability associated with the second endpoint; and suggesting a future meeting based, at least in part, on the first availability and the second availability. | 11-20-2014 |
20140343937 | INTERRUPT MODE FOR COMMUNICATION APPLICATIONS - An interrupt mode for messaging applications intended to run on smart phones, tablets and computers. The interrupt mode enables the automatic rendering of incoming messages, in accordance with various embodiments, when (i) the application is closed, (ii) the conversation for which the message pertains has not been selected for participation, (iii) the interrupt mode has been designated for the sender of the message or (iv) any combination of (i) through (iii). When a message is rendered in the interrupt mode, the media of the message is automatically rendered. As a result, the user of the communication device is interrupted. | 11-20-2014 |
20140343938 | APPARATUS FOR RECORDING CONVERSATION AND METHOD THEREOF - A method for recording conversation is provided. The method includes capturing content, receiving at least one voice signal, distinguishing at least one person corresponding to the at least one voice signal by analyzing the at least one voice signal, converting the at least one voice signal into a text corresponding to the at least one voice signal, and displaying the text in the captured content to correspond to the distinguished at least one person. | 11-20-2014 |
20140343939 | Discriminative Training of Document Transcription System - A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system may identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model using discriminative training techniques, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript. | 11-20-2014 |
20140343940 | METHOD AND APPARATUS FOR AN EXEMPLARY AUTOMATIC SPEECH RECOGNITION SYSTEM - An exemplary computer system configured to user multiple automatic speech recognizers (ASRs) with a plurality of language and acoustic models to increase the accuracy of speech recognition. | 11-20-2014 |
20140343941 | VISUALIZATION INTERFACE OF CONTINUOUS WAVEFORM MULTI-SPEAKER IDENTIFICATION - A method implemented in a computer infrastructure having computer executable code having programming instructions tangibly embodied on a computer readable storage medium. The programming instructions are operable to receive a current waveform of a communication between a plurality of participants. Additionally, the programming instructions are operable to create a voiceprint from the current waveform if the current waveform is of a human voice. Furthermore, the programming instructions are operable to determine one of whether a match exists between the voiceprint and one library waveform of one or more library waveforms, whether a correlation exists between the voiceprint and a number of library waveforms of the one or more library waveforms and whether the voiceprint is unique. Additionally, the programming instructions are operable to transcribe the current waveform into text and provide a match indication display (MID) indicating an association between the current waveform and the one or more library waveforms based on the determining. | 11-20-2014 |
20140350928 | Method For Finding Elements In A Webpage Suitable For Use In A Voice User Interface - A voice interface for web pages or other documents identifies interactive elements such as links, obtains one or more phrases of each interactive element, such as link text, title text and alternative text for images, and adds the phrases to a grammar which is used for speech recognition. A click event is generated for an interactive element having a phrase which is a best match for the voice command of a user. In one aspect, the phrases of currently-displayed elements of the document are used for speech recognition. In another aspect, phrases which are not displayed, such as title text and alternative text for images, are used in the grammar. In another aspect, updates to the document are detected and the grammar is updated accordingly so that the grammar is synchronized with the current state of the document. | 11-27-2014 |
20140350929 | METHOD AND APPARATUS FOR MANAGING AUDIO DATA IN ELECTRONIC DEVICE - A method and an apparatus for managing audio data of an electronic device which allow preliminary identification of audio data are provided. The method includes converting at least a part of audio data to a text; storing the converted text as preview data of the audio data; and displaying the stored preview data in response to a request for a preview of the audio data. | 11-27-2014 |
20140350930 | Real Time Generation of Audio Content Summaries - Audio content is converted to text using speech recognition software. The text is then associated with a distinct voice or a generic placeholder label if no distinction can be made. From the text and voice information, a word cloud is generated based on key words and key speakers. A visualization of the cloud displays as it is being created. Words grow in size in relation to their dominance. When it is determined that the predominant words or speakers have changed, the word cloud is complete. That word cloud continues to be displayed statically and a new word cloud display begins based upon a new set of predominant words or a new predominant speaker or set of speakers. This process may continue until the meeting is concluded. At the end of the meeting, the completed visualization may be saved to a storage device, sent to selected individuals, removed, or any combination of the preceding. | 11-27-2014 |
20140358536 | DATA PROCESSING METHOD AND ELECTRONIC DEVICE THEREOF - A method for operating an electronic device is provided. The method includes converting voice data into text data; displaying the text data, selecting a first section in the text data, and outputting voice data of a second section corresponding to the first section in the text data. | 12-04-2014 |
20140358537 | System and Method for Combining Speech Recognition Outputs From a Plurality of Domain-Specific Speech Recognizers Via Machine Learning - Disclosed herein are systems, methods and non-transitory computer-readable media for performing speech recognition across different applications or environments without model customization or prior knowledge of the domain of the received speech. The disclosure includes recognizing received speech with a collection of domain-specific speech recognizers, determining a speech recognition confidence for each of the speech recognition outputs, selecting speech recognition candidates based on a respective speech recognition confidence for each speech recognition output, and combining selected speech recognition candidates to generate text based on the combination. | 12-04-2014 |
20140365213 | System and Method of Improving Communication in a Speech Communication System - A speech communication system and a method of improving communication in such a speech communication system between at least a first user and a second user may be configured so the system (a) transcribes a recorded portion of a speech communication between the at least first and second user to form a transcribed portion, (b) selects and marks at least one of the words of the transcribed portion which is considered to be a keyword of the speech communication, (c) performs a search for each keyword and produces at least one definition for each keyword, (d) calculates a trustworthiness factor for each keyword, each trustworthiness factor indicating a calculated validity of the respective definition(s), and (e) displays the transcribed portion as well as each of the keywords together with the respective definition and the trustworthiness factor thereof to at least one of the first user and the second user. | 12-11-2014 |
20140365214 | Character Data Entry - Methods and apparatuses for entry of character data are disclosed. In one example, a user action is received at a headset input user interface. The user action is correlated to a character data previously stored. The character data is transmitted to a host device, the character data configured to be automatically entered in one or more data fields. | 12-11-2014 |
20140365215 | METHOD FOR PROVIDING SERVICE BASED ON MULTIMODAL INPUT AND ELECTRONIC DEVICE THEREOF - A method for operating an electronic device is provided. The method includes receiving a voice signal and an input, extracting data corresponding to the input, converting the voice signal to text data, setting an association relationship between the converted text data and the extracted data, and generating a response for the voice signal based on the converted text data, the extracted data, and the set association relationship. | 12-11-2014 |
20140365216 | SYSTEM AND METHOD FOR USER-SPECIFIED PRONUNCIATION OF WORDS FOR SPEECH SYNTHESIS AND RECOGNITION - The method is performed at an electronic device with one or more processors and memory storing one or more programs for execution by the one or more processors. A first speech input including at least one word is received. A first phonetic representation of the at least one word is determined, the first phonetic representation comprising a first set of phonemes selected from a speech recognition phonetic alphabet. The first set of phonemes is mapped to a second set of phonemes to generate a second phonetic representation, where the second set of phonemes is selected from a speech synthesis phonetic alphabet. The second phonetic representation is stored in association with a text string corresponding to the at least one word. | 12-11-2014 |
20140365217 | CONTENT CREATION SUPPORT APPARATUS, METHOD AND PROGRAM - According to one embodiment, a content creation support apparatus includes a speech synthesis unit, a speech recognition unit, an extraction unit, a detection unit, a presentation unit and a selection unit. The speech synthesis unit performs a speech synthesis on a first text. The speech recognition unit performs a speech recognition on the synthesized speech to obtain a second text. The extraction unit extracts feature values by performing a morphological analysis on each of the first and second texts. The detection unit compares a first feature value of a first difference string and a second feature value of a second difference string. The presentation unit presents correction candidate(s) according to the second feature value. The selection unit selects one of the correction candidates in accordance with an instruction from a user. | 12-11-2014 |
20140372114 | Self-Directed Machine-Generated Transcripts - In one aspect, this application describes a computer-readable storage medium storing instructions that, when executed by one or more processing devices, cause the one or more processing devices to perform operations that include receiving, from a user of a computing device, a spoken input that includes a note and an activation phrase that indicates an intent to record the note. The operations also include determining a target address based at least in part on an identifier associated with a registered user of the computing device, wherein the target address is determined without receiving, from the user, an input indicating the target address when the spoken input is received. The operations also include defining a communication that includes a machine-generated transcript of the note, and sending the communication to the target address. | 12-18-2014 |
20140372115 | Self-Directed Machine-Generated Transcripts - In one aspect, this application describes a computer-readable storage medium storing instructions that, when executed by one or more processing devices, cause the one or more processing devices to perform operations that include receiving, from a user of a computing device, a spoken input that includes a note and an activation phrase that indicates an intent to record the note. The operations also include determining a target address based at least in part on an identifier associated with a registered user of the computing device, wherein the target address is determined without receiving, from the user, an input indicating the target address when the spoken input is received. The operations also include defining a communication that includes a machine-generated transcript of the note, and sending the communication to the target address. | 12-18-2014 |
20140372116 | Robotic System with Verbal Interaction - A method and apparatus for moving an object. A verbal instruction for moving the object is received. The verbal instruction is converted into text. A logical representation of the verbal instruction is generated. A movement of a robotic system that corresponds to the verbal instruction for moving the object using a model of an environment in which the object and the robotic system are located is identified. A set of commands used by the robotic system for the movement of the robotic system is identified. The set of commands is sent to the robotic system. | 12-18-2014 |
20140372117 | TRANSCRIPTION SUPPORT DEVICE, METHOD, AND COMPUTER PROGRAM PRODUCT - According to an embodiment, a transcription support device includes a first voice acquisition unit, a second voice acquisition unit, a recognizer, a text acquisition unit, an information acquisition unit, a determination unit, and a controller. The first voice acquisition unit acquires a first voice to be transcribed. The second voice acquisition unit acquires a second voice uttered by a user. The recognizer recognizes the second voice to generate a first text. The text acquisition unit acquires a second text obtained by correcting the first text by the user. The information acquisition unit acquires reproduction information representing a reproduction section of the first voice. The determination unit determines a reproduction speed of the first voice on the basis of the first voice, the second voice, the second text, and the reproduction information. The controller reproduces the first voice at the determined reproduction speed. | 12-18-2014 |
20140372118 | METHOD AND APPARATUS FOR EXEMPLARY CHIP ARCHITECTURE - A dynamically configurable automatic speech recognizer where either or both of the acoustic model file and the language model file are changeable to improve the accuracy of human speech recognition. | 12-18-2014 |
20140379334 | NATURAL LANGUAGE UNDERSTANDING AUTOMATIC SPEECH RECOGNITION POST PROCESSING - In an automatic speech recognition post processing system, speech recognition results are received from an automatic speech recognition service. The speech recognition results may include transcribed speech, an intent classification and/or extracted fields of intent parameters. The speech recognition results are post processed for use in a specified context. All or a portion of the speech recognition results are compared to keywords that are sensitive to the specified context. The post processed speech recognition results are provided to an appropriate application which is operable to utilize the context sensitive product of post processing. | 12-25-2014 |
20140379335 | METHOD AND DEVICE OF MATCHING SPEECH INPUT TO TEXT - A method and device for matching speech to text are disclosed, the method including: receiving a speech input, the mentioned speech input carrying input speech information; obtaining initial text corresponding to the input speech information, and respective pinyin of the initial text; generating at least one approximate pinyin for the initial text based on predetermined pronunciation similarity information; and from a preset mapping relationship table, obtaining additional text corresponding to the respective pinyin of the initial text or to the at least one approximate pinyin of the initial text, wherein the preset mapping relationship table includes a respective record for each word in a word database, including respective pinyin and at least one respective approximate pinyin for said each word, and a respective mapping relation between said respective pinyin, said at least one respective approximate pinyin, and said each word. | 12-25-2014 |
20140379336 | EAR-BASED WEARABLE NETWORKING DEVICE, SYSTEM, AND METHOD - An ear-based wearable networking device, system, and method is disclosed. The wearable networking device, system, and method is configured to monitor and process plural conversations in proximity of a user and store the associated information in a cloud system. The cloud system is configured to, based on the associated information, process, archive, create alerts, create reminders, retrieve, etc. | 12-25-2014 |
20140379337 | METHOD AND SYSTEM FOR TESTING CLOSED CAPTION CONTENT OF VIDEO ASSETS - A method and system for monitoring video assets provided by a multimedia content distribution network includes testing closed captions provided in output video signals. A video and audio portion of a video signal are acquired during a time period that a closed caption occurs. A first text string is extracted from a text portion of a video image, while a second text string is extracted from speech content in the audio portion. A degree of matching between the strings is evaluated based on a threshold to determine when a caption error occurs. Various operations may be performed when the caption error occurs, including logging caption error data and sending notifications of the caption error. | 12-25-2014 |
20150012271 | SPEECH RECOGNITION USING DOMAIN KNOWLEDGE - In some implementations, data that indicates multiple candidate transcriptions for an utterance is received. For each of the candidate transcriptions, data relating to use of the candidate transcription as a search query is received, a score that is based on the received data is provided to a trained classifier, and a classifier output for the candidate transcription is received. One or more of the candidate transcriptions may be selected based on the classifier outputs. | 01-08-2015 |
20150012272 | WIRELESS TERMINAL AND INFORMATION PROCESSING METHOD OF THE WIRELESS TERMINAL - The present invention relates to a wireless terminal, an information processing method and a recording medium. The wireless terminal according to the present invention comprises a checking section for checking whether the wireless terminal is positioned in a vehicle; a first converting section for converting text information received through one of N applications for transceiving text information into speech information, the applications being included in the wireless terminal, in the case that the wireless terminal is positioned in the vehicle, according to a result of the checking by the checking section; a processing section for outputting the converted speech information through a speaker of the wireless terminal; an input section for receiving speech information from a user of the wireless terminal; and a second converting section for converting the received speech information into text information. The processing section transmits the converted text information through the application for transceiving text information. | 01-08-2015 |
20150019216 | PERFORMING AN OPERATION RELATIVE TO TABULAR DATA BASED UPON VOICE INPUT - Described herein are various technologies pertaining to performing an operation relative to tabular data based upon voice input. An ASR system includes a language model that is customized based upon content of the tabular data. The ASR system receives a voice signal that is representative of speech of a user. The ASR system creates a transcription of the voice signal based upon the ASR being customized with the content of the tabular data. The operation relative to the tabular data is performed based upon the transcription of the voice signal. | 01-15-2015 |
20150019217 | SYSTEMS AND METHODS FOR RESPONDING TO NATURAL LANGUAGE SPEECH UTTERANCE - Systems and methods are provided for receiving speech and non-speech communications of natural language questions and/or commands, transcribing the speech and non-speech communications to textual messages, and executing the questions and/or commands. The invention applies context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users presenting questions or commands across multiple domains. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context of the speech and non-speech communications and presenting the expected results for a particular question or command. | 01-15-2015 |
20150025882 | METHOD FOR OPERATING CONVERSATION SERVICE BASED ON MESSENGER, USER INTERFACE AND ELECTRONIC DEVICE USING THE SAME - A method and a terminal for operating a conversation service function based on a messenger are provided. The terminal includes a radio frequency communication unit configured to support transmission and reception of conversation information including at least one of text data, still image data, video image data, and voice data during an operation of a conversation function based on the messenger, a display unit configured to display a conversation function screen according to an operation of the conversation function based on the messenger, and a controller configured to output a text message based on the conversation information, and to output a thumbnail image corresponding to one of the received still image data and the received video image data on a user designation profile image region, when the still image data and the video image data are received during the operation of the conversation function based on the messenger. | 01-22-2015 |
20150025883 | METHOD AND APPARATUS FOR RECOGNIZING VOICE IN PORTABLE DEVICE - A method and an apparatus for recognizing voice in a portable terminal, and more particularly, a method and an apparatus for recognizing voice by re-combining commands in a portable terminal is provided. The method of controlling an application in a portable terminal includes displaying a voice control application, extracting keywords in a unit of a command from a received voice when receiving the voice, and classifying the keywords, rearranging the classified keywords according to a set control order, and generating a final command, and executing a function by processing the final command. | 01-22-2015 |
20150025884 | CORRECTIVE FEEDBACK LOOP FOR AUTOMATED SPEECH RECOGNITION - A method for facilitating the updating of a language model includes receiving, at a client device, via a microphone, an audio message corresponding to speech of a user; communicating the audio message to a first remote server; receiving, that the client device, a result, transcribed at the first remote server using an automatic speech recognition system (“ASR”), from the audio message; receiving, at the client device from the user, an affirmation of the result; storing, at the client device, the result in association with an identifier corresponding to the audio message; and communicating, to a second remote server, the stored result together with the identifier. | 01-22-2015 |
20150025885 | SYSTEM AND METHOD OF DICTATION FOR A SPEECH RECOGNITION COMMAND SYSTEM - In embodiments of the present invention, a system and computer-implemented method for enabling dictation may include parsing standard reports in order to identify a plurality of logical phrases in the report used for discrete sections and descriptions. In the report method, the phrases may be parsed and identifier words throughout the report may be compared to eliminate ambiguities. The method may then involve constructing text macros that follow the parsed text, thereby enabling the user to speak the identifiers to indicate full, formatted text. Finally, the report method may involve constructing a mnemonic document so both beginner and experienced users can easily read the identifiers out loud to produce a report. The result of the method is an intuitive, notes-style way to use speech commands to quickly produce a standard, formatted report. | 01-22-2015 |
20150032448 | METHOD AND APPARATUS FOR EXPANSION OF SEARCH QUERIES ON LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION TRANSCRIPTS - The subject matter discloses a method for expansion of search queries on large vocabulary continuous speech recognition transcripts comprising: obtaining a textual transcript of audio interaction generated by the large vocabulary continuous speech recognition; generating a topic model from the textual transcripts; said topic model comprises a plurality of topics wherein each topic of the plurality of topics comprises a list of keywords; obtaining a search term; associating a topic from the topic model with the search term; and generating a list of candidate term expansion words by selecting keywords from the list of keywords of the associated topic; said candidate term expansion words are of high probability to be substitution errors of the search term that are generated by the large vocabulary continuous speech recognition. | 01-29-2015 |
20150032449 | Method and Apparatus for Using Convolutional Neural Networks in Speech Recognition - Speech recognition techniques are employed in a variety of applications and services serving large numbers of users. As such, there is an increasing demand for speech recognition systems with enhanced performance. Specifically, enhanced performance in large vocabulary continuous speech recognition (LVCSR) systems is a market demand. Herein, convolutional neural networks are explored as an alternative speech recognition approach and different CNN architectures are tested. According to at least one example embodiment, a method and corresponding apparatus for performing speech recognition comprise employing a CNN with at least two convolutional layers and at least two fully-connected layers in speech recognition. Using the CNN a textual representation of input audio data may be provided based on output data by the CNN. | 01-29-2015 |
20150032450 | APPARATUS AND METHOD FOR PROVIDING ENHANCED TELEPHONIC COMMUNICATIONS - A system that incorporates teachings of the present disclosure may include, for example, device that determines a presence of a first communication device being nearby and engaged in an active voice call with a second communication device. An offer to transcribe a portion of the active voice call is presented in response to the determination. An audio portion of the active voice call is transcribed in response to an acceptance of the offer, to generate a textual transcription. The textual transcription is provided for presentation on a first display device. In response to detecting a presence of the first communication device being nearby another device, provision of the textual transcription to the first display device is discontinued, while the textual transcription is provided to the other device for presentation at a second display device. Other embodiments are disclosed. | 01-29-2015 |
20150039306 | System and Method of Automated Evaluation of Transcription Quality - Systems and methods automatedly evaluate a transcription quality. Audio data is obtained. The audio data is segmented into a plurality of utterances with a voice activity detector operating on a computer processor. The plurality of utterances are transcribed into at least one word lattice with a large vocabulary continuous speech recognition system operating on the processor. A minimum Bayes risk decoder is applied to the at least one word lattice to create at least one confusion network. At least conformity ratio is calculated from the at least one confusion network. | 02-05-2015 |
20150039307 | INTERFACING DEVICE AND METHOD FOR SUPPORTING SPEECH DIALOGUE SERVICE - An interfacing device and method to support a speech dialogue service based on a multi-modal input are provided. The method includes executing an interface for the speech dialogue service, receiving a user input, through the executed interface, including a voice input and a non-voice input, transmitting, as a request signal to a server, at least one of the voice input or a text extracted from the voice input when the received user input is the voice input, transmitting, as the request signal to the server, a text extracted from the non-voice input when the received user input is the non-voice input, receiving a result of dialogue recognition in response to the request signal from the server, and executing a response to the received user input on the basis of the received result of dialogue recognition. | 02-05-2015 |
20150039308 | APPARATUS, SERVER, AND METHOD FOR PROVIDING CONVERSATION TOPIC - A conversation topic providing method includes: converting voice data, of a conversation of a user who is on a phone, into text; selecting a keyword, indicating an intention of the user, from the text; obtaining information of interest with respect to the keyword; and determining topics relating to the keyword based on user information. | 02-05-2015 |
20150046158 | METHOD AND APPARATUS FOR VOICE MODIFICATION DURING A CALL - A method for voice modification during a telephone call comprising receiving a source audio signal associated with at least one participant, wherein the source audio signal comprises a voice of the at least one participant, detecting a source dialect of the at least one participant, selecting a target dialect based on at least a characteristic of a target participant and creating a modulated audio signal based on the source audio signal, the source dialect, and the target dialect and transmitting the modulated audio signal to the target participant. | 02-12-2015 |
20150046159 | UNSUPERVISED AND ACTIVE LEARNING IN AUTOMATIC SPEECH RECOGNITION FOR CALL CLASSIFICATION - Utterance data that includes at least a small amount of manually transcribed data is provided. Automatic speech recognition is performed on ones of the utterance data not having a corresponding manual transcription to produce automatically transcribed utterances. A model is trained using all of the manually transcribed data and the automatically transcribed utterances. A predetermined number of utterances not having a corresponding manual transcription are intelligently selected and manually transcribed. Ones of the automatically transcribed data as well as ones having a corresponding manual transcription are labeled. In another aspect of the invention, audio data is mined from at least one source, and a language model is trained for call classification from the mined audio data to produce a language model. | 02-12-2015 |
20150046160 | Systems, Computer-Implemented Methods, and Tangible Computer-Readable Storage Media For Transcription Alighnment - Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for captioning a media presentation. The method includes receiving automatic speech recognition (ASR) output from a media presentation and a transcription of the media presentation. The method includes selecting via a processor a pair of anchor words in the media presentation based on the ASR output and transcription and generating captions by aligning the transcription with the ASR output between the selected pair of anchor words. The transcription can be human-generated. Selecting pairs of anchor words can be based on a similarity threshold between the ASR output and the transcription. In one variation, commonly used words on a stop list are ineligible as anchor words. The method includes outputting the media presentation with the generated captions. The presentation can be a recording of a live event. | 02-12-2015 |
20150051908 | METHODS AND APPARATUSES RELATED TO TEXT CAPTION ERROR CORRECTION - Systems and methods related to providing error correction in a text caption are disclosed. A method may comprise displaying a text caption including one or more blocks of text on each of a first device and a second device remote from the first device. The method may also include generating another block of text and replacing a block of text of the text caption with the another block of text. Furthermore, the method may include displaying the text caption on the second device having the block of text of the first text caption replaced by the another block of text. | 02-19-2015 |
20150058005 | Automatic Collection of Speaker Name Pronunciations - An audio stream is segmented into a plurality of time segments using speaker segmentation and recognition (SSR), with each time segment corresponding to the speaker's name, producing an SSR transcript. The audio stream is transcribed into a plurality of word regions using automatic speech recognition (ASR), with each of the word regions having a measure of the confidence in the accuracy of the translation, producing an ASR transcript. Word regions with a relatively low confidence in the accuracy of the translation are identified. The low confidence regions are filtered using named entity recognition (NER) rules to identify low confidence regions that a likely names. The NER rules associate a region that is identified as a likely name with the name of the speaker corresponding to the current, the previous, or the next time segment. All of the likely name regions associated with that speaker's name are selected. | 02-26-2015 |
20150058006 | PHONETIC ALIGNMENT FOR USER-AGENT DIALOGUE RECOGNITION - A method for speech to text transcription uses a knowledge base containing solution descriptions, each describing, in words, a solution to a respective problem. An audio recording of a dialogue between an agent and a user in which the agent had access to the knowledge base is received. A sequence of phonemes based on the agent's part of the audio recording is identified and from this, a preliminary transcription is made which includes a sequence of words recognized as corresponding to phonemes in the identified sequence of phonemes together with any unrecognized phonemes from the phoneme sequence that are not recognized as corresponding to one of the recognized words. The preliminary transcription is revised by replacing one or more of the unrecognized phonemes with a word or words from a solution description that includes words which match adjacent words of the sequence of recognized words. | 02-26-2015 |
20150058007 | METHOD FOR MODIFYING TEXT DATA CORRESPONDING TO VOICE DATA AND ELECTRONIC DEVICE FOR THE SAME - A method of modifying a voice to a text includes reproducing voice data included in a voice file, displaying text data included in the voice file; determining whether a user input for editing the text data is input, and editing the text data in response to the user input, if the user input for editing the text data is input. | 02-26-2015 |
20150058008 | INFORMATION PROCESSING APPARATUS, METHOD FOR PROCESSING INFORMATION, AND PROGRAM - An information processing apparatus includes a specific-evaluation information acquisition unit that acquires an evaluation of a predetermined content item as a specific evaluation, the evaluation having been input by a user in accordance with an ordinal scale; a language-evaluation information extraction unit that acquires a language evaluation from language information regarding an evaluation sentence in which an evaluation of the predetermined content item is expressed in a language, the evaluation sentence having been input by the user; and a recommendation unit that recommends a content item that matches the user's preference in accordance with whether the specific evaluation is a positive or negative evaluation and whether the language evaluation is a positive or negative evaluation. | 02-26-2015 |
20150058009 | APPARATUS AND METHOD FOR PROVIDING MESSAGES IN A SOCIAL NETWORK - A system that incorporates teachings of the present disclosure may include, for example, a server including a controller to receive audio signals and content identification information from a media processor, generate text representing a voice message based on the audio signals, determine an identity of media content based on the content identification information, generate an enhanced message having text and additional content where the additional content is obtained by the controller based on the identity of the media content, and transmit the enhanced message to the media processor for presentation on the display device, where the enhanced message is accessible by one or more communication devices that are associated with a social network and remote from the media processor. Other embodiments are disclosed. | 02-26-2015 |
20150066501 | PROVIDING AN ELECTRONIC SUMMARY OF SOURCE CONTENT - A technique provides an electronic summary of source content. The technique involves performing, on the source content, a content recognition operation to electronically generate text output from the source content. The technique further involves electronically evaluating text portions of the text output based on predefined usability criteria to produce a respective set of usability properties for each text portion of the text output. The technique further involves providing, as the electronic summary of the source content, summarization output which summarizes the source content. The summarization output includes a particular text portion of the text output which is selected from the text portions of the text output based on the respective set of usability properties for each text portion of the text output. | 03-05-2015 |
20150066502 | System and Method of Automated Model Adaptation - Methods, systems, and computer readable media for automated transcription model adaptation includes obtaining audio data from a plurality of audio files. The audio data is transcribed to produce at least one audio file transcription which represents a plurality of transcription alternatives for each audio file. Speech analytics are applied to each audio file transcription. A best transcription is selected from the plurality of transcription alternatives for each audio file. Statistics from the selected best transcription are calculated. An adapted model is created from the calculated statistics. | 03-05-2015 |
20150066503 | System and Method of Automated Language Model Adaptation - Systems and methods of automated adaptation of a language model for transcription of audio data include obtaining audio data. The audio data is transcribed with a language model to produce a plurality of audio file transcriptions. A quality of the plurality of audio file transcriptions is evaluated. At least one best transcription from a plurality of audio file transcriptions is selected based upon the evaluated quality. Statistics are calculated from the selected at least one best transcription from the plurality of audio file transcriptions. The language model is modified from the calculated statistics. | 03-05-2015 |
20150066504 | System and Method for Determining the Compliance of Agent Scripts - Systems and methods of script identification in audio data obtained from audio data. The audio data is segmented into a plurality of utterances. A script model representative of a script text is obtained. The plurality of utterances are decoded with the script model. A determination is made if the script text occurred in the audio data. | 03-05-2015 |
20150066505 | Transcription of Speech - A speech media transcription system comprises a playback device arranged to play back speech delimited in segments. The system is programmed to provide, for a segment being transcribed, an adaptive estimate of the proportion of the segment that has not been transcribed by a transcriber. The device is arranged to play back that proportion of the segment, optionally after having already played back the entire segment. Additionally, a segmentation engine is arranged to divide speech media into a plurality of segments by identifying speech as such and using timing information but without using a machine conversion of the speech media into text or a representation of text. | 03-05-2015 |
20150066506 | System and Method of Text Zoning - A method of zoning a transcription of audio data includes separating the transcription of audio data into a plurality of utterances. A that each word in an utterances is a meaning unit boundary is calculated. The utterance is split into two new utterances at a work with a maximum calculated probability. At least one of the two new utterances that is shorter than a maximum utterance threshold is identified as a meaning unit. | 03-05-2015 |
20150073788 | MIXTURE OF N-GRAM LANGUAGE MODELS - Methods, systems, and apparatus, including computer programs encoded on computer storage media, for creating a static language model from a mixture of n-gram language models. One of the methods includes receiving a set of development sentences W, receiving a set of language models G | 03-12-2015 |
20150073789 | CONVERTING DATA BETWEEN USERS - A method and system for converting voice data to text data between users is provided. The method includes receiving voice data from at least one user and determining phoneme data items corresponding to the voice data. Conversion candidate string representations of the phoneme data items are identified by referencing a conversion dictionary defining the conversion candidate string representations for each phoneme data item. The plurality of conversion candidate string representations are scored and a specified conversion candidate string representation is selected as text data based on the scores. The text data is transmitted to a terminal device accessed by the at least one user. | 03-12-2015 |
20150073790 | AUTO TRANSCRIPTION OF VOICE NETWORKS - The systems, methods, and devices of the various embodiments enable a transcription of voice communications to be provided in parallel with an audio recording of the voice communications. | 03-12-2015 |
20150081291 | MOBILE TERMINAL AND METHOD OF CONTROLLING THE SAME - A mobile terminal including a wireless communication unit configured to perform wireless communication; a microphone configured to receive an input voice; a touch screen; and a controller configured to receive a written touch input on the touch screen corresponding to the input voice, recognize the input voice while a voice recognition mode is activated, and display extracted information extracted from the recognized input voice on the touch screen based on a comparison of the recognized input voice and the written input. | 03-19-2015 |
20150081292 | METHOD AND DEVICE FOR AUTOMATICALLY MANAGING AUDIO AIR CONTROL MESSAGES ON AN AIRCRAFT - Methods and devices for automatically managing audio air control messages on an aircraft are described. The device comprises a unit for automatically transcribing an audio message received on board an aircraft into a textual message, a unit for automatically processing the textual message in order to extract all the indications included in this message, and a unit for automatically displaying, for each extracted indication, on at least one screen of the aircraft cockpit, an information message relating to the indication and a validation request for the pilot. | 03-19-2015 |
20150081293 | SPEECH RECOGNITION USING PHONEME MATCHING - A system, method and computer program is provided for generating customized text representations of audio commands. A first speech recognition module may be used for generating a first text representation of an audio command based on a general language grammar. A second speech recognition module may be used for generating a second text representation of the audio command, the second module including a custom language grammar that may include contacts for a particular user. Entity extraction is applied to the second text representation and the entities are checked against a file containing personal language. If the entities are found in the user-specific language, the two text representations may be fused into a combined text representation and named entity recognition may be performed again to extract further entities. | 03-19-2015 |
20150081294 | SPEECH RECOGNITION FOR USER SPECIFIC LANGUAGE - A system, method and computer program is provided for generating customized text representations of audio commands. A first speech recognition module may be used for generating a first text representation of an audio command based on a general language grammar. A second speech recognition module may be used for generating a second text representation of the audio command, the second module including a custom language grammar that may include contacts for a particular user. Entity extraction is applied to the second text representation and the entities are checked against a file containing personal language. If the entities are found in the user-specific language, the two text representations may be fused into a combined text representation and named entity recognition may be performed again to extract further entities. | 03-19-2015 |
20150088499 | ENHANCED VOICE COMMAND OF COMPUTING DEVICES - Embodiments provide user access to software functionality such as enterprise-related software applications and accompanying actions and data. An example method includes receiving natural language input; analyzing the natural language input and selecting one or more portions of the natural language input; employing the one or more keywords to select software functionality; and presenting one or more user interface controls in combination with a representation of the natural language input, wherein the one or more user interface controls are adapted to facilitate user access to the software functionality. In a more specific embodiment, the natural language input is functionally augmented via in-line tagging of keywords or phrases, wherein the tags act as user interface controls for accessing selected software functionality. | 03-26-2015 |
20150088500 | WEARABLE COMMUNICATION ENHANCEMENT DEVICE - Embodiments disclosed herein may include a wearable apparatus including a frame having a memory and processor associated therewith. The apparatus may include a camera associated with the frame and in communication with the processor, the camera configured to track an eye of a wearer. The apparatus may also include at least one microphone associated with the frame. The at least one microphone may be configured to receive a directional instruction from the processor. The directional instruction may be based upon an adaptive beamforming analysis performed in response to a detected eye movement from the infrared camera. The apparatus may also include a speaker associated with the frame configured to provide an audio signal received at the at least one microphone to the wearer. | 03-26-2015 |
20150088501 | METHODS AND APPARATUS FOR SIGNAL SHARING TO IMPROVE SPEECH UNDERSTANDING - Methods and devices are described for allowing users to use portable computer devices such as smart phones to share microphone signals and/or closed captioning text generated by speech recognition processing of the microphone signals. Under user direction, the portable devices exchange messages to form a signal sharing group to facilitate their conversation. | 03-26-2015 |
20150088502 | Voice Recognition System For Interactively Gathering Information To Generate Documents - A voice recognition system for interactively gathering information to generate a document, form, or application. An user establishes a connection with the voice recognition system and provides verbal responses to a plurality of verbal questions generated by voice recognition system to compile a document, form or application. The voice recognition system converts the user's verbal responses to textually converted responses. | 03-26-2015 |
20150088503 | CONTEXTUAL CONVERSION PLATFORM FOR GENERATING PRIORITIZED REPLACEMENT TEXT FOR SPOKEN CONTENT OUTPUT - A contextual conversion platform, and method for converting text-to-speech, are described that can convert content of a target to spoken content. Embodiments of the contextual conversion platform can identify certain contextual characteristics of the content, from which can be generated a spoken content input. This spoken content input can include tokens, e.g., words and abbreviations, to be converted to the spoken content, as well as substitution tokens that are selected from contextual repositories based on the context identified by the contextual conversion platform. | 03-26-2015 |
20150088504 | Computer-Assisted Abstraction of Data and Document Coding - A computer-assisted method of abstracting and coding data includes receiving one or more documents is disclosed. The methods and systems extract information from a record based on extraction rules that correspond to an identified record type, determine codes corresponding to the information extracted from the record, present the correspondence between the extracted information and the codes, receive from the user-input device a validation of the correspondence between the extracted information and one of the codes, and output a report including the validated information and the validated code. | 03-26-2015 |
20150088505 | AUDIO SYNCHRONIZATION FOR DOCUMENT NARRATION WITH USER-SELECTED PLAYBACK - Disclosed are techniques and systems to provide a narration of a text. In some aspects, the techniques and systems described herein include generating a timing file that includes elapsed time information for expected portions of text that provides an elapsed time period from a reference time in an audio recording to each portion of text in recognized portions of text | 03-26-2015 |
20150100313 | PERSONIFICATION OF COMPUTING DEVICES FOR REMOTE ACCESS - Techniques described herein relate to remote access of computing devices. In one implementation, a method may include receiving a voice command from a first computing device associated with a user and parsing the voice command. The parsing may include determining a label, assigned by the user, to identify a second computing device associated with the user, and an action associated with the second computing device. The method may further include transmitting an indication of the action to the second computing device; receiving results, from the second computing device, relating to execution of the action by the second computing device; and transmitting the results to the first computing device. | 04-09-2015 |
20150100314 | MULTIPLE WEB-BASED CONTENT CATEGORY SEARCHING IN MOBILE SEARCH APPLICATION - In embodiments of the present invention improved capabilities are described for multiple web-based content category searching for web content on a mobile communication facility comprising capturing speech presented by a user using a resident capture facility on the mobile communication facility; transmitting at least a portion of the captured speech as data through a wireless communication facility to a speech recognition facility; generating speech-to-text results for the captured speech utilizing the speech recognition facility; and transmitting the text results and a plurality of formatting rules specifying how search text may be used to form a query for a search capability on the mobile communications facility, wherein each formatting rule is associated with a category of content to be searched. | 04-09-2015 |
20150100315 | METHODS AND APPARATUS FOR CONDUCTING INTERNET PROTOCOL TELEPHONY COMMUNICATIONS - IP telephony communications are conducted by sending both audio data produced by a CODEC that represents received spoken audio input, and a textual representation of the spoken audio input. A receiving device utilizes the textual representation of the spoken audio input to help recreate the spoken audio input when a portion of the CODEC data is missing. The textual representation can be generated by a speech-to-text function. Alternatively, the textual representation can be a notation of extracted phonemes. | 04-09-2015 |
20150106089 | Name Based Initiation of Speech Recognition - A computer-implemented method includes listening for audio name information indicative of a name of a computer, with the computer configured to listen for the audio name information in a first power mode that promotes a conservation of power; detecting the audio name information indicative of the name of the computer; after detection of the audio name information, switching to a second power mode that promotes a performance of speech recognition; receiving audio command information; and performing speech recognition on the audio command information. | 04-16-2015 |
20150106090 | DISPLAY APPARATUS AND METHOD OF PERFORMING VOICE CONTROL - A voice control method and display apparatus are provided. The voice control method includes converting a voice of a user into text in response to the voice being input during a voice input mode; performing a control operation corresponding to the text; determining whether speech of the user has finished based on a result of the performing the control operation; awaiting input of a subsequent voice of the user during a predetermined standby time in response to determining that the speech of the user has not finished; and releasing the voice input mode in response to determining that the speech of the user has finished. | 04-16-2015 |
20150106091 | CONFERENCE TRANSCRIPTION SYSTEM AND METHOD - A system and method include processing multiple individual participant speech in a conference call with an audio speech recognition system to create a transcript for each participant, assembling the transcripts into a single transcript having participant identification for each speaker in the single transcript, and making the transcript searchable. In one embodiment, encoder states are dynamically tracked and the state of each encoder is continuously tracked to allow interchange of state between encoders without creating audio artifacts, and re-initializing an encoder during a brief period of natural silence for encoders whose states continuously diverge. In yet a further embodiment, tracking of how each of multiple users has joined a conference call is performed to determine and utilize different messaging mechanisms for users. | 04-16-2015 |
20150106092 | SYSTEM, METHOD, AND COMPUTER PROGRAM FOR INTEGRATING VOICE-TO-TEXT CAPABILITY INTO CALL SYSTEMS - An interface for handling patient requests in voice calls sent from a call system includes a voice recognition engine that processes the voice calls and generates text data based on the requests, an analytics and reporting engine that improves efficiency of the voice recognition engine, and an alarm and routing engine that formats the text data for transmitting the text data to an intended recipient. | 04-16-2015 |
20150106093 | Systems and Methods for Providing an Electronic Dictation Interface - Some embodiments disclosed herein store a target application and a dictation application. The target application may be configured to receive input from a user. The dictation application interface may include a full overlay mode option, where in response to selection of the full overlay mode option, the dictation application interface is automatically sized and positioned over the target application interface to fully cover a text area of the target application interface to appear as if the dictation application interface is part of the target application interface. The dictation application may be further configured to receive an audio dictation from the user, convert the audio dictation into text, provide the text in the dictation application interface and in response to receiving a first user command to complete the dictation, automatically copy the text from the dictation application interface and inserting the text into the target application interface. | 04-16-2015 |
20150106094 | SYSTEM AND METHOD FOR MULTILINGUAL TRANSCRIPTION SERVICE WITH AUTOMATED NOTIFICATION SERVICES - A system and method for generating and managing a secure, multi-user project database. | 04-16-2015 |
20150112674 | METHOD FOR BUILDING ACOUSTIC MODEL, SPEECH RECOGNITION METHOD AND ELECTRONIC APPARATUS - A method for building acoustic model, a speech recognition method and an electronic apparatus are provided. The speech recognition method includes the following steps. A plurality of phonetic transcriptions of a speech signal is obtained from an acoustic model. A plurality of vocabularies matching the phonetic transcriptions are obtained according to each phonetic transcription and a syllable acoustic lexicon, wherein the syllable acoustic lexicon includes the vocabularies corresponding to the phonetic transcription, and the vocabulary having at least one phonetic transcription includes a code corresponding to the phonetic transcription. A plurality of strings and a plurality of string probabilities are obtained from a language model according to the code of each of the vocabularies. | 04-23-2015 |
20150112675 | SPEECH RECOGNITION METHOD AND ELECTRONIC APPARATUS - A speech recognition method and an electronic apparatus are provided. The speech recognition method includes the following steps. A plurality of phonetic transcriptions of a speech signal is obtained according to an acoustic model. A phonetic spelling and intonation information matched to the phonetic transcriptions are obtained according to a phonetic transcription sequence and a syllable acoustic lexicon of the invention. According to the phonetic spellings and the intonation information, a plurality of phonetic spelling sequences and a plurality of phonetic spelling sequence probabilities are obtained from a language model. The phonetic spelling sequence corresponding to a largest one among the phonetic spelling sequence probabilities is selected as a recognition result of the speech signal. | 04-23-2015 |
20150112676 | ENHANCED CAPTURE, MANAGEMENT AND DISTRIBUTION OF LIVE PRESENTATIONS - Techniques are provided for converting live presentations into electronic media and managing captured media assets for distribution. An exemplary system includes capture devices that capture media assets of live presentations comprising a session, including image data of sequentially presented visual aids accompanying the live presentations and audio data. Each capture device has an interface for real-time image data marking of the image data for identification of individual images and session marking of the image data for demarcation of individual presentations of the session. A centralized device processes the captured media assets and automatically divides the captured media assets into discrete files associated with the individual presentations based on the session markings. An administrative tool manages the processed media assets to produce modified presentations and enables modification of the visual aid images identified by the image data markings. A production device formats the modified presentations for distribution on distribution media. | 04-23-2015 |
20150112677 | Document Editing Using Anchors - A user edits text in a draft document by providing input including left and right “anchor” text and replacement text. In response, a document editing system identifies an instance of the left anchor text followed by the right anchor text in the draft document, and replaces text between these instances with the replacement text specified by the user. For example, the user may type a string containing the left anchor text followed by the replacement text followed by the right anchor text, in response to which the system may perform the replacement just described. As a result, the user may specify both the location of, and a correction for, text in the draft document without using cursor keys or other navigation commands to navigate to the location of the text to be corrected, thereby increasing correction efficiency by avoiding the delay associated with such manual navigation. | 04-23-2015 |
20150120293 | METHOD AND SYSTEM FOR ADJUSTING USER SPEECH IN A COMMUNICATION SESSION - A system that incorporates the subject disclosure may include, for example, receive user speech captured at a second end user device during a communication session between the second end user device and a first end user device, apply speech recognition to the user speech, identify an unclear word in the user speech based on the speech recognition, adjust the user speech to generate adjusted user speech by replacing all or a portion of the unclear word with replacement audio content, and provide the adjusted user speech to the first end user device during the communication session. Other embodiments are disclosed. | 04-30-2015 |
20150120294 | APPLIANCES FOR PROVIDING USER-SPECIFIC RESPONSE TO VOICE COMMANDS - Generally the present disclosure is directed to appliances that provide a user-specific response to a received voice command. In particular, the appliance can store a plurality of voice samples respectively associated with a plurality of users. The appliance can also store one or more preferences for each of the plurality of users. For example, the preferences can be input by the user and/or learned or inferred over time. When the appliance receives a human speech signal or voice command, it can match the received speech signal against one or more of the plurality of voice samples to identify the user. The preferences stored and associated with the identified user can then be obtained and the appliance can perform any requested operations in accordance with the obtained preferences. In such fashion, the appliance can provide a user-specific response to a received voice command. | 04-30-2015 |
20150120295 | VOICE-BASED INTERACTIVE CONTENT AND USER INTERFACE - A method, device, system, and computer medium for providing interactive advertising are provided. For example, a device may request an advertisement from a remote server, receive the advertisement, receive a response from a user who is listening and/or watching the advertisement, and transmit the response to the server for further action. The user may input a response by speaking. A server may receive an advertisement request from the device, select an advertisement based on pre-defined one or more criteria, transmit the selected advertisement to the device for play, receive from the device a response to the selected advertisement, and then perform an action corresponding to the received response. | 04-30-2015 |
20150127339 | CROSS-LANGUAGE SPEECH RECOGNITION - Embodiments that relate to identifying potential cross-language speech recognition problems are disclosed. For example, in one disclosed embodiment a speech recognition problem detection program receives a target word in a non-native language from a target application. A phonetic transcription of the target word comprising a plurality of target phonetic units is acquired. The program determines that at least one of the target phonetic units is not found in a plurality of native phonetic units associated with a native language. In response, a warning of the potential cross-language speech recognition problem may be outputted for display on a display device. The warning may comprise the target word. | 05-07-2015 |
20150127340 | CAPTURE - According to some embodiments, a capture device records presentations. According to some embodiments, the recorded presentations are made available for online review and comment by audience members. According to some embodiments, the recorded presentations parsed. | 05-07-2015 |
20150127341 | Event Driven Motion Systems - A motion system for allowing a person to cause a desired motion operation to be performed, comprising a network, a motion machine, a speech to text converter, a message protocol generator, an instant message receiver, and a motion services system. The motion machine is capable of performing motion operations. The speech to text converter generates a digital representation of a spoken motion message spoken by the person. The message protocol generator generates a digital motion command based on the digital representation of the spoken motion message and causes the digital motion command to be transmitted over the network. The instant message receiver receives the digital motion command. The motion services system causes the motion machine to perform the desired motion operation based on the digital motion command received by the instant message receiver. | 05-07-2015 |
20150142433 | Irregular Pattern Identification using Landmark based Convolution - Pattern identification using convolution is described. In one or more implementations, a representation of a pattern is obtained that is described using data points that include frequency coordinates, time coordinates, and energy values. An identification is made as to whether sound data described using irregularly positioned data points includes the pattern, the identifying including use of a convolution of the frequency or time coordinates to determine correspondence with the representation of the pattern. | 05-21-2015 |
20150142434 | Illustrated Story Creation System and Device - The present invention is directed to a method and device to create an illustrated story or narrative. As user dictates a narrative or story the users voice is recognized by a computer program operated on a computer and translated to text. The computer includes a microphone, a processor, and a display and can access a database that includes images. A program then interprets the text using an algorithm to select particular words from the translated text and associates an image file from the database that correspond to the selected words. The user may optionally provide other input with to select characters, themes, and objects. An algorithm processes the input from these various sources and displays the selected image. | 05-21-2015 |
20150142435 | System and Method for Speech-Based Navigation and Interaction with a Device's Visible Screen Elements Using a Corresponding View Hierarchy | 05-21-2015 |
20150149167 | DYNAMIC SELECTION AMONG ACOUSTIC TRANSFORMS - Aspects of this disclosure are directed to accurately transforming speech data into one or more word strings that represent the speech data. A speech recognition device may receive the speech data from a user device and an indication of the user device. The speech recognition device may execute a speech recognition algorithm using one or more user and acoustic condition specific transforms that are specific to the user device and an acoustic condition of the speech data. The execution of the speech recognition algorithm may transform the speech data into one or more word strings that represent the speech data. The speech recognition device may estimate which one of the one or more word strings more accurately represents the received speech data. | 05-28-2015 |
20150149168 | VOICE-ENABLED DIALOG INTERACTION WITH WEB PAGES - Voice enabled dialog with web pages is provided. An Internet address of a web page is received including an area with which a user of a client device can specify information. The web page is loaded using the received Internet address of the web page. A task structure of the web page is then extracted. An abstract representation of the web is then generated. A dialog script, based on the abstract representation of the web page is then provided. Spoken information received from the user is converted into text and the converted text is inserted into the area. | 05-28-2015 |
20150149169 | METHOD AND APPARATUS FOR PROVIDING MOBILE MULTIMODAL SPEECH HEARING AID - A method, computer-readable storage device and apparatus for processing an utterance are disclosed. For example, the method captures the utterance made by a speaker, captures a video of the speaker making the utterance, sends the utterance and the video to a speech to text transcription device, receives a text representing the utterance from the speech to text transcription device, wherein the text is presented on a screen of a mobile endpoint device, and sends the utterance to a hearing aid device. | 05-28-2015 |
20150149170 | NOTE PROMPT SYSTEM AND METHOD USED FOR INTELLIGENT GLASSES - A note prompt system and method used for intelligent glasses are disclosed, where the text content are detected and a reading information comprising the original content and note is acquired. When the original content includes the original content, the associated note is directed to be displayed on the glasses of the intelligent glasses, whereby achieving in the technical efficacy of promoting convenience of note prompt. | 05-28-2015 |
20150149171 | Contextual Audio Recording - During a conversation over the network, a microphone attachable to or included in a mobile computer system is used to input audio speech from the user of the computer system. The audio speech is processed into audio speech data. In the audio speech data, the processor monitors for a keyword previously defined by the user. Upon detecting the keyword in the audio speech data, a contextual portion of the audio speech data is extracted including the keyword. The contextual portion of the audio speech data may be converted to text and stored in memory of the computer system or on the network. | 05-28-2015 |
20150149172 | WORD CLOUD AUDIO NAVIGATION - The present invention is directed generally to linking a collection of words and/or phrases with locations in a video and/or audio stream where the words and/or phrases occur and/or associations of a collection of words and/or phrases with a call history. | 05-28-2015 |
20150289756 | METHOD FOR DETERMINING AT LEAST ONE RELEVANT SINGLE IMAGE OF A DENTAL SUBJECT - The invention relates to a method for determining at least one relevant single image, wherein a plurality of single optical images are generated during a continuous optical measurement ( | 10-15-2015 |
20150294668 | Word-Level Correction of Speech Input - The subject matter of this specification can be implemented in, among other things, a computer-implemented method for correcting words in transcribed text including receiving speech audio data from a microphone. The method further includes sending the speech audio data to a transcription system. The method further includes receiving a word lattice transcribed from the speech audio data by the transcription system. The method further includes presenting one or more transcribed words from the word lattice. The method further includes receiving a user selection of at least one of the presented transcribed words. The method further includes presenting one or more alternate words from the word lattice for the selected transcribed word. The method further includes receiving a user selection of at least one of the alternate words. The method further includes replacing the selected transcribed word in the presented transcribed words with the selected alternate word. | 10-15-2015 |
20150294669 | Speaker and Call Characteristic Sensitive Open Voice Search - Techniques disclosed herein include systems and methods for open-domain voice-enabled searching that is speaker sensitive. Techniques include using speech information, speaker information, and information associated with a spoken query to enhance open voice search results. This includes integrating a textual index with a voice index to support the entire search cycle. Given a voice query, the system can execute two matching processes simultaneously. This can include a text matching process based on the output of speech recognition, as well as a voice matching process based on characteristics of a caller or user voicing a query. Characteristics of the caller can include output of voice feature extraction and metadata about the call. The system clusters callers according to these characteristics. The system can use specific voice and text clusters to modify speech recognition results, as well as modifying search results. | 10-15-2015 |
20150294671 | SECURITY ALARM SYSTEM WITH ADAPTIVE SPEECH PROCESSING - A regional monitoring system includes speech recognition circuitry having smart filtering capability to interpret speech input from a user to provide interactions between the user and the system. Received voice commands can be filtered using key words to interpret security commands which can then be executed. The system can provide audible feedback using one or more of prerecorded voice data files or synthesized speech. | 10-15-2015 |
20150310863 | METHOD AND APPARATUS FOR SPEAKER DIARIZATION - A method and apparatus records at a first mobile device, separately, each of an upstream component and a downstream component of a speech data associated with users of the first mobile device and a second mobile device in a full-duplex communication system. Speech endpointing is performing on each recorded component to delimit speech chunks in each component using timing information common to both components. The speech chunks are converted to text chunks using at least one automatic speech recognition process and the text chunks are displayed, based on the timing information, in chronological order on a graphical user interface of the first mobile device as diarized text. | 10-29-2015 |
20150310864 | METHOD, INTERACTION DEVICE, SERVER, AND SYSTEM FOR SPEECH RECOGNITION - Embodiments of the present invention provide a method, an apparatus, and a system for speech recognition. a third-party application corresponding to a speech signal of a user can be determined according to the speech signal and by means of semantic analysis; and third-party application registry information is searched for and a third-party program is started, so that the user does not need to tap the third-party application to start the corresponding program, thereby providing more intelligent service for the user and facilitating use for the user. | 10-29-2015 |
20150310879 | SPEECH ENDPOINTING BASED ON WORD COMPARISONS - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing based on word comparisons are described. In one aspect, a method includes the actions of obtaining a transcription of an utterance. The actions further include determining, as a first value, a quantity of text samples in a collection of text samples that (i) include terms that match the transcription, and (ii) do not include any additional terms. The actions further include determining, as a second value, a quantity of text samples in the collection of text samples that (i) include terms that match the transcription, and (ii) include one or more additional terms. The actions further include classifying the utterance as a likely incomplete utterance or not a likely incomplete utterance based at least on comparing the first value and the second value. | 10-29-2015 |
20150317975 | Method and Devices for Language Determination for Voice to Text Transcription of Phone Calls - The present invention provides methods, devices and systems for determining a language among a plurality of languages available for a voice to text transcription of phone calls between a caller and a recipient provided by an answering machine system, NETWORK characterized in that at least two of said available languages are proposed to the caller based on: a phone country code corresponding to said caller a phone country code corresponding to said recipient, a language comprised in a set of languages available for the transcription by said answering machine system, a language selected automatically on the basis of parameters set by the caller or the recipient wherein said caller selects said language interacting with said answering machine system, and a corresponding voice message is transcribed into text of the selected language for forwarding to said recipient. | 11-05-2015 |
20150317976 | SYSTEM AND METHOD FOR CREATING MYSTORE VIDEO RECORDINGS AND EMBEDDED TEXT - A system for creating mystore video recordings with embedded text is provided. The system comprises a mobile device with video recording functionality and voice recognition functionality and an application stored on the mobile device. When executed on the mobile device, the application recognizes a first spoken keyword during recording of a first video and stores a first utterance, the first utterance spoken immediately following the first spoken keyword. The application further recognizes a second spoken keyword during recording of the first video and stores a second utterance, the second utterance spoken immediately following the second spoken keyword. The application further converts the first utterance to a first text string, converts the second utterance to a second text string, and embeds the first text string and the second text string into the first video. | 11-05-2015 |
20150317979 | METHOD FOR DISPLAYING MESSAGE AND ELECTRONIC DEVICE - A method for displaying a message is provided. The method includes receiving a speech signal; converting, to a text representation, at least a part of the speech signal corresponding to a voice message object; and displaying, within the voice message object, a part of the text representation, corresponding to the at least the part of the speech signal, a first object selectable to fully view the text representation, and a second object selectable to play back the speech signal. | 11-05-2015 |
20150317996 | System and Method of Improving Communication in a Speech Communication System - A speech communication system and a method of improving communication in such a speech communication system between at least a first user and a second user may be configured so the system (a) transcribes a recorded portion of a speech communication between the at least first and second user to form a transcribed portion, (b) selects and marks at least one of the words of the transcribed portion which is considered to be a keyword of the speech communication, (c) performs a search for each keyword and produces at least one definition for each keyword, (d) calculates a trustworthiness factor for each keyword, each trustworthiness factor indicating a calculated validity of the respective definition(s), and (e) displays the transcribed portion as well as each of the keywords together with the respective definition and the trustworthiness factor thereof to at least one of the first user and the second user. | 11-05-2015 |
20150326949 | DISPLAY OF DATA OF EXTERNAL SYSTEMS IN SUBTITLES OF A MULTI-MEDIA SYSTEM - A computer-implemented method for displaying data from external computing systems in subtitles of a multi-media system is provided. The computer-implemented method comprises analyzing data of an incoming media stream from at least one external computing system, wherein the data is analyzed to identify at least one of a text-based data, a voice-based data, or a video-based data of the least one external computing system that is associated with the multi-media system. The computer-implemented method further comprises augmenting at least one subtitle of the multi-media system with the identified and converted at least one of the text-based data, the voice-based data, or the video-based data. The computer-implemented method further comprises generating at least one annotation of the multi-media system with the identified and converted at least one of the text-based data, the voice-based data, or the video-based data. | 11-12-2015 |
20150332673 | REVISING LANGUAGE MODEL SCORES BASED ON SEMANTIC CLASS HYPOTHESES - Techniques for improved speech recognition disclosed herein include applying a statistical language model to a free-text input utterance to obtain a plurality of candidate word sequences for automatic speech recognition of the input utterance, each of the plurality of candidate word sequences having a corresponding initial score generated by the statistical language model. For one or more of the plurality of candidate word sequences, each of the one or more candidate word sequences may be analyzed to generate one or more hypotheses for a semantic class of at least one token in the respective candidate word sequence. The initial scores generated by the statistical language model for at least the one or more candidate word sequences may be revised based at least in part on the one or more hypotheses for the semantic class of the at least one token in each of the one or more candidate word sequences. | 11-19-2015 |
20150339940 | METHOD AND SYSTEM FOR CONSTRUCTED RESPONSE GRADING - A method and system for constructive response grading for spoken language is disclosed. The method and system are computer implemented and involve a crowdsourcing step to derive evaluation features. The method includes steps for posting a speech test through an automated speech assessment tool, receiving candidate responses from candidates for the speech test; delivering the candidate responses to crowdsource volunteers; receiving crowdsourced responses from crowdsource volunteers, where the crowdsourced responses comprise a transcription of the speech test; deriving features from the transcription; and deriving a individual scores based on the features, where the individual scores are representative of pronunciation score, fluency score, content organization score and grammar score of the spoken language for each candidate. | 11-26-2015 |
20150340024 | Language Modeling Using Entities - Among other things, this document describes a computer-implemented method. The method can include obtaining a plurality of text samples. For each of one or more text samples in the plurality of text samples, the text sample can be annotated with one or more labels that indicate respective classes to which one or more terms in the text sample are assigned, wherein annotating the text sample comprises determining that at least one term in the text sample corresponds to a first entity in a data structure of interconnected entities and determining a classification of the first entity within the data structure of interconnected entities. The method can include generating a class-based training set of text samples. A class-based language model can be trained using the class-based training set of text samples. A plurality of class-specific language models can be trained. | 11-26-2015 |
20150340034 | RECOGNIZING SPEECH USING NEURAL NETWORKS - Methods, systems, and apparatus, including computer programs encoded on computer storage media, for recognizing speech using neural networks. One of the methods includes receiving an audio input; processing the audio input using an acoustic model to generate a respective phoneme score for each of a plurality of phoneme labels; processing one or more of the phoneme scores using an inverse pronunciation model to generate a respective grapheme score for each of a plurality of grapheme labels; and processing one or more of the grapheme scores using a language model to generate a respective text label score for each of a plurality of text labels. | 11-26-2015 |
20150340036 | SYSTEMS AND METHODS FOR TRANSCRIPTION TRANSFER - Included are systems and methods for transcription transfer. In some embodiments a method includes receiving text data in an electronic format, determining a header in the text data, and in response to determining the header in the text data, determining a segment associated with the header. Some embodiments may include providing a dialog box associated with the text data, where the dialog box includes a first option for inserting the segment into a user interface provided by a destination application, receiving a user selection of the first option, and inserting the segment into a predetermined portion of the user interface provided by the destination application. | 11-26-2015 |
20150340037 | SYSTEM AND METHOD OF PROVIDING VOICE-MESSAGE CALL SERVICE - Provided are a system and method of providing a voice-message call service. A mobile device that performs a call with an external mobile device comprises a control unit configured to obtain text, the text converted from voice data that is exchanged between the mobile device and the external mobile device, during the call between the mobile device and the external mobile device, and obtain input text input to the mobile device and provided text that is received from the external mobile device; and a display unit configured to arrange the text, the input text, and the provided text and display the arranged text, input text, and provided text on a screen of the device, during the call between the mobile device and the external mobile device. | 11-26-2015 |
20150340038 | IDENTIFYING CORRESPONDING REGIONS OF CONTENT - A content alignment service may generate content synchronization information to facilitate the synchronous presentation of audio content and textual content. In some embodiments, a region of the textual content whose correspondence to the audio content is uncertain may be analyzed to determine whether the region of textual content corresponds to one or more words that are audibly presented in the audio content, or whether the region of textual content is a mismatch with respect to the audio content. In some embodiments, words in the textual content that correspond to words in the audio content are synchronously presented, while mismatched words in the textual content may be skipped to maintain synchronous presentation. Accordingly, in one example application, an audiobook is synchronized with an electronic book, so that as the electronic book is displayed, corresponding words of the audiobook are audibly presented. | 11-26-2015 |
20150348538 | SPEECH SUMMARY AND ACTION ITEM GENERATION - Techniques for generating summaries and action items associated with speech are described. Disclosed are techniques for receiving data representing an audio signal including speech, determining one or more words associated with the speech, determining one or more vocal fingerprints associated with the speech, and identifying a keyword associated with the speech using the one or more words and the one or more vocal fingerprints. Presentation of the keyword may be made at a loudspeaker, a display, another user interface, and the like. A summary, including meta-data and a content summary, may be generated from one or more keywords, and the summary may be presented to a user. | 12-03-2015 |
20150348540 | System and Method for Optimizing Speech Recognition and Natural Language Parameters with User Feedback - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for assigning saliency weights to words of an ASR model. The saliency values assigned to words within an ASR model are based on human perception judgments of previous transcripts. These saliency values are applied as weights to modify an ASR model such that the results of the weighted ASR model in converting a spoken document to a transcript provide a more accurate and useful transcription to the user. | 12-03-2015 |
20150348548 | REDUCING THE NEED FOR MANUAL START/END-POINTING AND TRIGGER PHRASES - Systems and processes for selectively processing and responding to a spoken user input are provided. In one example, audio input containing a spoken user input can be received at a user device. The spoken user input can be identified from the audio input by identifying start and end-points of the spoken user input. It can be determined whether or not the spoken user input was intended for a virtual assistant based on contextual information. The determination can be made using a rule-based system or a probabilistic system. If it is determined that the spoken user input was intended for the virtual assistant, the spoken user input can be processed and an appropriate response can be generated. If it is instead determined that the spoken user input was not intended for the virtual assistant, the spoken user input can be ignored and/or no response can be generated. | 12-03-2015 |
20150348549 | BETTER RESOLUTION WHEN REFERENCING TO CONCEPTS - Systems and processes for operating a virtual assistant programmed to refer to shared domain concepts using concept nodes are provided. In some examples, to process a textual representation of user speech using an active ontology having these concept nodes, a primary user intent can be determined from the textual representation of user speech. Concepts referred to by the primary user intent can be identified, and substrings of the textual representation of user speech corresponding to the concepts can be identified. Secondary user intents for the substrings can be determined and a task flow based on the primary user intent and the secondary user intents can be generated and performed. | 12-03-2015 |
20150348550 | Speech-to-text input method and system combining gaze tracking technology - A speech-to-text input method includes: receiving a speech input from a user; converting the speech input into text through speech recognition; displaying the recognized text to the user; determining a gaze position of the user on a display by tracking the eye movement of the user; displaying an edit cursor at the gaze position when the gaze position is located at the displayed text; receiving a speech edit command from the user; recognizing the speech edit command through speech recognition; and editing the text at the edit cursor according to the recognized speech edit command. | 12-03-2015 |
20150348551 | MULTI-COMMAND SINGLE UTTERANCE INPUT METHOD - Systems and processes are disclosed for handling a multi-part voice command for a virtual assistant. Speech input can be received from a user that includes multiple actionable commands within a single utterance. A text string can be generated from the speech input using a speech transcription process. The text string can be parsed into multiple candidate substrings based on domain keywords, imperative verbs, predetermined substring lengths, or the like. For each candidate substring, a probability can be determined indicating whether the candidate substring corresponds to an actionable command. Such probabilities can be determined based on semantic coherence, similarity to user request templates, querying services to determine manageability, or the like. If the probabilities exceed a threshold, the user intent of each substring can be determined, processes associated with the user intents can be executed, and an acknowledgment can be provided to the user. | 12-03-2015 |
20150348552 | DYNAMIC SPEECH RECOGNITION AND TRANSCRIPTION AMONG USERS HAVING HETEROGENEOUS PROTOCOLS - A system is disclosed for facilitating free form dictation, including directed dictation and constrained recognition and/or structured transcription among users having heterogeneous native (legacy) protocols for generating, transcribing, and exchanging recognized and transcribed speech. The system includes at least one system transaction manager having a “system protocol,” to receive a verified, streamed speech information request from at least one authorized user employing a first legacy user protocol. The speech information request which includes spoken text and system commands is generated using a user interface capable of bi-directional communication with the system transaction manager and supporting dictation applications, including prompts to direct user dictation in response to user system protocol commands and systems transaction manager commands. A speech recognition and/or transcription engine (ASR), in communication with the systems transaction manager, receives the speech information request from the system transaction manager, generates a transcribed response, which can include a formatted transcription, and transmits the response to the system transaction manager. The system transaction manager routes the response to one or more of the users employing a second protocol, which may be the same as or different than the first protocol. In another embodiment, the system employs a virtual sound driver for streaming free form dictation to any ASR, regardless of the ASR's ability to recognize and/or transcribe spoken text from any input source such as, for example, a live microphone or line input. In another embodiment, the system employs a buffer to facilitate the system's use of ASRs requiring input data to be in batches, while providing the user with an uninterrupted, seamless dictating experience. | 12-03-2015 |
20150356836 | CONVERSATION CUES WITHIN AUDIO CONVERSATIONS - In many scenarios, a device may detect one or more audio conversations, and may be capable of evaluating such audio conversations, e.g., in order to present a text transcript to a user. However, the user's attention to such audio conversations may waver, and the user may miss the audio conversation and/or an opportunity to participate in the audio conversation. Presented herein are techniques for enabling devices to assist users in such scenarios by monitoring audio conversations to detect conversation cues that pertain to the user (e.g., the user's name, names of the user's friends, and/or topics of interest to the user). Upon detecting a conversation cue within an audio conversation that pertains to the user, the device notifies the user (e.g., alerting the user that the audio conversation may be of interest, and/or presenting a text transcript of the portion of the audio conversation containing the conversation cue). | 12-10-2015 |
20150364131 | SYSTEM AND METHOD FOR UNSUPERVISED AND ACTIVE LEARNING FOR AUTOMATIC SPEECH RECOGNITION - A system and method is provided for combining active and unsupervised learning for automatic speech recognition. This process enables a reduction in the amount of human supervision required for training acoustic and language models and an increase in the performance given the transcribed and un-transcribed data. | 12-17-2015 |
20150364140 | Portable Electronic Equipment and Method of Operating a User Interface - A portable electronic equipment comprises a speech to text conversion module configured to generate a text by performing a speech to text conversion. A gaze tracking device is configured to track an eye gaze direction of a user on a display on which the text is displayed. The portable electronic equipment is configured to selectively activate a text editing function based on the tracked eye gaze direction. | 12-17-2015 |
20150364141 | METHOD AND DEVICE FOR PROVIDING USER INTERFACE USING VOICE RECOGNITION - A method of providing a user interface (UI), includes generating first feature information indicating a feature of a voice signal, and converting the voice signal to a first text. The method further includes visually changing the first text based on the first feature information, and providing the UI displaying the changed first text. | 12-17-2015 |
20150371627 | VOICE DIALOG SYSTEM USING HUMOROUS SPEECH AND METHOD THEREOF - A voice dialog system and a voice dialog method that generate and use humorous speech are disclosed. The voice dialog system includes a speech analysis unit that analyzes a user's intention by receiving user speech and converting the user speech into text, a humorous speech generation unit that generates humorous speech using a core word or an abbreviated word included in the user speech based on the user's intention, a chatting speech generation unit that generates chatting speech as a response corresponding to the user's intention, and a final speech selection unit that selects final speech from among the humorous speech and the chatting speech. Thus, a user is provided with humorous speech so that the user does not feel bored and has fun using the chatting dialog system. | 12-24-2015 |
20150371636 | SYSTEM AND METHOD FOR PROVIDING VOICE COMMUNICATION FROM TEXTUAL AND PRE-RECORDED RESPONSES - An approach is provided for detecting a voice call directed to a user. The approach involves presenting a user interface for interacting with the voice call, wherein the user interface includes a control option for selecting a pre-recorded word or phrase from the user; for generating a custom-created audio word or phrase from one or more phonemes pre-recorded by the user; or a combination thereof. The approach also involves interjecting the pre-recorded word or phrase, the custom-created audio word or phrase, or a combination thereof into the voice call. | 12-24-2015 |
20150371637 | METHODS AND APPARATUS FOR ASSOCIATING DICTATION WITH AN ELECTRONIC RECORD - According to some aspects, a method of associating dictation with an electronic record in a system having a dictation system comprising a dictation application for capture of speech input and a separate electronic records system for managing electronic records is provided. The method comprises receiving, by the dictation application, speech input from a user corresponding to a dictation to be associated with an electronic record of the electronic records system, obtaining, by the dictation application, a job identifier associated with the dictation, providing, by the dictation application, the job identifier and audio data based on the speech input for transcription, obtaining, by the dictation application, a dictation marker comprising the job identifier and one or more delimiters, and causing the dictation marker to be inserted into the electronic record. | 12-24-2015 |
20150373196 | SYSTEM FOR ANALYZING INTERACTIONS AND REPORTING ANALYTIC RESULTS TO HUMAN OPERATED AND SYSTEM INTERFACES IN REAL TIME - A computerized system for advising one communicant in electronic communication between two or more communicants has apparatus monitoring and recording interaction between the communicants, software executing from a machine-readable medium and providing analytics, the software functions including rendering speech into text, and analyzing the rendered text for topics, performing communicant verification, and detecting changes in communicant emotion. Advice is offered to the one communicant during the interaction, based on results of the analytics. | 12-24-2015 |
20150373428 | Clarifying Audible Verbal Information in Video Content - A method at a server includes: receiving a user request to clarify audible verbal information associated with a media content item playing in proximity to a client device, where the user request includes an audio sample of the media content item and a user query, and the audio sample corresponds to a portion of the media content item proximate in time to issuance of the user query; in response to the user request: identifying the media content item and a first playback position in the media content corresponding to the audio sample; in accordance with the first playback position and identity of the media content item, obtaining textual information corresponding to the user query for a respective portion of the media content item; and transmitting to the client device at least a portion of the textual information. | 12-24-2015 |
20150379995 | SYSTEMS AND METHODS FOR A NAVIGATION SYSTEM UTILIZING DICTATION AND PARTIAL MATCH SEARCH - A method searching navigation system data includes receiving a spoken utterance from a user, processing the spoken utterance to produce a dictation text substantially corresponding to the spoken utterance, and querying the navigation system data with the dictation text using an approximate string matching criteria and producing a results list associated therewith. | 12-31-2015 |
20150379996 | APPARATUS FOR SYNCHRONOUSLY PROCESSING TEXT DATA AND VOICE DATA - The apparatus for synchronously processing text data and voice data, comprises: a storing unit for storing text data and voice data; a text data dividing section for dividing the text data; a text data phoneme converting section for phonemically converting the divided text data; a text data phoneme conversion accumulated value calculating section for calculating accumulated values of text data phoneme conversion values; a voice data dividing section for dividing the voice data; a reading data phoneme converting section for phonemically converting the divided voice data; a voice data phoneme conversion accumulated value calculating section for calculating accumulated values of voice data phoneme conversion values; a phrase corresponding data producing section for producing phrase corresponding data; and an output section for synchronously outputting the text data and the divided voice data. | 12-31-2015 |
20150382070 | METHOD, ELECTRONIC DEVICE, AND COMPUTER PROGRAM PRODUCT - According to one embodiment, a method by an electronic device includes: receiving audio data including a voice of a user; performing voice recognition to translate the audio data to text including a first character string corresponding to the voice; determining whether the first character string is registered in conversion information; displaying, when the first character string is registered in the conversion information, a second character string associated with the first character string in the conversion information; receiving, when an instruction is received from the user and when the first character string is not registered in the conversion information, a third character string obtained by editing the first character string; and registering, when the third character string is found in program information, the third character string in the conversion information so as to associate the third character string with the first character string. | 12-31-2015 |
20160005402 | Content-Based Audio Playback Emphasis - Techniques are disclosed for facilitating the process of proofreading draft transcripts of spoken audio streams. In general, proofreading of a draft transcript is facilitated by playing back the corresponding spoken audio stream with an emphasis on those regions in the audio stream that are highly relevant or likely to have been transcribed incorrectly. Regions may be emphasized by, for example, playing them back more slowly than regions that are of low relevance and likely to have been transcribed correctly. Emphasizing those regions of the audio stream that are most important to transcribe correctly and those regions that are most likely to have been transcribed incorrectly increases the likelihood that the proofreader will accurately correct any errors in those regions, thereby improving the overall accuracy of the transcript. | 01-07-2016 |
20160012821 | RAPID TRANSCRIPTION BY DISPERSING SEGMENTS OF SOURCE MATERIAL TO A PLURALITY OF TRANSCRIBING STATIONS | 01-14-2016 |
20160019892 | PROCEDURE TO AUTOMATE/SIMPLIFY INTERNET SEARCH BASED ON AUDIO CONTENT FROM A VEHICLE RADIO - Audio obtained from a car radio is converted to digital data that is stored in a circular buffer, the size of which enables at least several seconds of audio to be recorded continuously. When a driver or passenger hears something of interest, data in the circular buffer is converted to strings of text. The text obtained from the recorded data is presented on a display device where individual text strings can be selected for transmission to an Internet search engine running on a computer or saved for the future use. The results of the Internet search are presented on the display device. | 01-21-2016 |
20160019893 | METHOD FOR CONTROLLING SPEECH-RECOGNITION TEXT-GENERATION SYSTEM AND METHOD FOR CONTROLLING MOBILE TERMINAL - A method for controlling a speech-recognition text-generation system that captures speech, converts the captured speech into character strings through speech recognition, includes determining whether or not the character strings include a predetermined phrase; specifying, in a case where the predetermined phrase is determined to be included, a character string associated with the predetermined phrase among the character strings as a first character string which is a deletion candidate; and displaying the first character string in a first display form on a display terminal and displaying a second character string, which is a character string other than the first character string, in a second display form on the display terminal. | 01-21-2016 |
20160019894 | VOICE INFORMATION CONTROL METHOD AND TERMINAL DEVICE - A voice information control method for a terminal used in a system including a server device which creates text data on the basis of the voice information received from the terminal device, the method including: acquiring plurality items of first voice information; specifying a time interval that includes second voice information which is one of the plurality items of the first voice information, and which includes spoken voice of a first speaker who uses the first terminal device; and transmitting the second voice information included in the specified time interval being transmitted to the server device. | 01-21-2016 |
20160027434 | UNSUPERVISED AND ACTIVE LEARNING IN AUTOMATIC SPEECH RECOGNITION FOR CALL CLASSIFICATION - Utterance data that includes at least a small amount of manually transcribed data is provided. Automatic speech recognition is performed on ones of the utterance data not having a corresponding manual transcription to produce automatically transcribed utterances. A model is trained using all of the manually transcribed data and the automatically transcribed utterances. A predetermined number of utterances not having a corresponding manual transcription are intelligently selected and manually transcribed. Ones of the automatically transcribed data as well as ones having a corresponding manual transcription are labeled. In another aspect of the invention, audio data is mined from at least one source, and a language model is trained for call classification from the mined audio data to produce a language model. | 01-28-2016 |
20160027439 | PROVIDING PRE-COMPUTED HOTWORD MODELS - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, for each of multiple words or sub-words, audio data corresponding to multiple users speaking the word or sub-word; training, for each of the multiple words or sub-words, a pre-computed hotword model for the word or sub-word based on the audio data for the word or sub-word; receiving a candidate hotword from a computing device; identifying one or more pre-computed hotword models that correspond to the candidate hotword; and providing the identified, pre-computed hotword models to the computing device. | 01-28-2016 |
20160027442 | SUMMARIZATION OF AUDIO DATA - Aspects of the present invention disclose a method, computer program product, a service, and a system for generating a summary of audio on one or more computing devices. The method includes one or more processors retrieving an audio recording. The method further includes one or more processors identifying supplemental information associated with the audio recording, wherein the supplemental information includes information associated with content in the audio recording and information associated with one or more speakers of the audio recording. The method further includes one or more processors converting the audio recording to a transcript of the audio recording. The method further includes one or more processors generating a summary of the transcript of the audio recording based at least in part on the identified supplemental information. | 01-28-2016 |
20160027443 | CONTINUOUS SPEECH TRANSCRIPTION PERFORMANCE INDICATION - A method of providing speech transcription performance indication includes receiving, at a user device data representing text transcribed from an audio stream by an ASR system, and data representing a metric associated with the audio stream; displaying, via the user device, said text; and via the user device, providing, in user-perceptible form, an indicator of said metric. Another method includes displaying, by a user device, text transcribed from an audio stream by an ASR system; and via the user device, providing, in user-perceptible form, an indicator of a level of background noise of the audio stream. Another method includes receiving data representing an audio stream; converting said data representing an audio stream to text via an ASR system; determining a metric associated with the audio stream; transmitting data representing said text to a user device; and transmitting data representing said metric to the user device. | 01-28-2016 |
20160035345 | Automatic Language Model Update - A method for generating a speech recognition model includes accessing a baseline speech recognition model, obtaining information related to recent language usage from search queries, and modifying the speech recognition model to revise probabilities of a portion of a sound occurrence based on the information. The portion of a sound may include a word. Also, a method for generating a speech recognition model, includes receiving at a search engine from a remote device an audio recording and a transcript that substantially represents at least a portion of the audio recording, synchronizing the transcript with the audio recording, extracting one or more letters from the transcript and extracting the associated pronunciation of the one or more letters from the audio recording, and generating a dictionary entry in a pronunciation dictionary. | 02-04-2016 |
20160035348 | Speech-Based Search Using Descriptive Features of Surrounding Objects - A natural language query arrangement is described for a mobile environment. An automatic speech recognition (ASR) engine can process an unknown speech input from a user to produce corresponding recognition text. A natural language understanding module can extract natural language concept information classifier uses the from the recognition text. A query recognition text and the natural language concept information to assign to the speech input a query intent related to one or more objects in the mobile environment. An environment database contains information descriptive of objects in the mobile environment. A query search engine searches the environment database based on the query intent, the natural language concept information, and the recognition text to determine corresponding search results, which can be to the user. | 02-04-2016 |
20160035349 | ELECTRONIC APPARATUS AND METHOD OF SPEECH RECOGNITION THEREOF - An electronic apparatus and a method of speech recognition thereof are disclosed. According to the method of speech recognition of the electronic apparatus, the method includes receiving a speech of a speaker, extracting phonemic characteristics for recognizing a speech and voice print characteristics for registering the speaker by analyzing the received speech of the speaker, and in response to the speech of the speaker corresponding a registered trigger word or phrase, based on the extracted phonemic characteristics, changing an execution mode to a speech recognition mode of the electronic apparatus and registering the extracted voice print characteristics as voice print characteristics of the speaker who spoke the speech. | 02-04-2016 |
20160035353 | CONVERSATIONAL AGENTS - Methods, systems, and apparatus, including computer programs encoded on computer storage media, for handing off a user conversation between computer-implemented agents. One of the methods includes receiving, by a computer-implemented agent specific to a user device, a digital representation of speech encoding an utterance, determining, by the computer-implemented agent, that the utterance specifies a requirement to establish a communication with another computer-implemented agent, and establishing, by the computer-implemented agent, a communication between the other computer-implemented agent and the user device. | 02-04-2016 |
20160042738 | VOICE RECORDING AND PROCESSING APPARATUS FOR USE IN SPEECH-TO-TEXT CONVERSION AND ANALYSIS SYSTEMS - Embodiments of the present invention are generally directed towards voice processing systems and methods of use thereof. Specifically, embodiments of the present invention are directed to providing an apparatus for recording and processing of voice data for transmission and use in speech-to-text analysis systems. Preferred embodiments of the present invention provide an apparatus configured to record call data from one or more sources and provide processing and transmission services on the recorded call data that allow for the data to be utilized and consumed in one or more remote speech-to-text analysis systems. | 02-11-2016 |
20160049146 | METHOD AND APPARATUS TO GENERATE A SPEECH RECOGNITION LIBRARY - Methods and apparatus to generate a speech recognition library for use by a speech recognition system are disclosed. An example method comprises identifying a plurality of video segments having closed caption data corresponding to a phrase, the plurality of video segments associated with respective ones of a plurality of audio data segments, computing a plurality of difference metrics between a baseline audio data segment associated with the phrase and respective ones of the plurality of audio data segments, selecting a set of the plurality of audio data segments based on the plurality of difference metrics, identifying a first one of the audio data segments in the set as a representative audio data segment, determining a first phonetic transcription of the representative audio data segment, and adding the first phonetic transcription to a speech recognition library when the first phonetic transcription differs from a second phonetic transcription associated with the phrase in the speech recognition library. | 02-18-2016 |
20160049151 | SYSTEM AND METHOD OF PROVIDING SPEECH PROCESSING IN USER INTERFACE - Disclosed are systems, methods and computer-readable media for enabling speech processing in a user interface of a device. The method includes receiving an indication of a field and a user interface of a device, the indication also signaling that speech will follow, receiving the speech from the user at the device, the speech being associated with the field, transmitting the speech as a request to public, common network node that receives and processes speech, processing the transmitted speech and returning text associated with the speech to the device and inserting the text into the field. Upon a second indication from the user, the system processes the text in the field as programmed by the user interface. The present disclosure provides a speech mash up application for a user interface of a mobile or desktop device that does not require expensive speech processing technologies. | 02-18-2016 |
20160055849 | RESPONSE GENERATION METHOD, RESPONSE GENERATION APPARATUS, AND RESPONSE GENERATION PROGRAM - A response generation method includes a step of recognizing a voice of a user, a step of analyzing a structure of the recognized voice, a step of generating a free response sentence in response to the voice of the user based on the analyzed voice structure and outputting the generated free response sentence, a step of generating the recognized voice of the user as a repeat response sentence, and a step of outputting the generated repeat response sentence before outputting the free response sentence based on the voice structure. | 02-25-2016 |
20160055850 | INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD, AND INFORMATION PROCESSING PROGRAM - An information processing device includes a first information processing unit, a communication unit, and a control unit. The first information processing unit performs predetermined information processing on input data to generate first processing result data. The communication unit is capable of receiving second processing result data generated by a second information processing unit capable of executing the same kind of information processing as the information processing on the input data under a condition with higher versatility. The control unit selects either the first processing result data or the second processing result data according to the use environment of the device. | 02-25-2016 |
20160055851 | METHODOLOGY FOR LIVE TEXT BROADCASTING - A Transcription Engine is able to broadcast over the Internet streaming text associated with the broadcast to registered and authenticated end users who may be hearing impaired or may have difficulty understanding the language used in the broadcast. The end users' understanding of the information being broadcast is improved because of the availability of the associated text. The Transcription Engine comprises an authentication server, a database server and a Transcription server. End users are first registered automatically at a website associated with the Transcription Engine. End users can then login and are authenticated automatically by the Transcription Engine prior to being given access to a live or recorded broadcast of associated broadcast. The end users obtain access to the associated text broadcast via the Internet after having been authenticated by the Transcription Engine. | 02-25-2016 |
20160063999 | AIRCRAFT AND INSTRUMENTATION SYSTEM FOR VOICE TRANSCRIPTION OF RADIO COMMUNICATIONS - Aircraft instrumentation systems, aircraft, and controllers are provided for transcribing radio communications. An aircraft instrumentation system and an aircraft include a radio device, a display device, and a controller. The controller is communicatively coupled with the radio device and the display device. The controller is configured to monitor the radio device and recognize a voice communication received over the radio device. The controller is further configured to generate an electronic transcript of the voice communication and control the display device to display a transcript of the voice communication. | 03-03-2016 |
20160078149 | IDENTIFICATION AND VERIFICATION OF FACTUAL ASSERTIONS IN NATURAL LANGUAGE - A system for verifying factual assertions in natural language. Receiving a text input. Identifying, a verifiable factual statement in the text. Forming a query based on the verifiable factual assertion. Searching a corpus based on the query. Determining the veracity of verifiable factual assertion based on the search results. | 03-17-2016 |
20160078864 | IDENTIFYING UN-STORED VOICE COMMANDS - Devices, methods, and computer-readable and executable instructions for identifying un-stored voice commands are described herein. For example, one or more embodiments include a microphone component configured to capture an un-stored voice command issued by a user and a speech recognition engine. The speech recognition engine can be configured to convert the un-stored voice command to device recognizable text, compare the device recognized text of the un-stored voice command to a plurality of stored voice commands of a voice controlled device, and identify a stored voice command among the plurality of stored voice commands based on the comparison of the device recognizable text of the un-stored voice command to the plurality of stored voice commands. | 03-17-2016 |
20160078867 | ELECTRONIC DEVICE AND METHOD FOR MANAGING VOICE ENTERED TEXT USING GESTURING - An electronic device for managing voice entered text using gesturing comprises a housing, display, power source, speech recognition module, gesture recognition module, and processor. A first speech input is detected, and textual words are displayed. One or more swipe gestures are detected, and a direction of the swipe gesture(s) is determined. Each textual word is highlighted one-by-one along a path of the direction of the swipe gesture(s) highlighting for each swipe gesture. For one embodiment, a second speech input may be detected and a highlighted textual word may be substituted with a second textual word. For another embodiment, a type of the swipe gesture(s) may be determined. A textual word adjacent to a currently highlighted word may be highlighted next for the first type, and a textual word non-adjacent to the currently highlighted word may be highlighted next for the second type. | 03-17-2016 |
20160086620 | METHOD FOR SENDING MULTI-MEDIA MESSAGES WITH CUSTOMIZED AUDIO - A system and method of creating a customized multi-media message to a recipient is disclosed. The multi-media message is created by a sender and contains an animated entity that delivers an audible message. The sender chooses the animated entity from a plurality of animated entities. The system receives a text message from the sender and receives a sender audio message associated with the text message. The sender audio message is associated with the chosen animated entity to create the multi-media message. The multi-media message is delivered by the animated entity using as the voice the sender audio message wherein the mouth movements of the animated entity conform to the sender audio message. | 03-24-2016 |
20160092161 | MOBILE TERMINAL AND CONTROLLING METHOD THEREOF - A mobile terminal including a wireless communication unit configured to wirelessly communicate with at least one other terminal; a microphone configured to receive a first voice input; a touchscreen display; a controller configured to convert the first voice input into first text, display the first text on the touchscreen display, receive a selection signal indicating a selection of the displayed first text, receive a second voice input via the microphone, convert the second voice input into second text, combine the first text and the second text, and display the combined text on the touchscreen display. | 03-31-2016 |
20160093298 | CACHING APPARATUS FOR SERVING PHONETIC PRONUNCIATIONS - Systems and processes for generating a shared pronunciation lexicon and using the shared pronunciation lexicon to interpret spoken user inputs received by a virtual assistant are provided. In one example, the process can include receiving pronunciations for words or named entities from multiple users. The pronunciations can be tagged with context tags and stored in the shared pronunciation lexicon. The shared pronunciation lexicon can then be used to interpret a spoken user input received by a user device by determining a relevant subset of the shared pronunciation lexicon based on contextual information associated with the user device and performing speech-to-text conversion on the spoken user input using the determined subset of the shared pronunciation lexicon. | 03-31-2016 |
20160093302 | SYSTEMS AND METHODS FOR CONVERTING TAXIWAY VOICE COMMANDS INTO TAXIWAY TEXTUAL COMMANDS - Systems and methods are provided for converting taxiway voice commands into taxiway textual commands. In various embodiments, the systems can comprise a radio receiver that is configured to receive the taxiway voice commands from an air traffic control center, a voice recognition processor coupled to the radio receiver that is configured to receive and convert the taxiway voice commands into the taxiway textual commands, and/or a taxiway clearance display coupled to the voice recognition processor that is configured to receive and display the taxiway textual commands. | 03-31-2016 |
20160093304 | SPEAKER IDENTIFICATION AND UNSUPERVISED SPEAKER ADAPTATION TECHNIQUES - Systems and processes for generating a speaker profile for use in performing speaker identification for a virtual assistant are provided. One example process can include receiving an audio input including user speech and determining whether a speaker of the user speech is a predetermined user based on a speaker profile for the predetermined user. In response to determining that the speaker of the user speech is the predetermined user, the user speech can be added to the speaker profile and operation of the virtual assistant can be triggered. In response to determining that the speaker of the user speech is not the predetermined user, the user speech can be added to an alternate speaker profile and operation of the virtual assistant may not be triggered. In some examples, contextual information can be used to verify results produced by the speaker identification process. | 03-31-2016 |
20160098995 | SPEECH TO TEXT TRAINING METHOD AND SYSTEM - An illustrative method includes receiving, at a processor of a computing device, an audio voice signal of a first call participant during a first call, where the first call is a communication across a communication network. The method further includes determining an identity of the first call participant and determining a speech to text profile associated with the identity of the first call participant, where the speech to text profile includes at least one rule for transcribing a word in the audio voice signal into text. The method further includes generating a text output, where the text output is a transcribed version of a plurality of words identified in the audio voice signal of the first call participant. At least one of the plurality of words identified is identified using the at least one rule. | 04-07-2016 |
20160104482 | DYNAMICALLY BIASING LANGUAGE MODELS - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition. In one aspect, a method comprises receiving audio data encoding one or more utterances; performing a first speech recognition on the audio data; identifying a context based on the first speech recognition; performing a second speech recognition on the audio data that is biased towards the context; and providing an output of the second speech recognition. | 04-14-2016 |
20160104484 | ELECTRONIC DEVICE AND METHOD FOR SPOKEN INTERACTION THEREOF - A method of operating an electronic device is provided, the method including: receiving, by the electronic device that includes a display and a voice receiving device, a sequence of speech elements through the voice receiving device; displaying, on the display by the electronic device, first information that is based on at least a part of a first speech element out of the speech elements; and displaying, on the display by the electronic device, second information, which is different than the first information and is based on at least a part of a second speech element that is received later than the first speech element among the speech elements. | 04-14-2016 |
20160117310 | METHODS AND SYSTEMS FOR CORRECTING TRANSCRIBED AUDIO FILES - Methods and systems for correcting transcribed text. One method includes receiving audio data from one or more audio data sources and transcribing the audio data based on a voice model to generate text data. The method also includes making the text data available to a plurality of users over at least one computer network and receiving corrected text data over the at least one computer network from the plurality of users. In addition, the method can include modifying the voice model based on the corrected text data. | 04-28-2016 |
20160118044 | MOBILE THOUGHT CATCHER SYSTEM - A voice capture device helps users capture and act upon fleeting thoughts. In response to user activation a processor stores an audio file corresponding to a finite amount of audio captured by a microphone of the voice capture device. The processor automatically transfers the audio file to one or more servers either directly via the Internet or via an intermediate device at a later time when such transfer is possible. The one or more servers automatically convert the audio file to a corresponding text record, automatically add the text record to a history of text records for the user, and send a copy of the history of text records for the user to a computing device associated with the user. The user can thereby utilize the computing device to review the history of text records and be reminded of actions that need to be taken. | 04-28-2016 |
20160118045 | METATAGGING OF CAPTIONS - A method for the real-time metatagging and captioning of an event. The method for the real-time metatagging and captioning of an event may include embedding metatag information in a caption file provided by a captioner. The embedded metatag information may allow a user to access additional information via the text of the captioned event. The metatag information may be embedded using a captioning device that creates both the text code and embeds the metatag code. | 04-28-2016 |
20160118050 | NON-STANDARD SPEECH DETECTION SYSTEM AND METHOD - The present invention relates to a non-standard speech detection system and method whereby a speech is analyzed based on models that are trained using personalized speech for each individual. The model is stored in a database and used to analyze a speech in real time to determine the content and behavior of an individual who is a party to a conversation that produces the speech. The results of the analysis can be used to determine if a conversation takes place under normal circumstances or under extraneous circumstances. | 04-28-2016 |
20160125878 | VEHICLE AND HEAD UNIT HAVING VOICE RECOGNITION FUNCTION, AND METHOD FOR VOICE RECOGNIZING THEREOF - A vehicle having a voice recognition function includes: a wireless communication unit configured to wirelessly transmit and receive data; a voice recognition unit configured to convert a voice signal inputted from a particular user into a digital signal and to extract voice data from the digital signal; a text converter configured to convert the voice data into text; and a control unit configured to request and receive phonebook data from a mobile communication terminal in the vehicle when a wireless connection with the mobile communication terminal is recognized and to generate example data by combining the phonebook data with supplementary data expected to be inputted as a voice signal from a user and by deleting duplicate data in combinations of the phonebook data and the supplementary data. | 05-05-2016 |
20160133247 | AUTOMATIC ACCURACY ESTIMATION FOR AUDIO TRANSCRIPTIONS - Embodiments of the present invention provide an approach for estimating the accuracy of a transcription of a voice recording. Specifically, in a typical embodiment, each word of a transcription of a voice recording is checked against a customer-specific dictionary and/or a common language dictionary. The number of words not found in either dictionary is determined. An accuracy number for the transcription is calculated from the number of said words not found and the total number of words in the transcription. | 05-12-2016 |
20160133256 | SCRIPT COMPLIANCE IN SPOKEN DOCUMENTS - A method, computerized apparatus and computer program product for determining script compliance in interactions, the method comprising: receiving one or more indexed audio interaction; receiving a text representing a script; automatically extracting two or more key terms from the script; automatically generating a query representing the script, comprising: receiving one or more constraint associated with the at least two key terms; and determining spotted key terms of the key terms that appear in the indexed audio interactions; determining complied constraints; and determining a relevance score for each of the indexed audio interactions, based on the spotted key terms and the complied constraints. | 05-12-2016 |
20160133257 | METHOD FOR DISPLAYING TEXT AND ELECTRONIC DEVICE THEREOF - A method of operating an electronic device is provided, which includes comparing gain values acquired on the basis of voices collected from at least two microphones, determining at least one speaker included in a displayed content on the basis of the compared gain values, and displaying a voice of the determined speaker in a text format in an area of a display around the determined speaker. | 05-12-2016 |
20160133258 | Word-Level Correction of Speech Input - The subject matter of this specification can be implemented in, among other things, a computer-implemented method for correcting words in transcribed text including receiving speech audio data from a microphone. The method further includes sending the speech audio data to a transcription system. The method further includes receiving a word lattice transcribed from the speech audio data by the transcription system. The method further includes presenting one or more transcribed words from the word lattice. The method further includes receiving a user selection of at least one of the presented transcribed words. The method further includes presenting one or more alternate words from the word lattice for the selected transcribed word. The method further includes receiving a user selection of at least one of the alternate words. The method further includes replacing the selected transcribed word in the presented transcribed words with the selected alternate word. | 05-12-2016 |
20160140955 | SPEECH RECOGNITION CANDIDATE SELECTION BASED ON NON-ACOUSTIC INPUT - A method includes the following steps. A speech input is received. At least two speech recognition candidates are generated from the speech input. A scene related to the speech input is observed using one or more non-acoustic sensors. The observed scene is segmented into one or more regions. One or more properties for the one or more regions are computed. One of the speech recognition candidates is selected based on the one or more computed properties of the one or more regions. | 05-19-2016 |
20160140966 | Portable speech transcription, interpreter and calculation device - A portable speech transcription and calculation device generates text and algorithmic solutions in response to speech commands. The device includes a speaker for recording speeches, an automatic speech recognition processor to generate text from recorded speeches, a formatting processor to format the text into a desired style, an algorithmic processor for calculating algorithms, and a display device to return the corresponding texts to the respective users. The device is efficacious in transcribing a speech command into text. The text may then display onto a display screen, print from an attached printer, transmit to a portable storage device, or transfer to a remote processor through email or facsimile. Complex algorithmic and accounting calculations may also be performed in response to the speech command. Eliminating the need for input devices such as keyboards reduces stress on the hand and wrist. Handicapped and blind users may also benefit from the device. | 05-19-2016 |
20160148616 | METHOD AND APPARATUS FOR RECOGNIZING SPEECH BY LIP READING - A dictation device includes: an audio input device configured to receive a voice utterance including a plurality of words; a video input device configured to receive video of lip motion during the voice utterance; a memory portion; a controller configured according to instructions in the memory portion to generate first data packets including an audio stream representative of the voice utterance and a video stream representative of the lip motion; and a transceiver for sending the first data packets to a server end device and receiving second data packets including combined dictation based upon the audio stream and the video stream from the server end device. In the combined dictation, first dictation generated based upon the audio stream has been corrected by second dictation generated based upon the video stream. | 05-26-2016 |
20160148617 | MULTI-MODE TEXT INPUT - Concepts and technologies are described herein for multi-mode text input. In accordance with the concepts and technologies disclosed herein, content is received. The content can include one or more input indicators. The input indicators can indicate that user input can be used in conjunction with consumption or use of the content. The application is configured to analyze the content to determine context associated with the content and/or the client device executing the application. The application also is configured to determine, based upon the content and/or the contextual information, which input device to use to obtain input associated with use or consumption of the content. Input captured with the input device can be converted to text and used during use or consumption of the content. | 05-26-2016 |
20160154624 | MOBILE TERMINAL AND CONTROLLING METHOD THEREOF | 06-02-2016 |
20160154788 | SYSTEM AND DIALOG MANAGER DEVELOPED USING MODULAR SPOKEN-DIALOG COMPONENTS | 06-02-2016 |
20160155435 | AIRCRAFT SYSTEMS AND METHODS FOR REDUCING AND DETECTING READ-BACK AND HEAR-BACK ERRORS | 06-02-2016 |
20160163308 | Word-Level Correction of Speech Input - The subject matter of this specification can be implemented in, among other things, a computer-implemented method for correcting words in transcribed text including receiving speech audio data from a microphone. The method further includes sending the speech audio data to a transcription system. The method further includes receiving a word lattice transcribed from the speech audio data by the transcription system. The method further includes presenting one or more transcribed words from the word lattice. The method further includes receiving a user selection of at least one of the presented transcribed words. The method further includes presenting one or more alternate words from the word lattice for the selected transcribed word. The method further includes receiving a user selection of at least one of the alternate words. The method further includes replacing the selected transcribed word in the presented transcribed words with the selected alternate word. | 06-09-2016 |
20160163316 | MOBILE WIRELESS COMMUNICATIONS DEVICE WITH SPEECH TO TEXT CONVERSION AND RELATED METHODS - A communications device and method are provided for converting speech to text and applying corrections to the text. The communications device may include at least one audio interface, such a microphone and/or speaker, and at least one communications subsystem, as well as a controller or processor operative to receive speech input using the at least one audio interface, convert the speech input to input text, correct the input text to corrected text, and send the corrected text over a network using the communications subsystem. The corrected text may involve the application of proposed modification, such as a grammatical correction or ambiguity resolution, to the input text. The application of the proposed modification may be based upon the receipt of an instruction to accept or reject the proposed correction or resolution. The instruction may be a spoken instruction. | 06-09-2016 |
20160163318 | METADATA EXTRACTION OF NON-TRANSCRIBED VIDEO AND AUDIO STREAMS - A system and computer based method for transcribing and extracting metadata from a source media. A processor-based server extracts audio and video stream from the source media. A speech recognition engine processes the audio and/or video stream to transcribe the audio and/or video stream into a time-aligned textual transcription and to extract audio amplitude by time interval, thereby providing a time-aligned machine transcribed media. The server processor measures the aural amplitude of the extracted audio amplitude and assigns a numerical value that is normalized to a single, normalized, universal amplitude scale. A database stores the time-aligned machine transcribed media, time-aligned video frames and the assigned value from the normalized amplitude scale. | 06-09-2016 |
20160163331 | ELECTRONIC DEVICE AND METHOD FOR VISUALIZING AUDIO DATA - According to one embodiment, an electronic displays a first block including speech segments, wherein a main speaker of the first block is visually distinguishable. When the first block includes a first speech segment of a first speaker and a second speech segment of a second speaker, the first speech segment is longer than the second speech segment, and the second speaker is not a speaker whose amount of speech of the sequence of the audio data is smaller than that of the first speaker or a first amount, the first speaker is determined as a main speaker of the first block. | 06-09-2016 |
20160163333 | CONFERENCING SYSTEM AND METHOD FOR CONTROLLING THE CONFERENCING SYSTEM - A communication system and a method can be configured to facilitate the performance of a conference. The system can include a conference organizer terminal and at least two participants' terminals each assigned to respective conference participants who each log in to start a conference on the communication system. The communication system can be configured to calculate a decision situation at a particular point in time of the ongoing conference by analyzing the views expressed by the conference participants during the conference and send data relating to the decision situation for that point in time to the conference organizer's terminal and/or other conference participant terminals for use in facilitating the conference. IN some embodiments, such data can be used to assist the conference participants' in recognizing when there is a consensus made on at least one decision to be made during the conference. | 06-09-2016 |
20160171981 | Method for Embedding Voice Mail in a Spoken Utterance Using a Natural Language Processing Computer System | 06-16-2016 |
20160171982 | HIGH INTELLIGIBILITY VOICE ANNOUNCEMENT SYSTEM | 06-16-2016 |
20160171983 | Processing and Cross Reference of Realtime Natural Language Dialog for Live Annotations | 06-16-2016 |
20160173958 | BROADCASTING RECEIVING APPARATUS AND CONTROL METHOD THEREOF | 06-16-2016 |
20160179831 | SYSTEMS AND METHODS FOR TEXTUAL CONTENT CREATION FROM SOURCES OF AUDIO THAT CONTAIN SPEECH | 06-23-2016 |
20160180849 | METHOD FOR PRODUCING AND RECOGNIZING BARCODE INFORMATION BASED ON VOICE, AND RECORDING MEDIUM | 06-23-2016 |
20160189710 | METHOD AND APPARATUS FOR SPEECH RECOGNITION - A method and apparatus for speech recognition are disclosed. The speech recognition apparatus includes a processor configured to process a received speech signal, generate a word sequence based on a phoneme sequence generated from the speech signal, generate a syllable sequence corresponding to a word element among words comprised in the word sequence based on the phoneme sequence, and determine a text corresponding to a recognition result of the speech signal based on the word sequence and the syllable sequence. | 06-30-2016 |
20160189712 | ENGINE, SYSTEM AND METHOD OF PROVIDING AUDIO TRANSCRIPTIONS FOR USE IN CONTENT RESOURCES - The present invention relates to the transcription of audio, and, more particularly, to an engine, system and method of providing audio transcriptions for use in content resources. | 06-30-2016 |
20160189713 | APPARATUS AND METHOD FOR AUTOMATICALLY CREATING AND RECORDING MINUTES OF MEETING - An electronic apparatus for automatically acquiring and revising minutes of a meeting and a method thereof includes the steps of identifying one or more speakers from audio signals which are recorded during a meeting, based on pre-sampled audio signals and a voice feature table stored in a non-transitory storage medium. The audio signals are converted to text and divided into paragraphs, one paragraph being attributable to one speaker, and each speaker has a given user name. An original minutes of the meeting, based on the text and a meeting minutes template stored in the non-transitory storage medium, is prepared and revised and issued to all relevant persons. | 06-30-2016 |
20160203818 | WIRELESS CAPTION COMMUNICATION SERVICE SYSTEM | 07-14-2016 |
20160203819 | Text Conversion Method and Device | 07-14-2016 |
20160253314 | AUTOMATIC CAPTURE OF INFORMATION FROM AUDIO DATA AND COMPUTER OPERATING CONTEXT | 09-01-2016 |
20160378850 | DETERMING PREFERRED COMMUNICATION EXPLANATIONS USING RECORD-RELEVANCY TIERS - In one example of the disclosure, data indicative of a word or phrase communicated during a meeting including a plurality of participants is obtained. For each participant, records electronically accessible to the participant are identified, and each record is associated with a tier from a hierarchy of record-relevancy tiers. A set of explanations for the communication and associated scores is identified, including for each participant, beginning with a most relevant tier, searching the records accessible to the participant tier by tier until an explanation is identified, and assigning a score to the explanation according to the tier associated with the record in which the explanation is found. A preferred explanation for the communication is determined based upon the scores, and a display of the preferred explanation is caused. | 12-29-2016 |
20160379169 | CONFERENCE INFORMATION ACCUMULATING APPARATUS, METHOD, AND COMPUTER PROGRAM PRODUCT - According to an embodiment, a conference information accumulating apparatus is for accumulating conference information. The apparatus includes a generator and a calculator. The generator is configured to generate a user interface screen either for creating minutes of a conference based on the conference information or for viewing the created minutes. The calculator is configured to calculate a correlation between a written text that is a unit in which the minutes are written and the conference information, based on a predetermined operation performed using the user interface screen by a minutes creator. The generator generates, upon detection of the conference information that is correlated with the written text, the user interface screen enabling a reference to the conference information. | 12-29-2016 |
20160379624 | RECOGNITION RESULT OUTPUT DEVICE, RECOGNITION RESULT OUTPUT METHOD, AND COMPUTER PROGRAM PRODUCT - According to an embodiment, a speech recognition result output device includes a storage and processing circuitry. The storage is configured to store a language model, for speech recognition. The processing circuitry is coupled to the storage and configured to acquire a phonetic sequence, convert the phonetic sequence into a phonetic sequence feature vector, convert the phonetic sequence feature vector into graphemes using the language model, and output the graphemes. | 12-29-2016 |
20160379630 | SPEECH RECOGNITION SERVICES - Various systems and methods for providing speech recognition services are described herein. A user device for providing speech recognition services includes a speech module to maintain a speech recognition model of a user of the user device; a user interaction module to detect an initiation of an interaction between the user and a target device; and a transmission module to transmit the speech recognition model to the target device, the target device to use the speech recognition model to enhance a speech recognition process executed by the target device during the interaction between the user and the target device. | 12-29-2016 |
20160379636 | SYSTEM AND METHOD FOR HANDLING A SPOKEN USER REQUEST - Method for handling a spoken user request of a user, executable by each one of at least two applications installed on an electronic device, comprising determining that the spoken user request corresponds to an action executable by an other one of the at least two of the applications; and causing execution of the action by the other one of the at least two of the applications. Method for handling a spoken user request received from a user of an electronic device, comprising detecting reception of a spoken user request by a first application; transferring the spoken user request to the second application by the first application; determining, by the second application, that the spoken user request corresponds to an action executable by a third application; and causing, by the second application, execution of the action by the third application. Electronic devices configured to carry out the methods are also disclosed. | 12-29-2016 |
20160379638 | INPUT SPEECH QUALITY MATCHING - A system matches text-to-speech (TTS) or other output to a quality of an input spoken utterance. The system uses trained models to detect a speech quality and generates an indicator of the speech quality. The speech quality may be determined from audio or non-audio data. The indicator is sent to downstream components of the system such as a command processor or TTS system. The output of the system is then determined using the indicator of speech quality, thus customizing an output of the system to the manner in which the utterance was spoken. | 12-29-2016 |
20160379639 | PRIVACY-PRESERVING TRAINING CORPUS SELECTION - The present disclosure relates to training a speech recognition system. A system that includes an automated speech recognizer and receives data from a client device. The system determines that at least a portion of the received data is likely sensitive data. Before the at least a portion of the received data is deleted, the system provides the at least a portion of the received data to a model training engine that trains recognition models for the automated speech recognizer. After the at least a portion of the received data is provided, the system deletes the at least a portion of the received data. | 12-29-2016 |
20160379640 | SYSTEM AND METHOD FOR AIRCRAFT VOICE-TO-TEXT COMMUNICATION WITH MESSAGE VALIDATION - The embodiments described herein can provide improved reliability and accuracy of communication to and from and aircraft, such as aircraft-to-aircraft communication and aircraft-to-air traffic control (ATC) communication. In particular, the systems and methods can improve the reliability and accuracy of aircraft communication by performing voice-to-text conversions on voice communications and performing a validation check on the resulting voice-to-text converted message. This validation check determines a measure of validation for the voice-to-text converted message. In one embodiment, the voice-to-text converted message can be displayed to a user with a visual indicator of the measure of validation. Thus, the user can be made aware of the measure of validation when viewing the voice-to-text converted message. Such a system and method can be used to provide the user with increased information regarding the reliability and accuracy of the voice-to-text converted message, and thus can provide improved communication to and from the aircraft. | 12-29-2016 |
20160379641 | Auto-Generation of Notes and Tasks From Passive Recording - Systems and methods, and computer-readable media bearing instructions for executing one or more actions associated with a predetermined feature detected in an ongoing content stream are presented. As the ongoing content stream is passively recorded, the content stream is monitored for any one of a plurality of predetermined features. Upon detecting a predetermined feature in the ongoing content stream, one or more actions associated the detected feature are carried out with regard to the recorded content in the passive recording buffer. | 12-29-2016 |
20170236517 | CONTEXTUAL NOTE TAKING | 08-17-2017 |
20190147049 | METHOD AND APPARATUS FOR PROCESSING INFORMATION | 05-16-2019 |
20190147878 | GENERATING AND TRANSMITTING INVOCATION REQUEST TO APPROPRIATE THIRD-PARTY AGENT | 05-16-2019 |
20190147903 | CONFERENCING SYSTEM AND METHOD FOR CONTROLLING THE CONFERENCING SYSTEM | 05-16-2019 |
20220139400 | SPEECH TRANSCRIPTION USING MULTIPLE DATA SOURCES - This disclosure describes transcribing speech using audio, image, and other data. A system is described that includes an audio capture system configured to capture audio data associated with a plurality of speakers, an image capture system configured to capture images of one or more of the plurality of speakers, and a speech processing engine. The speech processing engine may be configured to recognize a plurality of speech segments in the audio data, identify, for each speech segment of the plurality of speech segments and based on the images, a speaker associated with the speech segment, transcribe each of the plurality of speech segments to produce a transcription of the plurality of speech segments including, for each speech segment in the plurality of speech segments, an indication of the speaker associated with the speech segment, and analyze the transcription to produce additional data derived from the transcription. | 05-05-2022 |