Patent application number | Description | Published |
20090055175 | CONTINUOUS SPEECH TRANSCRIPTION PERFORMANCE INDICATION - A method of providing speech transcription performance indication includes receiving, at a user device data representing text transcribed from an audio stream by an ASR system, and data representing a metric associated with the audio stream; displaying, via the user device, said text; and via the user device, providing, in user-perceptible form, an indicator of said metric. Another method includes displaying, by a user device, text transcribed from an audio stream by an ASR system; and via the user device, providing, in user-perceptible form, an indicator of a level of background noise of the audio stream. Another method includes receiving data representing an audio stream; converting said data representing an audio stream to text via an ASR system; determining a metric associated with the audio stream; transmitting data representing said text to a user device; and transmitting data representing said metric to the user device. | 02-26-2009 |
20090076917 | FACILITATING PRESENTATION OF ADS RELATING TO WORDS OF A MESSAGE - Targeted delivery of contextually relevant ad impressions to a mobile device is provided. The ad impressions are delivered within text messages and/or instant message chat threads. Monetizing of text messaging and instant messaging by providers of such services is achieved, while providing unobtrusive and contextually relevant information to users of such services. | 03-19-2009 |
20090124272 | FILTERING TRANSCRIPTIONS OF UTTERANCES - A method for facilitating mobile phone messaging, such as text messaging and instant messaging, includes receiving audio data communicated from the mobile communication device, the audio data representing an utterance that is intended to be at least a portion of the text of the message that is to be sent from the mobile phone to a recipient; transcribing the utterance to text based on the received audio data to generate a transcription; and applying a filter to the transcribed text to generate a filtered transcription, the text of which is intended to mimic language patterns of mobile device messaging that is performed manually by users. The method may also be applied to the audio data of a voicemail, with the filtered, transcribed text being communicated to a mobile phone as, for example, an SMS text message. | 05-14-2009 |
20090182560 | USING A PHYSICAL PHENOMENON DETECTOR TO CONTROL OPERATION OF A SPEECH RECOGNITION ENGINE - A transmission device such as a cell phone or other mobile communication device includes a physical phenomenon detection device to perform a “push to talk” function by detecting the occurrence of a particular physical phenomenon and using such detection to start and stop recording an utterance for subsequent analysis by a speech recognition engine. A method of controlling operation of a speech recognition engine in response to detection of a physical phenomenon includes detecting or sensing, via a physical phenomenon detection unit, a predetermined physical phenomenon representative of an intent to invoke operation of a speech recognition engine. In response to the detection or sensing of the predetermined physical phenomenon, a signal is transmitted to a control unit in a communication device. In response to the receipt of the transmitted signal, the utterance received from a user via the communication device is recorded, and the recorded utterance is provided to a speech recognition engine for operation thereon. The user may thus effectuate operation of the speech recognition engine upon the utterance by causing the physical phenomenon to occur. | 07-16-2009 |
20090228274 | USE OF INTERMEDIATE SPEECH TRANSCRIPTION RESULTS IN EDITING FINAL SPEECH TRANSCRIPTION RESULTS - A communication system includes at least one transmitting device and at least one receiving device, one or more network systems for connecting the transmitting device to the receiving device, and an automatic speech recognition (“ASR”) system, including an ASR engine. A user speaks an utterance into the transmitting device, and the recorded speech audio is sent to the ASR engine. The ASR engine returns intermediate transcription results to the transmitting device, which displays the intermediate transcription results in real-time to the user. The intermediate transcription results are also correlated by utterance fragment to final transcription results and displayed to the user. The user may use the information thus presented to make decisions as to whether to edit the final transcription results or to speak the utterance again, thereby repeating the process. The intermediate transcription results may also be used by the user to edit the final transcription results. | 09-10-2009 |
20090240488 | CORRECTIVE FEEDBACK LOOP FOR AUTOMATED SPEECH RECOGNITION - A method for facilitating the updating of a language model includes receiving, at a client device, via a microphone, an audio message corresponding to speech of a user; communicating the audio message to a first remote server; receiving, that the client device, a result, transcribed at the first remote server using an automatic speech recognition system (“ASR”), from the audio message; receiving, at the client device from the user, an affirmation of the result; storing, at the client device, the result in association with an identifier corresponding to the audio message; and communicating, to a second remote server, the stored result together with the identifier. | 09-24-2009 |
20090248415 | USE OF METADATA TO POST PROCESS SPEECH RECOGNITION OUTPUT - A method of utilizing metadata stored in a computer-readable medium to assist in the conversion of an audio stream to a text stream. The method compares personally identifiable data, such as a user's electronic address book and/or Caller/Recipient ID information (in the case of processing voice mail to text), to the n-best results generated by a speech recognition engine for each word that is output by the engine. A goal of this comparison is to correct a possible misrecognition of a spoken proper noun such as a name or company with its proper textual form or a spoken phone number to correctly formatted phone number with Arabic numerals to improve the overall accuracy of the output of the voice recognition system. | 10-01-2009 |
20120303445 | FACILITATING PRESENTATION OF ADS RELATING TO WORDS OF A MESSAGE - Targeted delivery of contextually relevant ad impressions to a mobile device is provided. The ad impressions are delivered within text messages and/or instant message chat threads. Monetizing of text messaging and instant messaging by providers of such services is achieved, while providing unobtrusive and contextually relevant information to users of such services. | 11-29-2012 |
20130018655 | CONTINUOUS SPEECH TRANSCRIPTION PERFORMANCE INDICATION - A method of providing speech transcription performance indication includes receiving, at a user device data representing text transcribed from an audio stream by an ASR system, and data representing a metric associated with the audio stream; displaying, via the user device, said text; and via the user device, providing, in user-perceptible form, an indicator of said metric. Another method includes displaying, by a user device, text transcribed from an audio stream by an ASR system; and via the user device, providing, in user-perceptible form, an indicator of a level of background noise of the audio stream. Another method includes receiving data representing an audio stream; converting said data representing an audio stream to text via an ASR system; determining a metric associated with the audio stream; transmitting data representing said text to a user device; and transmitting data representing said metric to the user device. | 01-17-2013 |
20130018656 | FILTERING TRANSCRIPTIONS OF UTTERANCES - A method for facilitating mobile phone messaging, such as text messaging and instant messaging, includes receiving audio data communicated from the mobile communication device, the audio data representing an utterance that is intended to be at least a portion of the text of the message that is to be sent from the mobile phone to a recipient; transcribing the utterance to text based on the received audio data to generate a transcription; and applying a filter to the transcribed text to generate a filtered transcription, the text of which is intended to mimic language patterns of mobile device messaging that is performed manually by users. The method may also be applied to the audio data of a voicemail, with the filtered, transcribed text being communicated to a mobile phone as, for example, an SMS text message. | 01-17-2013 |
20130024195 | CORRECTIVE FEEDBACK LOOP FOR AUTOMATED SPEECH RECOGNITION - A method for facilitating the updating of a language model includes receiving, at a client device, via a microphone, an audio message corresponding to speech of a user; communicating the audio message to a first remote server; receiving, that the client device, a result, transcribed at the first remote server using an automatic speech recognition system (“ASR”), from the audio message; receiving, at the client device from the user, an affirmation of the result; storing, at the client device, the result in association with an identifier corresponding to the audio message; and communicating, to a second remote server, the stored result together with the identifier. | 01-24-2013 |
20130080179 | USING A PHYSICAL PHENOMENON DETECTOR TO CONTROL OPERATION OF A SPEECH RECOGNITION ENGINE - A device may include a physical phenomenon detector. The physical phenomenon detector may detect a physical phenomenon related to the device. In response to detecting the physical phenomenon, the device may record audio data that includes speech. The speech may be transcribed with a speech recognition engine. The speech recognition engine may be included in the device, or may be included with a remote computing device with which the device may communicate. | 03-28-2013 |
20150025884 | CORRECTIVE FEEDBACK LOOP FOR AUTOMATED SPEECH RECOGNITION - A method for facilitating the updating of a language model includes receiving, at a client device, via a microphone, an audio message corresponding to speech of a user; communicating the audio message to a first remote server; receiving, that the client device, a result, transcribed at the first remote server using an automatic speech recognition system (“ASR”), from the audio message; receiving, at the client device from the user, an affirmation of the result; storing, at the client device, the result in association with an identifier corresponding to the audio message; and communicating, to a second remote server, the stored result together with the identifier. | 01-22-2015 |
Patent application number | Description | Published |
20080243502 | PARTIALLY FILLING MIXED-INITIATIVE FORMS FROM UTTERANCES HAVING SUB-THRESHOLD CONFIDENCE SCORES BASED UPON WORD-LEVEL CONFIDENCE DATA - The invention discloses prompting for a spoken response that provides input for multiple elements. A single spoken utterance including content for multiple elements can be received, where each element is mapped to a data field. The spoken utterance can be speech-to-text converted to derive values for each of the multiple elements. An utterance level confidence score can be determined, which can fall below an associated certainty threshold. Element-level confidence scores for each of the derived elements can then be ascertained. A first set of the multiple elements can have element-level confidence scores above an associated certainty threshold and a second set can have scores below. Values can be stored in data fields mapped to the first set. A prompt for input for the second set can be played. Accordingly, data fields are partially filled in based upon the original speech utterance, where a second prompt for unfilled fields is played. | 10-02-2008 |
20090044146 | ASSOCIATING FILE TYPES WITH WEB-BASED APPLICATIONS FOR AUTOMATICALLY LAUNCHING THE ASSOCIATED APPLICATION - The present invention discloses a launching engine configured to automatically launch a Web site and load an electronic document responsive to a launching event for the electronic document. The launching engine can be a component of a computer operating system (e.g., MAC OS, OS/2, WINDOWS XP, etc.) or a graphics management component (e.g., KDE, GNOME, etc.) of a computer. A launching event can be initiated by user selection of a document icon, a user selection of an electronic document from a file management application, a launching script for the electronic document triggered by a media insertion action, and the like. | 02-12-2009 |
20090199101 | SYSTEMS AND METHODS FOR INPUTTING GRAPHICAL DATA INTO A GRAPHICAL INPUT FIELD | 08-06-2009 |
20120046950 | RETRIEVAL AND PRESENTATION OF NETWORK SERVICE RESULTS FOR MOBILE DEVICE USING A MULTIMODAL BROWSER - A method of obtaining information using a mobile device can include receiving a request including speech data from the mobile device, and querying a network service using query information extracted from the speech data, whereby search results are received from the network service. The search results can be formatted for presentation on a display of the mobile device. The search results further can be sent, along with a voice grammar generated from the search results, to the mobile device. The mobile device then can render the search results. | 02-23-2012 |
20120166199 | HOSTED VOICE RECOGNITION SYSTEM FOR WIRELESS DEVICES - Methods, systems, and software for converting the audio input of a user of a hand-held client device or mobile phone into a textual representation by means of a backend server accessed by the device through a communications network. The text is then inserted into or used by an application of the client device to send a text message, instant message, email, or to insert a request into a web-based application or service. In one embodiment, the method includes the steps of initializing or launching the application on the device; recording and transmitting the recorded audio message from the client device to the backend server through a client-server communication protocol; converting the transmitted audio message into the textual representation in the backend server; and sending the converted text message back to the client device or forwarding it on to an alternate destination directly from the server. | 06-28-2012 |
20130158994 | RETRIEVAL AND PRESENTATION OF NETWORK SERVICE RESULTS FOR MOBILE DEVICE USING A MULTIMODAL BROWSER - A method of obtaining information using a mobile device can include receiving a request including speech data from the mobile device, and querying a network service using query information extracted from the speech data, whereby search results are received from the network service. The search results can be formatted for presentation on a display of the mobile device. The search results further can be sent, along with a voice grammar generated from the search results, to the mobile device. The mobile device then can render the search results. | 06-20-2013 |