Patent application number | Description | Published |
20080221881 | Recognition of Speech in Editable Audio Streams - A speech processing system divides a spoken audio stream into partial audio streams, referred to as “snippets.” The system may divide a portion of the audio stream into two snippets at a position at which the speaker performed an editing operation, such as pausing and then resuming recording, or rewinding and then resuming recording. The snippets may be transmitted sequentially to a consumer, such as an automatic speech recognizer or a playback device, as the snippets are generated. The consumer may process (e.g., recognize or play back) the snippets as they are received. The consumer may modify its output in response to editing operations reflected in the snippets. The consumer may process the audio stream while it is being created and transmitted even if the audio stream includes editing operations that invalidate previously-transmitted partial audio streams, thereby enabling shorter turnaround time between dictation and consumption of the complete audio stream. | 09-11-2008 |
20090048833 | Automated Extraction of Semantic Content and Generation of a Structured Document from Speech - Techniques are disclosed for automatically generating structured documents based on speech, including identification of relevant concepts and their interpretation. In one embodiment, a structured document generator uses an integrated process to generate a structured textual document (such as a structured textual medical report) based on a spoken audio stream. The spoken audio stream may be recognized using a language model which includes a plurality of sub-models arranged in a hierarchical structure. Each of the sub-models may correspond to a concept that is expected to appear in the spoken audio stream. Different portions of the spoken audio stream may be recognized using different sub-models. The resulting structured textual document may have a hierarchical structure that corresponds to the hierarchical structure of the language sub-models that were used to generate the structured textual document. | 02-19-2009 |
20090048834 | Audio Signal De-Identification - Techniques are disclosed for automatically de-identifying spoken audio signals. In particular, techniques are disclosed for automatically removing personally identifying information from spoken audio signals and replacing such information with non-personally identifying information. De-identification of a spoken audio signal may be performed by automatically generating a report based on the spoken audio signal. The report may include concept content (e.g., text) corresponding to one or more concepts represented by the spoken audio signal. The report may also include timestamps indicating temporal positions of speech in the spoken audio signal that corresponds to the concept content. Concept content that represents personally identifying information is identified. Audio corresponding to the personally identifying concept content is removed from the spoken audio signal. The removed audio may be replaced with non-personally identifying audio. | 02-19-2009 |
20090132239 | Audio Signal De-Identification - Techniques are disclosed for automatically de-identifying spoken audio signals. In particular, techniques are disclosed for automatically removing personally identifying information from spoken audio signals and replacing such information with non-personally identifying information. De-identification of a spoken audio signal may be performed by automatically generating a report based on the spoken audio signal. The report may include concept content (e.g., text) corresponding to one or more concepts represented by the spoken audio signal. The report may also include timestamps indicating temporal positions of speech in the spoken audio signal that corresponds to the concept content. Concept content that represents personally identifying information is identified. Audio corresponding to the personally identifying concept content is removed from the spoken audio signal. The removed audio may be replaced with non-personally identifying audio. | 05-21-2009 |
20100057450 | Hybrid Speech Recognition - A hybrid speech recognition system uses a client-side speech recognition engine and a server-side speech recognition engine to produce speech recognition results for the same speech. An arbitration engine produces speech recognition output based on one or both of the client-side and server-side speech recognition results. | 03-04-2010 |
20100057451 | Distributed Speech Recognition Using One Way Communication - A speech recognition client sends a speech stream and control stream in parallel to a server-side speech recognizer over a network. The network may be an unreliable, low-latency network. The server-side speech recognizer recognizes the speech stream continuously. The speech recognition client receives recognition results from the server-side recognizer in response to requests from the client. The client may remotely reconfigure the state of the server-side recognizer during recognition. | 03-04-2010 |
20100076761 | Decoding-Time Prediction of Non-Verbalized Tokens - Non-verbalized tokens, such as punctuation, are automatically predicted and inserted into a transcription of speech in which the tokens were not explicitly verbalized. Token prediction may be integrated with speech decoding, rather than performed as a post-process to speech decoding. | 03-25-2010 |
20100299135 | Automated Extraction of Semantic Content and Generation of a Structured Document from Speech - Techniques are disclosed for automatically generating structured documents based on speech, including identification of relevant concepts and their interpretation. In one embodiment, a structured document generator uses an integrated process to generate a structured textual document (such as a structured textual medical report) based on a spoken audio stream. The spoken audio stream may be recognized using a language model which includes a plurality of sub-models arranged in a hierarchical structure. Each of the sub-models may correspond to a concept that is expected to appear in the spoken audio stream. Different portions of the spoken audio stream may be recognized using different sub-models. The resulting structured textual document may have a hierarchical structure that corresponds to the hierarchical structure of the language sub-models that were used to generate the structured textual document. | 11-25-2010 |
20100318347 | Content-Based Audio Playback Emphasis - Techniques are disclosed for facilitating the process of proofreading draft transcripts of spoken audio streams. In general, proofreading of a draft transcript is facilitated by playing back the corresponding spoken audio stream with an emphasis on those regions in the audio stream that are highly relevant or likely to have been transcribed incorrectly. Regions may be emphasized by, for example, playing them back more slowly than regions that are of low relevance and likely to have been transcribed correctly. Emphasizing those regions of the audio stream that are most important to transcribe correctly and those regions that are most likely to have been transcribed incorrectly increases the likelihood that the proofreader will accurately correct any errors in those regions, thereby improving the overall accuracy of the transcript. | 12-16-2010 |
20110238415 | Hybrid Speech Recognition - A hybrid speech recognition system uses a client-side speech recognition engine and a server-side speech recognition engine to produce speech recognition results for the same speech. An arbitration engine produces speech recognition output based on one or both of the client-side and server-side speech recognition results. | 09-29-2011 |
20110282687 | Clinical Data Reconciliation as Part of a Report Generation Solution - An automated system updates electronic medical records (EMRs) based on dictated reports, without requiring manual data entry into on-screen forms. A dictated report is transcribed by an automatic speech recognizer, and facts are extracted from the report and stored in encoded form. Information from a patient's report is also stored in encoded form. The resulting encoded information from the report and EMR are reconciled with each other, and changes to be made to the EMR are identified based on the reconciliation. The identified changes are made to the EMR automatically, without requiring manual data entry into the EMR. | 11-17-2011 |
20110288857 | Distributed Speech Recognition Using One Way Communication - A speech recognition client sends a speech stream and control stream in parallel to a server-side speech recognizer over a network. The network may be an unreliable, low-latency network. The server-side speech recognizer recognizes the speech stream continuously. The speech recognition client receives recognition results from the server-side recognizer in response to requests from the client. The client may remotely reconfigure the state of the server-side recognizer during recognition. | 11-24-2011 |
20110289405 | Monitoring User Interactions With A Document Editing System - A human editor uses a document editing system to edit a draft document. The editor's editing behavior is monitored and logged. Statistics are developed from the log to produce an assessment of the editor's productivity. This assessment, in combination with assessments of other editors, may be used to develop behavioral metrics which indicate correlations between editing behaviors and productivity. The behavioral metrics may be used to identify including the relative contribution to efficient editing of different editing behaviors. Such information about individual editing behaviors may be used to evaluate the productivity of individual editors based on their editing behaviors, to identify behaviors which individual editors could adopt to improve their productivities, and to identify changes to the editing system itself for improving editor productivity. An editor's editing behavior may be “played back” and observed by a human in an attempt to identify the causes of the editor's poor productivity. | 11-24-2011 |
20120041950 | Providing Computable Guidance to Relevant Evidence in Question-Answering Systems - A computer-based system includes a computer-processable definition of a region in a data set. The system identifies a region of the data set based on the definition of the region. The system provides output to a user representing a question and the identified region of the data set. The system may also automatically generate an answer to the question based on the question and the data set, and provide output to the user representing the answer. The system may generate the answer based on a subset of the data set, and provide output to the user representing the subset of the data set. The user may provide feedback on the first answer to the system, which the system may use to improve subsequent answers to the same and other questions, and to disable the system's automatic question-answering function in response to disagreement between the user and the system. | 02-16-2012 |
20120078763 | User Feedback in Semi-Automatic Question Answering Systems - A system applies rules to a set of documents to generate codes, such as billing codes for use in medical billing. A human operator provides input specifying whether the generated codes are correct. Based on the input from the human operator, the system attempts to identify which clause(s) in the rules which were relied on to generate the particular code are correct and which such clause(s) are incorrect. The system then assigns praise to components of the system responsible for generating codes in the correct clauses, and assigns blame to components of the system responsible for generating codes in the incorrect clauses. Such blame and praise may then be used to determine whether particular code-generating components are insufficiently reliable. The system may disable, or take other remedial action in response to, insufficiently reliable code-generating components. | 03-29-2012 |
20120089629 | Structured Searching of Dynamic Structured Document Corpuses - A system includes a document corpus containing structured documents, which contain both text and annotations of the text. The system also includes a search engine which is adapted to perform structured searches of the structured documents. As new types of annotations are added to the system, the search engine is updated automatically to become capable of performing structured searches for the new types of annotations. For example, if a new natural language processing (NLP) component, adapted to generate annotations of a new type, is added to the system, then the system automatically updates a query language to include a definition of the new type of annotation. The search engine may then immediately be capable of processing structured queries which refer to the new type of annotation. | 04-12-2012 |
20120296639 | Verification of Extracted Data - Facts are extracted from speech and recorded in a document using codings. Each coding represents an extracted fact and includes a code and a datum. The code may represent a type of the extracted fact and the datum may represent a value of the extracted fact. The datum in a coding is rendered based on a specified feature of the coding. For example, the datum may be rendered as boldface text to indicate that the coding has been designated as an “allergy.” In this way, the specified feature of the coding (e.g., “allergy”-ness) is used to modify the manner in which the datum is rendered. A user inspects the rendering and provides, based on the rendering, an indication of whether the coding was accurately designated as having the specified feature. A record of the user's indication may be stored, such as within the coding itself. | 11-22-2012 |
20120296644 | Hybrid Speech Recognition - A hybrid speech recognition system uses a client-side speech recognition engine and a server-side speech recognition engine to produce speech recognition results for the same speech. An arbitration engine produces speech recognition output based on one or both of the client-side and server-side speech recognition results. | 11-22-2012 |
20120296645 | Distributed Speech Recognition Using One Way Communication - A speech recognition client sends a speech stream and control stream in parallel to a server-side speech recognizer over a network. The network may be an unreliable, low-latency network. The server-side speech recognizer recognizes the speech stream continuously. The speech recognition client receives recognition results from the server-side recognizer in response to requests from the client. The client may remotely reconfigure the state of the server-side recognizer during recognition. | 11-22-2012 |
20120303365 | Audio Signal De-Identification - Techniques are disclosed for automatically de-identifying spoken audio signals. In particular, techniques are disclosed for automatically removing personally identifying information from spoken audio signals and replacing such information with non-personally identifying information. De-identification of a spoken audio signal may be performed by automatically generating a report based on the spoken audio signal. The report may include concept content (e.g., text) corresponding to one or more concepts represented by the spoken audio signal. The report may also include timestamps indicating temporal positions of speech in the spoken audio signal that corresponds to the concept content. Concept content that represents personally identifying information is identified. Audio corresponding to the personally identifying concept content is removed from the spoken audio signal. The removed audio may be replaced with non-personally identifying audio. | 11-29-2012 |
20120316871 | Speech Recognition Using Loosely Coupled Components - An automatic speech recognition system includes an audio capture component, a speech recognition processing component, and a result processing component which are distributed among two or more logical devices and/or two or more physical devices. In particular, the audio capture component may be located on a different logical device and/or physical device from the result processing component. For example, the audio capture component may be on a computer connected to a microphone into which a user speaks, while the result processing component may be on a terminal server which receives speech recognition results from a speech recognition processing server. | 12-13-2012 |
20120323557 | Speech Recognition Using Context-Aware Recognition Models - Inputs provided into user interface elements of an application are observed. Records are made of the inputs and the state(s) the application was in while the inputs were provided. For each state, a corresponding language model is trained based on the input(s) provided to the application while the application was in that state. When the application is next observed to be in a previously-observed state, a language model associated with the application's current state is applied to recognize speech input provided by a user and thereby to generate speech recognition output that is provided to the application. An application's state at a particular time may include the user interface element(s) that are displayed and/or in focus at that time. | 12-20-2012 |
20120323572 | Document Extension in Dictation-Based Document Generation Workflow - An automatic speech recognizer is used to produce a structured document representing the contents of human speech. A best practice is applied to the structured document to produce a conclusion, such as a conclusion that required information is missing from the structured document. Content is inserted into the structured document based on the conclusion, thereby producing a modified document. The inserted content may be obtained by prompting a human user for the content and receiving input representing the content from the human user. | 12-20-2012 |
20120323598 | Using Alternative Sources of Evidence in Computer-Assisted Billing Coding - A computerized billing code generator reviews billing source data containing both admissible data (such as physician's notes) and inadmissible data (such as nurse's notes). The billing code generator determines whether to generate a request to review the first data based on both the first data and the second data. For example, the billing code generator may generate the request in response to determining that the second data contains information that is inconsistent with information contained in the first data. As another example, the billing code generator may generate the request in response to determining that the second data contains information that is not contained within the first data. | 12-20-2012 |
20130103400 | Document Transcription System Training - A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system my identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript. | 04-25-2013 |
20130304453 | Automated Extraction of Semantic Content and Generation of a Structured Document from Speech - Techniques are disclosed for automatically generating structured documents based on speech, including identification of relevant concepts and their interpretation. In one embodiment, a structured document generator uses an integrated process to generate a structured textual document (such as a structured textual medical report) based on a spoken audio stream. The spoken audio stream may be recognized using a language model which includes a plurality of sub-models arranged in a hierarchical structure. Each of the sub-models may correspond to a concept that is expected to appear in the spoken audio stream. Different portions of the spoken audio stream may be recognized using different sub-models. The resulting structured textual document may have a hierarchical structure that corresponds to the hierarchical structure of the language sub-models that were used to generate the structured textual document. | 11-14-2013 |
20130346074 | Verification of Extracted Data - Facts are extracted from speech and recorded in a document using codings. Each coding represents an extracted fact and includes a code and a datum. The code may represent a type of the extracted fact and the datum may represent a value of the extracted fact. The datum in a coding is rendered based on a specified feature of the coding. For example, the datum may be rendered as boldface text to indicate that the coding has been designated as an “allergy.” In this way, the specified feature of the coding (e.g., “allergy”-ness) is used to modify the manner in which the datum is rendered. A user inspects the rendering and provides, based on the rendering, an indication of whether the coding was accurately designated as having the specified feature. A record of the user's indication may be stored, such as within the coding itself. | 12-26-2013 |
20140039880 | Applying Service Levels to Transcripts - Speech is transcribed to produce a draft transcript of the speech. Portions of the transcript having a high priority are identified. For example, particular sections of the transcript may be identified as high-priority sections. As another example, portions of the transcript requiring human verification may be identified as high-priority sections. High-priority portions of the transcript are verified at a first time, without verifying other portions of the transcript. Such other portions may or may not be verified at a later time. Limiting verification, either initially or entirely, to high-priority portions of the transcript limits the time required to perform such verification, thereby making it feasible to verify the most important portions of the transcript at an early stage without introducing an undue delay into the transcription process. Verifying the other portions of the transcript later ensures that early verification of the high-priority portions does not sacrifice overall verification accuracy. | 02-06-2014 |
20140047375 | Maintaining a Discrete Data Representation that Corresponds to Information Contained in Free-Form Text - A system includes a data record (such as an Electronic Medical Record (EMR)) and a user interface for modifying (e.g., storing data in) the data record. The data record includes both free-form text elements and discrete data elements. The user interface includes user interface elements for receiving free-form text data. In response to receiving free-form text data via the free-form text user interface elements, a suggested action is identified, such as a suggested action to take in connection with one of the discrete data elements of the data record. Output is generated representing the suggested action. A user provides input indicating acceptance or rejection of the suggested action. The suggested action may be performed in response to the user input. | 02-13-2014 |
20140149114 | Automatic Decision Support - Speech is transcribed to produce a transcript. At least some of the text in the transcript is encoded as data. These codings may be verified for accuracy and corrected if inaccurate. The resulting transcript is provided to a decision support system to perform functions such as checking for drug-drug, drug-allergy, and drug-procedure interactions, and checking against clinical performance measures (such as recommended treatments). Alerts and other information output by the decision support system are associated with the transcript. The transcript and associated decision support output are provided to a physician to assist the physician in reviewing the transcript and in taking any appropriate action in response to the transcript. | 05-29-2014 |
20140163974 | Distributed Speech Recognition Using One Way Communication - A speech recognition client sends a speech stream and control stream in parallel to a server-side speech recognizer over a network. The network may be an unreliable, low-latency network. The server-side speech recognizer recognizes the speech stream continuously. The speech recognition client receives recognition results from the server-side recognizer in response to requests from the client. The client may remotely reconfigure the state of the server-side recognizer during recognition. | 06-12-2014 |
20140164197 | User Feedback in Semi-Automatic Question Answering Systems - A system applies rules to a set of documents to generate codes, such as billing codes for use in medical billing. A human operator provides input specifying whether the generated codes are correct. Based on the input from the human operator, the system attempts to identify which clause(s) in the rules which were relied on to generate the particular code are correct and which such clause(s) are incorrect. The system then assigns praise to components of the system responsible for generating codes in the correct clauses, and assigns blame to components of the system responsible for generating codes in the incorrect clauses. Such blame and praise may then be used to determine whether particular code-generating components are insufficiently reliable. The system may disable, or take other remedial action in response to, insufficiently reliable code-generating components. | 06-12-2014 |
20140249818 | Document Transcription System Training - A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system may identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript. | 09-04-2014 |
20140309995 | Content-Based Audio Playback Emphasis - Techniques are disclosed for facilitating the process of proofreading draft transcripts of spoken audio streams. In general, proofreading of a draft transcript is facilitated by playing back the corresponding spoken audio stream with an emphasis on those regions in the audio stream that are highly relevant or likely to have been transcribed incorrectly. Regions may be emphasized by, for example, playing them back more slowly than regions that are of low relevance and likely to have been transcribed correctly. Emphasizing those regions of the audio stream that are most important to transcribe correctly and those regions that are most likely to have been transcribed incorrectly increases the likelihood that the proofreader will accurately correct any errors in those regions, thereby improving the overall accuracy of the transcript. | 10-16-2014 |
20140316772 | Verification of Extracted Data - Facts are extracted from speech and recorded in a document using codings. Each coding represents an extracted fact and includes a code and a datum. The code may represent a type of the extracted fact and the datum may represent a value of the extracted fact. The datum in a coding is rendered based on a specified feature of the coding. For example, the datum may be rendered as boldface text to indicate that the coding has been designated as an “allergy.” In this way, the specified feature of the coding (e.g., “allergy”-ness) is used to modify the manner in which the datum is rendered. A user inspects the rendering and provides, based on the rendering, an indication of whether the coding was accurately designated as having the specified feature. A record of the user's indication may be stored, such as within the coding itself. | 10-23-2014 |
20140324423 | Document Extension in Dictation-Based Document Generation Workflow - An automatic speech recognizer is used to produce a structured document representing the contents of human speech. A best practice is applied to the structured document to produce a conclusion, such as a conclusion that required information is missing from the structured document. Content is inserted into the structured document based on the conclusion, thereby producing a modified document. The inserted content may be obtained by prompting a human user for the content and receiving input representing the content from the human user. | 10-30-2014 |