Patent application number | Description | Published |
20080306737 | SYSTEMS AND METHODS FOR CLASSIFYING AND REPRESENTING GESTURAL INPUTS - Gesture and handwriting recognition agents provide possible interpretations of electronic ink. Recognition is performed on both individual strokes and combinations of strokes in the input ink lattice. The interpretations of electronic ink are classified and encoded as symbol complexes where symbols convey specific attributes of the contents of the stroke. The use of symbol complexes to represent strokes in the input ink lattice facilitates reference to sets of entities of a specific type. | 12-11-2008 |
20090063375 | SYSTEM AND METHOD FOR COMPILING RULES CREATED BY MACHINE LEARNING PROGRAM - A system, a method, and a machine-readable medium are provided. A group of linear rules and associated weights are provided as a result of machine learning. Each one of the group of linear rules is partitioned into a respective one of a group of types of rules. A respective transducer for each of the linear rules is compiled. A combined finite state transducer is created from a union of the respective transducers compiled from the linear rules. | 03-05-2009 |
20090076795 | System And Method Of Generating Responses To Text-Based Messages - In accordance with one aspect of the present invention, an automated method of and system for generating a response to a text-based natural language message is disclosed. The method includes identifying a sentence in the text-based natural language message. Also, identifying an input clause in the sentence. Further, comparing the input clause to a previously received clause, where the previously received clause is correlated with a previously generated response message. Additionally, generating an output response message based on the previously generated response message. The system includes means for performing the method steps. | 03-19-2009 |
20090119102 | SYSTEM AND METHOD OF EXPLOITING PROSODIC FEATURES FOR DIALOG ACT TAGGING IN A DISCRIMINATIVE MODELING FRAMEWORK - Disclosed are a system and method for exploiting information in an utterance for dialog act tagging. An exemplary method includes receiving a user utterance, computing at periodic intervals at least one parameter in the user utterance, quantizing the at least one parameter at each periodic interval, approximating conditional probabilities using an n-gram over a sliding window over the periodic intervals and tagging the utterance as a dialog act based on the approximated conditional probabilities. | 05-07-2009 |
20090157827 | System and method for generating response email templates - Disclosed is a method and system for generating a response email template from previously transmitted response emails for use in responding more efficiently to a client email. The response email template is generated by identifying text strings common to the transmitted response emails and creating the template from at least some of the text strings. | 06-18-2009 |
20090190899 | SYSTEM AND METHOD FOR DIGITAL VIDEO RETRIEVAL INVOLVING SPEECH RECOGNITIION - Disclosed are systems, methods, and computer readable media for retrieving digital images. The method embodiment includes converting a descriptive audio stream of a digital video that is provided for the visually impaired to text and then aligning that text to the appropriate segment of the digital video. The system then indexes the converted text from the descriptive audio stream with the text's relationship to the digital video. The system enables queries using action words describing a desired scene from a digital video. | 07-30-2009 |
20090192781 | SYSTEM AND METHOD OF PROVIDING MACHINE TRANSLATION FROM A SOURCE LANGUAGE TO A TARGET LANGUAGE - A machine translation method, system for using the method, and computer readable media are disclosed. The method includes the steps of receiving a source language sentence, selecting a set of target language n-grams using a lexical classifier and based on the source language sentence. When selecting the set of target language n-grams, in at least one n-gram, n is greater than 1. The method continues by combining the selected set of target language n-grams as a finite state acceptor (FSA), weighting the FSA with data from the lexical classifier, and generating an n-best list of target sentences from the FSA. As an alternate to using the FSA, N strings may be generated from the n-grams and ranked using a language model. The N strings may be represented by an FSA for efficiency but it is not necessary. | 07-30-2009 |
20090192838 | SYSTEM AND METHOD FOR OPTIMIZING RESPONSE HANDLING TIME AND CUSTOMER SATISFACTION SCORES - A system and method disclosed for using and updating a database of template responses for a live agent in response to user communications. The method includes computing an average string distance between each response from a live agent and a template, use to generate the response, modifying the computed average string distance based on a customer satisfaction score associated with each response and selecting a response that minimizes the computed average string distance and maximizes customer satisfaction. Upon receiving a further communication on a certain issue, the system presents a prototype response that has been added to the template database to the live agent for use in generating a response to the further communication that reduces handling time and increases customer satisfaction. | 07-30-2009 |
20090292529 | SYSTEM AND METHOD OF PROVIDING A SPOKEN DIALOG INTERFACE TO A WEBSITE - Disclosed is a system and method for training a spoken dialog service component from website data. Spoken dialog service components typically include an automatic speech recognition module, a language understanding module, a dialog management module, a language generation module and a text-to-speech module. The method includes converting data from a structured database associated with a website to a structured text data set and a structured task knowledge base, extracting linguistic items from the structured database, and training a spoken dialog service component using at least one of the structured text data, the structured task knowledge base, or the linguistic items. The system includes modules configured to implement the method. | 11-26-2009 |
20100082326 | SYSTEM AND METHOD FOR ENRICHING SPOKEN LANGUAGE TRANSLATION WITH PROSODIC INFORMATION - Disclosed herein are systems, methods, and computer readable-media for enriching spoken language translation with prosodic information in a statistical speech translation framework. The method includes receiving speech for translation to a target language, generating pitch accent labels representing segments of the received speech which are prosodically prominent, and injecting pitch accent labels with word tokens within the translation engine to create enriched target language output text. A further step may be added of synthesizing speech in the target language based on the prosody enriched target language output text. An automatic prosody labeler can generate pitch accent labels. An automatic prosody labeler can exploit lexical, syntactic, and prosodic information of the speech. A maximum entropy model may be used to determine which segments of the speech are prosodically prominent. A pitch accent label can include an indication of certainty that a respective segment of the speech is prosodically prominent and/or an indication of prosodic prominence of a respective segment of speech. | 04-01-2010 |
20100088160 | Automatic Learning for Mapping Spoken/Text Descriptions of Products onto Available Products - A method, processing device, and machine-readable medium are provided. Costs of states of a state space are calculated. Each state represent one or more available product attributes having zero or more decided attribute values. The calculating is based, at least in part, on training data associated with previously requested and offered products. Determining a next state such that one or more products are available and a sum of values, including a cost of a next state and a cost of a perturbation of one of the one or more requested product attribute values to reach the next state is a minimum value. A value for a product attribute is mapped according to the minimum sum of values and product attribute values of available products. | 04-08-2010 |
20100098224 | Method and Apparatus for Automatically Building Conversational Systems - A system and method provides a natural language interface to world-wide web content. Either in advance or dynamically, webpage content is parsed using a parsing algorithm. A person using a telephone interface can provide speech information, which is converted to text and used to automatically fill in input fields on a webpage form. The form is then submitted to a database search and a response is generated. Information contained on the responsive webpage is extracted and converted to speech via a text-to-speech engine and communicated to the person. | 04-22-2010 |
20100100509 | Systems and Methods for Generating Markup-Language Based Expressions from Multi-Modal and Unimodal Inputs - When using finite-state devices to perform various functions, it is beneficial to use finite state devices representing regular grammars with terminals having markup-language-based semantics. By using markup-language-based symbols in the finite state devices, it is possible to generate valid markup-language expressions by concatenating the symbols representing the result of the performed function. The markup-language expression can be used by other applications and/or devices. Finite-state devices are used to convert strings of words and gestures into valid markup-language, for example, XML, expressions that can be used, for example, to provide an application program interface to underlying system applications. | 04-22-2010 |
20100100515 | TEXT EDIT TRACKER - A method for monitoring edits to a template for responding to an incoming communication includes categorizing the incoming communication into a category associated with the template for a response to the incoming communication. The method also includes determining distances between the template and each of a set of responses based on the template, at a predetermined level of granularity. The method also includes coding the template in accordance with the determined distances and displaying the coded template. A method for extracting a new template based on responses to an existing template includes selecting factors that affect quantitative measures for preparing a response to the incoming communication. The method includes using a mathematical model of the factors to cluster a set of responses created based on the existing template into two clusters. The method further includes restricting a first cluster centroid to be the existing template and searching for a second cluster centroid for a second cluster. | 04-22-2010 |
20100131260 | SYSTEM AND METHOD FOR ENRICHING SPOKEN LANGUAGE TRANSLATION WITH DIALOG ACTS - Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for enriching spoken language translation with dialog acts. The method includes receiving a source speech signal, tagging dialog acts associated with the received source speech signal using a classification model, dialog acts being domain independent descriptions of an intended action a speaker carries out by uttering the source speech signal, producing an enriched hypothesis of the source speech signal incorporating the dialog act tags, and outputting a natural language response of the enriched hypothesis in a target language. Tags can be grouped into sets such as statement, acknowledgement, abandoned, agreement, question, appreciation, and other. The step of producing an enriched translation of the source speech signal uses a dialog act specific translation model containing a phrase translation table. | 05-27-2010 |
20100131274 | SYSTEM AND METHOD FOR DIALOG MODELING - Disclosed herein are systems, computer-implemented methods, and computer-readable media for dialog modeling. The method includes receiving spoken dialogs annotated to indicate dialog acts and task/subtask information, parsing the spoken dialogs with a hierarchical, parse-based dialog model which operates incrementally from left to right and which only analyzes a preceding dialog context to generate parsed spoken dialogs, and constructing a functional task structure of the parsed spoken dialogs. The method can further either interpret user utterances with the functional task structure of the parsed spoken dialogs or plan system responses to user utterances with the functional task structure of the parsed spoken dialogs. The parse-based dialog model can be a shift-reduce model, a start-complete model, or a connection path model. | 05-27-2010 |
20100153105 | SYSTEM AND METHOD FOR REFERRING TO ENTITIES IN A DISCOURSE DOMAIN - Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for referring to entities. The method includes receiving domain-specific training data of sentences describing a target entity in a context, extracting a speaker history and a visual context from the training data, selecting attributes of the target entity based on at least one of the speaker history, the visual context, and speaker preferences, generating a text expression referring to the target entity based on at least one of the selected attributes, the speaker history, and the context, and outputting the generated text expression. The weighted finite-state automaton can represent partial orderings of word pairs in the domain-specific training data. The weighted finite-state automaton can be speaker specific or speaker independent. The weighted finite-state automaton can include a set of weighted partial orderings of the training data for each possible realization. | 06-17-2010 |
20100169244 | METHOD AND APPARATUS FOR USING A DISCRIMINATIVE CLASSIFIER FOR PROCESSING A QUERY - A method and apparatus for using a classifier for processing a query are disclosed. For example, the method receives a query from a user, and processes the query to locate one or more documents in accordance with a search engine having a discriminative classifier, wherein the discriminative classifier is trained with a plurality of artificial query examples. The method then presents a result of the processing to the user. | 07-01-2010 |
20100217580 | On-Demand Language Translation for Television Programs - A method, a system and a machine-readable medium are provided for an on demand translation service. A translation module including at least one language pair module for translating a source language to a target language may be made available for use by a subscriber. The subscriber may be charged a fee for use of the requested on demand translation service or may be provided use of the on demand translation service for free in exchange for displaying commercial messages to the subscriber. A video signal may be received including information in the source language, which may be obtained as text from the video signal and may be translated from the source language to the target language by use of the translation module. Translated information, based on the translated text, may be added into the received video signal. The video signal including the translated information in the target language may be sent to a display device. | 08-26-2010 |
20100281435 | SYSTEM AND METHOD FOR MULTIMODAL INTERACTION USING ROBUST GESTURE PROCESSING - Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for multimodal interaction. The method includes receiving a plurality of multimodal inputs associated with a query, the plurality of multimodal inputs including at least one gesture input, editing the at least one gesture input with a gesture edit machine. The method further includes responding to the query based on the edited gesture input and remaining multimodal inputs. The gesture inputs can be from a stylus, finger, mouse, and other pointing/gesture device. The gesture input can be unexpected or errorful. The gesture edit machine can perform actions such as deletion, substitution, insertion, and aggregation. The gesture edit machine can be modeled as a finite-state transducer. In one aspect, the method further includes generating a lattice for each input, generating an integrated lattice of combined meaning of the generated lattices, and responding to the query further based on the integrated lattice. | 11-04-2010 |
20110022379 | On-Demand Language Translation for Television Programs - In an embodiment, a method of providing an on demand translation service is provided. A subscriber may be charged a reduced fee or no fee for use of the on demand translation service in exchange for displaying commercial messages to the subscriber, the commercial messages being selected based on subscriber information. A multimedia signal including information in a source language may be received. The information may be obtained as text in the source language from the multimedia signal. The text may be translated from the source language to a target language. Translated information, based on the translated text, may be transmitted to a processing device for presentation to the subscriber. The received multimedia signal may be sent to a multimedia device for viewing. | 01-27-2011 |
20110078613 | Dynamic Generation of Soft Keyboards for Mobile Devices - Devices and methods are disclosed which relate to improving the efficiency of text input by dynamically generating a visually assistive virtual keyboard. Exemplary variations display a soft keyboard on a touchscreen of a text-entry device. The touchscreen works with the soft keyboard as a form of text input. Keyboard logic on the text-entry device is programmed to change the visual appearance of each key within the soft keyboard based on the prior entry. The keyboard logic assigns a prediction value to each key based on a statistical probability that the key will be entered next. The touchscreen displays a visually enhanced keyboard based on these prediction values. Enhancements include resizing keys relative to their prediction value, rearranging the keys a distance from the previous key entered inverse to its prediction value, etc. | 03-31-2011 |
20110099013 | SYSTEM AND METHOD FOR IMPROVING SPEECH RECOGNITION ACCURACY USING TEXTUAL CONTEXT - Disclosed herein are systems, methods, and computer-readable storage media for improving speech recognition accuracy using textual context. The method includes retrieving a recorded utterance, capturing text from a device display associated with the spoken dialog and viewed by one party to the recorded utterance, and identifying words in the captured text that are relevant to the recorded utterance. The method further includes adding the identified words to a dynamic language model, and recognizing the recorded utterance using the dynamic language model. The recorded utterance can be a spoken dialog. A time stamp can be assigned to each identified word. The method can include adding identified words to and/or removing identified words from the dynamic language model based on their respective time stamps. A screen scraper can capture text from the device display associated with the recorded utterance. The device display can contain customer service data. | 04-28-2011 |
20110144995 | SYSTEM AND METHOD FOR TIGHTLY COUPLING AUTOMATIC SPEECH RECOGNITION AND SEARCH - Disclosed herein are systems, methods, and computer-readable storage media for performing a search. A system configured to practice the method first receives from an automatic speech recognition (ASR) system a word lattice based on speech query and receives indexed documents from an information repository. The system composes, based on the word lattice and the indexed documents, at least one triple including a query word, selected indexed document, and weight. The system generates an N-best path through the word lattice based on the at least one triple and re-ranks ASR output based on the N-best path. The system aggregates each weight across the query words to generate N-best listings and returns search results to the speech query based on the re-ranked ASR output and the N-best listings. The lattice can be a confusion network, the arc density of which can be adjusted for a desired performance level. | 06-16-2011 |
20110145224 | SYSTEM AND METHOD FOR SPEECH-BASED INCREMENTAL SEARCH - Disclosed herein are systems, methods, and computer-readable storage media for receiving a user's spoken search query that the system will incrementally recognize and identify search terms. After the query has been incrementally recognized, the system will use the search terms to retrieve a portion of the search results that are based on usable identified search terms. As the results are found, the system will then output at least part at least part of the retrieved portion of search results on the display prior to the user concluding his or her search query. | 06-16-2011 |
20110184973 | METHOD AND APPARATUS FOR DETECTING AND EXTRACTING INFORMATION FROM DYNAMICALLY GENERATED WEB PAGES - A method and apparatus for automatically detecting and extracting information from dynamically generated web pages are disclosed. For example, the present method stores user provided information that is entered into a form interlace of a web page for a first query. Responsive to the first query, a first response web page is received and stored. The present method then automatically generates a second query to acquire a second response web page that is responsive to the second query. Finally, the present method compares the first response web page and the second response web page. In one embodiment, the present invention extracts information that is dissimilar between the first response web page and the second response web page. This extracted information is deemed to be the pertinent information requested by the user. | 07-28-2011 |
20110258531 | Method and Apparatus for Building Sales Tools by Mining Data from Websites - A website mining tool is disclosed that extracts information from, for example, a company's website and presents the extracted information in a graphical user interface (GUI). In one embodiment, web pages from a website are stored in, for example, computer memory and a structure of the web pages is identified. A plurality of blocks of information is then extracted as a function of this structure and a category is assigned to each block of information. The elements in the blocks of information are then displayed, for example to a salesperson, as a function of these categories. In another embodiment, Document Object Modeling parsing is used to identify the structure of the web pages. In yet another embodiment, a support vector machine is used to categorize each block of information. | 10-20-2011 |
20110321098 | System and Method for Automatic Identification of Key Phrases during a Multimedia Broadcast - An Internet Protocol television system includes a user profile agent, a keyword detection agent, and an information search agent. The user profile agent is in communication with a multimedia device, and generates a user profile based on information received from the multimedia device. The keyword detection agent is in communication with the user profile agent, and searches text associated with a multimedia video stream transmitted to the multimedia device for keywords associated with the user profile. The information search agent is in communication with the keyword detection agent, and connects to an information source associated with the keywords detected by the keyword detection agent, and provides additional information associated with the keywords to the multimedia device. | 12-29-2011 |
20120065963 | System And Method Of Generating Responses To Text-Based Messages - In accordance with one aspect of the present invention, an automated method of and system for generating a response to a text-based natural language message is disclosed. The method includes identifying a first selected input clause in a sentence in the text-based natural language message. Also, assigning a semantic tag to the first selected input clause and matching the semantic tag to a historical input tag. The historical input tag associated with a first previously generated response clause. Further; generating an output response message based on the historical response clause, the output response message derived from the historical input tag and a second previously generated response clause. The system includes means for performing the method steps. | 03-15-2012 |
20120072217 | SYSTEM AND METHOD FOR USING PROSODY FOR VOICE-ENABLED SEARCH - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating relevant responses to a user query with voice-enabled search. A system practicing the method receives a word lattice generated by an automatic speech recognizer based on a user speech and a prosodic analysis of the user speech, generates a reweighted word lattice based on the word lattice and the prosodic analysis, approximates based on the reweighted word lattice one or more relevant responses to the query, and presents to a user the responses to the query. The prosodic analysis examines metalinguistic information of the user speech and can identify the most salient subject matter of the speech, assess how confident a speaker is in the content of his or her speech, and identify the attitude, mood, emotion, sentiment, etc. of the speaker. Other information not described in the content of the speech can also be used. | 03-22-2012 |
20120072219 | SYSTEM AND METHOD FOR ENHANCING VOICE-ENABLED SEARCH BASED ON AUTOMATED DEMOGRAPHIC IDENTIFICATION - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating responses to a user speech query in voice-enabled search based on metadata that include demographic features of the speaker. A system practicing the method recognizes received speech from a speaker to generate recognized speech, identifies metadata about the speaker from the received speech, and feeds the recognized speech and the metadata to a question-answering engine. Identifying the metadata about the speaker is based on voice characteristics of the received speech. The demographic features can include age, gender, socio-economic group, nationality, and/or region. The metadata identified about the speaker from the received speech can be combined with or override self-reported speaker demographic information. | 03-22-2012 |
20120084086 | SYSTEM AND METHOD FOR OPEN SPEECH RECOGNITION - Disclosed herein are systems, methods and non-transitory computer-readable media for performing speech recognition across different applications or environments without model customization or prior knowledge of the domain of the received speech. The disclosure includes recognizing received speech with a collection of domain-specific speech recognizers, determining a speech recognition confidence for each of the speech recognition outputs, selecting speech recognition candidates based on a respective speech recognition confidence for each speech recognition output, and combining selected speech recognition candidates to generate text based on the combination. | 04-05-2012 |
20120089605 | USER PROFILE AND ITS LOCATION IN A CLUSTERED PROFILE LANDSCAPE - Delivering targeted content includes collecting, via at least one tangible processor, user activity data for users during a specified time period. questions asked by the users during the specified time period are extracted from the user activity data, via the at least one tangible processor, and stored in user profiles for the users. The user profiles are clustered, via the at least one tangible processor, based on the questions asked. Targeted content is delivered, via the at least one tangible processor, to a subset of the users based on the clustering. | 04-12-2012 |
20120095861 | PERSONAL CUSTOMER CARE AGENT - Aggregating information includes configuring, by at least one processor, a user profile that indicates user preferences for aggregated information. The at least one processor monitors information sources including the World Wide Web, business websites of interest, and online social media, based on the user preferences. Data obtained from the information sources is presented, based on the monitoring, by the at least one processor, in accordance with a presentation format, as the aggregated information, based on the user preferences. The at least one processor triggers updating of the presented aggregated information based on a change to the data at least one of the information sources and a change to the user profile. | 04-19-2012 |
20120150531 | SYSTEM AND METHOD FOR LEARNING LATENT REPRESENTATIONS FOR NATURAL LANGUAGE TASKS - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for learning latent representations for natural language tasks. A system configured to practice the method analyzes, for a first natural language processing task, a first natural language corpus to generate a latent representation for words in the first corpus. Then the system analyzes, for a second natural language processing task, a second natural language corpus having a target word, and predicts a label for the target word based on the latent representation. In one variation, the target word is one or more word such as a rare word and/or a word not encountered in the first natural language corpus. The system can optionally assigning the label to the target word. The system can operate according to a connectionist model that includes a learnable linear mapping that maps each word in the first corpus to a low dimensional latent space. | 06-14-2012 |
20120221331 | Method and Apparatus for Automatically Building Conversational Systems - A system and method provides a natural language interface to world-wide web content. Either in advance or dynamically, webpage content is parsed using a parsing algorithm. A person using a telephone interface can provide speech information, which is converted to text and used to automatically fill in input fields on a webpage form. The form is then submitted to a database search and a response is generated. Information contained on the responsive webpage is extracted and converted to speech via a text-to-speech engine and communicated to the person. | 08-30-2012 |
20120221332 | SYSTEM AND METHOD FOR REFERRING TO ENTITIES IN A DISCOURSE DOMAIN - Systems, methods, and non-transitory computer-readable media for referring to entities. The method includes receiving domain-specific training data of sentences describing a target entity in a context, extracting a speaker history and a visual context from the training data, selecting attributes of the target entity based on at least one of the speaker history, the visual context, and speaker preferences, generating a text expression referring to the target entity based on at least one of the selected attributes, the speaker history, and the context, and outputting the generated text expression. The weighted finite-state automaton can represent partial orderings of word pairs in the domain-specific training data. The weighted finite-state automaton can be speaker specific or speaker independent. The weighted finite-state automaton can include a set of weighted partial orderings of the training data for each possible realization. | 08-30-2012 |
20120232885 | SYSTEM AND METHOD FOR BUILDING DIVERSE LANGUAGE MODELS - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for collecting web data in order to create diverse language models. A system configured to practice the method first crawls, such as via a crawler operating on a computing device, a set of documents in a network of interconnected devices according to a visitation policy, wherein the visitation policy is configured to focus on novelty regions for a current language model built from previous crawling cycles by crawling documents whose vocabulary considered likely to fill gaps in the current language model. A language model from a previous cycle can be used to guide the creation of a language model in the following cycle. The novelty regions can include documents with high perplexity values over the current language model. | 09-13-2012 |
20120239383 | SYSTEM AND METHOD OF SPOKEN LANGUAGE UNDERSTANDING IN HUMAN COMPUTER DIALOGS - A system and method are disclosed that improve automatic speech recognition in a spoken dialog system. The method comprises partitioning speech recognizer output into self-contained clauses, identifying a dialog act in each of the self-contained clauses, qualifying dialog acts by identifying a current domain object and/or a current domain action, and determining whether further qualification is possible for the current domain object and/or current domain action. If further qualification is possible, then the method comprises identifying another domain action and/or another domain object associated with the current domain object and/or current domain action, reassigning the another domain action and/or another domain object as the current domain action and/or current domain object and then recursively qualifying the new current domain action and/or current object. This process continues until nothing is left to qualify. | 09-20-2012 |
20120253799 | SYSTEM AND METHOD FOR RAPID CUSTOMIZATION OF SPEECH RECOGNITION MODELS - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating domain-specific speech recognition models for a domain of interest by combining and tuning existing speech recognition models when a speech recognizer does not have access to a speech recognition model for that domain of interest and when available domain-specific data is below a minimum desired threshold to create a new domain-specific speech recognition model. A system configured to practice the method identifies a speech recognition domain and combines a set of speech recognition models, each speech recognition model of the set of speech recognition models being from a respective speech recognition domain. The system receives an amount of data specific to the speech recognition domain, wherein the amount of data is less than a minimum threshold to create a new domain-specific model, and tunes the combined speech recognition model for the speech recognition domain based on the data. | 10-04-2012 |
20120271898 | SYSTEM AND METHOD FOR OPTIMIZING RESPONSE HANDLING TIME AND CUSTOMER SATISFACTION SCORES - A system and method disclosed for using and updating a database of template responses for a live agent in response to user communications. The method includes computing an average string distance between each response from a live agent and a template, use to generate the response, modifying the computed average string distance based on a customer satisfaction score associated with each response and selecting a response that minimizes the computed average string distance and maximizes customer satisfaction. Upon receiving a further communication on a certain issue, the system presents a prototype response that has been added to the template database to the live agent for use in generating a response to the further communication that reduces handling time and increases customer satisfaction. | 10-25-2012 |
20120316866 | SYSTEM AND METHOD OF PROVIDING A SPOKEN DIALOG INTERFACE TO A WEBSITE - Disclosed is a system and method for training a spoken dialog service component from website data. Spoken dialog service components typically include an automatic speech recognition module, a language understanding module, a dialog management module, a language generation module and a text-to-speech module. The method includes converting data from a structured database associated with a website to a structured text data set and a structured task knowledge base, extracting linguistic items from the structured database, and training a spoken dialog service component using at least one of the structured text data, the structured task knowledge base, or the linguistic items. The system includes modules configured to implement the method. | 12-13-2012 |
20130030788 | SYSTEM AND METHOD FOR LOCATING BILINGUAL WEB SITES - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for bootstrapping a language translation system. A system configured to practice the method performs a bidirectional web crawl to identify a bilingual website. The system analyzes data on the bilingual website to make a classification decision about whether the root of the bilingual website is an entry point for the bilingual website. The bilingual site can contain pairs of parallel pages. Each pair can include a first website in a first language and a second website in a second language, and a first portion of the first web page corresponds to a second portion of the second web page. Then the system analyzes the first and second web pages to identify corresponding information pairs in the first and second languages, and extracts the corresponding information pairs from the first and second web pages for use in a language translation model. | 01-31-2013 |
20130035932 | SYSTEM AND METHOD OF GENERATING RESPONSES TO TEXT-BASED MESSAGES - A system to generate a response to a text-based natural language message includes a user interface, processing device, and a computer-readable storage medium storing executable instructions to generate the response to the text-based natural language message. The instructions and a method for generating the response include identifying a sentence in the text-based natural language message, identifying an input clause in the sentence, and parsing the input clause, thereby defining a relationship between words in the input clause. The instructions and method also include assigning a semantic tag to the parsed input clause, comparing the input clause to a previously received clause, the previously received clause being correlated with a previously generated response clause, and generating an output response message derived from the previously generated response clause. | 02-07-2013 |
20130066632 | SYSTEM AND METHOD FOR ENRICHING TEXT-TO-SPEECH SYNTHESIS WITH AUTOMATIC DIALOG ACT TAGS - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for modifying the prosody of synthesized speech based on an associated speech act. A system configured according to the method embodiment (1) receives text, (2) performs an analysis of the text to determine and assign a speech act label to the text, and (3) converts the text to speech, where the speech prosody is based on the speech act label. The analysis performed compares the text to a corpus of previously tagged utterances to find a close match, determines a confidence score from a correlation of the text and the close match, and, if the confidence score is above a threshold value, retrieving the speech act label of the close match and assigning it to the text. | 03-14-2013 |
20130097264 | SYSTEM AND METHOD FOR OPTIMIZING RESPONSE HANDLING TIME AND CUSTOMER SATISFACTION SCORES - A system and method disclosed for using and updating a database of template responses for a live agent in response to user communications. The method includes computing an average string distance between each response from a live agent and a template, use to generate the response, modifying the computed average string distance based on a customer satisfaction score associated with each response and selecting a response that minimizes the computed average string distance and maximizes customer satisfaction. Upon receiving a further communication on a certain issue, the system presents a prototype response that has been added to the template database to the live agent for use in generating a response to the further communication that reduces handling time and increases customer satisfaction. | 04-18-2013 |
20130144594 | SYSTEM AND METHOD FOR COLLABORATIVE LANGUAGE TRANSLATION - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for presenting a machine translation and alternative translations to a user, where a selection of any particular alternative translation results in the re-ranking of the remaining alternatives. The system then presents these re-ranked alternatives to the user, who can continue proofing the machine translation using the re-ranked alternatives or by typing an improved translation. This process continues until the user indicates that the current portion of the translation is complete, at which point the system moves to the next portion. | 06-06-2013 |
20130144616 | SYSTEM AND METHOD FOR MACHINE-MEDIATED HUMAN-HUMAN CONVERSATION - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for processing speech. A system configured to practice the method monitors user utterances to generate a conversation context. Then the system receives a current user utterance independent of non-natural language input intended to trigger speech processing. The system compares the current user utterance to the conversation context to generate a context similarity score, and if the context similarity score is above a threshold, incorporates the current user utterance into the conversation context. If the context similarity score is below the threshold, the system discards the current user utterance. The system can compare the current user utterance to the conversation context based on an n-gram distribution, a perplexity score, and a perplexity threshold. Alternately, the system can use a task model to compare the current user utterance to the conversation context. | 06-06-2013 |
20130151232 | SYSTEM AND METHOD FOR ENRICHING SPOKEN LANGUAGE TRANSLATION WITH DIALOG ACTS - Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for enriching spoken language translation with dialog acts. The method includes receiving a source speech signal, tagging dialog acts associated with the received source speech signal using a classification model, dialog acts being domain independent descriptions of an intended action a speaker carries out by uttering the source speech signal, producing an enriched hypothesis of the source speech signal incorporating the dialog act tags, and outputting a natural language response of the enriched hypothesis in a target language. Tags can be grouped into sets such as statement, acknowledgement, abandoned, agreement, question, appreciation, and other. The step of producing an enriched translation of the source speech signal uses a dialog act specific translation model containing a phrase translation table. | 06-13-2013 |
20130159828 | Method and Apparatus for Building Sales Tools by Mining Data from Websites - A website mining tool is disclosed that extracts information from, for example, a company's website and presents the extracted information in a graphical user interface (GUI). In one embodiment, web pages from a website are stored in, for example, computer memory and a structure of the web pages is identified. A plurality of blocks of information is then extracted as a function of this structure and a category is assigned to each block of information. The elements in the blocks of information are then displayed, for example to a salesperson, as a function of these categories. In another embodiment, Document Object Modeling parsing is used to identify the structure of the web pages. In yet another embodiment, a support vector machine is used to categorize each block of information. | 06-20-2013 |
20130166284 | System and Method of Spoken Language Understanding in Human Computer Dialogs - A system and method are disclosed that improve automatic speech recognition in a spoken dialog system. The method comprises partitioning speech recognizer output into self-contained clauses, identifying a dialog act in each of the self-contained clauses, qualifying dialog acts by identifying a current domain object and/or a current domain action, and determining whether further qualification is possible for the current domain object and/or current domain action. If further qualification is possible, then the method comprises identifying another domain action and/or another domain object associated with the current domain object and/or current domain action, reassigning the another domain action and/or another domain object as the current domain action and/or current domain object and then recursively qualifying the new current domain action and/or current object. This process continues until nothing is left to qualify. | 06-27-2013 |
20130218561 | System and Method for Enhancing Voice-Enabled Search Based on Automated Demographic Identification - Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating responses to a user speech query in voice-enabled search based on metadata that include demographic features of the speaker. A system practicing the method recognizes received speech from a speaker to generate recognized speech, identifies metadata about the speaker from the received speech, and feeds the recognized speech and the metadata to a question-answering engine. Identifying the metadata about the speaker is based on voice characteristics of the received speech. The demographic features can include age, gender, socio-economic group, nationality, and/or region. The metadata identified about the speaker from the received speech can be combined with or override self-reported speaker demographic information. | 08-22-2013 |
20130238320 | SYSTEMS AND METHODS FOR GENERATING MARKUP-LANGUAGE BASED EXPRESSIONS FROM MULTI-MODAL AND UNIMODAL INPUTS - When using finite-state devices to perform various functions, it is beneficial to use finite state devices representing regular grammars with terminals having markup-language-based semantics. By using markup-language-based symbols in the finite state devices, it is possible to generate valid markup-language expressions by concatenating the symbols representing the result of the performed function. The markup-language expression can be used by other applications and/or devices. Finite-state devices are used to convert strings of words and gestures into valid markup-language, for example, XML, expressions that can be used, for example, to provide an application program interface to underlying system applications. | 09-12-2013 |
20130246069 | System and Method of Providing a Spoken Dialog Interface to a Website - Disclosed is a method for training a spoken dialog service component from website data. Spoken dialog service components typically include an automatic speech recognition module, a language understanding module, a dialog management module, a language generation module and a text-to-speech module. The method includes selecting anchor texts within a website based on a term density, weighting those anchor texts based on a percent of salient words to total words, and incorporating the weighted anchor texts into a live spoken dialog interface, the weights determining a level of incorporation into the live spoken dialog interface. | 09-19-2013 |
20130275132 | Method and Apparatus for Automatically Building Conversational Systems - A system and method provides a natural language interface to world-wide web content. Either in advance or dynamically, webpage content is parsed using a parsing algorithm. A person using a telephone interface can provide speech information, which is converted to text and used to automatically fill in input fields on a webpage form. The form is then submitted to a database search and a response is generated. Information contained on the responsive webpage is extracted and converted to speech via a text-to-speech engine and communicated to the person. | 10-17-2013 |
20130300845 | System and Method for Digital Video Retrieval Involving Speech Recognition - Disclosed are systems, methods, and computer readable media for retrieving digital images. The method embodiment includes converting a descriptive audio stream of a digital video that is provided for the visually impaired to text and then aligning that text to the appropriate segment of the digital video. The system then indexes the converted text from the descriptive audio stream with the text's relationship to the digital video. The system enables queries using action words describing a desired scene from a digital video. | 11-14-2013 |
20130311170 | Methods and Systems for Natural Language Understanding Using Human Knowledge and Collected Data - Disclosed herein are systems and methods to incorporate human knowledge when developing and using statistical models for natural language understanding. The disclosed systems and methods embrace a data-driven approach to natural language understanding which progresses seamlessly along the continuum of availability of annotated collected data, from when there is no available annotated collected data to when there is any amount of annotated collected data. | 11-21-2013 |
20140053171 | On-Demand Language Translation for Television Programs - A method, a system and a machine-readable medium are provided for an on demand translation service. A translation module including at least one language pair module for translating a source language to a target language may be made available for use by a subscriber. The subscriber may be charged a fee for use of the requested on demand translation service or may be provided use of the on demand translation service for free in exchange for displaying commercial messages to the subscriber. A video signal may be received including information in the source language, which may be obtained as text from the video signal and may be translated from the source language to the target language by use of the translation module. Translated information, based on the translated text, may be added into the received video signal. | 02-20-2014 |
20140074477 | System and Method of Spoken Language Understanding in Human Computer Dialogs - A system and method are disclosed that improve automatic speech recognition in a spoken dialog system. The method comprises partitioning speech recognizer output into self-contained clauses, identifying a dialog act in each of the self-contained clauses, qualifying dialog acts by identifying a current domain object and/or a current domain action, and determining whether further qualification is possible for the current domain object and/or current domain action. If further qualification is possible, then the method comprises identifying another domain action and/or another domain object associated with the current domain object and/or current domain action, reassigning the another domain action and/or another domain object as the current domain action and/or current domain object and then recursively qualifying the new current domain action and/or current object. This process continues until nothing is left to qualify. | 03-13-2014 |
20140188480 | SYSTEM AND METHOD FOR GENERATING CUSTOMIZED TEXT-TO-SPEECH VOICES - A system and method are disclosed for generating customized text-to-speech voices for a particular application. The method comprises generating a custom text-to-speech voice by selecting a voice for generating a custom text-to-speech voice associated with a domain, collecting text data associated with the domain from a pre-existing text data source and using the collected text data, generating an in-domain inventory of synthesis speech units by selecting speech units appropriate to the domain via a search of a pre-existing inventory of synthesis speech units, or by recording the minimal inventory for a selected level of synthesis quality. The text-to-speech custom voice for the domain is generated utilizing the in-domain inventory of synthesis speech units. Active learning techniques may also be employed to identify problem phrases wherein only a few minutes of recorded data is necessary to deliver a high quality TTS custom voice. | 07-03-2014 |
20140207462 | System and Method of Providing a Spoken Dialog Interface to a Website - Disclosed is a method for training a spoken dialog service component from website data. Spoken dialog service components typically include an automatic speech recognition module, a language understanding module, a dialog management module, a language generation module and a text-to-speech module. The method includes selecting anchor texts within a website based on a term density, weighting those anchor texts based on a percent of salient words to total words, and incorporating the weighted anchor texts into a live spoken dialog interface, the weights determining a level of incorporation into the live spoken dialog interface. | 07-24-2014 |
20140330552 | Machine Translation Using Global Lexical Selection and Sentence Reconstruction - Disclosed are systems, methods, and computer-readable media for performing translations from a source language to a target language. The method comprises receiving a source phrase, generating a target bag of words based on a global lexical selection of words that loosely couples the source words/phrases and target words/phrases, and reconstructing a target phrase or sentence by considering all permutations of words with a conditional probability greater than a threshold. | 11-06-2014 |
20140330555 | Methods and Systems for Natural Language Understanding Using Human Knowledge and Collected Data - Disclosed herein are systems and methods to incorporate human knowledge when developing and using statistical models for natural language understanding. The disclosed systems and methods embrace a data-driven approach to natural language understanding which progresses seamlessly along the continuum of availability of annotated collected data, from when there is no available annotated collected data to when there is any amount of annotated collected data. | 11-06-2014 |
20140350915 | On-Demand Language Translation for Television Programs - In an embodiment, a method of providing an on demand translation service is provided. A subscriber may be charged a reduced fee or no fee for use of the on demand translation service in exchange for displaying commercial messages to the subscriber, the commercial messages being selected based on subscriber information. A multimedia signal including information in a source language may be received. The information may be obtained as text in the source language from the multimedia signal. The text may be translated from the source language to a target language. Translated information, based on the translated text, may be transmitted to a processing device for presentation to the subscriber. The received multimedia signal may be sent to a multimedia device for viewing. | 11-27-2014 |
20140358537 | System and Method for Combining Speech Recognition Outputs From a Plurality of Domain-Specific Speech Recognizers Via Machine Learning - Disclosed herein are systems, methods and non-transitory computer-readable media for performing speech recognition across different applications or environments without model customization or prior knowledge of the domain of the received speech. The disclosure includes recognizing received speech with a collection of domain-specific speech recognizers, determining a speech recognition confidence for each of the speech recognition outputs, selecting speech recognition candidates based on a respective speech recognition confidence for each speech recognition output, and combining selected speech recognition candidates to generate text based on the combination. | 12-04-2014 |
20140372404 | METHOD AND APPARATUS FOR USING A DISCRIMINATIVE CLASSIFIER FOR PROCESSING A QUERY - A method and apparatus for using a classifier for processing a query are disclosed. For example, the method receives a query from a user, and processes the query to locate one or more documents in accordance with a search engine having a discriminative classifier, wherein the discriminative classifier is trained with a plurality of artificial query examples. The method then presents a result of the processing to the user. | 12-18-2014 |
20140379349 | System and Method for Tightly Coupling Automatic Speech Recognition and Search - Disclosed herein are systems, methods, and computer-readable storage media for performing a search. A system configured to practice the method first receives from an automatic speech recognition (ASR) system a word lattice based on speech query and receives indexed documents from an information repository. The system composes, based on the word lattice and the indexed documents, at least one triple including a query word, selected indexed document, and weight. The system generates an N-best path through the word lattice based on the at least one triple and re-ranks ASR output based on the N-best path. The system aggregates each weight across the query words to generate N-best listings and returns search results to the speech query based on the re-ranked ASR output and the N-best listings. The lattice can be a confusion network, the arc density of which can be adjusted for a desired performance level. | 12-25-2014 |
20150019210 | SYSTEM AND METHOD OF EXTRACTING CLAUSES FOR SPOKEN LANGUAGE UNDERSTANDING - A clausifier and method of extracting clauses for spoken language understanding are disclosed. The method relates to generating a set of clauses from speech utterance text and comprises inserting at least one boundary tag in speech utterance text related to sentence boundaries, inserting at least one edit tag indicating a portion of the speech utterance text to remove, and inserting at least one conjunction tag within the speech utterance text. The result is a set of clauses that may be identified within the speech utterance text according to the inserted at least one boundary tag, at least one edit tag and at least one conjunction tag. The disclosed clausifier comprises a sentence boundary classifier, an edit detector classifier, and a conjunction detector classifier. The clausifier may comprise a single classifier or a plurality of classifiers to perform the steps of identifying sentence boundaries, editing text, and identifying conjunctions within the text. | 01-15-2015 |
20150025886 | SYSTEM AND METHOD OF EXTRACTING CLAUSES FOR SPOKEN LANGUAGE UNDERSTANDING - A clausifier for extracting clauses for spoken language understanding is disclosed. The method relates to generating a set of clauses from speech utterance text and comprises inserting at least one boundary tag in speech utterance text related to sentence boundaries, inserting at least one edit tag indicating a portion of the speech utterance text to remove, and inserting at least one conjunction tag within the speech utterance text. The result is a set of clauses that may be identified within the speech utterance text according to the inserted at least one boundary tag, at least one edit tag and at least one conjunction tag. The disclosed clausifier comprises a sentence boundary classifier, an edit detector classifier, and a conjunction detector classifier. The clausifier may comprise a single classifier or a plurality of classifiers to perform the steps of identifying sentence boundaries, editing text, and identifying conjunctions within the text. | 01-22-2015 |
20150073811 | Systems and Methods for Generating Markup-Language Based Expressions from Multi-Modal and Unimodal Inputs - When using finite-state devices to perform various functions, it is beneficial to use finite state devices representing regular grammars with terminals having markup-language-based semantics. By using markup-language-based symbols in the finite state devices, it is possible to generate valid markup-language expressions by concatenating the symbols representing the result of the performed function. The markup-language expression can be used by other applications and/or devices. Finite-state devices are used to convert strings of words and gestures into valid markup-language, for example, XML, expressions that can be used, for example, to provide an application program interface to underlying system applications. | 03-12-2015 |
20150074702 | SYSTEM AND METHOD FOR AUTOMATIC IDENTIFICATION OF KEY PHRASES DURING A MULTIMEDIA BROADCAST - Aspects of the subject disclosure may include, for example, a process that determines information from a first media stream including a number of keywords associated with viewing habits of a user. A second media stream of a media program is scanned for one of a word or phrase corresponding to a keyword of the number of keywords. A second keyword is determined based on the one of the word or phrase, and additional information associated with the second keyword is identified. Results associated with the additional information are provided to a multimedia device. Other embodiments are disclosed. | 03-12-2015 |
20150081678 | SYSTEM AND METHOD FOR SPEECH-BASED INCREMENTAL SEARCH - Disclosed herein are systems, methods, and computer-readable storage media for receiving a user's spoken search query that the system will incrementally recognize and identify search terms. After the query has been incrementally recognized, the system will use the search terms to retrieve a portion of the search results that are based on usable identified search terms. As the results are found, the system will then output at least part at least part of the retrieved portion of search results on the display prior to the user concluding his or her search query. | 03-19-2015 |