Speech assisted network

Subclass of:

704 - Data processing: speech signal processing, linguistics, language translation, and audio compression/decompression

704200000 - SPEECH SIGNAL PROCESSING

704270000 - Application

Patent class list (only not empty are listed)

Deeper subclasses:

Document	Title	Date
Entries
20080208585	Ordering Recognition Results Produced By An Automatic Speech Recognition Engine For A Multimodal Application - Ordering recognition results produced by an automatic speech recognition (‘ASR’) engine for a multimodal application implemented with a grammar of the multimodal application in the ASR engine, with the multimodal application operating in a multimodal browser on a multimodal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes, the multimodal application operatively coupled to the ASR engine through a VoiceXML interpreter, includes: receiving, in the VoiceXML interpreter from the multimodal application, a voice utterance; determining, by the VoiceXML interpreter using the ASR engine, a plurality of recognition results in dependence upon the voice utterance and the grammar; determining, by the VoiceXML interpreter according to semantic interpretation scripts of the grammar, a weight for each recognition result; and sorting, by the VoiceXML interpreter, the plurality of recognition results in dependence upon the weight for each recognition result.	08-28-2008
20080208586	Enabling Natural Language Understanding In An X+V Page Of A Multimodal Application - Enabling natural language understanding using an X+V page of a multimodal application implemented with a statistical language model (‘SLM’) grammar of the multimodal application in an automatic speech recognition (‘ASR’) engine, with the multimodal application operating in a multimodal browser on a multimodal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes, the multimodal application operatively coupled to the ASR engine through a VoiceXML interpreter, including: receiving, in the ASR engine from the multimodal application, a voice utterance; generating, by the ASR engine according to the SLM grammar, at least one recognition result for the voice utterance; determining, by an action classifier for the VoiceXML interpreter, an action identifier in dependence upon the recognition result, the action identifier specifying an action to be performed by the multimodal application; and interpreting, by the VoiceXML interpreter, the multimodal application in dependence upon the action identifier.	08-28-2008
20080215331	ENABLING SPEECH WITHIN A MULTIMODAL PROGRAM USING MARKUP - A method for speech enabling an application can include the step of specifying a speech input within a speech-enabled markup. The speech-enabled markup can also specify an application operation that is to be executed responsive to the detection of the speech input. After the speech input has been defined within the speech-enabled markup, the application can be instantiated. The specified speech input can then be detected and the application operation can be responsively executed in accordance with the specified speech-enabled markup.	09-04-2008
20080221896	Grammar confusability metric for speech recognition - Architecture for testing an application grammar for the presence of confusable terms. A grammar confusability metric (GCM) is generated for describing a likelihood that a reference term will be confused by the speech recognizer with another term phrase currently allowed by active grammar rules. The GCM is used to flag processing of two phrases in the grammar that have different semantic meaning, but that the speech recognizer could have difficulty distinguishing reliably. A built-in acoustic model is analyzed and feature vectors generated that are close to the acoustic properties of the input term. The feature vectors are then sent for recognition. A statistically random sampling method is applied to explore the acoustic properties of feature vectors of the input term phrase spatially and temporally. The feature vectors are perturbed in the neighborhood of the time domain and the Gaussian mixture model to which the feature vectors belong.	09-11-2008
20080221897	MOBILE ENVIRONMENT SPEECH PROCESSING FACILITY - In embodiments of the present invention improved capabilities are described for a mobile environment speech processing facility. The present invention may provide for the entering of text into a software application resident on a mobile communication facility, where recorded speech may be presented by the user using the mobile communications facility's resident capture facility. Transmission of the recording may be provided through a wireless communication facility to a speech recognition facility, and may be accompanied by information related to the software application. Results may be generated utilizing the speech recognition facility that may be independent of structured grammar, and may be based at least in part on the information relating to the software application and the recording. The results may then be transmitted to the mobile communications facility, where they may be loaded into the software application.	09-11-2008
20080221898	Mobile navigation environment speech processing facility - In embodiments of the present invention improved capabilities are described for a mobile environment speech processing facility. The present invention may provide for the entering of text into a navigation software application resident on a mobile communication facility, where speech may be recorded using the mobile communications facility's resident capture facility. Transmission of the recording may be provided through a wireless communication facility to a speech recognition facility. Results may be generated utilizing the speech recognition facility that may be independent of structured grammar, and may be based at least in part on the information relating to the recording. The results may then be transmitted to the mobile communications facility, where they may be loaded into the navigation software application. In embodiments, the user may be allowed to alter the results that are received from the speech recognition facility. In addition, the speech recognition facility may be adapted based on usage.	09-11-2008
20080221899	MOBILE MESSAGING ENVIRONMENT SPEECH PROCESSING FACILITY - In embodiments of the present invention improved capabilities are described for a mobile environment speech processing facility. The present invention may provide for the entering of text into a messaging software application resident on a mobile communication facility, where speech may be recorded using the mobile communications facility's resident capture facility. Transmission of the recording may be provided through a wireless communication facility to a speech recognition facility. Results may be generated utilizing the speech recognition facility that may be independent of structured grammar, and may be based at least in part on the information relating to the recording. The results may then be transmitted to the mobile communications facility, where they may be loaded into the messaging software application. In embodiments, the user may be allowed to alter the results that are received from the speech recognition facility. In addition, the speech recognition facility may be adapted based on usage.	09-11-2008
20080221900	MOBILE LOCAL SEARCH ENVIRONMENT SPEECH PROCESSING FACILITY - In embodiments of the present invention improved capabilities are described for a mobile environment speech processing facility. The present invention may provide for the entering of text into a local search software application resident on a mobile communication facility, where speech may be recorded using the mobile communications facility's resident capture facility. Transmission of the recording may be provided through a wireless communication facility to a speech recognition facility. Results may be generated utilizing the speech recognition facility that may be independent of structured grammar, and may be based at least in part on the information relating to the recording. The results may then be transmitted to the mobile communications facility, where they may be loaded into the local search software application. In embodiments, the user may be allowed to alter the results that are received from the speech recognition facility. In addition, the speech recognition facility may be adapted based on usage.	09-11-2008
20080221901	MOBILE GENERAL SEARCH ENVIRONMENT SPEECH PROCESSING FACILITY - In embodiments of the present invention improved capabilities are described for a mobile environment speech processing facility. The present invention may provide for the entering of text into a search software application resident on a mobile communication facility, where speech may be recorded using the mobile communications facility's resident capture facility. Transmission of the recording may be provided through a wireless communication facility to a speech recognition facility. Results may be generated utilizing the speech recognition facility that may be independent of structured grammar, and may be based at least in part on the information relating to the recording. The results may then be transmitted to the mobile communications facility, where they may be loaded into the search software application. In embodiments, the user may be allowed to alter the results that are received from the speech recognition facility. In addition, the speech recognition facility may be adapted based on usage.	09-11-2008
20080221902	MOBILE BROWSER ENVIRONMENT SPEECH PROCESSING FACILITY - In embodiments of the present invention improved capabilities are described for a mobile environment speech processing facility. The present invention may provide for the entering of text into a browser software application resident on a mobile communication facility, where speech may be recorded using the mobile communications facility's resident capture facility. Transmission of the recording may be provided through a wireless communication facility to a speech recognition facility. Results may be generated utilizing the speech recognition facility that may be independent of structured grammar, and may be based at least in part on the information relating to the recording. The results may then be transmitted to the mobile communications facility, where they may be loaded into the browser software application. In embodiments, the user may be allowed to alter the results that are received from the speech recognition facility. In addition, the speech recognition facility may be adapted based on usage.	09-11-2008
20080228489	SYSTEM AND METHOD FOR ANALYZING AUTOMATIC SPEECH RECOGNITION PERFORMANCE DATA - In a disclosed method for interpreting automatic speech recognition (ASR) performance data, a data processing system may receive user input that selects a log file to be processed. The log file may contain log records produced by an ASR system as a result of verbal interaction between an individual and the ASR system. In response to receiving the user input, the data processing system may automatically interpret data in the log records and generate interpretation results. The interpretation results may include a duration for a system prompt communicated to the individual by the ASR system, a user response to the system prompt, and a duration for the user response. The user response may include a textual representation of a verbal response from the individual, obtained through ASR. The interpretation results may also include an overall duration for the telephone call.	09-18-2008
20080228490	METHOD AND APPARATUS FOR LINKING REPRESENTATION AND REALIZATION DATA - A method and apparatus for creating links between a representation, (e.g. text data,) and a realization, (e.g. corresponding audio data,) is provided. According to the invention the realization is structured by combining a time-stamped version of the representation generated from the realization with structural information from the representation. Thereby so called hyper links between representation and realization are created. These hyper links are used for performing search operations in realization data equivalent to those which are possible in representation data, enabling an improved access to the realization (e.g. via audio databases).	09-18-2008
20080235027	Supporting Multi-Lingual User Interaction With A Multimodal Application - Methods, apparatus, and products are disclosed for supporting multi-lingual user interaction with a multimodal application, the application including a plurality of VoiceXML dialogs, each dialog characterized by a particular language, supporting multi-lingual user interaction implemented with a plurality of speech engines, each speech engine having a grammar and characterized by a language corresponding to one of the dialogs, with the application operating on a multimodal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes, the application operatively coupled to the speech engines through a VoiceXML interpreter, the VoiceXML interpreter: receiving a voice utterance from a user; determining in parallel, using the speech engines, recognition results for each dialog in dependence upon the voice utterance and the grammar for each speech engine; administering the recognition results for the dialogs; and selecting a language for user interaction in dependence upon the administered recognition results.	09-25-2008
20080235028	Creating A Voice Response Grammar From A Presentation Grammer - Methods, systems, and products are disclosed for creating a voice response grammar in a voice response server including identifying presentation documents for a presentation, each presentation document having a presentation grammar. Typical embodiments include storing each presentation grammar in a voice response grammar on a voice response server. In typical embodiments, identifying presentation documents for a presentation includes creating a data structure representing a presentation and listing at least one presentation document in the data structure representing a presentation. In typical embodiments listing the at least one presentation document includes storing a location of the presentation document in the data structure representing a presentation and storing each presentation grammar includes retrieving a presentation grammar of the presentation document in dependence upon the location of the presentation document.	09-25-2008
20080243515	SYSTEM AND METHOD FOR PROVIDING AN AUTOMATED CALL CENTER INLINE ARCHITECTURE - A system and method for providing an automated call center inline architecture is provided. A plurality of grammar references and prompts are maintained on a script engine. A call is received through a telephony interface. Audio data is collected using the prompts from the script engine, which are transmitted to the telephony interface via a message server. Distributed speech recognition is performed on a speech server. The grammar references are received from the script engine via the message server. Speech results are determined by applying the grammar references to the audio data. A new grammar is formed from the speech results. Speech recognition results are identified by applying the new grammar to the audio data. The speech recognition results are received as a display on an agent console.	10-02-2008
20080249781	Voice business client - The subject mater herein relates to computer software and client-server based applications and, more particularly, to a voice business client. Some embodiments include one or more device-agnostic application interaction models and one or more device specific transformation services. Some such embodiments provide one or more of systems, methods, and software embodied at least in part in a device specific transformation service to transform channel agnostic application interaction models to and from device or device surrogate specific formats.	10-09-2008
20080255847	Meeting visualization system - Voice of plural participants during a meeting is obtained and dialogue situations of the participants that change every second are displayed in real time, so that it is possible to provide a meeting visualization system for triggering more positive discussions. Voice data collected from plural voice collecting units associated with plural participants is processed by a voice processing server to extract speech information. The speech information is sequentially input to an aggregation server. A query process is performed for the speech information by a stream data processing unit of the aggregation server, so that activity data such as the accumulation value of speeches of the participants in the meeting is generated. A display processing unit visualizes and displays dialogue situations of the participants by using the sizes of circles and the thicknesses of lines on the basis of the activity data.	10-16-2008
20080255848	Speech Recognition Method and System and Speech Recognition Server - A speech recognition method, system and server includes receiving speech information from at least one User Equipment; analyzing and recognizing the speech information and searching for a speech feature matching the speech information; and obtaining an instruction in accordance with the speech feature and executing the instruction. With the various embodiments of the disclosure, cost of the User Equipment may be reduced and accuracy of speech recognition may be improved.	10-16-2008
20080270142	Remote Interactive Information Delivery System - Disclosed herein is a method and system for providing a response to a user's request for information. The user calls into an intelligent information delivery system requests for the information. The information request is recorded as an audio file at the intelligent information delivery system. A structured text form of the audio file is refined into an optimized search query. The optimized search query is input to retrieve search results comprising information of interest from a data server. The search results are processed into an agent readability enhanced and context specific output and displayed to the agent. The agent selects context specific results from the displayed output. The selected context specific results are formatted to an optimized speech deliverable text form. Content of the optimized speech deliverable text form is converted into a voice stream. The voice stream is then communicated to the user.	10-30-2008
20080300884	Using voice commands from a mobile device to remotely access and control a computer - A method of using voice commands from a mobile device to remotely access and control a computer. The method includes receiving audio data from the mobile device at the computer. The audio data is decoded into a command. A software program that the command was provided for is determined. At least one process is executed at the computer in response to the command. Output data is generated at the computer in response to executing at least one process at the computer. The output data is transmitted to the mobile device.	12-04-2008
20080312933	INTERFACING AN APPLICATION SERVER TO REMOTE RESOURCES USING ENTERPRISE JAVA BEANS AS INTERFACE COMPONENTS - A method for interfacing an application server with a resource can include the step of associating a plurality of Enterprise Java Beans (EJBs) to a plurality of resources, where a one-to-one correspondence exists between EJB and resource. An application server can receive an application request and can determine a resource for handling the request. An EJB associated with the determined resource can interface the application server to the determined resource. The request can be handled with the determined resource.	12-18-2008
20080319757	SPEECH PROCESSING SYSTEM BASED UPON A REPRESENTATIONAL STATE TRANSFER (REST) ARCHITECTURE THAT USES WEB 2.0 CONCEPTS FOR SPEECH RESOURCE INTERFACES - A speech processing system can include a client, a speech for Web 2.0 system, and a speech processing system. The client can access a speech-enabled application using at least one Web 2.0 communication protocol. For example, a standard browser of the client can use a standard protocol to communicate with the speech-enabled application executing on the speech for Web 2.0 system. The speech for Web 2.0 system can access a data store within which user specific speech parameters are included, wherein a user of the client is able to configure the specific speech parameters of the data store. Suitable ones of these speech parameters are utilized whenever the user interacts with the Web 2.0 system. The speech processing system can include one or more speech processing engines. The speech processing system can interact with the speech for Web 2.0 system to handle speech processing tasks associated with the speech-enabled application.	12-25-2008
20080319758	SPEECH-ENABLED APPLICATION THAT USES WEB 2.0 CONCEPTS TO INTERFACE WITH SPEECH ENGINES - The present invention discloses a speech-enabled application that includes two or more linked markup documents that together form a speech-enabled application served by a Web 2.0 server. The linked markup documents can conform to an ATOM PUBLISHING PROTOCOL (APP) based protocol. Additionally, the linked markup documents can include an entry collection of documents and a resource collection of documents. The resource collection can include at least one speech resource associated with a speech engine disposed in a speech processing system remotely located from the Web 2.0 server. The speech resource can add a speech processing capability to the speech-enabled application. In one embodiment, end-users of the speech-enabled application can be permitted to introspect, customize, replace, add, re-order, and remove at least a portion of the linked markup documents.	12-25-2008
20080319759	INTEGRATING A VOICE BROWSER INTO A WEB 2.0 ENVIRONMENT - The present invention discloses a system and method for integrating a voice browser into a Web 2.0 environment. For example, a system is disclosed which includes at least a Web 2.0 server, a voice browser, and a server-side speech processing system. The Web 2.0 server can serve Web 2.0 content comprising at least one speech-enabled application. The served Web 2.0 content can include voice markup. The voice browser can render the Web 2.0 content received from the Web 2.0 server which includes rendering the voice markup. The server-side speech processing system can handle speech processing operations for the speech-enabled application. Communications with the server-side speech processing system occur via a set of RESTful commands, such as an HTTP GET command, an HTTP POST command, an HTTP PUT command, and an HTTP DELETE command.	12-25-2008
20080319760	CREATING AND EDITING WEB 2.0 ENTRIES INCLUDING VOICE ENABLED ONES USING A VOICE ONLY INTERFACE - The present invention discloses a method for creating Web 2.0 entries, such as WIKI entries. In the method, a voice communication channel can be established between a user and an automated response system. User speech input can be received over the voice communication channel. A Web 2.0 entry can be created based upon the speech input. The Web 2.0 entry can be saved in a data store accessible by a Web 2.0 server. The Web 2.0 server can serve the saved Web 2.0 entry to Web 2.0 clients. The Web 2.0 clients can include a graphical and/or a voice interface through which the Web 2.0 entry can be presented to users of the clients. The created Web 2.0 entries (e.g. Web 2.0 application) can be formatted in an ATOM PUBLISHING PROTOCOL compliant manner.	12-25-2008
20090055191	ESTABLISHING CALL-BASED AUDIO SOCKETS WITHIN A COMPONENTIZED VOICE SERVER - A method of interfacing a telephone application server and a speech engine can include the step of establishing one or more audio sockets in a media converting component of the telephone application server. The audio socket can remain available for approximately a duration of a call. A work unit that requires processing by a speech engine can be detected for the call. An identifier for the audio socket and a data for the work unit can be conveyed to a selected speech engine. Work unit results from the selected speech engine can be received by the media converting component via the previously established audio socket.	02-26-2009
20090076823	INTERACTIVE VOICE RESPONSE INTERFACE, SYSTEM, METHODS AND PROGRAM FOR CORRECTIONAL FACILITY COMMISSARY - An interactive voice response interface for a correctional facility commissary that detects violations of facility restrictions to orders for commissary goods at more than one point in time, and allows comprehensive review and editing of pending orders for commissary items.	03-19-2009
20090076824	REMOTE CONTROL SERVER PROTOCOL SYSTEM - A remote control server protocol system transports data to a client system. The client system communicates with the server application using a platform-independent communications protocol. The client system sends commands and audio data to the server application. The server application may respond by transmitting audio and other messages to the client system. The messages may be transmitted over a single communications channel.	03-19-2009
20090094036	SYSTEM AND METHOD OF HANDLING PROBLEMATIC INPUT DURING CONTEXT-SENSITIVE HELP FOR MULTI-MODAL DIALOG SYSTEMS - A method of presenting a multi-modal help dialog move to a user in a multi-modal dialog system is disclosed. The method comprises presenting an audio portion of the multi-modal help dialog move that explains available ways of user inquiry and presenting a corresponding graphical action performed on a user interface associated with the audio portion. The multi-modal help dialog move is context-sensitive and uses current display information and dialog contextual information to present a multi-modal help move that is currently related to the user. A user request or a problematic dialog detection module may trigger the multi-modal help move.	04-09-2009
20090106028	AUTOMATED TUNING OF SPEECH RECOGNITION PARAMETERS - A method for execution on a server for serving presence information, the method for providing dynamically loaded speech recognition parameters to a speech recognition engine, can be provided. The method can include storing at least one rule for selecting speech recognition parameters, wherein a rule comprises an if-portion including criteria and a then-portion specifying speech recognition parameters that must be used when the criteria is met. The method can further include receiving notice that a speech recognition session has been initiated between a user and the speech recognition engine. The method can further include selecting a first set of speech recognition parameters responsive to executing the at least one rule and providing to the speech recognition engine the first set of speech recognition parameters for performing speech recognition of the user.	04-23-2009
20090112600	SYSTEM AND METHOD FOR INCREASING ACCURACY OF SEARCHES BASED ON COMMUNITIES OF INTEREST - Disclosed are systems, methods and computer-readable media for using a local communication network to generate a speech model. The method includes retrieving for an individual a list of numbers in a calling history, identifying a local neighborhood associated with each number in the calling history, truncating the local neighborhood associated with each number based on the at least one parameter, retrieving a local communication network associated with each number in the calling history and each phone number in the local neighborhood, and creating a language model for the individual based on the retrieved local communication network. The generated language model may be used for improved automatic speech recognition for audible searches as well as other modules in a spoken dialog system.	04-30-2009
20090138269	SYSTEM AND METHOD FOR ENABLING VOICE DRIVEN INTERACTIONS AMONG MULTIPLE IVR'S, CONSTITUTING A VOICE WORKFLOW - A method for enabling voice driven interactions among multiple interactive voice response (IVR) systems begins by receiving a telephone call from a user of a first IVR system to begin a transaction; and, automatically contacting, by the first IVR system, at least one additional IVR system. Specifically, the contacting of the additional IVR system includes assigning tasks to the additional IVR system. The tasks require input from the user and the additional IVR system is secure and separate from the first IVR system. Moreover, the tasks can include a transfer of currency and a transfer of local information.	05-28-2009
20090144061	Systems and methods for generating verbal feedback messages in head-worn electronic devices - Systems and methods for generating and providing verbal feedback messages to wearers of man-machine interface (MMI)-enabled head-worn electronic devices. An exemplary head-worn electronic device includes an MMI and an acoustic signal generator configured to provide verbal acoustic messages to a wearer of the head-worn electronic device in response to the wearer's interaction with the MMI. The head-worn electronic device may be further configured to monitor device states and generate and provide verbal acoustic messages indicative of changes to the device states to the wearer. The verbal messages are digitally stored and accessed by a microprocessor configured to execute a verbal feedback generation program. Further, the verbal messages may be stored according to multiple different natural languages, thereby allowing a user to select a preferred natural language by which the verbal acoustic messages are fed back to the user.	06-04-2009
20090187410	SYSTEM AND METHOD OF PROVIDING SPEECH PROCESSING IN USER INTERFACE - Disclosed are systems, methods and computer-readable media for enabling speech processing in a user interface of a device. The method includes receiving an indication of a field and a user interface of a device, the indication also signaling that speech will follow, receiving the speech from the user at the device, the speech being associated with the field, transmitting the speech as a request to public, common network node that receives and processes speech, processing the transmitted speech and returning text associated with the speech to the device and inserting the text into the field. Upon a second indication from the user, the system processes the text in the field as programmed by the user interface. The present disclosure provides a speech mash up application for a user interface of a mobile or desktop device that does not require expensive speech processing technologies.	07-23-2009
20090192800	Medical Ontology Based Data & Voice Command Processing System - A computerized integrated order entry and clinical documentation and voice recognition system enables voice responsive user entry of orders. The system includes a voice recognition unit for detecting spoken words and converting detected spoken words to data representing commands. A data processor, coupled to the voice recognition unit, processes the data representing commands provided by the voice recognition unit, to provide order and documentation related data and menu options for use by a user, by interpreting the data representing commands using an ordering and documentation application specific ontology and excluding use of other non-ordering or non-documentation application specific ontologies. The ordering application enables initiating an order for medication to be administered to a particular patient, or additional ordered services to be performed. A user interface processor, coupled to the data processor, provides data representing a display image. The display image, includes the order related data and menu options provided by the data processor and supports a user in selecting an order for medication to be administered to a particular patient	07-30-2009
20090204407	System and method for processing a spoken request from a user - A system and method are described for processing a spoken request from a user. In one embodiment, a method is disclosed for attempting to recognize a spoken request from a user with a speech recognition engine above a predetermined level of accuracy. If the spoken request is not recognized above the predetermined level of accuracy, the spoken request is provided to a level one agent. If the level one agent does not recognize the request, a voice connection is established between the user and a level two agent. In another embodiment, a method is disclosed for determining whether a silent response system recognizes a spoken request from a user above a predetermined level of accuracy. A response is provided to the user if the silent response system recognizes the spoken request. Otherwise, a voice connection is established between the user and a call center.	08-13-2009
20090276223	REMOTE ADMINISTRATION METHOD AND SYSTEM - An administration method and system. The method includes receiving by a computing system, a telephone call from an administrator. The computing system presents an audible menu associated with a plurality of computers to the administrator. The computing system receives from the administrator, an audible selection for a computer from the audible menu. The computing system receives from the administrator, an audible verbal command for performing a maintenance operation on the computer. The computing system executes the maintenance operation on the computer. The computing system receives from the computer, confirmation data indicating that the maintenance operation has been completed. The computing system converts the confirmation data into an audible verbal message. The computing system transmits the second audible verbal message to the administrator.	11-05-2009
20090326953	Method of accessing cultural resources or digital contents, such as text, video, audio and web pages by voice recognition with any type of programmable device without the use of the hands or any physical apparatus. - The use of voice as a means of communication with a computer or programmable device (	12-31-2009
20090326954	IMAGING APPARATUS, METHOD OF CONTROLLING SAME AND COMPUTER PROGRAM THEREFOR - An imaging apparatus is provided. The apparatus includes a sound collecting unit configured to collect speech in a monitored environment, a shooting unit configured to shoot video in the monitored environment, a detection unit configured to detect a change in a state of the monitored environment based upon a change in data acquired by the sound collecting unit, the shooting unit and a sensor for measuring the state of the monitored environment, a recognition unit configured to recognize the change in state with regard to speech data acquired by the sound collecting unit and video data acquired by the shooting unit, and a control unit configured to start up the recognition unit and select a recognition database, which is used by the recognition unit, based upon result of detection by the detection unit.	12-31-2009
20100010817	System and Method for Improving the Performance of Speech Analytics and Word-Spotting Systems - A System and Method for Improving the Performance of Speech Analytics and Word-Spotting Systems is provided wherein a digitized signal originates from a input client device belonging to a customer, the signal being then passed to a network which passes the signal to both of an output client device belonging to a customer service rep and a call recorder. The call recorder compresses the signal using CELP-based technology such as MASC® technology and then sends the compressed signal to a speech analytics engine before being processed with or without a signal processing filter. The speech analytics engine receives the signal and upon also receiving a query, the speech analytics engine operates on the signal in response to the query, thereby outputting one or more desired voice outputs to an application to include a query application.	01-14-2010
20100023332	SPEECH RECOGNITION INTERFACE FOR VOICE ACTUATION OF LEGACY SYSTEMS - Methods and apparatus are disclosed for a technician to access a systems interface to back-end legacy systems by voice input commands to a speech recognition module. Generally, a user logs a computer into a systems interface which permits access to back-end legacy systems. Preferably, the systems interface includes a first server with middleware for managing the protocol interface. Preferably, the systems interface includes a second server for receiving requests and generating legacy transactions. After the computer is logged-on, a request for voice input is made. A speech recognition module is launched or otherwise activated. The user inputs voice commands that are processed to convert them to commands and text that can be recognized by the client software. The client software formats the requests and forwards them to the systems interface in order to retrieve the requested information.	01-28-2010
20100042414	SYSTEM AND METHOD FOR IMPROVING NAME DIALER PERFORMANCE - Disclosed herein are systems, methods, and computer readable-media for improving name dialer performance. The method includes receiving a speech query for a name in a directory of names, retrieving matches to the query, if the matches are uniquely spelled homophones or near-homophones, identifying information that is unique to all retrieved matches, and presenting a spoken disambiguation statement to a user that incorporates the identified unique information. Identifying information can include multiple pieces of unique information if necessary to completely disambiguate the matches. A hierarchy can establish priority of multiple pieces of unique information for use in the spoken disambiguation statement.	02-18-2010
20100049525	METHODS, APPARATUSES, AND SYSTEMS FOR PROVIDING TIMELY USER CUES PERTAINING TO SPEECH RECOGNITION - A method is provided of providing cues from am electronic communication device to a user while capturing an utterance. A plurality of cues associated with the user utterance are provided by the device to the user in at least near real-time. For each of a plurality of portions of the utterance, data representative of the respective portion of the user utterance is communicated from the electronic communication device to a remote electronic device. In response to this communication, data, representative of at least one parameter associated with the respective portion of the user utterance, is received at the electronic communication device. The electronic communication device provides one or more cues to the user based on the at least parameter. At least one of the cues is provided by the electronic communication device to the user prior to completion of the step of capturing the user utterance.	02-25-2010
20100088100	ELECTRONIC DEVICES WITH VOICE COMMAND AND CONTEXTUAL DATA PROCESSING CAPABILITIES - An electronic device may capture a voice command from a user. The electronic device may store contextual information about the state of the electronic device when the voice command is received. The electronic device may transmit the voice command and the contextual information to computing equipment such as a desktop computer or a remote server. The computing equipment may perform a speech recognition operation on the voice command and may process the contextual information. The computing equipment may respond to the voice command. The computing equipment may also transmit information to the electronic device that allows the electronic device to respond to the voice command.	04-08-2010
20100088101	SYSTEM AND METHOD FOR FACILITATING CALL ROUTING USING SPEECH RECOGNITION - A computer-implemented method is described for optimizing prompts for a speech-enabled application. The speech-enabled application is operable to receive communications from a number of users and communicate one or more prompts to each user to illicit a response from the user that indicates the purpose of the user's communication. The method includes determining a number of prompt alternatives (each including one or more prompts) to evaluate and determining an evaluation period for each prompt alternative. The method also includes automatically presenting each prompt alternative to users during the associated evaluation period and automatically recording the results of user responses to each prompt alternative. Furthermore, the method includes automatically analyzing the recorded results for each prompt alternative based on one or more performance criteria and automatically implementing one of the prompt alternatives based on the analysis of the recorded results.	04-08-2010
20100094635	System for Voice-Based Interaction on Web Pages - SYSTEM FOR VOICE-BASE INTERACTION ON WEB PAGES, of type that permits the incorporation of voice-handling functions on a Web page, in which from a Terminal (	04-15-2010
20100106507	Ratio of Speech to Non-Speech Audio such as for Elderly or Hearing-Impaired Listeners - The invention relates to audio signal processing and speech enhancement. In accordance with one aspect, the invention combines a high-quality audio program that is a mix of speech and non-speech audio with a lower-quality copy of the speech components contained in the audio program for the purpose of generating a high-quality audio program with an increased ratio of speech to non-speech audio such as may benefit the elderly, hearing impaired or other listeners. Aspects of the invention are particularly useful for television and home theater sound, although they may be applicable to other audio and sound applications. The invention relates to methods, apparatus for performing such methods, and to software stored on a computer-readable medium for causing a computer to perform such methods.	04-29-2010
20100125460	TRAINING/COACHING SYSTEM FOR A VOICE-ENABLED WORK ENVIRONMENT - A voice assistant system is disclosed which directs the voice Prompts delivered to a first user of a voice assistant to also be communicated wirelessly to the voice assistant of a second user so that the second user can hear the voice Prompts as delivered to the first user.	05-20-2010
20100211396	System and Method for Speech Recognition System - A digital speech enabled middleware module is disclosed that facilitates interaction between a large number of client devices and network-based automatic speech recognition (ASR) resources. The module buffers feature vectors associated with speech received from the client devices when the number of client devices is greater than the available ASR resources. When an ASR decoder becomes available, the module transmits the feature vectors to the ASR decoder and a recognition result is returned.	08-19-2010
20100217603	Method, System, and Apparatus for Enabling Adaptive Natural Language Processing - An adaptive processing system includes one or more adaptive processing engines that are adapted to receive the one or more requests. The one or more adaptive processing engines adapted to parse the one or more requests and to communicate one or more queries to the one or more communication devices based at least in part on the information of the request. In one embodiment, the parsing of the one or more queries includes analyzing the information from the user, determining one or more next steps to perform based at least in part on the information from the user, and generating one or more queries for additional information. The system also includes an application server adapted to communicate with the one or more adaptive processing engines in response to the one or more requests based at least in part on the information of the one or more requests.	08-26-2010
20100223059	METHOD AND APPARATUS FOR PLAYING DYNAMIC AUDIO AND VIDEO MENUS - A method and an apparatus for playing dynamic audio and video menus are provided herein to play two or more audio and video menu items dynamically. Specifically, the audio and video data in at least two obtained audio and video menu items are split into audio data and video data, respectively. After the splitting, the obtained video data is integrated into one video stream data and the audio data and the integrated video stream data are played. In this way, the video data of each menu item in the dynamic audio and video menus are played smoothly and the voice prompts can be spliced seamlessly. As such, the effect of the audio dynamic menus is the same as the effect of playing a single audio file, and the user can hear the voice menus smoothly.	09-02-2010
20100274563	METHOD AND MOBILE COMMUNICATION DEVICE FOR GENERATING DUAL-TONE MULTI-FREQUENCY (DTMF) COMMANDS ON A MOBILE COMMUNICATION DEVICE HAVING A TOUCHSCREEN - A method and mobile communication device for generating dual-tone multi-frequency (DTMF) commands on a mobile communication device having a touchscreen are provided. In accordance with one embodiment, there is provided a method for generating dual-tone multi-frequency (DTMF) commands on a mobile communication device having a touchscreen, comprising: detecting an automated attendant during a telephone call; activating speech recognition in respect of incoming voice data during the telephone call in response to detecting an automated attendant; translating spoken prompts in the incoming voice data into respective DTMF commands; displaying a menu having selectable menu options corresponding to the DTMF commands In a graphical user interface on the touchscreen; receiving input selecting one of the menu options; receiving input via the touchscreen activating a selected one of the menu options; and generating a DTMF command in accordance with the activated menu option.	10-28-2010
20100324910	TECHNIQUES TO PROVIDE A STANDARD INTERFACE TO A SPEECH RECOGNITION PLATFORM - Techniques and systems to provide speech recognition services over a network using a standard interface are described. In an embodiment, a technique includes accepting a speech recognition request that includes at least audio input, via an application program interface (API). The speech recognition request may also include additional parameters. The technique further includes performing speech recognition on the audio according to the request and any specified parameters; and returning a speech recognition result as a hypertext protocol (HTTP) response. Other embodiments are described and claimed.	12-23-2010
20110040564	VOICE ASSISTANT SYSTEM FOR DETERMINING ACTIVITY INFORMATION - A system and method of assisting a care provider in the documentation of self-performance and support information for a resident or person includes a speech dialog with a care provider that uses the generation of speech to play to the care provider and the capture of speech spoken by a care provider. The speech dialog provides assistance to the care provider in providing care for a person according to a care plan for the person. The care plan includes one or more activities requiring a level of performance by the person. For the activity, speech inquiries are provided to the care provider, through the speech dialog, regarding performance of the activity by the person and regarding care provider assistance in the performance of the activity by the person. Speech input is captured from the care provider that is responsive to the speech inquiries. A code is then determined from the speech input and the code indicates the self-performance of the person and support information for a care provider for the activity.	02-17-2011
20110066439	DIMENSION MEASUREMENT SYSTEM - A dimension measurement system is provided. The dimension measurement system includes a speech I/O device fit in an ear canal of a worker, generating a voice signal from vibration in the air emitted from an eardrum of the worker and propagated inside the ear canal, and outputting the voice signal and an information processing device realizing a speech recognition function recognizing a measurement value of a dimension of an object from the voice signal that the speech I/O device output and a judgment function judging if the measurement value satisfies a reference value of the object.	03-17-2011
20110082698	Devices, Systems and Methods for Improving and Adjusting Communication - Devices, methods and systems for improving and adjusting voice volume and body movements during a performance are disclosed. Device embodiments may be configured with a processor, microphone, one or more movement sensors and at least a display or a speaker. The processor may include instructions configured to receive at least one of sound input from the microphone and movement data from the one or more accelerometers, generate one or more input levels corresponding to at least one of the sound input and movement data, compare the one or more generated input levels to one or more predefined input levels, associate the one or more predefined input levels with at least one of a color, text, graphic or audio file and present at least one of the color, text, graphic or audio file to a user of the device.	04-07-2011
20110144999	DIALOGUE SYSTEM AND DIALOGUE METHOD THEREOF - A dialogue system and a method for the same are disclosed. The dialogue system includes a multimodal input unit receiving speech and non-speech information of a user, a domain reasoner, which stores a plurality of pre-stored situations, each of which is formed by a combination one or more speech and non-speech information, calculating each adaptability of the pre-stored situations on the basis of a generated situation based on the speech and the non-speech information received from the multimodal input unit, and determining a current domain according to the calculated adaptability, a dialogue manager to select a response corresponding to the current domain, and a multimodal output unit to output the response. The dialogue system performs domain reasoning using a situation including information combinations reflected in the domain reasoning process, current information, and a speech recognition result, and reduces the size of a dialogue search space while increasing domain reasoning accuracy.	06-16-2011
20110191108	Remote controller with position actuatated voice transmission - A method of operation of a remote controller consistent with certain implementations involves determining a spatial orientation of the remote controller based upon an output signal from a position detector; and setting a voice mode of operation of the remote controller as active or inactive based upon the spatial orientation of the remote controller as determined by the position detector, where the voice mode determines whether or not the remote controller will accept and process voice information from a microphone. This abstract is not to be considered limiting, since other embodiments may deviate from the features described in this abstract.	08-04-2011
20110202350	REMOTE CONTROL OF A WEB BROWSER - A system for remotely and interactively controlling visual and multimedia content displayed on and rendered by a web browser using a telephony device. In particular, the system relates to receiving a voice input (e.g., dual tone multi-frequency DTMF input, spoken input, etc.) from a telephony device (e.g., a landline, a cellular telephone, or other system with telephone functionality, etc.) via a wide-area network to an intermediary computer that is configured to control the rendering of one or more web pages (or other web data) by a standard web browser.	08-18-2011
20110282671	METHODS FOR PERSONAL EMERGENCY INTERVENTION - A method according to an aspect of the present invention includes receiving a communication from a patient through an interactive voice response (IVR) system; providing a guided voice prompt from the interactive voice response system to the patient; receiving a response of the patient to the guided voice prompt through the interactive voice response system; analyzing the response of the patient to the guided voice prompt; determining, based on the response of the patient, whether a command should be transmitted; and transmitting a command to a device controlled by the patient after a determination that the command should be transmitted. This method can be practiced automatically to allow a medical device for a patient or other subject to be monitored without requiring the patient to manually enter information.	11-17-2011
20110282672	DISTRIBUTED VOICE BROWSER - The present invention can include a method of call processing using a distributed voice browser including allocating a plurality of service processors configured to interpret parsed voice markup language data and allocating a plurality of voice markup language parsers configured to retrieve and parse voice markup language data representing a telephony service. The plurality of service processors and the plurality of markup language parsers can be registered with one or more session managers. Accordingly, components of received telephony service requests can be distributed to the voice markup language parsers and the parsed voice markup language data can be distributed to the service processors.	11-17-2011
20110307259	SYSTEM AND METHOD FOR AUDIO CONTENT NAVIGATION - A system and method for communicating one or more audio files through a network. One or more original files of an original web site are converted into one or more audio files. An indication is provided to a user that the one or more original files are available as the one or more audio files in response to the user navigating the one or more original files. The one or more audio files are delivered to a computing device of the user through the network in response to a request to access the one or more audio files.	12-15-2011
20120022872	Automatically Adapting User Interfaces For Hands-Free Interaction - A user interface for a system such as a virtual assistant is automatically adapted for hands-free use. A hands-free context is detected via automatic or manual means, and the system adapts various stages of a complex interactive system to modify the user experience to reflect the particular limitations of such a context. The system of the present invention thus allows for a single implementation of a complex system such as a virtual assistant to dynamically offer user interface elements and alter user interface behavior to allow hands-free use without compromising the user experience of the same system for hands-on use.	01-26-2012
20120046950	RETRIEVAL AND PRESENTATION OF NETWORK SERVICE RESULTS FOR MOBILE DEVICE USING A MULTIMODAL BROWSER - A method of obtaining information using a mobile device can include receiving a request including speech data from the mobile device, and querying a network service using query information extracted from the speech data, whereby search results are received from the network service. The search results can be formatted for presentation on a display of the mobile device. The search results further can be sent, along with a voice grammar generated from the search results, to the mobile device. The mobile device then can render the search results.	02-23-2012
20120046951	NUMERIC WEIGHTING OF ERROR RECOVERY PROMPTS FOR TRANSFER TO A HUMAN AGENT FROM AN AUTOMATED SPEECH RESPONSE SYSTEM - A method for a speech response system to automatically transfer users to human agents. The method can establish an interactive dialog session between a user and an automated speech response system. An error score can be established when the interactive dialog session is initiated. During the interactive dialog session, responses to dialog prompts can be received. Error weights can be assigned to receive responses determined to be non-valid responses. Different non-valid responses can be assigned different error weights. For each non-valid response, the assigned error weight can be added to the error score. When a value of the error score exceeds a previously established error threshold, a user can be automatically transferred from the automated speech response system to a human agent.	02-23-2012
20120065982	DYNAMICALLY GENERATING A VOCAL HELP PROMPT IN A MULTIMODAL APPLICATION - Dynamically generating a vocal help prompt in a multimodal application that include detecting a help-triggering event for an input element of a VoiceXML dialog, where the detecting is implemented with a multimodal application operating on a multimodal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes, the multimodal application is operatively coupled to a VoiceXML interpreter, and the multimodal application has no static help text. Dynamically generating a vocal help prompt in a multimodal application according to embodiments of the present invention typically also includes retrieving, by the VoiceXML interpreter from a source of help text, help text for an element of a speech recognition grammar, forming by the VoiceXML interpreter the help text into a vocal help prompt, and presenting by the multimodal application the vocal help prompt through a computer user interface to a user.	03-15-2012
20120078635	VOICE CONTROL SYSTEM - One embodiment of a voice control system includes a first electronic device communicatively coupled to a server and configured to receive a speech recognition file from the server. The speech recognition file may include a speech recognition algorithm for converting one or more voice commands into text and a database including one or more entries comprising one or more voice commands and one or more executable commands associated with the one or more voice commands.	03-29-2012
20120078636	EVIDENCE DIFFUSION AMONG CANDIDATE ANSWERS DURING QUESTION ANSWERING - Diffusing evidence among candidate answers during question answering may identify a relationship between a first candidate answer and a second candidate answer, wherein the candidate answers are generated by a question-answering computer process, the candidate answers have associated supporting evidence, and the candidate answers have associated confidence scores. All or some of the evidence may be transferred from the first candidate answer to the second candidate answer based on the identified relationship. A new confidence score may be computed for the second candidate answer based on the transferred evidence.	03-29-2012
20120078637	METHOD AND APPARATUS FOR PERFORMING AND CONTROLLING SPEECH RECOGNITION AND ENROLLMENT - A method and an apparatus for performing and controlling speech recognition and enrolment are provided. The method for performing speech recognition and enrolment includes: receiving a Speech Enrolment Start Request and a Speech Recognition Request sent from a media gateway controller (MGC); performing speech recognition and enrolment according to the Speech Enrolment Start Request and the Speech Recognition Request, and obtaining a recognition and enrolment result; and feeding back the recognition and enrolment result to the MGC.	03-29-2012
20120116775	BIOCHEMICAL ANALYZER HAVING MICROPROCESSING APPARATUS WITH EXPANDABLE VOICE CAPACITY - A biochemical analyzer having a microprocessing apparatus with expandable voice capacity is characterized in that a driving module is installed in a data processor and a voice carrier is replaceable. Thereby, increase or decrease of voice files can be easily done by replacing the current voice carrier with an alternative voice carrier storing desired voice files, without the need of replacing the driving module together with the voice carrier, thereby saving costs and reducing processing procedures.	05-10-2012
20120116776	System and method for client voice building - Provided is a system and method for building and managing a customized voice of an end-user, comprising the steps of designing a set of prompts for collection from the user, wherein the prompts are selected from both an analysis tool and by the user's own choosing to capture voice characteristics unique to the user. The prompts are delivered to the user over a network to allow the user to save a user recording on a server of a service provider. This recording is then retrieved and stored on the server and then set up on the server to build a voice database using text-to-speech synthesis tools. A graphical interface allows the user to continuously refine the data file to improve the voice and customize parameter and configuration settings, thereby forming a customized voice database which can be deployed or accessed.	05-10-2012
20120116777	Stateful, Double-Buffered Dynamic Navigation Voice Prompting - A navigation system written in J2ME MIDP for a client device includes a plurality of media players each respectively comprising a buffer. A navigation program manages the state of the plurality of media players. The plurality of media players are in either one of an acquiring resources state, and a playing and de-allocating state. The use of a plurality of media players each respectively comprising a buffer overcomes the prior art in which navigation system can cut off a voice prompt because of the time-consuming tasks associated with playing a voice prompt.	05-10-2012
20120166201	INVOKING TAPERED PROMPTS IN A MULTIMODAL APPLICATION - Methods, apparatus, and computer program products are described for invoking tapered prompts in a multimodal application implemented with a multimodal browser and a multimodal application operating on a multimodal device supporting multiple modes of user interaction with the multimodal application, the modes of user interaction including a voice mode and one or more non-voice modes. Embodiments include identifying, by a multimodal browser, a prompt element in a multimodal application; identifying, by the multimodal browser, one or more attributes associated with the prompt element; and playing a speech prompt according to the one or more attributes associated with the prompt element.	06-28-2012
20120166202	SYSTEM AND METHOD FOR FUNNELING USER RESPONSES IN AN INTERNET VOICE PORTAL SYSTEM TO DETERMINE A DESIRED ITEM OR SERVICEBACKGROUND OF THE INVENTION - A method of funneling user responses in a voice portal system to determine a desired item or service includes (a) querying a user for an attribute value associated with a first particular attribute of the desired item or service; and (b) determining if the attribute value given by the user satisfies an end state. If the end state is not satisfied, steps (a) and (b) are performed with a new particular attribute.	06-28-2012
20120173243	Expert Conversation Builder - An expert conversation builder contains a knowledge database that includes a plurality of dialogues having nodes and edges arranged as directed acyclic graphs. Users and authors of the system interface with the knowledge database through a graphical interface to author dialogues and to create expert conversations as threads traversing the node in the dialogues.	07-05-2012
20120179471	CONFIGURABLE SPEECH RECOGNITION SYSTEM USING MULTIPLE RECOGNIZERS - Techniques for combining the results of multiple recognizers in a distributed speech recognition architecture. Speech data input to a client device is encoded and processed both locally and remotely by different recognizers configured to be proficient at different speech recognition tasks. The client/server architecture is configurable to enable network providers to specify a policy directed to a trade-off between reducing recognition latency perceived by a user and usage of network resources. The results of the local and remote speech recognition engines are combined based, at least in part, on logic stored by one or more components of the client/server architecture.	07-12-2012
20120203557	COMPREHENSIVE MULTIPLE FEATURE TELEMATICS SYSTEM - A comprehensive system and method for telematics including the following features individually or in sub-combinations: vehicle user interfaces, telecommunications, speech recognition, digital commerce and vehicle parking, digital signal processing, wireless transmission of digitized voice input, navigational assistance for motorists, data communication to vehicles, mobile client-server communication, extending coverage and bandwidth of wireless communication services, and noise reduction.	08-09-2012
20120209613	METHOD AND ARRANGEMENT FOR MANAGING GRAMMAR OPTIONS IN A GRAPHICAL CALLFLOW BUILDER	08-16-2012
20120215542	METHOD OF PROVIDING DYNAMIC SPEECH PROCESSING SERVICES DURING VARIABLE NETWORK CONNECTIVITY - A client device for providing dynamic speech processing services during variable network connectivity with a network server includes a connection monitor that monitors network connectivity between the client device and the network server. The device further includes a simplified speech processor that processes speech data and is initiated based on an assessment from the connection monitor that the network connectivity is impaired. The device further includes a speech data storage that stores processed speech data from the simplified speech processor and a transmitter that is configured to transmit the stored speech data to the network	08-23-2012
20120232906	Electronic Devices with Voice Command and Contextual Data Processing Capabilities - An electronic device may capture a voice command from a user. The electronic device may store contextual information about the state of the electronic device when the voice command is received. The electronic device may transmit the voice command and the contextual information to computing equipment such as a desktop computer or a remote server. The computing equipment may perform a speech recognition operation on the voice command and may process the contextual information. The computing equipment may respond to the voice command. The computing equipment may also transmit information to the electronic device that allows the electronic device to respond to the voice command.	09-13-2012
20120245943	TRANSFORMING A NATURAL LANGUAGE REQUEST FOR MODIFYING A SET OF SUBSCRIPTIONS FOR A PUBLISH/SUBSCRIBE TOPIC STRING - A natural language request for modifying a set of subscriptions for one or more topics in a publish/subscribe topic hierarchy is received at a processing device. The natural language request includes a predetermined natural language element. The natural language request is transformed into a publish/subscribe topic string and the predetermined natural language element is transformed into a publish/subscribe symbol. The symbol represents one or more topics in the topic hierarchy. One or more subscriptions to one or more topics is modified based on the transformed topic string.	09-27-2012
20120245944	Intelligent Automated Assistant - The intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions. The system can be implemented using any of a number of different platforms, such as the web, email, smartphone, and the like, or any combination thereof. In one embodiment, the system is based on sets of interrelated domains and tasks, and employs additional functionally powered by external services with which the system can interact.	09-27-2012
20120253820	Mixed-mode interaction - A user of a wireless device, such as a mobile phone, can make purchases or obtain information via a network, such as the Internet, using both voice and non-verbal methods. Users can submit voice queries and receive non-verbal replies, submit non-verbal queries and receive voice replies, or perform similar operations that many the voice and data capabilities of modern mobile communication devices. The user may provide notification criteria indicating under what conditions a notification should be sent to the user's wireless device. When purchasing opportunities matching the selected notification criteria become available, the user is notified. The user can respond to the notification, and immediately take advantage of the purchasing opportunity if he so desires. Mixed-mode interactions can also be used by sellers to more advantageously control the marketing of distressed, time sensitive, or other merchandise/services.	10-04-2012
20120253821	VEHICULAR DEVICE AND METHOD FOR COMMUNICATING THE SAME WITH INFORMATION CENTER - A communication unit is connected with an information center, which is receivable analog data, through an external network. A transmission unit transmits predetermined vehicle information in a form of a voice prompt of analog data when the communication unit is in connection with the information center.	10-04-2012
20120253822	Systems and Methods for Managing Prompts for a Connected Vehicle - A method for providing audio prompts via a service-providing remote center includes receiving a list of requested data from an on-board navigation system of a vehicle, and, for each item in the list of requested data, determining whether an audio prompt is available and delivering an associated audio prompt from the service-providing remote center over a data channel. Also provided is a method for obtaining audio prompts using a minimal amount of text-to-speech ports including determining a plurality of known data items, generating audio prompts for the plurality of known data items with a single text-to-speech engine using batch mode processing, obtaining an associated audio prompt for each of the known data items, and storing each associated audio prompt in a recording database.	10-04-2012
20120253823	Hybrid Dialog Speech Recognition for In-Vehicle Automated Interaction and In-Vehicle Interfaces Requiring Minimal Driver Processing - A system and method for implementing a server-based speech recognition system for multi-modal automated interaction in a vehicle includes receiving, by a vehicle driver, audio prompts by an on-board human-to-machine interface and a response with speech to complete tasks such as creating and sending text messages, web browsing, navigation, etc. This service-oriented architecture is utilized to call upon specialized speech recognizers in an adaptive fashion. The human-to-machine interface enables completion of a text input task while driving a vehicle in a way that minimizes the frequency of the driver's visual and mechanical interactions with the interface, thereby eliminating unsafe distractions during driving conditions. After the initial prompting, the typing task is followed by a computerized verbalization of the text. Subsequent interface steps can be visual in nature, or involve only sound.	10-04-2012
20120284030	Stateful, Double-Buffered Dynamic Navigation Voice Prompting - A navigation system written in J	11-08-2012
20120303372	ENABLING SECURE TRANSACTIONS BETWEEN SPOKEN WEB SITES - Techniques for enabling a secure transaction with a remote site that uses voice interaction are provided. The techniques include authenticating a remote site to enable a secure transaction, wherein authenticating the remote site comprises using a dynamically generated audio signal.	11-29-2012
20120310652	Adaptive Human Computer Interface (AAHCI) - An Adaptive Human-Computer Interface (AAHCI) allows an electronic system to automatically monitor and learn from normal in-use behavior exhibited by a human user via responses generated by the supported input devices and to adjust output to the supported output devices accordingly. This Auto-Learning process is different than computer-directed training sessions and takes place as the user begins to use the device for the first time and with repeated use over time. The purpose of AHCI is to provide a user experience that is tailored to the skills, preferences, deficiencies and other personal attributes of the user automatically via machine-learned processes. This in turn provides an improved user experience that is more productive and cost efficient and that can automatically optimize itself over time with repeated use.	12-06-2012
20130006641	PROVIDING ANSWERS TO QUESTIONS USING LOGICAL SYNTHESIS OF CANDIDATE ANSWERS - A method, system and computer program product for generating answers to questions. In one embodiment, the method comprises receiving an input query, decomposing the input query into a plurality of different subqueries, and conducting a search in one or more data sources to identify at least one candidate answer to each of the subqueries. A ranking function is applied to each of the candidate answers to determine a ranking for each of these candidate answers; and for each of the subqueries, one of the candidate answers to the subquery is selected based on this ranking. A logical synthesis component is applied to synthesize a candidate answer for the input query from the selected the candidate answers to the subqueries. In one embodiment, the procedure applied by the logical synthesis component to synthesize the candidate answer for the input query is determined from the input query.	01-03-2013
20130013317	METHOD AND APPARATUS FOR NAVIGATION OF A DIALOGUE SYSTEM - In one embodiment, the present disclosure is a method and apparatus for navigation of a dialogue system. In one embodiment, a method for facilitating navigation of a menu of a dialogue system includes encoding data including information for navigating the menu in a machine-readable data structure and outputting the machine-readable data structure.	01-10-2013
20130046543	INTERACTIVE VOICE RESPONSE (IVR) SYSTEM FOR ERROR REDUCTION AND DOCUMENTATION OF MEDICAL PROCEDURES - Interactive voice response (IVR) systems and methods for delivery of healthcare services (e.g., by one or more medical professionals, such as, for example, in a hospital or clinic). In some embodiments, the present systems can be configured to: prompt one or more users for a plurality of voice inputs with information associated with at least one of a patient and a user; and determine whether each of the plurality of voice inputs is consistent with records related to the patient or the one or more users. In some embodiments, the present systems can be configured to: during performance of a procedure on a patient, prompt one or more users to provide a plurality of voice inputs with information related to progress of the procedure or characteristics of the patient; and/or prompt the user to perform each of a plurality of steps of the procedure.	02-21-2013
20130054245	System and Method to Search a Media Content Database Based on Voice Input Data - A method includes initiating a call from an interactive voice response (IVR) system to a first device associated with a user in response to a request. The method includes receiving voice input data at the IVR system via the call. The method also includes performing a search of a media content database based at least partially on the voice input data. The method further includes sending search results identifying media content items based on the search of the media content database to a second device associated with the user.	02-28-2013
20130066633	Providing Audio-Activated Resource Access for User Devices - Methods and computer systems for providing audio-activated resource access for user devices are provided. In at least one embodiment, a computer system may comprise a processor and a memory coupled to the processor. The memory may store instructions to cause the processor to perform operations comprising capturing audio at a user device. The operations may also comprise using a speech-to-text converter to convert speech transmitted over the audio into text and transmitting the text to a server system to determine a corresponding keyword or phrase. The operations may also comprise receiving a resource corresponding to the keyword or phrase.	03-14-2013
20130066634	Automated Conversation Assistance - Methods, apparatuses, systems, and computer-readable media for providing automated conversation assistance are presented. According to one or more aspects, a computing device may obtain user profile information associated with a user of the computing device, the user profile information including a list of one or more words that have previously been detected in one or more previously captured speeches associated with the user. Subsequently, the computing device may select, based on the user profile information, one or more words from a captured speech for inclusion in a search query. Then, the computing device may generate the search query based on the selected one or more words.	03-14-2013
20130066635	APPARATUS AND METHOD FOR CONTROLLING HOME NETWORK SERVICE IN PORTABLE TERMINAL - An apparatus and a method, which set a remote control command for controlling a home network service in a portable terminal are provided. The apparatus includes a memory for storing configuration types of a remote control command in a set order in a home network service; and a controller for setting the remote control command including the input configuration types of the remote control command and transmitting the remote control command, when the configuration types of the remote control command are input in the set order in the home network service.	03-14-2013
20130096923	System and Method of Dynamically Modifying a Spoken Dialog System to Reduce Hardware Requirements - A system and method for providing a scalable spoken dialog system are disclosed. The method comprises receiving information which may be internal to the system or external to the system and dynamically modifying at least one module within a spoken dialog system according to the received information. The modules may be one or more of an automatic speech recognition, natural language understanding, dialog management and text-to-speech module or engine. Dynamically modifying the module may improve hardware performance or improve a specific caller's speech processing accuracy, for example. The modification of the modules or hardware may also be based on an application or a task, or based on a current portion of a dialog.	04-18-2013
20130096924	Apparatus and Method for Processing Service Interactions - An interactive voice and data response system that directs input to a voice, text, and web-capable software-based router, which is able to intelligently respond to the input by drawing on a combination of human agents, advanced speech recognition and expert systems, connected to the router vis a TCP/IP network. The digitized input is broken down into components so that the customer interaction is managed as a series of small tasks performed by a pool of human agents, rather than one ongoing conversation between the customer and a single agent. The router manages the interactions and keeps pace with a real-time conversation. The system utilizes both speech recognition and human intelligence for purposes of interpreting customer utterances or customer text, where the role of the human agent(s) is to input the intent of caller utterances, and where the computer system—not the human agent—determines which response to provide given the customer's stated intent (as interpreted/captured by the human agents). The system may use more than one human agent, or both human agents and speech recognition software, to interpret simultaneously the same component for error-checking and interpretation accuracy.	04-18-2013
20130103403	RESPONDING TO A CALL TO ACTION CONTAINED IN AN AUDIO SIGNAL - An audio signal is monitored to detect the presence of a call to action contained therein. Addressing information is automatically extracted from the call to action and stored on a storage medium. An electronic message responding to the call to action may be automatically prepared, or a contact field may be automatically populated for inclusion in a contact list. The audio signal may be digitized or obtained from a broadcast transmission, and the process may be performed by a mobile communication device, a central system, or a combination thereof.	04-25-2013
20130110513	Platform for Sharing Voice Content	05-02-2013
20130110514	INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING METHOD	05-02-2013
20130110515	Disambiguation Based on Active Input Elicitation by Intelligent Automated Assistant	05-02-2013
20130132090	Voice Data Retrieval System and Program Product Therefor - A voice data retrieval system including an inputting device of inputting a keyword, a phoneme converting unit of converting the inputted keyword in a phoneme expression, a voice data retrieving unit of retrieving a portion of a voice data at which the keyword is spoken based on the keyword in the phoneme expression, a comparison keyword creating unit of creating a set of comparison keywords having a possibility of a confusion of a user in listening to the keyword based on a phoneme confusion matrix for each user, and a retrieval result presenting unit of presenting a retrieval result from the voice data retrieving unit and the comparison keyword from the comparison keyword creating unit to a user.	05-23-2013
20130138443	VOICE-SCREEN ARS SERVICE SYSTEM, METHOD FOR PROVIDING SAME, AND COMPUTER-READABLE RECORDING MEDIUM - A method for providing a voice-screen ARS service on a terminal, according to an embodiment of the present invention, uses an application installed on the terminal to connect to an IVR system of a client company via a voice call and connects a data call to a VARS service server. Menu information including a plurality of menu items related to a client is received through a data call and displayed on a screen and voice information related to the menu is received through a voice call and output in audio. Accordingly, when a user uses the ARS, both services of voice and onscreen information are simultaneously provided and thereby decreases the limitations and inaccuracies of provided voice information increases user convenience.	05-30-2013
20130144628	VOICE INTERFACE TO NFC APPLICATIONS - Technologies for transferring Near Field Communications information on a computing device include storing information corresponding to services in a database on the computing device, receiving a voice input corresponding to a name of a requested service, and retrieving the information corresponding to the requested service from the database. Such technologies may also include loading the retrieved information corresponding to the requested service into a Near Field Communications tag emulated by the computing device and transferring the retrieved information to a portable computing device in response to the Near Field Communications tag being touched by a Near Field Communications reader of the portable computing device. The information corresponding to the requested service stored in the database, retrieved from the database, loaded into the Near Field Communications tag, and/or transferred to the portable computing device may include a Universal Resource Identifier and content-specific keywords corresponding to the requested service.	06-06-2013
20130204626	Method and Apparatus for Setting Selected Recognition Parameters to Minimize an Application Cost Function - Methods and systems for setting selected automatic speech recognition parameters are described. A data set associated with operation of a speech recognition application is defined and includes: i. recognition states characterizing the semantic progression of a user interaction with the speech recognition application, and ii. recognition outcomes associated with each recognition state. For a selected user interaction with the speech recognition application, an application cost function is defined that characterizes an estimated cost of the user interaction for each recognition outcome. For one or more system performance parameters indirectly related to the user interaction, the parameters are set to values which optimize the cost of the user interaction over the recognition states.	08-08-2013
20130204627	Systems and Methods for Off-Board Voice-Automated Vehicle Navigation - A system for selecting music includes a mobile system for processing and transmitting through a wireless link a continuous voice stream spoken by a user of the mobile system, the continuous voice stream including a music request, and a data center for processing the continuous voice stream received through the wireless link into voice music information. The data center can perform automated voice recognition processing on the voice music information to recognize music components of the music request, confirm the recognized music components through interactive speech exchanges with the mobile system user through the wireless link and the mobile system, selectively allow human data center operator intervention to assist in identifying the selected recognized music components having a recognition confidence below a selected threshold value, and download music information pertaining to the music request for transmission to the mobile system derived from the confirmed recognized music components.	08-08-2013
20130211841	Multi-Dimensional Interactions and Recall - Methods for initiating actions based on analysis of multi-dimensional interactions are presented. Electronic devices can acquire sensor data representing interactions among multiple entities. Analysis engines can use the interaction data to create or otherwise manage interaction guide queues based on conceptual threads associated with the interactions. Interaction guides within the queue comprise instructions, possibly domain-specific instructions, for devices to participate in the interactions. Contemplated engines manage the queues as a function of attributes, for example priority, derived from the interactions.	08-15-2013
20130246069	System and Method of Providing a Spoken Dialog Interface to a Website - Disclosed is a method for training a spoken dialog service component from website data. Spoken dialog service components typically include an automatic speech recognition module, a language understanding module, a dialog management module, a language generation module and a text-to-speech module. The method includes selecting anchor texts within a website based on a term density, weighting those anchor texts based on a percent of salient words to total words, and incorporating the weighted anchor texts into a live spoken dialog interface, the weights determining a level of incorporation into the live spoken dialog interface.	09-19-2013
20130253936	MEMORY AID DEVICE - A device for aiding memory is provided having a series of actuators which are independently actuatable by a user. The device includes means for enabling a plurality of audible messages to be created by the user that are each assignable to one of the actuators. Each audible message is playable by actuation of the actuator to which it is assigned.	09-26-2013
20130262123	Computer-Readable Medium, System and Method of Providing Domain-Specific Information - A computer readable storage medium embodies instructions that, when executed by a processor, cause the processor to perform a method including receiving a natural language request corresponding to an audio input associated with a user. The computer-readable storage medium further embodies instructions that, when executed, cause the processor to retrieve account information associated with the user from a domain-specific data source through a network based on the natural language request using an application configurable retrieve account data from selected ones of a plurality of domain-specific data sources, process the account information based on the natural language request to produce output information, and provide the output information to an output interface.	10-03-2013
20130262124	"AT LEAST" OPERATOR FOR COMBINING AUDIO SEARCH HITS - System and method to search audio data, including: receiving audio data representing speech; receiving a search query related to the audio data; compiling, by use of a processor, the search query into a hierarchy of scored speech recognition sub-searches; searching, by use of a processor, the audio data for speech identified by one or more of the sub-searches to produce hits; and combining, by use of a processor, the hits by use of at least one combination function to provide a composite search score of the audio data. The combination function may include an at-least-M-of-N function that produces a high score when at least M of N function inputs exceed a predetermined threshold value. The composite search score employ a soft time window such as a spline function.	10-03-2013
20130262125	KNOWLEDGE REPOSITORY - A knowledge storage system is described. A specific embodiment is a computer system comprising a knowledge base of general knowledge in structured form which can be added to and queried by untrained users. Various embodiments include the facility for remote computers to access the knowledge stored in the system, natural language questions to be answered, profile screens giving general knowledge about an object in the system, and methods for distinguishing between reliable and unreliable facts.	10-03-2013
20130297317	METHOD FOR OFFERING SUGGESTION DURING CONVERSATION, ELECTRONIC DEVICE USING THE SAME, AND NON-TRANSITORY STORAGE MEDIUM - A method for offering suggestion during conversation, an electronic device using the same, and a non-transitory storage medium are provided. The method includes listening to a conversation on a first electronic device and a second electronic device, and determining whether the conversation satisfies a recommendation criterion. The method also includes determining whether at least one suggestion information exists in a database if the conversation satisfies the recommendation criterion. The method further includes displaying at least one suggestion option related to the at least one suggestion information on the first electronic device if the at least one suggestion information exists in the database.	11-07-2013
20130304477	Computer, Internet and Telecommunications Based Network - A method and apparatus for a computer and telecommunication network which can receive, send and manage information from or to a subscriber of the network, based on the subscriber's configuration. The network is made up of at least one cluster containing voice servers which allow for telephony, speech recognition, text-to-speech and conferencing functions, and is accessible by the subscriber through standard telephone connections or through internet connections. The network also utilizes a database and file server allowing the subscriber to maintain and manage certain contact lists and administrative information. A web server is also connected to the cluster thereby allowing access to all functions through internet connections.	11-14-2013
20130317826	APPARATUSES, METHODS AND SYSTEMS FOR A DIGITAL CONVERSATION MANAGEMENT PLATFORM - The APPARATUSES, METHODS AND SYSTEMS FOR A DIGITAL CONVERSATION MANAGEMENT PLATFORM (“DCM-Platform”) transforms digital dialogue from consumers, client demands and, Internet search inputs via DCM-Platform components into tradable digital assets, and client needs based artificial intelligence campaign plan outputs. In one implementation, The DCM-Platform may capture and examine conversations between individuals and artificial intelligence conversation agents. These agents may be viewed as assets. One can measure the value and performance of these agents by assessing their performance and ability to generate revenue from prolonging conversations and/or ability to effect sales through conversations with individuals.	11-28-2013
20130332172	TRANSMITTING DATA FROM AN AUTOMATED ASSISTANT TO AN ACCESSORY - An accessory is configured to receive a request. The accessory transmits information associated with the request to a portable device. An automated assistant application executed by the portable device can interpret the request and provide a report. The portable device can transmit the report to the accessory. The report may include one or more results determined by the automated assistant.	12-12-2013
20130339024	PARKING LOT SYSTEM - A parking lot system includes a camera that photographs an image including a number plate of a vehicle, and generates a first image; an image analysis section that performs a number analysis processing to analyze the first image and acquire number information described on the number plate of the vehicle; a storage section that stores the number information and parked position information of the vehicle associated with one another; a speech acquisition section to acquire the voice of the user; a speech recognition section that performs a speech recognition processing on the voice; and an output section that performs an output processing to retrieve the number information from the storage section based on the result of the speech recognition processing and notifies the user of the parked position information.	12-19-2013
20130346083	Computer-Implemented System And Method For User-Controlled Processing Of Audio Signals - A computer-implemented system and method for user-controlled processing of audio signals is provided. An audio signal including a reference segment and a segment preceding the reference segment is obtained. A value q is received from a user. Audio buffers in the preceding segment are defined, each having a width of N samples and a starting point a unique number of samples away from the preceding segment's start, based on a division of N by q. One or more of the buffers are transformed into discrete Fourier transform (DFT) buffers. A signature of the signal is generated using at least a portion of the reference segment and at least one of the DFT buffers. A new audio signal is received and a DFT for the audio signal is generated. The new audio signal is determined to match the audio signal based on a comparison of the DFT to the signature.	12-26-2013
20140012585	DISPLAY APPARATUS, INTERACTIVE SYSTEM, AND RESPONSE INFORMATION PROVIDING METHOD - A display apparatus includes a voice collecting device which collects a user voice, a communication device which performs communication with an interactive server, and a control device which, when response information corresponding to the user voice sent to the interactive server is received from the interactive server, controls to perform a feature corresponding to the response information, and the control device controls the communication device to receive replacement response information, related to the user voice, through a web search and a social network service (SNS).	01-09-2014
20140032219	Parsimonious Protection of Sensitive Data in Enterprise Dialog Systems - In one embodiment, a method comprises classifying a representation of audio data of a dialog turn in a dialog system to a classification. The method may further comprise taking a security action on the classified representation of the audio data of the dialog turn as a function of the classification. The security action can be suppressing the representation of the audio data, encrypting the representation of the audio data, releasing the representation of the audio data, partially suppressing the representation of the audio data, partially encrypting the representation of the audio data, partially releasing the representation of the audio data, or a command.	01-30-2014
20140052449	ESTABLISHING A MULTIMODAL ADVERTISING PERSONALITY FOR A SPONSOR OF A ULTIMODAL APPLICATION - Establishing a multimodal advertising personality for a sponsor of a multimodal application, including associating one or more vocal demeanors with a sponsor of a multimodal application and presenting a speech portion of the multimodal application for the sponsor using at least one of the vocal demeanors associated with the sponsor.	02-20-2014
20140067402	DISPLAYING ADDITIONAL DATA ABOUT OUTPUTTED MEDIA DATA BY A DISPLAY DEVICE FOR A SPEECH SEARCH COMMAND - A speech search method performed by a display device, the method including outputting media data including audio data, receiving a speech search command for additional data about the outputted media data from a user, the speech search command including at least one query word, determining whether the at least one query word matches a query term that is full and searchable, when the at least one query word matches the query term that is full and searchable, performing a search for the additional data using the query term, and when the at least one query word does not match the query term that is full and searchable, determining the query term from a predetermined amount of the audio data prior to receiving the speech search command and performing the search for the additional data using the query term.	03-06-2014
20140088971	System And Method For Voice Operated Communication Assistance - A system and method for voice operated communication assistance. Embodiments of the invention may include detecting a predetermined command from a first user making a call on a communication device; redirecting the call to a hosted application; and notifying the first user that the hosted application is ready to accept a command, receiving data from a second user; and converting the data from the second user to a voice signal and sending the voice signal to the first user. The commands may be a voice commands.	03-27-2014
20140095167	SYSTEMS AND METHODS FOR PROVIDING A VOICE AGENT USER INTERFACE - Some embodiments relate to techniques performed at least in part by at least one voice agent executing on a computing device. The techniques comprise responsive to the at least one voice agent receiving input at least partially specifying a requested action to be performed at least partially by an application program, presenting visual feedback responsive to the input concurrently via a user interface of the at least one voice agent and a user interface of the application program.	04-03-2014
20140095168	SYSTEMS AND METHODS FOR PROVIDING A VOICE AGENT USER INTERFACE - Some embodiments provide techniques performed by at least one voice agent. The techniques include receiving voice input; accessing contextual information related to an application program that has focus of the computing device when the voice input is received; and using the contextual information to interpret the received voice input.	04-03-2014
20140100853	Interactive Voice Response System - An interactive voice response system, comprising: a processor configured to control the output of voice prompts for transmission to a user; an alphanumeric string generator controllable by the processor to generate a random or pseudo-random alphanumeric string for outputting by the processor to a user in natural language form; an input module for receiving a user response and configured to recognize alphanumeric characters in the user response and to output a recognized string of one or more alphanumeric characters recognized in the user response; and a validation module. The validation module is configured to receive the generated alphanumeric string from the alphanumeric string generator and the recognized string corresponding to the generated alphanumeric string from the input module, to compare the generated alphanumeric string with the recognized string, to determine whether the recognized string matches the generated alphanumeric string, and to output validation data in response to determining that the recognized string matches the generated alphanumeric string.	04-10-2014
20140108017	Multi-Tiered Voice Feedback in an Electronic Device - This invention is directed to providing voice feedback to a user of an electronic device. Because each electronic device display may include several speakable elements (i.e., elements for which voice feedback is provided), the elements may be ordered. To do so, the electronic device may associate a tier with the display of each speakable element. The electronic device may then provide voice feedback for displayed speakable elements based on the associated tier. To reduce the complexity in designing the voice feedback system, the voice feedback features may be integrated in a Model View Controller (MVC) design used for displaying content to a user. For example, the model and view of the MVC design may include additional variables associated with speakable properties. The electronic device may receive audio files for each speakable element using any suitable approach, including for example by providing a host device with a list of speakable elements and directing a text to speech engine of the host device to generate and provide the audio files.	04-17-2014
20140122083	CHATBOT SYSTEM AND METHOD WITH CONTEXTUAL INPUT AND OUTPUT MESSAGES - A chatbot system and method with contextual input/output messages. A chatbot includes a processor, an interactive dialog interface and a knowledge database. The system uses a script file to display input and output messages in a tree format. An initial input or output message is stored. An identifier is assigned to the initial input or output message that is then used as context for the subsequent input/output messages by associating and storing the identifier with the subsequent input/output messages. The relationship between the first input or output message and subsequent input/output messages define a parent-child relationship that is displayable via the script file.	05-01-2014
20140129231	AUTHENTICATION BASED ON SOUND PROXIMITY - A computer program product comprises computer usable program code for receiving data describing a proposed electronic transaction between first and second communications devices. Additional computer usable program code is provided for generating a first audio signal by sound detected by a first microphone of the first communications device, and for generating a second audio signal by sound detected by a second microphone that is part of the second communications device. Still further computer usable program code provides for authenticating that the first communications device and the second communications device are in the same proximity in response to determining that the first and second audio signals were produced by the same sound event, and for completing the proposed electronic transaction between the first and second communications device in response to authenticating that the first and second communications devices are in close proximity.	05-08-2014
20140142948	SYSTEMS AND METHODS FOR IN-VEHICLE CONTEXT FORMATION - Systems, methods, and computer program products directed to in-vehicle context formation are described. Data from one or more sources associated with a vehicle may be received. Context information may be identified, based upon, at least in part, the received data. Audio captured from the vehicle may be received. The context information may be processed based upon, at least in part, at least one of the data from the one or more sources or the received audio.	05-22-2014
20140149121	Method of Handling Frequently Asked Questions in a Natural Language Dialog Service - A voice-enabled help desk service is disclosed. The service comprises an automatic speech recognition module for recognizing speech from a user, a spoken language understanding module for understanding the output from the automatic speech recognition module, a dialog management module for generating a response to speech from the user, a natural voices text-to-speech synthesis module for synthesizing speech to generate the response to the user, and a frequently asked questions module. The frequently asked questions module handles frequently asked questions from the user by changing voices and providing predetermined prompts to answer frequently asked questions.	05-29-2014
20140195243	DISPLAY APPARATUS AND METHOD FOR CONTROLLING THE DISPLAY APPARATUS - An electronic apparatus is provided, which includes an output, a voice collector configured to collect a user voice, and a controller configured to control the output to output a system response corresponding to the user voice, in which the controller is further configured to control the output such that a voice command guide applicable to a current situation of the electronic apparatus is outputted.	07-10-2014
20140195244	DISPLAY APPARATUS AND METHOD OF CONTROLLING DISPLAY APPARATUS - An electronic apparatus includes: an output; a voice collector configured to collect a voice of a user; a first communicator configured to transmit the voice of the user to a first server and receive text information corresponding to the voice of the user from the first server; a second communicator configured to transmit the received text information to a second server; and a controller configured to, in response to response information corresponding to the text information being received from the second server, control the output to output a system response, differentiated according to an utterance intention included in the voice of the user, based on the response information. The utterance intention relates to a search for content or a recommendation of content.	07-10-2014
20140200896	IMAGE PROCESSING APPARATUS, CONTROL METHOD THEREOF, AND IMAGE PROCESSING SYSTEM - An image processing apparatus includes an image processor; an audio input to input a user's speech; a storage to store at least one simple sentence voice command and an operation corresponding to the simple sentence voice command; a communication device to communicate with a server that analyzes a descriptive sentence voice command and determine an operation corresponding to the descriptive sentence voice command; an audio processor to process a first voice command corresponding to the speech and conduct the operation corresponding to the simple sentence voice command if the first voice command is the simple sentence voice command, and to transmit the first voice command to the communication device if the first voice command is not the simple sentence voice command; and a controller configured to display a first guide image which recommends the simple sentence voice command stored in the storage if the corresponding operation for the first voice command determined by the server is identical to one of the at least one simple sentence voice command stored in the storage.	07-17-2014
20140207464	Systems and Techniques for Producing Spoken Voice Prompts - Methods and systems are described in which spoken voice prompts can be produced in a manner such that they will most likely have the desired effect, for example to indicate empathy, or produce a desired follow-up action from a call recipient. The prompts can be produced with specific optimized speech parameters, including duration, gender of speaker, and pitch, so as to encourage participation and promote comprehension among a wide range of patients or listeners. Upon hearing such voice prompts, patients/listeners can know immediately when they are being asked questions that they are expected to answer, and when they are being given information, as well as the information that considered sensitive.	07-24-2014
20140214427	LANDMARK BASED POSITIONING WITH VERBAL INPUT - Disclosed are systems, apparatus, devices, methods, computer program products, and other implementations, including a method that includes determining at a mobile device whether verbal input from a user is required to determine position of location of the user. The method also includes, in response to a determination that the verbal input from the user is required to determine the position of the location of the user, obtaining at the mobile device verbal description data representative of one or more geographic features viewable by the user from the location of the user, identifying at the mobile device the one or more of the geographic features from the obtained verbal description data, and determining, at the mobile device, positioning information for the location of the user based, at least in part, on the one or more geographic features identified from the verbal description data.	07-31-2014
20140278434	Methods and Apparatus for Message Playback - A system for playback of messages. Context appropriate messages for an environment may be played back. Messages may be user behavior interactive and subject to user behavior initiated message playback conditions. User generated environment events may be automatically analyzed and user behavior interactive messages may be automatically coordinated. An automated themed message playback apparatus may have a self-contained housing within which a stored themed message, an in situ user generated environment event sensor, and an automated themed message playback device are housed. User generated environment events may be automatically sensed in situ.	09-18-2014
20140310000	SPOTTING AND FILTERING MULTIMEDIA - In an aspect, in general, a computer implemented method includes receiving a query phrase, receiving a first data representing a first audio signal including an interaction among a number of speakers and at least one segment of one or more known audio items, receiving a second data comprising temporal locations of the at least one segment of one or more known audio items in the first audio signal, and searching the first data to identify putative instances of the query phrase that are temporally excluded from the temporal locations of the at least one segment of one or more known audio items.	10-16-2014
20140310001	Using Intents to Analyze and Personalize a User's Dialog Experience with a Virtual Personal Assistant - A virtual personal assistant (VPA) application analyzes intents to, among other things, enhance or personalize a user's dialog experience with the VPA application. A set of intents, or multiple sets of intents, are maintained over the course of one or more user-specific dialog sessions with the VPA. Inferences may be derived from the set or sets of intents and incorporated into a current or future dialog session between the VPA and a user of the VPA application. In some embodiments, the inferences are only made available through the systemic understanding of natural language discourse by the VPA.	10-16-2014
20140310002	Providing Virtual Personal Assistance with Multiple VPA Applications - The activities of multiple virtual personal assistant (VPA) applications are coordinated. For example, different portions of a conversational natural language dialog involving a user and a computing device may be handled by different VPAs.	10-16-2014
20140310003	System and Method for Improving Name Dialer Performance - Disclosed herein are systems, methods, and computer readable-media for improving name dialer performance. The method includes receiving a speech query for a name in a directory of names, retrieving matches to the query, if the matches are uniquely spelled homophones or near-homophones, identifying information that is unique to all retrieved matches, and presenting a spoken disambiguation statement to a user that incorporates the identified unique information. Identifying information can include multiple pieces of unique information if necessary to completely disambiguate the matches. A hierarchy can establish priority of multiple pieces of unique information for use in the spoken disambiguation statement.	10-16-2014
20140343946	Storing State Information From Network-Based User Devices - Network-based services may be provided to a user through the user of a speech-based user device located within a user environment. The speech-based user device may accept speech commands from a user and may also interact with the user by means of generated speech. Operating state of the speech-based user device may be provided to the network-based service and stored by the service. Applications that provide services through the speech-based interface may request and obtain the stored state information.	11-20-2014
20140343947	METHODS AND SYSTEMS FOR MANAGING DIALOG OF SPEECH SYSTEMS - Methods and systems are provided for managing speech dialog of a speech system. In one embodiment, a method includes: receiving at least one first utterance from a user of the speech system; determining a user interaction style based on the at least one first utterance; and generating feedback to the user based on the interaction style.	11-20-2014
20140343948	SYSTEM AND METHOD FOR PROVIDING NETWORK COORDINATED CONVERSATIONAL SERVICES - A system and method for providing automatic and coordinated sharing of conversational resources, e.g., functions and arguments, between network-connected servers and devices and their corresponding applications. In one aspect, a system for providing automatic and coordinated sharing of conversational resources includes a network having a first and second network device, the first and second network device each comprising a set of conversational resources, a dialog manager for managing a conversation and executing calls requesting a conversational service, and a communication stack for communicating messages over the network using conversational protocols, wherein the conversational protocols establish coordinated network communication between the dialog managers of the first and second network device to automatically share the set of conversational resources of the first and second network device, when necessary, to perform their respective requested conversational service.	11-20-2014
20140358550	APPARATUS AND METHOD FOR PROVIDING AUGMENTED REALITY SERVICE USING SOUND - A method and an apparatus are provided for providing additional information service in a mobile communication terminal. A microphone receives a sound signal including an audible frequency band and an inaudible frequency band. Additional information related to the service is detected in the inaudible frequency band included in the sound signal. The detected additional information is extracted from the sound signal. Data for providing the service is acquired based on the extracted additional information. A service screen is displayed based on the acquired data.	12-04-2014
20140365223	Virtual Assistant Conversations - A virtual assistant may communicate with a user in a natural language that simulates a human. The virtual assistant may be associated with a human-configured knowledge base that simulates human responses. In some instances, a parent response may be provided by the virtual assistant and, thereafter, a child response that is associated with the parent response may be provided.	12-11-2014
20140372125	DATA COMMUNICATION NETWORK FOR PROCESSING DATA TRANSACTION - A data transaction processing system in which transaction data is entered by the user in response to prompts in a template which is tailored to each user application. The template and entered data are accumulated into data transactions which are immediately transmitted upon completion to an external database server for processing and storage. The data transactions are not locally stored for processing, and no conventional operating system is necessary. No local processing needs to be provided, and the only local storage is flash PROM which stored the control firmware, a flash memory which stores the data streams making up the forms and menus, and a small RAM which operates as an input/output transaction buffer for storing the data streams of the template and the user replies to the prompts during assembly of a data transaction. The data transaction is received via standard protocols at a database server which, depending upon the application, stores the entire data transaction, explodes the data transaction to produce ancillary records which are then stored, and/or forwards the data transaction or some or all of the ancillary records to other database servers for updating other databases associated with those database servers. Also, in response to requests from the transaction entry device, the database server may return data streams for use in completing the fields in the data transaction or in presenting a menu on the display which was read in from the database server or a remote phone mail system. The transaction entry device is integrated with a telephone and is accessed via a touch screen, an optional keyboard, a magnetic card reader, voice entry, a modem, and the like.	12-18-2014
20140372126	PRIVACY MODE FOR ALWAYS-ON VOICE-ACTIVATED INFORMATION ASSISTANT - A user device and method discriminately provides audible responses to a voice command received by a user device that supports voice activation. The method includes detecting a first pre-established, audible activation command that activates the user device. In response to detecting the first pre-established, audible activation command, the method includes producing a first audible acknowledgement within loudspeaker proximity of the user device and then monitoring for detection of at least one second, audible acknowledgement produced by another user device within a pre-set time interval, which detection would indicate that the other user device is also responding. The method includes processing and responding to a received audible command in response to not detecting. However, in response to detecting, the method includes triggering entry into a privacy mode of audible command input and producing a privacy mode announcement via at least one of a display and a sound producing component.	12-18-2014
20150012278	SYSTEM AND METHOD FOR TRANSFERRING DATA TO A CUSTOMER RELATIONSHIP MANAGEMENT PLATFORM - A system and method for transferring data to a customer relationship management platform. A system for transferring data to a customer relationship management platform includes a server computer which is adapted to connect to a network and further includes an application residing on the server. The application may be configured to include the customer relationship platform. The application may also be configured to communicate with a communication device, such as a mobile phone, for example, through the network. The application may detect a voice command from a user using the communication device, then connect the user to the customer relationship management platform. The application may then take an action with the customer relationship management platform based on the voice command, such as making a sales call, populating the customer relationship management platform database with sales data or prospective customer data, or taking some administrative action.	01-08-2015
20150019228	AUTOMATED CONFIRMATION AND DISAMBIGUATION MODULES IN VOICE APPLICATIONS - A method for providing a voice application includes executing control flow logic modeling a dialog flow with a user via a voice browser. The control flow logic produces a disambiguation requirement. A disambiguation module is initiated and a set of at least two candidates and partitioning criteria is sent from the control flow logic to the module. Attributes of the candidates are analyzed to determine a partitioning score for each attribute indicative of ability to distinguish between candidates based on the partitioning criteria. The attributes are sorted based on the partitioning scores. The user is queried based on a top-sorted attribute and results of the query are used to reduce the set of candidates. The steps of analyzing, sorting, and querying are repeated until the set of candidates is reduced to a single candidate. The single candidate is returned to the control flow logic for continued execution.	01-15-2015
20150046167	SYSTEM AND METHOD FOR FUNNELING USER RESPONSES INAN INTERNET VOICE PORTAL SYSTEM TO DETERMINE ADESIRED ITEM OR SERVICEBACKGROUND OF THE INVENTION - A method of funneling user responses in a voice portal system to determine a desired item or service includes (a) querying a user for an attribute value associated with a first particular attribute of the desired item or service; and (b) determining if the attribute value given by the user satisfies an end state. If the end state is not satisfied, steps (a) and (b) are performed with a new particular attribute.	02-12-2015
20150120304	SPEAKING CONTROL METHOD, SERVER, SPEAKING DEVICE, SPEAKING SYSTEM, AND STORAGE MEDIUM - A speaking control method including a switching step of switching between answer options for an answer to a user in a case where a sound level of target audio data falls within a first predetermined sound-level range, the answer options being associated with a case where audio data content indicated by the target audio data is recognized and a case where the audio data content is not recognized, respectively.	04-30-2015
20150348565	DETERMINING DOMAIN SALIENCE RANKING FROM AMBIGUOUS WORDS IN NATURAL SPEECH - Systems and processes for identifying relevant domains for user inputs that include one or more ambiguous words are disclosed. The ambiguous words include words that may or may not refer to a named entity, such as a song, movie, book, etc. In one example, a textual representation of user speech can be received and processed to identify a candidate named entity. The possible parts of speech of the candidate named entity can be determined and compared to a predetermined set of parts of speech. In response to determining that the possible parts of speech of the candidate named entity do not include one or more of the predetermined set of parts of speech, a saliency score associated with the candidate named entity can be lowered. A domain for processing the textual representation of user speech can then be identified using the saliency score associated with the candidate named entity.	12-03-2015
20150364138	COMPUTER-GENERATED SPEECH DEVICE FOR SITE SURVEY AND MAINTENANCE - Computer-generated speech devices for site survey and maintenance, and methods of using the same are described herein. One computer-generated speech device includes a location engine to determine a location of the computer-generated speech device at a site, a solution engine to identify an action to perform associated with one of a plurality of nodes at the site using the location of the computer-generated speech device and input data associated with the plurality of nodes, and a speech engine to broadcast the identified action as computer-generated speech using a speaker component of the computer-generated speech device for a user to perform during a survey and/or maintenance of the site.	12-17-2015
20150371663	PERSONALITY-BASED INTELLIGENT PERSONAL ASSISTANT SYSTEM AND METHODS - The methods, apparatus, and systems described herein assist a user with a request. The methods include receiving at least one input from a user, entering the at least one input into an algorithm trained to output a personality type of the user, and tailoring an output based on the personality type.	12-24-2015
20150371664	REMOTE INVOCATION OF MOBILE DEVICE ACTIONS - Systems, methods and apparatus for invoking actions at a second user device from a first user device. A method includes determining that a first user device has an associated second user device; accessing specification data that specifies a set of user device actions that the second user device is configured to perform; receiving command inputs for the first user device; for each command input, determining whether the command input resolves to one of the user device actions; for each command input not determined to resolve to any of the user device actions, causing the command input to be processed at the first user device; and for each command input determined to resolve one of the user device actions causing the first user device to display in a user interface a dialog by which a user may either accept or deny invoking the user device action at the second user device.	12-24-2015
20160042737	VOICE ASSISTANT SYSTEM - Methods and apparatuses to assist a user in the performance of a plurality of tasks are provided. The invention includes storing at least one care plan for a resident, the care plan defining a plurality of tasks to be performed for providing care to the resident. The method includes capturing speech inputs from the user and providing speech outputs to the user to provide a speech dialog with the user reflective of the care plan. Information is captured with a contactless communication interface and is used for engaging the care plan.	02-11-2016
20160042749	SOUND OUTPUT DEVICE, NETWORK SYSTEM, AND SOUND OUTPUT METHOD - Provided herein is a sound output device	02-11-2016
20160071518	Service Oriented Speech Recognition for In-Vehicle Automated Interaction and In-Vehicle User Interfaces Requiring Minimal Cognitive Driver Processing for Same - A system and method for implementing a server-based speech recognition system for multimodal automated interaction in a vehicle includes receiving, by a vehicle driver, audio prompts by an on-board human-to-machine interface and a response with speech to complete tasks such as creating and sending text messages, web browsing, navigation, etc. This service-oriented architecture is utilized to call upon specialized speech recognizers in an adaptive fashion. The human-to-machine interface enables completion of a text input task while driving a vehicle in a way that minimizes the frequency of the driver's visual and mechanical interactions with the interface, thereby eliminating unsafe distractions during driving conditions. After the initial prompting, the typing task is followed by a computerized verbalization of the text. Subsequent interface steps can be visual in nature, or involve only sound.	03-10-2016
20160098521	Data Encoding and Retrieval System and Method - System platform, software and hardware equipment and components, and methodologies are provided for generating, organizing, storing and retrieving medical records using voice recognition in combination with unique codes assigned to data elements, and include microprocessor and memory, such as non-transient computer readable medium, having stored thereon a database including vocabulary terms. Speech recognition interface receives spoken language. Display generates an output according to vocabulary terms uniquely associated with the spoken language. Data stored in the database can include records organized into specific modules having specified vocabulary terms synced with each module and unique computer code to key vocabulary terms in the database. Using an associated unique code can cause specific data field to open on display when recognizing specific spoken word or phrase by the speech recognition interface.	04-07-2016
20160163312	DISAMBIGUATING HETERONYMS IN SPEECH SYNTHESIS - Systems and processes for disambiguating heteronyms in speech synthesis are provided. In one example process, a speech input containing a heteronym can be received from a user. The speech input can be processed using an automatic speech recognition system to determine a phonemic string corresponding to the heteronym as pronounced by the user in the speech input. A correct pronunciation of the heteronym can be determined based on at least one of the phonemic string or using an n-gram language model of the automatic speech recognition system. A dialogue response to the speech input can be generated where the dialogue response can include the heteronym. The dialogue response can be outputted as a speech output. The heteronym in the dialogue response can be pronounced in the speech output according to the correct pronunciation.	06-09-2016
20160179752	USING VOICE-BASED WEB NAVIGATION TO CONSERVE CELLULAR DATA	06-23-2016
20160180847	MOBILE TERMINAL PHOTOGRAPHING CONTROL METHOD AND SYSTEM BASED ON SMART WEARABLE DEVICE	06-23-2016
20160189714	METHOD AND APPARATUS FOR VOICE CONTROL OF A MOBILE DEVICE - A method and apparatus for voice control of a mobile device are provided. The method establishes a connection between the mobile device and a voice-control module. Responsive to establishing the connection, the mobile device enters into an intermediate mode; and the voice-control module monitors for verbal input comprising a verbal command from among a set of predetermined verbal commands. The voice-control module sends instructions to the mobile device related to the verbal command received; and the mobile device acts on the received instructions. An apparatus/voice control module (VCM) for voice control of a mobile device, wherein the VCM includes a connection module configured for establishing a connection between the VCM and the mobile device; a monitoring module configured for monitoring for a verbal command from among a set of predetermined verbal commands; and a communications module configured for sending instructions to the mobile device related to the verbal command received.	06-30-2016
20160379643	Group Status Determining Device and Group Status Determining Method - A group status determining device determining a status of a group made up of a plurality of speakers engaged in a conversation includes: a storage that stores determination criteria, based on conversation situational data with respect to a plurality of group types; and a processor configured to operate as: an acquisition module that acquires conversation situational data, which is data regarding a series of groups of utterances made by a plurality of speakers and estimated to be on a same conversation theme; and a determination module that acquires a type of the group made up of the plurality of speakers, based on the conversation situational data and the determination criteria as a group status of the group made up of the plurality of speakers.	12-29-2016
20180025724	NATURAL LANGUAGE VOICE ASSISTANT	01-25-2018

Patent applications in class Speech assisted network

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Speech assisted network

Subclass of:

704 - Data processing: speech signal processing, linguistics, language translation, and audio compression/decompression

704200000 - SPEECH SIGNAL PROCESSING

704270000 - Application

Patent class list (only not empty are listed)

Deeper subclasses: