Patent application number | Description | Published |
20110178798 | ADAPTIVE AMBIENT SOUND SUPPRESSION AND SPEECH TRACKING - A device for suppressing ambient sounds from speech received by a microphone array is provided. One embodiment of the device comprises a microphone array, a processor, an analog-to-digital converter, and memory comprising instructions stored therein that are executable by the processor. The instructions stored in the memory are configured to receive a plurality of digital sound signals, each digital sound signal based on an analog sound signal originating at the microphone array, receive a multi-channel speaker signal, generate a monophonic approximation signal of the multi-channel speaker signal, apply a linear acoustic echo canceller to suppress a first ambient sound portion of each digital sound signal, generate a combined directionally-adaptive sound signal from a combination of each digital sound signal by a combination of time-invariant and adaptive beamforming techniques, and apply one or more nonlinear noise suppression techniques to suppress a second ambient sound portion of the combined directionally-adaptive sound signal. | 07-21-2011 |
20110184735 | SPEECH RECOGNITION ANALYSIS VIA IDENTIFICATION INFORMATION - Embodiments are disclosed that relate to the use of identity information to help avoid the occurrence of false positive speech recognition events in a speech recognition system. One embodiment provides a method comprising receiving speech recognition data comprising a recognized speech segment, acoustic locational data related to a location of origin of the recognized speech segment as determined via signals from the microphone array, and confidence data comprising a recognition confidence value, and also receiving image data comprising visual locational information related to a location of each person in an image. The acoustic locational data is compared to the visual locational data to determine whether the recognized speech segment originated from a person in the field of view of the image sensor, and the confidence data is adjusted depending on this determination. | 07-28-2011 |
20120092328 | FUSING VIRTUAL CONTENT INTO REAL CONTENT - A system that includes a head mounted display device and a processing unit connected to the head mounted display device is used to fuse virtual content into real content. In one embodiment, the processing unit is in communication with a hub computing device. The system creates a volumetric model of a space, segments the model into objects, identifies one or more of the objects including a first object, and displays a virtual image over the first object on a display (of the head mounted display) that allows actual direct viewing of at least a portion of the space through the display. | 04-19-2012 |
20120093320 | SYSTEM AND METHOD FOR HIGH-PRECISION 3-DIMENSIONAL AUDIO FOR AUGMENTED REALITY - Techniques are provided for providing 3D audio, which may be used in augmented reality. A 3D audio signal may be generated based on sensor data collected from the actual room in which the listener is located and the actual position of the listener in the room. The 3D audio signal may include a number of components that are determined based on the collected sensor data and the listener's location. For example, a number of (virtual) sound paths between a virtual sound source and the listener may be determined The sensor data may be used to estimate materials in the room, such that the affect that those materials would have on sound as it travels along the paths can be determined In some embodiments, sensor data may be used to collect physical characteristics of the listener such that a suitable HRTF may be determined from a library of HRTFs. | 04-19-2012 |
20120163520 | SYNCHRONIZING SENSOR DATA ACROSS DEVICES - Techniques are provided for synchronization of sensor signals between devices. One or more of the devices may collect sensor data. The device may create a sensor signal from the sensor data, which it may make available to other devices upon a publisher/subscriber model. The other devices may subscribe to sensor signals they choose. A device could be a provider or a consumer of the sensor signals. A device may have a layer of code between an operating system and software applications that processes the data for the applications. The processing may include such actions as synchronizing the data in a sensor signal to a local time clock, predicting future values for data in a sensor signal, and providing data samples for a sensor signal at a frequency that an application requests, among other actions. | 06-28-2012 |
20120165964 | INTERACTIVE CONTENT CREATION - An audio/visual system (e.g., such as an entertainment console or other computing device) plays a base audio track, such as a portion of a pre-recorded song or notes from one or more instruments. Using a depth camera or other sensor, the system automatically detects that a user (or a portion of the user) enters a first collision volume of a plurality of collision volumes. Each collision volume of the plurality of collision volumes is associated with a different audio stem. In one example, an audio stem is a sound from a subset of instruments playing a song, a portion of a vocal track for a song, or notes from one or more instruments. In response to automatically detecting that the user (or a portion of the user) entered the first collision volume, the appropriate audio stem associated with the first collision volume is added to the base audio track or removed from the base audio track. | 06-28-2012 |
20120245933 | ADAPTIVE AMBIENT SOUND SUPPRESSION AND SPEECH TRACKING - A device for suppressing ambient sounds from speech received by a microphone array is provided. One embodiment of the device comprises a microphone array, a processor, an analog-to-digital converter, and memory comprising instructions stored therein that are executable by the processor. The instructions stored in the memory are configured to receive a plurality of digital sound signals, each digital sound signal based on an analog sound signal originating at the microphone array, receive a multi-channel speaker signal, generate a monophonic approximation signal of the multi-channel speaker signal, apply a linear acoustic echo canceller to suppress a first ambient sound portion of each digital sound signal, generate a combined directionally-adaptive sound signal from a combination of each digital sound signal by a combination of time-invariant and adaptive beamforming techniques, and apply one or more nonlinear noise suppression techniques to suppress a second ambient sound portion of the combined directionally-adaptive sound signal. | 09-27-2012 |
20120306850 | DISTRIBUTED ASYNCHRONOUS LOCALIZATION AND MAPPING FOR AUGMENTED REALITY - A system and method for providing an augmented reality environment in which the environmental mapping process is decoupled from the localization processes performed by one or more mobile devices is described. In some embodiments, an augmented reality system includes a mapping system with independent sensing devices for mapping a particular real-world environment and one or more mobile devices. Each of the one or more mobile devices utilizes a separate asynchronous computing pipeline for localizing the mobile device and rendering virtual objects from a point of view of the mobile device. This distributed approach provides an efficient way for supporting mapping and localization processes for a large number of mobile devices, which are typically constrained by form factor and battery life limitations. | 12-06-2012 |
20130169626 | DISTRIBUTED ASYNCHRONOUS LOCALIZATION AND MAPPING FOR AUGMENTED REALITY - A system and method for providing an augmented reality environment in which the environmental mapping process is decoupled from the localization processes performed by one or more mobile devices is described. In some embodiments, an augmented reality system includes a mapping system with independent sensing devices for mapping a particular real-world environment and one or more mobile devices. Each of the one or more mobile devices utilizes a separate asynchronous computing pipeline for localizing the mobile device and rendering virtual objects from a point of view of the mobile device. This distributed approach provides an efficient way for supporting mapping and localization processes for a large number of mobile devices, which are typically constrained by form factor and battery life limitations. | 07-04-2013 |
20130208897 | SKELETAL MODELING FOR WORLD SPACE OBJECT SOUNDS - A method for providing three-dimensional audio includes determining a world space object position and a world space ear position of a human subject based on a modeled virtual skeleton. The method further includes providing three-dimensional audio output to the human subject via an acoustic transducer array including one or more acoustic transducers. The three-dimensional audio output is configured such that sounds appear to originate from the object. | 08-15-2013 |
20130208898 | THREE-DIMENSIONAL AUDIO SWEET SPOT FEEDBACK - A method for providing three-dimensional audio is provided. The method includes receiving a depth map imaging a scene from a depth camera and recognizing a human subject present in the scene. The human subject is modeled with a virtual skeleton comprising a plurality of joints defined with a three-dimensional position. A world space ear position of the human subject is determined based on the virtual skeleton. Furthermore, a target world space ear position of the human subject is determined. The target world space ear position is the world space position where a desired audio effect can be produced via an acoustic transducer array. The method further includes outputting a notification representing a spatial relationship between the world space ear position and the target world space ear position. | 08-15-2013 |
20130208899 | SKELETAL MODELING FOR POSITIONING VIRTUAL OBJECT SOUNDS - Providing three-dimensional audio includes determining a world space ear position of a human subject based on a modeled virtual skeleton. A world space sound source position is determined such that a spatial relationship between the world space sound source position and the world space ear position models a spatial relationship between a virtual space sound source position of a virtual space sound source and a virtual space listening position. Three-dimensional audio is output to the human subject via an acoustic transducer array including one or more acoustic transducers. The three-dimensional audio output is configured such that at the world space ear position a sound provided by a particular virtual space sound source appears to originate from a corresponding world space sound source position | 08-15-2013 |
20130208900 | DEPTH CAMERA WITH INTEGRATED THREE-DIMENSIONAL AUDIO - A three-dimensional audio system includes a depth camera and one or more acoustic transducers in the same housing. Further, the same housing also houses logic for determining a world space ear position of a human subject observed by the depth camera. The logic also determines one or more audio-output transformations based on the world space ear position. The one or more audio-output transformations are configured to produce a three-dimensional audio output configured to provide a desired audio effect at the world space ear position. | 08-15-2013 |
20130208926 | SURROUND SOUND SIMULATION WITH VIRTUAL SKELETON MODELING - A method for providing three-dimensional audio includes determining a world space ear position of a human subject based on a modeled virtual skeleton. The method further includes providing three-dimensional audio output to the human subject via an acoustic transducer array including one or more acoustic transducers. The three-dimensional audio output is configured such that channel-specific sounds appear to originate from corresponding simulated world speaker positions. | 08-15-2013 |
20140270114 | SYSTEM AND METHOD FOR ANALYZING AND CLASSIFYING CALLS WITHOUT TRANSCRIPTION - A facility and method for analyzing and classifying calls without transcription. The facility analyzes individual frames of an audio to identify speech and measure the amount of time spent in speech for each channel (e.g., caller channel, agent channel). Additional telephony metrics such as R-factor or MOS score and other metadata may be factored in as audio analysis inputs. The facility then analyzes the frames together as a whole and formulates a clustered-frame representation of a conversation to further identify dialogue patterns and characterize call classification. Based on the data in the clustered-frame representation, the facility is able to make estimations of call classification. The correlation of dialogue patterns to call classification may be utilized to develop targeted solutions for call classification issues, target certain advertising channels over others, evaluate advertising placements at scale, score callers, and to identify spammers. | 09-18-2014 |
Patent application number | Description | Published |
20080320027 | Strongly typed tags - In one or more embodiments, a tag is provided and includes a property that associates a strongly typed variable with the tag. Strongly typed variables can include any suitable types. For example, in at least some embodiments, the strongly typed variable is a people type that allows the tag to be associated with an individual person or group of people by virtue of a unique identification that is associated with the person or group. Strongly typed tags can then serve as a foundation upon which various other types of information and services can be provided to enhance the user experience. | 12-25-2008 |
20110131254 | STRONGLY TYPED TAGS - In one or more embodiments, a tag is provided and includes a property that associates a strongly typed variable with the tag. Strongly typed variables can include any suitable types. For example, in at least some embodiments, the strongly typed variable is a people type that allows the tag to be associated with an individual person or group of people by virtue of a unique identification that is associated with the person or group. Strongly typed tags can then serve as a foundation upon which various other types of information and services can be provided to enhance the user experience. | 06-02-2011 |
20120271632 | Speaker Identification - Speaker identification techniques are described. In one or more implementations, sample data is received at a computing device of one or more user utterances captured using a microphone. The sample data is processed by the computing device to identify a speaker of the one or more user utterances. The processing involving use of a feature set that includes features obtained using a filterbank having filters that space linearly at higher frequencies and logarithmically at lower frequencies, respectively, features that model the speaker's vocal tract transfer function, and features that indicate a vibration rate of vocal folds of the speaker of the sample data. | 10-25-2012 |
Patent application number | Description | Published |
20110313768 | COMPOUND GESTURE-SPEECH COMMANDS - A multimedia entertainment system combines both gestures and voice commands to provide an enhanced control scheme. A user's body position or motion may be recognized as a gesture, and may be used to provide context to recognize user generated sounds, such as speech input. Likewise, speech input may be recognized as a voice command, and may be used to provide context to recognize a body position or motion as a gesture. Weights may be assigned to the inputs to facilitate processing. When a gesture is recognized, a limited set of voice commands associated with the recognized gesture are loaded for use. Further, additional sets of voice commands may be structured in a hierarchical manner such that speaking a voice command from one set of voice commands leads to the system loading a next set of voice commands. | 12-22-2011 |
20120120218 | SEMI-PRIVATE COMMUNICATION IN OPEN ENVIRONMENTS - A system and method providing semi-private conversation using an area microphone between one local user in a group of local users and a remote user. The local and remote users may be in different physical environments, using devices coupled by a network. A conversational relationship is defined between a local user and a remote user. The local user's voice is isolated from other voices in the environment, and transmitted to the remote user. Directional output technology may be used to direct the local user's utterances to the remote user in the remote environment. | 05-17-2012 |
20130027296 | COMPOUND GESTURE-SPEECH COMMANDS - A multimedia entertainment system combines both gestures and voice commands to provide an enhanced control scheme. A user's body position or motion may be recognized as a gesture, and may be used to provide context to recognize user generated sounds, such as speech input. Likewise, speech input may be recognized as a voice command, and may be used to provide context to recognize a body position or motion as a gesture. Weights may be assigned to the inputs to facilitate processing. When a gesture is recognized, a limited set of voice commands associated with the recognized gesture are loaded for use. Further, additional sets of voice commands may be structured in a hierarchical manner such that speaking a voice command from one set of voice commands leads to the system loading a next set of voice commands. | 01-31-2013 |
Patent application number | Description | Published |
20080313703 | Integrating Security by Obscurity with Access Control Lists - Aspects of the subject matter described herein relate to providing and restricting access to content. In aspects, information (e.g., a URL) that identifies content and a user is provided to a user. In conjunction with providing the information to a user, a data structure (e.g., an access control list) is updated to indicate that the user has access to the content. The user may use the information to access the content and/or may send this information to other users. The other users may use the information (e.g., by pasting it into a browser) to access the content and may be added to the data structure so that they may subsequently access the content without the use of the information. Access to the content via using the information may be subsequently revoked. | 12-18-2008 |
20110247083 | INTEGRATING SECURITY BY OBSCURITY WITH ACCESS CONTROL LISTS - Aspects of the subject matter described herein relate to providing and restricting access to content. In aspects, information (e.g., a URL) that identifies content and a user is provided to a user. In conjunction with providing the information to a user, a data structure (e.g., an access control list) is updated to indicate that the user has access to the content. The user may use the information to access the content and/or may send this information to other users. The other users may use the information (e.g., by pasting it into a browser) to access the content and may be added to the data structure so that they may subsequently access the content without the use of the information. Access to the content via using the information may be subsequently revoked. | 10-06-2011 |
20110307251 | Sound Source Separation Using Spatial Filtering and Regularization Phases - Described is a multiple phase process/system that combines spatial filtering with regularization to separate sound from different sources such as the speech of two different speakers. In a first phase, frequency domain signals corresponding to the sensed sounds are processed into separated spatially filtered signals including by inputting the signals into a plurality of beamformers (which may include nullformers) followed by nonlinear spatial filters. In a regularization phase, the separated spatially filtered signals are input into an independent component analysis mechanism that is configured with multi-tap filters, followed by secondary nonlinear spatial filters. Separated audio signals are the provided via an inverse-transform. | 12-15-2011 |