Patent application number | Description | Published |
20110178798 | ADAPTIVE AMBIENT SOUND SUPPRESSION AND SPEECH TRACKING - A device for suppressing ambient sounds from speech received by a microphone array is provided. One embodiment of the device comprises a microphone array, a processor, an analog-to-digital converter, and memory comprising instructions stored therein that are executable by the processor. The instructions stored in the memory are configured to receive a plurality of digital sound signals, each digital sound signal based on an analog sound signal originating at the microphone array, receive a multi-channel speaker signal, generate a monophonic approximation signal of the multi-channel speaker signal, apply a linear acoustic echo canceller to suppress a first ambient sound portion of each digital sound signal, generate a combined directionally-adaptive sound signal from a combination of each digital sound signal by a combination of time-invariant and adaptive beamforming techniques, and apply one or more nonlinear noise suppression techniques to suppress a second ambient sound portion of the combined directionally-adaptive sound signal. | 07-21-2011 |
20110184735 | SPEECH RECOGNITION ANALYSIS VIA IDENTIFICATION INFORMATION - Embodiments are disclosed that relate to the use of identity information to help avoid the occurrence of false positive speech recognition events in a speech recognition system. One embodiment provides a method comprising receiving speech recognition data comprising a recognized speech segment, acoustic locational data related to a location of origin of the recognized speech segment as determined via signals from the microphone array, and confidence data comprising a recognition confidence value, and also receiving image data comprising visual locational information related to a location of each person in an image. The acoustic locational data is compared to the visual locational data to determine whether the recognized speech segment originated from a person in the field of view of the image sensor, and the confidence data is adjusted depending on this determination. | 07-28-2011 |
20120092328 | FUSING VIRTUAL CONTENT INTO REAL CONTENT - A system that includes a head mounted display device and a processing unit connected to the head mounted display device is used to fuse virtual content into real content. In one embodiment, the processing unit is in communication with a hub computing device. The system creates a volumetric model of a space, segments the model into objects, identifies one or more of the objects including a first object, and displays a virtual image over the first object on a display (of the head mounted display) that allows actual direct viewing of at least a portion of the space through the display. | 04-19-2012 |
20120093320 | SYSTEM AND METHOD FOR HIGH-PRECISION 3-DIMENSIONAL AUDIO FOR AUGMENTED REALITY - Techniques are provided for providing 3D audio, which may be used in augmented reality. A 3D audio signal may be generated based on sensor data collected from the actual room in which the listener is located and the actual position of the listener in the room. The 3D audio signal may include a number of components that are determined based on the collected sensor data and the listener's location. For example, a number of (virtual) sound paths between a virtual sound source and the listener may be determined The sensor data may be used to estimate materials in the room, such that the affect that those materials would have on sound as it travels along the paths can be determined In some embodiments, sensor data may be used to collect physical characteristics of the listener such that a suitable HRTF may be determined from a library of HRTFs. | 04-19-2012 |
20120163520 | SYNCHRONIZING SENSOR DATA ACROSS DEVICES - Techniques are provided for synchronization of sensor signals between devices. One or more of the devices may collect sensor data. The device may create a sensor signal from the sensor data, which it may make available to other devices upon a publisher/subscriber model. The other devices may subscribe to sensor signals they choose. A device could be a provider or a consumer of the sensor signals. A device may have a layer of code between an operating system and software applications that processes the data for the applications. The processing may include such actions as synchronizing the data in a sensor signal to a local time clock, predicting future values for data in a sensor signal, and providing data samples for a sensor signal at a frequency that an application requests, among other actions. | 06-28-2012 |
20120165964 | INTERACTIVE CONTENT CREATION - An audio/visual system (e.g., such as an entertainment console or other computing device) plays a base audio track, such as a portion of a pre-recorded song or notes from one or more instruments. Using a depth camera or other sensor, the system automatically detects that a user (or a portion of the user) enters a first collision volume of a plurality of collision volumes. Each collision volume of the plurality of collision volumes is associated with a different audio stem. In one example, an audio stem is a sound from a subset of instruments playing a song, a portion of a vocal track for a song, or notes from one or more instruments. In response to automatically detecting that the user (or a portion of the user) entered the first collision volume, the appropriate audio stem associated with the first collision volume is added to the base audio track or removed from the base audio track. | 06-28-2012 |
20120245933 | ADAPTIVE AMBIENT SOUND SUPPRESSION AND SPEECH TRACKING - A device for suppressing ambient sounds from speech received by a microphone array is provided. One embodiment of the device comprises a microphone array, a processor, an analog-to-digital converter, and memory comprising instructions stored therein that are executable by the processor. The instructions stored in the memory are configured to receive a plurality of digital sound signals, each digital sound signal based on an analog sound signal originating at the microphone array, receive a multi-channel speaker signal, generate a monophonic approximation signal of the multi-channel speaker signal, apply a linear acoustic echo canceller to suppress a first ambient sound portion of each digital sound signal, generate a combined directionally-adaptive sound signal from a combination of each digital sound signal by a combination of time-invariant and adaptive beamforming techniques, and apply one or more nonlinear noise suppression techniques to suppress a second ambient sound portion of the combined directionally-adaptive sound signal. | 09-27-2012 |
20120306850 | DISTRIBUTED ASYNCHRONOUS LOCALIZATION AND MAPPING FOR AUGMENTED REALITY - A system and method for providing an augmented reality environment in which the environmental mapping process is decoupled from the localization processes performed by one or more mobile devices is described. In some embodiments, an augmented reality system includes a mapping system with independent sensing devices for mapping a particular real-world environment and one or more mobile devices. Each of the one or more mobile devices utilizes a separate asynchronous computing pipeline for localizing the mobile device and rendering virtual objects from a point of view of the mobile device. This distributed approach provides an efficient way for supporting mapping and localization processes for a large number of mobile devices, which are typically constrained by form factor and battery life limitations. | 12-06-2012 |
20130169626 | DISTRIBUTED ASYNCHRONOUS LOCALIZATION AND MAPPING FOR AUGMENTED REALITY - A system and method for providing an augmented reality environment in which the environmental mapping process is decoupled from the localization processes performed by one or more mobile devices is described. In some embodiments, an augmented reality system includes a mapping system with independent sensing devices for mapping a particular real-world environment and one or more mobile devices. Each of the one or more mobile devices utilizes a separate asynchronous computing pipeline for localizing the mobile device and rendering virtual objects from a point of view of the mobile device. This distributed approach provides an efficient way for supporting mapping and localization processes for a large number of mobile devices, which are typically constrained by form factor and battery life limitations. | 07-04-2013 |
20130208897 | SKELETAL MODELING FOR WORLD SPACE OBJECT SOUNDS - A method for providing three-dimensional audio includes determining a world space object position and a world space ear position of a human subject based on a modeled virtual skeleton. The method further includes providing three-dimensional audio output to the human subject via an acoustic transducer array including one or more acoustic transducers. The three-dimensional audio output is configured such that sounds appear to originate from the object. | 08-15-2013 |
20130208898 | THREE-DIMENSIONAL AUDIO SWEET SPOT FEEDBACK - A method for providing three-dimensional audio is provided. The method includes receiving a depth map imaging a scene from a depth camera and recognizing a human subject present in the scene. The human subject is modeled with a virtual skeleton comprising a plurality of joints defined with a three-dimensional position. A world space ear position of the human subject is determined based on the virtual skeleton. Furthermore, a target world space ear position of the human subject is determined. The target world space ear position is the world space position where a desired audio effect can be produced via an acoustic transducer array. The method further includes outputting a notification representing a spatial relationship between the world space ear position and the target world space ear position. | 08-15-2013 |
20130208899 | SKELETAL MODELING FOR POSITIONING VIRTUAL OBJECT SOUNDS - Providing three-dimensional audio includes determining a world space ear position of a human subject based on a modeled virtual skeleton. A world space sound source position is determined such that a spatial relationship between the world space sound source position and the world space ear position models a spatial relationship between a virtual space sound source position of a virtual space sound source and a virtual space listening position. Three-dimensional audio is output to the human subject via an acoustic transducer array including one or more acoustic transducers. The three-dimensional audio output is configured such that at the world space ear position a sound provided by a particular virtual space sound source appears to originate from a corresponding world space sound source position | 08-15-2013 |
20130208900 | DEPTH CAMERA WITH INTEGRATED THREE-DIMENSIONAL AUDIO - A three-dimensional audio system includes a depth camera and one or more acoustic transducers in the same housing. Further, the same housing also houses logic for determining a world space ear position of a human subject observed by the depth camera. The logic also determines one or more audio-output transformations based on the world space ear position. The one or more audio-output transformations are configured to produce a three-dimensional audio output configured to provide a desired audio effect at the world space ear position. | 08-15-2013 |
20130208926 | SURROUND SOUND SIMULATION WITH VIRTUAL SKELETON MODELING - A method for providing three-dimensional audio includes determining a world space ear position of a human subject based on a modeled virtual skeleton. The method further includes providing three-dimensional audio output to the human subject via an acoustic transducer array including one or more acoustic transducers. The three-dimensional audio output is configured such that channel-specific sounds appear to originate from corresponding simulated world speaker positions. | 08-15-2013 |
20140270114 | SYSTEM AND METHOD FOR ANALYZING AND CLASSIFYING CALLS WITHOUT TRANSCRIPTION - A facility and method for analyzing and classifying calls without transcription. The facility analyzes individual frames of an audio to identify speech and measure the amount of time spent in speech for each channel (e.g., caller channel, agent channel). Additional telephony metrics such as R-factor or MOS score and other metadata may be factored in as audio analysis inputs. The facility then analyzes the frames together as a whole and formulates a clustered-frame representation of a conversation to further identify dialogue patterns and characterize call classification. Based on the data in the clustered-frame representation, the facility is able to make estimations of call classification. The correlation of dialogue patterns to call classification may be utilized to develop targeted solutions for call classification issues, target certain advertising channels over others, evaluate advertising placements at scale, score callers, and to identify spammers. | 09-18-2014 |