Patent application number | Description | Published |
20080209327 | Persistent spatial collaboration - Persistent, spatial collaboration on the web supports a free-form, user-intuitive approach to a variety of projects and activities. Users can place differing object types at any time any where on a web page and/or the system can automatically, and with no user effort, affect object placement based on one or more meta data characteristics. A user can, in real-time, see changes made by another user to a web page, and, if desired, react accordingly, enabling true collaboration even if the various users are at remote locations. The flexibility of the methodology and system provides a platform for users to engage in projects and activities in a manner and environment suited to the users' mind sets, creativity, and natural proclivities. | 08-28-2008 |
20080215318 | EVENT RECOGNITION - Recognition of events can be performed by accessing an audio signal having static and dynamic features. A value for the audio signal can be calculated by utilizing different weights for the static and dynamic features such that a frame of the audio signal can be associated with a particular event. A filter can also be used to aid in determining the event for the frame. | 09-04-2008 |
20080234842 | MICROPHONES AS CONTACT SENSORS FOR DEVICE CONTROL - A device controller that controls a device by tapping or rubbing the surface of microphones on the device. It allows microphones to be used as both speech sensors (to capture speech signals, the original functionality) and a device controller (the new functionality). Tapping or rubbing the surface of microphones on the device produces complex yet distinctive signals. By detecting these events, the present device controller can generate appropriate commands to control the device. | 09-25-2008 |
20080267578 | INSERTION OF VIRTUAL VIDEO INTO LIVE VIDEO - The present virtual video muting technique seamlessly inserts a virtual video into a live video when the user does not want to reveal his/her actual activity. The virtual video is generated based on real video frames captured earlier and thus makes the virtual video appear to be real. | 10-30-2008 |
20080279423 | RECOVERING PARAMETERS FROM A SUB-OPTIMAL IMAGE - A subregion-based image parameter recovery system and method for recovering image parameters from a single image containing a face taken under sub-optimal illumination conditions. The recovered image parameters (including albedo, illumination, and face geometry) can be used to generate face images under a new lighting environment. The method includes dividing the face in the image into numerous smaller regions, generating an albedo morphable model for each region, and using a Markov Random Fields (MRF)-based framework to model the spatial dependence between neighboring regions. Different types of regions are defined, including saturated, shadow, regular, and occluded regions. Each pixel in the image is classified and assigned to a region based on intensity, and then weighted based on its classification. The method decouples the texture from the geometry and illumination models, and then generates an objective function that is iteratively solved using an energy minimization technique to recover the image parameters. | 11-13-2008 |
20080317371 | VIDEO NOISE REDUCTION - A video noise reduction technique is presented. Generally, the technique involves first decomposing each frame of the video into low-pass and high-pass frequency components. Then, for each frame of the video after the first frame, an estimate of a noise variance in the high pass component is obtained. The noise in the high pass component of each pixel of each frame is reduced using the noise variance estimate obtained for the frame under consideration, whenever there has been no substantial motion exhibited by the pixel since the last previous frame. Evidence of motion is determined by analyzing the high and low pass components. | 12-25-2008 |
20090075634 | DATA BUDDY - Multi-modal, multi-lingual devices can be employed to consolidate numerous items including, but not limited to, keys, remote controls, image capture devices, audio recorders, cellular telephone functionalities, location/direction detectors, health monitors, calendars, gaming devices, smart home inputs, pens, optical pointing devices or the like. For example, a corner of a cellular telephone can be used as an electronic pen. Moreover, the device can be used to snap multiple pictures stitching them together to create a panoramic image. A device can automate ignition of an automobile, initiate appliances, etc. based upon relative distance. The device can provide for near to eye capabilities for enhanced image viewing. Multiple cameras/sensors can be provided on a single device to provide for stereoscopic capabilities. The device can also provide assistance to blind, privacy, etc. by consolidating services. | 03-19-2009 |
20090080632 | SPATIAL AUDIO CONFERENCING - Audio in an audio conference is spatialized using either virtual sound-source positioning or sound-field capture. A spatial audio conference is provided between a local and remote parties using audio conferencing devices (ACDs) interconnected by a network. Each ACD captures spatial audio information from the local party, generates either one, or three or more, audio data streams which include the captured information, and transmits the generated stream(s) to each remote party. Each ACD also receives the generated audio data stream(s) transmitted from each of the remote parties, processes the received streams to generate a plurality of audio signals, and renders the signals to produce a sound-field that is perceived by the local party, where the sound-field includes the spatial audio information captured from the remote parties. A sound-field capture device is also provided which includes at least three directional microphones symmetrically configured about a center axis in a semicircular array. | 03-26-2009 |
20090177601 | STATUS-AWARE PERSONAL INFORMATION MANAGEMENT - Described is a technology by which personal information that comes into a computer system is intelligently managed according to current state data including user presence and/or user attention data. Incoming information is processed against the state data to determine whether corresponding data is to be output, and if so, what output modality or modalities to use. For example, if a user is present and busy, a notification may be blocked or deferred to avoid disturbing the user. Cost analysis may be used to determine the cost of outputting the data. In addition to user state data, the importance of the information, other state data, the cost of converting data to another format for output (e.g., text-to-speech), and/or user preference data, may factor into the decision. The output data may be modified (e.g., audio made louder) based on a current output environment as determined via the state data. | 07-09-2009 |
20090220165 | EFFICIENT IMAGE DISPLAYING - Efficient image display on a display screen (e.g., in terms of number, space, resolution, and/or distortion) is facilitated by implementing one or more specialized select and pack routines for images. That is, representative images are selected from an image database, based on desired resolution and distortion, then resized and packed into a display arrangement that enhances use of display screen space. This allows, for example, images to be sent to a user from an image database more quickly, with more desirable resolution, and less distortion than traditional display techniques. | 09-03-2009 |
20090249386 | FACILITATING ADVERTISEMENT PLACEMENT OVER VIDEO CONTENT - Systems, methods, computer-readable media, and graphical user interfaces for facilitating advertisement placement over video content are provided. Images within a video are partitioned into image regions. Upon partitioning images into image regions, an intrusiveness score is determined for each image region. Based on the intrusiveness scores, optimal placement of an advertisement within the video is determined. | 10-01-2009 |
20090263010 | ADAPTING A PARAMETERIZED CLASSIFIER TO AN ENVIRONMENT - A classifier is trained on a first set of examples, and the trained classifier is adapted to perform on a second set of examples. The classifier implements a parameterized labeling function. Initial training of the classifier optimizes the labeling function's parameters to minimize a cost function. The classifier and its parameters are provided to an environment in which it will operate, along with an approximation function that approximates the cost function using a compact representation of the first set of examples in place of the actual first set. A second set of examples is collected, and the parameters are modified to minimize a combined cost of labeling the first and second sets of examples. The part of the combined cost that represents the cost of the modified parameters applied to the first set is calculated using the approximation function. | 10-22-2009 |
20090310802 | VIRTUAL SOUND SOURCE POSITIONING - Systems and methods for determining a virtual sound source position by determining an output for loudspeakers by the position of the loudspeakers in relation to a listener. The output of respective loudspeakers is generated using aural cues to give the listener knowledge of the virtual position of the virtual sound source. Both a gain in intensity and a delay are simulated. | 12-17-2009 |
20100027835 | RECOGNIZING ACTIONS OF ANIMATE OBJECTS IN VIDEO - A system that facilitates automatically determining an action of an animate object is described herein. The system includes a receiver component that receives video data that includes images of an animate object. The system additionally includes a determiner component that accesses a data store that includes an action graph and automatically determines an action undertaken by the animate object in the received video data based at least in part upon the action graph. The action graph comprises a plurality of nodes that are representative of multiple possible postures of the animate object. At least one node in the action graph is shared amongst multiple actions represented in the action graph. | 02-04-2010 |
20100074433 | Multichannel Acoustic Echo Cancellation - A multi-party spatial audio conferencing system is configured to receive far end signals from remote participants. The system comprises a speaker array that outputs spatialized sound signals and one or more microphones that capture and relay a sound signal comprising an echo of the spatialized sound signal to a multichannel acoustic echo cancellation (MC-AEC) unit having a plurality of echo cancellers. Respective echo cancellers perform cancellation of an echo associated with a far end signal from one of the multiple participants according to an algorithm based upon echo cancellation coefficients. The echo cancellation coefficients are determined from the input channel signals, the spatialization parameters associated with each input channel, and the audio signals captured by the microphones. This allows respective echo cancellation filters to be updated simultaneously even though the corresponding remote participant is not talking. | 03-25-2010 |
20100085416 | Multi-Device Capture and Spatial Browsing of Conferences - Multi-device capture and spatial browsing of conferences is described. In one implementation, a system detects cameras and microphones, such as the webcams on participants' notebook computers, in a conference room, group meeting, or table game, and enlists an ad-hoc array of available devices to capture each participant and the spatial relationships between participants. A video stream composited from the array is browsable by a user to navigate a 3-dimensional representation of the meeting. Each participant may be represented by a video pane, a foreground object, or a 3-D geometric model of the participant's face or body displayed in spatial relation to the other participants in a 3-dimensional arrangement analogous to the spatial arrangement of the meeting. The system may automatically re-orient the 3-dimensional representation as needed to best show the currently interesting event such as current speaker or may extend navigation controls to a user for manually viewing selected participants or nuanced interactions between participants. | 04-08-2010 |
20100149310 | VISUAL FEEDBACK FOR NATURAL HEAD POSITIONING - A videoconferencing conferee may be provided with feedback on his or her location relative a local video camera by altering how remote videoconference video is displayed on a local videoconference display viewed by the conferee. The conferee's location may be tracked and the displayed remote video may be altered in accordance to the changing location of the conferee. The remote video may appear to move in directions mirroring movement of the conferee. This effect may be achieved by modeling the remote video as offset and behind a virtual portal corresponding to the display. The remote video may be displayed according to a view of the remote video through the virtual portal. As the conferee's position changes, the view through the portal changes, and the remote video changes accordingly. | 06-17-2010 |
20100189310 | System and Method Providing Improved Head Motion Estimations for Animation - The computer-readable media provides improved procedures to estimate head motion between two images of a face. Locations of a number of distinct facial features are determined in two images. The locations are converted into as a set of physical face parameters based on the symmetry of the identified distinct facial features. An estimation objective function is determined by: (a) estimating each of the set of physical parameters, (b) estimating a first head pose transform corresponding to the first image, and (c) estimating a second head pose transform corresponding to the second image. The motion is estimated between the two images based on the set of physical face parameters by multiplying each term of the estimation objective function by a weighted contribution factor based on the confidence of data corresponding to the estimation objective function. | 07-29-2010 |
20100195812 | AUDIO TRANSFORMS IN CONNECTION WITH MULTIPARTY COMMUNICATION - The claimed subject matter relates to an architecture that can preprocess audio portions of communications in order to enrich multiparty communication sessions or environments. In particular, the architecture can provide both a public channel for public communications that are received by substantially all connected parties and can further provide a private channel for private communications that are received by a selected subset of all connected parties. Most particularly, the architecture can apply an audio transform to communications that occur during the multiparty communication session based upon a target audience of the communication. By way of illustration, the architecture can apply a whisper transform to private communications, an emotion transform based upon relationships, an ambience or spatial transform based upon physical locations, or a pace transform based upon lack of presence. | 08-05-2010 |
20100198579 | UNIVERSAL TRANSLATOR - The claimed subject matter provides a system and/or a method that facilitates communication within a telepresence session. A telepresence session can be initiated within a communication framework that includes two or more virtually represented users that communicate therein. The telepresence session can include at least one virtually represented user that communicates in a first language, the communication is at least one of a portion of audio, a portion of video, a portion of graphic, a gesture, or a portion of text. An interpreter component can evaluate the communication to translate an identified first language into a second language within the telepresence session, the translation is automatically provided to at least one virtually represented user within the telepresence. | 08-05-2010 |
20100228825 | SMART MEETING ROOM - The claimed subject matter provides a system and/or a method that facilitates enhancing the employment of a telepresence session. An automatic telepresence engine that can evaluate data associated with at least one of an attendee, a schedule for an attendee, or a portion of an electronic communication for an attendee. The automatic telepresence engine can identify at least one the following for a telepresence session based upon the evaluated data: a participant to include for the telepresence session, a portion of data related to a presentation within the telepresence session, a portion of data related to a meeting topic within the telepresence session, a device utilized by an attendee to communicate within the telepresence session. The automatic telepresence engine can initiate the telepresence session within a communication framework that includes two or more virtually represented users that communicate therein. | 09-09-2010 |
20100245536 | AMBULATORY PRESENCE FEATURES - The claimed subject matter provides a system and/or a method that facilitates managing one or more devices utilized for communicating data within a telepresence session. A telepresence session can be initiated within a communication framework that includes two or more virtually represented users that communicate therein. A device can be utilized by at least one virtually represented user that enables communication within the telepresence session, the device includes at least one of an input to transmit a portion of a communication to the telepresence session or an output to receive a portion of a communication from the telepresence session. A detection component can adjust at least one of the input related to the device or the output related to the device based upon the identification of a cue, the cue is at least one of a movement detected, an event detected, or an ambient variation. | 09-30-2010 |
20100289904 | VIDEO CAPTURE DEVICE PROVIDING MULTIPLE RESOLUTION VIDEO FEEDS - Systems are disclosed that provide improved transfer speed of video data from a video capture device to a computing device using multiple video feeds respectively comprising different resolutions. A high-resolution image sensor is used to convert light images into a high-resolution video data stream. A down sampler converts the high-resolution video data stream to a low-resolution video data stream, so that both a low-resolution data stream and a high-resolution data stream are available. While the low resolution-data stream can be sent to the computing device, a digital signal processor (DSP) processes the high-resolution video data stream in accordance with an input control signal that is comprised of desired high-resolution video stream parameters derived from the low-resolution video data stream. | 11-18-2010 |
20100303266 | SPATIALIZED AUDIO OVER HEADPHONES - A spatial element is added to communications, including over telephone conference calls heard through headphones or a stereo speaker setup. Functions are created to modify signals from different callers to create the illusion that the callers are speaking from different parts of the room. | 12-02-2010 |
20100318399 | Adaptive Meeting Management - A template and/or knowledge associated with a synchronous meeting are obtained by a computing device. The computing device then adaptively manages the synchronous meeting based at least in part on the template and/or knowledge. | 12-16-2010 |
20100329517 | BOOSTED FACE VERIFICATION - Techniques for face verification are described. Local binary pattern (LBP) features and boosting classifiers are used to verify faces in images. A boosted multi-task learning algorithm is used for face verification in images. Finally, boosted face verification is used to verify faces in videos. | 12-30-2010 |
20110063403 | MULTI-CAMERA HEAD POSE TRACKING - Techniques and technologies for tracking a face with a plurality of cameras wherein a geometry between the cameras is initially unknown. One disclosed method includes detecting a head with two of the cameras and registering a head model with the image of the head (as detected by one of the cameras). The method also includes back projecting the other detected face image to the head model and determining a head pose from the back-projected head image. Furthermore, the determined geometry is used to track the face with at least one of the cameras. | 03-17-2011 |
20110093820 | GESTURE PERSONALIZATION AND PROFILE ROAMING - A gesture-based system may have default or pre-packaged gesture information, where a gesture is derived from a user's position or motion in a physical space. In other words, no controllers or devices are necessary. Depending on how a user uses his or her gesture to accomplish the task, the system may refine the properties and the gesture may become personalized. The personalized gesture information may be stored in a gesture profile and can be further updated with the latest data. The gesture-based system may use the gesture profile information for gesture recognition techniques. Further, the gesture profile may be roaming such that the gesture profile is available in a second location without requiring the system to relearn gestures that have already been personalized on behalf of the user. | 04-21-2011 |
20110119210 | Multiple Category Learning for Training Classifiers - Described is multiple category learning to jointly train a plurality of classifiers in an iterative manner. Each training iteration associates an adaptive label with each training example, in which during the iterations, the adaptive label of any example is able to be changed by the subsequent reclassification. In this manner, any mislabeled training example is corrected by the classifiers during training. The training may use a probabilistic multiple category boosting algorithm that maintains probability data provided by the classifiers, or a winner-take-all multiple category boosting algorithm selects the adaptive label based upon the highest probability classification. The multiple category boosting training system may be coupled to a multiple instance learning mechanism to obtain the training examples. The trained classifiers may be used as weak classifiers that provide a label used to select a deep classifier for further classification, e.g., to provide a multi-view object detector. | 05-19-2011 |
20110170739 | Automated Acquisition of Facial Images - Described is a technology by which medical patient facial images are acquired and maintained for associating with a patient's records and/or other items. A video camera may provide video frames, such as captured when a patient is being admitted to a hospital. Face detection may be employed to clip the facial part from the frame. Multiple images of a patient's face may be displayed on a user interface to allow selection of a representative image. Also described is obtaining the patient images by processing electronic documents (e.g., patient records) to look for a face pictured therein. | 07-14-2011 |
20110267419 | ACCELERATED INSTANT REPLAY FOR CO-PRESENT AND DISTRIBUTED MEETINGS - Techniques for recording and replay of a live conference while still attending the live conference are described. A conferencing system includes a user interface generator, a live conference processing module, and a replay processing module. The user interface generator is configured to generate a user interface that includes a replay control panel and one or more output panels. The live conference processing module is configured to extract information included in received conferencing data that is associated with one or more conferencing modalities, and to display the information in the one or more output panels in a live manner (e.g., as a live conference). The replay processing module is configured to enable information associated with the one or more conferencing modalities corresponding to a time of the conference session prior to live to be presented at a desired rate, possibly different from the real-time rate, if a replay mode is selected in the replay control panel. | 11-03-2011 |
20110295392 | DETECTING REACTIONS AND PROVIDING FEEDBACK TO AN INTERACTION - Reaction information of participants to an interaction may be sensed and analyzed to determine one or more reactions or dispositions of the participants. Feedback may be provided based on the determined reactions. The participants may be given an opportunity to opt in to having their reaction information collected, and may be provided complete control over how their reaction information is shared or used. | 12-01-2011 |
20110307260 | MULTI-MODAL GENDER RECOGNITION - Gender recognition is performed using two or more modalities. For example, depth image data and one or more types of data other than depth image data is received. The data pertains to a person. The different types of data are fused together to automatically determine gender of the person. A computing system can subsequently interact with the person based on the determination of gender. | 12-15-2011 |
20110311137 | HIERARCHICAL FILTERED MOTION FIELD FOR ACTION RECOGNITION - Described is a hierarchical filtered motion field technology such as for use in recognizing actions in videos with crowded backgrounds. Interest points are detected, e.g., as 2D Harris corners with recent motion, e.g. locations with high intensities in a motion history image (MHI). A global spatial motion smoothing filter is applied to the gradients of MHI to eliminate low intensity corners that are likely isolated, unreliable or noisy motions. At each remaining interest point, a local motion field filter is applied to the smoothed gradients by computing a structure proximity between sets of pixels in the local region and the interest point. The motion at a pixel/pixel set is enhanced or weakened based on its structure proximity with the interest point (nearer pixels are enhanced). | 12-22-2011 |
20120105585 | IN-HOME DEPTH CAMERA CALIBRATION - A system and method are disclosed for calibrating a depth camera in a natural user interface. The system in general obtains an objective measurement of true distance between a capture device and one or more objects in a scene. The system then compares the true depth measurement to the depth measurement provided by the depth camera at one or more points and determines an error function describing an error in the depth camera measurement. The depth camera may then be recalibrated to correct for the error. The objective measurement of distance to one or more objects in a scene may be accomplished by a variety of systems and methods. | 05-03-2012 |
20120155680 | VIRTUAL AUDIO ENVIRONMENT FOR MULTIDIMENSIONAL CONFERENCING - The disclosed architecture employs signal processing techniques to provide audio perception only, or audio perception that matches the visual perception. This also provides spatial audio reproduction for multiparty teleconferencing such that the teleconferencing participants perceive themselves as if they were sitting in the same room. The solution is based on the premise that people perceive sounds as a reconstructed wavefront, and hence, the wavefronts are used to provide the spatial perceptual cues. The differences between the spatial perceptual cues derived from the reconstructed wavefront of sound waves and the ideal wavefront of sound waves form an objective metric for spatial perceptual quality, and provide the means of evaluating the overall system performance. Additionally, compensation filters are employed to improve the spatial perceptual quality of stereophonic systems by optimizing the objective metrics. | 06-21-2012 |
20120262536 | STEREOPHONIC TELECONFERENCING USING A MICROPHONE ARRAY - Stereophonic teleconferencing system embodiments are described which advantageously employ a microphone array at a remote conference site having multiple conferencees to produce a separate output channel from the each microphone in the array. Audio data streams each representing one of the audio output channels from the microphone array are then sent to a local conference site where a local conferencee is in attendance. The voices of the aforementioned remote conferencees are spatialized within a sound-field of the local site using multiple loudspeakers. Generally, this involves receiving the monophonic audio data streams from the remote site, and processing them to generate an audio signal for each loudspeaker. Each of the generated audio signals is then played through its respective loudspeaker to produce a spatial audio sound-field which is audibly perceived by the local conferencee as having the voice of each of the remote conferencees coming from a different location. | 10-18-2012 |
20120268563 | AUGMENTED AUDITORY PERCEPTION FOR THE VISUALLY IMPAIRED - A person is provided with the ability to auditorily determine the spatial geometry of his current physical environment. A spatial map of the current physical environment of the person is generated. The spatial map is then used to generate a spatialized audio representation of the environment. The spatialized audio representation is then output to a stereo listening device which is being worn by the person. | 10-25-2012 |
20120280974 | PHOTO-REALISTIC SYNTHESIS OF THREE DIMENSIONAL ANIMATION WITH FACIAL FEATURES SYNCHRONIZED WITH SPEECH - Dynamic texture mapping is used to create a photorealistic three dimensional animation of an individual with facial features synchronized with desired speech. Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which the animation will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with facial features, such as lip movements, synchronized with the desired speech. This image sequence is applied to the three-dimensional model. | 11-08-2012 |
20120281059 | Immersive Remote Conferencing - The subject disclosure is directed towards an immersive conference, in which participants in separate locations are brought together into a common virtual environment (scene), such that they appear to each other to be in a common space, with geometry, appearance, and real-time natural interaction (e.g., gestures) preserved. In one aspect, depth data and video data are processed to place remote participants in the common scene from the first person point of view of a local participant. Sound data may be spatially controlled, and parallax computed to provide a realistic experience. The scene may be augmented with various data, videos and other effects/animations. | 11-08-2012 |
20120287223 | IMAGING THROUGH A DISPLAY SCREEN - The described implementations relate to enhancement images, such as in videoconferencing scenarios. One system includes a poriferous display screen having generally opposing front and back surfaces. This system also includes a camera positioned proximate to the back surface to capture an image through the poriferous display screen. | 11-15-2012 |
20120294510 | DEPTH RECONSTRUCTION USING PLURAL DEPTH CAPTURE UNITS - A depth construction module is described that receives depth images provided by two or more depth capture units. Each depth capture unit generates its depth image using a structured light technique, that is, by projecting a pattern onto an object and receiving a captured image in response thereto. The depth construction module then identifies at least one deficient portion in at least one depth image that has been received, which may be attributed to overlapping projected patterns that impinge the object. The depth construction module then uses a multi-view reconstruction technique, such as a plane sweeping technique, to supply depth information for the deficient portion. In another mode, a multi-view reconstruction technique can be used to produce an entire depth scene based on captured images received from the depth capture units, that is, without first identifying deficient portions in the depth images. | 11-22-2012 |
20120306995 | Ambulatory Presence Features - A system facilitates managing one or more devices utilized for communicating data within a telepresence session. A telepresence session can be initiated within a communication framework that includes a first user and one or more second users. In response to determining a temporary absence of the first user from the telepresence session, a recordation of the telepresence session is initialized to enable a playback of a portion or a summary of the telepresence session that the first user has missed. | 12-06-2012 |
20130010079 | CALIBRATION BETWEEN DEPTH AND COLOR SENSORS FOR DEPTH CAMERAS - A system described herein includes a receiver component that receives a first digital image from a color camera, wherein the first digital image comprises a planar object, and a second digital image from a depth sensor, wherein the second digital image comprises the planar object. The system also includes a calibrator component that jointly calibrates the color camera and the depth sensor based at least in part upon the first digital image and the second digital image. | 01-10-2013 |
20130121526 | COMPUTING 3D SHAPE PARAMETERS FOR FACE ANIMATION - A three-dimensional shape parameter computation system and method for computing three-dimensional human head shape parameters from two-dimensional facial feature points. A series of images containing a user's face is captured. Embodiments of the system and method deduce the 3D parameters of the user's head by examining a series of captured images of the user over time and in a variety of head poses and facial expressions, and then computing an average. An energy function is constructed over a batch of frames containing 2D face feature points obtained from the captured images, and the energy function is minimized to solve for the head shape parameters valid for the batch of frames. Head pose parameters and facial expression and animation parameters can vary over each captured image in the batch of frames. In some embodiments this minimization is performed using a modified Gauss-Newton minimization technique using a single iteration. | 05-16-2013 |
20130151244 | HARMONICITY-BASED SINGLE-CHANNEL SPEECH QUALITY ESTIMATION - Speech quality estimation technique embodiments are described which generally involve estimating the human speech quality of an audio frame in a single-channel audio signal. A representation of a harmonic component of the frame is synthesized and used to compute a non-harmonic component of the frame. The synthesized harmonic component representation and the non-harmonic component are then used to compute a harmonic to non-harmonic ratio (HnHR). This HnHR is indicative of the quality of a user's speech and is designated as an estimate of the speech quality of the frame. In one implementation, the HnHR is used to establish a minimum speech quality threshold below which the quality of the user's speech is considered unacceptable. Feedback to the user is then provided based on whether the HnHR falls below the threshold. | 06-13-2013 |
20130201291 | HEAD POSE TRACKING USING A DEPTH CAMERA - Head pose tracking technique embodiments are presented that use a group of sensors configured so as to be disposed on a user's head. This group of sensors includes a depth sensor apparatus used to identify the three dimensional locations of features within a scene, and at least one other type of sensor. Data output by each sensor in the group of sensors is periodically input, and each time the data is input it is used to compute a transformation matrix that when applied to a previously determined head pose location and orientation established when the first sensor data was input identifies a current head pose location and orientation. This transformation matrix is then applied to the previously determined head pose location and orientation to identify a current head pose location and orientation. | 08-08-2013 |
20130232515 | ESTIMATING ENGAGEMENT OF CONSUMERS OF PRESENTED CONTENT - Technologies described herein relate to estimating engagement of a person with respect to content being presented to the person. A sensor outputs a stream of data relating to the person as the person is consuming the content. At least one feature is extracted from the stream of data, and a level of engagement of the person is estimated based at least in part upon the at least one feature. A computing function is performed based upon the estimated level of engagement of the person. | 09-05-2013 |
20130294710 | RECOVERING DIS-OCCLUDED AREAS USING TEMPORAL INFORMATION INTEGRATION - A temporal information integration dis-occlusion system and method for using historical data to reconstruct a virtual view containing an occluded area. Embodiments of the system and method use temporal information of the scene captured previously to obtain a total history. This total history is warped onto information captured by a camera at a current time in order to help reconstruct the dis-occluded areas. The historical data (or frames) from the total history match only a portion of the frames contained in the captured information. This warping yields warped history information. Warping is performed by using one of two embodiments to match points in an estimation of the current information to points in the captured information. Next, regions of current information are split using a classifier. The warped history information and the captured information then are merged to obtain an estimate for the current information and the reconstructed virtual view. | 11-07-2013 |
20130321564 | PERSPECTIVE-CORRECT COMMUNICATION WINDOW WITH MOTION PARALLAX - A perspective-correct communication window system and method for communicating between participants in an online meeting, where the participants are not in the same physical locations. Embodiments of the system and method provide an in-person communications experience by changing virtual viewpoint for the participants when they are viewing the online meeting. The participant sees a different perspective displayed on a monitor based on the location of the participant's eyes. Embodiments of the system and method include a capture and creation component that is used to capture visual data about each participant and create a realistic geometric proxy from the data. A scene geometry component is used to create a virtual scene geometry that mimics the arrangement of an in-person meeting. A virtual viewpoint component displays the changing virtual viewpoint to the viewer and can add perceived depth using motion parallax. | 12-05-2013 |
20130336524 | Dynamic Hand Gesture Recognition Using Depth Data - The subject disclosure is directed towards a technology by which dynamic hand gestures are recognized by processing depth data, including in real-time. In an offline stage, a classifier is trained from feature values extracted from frames of depth data that are associated with intended hand gestures. In an online stage, a feature extractor extracts feature values from sensed depth data that corresponds to an unknown hand gesture. These feature values are input to the classifier as a feature vector to receive a recognition result of the unknown hand gesture. The technology may be used in real time, and may be robust to variations in lighting, hand orientation, and the user's gesturing speed and style. | 12-19-2013 |
20140009562 | MULTI-DEVICE CAPTURE AND SPATIAL BROWSING OF CONFERENCES - Multi-device capture and spatial browsing of conferences is described. In one implementation, a system detects cameras and microphones, such as the webcams on participants' notebook computers, in a conference room, group meeting, or table game, and enlists an ad-hoc array of available devices to capture each participant and the spatial relationships between participants. A video stream composited from the array is browsable by a user to navigate a 3-dimensional representation of the meeting. Each participant may be represented by a video pane, a foreground object, or a 3-D geometric model of the participant's face or body displayed in spatial relation to the other participants in a 3-dimensional arrangement analogous to the spatial arrangement of the meeting. The system may automatically re-orient the 3-dimensional representation as needed to best show a currently interesting event. | 01-09-2014 |
20140098183 | CONTROLLED THREE-DIMENSIONAL COMMUNICATION ENDPOINT - A controlled three-dimensional (3D) communication endpoint system and method for simulating an in-person communication between participants in an online meeting or conference and providing easy scaling of a virtual environment when additional participants join. This gives the participants the illusion that the other participants are in the same room and sitting around the same table with the viewer. The controlled communication endpoint includes a plurality of camera pods that capture video of a participant from 360 degrees around the participant. The controlled communication endpoint also includes a display device configuration containing display devices placed at least 180 degrees around the participant and display the virtual environment containing geometric proxies of the other participants. Placing the participants at a round virtual table and increasing the diameter of the virtual table as additional participants are added easily achieves scalability. | 04-10-2014 |
20140168204 | MODEL BASED VIDEO PROJECTION - A method, system, and computer-readable storage media for model based video projection are provided herein. The method includes tracking an object within a video based on a three-dimensional parametric model via a computing device and projecting the video onto the three-dimensional parametric model. The method also includes updating a texture map corresponding to the object within the video and rendering a three-dimensional video of the object from any of a number of viewpoints by loosely coupling the three-dimensional parametric model and the updated texture map. | 06-19-2014 |
20140232816 | PROVIDING A TELE-IMMERSIVE EXPERIENCE USING A MIRROR METAPHOR - A tele-immersive environment is described that provides interaction among participants of a tele-immersive session. The environment includes two or more set-ups, each associated with a participant. Each set-up, in turn, includes mirror functionality for presenting a three-dimensional virtual space for viewing by a local participant. The virtual space shows at least some of the participants as if the participants were physically present at a same location and looking into a mirror. The mirror functionality can be implemented as a combination of a semi-transparent mirror and a display device, or just a display device acting alone. According to another feature, the environment may present a virtual object in a manner that allows any of the participants of the tele-immersive session to interact with the virtual object. | 08-21-2014 |
20140267225 | HAIR SURFACE RECONSTRUCTION FROM WIDE-BASELINE CAMERA ARRAYS - The subject disclosure is directed towards reconstructing an approximate hair surface using refinement of hair strands. Hair strands are first extracted from 2D images of a camera array, and projected onto a 3D visual hull. The 3D positions of these strands are refined by optimizing an objective function that takes into account orientation consistency, a visual hull constraint and/or smoothness constraints defined at the strand, wisp and/or global levels. | 09-18-2014 |
20140325393 | ACCELERATED INSTANT REPLAY FOR CO-PRESENT AND DISTRIBUTED MEETINGS - Techniques for recording and replay of a live conference while still attending the live conference are described. A conferencing system includes a user interface generator, a live conference processing module, and a replay processing module. The user interface generator is configured to generate a user interface that includes a replay control panel and one or more output panels. The live conference processing module is configured to extract information included in received conferencing data that is associated with one or more conferencing modalities, and to display the information in the one or more output panels in a live manner (e.g., as a live conference). The replay processing module is configured to enable information associated with the one or more conferencing modalities corresponding to a time of the conference session prior to live to be presented at a desired rate, possibly different from the real-time rate. | 10-30-2014 |