Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Gregory H. Bellows, Austin US

Gregory H. Bellows, Austin, TX US

Patent application number	Description	Published
20090157913	Method for Toggling Non-Adjacent Channel Identifiers During DMA Double Buffering Operations - Disclosed are a method, a system and a computer program product for managing direct memory access (DMA) operations in a double buffering system. During direct memory access operations in a computer system, data is transferred from a source memory location to a destination memory location with minimal use of the computer's processing unit. Double buffering utilizes two separate memory buffers to perform simultaneous DMA operations. Prior to processing a DMA request each buffer in a double buffering system is assigned a channel identification (ID), or tag. When reading, writing, or polling status of data in a buffer, the tag identifies the buffer. A toggle factor is utilized to conveniently switch between each buffer in the double buffering system. Utilizing a toggle factor decreases latencies in DMA operations.	06-18-2009
20100159875	Telephone Handset Contact List Synchronization - A method comprises a first telephone handset selecting a second telephone handset as an approved contact exchange partner. The first telephone handset and the second telephone handset comprise contact information organized in a database. The first telephone handset establishes a telephone call between the first telephone handset and the second telephone handset. The first telephone handset receives contact update information from the second telephone handset in a first protocol. The first telephone handset synchronizes the contact update information with the contact information of the first telephone handset.	06-24-2010
20110055531	Synchronizing Commands and Dependencies in an Asynchronous Command Queue - Provided are techniques for the managing of command queue dependencies and command queue synchronization. Incoming commands are actively tracked through their dependency relationships. Command dependencies may be tracked across multiple lists, including a submission list and a completion list. Each command on the submission list is prepared for processing and ultimately submitted to command processing logic. Command completion processing is performed on each command on the completion list, including by not limited to removing dependencies from pending commands and possibly queuing pending commands for submission to the command processing logic. Also provided as features of a command queue are a standby barrier, an active barrier and a marker. Standby and active barriers are employed to synchronize and track commands through the command queue. Markers are employed to track commands through the command queue.	03-03-2011
20110055839	Multi-Core/Thread Work-Group Computation Scheduler - Execution units process commands from one or more command queues. Once a command is available on the queue, each unit participating in the execution of the command atomically decrements the command's work groups remaining counter by the work group reservation size and processes a corresponding number of work groups within a work group range. Once all work groups within a range are processed, an execution unit increments a work group processed counter. The unit that increments the work group processed counter to the value stored in a work groups to be executed counter signals completion of the command. Each execution unit that access a command also marks a work group seen counter. Once the work groups processed counter equals the work groups to be executed counter and the work group seen counter equals the number of execution units, the command may be removed or overwritten on the command queue.	03-03-2011
20110161608	METHOD TO CUSTOMIZE FUNCTION BEHAVIOR BASED ON CACHE AND SCHEDULING PARAMETERS OF A MEMORY ARGUMENT - Disclosed are a method, a system and a computer program product of operating a data processing system that can include or be coupled to multiple processor cores. In one or more embodiments, each of multiple memory objects can be populated with work items and can be associated with attributes that can include information which can be used to describe data of each memory object and/or which can be used to process data of each memory object. The attributes can be used to indicate one or more of a cache policy, a cache size, and a cache line size, among others. In one or more embodiments, the attributes can be used as a history of how each memory object is used. The attributes can be used to indicate cache history statistics (e.g., a hit rate, a miss rate, etc.).	06-30-2011
20110161734	PROCESS INTEGRITY IN A MULTIPLE PROCESSOR SYSTEM - Disclosed are a method, a system and a computer program product of operating a data processing system that can include or be coupled to multiple processor cores. In one or more embodiments, an error can be determined while two or more processor cores are processing a first group of two or more work items, and the error can be signaled to an application. The application can determine a state of progress of processing the two or more work items and at least one dependency from the state of progress. In one or more embodiments, a second group of two or more work items that are scheduled for processing can be unscheduled, in response to determining the error. In one or more embodiments, the application can process at least one work item that caused the error, and the second group of two or more work items can be rescheduled for processing.	06-30-2011
20110161943	METHOD TO DYNAMICALLY DISTRIBUTE A MULTI-DIMENSIONAL WORK SET ACROSS A MULTI-CORE SYSTEM - A method provides efficient dispatch/completion of an N Dimensional (ND) Range command in a data processing system (DPS). The method comprises: a compiler generating one or more commands from received program instructions; ND Range work processing (WP) logic determining when a command generated by the compiler will be implemented over an ND configuration of operands, where N is greater than one (1); automatically decomposing the ND configuration of operands into a one (1) dimension (1D) work element comprising P sequentially ordered work items that each represent one of the operands; placing the 1D work element within a command queue of the DPS; enabling sequential dispatching of 1D work items in ordered sequence from to one or more processing units; and generating an ND Range output by mapping the 1D work output result to an ND position corresponding to an original location of the operand represented by the 1D work item.	06-30-2011
20110161970	METHOD TO REDUCE QUEUE SYNCHRONIZATION OF MULTIPLE WORK ITEMS IN A SYSTEM WITH HIGH MEMORY LATENCY BETWEEN COMPUTE NODES - Disclosed are a method, a system and a computer program product of operating a data processing system that can include or be coupled to multiple processor cores. The multiple processor cores can be coupled to a memory that can include multiple priority queues associated with multiple respective priorities and store multiple work items. Work items stored in the multiple priority queues can be associated with a bit mask which is associated with a respective priority queue and can be routed to respective groups of one or more processors based on the associated bit mask. In one or more embodiments, at least two groups of processor cores can include at least one processor core that is common to both of the at least two groups of processor cores.	06-30-2011
20110161975	REDUCING CROSS QUEUE SYNCHRONIZATION ON SYSTEMS WITH LOW MEMORY LATENCY ACROSS DISTRIBUTED PROCESSING NODES - A method for efficient dispatch/completion of a work element within a multi-node data processing system. The method comprises: selecting specific processing units from among the processing nodes to complete execution of a work element that has multiple individual work items that may be independently executed by different ones of the processing units; generating an allocated processor unit (APU) bit mask that identifies at least one of the processing units that has been selected; placing the work element in a first entry of a global command queue (GCQ); associating the APU mask with the work element in the GCQ; and responsive to receipt at the GCQ of work requests from each of the multiple processing nodes or the processing units, enabling only the selected specific ones of the processing nodes or the processing units to be able to retrieve work from the work element in the GCQ.	06-30-2011
20110161976	METHOD TO REDUCE QUEUE SYNCHRONIZATION OF MULTIPLE WORK ITEMS IN A SYSTEM WITH HIGH MEMORY LATENCY BETWEEN PROCESSING NODES - A method efficiently dispatches/completes a work element within a multi-node, data processing system that has a global command queue (GCQ) and at least one high latency node. The method comprises: at the high latency processor node, work scheduling logic establishing a local command/work queue (LCQ) in which multiple work items for execution by local processing units can be staged prior to execution; a first local processing unit retrieving via a work request a larger chunk size of work than can be completed in a normal work completion/execution cycle by the local processing unit; storing the larger chunk size of work retrieved in a local command/work queue (LCQ); enabling the first local processing unit to locally schedule and complete portions of the work stored within the LCQ; and transmitting a next work request to the GCQ only when all the work within the LCQ has been dispatched by the local processing units.	06-30-2011
20110191785	Terminating An Accelerator Application Program In A Hybrid Computing Environment - Terminating an accelerator application program in a hybrid computing environment that includes a host computer having a host computer architecture and an accelerator having an accelerator architecture, where the host computer and the accelerator are adapted to one another for data communications by a system level message passing module (‘SLMPM’), and terminating an accelerator application program in a hybrid computing environment includes receiving, by the SLMPM from a host application executing on the host computer, a request to terminate an accelerator application program executing on the accelerator; terminating, by the SLMPM, execution of the accelerator application program; returning, by the SLMPM to the host application, a signal indicating that execution of the accelerator application program was terminated; and performing, by the SLMPM, a cleanup of the execution environment associated with the terminated accelerator application program.	08-04-2011
20120221836	Synchronizing Commands and Dependencies in an Asynchronous Command Queue - Provided are techniques for the managing of command queue dependencies and command queue synchronization. Incoming commands are actively tracked through their dependency relationships. Command dependencies may be tracked across multiple lists, including a submission list and a completion list. Each command on the submission list is prepared for processing and ultimately submitted to command processing logic. Command completion processing is performed on each command on the completion list, including by not limited to removing dependencies from pending commands and possibly queuing pending commands for submission to the command processing logic. Also provided as features of a command queue are a standby barrier, an active barrier and a marker. Standby and active barriers are employed to synchronize and track commands through the command queue. Markers are employed to track commands through the command queue.	08-30-2012
20130013897	METHOD TO DYNAMICALLY DISTRIBUTE A MULTI-DIMENSIONAL WORK SET ACROSS A MULTI-CORE SYSTEM - A method provides efficient dispatch/completion of an N Dimensional (ND) Range command in a data processing system (DPS). The method comprises: a compiler generating one or more commands from received program instructions; ND Range work processing (WP) logic determining when a command generated by the compiler will be implemented over an ND configuration of operands, where N is greater than one (1); automatically decomposing the ND configuration of operands into a one (1) dimension (1D) work element comprising P sequentially ordered work items that each represent one of the operands; placing the 1D work element within a command queue of the DPS; enabling sequential dispatching of 1D work items in ordered sequence from to one or more processing units; and generating an ND Range output by mapping the 1D work output result to an ND position corresponding to an original location of the operand represented by the 1D work item.	01-10-2013
20130014124	REDUCING CROSS QUEUE SYNCHRONIZATION ON SYSTEMS WITH LOW MEMORY LATENCY ACROSS DISTRIBUTED PROCESSING NODES - A method for efficient dispatch/completion of a work element within a multi-node data processing system. The method comprises: selecting specific processing units from among the processing nodes to complete execution of a work element that has multiple individual work items that may be independently executed by different ones of the processing units; generating an allocated processor unit (APU) bit mask that identifies at least one of the processing units that has been selected; placing the work element in a first entry of a global command queue (GCQ); associating the APU mask with the work element in the GCQ; and responsive to receipt at the GCQ of work requests from each of the multiple processing nodes or the processing units, enabling only the selected specific ones of the processing nodes or the processing units to be able to retrieve work from the work element in the GCQ.	01-10-2013
20130254776	METHOD TO REDUCE QUEUE SYNCHRONIZATION OF MULTIPLE WORK ITEMS IN A SYSTEM WITH HIGH MEMORY LATENCY BETWEEN PROCESSING NODES - A method efficiently dispatches/completes a work element within a multi-node, data processing system that has a global command queue (GCQ) and at least one high latency node. The method comprises: at the high latency processor node, work scheduling logic establishing a local command/work queue (LCQ) in which multiple work items for execution by local processing units can be staged prior to execution; a first local processing unit retrieving via a work request a larger chunk size of work than can be completed in a normal work completion/execution cycle by the local processing unit; storing the larger chunk size of work retrieved in a local command/work queue (LCQ); enabling the first local processing unit to locally schedule and complete portions of the work stored within the LCQ; and transmitting a next work request to the GCQ only when all the work within the LCQ has been dispatched by the local processing units.	09-26-2013

Patent applications by Gregory H. Bellows, Austin, TX US