Ravi K. Arimilli, Austin US

Ravi K. Arimilli, Austin, TX US

Patent application number	Description	Published
20080225863	DATA PROCESSING SYSTEM, METHOD AND INTERCONNECT FABRIC SUPPORTING MULTIPLE PLANES OF PROCESSING NODES - A data processing system includes a first plane including a first plurality of processing nodes, each including multiple processing units, and a second plane including a second plurality of processing nodes, each including multiple processing units. The data processing system also includes a plurality of point-to-point first tier links. Each of the first plurality and second plurality of processing nodes includes one or more first tier links among the plurality of first tier links, where the first tier link(s) within each processing node connect a pair of processing units in the same processing node for communication. The data processing system further includes a plurality of point-to-point second tier links. At least a first of the plurality of second tier links connects processing units in different ones of the first plurality of processing nodes, at least a second of the plurality of second tier links connects processing units in different ones of the second plurality of processing nodes, and at least a third of the plurality of second tier links connects a processing unit in the first plane to a processing unit in the second plane.	09-18-2008
20080307182	EFFICIENT AND FLEXIBLE MEMORY COPY OPERATION - A system, method, and computer program product for semi-synchronously copying data from a first portion of memory to a second portion of memory are disclosed. The method comprises receiving, in a processor, a call for a semi-synchronous memory copy operation. The semi-synchronous memory copy operation preserves temporal persistence of validity for a virtual source address corresponding to a source location in a memory and a virtual target address corresponding to a target location in the memory by setting a flag bit. The call includes at least the virtual source address, the virtual target address, and an indicator identifying a number of bytes to be copied. The memory copy operation is placed in a queue for execution by a memory controller. The queue is coupled to the memory controller. At least one subsequent instruction is continued to be executed as the subsequent instruction becomes available from an instruction pipeline.	12-11-2008
20090063443	System and Method for Dynamically Supporting Indirect Routing Within a Multi-Tiered Full-Graph Interconnect Architecture - A method, computer program product, and system are provided for dynamically routing data through the data processing system. Data is received at a first processor that is to be transmitted to a destination processor. The data that is received includes address information. A lookup is performed in routing table data structures based on the address information to identify candidate paths through which the data is routed to the destination processor. A determination is made as to whether any of the candidate paths are not able to be used to route the data to the destination processor based on a setting of at least one identifier. A path is selected from the identified candidate paths for routing of the data based on a setting of the at least one identifier. Then, the data is transmitted from the first processor along the selected path toward the destination processor.	03-05-2009
20090063444	System and Method for Providing Multiple Redundant Direct Routes Between Supernodes of a Multi-Tiered Full-Graph Interconnect Architecture - A method, computer program product, and system are provided for selecting, from a plurality of routes through the data processing system, a direct route for transmitting data. Data that includes address information is received at a first processor that is to be transmitted to a destination processor. Using routing table data structures, direct route entries are identified that correspond to direct routes for transmitting data. An accessed priority table data structure comprises a priority entry for each entry in the routing table data structures. The priority entry specifies a priority of a corresponding entry in the routing table data structures. A direct route entry is selected that corresponds to a direct route from the routing table data structures, based on specified priorities. Then the data is transmitted from the first processor to the destination processor using a path corresponding to the selected direct route entry.	03-05-2009
20090063445	System and Method for Handling Indirect Routing of Information Between Supernodes of a Multi-Tiered Full-Graph Interconnect Architecture - A method, computer program product, and system are provided for selecting, from a plurality of routes through the data processing system, an indirect route for transmitting data. Data that includes address information is received at a first processor that is to be transmitted to a destination processor. Using routing table data structures, indirect route entries are identified that correspond to indirect routes for transmitting data. An accessed priority table data structure comprises a priority entry for each entry in the routing table data structures. The priority entry specifies a priority of a corresponding entry in the routing table data structures. An indirect route entry is selected that corresponds to an indirect route from the routing table data structures, based on specified priorities. Then the data is transmitted from the first processor to the destination processor using a path corresponding to the selected indirect route entry.	03-05-2009
20090063728	System and Method for Direct/Indirect Transmission of Information Using a Multi-Tiered Full-Graph Interconnect Architecture - A method, computer program product, and system are provided for transmitting data in a data network. A first processor of the data network receives data to be transmitted to a second processor within the data network. A determination is made if the data has previously been routed through an indirect communication link from a source processor, the indirect communication link being a communication link that does not directly couple the source processor to a final destination processor which is to receive the data. A communication link is selected over which to transmit the data from the first processor to the second processor based on results of determining if the data has previously been routed through an indirect communication link. Finally, the data is transmitted from the first processor to the second processor using the selected communication link.	03-05-2009
20090063811	System for Data Processing Using a Multi-Tiered Full-Graph Interconnect Architecture - A system is provided for implementing a multi-tiered full-graph interconnect architecture. In order to implement a multi-tiered full-graph interconnect architecture, a plurality of processors are coupled to one another to create a plurality of processor books. The plurality of processor books are coupled together to create a plurality of supernodes. Then, the plurality of supernodes are coupled together to create the multi-tiered full-graph interconnect architecture. Data is then transmitted from one processor to another within the multi-tiered full-graph interconnect architecture based on an addressing scheme that specifies at least a supernode and a processor book associated with a target processor to which the data is to be transmitted.	03-05-2009
20090063814	System and Method for Routing Information Through a Data Processing System Implementing a Multi-Tiered Full-Graph Interconnect Architecture - A method, computer program product, and system are provided for routing information through the data processing system. Data is received at a source processor within a set of processors that is to be transmitted to a destination processor, where the data includes address information. A first determination is performed as to whether the destination processor is within a same processor book as the source processor based on the address information. A second determination is performed as to whether the destination processor is within a same supernode as the source processor based on the address information if the destination processor is not within the same processor book. A routing path is identified for the data based on results of the first determination, the second determination, and one or more routing table data structures. The data is then transmitted from the source processor along the identified routing path toward the destination processor.	03-05-2009
20090063815	System and Method for Providing Full Hardware Support of Collective Operations in a Multi-Tiered Full-Graph Interconnect Architecture - A method, computer program product, and system are provided for performing collective operations. In hardware of a parent processor in a first processor book, a number of other processors are determined in a same or different processor book of the data processing system that is needed to execute the collective operation, thereby establishing a plurality of processors comprising the parent processor and the other processors. In hardware of the parent processor, the plurality of processors are logically arranged as a plurality of nodes in a hierarchical structure. The collective operation is transmitted to the plurality of processors based on the hierarchical structure. In hardware of the parent processor, results are received from the execution of the collective operation from the other processors, a final result is generated of the collective operation based on the received results, and the final result is output.	03-05-2009
20090063816	System and Method for Performing Collective Operations Using Software Setup and Partial Software Execution at Leaf Nodes in a Multi-Tiered Full-Graph Interconnect Architecture - A method, computer program product, and system are provided for performing collective operations. In software executing on a parent processor in a first processor book, a number of other processors are determined in a same or different processor book of the data processing system that is needed to execute the collective operation, thereby establishing a plurality of processors comprising the parent processor and the other processors. In software executing on the parent processor, the plurality of processors are logically arranged as a plurality of nodes in a hierarchical structure. The collective operation is transmitted to the plurality of processors based on the hierarchical structure. In hardware of the parent processor, results are received from the execution of the collective operation from the other processors, a final result is generated of the collective operation based on the received results, and the final result is output.	03-05-2009
20090063817	System and Method for Packet Coalescing in Virtual Channels of a Data Processing System in a Multi-Tiered Full-Graph Interconnect Architecture - A method, computer program product, and system are provided for packet coalescing in virtual channels of a data processing system. A first processor bundles original data to be transmitted to a destination processor, the original data provided by a first source processor. The first processor transmits the bundle of data to a second processor along a path to the destination processor. The second processor determines if the second processor has additional data destined for the same destination processor, the additional data being provided by a second source processor that is different from the first source processor. Responsive to the second processor having additional data, the second processor unbundles the original data, adds the additional data to the original data, and rebundles the data along with the additional data. Then the second processor transmits the rebundled data to at least one other processor along the path to the destination processor.	03-05-2009
20090063880	System and Method for Providing a High-Speed Message Passing Interface for Barrier Operations in a Multi-Tiered Full-Graph Interconnect Architecture - A method, computer program product, and system are provided performing a Message Passing Interface (MPI) job. A first processor chip receives a set of arrival signals from a set of processor chips executing tasks of the MPI job in the data processing system. The arrival signals identify when a processor chip executes a synchronization operation for synchronizing the tasks for the MPI job. Responsive to receiving the set of arrival signals from the set of processor chips, the first processor chip identifies a fastest processor chip of the set of processor chips whose arrival signal arrived first. An operation of the fastest processor chip is modified based on the identification of the fastest processor chip. The set of processor chips comprises processor chips that are in one of a same processor book or a different processor book of the data processing system.	03-05-2009
20090063885	System and Computer Program Product for Modifying an Operation of One or More Processors Executing Message Passing Interface Tasks - A system and computer program product for modifying an operation of one or more processors executing message passing interface (MPI) tasks are provided. Mechanisms for adjusting the balance of processing workloads of the processors are provided so as to minimize wait periods for waiting for all of the processors to call a synchronization operation. Each processor has an associated hardware implemented MPI load balancing controller. The MPI load balancing controller maintains a history that provides a profile of the tasks with regard to their calls to synchronization operations. From this information, it can be determined which processors should have their processing loads lightened and which processors are able to handle additional processing loads without significantly negatively affecting the overall operation of the parallel execution system. As a result, operations may be performed to shift workloads from the slowest processor to one or more of the faster processors.	03-05-2009
20090063886	System for Providing a Cluster-Wide System Clock in a Multi-Tiered Full-Graph Interconnect Architecture - A system for providing a cluster-wide system clock in a multi-tiered full graph (MTFG) interconnect architecture are provided. Heartbeat signals transmitted by each of the processor chips in the computing cluster are synchronized. Internal system clock signals are generated in each of the processor chips based on the synchronized heartbeat signals. As a result, the internal system clock signals of each of the processor chips are synchronized since the heartbeat signals, that are the basis for the internal system clock signals, are synchronized. Mechanisms are provided for performing such synchronization using direct couplings of processor chips within the same processor book, different processor books in the same supernode, and different processor books in different supernodes of the MTFG interconnect architecture.	03-05-2009
20090063891	System and Method for Providing Reliability of Communication Between Supernodes of a Multi-Tiered Full-Graph Interconnect Architecture - A method, computer program product, and system are provided for providing reliability of communication. A first processor determines a current state of links coupled to ports of a first processor of the data processing system. Each port of the first processor comprises a plurality of links to a corresponding port on a second processor of the data processing system. The current state of the links indicates a level of error associated with each link. The first processor determines, for each link, if a level of error associated with the link exceeds a threshold. For each link whose level of error exceeds the threshold, the first processor tags the link with an error identifier in a switch associated with the ports of the first processor. The first processor reduces a level of usage for transmitting data on ports associated with links tagged with the error identifier.	03-05-2009
20090064139	Method for Data Processing Using a Multi-Tiered Full-Graph Interconnect Architecture - A method is provided for implementing a multi-tiered full-graph interconnect architecture. In order to implement a multi-tiered full-graph interconnect architecture, a plurality of processors are coupled to one another to create a plurality of processor books. The plurality of processor books are coupled together to create a plurality of supernodes. Then, the plurality of supernodes are coupled together to create the multi-tiered full-graph interconnect architecture. Data is then transmitted from one processor to another within the multi-tiered full-graph interconnect architecture based on an addressing scheme that specifies at least a supernode and a processor book associated with a target processor to which the data is to be transmitted.	03-05-2009
20090064140	System and Method for Providing a Fully Non-Blocking Switch in a Supernode of a Multi-Tiered Full-Graph Interconnect Architecture - A method, computer program product, and system are provided for transmitting data from a first processor of a data processing system to a second processor of the data processing system. In one or more switches, a set of virtual channels is created, the one or more switches comprising, for each processor, a corresponding switch in the one or more switches. The data is transmitted from the first processor to the second processor through a path comprising a subset of processors of a set of processors in the data processing system. In each processor of the subset of processors, the data is stored in a virtual channel of a corresponding switch before transmitting the data to a next processor. The virtual channel of the corresponding switch in which the data is stored corresponds to a position of the processor in the path through which the data is transmitted.	03-05-2009
20090064165	Method for Hardware Based Dynamic Load Balancing of Message Passing Interface Tasks - A method for providing hardware based dynamic load balancing of message passing interface (MPI) tasks are provided. Mechanisms for adjusting the balance of processing workloads of the processors executing tasks of an MPI job are provided so as to minimize wait periods for waiting for all of the processors to call a synchronization operation. Each processor has an associated hardware implemented MPI load balancing controller. The MPI load balancing controller maintains a history that provides a profile of the tasks with regard to their calls to synchronization operations. From this information, it can be determined which processors should have their processing loads lightened and which processors are able to handle additional processing loads without significantly negatively affecting the overall operation of the parallel execution system. As a result, operations may be performed to shift workloads from the slowest processor to one or more of the faster processors.	03-05-2009
20090064166	System and Method for Hardware Based Dynamic Load Balancing of Message Passing Interface Tasks - A system and method for providing hardware based dynamic load balancing of message passing interface (MPI) tasks are provided. Mechanisms for adjusting the balance of processing workloads of the processors executing tasks of an MPI job are provided so as to minimize wait periods for waiting for all of the processors to call a synchronization operation. Each processor has an associated hardware implemented MPI load balancing controller. The MPI load balancing controller maintains a history that provides a profile of the tasks with regard to their calls to synchronization operations. From this information, it can be determined which processors should have their processing loads lightened and which processors are able to handle additional processing loads without significantly negatively affecting the overall operation of the parallel execution system. As a result, operations may be performed to shift workloads from the slowest processor to one or more of the faster processors.	03-05-2009
20090064167	System and Method for Performing Setup Operations for Receiving Different Amounts of Data While Processors are Performing Message Passing Interface Tasks - A system and method are provided for performing setup operations for receiving a different amount of data while processors are performing message passing interface (MPI) tasks. Mechanisms for adjusting the balance of processing workloads of the processors are provided so as to minimize wait periods for waiting for all of the processors to call a synchronization operation. An MPI load balancing controller maintains a history that provides a profile of the tasks with regard to their calls to synchronization operations. From this information, it can be determined which processors should have their processing loads lightened and which processors are able to handle additional processing loads without significantly negatively affecting the overall operation of the parallel execution system. As a result, setup operations may be performed while processors are performing MPI tasks to prepare for receiving different sized portions of data in a subsequent computation cycle based on the history.	03-05-2009
20090064168	System and Method for Hardware Based Dynamic Load Balancing of Message Passing Interface Tasks By Modifying Tasks - A system and method are provided for providing hardware based dynamic load balancing of message passing interface (MPI) tasks by modifying tasks. Mechanisms for adjusting the balance of processing workloads of the processors executing tasks of an MPI job are provided so as to minimize wait periods for waiting for all of the processors to call a synchronization operation. Each processor has an associated hardware implemented MPI load balancing controller. The MPI load balancing controller maintains a history that provides a profile of the tasks with regard to their calls to synchronization operations. From this information, it can be determined which processors should have their processing loads lightened and which processors are able to handle additional processing loads without significantly negatively affecting the overall operation of the parallel execution system. Thus, operations may be performed to shift workloads from the slowest processor to one or more of the faster processors.	03-05-2009
20090070617	Method for Providing a Cluster-Wide System Clock in a Multi-Tiered Full-Graph Interconnect Architecture - A method for providing a cluster-wide system clock in a multi-tiered full graph (MTFG) interconnect architecture are provided. Heartbeat signals transmitted by each of the processor chips in the computing cluster are synchronized. Internal system clock signals are generated in each of the processor chips based on the synchronized heartbeat signals. As a result, the internal system clock signals of each of the processor chips are synchronized since the heartbeat signals, that are the basis for the internal system clock signals, are synchronized. Mechanisms are provided for performing such synchronization using direct couplings of processor chips within the same processor book, different processor books in the same supernode, and different processor books in different supernodes of the MTFG interconnect architecture.	03-12-2009
20090138664	CACHE INJECTION USING SEMI-SYNCHRONOUS MEMORY COPY OPERATION - A system, method, and a computer readable for inserting data into a cache memory based on information in a semi-synchronous memory copy instruction are disclosed. The method comprises determining a start of a semi-synchronous memory copy operation. The semi-synchronous memory copy operation is checked for a given value in at least one cache injection bit. In response to the given value in the cache injection bit, a predefined number of lines of destination data is copied into at least one level of cache memory.	05-28-2009
20090157996	Method, System and Program Product for Allocating a Global Shared Memory - A method of operating a data processing system includes each of multiple tasks within a parallel job executing on multiple nodes of the data processing system issuing a system call to request allocation of backing storage in physical memory for global shared memory accessible to all of the multiple tasks within the parallel job, where the global shared memory is in a global address space defined by a range of effective addresses. Each task among the multiple tasks receives an indication that the allocation requested by the system call was successful only if the global address space for that task was previously reserved and backing storage for the global shared memory has not already been allocated.	06-18-2009
20090182968	VALIDITY OF ADDRESS RANGES USED IN SEMI-SYNCHRONOUS MEMORY COPY OPERATIONS - A system, method, and a computer readable for protecting content of a memory page are disclosed. The method includes determining a start of a semi-synchronous memory copy operation. A range of addresses is determined where the semi-synchronous memory copy operation is being performed. An issued instruction that removes a page table entry is detected. The method further includes determining whether the issued instruction is destined to remove a page table entry associated with at least one address in the range of addresses. In response to the issued instruction being destined to remove the page table entry, the execution of the issued instruction is stalled until the semi-synchronous memory copy operation is completed.	07-16-2009
20090193290	System and Method to Use Cache that is Embedded in a Memory Hub to Replace Failed Memory Cells in a Memory Subsystem - A memory system, data processing system, and method are provided for using cache that is embedded in a memory hub device to replace failed memory cells. A memory module comprises an integrated memory hub device. The memory hub device comprises an integrated memory device data interface that communicates with a set of memory devices coupled to the memory hub device and a cache integrated in the memory hub device. The memory hub device also comprises an integrated memory hub controller that controls the data that is read or written by the memory device data interface to the cache based on a determination whether one or more memory cells within the set of memory devices has failed.	07-30-2009
20090198695	Method and Apparatus for Supporting Distributed Computing Within a Multiprocessor System - A locking mechanism for supporting distributed computing within a multiprocessor system is disclosed. A lock control section and a stage control section are assigned to a data block within a system memory. In response to a request for accessing the data block by a processing unit, a determination is made by a memory controller whether or not the lock control section of the data block has been set. If the lock control section of the data block has been set, the access request is denied. Otherwise, if the lock control section of the data block has not been set, another determination is made whether or not a current processing stage of the requesting processing unit matches a processing stage indicated by the stage control section. If the current processing stage of the requesting processing unit does not match the processing stage indicated by the stage control section, the access request is denied; otherwise, the access request is allowed.	08-06-2009
20090198837	System and Method for Providing Remotely Coupled I/O Adapters - A heterogeneous processing element model is provided where I/O devices look and act like processors. In order to be treated like a processor, an I/O processing element, or other special purpose processing element, must follow some rules and have some characteristics of a processor, such as address translation, security, interrupt handling, and exception processing, for example. The heterogeneous processing element model abstracts an I/O device such that communication intended for the I/O device may be packetized and sent over a network. Thus, a virtualization platform may packetize communication intended for a remotely located I/O device and transmit the packetized communication over a distance, rather than having to make a call to a library, call a device driver, pin memory, and so forth.	08-06-2009
20090198849	Memory Lock Mechanism for a Multiprocessor System - A memory lock mechanism within a multi-processor system is disclosed. A lock control section is initially assigned to a data block within a system memory of the multiprocessor system. In response to a request for accessing the data block by a processing unit within the multiprocessor system, a determination is made by a memory controller whether or not the lock control section of the data block has been set. If the lock control section of the data block has been set, the request for accessing the data block is denied. Otherwise, if the lock control section of the data block has not been set, the lock control section of the data block is set, and the request for accessing the data block is allowed.	08-06-2009
20090198865	DATA PROCESSING SYSTEM, PROCESSOR AND METHOD THAT PERFORM A PARTIAL CACHE LINE STORAGE-MODIFYING OPERATION BASED UPON A HINT - In at least one embodiment, a method of data processing in a data processing system having a memory hierarchy includes a processor core executing a storage-modifying memory access instruction to determine a memory address. The processor core transmits to a cache memory within the memory hierarchy a storage-modifying memory access request including the memory address, an indication of a memory access type, and, if present, a partial cache line hint signaling access to less than all granules of a target cache line of data associated with the memory address. In response to the storage-modifying memory access request, the cache memory performs a storage-modifying access to all granules of the target cache line of data if the partial cache line hint is not present and performs a storage-modifying access to less than all granules of the target cache line of data if the partial cache line hint is present.	08-06-2009
20090198891	Issuing Global Shared Memory Operations Via Direct Cache Injection to a Host Fabric Interface - A data processing system enables global shared memory (GSM) operations across multiple nodes with a distributed EA-to-RA mapping of physical memory. Each node has a host fabric interface (HFI), which includes HFI windows that are assigned to at most one locally-executing task of a parallel job. The tasks perform parallel job execution, but map only a portion of the effective addresses (EAs) of the global address space to the local, real memory of the task's respective node. The HFI window tags all outgoing GSM operations (of the local task) with the job ID, and embeds the target node and HFI window IDs of the node at which the EA is memory mapped. The HFI window also enables processing of received GSM operations with valid EAs that are homed to the local real memory of the receiving node, while preventing processing of other received operations without a valid EA-to-RA local mapping.	08-06-2009
20090198897	CACHE MANAGEMENT DURING ASYNCHRONOUS MEMORY MOVE OPERATIONS - A data processing system includes a mechanism for completing an asynchronous memory move (AMM) operation in which the processor receives an AMM ST instruction and processes a processor-level move of data in virtual address space and an asynchronous memory mover then completes a physical move of the data within the real address space (memory). A status/control field of the AMM ST instruction includes an indication of a requested treatment of the lower level cache(s) on completion of the AMM operation. When the status/control field indicates an update to at least one cache should be performed, the asynchronous memory mover automatically forwards a copy of the data from the data move to the lower level cache, and triggers an update of a coherency state for a cache line in which the copy of the data is placed.	08-06-2009
20090198903	DATA PROCESSING SYSTEM, PROCESSOR AND METHOD THAT VARY AN AMOUNT OF DATA RETRIEVED FROM MEMORY BASED UPON A HINT - In at least one embodiment, a processor detects during execution of program code whether a load instruction within the program code is associated with a hint. In response to detecting that the load instruction is not associated with a hint, the processor retrieves a full cache line of data from the memory hierarchy into the processor in response to the load instruction. In response to detecting that the load instruction is associated with a hint, a processor retrieves a partial cache line of data into the processor from the memory hierarchy in response to the load instruction.	08-06-2009
20090198904	Techniques for Data Prefetching Using Indirect Addressing with Offset - A technique for performing data prefetching using indirect addressing includes determining a first memory address of a pointer associated with a data prefetch instruction. Content, that is included in a first data block (e.g., a first cache line) of a memory, at the first memory address is then fetched. An offset is then added to the content of the memory at the first memory address to provide a first offset memory address. A second memory address is then determined based on the first offset memory address. A second data block (e.g., a second cache line) that includes data at the second memory address is then fetched (e.g., from the memory or another memory). A data prefetch instruction may be indicated by a unique operational code (opcode), a unique extended opcode, or a field (including one or more bits) in an instruction.	08-06-2009
20090198905	Techniques for Prediction-Based Indirect Data Prefetching - A technique for data prefetching using indirect addressing includes monitoring data pointer values, associated with an array, in an access stream to a memory. The technique determines whether a pattern exists in the data pointer values. A prefetch table is then populated with respective entries that correspond to respective array address/data pointer pairs based on a predicted pattern in the data pointer values. Respective data blocks (e.g., respective cache lines) are then prefetched (e.g., from the memory or another memory) based on the respective entries in the prefetch table.	08-06-2009
20090198908	METHOD FOR ENABLING DIRECT PREFETCHING OF DATA DURING ASYCHRONOUS MEMORY MOVE OPERATION - While an AMM operation is ongoing, a prefetch request for data from the source effective address or the destination effective address triggers a cache injection by the AMM mover (or memory controller) of relevant data from the stream of data being moved in the physical memory. The memory controller forwards the first prefetched line to the prefetch engine and L1 cache. The memory controller also forwards the next cache lines in the sequence of data to the L2 cache and a subsequent set of cache lines to the L3 cache. The memory controller then forwards the remaining data to the destination memory location. Quick access to prefetch data is enabled by buffering the stream of data in the upper caches rather than placing all the moved data within the memory. Also, the memory controller does not overrun the upper caches, by placing moved data into only a subset of the available cache lines of the upper level cache.	08-06-2009
20090198910	DATA PROCESSING SYSTEM, PROCESSOR AND METHOD THAT SUPPORT A TOUCH OF A PARTIAL CACHE LINE OF DATA - According to method of data processing in a multiprocessor data processing system, in response to a processor touch request targeting a target granule of a cache line of data containing multiple granules, a processing unit originates on an interconnect of the multiprocessor data processing system a partial touch request that requests a copy of only the target granule for subsequent query access. In response to a combined response to the partial touch request indicating success, the combined response representing a system-wide response to the partial touch request, the processing unit receives the target granule of the target cache line and updates a coherency state of the target granule while retaining a coherency state of at least one other granule of the cache line.	08-06-2009
20090198911	DATA PROCESSING SYSTEM, PROCESSOR AND METHOD FOR CLAIMING COHERENCY OWNERSHIP OF A PARTIAL CACHE LINE OF DATA - According to method of data processing in a multiprocessor data processing system, in response to a processor request to modify a target granule of a target cache line of data containing multiple granules, a processing unit originates on an interconnect of the multiprocessor data processing system a data-claim-partial request that requests permission to promote only the target granule of the target cache line to a unique copy with an intent to modify the target granule. In response to a combined response to the data-claim-partial request indicating success (the combined response representing a system-wide response to the data-claim-partial-request), the processing unit promotes only the target granule of the target cache line to a unique copy by updating a coherency state of the target granule and retaining a coherency state of at least one other granule of the target cache line.	08-06-2009
20090198912	DATA PROCESSING SYSTEM, PROCESSOR AND METHOD FOR IMPLEMENTING CACHE MANAGEMENT FOR PARTIAL CACHE LINE OPERATIONS - A method of data processing in a cache memory includes caching a plurality of cache lines of data in a corresponding plurality of entries in a cache array, where each of the plurality of cache lines includes multiple data granules. For each of the plurality of cache entries, a plurality of line coherency state fields indicates an associated coherency state applicable to two or more data granules. For at least a particular cache line among the plurality of cache lines, a granule coherency state field indicates a coherency state for a particular granule of the multiple data granules in the particular cache line, where the coherency state field indicated by the granule coherency state field differs from that indicated for the particular cache line by its line coherency state field.	08-06-2009
20090198914	DATA PROCESSING SYSTEM, PROCESSOR AND METHOD IN WHICH AN INTERCONNECT OPERATION INDICATES ACCEPTABILITY OF PARTIAL DATA DELIVERY - According to at least one embodiment, a method of data processing in a multiprocessor data processing system includes a requesting processing unit initiating an interconnect operation including a memory access request that indicates an acceptability of a variable amount of data to service the interconnect request for data. In response to snooping the memory access request on an interconnect, a snooper selects an amount of data to supply to the requesting processing unit and transmits the selected amount of data to the requesting processing unit. The requesting processing unit receives the selected amount of data and utilizes at least some of the selected amount of data to service a processor request.	08-06-2009
20090198915	DATA PROCESSING SYSTEM, PROCESSOR AND METHOD THAT DYNAMICALLY SELECT A MEMORY ACCESS SIZE - A method of data processing in a processing unit supported by a memory hierarchy includes the processing unit performing a plurality of memory accesses to the memory hierarchy. The plurality of memory accesses includes one or more memory accesses targeting a full cache line of data. The processing unit monitors utilization of data accessed by the plurality of memory accesses, and based upon the utilization of the data, dynamically alters a memory access mode of operation so that a subsequent storage-modifying memory access targets less than a full cache line of data.	08-06-2009
20090198916	Method and Apparatus for Supporting Low-Overhead Memory Locks Within a Multiprocessor System - A method for supporting low-overhead memory locks within a multi-processor system is disclosed. A lock control section is initially assigned to a data block within a system memory of the multiprocessor system. In response to a request for accessing the data block by a processing unit within the multiprocessor system, a determination is made by a memory controller whether or not the lock control section of the data block has been set. If the lock control section of the data block has been set, the request for accessing the data block is ignored. Otherwise, if the lock control section of the data block has not been set, the lock control section of the data block is set, and the request for accessing the data block is allowed.	08-06-2009
20090198917	SPECIALIZED MEMORY MOVE BARRIER OPERATIONS - An instruction set architecture (ISA) includes an asynchronous memory move (AMM) synchronization (SYNC) instruction. When processor of a data processing system executes the AMM SYNC instruction, the processor prevents an AMM operation generated by a subsequently received/executed AMM ST instruction from proceeding with the data move portion of the AMM operation within the memory subsystem until completion of all ongoing memory access operations within the memory subsystem and fabric. The AMM operation does not wait for a normal barrier operation. The processor forwards the information relevant to initiate the AMM operation to an asynchronous memory mover logic, and signals the logic to not proceed with the AMM operation until signaled of the completion of the AMM SYNC.	08-06-2009
20090198918	Host Fabric Interface (HFI) to Perform Global Shared Memory (GSM) Operations - A data processing system enables global shared memory (GSM) operations across multiple nodes with a distributed EA-to-RA mapping of physical memory. Each node has a host fabric interface (HFI), which includes HFI windows that are assigned to at most one locally-executing task of a parallel job. The tasks perform parallel job execution, but map only a portion of the effective addresses (EAs) of the global address space to the local, real memory of the task's respective node. The HFI window tags all outgoing GSM operations (of the local task) with the job ID, and embeds the target node and HFI window IDs of the node at which the EA is memory mapped. The HFI window also enables processing of received GSM operations with valid EAs that are homed to the local real memory of the receiving node, while preventing processing of other received operations without a valid EA-to-RA local mapping.	08-06-2009
20090198920	Processing Units Within a Multiprocessor System Adapted to Support Memory Locks - A processing unit within a multiprocessor system adapted to support memory locks is disclosed. In response to a request for accessing a data block being denied when a lock control section of the data block has been set by a memory controller, a timer countdown is started within a processing unit within a multiprocessor system. The requesting processing unit can relinquish the access request once the timer countdown has reached a timeout period.	08-06-2009
20090198933	Method and Apparatus for Handling Multiple Memory Requests Within a Multiprocessor System - A method for handling multiple memory requests within a multi-processor system is disclosed. A lock control section is initially assigned to a data block within a system memory. In response to a request for accessing the data block by a processing unit, a determination is made whether or not the lock control section of the data block has been set. If the lock control section has been set, another determination is made whether or not the requesting processing unit is located beyond a predetermined distance from a memory controller. If the requesting processing unit is located beyond a predetermined distance from the memory controller, the requesting processing unit is invited to perform other functions; otherwise, the number of the requesting processing unit is placed in a queue table. However, if the lock control section has not been set, the lock control section of the data block is set, and the access request is allowed.	08-06-2009
20090198934	FULLY ASYNCHRONOUS MEMORY MOVER - A data processing system has a processor and a memory coupled to the processor and an asynchronous memory mover coupled to the processor. The asynchronous memory mover has registers for receiving a set of parameters from the processor, which parameters are associated with an asynchronous memory move (AMM) operation initiated by the processor in virtual address space, utilizing a source effective address and a destination effective address. The asynchronous memory mover performs the AMM operation to move the data from a first physical memory location having a source real address corresponding to the source effective address to a second physical memory location having a destination real address corresponding to the destination effective address. The asynchronous memory mover has an associated off-chip translation mechanism. The AMM operation thus occurs independent of the processor, and the processor continues processing other operations independent of the AMM operation.	08-06-2009
20090198935	METHOD AND SYSTEM FOR PERFORMING AN ASYNCHRONOUS MEMORY MOVE (AMM) VIA EXECUTION OF AMM STORE INSTRUCTION WITHIN INSTRUCTION SET ARCHITECTURE - A data processing system with a processor and memory includes an instruction set architecture (ISA) that provides an asynchronous memory move (AMM) store (ST) instruction. When the processor executes the AMM ST instruction, the processor performs a series of functions, which initiates an asynchronous memory move (AMM) operation. The AMM ST instruction moves data from a first memory location having a first real address to a second memory location having a second real address by: (a) performing a move of the data in virtual address space utilizing a source effective address that is memory mapped to the first memory location and a destination effective address that is memory mapped to the second memory location. When the move is completed in the virtual address space, the AMM operation performs the physical move of the data to the second memory location outside the processor core, without processor involvement.	08-06-2009
20090198936	REPORTING OF PARTIALLY PERFORMED MEMORY MOVE - A method performed in a data processing system initiates an asynchronous memory move (AMM) operation, whereby a processor performs a move of data in virtual address space from a first effective address to a second effective address and forwards parameters of the AMM operation to asynchronous memory mover logic for completion of the physical movement of data from a first memory location to a second memory location. The processor executes a second operation, which checks a status of the completion of the data move and returns a notification indicating the status. The notification indicates a status, which includes one of: data move in progress; data move totally done; data move partially done; data move cannot be performed; and occurrence of a translation look-aside buffer invalidate entry (TLBIE) operation. The processor initiates one or more actions in response to the notification received.	08-06-2009
20090198937	MECHANISMS FOR COMMUNICATING WITH AN ASYNCHRONOUS MEMORY MOVER TO PERFORM AMM OPERATIONS - A data processing system includes a set of architected registers within which the processor places state and other information to communicate with the asynchronous memory mover in order to initiate and control an AMM operation. The asynchronous memory mover performs an asynchronous memory move (AMM) operation in response to receiving a set of parameters within the architected registers, which parameters are associated with an AMM store instruction executed by the processor to initiates a move of data in virtual space before placing the information in the architected registers. The architected registers are processor architected registers, defined on a per thread basis by a compiler, or memory mapped architected registers allocated for communicating with the asynchronous memory mover during a bind and subsequent execution of an application. The architected registers are also utilized to store state information to enable a restore to a point before execution of the AMM operation.	08-06-2009
20090198938	HANDLING OF ADDRESS CONFLICTS DURING ASYNCHRONOUS MEMORY MOVE OPERATIONS - A method within a data processing system in which a processor handles conflicts, which occur during performance by an asynchronous memory mover of an asynchronous memory move (AMM) operation. The asynchronous memory mover performs an asynchronous memory move (AMM) operation by which the actual data is moved from a source to a destination memory location, independent of the processor. The memory mover sets a flag bit to indicate that the asynchronous memory mover is currently performing an AMM operation at the memory. When the processor receives a memory access operation, the processor checks the value of the flag bit before issuing the new memory access operation, and checks the associated address of the AMM operation to determine possible address conflicts. The processor then evaluates and responds to address conflicts to prevent corruption of data during an AMM operation.	08-06-2009
20090198939	LAUNCHING MULTIPLE CONCURRENT MEMORY MOVES VIA A FULLY ASYNCHRONOOUS MEMORY MOVER - A data processing system has an asynchronous memory mover, which includes multiple sets of registers for storing addressing and control parameters utilized to generate one or more asynchronous memory move (AMM) operations. The memory mover detects a receipt of a first set of parameters in a first set of registers from the processor. The processor forwards the parameters after the processor initiates a data move in virtual address space, utilizing a source effective address and a destination effective address. The memory mover responds to receiving the first set of parameters by generating and launching a first asynchronous memory move (AMM) operation. When the memory mover receives a second set of parameters in a second set of registers before the first AMM operation completes, the memory mover generates and launches a second AMM operation concurrently with the first AMM operation if no address conflicts exist.	08-06-2009
20090198948	Techniques for Data Prefetching Using Indirect Addressing - A technique for performing indirect data prefetching includes determining a first memory address of a pointer associated with a data prefetch instruction. Content of a memory at the first memory address is then fetched. A second memory address is determined from the content of the memory at the first memory address. Finally, a data block (e.g., a cache line) including data at the second memory address is fetched (e.g., from the memory or another memory).	08-06-2009
20090198950	Techniques for Indirect Data Prefetching - A processor includes a first address translation engine, a second address translation engine, and a prefetch engine. The first address translation engine is configured to determine a first memory address of a pointer associated with a data prefetch instruction. The prefetch engine is coupled to the first translation engine and is configured to fetch content, included in a first data block (e.g., a first cache line) of a memory, at the first memory address. The second address translation engine is coupled to the prefetch engine and is configured to determine a second memory address based on the content of the memory at the first memory address. The prefetch engine is also configured to fetch (e.g., from the memory or another memory) a second data block (e.g., a second cache line) that includes data at the second memory address.	08-06-2009
20090198951	Full Virtualization of Resources Across an IP Interconnect - An addressing model is provided where all resources, including memory and devices, are addressed with internet protocol (IP) addresses. A task, such as an application, may be assigned a range of IP addresses rather than an effective address range. Thus, a processing element, such as an I/O adapter or even a printer, for example, may also be addressed using IP addresses without the need for library calls, device drivers, pinning memory, and so forth. This addressing model also provides full virtualization of resources across an IP interconnect, allowing a process to access an I/O device across a network.	08-06-2009
20090198953	Full Virtualization of Resources Across an IP Interconnect Using Page Frame Table - An addressing model is provided where devices, including I/O devices, are addressed with internet protocol (IP) addresses, which are considered part of the virtual address space. A task, such as an application, may be assigned an effective address range, which corresponds to addresses in the virtual address space. The virtual address space is expanded to include Internet protocol addresses. Thus, the page frame tables are also modified to include entries for IP addresses and additional properties for devices and I/O. Thus, a processing element, such as an I/O adapter or even a printer, for example, may also be addressed using IP addresses without the need for library calls, device drivers, pinning memory, and so forth. This addressing model also provides full virtualization of resources across an IP interconnect, allowing a process to access an I/O device across a network.	08-06-2009
20090198955	ASYNCHRONOUS MEMORY MOVE ACROSS PHYSICAL NODES (DUAL-SIDED COMMUNICATION FOR MEMORY MOVE) - A distributed data processing system includes: (1) a first node with a processor, a first memory, and asynchronous memory mover logic; and connection mechanism that connects (2) a second node having a second memory. The processor includes processing logic for completing a cross-node asynchronous memory move (AMM) operation, wherein the processor performs a move of data in virtual address space from a first effective address to a second effective address, and the asynchronous memory mover logic completes a physical move of the data from a first memory location in the first memory having a first real address to a second memory location in the second memory having a second real address. The data is transmitted via the connection mechanism connecting the two nodes independent of the processor.	08-06-2009
20090198956	System and Method for Data Processing Using a Low-Cost Two-Tier Full-Graph Interconnect Architecture - A system and method are provided for implementing a two-tier full-graph interconnect architecture. In order to implement a two-tier full-graph interconnect architecture, a plurality of processors are coupled to one another to create a plurality of supernodes. Then, the plurality of supernodes are coupled together to create the two-tier full-graph interconnect architecture. Data is then transmitted from one processor to another within the two-tier full-graph interconnect architecture based on an addressing scheme that specifies at least a supernode and a processor chip identifier associated with a target processor to which the data is to be transmitted.	08-06-2009
20090198957	System and Method for Performing Dynamic Request Routing Based on Broadcast Queue Depths - A system and method for performing dynamic request routing based on broadcast depth queue information are provided. Each processor chip in the system may use a synchronized heartbeat signal it generates to provide queue depth information to each of the other processor chips in the system. The queue depth information identifies a number of requests or amount of data in each of the queues of a processor chip that originated the heartbeat signal. The queue depth information from each of the processor chips in the system may be used by the processor chips in determining optimal routing paths for data from a source processor chip to a destination processor chip. As a result, the congestion of data for processing at each of the processor chips along each possible routing path may be taken into account when selecting to which processor chip to forward data.	08-06-2009
20090198958	System and Method for Performing Dynamic Request Routing Based on Broadcast Source Request Information - A system and method for performing dynamic request routing based on broadcast source request information are provided. Each processor chip in the system may use a synchronized heartbeat signal it generates to provide source request information to each of the other processor chips in the system. The source request information identifies the number of active source requests sent by the processor chip that originated the heartbeat signal. The source request information from each of the processor chips in the system may be used by the processor chips in determining optimal routing paths for data from a source processor chip to a destination processor chip. As a result, the congestion of data for processing at each of the processor chips along each possible routing path may be taken into account when selecting to which processor chip to forward data.	08-06-2009
20090198960	DATA PROCESSING SYSTEM, PROCESSOR AND METHOD THAT SUPPORT PARTIAL CACHE LINE READS - According to a method of data processing in a multiprocessor data processing system, in response to a processor request to read a target granule of a target cache line of data containing multiple granules, a processing unit originates on an interconnect of the multiprocessor data processing system a partial read request that requests permission to read only the target granule of the target cache line. In response to a combined response to the partial read request indicating success, the combined response representing a system-wide response to the partial read request, the processing unit receives the target granule of the target cache line, supplies the target granule to a requesting processor core, and updates a coherency state of the target granule while retaining a coherency state of at least one other granule of the target cache line.	08-06-2009
20090198963	COMPLETION OF ASYNCHRONOUS MEMORY MOVE IN THE PRESENCE OF A BARRIER OPERATION - A method within a data processing system by which a processor executes an asynchronous memory move (AMM) store (ST) instruction to complete a corresponding AMM operation in parallel with an ongoing (not yet completed), previously issued barrier operation. The processor receives the AMM ST instruction after executing the barrier operation (or SYNC instruction) and before the completion of the barrier operation or SYNC on the system fabric. The processor continues executing the AMM ST instruction, which performs a move in virtual address space and then triggers the generation of the AMM operation. The AMM operation proceeds while the barrier operation continues, independent of the processor. The processor stops further execution of all other memory access requests, excluding AMM ST instructions that are received after the barrier operation, but before completion of the barrier operation.	08-06-2009
20090198965	METHOD AND SYSTEM FOR SOURCING DIFFERING AMOUNTS OF PREFETCH DATA IN RESPONSE TO DATA PREFETCH REQUESTS - According to a method of data processing, a memory controller receives a prefetch load request from a processor core of a data processing system. The prefetch load request specifies a requested line of data. In response to receipt of the prefetch load request, the memory controller determines by reference to a stream of demand requests how much data is to be supplied to the processor core in response to the prefetch load request. In response to the memory controller determining to provide less than all of the requested line of data, the memory controller provides less than all of the requested line of data to the processor core.	08-06-2009
20090198971	Heterogeneous Processing Elements - A heterogeneous processing element model is provided where I/O devices look and act like processors. In order to be treated like a processor, an I/O processing element, or other special purpose processing element, must follow some rules and have some characteristics of a processor, such as address translation, security, interrupt handling, and exception processing, for example. The heterogeneous processing element model puts special purpose processing elements on the same playing field as processors, from a programming perspective, operating system perspective, power perspective, as the processors. The operating system can get work to a security engine, for example, in the same way it does to a processor.	08-06-2009
20090198975	TERMINATION OF IN-FLIGHT ASYNCHRONOUS MEMORY MOVE - A data processing system has a processor, a memory, and an instruction set architecture (ISA) that includes: (1) an asynchronous memory mover (AMM) store (ST) instruction initiates an asynchronous memory move operation that moves data from a first memory location having a first real address to a second memory location having a second real address by: (a) first performing a move of the data in virtual address space utilizing a source effective address a destination effective address; and (b) when the move is completed, completing a physical move of the data to the second memory location, independent of the processor. The ISA further provides (2) an AMM terminate ST instruction for stopping an ongoing AMM operation before completion of the AMM operation, and (3) a LD CMP instruction for checking a status of an AMM operation.	08-06-2009
20090199028	Wake-and-Go Mechanism with Data Exclusivity - Snoop response logic on a system bus is configured to detect on the system bus requests to access data at a target address with data exclusivity from at least one of a plurality of wake-and-go engines. The snoop response logic is further configured to determine a winning wake-and-go engine from the at least one wake-and-go engine that obtains a lock on the target address and generate a combined snoop response. The combined snoop response identifies the winning wake-and-go engine. The snoop response logic sends the combined snoop response to the at least one wake-and-go engine on the system bus. Each remaining wake-and-go engine within the at least one wake-and-go engine places an entry in its respective wake-and-go storage array to spin on a lock for the target address.	08-06-2009
20090199029	Wake-and-Go Mechanism with Data Monitoring - A wake-and-go mechanism is provided for a data processing system. The wake-and-go mechanism recognizes a programming idiom, specialized instruction, operating system call, or application programming interface call that indicates that a thread is waiting for an event. The wake-and-go mechanism updates a wake-and-go array with a target address, expected data value, and comparison type associated with the event. The thread then goes to sleep until the event occurs. The wake-and-go array may be a content addressable memory (CAM). When a transaction appears on the symmetric multiprocessing (SMP) fabric that modifies the value at a target address in the CAM, logic associated with the CAM performs a comparison based on the data value being written, expected data value, and comparison type.	08-06-2009
20090199030	Hardware Wake-and-Go Mechanism for a Data Processing System - A hardware wake-and-go mechanism is provided for a data processing system. The wake-and-go mechanism recognizes a programming idiom that indicates that a thread is waiting for an event. The wake-and-go mechanism updates a wake-and-go array with a target address associated with the event. The thread then goes to sleep until the event occurs. The wake-and-go array may be a content addressable memory (CAM). When a transaction appears on the symmetric multiprocessing (SMP) fabric that modifies the value at a target address in the CAM, the CAM returns a list of storage addresses at which the target address is stored. The wake-and-go mechanism associates these storage addresses with the threads waiting for an even at the target addresses, and may wake the one or more threads waiting for the event.	08-06-2009
20090199170	Helper Thread for Pre-Fetching Data - A set of helper thread binaries is created to retrieve data used by a set of main thread binaries. If executing a portion of the set of helper thread binaries results in the retrieval of data needed by the set of main thread binaries, then that retrieved data is utilized by the set of main thread binaries.	08-06-2009
20090199181	Use of a Helper Thread to Asynchronously Compute Incoming Data - A set of helper thread binaries is created from a set of main thread binaries. The helper thread monitors software or hardware ports for incoming data events. When the helper thread detects an incoming event, the helper thread asynchronously executes instructions that calculate incoming data needed by the main thread.	08-06-2009
20090199183	Wake-and-Go Mechanism with Hardware Private Array - A wake-and-go mechanism is provided for a data processing system. When a thread is waiting for an event, rather than performing a series of get-and-compare sequences, the thread updates a wake-and-go array with a target address associated with the event. The wake-and-go mechanism may save the state of the thread in a hardware private array. The hardware private array may comprise a plurality of memory cells embodied within the processor or pervasive logic associated with the bus, for example. Alternatively, the hardware private array may be embodied within logic associated with the wake-and-go storage array.	08-06-2009
20090199184	Wake-and-Go Mechanism With Software Save of Thread State - A wake-and-go mechanism is provided for a data processing system. When a thread is waiting for an event, rather than performing a series of get-and-compare sequences, the thread updates a wake-and-go array with a target address associated with the event. Software may save the state of the thread. The thread is then put to sleep. When the wake-and-go array snoops a kill at a given target address, logic associated with wake-and-go array may generate an exception, which may result in a switch to kernel mode, wherein the operating system performs some action before returning control to the originating process. In this case, the trap results in other software, such as the operating system or background sleeper thread, for example, to reload thread from thread state storage and to continue processing of the active threads on the processor.	08-06-2009
20090199189	Parallel Lock Spinning Using Wake-and-Go Mechanism - A wake-and-go mechanism is provided for a data processing system. The wake-and-go mechanism recognizes a programming idiom that indicates that a thread is spinning on a lock. The wake-and-go mechanism updates a wake-and-go array with a target address associated with the lock and sets a lock bit in the wake-and-go array. The thread then goes to sleep until the lock frees. The wake-and-go array may be a content addressable memory (CAM). When a transaction appears on the symmetric multiprocessing (SMP) fabric that modifies the value at a target address in the CAM, the CAM returns a list of storage addresses at which the target address is stored. The wake-and-go mechanism associates these storage addresses with the threads waiting for an even at the target addresses, and may wake the thread that is spinning on the lock.	08-06-2009
20090199197	Wake-and-Go Mechanism with Dynamic Allocation in Hardware Private Array - A wake-and-go mechanism is provided for a data processing system. When a thread is waiting for an event, rather than performing a series of get-and-compare sequences, the thread updates a wake-and-go array with a target address associated with the event. The wake-and-go mechanism may save the state of the thread in a hardware private array. The hardware private array may comprise a plurality of memory cells embodied within the processor or pervasive logic associated with the bus, for example. Alternatively, the hardware private array may be embodied within logic associated with the wake-and-go storage array.	08-06-2009
20100005202	DYNAMIC SEGMENT SPARING AND REPAIR IN A MEMORY SYSTEM - A communication interface device, system, method, and design structure for providing dynamic segment sparing and repair in a memory system. The communication interface device includes drive-side switching logic including driver multiplexers to select driver data for transmitting on link segments of a bus, and receive-side switching logic including receiver multiplexers to select received data from the link segments of the bus. The bus includes multiple data link segments, a clock link segment, and at least two spare link segments selectable by the drive-side switching logic and the receive-side switching logic to replace one or more of the data link segments and the clock link segment.	01-07-2010
20100067193	CONVERGENCE OF AIR WATER COOLING OF AN ELECTRONICS RACK AND A COMPUTER ROOM IN A SINGLE UNIT - Systems and methods are provided for cooling an electronics rack and a computer room from a single unit, which includes a heat-generating electronics subsystem across which air flows from an air inlet to an air outlet side of the rack. First and second modular cooling units (MCUs) are associated with the rack and configured to provide system coolant to the electronics subsystem for cooling thereof. System coolant supply and return manifolds are in fluid communication with the MCUs for facilitating providing of system coolant to the electronics subsystem, and to an air-to-liquid heat exchanger associated with the rack for exclusively cooling air passing through the rack, as well as conditioning the ambient air of the computer room. Such cooling is exclusive of an outside-of-rack conditioned air unit.	03-18-2010
20100070710	Techniques for Cache Injection in a Processor System - A technique for performing cache injection includes monitoring addresses on a bus. Ownership of input/output data on the bus is acquired by a cache when an address on the bus (that is associated with the input/output data) corresponds to an address of a data block stored in the cache.	03-18-2010
20100070711	Techniques for Cache Injection in a Processor System Using a Cache Injection Instruction - A technique for performing cache injection includes monitoring addresses on a bus in response to a cache injection instruction. Ownership of input/output data on the bus is acquired by a cache when an address on the bus (that is associated with the input/output data) corresponds to an address of a data block associated with the cache injection instruction.	03-18-2010
20100070712	Techniques for Cache Injection in a Processor System with Replacement Policy Position Modification - A technique for performing cache injection includes monitoring, at a cache, addresses on a bus. Ownership of input/output data on the bus is then acquired by the cache when an address on the bus (that is associated with the input/output data) corresponds to an address of a data block stored in the cache. A replacement policy position of the data block is then modified (to increase a probability that the data block is consumed prior to ejection from the cache).	03-18-2010
20100070717	Techniques for Cache Injection in a Processor System Responsive to a Specific Instruction Sequence - A technique for performing cache injection includes monitoring an instruction stream for a specific instruction sequence. Addresses on a bus are then monitored, at a cache, in response to detecting the specific instruction sequence a determined number of times. Ownership of input/output data on the bus is then acquired by the cache when an address on the bus (that is associated with the input/output data) corresponds to an address of a data block stored in the cache.	03-18-2010
20100122011	Method and Apparatus for Supporting Multiple High Bandwidth I/O Controllers on a Single Chip - An integrated processor design includes physical interface macros supporting heterogeneous electrical properties. The processor design comprises a plurality of processing cores and a plurality of physical interfaces to connect to a memory interface, a peripheral component interconnect express (PCI Express or PCIe) interface for input/output, an Ethernet interface for network communication, and/or a serial attached SCSI (SAS) interface for storage. Each physical interface may be programmatically connected to a selected interface controller, such as a memory controller, a PCI Express controller, or an Ethernet controller, for example. A plurality of such controllers may be connected to a switch within the processor design, with the switch also being connected to each physical interface macro. Thus, the physical interface macros may be programmatically connected to a subset of the plurality of controllers.	05-13-2010
20100122107	Physical Interface Macros (PHYS) Supporting Heterogeneous Electrical Properties - An integrated processor design includes physical interface macros supporting heterogeneous electrical properties. The processor design comprises a plurality of processing cores and a plurality of physical interfaces to connect to a memory interface, a peripheral component interconnect express (PCI Express or PCIe) interface for input/output, an Ethernet interface for network communication, and/or a serial attached SCSI (SAS) interface for storage.	05-13-2010
20100153648	Block Driven Computation Using A Caching Policy Specified In An Operand Data Structure - A processor has an associated memory hierarchy including a cache memory. The processor includes an instruction sequencing unit that fetches instructions for processing, an operand data structure including a plurality of entries corresponding to operands of operations to be performed by the processor, and a computation engine. A first entry among the plurality of entries in the operand data structure specifies a first caching policy for a first operand, and a second entry specifies a second caching policy for a second operand. The computation engine computes and stores operands in the memory hierarchy in accordance with the cache policies indicated within the operand data structure.	06-17-2010
20100153681	Block Driven Computation With An Address Generation Accelerator - A processor includes at least one execution unit that executes instructions, at least one register file, coupled to the at least one execution unit, that buffers operands for access by the at least one execution unit, an instruction sequencing unit that fetches instructions for execution by the at least one execution unit, and an address generation accelerator. The address generation accelerator, responsive to an initiation signal received from the instruction sequencing unit, computes and outputs first and second effective addresses of operands of an operation.	06-17-2010
20100153683	Specifying an Addressing Relationship In An Operand Data Structure - A processor includes at least one execution unit that executes instructions, at least one register file, coupled to the at least one execution unit, that buffers operands for access by the at least one execution unit, and an instruction sequencing unit that fetches instructions for execution by the execution unit. The processor further includes an operand data structure and an address generation accelerator. The operand data structure specifies a first relationship between addresses of sequential accesses within a first address region and a second relationship between addresses of sequential accesses within a second address region. The address generation accelerator computes a first address of a first memory access in the first address region by reference to the first relationship and a second address of a second memory access in the second address region by reference to the second relationship.	06-17-2010
20100153931	Operand Data Structure For Block Computation - In response to receiving pre-processed code, a compiler identifies a code section that is not a candidate for acceleration and a code block that is a candidate for acceleration. The code block specifies an iterated operation having a first operand and a second operand, where each of multiple first operands and each of multiple second operands for the iterated operation has a defined addressing relationship. In response to the identifying, the compiler generates post-processed code containing lower level instruction(s) corresponding to the identified code section and creates and outputs an operand data structure separate from the post-processed code. The operand data structure specifies the defined addressing relationship for the multiple first operands and for the multiple second operands. The compiler places a block computation command in the post-processed code that invokes processing of the operand data structure to compute operand addresses.	06-17-2010
20100153938	Computation Table For Block Computation - In response to receiving pre-processed code, a compiler identifies a code section that is not candidate for acceleration and identifying a code block specifying an iterated operation that is a candidate for acceleration. In response to identifying the code section, the compiler generates post-processed code containing one or more lower level instructions corresponding to the identified code section, and in response to identifying the code block, the compiler creates and outputs an operation data structure separate from the post-processed code that identifies the iterated operation. The compiler places a block computation command in the post-processed code that invokes processing of the operation data structure to perform the iterated operation and outputs the post-processed code.	06-17-2010
20100158048	Reassembling Streaming Data Across Multiple Packetized Communication Channels - Mechanisms are provided for processing streaming data at high sustained data rates. These mechanisms receive a plurality of data elements over a plurality of non-sequential communication channels and write the plurality of data elements directly to the file system of the data processing system in an unassembled manner. The mechanisms further perform a data scrubbing operation to determine if there are any missing data elements that are not present in the plurality of data elements written to the file system and assemble the plurality of data elements into a plurality of data streams associated with the plurality of non-sequential communication channels in response to results of the data scrubbing indicating that there are no missing data elements. In addition, the mechanisms release the assembled plurality of data streams for access via the file system.	06-24-2010
20100161704	Management of Process-to-Process Inter-Cluster Communication Requests - A mechanism is provided for managing a process-to-process inter-cluster communication request. A call from a first application is received in a first operating system in a first data processing system. The first operating system passes the call from the first operating system to a first host fabric interface controller in the first data processing system without processing the call. The first host fabric interface processes the call to determine a second data processing system in the plurality of data processing systems with which the call is associated, wherein the call is processed by the first host fabric interface without intervention by the first operating system. The first host fabric interface initiates an inter-cluster connection to a second host fabric interface in the second data processing. The call is then transferred to the second host fabric interface in the second data processing system via the inter-cluster connection.	06-24-2010
20100161705	Management of Application to Application Communication Requests Between Data Processing Systems - A mechanism is provided for managing an application communication request. A first operating system passes a call from a first application in a first data processing system intended for a second application in a second data processing system to a first host fabric interface controller in the first data processing system without processing the call. The first host fabric interface processes the call using state information associated with the call to determine the second data processing system with which the call is associated. The first host fabric interface initiates a connection to a second host fabric interface in the second data processing system and transfers the call to a second operating system in the second data processing system via the connection to the second host fabric interface. The second data processing system then processes the call intended for the second application without assistance from the second application.	06-24-2010
20100162271	Management of Process-to-Process Intra-Cluster Communication Requests - A mechanism is provided for managing a process-to-process intra-cluster communication request. A call from a first application is received in a first operating system in a first data processing system. The first operating system passes the call from the first operating system to a first host fabric interface controller in the first data processing system without processing the call. The first host fabric interface controller processes the call without intervention by the first operating system to determine a second data processing system in the plurality of data processing systems with which the call is associated. The first host fabric interface controller initiates an intra-cluster connection to a second host fabric interface controller in the second data processing system. The first host fabric interface controller then transfers the call to the second host fabric interface controller in the second data processing system via the intra-cluster connection.	06-24-2010
20100162272	Management of Application to I/O Device Communication Requests Between Data Processing Systems - A mechanism is provided for managing an input/output device communication request. A first operating system passes a call from a first application intended for an input/output device in a second data processing system to a first host fabric interface controller in the first data processing system without processing the call. The first host fabric interface processes the call to determine the second data processing system with which the call is associated. The first host fabric interface initiates a connection to a second host fabric interface in the second data processing system and transfers the call to a second operating system associated with the input/output device in the second data processing system via the connection to the second host fabric interface. The second operating system then processes the call intended for the input/output device without assistance from any application running on the second data processing system.	06-24-2010
20100262578	Consolidating File System Backend Operations with Access of Data - Mechanisms for performing a backend operation in a file system are provided. A backend operation on a portion of the file system is initiated. At least one indirect transition table data structure is created for performing the backend operation. Metadata corresponding to the portion of the file system is linked to the at least one indirect transition table data structure. The backend operation is performed on data in a sub-portion of the portion of the file system and the at least one indirect transition table data structure is updated with pointers to new locations of the data in the sub-portion as transitions of the data are completed. At least one data access operation is performed to the portion of the file system at substantially a same time as performing the backend operation on the data in the sub-portion of the portion of the file system.	10-14-2010
20100262787	TECHNIQUES FOR CACHE INJECTION IN A PROCESSOR SYSTEM BASED ON A SHARED STATE - A technique for performing cache injection includes monitoring, at a host fabric interface, snoop responses to an address on a bus. When the snoop responses indicate a data block associated with the address is in a shared state, input/output data associated with the address on the bus is directed to a cache that includes the data block in the shared state and is located physically closer to the host fabric interface than one or more other caches that include the data block associated with the address in the shared state.	10-14-2010
20100262883	Dynamic Monitoring of Ability to Reassemble Streaming Data Across Multiple Channels Based on History - Mechanisms are provided for processing streaming data at high sustained data rates. These mechanisms receive a plurality of data elements over a plurality of non-sequential communication channels and write the plurality of data elements directly to the file system of the data processing system in an unassembled manner. The mechanisms determining whether to perform a data scrubbing operation or not based on history information indicative of whether data elements in the plurality of data elements are being received in a substantially sequential manner. The mechanisms perform a data scrubbing operation, in response to a determination to perform data scrubbing, to identify any missing data elements in the plurality of data elements written to the file system and assemble the plurality of data elements into a plurality of data streams in response to results of the data scrubbing indicating that there are no missing data elements.	10-14-2010
20100263855	ENVIRONMENTAL CONTROL OF LIQUID COOLED ELECTRONICS - A method, system, and computer program product are provided for controlling liquid-cooled electronics, which includes measuring a first set point temperature, T	10-21-2010
20100268788	Remote Asynchronous Data Mover - A distributed data processing system executes multiple tasks within a parallel job, including a first local task on a local node and at least one task executing on a remote node, with a remote memory having real address (RA) locations mapped to one or more of the source effective addresses (EA) and destination EA of a data move operation initiated by a task executing on the local node. On initiation of the data move operation, remote asynchronous data move (RADM) logic identifies that the operation moves data to/from a first EA that is memory mapped to an RA of the remote memory. The local processor/RADM logic initiates a RADM operation that moves a copy of the data directly from/to the first remote memory by completing the RADM operation using the network interface cards (NICs) of the source and destination processing nodes, determined by accessing a data center for the node IDs of remote memory.	10-21-2010
20100268790	Complex Remote Update Programming Idiom Accelerator - A remote update programming idiom accelerator is configured to detect a complex remote update programming idiom in an instruction sequence of a thread. The complex remote update programming idiom includes a read operation for reading data from a storage location at a remote node, a sequence of instructions for performing an update operation on the data to form result data, and a write operation for writing the result data to the storage location at the remote node. The remote update programming idiom accelerator is configured to determine whether the sequence of instructions is longer than an instruction size threshold and responsive to a determination that the sequence of instructions is not longer than the instruction size threshold, transmit the complex remote update programming idiom to the remote node to perform the update operation on the data at the remote node.	10-21-2010
20100268791	Programming Idiom Accelerator for Remote Update - A programming idiom accelerator identifies a remote update programming idiom. The programming idiom accelerator sends the remote update to a remote node to perform an operation on data at the remote node. A programming idiom accelerator at the remote node receives the remote update and performs the update as a local operation. The programming idiom accelerator at the remote node may request processing resources from a virtualization layer to perform the update. The programming idiom accelerator may determine a remote update threshold that limits the instruction size of remote updates that may be sent to a remote node. The programming idiom accelerator may also determine a dedicated threshold that limits the instruction size of remote updates that may be performed by the programming idiom accelerator.	10-21-2010
20100268896	TECHNIQUES FOR CACHE INJECTION IN A PROCESSOR SYSTEM FROM A REMOTE NODE - A technique for performing cache injection in a processor system includes monitoring, by a cache, addresses on a bus. Input/output data associated with an address of a data block stored in the cache is then requested from a remote node, via a network controller. Ownership of the input/output data is acquired by the cache when an address on the bus that is associated with the input/output data corresponds to the address of the data block stored in the cache.	10-21-2010
20100268915	Remote Update Programming Idiom Accelerator with Allocated Processor Resources - A data processing system comprises at least one processing unit, a virtualization layer, and a remote update programming idiom accelerator. The remote update programming idiom accelerator is configured to receive a complex remote update programming idiom from a remote node. Responsive to a determination that the sequence of instructions in the complex remote update programming idiom is longer than a dedicated processor threshold, the remote update programming idiom accelerator is configured to request a processing unit from the virtualization layer in the data processing system, and receive an allocation of a processing unit from the virtualization layer. The allocated processing unit is configured to read the data from the storage location local to the data processing system, execute the sequence of instructions to perform the update operation on the data to form result data, and write the result data to the storage location local to the data processing system.	10-21-2010
20100269027	USER LEVEL MESSAGE BROADCAST MECHANISM IN DISTRIBUTED COMPUTING ENVIRONMENT - A data processing system is programmed to provide a method for enabling user-level one-to-all message/messaging (OTAM) broadcast within a distributed parallel computing environment in which multiple threads of a single job execute on different processing nodes across a network. The method comprises: generating one or more messages for transmission to at least one other processing node accessible via a network, where the messages are generated by/for a first thread executing at the data processing system (first processing node) and the other processing node executes one or more second threads of a same parallel job as the first thread. An OTAM broadcast is transmitting via a host fabric interface (HFI) of the data processing system as a one-to-all broadcast on the network, whereby the messages are transmitted to a cluster of processing nodes across the network that execute threads of the same parallel job as the first thread.	10-21-2010
20100269115	Managing Threads in a Wake-and-Go Engine - A wake-and-go mechanism is provided for a data processing system. The wake-and-go mechanism detects a thread running on a first processing unit within a plurality of processing units that is waiting for an event that modifies a data value associated with a target address. The wake-and-go mechanism creates a wake-and-go instance for the thread by populating a wake-and-go storage array with the target address. The operating system places the thread in a sleep state. Responsive to detecting the event that modifies the data value associated with the target address, the wake-and-go mechanism assigns the wake-and-go instance to a second processing unit within the plurality of processing units. The operating system on the second processing unit places the thread in a non-sleep state.	10-21-2010
20100269118	SPECULATIVE POPCOUNT DATA CREATION - A method and a data processing system by which population count (popcount) operations are efficiently performed without incurring the latency and loss of critical processing cycles and bandwidth of real time processing. The method comprises: identifying data to be stored to memory for which a popcount may need to be determined; speculatively performing a popcount operation on the data as a background process of the processor while the data is being stored to memory; storing the data to a first memory location; and storing a value of the popcount generated by the popcount operation within a second memory location. The method further comprises: determining a size of data; determining a granular level at which the popcount operation on the data will be performed; and reserving a size of said second memory location that is sufficiently large to hold the value of the popcount.	10-21-2010
20100287341	Wake-and-Go Mechanism with System Address Bus Transaction Master - A wake-and-go mechanism is provided for a data processing system. The wake-and-go mechanism is configured to issue a look-ahead load command on a system bus to read a data value from a target address and perform a comparison operation to determine whether the data value at the target address indicates that an event for which a thread is waiting has occurred. In response to the comparison resulting in a determination that the event has not occurred, the wake-and-go engine populates the wake-and-go storage array with the target address and snoops the target address on the system bus.	11-11-2010
20100293339	DATA PROCESSING SYSTEM, PROCESSOR AND METHOD FOR VARYING A DATA PREFETCH SIZE BASED UPON DATA USAGE - A method of data processing in a processor includes maintaining a usage history indicating demand usage of prefetched data retrieved into cache memory. An amount of data to prefetch by a data prefetch request is selected based upon the usage history. The data prefetch request is transmitted to a memory hierarchy to prefetch the selected amount of data into cache memory.	11-18-2010
20100293340	Wake-and-Go Mechanism with System Bus Response - A wake-and-go mechanism is provided for a data processing system. The wake-and-go mechanism is configured to issue a look-ahead load command on a system bus to read a data value from a target address and perform a comparison operation to determine whether the data value at the target address indicates that an event for which a thread is waiting has occurred. In response to the comparison resulting in a determination that the event has not occurred, the wake-and-go engine populates a wake-and-go storage array with the target address and snooping the target address on the system bus without data exclusivity. In response to the comparison resulting in a determination that the event has occurred, the wake-and-go engine issues a load command on the system bus to read the data value from the target address with data exclusivity.	11-18-2010
20100293341	Wake-and-Go Mechanism with Exclusive System Bus Response - A wake-and-go mechanism is configured to issue a look-ahead load command on a system bus to read a data value from a target address and perform a comparison operation to determine whether the data value at the target address indicates that an event for which a thread is waiting has occurred. In response to the comparison resulting in a determination that the event has not occurred, the wake-and-go engine populates the wake-and-go storage array with the target address. In response to the comparison resulting in a determination that the event has occurred, the wake-and-go engine issues a load command on the system bus to read the data value from the target address with data exclusivity and determines whether the wake-and-go engine obtains a lock for the target address. Responsive to obtaining the lock for the target address, the wake-and-go engine holds the lock for the thread.	11-18-2010
20100293359	General Purpose Register Cloning - A clone set of General Purpose Registers (GPRs) is created to be used by a set of helper thread binaries, which is created from a set of main thread binaries. When the set of main thread binaries enters a wait state, the set of helper thread binaries uses the clone set of GPRs to continue using unused execution units within a processor core. The set of helper threads are thus able to warm up local cache memory with data that will be needed when execution of the set of main thread binaries resumes.	11-18-2010
20100299496	Thread Partitioning in a Multi-Core Environment - A set of helper thread binaries is created to retrieve data used by a set of main thread binaries. The set of helper thread binaries and the set of main thread binaries are partitioned according to common instruction boundaries. As a first partition in the set of main thread binaries executes within a first core, a second partition in the set of helper thread binaries executes within a second core, thus “warming up” the cache in the second core. When the first partition of the main completes execution, a second partition of the main core moves to the second core, and executes using the warmed up cache in the second core.	11-25-2010
20110173417	Programming Idiom Accelerators - A wake-and-go mechanism may be a programming idiom accelerator. As a processor fetches instructions, the programming idiom accelerator may look ahead to determine whether a programming idiom is coming up in the instruction stream. If the programming idiom accelerator recognizes a programming idiom, the programming idiom accelerator may perform an action to accelerate execution of the programming idiom. In the case of a wake-and-go programming idiom, the programming idiom accelerator may record an entry in a wake-and-go array, for example.	07-14-2011
20110173419	Look-Ahead Wake-and-Go Engine With Speculative Execution - A wake-and-go mechanism is provided for a microprocessor. The wake-and-go mechanism looks ahead in the instruction stream of a thread for programming idioms that indicates that the thread is waiting for an event. If a look-ahead polling operation succeeds, the look-ahead wake-and-go engine may record an instruction address for the corresponding idiom so that the wake-and-go mechanism may have the thread perform speculative execution at a time when the thread is waiting for an event. During execution, when the wake-and-go mechanism recognizes a programming idiom, the wake-and-go mechanism may store the thread state in the thread state storage. Instead of putting thread to sleep, the wake-and-go mechanism may perform speculative execution.	07-14-2011
20110173423	Look-Ahead Hardware Wake-and-Go Mechanism - A hardware wake-and-go mechanism is provided for a data processing system. The wake-and-go mechanism looks ahead in the instruction stream of a thread for programming idioms that indicates that the thread is waiting for an event. The wake-and-go mechanism updates a wake-and-go array with a target address associated with the event for each recognized programming idiom. When the thread reaches a programming idiom, the thread goes to sleep until the event occurs. The wake-and-go array may be a content addressable memory (CAM). When a transaction appears on the symmetric multiprocessing (SMP) fabric that modifies the value at a target address in the CAM, the CAM returns a list of storage addresses at which the target address is stored. The wake-and-go mechanism associates these storage addresses with the threads waiting for an even at the target addresses, and may wake the one or more threads waiting for the event.	07-14-2011
20110173593	Compiler Providing Idiom to Idiom Accelerator - A wake-and-go mechanism may be a programming idiom accelerator. As a processor fetches instructions, the programming idiom accelerator may look ahead to determine whether a programming idiom is coming up in the instruction stream. If the programming idiom accelerator recognizes a programming idiom, the programming idiom accelerator may perform an action to accelerate execution of the programming idiom. A compiler may recognize programming idioms and expose the programming idioms to the programming idiom accelerator within the resulting machine language instructions.	07-14-2011
20110173625	Wake-and-Go Mechanism with Prioritization of Threads - A hardware private array is a thread state storage that is embedded within the processor or within logic associated with a bus or wake-and-go logic. The hardware private array and/or wake-and-go array may have a limited storage area. Therefore, each thread may have an associated priority. If there is insufficient space in the hardware private array, then the wake-and-go mechanism may compare the priority of the thread to the priorities of the threads already stored in the hardware private array and wake-and-go array. If the thread has a higher priority than at least one thread already stored in the hardware private array and wake-and-go array, then the wake-and-go mechanism may remove a lowest priority thread, meaning the thread is removed from hardware private array and wake-and-go array and converted to a flee model.	07-14-2011
20110173630	Central Repository for Wake-and-Go Mechanism - A wake-and-go mechanism is provided with a central repository wake-and-go array for a multiple processor data processing system. The wake-and-go mechanism recognizes a programming idiom that indicates that a thread running on a processor within the multiple processor data processing system is waiting for an event. The wake-and-go mechanism updates a central repository wake-and-go array with a target address associated with the event. Each entry in the central repository wake-and-go array may include a thread identification (ID), a central processing unit (CPU) ID, the target address, the expected data, a comparison type, a lock bit, a priority, and a thread state pointer, which is the address at which the thread state information is stored.	07-14-2011
20110173631	Wake-and-Go Mechanism for a Data Processing System - A wake-and-go mechanism is provided for a data processing system. When a thread is waiting for an event, rather than performing a series of get-and-compare sequences, the thread updates a wake-and-go array with a target address associated with the event. The thread then goes to sleep until the event occurs. The wake-and-go array may be a content addressable memory (CAM). When a transaction appears on the symmetric multiprocessing (SMP) fabric that modifies the value at a target address in the CAM, the CAM returns a list of storage addresses at which the target address is stored. The operating system or a background sleeper thread associates these storage addresses with the threads waiting for an even at the target addresses, and may wake the one or more threads waiting for the event.	07-14-2011
20110173632	Hardware Wake-and-Go Mechanism with Look-Ahead Polling - A hardware wake-and-go mechanism is provided for a data processing system. The wake-and-go mechanism looks ahead in a thread for programming idioms that indicates that the thread is waiting for an event. The wake-and-go mechanism performs a look-ahead polling operation for each of the programming idioms. If each of the look-ahead polling operations fails, then the wake-and-go mechanism updates a wake-and-go array with a target address associated with the event for each recognized programming idiom.	07-14-2011
20120159126	Programming Language Exposing Idiom Calls - A programming language may include hint instructions that may notify a programming idiom accelerator that a programming idiom is coming. An idiom begin hint exposes the programming idiom to the programming idiom accelerator. Thus, the programming idiom accelerator need not perform pattern matching or other forms of analysis to recognize a sequence of instructions. Rather, the programmer may insert idiom hint instructions, such as an idiom begin hint, to expose the idiom to the programming idiom accelerator. Similarly, an idiom end hint may mark the end of the programming idiom.	06-21-2012
20120191674	Dynamic Monitoring of Ability to Reassemble Streaming Data Across Multiple Channels Based on History - Mechanisms are provided for processing streaming data at high sustained data rates. These mechanisms receive a plurality of data elements over a plurality of non-sequential communication channels and write the plurality of data elements directly to the file system of the data processing system in an unassembled manner. The mechanisms determining whether to perform a data scrubbing operation or not based on history information indicative of whether data elements in the plurality of data elements are being received in a substantially sequential manner. The mechanisms perform a data scrubbing operation, in response to a determination to perform data scrubbing, to identify any missing data elements in the plurality of data elements written to the tile system and assemble the plurality of data elements into a plurality of data streams in response to results of the data scrubbing indicating that there are no missing data elements.	07-26-2012
20120265938	PERFORMING A PARTIAL CACHE LINE STORAGE-MODIFYING OPERATION BASED UPON A HINT - Analyzing pre-processed code includes identifying at least one storage-modifying construct specifying a storage-modifying memory access to a memory hierarchy of a data processing system and determining if more than one granule of a cache line of data containing multiple granules that is targeted by the storage-modifying construct is subsequently referenced by said pre-processed code. Post-processed code including a storage-modifying instruction corresponding to the at least one storage-modifying construct in the pre-processed code is generated and stored. Generating the post-processed code includes marking the storage-modifying instruction with a partial cache line hint indicating that said storage-modifying instruction targets less than a full cache line of data within a memory hierarchy if the analyzing indicates only one granule of the target cache line will be accessed while the cache line is held in the cache memory and otherwise refraining from marking the storage-modifying instruction with the partial cache line hint.	10-18-2012
20120266180	Performing Setup Operations for Receiving Different Amounts of Data While Processors are Performing Message Passing Interface Tasks - A system and method are provided for performing setup operations for receiving a different amount of data while processors are performing message passing interface (MPI) tasks. Mechanisms for adjusting the balance of processing workloads of the processors are provided so as to minimize wait periods for waiting for all of the processors to call a synchronization operation. An MPI load balancing controller maintains a history that provides a profile of the tasks with regard to their calls to synchronization operations. From this information, it can be determined which processors should have their processing loads lightened and which processors are able to handle additional processing loads without significantly negatively affecting the overall operation of the parallel execution system. As a result, setup operations may be performed while processors are performing MPI tasks to prepare for receiving different sized portions of data in a subsequent computation cycle based on the history.	10-18-2012
20120273185	ENVIRONMENTAL CONTROL OF LIQUID COOLED ELECTRONICS - A method, system, and computer program product are provided for controlling liquid-cooled electronics, which includes measuring a first set point temperature, T	11-01-2012
20120304201	Management of Process-to-Process Communication Requests - A mechanism is provided for managing a process-to-process communication request. A call is received in an operating system from an application in the data processing system. The operating system passes the call to a host fabric interface controller in the data processing system without processing the call. The host fabric interface controller processes the call using state information associated with the call. The call is processed by the host fabric interface controller without intervention by the operating system.	11-29-2012
20130179899	MANAGEMENT OF PROCESS-TO-PROCESS COMMUNICATION REQUESTS - A mechanism is provided for managing a process-to-process communication request. A call is received in an operating system from an application in the data processing system. The operating system passes the call to a host fabric interface controller in the data processing system without processing the call. The host fabric interface controller processes the call using state information associated with the call. The call is processed by the host fabric interface controller without intervention by the operating system.	07-11-2013

Patent applications by Ravi K. Arimilli, Austin, TX US

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Ravi K. Arimilli, Austin US

Ravi K. Arimilli, Austin, TX US