Entries |
Document | Title | Date |
20080201531 | STRUCTURE FOR ADMINISTERING AN ACCESS CONFLICT IN A COMPUTER MEMORY CACHE - A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design is provided. The design structure includes an apparatus for administering an access conflict in a cache. The apparatus includes the cache, a cache controller, and a superscalar computer processor. The cache controller is capable of receiving a write address and write data from the superscalar computer processor's store memory instruction execution unit and a read address for read data from the superscalar computer processor's load memory instruction execution unit, for writing and reading data from a same cache line in the cache simultaneously on a current clock cycle; storing the write data in the same cache line on the current clock cycle; stalling, in the load memory instruction execution unit, a corresponding load microinstruction; and reading from the cache on a subsequent clock cycle read data from the read address. | 08-21-2008 |
20080209132 | Disk snapshot acquisition method - A disk snapshot acquisition method, which is applied in a server comprising a memory allocated with a kernel space and a hard disk, comprises the steps of allocating all chunks having data stored as a disk volume in said hard disk; allocating a first portion and a second portion in said hard disk; establishing a snapshot pointer in said kernel space, said snapshot pointer pointing to a starting address of said first portion in said hard disk; and when original data in one of said chunks of said disk volume is to be modified, duplicating said original data to a chunk in said second portion as backup data, then modifying said original data into modified data, and storing a piece of mapping information comprising an address of said modified data and an address of said backup data corresponding to said modified data to a copy-on-write table in said first portion. | 08-28-2008 |
20080215819 | METHOD, APPARATUS, AND COMPUTER PROGRAM PRODUCT FOR A CACHE COHERENCY PROTOCOL STATE THAT PREDICTS LOCATIONS OF SHARED MEMORY BLOCKS - A method, apparatus, and computer program product are disclosed for reducing the number of unnecessarily broadcast local requests to reduce the latency to access data from remote nodes in an SMP computer system. A shared invalid cache coherency protocol state is defined that predicts whether a memory read request to read data in a shared cache line can be satisfied within a local node. When a cache line is in the shared invalid state, a valid copy of the data is predicted to be located in the local node. When a cache line is in the invalid state and not in the shared invalid state, a valid copy of the data is predicted to be located in one of the remote nodes. | 09-04-2008 |
20080215820 | METHOD AND APPARATUS FOR FILTERING MEMORY WRITE SNOOP ACTIVITY IN A DISTRIBUTED SHARED MEMORY COMPUTER - A method and apparatus for filtering memory probe activity for writes in a distributed shared memory computer. In one embodiment, the method may include assigning an uncached directory state to a cache data block in response to evicting the cache data block. In another embodiment, the method may include assigning a remote directory state to a cache data block in response to evicting the cache data block and storing it in a remote cache. In a third embodiment, the method may include assigning a pairwise-shared directory state in response to a second processor node initiating a load operation to a cache data block in a modified cache state in a first processor node. In a fourth embodiment, the method may include assigning a migratory directory state in response to a processor node initiating a store operation to a cache data block in a pairwise-shared cache state. | 09-04-2008 |
20080229028 | UNIFORM EXTERNAL AND INTERNAL INTERFACES FOR DELINQUENT MEMORY OPERATIONS TO FACILITATE CACHE OPTIMIZATION - A computer implemented method, software infrastructure and computer usable program code for improving application performance. A delinquent memory operation instruction is identified. A delinquent memory operation instruction is an instruction associated with cache misses that exceeds a threshold number of cache misses. A directive is inserted in a code region associated with the delinquent memory operation to form annotated code. The directive indicates an address of the delinquent memory operation instruction and a number of memory latency cycles expected to be required for the delinquent memory operation instruction to execute. The information included in the annotated code is used to optimize execution of an application associated with the delinquent memory operation instruction. | 09-18-2008 |
20080244189 | Method, Apparatus, System and Program Product Supporting Directory-Assisted Speculative Snoop Probe With Concurrent Memory Access - A multiprocessor data processing system includes a memory controller controlling access to a memory subsystem, multiple processor buses coupled to the memory controller, and at least one of multiple processors coupled to each processor bus. In response to receiving a first read request of a first processor via a first processor bus, the memory controller initiates a speculative access to the memory subsystem and a lookup of the target address in a central coherence directory. In response to the central coherence directory indicating that a copy of the target memory block is cached by a second processor, the memory controller transmits a second read request for the target address on a second processor bus. In response to receiving a clean snoop response to the second read request, the memory controller provides to the first processor the target memory block retrieved from the memory subsystem by the speculative access. | 10-02-2008 |
20080244190 | Method, Apparatus, System and Program Product Supporting Efficient Eviction of an Entry From a Central Coherence Directory - In response to a memory access request missing in a central coherence directory of a data processing system, the central coherence directory issues a back-invalidate request and provides an indication of one or more processors possibly caching a copy of a victim memory block associated with a victim memory address. In response to the back-invalidate request, a memory controller initiates a lookup of coherency information for the victim memory address in the central coherence directory and, prior to receipt of the coherency information, speculatively issues a set of back-invalidate commands on one or more of multiple processor buses to invalidate any cached copy of the victim memory block. In response to receipt of the coherency information, the memory controller determines whether the set of speculatively issued back-invalidate commands was under-inclusive, and if not, removes a victim entry associated with the victim memory address from the central coherence directory. | 10-02-2008 |
20080250210 | COPYING DATA FROM A FIRST CLUSTER TO A SECOND CLUSTER TO REASSIGN STORAGE AREAS FROM THE FIRST CLUSTER TO THE SECOND CLUSTER - Provided are a method, system, and article of manufacture for copying data from a first cluster to a second cluster to reassign storage areas from the first cluster to the second cluster. An operation is initiated to reassign storage areas from a first cluster to a second cluster, wherein the first cluster includes a first cache and a first storage unit and the second cluster includes a second cache and a second storage unit. Data in the first cache for the storage areas to reassign to the second cluster is copied to the second cache. Data in the first storage unit for storage areas remaining assigned to the first cluster is copied to the second storage unit. | 10-09-2008 |
20080263284 | Methods and Arrangements to Manage On-Chip Memory to Reduce Memory Latency - Methods, systems, and media for reducing memory latency seen by processors by providing a measure of control over on-chip memory (OCM) management to software applications, implicitly and/or explicitly, via an operating system are contemplated. Many embodiments allow part of the OCM to be managed by software applications via an application program interface (API), and part managed by hardware. Thus, the software applications can provide guidance regarding address ranges to maintain close to the processor to reduce unnecessary latencies typically encountered when dependent upon cache controller policies. Several embodiments utilize a memory internal to the processor or on a processor node so the memory block used for this technique is referred to as OCM. | 10-23-2008 |
20080282039 | METHOD AND SYSTEM FOR PROACTIVELY MONITORING THE COHERENCY OF A DISTRIBUTED CACHE - A method of proactively monitoring the coherency of a distributed cache. A cache comparison utility selects a set of cache keys from a replica cache connected to a main cache via a network. The cache comparison utility selects a first cache key from the set of cache keys and fetches a first cache value from the replica cache that corresponds to the first cache key. The cache comparison utility generates a first checksum value corresponding to the first cache value and the first cache key and stores the first checksum value in a first checksum table. The cache comparison utility creates a first total checksum value that corresponds to the first checksum table and compares the first total checksum value with multiple total checksum values that correspond to the main cache and one or more additional replica caches, thereby identifying replica caches that are not identical to the main cache. | 11-13-2008 |
20080282040 | COMPUTER SYSTEM, METHOD, CACHE CONTROLLER AND COMPUTER PROGRAM FOR CACHING I/O REQUESTS - A computer system having a main unit and an expansion unit connected by an interface arrangement. The expansion unit includes at least one connector for receiving an input/output component, so that additional input/output components can be added to the computer system. The interface arrangement includes at least one cache controller and at least one cache memory for monitoring and predicting requests exchanged between the main unit and the expansion unit. A method of caching and processing input/output requests and a storage medium is also provided. | 11-13-2008 |
20080294848 | Control data modification within a cache memory - A data processing system is provided with at least one processor | 11-27-2008 |
20080301376 | Method, Apparatus, and System Supporting Improved DMA Writes - A memory controller receives a stream of DMA write operations and enqueues them in a queue enforcing a First-In First-Out (FIFO) order. Prior to processing a particular DMA write operation, the memory controller acquires coherency ownership of a target memory block and stores the result in a low latency array. In response to acquiring coherency ownership, this low latency array is updated to a coherency state signifying coherency ownership by the memory controller. In a pipelined array access, both the low latency array and the second array are accessed and if the lower latency second array indicates the particular coherency state with no collision indication, the memory controller signals that the particular DMA write operation can be performed, where the signaling occurs prior to results being obtained from the higher latency first array at the normal end of the array access pipeline. In response to the signaling, the memory controller performs an update to the memory subsystem indicated by the particular DMA write operation. | 12-04-2008 |
20080313409 | Separating device and separating method - In a separating device that separates a processor configured to perform process by using data recorded in a cache memory connected to the processor, a stopping unit, upon receiving a processor separation request, stops the processor from performing a new process; and a separation executing unit, upon completion of process being performed by the processor, separates the processor after invalidating the data recorded in the cache memory. | 12-18-2008 |
20080320230 | Avoiding Livelock Using A Cache Manager in Multiple Core Processors - Livelocks are prevented in multiple core processors by verifying that a data access request is still valid before sending messages to processor cores that may cause other data access requests to fail. A cache coherency manager receives data access requests from multiple processor cores. Upon receiving a data access request that may cause a livelock, the cache coherency manager first sends an intervention message back to the requesting processor core to confirm that this data access request will succeed. If the requesting processor core determines that the data access request is still valid, it directs the cache coherency manager to proceed with the data access request. The cache coherency manager may then send intervention messages to other processor cores to complete the data access request. If the requesting processor core determines that the data access request is invalid, it directs the cache coherency manager to abandon the data access request. | 12-25-2008 |
20080320231 | Avoiding Livelock Using Intervention Messages in Multiple Core Processors - Livelocks are prevented in multiple core processors by canceling data access requests upon determining that they conflict with other data access requests. A requesting processor core sends a data access request potentially causing livelock to a cache coherency manager. A cache coherency manager receives data access requests from multiple processor. The cache coherency manager sends intervention messages to all of the processor cores in response to all data access requests that may cause livelock. Upon receiving an intervention message from the cache coherency manager, the processor core determines if the intervention message corresponds with any of its own pending data access requests. If the intervention message is associated with a data access request conflicting with one of its own pending data access requests, the processor core responds to the invention message by directing the cache coherency manager to cancel its own conflicting pending data access request. | 12-25-2008 |
20090006764 | INSERTION OF COHERENCE REQUESTS FOR DEBUGGING A MULTIPROCESSOR - A method and system are disclosed to insert coherence events in a multiprocessor computer system, and to present those coherence events to the processors of the multiprocessor computer system for analysis and debugging purposes. The coherence events are inserted in the computer system by adding one or more special insert registers. By writing into the insert registers, coherence events are inserted in the multiprocessor system as if they were generated by the normal coherence protocol. Once these coherence events are processed, the processing of coherence events can continue in the normal operation mode. | 01-01-2009 |
20090006765 | METHOD AND SYSTEM FOR REDUCING CACHE CONFLICTS - Disclosed is a system and method for storing a plurality of data packets in a plurality of memory buffers in a cache memory for reducing cache conflicts. The method includes determining size of each of a plurality of data packets; storing a first data packet of the plurality of data packets starting from a first address in a first memory buffer of the plurality of memory buffers; determining an offset based on the size of the first data packet; and storing a second data packet in a second buffer starting from a second address based on the offset. | 01-01-2009 |
20090006766 | DATA PROCESSING SYSTEM AND METHOD FOR PREDICTIVELY SELECTING A SCOPE OF BROADCAST OF AN OPERATION UTILIZING A HISTORY-BASED PREDICTION - According to a method of data processing, a predictor is maintained that indicates a historical scope of broadcast for one or more previous operations transmitted on an interconnect of a data processing system. A scope of broadcast of a subsequent operation is predictively selected by reference to the predictor. | 01-01-2009 |
20090019230 | DYNAMIC INITIAL CACHE LINE COHERENCY STATE ASSIGNMENT IN MULTI-PROCESSOR SYSTEMS - A method, system, and computer program product for providing lines of data from shared resources to caching agents are provided. The method, system, and computer program product provide for receiving a request from a caching agent for a line of data stored in a shared resource, assigning one of a plurality of coherency states as an initial coherency state for the line of data, each of the plurality of coherency states being assignable as the initial coherency state for the line of data, and providing the line of data to the caching agent in the initial coherency state assigned to the line of data. | 01-15-2009 |
20090019231 | Method and Apparatus for Implementing Virtual Transactional Memory Using Cache Line Marking - Embodiments of the present invention implement virtual transactional memory using cache line marking. The system starts by executing a starvation-avoiding transaction for a thread. While executing the starvation-avoiding transaction, the system places starvation-avoiding load-marks on cache lines which are loaded from and places starvation-avoiding store-marks on cache lines which are stored to. Next, while swapping a page out of a memory and to a disk during the starvation-avoiding transaction, the system determines if one or more cache lines in the page have a starvation-avoiding load-mark or a starvation-avoiding store-mark. If so, upon swapping the page into the memory from the disk, the system places a starvation-avoiding load-mark on each cache line that had a starvation-avoiding load-mark and places a starvation-avoiding store-mark on each cache line that had a starvation-avoiding store-mark. | 01-15-2009 |
20090019232 | SPECIFICATION OF COHERENCE DOMAIN DURING ADDRESS TRANSLATION - A processing system includes a plurality of coherency domains and a plurality of coherency agents. Each coherency agent is associated with at least one of the plurality of coherency domains. At a select coherency agent of the plurality of coherency agents, an address translation for a coherency message is performed using a first memory address to generate a second memory address. A select coherency domain of the plurality of coherency domains associated with the coherency message is determined at the select coherency agent based on the address translation. The coherency message and a coherency domain identifier of the select coherency domain are provided by the select coherency agent to a coherency interconnect for distribution to at least one of the plurality of coherency agents based on the coherency domain identifier. | 01-15-2009 |
20090019233 | STRUCTURE FOR DYNAMIC INITIAL CACHE LINE COHERENCY STATE ASSIGNMENT IN MULTI-PROCESSOR SYSTEMS - A design structure embodied in a machine readable storage medium for designing, manufacturing, and testing a system for providing lines of data from shared resources to caching agents are provided. The system provides for receiving a request from a caching agent for a line of data stored in a shared resource, assigning one of a plurality of coherency states as an initial coherency state for the line of data, each of the plurality of coherency states being assignable as the initial coherency state for the line of data, and providing the line of data to the caching agent in the initial coherency state assigned to the line of data. | 01-15-2009 |
20090019234 | CACHE MEMORY DEVICE AND DATA PROCESSING METHOD OF THE DEVICE - A cache memory device is provided. The cache memory device includes a memory including a first cache memory region and a second cache memory region, and a control block. The control block determines a type of data to be received. The control block also performs at least one of transmitting a head of received data to a first cache memory region, transmitting a body of the received data to a second cache memory region and transmitting a tail of the received data to the first cache memory region based on the type of the data to be received. | 01-15-2009 |
20090037665 | HIDING CONFLICT, COHERENCE COMPLETION AND TRANSACTION ID ELEMENTS OF A COHERENCE PROTOCOL - According to one embodiment of the invention, an apparatus having one or more cache agents and a protocol agent is disclosed. The protocol agent is coupled to the one or more cache agents to receive events corresponding to cache operations from the one or more cache agents to maintain ordering with respect to the cache operation events. The protocol agent includes a structure to handle conflict resolution. | 02-05-2009 |
20090055596 | MULTI-PROCESSOR SYSTEM HAVING AT LEAST ONE PROCESSOR THAT COMPRISES A DYNAMICALLY RECONFIGURABLE INSTRUCTION SET - A multi-processor system comprises at least one host processor, which may comprise a fixed instruction set, such as the well-known x86 instruction set. The system further comprises at least one co-processor, which comprises dynamically reconfigurable logic that enables the co-processor's instruction set to be dynamically reconfigured. In this manner, the at least one host processor and the at least one dynamically reconfigurable co-processor are heterogeneous processors having different instruction sets. Further, cache coherency is maintained between the heterogeneous host and co-processors. And, a single executable file may contain instructions that are processed by the multi-processor system, wherein a portion of the instructions are processed by the host processor and a portion of the instructions are processed by the co-processor. | 02-26-2009 |
20090063780 | DATA PROCESSING SYSTEM AND METHOD FOR MONITORING THE CACHE COHERENCE OF PROCESSING UNITS - The present invention relates to a data processing system with a plurality of processing units (PU), a shared memory (M) for storing data from said processing units (PU) and an interconnect means (IM) for coupling the memory (M) and the plurality of processing units (PU). At least one of the processing units (PU) comprises a cache memory (C). Furthermore, a transition buffer (STB) is provided for buffering at least some of the state transitions of the cache memories (C) of said at least one of said plurality of processing units (PU). A monitoring means (MM) is provided for monitoring the cache coherence of the caches (C) of said plurality of processing units (PU) based on the data of the transition buffer (STB), in order to determine any cache coherence violations. | 03-05-2009 |
20090063781 | CACHE ACCESS MECHANISM - Techniques for improving cache accesses in an object-relational mapping space are described herein. In one embodiment, in response to a first cache request received at a first cache API associated with a transaction for updating a data entry of the relational database, the updated data of the data entry is stored in a local cache, where the local cache is one of members of a cache cluster, and an invalidation message is sent to remaining members of the cache cluster to invalidate corresponding cache entries of the remaining members. In response to a second cache request received at a second cache API associated with a transaction for loading data from a data entry of the relational database, the loaded data is stored in the local cache without sending an invalidation message to the remaining members of the cache cluster. Other methods and apparatuses are also described. | 03-05-2009 |
20090077322 | System and Method for Getllar Hit Cache Line Data Forward Via Data-Only Transfer Protocol Through BEB Bus - A system and method for using a data-only transfer protocol to store atomic cache line data in a local storage area is presented. A processing engine includes an atomic cache and a local storage. When the processing engine encounters a request to transfer cache line data from the atomic cache to the local storage (e.g., GETTLAR command), the processing engine utilizes a data-only transfer protocol to pass cache line data through the external bus node and back to the processing engine. The data-only transfer protocol comprises a data phase and does not include a prior command phase or snoop phase due to the fact that the processing engine communicates to the bus node instead of an entire computer system when the processing engine sends a data request to transfer data to itself. | 03-19-2009 |
20090083493 | SUPPORT FOR MULTIPLE COHERENCE DOMAINS - A number of coherence domains are maintained among the multitude of processing cores disposed in a microprocessor. A cache coherency manager defines the coherency relationships such that coherence traffic flows only among the processing cores that are defined as having a coherency relationship. The data defining the coherency relationships between the processing cores is optionally stored in a programmable register. For each source of a coherent request, the processing core targets of the request are identified in the programmable register. In response to a coherent request, an intervention message is forwarded only to the cores that are defined to be in the same coherence domain as the requesting core. If a cache hit occurs in response to a coherent read request and the coherence state of the cache line resulting in the hit satisfies a condition, the requested data is made available to the requesting core from that cache line. | 03-26-2009 |
20090083494 | PROBABILISTIC TECHNIQUE FOR CONSISTENCY CHECKING CACHE ENTRIES - A facility for determining whether to consistency-check a cache entry is described. The facility randomly or pseudorandomly selects a value in a range. If the selected value satisfies a predetermined consistency-checking threshold within the range, the facility consistency-checks the entry, and may decide to propagate this knowledge to other cache managers. If, on the other hand, the selected value does not satisfy the consistency-checking threshold, the facility determines not to consistency-check the entry. | 03-26-2009 |
20090089510 | SPECULATIVE READ IN A CACHE COHERENT MICROPROCESSOR - A cache coherence manager, disposed in a multi-core microprocessor, includes a request unit, an intervention unit, a response unit and an interface unit. The request unit receives coherent requests and selectively issues speculative requests in response. The interface unit selectively forwards the speculative requests to a memory. The interface unit includes at least three tables. Each entry in the first table represents an index to the second table. Each entry in the second table represents an index to the third table. The entry in the first table is allocated when a response to an associated intervention message is stored in the first table but before the speculative request is received by the interface unit. The entry in the second table is allocated when the speculative request is stored in the interface unit. The entry in the third table is allocated when the speculative request is issued to the memory. | 04-02-2009 |
20090100232 | Processor, information processing device and cache control method of processor - A processor having a cache memory provided therein controls use of the cache memory based on operation mode information which changeably designates use/no-use of a cache memory and on designation of cache memory use in an access instruction word in a program at the time of an access to a main storage memory from the program in operation. | 04-16-2009 |
20090106500 | Method and Apparatus for Managing Buffers in a Data Processing System - A buffer management for a data processing system is provided. According to one embodiment, a method for managing buffers in a telephony device is provided. The method comprising providing a plurality of buffers stored in a memory, providing a cache having a pointer pointing to the buffer, scanning the cache to determine if the cache is full, and when the scan determines the cache is not full determining a free buffer from the plurality of buffers, generating a pointer for the free buffer, and placing the generated pointer into the cache. | 04-23-2009 |
20090119461 | MAINTAINING CACHE COHERENCE USING LOAD-MARK METADATA - Embodiments of the present invention provide a system that maintains load-marks on cache lines. The system includes: (1) a cache which accommodates a set of cache lines, wherein each cache line includes metadata for load-marking the cache line, and (2) a local cache controller for the cache. Upon determining that a remote cache controller has made a request for a cache line that would cause the local cache controller to invalidate a copy of the cache line in the cache, the local cache controller determines if there is a load-mark in the metadata for the copy of the cache line. If not, the local cache controller invalidates the copy of the cache line. Otherwise, the local cache controller signals a denial of the invalidation of the cache line and retains the copy of the cache line and the load-mark in the metadata for the copy of the cache line. | 05-07-2009 |
20090138662 | FLOATING POINT BYPASS RETRY - A system and method for increasing the throughput of a processor during cache misses. During the retrieval of the cache miss data, subsequent memory requests are generated and allowed to proceed to the cache. The data for the subsequent cache hits are stored in a bypass retry device. Also, the cache miss address and memory line data may be stored by the device when they are retrieved and they may be sent them to the cache for a cache line replacement. The bypass retry device determines the priority of sending data to the processor. The priority allows the data for memory requests to be provided to the processor in the same order as they were generated from the processor without delaying subsequent memory requests after a cache miss. | 05-28-2009 |
20090157976 | Network on Chip That Maintains Cache Coherency With Invalidate Commands - A network on chip (‘NOC’) that maintains cache coherency with invalidate commands, the NOC comprising integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controller, each IP block adapted to a router through a memory communications controller and a network interface controller, the NOC also including a port on a router of the network through which is received an invalidate command, the invalidate command including an identification of a cache line, the invalidate command representing an instruction to invalidate the cache line, the router configured to send the invalidate command to an IP block served by the router; the router further configured to send the invalidate command horizontally and vertically to neighboring routers if the port is a vertical port; and the router further configured to send the invalidate command only horizontally to neighboring routers if the port is a horizontal port. | 06-18-2009 |
20090157977 | DATA TRANSFER TO MEMORY OVER AN INPUT/OUTPUT (I/O) INTERCONNECT - A method, system, and computer program product for data transfer to memory over an input/output (I/O) interconnect are provided. The method includes reading a mailbox stored on an I/O adapter in response to a request to initiate an I/O transaction. The mailbox stores a directive that defines a condition under which cache injection for data values in the I/O transaction will not be performed. The method also includes embedding a hint into the I/O transaction when the directive in the mailbox matches data received in the request, and executing the I/O transaction. The execution of the I/O transaction causes a system chipset or I/O hub for a processor receiving the I/O transaction, to directly store the data values from the I/O transaction into system memory and to suppress the cache injection of the data values into a cache memory upon presence of the hint in a header of the I/O transaction. | 06-18-2009 |
20090157978 | TARGET COMPUTER PROCESSOR UNIT (CPU) DETERMINATION DURING CACHE INJECTION USING INPUT/OUTPUT (I/O) ADAPTER RESOURCES - A method, system, and computer program product for target computer processor unit (CPU) determination during cache injection using input/output (I/O) adapter resources are provided. The method includes storing locations of cache lines for pinned or affinity scheduled processes in a table on an input/output (I/O) adapter. The method also includes setting a cache injection hint in an input/output (I/O) transaction when an address in the I/O transaction is found in the table. The cache injection hint is set for performing direct cache injection. The method further includes entering a central processing unit (CPU) identifier and cache type in the I/O transaction, and updating a cache by injecting data values of the I/O transaction into the cache as determined by the CPU identifier and the cache type associated with the address in the table. | 06-18-2009 |
20090157979 | TARGET COMPUTER PROCESSOR UNIT (CPU) DETERMINATION DURING CACHE INJECTION USING INPUT/OUTPUT (I/O) HUB/CHIPSET RESOURCES - A method, system, and computer program product for target computer processor unit (CPU) determination during cache injection using I/O hub/chipset resources are provided. The method includes creating a cache injection indirection table on the input/output (I/O) hub or chipset. The cache injection indirection table includes fields for address or address range, CPU identifier, and cache type. In response to receiving an input/output (I/O) transaction, the hub/chipset reads the address in an address field of the I/O transaction, looks up the address in the cache injection indirection table, and injects the address and data of the I/O transaction to a target cache associated with a CPU as identified in the CPU identifier field when, in response to the look up, the address is present in the address field of the cache injection indirection table. | 06-18-2009 |
20090157980 | Memory controller with write data cache and read data cache - A memory controller | 06-18-2009 |
20090157981 | COHERENT INSTRUCTION CACHE UTILIZING CACHE-OP EXECUTION RESOURCES - A multiprocessor system maintains cache coherence among processors in a coherent domain. Within the coherent domain, a first processor can receive a command to perform a cache maintenance operation. The first processor can determine whether the cache maintenance operation is a coherent operation. For coherent operations, the first processor sends a coherent request message for distribution to other processors in the coherent domain and can cancel execution of the cache maintenance operation pending receipt of intervention messages corresponding to the coherent request. The intervention messages can reflect a global ordering of coherence traffic in the multiprocessor system and can include instructions for maintaining a data cache and an instruction cache of the first processor. Cache maintenance operations that are determined to be non-coherent can be executed at the first processor without sending the coherent request. | 06-18-2009 |
20090164735 | System and Method for Cache Coherency In A Multiprocessor System - A method for maintaining cache coherency operates in a data processing system with a system memory and a plurality of processing units (PUs), each PU having a cache, and each PU coupled to at least another one of the plurality of PUs. A first PU receives a first data block for storage in a first cache of the first PU. The first PU stores the first data block in the first cache. The first PU assigns a first coherency state and a first tag to the first data block, wherein the first coherency state is one of a plurality of coherency states that indicate whether the first PU has accessed the first data block. The plurality of coherency states further indicate whether, in the event the first PU has not accessed the first data block, the first PU received the first data block from a neighboring PU. | 06-25-2009 |
20090164736 | System and Method for Cache Line Replacement Selection in a Multiprocessor Environment - A method for managing a cache operates in a data processing system with a system memory and a plurality of processing units (PUs). A first PU determines that one of a plurality of cache lines in a first cache of the first PU must be replaced with a first data block, and determines whether the first data block is a victim cache line from another one of the plurality of PUs. In the event the first data block is not a victim cache line from another one of the plurality of PUs, the first cache does not contain a cache line in coherency state invalid, and the first cache contains a cache line in coherency state moved, the first PU selects a cache line in coherency state moved, stores the first data block in the selected cache line and updates the coherency state of the first data block. | 06-25-2009 |
20090164737 | SYSTEM AND METHOD FOR PROCESSING POTENTIALLY SELF-INCONSISTENT MEMORY TRANSACTIONS - A processor provides memory request and a coherency state value for a coherency granule associated with a memory request. The processor further provides either a first indicator or a second indicator depending on whether the coherency state value represents a cumulative coherency state for a plurality of caches of the processor. The first indicator and the second indicator identify the coherency state value as representing a cumulative coherency state or a potentially non-cumulative coherency state, respectively. If the second indicator is provided, a transaction management module determines whether to request the cumulative coherency state for the coherency granule in response to receiving the second indicator. The transaction management module then provides an indicator of the request for the cumulative coherency state to the processor in response to determining to request the cumulative coherency state. Otherwise, the transaction management module processes the memory transaction without requesting the cumulative coherency state. | 06-25-2009 |
20090172294 | Method and apparatus for supporting scalable coherence on many-core products through restricted exposure - In one embodiment, a multi-core processor having cores each associated with a cache memory, can operate such that when a first core is to access data owned by a second core present in a cache line associated with the second core, responsive to a request from the first core, cache coherency state information associated with the cache line is not updated. A coherence engine associated with the processor may receive the data access request and determine that the data is of a memory page owned by the first core and convert the data access request to a non-cache coherent request. Other embodiments are described and claimed. | 07-02-2009 |
20090172295 | In-memory, in-page directory cache coherency scheme - In an embodiment, the method provides receiving a memory access request for a demanded cache line from a processor of a plurality of processors; accessing coherency information associated with the demanded cache line from a memory unit by bringing in from a memory page in which the demanded cache line is stored, the memory page also including a directory line having coherency information corresponding to the demanded cache line; reading data associated with the demanded cache line in accordance with the coherency information; and returning the data to the processor. | 07-02-2009 |
20090187716 | Network On Chip that Maintains Cache Coherency with Invalidate Commands - A network on chip (‘NOC’) that maintains cache coherency, the NOC including integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controller, each IP block adapted to a router through a memory communications controller and a network interface controller, at least one memory communications controller further comprising a cache coherency controller each memory communications controller controlling communication between an IP block and memory, and each network interface controller controlling inter-IP block communications through routers, wherein the memory communications controller configured to execute a memory access instruction and configured to determine a state of a cache line addressed by the memory access instruction, the state of the cache line being one of shared, exclusive, or invalid; the memory communications controller configured to broadcast an invalidate command to a plurality of IP blocks of the NOC if the state of the cache line is shared; and the memory communications controller configured to transmit an invalidate command only to an IP block that controls a cache where the cache line is stored if the state of the cache line is exclusive. | 07-23-2009 |
20090187717 | Apparatus, circuit and method of controlling memory initialization - An apparatus includes a first memory which includes a plurality of memory regions, a second memory which stores initializing information indicating whether each of the memory regions is initialized, the second memory controlling a coherency between the first memory and a cache memory, and a control circuit which initializes a memory region based on the initializing information when accessing the memory region. | 07-23-2009 |
20090193197 | Selective coherency control - A data processing system | 07-30-2009 |
20090193198 | METHOD, SYSTEM AND COMPUTER PROGRAM PRODUCT FOR PREVENTING LOCKOUT AND STALLING CONDITIONS IN A MULTI-NODE SYSTEM WITH SPECULATIVE MEMORY FETCHING - A method of preventing lockout and stalling conditions in a multi-node system having a plurality of nodes which includes initiating a processor request to a shared level of cache in a requesting node, performing a fabric coherency establishment sequence on the plurality of nodes, issuing a speculative memory fetch request to a memory, detecting a conflict on one of the plurality of nodes and communicating the conflict back to the requesting node within the system, canceling the speculative memory fetch request issued, and repeating the fabric coherency establishment sequence in the system until the point of conflict is resolved, without issuing another speculative memory fetch request. The subsequent memory fetch request is only issued after determining the state of line within the system, after the successful completion of the multi-node fabric coherency establishment sequence. | 07-30-2009 |
20090198910 | DATA PROCESSING SYSTEM, PROCESSOR AND METHOD THAT SUPPORT A TOUCH OF A PARTIAL CACHE LINE OF DATA - According to method of data processing in a multiprocessor data processing system, in response to a processor touch request targeting a target granule of a cache line of data containing multiple granules, a processing unit originates on an interconnect of the multiprocessor data processing system a partial touch request that requests a copy of only the target granule for subsequent query access. In response to a combined response to the partial touch request indicating success, the combined response representing a system-wide response to the partial touch request, the processing unit receives the target granule of the target cache line and updates a coherency state of the target granule while retaining a coherency state of at least one other granule of the cache line. | 08-06-2009 |
20090198911 | DATA PROCESSING SYSTEM, PROCESSOR AND METHOD FOR CLAIMING COHERENCY OWNERSHIP OF A PARTIAL CACHE LINE OF DATA - According to method of data processing in a multiprocessor data processing system, in response to a processor request to modify a target granule of a target cache line of data containing multiple granules, a processing unit originates on an interconnect of the multiprocessor data processing system a data-claim-partial request that requests permission to promote only the target granule of the target cache line to a unique copy with an intent to modify the target granule. In response to a combined response to the data-claim-partial request indicating success (the combined response representing a system-wide response to the data-claim-partial-request), the processing unit promotes only the target granule of the target cache line to a unique copy by updating a coherency state of the target granule and retaining a coherency state of at least one other granule of the target cache line. | 08-06-2009 |
20090198912 | DATA PROCESSING SYSTEM, PROCESSOR AND METHOD FOR IMPLEMENTING CACHE MANAGEMENT FOR PARTIAL CACHE LINE OPERATIONS - A method of data processing in a cache memory includes caching a plurality of cache lines of data in a corresponding plurality of entries in a cache array, where each of the plurality of cache lines includes multiple data granules. For each of the plurality of cache entries, a plurality of line coherency state fields indicates an associated coherency state applicable to two or more data granules. For at least a particular cache line among the plurality of cache lines, a granule coherency state field indicates a coherency state for a particular granule of the multiple data granules in the particular cache line, where the coherency state field indicated by the granule coherency state field differs from that indicated for the particular cache line by its line coherency state field. | 08-06-2009 |
20090198913 | Two-Hop Source Snoop Based Messaging Protocol - A messaging protocol that facilitates a distributed cache coherency conflict resolution in a multi-node system that resolves conflicts at a home node. The protocol may perform a method including supporting at least three protocol classes for the messaging protocol, via at least three virtual channels provided by a link layer of a network fabric coupled to the caching agents, wherein the virtual channels include a first virtual channel to support a probe message class, a second virtual channel to support an acknowledgment message class, and a third virtual channel to support a response message class. | 08-06-2009 |
20090210631 | MOBILE APPLICATION CACHE SYSTEM - Providing a framework for developing, deploying and managing sophisticated mobile solutions, with a simple Web-like programming model that integrates with existing enterprise components. Mobile applications may consist of a data model definition, user interface templates, a client side controller, which includes scripts that define actions, and, on the server side, a collection of conduits, which describe how to mediate between the data model and the enterprise. In one embodiment, the occasionally-connected application server assumes that data used by mobile applications is persistently stored and managed by external systems. The occasionally-connected data model can be a METAdata description of the mobile application's anticipated usage of this data, and be optimized to enable the efficient traversal and synchronization of this data between occasionally connected devices and external systems. | 08-20-2009 |
20090210632 | MICROPROCESSOR AND METHOD FOR DEFERRED STORE DATA FORWARDING FOR STORE BACKGROUND DATA IN A SYSTEM WITH NO MEMORY MODEL RESTRICTIONS - A pipelined processor includes circuitry adapted for store forwarding, including: for each store request, and while a write to one of a cache and a memory is pending; obtaining the most recent value for at least one block of data; merging store data from the store request with the block of data thus updating the block of data and forming a new most recent value and an updated complete block of data; and buffering the updated block of data into a store data queue; for each additional store request, where the additional store request requires at least one updated block of data: determining if store forwarding is appropriate for the additional store request on a block-by-block basis; if store forwarding is appropriate, selecting an appropriate block of data from the store data queue on a block-by-block basis; and forwarding the selected block of data to the additional store request. | 08-20-2009 |
20090210633 | Method and Apparatus for Eliminating Silent Store Invalidation Propagation in Shared Memory Cache Coherency Protocols - A method and circuit for eliminating silent store invalidation propagation in shared memory cache coherency protocols, and a design structure on which the subject circuit resides are provided. A received write data value is compared with a stored cache data value. When the received write data value matches the stored cache data value, a first squash signal is generated. A received write address is compared with a reservation address. When the received write address matches the reservation address, a reservation signal is generated. The first squash signal and the reservation signal are combined to selectively produce a silent store squash signal. The silent store squash signal cancels sending an invalidation signal. | 08-20-2009 |
20090210634 | Data transfer controller, data consistency determination method and storage controller - A data transfer controller of the present invention can determine whether or not data has been correctly stored in a cache memory even when the data is not transferred to the cache memory in sequential order. Data inputted from a host is transferred to and stored in a prescribed area of the cache memory. First check data is created and stored for each block. A data consistency determination module reads out the data from the cache memory subsequent to the end of a data write, and creates second check data anew. By comparing the second check data against the first check data, it can be determined whether or not the data has been stored normally in the cache memory. The data consistency determination module can also determine the consistency of the data on the basis of the data address written to the cache memory. | 08-20-2009 |
20090216957 | Managing the storage of data in coherent data stores - A data processing apparatus is disclosed that comprises: at least one processor; at least one data store for storing data processed by said at least one processor; a shared data store for storing data processed by said at least one processor and at least one further device; and coherency control circuitry responsive to a write request from said at least one further device to determine if data related to an address targeted by said write request is stored in said at least one data store, and if it is forcing an eviction of said stored data from said at least one data store to said shared data store prior to performing said write to said shared data store; wherein said data is stored in said at least one data store in conjunction with an indicator indicating if said stored data is consistent with data stored in a corresponding address in a further data store, and said stored data is evicted whether said stored data is indicated as being consistent or inconsistent. | 08-27-2009 |
20090248988 | MECHANISM FOR MAINTAINING CONSISTENCY OF DATA WRITTEN BY IO DEVICES - A multi-core microprocessor includes, in part, a cache coherence manager that maintains coherence among the multitude of microprocessor cores, and an I/O coherence unit that maintains coherent traffic between the I/O devices and the multitude of processing cores of the microprocessor. The I/O coherence unit stalls non-coherent I/O write requests until it receives acknowledgement that all pending coherent I/O write requests issued prior to the non-coherence I/O write requests have been made visible to the processing cores. The I/O coherence unit ensures that MMIO read responses are not delivered to the processing cores until after all previous I/O write requests are made visible to the processing cores. Deadlock conditions are prevented by limiting MMIO requests in such a way that they can never block I/O write requests from completing. | 10-01-2009 |
20090248989 | Multiprocessor computer system with reduced directory requirement - The invention has application in implementation of large Symmetric Multiprocessor Systems with a large number of nodes which include processing elements and associated cache memories. The illustrated embodiment of the invention provides for interconnection of a large number of multiprocessor nodes while reducing over the prior art the size of directories for tracking of memory coherency throughout the system. The embodiment incorporates within the memory controller of each node, directory information relating to the current locations of memory blocks which allows for elimination at a higher level in the node controllers of a larger volume of directory information relating to the location of memory blocks. This arrangement thus allows for more efficient implementation of very large multiprocessor computer systems. | 10-01-2009 |
20090254712 | ADAPTIVE CACHE ORGANIZATION FOR CHIP MULTIPROCESSORS - A method, chip multiprocessor tile, and a chip multiprocessor with amorphous caching are disclosed. An initial processing core | 10-08-2009 |
20090276578 | CACHE COHERENCY PROTOCOL IN A DATA PROCESSING SYSTEM - A data processing system includes a first master having a cache, a second master, a memory operably coupled to the first master and the second master via a system interconnect. The cache includes a cache controller which implements a set of cache coherency states for data units of the cache. The cache coherency states include an invalid state; an unmodified non-coherent state indicating that data in a data unit of the cache has not been modified and is not guaranteed to be coherent with data in at least one other storage device of the data processing system, and an unmodified coherent state indicating that the data of the data unit has not been modified and is coherent with data in the at least one other storage device of the data processing system. | 11-05-2009 |
20090276579 | CACHE COHERENCY PROTOCOL IN A DATA PROCESSING SYSTEM - A method includes detecting a bus transaction on a system interconnect of a data processing system having at least two masters; determining whether the bus transaction is one of a first type of bus transaction or a second type of bus transaction, where the determining is based upon a burst attribute of the bus transaction; performing a cache coherency operation for the bus transaction in response to the determining that the bus transaction is of the first type, where the performing the cache coherency operation includes searching at least one cache of the data processing system to determine whether the at least one cache contains data associated with a memory address the bus transaction; and not performing cache coherency operations for the bus transaction in response to the determining that the bus transaction is of the second type. | 11-05-2009 |
20090292881 | DISTRIBUTED HOME-NODE HUB - A method and a system for processor nodes configurable to operate in various distributed shared memory topologies. The processor node may be coupled to a first local memory. The first processor node may include a first local arbiter, which may be configured to perform one or more of a memory node decode or a coherency check on the first local memory. The processor node may also include a switch coupled to the first local arbiter for enabling and/or disabling the first local arbiter. Thus one or more processor nodes may be coupled together in various distributed shared memory configurations, depending on the configuration of their respective switches. | 11-26-2009 |
20090292882 | STORAGE AREA NETWORK SERVER WITH PARALLEL PROCESSING CACHE AND ACCESS METHOD THEREOF - A storage area network (SAN) server with a parallel processing cache and an access method thereof are described, which are supplied for a plurality of request to access data in a server through an SAN. The server includes physical storage devices, for storing data sent by the request and data transmitted to the request; copy managers, for managing the physical storage devices connected to the server, and each copy manager includes a cache memory unit, for temporarily storing the data accessed by the physical storage devices, and a data manager, for recording an index of the data in the cache memory unit, providing a cache copy stored in the cache memory unit to a corresponding request end, and confirming an access time for each virtual device manager to access the cache copy. | 11-26-2009 |
20090300291 | Implementing Cache Coherency and Reduced Latency Using Multiple Controllers for Memory System - A method and apparatus implement cache coherency and reduced latency using multiple controllers for a memory system, and a design structure is provided on which the subject circuit resides. A first memory controller uses a first memory as its primary address space, for storage and fetches. A second memory controller is also connected to the first memory. A second memory controller uses a second memory as its primary address space, for storage and fetches. The first memory controller is also connected to the second memory. The first memory controller and the second memory controller, for example, are connected together by a processor communications bus. A request and send sequence of the invention sends data directly to a requesting memory controller eliminating the need to re-route data back through a responding controller, and improving the latency of the data transfer. | 12-03-2009 |
20090300292 | Using criticality information to route cache coherency communications - In one embodiment, the present invention includes a method for receiving a cache coherency message in an interconnect router from a caching agent, mapping the message to a criticality level according to a predetermined mapping, and appending the criticality level to each flow control unit of the message, which can be transmitted from the interconnect router based at least in part on the criticality level. Other embodiments are described and claimed. | 12-03-2009 |
20090300293 | Dynamically Partitionable Cache - Methods and systems for dynamically partitioning a cache and maintaining cache coherency are provided. In an embodiment, a system for processing memory requests includes a cache and a cache controller configured to compare a memory address and a type of a received memory request to a memory address and a type, respectively, corresponding to a cache line of the cache to determine whether the memory request hits on the cache line. In another embodiment, a method for processing fetch memory requests includes receiving a memory request and determining if the memory request hits on a cache line of a cache by determining if a memory address and a type of the memory request match a memory address and a type, respectively, corresponding to a cache line of the cache. | 12-03-2009 |
20090313439 | MANAGING COHERENCE VIA PUT/GET WINDOWS - A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements. | 12-17-2009 |
20090319726 | Efficient Region Coherence Protocol for Clustered Shared-Memory Multiprocessor Systems - A system and method of a region coherence protocol for use in Region Coherence Arrays (RCAs) deployed in clustered shared-memory multiprocessor systems which optimize cache-to-cache transfers by allowing broadcast memory requests to be provided to only a portion of a clustered shared-memory multiprocessor system. Interconnect hierarchy levels can be devised for logical groups of processors, processors on the same chip, processors on chips aggregated into a multichip module, multichip modules on the same printed circuit board, and for processors on other printed circuit boards or in other cabinets. The present region coherence protocol includes, for example, one bit per level of interconnect hierarchy, such that the one bit has a value of “1” to indicate that there may be processors caching copies of lines from the region at that level of the interconnect hierarchy, and the one bit has a value of “0” to indicate that there are no cached copies of any lines from the region at that respective level of the interconnect hierarchy. | 12-24-2009 |
20090327612 | Access Speculation Predictor with Predictions Based on a Domain Indicator of a Cache Line - An access speculation predictor may predict whether to perform speculative retrieval of data for a data request from a main memory based on whether or not a domain indicator in the data request indicates that the cache line corresponding to the data has a special invalid state or not. In particular, a first address and a domain indicator are extracted from first data request. The first address is used to select a finite state machine (FSM) of a memory controller based on memory regions associated with the FSMs of the memory controller. Speculative retrieval of data for the first data request from main memory is controlled based on whether the domain indicator identifies the special invalid state or not and, if the domain indicator identifies that the cache line does not have the special invalid state, based on information stored in registers associated with the selected FSM. | 12-31-2009 |
20090327613 | System and Method for a Software Managed Cache in a Multiprocessing Environment - A method for implementing a software-managed cache comprises determining an object identifier (ID) for each of a first set of objects of a plurality of objects resident in a local memory, to generate a first cache table, the first cache table comprising a plurality of entries. Each object comprises an object ID and an effective address. The method receives a request for an object, the request comprising an object ID. The method compares the received object ID with the entries in the first cache table. In the event the received object ID matches an entry in the first cache table, the method returns the matching entry in response to the request. In the event the received object ID does not match an entry in the first cache table, the method calculates an effective address in the local memory of the object associated with the object ID. | 12-31-2009 |
20100037027 | Transaction Manager And Cache For Processing Agent - A processing agent is used in a system that transfers data of a predetermined data line length during external transactions. The agent may include an internal cache having a plurality of cache entries. Each cache entry may store multiple data line lengths of data. The agent further may include a transaction queue system having queue entries that include a primary entry including an address portion and status portion, the status portion provided for a first external transaction of the agent, and a secondary entry including a status portion provided for a second external transaction. | 02-11-2010 |
20100057996 | CACHE LOGIC VERIFICATION APPARATUS AND CACHE LOGIC VERIFICATION METHOD - A cache logic verification apparatus is provided. The cache logic verification apparatus includes an acquisition unit that acquires an ongoing process in each stage of a stepped operation to judge whether data to be read in a cache memory holding a copy of contents of a part of a memory is held or not; and a comparator that compares the ongoing process in each stage acquired by the acquisition unit with a scheduled ongoing process predetermined in each stage of the stepped operation. | 03-04-2010 |
20100070717 | Techniques for Cache Injection in a Processor System Responsive to a Specific Instruction Sequence - A technique for performing cache injection includes monitoring an instruction stream for a specific instruction sequence. Addresses on a bus are then monitored, at a cache, in response to detecting the specific instruction sequence a determined number of times. Ownership of input/output data on the bus is then acquired by the cache when an address on the bus (that is associated with the input/output data) corresponds to an address of a data block stored in the cache. | 03-18-2010 |
20100070718 | Memory management in a shared memory system - Methods, systems and computer program products to maintain cache coherency, in a System On Chip (SOC) which is part of a distributed shared memory system are described. A local SOC unit that includes a local controller and an on-chip memory is provided. In response to receiving a request from a remote controller of a remote SOC to access a memory location, the local controller determines whether the local SOC has exclusive ownership of the requested memory location, sends data from the memory location if the local SOC has exclusive ownership of the memory location and stores an entry in the on-chip memory that identifies the remote SOC as having requested data from the memory location. The entry specifies whether the request from the remote SOC is for exclusive ownership of the memory location. The entry also includes a field that identifies the remote SOC as the requester. The requested memory location may be external or internal to the local SOC unit. | 03-18-2010 |
20100106912 | COHERENCE PROTOCOL WITH DYNAMIC PRIVATIZATION - Embodiments of the present invention provide a system that maintains coherence between cache lines in a computer system by using dynamic privatization. During operation, the system starts by receiving a request for a read-only copy of a cache line from a processor. The system then determines if the processor has privately requested the cache line a predetermined number of times. If so, the system provides a copy of the cache line to the processor in an exclusive state. Otherwise, the system provides a copy of the cache line to the processor in a shared state. | 04-29-2010 |
20100106913 | Cache memory control apparatus and cache memory control method - According to an aspect of the embodiment, an FP includes a plurality of entries which holds requests to be processed, and each of the plurality of entries includes a requested flag indicating that data transfer is once requested. An FP-TOQ holds information indicating an entry holding the oldest request. A data transfer request prevention determination circuit checks the requested flag of a request to be processed and the FP-TOQ, and when a transfer request of data as a target of the request to be processed has already been issued and the entry holding the request to be processed is not the entry indicated by the FP-TOQ, transmits a signal which prevents the transfer request of the data to a data transfer request control circuit. Even when a cache miss occurs in a primary cache RAM, the data transfer request control circuit does not issue a data transfer request when the signal which prevents the transfer request is received. | 04-29-2010 |
20100146215 | METHOD AND APPARATUS FOR FILTERING MEMORY WRITE SNOOP ACTIVITY IN A DISTRIBUTED SHARED MEMORY COMPUTER - A method and apparatus for filtering memory probe activity for writes in a distributed shared memory computer. In one embodiment, the method may include assigning an uncached directory state to a cache data block in response to evicting the cache data block. In another embodiment, the method may include assigning a remote directory state to a cache data block in response to evicting the cache data block and storing it in a remote cache. In a third embodiment, the method may include assigning a pairwise-shared directory state in response to a second processor node initiating a load operation to a cache data block in a modified cache state in a first processor node. In a fourth embodiment, the method may include assigning a migratory directory state in response to a processor node initiating a store operation to a cache data block in a pairwise-shared cache state. | 06-10-2010 |
20100146216 | METHOD AND APPARATUS FOR FILTERING MEMORY WRITE SNOOP ACTIVITY IN A DISTRIBUTED SHARED MEMORY COMPUTER - A method and apparatus for filtering memory probe activity for writes in a distributed shared memory computer. In one embodiment, the method may include assigning an uncached directory state to a cache data block in response to evicting the cache data block. In another embodiment, the method may include assigning a remote directory state to a cache data block in response to evicting the cache data block and storing it in a remote cache. In a third embodiment, the method may include assigning a pairwise-shared directory state in response to a second processor node initiating a load operation to a cache data block in a modified cache state in a first processor node. In a fourth embodiment, the method may include assigning a migratory directory state in response to a processor node initiating a store operation to a cache data block in a pairwise-shared cache state. | 06-10-2010 |
20100153656 | DATA PROCESSOR - A data processor includes a cache memory control section which includes: a hit/miss determination section which is supplied with a request for data processing to determine whether data to be processed is present in a cache memory and outputs a cache hit/miss determination result and, if having determined that the data is not present in the cache memory, feeds a read command to make an upper memory control section read the data from the upper memory; a FIFO storage which stores the cache hit/miss determination result and the in-block read position information according to a FIFO system; and a cache memory read/write section which reads the hit/miss determination result and the in-block read position information from the FIFO storage and reads the data from the cache memory, or writes the data from the upper memory control section into the cache memory and outputs the data. | 06-17-2010 |
20100169579 | READ AND WRITE MONITORING ATTRIBUTES IN TRANSACTIONAL MEMORY (TM) SYSTEMS - A method and apparatus for monitoring memory accesses in hardware to support transactional execution is herein described. Attributes are monitor accesses to data items without regard for detection at physical storage structure granularity, but rather ensuring monitoring at least at data items granularity. As an example, attributes are added to state bits of a cache to enable new cache coherency states. Upon a monitored memory access to a data item, which may be selectively determined, coherency states associated with the data item are updated to a monitored state. As a result, invalidating requests to the data item are detected through combination of the request type and the monitored coherency state of the data item. | 07-01-2010 |
20100169580 | MEMORY MODEL FOR HARDWARE ATTRIBUTES WITHIN A TRANSACTIONAL MEMORY SYSTEM - A method and apparatus for providing a memory model for hardware attributes to support transactional execution is herein described. Upon encountering a load of a hardware attribute, such as a test monitor operation to load a read monitor, write monitor, or buffering attribute, a fault is issued in response to a loss field indicating the hardware attribute has been lost. Furthermore, dependency actions, such as blocking and forwarding, are provided for the attribute access operations based on address dependency and access type dependency. As a result, different scenarios for attribute loss and testing thereof are allowed and restricted in a memory model. | 07-01-2010 |
20100169581 | EXTENDING CACHE COHERENCY PROTOCOLS TO SUPPORT LOCALLY BUFFERED DATA - A method and apparatus for extending cache coherency to hold buffered data to support transactional execution is herein described. A transactional store operation referencing an address associated with a data item is performed in a buffered manner. Here, the coherency state associated with cache lines to hold the data item are transitioned to a buffered state. In response to local requests for the buffered data item, the data item is provided to ensure internal transactional sequential ordering. However, in response to external access requests, a miss response is provided to ensure the transactionally updated data item is not made globally visible until commit. Upon commit, the buffered lines are transitioned to a modified state to make the data item globally visible. | 07-01-2010 |
20100191919 | APPEND-BASED SHARED PERSISTENT STORAGE - A shared storage system is described herein that is based on an append-only model of updating a storage device to allow multiple computers to access storage with lighter-weight synchronization than traditional systems and to reduce wear on flash-based storage devices. Appending data allows multiple computers to write to the same storage device without interference and without synchronization between the computers. Computers can also safely read a written page without using synchronization because the system limits how data can be changed once written. The system may record a log of append operations performed and ensure idempotence by storing a key specified by the caller in the log along with each log entry. The system also provides broadcasts about appended data to computers so that coordination between computers can occur without direct communication between the computers. | 07-29-2010 |
20100199046 | METHOD AND DEVICE FOR CONTROLLING A MEMORY ACCESS IN A COMPUTER SYSTEM HAVING AT LEAST TWO EXECUTION UNITS - A method and device for controlling memory access in a computer system having at least two execution units, a buffer area, in particular a cache memory area being provided for each execution unit, and furthermore a switchover device and a comparison device being provided, the system switching between a performance mode and a compare mode, wherein in the performance mode each execution unit accesses the buffer area assigned to it and in the compare mode both execution units access one buffer area that can be predefined, the buffer areas being configurable. | 08-05-2010 |
20100199047 | EXPIRING VIRTUAL CONTENT FROM A CACHE IN A VIRTUAL UNINERSE - An invention that expires cached virtual content in a virtual universe is provided. In one embodiment, there is an expiration tool, including an identification component configured to identify virtual content associated with an avatar in the virtual universe; an analysis component configured to analyze a behavior of the avatar in a region of the virtual universe; and an expiration component configured to expire cached virtual content associated with the avatar based on the behavior of the avatar in the region of the virtual universe. | 08-05-2010 |
20100199048 | SPECULATIVE WRITESTREAM TRANSACTION - Embodiments of the present invention provide a system that performs a speculative writestream transaction. The system starts by receiving, at a home node, a writestream ordered (WSO) request to start a WSO transaction from a processing subsystem. The WSO request identifies a cache line to be written during the WSO transaction. The system then sends an acknowledge signal to the processing subsystem to enable the processing subsystem to proceed with the WSO transaction. During the WSO transaction, the system receives a second WSO request to start a WSO transaction. The second WSO request identifies the same cache line as to be written during the subsequent WSO transaction. In response to receiving the second WSO request, the system sends an abort signal to cause the processing subsystem to abort the WSO transaction. | 08-05-2010 |
20100199049 | PARAMETER COPYING METHOD AND PARAMETER COPYING DEVICE - A parameter copying method is applied to a duplex system in which MPU and a main memory are duplicated and duplex operations on a hot standby system are performed. The parameter copying method includes cache reading data in the main memory corresponding to one MPU, cache writing the data read in the cache reading step on an as-is basis, and writing the data into the main memory corresponding to the one MPU by a block write that is produced by a cache replace caused due to the cache writing step, and also writing the same data into the main memory corresponding to the other MPU by the block write on a basis of a mirrored write. | 08-05-2010 |
20100217938 | METHOD AND AN APPARATUS TO IMPROVE LOCALITY OF REFERENCES FOR OBJECTS - Some embodiments of a method and an apparatus to improve locality of references for objects have been presented. In one embodiment, an access counter is provided to each of a set of objects in a computing system. The access counter is incremented each time a respective object is accessed. In response to a request to organize the objects, the objects are sorted by their respective counts of access in the access counters. | 08-26-2010 |
20100217939 | DATA PROCESSING SYSTEM - A data processing system includes a plurality of nodes connected with each other, each of the nodes including a processor and a memory, each of the processor including a processing unit, a cache memory, a tag memory for storing tag information, the processor accessing data to be processed, in the tag memory in reference to the tag information, and a cache controller for controlling saving or evacuating of data in the cache memory, the cache controller, checking if the data to be evacuated originated from the memory of its own node or from any other memory of any other node, and when the data to be evacuated originated from any other memory of any other node, storing the data into the memory of its own node at a particular address of the memory and storing information of the particular address in the tag memory as tag information. | 08-26-2010 |
20100241812 | DATA PROCESSING SYSTEM WITH A PLURALITY OF PROCESSORS, CACHE CIRCUITS AND A SHARED MEMORY - Data from a shared memory ( | 09-23-2010 |
20100250860 | Method and System for Managing Cache Invalidation - In one embodiment the present invention includes a method and system for managing cache invalidation. In one embodiment, connection information to a database in stored in an intermediate cache management module. If changes are made to objects in the database, the objects are invalidated in a local cache. The connection information is accessed and used to connect to the database by an invalidation listener. The invalidation listener may determine the changes so that the changes can be reflected in the cache. Embodiments of the present invention may be implemented across multiple nodes in a clustered environment for updating caches on different nodes in response to changes to data objects performed by other nodes. | 09-30-2010 |
20100250861 | FAIRNESS MECHANISM FOR STARVATION PREVENTION IN DIRECTORY-BASED CACHE COHERENCE PROTOCOLS - Methods and apparatus relating to a fairness mechanism for starvation prevention in directory-based cache coherence protocols are described. In one embodiment, negatively-acknowledged (nack'ed) requests from a home agent may be tracked (e.g., using distributed linked-lists). In turn, the tracked requests may be served in a fair order. Other embodiments are also disclosed. | 09-30-2010 |
20100262786 | Barriers Processing in a Multiprocessor System Having a Weakly Ordered Storage Architecture Without Broadcast of a Synchronizing Operation - A data processing system employing a weakly ordered storage architecture includes first and second sets of processing units coupled to each other and data storage by an interconnect fabric. Each processing unit has a processor core having an associated cache hierarchy including at least a level one, level two and level three cache memories. In response to a request to perform an update to a portion of a first image of memory contained in the level three cache memory of a first processing unit while at last one kill-type command is pending at the first processing unit, the cache hierarchy of the first processing unit permitting the update to be exposed to any first processor core only after the at least one kill-type command is complete. | 10-14-2010 |
20100268895 | INFORMATION HANDLING SYSTEM WITH IMMEDIATE SCHEDULING OF LOAD OPERATIONS - An information handling system (IHS) includes a processor with a cache memory system. The processor includes a processor core with an L1 cache memory that couples to an L2 cache memory. The processor includes an arbitration mechanism that controls load and store requests to the L2 cache memory. The arbitration mechanism includes control logic that enables a load request to interrupt a store request that the L2 cache memory is currently servicing. When the L2 cache memory finishes servicing the interrupting load request, the L2 cache memory may return to servicing the interrupted store request at the point of interruption. | 10-21-2010 |
20100299485 | CIRCUIT AND METHOD WITH CACHE COHERENCE STRESS CONTROL - A circuit contains a shared memory ( | 11-25-2010 |
20100318746 | MEMORY CHANGE TRACK LOGGING - A method for tracking memory changes includes defining a change-track area of memory including at least one memory address range for which changes will be tracked. The method also includes allocating a protected log region of memory for storing a change-track log and selecting an operational mode for change tracking from among a plurality of modes, the selected operational mode having criteria for tracking memory changes. The method includes detecting memory transactions using a memory logging module and generating a transaction record for each memory transaction that occurs in the change-track are of memory and which meets the criteria. The transaction records can be stored in the change-track log. | 12-16-2010 |
20100332765 | HIERARCHICAL BLOOM FILTERS FOR FACILITATING CONCURRENCY CONTROL - Some embodiments provide a system that facilitates concurrency control in a computer system. During operation, the system generates a set of signatures associated with memory accesses in the computer system. To generate the signatures, the system creates a set of hierarchical Bloom filters (HBFs) corresponding to the signatures, and populates the HBFs using addresses associated with the memory accesses. Next, the system compares the HBFs to detect a potential conflict associated with the memory accesses. Finally, the system manages concurrent execution in the computer system based on the detected potential conflict. | 12-30-2010 |
20110016277 | Method for Performing Cache Coherency in a Computer System - In a computing system, cache coherency is performed by selecting one of a plurality of coherency protocols for a first memory transaction. Each of the plurality of coherency protocols has a unique set of cache states that may be applied to cached data for the first memory transaction. Cache coherency is performed on appropriate caches in the computing system by applying the set of cache states of the selected one of the plurality of coherency protocols. | 01-20-2011 |
20110029738 | LOW-COST CACHE COHERENCY FOR ACCELERATORS - Embodiments of the invention provide methods and systems for reducing the consumption of inter-node bandwidth by communications maintaining coherence between accelerators and CPUs. The CPUs and the accelerators may be clustered on separate nodes in a multiprocessing environment. Each node that contains a shared memory device may maintain a directory to track blocks of shared memory that may have been cached at other nodes. Therefore, commands and addresses may be transmitted to processors and accelerators at other nodes only if a memory location has been cached outside of a node. Additionally, because accelerators generally do not access the same data as CPUs, only initial read, write, and synchronization operations may be transmitted to other nodes. Intermediate accesses to data may be performed non-coherently. As a result, the inter-chip bandwidth consumed for maintaining coherence may be reduced. | 02-03-2011 |
20110047334 | Checkpointing in Speculative Versioning Caches - Mechanisms for generating checkpoints in a speculative versioning cache of a data processing system are provided. The mechanisms execute code within the data processing system, wherein the code accesses cache lines in the speculative versioning cache. The mechanisms further determine whether a first condition occurs indicating a need to generate a checkpoint in the speculative versioning cache. The checkpoint is a speculative cache line which is made non-speculative in response to a second condition occurring that requires a roll-back of changes to a cache line corresponding to the speculative cache line. The mechanisms also generate the checkpoint in the speculative versioning cache in response to a determination that the first condition has occurred. | 02-24-2011 |
20110055489 | Managing Counter Saturation In A Filter - Filters and methods for managing presence counter saturation are disclosed. The filters can be coupled to a collection of items and maintain information for determining a potential presence of an identified item in the collection of items. The filter includes a filter controller and one or more mapping functions. Each mapping function has a plurality of counters associated with the respective mapping function. When a membership status of an item in the collection of items changes, the filter receives a membership change notification including an identifier identifying the item. Each mapping function processes the identifier to identify a particular counter associated with the respective mapping function. If a particular counter has reached a predetermined value, a request including a reference to the particular counter is sent to the collection of items. The filter receives a response to the request and modifies the particular counter as a result of the response. | 03-03-2011 |
20110072219 | MANAGING COHERENCE VIA PUT/GET WINDOWS - A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements. | 03-24-2011 |
20110078383 | Cache Management for Increasing Performance of High-Availability Multi-Core Systems - An apparatus and method for improving performance in high-availability systems are disclosed. In accordance with the illustrative embodiment, pages of memory of a primary system that are to be shadowed are initially copied to a backup system's memory, as well as to a cache in the primary system. A duplication manager process maintains the cache in an intelligent manner that significantly reduces the overhead required to keep the backup system in sync with the primary system, as well as the cache size needed to achieve a given level of performance. Advantageously, the duplication manager is executed on a different processor core than the application process executing transactions, further improving performance. | 03-31-2011 |
20110093659 | DATA STORAGE DEVICE AND DATA STORING METHOD THEREOF - A data storage device and a data storing method thereof, including first main memories coupled to a plurality of channels, second main memories coupled to the plurality of channels in common, a buffer memory temporarily storing data to be programmed to the first and the second main memories; and a controller configured to program data of victim cache lines from the buffer memory to the second main memories while data of a first victim cache line from the buffer memory is being programmed to the first main memories. The storing method includes that a victim cache line is selected based on cost-based page replacement. | 04-21-2011 |
20110093660 | MULTI-CORE PROCESSING SYSTEM - A system has a first plurality of cores in a first coherency group. Each core transfers data in packets. The cores are directly coupled serially to form a serial path. The data packets are transferred along the serial path. The serial path is coupled at one end to a packet switch. The packet switch is coupled to a memory. The first plurality of cores and the packet switch are on an integrated circuit. The memory may or may not be on the integrated circuit. In another aspect a second plurality of cores in a second coherency group is coupled to the packet switch. The cores of the first and second pluralities may be reconfigured to form or become part of coherency groups different from the first and second coherency groups. | 04-21-2011 |
20110099334 | CORE CLUSTER, ENERGY SCALABLE VECTOR PROCESSING APPARATUS AND METHOD OF VECTOR PROCESSING INCLUDING THE SAME - A core cluster includes a cache memory, a core, and a cluster cache controller. The cache memory stores and provides instructions and data. The core accesses the cache memory or a cache memory provided in an adjacent core cluster, and performs an operation. The cluster cache controller allows the core to access the cache memory when the core requests memory access. The cluster cache controller allows the core to access the cache memory provided in the adjacent core cluster when the core requests a clustering to the adjacent core cluster. The cluster cache controller allows a core provided in the adjacent core cluster to access the cache memory when the core receives a clustering request from the adjacent core cluster. | 04-28-2011 |
20110099335 | SYSTEM AND METHOD FOR HARDWARE ACCELERATION OF A SOFTWARE TRANSACTIONAL MEMORY - In a transactional memory technique, hardware serves simply to optimize the performance of transactions that are controlled fundamentally by software. The hardware support reduces the overhead of common TM tasks—conflict detection, validation, and data isolation—for common-case bounded transactions. Software control preserves policy flexibility and supports transactions unbounded in space and in time. The hardware includes 1) an alert-on-update mechanism for fast software-controlled conflict detection; and 2) programmable data isolation, allowing potentially conflicting readers and writers to proceed concurrently under software control. | 04-28-2011 |
20110119450 | MULTI-PROCESSOR AND APPARATUS AND METHOD FOR MANAGING CACHE COHERENCE OF THE SAME - A cache consistency management device according to example embodiments comprises a ping-pong monitoring unit monitoring a ping-pong migration sequence generated between a plurality of processors; a counting unit counting the number of successive generations of the ping-pong migration sequence in response to the monitoring result; and a request modifying unit modifying a migration request to a request of a non-migratory sharing method on the basis of the counting result. | 05-19-2011 |
20110131381 | CACHE SCRATCH-PAD AND METHOD THEREFOR - An address containing data to be accessed is determined in response to executing an instruction received at a processor core of a microprocessor. During a scratch-pad mode of operation, it is determined whether a set of cache lines of a data cache is accessible based upon the memory location from which the instruction was retrieved. The address space of the data cache during scratch-pad mode can be isolated from other address spaces. | 06-02-2011 |
20110145510 | REDUCING INTERPROCESSOR COMMUNICATIONS PURSUANT TO UPDATING OF A STORAGE KEY - Processing within a multiprocessor computer system is facilitated by: deciding by a processor, pursuant to processing of a request to update a previous storage key to a new storage key, whether to purge the previous storage key from, or update the previous storage key in, local processor cache of the multiprocessor computer system. The deciding includes comparing a bit value(s) of one or more required components of the previous storage key to respective predefined allowed stale value(s) for the required component(s), and leaving the previous storage key in local processor cache if the bit value(s) of the required component(s) in the previous storage key equals the respective predefined allowed stale value(s) for the required component(s). By selectively leaving the previous storage key in local processor cache, interprocessor communication pursuant to processing of the request to update the previous storage key to the new storage key is minimized. | 06-16-2011 |
20110145511 | PAGE INVALIDATION PROCESSING WITH SETTING OF STORAGE KEY TO PREDEFINED VALUE - Processing within a multiprocessor computer system is facilitated by: setting, in association with invalidate page table entry processing, a storage key at a matching location in central storage of a multiprocessor computer system to a predefined value; and subsequently executing a request to update the storage key to a new storage key, the subsequently executing including determining whether the predefined value is an allowed stale value, and if so, replacing in central storage the storage key of predefined value with the new storage key without requiring purging or updating of the storage key in any local processor cache of the multiprocessor computer system, thus minimizing interprocessor communication pursuant to processing of the request to update the storage key to the new storage key. | 06-16-2011 |
20110145512 | Mechanisms To Accelerate Transactions Using Buffered Stores - In one embodiment, the present invention includes a method for executing a transactional memory (TM) transaction in a first thread, buffering a block of data in a first buffer of a cache memory of a processor, and acquiring a write monitor on the block to obtain ownership of the block at an encounter time in which data at a location of the block in the first buffer is updated. Other embodiments are described and claimed. | 06-16-2011 |
20110145513 | System and method for reduced latency caching - A reduced latency memory system that prevents memory bank conflicts. The reduced latency memory system receives a read request and write request. The read request is then handled by simultaneously fetching data from a main memory and a cache memory. The address of the read request is compared with a cache tag value and if the cache tag value matches the address of the read request, the data from the cache memory is served. The write request is stored and handled in a subsequent memory cycle. | 06-16-2011 |
20110153954 | STORAGE SUBSYSTEM - Provided is a storage subsystem capable of speeding up the input/output processing for a cache memory. Microprocessor Packages manage information related to a VDEV ownership for controlling virtual devices and a cache segment ownership for controlling cache segments in units of Microprocessor Packages, and one Microprocessor among multiple Microprocessors belonging to the determined Microprocessor Package to perform input/output processing for the virtual devices searches cache control information stored in the Package Memory without searching the cache control information in the shared memory, and if data exists in the cache memory, accesses the cache memory, and if it does not, accesses the virtual devices. | 06-23-2011 |
20110161599 | Handling of a wait for event operation within a data processing apparatus - A data processing apparatus and method are provided for handling of a wait for event operation. The data processing apparatus forms a portion of a coherent cache system and has a master device for performing data processing operations, including a wait for event operation causing the master device to enter a power saving mode. A cache is coupled to the master device and arranged to store data values for access by the master device when performing the data processing operations. Cache coherency circuitry is responsive to a coherency request from another portion of the coherent cache system, to detect whether a data value identified by the coherency request is present in the cache, and if so to cause a coherency action to be taken in respect of that data value stored in the cache. Wake event circuitry is responsive to the cache coherency circuitry to issue a wake event to the master device if the coherency action is taken. The master device is then responsive to the wake event to exit the power saving mode. Such a mechanism provides a simple and effective technique for causing the master device to exit the power saving mode, which can be used in all hardware implementations of coherent cache systems irrespective of the type of master devices provided within the coherent cache system. | 06-30-2011 |
20110161600 | Arithmetic processing unit, information processing device, and cache memory control method - A processor holds, in a plurality of respective cache lines, part of data held in a main memory unit. The processor also holds, in the plurality of respective cache lines, a tag address used to search for the data held in the cache lines and a flag indicating the validity of the data held in the cache lines. The processor executes a cache line fill instruction on a cache line corresponding to a specified address. Upon execution of the cache line fill instruction, the processor registers predetermined data in the cache line of the cache memory unit which has a tag address corresponding to the specified address and validates a flag in the cache line having the tag address corresponding to the specified address. | 06-30-2011 |
20110179229 | STORE-OPERATE-COHERENCE-ON-VALUE - A system, method and computer program product for performing various store-operate instructions in a parallel computing environment that includes a plurality of processors and at least one cache memory device. A queue in the system receives, from a processor, a store-operate instruction that specifies under which condition a cache coherence operation is to be invoked. A hardware unit in the system runs the received store-operate instruction. The hardware unit evaluates whether a result of the running the received store-operate instruction satisfies the condition. The hardware unit invokes a cache coherence operation on a cache memory address associated with the received store-operate instruction if the result satisfies the condition. Otherwise, the hardware unit does not invoke the cache coherence operation on the cache memory device. | 07-21-2011 |
20110191545 | SYSTEM AND METHOD FOR PERFORMING MEMORY OPERATIONS IN A COMPUTING SYSTEM - A processor may operate in one of a plurality of operating states. In a Normal operating state, the processor is not involved with a memory transaction. Upon receipt of a transaction instruction to access a memory location, the processor transitions to a Transaction operating state. In the Transaction operating state, the processor performs changes to a cache line and data associated with the memory location. While in the Transaction operating state, any changes to the data and the cache line is not visible to other processors in the computing system. These changes become visible upon the processor entering a Commit operating state in response to receipt of a commit instruction. After changes become visible, the processor returns to the Normal operating state. If an abort event occurs prior to receipt of the commit instruction, the processor transitions to an Abort operating state where any changes to the data and cache line are discarded. | 08-04-2011 |
20110202728 | METHODS AND APPARATUS FOR MANAGING CACHE PERSISTENCE IN A STORAGE SYSTEM USING MULTIPLE VIRTUAL MACHINES - Methods and systems for assuring persistence of battery backed cache memory in a storage system comprising multiple virtual machines. In one exemplary embodiment, an additional process is added to the storage controller that senses the loss of power and copies the entire content of the cache memory including portions used by each of the multiple virtual machines to a nonvolatile persistent storage that does not rely on the battery capacity of the storage system. In another exemplary embodiment, the additional process calls a plug-in procedure associated with each of the virtual machines to permit the virtual machine to assure that the content of its portion of the cache memory is consistent before the additional process copies the cache memory to nonvolatile memory. The additional process may be integrated with the hypervisor or may be operable as a separate process in yet another virtual machine. | 08-18-2011 |
20110202729 | EXECUTING ATOMIC STORE DISJOINT INSTRUCTIONS - A disjoint instruction for accessing operands in memory while executing in a processor of a plurality of processes interrogates a state indicator settable by other processors to determine if the disjoint instruction accessed the operands without an intervening store operation from another processor to the operand. A condition code is set based on the state indicator. | 08-18-2011 |
20110202730 | INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND COMPUTER-READABLE RECORDING MEDIUM - An information processing apparatus according to the present invention is arranged in a client terminal connected to a server storing data via a network, wherein the information processing apparatus receives requests from one or a plurality of applications in the client terminal and controls transmission and reception of information to/from the server. The information processing apparatus includes an authentication information storage unit for storing authentication information of a user for accessing the server, and a request transmission unit for attaching the authentication information of the user of the client terminal to a request based on the request given by the application of the client terminal, and transmits the request to the server. | 08-18-2011 |
20110202731 | CACHE WITHIN A CACHE - In a cache memory, energy and other efficiencies can be realized by saving a result of a cache directory lookup for sequential accesses to a same memory address. Where the cache is a point of coherence for speculative execution in a multiprocessor system, with directory lookups serving as the point of conflict detection, such saving becomes particularly advantageous. | 08-18-2011 |
20110208920 | FACILITATING SERVER RESPONSE OPTIMIZATION - A configuration of cached information stored within a cache is determined. One or more character omission rules are determined by: identifying the one or more optimizable characters based on the configuration, where the one or more optimizable characters are characters in the stored cached information that do not have an effect on an interpretation of the stored cached information by a requester computer; and determining, based on the configuration, one or more conditions under which omission of the one or more optimizable characters from the stored cached information produces a valid result in view of the configuration. One or more character omission rules are applied to the stored cached information by removing from the stored cached information the one or more optimizable characters that meet the one or more conditions. | 08-25-2011 |
20110225372 | CONCURRENT, COHERENT CACHE ACCESS FOR MULTIPLE THREADS IN A MULTI-CORE, MULTI-THREAD NETWORK PROCESSOR - Described embodiments provide a packet classifier of a network processor having a plurality of processing modules. A scheduler generates a thread of contexts for each tasks generated by the network processor corresponding to each received packet. The thread corresponds to an order of instructions applied to the corresponding packet. A multi-thread instruction engine processes the threads of instructions. A state engine operates on instructions received from the multi-thread instruction engine, the instruction including a cache access request to a local cache of the state engine. A cache line entry manager of the state engine translates between a logical index value of data corresponding to the cache access request and a physical address of data stored in the local cache. The cache line entry manager manages data coherency of the local cache and allows one or more concurrent cache access requests to a given cache data line for non-overlapping data units. | 09-15-2011 |
20110231614 | ACCELERATING MEMORY OPERATIONS USING VIRTUALIZATION INFORMATION - A method of accelerating memory operations using virtualization information includes executing a hypervisor on hardware resources of a computing system. A plurality of domains are created under the control of the hypervisor, are created. Each domain is allocated memory resources that include accessible memory space that is exclusively accessible by that domain. Each domain is allocated one or more processor resources. The hypervisor identifies domain layout information that includes a boundary of accessible memory space of each domain. The hypervisor provides the domain layout information to each processor resource. Each processor resource is configured to implement, on a per domain basis, a restricted coherency protocol based on the domain layout information. The restricted coherency protocol bypasses, relative to the domain, downstream aches when a cache line falls within the accessible memory space of that domain. | 09-22-2011 |
20110238925 | CACHE CONTROLLER AND METHOD OF OPERATION - In one embodiment, there are described a sectored cache system and method of operation. A cache data block comprises separately updatable cache sectors. A common tag block contains metadata for the cache sectors of the data block and is writable as a whole. A pending allocation table (PAT) contains data representing pending writes to the tag block. When writing changes data to the tag block, the changed data is broadcast to the PAT to update data representing other pending writes to the tag block so that when the other pending writes are written to the tag block changed data from received broadcasts is included. | 09-29-2011 |
20110238926 | Method And Apparatus For Supporting Scalable Coherence On Many-Core Products Through Restricted Exposure - In one embodiment, a multi-core processor having cores each associated with a cache memory, can operate such that when a first core is to access data owned by a second core present in a cache line associated with the second core, responsive to a request from the first core, cache coherency state information associated with the cache line is not updated. A coherence engine associated with the processor may receive the data access request and determine that the data is of a memory page owned by the first core and convert the data access request to a non-cache coherent request. Other embodiments are described and claimed. | 09-29-2011 |
20110252202 | SYSTEM AND METHOD FOR PROVIDING L2 CACHE CONFLICT AVOIDANCE - A system provides a cache memory coherency mechanism within a multi-processor computing system utilizing a shared memory space across the multiple processors. The system possesses a store address list for storing cache line addresses corresponding to a cache line write request issued by one of the multiple processors, a fetch address list for storing cache line addresses corresponding to a cache line fetch request issued by one of the multiple processors, a priority and pipeline module, a request tracker module and a read/write address list. The store address list and the fetch address list are queues containing result in cache lookup requests being done by the priority and pipeline module; and each entry in the store address list and the fetch address list possess status bits which indicate the state of the request. | 10-13-2011 |
20110258394 | MERGING DATA IN AN L2 CACHE MEMORY - A method for merging data including receiving a request from an input/output device to merge a data, wherein a merge of the data includes a manipulation of the data, determining that the data exists in a local cache memory that is in local communication with the input/output device, fetching the data to the local cache memory from a remote cache memory or a main memory if the data does not exist in the local cache memory, merging the data according to the request to obtain a merged data, and storing the merged data in the local cache, wherein the merging of the data is performed without using a memory controller within a control flow or a data flow of the merging of the data. | 10-20-2011 |
20110264865 | TECHNIQUES FOR DIRECTORY SERVER INTEGRATION - Techniques for directory server integration are disclosed. In one particular exemplary embodiment, the techniques may be realized as a method for directory server integration comprising setting one or more parameters determining a range of permissible expiration times for a plurality of cached directory entries, creating, in electronic storage, a cached directory entry from a directory server, assigning a creation time to the cached directory entry, and assigning at least one random value to the cached directory entry, the random value determining an expiration time for the cached directory entry within the range of permissible expiration times, wherein randomizing the expiration time for the cached directory entry among the range of permissible expiration times for a plurality of cached directory entries reduces an amount of synchronization required between cache memory and the directory server at a point in time. | 10-27-2011 |
20110302374 | LOCAL AND GLOBAL MEMORY REQUEST PREDICTOR - A method, circuit arrangement, and design structure utilize broadcast prediction data to determine whether to globally broadcast a memory request in a computing system of the type that includes a plurality of nodes, each node including a plurality of processing units. The method includes updating broadcast prediction data for a cache line associated with a first memory request within a hardware-based broadcast prediction data structure in turn associated with a first processing unit in response to the first memory request, the broadcast prediction data for the cache line including data associated with a history of ownership of the cache line. The method further comprises accessing the broadcast prediction data structure and determining whether to perform an early broadcast of a second memory request to a second node based on broadcast prediction data within the broadcast prediction data structure in response to that second memory request associated with the cache line. | 12-08-2011 |
20110314228 | Maintaining Cache Coherence In A Multi-Node, Symmetric Multiprocessing Computer - Maintaining cache coherence in a multi-node, symmetric multiprocessing computer, the computer composed of a plurality of compute nodes, including, broadcasting upon a cache miss by a first compute node a request for a cache line; transmitting from each of the other compute nodes to all other nodes the state of the cache line on that node, including transmitting from any compute node having a correct copy to the first node the correct copy of the cache line; and updating by each node the state of the cache line in each node, in dependence upon one or more of the states of the cache line in all the nodes. | 12-22-2011 |
20110320737 | Main Memory Operations In A Symmetric Multiprocessing Computer - Main memory operation in a symmetric multiprocessing computer, the computer comprising one or more processors operatively coupled through a cache controller to at least one cache of main memory, the main memory shared among the processors, the computer further comprising input/output (‘I/O’) resources, including receiving, in the cache controller from an issuing resource, a memory instruction for a memory address, the memory instruction requiring writing data to main memory; locking by the cache controller the memory address against further memory operations for the memory address; advising the issuing resource of completion of the memory instruction before the memory instruction completes in main memory; issuing by the cache controller the memory instruction to main memory; and unlocking the memory address only after completion of the memory instruction in main memory. | 12-29-2011 |
20110320738 | Maintaining Cache Coherence In A Multi-Node, Symmetric Multiprocessing Computer - Maintaining cache coherence in a multi-node, symmetric multiprocessing computer, the computer composed of a plurality of compute nodes, including, broadcasting upon a cache miss by the first compute node to other compute nodes a request for the cache line; if at least two of the compute nodes has a correct copy of the cache line, selecting which compute node is to transmit the correct copy of the cache line to the first node, and transmitting from the selected compute node to the first node the correct copy of the cache line; and updating by each node the state of the cache line in each node, in dependence upon one or more of the states of the cache line in all the nodes. | 12-29-2011 |
20110320739 | DISCOVERY OF NETWORK SERVICES - Discovery of network services consumable by a client executing on a first device. A request is received from the client for a list of services. There is a determination of whether a second device on the network which maintains a current list of services can or can not be located. Responsive to a determination that the second device can not be located, a local cached copy of a list of services is returned to the client. Responsive to a determination that the second device can be located, a request for the current list of services is sent to the second device, and a response containing the current list of services is received from the second device. The current list of services is returned to the client. | 12-29-2011 |
20120005432 | Reducing Cache Probe Traffic Resulting From False Data Sharing - Disclosed herein are a processing unit and a multi-processing unit system that implement a cache-coherency method. Such a multi-processing unit system includes a main memory, a first processing unit, and a second processing unit. The first processing unit and the second processing unit are coupled to the main memory. The first processing unit includes a cache and logic. The cache is configured to store data from the main memory. The logic is configured to maintain an entry in a directory of the cache. The entry indicates whether either of the first processing unit and the second processing unit accesses a data object of a cache line for which the first processing unit is a home node. | 01-05-2012 |
20120005433 | RESPONSE HEADER INVALIDATION - Systems, methods, and other embodiments associated with content invalidation are described. One example method includes providing an invalidation directive in a header of a response. | 01-05-2012 |
20120042132 | STORAGE SYSTEM WITH MIDDLE-WAY LOGICAL VOLUME - A storage system is disclosed including storage devices configured to store data, and a logical storage volume coupled to the storage devices and configured to store a subset of the data as segments. The storage system also includes a controller including a cache and memory. The memory is configured to include records such that each record corresponds with a segment in the logical storage volume and each record includes information regarding data stored in the corresponding segment. The controller is configured to access the records in response to a cache miss of the cache to determine if requested data from the cache miss is stored and ready for access in the logical storage volume. The controller is also configured to update the subset of data stored in the segments as a function of cache misses. | 02-16-2012 |
20120079207 | MASS STORAGE SYSTEM AND METHOD OF OPERATING THEREOF - There is provided a mass storage system and a method of operating thereof. The method comprises: a) generating one or more consistency checkpoints; b) associating each generated consistency checkpoint with a global number of snapshots generated in the storage system corresponding to time of generation of respective checkpoint; c) upon generating, placing each consistency checkpoint at the beginning of a sequence of dirty data portions which are handled in a cache memory with the help of a replacement technique with an access-based promotion; d) enabling within the sequence of dirty data portions an invariable order of consistency checkpoints and dirty data portions corresponding to volumes with generated snapshots; and e) responsive to destaging a certain consistency checkpoint, recording associated with the certain checkpoint global number of generated snapshots to a predefined storage location configured to be read during a recovery of the storage system. The invariable order can be provided by ceasing access-related promotion of all dirty data portions corresponding to all volumes with generated snapshots. | 03-29-2012 |
20120079208 | PROBE SPECULATIVE ADDRESS FILE - An apparatus to resolve cache coherency is presented. In one embodiment, the apparatus includes a microprocessor comprising one or more processing cores. The apparatus also includes a probe speculative address file unit, coupled to a cache memory, comprising a plurality of entries. Each entry includes a timer and a tag associated with a memory line. The apparatus further includes control logic to determine whether to service an incoming probe based at least in part on a timer value. | 03-29-2012 |
20120079209 | METHOD AND APPARATUS FOR IMPLEMENTING MULTI-PROCESSOR MEMORY COHERENCY - A method and an apparatus for implementing multi-processor memory coherency are disclosed. The method includes: a Level-2 (L2) cache of a first cluster receives a control signal of the first cluster for reading first data; the L2 cache of the first cluster reads the first data in a Level-1 (L1) cache of a second cluster through an Accelerator Coherency Port (ACP) of the L1 cache of the second cluster if the first data is currently maintained by the second cluster, where the L2 cache of the first cluster is connected to the ACP of the L1 cache of the second cluster; and the L2 cache of the first cluster provides the first data read to the first cluster for processing. The technical solution under the present invention implements memory coherency between clusters in the ARM Cortex-A9 architecture. | 03-29-2012 |
20120089785 | APPARATUS AND METHOD FOR DETECTING FALSE SHARING - A false sharing detecting apparatus for analyzing a multi-thread application, the false sharing detecting apparatus includes an operation set detecting unit configured to detect an operation set having a chance of causing performance degradation due to false sharing, and a probability calculation unit configured to calculate a first probability defined as a probability that the detected operation set is to be executed according to an execution pattern causing performance degradation due to false sharing, and calculate a second probability based on the calculated first probability. The second probability is defined as a probability that performance degradation due to false sharing occurs with respect to an operation included in the detected operation set. | 04-12-2012 |
20120089786 | DISTRIBUTED CACHE COHERENCY PROTOCOL - Systems, methods, and other embodiments associated with a distributed cache coherency protocol are described. According to one embodiment, a method includes receiving a request from a requester for access to one or more memory blocks in a block storage device that is shared by at least two physical computing machines and determining if a caching right to any of the one or more memory blocks has been granted to a different requester. If the caching right has not been granted to the different requester, access is granted to the one or more memory blocks to the requester. | 04-12-2012 |
20120089787 | TRANSACTION PROCESSING MULTIPLE PROTOCOL ENGINES IN SYSTEMS HAVING MULTIPLE MULTI-PROCESSOR CLUSTERS - A multi-processor computer system is described in which transaction processing in each cluster of processors is distributed among multiple protocol engines. Each cluster includes a plurality of local nodes and an interconnection controller interconnected by a local point-to-point architecture. The interconnection controller in each cluster comprises a plurality of protocol engines for processing transactions. Transactions are distributed among the protocol engines using destination information associated with the transactions. | 04-12-2012 |
20120096228 | System and Method for the Synchronization of a File in a Cache - The present invention provides a system and method for bi-directional synchronization of a cache. One embodiment of the system of this invention includes a software program stored on a computer readable medium. The software program can be executed by a computer processor to receive a database asset from a database; store the database asset as a cached file in a cache; determine if the cached file has been modified; and if the cached file has been modified, communicate the cached file directly to the database. The software program can poll a cached file to determine if the cached file has changed. Thus, bi-directional synchronization can occur. | 04-19-2012 |
20120102272 | EFFICIENT FILE MANAGEMENT THROUGH GRANULAR OPPORTUNISTIC LOCKING - Improved methods and systems for granular opportunistic locking mechanisms (oplocks) are provided for increasing file caching efficiency. Oplocks can be specified with a combination of three possible granular caching intentions: read, write, and/or handle. An oplock can be specified with an identifier that indicates a client/specific caller to avoid breaking the original oplock due to an incompatibility from other requests of the same client. An atomic oplock flag is added to create operations that allow callers to request an atomic open with an oplock with a given file. | 04-26-2012 |
20120117331 | NOTIFICATION PROTOCOL BASED ENDPOINT CACHING OF HOST MEMORY - An endpoint device ( | 05-10-2012 |
20120124297 | COHERENCE DOMAIN SUPPORT FOR MULTI-TENANT ENVIRONMENT - A method includes bypassing a global coherence operation that maintains global memory coherence between a plurality of local memories associated with a plurality of corresponding processors. The bypassing is in response to an address of a memory request being associated with a local memory coherence domain. The method includes accessing a memory location associated with the local memory coherence domain according to the memory request in response to the address being associated with the local memory coherence domain. | 05-17-2012 |
20120124298 | LOCAL SYNCHRONIZATION IN A MEMORY HIERARCHY - A method, system, and computer usable program product for local synchronization in a memory hierarchy in a multi-core data processing system are provided in the illustrative embodiments. A request to acquire a reservation for a reservation granule is received at a first core. The reservation is acquired in a first local cache associated with the first core in response to a cache line including the reservation granule being present and writable in the first local cache. A conditional store request to store at the reservation granule is received at the first core. A determination is made whether the reservation remains held at the first local cache. The store operation is performed at the first local cache responsive to reservation remaining held at the first local cache. | 05-17-2012 |
20120131282 | Providing A Directory Cache For Peripheral Devices - In one embodiment, the present invention includes a processor having at least one core and uncore logic. The uncore logic can include a home agent to act as a guard to control access to a memory region. Either in the home agent or another portion of the uncore logic, a directory cache may be provided to store ownership information for a portion of the memory region owned by an agent coupled to the processor. In this way, when an access request for the memory region misses in the directory cache, a memory transaction can be avoided. Other embodiments are described and claimed. | 05-24-2012 |
20120137079 | CACHE COHERENCY CONTROL METHOD, SYSTEM, AND PROGRAM - In a system for controlling cache coherency of a multiprocessor system in which a plurality of processors share a system memory, each of the plurality of processors including a cache and a TLB, the processor includes a TLB controller including a TLB search unit that performs a TLB search and a coherency handler that performs TLB registration information processing when no hit occurs in the TLB search and a TLB interrupt occurs. The coherency handler includes a TLB replacement handler that searches a page table in the system memory and that replaces the TLB registration information, a TLB miss exception handling unit, and a storage exception handling unit. | 05-31-2012 |
20120144126 | APPARATUS, METHOD, AND SYSTEM FOR INSTANTANEOUS CACHE STATE RECOVERY FROM SPECULATIVE ABORT/COMMIT - An apparatus and method is described herein for providing instantaneous, efficient cache state recover upon an end of speculative execution. Speculatively accessed entries of a cache memory are marked as speculative, which may be on a thread specific basis. Upon an end of speculation, the speculatively marked entries are transitioned in parallel by a speculative port to their appropriate, thread specific, non-speculative coherency state; these parallel transitions allow for instantaneous commit or recovery of speculative memory state. | 06-07-2012 |
20120159080 | NEIGHBOR CACHE DIRECTORY - A method and apparatus for utilizing a higher-level cache as a neighbor cache directory in a multi-processor system are provided. In the method and apparatus, when the data field of a portion or all of the cache is unused, a remaining portion of the cache is repurposed for usage as neighbor cache directory. The neighbor cache provides a pointer to another cache in the multi-processor system storing memory data. The neighbor cache directory can be searched in the same manner as a data cache. | 06-21-2012 |
20120159081 | DEDUPLICATION-AWARE PAGE CACHE - An access request that includes a combination of a file identifier and an offset value is received. If the page cache does not contain the page indexed by the combination, then the file system is accessed and the offset value is mapped to a disk location. The file system can access a block map to identify the location. A table (e.g., a shared location table) that includes entries (e.g., locations) for pages that are shared by multiple files is accessed. If the aforementioned disk location is in the table, then the requested page is in the page cache and it is not necessary to add the page to the page cache. Otherwise, the page is added to the page cache. | 06-21-2012 |
20120159082 | Direct Access To Cache Memory - Methods and apparatuses are disclosed for direct access to cache memory. Embodiments include receiving, by a direct access manager that is coupled to a cache controller for a cache memory, a region scope zero command describing a region scope zero operation to be performed on the cache memory; in response to receiving the region scope zero command, generating a direct memory access region scope zero command, the direct memory access region scope zero command having an operation code and an identification of the physical addresses of the cache memory on which the operation is to be performed; sending the direct memory access region scope zero command to the cache controller for the cache memory; and performing, by the cache controller, the direct memory access region scope zero operation in dependence upon the operation code and the identification of the physical addresses of the cache memory. | 06-21-2012 |
20120159083 | Systems and Methods for Processing Memory Transactions - Systems and methods for performing memory transactions are described. In an embodiment, a system comprises a processor configured to perform an action in response to a transaction indicative of a request originated by a hardware subsystem. A logic circuit is configured to receive the transaction. In response to identifying a specific characteristic of the transaction, the logic circuit splits the transaction into two or more other transactions. The two or more other transactions enable the processor to satisfy the request without performing the action. The system also includes an interface circuit configured to receive the request originated by the hardware subsystem and provide the transaction to the logic circuit. In some embodiments, a system may be implemented as a system-on-a-chip (SoC). Devices suitable for using these systems include, for example, desktop and laptop computers, tablets, network appliances, mobile phones, personal digital assistants, e-book readers, televisions, and game consoles. | 06-21-2012 |
20120159084 | METHOD AND APPARATUS FOR REDUCING LIVELOCK IN A SHARED MEMORY SYSTEM - A method is provided for identifying a first portion of a computer program for speculative execution by a first processor element. At least one memory object is declared as being protected during the speculative execution. Thereafter, if a first signal is received indicating that the at least one protected memory object is to be accessed by a second processor element, then delivery of the first signal is delayed for a preselected duration of time to potentially allow the speculative execution to complete. The speculative execution of the first portion of the computer program may be aborted in response to receiving the delayed first signal before the speculative execution of the first portion of the computer program has been completed. | 06-21-2012 |
20120173823 | APPLICATION CACHE PROFILER - In an embodiment of the invention, a method for data profiling incorporating an enterprise service bus (ESB) coupling the target and source systems following an extraction, transformation, and loading (ETL) process for a target system and a source system is provided. The method includes receiving baseline data profiling results obtained during ETL from a source application to a target application, caching the updates, determining current data profiling results within the ESB for cached updates, and triggering an action if a threshold disparity is detected upon the current data profiling results and the baseline data profiling results. | 07-05-2012 |
20120179876 | Cache-Based Speculation of Stores Following Synchronizing Operations - A method of processing store requests in a data processing system includes enqueuing a store request in a store queue of a cache memory of the data processing system. The store request identifies a target memory block by a target address and specifies store data. While the store request and a barrier request older than the store request are enqueued in the store queue, a read-claim machine of the cache memory is dispatched to acquire coherence ownership of target memory block of the store request. After coherence ownership of the target memory block is acquired and the barrier request has been retired from the store queue, a cache array of the cache memory is updated with the store data. | 07-12-2012 |
20120179877 | MECHANISM TO SUPPORT FLEXIBLE DECOUPLED TRANSACTIONAL MEMORY - The present invention employs three decoupled hardware mechanisms: read and write signatures, which summarize per-thread access sets; per-thread conflict summary tables, which identify the threads with which conflicts have occurred; and a lazy versioning mechanism, which maintains the speculative updates in the local cache and employs a thread-private buffer (in virtual memory) only in the rare event of an overflow. The conflict summary tables allow lazy conflict management to occur locally, with no global arbitration (they also support eager management). All three mechanisms are kept software-accessible, to enable virtualization and to support transactions of arbitrary length. | 07-12-2012 |
20120210072 | CACHE-BASED SPECULATION OF STORES FOLLOWING SYNCHRONIZING OPERATIONS - A method of processing store requests in a data processing system includes enqueuing a store request in a store queue of a cache memory of the data processing system. The store request identifies a target memory block by a target address and specifies store data. While the store request and a barrier request older than the store request are enqueued in the store queue, a read-claim machine of the cache memory is dispatched to acquire coherence ownership of target memory block of the store request. After coherence ownership of the target memory block is acquired and the barrier request has been retired from the store queue, a cache array of the cache memory is updated with the store data. | 08-16-2012 |
20120233410 | Shared-Variable-Based (SVB) Synchronization Approach for Multi-Core Simulation - The present invention discloses a shared-variable-based (SVB) approach for fast and accurate multi-core cache coherence simulation. While the intuitive, conventional approach, synchronizing at either every cycle or memory access, gives accurate simulation results, it has poor performance due to huge simulation overloads. In the present invention, timing synchronization is only needed before shared variable accesses in order to maintain accuracy while improving the efficiency in the proposed shared-variable-based approach. | 09-13-2012 |
20120284463 | PREDICTING CACHE MISSES USING DATA ACCESS BEHAVIOR AND INSTRUCTION ADDRESS - In a decode stage of hardware processor pipeline, one particular instruction of a plurality of instructions is decoded. It is determined that the particular instruction requires a memory access. Responsive to such determination, it is predicted whether the memory access will result in a cache miss. The predicting in turn includes accessing one of a plurality of entries in a pattern history table stored as a hardware table in the decode stage. The accessing is based, at least in part, upon at least a most recent entry in a global history buffer. The pattern history table stores a plurality of predictions. The global history buffer stores actual results of previous memory accesses as one of cache hits and cache misses. Additional steps include scheduling at least one additional one of the plurality of instructions in accordance with the predicting; and updating the pattern history table and the global history buffer subsequent to actual execution of the particular instruction in an execution stage of the hardware processor pipeline, to reflect whether the predicting was accurate. | 11-08-2012 |
20120297146 | FACILITATING DATA COHERENCY USING IN-MEMORY TAG BITS AND TAG TEST INSTRUCTIONS - A method is provided for fine-grained detection of data modification of original data by associating separate guard bits with granules of memory storing original data from which translated data has been obtained. The guard bits indicating whether the original data stored in the associated granule is protected for data coherency. The guard bits are set and cleared by special-purpose instructions. Responsive to attempting access to translated data obtained from the original data, the guard bit(s) associated with the original data is checked to determine whether the guard bit(s) fail to indicate coherency of the original data, and if so, discarding of the translated data is initiated to facilitate maintaining data coherency between the original data and the translated data. | 11-22-2012 |
20120311271 | Read Cache Device and Methods Thereof for Accelerating Access to Data in a Storage Area Network - A read cache device for accelerating execution of read commands in a storage area network (SAN) in a data path between frontend servers and a backend storage. The device includes a cache memory unit for maintaining portions of data that reside in the backend storage and mapped to at least one accelerated virtual volume; a cache management unit for maintaining data consistency between the cache memory unit and the at least one accelerated virtual volume; a descriptor memory unit for maintaining a plurality of descriptors; and a processor for receiving each command and each command response travels in the data path serving each received read command directed to the at least one accelerated virtual volume by returning requested data stored in the cache memory unit and writing data to the cache memory unit according to a caching policy. | 12-06-2012 |
20120317365 | SYSTEM AND METHOD TO BUFFER DATA - A data storage device includes a controller, a non-volatile memory, and a buffer accessible to the controller. The buffer is configured to store data retrieved from the non-volatile memory to be accessible to a host device in response to receiving from the host device one or more requests for read access to the non-volatile memory while the data storage device is operatively coupled to the host device. The controller is configured to read an indicator of cached data in response to receiving a request for read access to the non-volatile memory. The request includes a data identifier. In response to the indicator of cached data not indicating that data corresponding to the data identifier is in the buffer, the controller is configured to retrieve data corresponding to the data identifier as well as additional data from the non-volatile memory and to write the data corresponding to the data identifier and the additional data to the buffer. The controller is configured to update the indicator of cached data in response to retrieved data from the non-volatile memory being written to the buffer. | 12-13-2012 |
20120317366 | COMPUTER SYSTEM MANAGEMENT APPARATUS AND MANAGEMENT METHOD - The present invention measures an actual utilization frequency of data and controls a location of this data in a storage apparatus in a case where a host computer makes joint use of a storage apparatus and a cache apparatus. A portion of data used by an application program | 12-13-2012 |
20120324173 | EFFICIENT DISCARD SCANS - Exemplary method, system, and computer program product embodiments for performing a discard scan operation are provided. In one embodiment, by way of example only, a plurality of tracks is examined for meeting criteria for a discard scan. In lieu of waiting for a completion of a track access operation, at least one of the plurality of tracks is marked for demotion. An additional discard scan may be subsequently performed for tracks not previously demoted. The discard and additional discard scans may proceed in two phases. Additional system and computer program product embodiments are disclosed and provide related advantages. | 12-20-2012 |
20130007375 | DEVICE AND METHOD FOR EXCHANGING DATA BETWEEN MEMORY CONTROLLERS - A device with an interconnect having a plurality of memory controllers for connecting the plurality of memory controllers. Each memory controller of the plurality of memory controllers is coupled to an allocated memory for storing data. Further, each memory controller of the plurality of memory controllers has one accelerator of a plurality of accelerators for mutually exchanging data over the interconnect. | 01-03-2013 |
20130031314 | Support for Multiple Coherence Domains - A number of coherence domains are maintained among the multitude of processing cores disposed in a microprocessor. A cache coherency manager defines the coherency relationships such that coherence traffic flows only among the processing cores that are defined as having a coherency relationship. The data defining the coherency relationships between the processing cores is optionally stored in a programmable register. For each source of a coherent request, the processing core targets of the request are identified in the programmable register. In response to a coherent request, an intervention message is forwarded only to the cores that are defined to be in the same coherence domain as the requesting core. If a cache hit occurs in response to a coherent read request and the coherence state of the cache line resulting in the hit satisfies a condition, the requested data is made available to the requesting core from that cache line. | 01-31-2013 |
20130054900 | Method and apparatus for increasing capacity of cache directory in multi-processor systems - A method and an apparatus for increasing capacity of cache directory in multi-processor systems, the apparatus comprising a plurality of processor nodes and a plurality of cache memory nodes and a plurality of main memory nodes. | 02-28-2013 |
20130061003 | COHERENCE SWITCH FOR I/O TRAFFIC - A system, apparatus, and method for routing traffic in a SoC from I/O devices to memory. A coherence switch routes coherent traffic through a coherency port on a processor complex to a real-time port of a memory controller. The coherence switch routes non-coherent traffic to a non-real time port of the memory controller. The coherence switch can also dynamically switch traffic between the two paths. The routing of traffic can be configured via a configuration register, and while software can initiate an update to the configuration register, the actual coherence switch hardware will implement the update. Software can write to a software-writeable copy of the configuration register to initiate an update to the flow path to memory for a transaction identifier. The coherence switch detects the update to the software-writeable copy, and then the coherence switch updates the working copy of the configuration register and implements the new routing. | 03-07-2013 |
20130067171 | DATA STORAGE SYSTEM INCLUDING BACKUP MEMORY AND MANAGING METHOD THEREOF - The invention discloses a data storage system and managing method thereof. The data storage system according to the invention includes N storage devices, a backup memory and a controller where N is a natural number. Each storage device has a respective write cache. Once the data storage system suffers from power failure, the backup memory still reserves data stored therein. The controller receives data transmitted from an application I/O request unit, executes a predetermined operation for the received data to generate data to be written, transmits the data to be written to the write caches of the storage devices, duplicates the data to be written into the backup memory, and labels the duplicated data in the backup memory as being valid in response to a writing confirm message sent from the storage devices. | 03-14-2013 |
20130073811 | REGION PRIVATIZATION IN DIRECTORY-BASED CACHE COHERENCE - A system and method for region privatization in a directory-based cache coherence system is disclosed. The system and method includes receiving a request from a requesting node for at least one block in a region, allocating a new entry for the region based on the request for the block, requesting from the memory controller the data for the region be sent to the requesting node, receiving a subsequent request for a block within the region, determining that any blocks of the region that are cached are also cached at the requesting node, and privatizing the region at the requesting node. | 03-21-2013 |
20130073812 | CACHE MEMORY DEVICE, PROCESSOR, AND INFORMATION PROCESSING APPARATUS - According to an embodiment, a cache memory device caches data stored in or data to be stored in a memory device. The cache memory device includes a memory area that includes a plurality of cache lines; and a controller. When the number of dirty lines among the cache lines exceeds a predetermined number, the controller writes data of the dirty lines into the memory device, each of the dirty lines containing data that is not written in the memory device. | 03-21-2013 |
20130103910 | CACHE MANAGEMENT FOR INCREASING PERFORMANCE OF HIGH-AVAILABILITY MULTI-CORE SYSTEMS - An apparatus and method for improving performance in high-availability systems are disclosed. In accordance with the illustrative embodiment, pages of memory of a primary system that are to be shadowed are initially copied to a backup system's memory, as well as to a cache in the primary system. A duplication manager process maintains the cache in an intelligent manner that significantly reduces the overhead required to keep the backup system in sync with the primary system, as well as the cache size needed to achieve a given level of performance. Advantageously, the duplication manager is executed on a different processor core than the application process executing transactions, further improving performance. | 04-25-2013 |
20130111148 | THREE CHANNEL CACHE-COHERENCY SOCKET PROTOCOL | 05-02-2013 |
20130111149 | INTEGRATED CIRCUITS WITH CACHE-COHERENCY | 05-02-2013 |
20130117511 | DATA PROCESSING APPARATUS AND METHOD - A data processing apparatus has a cache having a normal mode and a retention mode in which the cache consumes less power than in the normal mode. An interconnect receives, from at least one other device, coherency access requests for data stored in the cache. In the normal mode, the data in the cache is accessible and the cache generates coherency responses in response to the coherency access requests, while in the retention mode the data is retained in the cache but inaccessible in response to the coherency access requests. A coherency controller is provided to monitor the coherency access requests and coherency responses. Switching of the cache from the normal mode to the retention mode is deferred until the coherency controller has detected coherency responses for all coherency access requests passed to said cache. | 05-09-2013 |
20130117512 | PROGRAM CONVERTING APPARATUS, PROGRAM CONVERTING METHOD, AND MEDIUM - According to one embodiment, a program converting device includes an access attribute determining unit, a non-sharing target classifying unit, and a converting unit. The access attribute determining unit calculates exclusive accesses from memory accesses by threads forming a source program and determines a memory access using a cache memory among the calculated exclusive accesses. The non-sharing target classifying unit determines an access data item that does not share a cache line with another access data item among the access data items that are accessed using the cache memories. The converting unit inserts a process that does not share the cache line into the source program based on the determination result of the non-sharing target classifying unit. | 05-09-2013 |
20130138893 | STORAGE DEVICE, COMPUTER-READABLE RECORDING MEDIUM, AND STORAGE CONTROL METHOD - A storage device being one of a plurality of storage devices storing data includes a memory and a processor coupled to the memory. The processor executes determining, when having received a new request and a new priority information during a preparation for an execution of another update processing, whether a new priority indicated by the new priority information is higher than a priority of the update processing in the preparation. The process including canceling the update processing in the preparation when having determines at the determining that the new priority is higher than the priority of the update processing in the preparation. The process includes forwarding the new request and the new priority information to another storage device when having determined at the determining that the new priority is higher than the priority of the update processing in the preparation. | 05-30-2013 |
20130145104 | METHOD AND APPARATUS FOR CONTROLLING CACHE REFILLS - A method and apparatus are provided for controlling a cache. The cache includes a plurality of storage locations, each having a priority associated therewith, and wherein the cache evicts data from one or more of the storage locations based on the priority associated therewith. The method comprises: storing historical information regarding data being evicted from the cache; retrieving data from a secondary memory in response to a miss in the cache; assigning a priority to the retrieved data based on the historical information; and storing the retrieved data in the cache with an indication of the assigned priority. | 06-06-2013 |
20130151788 | DYNAMIC PRIORITIZATION OF CACHE ACCESS - Some embodiments of the inventive subject matter are directed to a cache comprising a tracking unit and cache state machines. In some embodiments, the tracking unit is configured to track an amount of cache resources used to service cache misses within a past period. In some embodiments, each of the cache state machines is configured to, determine whether a memory access request results in a cache miss or cache hit, and in response to a cache miss for a memory access request, query the tracking unit for the amount of cache resources used to service cache misses within the past period. In some embodiments, the each of the cache state machines is configured to service the memory access request based, at least in part, on the amount of cache resources used to service the cache misses within the past period according to the tracking unit. | 06-13-2013 |
20130151789 | MANAGING A REGION CACHE - A method, system or computer usable program product for managing a cache region including receiving a new region to be stored within the cache, the cache including multiple regions defined by one or more ranges having a starting index and an ending index, and storing the new region in the cache in accordance with a cache invariant, the cache invariant ensuring that regions in the cache are not overlapping and that the regions are stored in a specified order. | 06-13-2013 |
20130159631 | TRANSACTIONAL-CONSISTENT CACHE FOR DATABASE OBJECTS - A system and method for providing a transactional-consistent cache for database objects is disclosed. New data is received by a cache manager. The cache manager updates an entry of a cache with the new data received by the cache manager, by registering the updating of the entry with the new data with an invalidator. The registering includes a timestamp. An invalidation event is then generated by the invalidator. The invalidation event includes a notification about the updating of the entry of the cache with the new data received by the cache manager according to the timestamp. | 06-20-2013 |
20130185519 | MANAGING GLOBAL CACHE COHERENCY IN A DISTRIBUTED SHARED CACHING FOR CLUSTERED FILE SYSTEMS - Systems. Methods, and Computer Program Products are provided for managing a global cache coherency in a distributed shared caching for a clustered file systems (CFS). The CFS manages access permissions to an entire space of data segments by using the DSM module. In response to receiving a request to access one of the data segments, a calculation operation is performed for obtaining most recent contents of one of the data segments. The calculation operation performs one of providing the most recent contents via communication with a remote DSM module which obtains the one of the data segments from an associated external cache memory, instructing by the DSM module to read from storage the one of the data segments, and determining that any existing contents of the one of the data segments in the local external cache are the most recent contents. | 07-18-2013 |
20130205096 | FORWARD PROGRESS MECHANISM FOR STORES IN THE PRESENCE OF LOAD CONTENTION IN A SYSTEM FAVORING LOADS BY STATE ALTERATION - A multiprocessor data processing system includes a plurality of cache memories including a cache memory. The cache memory issues a read-type operation for a target cache line. While waiting for receipt of the target cache line, the cache memory monitors to detect a competing store-type operation for the target cache line. In response to receiving the target cache line, the cache memory installs the target cache line in the cache memory, and sets a coherency state of the target cache line installed in the cache memory based on whether the competing store-type operation is detected. | 08-08-2013 |
20130212336 | Method and Apparatus for Memory Write Performance Optimization in Architectures with Out-of-Order Read/Request-for-Ownership Response - A block of data may be transferred to memory through a plurality of write operations, where each write operation is preceded by a protocol request and a protocol response. A plurality of protocol requests issued in a first order may elicit a corresponding plurality of protocol responses in a second order, and the write operations may be performed in yet a third order. Chipsets implementing the data write methods are also described and claimed. | 08-15-2013 |
20130254492 | ACCESS REQUESTS WITH CACHE INTENTIONS - A lease system is described herein that allows clients to request a lease to a remote file, wherein the lease permits access to the file across multiple applications using multiple handles without extra round trips to a server. When multiple applications on the same client (or multiple components of the same application) request access to the same file, the client specifies the same lease identifier to the server for each open request or may handle the request from the cache based on the existing lease. Because the server identifies the client's cache at the client level rather than the individual file request level, the client receives fewer break notifications and is able to cache remote files in more circumstances. Thus, by providing the ability to cache data in more circumstances common with modern applications, the lease system reduces bandwidth, improves server scalability, and provides faster access to data. | 09-26-2013 |
20130268735 | SUPPORT FOR SPECULATIVE OWNERSHIP WITHOUT DATA - Techniques are described for providing an enhanced cache coherency protocol for a multi-core processor that includes a Speculative Request For Ownership Without Data (SRFOWD) for a portion of cache memory. With a SRFOWD, only an acknowledgement message may be provided as an answer to a requesting core. The contents of the affected cache line are not required to be a part of the answer. The enhanced cache coherency protocol may assure that a valid copy of the current cache line exists in case of misspeculation by the requesting core. Thus, an owner of the current copy of the cache line may maintain a copy of the old contents of the cache line. The old contents of the cache line may be discarded if speculation by the requesting core turns out to be correct. Otherwise, in case of misspeculation by the requesting core, the old contents of the cache line may be set back to a valid state. | 10-10-2013 |
20130282987 | Write-Only Dataless State for Maintaining Cache Coherency - Systems and methods for maintaining cache coherency in a multiprocessor system with shared memory, including a write-data-invalid (WDI) state configured to reduce stalls during write operations. The WDI state is a dataless state with guaranteed write permissions. When a first processor of the multiprocessor system makes a write request for a first cache entry of a first cache, the WDI state associated with the first cache entry includes write permissions for the write to directly proceed to one or more higher levels of memory in the shared memory, such that delays associated with obtaining write permissions is reduced at the first cache. The WDI state is treated as an invalid state for a read request to the first cache entry by the first processor. | 10-24-2013 |
20130282988 | Method for Performing Cache Coherency in a Computer System - In a computing system, cache coherency is performed by selecting one of a plurality of coherency protocols for a first memory transaction. Each of the plurality of coherency protocols has a unique set of cache states that may be applied to cached data for the first memory transaction. Cache coherency is performed on appropriate caches in the computing system by applying the set of cache states of the selected one of the plurality of coherency protocols. | 10-24-2013 |
20130290642 | Managing nodes in a storage system - Each node in a clustered array is the owner of a set of zero logical disks (LDs). Thinly-provisioned VVs (TPVVs) are partitioned so each is mapped to a group of zero LDs from different sets of zero LDs. When there is a change in ownership, the affected zero LDs are switched one at a time so only a group of the TPVVs is affected each time. | 10-31-2013 |
20130297888 | SCHEDULING METHOD AND MULTI-CORE PROCESSOR SYSTEM - A scheduling method of a scheduler that manages threads is executed by a computer. The scheduling method includes selecting a CPU of relatively less load, when a second thread is generated from a first thread to be processed; determining whether the second thread operates exclusively from the first thread; copying a first storage area assessed by the first thread onto a second storage area managed by the CPU, when the second thread operates exclusively; calculating based on an address of the second storage area and a predetermined value, an offset for a second address for the second thread to access the first storage area; and notifying the CPU of the offset for the second address to convert a first address to a third address for accessing the second storage area. | 11-07-2013 |
20130326152 | Rapid Recovery From Loss Of Storage Device Cache - Dirty data in a storage device is made current through rapid re-silvering, which uses a mirrored and up-to-date version of the dirty data from another storage device to recover the data. Because under rapid re-silvering cache metadata in volatile memory survives the failure of the cache, the cache metadata is used to determine which subset of data from the other storage device needs to be copied to the storage device being re-silvered. During re-silvering, cache metadata is used to determine which I/O requests from clients are requests for data that is not stale. | 12-05-2013 |
20130326153 | MULTI-THREADED TRANSACTIONAL MEMORY COHERENCE - The disclosure provides systems and methods for maintaining cache coherency in a multi-threaded processing environment. For each location in a data cache, a global state is maintained specifying the coherency of the cache location relative to other data caches and/or to a shared memory resource backing the data cache. For each cache location, thread state information associated with a plurality of threads is maintained. The thread state information is specified separately and in addition to the global state, and is used to individually control read and write permissions for each thread for the cache location. The thread state information is also used, for example by a cache controller, to control whether uncommitted transactions of threads relating to the cache location are to be rolled back. | 12-05-2013 |
20130326154 | CACHE SYSTEM OPTIMIZED FOR CACHE MISS DETECTION - According to an embodiment of the invention, cache management comprises maintaining a cache comprising a hash table including rows of data items in the cache, wherein each row in the hash table is associated with a hash value representing a logical block address (LBA) of each data item in that row. Searching for a target data item in the cache includes calculating a hash value representing a LBA of the target data item, and using the hash value to index into a counting Bloom filter that indicates that the target data item is either not in the cache, indicating a cache miss, or that the target data item may be in the cache. If a cache miss is not indicated, using the hash value to select a row in the hash table, and indicating a cache miss if the target data item is not found in the selected row. | 12-05-2013 |
20140006720 | DIRECTORY CACHE CONTROL DEVICE, DIRECTORY CACHE CONTROL CIRCUIT, AND DIRECTORY CACHE CONTROL METHOD | 01-02-2014 |
20140032853 | Method for Peer to Peer Cache Forwarding - A home node for selecting a source node using a cache coherency protocol, comprising a logic unit cluster coupled to a directory, wherein the logic unit cluster is configured to receive a request for data from a requesting cache node, determine a plurality of nodes that hold a copy of the requested data using the directory, select one of the nodes using one or more selection parameters as the source node, and transmit a message to the source node to determine whether the source node stores a copy of the requested data, wherein the source node forwards the requested data to the requesting cache node when the requested data is found within the source node, and wherein some of the nodes are marked as a Shared state corresponding to the cache coherency protocol. | 01-30-2014 |
20140032854 | Coherence Management Using a Coherent Domain Table - A computer program product comprising computer executable instructions stored on a non-transitory medium that when executed by a processor cause the processor to perform the following: assign a first, second, third, and fourth coherence domain address to a cache data, wherein the first and second address provides the boundary for a first coherence domain, and wherein the third and fourth address provides the boundary for a second coherence domain, inform a first resource about the first coherence domain prior to the first resource executing a first task, and inform a second resource about the second coherence domain prior to the second resource executing a second task. | 01-30-2014 |
20140040562 | USING BROADCAST-BASED TLB SHARING TO REDUCE ADDRESS-TRANSLATION LATENCY IN A SHARED-MEMORY SYSTEM WITH ELECTRICAL INTERCONNECT - The disclosed embodiments provide a system that uses broadcast-based TLB-sharing techniques to reduce address-translation latency in a shared-memory multiprocessor system with two or more nodes that are connected by an electrical interconnect. During operation, a first node receives a memory operation that includes a virtual address. Upon determining that one or more TLB levels of the first node will miss for the virtual address, the first node uses the electrical interconnect to broadcast a TLB request to one or more additional nodes of the shared-memory multiprocessor in parallel with scheduling a speculative page-table walk for the virtual address. If the first node receives a TLB entry from another node of the shared-memory multiprocessor via the electrical interconnect in response to the TLB request, the first node cancels the speculative page-table walk. Otherwise, if no response is received, the first node instead waits for the completion of the page-table walk. | 02-06-2014 |
20140040563 | SHARED VIRTUAL MEMORY MANAGEMENT APPARATUS FOR PROVIDING CACHE-COHERENCE - A shared virtual memory management apparatus for ensuring cache coherence. When two or more cores request write permission to the same virtual memory page, the shared virtual memory management apparatus allocates a physical memory page for the cores to change data in the allocated physical memory page. Thereafter, changed data is updated in an original physical memory page, and accordingly it is feasible to achieve data coherence in a multi-core hardware environment that does not provide cache coherence. | 02-06-2014 |
20140040564 | System, Method, and Computer Program Product for Conditionally Sending a Request for Data to a Node Based on a Determination - A system, method, and computer program product are provided for conditionally sending a request for data to a node based on a determination. In operation, a first request for data is sent to a cache of a first node. Additionally, it is determined whether the first request can be satisfied within the first node, where the determining includes at least one of determining a type of the first request and determining a state of the data in the cache. Furthermore, a second request for the data is conditionally sent to a second node, based on the determination. | 02-06-2014 |
20140052929 | PROGRAMMABLE RESOURCES TO TRACK MULTIPLE BUSES - A system and method for efficiently monitoring traces of multiple components in an embedded system. A system-on-a-chip (SOC) includes a trace unit for collecting and storing trace history, bus event statistics, or both. The SOC may transfer cache coherent messages across multiple buses between a shared memory and a cache coherent controller. The trace unit includes multiple bus event filters. Programmable configuration registers are used to assign the bus event filters to selected buses for monitoring associated bus traffic and determining whether qualified bus events occur. If so, the bus event filters increment an associated count for each of the qualified bus events. The values used for determining qualified bus events may be set by programmable configuration registers. | 02-20-2014 |
20140052930 | EFFICIENT TRACE CAPTURE BUFFER MANAGEMENT - A system and method for efficiently storing traces of multiple components in an embedded system. A system-on-a-chip (SOC) includes a trace unit for collecting and storing trace history, bus event statistics, or both. The SOC may transfer cache coherent messages across multiple buses between a shared memory and a cache coherent controller. The trace unit includes a trace buffer with multiple physical partitions assigned to subsets of the multiple buses. The number of partitions is less than the number of multiple buses. One or more trace instructions may cause a trace history, trace bus event statistics, local time stamps and a global time-base value to be stored in a physical partition within the trace buffer. | 02-20-2014 |
20140068198 | STATISTICAL CACHE PROMOTION - Storing data in a cache is disclosed. It is determined that a data record is not stored in a cache. A random value is generated using a threshold value. It is determined whether to store the data record in the cache based at least in part on the generated random value. | 03-06-2014 |
20140082300 | APPARATUS AND METHOD FOR MAINTAINING CACHE COHERENCY, AND MULTIPROCESSOR APPARATUS USING THE METHOD - Provided are an apparatus and method for maintaining cache coherency, and a multiprocessor apparatus using the method. The multiprocessor apparatus includes a main memory, a plurality of processors, a plurality of cache memories that are connected to each of the plurality of processors, a memory bus that is connected to the plurality of cache memories and the main memory, and a coherency bus that is connected to the plurality of cache memories to transmit coherency related information between caches. Accordingly, a bandwidth shortage phenomenon may be reduced in an on-chip communication structure, which occurs when using a communication structure between a memory and a cache, and communication for coherency between caches may be simplified. | 03-20-2014 |
20140089600 | SYSTEM CACHE WITH DATA PENDING STATE - Methods and apparatuses for utilizing a data pending state for cache misses in a system cache. To reduce the size of a miss queue that is searched by subsequent misses, a cache line storage location is allocated in the system cache for a miss and the state of the cache line storage location is set to data pending. A subsequent request that hits to the cache line storage location will detect the data pending state and as a result, the subsequent request will be sent to a replay buffer. When the fill for the original miss comes back from external memory, the state of the cache line storage location is updated to a clean state. Then, the request stored in the replay buffer is reactivated and allowed to complete its access to the cache line storage location. | 03-27-2014 |
20140089601 | MANAGING A REGION CACHE - A method for managing a cache region including receiving a new region to be stored within the cache, the cache including multiple regions defined by one or more ranges having a starting index and an ending index, and storing the new region in the cache in accordance with a cache invariant, the cache invariant ensuring that regions in the cache are not overlapping and that the regions are stored in a specified order. | 03-27-2014 |
20140115266 | OPTIONAL ACKNOWLEDGEMENT FOR OUT-OF-ORDER COHERENCE TRANSACTION COMPLETION - To enable efficient tracking of transactions, an acknowledgement expected signal is used to give the cache coherent interconnect a hint for whether a transaction requires coherent ownership tracking. This signal informs the cache coherent interconnect to expect an ownership transfer acknowledgement signal from the initiating master upon read/write transfer completion. The cache coherent interconnect can therefore continue tracking the transaction at its point of coherency until it receives the acknowledgement from the initiating master only when necessary. | 04-24-2014 |
20140115267 | Hazard Detection and Elimination for Coherent Endpoint Allowing Out-of-Order Execution - A coherence maintenance address queue tracks each memory access from receipt until the memory reports the access complete. The address of each new access is compared against the address of all entries in the queue. This check is made when the access is ready to transmit to the memory. If there is no address match, then the current access does not conflict with any pending access. If there is an address match, the current access is stalled. The multi-core shared memory controller would then typically proceed to another access waiting a slot to the endpoint memory. Stored addresses in the coherence maintenance address queue are retired when the endpoint memory reports completion of the operation. At this point the access is no longer a hazard to following operations. | 04-24-2014 |
20140122809 | CONTROL MECHANISM FOR FINE-TUNED CACHE TO BACKING-STORE SYNCHRONIZATION - One embodiment of the present invention sets forth a technique for processing commands received by an intermediary cache from one or more clients. The technique involves receiving a first write command from an arbiter unit, where the first write command specifies a first memory address, determining that a first cache line related to a set of cache lines included in the intermediary cache is associated with the first memory address, causing data associated with the first write command to be written into the first cache line, and marking the first cache line as dirty. The technique further involves determining whether a total number of cache lines marked as dirty in the set of cache lines is less than, equal to, or greater than a first threshold value, and: not transmitting a dirty data notification to the frame buffer logic when the total number is less than the threshold value, or transmitting a dirty data notification to the frame buffer logic when the total number is equal to or greater than the first threshold value. | 05-01-2014 |
20140129782 | Server Side Distributed Storage Caching - The invention provides a system with storage cache with high bandwidth and low latency to the server, and coherence for the contents of multiple memory caches, wherein locally managing a storage cache situated on a server is combined with a means for globally managing the coherency of storage caches of a number of servers. The local cache manager delivers very high performance and low latency for write transactions that hit the local cache in the Modified or Exclusive state and for read transactions that hit the local cache in the Modified, Exclusive or Shared states. The global coherency manager enables many servers connected via a network to share the contents of their local caches, providing application transparency by maintaining a directory with an entry for each storage block that indicates which servers have that block in the shared state or which server has that block in the modified state. | 05-08-2014 |
20140149681 | COHERENT PROXY FOR ATTACHED PROCESSOR - A coherent attached processor proxy (CAPP) of a primary coherent system receives a memory access request from an attached processor (AP) and an expected coherence state of a target address of the memory access request with respect to a cache memory of the AP. In response, the CAPP determines a coherence state of the target address and whether or not the expected state matches the determined coherence state. In response to determining that the expected state matches the determined coherence state, the CAPP issues a memory access request corresponding to that received from the AP on a system fabric of the primary coherent system. In response to determining that the expected state does not match the coherence state determined by the CAPP, the CAPP transmits a failure message to the AP without issuing on the system fabric a memory access request corresponding to that received from the AP. | 05-29-2014 |
20140149682 | PROGRAMMABLE COHERENT PROXY FOR ATTACHED PROCESSOR - A coherent attached processor proxy (CAPP) within a primary coherent system participates in an operation on a system fabric of the primary coherent system on behalf of an attached processor (AP) that is external to the primary coherent system and that is coupled to the CAPP. The operation includes multiple components communicated with the CAPP including a request and at least one coherence message. The CAPP determines one or more of the components of the operation by reference to at least one programmable data structure within the CAPP that can be reprogrammed. | 05-29-2014 |
20140149683 | PROGRAMMABLE COHERENT PROXY FOR ATTACHED PROCESSOR - A coherent attached processor proxy (CAPP) within a primary coherent system participates in an operation on a system fabric of the primary coherent system on behalf of an attached processor (AP) that is external to the primary coherent system and that is coupled to the CAPP. The operation includes multiple components communicated with the CAPP including a request and at least one coherence message. The CAPP determines one or more of the components of the operation by reference to at least one programmable data structure within the CAPP that can be reprogrammed. | 05-29-2014 |
20140149684 | APPARATUS AND METHOD OF CONTROLLING CACHE - An apparatus and method for controlling a cache may include a cache controller configured to collect a portion of data corresponding to a cache miss in the data at one time, and a data operation unit configured to perform an operation on the data based on the collected data. | 05-29-2014 |
20140164714 | SPECULATIVE READ IN A CACHE COHERENT MICROPROCESSOR - A cache coherence manager, disposed in a multi-core microprocessor, includes a request unit, an intervention unit, a response unit and an interface unit. The request unit receives coherent requests and selectively issues speculative requests in response. The interface unit selectively forwards the speculative requests to a memory. The interface unit includes at least three tables. Each entry in the first table represents an index to the second table. Each entry in the second table represents an index to the third table. The entry in the first table is allocated when a response to an associated intervention message is stored in the first table but before the speculative request is received by the interface unit. The entry in the second table is allocated when the speculative request is stored in the interface unit. The entry in the third table is allocated when the speculative request is issued to the memory. | 06-12-2014 |
20140173218 | CROSS DEPENDENCY CHECKING LOGIC - Systems and methods for maintaining an order of transactions in the coherence point. The coherence point stores attributes associated with received transactions in an input request queue (IRQ). When a new transaction is received by the coherence point, the IRQ is searched for other entries with the same request address or the same victim address as the new transaction. If one or more matches are found, the new transaction entry points to the entry storing the most recently received transaction with the same address. The new transaction is stalled until the transaction it points to has been completed in the coherence point. | 06-19-2014 |
20140173219 | LIGHTWEIGHT OBSERVABLE VALUES FOR MULTIPLE GRIDS - A method, computer program product, and computer system for updating observable values for multiple user-interface components. A computer system reads first values indexed by keys from a cache, in response to receiving a request from the multiple user-interface components. The computer system reads second values, which are indexed by the keys, from persistent storage. The computer system compares the first values and the second values based on the keys. The computer system writes the second values as new values of the first values in the cache. The computer system notifies one or more observers for respective ones of the first values, wherein the respective ones of the first values are changed. And, the computer system notifies the one or more observers for the first values that reading and writing operations are finished. | 06-19-2014 |
20140173220 | Using Logical Block Addresses with Generation Numbers as Data Fingerprints to Provide Cache Coherency - The technique introduced here involves using a block address and a corresponding generation number as a “fingerprint” to uniquely identify a sequence of data within a given storage domain. Each block address has an associated generation number which indicates the number of times that data at that block address has been modified. This technique can be employed, for example, to maintain cache coherency among multiple storage nodes. It can also be employed to avoid sending the data to a network node over a network if it already has the data. | 06-19-2014 |
20140181417 | CACHE COHERENCY USING DIE-STACKED MEMORY DEVICE WITH LOGIC DIE - A die-stacked memory device implements an integrated coherency manager to offload cache coherency protocol operations for the devices of a processing system. The die-stacked memory device includes a set of one or more stacked memory dies and a set of one or more logic dies. The one or more logic dies implement hardware logic providing a memory interface and the coherency manager. The memory interface operates to perform memory accesses in response to memory access requests from the coherency manager and the one or more external devices. The coherency manager comprises logic to perform coherency operations for shared data stored at the stacked memory dies. Due to the integration of the logic dies and the memory dies, the coherency manager can access shared data stored in the memory dies and perform related coherency operations with higher bandwidth and lower latency and power consumption compared to the external devices. | 06-26-2014 |
20140181418 | MANAGING GLOBAL CACHE COHERENCY IN A DISTRIBUTED SHARED CACHING FOR CLUSTERED FILE SYSTEMS - Systems. Methods, and Computer Program Products are provided for managing a global cache coherency in a distributed shared caching for a clustered file systems (CFS). The CFS manages access permissions to an entire space of data segments by using the DSM module. In response to receiving a request to access one of the data segments, a calculation operation is performed for obtaining most recent contents of one of the data segments. The calculation operation performs one of providing the most recent contents via communication with a remote DSM module which obtains the one of the data segments from an associated external cache memory, instructing by the DSM module to read from storage the one of the data segments, and determining that any existing contents of the one of the data segments in the local external cache are the most recent contents. | 06-26-2014 |
20140189251 | UPDATE MASK FOR HANDLING INTERACTION BETWEEN FILLS AND UPDATES - A multi core processor implements a cash coherency protocol in which probe messages are address-ordered on a probe channel while responses are un-ordered on a response channel. When a first core generates a read of an address that misses in the first core's cache, a line fill is initiated. If a second core is writing the same address, the second core generates an update on the addressed ordered probe channel. The second core's update may arrive before or after the first core's line fill returns. If the update arrived before the fill returned, a mask is maintained to indicate which portions of the line were modified by the update so that the late arriving line fill only modifies portions of the line that were unaffected by the earlier-arriving update. | 07-03-2014 |
20140195740 | FLOW-ID DEPENDENCY CHECKING LOGIC - Systems and methods for maintaining an order of transactions in the coherence point. The coherence point stores attributes associated with received transactions in an input request queue (IRQ). When a new transaction is received with a device ordered attribute, the IRQ is searched for other entries with the same flow ID as the new transaction. If one or more matches are found, the new transaction entry points to the entry for the most recently received transaction with the same flow ID. The new transaction is prevented from exiting the coherence point until the transaction it points to has been sent to its destination. | 07-10-2014 |
20140201460 | DATA RECOVERY FOR COHERENT ATTACHED PROCESSOR PROXY - A coherent attached processor proxy (CAPP) that participates in coherence communication in a primary coherent system on behalf of an attached processor external to the primary coherent system tracks delivery of data to destinations in the primary coherent system via one or more entries in a data structure. Each of the one or more entries specifies with a destination tag a destination in the primary coherent system to which data is to be delivered from the attached processor. In response to initiation of recovery operations for the CAPP, the CAPP performs data recovery operations, including transmitting, to at least one destination indicated by the destination tag of one or more entries, an indication of a data error in data to be delivered to that destination from the attached processor. | 07-17-2014 |
20140201461 | Context Switching with Offload Processors - A method for context switching of multiple offload processors coupled to receive data for processing over a memory bus is disclosed. The method can include directing storage of a cache state, via a bulk read from a cache of at least one of a plurality of offload processors into a context memory, by operation of a scheduling circuit, with any virtual and physical memory locations of the cache state being aligned, and subsequently directing transfer of the cache state to at least one of the offload processors for processing, by operation of the scheduling circuit. | 07-17-2014 |
20140237194 | EFFICIENT VALIDATION OF COHERENCY BETWEEN PROCESSOR CORES AND ACCELERATORS IN COMPUTER SYSTEMS - A method of testing cache coherency in a computer system design allocates different portions of a single cache line for use by accelerators and processors. The different portions of the cache line can have different sizes, and the processors and accelerators can operate in the simulation at different frequencies. The verification system can control execution of the instructions to invoke different modes of the coherency mechanism such as direct memory access or cache intervention. The invention provides a further opportunity to test any accelerator having an original function and an inverse function by allocating cache lines to generate an original function output, allocating cache lines to generate an inverse function output based on the original function output, and verifying correctness of the original and inverse functions by comparing the inverse function output to the original function input. | 08-21-2014 |
20140244940 | AFFINITY GROUP ACCESS TO GLOBAL DATA - A method, system, and computer readable medium to share data on a global basis within a symmetric multiprocessor (SMP) computer system are disclosed. The method may include grouping a plurality of processor cores into a plurality of affinity groups. Global data may be copied into a plurality of group data structures. Each group data structure may correspond to an affinity group. The method may read a first group data structure by a thread executing on a processor core associated with a first affinity group. | 08-28-2014 |
20140244941 | AFFINITY GROUP ACCESS TO GLOBAL DATA - A method, system, and computer readable medium to share data on a global basis within a symmetric multiprocessor (SMP) computer system are disclosed. The method may include grouping a plurality of processor cores into a plurality of affinity groups. The method may include creating hints about the global data in the plurality of group data structures. Each group data structure may correspond to an affinity group. The method may read a first group data structure by a thread executing on a processor core associated with a first affinity group. | 08-28-2014 |
20140244942 | AFFINITY GROUP ACCESS TO GLOBAL DATA - A method, system, and computer readable medium to share data on a global basis within a symmetric multiprocessor (SMP) computer system are disclosed. The method may include grouping a plurality of processor cores into a plurality of affinity groups. Global data may be copied into a plurality of group data structures. Each group data structure may correspond to an affinity group. The method may read a first group data structure by a thread executing on a processor core associated with a first affinity group. | 08-28-2014 |
20140244943 | AFFINITY GROUP ACCESS TO GLOBAL DATA - A method, system, and computer readable medium to share data on a global basis within a symmetric multiprocessor (SMP) computer system are disclosed. The method may include grouping a plurality of processor cores into a plurality of affinity groups. The method may include creating hints about the global data in the plurality of group data structures. Each group data structure may correspond to an affinity group. The method may read a first group data structure by a thread executing on a processor core associated with a first affinity group. | 08-28-2014 |
20140250274 | MAPPING PERSISTENT STORAGE - A computer apparatus and related method to access storage is provided. In one aspect, a controller maps an address range of a data block of storage into an accessible memory address range of at least one of a plurality of processors, in a further aspect, the controller ensures that copies of the data block cached in a plurality of memories by a plurality of processors are consistent. | 09-04-2014 |
20140258642 | DYNAMIC PRIORITIZATION OF CACHE ACCESS - Some embodiments of the inventive subject matter are directed to operations that include determining that an access request to a computer memory results in a cache miss. In some examples, the operations further include determining an amount of cache resources used to service additional cache misses that occurred within a period prior to the cache miss. Furthermore, in some examples, the operations further include servicing the access request to the computer memory based, at least in part, on the amount of the cache resources used to service the additional cache misses within the period prior to the cache miss. | 09-11-2014 |
20140281266 | Maintaining Coherence When Removing Nodes From a Directory-Based Shared Memory System - A high performance computing system and methods are disclosed. The system includes logical partitions with physically removable nodes that each have at least one processor, and memory that can be shared with other nodes. Node hardware may be removed or allocated to another partition without a reboot or power cycle. Memory sharing is tracked using a memory directory. Cache coherence operations on the memory directory include a test to determine whether a given remote node has been removed. If the remote node is not present, system hardware simulates a valid response from the missing node. | 09-18-2014 |
20140281267 | Enabling Hardware Transactional Memory To Work More Efficiently With Readers That Can Tolerate Stale Data - A technique for enabling hardware transactional memory (HTM) to work more efficiently with readers that can tolerate stale data. In an embodiment, a pre-transaction load request is received from one of the readers, the pre-transaction load request signifying that the reader can tolerate pre-transaction data. A determination is made whether the pre-transaction load request comprises data that has been designated for update by a concurrent HTM transaction. If so, a cache line containing the data is marked as pre-transaction data. The concurrent HTM transaction proceeds without aborting notwithstanding the pre-transaction load request. | 09-18-2014 |
20140281268 | Enabling Hardware Transactional Memory To Work More Efficiently With Readers That Can Tolerate Stale Data - A technique for enabling hardware transactional memory (HTM) to work more efficiently with readers that can tolerate stale data. In an embodiment, a pre-transaction load request is received from one of the readers, the pre-transaction load request signifying that the reader can tolerate pre-transaction data. A determination is made whether the pre-transaction load request comprises data that has been designated for update by a concurrent HTM transaction. If so, a cache line containing the data is marked as pre-transaction data. The concurrent HTM transaction proceeds without aborting notwithstanding the pre-transaction load request. | 09-18-2014 |
20140297966 | OPERATION PROCESSING APPARATUS, INFORMATION PROCESSING APPARATUS AND METHOD OF CONTROLLING INFORMATION PROCESSING APPARATUS - An operation processing apparatus connected with another operation processing apparatus including an operation processing unit to perform an operation process using first data administered by the own operation processing apparatus and second data administered by and acquired from another operation processing apparatus, a main memory to store the first data, and a control unit to include a setting unit which sets the operation processing unit to an operating state or a non-operating state and a cache memory which holds the first and second data, wherein when the setting unit sets the operation processing unit to the non-operating state and receives a notification related to discarding of the first data from another operation processing apparatus, the control unit acquires the first data from the main memory and holds the acquired data in the cache memory. | 10-02-2014 |
20140317358 | GLOBAL MAINTENANCE COMMAND PROTOCOL IN A CACHE COHERENT SYSTEM - A system may include a command queue controller coupled to a number of clusters of cores, where each cluster includes a cache shared amongst the cores. An originating core of one of the clusters may detect a global maintenance command and send the global maintenance command to the command queue controller. The command queue controller may broadcast the global maintenance command to the clusters including the originating core's cluster. Each of the cores of the clusters may execute the global maintenance command. Each cluster may send an acknowledgement to the command queue controller upon completed execution of the global maintenance command by each core of the cluster. The command queue controller may also send, upon receiving an acknowledgement from each cluster, a final acknowledgement to the originating core's cluster. | 10-23-2014 |
20140337583 | INTELLIGENT CACHE WINDOW MANAGEMENT FOR STORAGE SYSTEMS - Methods and structure for intelligent cache window management are provided. The system comprises a memory and a cache manager. The memory stores entries of cache data for a logical volume. The cache manager is able to track usage of the logical volume by a host, and to identify logical block addresses of the logical volume to cache based on the tracked usage. The cache manager is further able to determine that one or more write operations are directed to the identified logical block addresses, to prevent caching for the identified logical block addresses until the write operations have completed, and to populate a new cache entry in the memory with data from the identified logical block addresses responsive to detecting completion of the write operations. | 11-13-2014 |
20140344526 | METADATA MANAGEMENT - In one embodiment, a copy relationship is established between a storage location at a first site and a storage location at a second site, in a manner which includes selectively either 1) synchronously writing a modified metadata track from a cache to data storage if the metadata track in cache is a mixture of ones and zeros, before staging from data storage into the cache, the next track of the sequence of tracks of metadata, or 2) instead of synchronously writing from cache the modified metadata track, entering a journal entry to protect the modified metadata track in cache if the metadata track in cache is one of all ones and all zeros, so that asynchronous writing of the modified metadata track from cache is substituted for synchronous writing from cache. Other aspects are described. | 11-20-2014 |
20140365733 | INTEGRATED CIRCUIT SYSTEM HAVING DECOUPLED LOGICAL AND PHYSICAL INTERFACES - An integrated circuit system including a first integrated circuit chip including first logic, a second integrated circuit chip, and second logic distributed across the first and second integrated circuit chips. The second logic includes a first unit integrated in the first integrated circuit chip and a second unit integrated in the second integrated circuit chip. The integrated circuit system further includes a physical communication link coupling the first unit in the first integrated circuit chip and the second unit in the second integrated circuit chip and a request interface between the first logic and first unit of the second logic. The request interface is implemented in the first integrated circuit such that communication via the request interface between the first logic and the first unit of the second logic has low latency and such that the request interface is decoupled from the physical communication link. | 12-11-2014 |
20150026416 | DYNAMIC MEMORY CACHE SIZE ADJUSTMENT IN A MEMORY DEVICE - Methods for dynamic memory cache size adjustment, enabling dynamic memory cache size adjustment, memory devices, and memory systems are disclosed. One such method for dynamic memory cache size adjustment determines available memory space in a memory array and adjusts a size of a memory cache in the memory array responsive to the available memory space. | 01-22-2015 |
20150032970 | PERFORMANCE OF ACCESSES FROM MULTIPLE PROCESSORS TO A SAME MEMORY LOCATION - A processing apparatus comprising: several processors for processing data; a hierarchical memory system comprising a memory accessible to all the processors, and several caches corresponding to each of the processors, each of the caches being accessible to the corresponding processor and comprising storage locations and corresponding indicators. There is also cache coherency control circuitry for maintaining coherency of data stored in the hierarchical memory system. The processors are configured to respond to receipt of a predefined request to perform an operation on a data item to determine if the cache corresponding to the processor receiving the request has a storage location allocated to the data item. If not, the processing apparatus is configured to: allocate a storage location within the cache to the data item, set the indicator corresponding to the storage location to indicate that the storage location is storing a delta value, set data in the allocated storage location to an initial value. The processor is configured in response to the predefined request to perform the operation on data within the storage location allocated to the data item. | 01-29-2015 |
20150032971 | System and Method for Predicting False Sharing - In one embodiment, a method for predicting false sharing includes running code on a plurality of cores and tracking potential false sharing in the code while running the code to produce tracked potential false sharing, where tracking the potential false sharing includes determining whether there is potential false sharing between a first cache line and a second cache line, and where the first cache line is adjacent to the second cache line. The method also includes reporting potential false sharing in accordance with the tracked potential false sharing to produce a false sharing report. | 01-29-2015 |
20150058579 | SYSTEMS AND METHODS FOR MEMORY UTILIZATION FOR OBJECT DETECTION - A method for memory utilization by an electronic device is described. The method includes transferring a first portion of a first decision tree and a second portion of a second decision tree from a first memory to a cache memory. The first portion and second portion of each decision tree are stored contiguously in the first memory. The first decision tree and second decision tree are each associated with a different feature of an object detection algorithm. The method also includes reducing cache misses by traversing the first portion of the first decision tree and the second portion of the second decision tree in the cache memory based on an order of execution of the object detection algorithm. | 02-26-2015 |
20150067267 | CONCURRENT INLINE CACHE OPTIMIZATION IN ACCESSING DYNAMICALLY TYPED OBJECTS - A method and an apparatus for concurrent accessing of dynamically type objects based on inline cache code are described. Inline cache initialization in a single thread may be off loaded to an interpreter without incurring unnecessary synchronization overhead. A thread bias mechanism may be provided to detect whether a code block is executed in a single thread. Further, the number of inline cache initializations performed via a compiler, such as baseline JIT compiler, can be reduced to improve processing performance. | 03-05-2015 |
20150067268 | OPTIMIZING MEMORY BANDWIDTH CONSUMPTION USING DATA SPLITTING WITH SOFTWARE CACHING - A computer processor collects information for a dominant data access loop and reference code patterns based on data reference pattern analysis, and for pointer aliasing and data shape based on pointer escape analysis. The computer processor selects a candidate array for data splitting wherein the candidate array is referenced by a dominant data access loop. The computer processor determines a data splitting mode by which to split the data of the candidate array, based on the reference code patterns, the pointer aliasing, and the data shape information, and splits the data into two or more split arrays. The computer processor creates a software cache that includes a portion of the data of the two or more split arrays in a transposed format, and maintains the portion of the transposed data within the software cache and consults the software cache during an access of the split arrays. | 03-05-2015 |
20150067269 | METHOD FOR BUILDING MULTI-PROCESSOR SYSTEM WITH NODES HAVING MULTIPLE CACHE COHERENCY DOMAINS - A method for building a multi-processor system with nodes having multiple cache coherency domains. In this system, a directory built in anode controller needs to include processor domain attribute information, and the information can be acquired by configuring cache coherency domain attributes of ports of the node controller connected to processors. In the disclosure herein, the node ca roller can support the multiple physical cache coherency domains in a node. | 03-05-2015 |
20150067270 | METADATA CACHE MANAGEMENT - Managing a cache includes determining from metadata of a received service request whether a cache data response may satisfy the request as a function of recognizing a cacheable method name specification within request metadata by a service provider associated with the request, and determining whether the request is an inquiry in order to decide if the request may be satisfied by the cached data. Aspects also include searching the cache for the data response if determined the data is cacheable and the request is an inquiry, and sending the request on to a service provider if the data response is not a cacheable response, or the request is an update request. | 03-05-2015 |
20150089151 | SURFACE RESOURCE VIEW HASH FOR COHERENT CACHE OPERATIONS IN TEXTURE PROCESSING HARDWARE - Techniques are disclosed for performing memory access operations. A texture unit receives a memory access operation that includes a tuple associated with a first view in a plurality of views. The texture unit retrieves a first hash value associated with a first texture header in a plurality of texture headers, where the first texture header is related to the first view. The texture unit retrieves a second hash value associated with a second texture header in the plurality of texture headers, where the second texture header is related to a second view. The texture unit determines whether the first view is potentially aliased with the second view, based on the first and second hash values. If so, then the texture unit invalidates a cache entry in a cache memory associated with the second texture header. Otherwise, the texture unit maintains the cache entry. | 03-26-2015 |
20150089152 | MANAGING HIGH-CONFLICT CACHE LINES IN TRANSACTIONAL MEMORY COMPUTING ENVIRONMENTS - Cache lines in a computing environment with transactional memory are configurable with a coherency mode. Cache lines in full-line coherency mode are operated or managed with full-line granularity. Cache lines in sub-line coherency mode are operated or managed as sub-cache line portions of a full cache line. When a transaction accessing a cache line in full-line coherency mode results in a transactional abort, the cache line may be placed in sub-line coherency mode if the cache line is a high-conflict cache line. The cache line may be associated with a counter in a conflict address detection table that is incremented whenever a transaction conflict is detected for the cache line. The cache line may be a high-conflict cache line when the counter satisfies a high-conflict criterion, such as reaching a threshold value. The cache line may be returned to full-line coherency mode when a reset criterion is satisfied. | 03-26-2015 |
20150089153 | IDENTIFYING HIGH-CONFLICT CACHE LINES IN TRANSACTIONAL MEMORY COMPUTING ENVIRONMENTS - Cache lines in a computing environment with transactional memory are configurable with a coherency mode and are associated with a high-conflict indicator. Cache lines in full-line coherency mode are operated or managed with full-line granularity. Cache lines in sub-line coherency mode are operated or managed as sub-cache line portions of a full cache line. A cache line is placed in sub-line coherency mode based on examining the high-conflict indicator. A transaction accessing a memory address in a cache line in sub-line coherency mode marks only the sub-cache line portion associated with the memory address as transactionally accessed. The high-conflict indicator may be included in a set of descriptive bits associated with the cache line. A copy of the high-conflict indicator for a cache line in a first cache may be updated with the high-conflict indicator for the cache line in a second cache. | 03-26-2015 |
20150089154 | MANAGING HIGH-COHERENCE-MISS CACHE LINES IN MULTI-PROCESSOR COMPUTING ENVIRONMENTS - Cache lines in a multi-processor computing environment are configurable with a coherency mode. Cache lines in full-line coherency mode are operated or managed with full-line granularity. Cache lines in sub-line coherency mode are operated or managed as sub-cache line portions of a full cache line. A high-coherence-miss cache line may be placed in sub-line coherency mode. A cache line may be associated with a counter in a coherence miss detection table that is incremented whenever an access of the cache line results in a coherence request. The cache line may be a high-coherence-miss cache line when the counter satisfies a high-coherence-miss criterion, such as reaching a threshold value. The cache line may be returned to full-line coherency mode when a reset criterion is satisfied. | 03-26-2015 |
20150089155 | CENTRALIZED MANAGEMENT OF HIGH-CONTENTION CACHE LINES IN MULTI-PROCESSOR COMPUTING ENVIRONMENTS - Cache lines in a multi-processor computing environment are configurable with a coherency mode. Cache lines in full-line coherency mode are operated or managed with full-line granularity. Cache lines in sub-line coherency mode are operated or managed as sub-cache line portions of a full cache line. Communications detected on a coherence interconnect may indicate that a cache line is associated with performance-reducing events. A high-contention cache line may be placed in sub-line coherency mode. Caches accessing the cache line are notified that the cache line is in sub-line coherency mode. The cache line may be associated with a counter in a centralized detection table that is incremented based on detecting the communications. The cache line may be a high-contention cache line when the counter satisfies a high-contention criterion, such as reaching a threshold value. The cache line may be returned to full-line coherency mode when a reset criterion is satisfied. | 03-26-2015 |
20150089156 | Atomic Memory Update Unit & Methods - In an aspect, an update unit can evaluate condition(s) in an update request and update one or more memory locations based on the condition evaluation. The update unit can operate atomically to determine whether to effect the update and to make the update. Updates can include one or more of incrementing and swapping values. An update request may specify one of a pre-determined set of update types. Some update types may be conditional and others unconditional. The update unit can be coupled to receive update requests from a plurality of computation units. The computation units may not have privileges to directly generate write requests to be effected on at least some of the locations in memory. The computation units can be fixed function circuitry operating on inputs received from programmable computation elements. The update unit may include a buffer to hold received update requests. | 03-26-2015 |
20150089157 | SPECULATIVE READ IN A CACHE COHERENT MICROPROCESSOR - A cache coherence manager, disposed in a multi-core microprocessor, includes a request unit, an intervention unit, a response unit and an interface unit. The request unit receives coherent requests and selectively issues speculative requests in response. The interface unit selectively forwards the speculative requests to a memory. The interface unit includes at least three tables. Each entry in the first table represents an index to the second table. Each entry in the second table represents an index to the third table. The entry in the first table is allocated when a response to an associated intervention message is stored in the first table but before the speculative request is received by the interface unit. The entry in the second table is allocated when the speculative request is stored in the interface unit. The entry in the third table is allocated when the speculative request is issued to the memory. | 03-26-2015 |
20150095589 | CACHE MEMORY SYSTEM AND OPERATING METHOD FOR THE SAME - A cache memory system includes a cache memory, which stores cache data corresponding to portions of main data stored in a main memory and priority data respectively corresponding to the cache data; a table storage unit, which stores a priority table including information regarding access frequencies with respect to the main data; and a controller, which, when at least one from among the main data is requested, determines whether cache data corresponding to the request is stored in the cache memory, deletes one from among the cache data based on the priority data, and updates the cache data set with new data, wherein the priority data is determined based on the information regarding access frequencies. | 04-02-2015 |
20150317250 | Read and Write Requests to Partially Cached Files - Aspects of the invention are provided to support partial file caching on a file system block boundary. All read requests are converted so that offset and count are aligned on a block boundary. Data associated with read requests is first satisfied from local cache, with cache misses supported with a call to persistent or remote system. Similarly, for a write request, any partial blocks are aligned to the block boundary. Data associated with the write request is performed on local cache and placed in a queue for replay to the persistent or remote system. | 11-05-2015 |
20150324290 | HYBRID MEMORY CUBE SYSTEM INTERCONNECT DIRECTORY-BASED CACHE COHERENCE METHODOLOGY - A system includes a plurality of host processors and a plurality of hybrid memory cube (HMC) devices configured as a distributed shared memory for the host processors. An HMC device includes a plurality of integrated circuit memory die including at least a first memory die arranged on top of a second memory die, and at least a portion of the memory of the memory die is mapped to include at least a portion of a memory coherence directory; and a logic base die including at least one memory controller configured to manage three-dimensional (3D) access to memory of the plurality of memory die by at least one second device, and logic circuitry configured to implement a memory coherence protocol for data stored in the memory of the plurality of memory die. | 11-12-2015 |
20150331797 | MEMORY ACCESS TRACING METHOD - A method for identifying, in a system including two or more computing devices that are able to communicate with each other, with each computing device having with a cache and connected to a corresponding memory, a computing device accessing one of the memories, includes monitoring memory access to any of the memories; monitoring cache coherency commands between computing devices; and identifying the computing device accessing one of the memories by using information related to the memory access and cache coherency commands. | 11-19-2015 |
20150363324 | SYSTEMS AND METHODS FOR A DE-DUPLICATION CACHE - A de-duplication is configured to cache data for access by a plurality of different storage clients, such as virtual machines. A virtual machine may comprise a virtual machine de-duplication module configured to identify data for admission into the de-duplication cache. Data admitted into the de-duplication cache may be accessible by two or more storage clients. Metadata pertaining to the contents of the de-duplication cache may be persisted and/or transferred with respective storage clients such that the storage clients may access the contents of the de-duplication cache after rebooting, being power cycled, and/or being transferred between hosts. | 12-17-2015 |
20150370709 | REDUCTION OF EVICTIONS IN CACHE MEMORY MANAGEMENT DIRECTORIES - A module of cache coherence management by directory, in which each datum stored in cache memory is associated with a state, at least one of which indicates data sharing among a plurality of processors, the module including a storage unit to store a directory containing a list of cache memory addresses, each address possibly associated with a state corresponding to the state of the datum available at this address, and a processing unit configured to update said list, said processing unit being configured so as not to list the address lines related to data associated with the first state. | 12-24-2015 |
20150378907 | DYNAMIC PREDICTOR FOR COALESCING MEMORY TRANSACTIONS - A transactional memory system predicts the outcome of coalescing outermost memory transactions, the coalescing causing committing of memory store data to memory for a first transaction to be done at transaction execution (TX) end of a second transaction, the method comprising. A processor of the transactional memory system determines whether a first plurality of outermost transactions from an associated program that were coalesced experienced an abort, the first plurality of outermost transactions including a first instance of a first transaction. The processor updates a history of the associated program to reflect the results of the determination. The processor coalesces a second plurality of outermost transactions from the associated program, based, at least in part, on the updated history. | 12-31-2015 |
20150378908 | ALLOCATING READ BLOCKS TO A THREAD IN A TRANSACTION USING USER SPECIFIED LOGICAL ADDRESSES - A processor in a multi-processor configuration is configured to execute an instruction that specifies a virtual address range to be monitored to protect reads in a transaction. The processor translates the virtual address range to a series of real pages. The real starting address and ending address pairs for each real page are stored for use later on to resolve a potential cross-interrogation (XI) conflict with a real address on the XI bus. | 12-31-2015 |
20150378909 | PERFORMING STAGING OR DESTAGING BASED ON THE NUMBER OF WAITING DISCARD SCANS - A controller receives a request to perform staging or destaging operations with respect to an area of a cache. A determination is made as to whether more than a threshold number of discard scans are waiting to be performed. The controller avoids satisfying the request to perform the staging or the destaging operations or a read hit with respect to the area of the cache, in response to determining that more than the threshold number of discard scans are waiting to be performed. | 12-31-2015 |
20150378925 | INVALIDATION DATA AREA FOR CACHE - The present disclosure relates to caches, methods, and systems for using an invalidation data area. The cache can include a journal configured for tracking data blocks, and an invalidation data area configured for tracking invalidated data blocks associated with the data blocks tracked in the journal. The invalidation data area can be on a separate cache region from the journal. A method for invalidating a cache block can include determining a journal block tracking a memory address associated with a received write operation. The method can also include determining a mapped journal block based on the journal block and on an invalidation record. The method can also include determining whether write operations are outstanding. If so, the method can include aggregating the outstanding write operations and performing a single write operation based on the aggregated write operations. | 12-31-2015 |
20150378926 | TRANSACTIONAL EXECUTION IN A MULTI-PROCESSOR ENVIRONMENT THAT MONITORS MEMORY CONFLICTS IN A SHARED CACHE - A higher level shared cache of a hierarchical cache of a multi-processor system utilizes transaction identifiers to manage memory conflicts in corresponding transactions. The higher level cache is shared with two or more processors. Transaction indicators are set in the higher level cache corresponding to the cache lines being accessed. The transaction aborts if a memory conflict with the transaction's cache lines from another transaction is detected. | 12-31-2015 |
20150378927 | ALLOWING NON-CACHEABLE LOADS WITHIN A TRANSACTION - A computer allows non-cacheable loads or stores in a hardware transactional memory environment. Transactional loads or stores, by a processor, are monitored in a cache for TX conflicts. The processor accepts a request to execute a transactional execution (TX) transaction. Based on processor execution of a cacheable load or store instruction for loading or storing first memory data of the transaction, the computer can perform a cache miss operation on the cache. Based on processor execution of a non-cacheable load instruction for loading second memory data of the transaction, the computer can not-perform the cache miss operation on the cache based on a cache line associated with the second memory data being not-cached, and load an address of the second memory data into a non-cache-monitor. The TX transaction can be aborted based on the non-cache monitor detecting a memory conflict from another processor. | 12-31-2015 |
20160011976 | THREE CHANNEL CACHE-COHERENCY SOCKET PROTOCOL | 01-14-2016 |
20160011977 | MEMORY SEQUENCING WITH COHERENT AND NON-COHERENT SUB-SYSTEMS | 01-14-2016 |
20160019150 | INFORMATION PROCESSING DEVICE, CONTROL METHOD OF INFORMATION PROCESSING DEVICE AND CONTROL PROGRAM OF INFORMATION PROCESSING DEVICE - An information processing device comprising a plurality of nodes, each nodes comprising an arithmetic operation device configured to execute an arithmetic process, and a main memory which stores data, wherein each of arithmetic operation devices belonging to each of the plurality of nodes is configured to read a target data of which the arithmetic operation unit executes the arithmetic operation from a storage device except the main memory, based on a first address information indicating a storage position in the storage device, and write the target data into the main memory of own node. | 01-21-2016 |
20160041792 | RECOVERING FROM UNEXPECTED FLASH DRIVE REMOVAL - Techniques for recovering from unexpected removal of (or other unexpected power loss) a flash memory device from a computer system. An interpolated device driver notes whenever the flash memory device is unexpectedly removed, or otherwise unexpectedly powers off or enters a locked state. If the flash memory device is reinserted, the interpolated device driver reinitializes the flash memory device, and satisfies any flash memory device security protocol, so the flash memory device and the computer system can be restored to their status just before unexpected removal. The interpolated device driver caches requests to the flash memory device, and when status is restored to just before removal, replays those requests to the flash memory device, so the flash memory device responds to those requests as if it had ever been removed. The computer system does not notice any break in service by the flash memory device due to removal and reinsertion. | 02-11-2016 |
20160055084 | NON-BLOCKING WRITES TO FILE DATA - Techniques and systems are disclosed for implementing non-blocking writes to eliminate the fetch-before-write requirement by creating an in-memory patch for the updated page and unblocking the calling process. Non-blocking writes eliminate such blocking by buffering the written data elsewhere in memory and unblocking the writing process immediately. Subsequent reads to the updated page locations are also made non-blocking and, in some cases, can be eliminated when the read request can be serviced from in-memory patches. | 02-25-2016 |
20160077972 | Efficient and Consistent Para-Virtual I/O System - Embodiments of the invention relate to a para-virtual I/O system. A consistent para-virtual I.O system architecture is provided with a new virtual disk interface and a semantic journaling mechanism. The virtual disk interface is extended with two primitives for flushing and ordering I/O, both of the primitives being exported to para-virtual I/O drivers in a guest operating system. The ordering primitive guarantees ordering of preceeding writes, and the flushing primitive enforces order and durability. The guest drivers selectively uses both of these primitives based on semantics of the data being persisted from the para-virtual cache hierarchy to physical disk. The order of committed writes is enforced in order to enable a consistent start recovered after a crash. | 03-17-2016 |
20160092354 | HARDWARE APPARATUSES AND METHODS TO CONTROL CACHE LINE COHERENCY - Methods and apparatuses to control cache line coherency are described. A processor may include a first core having a cache to store a cache line, a second core to send a request for the cache line from the first core, moving logic to cause a move of the cache line between the first core and a memory and to update a tag directory of the move, and cache line coherency logic to create a chain home in the tag directory from the request to cause the cache line to be sent from the tag directory to the second core. A method to control cache line coherency may include creating a chain home in a tag directory from a request for a cache line in a first processor core from a second processor core to cause the cache line to be sent from the tag directory to the second processor core. | 03-31-2016 |
20160092358 | CACHE COHERENCY VERIFICATION USING ORDERED LISTS - Embodiments relate to cache coherency verification using ordered lists. An aspect includes maintaining a plurality of ordered lists, each ordered list corresponding to a respective thread that is executed by a processor, wherein each ordered list comprises a plurality of atoms, each atom corresponding to a respective operation performed in a cache by the respective thread that corresponds to the ordered list in which the atom is located, wherein the plurality of atoms in an ordered list are ordered based on program order. Another aspect includes determining a state of an atom in an ordered list of the plurality of ordered lists. Another aspect includes comparing the state of the atom in an ordered list to a state of an operation corresponding to the atom in the cache. Yet another aspect includes, based on the comparing, determining that there is a coherency violation in the cache. | 03-31-2016 |
20160092359 | MULTI-GRANULAR CACHE MANAGEMENT IN MULTI-PROCESSOR COMPUTING ENVIRONMENTS - Cache lines in a multi-processor computing environment are configurable with a coherency mode. Cache lines in full-line coherency mode are operated or managed with full-line granularity. Cache lines in sub-line coherency mode are operated or managed as sub-cache line portions of a full cache line. Each cache is associated with a directory having a number of directory entries and with a side table having a smaller number of entries. The directory entry for a cache line associates the cache line with a tag and a set of full-line descriptive bits. Creating a side table entry for the cache line places the cache line in sub-line coherency mode. The side table entry associates each of the sub-cache line portions of the cache line with a set of sub-line descriptive bits. Removing the side table entry may return the cache line to full-line coherency mode. | 03-31-2016 |
20160092368 | CACHE COHERENCY VERIFICATION USING ORDERED LISTS - Embodiments relate to cache coherency verification using ordered lists. An aspect includes maintaining a plurality of ordered lists, each ordered list corresponding to a respective thread that is executed by a processor, wherein each ordered list comprises a plurality of atoms, each atom corresponding to a respective operation performed in a cache by the respective thread that corresponds to the ordered list in which the atom is located, wherein the plurality of atoms in an ordered list are ordered based on program order. Another aspect includes determining a state of an atom in an ordered list of the plurality of ordered lists. Another aspect includes comparing the state of the atom in an ordered list to a state of an operation corresponding to the atom in the cache. Yet another aspect includes, based on the comparing, determining that there is a coherency violation in the cache. | 03-31-2016 |
20160110287 | GRANTING EXCLUSIVE CACHE ACCESS USING LOCALITY CACHE COHERENCY STATE - A cache coherency management facility to reduce latency in granting exclusive access to a cache in certain situations. A node requests exclusive access to a cache line of the cache. The node is in one region of nodes of a plurality of regions of nodes. The one region of nodes includes the node requesting exclusive access and another node of the computing environment, in which the node and the another node are local to one another as defined by a predetermined criteria. The node requesting exclusive access checks a locality cache coherency state of the another node, the locality cache coherency state being specific to the another node and indicating whether the another node has access to the cache line. Based on the checking indicating that the another node has access to the cache line, a determination is made that the node requesting exclusive access is to be granted exclusive access to the cache line. The determining being independent of transmission of information relating to the cache line from one or more other nodes of the one or more other regions of nodes. | 04-21-2016 |
20160117247 | COHERENCY PROBE RESPONSE ACCUMULATION - A processor accumulating coherency probe responses, thereby reducing the impact of coherency messages on the bandwidth of the processor's communication fabric. A probe response accumulator is connected to a processing module of the processor, the processing module having multiple processor cores and associated caches. In response to a coherency probe, the processing module generates a different coherency probe response for each of the caches. The probe response accumulator combines the different coherency probe responses into a single coherency probe response and communicates the single coherency response over the communication fabric. | 04-28-2016 |
20160117248 | COHERENCY PROBE WITH LINK OR DOMAIN INDICATOR - A processor includes a set of processing modules, each of the processing modules including a cache and a coherency manager that keeps track of the memory addresses of data stored at the caches of other processing modules. In response to its local cache requesting access to a particular memory address or other triggering event, the coherency manager generates a coherency probe. In the event that the generated coherency probe is targeted to multiple processing modules, the coherency manager includes a set of multicast bits indicating the processing modules whose caches include copies of the data targeted by the multicast probe. A transport switch that connects the processing module to the fabric communicates the coherency probe only to subset of processing modules indicated by the multicast bits. | 04-28-2016 |
20160139830 | MEMORY CONTROLLED OPERATIONS UNDER DYNAMIC RELOCATION OF STORAGE - A computing device is provided and includes a plurality of nodes. Each node includes multiple chips and a node controller at which the multiple chips are assignable to logical partitions. Each of the multiple chips includes processors and a memory unit configured to handle local memory operations originating from the processors. The node controller includes a dynamic memory relocation (DMR) mechanism configured to move data having a DMR storage increment address relative to a local one of the memory units without interrupting a processing of the data by at least one of the logical partitions. During movement of the data by the DMR mechanism, the memory units are disabled from handling the local memory operations matching the DMR storage increment address and the node controller handles the local memory operations matching the DMR storage increment address. | 05-19-2016 |
20160147658 | CONFIGURATION BASED CACHE COHERENCY PROTOCOL SELECTION - Topology of clusters of processors of a computer configuration, configured to support any of a plurality of cache coherency protocols, is discovered at initialization time to determine which one of the plurality of cache coherency protocols is to be used to handle coherency requests of the configuration | 05-26-2016 |
20160147659 | NESTED CACHE COHERENCY PROTOCOL IN A TIERED MULTI-NODE COMPUTER SYSTEM - A computer system comprising multiple nodes, each node comprising a plurality of processors and a local cache hierarchy, suppresses local cache coherency of a node operations or global cache coherency operations between nodes based on the coherency request being a global or local request, and the state of the cache line at the node. | 05-26-2016 |
20160162414 | CACHING AND DEDUPLICATION OF DATA BLOCKS IN CACHE MEMORY - Techniques for deduplicating data in cache memory include determining that a first data block stored in the cache memory matches a second data block stored in the cache memory. It is further determined that a number of accesses associated with at least one of the first data block or the second data block is equal to or greater than a threshold number of accesses. In response to determining that the number of accesses is equal to or greater than the threshold number of accesses, the first data block is deduplicated in the cache memory. | 06-09-2016 |
20160170879 | SYSTEMS AND METHODS FOR MANAGING CACHE OF A DATA STORAGE DEVICE | 06-16-2016 |
20160179674 | HARDWARE APPARATUSES AND METHODS TO CONTROL CACHE LINE COHERENCE | 06-23-2016 |
20160188473 | COMPRESSION OF HARDWARE CACHE COHERENT ADDRESSES - Compression of address bits within a cache coherent subsystem of a chip is performed, enabling a cache coherent subsystem to avoid transmitting, storing, and operating upon unnecessary address information. Compression is performed according to any appropriate lossless algorithm, such as discarding of bits or code book lookup. The algorithm may be chosen according to constraints on logic delay and silicon area. An algorithm for minimum area would use a number of bits equal to the rounded up binary logarithm of the sum of all addresses of all memory regions. A configuration tool generates a logic description of the compression algorithm. The algorithm may be chosen automatically by the configuration tool. Decompression may be performed on addresses exiting the coherent subsystem. | 06-30-2016 |
20160253260 | STORAGE SYSTEM AND STORAGE CONTROL CIRCUIT | 09-01-2016 |
20160253261 | METHOD AND APPARATUS FOR CORRECTING CACHE PROFILING INFORMATION IN MULTI-PASS SIMULATOR | 09-01-2016 |
20170237677 | DATA CACHING METHOD AND DEVICE, AND STORAGE MEDIUM | 08-17-2017 |
20180024926 | APPARATUSES AND METHODS FOR TRANSFERRING DATA | 01-25-2018 |
20190146920 | TECHNIQUES FOR HANDLING REQUESTS FOR DATA AT A CACHE | 05-16-2019 |