Entries |
Document | Title | Date |
20080201533 | REDUCING NUMBER OF REJECTED SNOOP REQUESTS BY EXTENDING TIME TO RESPOND TO SNOOP REQUEST - A cache, system and method for reducing the number of rejected snoop requests. An incoming snoop request is entered in the first available latch in a pipeline of latches in a stall/reorder unit if the stall/reorder unit is not full. The entered snoop request is dispatched to a selector upon entering a bottom latch in the pipeline. The stall/reorder unit is not informed as to whether the dispatched snoop request is accepted by an arbitration mechanism for several clock cycles after the dispatch occurred. A copy of the dispatched snoop request is stored in a top latch in an overrun pipeline of latches in the first unit upon dispatching the snoop request. By maintaining information about the snoop request, the snoop request may be dispatched again to the selector in case the dispatched snoop request was rejected thereby increasing the chance that the snoop request will ultimately be accepted. | 08-21-2008 |
20080209133 | Managing cache coherency in a data processing apparatus - A data processing apparatus and method are provided for managing cache coherency. The data processing apparatus comprises a plurality of processing units, each having a cache associated therewith, and each cache having indication circuitry containing segment filtering data. The indication circuitry is responsive to an address portion of an address specified by an access request from an associated processing unit to reference the segment filtering data in order to provide, for each of at least a subset of the segments of the associated cache, an indication as to whether the data is either definitely not stored in that segment or is potentially stored in that segment. Further, in accordance with the present invention, cache coherency circuitry is provided which employs a cache coherency protocol to ensure data accessed by each processing unit is up-to-date. The cache coherency circuitry has snoop indication circuitry associated therewith whose content is derived from the segment filtering data of each indication circuitry. For certain access requests, the cache coherency circuitry initiates a coherency operation during which the snoop indication circuitry is referenced to determine whether any of the caches require subjecting to a snoop operation. For each cache for which it is determined a snoop operation should be performed, the cache coherency circuitry is arranged to issue a notification to that cache identifying the snoop operation to be performed. By taking advantage of information already provided in association with each cache in order to form the content of the snoop indication circuitry, significant hardware cost savings are achieved when compared with prior art techniques. Further, through use of such an approach, it is possible in embodiments of the present invention to identify the snoop operation not only on a cache-by-cache basis, but also for a particular cache to identify which segments of that cache should be subjected to the snoop operation. | 08-28-2008 |
20080209134 | Apparatus for Operating Cache-Inhibited Memory Mapped Commands to Access Registers - In a multiprocessor environment, by executing cache-inhibited reads or writes to registers, a scan communication is used to rapidly access registers inside and outside a chip originating the command. Cumbersome locking of the memory location may be thus avoided. Setting of busy latches at the outset virtually eliminates the chance of collisions, and status bits are set to inform the requesting core processor that a command is done and free of error, if that is the case. | 08-28-2008 |
20080209135 | DATA PROCESSING SYSTEM, METHOD AND INTERCONNECT FABRIC SUPPORTING DESTINATION DATA TAGGING - A data processing system includes a plurality of communication links and a plurality of processing units including a local master processing unit. The local master processing unit includes interconnect logic that couples the processing unit to one or more of the plurality of communication links and an originating master coupled to the interconnect logic. The originating master originates an operation by issuing a write-type request on at least one of the one or more communication links, receives from a snooper in the data processing system a destination tag identifying a route to the snooper, and, responsive to receipt of the combined response and the destination tag, initiates a data transfer including a data payload and a data tag identifying the route provided within the destination tag. | 08-28-2008 |
20080215822 | PCI Express Enhancements and Extensions - A method and apparatus for enhancing/extending a serial point-to-point interconnect architecture, such as Peripheral Component Interconnect Express (PCIe) is herein described. Temporal and locality caching hints and prefetching hints are provided to improve system wide caching and prefetching. Message codes for atomic operations to arbitrate ownership between system devices/resources are included to allow efficient access/ownership of shared data. Loose transaction ordering provided for while maintaining corresponding transaction priority to memory locations to ensure data integrity and efficient memory access. Active power sub-states and setting thereof is included to allow for more efficient power management. And, caching of device local memory in a host address space, as well as caching of system memory in a device local memory address space is provided for to improve bandwidth and latency for memory accesses. | 09-04-2008 |
20080215823 | DATA CONSISTENCY CONTROL SYSTEM AND DATA CONSISTENCY CONTROL METHOD - In a data consistency control system, a plurality of cache agents and at least one home agent are connected to one another by a plurality of networks. The home agent includes a unit issuing a snoop request when receiving an access request. Each of the cache agents includes a unit issuing a snoop response to the home agent when receiving the snoop request, and a unit issuing a snoop retry response. Each of the cache agents also includes a unit causing the snoop response and the snoop retry response to be communicated via different networks. The home agent also includes a unit managing the snoop retry response and a unit reissuing the snoop request by the management unit. | 09-04-2008 |
20080215824 | CACHE MEMORY, PROCESSING UNIT, DATA PROCESSING SYSTEM AND METHOD FOR FILTERING SNOOPED OPERATIONS - A cache coherent data processing system includes at least a first cache memory supporting a first processing unit and a second cache memory supporting a second processing unit. The first cache memory includes a cache array and a cache directory of contents of the cache array. In response to the first cache memory detecting on an interconnect a broadcast operation that specifies a request address, the first cache memory determines from the operation a type of the operation and a coherency state associated with the request address. In response to determining the type and the coherency state, the first cache memory filters out the broadcast operation without accessing the cache directory. | 09-04-2008 |
20080222364 | SNOOP FILTERING SYSTEM IN A MULTIPROCESSOR SYSTEM - A system and method for supporting cache coherency in a computing environment having multiple processing units, each unit having an associated cache memory system operatively coupled therewith. The system includes a plurality of interconnected snoop filter units, each snoop filter unit corresponding to and in communication with a respective processing unit, with each snoop filter unit comprising a plurality of devices for receiving asynchronous snoop requests from respective memory writing sources in the computing environment; and a point-to-point interconnect comprising communication links for directly connecting memory writing sources to corresponding receiving devices; and, a plurality of parallel operating filter devices coupled in one-to-one correspondence with each receiving device for processing snoop requests received thereat and one of forwarding requests or preventing forwarding of requests to its associated processing unit. Each of the plurality of parallel operating filter devices comprises parallel operating sub-filter elements, each simultaneously receiving an identical snoop request and implementing one or more different snoop filter algorithms for determining those snoop requests for data that are determined not cached locally at the associated processing unit and preventing forwarding of those requests to the processor unit. In this manner, a number of snoop requests forwarded to a processing unit is reduced thereby increasing performance of the computing environment. | 09-11-2008 |
20080235461 | Technique and apparatus for combining partial write transactions - A bridge includes a memory to establish a transaction table and write combining windows. Each write combining window is associated with a cache line and is subdivided into subwindows; and each of the subwindows is associated with a partial cache line. The bridge includes a controller to determine whether an incoming partial write transaction conflicts with a transaction stored in the transaction table. If a conflict occurs, the controller uses the write combining windows to combine the partial write transaction with another partial write transaction if one of the partial write combining windows is available. The controller issues a retry signal to a processor originating the partial write transaction if none of the partial write combining windows are available. | 09-25-2008 |
20080244193 | ADAPTIVE RANGE SNOOP FILTERING METHODS AND APPARATUSES - Snoop filtering methods and apparatuses for systems utilizing memory are contemplated. Method embodiments comprise receiving a request for contents of a memory line by a home agent, comparing an address of the memory line to a range in a set of adaptive ranges, and snooping an I/O agent for the contents upon a match of the address with the range. Apparatus embodiments comprise a range table, a table updater, a receiver module, and a range comparator. The range tables allow for the tracking of memory addresses as I/O agents assert ownership of the addresses. | 10-02-2008 |
20080244194 | METHOD AND APARATHUS FOR FILTERING SNOOP REQUESTS USING STREAM REGISTERS - A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having a local cache memory associated therewith. A snoop filter device is associated with each processing unit and includes at least one snoop filter primitive implementing filtering method based on usage of stream registers sets and associated stream register comparison logic. From the plurality of stream registers sets, at least one stream register set is active, and at least one stream register set is labeled historic at any point in time. In addition, the snoop filter block is operatively coupled with cache wrap detection logic whereby the content of the active stream register set is switched into a historic stream register set upon the cache wrap condition detection, and the content of at least one active stream register set is reset. Each filter primitive implements stream register comparison logic that determines whether a received snoop request is to be forwarded to the processor or discarded. | 10-02-2008 |
20080270708 | System and Method for Achieving Cache Coherency Within Multiprocessor Computer System - A system and method are disclosed for achieving cache coherency in a multiprocessor computer system having a plurality of sockets with processing devices and memory controllers and a plurality of memory blocks. In at least some embodiments, the system includes a plurality of node controllers capable of being respectively coupled to the respective sockets of the multiprocessor computer, a plurality of caching devices respectively coupled to the respective node controllers, and a fabric coupling the respective node controllers, by which cache line request signals can be communicated between the respective node controllers. Cache coherency is achieved notwithstanding the cache line request signals communicated between the respective node controllers due at least in part to communications between the node controllers and the respective caching devices to which the node controllers are coupled. In at least some embodiments, the caching devices track remote cache line ownership for processor and/or input/output hub caches. | 10-30-2008 |
20080276047 | APPARATUS, SYSTEM, AND METHOD FOR EFFICIENTLY VERIFYING WRITES - An apparatus, system, and method are disclosed for efficiently verifying writes. A storage module stores a plurality of data sets in a storage controller memory. A write module writes the plurality of data sets through a first write channel to a hard disk drive. A verification module verifies whether a representative data set of the plurality of data sets is successfully written to the first write channel or not. A mitigation module rewrites the plurality of data sets in response to an unsuccessful write of the representative data set. | 11-06-2008 |
20080288724 | METHOD AND APPARATUS FOR CACHE TRANSACTIONS IN A DATA PROCESSING SYSTEM - A plurality of new snoop transaction types are described. Some include address information in the requests, and others include cache entry information in the requests. Some responses include tag address information, and some do not. Some provide tag address content on the data bus lines during the data portion of the transaction. These new snoop transaction types are very helpful during debug of a data processing system. | 11-20-2008 |
20080288725 | METHOD AND APPARATUS FOR CACHE TRANSACTIONS IN A DATA PROCESSING SYSTEM - A plurality of new snoop transaction types are described. Some include address information in the requests, and others include cache entry information in the requests. Some responses include tag address information, and some do not. Some provide tag address content on the data bus lines during the data portion of the transaction. These new snoop transaction types are very helpful during debug of a data processing system. | 11-20-2008 |
20080294850 | METHOD AND APPARATUS FOR FILTERING SNOOP REQUESTS USING A SCOREBOARD - An apparatus for implementing snooping cache coherence that locally reduces the number of snoop requests presented to each cache in a multiprocessor system. A snoop filter device associated with a single processor includes one or more “scoreboard” data structures that make snoop determinations, i.e., for each snoop request from another processor, to determine if a request is to be forwarded to the processor or, discarded. At least one scoreboard is active, and at least one scoreboard is determined to be historic at any point in time. A snoop determination of the queue indicates that an entry may be in the cache, but does not indicate its actual residence status. In addition, the snoop filter block implementing scoreboard data structures is operatively coupled with a cache wrap detection logic means whereby, upon detection of a cache wrap condition, the content of the active scoreboard is copied into a historic scoreboard and the content of at least one active scoreboard is reset. | 11-27-2008 |
20080301377 | DATA PROCESSING SYSTEM, CACHE SYSTEM AND METHOD FOR UPDATING AN INVALID COHERENCY STATE IN RESPONSE TO SNOOPING AN OPERATION - A cache coherent data processing system includes at least first and second coherency domains. In a first cache memory within the first coherency domain of the data processing system, a coherency state field associated with a storage location and an address tag is set to a first data-invalid coherency state that indicates that the address tag is valid and that the storage location does not contain valid data. In response to snooping an exclusive access operation, the exclusive access request specifying a target address matching the address tag and indicating a relative domain location of a requestor that initiated the exclusive access operation, the first cache memory updates the coherency state field from the first data-invalid coherency state to a second data-invalid coherency state that indicates that the address tag is valid, that the storage location does not contain valid data, and whether a target memory block associated with the address tag is cached within the first coherency domain upon successful completion of the exclusive access operation based upon the relative location of the requestor. | 12-04-2008 |
20080307168 | Latency Reduction for Cache Coherent Bus-Based Cache - In one embodiment, a system comprises a plurality of agents coupled to an interconnect and a cache coupled to the interconnect. The plurality of agents are configured to cache data. A first agent of the plurality of agents is configured to initiate a transaction on the interconnect by transmitting a memory request, and other agents of the plurality of agents are configured to snoop the memory request from the interconnect. The other agents provide a response in a response phase of the transaction on the interconnect. The cache is configured to detect a hit for the memory request and to provide data for the transaction to the first agent prior to the response phase and independent of the response. | 12-11-2008 |
20080307169 | Method, Apparatus, System and Program Product Supporting Improved Access Latency for a Sectored Directory - A data processing system includes a coherence directory having a prefetch sector cache and a memory directory array containing a plurality of sectored entries. According to one method, in response to receiving a first directory lookup request specifying a first target address, an entry associated with the target address is accessed in the memory directory array. In response to the access, the coherence directory returns, as a result of the first directory lookup request, contents of a first sector that is identified by the target address as a requested sector. The coherence directory also caches contents of a second sector of the multiple sectors that is a non-requested sector for the first directory lookup request in a prefetch sector cache. In response to receiving a subsequent second directory lookup request specifying a second target address that identifies the second sector as a requested sector, the coherence directory accesses the contents of the second sector in the sector prefetch cache and returns the contents of the second sector as a result of the second directory lookup request. | 12-11-2008 |
20080320236 | System having cache snoop interface independent of system bus interface - A system includes processor units, caches, memory shared by the processor units, a system bus interface, and a cache snoop interfaces. Each processor unit has one of the caches. The system bus interface communicatively connects the processor units to the memory via at least the caches, and is a non-cache snoop system bus interface. The cache snoop interface communicatively connects the caches, and is independent of the system bus interface. Upon a given processor unit writing a new value to an address within the memory such that the new value and the address are cached within the cache of the given processor unit a write invalidation event is sent over the cache snoop interface to the caches of the processor units other than the given processor unit. This event invalidates the address as stored within any of the caches other than the cache of the given processor unit. | 12-25-2008 |
20080320237 | SYSTEM CONTROLLER AND CACHE CONTROL METHOD - A multiprocessor system comprises a plurality of system controllers, each of which performs a snoop processing regarding a cache device in its charge. The system controllers adjust the number of steps of a snoop pipeline for the snoop processing according to communication time with the other system controllers. The number-of-steps adjustment absorbs the difference of the communication time in the results of the snoop for each scale of the multiprocessor system. When a retrial is determined by an address conflict or the like in the snoop processing, each of the system controllers resubmits the access to be retried to the snoop pipeline after waiting until no other access which may cause an address conflict precedes. The resubmission timing prevents infinite repetition of the retrial of the snoop processing in the system controllers. | 12-25-2008 |
20080320238 | Snoop control method and information processing apparatus - One aspect of the embodiments utilizes a system controller which has a broadcast transmitting and receiving unit that receives a memory access request from each of CPU and notifies to the other system controllers and a snoop control unit that judges when the memory access request from any of the CPUs for each of the cache memories in the CPU is received, whether object data conflicts with object data requested by a prior access request received earlier than the memory access request and whether the object data is present in any of the cache memories, selects the status of the cache memory of the CPU, notifies the other system controller of a snoop processing result in which the status selected and the cache memory are associated, and set a final status as the status of the system controller based on priority of each status of other system controllers. | 12-25-2008 |
20090006769 | PROGRAMMABLE PARTITIONING FOR HIGH-PERFORMANCE COHERENCE DOMAINS IN A MULTIPROCESSOR SYSTEM - A multiprocessor computing system and a method of logically partitioning a multiprocessor computing system are disclosed. The multiprocessor computing system comprises a multitude of processing units, and a multitude of snoop units. Each of the processing units includes a local cache, and the snoop units are provided for supporting cache coherency in the multiprocessor system. Each of the snoop units is connected to a respective one of the processing units and to all of the other snoop units. The multiprocessor computing system further includes a partitioning system for using the snoop units to partition the multitude of processing units into a plurality of independent, memory-consistent, adjustable-size processing groups. Preferably, when the processor units are partitioned into these processing groups, the partitioning system also configures the snoop units to maintain cache coherency within each of said groups. | 01-01-2009 |
20090006770 | NOVEL SNOOP FILTER FOR FILTERING SNOOP REQUESTS - A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having one or more local cache memories associated and operatively connected therewith. The method comprises providing a snoop filter device associated with each processing unit, each snoop filter device having a plurality of dedicated input ports for receiving snoop requests from dedicated memory writing sources in the multiprocessor computing environment. Each snoop filter device includes a plurality of parallel operating port snoop filters in correspondence with the plurality of dedicated input ports, each port snoop filter implementing one or more parallel operating sub-filter elements that are adapted to concurrently filter snoop requests received from respective dedicated memory writing sources and forward a subset of those requests to its associated processing unit. | 01-01-2009 |
20090031085 | Directory for Multi-Node Coherent Bus - A method for maintaining cache coherency for a multi-node system using a specialized bridge which allows for fewer forward progress dependencies. A look-up of a local node directory is performed if a request received at a multi-node bridge of the local node is a system request. If a directory entry indicates that data specified in the request has a local owner or local destination, the request is forwarded to the local node. If the local node determines that the request is a local request, a look-up of the local node directory is performed. If the directory entry indicates that data specified in the request has a local owner and local destination, the coherency of the data on the local node is resolved and a transfer of the request data is performed if required. Otherwise, the request is forwarded to all remote nodes in the multi-node system. | 01-29-2009 |
20090031086 | Directory For Multi-Node Coherent Bus - A method for maintaining cache coherency for a multi-node system using a specialized bridge which allows for fewer forward progress dependencies. A local node makes a determination whether a request is a local or system request. If the request is a local request, a look-up of a directory in the local node is performed. If an entry in the directory of the local node indicates that data in the request does not have a remote owner and that the request does not have a remote destination, the coherency of the data is resolved on the local node, and a transfer of the data specified in the request is performed if required and if the request is a local request. If the entry indicates that the data has a remote owner or that the request has a remote destination, the request is forwarded to all remote nodes in the multi-node system. | 01-29-2009 |
20090031087 | MASK USABLE FOR SNOOP REQUESTS - A system comprises a plurality of cache agents, a computing entity coupled to the cache agents, and a programmable mask accessible to the computing entity. The programmable mask is indicative of, for at least one memory address, those cache agents that can receive a snoop request associated with a memory address. Based on the mask, the computing entity transmits snoop requests, associated with the memory address, to only those cache agents identified by the mask as cache agents that can receive a snoop request associated with the memory address. | 01-29-2009 |
20090070534 | MEMORY ACCESS MONITORING APPARATUS AND RELATED METHOD - A memory access controlling apparatus, for monitoring an access of a memory to generate a target watch signal, includes: at least one monitoring circuit, a setting unit and an output circuit. The monitoring circuit corresponds to an address of the memory and holds an access setting value. The monitoring circuit monitors the access of the memory according to the access setting value to generate an initial watch signal. The setting unit holds a setting value for triggering an exception, which is related to a condition for triggering the exception while the memory is accessed. The output circuit is coupled to the monitoring circuit and the setting unit, and is used for generating the target watch signal according to the initial watch signal and the setting value. | 03-12-2009 |
20090089512 | Adaptive Snoop-and-Forward Mechanisms for Multiprocessor Systems - In a network-based cache-coherent multiprocessor system, when a node receives a cache request, the node can perform an intra-node cache snoop operation and forward the cache request to a subsequent node in the network. A snoop-and-forward prediction mechanism can be used to predict whether lazy forwarding or eager forwarding is used in processing the incoming cache request. With lazy forwarding, the node cannot forward the cache request to the subsequent node until the corresponding intra-node cache snoop operation is completed. With eager forwarding, the node can forward the cache request to the subsequent node immediately, before the corresponding intra-node cache snoop operation is completed. Furthermore, the snoop-and-forward prediction mechanism can be enhanced seamlessly with an appropriate snoop filter to avoid unnecessary intra-node cache snoop operations. | 04-02-2009 |
20090106502 | Translation lookaside buffer snooping within memory coherent system - A node of a multiple-node system includes a translation lookaside buffer (TLB), a cache, and a TLB snoop mechanism. The node shares memory with other nodes of the multiple-node systems, and is connected with the other nodes via a bus. The TLB snooping mechanism snoops inbound memory access requests and/or outbound memory access requests. Inbound requests are received from over the bus and are intended for the cache. However, the cache receives only the inbound requests that relate to memory addresses having associated entries within the TLB. Outbound requests are received from within the node and are intended for transmission over the bus. However, the bus coherently transmits only the outbound requests that relate to memory addresses that are part of memory pages having set shared-memory page memory flags. All other outbound memory access requests are sent over the bus non-coherently. | 04-23-2009 |
20090113139 | Avoiding snoop response dependency - In one embodiment, the present invention includes a method for receiving a request for data in a home agent of a system from a first agent, prefetching the data from a memory and accessing a directory entry to determine whether a copy of the data is cached in any system agent, and forwarding the data to the first agent without waiting for snoop responses from other system agents if the directory entry indicates that the data is not cached. Other embodiments are described and claimed. | 04-30-2009 |
20090113140 | Reducing latency in responding to a snoop request - In one embodiment, the present invention includes a method for receiving a snoop request, providing the snoop request to a coherency engine along a first path and providing the snoop request to a bypass logic along a bypass path, and generating a speculative invalid snoop response in the bypass logic and forwarding the speculative invalid snoop response to indicate that an address associated with the snoop response is not present in a cache memory. Other embodiments are described and claimed. | 04-30-2009 |
20090119462 | REPEATED CONFLICT ACKNOWLEDGEMENTS IN A CACHE COHERENCY PROTOCOL - In a cache coherency protocol multiple conflict phases may be utilized to resolve a data request conflict condition. The multiple conflict phases may avoid buffering or stalling conflict resolution, which may reduce system inefficiencies. | 05-07-2009 |
20090150619 | COHERENT CACHING OF LOCAL MEMORY DATA - A multi processor system | 06-11-2009 |
20090150620 | Controlling cleaning of data values within a hardware accelerator - A data processing apparatus | 06-11-2009 |
20090177845 | SNOOP REQUEST MANAGEMENT IN A DATA PROCESSING SYSTEM - Snoop requests are managed in a data processing system having a cache coupled to a processor that provides access addresses to the cache. Snoop queue circuitry provides snoop addresses to the cache via an arbiter. The snoop queue circuitry has a snoop request queue for storing a plurality of entries. Each entry of the snoop request queue that corresponds to a snoop request includes a snoop address and a corresponding status indicator. The corresponding status indicator indicates whether the snoop request has zero or more collapsed snoop requests having a common snoop address which have been merged to form the snoop request. The status indicator is used for debug and by fullness management logic to manage the capacity of the snoop request queue. A general collapsed status signal is generated to indicate whenever any snoop queue entry collapsing occurs. | 07-09-2009 |
20090177846 | Retry Mechanism - An interface unit may comprise a buffer configured to store requests that are to be transmitted on an interconnect and a control unit coupled to the buffer. In one embodiment, the control unit is coupled to receive a retry response from the interconnect during a response phase of a first transaction for a first request stored in the buffer. The control unit is configured to record an identifier supplied on the interconnect with the retry response that identifies a second transaction that is in progress on the interconnect. The control unit is configured to inhibit reinitiation of the first transaction at least until detecting a second transmission of the identifier. In another embodiment, the control unit is configured to assert a retry response during a response phase of a first transaction responsive to a snoop hit of the first transaction on a first request stored in the buffer for which a second transaction is in progress on the interconnect. The control unit is further configured to provide an identifier of the second transaction with the retry response. | 07-09-2009 |
20090198914 | DATA PROCESSING SYSTEM, PROCESSOR AND METHOD IN WHICH AN INTERCONNECT OPERATION INDICATES ACCEPTABILITY OF PARTIAL DATA DELIVERY - According to at least one embodiment, a method of data processing in a multiprocessor data processing system includes a requesting processing unit initiating an interconnect operation including a memory access request that indicates an acceptability of a variable amount of data to service the interconnect request for data. In response to snooping the memory access request on an interconnect, a snooper selects an amount of data to supply to the requesting processing unit and transmits the selected amount of data to the requesting processing unit. The requesting processing unit receives the selected amount of data and utilizes at least some of the selected amount of data to service a processor request. | 08-06-2009 |
20090198915 | DATA PROCESSING SYSTEM, PROCESSOR AND METHOD THAT DYNAMICALLY SELECT A MEMORY ACCESS SIZE - A method of data processing in a processing unit supported by a memory hierarchy includes the processing unit performing a plurality of memory accesses to the memory hierarchy. The plurality of memory accesses includes one or more memory accesses targeting a full cache line of data. The processing unit monitors utilization of data accessed by the plurality of memory accesses, and based upon the utilization of the data, dynamically alters a memory access mode of operation so that a subsequent storage-modifying memory access targets less than a full cache line of data. | 08-06-2009 |
20090240892 | SELECTIVE INTERCONNECT TRANSACTION CONTROL FOR CACHE COHERENCY MAINTENANCE - A data processing system ( | 09-24-2009 |
20090240893 | Information processing device, memory control method, and memory control device - The present invention provides an information processing device, a memory control method, and a memory control device. In the information processing device that includes nodes each having a main memory and a processor including a cache memory, the system controller of at least one of the nodes is designed to include a holding unit that holds specific information about primary data present in the main memory of its subject node, with the cache data corresponding to the primary data not present in the cache memory of the nodes other than its subject node. With this structure, the latency of each memory access is shortened, and the throughput of each snoop operation is improved. | 09-24-2009 |
20090240894 | METHOD AND APARATUS FOR THE SYNCHRONIZATION OF DISTRIBUTED CACHES - A method and apparatus for the synchronization of distributed caches. More particularly, the present invention to cache memory systems and more particularly to a hierarchical caching protocol suitable for use with distributed caches, including use within a caching input/output (I/O) hub. | 09-24-2009 |
20090276580 | SNOOP REQUEST MANAGEMENT IN A DATA PROCESSING SYSTEM - In a data processing system, a method includes a first master initiating a transaction via a system interconnect to a target device. After initiating the transaction, a snoop request corresponding to the transaction is provided to a cache of a second master. The transaction is completed. After completing the transaction, a snoop lookup operation corresponding to the snoop request in the cache of the second master is performed. The transaction may be completed prior to or after providing the snoop request. In response to performing the snoop lookup operation, a snoop response may be provided, where the snoop response is provided after completing the transaction. When the snoop response indicates an error, a snoop error may be provided to the first master. | 11-05-2009 |
20090276581 | METHOD, SYSTEM AND APPARATUS FOR REDUCING MEMORY TRAFFIC IN A DISTRIBUTED MEMORY SYSTEM - The present disclosure provides a method for reducing memory traffic in a distributed memory system. The method may include storing a presence vector in a directory of a memory slice, said presence vector indicating whether a line in local memory has been cached. The method may further include protecting said memory slice from cache coherency violations via a home agent configured to transmit and receive data from said memory slice, said home agent configured to store a copy of said presence vector. The method may also include receiving a request for a block of data from at least one processing node at said home agent and comparing said presence vector with said copy of said presence vector stored in said home agent. The method may additionally include eliminating a write update operation between said home agent and said directory if said presence vector and said copy are equivalent. Of course, many alternatives, variations and modifications are possible without departing from this embodiment. | 11-05-2009 |
20090319727 | Efficient Load Queue Snooping - In one embodiment, a processor comprises a data cache and a load/store unit (LSU). The LSU comprises a queue and a control unit, and each entry in the queue is assigned to a different load that has accessed the data cache but has not retired. The control unit is configured to update the data cache hit status of each load represented in the queue as a content of the data cache changes. The control unit is configured to detect a snoop hit on a first load in a first entry of the queue responsive to: the snoop index matching a load index stored in the first entry, the data cache hit status of the first load indicating hit, the data cache detecting a snoop hit for the snoop operation, and a load way stored in the first entry matching a first way of the data cache in which the snoop operation is a hit. | 12-24-2009 |
20090327616 | SNOOP FILTERING MECHANISM - A system and method for selectively transmitting probe commands and reducing network traffic. Directory entries are maintained to filter probe command and response traffic for certain coherent transactions. Rather than storing directory entries in a dedicated directory storage, directory entries may be stored in designated locations of a shared cache memory subsystem, such as an L3 cache. Directory entries are stored within the shared cache memory subsystem to provide indications of lines (or blocks) that may be cached in exclusive-modified, owned, shared, shared-one, or invalid coherency states. The absence of a directory entry for a particular line may imply that the line is not cached anywhere in a computing system. | 12-31-2009 |
20100005246 | SATISFYING MEMORY ORDERING REQUIREMENTS BETWEEN PARTIAL READS AND NON-SNOOP ACCESSES - A method and apparatus for preserving memory ordering in a cache coherent link based interconnect in light of partial and non-coherent memory accesses is herein described. In one embodiment, partial memory accesses, such as a partial read, is implemented utilizing a Read Invalidate and/or Snoop Invalidate message. When a peer node receives a Snoop Invalidate message referencing data from a requesting node, the peer node is to invalidate a cache line associated with the data and is not to directly forward the data to the requesting node. In one embodiment, when the peer node holds the referenced cache line in a Modified coherency state, in response to receiving the Snoop Invalidate message, the peer node is to writeback the data to a home node associated with the data. | 01-07-2010 |
20100005247 | Method and Apparatus for Global Ordering to Insure Latency Independent Coherence - A method and apparatus is described for insuring coherency between memories in a multi-agent system where the agents are interconnected by one or more fabrics. A global arbiter is used to segment coherency into three phases: request; snoop; and response, and to apply global ordering to the requests. A bus interface having request, snoop, and response logic is provided for each agent. A bus interface having request, snoop and response logic is provided for the global arbiter, and a bus interface is provided to couple the global arbiter to each type of fabric it is responsible for. Global ordering and arbitration logic tags incoming requests from the multiple agents and insures that snoops are responded to according to the global order, without regard to latency differences in the fabrics. | 01-07-2010 |
20100005248 | PSEUDO LEAST RECENTLY USED REPLACEMENT/ALLOCATION SCHEME IN REQUEST AGENT AFFINITIVE SET-ASSOCIATIVE SNOOP FILTER - The storage locations of a snoop filter are segregated into a number of groups, and some groups are associated with some processors in a system. When new data enter a cache line of a processor, one of the storage locations associated with the processor is selected for further operations. | 01-07-2010 |
20100011171 | CACHE CONSISTENCY IN A MULTIPROCESSOR SYSTEM WITH SHARED MEMORY - A process to make the cache memory of a processor consistent includes the processor processing a request to write data to an address in its memory marked as being in the shared state. The address is transmitted to the other processors, data are written into the processor's cache memory and the address changes to the modified state. An appended memory associated with the processor memorizes the address, the data and an associated marker in a first state. The processor then receives the address with an indicator. If the indicator indicates that the processor must perform the operation and if the associated marker is in the first state, the data are kept in the modified state. If the indicator does not indicate that the processor must perform the operation and if the processor receives an order to mark the data to be in the invalid state, the marker changes to a second state. | 01-14-2010 |
20100042787 | CACHE INJECTION DIRECTING TOOL - A method for directing cache injection based on actual system load may include providing a snooping-based fabric having two or more bus-coupled units. At least one of the bus-coupled units may be configured as an injection unit for directing cache injection. A snoop request may be transmitted from the injection unit to one or more destination units of the other bus-coupled unit. The snoop request may include an identification value having a function identifier. The function identifier may identify a destination function for the cache injection, where the destination function is configured to run on the destination unit. A snoop response may be transmitted from the destination unit to the injection unit in response to the snoop request. The snoop response may include a function response value indicating whether the function identifier matches a function indication of a snoop register for the destination unit. | 02-18-2010 |
20100057997 | CACHE SNOOP LIMITING WITHIN A MULTIPLE MASTER DATA PROCESSING SYSTEM - In a data processing system, access to a cache in response to access requests from first processing circuitry and snoop requests resulting from a transaction performed by second processing circuitry are arbitrated. Accesses to the cache are monitored to determine if the first processing circuitry is prevented from accessing the cache for more than a threshold amount of time. A signal is generated to indicate when the first processing circuitry has been prevented from accessing the cache for more than the threshold amount of time. | 03-04-2010 |
20100057998 | SNOOP REQUEST ARBITRATION IN A DATA PROCESSING SYSTEM - A snoop look-up operation is performed in a system having a cache and a first processor. The processor generates requests to the cache for data. A snoop queue is loaded with snoop requests. Fullness of the snoop queue is a measure of how many snoop requests are in the snoop queue. A snoop look-up operation is performed in the cache if the fullness of the snoop queue exceeds the threshold. The snoop look-up operation is based on a snoop request from the snoop queue corresponding to an entry in the snoop queue. If the fullness of the snoop queue does not exceed the threshold, waiting to perform a snoop look-up operation until an idle access request cycle from the processor to the cache occurs and performing the snoop look-up operation in the cache upon the idle access request cycle from the processor. | 03-04-2010 |
20100057999 | SYNCHRONIZATION MECHANISM FOR USE WITH A SNOOP QUEUE - In a data processing system each bus master of a plurality of bus masters communicates information via a system interconnect. A cache is associated with a predetermined bus master of the plurality of bus masters for storing information used by the predetermined bus master. A snoop queue is associated with the predetermined bus master for storing a plurality of snoop requests and selectively storing for each snoop request an indicator of a synchronization request that indicates a synchronization operation is to be performed by completing any previously issued snoop requests prior to or concurrently with completion of the synchronization operation. In one form the indicator is a synchronization request indicator flag for each entry in the snoop queue that indicates whether each entry participates in the synchronization operation associated with the synchronization request. | 03-04-2010 |
20100058000 | SNOOP REQUEST ARBITRATION IN A DATA PROCESSING SYSTEM - A snoop look-up operation is performed in a system having a first cache and a first processor. The first processor generates access requests to the first cache for data. Snoop look-up operations are performed in the cache. The snoop look-up operations are based on snoop requests from a snoop queue. The snoop requests correspond to entries in the snoop queue. An access request from the first processor is performed in response to a consecutive number of snoop look-up operations exceeding a first limit. This is useful in avoiding having no processor operations while performing snoop look-up operations. Similarly, consecutive access requests can be counted and if a second limit is exceeded, a snoop look-up operation can be performed. | 03-04-2010 |
20100064107 | MICROPROCESSOR CACHE LINE EVICT ARRAY - An apparatus for ensuring data coherency within a cache memory hierarchy of a microprocessor during an eviction of a cache line from a lower-level memory to a higher-level memory in the hierarchy includes an eviction engine and an array of storage elements. The eviction engine is configured to move the cache line from the lower-level memory to the higher-level memory. The array of storage elements are coupled to the eviction engine. Each storage element is configured to store an indication for a corresponding cache line stored in the lower-level memory. The indication indicates whether or not the eviction engine is currently moving the cache line from the lower-level memory to the higher-level memory. | 03-11-2010 |
20100064108 | Data processing apparatus and method for managing snoop operations - The present invention provides a data processing apparatus and method for managing snoop operations. The data processing apparatus has a plurality of processing units for performing data processing operations requiring access to data in shared memory, with at least two of the processing units having a cache associated therewith for storing a subset of the data for access by that processing unit. A snoop-based cache coherency protocol is employed to ensure data accessed by each processing unit is up-to-date, and when an access request is issued the cache coherency protocol is referenced in order to determine whether a snoop process is required. Snoop control storage is provided which defines a plurality of snoop schemes, each snoop scheme defining a series of snoop phases to be performed to implement the snoop process, and each snoop phase requiring a snoop operation to be performed on either a single cache or multiple caches. When a snoop process is required, a snoop unit is used to reference the snoop control storage in order to identify, having regard to one or more properties of the access request, the snoop scheme to be employed to perform the snoop process. Such an approach provides a great deal of flexibility with regards to how snoop processes are implemented, in particular allowing different snoop schemes to be used dependent on the properties of the access request in question. | 03-11-2010 |
20100106916 | Data Cache Block Zero Implementation - In one embodiment, a processor comprises a core configured to execute a data cache block write instruction and an interface unit coupled to the core and to an interconnect on which the processor is configured to communicate. The core is configured to transmit a request to the interface unit in response to the data cache block write instruction. If the request is speculative, the interface unit is configured to issue a first transaction on the interconnect. On the other hand, if the request is non-speculative, the interface unit is configured to issue a second transaction on the interconnect. The second transaction is different from the first transaction. For example, the second transaction may be an invalidate transaction and the first transaction may be a probe transaction. In some embodiments, the processor may be in a system including the interconnect and one or more caching agents. | 04-29-2010 |
20100131719 | Early Response Indication for data retrieval in a multi-processor computing system - A data processing system is described that reduces read latency of requested memory data, thereby resulting in improved system performance. An exemplary system includes a bus, a processor, and a controller associated with the processor. The controller is configured to send a request for data to a memory storage unit, receive, from the memory storage unit, an early response indicating that the controller will later receive the requested data, and upon receipt of the early response indicator, start a timer to wait a period of time. The controller is further configured to, after expiration of the timer but prior to receipt of the requested data, send an arbitration request to initiate a transaction on the bus to communicate the requested data from the controller to the processor when the requested data is later received by the controller. | 05-27-2010 |
20100138615 | HANDLING DIRECT MEMORY ACCESSES - Methods and systems for efficiently processing direct memory access requests coherently. An external agent requests data from the memory system of a computer system at a target address. A snoop cache determines if the target address is within an address range known to be safe for external access. If the snoop cache determines that the target address is safe, the external agent proceeds with the direct memory access. If the snoop cache does not determine if the target address is safe, then the snoop cache forwards the request on to the processor. After the processor resolves any coherency problems between itself and the memory system, the processor signals the external agent to proceed with the direct memory access. The snoop cache can determine safe address ranges from such processor activity. The snoop cache invalidates its safe address ranges by observing traffic between the processor and the memory system. | 06-03-2010 |
20100146217 | MEMORY INTERFACE DEVICE AND METHODS THEREOF - A data processing device includes a load/store module to provide an interface between a processor device and a bus. In response to receiving a load or store instruction from the processor device, the load/store module determines a predicted coherency state of a cache line associated with the load or store instruction. Based on the predicted coherency state, the load/store module selects a bus transaction and communicates it to the bus. By selecting the bus transaction based on the predicted cache state, the load/store module does not have to wait for all pending bus transactions to be serviced, providing for greater predictability as to when bus transactions will be communicated to the bus, and allowing the bus behavior to be more easily simulated. | 06-10-2010 |
20100146218 | System And Method For Maintaining Cache Coherency Across A Serial Interface Bus - A method for executing processing operations using data stored in a memory. The method includes generating a snoop request configured to determine whether first data stored in a local memory is coherent relative to second data stored in a data cache, the snoop request including destination information that identifies the data cache on a bus, and a cache line address identifying where in the data cache the second data is located. The method further includes causing the snoop request to be transmitted over the bus to the second processor, extracting the cache line address from the snoop request, determining whether the second data is coherent, generating a complete message that includes completion information indicating that the first data is coherent with the second data, and causing the complete message to be transmitted over the bus to the first processor. | 06-10-2010 |
20100161907 | POSTING WEAKLY ORDERED TRANSACTIONS - A processor may comprise a core area, a control unit, an uncore area. The core area may comprise multiple processing cores and line-fill buffers. A first processing core of the core area may store a first weakly ordered transaction in a first line-fill buffer. The firs processing core may offload the first weakly ordered transaction to the extended buffer space provisioned in the uncore area after receiving a request from the uncore area. The first processing core may then de-allocate the first line-fill buffer after the first weakly ordered transaction is offloaded to the extended buffer space. The uncore may then post the first weakly ordered transaction to a memory or a memory system. The control unit may track the first weakly ordered transaction to ensure that the first weakly ordered transaction is posted to the memory or the system. | 06-24-2010 |
20100169582 | Obtaining data for redundant multithreading (RMT) execution - In one embodiment, the present invention includes a method for providing a cache block in an exclusive state to a first cache and providing the same cache block in the exclusive state to a second cache when cores accessing the two caches are executing redundant threads. Other embodiments are described and claimed. | 07-01-2010 |
20100180085 | HANDLING OF MEMORY ACCESS REQUESTS TO SHARED MEMORY IN A DATA PROCESSING APPARATUS - A data processing apparatus and method are provided for handling memory access requests to shared memory. The data processing apparatus has a plurality of processing units, at least one of which is configured to be switchable between an active power state and a dormant power state. The processing units share a memory, and at least one local storage unit is configured to store a local copy of a data item stored in the memory for access by an associated processing unit. A snoop control unit is configured to monitor memory access requests issued by the processing units and to store in the snoop control unit indications of local copies of data items stored in each local storage unit. When a memory access request for a requested data item is issued by one processing unit, if the snoop control unit has a stored indication that a local storage unit belonging to another dormant processing unit has a local copy of that data storage item and a cache coherency protocol required that the local copy of the requested data item stores in the local storage unit associated with the other processing unit be invalidated, the snoop control unit stores a marker indicating that that other local copy should later be invalidated. This approach ensures that the correct behaviour according to a cache coherency protocol for a shared memory is carried out, without losing the benefits of being able to put one of the processors of the multi-processor system into a dormant power state and avoiding the latency of repeated power state switching. | 07-15-2010 |
20100185821 | Local cache power control within a multiprocessor system - A data processing system including a plurality of processors | 07-22-2010 |
20100191920 | Providing Address Range Coherency Capability To A Device - In one embodiment, the present invention includes a method for receiving a memory request from a device coupled to an input/output (IO) interconnect, accessing a mapping table associated with the IO interconnect to determine if an address range including an address of the memory request is coherent, and if so, sending the memory request and a coherency indicator to indicate the coherent state of data at the address, otherwise sending the memory request and the coherency indicator to indicate a non-coherent state. Other embodiments are described and claimed. | 07-29-2010 |
20100199051 | CACHE COHERENCY IN A SHARED-MEMORY MULTIPROCESSOR SYSTEM - A method of making cache memories of a plurality of processors coherent with a shared memory includes one of the processors determining whether an external memory operation is needed for data that is to be maintained coherent. If so, the processor transmits a cache coherency request to a traffic-monitoring device. The traffic-monitoring device transmits memory operation information to the plurality of processors, which includes an address of the data. Each of the processors determines whether the data is in its cache memory and whether a memory operation is needed to make the data coherent. Each processor also transmits to the traffic-monitoring device a message that indicates a state of the data and the memory operation that it will perform on the data. The processors then perform the memory operations on the data. The traffic-monitoring device performs the transmitted memory operations in a fixed order that is based on the states of the data in the processors' cache memories. | 08-05-2010 |
20100205377 | DEBUG CONTROL FOR SNOOP OPERATIONS IN A MULTIPROCESSOR SYSTEM AND METHOD THEREOF - A data processing system has a cache which receives both non-debug snoop requests and debug snoop requests for processing. The non-debug snoop requests are generated in response to transactions snooped from a system interconnect. Debug control circuitry that is coupled to the cache provides the debug snoop requests to the cache for processing. The debug snoop requests are generated in response to debug snoop commands from a debugger and without the use of the system interconnect. In one form snoop circuitry has a snoop request queue having a plurality of entries, each entry for storing a snoop request. A debug indicator corresponding to each snoop request indicates whether the snoop request is a debug snoop request or a non-debug snoop request. | 08-12-2010 |
20100205378 | METHOD FOR DEBUGGER INITIATED COHERENCY TRANSACTIONS USING A SHARED COHERENCY MANAGER - A data processing system includes a system interconnect, a first interconnect master coupled to the system interconnect, a second interconnect master coupled to the system interconnect, and a cache coherency manager coupled to the first and second interconnect masters. The first interconnect master includes a cache. The cache coherency manager provides debug cache coherency operations and non-debug cache coherency operations to the first interconnect master. The cache coherency manager generates the debug cache coherency operations in response to debug cache coherency commands from a debugger and generates the non-debug cache coherency operations in response to transactions performed by the second interconnect master on the system interconnect. | 08-12-2010 |
20100205379 | CACHE-LINE BASED NOTIFICATION - Embodiments of the invention provide a method, system, and computer program product for cache-line based notification. An embodiment of the method comprises injecting a cache-line including notification information into a cache of a processing unit, marking the cache-line as having the notification information, and using the notification information to notify a processing thread of the presence of the cache-line in the cache. In an embodiment, the cache-line identifies a thread affiliation. In an embodiment, a multitude of threads operate in the processing unit, and the using includes notifying a plurality of these threads of the presence of the cache-line in the cache, and analyzing the cache-line to identify this plurality of threads. The cache may include a plurality of cache-lines, each of which includes a notification, and the processing unit thread uses these notifications to form a linked list of at least some of the cache-lines. | 08-12-2010 |
20100205380 | Cache Coherent Switch Device - In one embodiment, the present invention includes a switch device to be coupled between a first semiconductor component and a processor node by interconnects of a communication protocol that provides for cache coherent transactions and non-cache coherent transactions. The switch device includes logic to handle cache coherent transactions from the first semiconductor component to the processor node, while the first semiconductor component does not include such logic. Other embodiments are described and claimed. | 08-12-2010 |
20100250862 | SYSTEM CONTROLLER, INFORMATION PROCESSING SYSTEM, AND ACCESS PROCESSING METHOD - A system controller includes an output unit which transfers an access request from an access source coupled to the system controller to an other system controller; a local snoop control unit that determines whether a destination of the access request from the access source is a local memory unit coupled to the system controller, and locks the destination when the destination is the local memory unit; a receiving unit which receives the access request from the output unit and an access request from an other system controller; a global snoop control unit which sends a response indicating whether the access request is executable or not, and controls locking of the destination of the access request when the destination is the local memory unit; and an access processing unit which unlocks the locking and accesses the memory unit when the access request from the access source becomes executable. | 09-30-2010 |
20100262787 | TECHNIQUES FOR CACHE INJECTION IN A PROCESSOR SYSTEM BASED ON A SHARED STATE - A technique for performing cache injection includes monitoring, at a host fabric interface, snoop responses to an address on a bus. When the snoop responses indicate a data block associated with the address is in a shared state, input/output data associated with the address on the bus is directed to a cache that includes the data block in the shared state and is located physically closer to the host fabric interface than one or more other caches that include the data block associated with the address in the shared state. | 10-14-2010 |
20100262788 | PRE-COHERENCE CHANNEL - A cache architecture to increase communication throughput and reduce stalls due to coherence protocol dependencies. More particularly, embodiments of the invention include multiple cache agents that each communication with the same protocol agent. In one embodiment, a pre-coherence channel couples the cache agents to the protocol agent to enable the protocol agent to receive events corresponding to cache operations from the cache agents to maintain ordering with respect to the cache operation events. | 10-14-2010 |
20100268896 | TECHNIQUES FOR CACHE INJECTION IN A PROCESSOR SYSTEM FROM A REMOTE NODE - A technique for performing cache injection in a processor system includes monitoring, by a cache, addresses on a bus. Input/output data associated with an address of a data block stored in the cache is then requested from a remote node, via a network controller. Ownership of the input/output data is acquired by the cache when an address on the bus that is associated with the input/output data corresponds to the address of the data block stored in the cache. | 10-21-2010 |
20100274975 | Forming Multiprocessor Systems Using Dual Processors - In one embodiment, link logic of a multi-chip processor (MCP) formed using multiple processors may interface with a first point-to-point (PtP) link coupled between the MCP and an off-package agent and another PtP link coupled between first and second processors of the MCP, where the on-package PtP link operates at a greater bandwidth than the first PtP link. Other embodiments are described and claimed. | 10-28-2010 |
20100287341 | Wake-and-Go Mechanism with System Address Bus Transaction Master - A wake-and-go mechanism is provided for a data processing system. The wake-and-go mechanism is configured to issue a look-ahead load command on a system bus to read a data value from a target address and perform a comparison operation to determine whether the data value at the target address indicates that an event for which a thread is waiting has occurred. In response to the comparison resulting in a determination that the event has not occurred, the wake-and-go engine populates the wake-and-go storage array with the target address and snoops the target address on the system bus. | 11-11-2010 |
20100287342 | PROCESSING OF COHERENT AND INCOHERENT ACCESSES AT A UNIFORM CACHE - Each cacheline of a unified cache storing information is marked as incoherent if the information was acquired incoherently or marked as coherent if the information was acquired coherently. A subsequent incoherent read access to a cacheline can result in a cache hit and a return of the cached information regardless of whether the cacheline is marked as coherent or incoherent. However, a subsequent coherent read access to a cacheline marked as incoherent will be returned as a cache miss regardless of whether the cacheline includes information sought by the coherent read access. In response to a cache miss for a coherent read access, a global snoop is initiated so as to query all other target components within the same coherency domain. In contrast, a cache miss resulting from an incoherent read access is processed using a non-global snoop to a limited set of one or a few target components in the coherency domain. | 11-11-2010 |
20100332767 | Controllably Exiting An Unknown State Of A Cache Coherency Directory - In one embodiment, a method includes receiving a read request from a first caching agent and if a directory entry associated with the request is in an unknown state, an invalidating snoop message is sent to at least one other caching agent to invalidate information in a cache location of the other caching agent corresponding to the location of the read request, to enable setting of the directory entry into a known state. Other embodiments are described and claimed. | 12-30-2010 |
20100332768 | FLEXIBLE READ- AND WRITE-MONITORED AND BUFFERED MEMORY BLOCKS - A computing system includes a number of threads. The computing system is configured to allow for monitoring and testing memory blocks in a cache memory to determine effects on memory blocks by various agents. The system includes a processor. The processor includes a mechanism implementing an instruction set architecture including instructions accessible by software. The instructions are configured to: set per-hardware-thread, for a first thread, memory access monitoring indicators for a plurality of memory blocks, and test whether any monitoring indicator has been reset by the action of a conflicting memory access by another agent. The processor further includes mechanism configured to: detect conflicting memory accesses by other agents to the monitored memory blocks, and upon such detection of a conflicting access, reset access monitoring indicators corresponding to memory blocks having conflicting memory accesses, and remember that at least one monitoring indicator has been so reset. | 12-30-2010 |
20110004731 | CACHE MEMORY DEVICE, CACHE MEMORY SYSTEM AND PROCESSOR SYSTEM - A cache memory device includes: a storage unit in which data and attribute information can be stored in association with each other; and a cache controller which (i) obtains, from CPU, a request signal requesting access to data and an indication signal indicating whether or not the requested data is a synchronization primitive, and when the indication signal indicates that the data requested by the request signal is the synchronization primitive, (ii) stores in association, into the storage unit, the requested data and synchronization primitive attribute information indicating that the requested data is a valid synchronization primitive. The cache controller prohibits purge of the data stored in the storage unit in association with the synchronization primitive attribute information. | 01-06-2011 |
20110138132 | RESCINDING OWNERSHIP OF A CACHE LINE IN A COMPUTER SYSTEM - A method of rescinding ownership of a cache line in a computer system includes constructing a table of caching agent representations in which each caching agent representation is accompanied by a validity indicator. The method continues with receiving a cache line sharing list, with each entry of the cache line sharing list indicating the potential ownership of the cache line by one or more caching agent representations that correspond to an entry of the sharing list. The method also includes conveying a snoop packet to a caching agent when the logical conjunction of an entry of the cache line sharing list that corresponds to a caching agent representation and the accompanying validity indicator meets a predetermined Boolean condition. | 06-09-2011 |
20110153955 | SOFTWARE ASSISTED TRANSLATION LOOKASIDE BUFFER SEARCH MECHANISM - A computer implemented method searches a unified translation lookaside buffer. Responsive to a request to access the unified translation lookaside buffer, a first order code within a first entry of a search priority configuration register is identified. A unified translation lookaside buffer is then searched according to the first order code for a hashed page entry. If the hashed page entry is not found when searching a unified translation lookaside buffer according to the first order code, a second order code is identified within a second entry of the search priority configuration register. The unified translation lookaside buffer is then searched according to the second order code for the hashed page entry. | 06-23-2011 |
20110153956 | Cache Coherent Switch Device - In one embodiment, the present invention includes a switch device to be coupled between a first semiconductor component and a processor node by interconnects of a communication protocol that provides for cache coherent transactions and non-cache coherent transactions. The switch device includes logic to handle cache coherent transactions from the first semiconductor component to the processor node, while the first semiconductor component does not include such logic. Other embodiments are described and claimed. | 06-23-2011 |
20110161601 | INTER-QUEUE ANTI-STARVATION MECHANISM WITH DYNAMIC DEADLOCK AVOIDANCE IN A RETRY BASED PIPELINE - Methods and apparatus relating to an inter-queue anti-starvation mechanism with dynamic deadlock avoidance in a retry based pipeline are described. In one embodiment, logic may arbitrate between two queues based on various rules. The queues may store data including local or remote requests, data responses, non-data responses, external interrupts, etc. Other embodiments are also disclosed. | 06-30-2011 |
20110185128 | Memory access method and information processing apparatus - To maintain data consistency in an information processing apparatus in which a nodes are coupled, takeout information indicating that data of the node is taken out to a secondary memory of another node is stored in a directory of each node. When a cache miss occurs during a memory access to a secondary memory of one node, the one node judges whether a destination of the memory access is a main or the secondary memory thereof. If the memory access is destination is the main or secondary memory of the one node, the directory is indexed and retrieved to judge whether a directory hit occurs, and if no directory hit occurs, a memory access is performed by the one node based on the memory access. | 07-28-2011 |
20110213934 | Data processing apparatus and method for switching a workload between first and second processing circuitry - A data processing apparatus and method are provided for switching performance of a workload between two processing circuits. The data processing apparatus has first processing circuitry which is architecturally compatible with second processing circuitry, but with the first processing circuitry being micro-architecturally different from the second processing circuitry. At any point in time, a workload consisting of at least one application and at least one operating system for running that application is performed by one of the first processing circuitry and the second processing circuitry. A switch controller is responsive to a transfer stimulus to perform a handover operation to transfer performance of the workload from source processing circuitry to destination processing circuitry, with the source processing circuitry being one of the first and second processing circuitry and the destination processing circuitry being the other of the first and second processing circuitry. During the handover operation, the switch controller causes the source processing circuitry to makes it current architectural state available to the destination processing circuitry, the current architectural state being that state not available from shared memory at a time the handover operation is initiated, and that is necessary for the destination processing circuitry to successfully take over performance of the workload from the source processing circuitry. In addition, the switch controller masks predetermined processor specific configuration information from the at least one operating system such that the transfer of the workload is transparent to that operating system. Such an approach has been found to yield significant energy consumption benefits whilst avoiding complexities associated with providing operating systems with the capability for switching applications between processing circuits. | 09-01-2011 |
20110213935 | Data processing apparatus and method for switching a workload between first and second processing circuitry - A data processing apparatus and method are provided for switching performance of a workload between two processing circuits. The data processing apparatus has first processing circuitry which is architecturally compatible with second processing circuitry, but with the first processing circuitry being micro-architecturally different from the second processing circuitry. At any point in time, a workload consisting of at least one application and at least one operating system for running that application is performed by one of the first processing circuitry and the second processing circuitry. A switch controller is responsive to a transfer stimulus to perform a handover operation to transfer performance of the workload from source processing circuitry to destination processing circuitry, with the source processing circuitry being one of the first and second processing circuitry and the destination processing circuitry being the other of the first and second processing circuitry. The switch controller is arranged, during the handover operation, to cause the source processing circuitry to make its current architectural state available to the destination processing circuitry, the current architectural state being that state not available from shared memory shared between the first and second processing circuitry at a time the handover operation is initiated, and that is necessary for the destination processing circuitry to successfully take over performance of the workload from the source processing circuitry. Further, the source processing circuitry and second processing circuitry implement an accelerated mechanism to make the current architectural state available to the destination processing circuitry without routing of the current architectural state via the shared memory. Since the accelerated mechanism is quick and energy efficient, it increases the number of situations it which it is energy efficient to make the switch from one processing circuitry to the other. | 09-01-2011 |
20110296115 | Assigning Memory to On-Chip Coherence Domains - A mechanism is provided for assigning memory to on-chip cache coherence domains. The mechanism assigns caches within a processing unit to coherence domains. The mechanism then assigns chunks of memory to the coherence domains. The mechanism monitors applications running on cores within the processing unit to identify needs of the applications. The mechanism may then reassign memory chunks to the cache coherence domains based on the needs of the applications running in the coherence domains. When a memory controller receives the cache miss, the memory controller may look up the address in a lookup table that maps memory chunks to cache coherence domains. Snoop requests are sent to caches within the coherence domain. If a cache line is found in a cache within the coherence domain, the cache line is returned to the originating cache by the cache containing the cache line either directly or through the memory controller. If a cache line is not found within the coherence domain, the memory controller accesses the memory to retrieve the cache line. | 12-01-2011 |
20110296116 | System and Method for Aggregating Core-Cache Clusters in Order to Produce Multi-Core Processors - According to one embodiment of the invention, a processor comprises a memory, a plurality of core-cache clusters and a scalability agent unit that operates as an interface between an on-die interconnect and multiple core-cache clusters. The scalability agent operates in accordance with a protocol to ensure that the plurality of core-cache clusters appear as a single caching agent. | 12-01-2011 |
20120036328 | DYNAMIC CACHE REDUCTION UTILIZING VOLTAGE WARNING MECHANISM - An interface controller of a storage device configured to manage a write cache of the storage device responsive to changes in a voltage supply provided to the storage device. In one implementation, the interface controller reduces the size of the write cache responsive to the voltage supply dropping at or below a first threshold. The interface controller further disables write permissions to the write cache responsive the voltage supply dropping at or below a second threshold, wherein the second threshold is lower in magnitude that the first threshold. The interface controller periodically receives the voltage supply responsive to transmitting sequential requests to a servo firmware of the storage device. | 02-09-2012 |
20120047333 | EXTENDING A CACHE COHERENCY SNOOP BROADCAST PROTOCOL WITH DIRECTORY INFORMATION - In one embodiment, a method includes receiving a read request from a first caching agent, determining whether a directory entry associated with the memory location indicates that the information is not present in a remote caching agent, and if so, transmitting the information from the memory location to the first caching agent before snoop processing with respect to the read request is completed. Other embodiments are described and claimed. | 02-23-2012 |
20120079214 | ALLOCATION AND WRITE POLICY FOR A GLUELESS AREA-EFFICIENT DIRECTORY CACHE FOR HOTLY CONTESTED CACHE LINES - Methods and apparatus relating to allocation and/or write policy for a glueless area-efficient directory cache for hotly contested cache lines are described. In one embodiment, a directory cache stores data corresponding to a caching status of a cache line. The caching status of the cache line is stored for each of a plurality of caching agents in the system. An write-on-allocate policy is used for the directory cache by using a special state (e.g., snoop-all state) that indicates one or more snoops are to be broadcasted to all agents in the system. Other embodiments are also disclosed. | 03-29-2012 |
20120079215 | Performing Mode Switching In An Unbounded Transactional Memory (UTM) System - In one embodiment, the present invention includes a method for selecting a first transaction execution mode to begin a first transaction in a unbounded transactional memory (UTM) system having a plurality of transaction execution modes. These transaction execution modes include hardware modes to execute within a cache memory of a processor, a hardware assisted mode to execute using transactional hardware of the processor and a software buffer, and a software transactional memory (STM) mode to execute without the transactional hardware. The first transaction execution mode can be selected to be a highest performant of the hardware modes if no pending transaction is executing in the STM mode, otherwise a lower performant mode can be selected. Other embodiments are described and claimed. | 03-29-2012 |
20120110270 | DATA PROCESSING SYSTEM HAVING SELECTIVE INVALIDATION OF SNOOP REQUESTS AND METHOD THEREFOR - A data processing system includes a system interconnect, a processor coupled to the system interconnect, and a cache coherency manager (CCM) coupled to the system interconnect. The processor includes a cache. A method includes generating, by the CCM, one or more snoop requests to the cache of the processor; storing the one or more snoop requests to the cache of the processor into a snoop queue; setting a cache enable indicator to indicate that the cache of the processor is to be disabled; in response to setting the cache enable indicator to indicate that the cache of the processor is to be disabled, selectively invalidating the one or more snoop requests to the cache of the processor, wherein the selectively invalidating is performed based on an invalidate snoop queue indicator of the processor; and disabling the cache. | 05-03-2012 |
20120117335 | LOAD ORDERING QUEUE - A method and apparatus to utilize a strong ordering scheme to be performed on memory operations in a processor to prevent performance degradation caused by out-of-order memory operations is provided. Also provided is a computer readable storage device encoded with data for adapting a manufacturing facility to create an apparatus. The method includes storing information associated with a first load operation in a load queue, the first load operation being executed out-of-order with respect to one or more second load operations. The method also includes detecting a snoop hit on the first load operation. The method further includes re-executing the first load operation in response to detecting the snoop hit. | 05-10-2012 |
20120124299 | SYSTEM, METHOD AND COMPUTER PROGRAM PRODUCT FOR EXTENDING A CACHE USING PROCESSOR REGISTERS - According to one aspect of the present disclosure, a method and technique for using processor registers for extending a cache structure is disclosed. The method includes identifying a register of a processor, identifying a cache to extend, allocating the register as an extension of the cache, and setting an address of the register as corresponding to an address space in the cache. | 05-17-2012 |
20120131284 | MULTI-CORE ACTIVE MEMORY PROCESSOR SYSTEM - In general, the present invention relates to data cache processing. Specifically, the present invention relates to a system that provides reconfigurable dynamic cache which varies the operation strategy of cache memory based on the demand from the applications originating from different external general processor cores, along with functions of a virtualized hybrid core system. The system includes receiving a data request, selecting an operational mode based on the data request and a predefined selection algorithm, and processing the data request based on the selected operational mode. The system is further configured to delegate computational or memory resource needs to a plurality of sub-processing cores for processing to satisfy application demands. | 05-24-2012 |
20120151151 | SYSTEMS AND METHODS FOR MANAGING CACHE DESTAGE SCAN TIMES - Systems and methods for managing destage scan times in a cache are provided. One system includes a cache and a processor. The processor is configured to utilize a first thread to continually determine a desired scan time for scanning the plurality of storage tracks in the cache and utilize a second thread to continually control an actual scan time of the plurality of storage tracks in the cache based on the continually determined desired scan time. One method includes utilizing a first thread to continually determine a desired scan time for scanning the plurality of storage tracks in the cache and utilizing a second thread to continually control an actual scan time of the plurality of storage tracks in the cache based on the continually determined desired scan time. Physical computer storage mediums including a computer program product for performing the above method are also provided. | 06-14-2012 |
20120151152 | READING CORE DATA IN A RING BUS TYPE MULTICORE SYSTEM - The present invention provides a ring bus type multicore system including one memory, a main memory controller for connecting the memory to a ring bus; and multiple cores connected in the shape of the ring bus, wherein each of the cores further includes a cache interface and a cache controller for controlling or managing the interface, and the cache controller of each of the cores connected in the shape of the ring bus executes a step of snooping data on the request through the cache interface; and when the cache of the core holds the data, a step of controlling the core to receive the request and return the data to the requester core, or, when the cache of the core does not hold the data, the main memory controller executes a step of reading the data from the memory and sending the data to the requester core. | 06-14-2012 |
20120159087 | Ensuring Forward Progress of Token-Required Cache Operations In A Shared Cache - Ensuring forward progress of token-required cache operations in a shared cache, including: snooping an instruction to execute a token-required cache operation; determining if a snoop machine is available and if the snoop machine is set to a reservation state; if the snoop machine is available and the snoop machine is in the reservation state, determining whether the instruction to execute the token-required cache operation owns a token or is a joint instruction; if the instruction is a joint instruction, instructing the operation to retry; if the instruction to execute the token-required cache operation owns a token, dispatching a cache controller; determining whether all required cache controllers of relevant compute nodes are available to execute the instruction; executing the instruction if the required cache controllers are available otherwise not executing the instruction. | 06-21-2012 |
20120166735 | DATA STORAGE AND ACCESS IN MULTI-CORE PROCESSOR ARCHITECTURES - Technologies are generally described for a system for sending a data block stored in a cache. In some examples described herein, a system may comprise a first processor in a first tile. The first processor is effective to generate a request for a data block, the request including a destination identifier identifying a destination tile for the data block, the destination tile being distinct from the first tile. Some example systems may further comprise a second tile effective to receive the request, the second tile effective to determine a data tile including the data block, the second tile further effective to send the request to the data tile. Some example systems may still further comprise a data tile effective to receive the request from the second tile, the data tile effective to send the data block to the destination tile. | 06-28-2012 |
20120179878 | Forming Multiprocessor Systems Using Dual Processors - In one embodiment, link logic of a multi-chip processor (MCP) formed using multiple processors may interface with a first point-to-point (PtP) link coupled between the MCP and an off-package agent and another PtP link coupled between first and second processors of the MCP, where the on-package PtP link operates at a greater bandwidth than the first PtP link. Other embodiments are described and claimed. | 07-12-2012 |
20120210073 | WRITE-THROUGH CACHE OPTIMIZED FOR DEPENDENCE-FREE PARALLEL REGIONS - An apparatus, method and computer program product for improving performance of a parallel computing system. A first hardware local cache controller associated with a first local cache memory device of a first processor detects an occurrence of a false sharing of a first cache line by a second processor running the program code and allows the false sharing of the first cache line by the second processor. The false sharing of the first cache line occurs upon updating a first portion of the first cache line in the first local cache memory device by the first hardware local cache controller and subsequent updating a second portion of the first cache line in a second local cache memory device by a second hardware local cache controller. | 08-16-2012 |
20120265944 | Assigning Memory to On-Chip Coherence Domains - A mechanism for assigning memory to on-chip cache coherence domains assigns caches within a processing unit to coherence domains. The mechanism assigns chunks of memory to the coherence domains. The mechanism monitors applications running on cores within the processing unit to identify needs of the applications. The mechanism may then reassign memory chunks to the cache coherence domains based on the needs of the applications running in the coherence domains. When a memory controller receives the cache miss, the memory controller may look up the address in a lookup table that maps memory chunks to cache coherence domains. Snoop requests are sent to caches within the coherence domain. If a cache line is found in a cache within the coherence domain, the cache line is returned to the originating cache by the cache containing the cache line either directly or through the memory controller. | 10-18-2012 |
20120272011 | PROCESSOR CACHE TRACING - A method for refining multithread software executed on a processor chip of a computer system. The envisaged processor chip has at least one processor core and a memory cache coupled to the processor core and configured to cache at least some data read from memory. The method includes, in logic distinct from the processor core and coupled to the memory cache, observing a sequence of operations of the memory cache and encoding a sequenced data stream that traces the sequence of operations observed. | 10-25-2012 |
20120290796 | SYSTEM AND METHOD FOR MAINTAINING CACHE COHERENCY ACROSS A SERIAL INTERFACE BUS USING A SNOOP REQUEST AND COMPLETE MESSAGE - Techniques are disclosed for maintaining cache coherency across a serial interface bus such as a Peripheral Component Interconnect Express (PCIe) bus. The techniques include generating a snoop request (SNP) to determine whether first data stored in a local memory is coherent relative to second data stored in a data cache, the snoop request including destination information that identifies the data cache on the serial interface bus and causing the snoop request to be transmitted over the serial interface bus to a second processor. The techniques further include extracting a cache line address from the snoop request, determining whether the second data is coherent, generating a complete message (CPL) indicating that the first data is coherent with the second data, and causing the complete message to be transmitted over the bus to the first processor. The snoop request and complete messages may be vendor defined messages. | 11-15-2012 |
20120311272 | NOVEL SNOOP FILTER FOR FILTERING SNOOP REQUESTS - A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having one or more local cache memories associated and operatively connected therewith. The method comprises providing a snoop filter device associated with each processing unit, each snoop filter device having a plurality of dedicated input ports for receiving snoop requests from dedicated memory writing sources in the multiprocessor computing environment. Each snoop filter device includes a plurality of parallel operating port snoop filters in correspondence with the plurality of dedicated input ports, each port snoop filter implementing one or more parallel operating sub-filter elements that are adapted to concurrently filter snoop requests received from respective dedicated memory writing sources and forward a subset of those requests to its associated processing unit. | 12-06-2012 |
20120317368 | Memory interface control - A memory interface apparatus | 12-13-2012 |
20120317369 | SATISFYING MEMORY ORDERING REQUIREMENTS BETWEEN PARTIAL READS AND NON-SNOOP ACCESSES - A method and apparatus for preserving memory ordering in a cache coherent link based interconnect in light of partial and non-coherent memory accesses is herein described. In one embodiment, partial memory accesses, such as a partial read, is implemented utilizing a Read Invalidate and/or Snoop Invalidate message. When a peer node receives a Snoop Invalidate message referencing data from a requesting node, the peer node is to invalidate a cache line associated with the data and is not to directly forward the data to the requesting node. In one embodiment, when the peer node holds the referenced cache line in a Modified coherency state, in response to receiving the Snoop Invalidate message, the peer node is to writeback the data to a home node associated with the data. | 12-13-2012 |
20120317370 | CACHE STATE MANAGEMENT ON A MOBILE DEVICE TO PRESERVE USER EXPERIENCE - Systems and methods for cache state management to preserve user experience with a mobile application on a mobile device while conserving resources in a wireless network are disclosed. In one embodiment, the method can include, for example, storing content from a content server as cached elements in a local cache on the mobile device and in response to receiving polling requests to contact the content server, retrieving the cached elements from the local cache to respond to the polling requests made at the mobile device, and/or using state information associated with the cached elements to provide the cached elements as responses to the polling requests such that user experience is preserved. | 12-13-2012 |
20130007376 | OPPORTUNISTIC SNOOP BROADCAST (OSB) IN DIRECTORY ENABLED HOME SNOOPY SYSTEMS - Methods and apparatus relating to Opportunistic Snoop Broadcast (OSB) in directory enabled home snoopy systems are described. In one embodiment, a plurality of snoops are broadcast to a plurality of caching agents in response to a request for data and based on a comparison of a bandwidth consumption of the link and a threshold value. Other embodiments are also disclosed. | 01-03-2013 |
20130024629 | Data processing apparatus and method for managing coherency of cached data - An interconnect having a plurality of interconnect nodes arranged to provide at least one ring, a plurality of caching nodes for caching data coupled into the interconnect via an associated one of said interconnect nodes, and at least one coherency management node for implementing a coherency protocol to manage coherency of the data cached by each of said caching nodes. Each coherency management node being coupled into the interconnect via an associated one of said interconnect nodes. When each caching node produces a snoop response for said snoop request, the associated interconnect node is configured to output that snoop response in one of said at least one identified slots. Further, each interconnect node associated with a caching node has merging circuitry configured, when outputting the snoop response in an identified slot, to merge that snoop response with any current snoop response information held in that slot. | 01-24-2013 |
20130024630 | Terminating barriers in streams of access requests to a data store while maintaining data consistency - A memory controller for a slave memory that controls an order of data access requests is disclosed. There is a read and write channel having streams of requests with corresponding barrier transactions within the request streams indicating where reordering should not occur. The controller has barrier response generating circuitry located on the read and said write channels and being responsive to receipt of one of said barrier transactions: to issue a response to the received barrier transaction such that subsequent requests in said stream of requests are not blocked by the barrier transaction and can be received and to terminate the received barrier transaction and not transmit the received barrier transaction further; and to mark requests subsequent to the received barrier transaction in the stream of requests with a barrier context value identifying the received barrier transaction. The memory controller comprises a point of data consistency on the write channel prior to the memory; and the memory controller comprises comparison circuitry configured to compare the bather context value of each write request to be issued to the memory with the barrier context values of at least some pending read requests, the pending read requests being requests received at the memory controller but not yet issued to the memory and: in response to detecting at least one of the pending read requests with an earlier barrier context value identifying a bather transaction that has a corresponding barrier transaction in the stream of requests on the write channel that is earlier in the stream of requests than the write request, stalling the write request until the at least one pending read request has been performed; and in response to detecting no pending read requests with the earlier barrier context value, issuing the write request to the memory. | 01-24-2013 |
20130042077 | Data hazard handling for copending data access requests - A data processing system that manages data hazards at a coherency controller and not at an initiator device is disclosed. The data processing system process write requests in a two part form, such that a first part is transmitted and when the coherency controller has space to accept data it responds to the first part and the data and state of the data prior to the write are sent as a second part of the write request. When there are copending reads and writes to the same address the writes are stalled by the coherency controller by not responding to the first part of the write and the initiator device proceeds to process any snoop requests received to the address of the write regardless of the fact that the write is pending. When the pending read has completed the coherency controller will respond to the first part of the write and the initiator device will complete the write by sending the data and an indicator of the state of the data following the snoop. The coherency controller can then avoid any potential data hazard using this information to update memory as required. | 02-14-2013 |
20130042078 | Snoop filter and non-inclusive shared cache memory - A data processing apparatus | 02-14-2013 |
20130086331 | INFORMATION PROCESSING SYSTEM AND A SYSTEM CONTROLLER - In a system including a plurality of CPU units having a cache memory of different capacity each other and a system controller that connects to the plurality of CPUs and controls cache synchronization, the system controller includes a cache synchronization unit which monitors an address contention between a preceding request and a subsequent request and a setting unit which sets different monitoring range of the contention between the preceding request and the subsequent request for each capacity of the cache memory in each of the CPU units. | 04-04-2013 |
20130132683 | PCI EXPRESS ENHANCEMENTS AND EXTENSIONS - A method and apparatus forenhancing/extending a serial point-to-point interconnect architecture, such as Peripheral Component Interconnect Express (PCIe) is herein described. Temporal and locality caching hints and prefetching hints are provided to improve system wide caching and prefetching. Message codes for atomic operations to arbitrate ownership between system devices/resources are included to allow efficient access/ownership of shared data. Loose transaction ordering provided for while maintaining corresponding transaction priority to memory locations to ensure data integrity and efficient memory access. Active power sub-states and setting thereof is included to allow for more efficient power management. And, caching of device local memory in a host address space, as well as caching of system memory in a device local memory address space is provided for to improve bandwidth and latency for memory accesses. | 05-23-2013 |
20130159633 | QOS MANAGEMENT IN THE L2 CACHE - Methods and apparatuses for assigning a QoS level to memory requests based on the number of currently outstanding memory requests. One or more processors of a processor complex issue memory requests to a L2 cache. The L2 cache controller assigns a QoS level to the memory request based on whether the number of outstanding memory requests is above or below a programmable threshold. If the number is above the threshold, then new requests typically do not impair processor performance since the processor is already waiting for a large number of previous memory requests, and so the new memory request is assigned a low priority level. If the number of outstanding memory requests is below the threshold, then the new memory request is assigned a high priority level. | 06-20-2013 |
20130185521 | MULTIPROCESSOR SYSTEM AND SCHEDULING METHOD - A multiprocessor system includes a master processor, at least one slave processor, and a synchronization unit. The master processor has a first flag indicating whether the master processor is in a task activation accepting state and a second flag reflective of a flag of a slave processor, iteratively updates the first flag at a frequency based on the volume of tasks processed by the master processor, and activates a task on the master processor or the slave processor based on the first flag and the second flag. Each slave processor has a third flag indicating whether the slave processor is in the task activation accepting state and iteratively updates the third flag at a frequency based on the volume of tasks processed by the slave processor. Tasks are allocated to the slave processor by the master processor. The synchronization unit synchronizes the third flag and the second flag. | 07-18-2013 |
20130185522 | ALLOCATION AND WRITE POLICY FOR A GLUELESS AREA-EFFICIENT DIRECTORY CACHE FOR HOTLY CONTESTED CACHE LINES - Methods and apparatus relating to allocation and/or write policy for a glueless area-efficient directory cache for hotly contested cache lines are described. In one embodiment, a directory cache stores data corresponding to a caching status of a cache line. The caching status of the cache line is stored for each of a plurality of caching agents in the system. An write-on-allocate policy is used for the directory cache by using a special state (e.g., snoop-all state) that indicates one or more snoops are to be broadcasted to all agents in the system. Other embodiments are also disclosed. | 07-18-2013 |
20130205098 | FORWARD PROGRESS MECHANISM FOR STORES IN THE PRESENCE OF LOAD CONTENTION IN A SYSTEM FAVORING LOADS - A multiprocessor data processing system includes a plurality of cache memories including a cache memory. In response to the cache memory detecting a storage-modifying operation specifying a same target address as that of a first read-type operation being processed by the cache memory, the cache memory provides a retry response to the storage-modifying operation. In response to completion of the read-type operation, the cache memory enters a referee mode. While in the referee mode, the cache memory temporarily dynamically increases priority of any storage-modifying operation targeting the target address in relation to any second read-type operation targeting the target address. | 08-08-2013 |
20130205099 | FORWARD PROGRESS MECHANISM FOR STORES IN THE PRESENCE OF LOAD CONTENTION IN A SYSTEM FAVORING LOADS BY STATE ALTERATION - A multiprocessor data processing system includes a plurality of cache memories including a cache memory. The cache memory issues a read-type operation for a target cache line. While waiting for receipt of the target cache line, the cache memory monitors to detect a competing store-type operation for the target cache line. In response to receiving the target cache line, the cache memory installs the target cache line in the cache memory, and sets a coherency state of the target cache line installed in the cache memory based on whether the competing store-type operation is detected. | 08-08-2013 |
20130219128 | METHODS AND APPARATUS FOR REUSING SNOOP RESPONSES AND DATA PHASE RESULTS IN A BUS CONTROLLER - Methods and apparatus are provided for reusing snoop responses and data phase results in a bus controller. A bus controller receives an incoming bus transaction BTR | 08-22-2013 |
20130219129 | METHODS AND APPARATUS FOR REUSING SNOOP RESPONSES AND DATA PHASE RESULTS IN A CACHE CONTROLLER - Methods and apparatus are provided for reusing snoop responses and data phase results in a cache controller. A cache controller receives a broadcast combined snoop response from a bus controller, wherein the broadcast combined snoop response corresponds to an incoming bus transaction BTR1 corresponding to a cache transaction CTR1 for an entry in at least one cache and wherein the combined snoop response is a combination of at least one snoop response from a plurality of cache controllers; receives broadcast cache line data from a source cache as instructed by the bus controller for the entry during a data phase; and processes a subsequent cache transaction CTR2 for the entry based on one or more of the broadcast combined snoop response and the broadcast cache line data. | 08-22-2013 |
20130275686 | MULTIPROCESSOR SYSTEM AND METHOD FOR MANAGING CACHE MEMORY THEREOF - A multiprocessor system includes a plurality of master devices, at least one slave device, and a system bus connecting the master devices to the at least one slave device. At least one of the master devices includes at least one cache memory, and the system bus processes a data write or data read request corresponding a transaction issued to the slave device from at least one of the master devices prior to termination of a snooping operation on the master devices. | 10-17-2013 |
20130311725 | DATA PROCESSING APPARATUS AND METHOD FOR TRANSFERRING WORKLOAD BETWEEN SOURCE AND DESTINATION PROCESSING CIRCUITRY - In response to a transfer stimulus, performance of a processing workload is transferred from a source processing circuitry to a destination processing circuitry, in preparation for the source processing circuitry to be placed in a power saving condition following the transfer. To reduce the number of memory fetches required by the destination processing circuitry following the transfer, a cache of the source processing circuitry is maintained in a powered state for a snooping period. During the snooping period, cache snooping circuitry snoops data values in the source cache and retrieves the snoop data values for the destination processing circuitry. | 11-21-2013 |
20130318308 | SCALABLE CACHE COHERENCE FOR A NETWORK ON A CHIP - Maintaining cache coherence in a System-on-a-Chip with both multiple cache coherent master IP cores (CCMs) and non-cache coherent master IP cores (NCMs). A plug-in cache coherence manager (CM), coherence logic in agents, and an interconnect are used for the SoC to provide a scalable cache coherence scheme that scales to an amount of CCMs in the SoC. The CCMs each includes at least one processor operatively coupled through the CM to at least one cache that stores data for that CCM. The CM maintains cache coherence responsive to a cache miss of a cache line on a first cache of the caches, then broadcasts a request for an instance of the data stored corresponding to cache miss of the cache line in the first cache. Each CCM maintains its own coherent cache and each NCM is configured to issue communication transactions into both coherent and non-coherent address spaces. | 11-28-2013 |
20140006723 | Adaptive Configuration of Cache | 01-02-2014 |
20140013060 | ENSURING CAUSALITY OF TRANSACTIONAL STORAGE ACCESSES INTERACTING WITH NON-TRANSACTIONAL STORAGE ACCESSES - A data processing system implements a weak consistency memory model for a distributed shared memory system. The data processing system concurrently executes, on a plurality of processor cores, one or more transactional memory instructions within a memory transaction and one or more non-transactional memory instructions. The one or more non-transactional memory instructions include a non-transactional store instruction. The data processing system commits the memory transaction to the distributed shared memory system only in response to enforcement of causality of the non-transactional store instruction with respect to the memory transaction. | 01-09-2014 |
20140032858 | METHODS AND APPARATUS FOR CACHE LINE SHARING AMONG CACHE CONTROLLERS - Methods and apparatus are provided for cache line sharing among cache controllers. A cache comprises a plurality of cache lines; and a cache controller for sharing at least one of the cache lines with one or more additional caches, wherein a given cache line shared by a plurality of caches corresponds to a given set of physical addresses in a main memory. The cache controller optionally maintains an ownership control signal indicating which portions of the at least one cache line are controlled by the cache and a validity control signal indicating whether each portion of the at least one cache line is valid. Each cache line can be in one of a plurality of cache coherence states, including a modified partial state and a shared partial state. | 01-30-2014 |
20140052933 | WRITE TRANSACTION MANAGEMENT WITHIN A MEMORY INTERCONNECT - A memory interconnect between transaction masters and a shared memory. A first snoop request is sent to other transaction masters to trigger them to invalidate any local copy of that data they may hold and for them to return any cached line of data corresponding to the write line of data that is dirty. A first write transaction is sent to the shared memory. When and if any cached line of data is received from the further transaction masters, then the portion data is used to form a second write transaction which is sent to the shared memory and writes the remaining portions of the cached line of data which were not written by the first write transaction in to the shared memory. The serialisation circuitry stalls any transaction requests to the write line of data until the first write transaction. | 02-20-2014 |
20140089603 | Techniques for Managing Power and Performance of Multi-Socket Processors - Examples are disclosed for managing power and performance of multi-socket processors. In some examples, a utilization rate of a first processor circuitry in a first processor socket may be determined. An active memory ratio of a cache for the first processor circuitry may be compared to a threshold ratio or a data traffic rate between the first processor circuitry and a second processor circuitry in a second processor socket may be compared to a threshold rate. According to some examples, a first power state of the first processor circuitry may be changed based on the determined utilization rate. The first power state may also be changed based on the comparison of the active memory ratio to the threshold ratio or the comparison of the data traffic rate to the threshold rate. | 03-27-2014 |
20140095806 | CONFIGURABLE SNOOP FILTER ARCHITECTURE - Configurable snoop filters. A memory system is coupled with one or more processing cores. A coherent system fabric couples the memory system with the one or more processing cores. The coherent system fabric comprising at least a configurable snoop filter that is configured based on workload. The configurable snoop filter having a configurable snoop filter directory and a bloom filter. The configurable snoop filter and the bloom filter include runtime configuration parameters that are used to selectively limit snoop traffic. | 04-03-2014 |
20140095807 | ADAPTIVE TUNING OF SNOOPS - A coherency controller, such as one used within a system-on-chip, is capable of issuing different types of snoops to coherent caches. The coherency controller chooses the type of snoop based on the type of request that caused the snoops or the state of the system or both. By so doing, coherent caches provide data when they have sufficient throughput, and are not required to provide data when they do not have insufficient throughput. | 04-03-2014 |
20140095808 | ADAPTIVE TUNING OF SNOOPS - A coherency controller, such as one used within a system-on-chip, is capable of issuing different types of snoops to coherent caches. The coherency controller chooses the type of snoop based on the type of request that caused the snoops or the state of the system or both. By so doing, coherent caches provide data when they have sufficient throughput, and are not required to provide data when they do not have insufficient throughput. | 04-03-2014 |
20140095809 | COHERENCY CONTROLLER WITH REDUCED DATA BUFFER - A coherency controller with a data buffer store that is smaller than the volume of pending read data requests. Data buffers are allocated only for requests that match the ID of another pending request. Buffers are deallocated if all snoops receive responses, none of which contain data. Buffers containing clean data have their data discarded and are reallocated to later requests. The discarded data is later read from the target. When all buffers are full of dirty data requests with a pending order ID are shunted into request queues for later service. Dirty data may be foisted onto coherent agents to make buffers available for reallocation. Accordingly, the coherency controller can issue snoops and target requests for a volume of data that exceeds the number of buffers in the data store. | 04-03-2014 |
20140108744 | SIMPLIFIED CONTROLLER WITH PARTIAL COHERENCY - A simplified coherency controller supports multiple exclusively active fully coherent agent interfaces and any number of active I/O (partially) coherent agent interfaces. A state controller determines which fully coherent agent is active. Multiple fully coherent agents can be simultaneously active during a short period of a transition of processing from one to another processor. Multiple fully coherent agents can be simultaneously active, though without a mutually consistent view of memory, which is practical in cases such as when running multiple operating systems on different processors. | 04-17-2014 |
20140115268 | HIGH PERFORMANCE INTERCONNECT COHERENCE PROTOCOL - A coherence protocol message is sent corresponding to a particular cache line. A potential conflict involving the particular cache line is identified and a forward request is sent to a home agent to identify the potential conflict. A forward response can be received in response to the forward request from the home agent and a response to the conflict can be determined. | 04-24-2014 |
20140115269 | Multi Domain Bridge with Auto Snoop Response - An asynchronous dual domain bridge is implemented between the cache coherent master and the coherent system interconnect. The bridge has 2 halves, one in each clock/powerdown domain-master and interconnect. The powerdown mechanism is isolated to just the asynchronous bridge implemented between the master and the interconnect with a basic request/acknowledge handshake between the master subsystem and the asynchronous bridge. | 04-24-2014 |
20140115270 | MULTI PROCESSOR BRIDGE WITH MIXED ENDIAN MODE SUPPORT - An asynchronous dual domain bridge is implemented between the cache coherent master and the coherent system interconnect. The bridge has 2 halves, one in each clock/powerdown domain—master and interconnect. The asynchronous bridge is aware of the endian view used by each individual processor within the attached subsystem, and can perform the appropriate endian conversion on each processor's transactions to adapt the transaction to/from the endian view used by the interconnect. | 04-24-2014 |
20140115271 | COHERENCE CONTROLLER SLOT ARCHITECTURE ALLOWING ZERO LATENCY WRITE COMMIT - This invention speeds operation for coherence writes to shared memory. This invention immediately commits to the memory endpoint coherence write data. Thus this data will be available earlier than if the memory controller stalled this write pending snoop responses. This invention computes write enable strobes for the coherence write data based upon the cache dirty tags. This invention initiates a snoop cycle based upon the address of the coherence write. The stored write enable strobes enable determination of which data to write to the endpoint memory upon a cached and dirty snoop response. | 04-24-2014 |
20140115272 | Deadlock-Avoiding Coherent System On Chip Interconnect - This invention mitigates these deadlocking issues by a adding a separate non-blocking pipeline for snoop returns. This separate pipeline would not be blocked behind coherent requests. This invention also repartitions the master initiated traffic to move cache evictions (both with and without data) and non-coherent writes to the new non-blocking channel. This non-blocking pipeline removes the need for any coherent requests to complete before the snoop request can reach the memory controller. Repartitioning cache initiated evictions to the non-blocking pipeline prevents deadlock when snoop and eviction occur concurrently. The non-blocking channel of this invention combines snoop responses from memory controller initiated requests and master initiated evictions/non-coherent writes. | 04-24-2014 |
20140115273 | DISTRIBUTED DATA RETURN BUFFER FOR COHERENCE SYSTEM WITH SPECULATIVE ADDRESS SUPPORT - The MSMC (Multicore Shared Memory Controller) described is a module designed to manage traffic between multiple processor cores, other mastering peripherals or DMA, and the EMIF (External Memory InterFace)in a multicore SoC. Each processor has an associated return buffer allowing out of order responses of memory read data and cache snoop responses to ensure maximum bandwidth at the endpoints, and all endpoints receive status messages to simplify the return queue. | 04-24-2014 |
20140115274 | EXTENDING A CACHE COHERENCY SNOOP BROADCAST PROTOCOL WITH DIRECTORY INFORMATION - In one embodiment, a method includes receiving a read request from a first caching agent, determining whether a directory entry associated with the memory location indicates that the information is not present in a remote caching agent, and if so, transmitting the information from the memory location to the first caching agent before snoop processing with respect to the read request is completed. Other embodiments are described and claimed. | 04-24-2014 |
20140115275 | SATISFYING MEMORY ORDERING REQUIREMENTS BETWEEN PARTIAL READS AND NON-SNOOP ACCESSES - A method and apparatus for preserving memory ordering in a cache coherent link based interconnect in light of partial and non-coherent memory accesses is herein described. In one embodiment, partial memory accesses, such as a partial read, is implemented utilizing a Read Invalidate and/or Snoop Invalidate message. When a peer node receives a Snoop Invalidate message referencing data from a requesting node, the peer node is to invalidate a cache line associated with the data and is not to directly forward the data to the requesting node. In one embodiment, when the peer node holds the referenced cache line in a Modified coherency state, in response to receiving the Snoop Invalidate message, the peer node is to writeback the data to a home node associated with the data. | 04-24-2014 |
20140136797 | TECHNIQUE TO SHARE INFORMATION AMONG DIFFERENT CACHE COHERENCY DOMAINS - A technique to enable information sharing among agents within different cache coherency domains. In one embodiment, a graphics device may use one or more caches used by one or more processing cores to store or read information, which may be accessed by one or more processing cores in a manner that does not affect programming and coherency rules pertaining to the graphics device. | 05-15-2014 |
20140149686 | COHERENT ATTACHED PROCESSOR PROXY SUPPORTING MASTER PARKING - In response to receiving a memory access request and expected coherence state at an attached processor at a coherent attached processor proxy (CAPP), the CAPP determines that a conflicting request is being serviced. In response to determining that the CAPP is servicing a conflicting request and that the expected state matches, a master machine of the CAPP is allocated in a Parked state to service the memory access request after completion of service of the conflicting request. The Parked state prevents servicing by the CAPP of a further conflicting request snooped on the system fabric. In response to completion of service of the conflicting request, the master machine transitions out of the Parked state and issues on the system fabric a memory access request corresponding to that received from the AP. | 05-29-2014 |
20140149687 | METHOD AND APPARATUS FOR SUPPORTING TARGET-SIDE SECURITY IN A CACHE COHERENT SYSTEM - A cache coherency controller, a system comprising such, and a method of its operation are disclosed. The coherency controller ensures that target-side security checking rules are not violated by the performance-improving processes commonly used in coherency controllers such as dropping, merging, invalidating, forwarding, and snooping. This is done by ensuring that requests marked for target-side security checking and any other requests to overlapping addresses are forwarded directly to the target-side security filter without modification or side effects. | 05-29-2014 |
20140149688 | COHERENT ATTACHED PROCESSOR PROXY SUPPORTING MASTER PARKING - In response to receiving a memory access request and expected coherence state at an attached processor at a coherent attached processor proxy (CAPP), the CAPP determines that a conflicting request is being serviced. In response to determining that the CAPP is servicing a conflicting request and that the expected state matches, a master machine of the CAPP is allocated in a Parked state to service the memory access request after completion of service of the conflicting request. The Parked state prevents servicing by the CAPP of a further conflicting request snooped on the system fabric. In response to completion of service of the conflicting request, the master machine transitions out of the Parked state and issues on the system fabric a memory access request corresponding to that received from the AP. | 05-29-2014 |
20140149689 | COHERENT PROXY FOR ATTACHED PROCESSOR - A coherent attached processor proxy (CAPP) of a primary coherent system receives a memory access request from an attached processor (AP) and an expected coherence state of a target address of the memory access request with respect to a cache memory of the AP. In response, the CAPP determines a coherence state of the target address and whether or not the expected state matches the determined coherence state. In response to determining that the expected state matches the determined coherence state, the CAPP issues a memory access request corresponding to that received from the AP on a system fabric of the primary coherent system. In response to determining that the expected state does not match the coherence state determined by the CAPP, the CAPP transmits a failure message to the AP without issuing on the system fabric a memory access request corresponding to that received from the AP. | 05-29-2014 |
20140149690 | Multi-Processor, Multi-Domain, Multi-Protocol Cache Coherent Speculation Aware Shared Memory Controller and Interconnect - This invention combines a multicore shared memory controller and an asynchronous protocol converting bridge to create a very efficient heterogeneous multi-processor system. After traversing the protocol converting bridge the commands travel through the regular processor port. This allows the interconnect to remain unchanged while having any combination of different processors connected. This invention tightly integrates all of the processors into the same memory controller/interconnect. | 05-29-2014 |
20140156950 | EMULATED MESSAGE SIGNALED INTERRUPTS IN MULTIPROCESSOR SYSTEMS - A processor with coherency-leveraged support for low latency message signaled interrupt handling includes multiple execution cores and their associated cache memories. A first cache memory associated a first of the execution cores includes a plurality of cache lines. The first cache memory has a cache controller including hardware logic, microcode, or both to identify a first cache line as an interrupt reserved cache line and map the first cache line to a selected memory address associated with an I/O device. The selected system address may be a portion of configuration data in persistent storage accessible to the processor. The controller may set a coherency state of the first cache line to shared and, in response to detecting an I/O transaction including I/O data from the I/O device and containing a reference to the selected memory address, emulate a first message signaled interrupt identifying the selected memory address. | 06-05-2014 |
20140156951 | MULTICORE, MULTIBANK, FULLY CONCURRENT COHERENCE CONTROLLER - This invention optimizes non-shared accesses and avoids dependencies across coherent endpoints to ensure bandwidth across the system even when sharing. The coherence controller is distributed across all coherent endpoints. The coherence controller for each memory endpoint keeps a state around for each coherent access to ensure the proper ordering of events. The coherence controller of this invention uses First-In-First-Out allocation to ensure full utilization of the resources before stalling and simplicity of implementation. The coherence controller provides Snoop Command/Response ID Allocation per memory endpoint. | 06-05-2014 |
20140181419 | CREDIT LOOKAHEAD MECHANISM - Systems and methods for preventing excessive buffering of transactions in a coherence point. The coherence point uses a lookahead mechanism to determine if there are enough credits from the memory controller for forwarding the outstanding transactions stored in the IRQ. If there are not enough credits, then the coherence point prevents the switch fabric from forwarding additional transactions to the coherence point. By preventing excessive buffering in the IRQ, the QoS-based ordering of transactions performed by the switch fabric is preserved. | 06-26-2014 |
20140181420 | DISTRIBUTED CACHE COHERENCY DIRECTORY WITH FAILURE REDUNDANCY - A system includes a number of processors with each processor including a cache memory. The system also includes a number of directory controllers coupled to the processors. Each directory controller may be configured to administer a corresponding cache coherency directory. Each cache coherency directory may be configured to track a corresponding set of memory addresses. Each processor may be configured with information indicating the corresponding set of memory addresses tracked by each cache coherency directory. Directory redundancy operations in such a system may include identifying a failure of one of the cache coherency directories; reassigning the memory address set previously tracked by the failed cache coherency directory among the non-failed cache coherency directories; and reconfiguring each processor with information describing the reassignment of the memory address set among the non-failed cache coherency directories. | 06-26-2014 |
20140189253 | CACHE COHERENCY AND PROCESSOR CONSISTENCY - Responsive to execution of a computer instruction in a current translation window, state indicators associated with a cache line accessed for the execution may be modified. The state indicators may include: a first indicator to indicate whether the computer instruction is a load instruction moved from a subsequent translation window into the current translation window, a second indicator to indicate whether the cache line is modified in a cache responsive to the execution of the computer instruction, a third indicator to indicate whether the cache line is speculatively modified in the cache responsive to the execution of the computer instruction, a fourth indicator to indicate whether the cache line is speculatively loaded by the computer instruction, a fifth indicator to indicate whether a core executing the computer instruction exclusively owns the cache line, and a sixth indicator to indicate whether the cache line is invalid. | 07-03-2014 |
20140189254 | Snoop Filter Having Centralized Translation Circuitry and Shadow Tag Array - A processor is described that includes a plurality of processing cores. The processor includes an interconnection network coupled to each of said processing cores. The processor includes snoop filter logic circuitry coupled to the interconnection network and associated with coherence plane logic circuitry of the processor. The snoop filter logic circuitry contains circuitry to hold information that identifies not only which of the processing cores are caching specific cache lines that are cached by the processing cores, but also, where in respective caches of the processing cores the cache lines are cached. | 07-03-2014 |
20140189255 | METHOD AND APPARATUS TO SHARE MODIFIED DATA WITHOUT WRITE-BACK IN A SHARED-MEMORY MANY-CORE SYSTEM - A cache-coherent device may include multiple caches and a cache coherency engine, which monitors whether there are more than one versions of a cache line stored in the caches and whether the version of the cache line in the caches is consistent with the version of the cache line stored in the memory. | 07-03-2014 |
20140201467 | EPOCH-BASED RECOVERY FOR COHERENT ATTACHED PROCESSOR PROXY - A coherent attached processor proxy (CAPP) participates in coherence communication in a primary coherent system on behalf of an attached processor external to the primary coherent system. The CAPP includes an epoch timer that advances at regular intervals to define epochs of operation of the CAPP. Each of one or more entries in a data structure in the CAPP are associated with a respective epoch. Recovery operations for the CAPP are initiated based on a comparison of an epoch indicated by the epoch timer and the epoch associated with one of the one or more entries in the data structure. | 07-17-2014 |
20140201468 | ACCELERATED RECOVERY FOR SNOOPED ADDRESSES IN A COHERENT ATTACHED PROCESSOR PROXY - A coherent attached processor proxy (CAPP) that participates in coherence communication in a primary coherent system on behalf of an external attached processor maintains, in each of a plurality of entries of a CAPP directory, information regarding a respective associated cache line of data from the primary coherent system cached by the attached processor. In response to initiation of recovery operations, the CAPP transmits, in a generally sequential order with respect to the CAPP directory, multiple memory access requests indicating an error for addresses indicated by the plurality of entries. In response to a snooped memory access request that targets a particular address hitting in the CAPP directory during the transmitting, the CAPP performs a coherence recovery operation for the particular address prior to a time indicated by the generally sequential order. | 07-17-2014 |
20140208041 | MEMORY CIRCUITRY INCLUDING COMPUTATIONAL CIRCUITRY FOR PERFORMING SUPPLEMENTAL FUNCTIONS - A computer system includes but is not limited to a primary processing circuitry, a bus coupled to the primary processing circuitry, and memory circuitry coupled to the bus. The memory circuitry is physically separated from the primary processing circuitry. The memory circuitry includes at least one integrated memory circuit and computational circuitry. The at least one integrated memory circuit configured to store and retrieve data and to provide to the bus, during accessing intervals, requested data for the primary processing circuitry. The computational circuitry co-located with the at least one integrated memory circuit, the computational circuitry co-located with integrated memory circuit can be configured for performing supplemental functions at least partially during time periods that are not accessing intervals. | 07-24-2014 |
20140223110 | ACTIVE MEMORY PROCESSOR SYSTEM - In general, the present invention relates to data cache processing. Specifically, the present invention relates to a system that provides reconfigurable dynamic cache which varies the operation strategy of cache memory based on the demand from the applications originating from different external general processor cores, along with functions of a virtualized hybrid core system. The system includes receiving a data request, selecting an operational mode based on the data request and a predefined selection algorithm, and processing the data request based on the selected operational mode. | 08-07-2014 |
20140250275 | SELECTION OF POST-REQUEST ACTION BASED ON COMBINED RESPONSE AND INPUT FROM THE REQUEST SOURCE - A data structure includes a plurality of entries each corresponding to a different systemwide combined response of a data processing system. A particular entry includes identifiers of multiple possible actions that can be taken in response to a systemwide combined response. Master logic issues a memory access request on a system fabric of the data processing system. The master logic, responsive to receiving the systemwide combined response and a selection of one of the multiple possible actions from a source of the memory access request prior to receipt of the systemwide combined response, selects the particular entry based on the systemwide combined response and selects one of the multiple possible actions identified in the particular entry based on the received selection. The master logic services the memory access request in accordance with the systemwide combined response by performing the selected one of the multiple possible actions. | 09-04-2014 |
20140250276 | SELECTION OF POST-REQUEST ACTION BASED ON COMBINED RESPONSE AND INPUT FROM THE REQUEST SOURCE - A data structure includes a plurality of entries each corresponding to a different systemwide combined response of a data processing system. A particular entry includes identifiers of multiple possible actions that can be taken in response to a systemwide combined response. Master logic issues a memory access request on a system fabric of the data processing system. The master logic, responsive to receiving the systemwide combined response and a selection of one of the multiple possible actions from a source of the memory access request prior to receipt of the systemwide combined response, selects the particular entry based on the systemwide combined response and selects one of the multiple possible actions identified in the particular entry based on the received selection. The master logic services the memory access request in accordance with the systemwide combined response by performing the selected one of the multiple possible actions. | 09-04-2014 |
20140281275 | BROADCAST MESSAGING AND ACKNOWLEDGMENT MESSAGING FOR POWER MANAGEMENT IN A MULTIPROCESSOR SYSTEM - Various aspects provide for implementing a cache coherence protocol. A system comprises at least one processing component and a centralized controller. The at least one processing component comprises a cache controller. The cache controller is configured to manage a cache memory associated with a processor. The centralized controller is configured to communicate with the cache controller based on a power state of the processor. | 09-18-2014 |
20140281276 | METHOD, APPARATUS, AND SYSTEM FOR LOW LATENCY COMMUNICATION - A method, apparatus, computer program product, and computer readable medium to perform receipt of a snoop notification indicating a write to a memory address associated with a cache, determination that the snoop notification signifies receipt of a message based, at least in part, on the memory address, and performance of an operation based, at least in part, on the message is disclosed. | 09-18-2014 |
20140297967 | INTER-QUEUE ANTI-STARVATION MECHANISM WITH DYNAMIC DEADLOCK AVOIDANCE IN A RETRY BASED PIPELINE - Methods and apparatus relating to an inter-queue anti-starvation mechanism with dynamic deadlock avoidance in a retry based pipeline are described. In one embodiment, logic may arbitrate between two queues based on various rules. The queues may store data including local or remote requests, data responses, non-data responses, external interrupts, etc. Other embodiments are also disclosed. | 10-02-2014 |
20140304480 | NEIGHBOR BASED AND DYNAMIC HOT THRESHOLD BASED HOT DATA IDENTIFICATION - An address is received. One or more neighbors associated with the received address is/are determined. One or more neighboring hot metrics is/are determined for the one or more neighbors associated with the received address. A hot metric for the received address is determined based at least in part on the neighboring hot metrics. | 10-09-2014 |
20140310480 | DATA PROCESSING APPARATUS AND METHOD FOR PERFORMING LOAD-EXCLUSIVE AND STORE-EXCLUSIVE OPERATIONS - A data processing apparatus is provided in which a processor unit accesses data values stored in a memory and a cache stores local copies of a subset of the data values. The cache maintains a status value for each local copy stored in the cache. When the processor unit executes a load-exclusive operation, a first data value is loaded from a specified memory location and an exclusive use monitor begins monitoring the specified memory location for accesses. When the processor unit executes a store-exclusive operation, a second data value is stored to the specified memory location if the exclusive use monitor indicates that the first data value has not been modified since the load-exclusive operation was executed. When a local copy of the first data value is stored in the cache and the status value for the local copy of the first data value indicates that the processor unit has exclusive usage of the first data value, the data processing apparatus is configured to prevent modification of the status value for a predetermined time period after the processor unit has executed the load-exclusive operation. | 10-16-2014 |
20140359230 | PROTOCOL FOR CONFLICTING MEMORY TRANSACTIONS - Embodiments of the invention describe a cache coherency protocol that eliminates the need for ordering between message classes and also eliminates home tracker preallocation. Embodiments of the invention describe a less complex conflict detection and resolution mechanism (at the home agent) without any performance degradation in form of bandwidth or latency compared to prior art solutions. | 12-04-2014 |
20140379997 | COHERENT ATTACHED PROCESSOR PROXY HAVING HYBRID DIRECTORY - A coherent attached processor proxy (CAPP) includes transport logic having a first interface configured to support communication with a system fabric of a primary coherent system and a second interface configured to support communication with an attached processor (AP) that is external to the primary coherent system and that includes a cache memory that holds copies of memory blocks belonging to a coherent address space of the primary coherent system. The CAPP further includes one or more master machines that initiate memory access requests on the system fabric of the primary coherent system on behalf of the AP, one or more snoop machines that service requests snooped on the system fabric, and a CAPP directory having a precise directory having a plurality of entries each associated with a smaller data granule and a coarse directory having a plurality of entries each associated with a larger data granule. | 12-25-2014 |
20140379998 | DYNAMIC HOME TILE MAPPING - Technologies for dynamic home tile mapping are described. an address request can be received from a processing core, the processing core being associated with a home tile table, the home tile table including respective mappings of one or more directory addresses to one or more home tiles. A buffer can be scanned to identify a presence of the address within the buffer. Based on an identification of the presence of the address within the buffer, a home tile identifier corresponding to the address can be provided from the buffer. | 12-25-2014 |
20150012713 | DATA PROCESSING APPARATUS HAVING FIRST AND SECOND PROTOCOL DOMAINS, AND METHOD FOR THE DATA PROCESSING APPARATUS - A data processing apparatus ( | 01-08-2015 |
20150032974 | OBJECT CACHING FOR MOBILE DATA COMMUNICATION WITH MOBILITY MANAGEMENT - Method and system are provided for object caching with mobility management for mobile data communication. The method may include: intercepting and snooping data communications at a base station between a user equipment and a content server without terminating communications; implementing object caching at the base station using snooped data communications; implementing object caching at an object cache server in the network, wherein the object cache server proxies communications to the content server from the user equipment; and maintaining synchrony between an object cache at the base station and an object cache at the object cache server. | 01-29-2015 |
20150046660 | ACTIVE MEMORY PROCESSOR SYSTEM - In general, the present invention relates to data cache processing. Specifically, the present invention relates to a system that provides reconfigurable dynamic cache which varies the operation strategy of cache memory based on the demand from the applications originating from different external general processor cores, along with functions of a virtualized hybrid core system. The system includes receiving a data request, selecting an operational mode based on the data request and a predefined selection algorithm, and processing the data request based on the selected operational mode. | 02-12-2015 |
20150067272 | SYSTEM AND METHOD FOR PROVIDING STEALTH MEMORY - The described implementations relate to computer memory. One implementation provides a technique that can include providing stealth memory to an application. The stealth memory can have an associated physical address on a memory device. The technique can also include identifying a cache line of a cache that is mapped to the physical address associated with the stealth page, and locking one or more other physical addresses on the memory device that also map to the cache line. | 03-05-2015 |
20150074357 | DIRECT SNOOP INTERVENTION - A low latency cache intervention mechanism implements a snoop filter to dynamically select an intervener cache for a cache “hit” in a multiprocessor architecture of a computer system. The selection of the intervener is based on variables such as latency, topology, frequency, utilization, load, wear balance, and/or power state of the computer system. | 03-12-2015 |
20150089160 | METHOD AND APPARATUS FOR COPYING DATA USING CACHE - A cache and a method for performing data copying are provided. The cache includes a copy logic and be connected to a processor through a first bus and to a memory controller through a second bus, which is different from the first bus. Moreover, the copy logic may perform data copying through the second bus based on a data copy command received from the processor. | 03-26-2015 |
20150089161 | HIGH PERFORMANCE INTERCONNECT COHERENCE PROTOCOL - A coherence protocol message is sent corresponding to a particular cache line. A potential conflict involving the particular cache line is identified and a forward request is sent to a home agent to identify the potential conflict. A forward response can be received in response to the forward request from the home agent and a response to the conflict can be determined. | 03-26-2015 |
20150095590 | METHOD AND APPARATUS FOR PAGE-LEVEL MONITORING - An apparatus and method for page level monitoring are described. For example, one embodiment of a method for monitoring memory pages comprises storing information related to each of a plurality of memory pages including an address identifying a location for a monitor variable for each of the plurality of memory pages in a data structure directly accessible only by a software layer operating at or above a first privilege level; detecting virtual-to-physical page mapping consistency changes or other page modifications to a particular memory page for which information is maintained in the data structure; responsively updating the monitor variable to reflect the consistency changes or page modifications; checking a first monitor variable associated with a first memory page prior to execution of first program code; and refraining from executing the first program code if the first monitor variable indicates consistency changes or page modifications to the first memory page. | 04-02-2015 |
20150095591 | METHOD AND SYSTEM FOR FILTERING THE STORES TO PREVENT ALL STORES FROM HAVING TO SNOOP CHECK AGAINST ALL WORDS OF A CACHE - In a processor, a method for filtering stores to prevent all stores from having to snoop check against all words of a cache. The method includes implementing a cache wherein stores snoop the caches for address matches to maintain coherency; marking a portion of a cache line if a given core out of a plurality of cores loads from that portion by using an access mask; checking the access mask upon execution of subsequent stores to the cache line; and causing a miss prediction when a subsequent store to the portion of the cache line sees a prior mark from a load in the access mask. | 04-02-2015 |
20150100740 | Obstruction-Aware Cache Management - Processors and methods disclosed herein include a cache memory unit, n processor cores where n≧1, a controller connected to the cache memory unit and to each of the n processor cores, and n obstruction monitoring units, where each obstruction monitoring unit is connected to the controller and to a different one of the n processor cores, and where during operation of the electronic processor, each obstruction monitoring unit is configured to detect an obstruction corresponding to an operation from the processor core connected to the obstruction monitoring unit before the operation executes in the cache memory unit. | 04-09-2015 |
20150149734 | Combined Transparent/Non-Transparent Cache - In one embodiment, a memory that is delineated into transparent and non-transparent portions. The transparent portion may be controlled by a control unit coupled to the memory, along with a corresponding tag memory. The non-transparent portion may be software controlled by directly accessing the non-transparent portion via an input address. In an embodiment, the memory may include a decoder configured to decode the address and select a location in either the transparent or non-transparent portion. Each request may include a non-transparent attribute identifying the request as either transparent or non-transparent. In an embodiment, the size of the transparent portion may be programmable. Based on the non-transparent attribute indicating transparent, the decoder may selectively mask bits of the address based on the size to ensure that the decoder only selects a location in the transparent portion. | 05-28-2015 |
20150317251 | MAINTAINING A SYSTEM STATE CACHE - Methods, apparatuses and computer software products implement embodiments of the present invention that include storing, to a module memory in each of a plurality of modules having multiple sub-modules, a record containing record entries corresponding respectively to the sub-modules. Upon detecting changes in respective states of the sub-modules of a given module, the corresponding record entries are set in response to the detected changes in the states of the sub-modules of the given module. A cache containing cache entries corresponding respectively to the sub-modules in the plurality of the modules is stored to a controller memory, and the record in each of the modules is polled. Upon detecting that a given record entry of the given module has been set, from the current state information with respect to the given sub-module is requested and received, and a corresponding cache entry is updated in the cache with the current state information. | 11-05-2015 |
20150363315 | METHOD FOR TEMPORARILY STORING DATA AND STORAGE DEVICE - A method for temporarily storing data and a storage device is provided. The method for temporarily storing data is applied to the storage device, and the storage device includes a source agent and a target agent. The method includes: sending, by the source agent, a data obtaining request to the target agent; receiving, by the source agent, target data that is corresponding to the data obtaining request and is returned by the target agent; determining, by the source agent, whether a snooping request that is for the target data and sent by the target agent is received after the data obtaining request is sent and before the target data is received, where the snooping request indicates that the target agent is simultaneously processing an obtaining request from another source agent for the target data; and if the snooping request is received, discarding, by the source agent, the target data | 12-17-2015 |
20150370710 | OPTIONAL ACKNOWLEDGEMENT FOR OUT-OF-ORDER COHERENCE TRANSACTION COMPLETION - To enable efficient tracking of transactions, an acknowledgement expected signal is used to give the cache coherent interconnect a hint for whether a transaction requires coherent ownership tracking. This signal informs the cache coherent interconnect to expect an ownership transfer acknowledgement signal from the initiating master upon read/write transfer completion. The cache coherent interconnect can therefore continue tracking the transaction at its point of coherency until it receives the acknowledgement from the initiating master only when necessary. | 12-24-2015 |
20150378902 | CONDITIONAL INCLUSION OF DATA IN A TRANSACTIONAL MEMORY READ SET - Determining, by a processor having a cache, if data in the cache is to be monitored for cache coherency conflicts in a transactional memory (TM) environment. A processor executes a TM transaction, that includes the following. Executing a memory data access instruction that accesses an operand at an operand memory address. Based on either a prefix instruction associated with the memory data access instruction, or an operand tag associated with the operand of the memory data access instruction, determining whether a cache entry having the operand is to be marked for monitoring for cache coherency conflicts while the processor is executing the transaction. Based on determining that the cache entry is to be marked for monitoring for cache coherency conflicts while the processor is executing the transaction, marking the cache entry for monitoring for conflicts. | 12-31-2015 |
20150378903 | CO-PROCESSOR MEMORY ACCESSES IN A TRANSACTIONAL MEMORY - Monitoring, by a processor having a cache, addresses accessed by a co-processor associated with the processor during transactional execution of a transaction by the processor. The processor executes a transactional memory (TM) transaction, including receiving, by the processor, a memory address range of data that a co-processor may access to perform a co-processor operation. The processor saves the memory address range. Based on receiving, by the processor, a cache coherency request that conflicts with the saved address range, the processor aborts the TM transaction. | 12-31-2015 |
20150378906 | CONDITIONAL INCLUSION OF DATA IN A TRANSACTIONAL MEMORY READ SET - Determining, by a processor having a cache, if data in the cache is to be monitored for cache coherency conflicts in a transactional memory (TM) environment. A processor executes a TM transaction, that includes the following. Executing a memory data access instruction that accesses an operand at an operand memory address. Based on either a prefix instruction associated with the memory data access instruction, or an operand tag associated with the operand of the memory data access instruction, determining whether a cache entry having the operand is to be marked for monitoring for cache coherency conflicts while the processor is executing the transaction. Based on determining that the cache entry is to be marked for monitoring for cache coherency conflicts while the processor is executing the transaction, marking the cache entry for monitoring for conflicts. | 12-31-2015 |
20160055085 | ENFORCING ORDERING OF SNOOP TRANSACTIONS IN AN INTERCONNECT FOR AN INTEGRATED CIRCUIT - An interconnect has transaction tracking circuitry for enforcing ordering of a set of data access transactions so that they are issued to slave devices in an order in which they are received from master devices. The transaction tracking circuitry is reused for also enforcing ordering of snoop transactions which are triggered by the set of data access transactions, for snooping master devices identified by a snoop filter as holding cache data for the target address of the transactions. | 02-25-2016 |
20160062889 | COHERENCY CHECKING OF INVALIDATE TRANSACTIONS CAUSED BY SNOOP FILTER EVICTION IN AN INTEGRATED CIRCUIT - An interconnect has coherency control circuitry for performing coherency control operations and a snoop filter for identifying which devices coupled to the interconnect have cached data from a given address. When an address is looked up in the snoop filter and misses, and there is no spare snoop filter entry available, then the snoop filter selects a victim entry corresponding to a victim address, and issues an invalidate transaction for invalidating locally cached copies of the data identified by the victim. The coherency control circuitry for performing coherency checking operations for data access transactions is reused for performing coherency control operations for the invalidate transaction issued by the snoop filter. This greatly reduces the circuitry complexity of the snoop filter. | 03-03-2016 |
20160062890 | COHERENCY CHECKING OF INVALIDATE TRANSACTIONS CAUSED BY SNOOP FILTER EVICTION IN AN INTEGRATED CIRCUIT - An interconnect has coherency control circuitry for performing coherency control operations and a snoop filter for identifying which devices coupled to the interconnect have cached data from a given address. When an address is looked up in the snoop filter and misses, and there is no spare snoop filter entry available, then the snoop filter selects a victim entry corresponding to a victim address, and issues an invalidate transaction for invalidating locally cached copies of the data identified by the victim. The coherency control circuitry for performing coherency checking operations for data access transactions is reused for performing coherency control operations for the invalidate transaction issued by the snoop filter. This greatly reduces the circuitry complexity of the snoop filter. | 03-03-2016 |
20160085446 | CONTROL DEVICE AND STORAGE SYSTEM - A control device includes a processor. The processor is configured to collect plural types of performance information regarding a first data unit. The processor is configured to determine, on basis of the collected plural types of performance information, whether to transfer the first data unit from a first storage device which is under control of a first controller to a second storage device which is positioned as higher than the first storage device. The processor is configured to transfer the first data unit from the first storage device to the second storage device depending on a result of the determination. | 03-24-2016 |
20160179673 | CROSS-DIE INTERFACE SNOOP OR GLOBAL OBSERVATION MESSAGE ORDERING | 06-23-2016 |
20160188470 | PROMOTION OF A CACHE LINE SHARER TO CACHE LINE OWNER - A system and method for performing coherent cache snoops whereby a single or limited number of sharing coherent agents are snooped for a data access. A directory may store information identifying which coherent agents have a shared copy of a cache line. If more than one might be in a shared state, one is promoted to an owner state within the directory. Accesses to the shared cache line are responded to by a snoop to just one, or a number less than all, of the caching agents sharing the cache line. | 06-30-2016 |
20160188472 | DISTRIBUTED IMPLEMENTATION FOR CACHE COHERENCE - A distributed implementation for cache coherence comprises distinct agent interface units, coherency controllers, and memory interface units. The distinct units are separated logically and physically. Units are interconnected, and communicate with each other, by a transport network. Different organizations of connectivity are possible and chosen based on system performance and physical floorplan design constraints. The cache coherence subsystem is designed using software that exports a description of the system in a hardware description language. | 06-30-2016 |