Patent application number | Description | Published |
20080215818 | STRUCTURE FOR SILENT INVALID STATE TRANSITION HANDLING IN AN SMP ENVIRONMENT - A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design can be provided. The design structure includes a symmetric multiprocessing (SMP) system. The system includes a plurality of nodes. Each of the nodes includes a node controller and a plurality of processors cross-coupled to one another. The system also includes at least one cache directory coupled to each node controller, and, invalid state transition logic coupled to each node controller. The invalid state transition logic includes program code enabled to identify an invalid state transition for a cache line in a local node, to evict a corresponding cache directory entry for the cache line, and to forward an invalid state transition notification to a node controller for a home node for the cache line in order for the home node to evict a corresponding cache directory entry for the cache line. | 09-04-2008 |
20080244189 | Method, Apparatus, System and Program Product Supporting Directory-Assisted Speculative Snoop Probe With Concurrent Memory Access - A multiprocessor data processing system includes a memory controller controlling access to a memory subsystem, multiple processor buses coupled to the memory controller, and at least one of multiple processors coupled to each processor bus. In response to receiving a first read request of a first processor via a first processor bus, the memory controller initiates a speculative access to the memory subsystem and a lookup of the target address in a central coherence directory. In response to the central coherence directory indicating that a copy of the target memory block is cached by a second processor, the memory controller transmits a second read request for the target address on a second processor bus. In response to receiving a clean snoop response to the second read request, the memory controller provides to the first processor the target memory block retrieved from the memory subsystem by the speculative access. | 10-02-2008 |
20080244190 | Method, Apparatus, System and Program Product Supporting Efficient Eviction of an Entry From a Central Coherence Directory - In response to a memory access request missing in a central coherence directory of a data processing system, the central coherence directory issues a back-invalidate request and provides an indication of one or more processors possibly caching a copy of a victim memory block associated with a victim memory address. In response to the back-invalidate request, a memory controller initiates a lookup of coherency information for the victim memory address in the central coherence directory and, prior to receipt of the coherency information, speculatively issues a set of back-invalidate commands on one or more of multiple processor buses to invalidate any cached copy of the victim memory block. In response to receipt of the coherency information, the memory controller determines whether the set of speculatively issued back-invalidate commands was under-inclusive, and if not, removes a victim entry associated with the victim memory address from the central coherence directory. | 10-02-2008 |
20080301376 | Method, Apparatus, and System Supporting Improved DMA Writes - A memory controller receives a stream of DMA write operations and enqueues them in a queue enforcing a First-In First-Out (FIFO) order. Prior to processing a particular DMA write operation, the memory controller acquires coherency ownership of a target memory block and stores the result in a low latency array. In response to acquiring coherency ownership, this low latency array is updated to a coherency state signifying coherency ownership by the memory controller. In a pipelined array access, both the low latency array and the second array are accessed and if the lower latency second array indicates the particular coherency state with no collision indication, the memory controller signals that the particular DMA write operation can be performed, where the signaling occurs prior to results being obtained from the higher latency first array at the normal end of the array access pipeline. In response to the signaling, the memory controller performs an update to the memory subsystem indicated by the particular DMA write operation. | 12-04-2008 |
20080307169 | Method, Apparatus, System and Program Product Supporting Improved Access Latency for a Sectored Directory - A data processing system includes a coherence directory having a prefetch sector cache and a memory directory array containing a plurality of sectored entries. According to one method, in response to receiving a first directory lookup request specifying a first target address, an entry associated with the target address is accessed in the memory directory array. In response to the access, the coherence directory returns, as a result of the first directory lookup request, contents of a first sector that is identified by the target address as a requested sector. The coherence directory also caches contents of a second sector of the multiple sectors that is a non-requested sector for the first directory lookup request in a prefetch sector cache. In response to receiving a subsequent second directory lookup request specifying a second target address that identifies the second sector as a requested sector, the coherence directory accesses the contents of the second sector in the sector prefetch cache and returns the contents of the second sector as a result of the second directory lookup request. | 12-11-2008 |
20090019238 | Memory Controller Read Queue Dynamic Optimization of Command Selection - A memory controller receives read requests from a processor into a read queue. The memory controller dynamically modifies an order of servicing the requests based on how many pending requests are in the read queue. When the read queue is relatively empty, requests are serviced oldest first to minimize latency. When the read queue becomes fuller, requests are serviced in a manner that maximizes throughput on a memory bus to reduce the likelihood that the read queue will become full and further requests from the processor would have to be halted. | 01-15-2009 |
20090019239 | Memory Controller Granular Read Queue Dynamic Optimization of Command Selection - A memory controller receives read requests from a processor into a read queue. The memory controller dynamically modifies an order of servicing the requests based on how many pending requests are in the read queue. When the read queue is relatively empty, requests are serviced oldest first to minimize latency. When the read queue becomes progressively fuller, requests are progressively, using three or more memory access modes, serviced in a manner that increases throughput on a memory bus to reduce the likelihood that the read queue will become full and further requests from the processor would have to be halted. | 01-15-2009 |
20090089554 | METHOD FOR TUNING CHIPSET PARAMETERS TO ACHIEVE OPTIMAL PERFORMANCE UNDER VARYING WORKLOAD TYPES - A method, system, and computer program product for tuning a set of chipset parameters to achieve optimal chipset performance under varying workload characteristics. A set of workload characteristics of a current workload type is determined. An instruction stream is generated using weighted parameters derived from the set of workload characteristics of the current workload type. A set of chipset parameters is generated and integrated within the instruction stream. The instruction stream is loaded to one or more processors and executed to collect and analyze performance data relating to the chipset's performance. The analysis includes comparing the set of performance data of a plurality of different instruction streams having the same set of workload characteristics. Each executed instruction stream is executed with at least one different combination of chipset parameters. A determination is made regarding which combination of chipset parameters provides the best performance data for the current workload. | 04-02-2009 |
20090265534 | Fairness, Performance, and Livelock Assessment Using a Loop Manager With Comparative Parallel Looping - A method, apparatus, and computer program are provided for assessing fairness, performance, and livelock in a logic development process utilizing comparative parallel looping. Multiple loop macros are generated, the multiple loop macros respectively correspond to multiple processor threads, and the multiple loop macros are parallel comparative loop macros. The multiple processor threads for the multiple loop macros are executed in which a common resource is accessed. A forward performance of each of the multiple processor threads is verified. The forward performance of the multiple processor threads is compared with each other. It is determined whether any of the multiple processor threads fails to meet a minimum loop count or a minimum loop time. It is determined whether any of the multiple processor threads exceeds a maximum loop count or a maximum loop time. It is recognized whether fairness is maintained during the execution of the multiple processor threads. | 10-22-2009 |
20090268727 | Early header CRC in data response packets with variable gap count - A method is provided for processing a command issued by a processor over a bus. The method includes ( | 10-29-2009 |
20090268736 | Early header CRC in data response packets with variable gap count - A method is provided for processing commands issued by a processor over a bus. The method includes the steps of (1) transmitting the command to a remote node to obtain access to data required to complete the command; (2) receiving from the remote node a response packet including a header and a header CRC; (3) validating the response packet based on the header CRC; and (4) before receiving the data required to complete the command, arranging to return the data to the processor over the bus. | 10-29-2009 |
20090271532 | Early header CRC in data response packets with variable gap count - A method is provided for processing a command issued by a processor over a bus. The method includes ( | 10-29-2009 |
20090271578 | Reducing Memory Fetch Latency Using Next Fetch Hint - In one aspect, a processor is provided. The processor may include logic, coupled to the processor, and to issue a currently issued memory fetch over a processor bus. The currently issued memory fetch may include a next fetch hint that may include information about a next memory fetch. | 10-29-2009 |
20130242985 | MULTICAST BANDWIDTH MULTIPLICATION FOR A UNIFIED DISTRIBUTED SWITCH - The distributed switch may include a plurality of chips (i.e., sub-switches) on a switch module. These sub-switches may receive from a computing device connected to a Tx/Rx port a multicast data frame (e.g., an Ethernet frame) that designates a plurality of different destinations. Instead of simply using one egress connection interface to forward the copies of the data frame to each of the destinations sequentially, the sub-switch may use a plurality of a connection interfaces to transfer copies of the multicast data frame simultaneously. The port that receives the multicast data frame can borrow the connection interfaces (and associated hardware such as buffers) assigned to these other ports to transmit copies of the multicast data frame simultaneously. | 09-19-2013 |
20130242993 | MULTICAST BANDWIDTH MULTIPLICATION FOR A UNIFIED DISTRIBUTED SWITCH - The distributed switch may include a plurality of chips (i.e., sub-switches) on a switch module. These sub-switches may receive from a computing device connected to a Tx/Rx port a multicast data frame (e.g., an Ethernet frame) that designates a plurality of different destinations. Instead of simply using one egress connection interface to forward the copies of the data frame to each of the destinations sequentially, the sub-switch may use a plurality of a connection interfaces to transfer copies of the multicast data frame simultaneously. The port that receives the multicast data frame can borrow the connection interfaces (and associated hardware such as buffers) assigned to these other ports to transmit copies of the multicast data frame simultaneously. | 09-19-2013 |
20140122771 | WEIGHTAGE-BASED SCHEDULING FOR HIERARCHICAL SWITCHING FABRICS - Techniques are disclosed to implement a scheduling scheme for a crossbar scheduler that provides distributed request-grant-accept arbitration between input group arbiters and output group arbiters in a distributed switch. Input and output ports are grouped and assigned a respective arbiter. The input group arbiters communicate requests indicating a count of respective ports having data packets to be transmitted via one of the output ports. The output group arbiter attempts to accommodate the requests for each member of an input group before proceeding to a next input group. | 05-01-2014 |
20140359639 | INTEGRATED LINK-BASED DATA RECORDER FOR SEMICONDUCTOR CHIP - Programmable data recorders are provided in a network semiconductor chip to monitor and record all or portions of data from various interfaces, including the input and output interfaces of network links interfacing the chip. A data recorder is provided for each network link. The data recorders store captured data in storage arrays. Data may be compressed and associated with time stamps to conserve space in the storage arrays. A data recorder manager in the chip may start and stop the data recorders at approximately the same time. The programmable mode of the data recorders determines which interfaces are monitored, the portion of data captured, and the format of the data in the storage arrays. | 12-04-2014 |
20140359641 | INTEGRATED LINK-BASED DATA RECORDER FOR SEMICONDUCTOR CHIP - Programmable data recorders are provided in a network semiconductor chip to monitor and record all or portions of data from various interfaces, including the input and output interfaces of network links interfacing the chip. A data recorder is provided for each network link. The data recorders store captured data in storage arrays. Data may be compressed and associated with time stamps to conserve space in the storage arrays. A data recorder manager in the chip may start and stop the data recorders at approximately the same time. The programmable mode of the data recorders determines which interfaces are monitored, the portion of data captured, and the format of the data in the storage arrays. | 12-04-2014 |
20150016254 | QUEUE CREDIT MANAGEMENT - To prevent buffer overflow, a receiving entity may use credits to control the total amount of packets any single transmitting entity can forward. Once the assigned credits are spent, the transmitting entity cannot send data portions to the receiving entity until additional credits are provided. However, the logic in the receiving entity may be designed to manage a maximum number of credits that is less than the capacity of the buffer in the transmitting entity. For example, the receiving entity is designed to manage a maximum of eight credits but the buffer has room for twelve data portions. To use the buffer efficiently, the transmitting entity may identify when extra buffer storage is available and provide additional credits. In addition, the transmitting entity may control when the credits are provided such that the receiving entity is not allocated more credits that it was designed to manage. | 01-15-2015 |
20150063348 | IMPLEMENTING HIERARCHICAL HIGH RADIX SWITCH WITH TIMESLICED CROSSBAR - A method and system are provided for implementing a hierarchical high radix switch with a time-sliced crossbar. The hierarchical high radix switch includes a plurality of inputs and a plurality of outputs. Each input belongs to one input group; each input group sends consolidated requests to each output, by ORing the requests from the local input ports in that input group. Each output port belongs to one output group; each output port grants one of the requesting input groups using a rotating priority defined by a next-to-serve pointer. Each output group consolidates the output port grants and allows one grant to pass back to an input group. Each input port in an input group evaluates all incoming grants in an oldest packet first manner to form an accept. Each input group consolidates the input port accepts and selects one accept to send to the output port. | 03-05-2015 |