Patent application number | Description | Published |
20090113181 | Method and Apparatus for Executing Instructions - A method and apparatus for executing instructions in a processor are provided. In one embodiment of the invention, the method includes receiving a plurality of instructions. The plurality of instructions includes first instructions in a first thread and second instructions in a second thread. The method further includes forming a common issue group including an instruction of a first instruction type and an instruction of a second instruction type. The method also includes issuing the common issue group to a first execution unit and a second execution unit. The instruction of the first instruction type is issued to the first execution unit and the instruction of the second instruction type is issued to the second execution unit. | 04-30-2009 |
20090157976 | Network on Chip That Maintains Cache Coherency With Invalidate Commands - A network on chip (‘NOC’) that maintains cache coherency with invalidate commands, the NOC comprising integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controller, each IP block adapted to a router through a memory communications controller and a network interface controller, the NOC also including a port on a router of the network through which is received an invalidate command, the invalidate command including an identification of a cache line, the invalidate command representing an instruction to invalidate the cache line, the router configured to send the invalidate command to an IP block served by the router; the router further configured to send the invalidate command horizontally and vertically to neighboring routers if the port is a vertical port; and the router further configured to send the invalidate command only horizontally to neighboring routers if the port is a horizontal port. | 06-18-2009 |
20090179902 | Dynamic Data Type Aligned Cache Optimized for Misaligned Packed Structures - A method and apparatus for processing vector data is provided. A processing core may have a data cache and a relatively smaller vector data cache. The vector data cache may be optimally sized to store vector data structures that are smaller than full data cache lines. | 07-16-2009 |
20090187716 | Network On Chip that Maintains Cache Coherency with Invalidate Commands - A network on chip (‘NOC’) that maintains cache coherency, the NOC including integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controller, each IP block adapted to a router through a memory communications controller and a network interface controller, at least one memory communications controller further comprising a cache coherency controller each memory communications controller controlling communication between an IP block and memory, and each network interface controller controlling inter-IP block communications through routers, wherein the memory communications controller configured to execute a memory access instruction and configured to determine a state of a cache line addressed by the memory access instruction, the state of the cache line being one of shared, exclusive, or invalid; the memory communications controller configured to broadcast an invalidate command to a plurality of IP blocks of the NOC if the state of the cache line is shared; and the memory communications controller configured to transmit an invalidate command only to an IP block that controls a cache where the cache line is stored if the state of the cache line is exclusive. | 07-23-2009 |
20090245257 | Network On Chip - A network on chip (‘NOC’) that includes integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controllers, with all communications including a route code specifying a route through the routers of the NOC from a source to a destination, each router including routing logic that directs a communication to one of four ports of the router, the one port identified by the first two bits in the route code. The routing logic in the router shifts the route code to discard the first two bits of the route code before transmitting the communication through the one port. | 10-01-2009 |
20090282197 | Network On Chip - A network on chip (‘NOC’) that includes integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controllers; each IP block adapted to a router through a memory communications controller and a network interface controller; and at least one IP block also including a computer processor and an L1, write-through data cache comprising high speed local memory on the IP block, the cache controlled by a cache controller having a cache line replacement policy, the cache controller configured to lock segments of the cache, the computer processor configured to store thread-private data in main memory off the IP block, the computer processor further configured to store thread-private data on a segment of the L1 data cache, the segment locked against replacement upon cache misses under the cache controller's replacement policy, the segment further locked against write-through to main memory. | 11-12-2009 |
20090315908 | Anisotropic Texture Filtering with Texture Data Prefetching - A circuit arrangement and method utilize texture data prefetching to prefetch texture data used by an anisotropic filtering algorithm. In particular, stride-based prefetching may be used to prefetch texture data for use in anisotropic filtering, where the value of the stride, or difference between successive accesses, is based upon a distance in a memory address space between sample points taken along the line of anisotropy used in an anisotropic filtering algorithm. | 12-24-2009 |
20120144253 | HARD MEMORY ARRAY FAILURE RECOVERY UTILIZING LOCKING STRUCTURE - A technique for managing hard failures in a memory system employing a locking is disclosed. An error count is maintained for units of memory within the memory system. When the error count indicates a hard failure, the unit of memory is locked out from further use. An arbitrary set of error counters are assigned to record errors resulting from access to the units of memory. Embodiments of the present invention advantageously enable a system to continue reliable operation even after one or more internal hard memory failures. Other embodiments advantageously enable manufacturers to salvage partially failed devices and deploy the devices as having a lower-performance specification rather than discarding the devices, as would otherwise be indicated by conventional practice. | 06-07-2012 |
20120169755 | ANISOTROPIC TEXTURE FILTERING WITH TEXTURE DATA PREFETCHING - A circuit arrangement and method utilize texture data prefetching to prefetch texture data used by an anisotropic filtering algorithm. In particular, stride-based prefetching may be used to prefetch texture data for use in anisotropic filtering, where the value of the stride, or difference between successive accesses, is based upon a distance in a memory address space between sample points taken along the line of anisotropy used in an anisotropic filtering algorithm. | 07-05-2012 |
20120254548 | ALLOCATING CACHE FOR USE AS A DEDICATED LOCAL STORAGE - A method and apparatus dynamically allocates and deallocates a portion of a cache for use as a dedicated local storage. Cache lines may be dynamically allocated and deallocated for inclusion in the dedicated local storage. Cache entries that are included in the dedicated local storage may not be evicted or invalidated. Additionally, coherence is not maintained between the cache entries that are included in the dedicated local storage and the backing memory. A load instruction may be configured to allocate, e.g., lock, a portion of the data cache for inclusion in the dedicated local storage and load data into the dedicated local storage. A load instruction may be configured to read data from the dedicated local storage and to deallocate, e.g., unlock, a portion of the data cache that was included in the dedicated local storage. | 10-04-2012 |
20120254877 | TRANSFERRING ARCHITECTED STATE BETWEEN CORES - A method and apparatus for transferring architected state bypasses system memory by directly transmitting architected state between processor cores over a dedicated interconnect. The transfer may be performed by state transfer interface circuitry with or without software interaction. The architected state for a thread may be transferred from a first processing core to a second processing core when the state transfer interface circuitry detects an error that prevents proper execution of the thread corresponding to the architected state. A program instruction may be used to initiate the transfer of the architected state for the thread to one or more other threads in order to parallelize execution of the thread or perform load balancing between multiple processor cores by distributing processing of multiple threads. | 10-04-2012 |
20130086329 | ALLOCATING CACHE FOR USE AS A DEDICATED LOCAL STORAGE - A method and apparatus dynamically allocates and deallocates a portion of a cache for use as a dedicated local storage. Cache lines may be dynamically allocated and deallocated for inclusion in the dedicated local storage. Cache entries that are included in the dedicated local storage may not be evicted or invalidated. Additionally, coherence is not maintained between the cache entries that are included in the dedicated local storage and the backing memory. A load instruction may be configured to allocate, e.g., lock, a portion of the data cache for inclusion in the dedicated local storage and load data into the dedicated local storage. A load instruction may be configured to read data from the dedicated local storage and to deallocate, e.g., unlock, a portion of the data cache that was included in the dedicated local storage. | 04-04-2013 |
20130097411 | TRANSFERRING ARCHITECTED STATE BETWEEN CORES - A method and apparatus for transferring architected state bypasses system memory by directly transmitting architected state between processor cores over a dedicated interconnect. The transfer may be performed by state transfer interface circuitry with or without software interaction. The architected state for a thread may be transferred from a first processing core to a second processing core when the state transfer interface circuitry detects an error that prevents proper execution of the thread corresponding to the architected state. A program instruction may be used to initiate the transfer of the architected state for the thread to one or more other threads in order to parallelize execution of the thread or perform load balancing between multiple processor cores by distributing processing of multiple threads. | 04-18-2013 |
20130159669 | LOW LATENCY VARIABLE TRANSFER NETWORK FOR FINE GRAINED PARALLELISM OF VIRTUAL THREADS ACROSS MULTIPLE HARDWARE THREADS - A method and circuit arrangement utilize a low latency variable transfer network between the register files of multiple processing cores in a multi-core processor chip to support fine grained parallelism of virtual threads across multiple hardware threads. The communication of a variable over the variable transfer network may be initiated by a move from a local register in a register file of a source processing core to a variable register that is allocated to a destination hardware thread in a destination processing core, so that the destination hardware thread can then move the variable from the variable register to a local register in the destination processing core. | 06-20-2013 |
20130159799 | MULTI-CORE PROCESSOR WITH INTERNAL VOTING-BASED BUILT IN SELF TEST (BIST) - A method and circuit arrangement utilize scan logic disposed on a multi-core processor integrated circuit device or chip to perform internal voting-based built in self test (BIST) of the chip. Test patterns are generated internally on the chip and communicated to the scan chains within multiple processing cores on the chip. Test results output by the scan chains are compared with one another on the chip, and majority voting is used to identify outlier test results that are indicative of a faulty processing core. A bit position in a faulty test result may be used to identify a faulty latch in a scan chain and/or a faulty functional unit in the faulty processing core, and a faulty processing core and/or a faulty functional unit may be automatically disabled in response to the testing. | 06-20-2013 |
20130173860 | NEAR NEIGHBOR DATA CACHE SHARING - Parallel computing environments, where threads executing in neighboring processors may access the same set of data, may be designed and configured to share one or more levels of cache memory. Before a processor forwards a request for data to a higher level of cache memory following a cache miss, the processor may determine whether a neighboring processor has the data stored in a local cache memory. If so, the processor may forward the request to the neighboring processor to retrieve the data. Because access to the cache memories for the two processors is shared, the effective size of the memory is increased. This may advantageously decrease cache misses for each level of shared cache memory without increasing the individual size of the caches on the processor chip. | 07-04-2013 |
20130173861 | NEAR NEIGHBOR DATA CACHE SHARING - Parallel computing environments, where threads executing in neighboring processors may access the same set of data, may be designed and configured to share one or more levels of cache memory. Before a processor forwards a request for data to a higher level of cache memory following a cache miss, the processor may determine whether a neighboring processor has the data stored in a local cache memory. If so, the processor may forward the request to the neighboring processor to retrieve the data. Because access to the cache memories for the two processors is shared, the effective size of the memory is increased. This may advantageously decrease cache misses for each level of shared cache memory without increasing the individual size of the caches on the processor chip. | 07-04-2013 |
20130173967 | HARD MEMORY ARRAY FAILURE RECOVERY UTILIZING LOCKING STRUCTURE - A technique for managing hard failures in a memory system employing a locking is disclosed. An error count is maintained for units of memory within the memory system. When the error count indicates a hard failure, the unit of memory is locked out from further use. An arbitrary set of error counters are assigned to record errors resulting from access to the units of memory. Embodiments of the present invention advantageously enable a system to continue reliable operation even after one or more internal hard memory failures. Other embodiments advantageously enable manufacturers to salvage partially failed devices and deploy the devices as having a lower-performance specification rather than discarding the devices, as would otherwise be indicated by conventional practice. | 07-04-2013 |