Patent application number | Description | Published |
20090013133 | CACHE LINE MARKING WITH SHARED TIMESTAMPS - Embodiments of the present invention provide a system that marks cache lines using shared timestamps. During operation, the system starts a transaction for a thread, wherein starting the transaction involves recording the value of an active timestamp and incrementing a transaction or overflow counter (TO_counter) corresponding to the recorded value. The system then places load-marks on cache lines which are loaded during the transaction. While placing the load-marks, the system writes the recorded value into metadata corresponding to the cache lines. Upon completing the transaction for the thread, the system decrements the TO_counter corresponding to the recorded value and resumes non-transactional execution for the thread without removing the load-marks from cache lines which were load-marked during the transaction. | 01-08-2009 |
20090019231 | Method and Apparatus for Implementing Virtual Transactional Memory Using Cache Line Marking - Embodiments of the present invention implement virtual transactional memory using cache line marking. The system starts by executing a starvation-avoiding transaction for a thread. While executing the starvation-avoiding transaction, the system places starvation-avoiding load-marks on cache lines which are loaded from and places starvation-avoiding store-marks on cache lines which are stored to. Next, while swapping a page out of a memory and to a disk during the starvation-avoiding transaction, the system determines if one or more cache lines in the page have a starvation-avoiding load-mark or a starvation-avoiding store-mark. If so, upon swapping the page into the memory from the disk, the system places a starvation-avoiding load-mark on each cache line that had a starvation-avoiding load-mark and places a starvation-avoiding store-mark on each cache line that had a starvation-avoiding store-mark. | 01-15-2009 |
20090019272 | STORE QUEUE ARCHITECTURE FOR A PROCESSOR THAT SUPPORTS SPECULATIVE EXECUTION - Embodiments of the present invention provide a system that buffers stores on a processor that supports speculative execution. The system starts by buffering a store into an entry in the store queue during a speculative execution mode. If an entry for the store does not already exist in the store queue, the system writes the store into an available entry in the store queue and updates a byte mask for the entry. Otherwise, if an entry for the store already exists in the store queue, the system merges the store into the existing entry in the store queue and updates the byte mask for the entry to include information about the newly merged store. The system then forwards the data from the store queue to subsequent dependent loads. | 01-15-2009 |
20090113131 | METHOD AND APPARATUS FOR TRACKING LOAD-MARKS AND STORE-MARKS ON CACHE LINES - Embodiments of the present invention provide a system that handles load-marked and store-marked cache lines. Upon asserting a load-mark or a store-mark for a cache line during a given phase of operation, the system adds an entry to a private buffer and in doing so uses an address of the cache line as a key for the entry in the private buffer. The system also updates the entry in the private buffer with information about the load-mark or store-mark and uses pointers for the entry and for the last entry added to the private buffer to add the entry to a sequence of private buffer entries placed during the phase of operation. The system then uses the entries in the private buffer to remove the load-marks and store-marks from cache lines when the phase of operation is completed. | 04-30-2009 |
20090119461 | MAINTAINING CACHE COHERENCE USING LOAD-MARK METADATA - Embodiments of the present invention provide a system that maintains load-marks on cache lines. The system includes: (1) a cache which accommodates a set of cache lines, wherein each cache line includes metadata for load-marking the cache line, and (2) a local cache controller for the cache. Upon determining that a remote cache controller has made a request for a cache line that would cause the local cache controller to invalidate a copy of the cache line in the cache, the local cache controller determines if there is a load-mark in the metadata for the copy of the cache line. If not, the local cache controller invalidates the copy of the cache line. Otherwise, the local cache controller signals a denial of the invalidation of the cache line and retains the copy of the cache line and the load-mark in the metadata for the copy of the cache line. | 05-07-2009 |
20090187727 | INDEX GENERATION FOR CACHE MEMORIES - Embodiments of the present invention provide a system that generates an index for a cache memory. The system starts by receiving a request to access the cache memory, wherein the request includes address information. The system then obtains non-address information associated with the request. Next, the system generates the index using the address information and the non-address information. The system then uses the index to fulfill access the cache memory. | 07-23-2009 |
20090187906 | SEMI-ORDERED TRANSACTIONS - Embodiments of the present invention provide a system that facilitates transactional execution in a processor. The system starts by executing program code for a thread in a processor. Upon detecting a predetermined indicator, the system starts a transaction for a section of the program code for the thread. When starting the transaction, the system executes a checkpoint instruction. If the checkpoint instruction is a WEAK_CHECKPOINT instruction, the system executes a semi-ordered transaction. During the semi-ordered transaction, the system preserves code atomicity but not memory atomicity. Otherwise, the system executes a regular transaction. During the regular transaction, the system preserves both code atomicity and memory atomicity. | 07-23-2009 |
20090204761 | PSEUDO-LRU CACHE LINE REPLACEMENT FOR A HIGH-SPEED CACHE - Embodiments of the present invention provide a system that replaces an entry in a least-recently-used way in a skewed-associative cache. The system starts by receiving a cache line address. The system then generates two or more indices using the cache line address. Next, the system generates two or more intermediate indices using the two or more indices. The system then uses at least one of the two or more indices or the two or more intermediate indices to perform a lookup in one or more lookup tables, wherein the lookup returns a value which identifies a least-recently-used way. Next, the system replaces the entry in the least-recently-used way. | 08-13-2009 |
20090282225 | STORE QUEUE - Embodiments of the present invention provide a system which executes a load instruction or a store instruction. During operation the system receives a load instruction. The system then determines if an unrestricted entry or a restricted entry in a store queue contains data that satisfies the load instruction. If not, the system retrieves data for the load instruction from a cache. If so, the system conditionally forwards data from the unrestricted entry or the restricted entry by: (1) forwarding data from an unrestricted entry that contains the youngest store that satisfies the load instruction when any number of unrestricted or restricted entries contain data that satisfies the load instruction; (2) forwarding data from an unrestricted entry when only one restricted entry and no unrestricted entries contain data that satisfies the load instruction; and (3) deferring the load instruction by placing the load instruction in a deferred queue when two or more restricted entries and no unrestricted entries contain data that satisfies the load instruction. | 11-12-2009 |
20090300338 | AGGRESSIVE STORE MERGING IN A PROCESSOR THAT SUPPORTS CHECKPOINTING - Embodiments of the present invention provide a processor that merges stores in an N-entry first-in-first-out (FIFO) store queue. In these embodiments, the processor starts by executing instructions before a checkpoint is generated. When executing instructions before the checkpoint is generated, the processor is configured to perform limited or no merging of stores into existing entries in the store queue. Then, upon detecting a predetermined condition, the processor is configured to generate a checkpoint. After generating the checkpoint, the processor is configured to continue to execute instructions. When executing instructions after the checkpoint is generated, the processor is configured to freely merge subsequent stores into post-checkpoint entries in the store queue. | 12-03-2009 |
20100023701 | CACHE LINE DUPLICATION IN RESPONSE TO A WAY PREDICTION CONFLICT - Embodiments of the present invention provide a system that handles way mispredictions in a multi-way cache. The system starts by receiving requests to access cache lines in the multi-way cache. For each request, the system makes a prediction of a way in which the cache line resides based on a corresponding entry in the way prediction table. The system then checks for the presence of the cache line in the predicted way. Upon determining that the cache line is not present in the predicted way, but is present in a different way, and hence the way was mispredicted, the system increments a corresponding record in a conflict detection table. Upon detecting that a record in the conflict detection table indicates that a number of mispredictions equals a predetermined value, the system copies the corresponding cache line from the way where the cache line actually resides into the predicted way. | 01-28-2010 |
20100031084 | CHECKPOINTING IN A PROCESSOR THAT SUPPORTS SIMULTANEOUS SPECULATIVE THREADING - Embodiments of the present invention provide a system for executing program code on a processor. In these embodiments, the processor is configured to start by using a primary strand to execute program code. Upon detecting a predetermined condition, the processor is configured to instantaneously checkpoint an architectural state of the primary strand and then use the subordinate strand to copy the checkpointed state to memory while using the primary strand to continue executing the program code without interruption. | 02-04-2010 |
20100049957 | RECOVERING A SUBORDINATE STRAND FROM A BRANCH MISPREDICTION USING STATE INFORMATION FROM A PRIMARY STRAND - Embodiments of the present invention provide a system that executes program code in a processor. The system starts by executing the program code in a normal mode using a primary strand while concurrently executing the program code ahead of the primary strand using a subordinate strand in a scout mode. Upon resolving a branch using the subordinate strand, the system records a resolution for the branch in a speculative branch resolution table. Upon subsequently encountering the branch using the primary strand, the system uses the recorded resolution from the speculative branch resolution table to predict a resolution for the branch for the primary strand. Upon determining that the resolution of the branch was mispredicted for the primary strand, the system determines that the subordinate strand mispredicted the branch. The system then recovers the subordinate strand to the branch and restarts the subordinate strand executing the program code. | 02-25-2010 |
20100125707 | DEADLOCK AVOIDANCE DURING STORE-MARK ACQUISITION - Some embodiments of the present invention provide a system that avoids deadlock while attempting to acquire store-marks on cache lines. During operation, the system keeps track of store-mark requests that arise during execution of a thread, wherein a store-mark on a cache line indicates that one or more associated store buffer entries are waiting to be committed to the cache line. In this system, store-mark requests are processed in a pipelined manner, which allows a store-mark request to be initiated before preceding store-mark requests for the same thread complete. Next, if a store-mark request fails, within a bounded amount of time, the system removes or prevents store-marks associated with younger store-mark requests for the same thread, thereby avoiding a potential deadlock that can arise when one or more other threads attempt to store-mark the same cache lines. | 05-20-2010 |
20100180103 | MECHANISM FOR INCREASING THE EFFECTIVE CAPACITY OF THE WORKING REGISTER FILE - A computer processor pipeline has both an architectural register file and a working register file. The lifetime of an entry in the working register file is determined by a predetermined number of instructions passing through a specified stage in the pipeline after the location in the working register file is allocated for an instruction. The size of the working register file is selected based upon performance characteristics. A working register file creditor indicator is coupled to the front end pipeline portion and to the back end pipeline portion. The working register file credit indicator is monitored to prevent a working register file overflow. When the a location in the architectural register file is read early, the location is monitored to determine whether the location is written to prior to issuance of the instruction associated with the early read. | 07-15-2010 |
20100191993 | LOGICAL POWER THROTTLING - A processor includes a device providing a throttling power output signal. The throttling power output signal is used to determine when to logically throttle the power consumed by the processor. At least one core in the processor includes a pipeline having a decode pipe; and a logical power throttling unit coupled to the device to receive the output signal, and coupled to the decode pipe. Following the logical power throttling unit receiving the power throttling output signal satisfying a predetermined criterion, the logical power throttling unit causes the decode pipe to reduce an average number of instructions decoded per processor cycle without physically changing the processor cycle or any processor supply voltages. | 07-29-2010 |
20100268919 | METHOD AND STRUCTURE FOR SOLVING THE EVIL-TWIN PROBLEM - A register file, in a processor, includes a first plurality of registers of a first size, n-bits. A decoder uses a mapping that divides the register file into a second plurality M of registers having a second size. Each of the registers having the second size is assigned a different name in a continuous name space. Each register of the second size includes a plurality N of registers of the first size, n-bits. Each register in the plurality N of registers is assigned the same name as the register of the second size that includes that plurality. State information is maintained in the register file for each n-bit register. The dependence of an instruction on other instructions is detected through the continuous name space. The state information allows the processor to determine when the information in any portion, or all, of a register is valid. | 10-21-2010 |
20100325374 | DYNAMICALLY CONFIGURING MEMORY INTERLEAVING FOR LOCALITY AND PERFORMANCE ISOLATION - Embodiments of the present invention provide a system that dynamically reconfigures memory. During operation, the system determines that a virtual memory page is to be reconfigured from an original virtual-address-to-physical-address mapping to a new virtual-address-to-physical-address mapping. The system then determines a new real address mapping for a set of virtual addresses in the virtual memory page by selecting a range of real addresses for the virtual addresses that are arranged according to the new virtual-address-to-physical-address mapping. Next, the system temporarily disables accesses to the virtual memory page. Then, the system copies data from real address locations indicated by the original virtual-address-to-physical-address mapping to real address locations indicated by the new virtual-address-to-physical-address mapping. Next, the system updates the real-address-to-physical-address mapping for the page, and re-enables accesses to the virtual memory page. | 12-23-2010 |
20110035561 | STORE QUEUE WITH TOKEN TO FACILITATE EFFICIENT THREAD SYNCHRONIZATION - Some embodiments of the present invention provide a system for operating a store queue, wherein the store queue buffers stores that are waiting to be committed to a memory system in a processor. During operation, the system examines an entry at the head of the store queue. If the entry contains a membar token, the system examines an unacknowledged counter that keeps track of the number of store operations that have been sent from the store queue to the memory system but have not been acknowledged as being committed to the memory system. If the unacknowledged counter is non-zero, the system waits until the unacknowledged counter equals zero, and then removes the membar token from the store queue. | 02-10-2011 |
20110119528 | HARDWARE TRANSACTIONAL MEMORY ACCELERATION THROUGH MULTIPLE FAILURE RECOVERY - The described embodiments provide a processor (e.g., processor | 05-19-2011 |
20110179254 | LIMITING SPECULATIVE INSTRUCTION FETCHING IN A PROCESSOR - The described embodiments relate to a processor that speculatively executes instructions. During operation, the processor often executes instructions in a speculative-execution mode. Upon detecting an impending pipe-clearing event while executing instructions in the speculative-execution mode, the processor stalls an instruction fetch unit to prevent the instruction fetch unit from fetching instructions. In some embodiments, the processor stalls the instruction fetch unit until a condition that originally caused the processor to operate in the speculative-execution mode is resolved. In alternative embodiments, the processor maintains the stall of the instruction fetch unit until the pipe-clearing event has been completed (i.e., has been handled in the processor). | 07-21-2011 |
20110179258 | PRECISE DATA RETURN HANDLING IN SPECULATIVE PROCESSORS - The described embodiments provide a system for executing instructions in a processor. In the described embodiments, upon detecting a return of input data for a deferred instruction while executing instructions in an execute-ahead mode, the processor determines whether a replay bit is set in a corresponding entry for the returned input data in a miss buffer. If the replay bit is set, the processor transitions to a deferred-execution mode to execute deferred instructions. Otherwise, the processor continues to execute instructions in the execute-ahead mode. | 07-21-2011 |
20110231612 | PRE-FETCHING FOR A SIBLING CACHE - One embodiment provides a system that pre-fetches into a sibling cache. During operation, a first thread executes in a first processor core associated with a first cache, while a second thread associated with the first thread simultaneously executes in a second processor core associated with a second cache. During execution, the second thread encounters an instruction that triggers a request to a lower-level cache which is shared by the first cache and the second cache. The system responds to this request by directing a load fill which returns from the lower-level cache in response to the request to the first cache, thereby reducing cache misses for the first thread. | 09-22-2011 |
20110264862 | REDUCING PIPELINE RESTART PENALTY - Techniques are disclosed relating to reducing the latency of restarting a pipeline in a processor that implements scouting. In one embodiment, the processor may reduce pipeline restart latency using two instruction fetch units that are configured to fetch and re-fetch instructions in parallel with one another. In some embodiments, the processor may reduce pipeline restart latency by initiating re-fetching instructions in response to determining that a commit operation is to be attempted with respect to one or more deferred instructions. In other embodiments, the processor may reduce pipeline restart latency by initiating re-fetching instructions in response to receiving an indication that a request for a set of data has been received by a cache, where the indication is sent by the cache before determining whether the data is present in the cache or not. | 10-27-2011 |
20110264898 | CHECKPOINT ALLOCATION IN A SPECULATIVE PROCESSOR - The embodiments described in the instant application provide a system for generating checkpoints. In the described embodiments, while speculatively executing instructions with one or more checkpoints in use, upon detecting an occurrence of a predetermined operating condition or encountering a predetermined type of instruction, the system is configured to determine whether an additional checkpoint is to be generated by computing a factor based on one or more operating conditions of the processor. When the factor is greater than a predetermined value, the processor is configured to generate the additional checkpoint. | 10-27-2011 |
20110276791 | HANDLING A STORE INSTRUCTION WITH AN UNKNOWN DESTINATION ADDRESS DURING SPECULATIVE EXECUTION - The described embodiments provide a system for executing instructions in a processor. While executing instructions in an execute-ahead mode, the processor encounters a store instruction for which a destination address is unknown. The processor then defers the store instruction. Upon encountering a load instruction while the store instruction with the unknown destination address is deferred, the processor determines if the load instruction is to continue executing. If not, the processor defers the load instruction. Otherwise, the processor continues executing the load instruction. | 11-10-2011 |
20120089819 | ISSUING INSTRUCTIONS WITH UNRESOLVED DATA DEPENDENCIES - The described embodiments include a processor that determines instructions that can be issued based on unresolved data dependencies. In an issue unit in the processor, the processor keeps a record of each instruction that is directly or indirectly dependent on a base instruction. Upon determining that the base instruction has been deferred, the processor monitors instructions that are being issued from an issue queue to an execution unit for execution. Upon determining that an instruction from the record has reached a head of the issue queue, the processor immediately issues the instruction from the issue queue. | 04-12-2012 |
20120166756 | INDEX GENERATION FOR CACHE MEMORIES - Embodiments of the present invention provide a system that generates an index for a cache memory. The system starts by receiving a request to access the cache memory, wherein the request includes address information. The system then obtains non-address information associated with the request. Next, the system generates the index using the address information and the non-address information. The system then uses the index to fulfill access the cache memory. | 06-28-2012 |
20120331314 | LOGICAL POWER THROTTLING - A processor includes a device providing a throttling power output signal. The throttling power output signal is used to determine when to logically throttle the power consumed by the processor. At least one core in the processor includes a pipeline having a decode pipe; and a logical power throttling unit coupled to the device to receive the output signal, and coupled to the decode pipe. Following the logical power throttling unit receiving the power throttling output signal satisfying a predetermined criterion, the logical power throttling unit causes the decode pipe to reduce an average number of instructions decoded per processor cycle without physically changing the processor cycle or any processor supply voltages. | 12-27-2012 |