Patent application number | Description | Published |
20080263325 | SYSTEM AND STRUCTURE FOR SYNCHRONIZED THREAD PRIORITY SELECTION IN A DEEPLY PIPELINED MULTITHREADED MICROPROCESSOR - A microprocessor and system with improved performance and power in simultaneous multithreading (SMT) microprocessor architecture. The microprocessor and system includes a process wherein the processor has the ability to select instructions from one thread or another in any given processor clock cycle. Instructions from each, thread may be assigned selection priorities at multiple decision points in a processor in a given cycle dynamically. The thread priority is based on monitoring performance behavior and activities in the processor. In the exemplary embodiment, the present invention discloses a microprocessor and system for synchronizing thread priorities among multiple decision points throughout the micro-architecture of the microprocessor. This system and method for synchronizing thread priorities allows each thread priority to he in sync and aware of the status of other thread priorities at various decision points within the microprocessor. | 10-23-2008 |
20080307210 | System and Method for Optimizing Branch Logic for Handling Hard to Predict Indirect Branches - A system and method for optimizing the branch logic of a processor to improve handling of hard to predict indirect branches are provided. The system and method leverage the observation that there will generally be only one move to the count register (mtctr) instruction that will be executed while a branch on count register (bcctr) instruction has been fetched and not executed. With the mechanisms of the illustrative embodiments, fetch logic detects that it has encountered a bcctr instruction that is hard to predict and, in response to this detection, blocks the target fetch from entering the instruction buffer of the processor. At this point, the fetch logic has fetched all the instructions up to and including the bcctr instruction but no target instructions. When the next mtctr instruction is executed, the branch logic of the processor grabs the data and starts fetching using that target address. Since there are no other target instructions that were fetched, no flush is needed if that target address is the correct address, i.e. the branch prediction is correct. | 12-11-2008 |
20090049286 | DATA PROCESSING SYSTEM, PROCESSOR AND METHOD OF DATA PROCESSING HAVING IMPROVED BRANCH TARGET ADDRESS CACHE - A processor includes an execution unit and instruction sequencing logic that fetches instructions from a memory system for execution. The instruction sequencing logic includes branch logic that outputs predicted branch target addresses for use as instruction fetch addresses. The branch logic includes a level one branch target address cache (BTAC) and a level two BTAC each having a respective plurality of entries each associating at least a tag with a predicted branch target address. The branch logic accesses the level one and level two BTACs in parallel with a tag portion of a first instruction fetch address to obtain a first predicted branch target address from the level one BTAC for use as a second instruction fetch address in a first processor clock cycle and a second predicted branch target address from the level two BTAC for use as a third instruction fetch address in a later second processor clock cycle. | 02-19-2009 |
20090198962 | DATA PROCESSING SYSTEM, PROCESSOR AND METHOD OF DATA PROCESSING HAVING BRANCH TARGET ADDRESS CACHE INCLUDING ADDRESS TYPE TAG BIT - In at least one embodiment, a processor includes an execution unit and instruction sequencing logic that fetches instructions from a memory system for execution by the execution unit. The instruction sequencing logic includes branch logic that outputs predicted branch target addresses for use as instruction fetch addresses. The branch logic includes a branch target address prediction circuitry concurrently holding a first entry providing storage for a first branch target address prediction associating a first instruction fetch address with a first branch target address to be used as an instruction fetch address and a second entry providing storage for a second branch target address prediction associating the first instruction fetch address with a different second branch target address. The first entry indicates a first instruction address type for the first instruction fetch address, and the second entry indicates a second instruction address type for the first instruction fetch address. | 08-06-2009 |
20090198981 | DATA PROCESSING SYSTEM, PROCESSOR AND METHOD OF DATA PROCESSING HAVING BRANCH TARGET ADDRESS CACHE STORING DIRECT PREDICTIONS - In at least one embodiment, a processor includes at least one execution unit and instruction sequencing logic that fetches instructions for execution by the execution unit. The instruction sequencing logic includes branch logic that outputs predicted branch target addresses for use as instruction fetch addresses. The branch logic includes a branch target address cache (BTAC) having at least one direct entry providing storage for a direct branch target address prediction associating a first instruction fetch address with a branch target address to be used as a second instruction fetch address immediately after the first instruction fetch address and at least one indirect entry providing storage for an indirect branch target address prediction associating a third instruction fetch address with a branch target address to be used as a fourth instruction fetch address subsequent to both the third instruction fetch address and an intervening fifth instruction fetch address. | 08-06-2009 |
20090198982 | DATA PROCESSING SYSTEM, PROCESSOR AND METHOD OF DATA PROCESSING HAVING BRANCH TARGET ADDRESS CACHE SELECTIVELY APPLYING A DELAYED HIT - In at least one embodiment, a processor includes at least one execution unit that executes instructions and instruction sequencing logic, coupled to the at least one execution unit, that fetches instructions from a memory system for execution by the at least one execution unit. The instruction sequencing logic including branch target address prediction circuitry that stores a branch target address prediction associating a first instruction fetch address with a branch target address to be used as a second instruction fetch address. The branch target address prediction circuitry includes delay logic that, in response to at least a tag portion of a third instruction fetch address matching the first instruction fetch address, delays access to the memory system utilizing the second instruction fetch address if no branch target address prediction was made in an immediately previous cycle of operation. | 08-06-2009 |
20090198983 | GLOBAL HISTORY FOLDING TECHNIQUE OPTIMIZED FOR TIMING - A global history vector (GHV) mechanism maintains a folded GHV with higher order entries an an unfolded GHV with lower order entries. When a new entry arrives at the GHV, the GHV mechanism performs an XOR of the oldest unfolded entry in the unfolded GHV with the new entry. The XOR result is then shifted into the folded GHV as the newest folded entry. The oldest folded entry is discarded during the shift in of the newest folded entry. The GHV mechanism thus provides a resulting folded GHV that is current and can be utilized for XORing with an IFAR by performing an XOR operation. Only a single XOR logic is required to perform a single bit XOR operation between the oldest entry and the youngest entry, resulting in reducing the cycle time required to complete the folding operation on a GHV. | 08-06-2009 |
20090198985 | DATA PROCESSING SYSTEM, PROCESSOR AND METHOD OF DATA PROCESSING HAVING BRANCH TARGET ADDRESS CACHE WITH HASHED INDICES - In at least one embodiment, a processor includes at least one execution unit that executes instructions and instruction sequencing logic, coupled to the at least one execution unit, that fetches instructions from a memory system for execution by the at least one execution unit. The instruction sequencing logic including a branch target address cache (BTAC) including a plurality of entries for storing branch target address predictions. The BTAC includes index logic that selects an entry to access utilizing a BTAC index based upon at least a set of higher order bits of an instruction address and a set of lower order bits of the instruction address. | 08-06-2009 |
20100031011 | METHOD AND APPARATUS FOR OPTIMIZED METHOD OF BHT BANKING AND MULTIPLE UPDATES - The invention relates to a method and apparatus for controlling the instruction flow in a computer system and more particularly to the predicting of outcome of branch instructions using branch prediction arrays, such as BHTs. In an embodiment, the invention allows concurrent BHT read and write accesses without the need for a multi-ported BHT design, while still providing comparable performance to that of a multi-ported BHT design. | 02-04-2010 |
20100262806 | Tracking Effective Addresses in an Out-of-Order Processor - Mechanisms, in a data processing system, are provided for tracking effective addresses through a processor pipeline of the data processing system. The mechanisms comprise logic for fetching an instruction from an instruction cache and associating, by an effective address table logic in the data processing system, an entry in an effective address table (EAT) data structure with the fetched instruction. The mechanisms further comprise logic for associating an effective address tag (eatag) with the fetched instruction, the eatag comprising a base eatag that points to the entry in the EAT and an eatag offset. Moreover, the mechanisms comprise logic for processing the instruction through the processor pipeline by processing the eatag. | 10-14-2010 |
20110131438 | Saving Power by Powering Down an Instruction Fetch Array Based on Capacity History of Instruction Buffer - Mechanisms for saving power by powering down an instruction fetch array based on capacity history information of an instruction buffer are provided. The mechanisms operate to receive a current access to an instruction buffer of a processor. The current access is a fetch of a group of one or more instructions to be stored in the instruction buffer. A determination is made, for one or more prior accesses occurring prior to the current access, if a predetermined pattern of capacity availability of the instruction buffer indicates that the instruction buffer is likely to have available capacity to store the group of one or more instructions of the current access. An instruction fetch unit array is powered down in response to a determination that the instruction buffer is not likely to have available capacity to store the group of one or more instructions of the current access. | 06-02-2011 |
20110314259 | OPERATING A STACK OF INFORMATION IN AN INFORMATION HANDLING SYSTEM - A pointer is for pointing to a next-to-read location within a stack of information. For pushing information onto the stack: a value is saved of the pointer, which points to a first location within the stack as being the next-to-read location; the pointer is updated so that it points to a second location within the stack as being the next-to-read location; and the information is written for storage at the second location. For popping the information from the stack: in response to the pointer, the information is read from the second location as the next-to-read location; and the pointer is restored to equal the saved value so that it points to the first location as being the next-to-read location. | 12-22-2011 |
20120005462 | Hardware Assist for Optimizing Code During Processing - A method, data processing system, and computer program product for obtaining information about instructions. Instructions are processed. In response to processing a branch instruction in the instructions, a determination is made as to whether a result from processing the branch instruction follows a prediction of whether a branch is predicted to occur for the branch instruction. In response to the result following the prediction, the branch instruction is added to a current segment in a trace. In response to an absence of the result following the prediction, the branch instruction is added to the current segment in the trace and a first new segment and a second new segment are created. The first new segment includes a first branch instruction reached in the instructions from following the prediction. The second new segment includes a second branch instruction in the instructions reached from not following the prediction. | 01-05-2012 |
20120173821 | Predicting the Instruction Cache Way for High Power Applications - A mechanism for accessing a cache memory is provided. With the mechanism of the illustrative embodiments, a processor of the data processing system performs a first execution a portion of code. During the first execution of the portion of code, information identifying which cache lines in the cache memory are accessed during the execution of the portion of code is stored in a storage device of the data processing system. Subsequently, during a second execution of the portion of code, power to the cache memory is controlled such that only the cache lines that were accessed during the first execution of the portion of code are powered-up. | 07-05-2012 |
20120254837 | TASK SWITCH IMMUNIZED PERFORMANCE MONITORING - A performance monitoring technique provides task-switch immune operation without requiring storage and retrieval of the performance monitor state when a task switch occurs. When a hypervisor signals that a task is being resumed, it provides an indication, which starts a delay timer. The delay timer is resettable in case a predetermined time period has not elapsed when the next task switch occurs. After the delay timer expires, analysis of the performance monitor measurements is resumed, which prevents an initial state or a state remaining from a previous task from corrupting the performance monitoring results. The performance monitor may be or include an execution trace unit that collects taken branches in a current trace and may use branch prediction success to determine whether to collect a predicted and taken branch instruction in a current trace or to start a new segment when the branch resolves in a non-predicted direction. | 10-04-2012 |
20130055033 | HARDWARE-ASSISTED PROGRAM TRACE COLLECTION WITH SELECTABLE CALL-SIGNATURE CAPTURE - Hardware-assisted program tracing is facilitated by a processor that includes a root instruction address register, a program trace signature computation unit and a call signature register. When a program instruction having an address matching the root instruction address register is executed, a program trace signature is captured in the call signature register and capture of branch history is commenced. By accumulating different values of the call signature register, for example in response to an interrupt generated when the root instruction is executed, software that performs program tracing can obtain signatures of all of the multiple execution paths that lead to the root instruction, which is also specified by software in order to set different root instructions for program tracing. In an alternative implementation, a storage for multiple call signatures is provided in the processor and read at once by the software. | 02-28-2013 |
20130073833 | REDUCING STORE-HIT-LOADS IN AN OUT-OF-ORDER PROCESSOR - A technique for reducing store-hit-loads in an out-of-order processor includes storing a store address of a store instruction associated with a store-hit-load (SHL) pipeline flush in an SHL entry. In response to detecting another SHL pipeline flush for the store address, a current count associated with the SHL entry is updated. In response to the current count associated with the SHL entry reaching a first terminal count, a dependency for the store instruction is created such that execution of a younger load instruction with a load address that overlaps the store address stalls until the store instruction executes. | 03-21-2013 |
20140059523 | HARDWARE-ASSISTED PROGRAM TRACE COLLECTION WITH SELECTABLE CALL-SIGNATURE CAPTURE - Hardware-assisted program tracing is facilitated by a processor that includes a root instruction address register, a program trace signature computation unit and a call signature register. When a program instruction having an address matching the root instruction address register is executed, a program trace signature is captured in the call signature register and capture of branch history is commenced. By accumulating different values of the call signature register, for example in response to an interrupt generated when the root instruction is executed, software that performs program tracing can obtain signatures of all of the multiple execution paths that lead to the root instruction, which is also specified by software in order to set different root instructions for program tracing. In an alternative implementation, a storage for multiple call signatures is provided in the processor and read at once by the software. | 02-27-2014 |
20140298105 | IDENTIFYING AND TAGGING BREAKPOINT INSTRUCTIONS FOR FACILITATION OF SOFTWARE DEBUG - A processor stores an address of a first instruction of a first instruction set into a first register. The processor determines that a first instruction set location of the first instruction address matches a breakpoint instruction set location of a breakpoint instruction address stored in a second register, wherein the second register includes a state bit. The processor retrieves the first instruction. The processor determines that a breakpoint instruction offset of the breakpoint instruction address identifies the first instruction as the breakpoint. The processor sets the state bit of the second register. The processor removes the first instruction based on the state bit being set and then re-retrieves the first instruction. The processor tags the first instruction and generates an interrupt based on either the tagged first instruction being next to completion or the tagged first instruction being completed. | 10-02-2014 |
20140298106 | IDENTIFYING AND TAGGING BREAKPOINT INSTRUCTIONS FOR FACILITATION OF SOFTWARE DEBUG - A processor stores an address of a first instruction of a first instruction set into a first register. The processor determines that a first instruction set location of the first instruction address matches a breakpoint instruction set location of a breakpoint instruction address stored in a second register, wherein the second register includes a state bit. The processor retrieves the first instruction. The processor determines that a breakpoint instruction offset of the breakpoint instruction address identifies the first instruction as the breakpoint. The processor sets the state bit of the second register. The processor removes the first instruction based on the state bit being set and then re-retrieves the first instruction. The processor tags the first instruction and generates an interrupt based on either the tagged first instruction being next to completion or the tagged first instruction being completed. | 10-02-2014 |
20150032997 | TRACKING LONG GHV IN HIGH PERFORMANCE OUT-OF-ORDER SUPERSCALAR PROCESSORS - Tracking global history vector in high performance out of order superscalar processors, in one aspect, may comprise providing a shift register storing global history vector that stores branch predictions and outcomes. A counter is maintained to determine a number of bits to shift the shift register to recover branch history. In another aspect, the global history vector may be implemented with a circular buffer structure. Youngest and oldest pointers to the circular buffer are maintained and used in recovery. | 01-29-2015 |