Entries |
Document | Title | Date |
20080201556 | PROGRAM INSTRUCTION REARRANGEMENT METHODS IN COMPUTER - A program instruction rearrangement method calculates the dependency depth of each instruction of a program based on dependency between instructions, based on register access order, and rearranging instructions based on the dependency depth. Additionally, the dependency between instructions can be utilized to locate and remove redundant instructions. | 08-21-2008 |
20080229077 | COMPUTER PROCESSING SYSTEM EMPLOYING AN INSTRUCTION REORDER BUFFER - A method and a system for operating a plurality of processors that each includes an execution pipeline for processing dependence chains, the method comprising: configuring the plurality of processors to execute the dependence chains on execution pipelines; implementing a Super Re-Order Buffer (SuperROB) in which received instructions are re-ordered after out-of-order execution when at least one of the plurality of processors is in an Instruction Level Parallelism (ILP) mode and at least one of the plurality of processors has a Thread Level Parallelism (TLP) core; detecting an imbalance in a dispatch of instructions of a first dependence chain compared to a dispatch of instructions of a second dependence chain with respect to dependence chain priority; determining a source of the imbalance; and activating the ILP mode when the source of the imbalance has been determined. | 09-18-2008 |
20080244233 | MACHINE CLUSTER TOPOLOGY REPRESENTATION FOR AUTOMATED TESTING - Software (such as server products) operating in a complex networked environment often run on multi-machine installations that are known as machine clusters. A server product can be tested on a server machine type. The server product can be tested by tracking the constituent machines of a machine cluster, and configuring and recording the roles that each machine in the machine cluster plays. Scenarios targeting a single server machine-type can be seamlessly mapped from the single machine scenario to a machine cluster of any number of machines, while handling actions such as executing tests and gathering log files from all machines of a machine cluster as a unit. | 10-02-2008 |
20080256339 | Techniques for Tracing Processes in a Multi-Threaded Processor - A technique for tracing processes executing in a multi-threaded processor includes forming a trace message that includes a virtual core identification (VCID) that identifies an associated thread. The trace message, including the VCID, is then transmitted to a debug tool. | 10-16-2008 |
20080270762 | Method, System and Computer Program Product for Register Management in a Simulation Enviroment - A method for register management in a simulation environment including receiving an instruction from an instruction unit decode pipeline. An address generation interlock (AGI) function is executed in the simulation environment if the instruction is an AGI instruction. The executing an AGI function is responsive to a pool of registers controlled by a register manager and to the instruction. An early AGI function is executed in the simulation environment if the instruction is an early AGI instruction. The executing an early AGI function is responsive to the pool of registers and to the instruction. | 10-30-2008 |
20080276074 | SIMPLE LOAD AND STORE DISAMBIGUATION AND SCHEDULING AT PREDECODE - Embodiments of the invention provide a processor for executing instructions. In one embodiment, the processor includes circuitry to receive a load instruction and a store instruction to be executed in the processor and detect a conflict between the load instruction and the store instruction. Detecting the conflict includes determining if load-store conflict information indicates that the load instruction previously conflicted with the store instruction. The load-store conflict information is stored for both the load instruction and the store instruction. The processor further includes circuitry to schedule execution of the load instruction and the store instruction so that execution of the load instruction and the store instruction do not result in a conflict. | 11-06-2008 |
20080276075 | SIMPLE LOAD AND STORE DISAMBIGUATION AND SCHEDULING AT PREDECODE - Embodiments of the invention provide a method and processor for executing instructions. In one embodiment, the method includes receiving a load instruction and a store instruction to be executed in the processor and detecting a conflict between the load instruction and the store instruction. Detecting the conflict includes determining if load-store conflict information indicates that the load instruction previously conflicted with the store instruction. The load-store conflict information is stored for both the load instruction and the store instruction. The method further includes scheduling execution of the load instruction and the store instruction so that execution of the load instruction and the store instruction do not result in a conflict. | 11-06-2008 |
20080294877 | NUMERICAL CONTROLLER HAVING FUNCTION OF RESUMING LOOK-AHEAD OF BLOCK - A numerical controller which performs look-ahead control by suspending analysis of a read block of a machining program and resuming the analysis of the read block at a suspended stage when execution of a block immediately preceding the read block is completed. The numerical controller successively reads and analyzes blocks of the machining program in advance, stores the analyzed blocks into a buffer, and then executes the stored blocks, and comprises means for determining whether or not a read block contains a look-ahead stop code to suspend analysis of the block, means for suspending the analysis of the block when the look-ahead stop code is determined, means for determining whether or not execution of a block immediately preceding the suspended block is completed, and resuming means for resuming the analysis of the suspended block when the execution of the block immediately preceding the suspended block is completed. | 11-27-2008 |
20080313434 | Rendering Processing Apparatus, Parallel Processing Apparatus, and Exclusive Control Method | 12-18-2008 |
20090006817 | Mechanisms for Placing a Processor into a Gradual Slow Mode of Operation - Mechanisms for placing a processor into a gradual slow down mode of operation are provided. The gradual slow down mode of operation comprises a plurality of stages of slow down operation of an issue unit in a processor in which the issuance of instructions is slowed in accordance with a staging scheme. The gradual slow down of the processor allows the processor to break out of livelock conditions. Moreover, since the slow down is gradual, the processor may flexibly avoid various degrees of livelock conditions. The mechanisms of the illustrative embodiments impact the overall processor performance based on the severity of the livelock condition by taking a small performance impact on less severe livelock conditions and only increasing the processor performance impact when the livelock condition is more severe. | 01-01-2009 |
20090019264 | ADAPTIVE EXECUTION CYCLE CONTROL METHOD FOR ENHANCED INSTRUCTION THROUGHPUT - A method, system and processor for increasing the instruction throughput in a processor executing longer latency instructions within the instruction pipeline. Logic associated with specific stages of the execution pipeline, responsible for executing the particular type of instructions, determines when at least a threshold number of the particular-type instructions is scheduled to be executed. The logic then automatically changes an execution cycle frequency of the specific pipeline stages from a first cycle frequency to a second, pre-established higher cycle frequency, which enables more efficient execution and higher execution throughput of the particular-type instructions. The cycle frequency of only the one or more functional stages are switched to the higher cycle frequency independent of the cycle frequency of the other functional stages in the processor pipeline. The logic also automatically switches the execution cycle frequency of the specific pipeline stages back from the second, higher cycle frequency to the first cycle frequency, when the number of scheduled first-type instructions has completed execution. | 01-15-2009 |
20090019265 | ADAPTIVE EXECUTION FREQUENCY CONTROL METHOD FOR ENHANCED INSTRUCTION THROUGHPUT - A method, system and processor for adaptively and selectively controlling the instruction execution frequency of a data processor. Processing logic or a software compiler determines when a number of first-type instructions, requiring longer execution latency, are scheduled to be executed. The logic/compiler then triggers the CPM unit to automatically switch the execution frequency of the instruction processor from a first frequency that is optimal for processing regular-type instructions to a second, pre-established lower frequency that is optimal for processing the first-type instructions, to enable more efficient execution and higher execution throughput of the number of first-type operations within the processor. When the first-type instructions have completed execution, the processor's instruction execution frequency is returned to the first optimal frequency. | 01-15-2009 |
20090031115 | METHOD FOR HIGH INTEGRITY AND HIGH AVAILABILITY COMPUTER PROCESSING - A method of providing high integrity checking for an N-lane computer processing module (Module), N being an integer greater than equal to two. The method comprises the steps of: detecting, by a data Output Management unit (OM), when any of the N processing lanes sends different output data; configuring each Hosted Application as either normal or high integrity; for the Hosted Applications configured as high integrity, running an identical version of the software source code targeted for similar or dissimilar microprocessors on all N processing lanes, and activating a Time Management Unit, Critical Regions Management Unit, data Input Management Unit and data Output Management Unit for each of the N processing lanes; and for the Hosted Applications configured as normal integrity, running a copy of the software on one of the N processing lanes, and not activating the Time Management Unit, Critical Regions Management Unit, Input Management Unit and Output Management Unit for the one activated processing lane while that Hosted Application is running. | 01-29-2009 |
20090043991 | Scheduling Multithreaded Programming Instructions Based on Dependency Graph - A computer implemented method for scheduling multithreaded programming instructions based on the dependency graph wherein the dependency graph organizes the programming instruction logically based on blocks, nodes, and super blocks and wherein the programming instructions could be executed outside of a critical section may be executed outside of the critical section by inserting dependency relationship in the dependency graph. | 02-12-2009 |
20090043992 | Method And System For Data Speculation On Multicore Systems - The method and system for data speculation of multicore systems are disclosed. In one embodiment, a method includes dynamically determining whether a current speculative load instruction and an associated store instruction have same memory addresses in an application thread in compiled code running on a main core using a dynamic helper thread running on a idle core substantially before encountering the current speculative load instruction. The instruction sequence associated with the current speculative load instruction is then edited by the dynamic helper thread based on the outcome of the determination so that the current speculative load instruction becomes a non-speculative load instruction. | 02-12-2009 |
20090043993 | Monitoring Values of Signals within an Integrated Circuit - An integrated circuit, and method of reviewing values of one or more signals occurring within that integrated circuit, are provided. The integrated circuit comprises processing logic for executing a program, and monitoring logic for reviewing values of one or more signals occurring within the integrated circuit as a result of execution of the program. The monitoring logic stores configuration data, which can be software programmed in relation to the signals to be monitored. Further, the monitoring logic makes use of a Bloom filter which, for a value to be reviewed, performs a hash operation on that value in order to reference the configuration data to determine whether that value is either definitely not a value within the range or is potentially a value within the range of values. If the value is determined to be within the set of values, then a trigger signal is generated which can be used to trigger a further monitoring process. | 02-12-2009 |
20090043994 | RESOURCE FLOW COMPUTER - A scalable processing system includes a memory device having a plurality of executable program instructions, wherein each of the executable program instructions includes a timetag data field indicative of the nominal sequential order of the associated executable program instructions. The system also includes a plurality of processing elements, which are configured and arranged to receive executable program instructions from the memory device, wherein each of the processing elements executes executable instructions having the highest priority as indicated by the state of the timetag data field. | 02-12-2009 |
20090055630 | Program Processing Device, Parallel Processing Program, Program Processing Method, Parallel Processing Compiler, Recording Medium Containing The Parallel Processing Compiler, And Multi-Processor System - In a multi-processor system for performing a parallel processing, each of a plurality of processors includes a communication processing unit for performing control between the processors in a data flow machine-type data-driven control method; and a program processing unit for performing control in each processor in a Neumann-type program-driven control method. The communication processing unit performs a communication between the processors in synchronization with the program processing unit, and has a function of detecting a communication data hazard between the processors. The program processing unit performs a processing based on an execution code stored in a local memory, and has a function of executing or suspending the execution code, according to a result of detecting the data hazard. | 02-26-2009 |
20090055631 | Method And Apparatus For Register Renaming Using Multiple Physical Register Files And Avoiding Associative Search - A method for implementing a register renaming scheme for a digital data processor using a plurality of physical register files for supporting out-of-order execution of a plurality of instructions from one or more threads, the method comprising: using a DEF table to store the instruction dependencies between the plurality of instructions using the instruction tags, the DEF table being indexed by a logical register name and including one entry per logical register; using a rename USE table indexed by the instruction tags to store logical-to-physical register mapping information shared by multiple sets of different types of non-architected copies of logical registers used by multiple threads; using a last USE table to transfer data of the multiple sets of different types of non-architected copies of logical registers into the first set of architected registered files, the last USE table being indexed by a physical register name in the second set of rename registered files; and performing the register renaming scheme at the instruction dispatch or wake-up/issue time. | 02-26-2009 |
20090063823 | Method and System for Tracking Instruction Dependency in an Out-of-Order Processor - A method of tracking instruction dependency in a processor issuing instructions speculatively includes recording in an instruction dependency array (IDA) an entry for each instruction that indicates data dependencies, if any, upon other active instructions. An output vector read out from the IDA indicates data readiness based upon which instructions have previously been selected for issue. The output vector is used to select and read out issue-ready instructions from an instruction buffer. | 03-05-2009 |
20090063824 | Compound instructions in a multi-threaded processor - A multi-threaded processor determines which threads to execute, switches between execution of threads in dependence on the determination, each thread being coupled to a respective register for storing the state of the thread and used in executing instructions on the thread and includes a further register shared by all the threads. The executing threads use the further register to improve execution performance and prevents the switching of execution to another thread while the internal register is in use. | 03-05-2009 |
20090070560 | Method and Apparatus for Accelerating the Access of a Multi-Core System to Critical Resources - A method accelerates access of a multi-core system to its critical resources, which includes preparing to delete a critical node in a critical resource, separating the critical node from the critical resource, and deleting the critical node if the conditions for deleting the critical node are satisfied. An apparatus includes a confirmation module for the node to be deleted and a deletion module to accelerate access of a multi-core system to its critical resources. | 03-12-2009 |
20090089552 | Dependency Graph Parameter Scoping - A number of tasks are defined according to a dependency graph. Multiple parameter contexts are maintained, each associated with a different scope of the tasks. A parameter used in a first of the tasks is bound to a value. This binding includes identifying a first of the contexts according to the dependency graph and retrieving the value for the parameter from the identified context. | 04-02-2009 |
20090113182 | System and Method for Issuing Load-Dependent Instructions from an Issue Queue in a Processing Unit - A system and method for issuing load-dependent instructions from an issue queue in a processing unit in a data processing system. In response to a LSU determining that a load request from a load instruction missed a first level in a memory hierarchy, a LMQ allocates a load-miss queue entry corresponding to the load instruction. The LMQ associates at least one instruction dependent on the load request with the load-miss queue entry. Once data associated with the load request is retrieved, the LMQ selects at least one instruction dependent on the load request for execution on the next cycle. At least one instruction dependent on the load request is executed and a result is outputted. | 04-30-2009 |
20090138681 | SYNCHRONIZATION OF PARALLEL PROCESSES - A speculative execution capability of a processor is exposed to program control through at least one machine instruction. The at least one machine instruction may be two instructions designed to facilitate synchronization between parallel processes. According to an aspect, an instruction set architecture includes circuitry that handles a speculative execution instruction and a speculation termination instruction. The speculative execution instruction may be an instruction that takes first and second operands, causes the processor to speculatively execute additional instructions if a memory location contains a value, and causes the processor to start executing instructions from an address indicated by the second operand if a mis-speculation occurs, and the speculation termination instruction may be an instruction that causes the processor to begin retiring the additional instructions. | 05-28-2009 |
20090164759 | Execution of Single-Threaded Programs on a Multiprocessor Managed by an Operating System - A method and apparatus for speculatively executing a single threaded program within a multi-core processor which includes identifying an idle core within the multi-core processor, performing a look ahead operation on the single thread instructions to identify speculative instructions within the single thread instructions, and allocating the idle core to execute the speculative instructions. | 06-25-2009 |
20090164760 | METHOD AND SYSTEM FOR MODULE INITIALIZATION WITH AN ARBITRARY NUMBER OF PHASES - A method for initializing a module that includes identifying a first module for initialization, and performing a plurality of processing phases on the first module and all modules in a dependency graph of the first module. Performing the plurality of processing phases includes, for each module, executing a processing phase of the plurality of processing phases on the module, determining whether the processing phase has been executed on all modules in a dependency graph of the module, and when the processing phase has been executed for all modules in the dependency graph of the module, executing a subsequent processing phase of the plurality of processing phases on the module. | 06-25-2009 |
20090172360 | INFORMATION PROCESSING APPARATUS EQUIPPED WITH BRANCH PREDICTION MISS RECOVERY MECHANISM - The information processing apparatus comprises a cache miss detection unit detects a cache miss of a load instruction; an instruction issuance stop unit stops the issuance of an instruction subsequent to a conditional branch instruction if the branch direction of a conditional branch instruction subsequent to the load instruction for which a cache miss has been detected by the cache miss detection unit is not established at the timing of issuance, wherein a period of time cancels an issued instruction, the cancelation having been caused by a branch prediction miss, is deleted and thereby a penalty for the branch prediction miss is concealed under a wait time due to a cache miss. | 07-02-2009 |
20090182988 | Compare Relative Long Facility and Instructions Therefore - A method, system and program product for comparing two operands wherein one operand is obtained from memory wherein the address of the memory operand is based an offset of the program counter rather than an explicitly defined address location. The offset is defined by an immediate field of the instruction which is sign extended and is aligned as a halfword address when added to the value of the program counter. | 07-16-2009 |
20090193233 | Mechanism for Avoiding Check Stops in Speculative Accesses While Operating in Real Mode - A method and processor for avoiding check stops in speculative accesses. An execution unit, e.g., load/store unit, may be coupled to a queue configured to store instructions. A register, coupled to the execution unit, may be configured to store a value corresponding to an address in physical memory. When the processor is operating in real mode, the execution unit may retrieve the value stored in the register. Upon the execution unit receiving a speculative instruction, e.g., speculative load instruction, from the queue, a determination may be made as to whether the address of the speculative instruction is at or below the retrieved value. If the address of the speculative instruction is at or below this value, then the execution unit may safely speculatively execute this instruction while avoiding a check stop since all the addresses at or below this value are known to exist in physical memory. | 07-30-2009 |
20090198967 | METHOD AND STRUCTURE FOR LOW LATENCY LOAD-TAGGED POINTER INSTRUCTION FOR COMPUTER MICROARCHITECHTURE - A methodology and implementation of a load-tagged pointer instruction for RISC based microarchitecture is presented. A first lower latency, speculative implementation reduces overall throughput latency for a microprocessor system by estimating the results of a particular instruction and confirming the integrity of the estimate a little slower than the normal instruction execution latency. A second higher latency, non-speculative implementation that always produces correct results is invoked by the first when the first guesses incorrectly. The methodologies and structures disclosed herein are intended to be combined with predictive techniques for instruction processing to ultimately improve processing throughput. | 08-06-2009 |
20090198968 | Method, Apparatus and Software for Processing Software for Use in a Multithreaded Processing Environment - A method, apparatus and software for processing software code for use in a multithreaded processing environment in which lock verification mechanisms are automatically inserted in the software code and arranged to determine whether a respective shared storage element is locked prior to the use of the respective shared storage element by a given processing thread in a multithreaded processing environment. | 08-06-2009 |
20090198969 | Microprocessor systems - A microprocessor pipeline arrangement | 08-06-2009 |
20090210668 | System and Method for Optimization Within a Group Priority Issue Schema for a Cascaded Pipeline - The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if a plurality of load instructions are in the issue group, if so, schedule the plurality of load instructions in descending order of longest dependency chain depth to shortest dependency chain depth in a shortest to longest available execution pipelines; and (3) execute the issue group of instructions in the cascaded delayed execution pipeline unit. | 08-20-2009 |
20090210669 | System and Method for Prioritizing Floating-Point Instructions - The present invention provides a system and method for prioritizing floating-point instructions in a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: ( | 08-20-2009 |
20090210670 | System and Method for Prioritizing Arithmetic Instructions - The present invention provides a system and method for prioritizing arithmetic instructions in a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one arithmetic instruction is in the issue group, if so scheduling the least one arithmetic instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; (3) determine if there is an issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by scheduling the at least one arithmetic instruction in a different execution pipeline; (4) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit. | 08-20-2009 |
20090210671 | System and Method for Prioritizing Store Instructions - The present invention provides a system and method for prioritizing store instructions in a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one store instruction is in the issue group, if so scheduling the least one store instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; (3) determine if there is an issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by scheduling the at least one store instruction in a different execution pipeline; (4) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit. | 08-20-2009 |
20090210672 | System and Method for Resolving Issue Conflicts of Load Instructions - The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: receive an issue group of instructions; determine if at least one load instruction is in the issue group, if so scheduling the least one load instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; determine if there is an issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by determining which load involved in the issue conflict has the closest dependency, and schedule any load involved in the issue conflict not having the closest dependency in a delayed execution pipeline. | 08-20-2009 |
20090210673 | System and Method for Prioritizing Compare Instructions - The present invention provides a system and method for prioritizing compare instructions in a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one compare instruction is in the issue group, if so scheduling the least one compare instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; (3) determine if there is an issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by scheduling the at least one compare instruction in a different execution pipeline; (4) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit. | 08-20-2009 |
20090210674 | System and Method for Prioritizing Branch Instructions - The present invention provides a system and method for prioritizing branch instructions in a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one branch instruction is in the issue group, if so scheduling the least one branch instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; (3) determine if there is an issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by scheduling the at least one branch instruction in a different execution pipeline; (4) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit. | 08-20-2009 |
20090210675 | METHOD AND SYSTEM FOR EARLY INSTRUCTION TEXT BASED OPERAND STORE COMPARE REJECT AVOIDANCE - A method and system for early instruction text based operand store compare avoidance in a processor are provided. The system includes a processor pipeline for processing instruction text in an instruction stream, where the instruction text includes operand address information. The system also includes delay logic to monitor the instruction stream. The delay logic performs a method that includes detecting a load instruction following a store instruction in the instruction stream, comparing the operand address information of the store instruction with the load instruction. The method also includes delaying the load instruction in the processor pipeline in response to detecting a common field value between the operand address information of the store instruction and the load instruction. | 08-20-2009 |
20090259827 | SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR CREATING DEPENDENCIES AMONGST INSTRUCTIONS USING TAGS - A system, method, and computer program product are provided for creating dependencies amongst instructions using tags. In operation, tags are associated with a first instruction and a second instruction. Additionally, a dependency is created between the first instruction and the second instruction, utilizing the tags. Furthermore, the first instruction and the second instruction are executed in accordance with the dependency. | 10-15-2009 |
20090265527 | Multiport Execution Target Delay Queue Fifo Array - One embodiment provides a method of forwarding data in a processor. The method generally includes providing at least one cascaded delayed execution pipeline unit having at least a first pipeline and a second pipeline for executing first and second instructions in a common issue group, wherein the second pipeline executes the second instruction in a delayed manner relative to the execution of the first instruction in the first pipeline, storing results generated by an execution unit of the first pipeline in a first-in first-out (FIFO) storage target delay queue, determining if the target delay queue contains source data for executing the second instruction, and if the target delay queue contains source data for the second instruction, forwarding the source data for the second instruction from the target delay queue to an execution unit of the second pipeline. | 10-22-2009 |
20090276608 | MICRO PROCESSOR, METHOD FOR ENCODING BIT VECTOR, AND METHOD FOR GENERATING BIT VECTOR - In a microprocessor for pipeline processing instruction execution, dependency relationship information representing a dependency relationship of each of a plurality of instructions with all the preceding instructions is stored, and whether or not the instructions in stages after instruction issue depend on the instruction of a miss speculation is judged based on the dependency relationship information if the miss speculation occurs during the execution of the plurality of instructions in accordance with a set schedule. Thus, this microprocessor can perform a recovery processing for invalidating only the instructions in a dependency relationship at once in the case of a miss speculation in speculative scheduling. | 11-05-2009 |
20100017581 | LOW OVERHEAD ATOMIC MEMORY OPERATIONS - Embodiments that provide low-overhead restricted memory transactions are disclosed. In accordance with one embodiment, the method includes providing one or more references to processor-specific data that corresponds to a first processor. The method further includes detecting an interrupt to the first processor when the interrupt indicates modification of the one or more references to the processor-specific data during the execution of one or more instructions. The method also includes taking remedial action on the one or more instructions when the interrupt is detected. | 01-21-2010 |
20100023731 | Generation of parallelized program based on program dependence graph - A method of generating a parallelized program includes calculating an execution order of vertices of a degenerate program dependence graph, generating basic blocks by consolidating vertices including neither branching nor merging, generating procedures each corresponding to a respective one of the vertices, and generating a procedure control program by arranging an instruction to execute a first procedure after an instruction to wait for output data transfer from a second procedure for a dependence relation crossing a border between the basic blocks, generating an instruction to register a dependence relation that a third procedure has on output data transfer from a fourth procedure for a dependence relation within one of the basic blocks, and generating an instruction to perform a given data transfer directly from procedure to procedure for each of a data transfer within one of the basic blocks and a data transfer crossing a border between the basic blocks. | 01-28-2010 |
20100058033 | System and Method for Double-Issue Instructions Using a Dependency Matrix and a Side Issue Queue - A method receives a complex instruction comprising a first portion and a second portion. The method sets a single issue queue slot and allocates an execution unit for the complex instruction, and identifies dependencies in the first and second portions. The method sets a dependency matrix slot and a consumers table slot for the first and section portion. In the event the first portion dependencies have been satisfied, the method issues the first portion and then issues the second portion from the single issue queue slot. In the event the second portion dependencies have not been satisfied, the method places the second portion into a side issue queue. The method issues the second portion when the side issue queue indicates that the second portion is eligible for issue. | 03-04-2010 |
20100058034 | CREATING REGISTER DEPENDENCIES TO MODEL HAZARDOUS MEMORY DEPENDENCIES - A method of transforming low-level programming language code written for execution by a target processor includes receiving data comprising a plurality of low-level programming language instructions ordered for sequential execution by the target processor; detecting a pair of instructions in the plurality of low-level programming language instructions having a memory dependency therebetween; and inserting one or more instructions between the detected pair of instructions in the plurality of low-level programming language instructions having a memory dependency therebetween. The one or more instructions inserted between the detected pair of instructions create a true data dependency on a value stored in an architectural register of the target processor between the detected pair of instructions. | 03-04-2010 |
20100077182 | GENERATING STOP INDICATORS BASED ON CONDITIONAL DATA DEPENDENCY IN VECTOR PROCESSORS - Embodiments of a method for performing parallel operations in a computer system when one or more conditional dependencies may be present, where a given conditional dependency includes a dependency associated with at least two data elements based on a pair of conditions. During operation, a processor receives instructions for generating one or more stop indicators based on actual dependencies, where a given stop indicator indicates the position of a given actual dependency that can lead to different results when the data elements are processed in parallel than when the data elements are processed sequentially, and where the given actual dependency occurs when the pair of conditions matches one or more criteria. Then, the processor executes the instructions for generating the one or more stop indicators. | 03-25-2010 |
20100077183 | CONDITIONAL DATA-DEPENDENCY RESOLUTION IN VECTOR PROCESSORS - Embodiments of a method for performing parallel operations in a computer system when one or more conditional dependencies may be present, where a given conditional dependency includes a dependency associated with at least two data elements based on a pair of conditions. During operation, a processor receives instructions for generating a vector of tracked positions of actual dependencies, where a given tracked position indicates the position of a given actual dependency, and where the given actual dependency occurs when the pair of condition matches one or more criteria. Then, the processor executes the instructions for generating the vector of tracked positions. | 03-25-2010 |
20100122069 | Macroscalar Processor Architecture - A macroscalar processor architecture is described herein. In one embodiment, a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. Other methods and apparatuses are also described. | 05-13-2010 |
20100169616 | REDUCING INSTRUCTION COLLISIONS IN A PROCESSOR - An embodiment of a technique for selecting instructions for execution from an issue queue at multiple function units while reducing the chances of instruction collisions. Each function unit in a processor may include a selection logic circuit that selects a specific instruction from the issue queue for execution. In order to avoid instruction collision, a function unit may have a selection logic circuit that may select two instructions from an instruction queue: one according to a first selection technique and one according to a second selection technique. Then, by comparing the instruction selected by the first selection technique to the instruction selected by the selection logic circuit of another function unit, the instruction selected by the second technique may be used instead if there will be an instruction collision because the instruction selected by the first selection technique is the same as the instruction selected at a different function unit. | 07-01-2010 |
20100169617 | POWER EFFICIENT SYSTEM FOR RECOVERING AN ARCHITECTURE REGISTER MAPPING TABLE - A system for recovering an architecture register mapping table (ARMT). The system includes a first number of collection circuits and decode circuits, a second number of selection circuits, and an enable circuit. Information related to the mapping between each physical register and an appropriate architecture register is obtained from a physical register mapping table (PRMT) by one and only one collection circuit during only one of a fourth number of instruction cycles. Each decode circuit has its input coupled to the output of one different collection circuit and is capable of converting its input into a third number bit wide binary string selection code at its output. Each selection circuit is configured to receive from each selection code a bit from a bit position associated with that selection circuit. The enable circuit is configured to appropriately enable mapping of information from the PRMT to the ARMT. | 07-01-2010 |
20100205407 | PIPELINED MICROPROCESSOR WITH FAST NON-SELECTIVE CORRECT CONDITIONAL BRANCH INSTRUCTION RESOLUTION - A microprocessor includes a pipeline of stages for processing instructions and first and second types of conditional branch instruction includable by a program. The microprocessor makes a prediction of conditional branch instructions of the first type and flushes the pipeline of instructions if the prediction is subsequently determined to be incorrect, thereby incurring a branch misprediction penalty related to processing of conditional branch instructions of the first type. The microprocessor always correctly resolves conditional branch instructions of the second type without making a prediction of conditional branch instructions of the second type, thereby avoiding ever incurring a branch misprediction penalty related to processing of conditional branch instructions of the second type. | 08-12-2010 |
20100205408 | Speculative Region: Hardware Support for Selective Transactional Memory Access Annotation Using Instruction Prefix - A computer system and method is disclosed for executing selectively annotated transactional regions. The system is configured to determine whether an instruction within a plurality of instructions in a transactional region includes a given prefix. The prefix indicates that one or more memory operations performed by the processor to complete the instruction are to be executed as part of an atomic transaction. The atomic transaction can include one or more other memory operations performed by the processor to complete one or more others of the plurality of instructions in the transactional region. | 08-12-2010 |
20100217959 | INFORMATION PROCESSING APPARATUS, VIRTUAL STORAGE MANAGEMENT METHOD, AND STORAGE MEDIUM - A virtual storage management method that can increase the overall processing speed while preventing a processor from being overloaded. A request for acquisition of a memory area in a primary storage device is received from a process executed by a processor. It is determined whether or not the process that has made the acquisition request is a utility process executable in cooperation with another process. Control is provided so as to restrict swap-out of the utility process when it is determined that the process that has made the received acquisition request is a utility process executable in cooperation with another process, and a process cooperating with the utility process executed by the processor is a preferred process of which swap-out is restricted, and a processor utilization of the preferred process is greater than a predetermined value. | 08-26-2010 |
20100250902 | Tracking Deallocated Load Instructions Using a Dependence Matrix - A mechanism is provided for tracking deallocated load instructions. A processor detects whether a load instruction in a set of instructions in an issue queue has missed. Responsive to a miss of the load instruction, an instruction scheduler allocates the load instruction to a load miss queue and deallocates the load instruction from the issue queue. The instruction scheduler determines whether there is a dependence entry for the load instruction in an issue queue portion of a dependence matrix. Responsive to the existence of the dependence entry for the load instruction in the issue queue portion of the dependence matrix, the instruction scheduler reads data from the dependence entry of the issue queue portion of the dependence matrix that specifies a set of dependent instructions that are dependent on the load instruction and writes the data into a new entry in a load miss queue portion of the dependence matrix. | 09-30-2010 |
20100262809 | System and Method for Conflict Resolution During the Consolidation of Information Relating to a Data Service - A context aware mechanism including a system configured to receive presence data relating to at least one of a presentity and a watcher, and to resolve a conflict that arises during processing of the presence data according to a criteria. A method is also provided. | 10-14-2010 |
20100274992 | APPARATUS AND METHOD FOR HANDLING DEPENDENCY CONDITIONS - Techniques for handling dependency conditions, including evil twin conditions, are disclosed herein. An instruction may designate a source register comprising two portions. The source register may be a double-precision register and its two portions may be single-precision portions, each specified as destinations by two other single-precision instructions. Execution of these two single-precision instructions, especially on a register renaming machine, may result in the appropriate values for the two portions of the source register being stored in different physical locations, which can complicate execution of an instruction stream. In response to detecting a potential dependency, one or more instructions may be inserted in an instruction stream to enable the appropriate values to be stored within one physical double precision register, eliminating an actual or potential evil twin dependency. Embodiments including a compiler that inserts instructions in a generated instruction stream to eliminate dependency conditions are also contemplated. | 10-28-2010 |
20100274993 | LOGICAL MAP TABLE FOR DETECTING DEPENDENCY CONDITIONS - Techniques and structures are described which allow the detection of certain dependency conditions, including evil twin conditions, during the execution of computer instructions. Information used to detect dependencies may be stored in a logical map table, which may include a content-addressable memory. The logical map table may maintain a logical register to physical register mapping, including entries dedicated to physical registers available as rename registers. In one embodiment, each entry in the logical map table includes a first value usable to indicate whether only a portion of the physical register is valid and whether the physical register includes the most recent update to the logical register being renamed. Use of this first value may allow precise detection of dependency conditions, including evil twin conditions, upon an instruction reading from at least two portions of a logical register having an entry in the logical map table whose first value is set. | 10-28-2010 |
20100274994 | PROCESSOR OPERATING MODE FOR MITIGATING DEPENDENCY CONDITIONS - Various techniques for mitigating dependencies between groups of instructions are disclosed. In one embodiment, such dependencies include “evil twin” conditions, in which a first floating-point instruction has as a destination a first portion of a logical floating-point register (e.g., a single-precision write), and in which a second, subsequent floating-point instruction has as a source the first portion and a second portion of the same logical floating-point register (e.g., a double-precision read). The disclosed techniques may be applicable in a multithreaded processor implementing register renaming. In one embodiment, a processor may enter an operating mode in which detection of evil twin “producers” (e.g., single-precision writes) causes the instruction sequence to be modified to break potential dependencies. Modification of the instruction sequence may continue until one or more exit criteria are reached (e.g., committing a predetermined number of single-precision writes). This operating mode may be employed on a per-thread basis. | 10-28-2010 |
20100274995 | PROCESSOR AND METHOD OF CONTROLLING INSTRUCTION ISSUE IN PROCESSOR - One exemplary embodiment includes a processor including a plurality of execution units and an instruction unit. The instruction unit discriminates whether an instruction is a target instruction for which determination about availability of parallel issue based on dependency among instructions is to be made with respect to each instruction contained in an instruction stream. When a first instruction contained in the instruction stream is the target instruction, the instruction unit adjusts the number of instructions to be issued in parallel to the plurality of execution units based on a detection result of dependency among the first instruction and at least one subsequent instruction. Further, when the first instruction is not the target instruction, the instruction unit issues a group of a predetermined fixed number of instructions including the first instruction in parallel to the plurality of execution units unconditionally regardless of a detection result of dependency among the instruction group. | 10-28-2010 |
20100306506 | MICROPROCESSOR WITH SELECTIVE OUT-OF-ORDER BRANCH EXECUTION - A pipelined out-of-order execution in-order retire microprocessor includes a branch predictor that predicts a target address of a branch instruction, a fetch unit that fetches instructions at the predicted target address, and an execution unit that: resolves a target address of the branch instruction and detects that the predicted and resolved target addresses are different; determines whether there is an unretired instruction that must be corrected and that is older in program order than the branch instruction, in response to detecting that the predicted and resolved target addresses are different; execute the branch instruction by flushing instructions fetched at the predicted target address and causing the fetch unit to fetch from the resolved target address, if there is not an unretired instruction that must be corrected and that is older in program order than the branch instruction; and otherwise, refrain from executing the branch instruction. | 12-02-2010 |
20100306507 | OUT-OF-ORDER EXECUTION MICROPROCESSOR WITH REDUCED STORE COLLISION LOAD REPLAY REDUCTION - An out-of-order execution microprocessor for reducing the likelihood of having to replay a load instruction due to a store collision. The microprocessor includes a queue of entries, each entry configured to hold information that identifies sources of a store instruction used to compute its store address and to hold a dependency that identifies an instruction upon which the store instruction depends for its data. A register alias table (RAT), coupled to the queue of entries, is configured to encounter instructions in program order and to generate dependencies used to determine when the instructions may execute out of program order. In response to encountering a load instruction the RAT determines whether sources of the load instruction used to compute its load address match the sources of the store instruction in an entry of the queue, and if so, causes the load instruction to share the dependency of the matching store instruction. | 12-02-2010 |
20100306508 | OUT-OF-ORDER EXECUTION MICROPROCESSOR WITH REDUCED STORE COLLISION LOAD REPLAY REDUCTION - An out-of-order execution microprocessor for reducing load instruction replay likelihood due to store collisions. A register alias table (RAT) is coupled to first and second queues of entries and generates dependencies used to determine when instructions may execute out of order. The RAT allocates an entry of the first queue and populates the allocated entry with an instruction pointer of a load instruction, when it determines that the load instruction must be replayed. The RAT allocates an entry of the second queue when it encounters a store instruction and populates the allocated entry with a dependency that identifies an instruction upon which the store instruction depends for its data. The RAT causes a subsequent instance of the load instruction to share the dependency when it encounters the subsequent instance of the load instruction and determines that its instruction pointer matches the instruction pointer of an entry of the first queue. | 12-02-2010 |
20100325395 | DEPENDENCE PREDICTION IN A MEMORY SYSTEM - Techniques related to dependence prediction for a memory system are generally described. Various implementations may include a predictor storage storing a value corresponding to at least one prediction type associated with at least one load operation, and a state-machine having multiple states. For example, the state-machine may determine whether to execute the load operation based upon a prediction type associated with each of the states and a corresponding precedent to the load operation for the associated prediction type. The state-machine may further determine the prediction type for a subsequent load operation based on a result of the load operation. The states of the state machine may correspond to prediction types, which may be a conservative prediction type, an aggressive prediction type, or one or more N-store prediction types, for example. | 12-23-2010 |
20100332805 | Remapping source Registers to aid instruction scheduling within a processor - An out-of-order renaming processor is provided with a register file within which aliasing between registers of different sizes may occur. In this way a program instruction having a source register of a double precision size may alias with two single precision registers being used as destinations of one or more preceding program instructions. In order to track this data dependency the double precision register may be remapped into a micro-operation specifying two single precision registers as its source register. In this way, scheduling circuitry may use its existing hazard detection and management mechanisms to handle potential data hazards and dependencies. Not all program instructions having such data hazards between registers of different sizes are handled by this source register remapping. For these other program instructions a slower mechanism for dealing with the data dependency hazard is provided. This slower mechanism may, for example, be to drain all the preceding micro-operations from the execution pipelines before issuing the micro-operation having the data hazard. | 12-30-2010 |
20100332806 | DEPENDENCY MATRIX FOR THE DETERMINATION OF LOAD DEPENDENCIES - Systems and methods for identification of dependent instructions on speculative load operations in a processor. A processor allocates entries of a unified pick queue for decoded and renamed instructions. Each entry of a corresponding dependency matrix is configured to store a dependency bit for each other instruction in the pick queue. The processor speculates that loads will hit in the data cache, hit in the TLB and not have a read after write (RAW) hazard. For each unresolved load, the pick queue tracks dependent instructions via dependency vectors based upon the dependency matrix. If a load speculation is found to be incorrect, dependent instructions in the pick queue are reset to allow for subsequent picking, and dependent instructions in flight are canceled. On completion of a load miss, dependent operations are re-issued. On resolution of a TLB miss or RAW hazard, the original load is replayed and dependent operations are issued again from the pick queue. | 12-30-2010 |
20110035573 | OUT-OF-ORDER X86 MICROPROCESSOR WITH FAST SHIFT-BY-ZERO HANDLING - An out-of-order execution microprocessor includes a register alias table configured to generate a first indicator that indicates whether an instruction is dependent upon a condition code result of a shift instruction. The microprocessor also includes a first execution unit configured to execute the shift instruction and to generate a second indicator that indicates whether a shift amount of the shift instruction is zero. The microprocessor also includes a second execution unit configured to receive the first and second indicators and to generate a replay signal to cause the instruction to be replayed if the first indicator indicates the instruction is dependent upon the condition code result of the shift instruction and a second indicator indicates the shift amount of the shift instruction is zero. | 02-10-2011 |
20110078417 | COOPERATIVE THREAD ARRAY REDUCTION AND SCAN OPERATIONS - One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread. | 03-31-2011 |
20110093684 | TRANSPARENT CONCURRENT ATOMIC EXECUTION - Executing a block of code is disclosed. Executing includes receiving an indication that the block of code is to be executed using a synchronization mechanism and speculatively executing the block of code on a virtual machine. The block of code may include application code. The block of code does not necessarily indicate that the block of code should be speculatively executed. | 04-21-2011 |
20110119468 | MECHANISM OF SUPPORTING SUB-COMMUNICATOR COLLECTIVES WITH O(64) COUNTERS AS OPPOSED TO ONE COUNTER FOR EACH SUB-COMMUNICATOR - A system and method for enhancing barrier collective synchronization on a computer system comprises a computer system including a data storage device. The computer system includes a program stored in the data storage device and steps of the program being executed by a processor. The system includes providing a plurality of communicators for storing state information for a bather algorithm. Each communicator designates a master core in a multi-processor environment of the computer system. The system allocates or designates one counter for each of a plurality of threads. The system configures a table with a number of entries equal to the maximum number of threads. The system sets a table entry with an ID associated with a communicator when a process thread initiates a collective. The system determines an allocated or designated counter by searching entries in the table. | 05-19-2011 |
20110119469 | BALANCING WORKLOAD IN A MULTIPROCESSOR SYSTEM RESPONSIVE TO PROGRAMMABLE ADJUSTMENTS IN A SYNCRONIZATION INSTRUCTION - In a multiprocessor system with threads running in parallel, workload balancing is facilitated by recognizing a plurality of levels of sub-tasks of a memory synchronization instruction and selectively choosing for at least one thread to do less than all of levels of these sub-tasks in response to the memory synchronization instruction. Which thread waits to synchronize can be impacted by this choice. The programmer can cause a thread expected to be a bottleneck to wait less than other threads. Where one thread is a producer and another thread is a consumer, types of memory synchronization can be adapted to these roles. | 05-19-2011 |
20110145550 | NON-QUIESCING KEY SETTING FACILITY - A non-quiescing key setting facility is provided that enables manipulation of storage keys to be performed without quiescing operations of other processors of a multiprocessor system. With this facility, a storage key, which is accessible by a plurality of processors of the multiprocessor system, is updated absent a quiesce of operations of the plurality of processors. Since the storage key is updated absent quiescing of other operations, the storage key may be observed by a processor as having one value at the start of an operation performed by the processor and a second value at the end of the operation. A mechanism is provided to enable the operation to continue, avoiding a fatal exception. | 06-16-2011 |
20110167244 | EARLY INSTRUCTION TEXT BASED OPERAND STORE COMPARE REJECT AVOIDANCE - A method and system for early instruction text based operand store compare avoidance in a processor are provided. The system includes a processor pipeline for processing instruction text in an instruction stream, where the instruction text includes operand address information. The system also includes delay logic to monitor the instruction stream. The delay logic performs a method that includes detecting a load instruction following a store instruction in the instruction stream, comparing the operand address information of the store instruction with the load instruction. The method also includes delaying the load instruction in the processor pipeline in response to detecting a common field value between the operand address information of the store instruction and the load instruction. | 07-07-2011 |
20110185160 | MULTI-CORE PROCESSOR WITH EXTERNAL INSTRUCTION EXECUTION RATE HEARTBEAT - A method for debugging a multi-core microprocessor includes causing the microprocessor to perform an actual execution of instructions and obtaining from the microprocessor heartbeat information that specifies an actual execution sequence of the instructions by the plurality of cores relative to one another, commanding a corresponding plurality of instances of a software functional model of the cores to execute the instructions according to the actual execution sequence specified by the heartbeat information to generate simulated results of the execution of the instructions, and comparing the simulated results with actual results of the execution of the instructions to determine whether they match. Each core outputs an instruction execution indicator indicating the number of instructions executed by the core each core clock. A heartbeat generator generates a heartbeat indicator for each core on an external bus that indicates the number of instructions executed by each core during each external bus clock cycle. | 07-28-2011 |
20110219215 | ATOMICITY: A MULTI-PRONGED APPROACH - In a multiprocessor system with speculative execution, atomicity can be approached in several fashions. One approach is to have atomic instructions that achieve multiple functions and are guaranteed to complete. Another approach is to have blocks of code that are grouped to succeed or fail together. A system can incorporate more than one such approach. In implementing more than one approach, the system may prioritize one over another. When conflict detection is done through a directory lookup in cache memory, atomic instructions and atomicity related operations may be implemented in a cache data array access pipeline in that cache memory. This implementation may include feedback to the pipeline for implementing multiple functions within an atomic instruction and also for cascading atomic instructions. | 09-08-2011 |
20110258415 | APPARATUS AND METHOD FOR HANDLING DEPENDENCY CONDITIONS - Techniques for handling dependency conditions, including evil twin conditions, are disclosed herein. An instruction may designate a source register comprising two portions. The source register may be a double-precision register and its two portions may be single-precision portions, each specified as destinations by two other single-precision instructions. Execution of these two single-precision instructions, especially on a register renaming machine, may result in the appropriate values for the two portions of the source register being stored in different physical locations, which can complicate execution of an instruction stream. In response to detecting a potential dependency, one or more instructions may be inserted in an instruction stream to enable the appropriate values to be stored within one physical double precision register, eliminating an actual or potential evil twin dependency. Embodiments including a compiler that inserts instructions in a generated instruction stream to eliminate dependency conditions are also contemplated. | 10-20-2011 |
20110296144 | REDUCING DATA HAZARDS IN PIPELINED PROCESSORS TO PROVIDE HIGH PROCESSOR UTILIZATION - A pipelined computer processor is presented that reduces data hazards such that high processor utilization is attained. The processor restructures a set of instructions to operate concurrently on multiple pieces of data in multiple passes. One subset of instructions operates on one piece of data while different subsets of instructions operate concurrently on different pieces of data. A validity pipeline tracks the priming and draining of the pipeline processor to ensure that only valid data is written to registers or memory. Pass-dependent addressing is provided to correctly address registers and memory for different pieces of data. | 12-01-2011 |
20120005459 | PROCESSOR HAVING INCREASED PERFORMANCE AND ENERGY SAVING VIA MOVE ELIMINATION - Methods and apparatuses are provided for increasing processor performance and energy saving via eliminating physical data movement to accomplish a move instruction. The apparatus comprises a first plurality of available physical registers mapped to a second plurality of logical registers, including a source logical register and a destination logical register. A renaming unit remaps the destination logical register to the same physical register mapping as the source logical register in response to a move instruction. In this way, the move instruction is effectively executed without moving data between physical registers. A method is provided for increasing processor performance and energy saving via eliminating physical data movement to accomplish a move instruction. The method comprises determining a mapping of a logical source register and a logical destination register to physical registers of a processor and then remapping the logical destination register to the same physical register mapping as the logical source register to affect an equivalent of the move instruction with actual data movement between physical registers. | 01-05-2012 |
20120017069 | OUT-OF-ORDER COMMAND EXECUTION - Techniques are described for reordering commands to improve the speed at which at least one command stream may execute. Prior to distributing commands in the at least one command stream to multiple pipelines, a multimedia processor analyzes any inter-pipeline dependencies and determines the current execution state of the pipelines. The processor may, based on this information, reorder the at least one command stream by prioritizing commands that lack any current dependencies and therefore may be executed immediately by the appropriate pipeline. Such out of order execution of commands in the at least one command stream may increase the throughput of the multimedia processor by increasing the rate at which the command stream is executed. | 01-19-2012 |
20120042152 | ELIMINATION OF READ-AFTER-WRITE RESOURCE CONFLICTS IN A PIPELINE OF A PROCESSOR - An apparatus having a processor and a circuit is disclosed. The processor generally has a pipeline. The circuit may be configured to (i) detect a first write instruction in the pipeline that writes to a resource, (ii) stall a read instruction in the pipeline where (a) a first read-after-write conflict exists between the first write instruction and the read instruction and (b) no other write instruction to the resource is scheduled between the first write instruction and the read instruction and (iii) not stall the read instruction due to the first read-after-write conflict where a second write instruction to the resource is scheduled between the first write instruction and the read instruction. | 02-16-2012 |
20120124339 | PROCESSOR CORE SELECTION BASED AT LEAST IN PART UPON AT LEAST ONE INTER-DEPENDENCY - An embodiment may include at least one first process to be executed, at least in part, by circuitry. The at least one first process may select, at least in part, from a plurality of processor cores, one or more processor cores to execute, at least in part, at least one second process. The at least one first process may select, at least in part, the one or more processor cores based at least in part upon whether at least one inter-dependency exists, at least in part, between the at least one second process and at least one third process also to be executed by the one or more processor cores. Many alternatives, variations, and modifications are possible. | 05-17-2012 |
20120137109 | METHOD AND APPARATUS FOR PERFORMING STORE-TO-LOAD FORWARDING FROM AN INTERLOCKING STORE USING AN ENHANCED LOAD/STORE UNIT IN A PROCESSOR - A method and a processor load/store unit (LSU) are described for performing store-to-load forwarding (STLF) from an interlocking store. STLF is performed when a starting address of the store and the load do not match, or when a data size of the store is smaller than a data size of the load. The LSU detects a load that interlocks with a store, and determines whether all or only a portion of data bytes needed by the load can be provided by the interlocking store. If it is determined that only a portion of the data bytes needed by the load can be provided by the interlocking store, then that portion of the data bytes is provided by a store data buffer (SDB) and the remaining portion of the data bytes needed by the load is provided by a data cache (DC). Otherwise, the SDB provides all of the data bytes. | 05-31-2012 |
20120144167 | APPARATUS FOR EXECUTING PROGRAMS FOR A FIRST COMPUTER ARCHITECTURE ON A COMPUTER OF A SECOND ARCHITECTURE - A multi-instruction set architecture (ISA) computer system includes a computer program, a first processor, a second processor, a profiler, and a translator. The computer program includes instructions of a first ISA, the first ISA having a first complexity. The first processor is configured to execute instructions of the first ISA. The second processor is configured to execute instructions of a second ISA, the second ISA being different than the first ISA and having a second complexity, wherein the second complexity is less than the first complexity. The profiler is configured to select a block of the computer program for translation to instructions of the second ISA, wherein the block includes one or more instructions of the first ISA. The translator is configured to translate the block of the first ISA into instructions of the second ISA for execution by the second processor. | 06-07-2012 |
20120159130 | MECHANISM FOR CONFLICT DETECTION USING SIMD - A system and method are configured to detect conflicts when converting scalar processes to parallel processes (“SIMDifying”). Conflicts may be detected for an unordered single index, an ordered single index and/or ordered pairs of indices. Conflicts may be further detected for read-after-write dependencies. Conflict detection is configured to identify operations (i.e., iterations) in a sequence of iterations that may not be done in parallel. | 06-21-2012 |
20120166769 | PROCESSOR HAVING INCREASED PERFORMANCE VIA ELIMINATION OF SERIAL DEPENDENCIES - Methods and apparatuses are provided for achieving increased performance via elimination of serial dependencies in instructions or instruction sequences. The apparatus comprises an operational unit for determining whether an instruction will cause dependencies during completion in an execution unit. Responsive to that determination the instruction is replaced with an alternative instruction for completion in the execution unit. In this way, the alternative instruction is completed without causing dependencies in the execution unit. The method comprises determining that an instruction will cause dependencies during completion in a processor and replacing the instruction with an alternative instruction for completion in the processor. | 06-28-2012 |
20120166770 | Instruction Execution - A method of executing an instruction set including a first instruction and a second instruction, includes reading the first instruction; determining whether the first instruction is an instruction which is integral with the second instruction; reading the second instruction; if the first instruction is integral with the second instruction, interpreting the operand field of the second instruction to indicate at least one value to be used in conjunction with at least one bit of the first instruction; and if the first instruction is not integral with the second instruction, interpreting the operand field of the second instruction to indicate an entry of a look-up table. | 06-28-2012 |
20120185675 | APPARATUS AND METHOD FOR COMPRESSING TRACE DATA - An apparatus and method for compressing trace data is provided. The apparatus includes a detection unit configured to detect trace data corresponding to one or more function units performing a substantially significant operation in a reconfigurable processor as valid trace data, and a compression unit configured to compress the valid trace data. | 07-19-2012 |
20120185676 | Processing apparatus, trace unit and diagnostic apparatus - A processing circuit | 07-19-2012 |
20120191952 | PROCESSOR IMPLEMENTING SCALAR CODE OPTIMIZATION - Methods and apparatuses are provided for increased efficiency and enhanced power saving in a processor via scalar code optimization. The method comprises determining that an instruction comprises a scalar instruction and then processing the instruction using only a lower portion of an XMM register. The apparatus comprises an operational unit capable of determining whether an instruction comprises a scalar instruction and execution units responsive that determining for processing the scalar instruction using only a lower portion of an XMM register of the processor. By not processing the upper portion of the XMM register efficiency is increased and power saving is enhanced. | 07-26-2012 |
20120191953 | Parallel Execution Unit that Extracts Data Parallelism at Runtime - Mechanisms for extracting data dependencies during runtime are provided. With these mechanisms, a portion of code having a loop is executed. A first parallel execution group is generated for the loop, the group comprising a subset of iterations of the loop less than a total number of iterations of the loop. The first parallel execution group is executed by executing each iteration in parallel. Store data for iterations are stored in corresponding store caches of the processor, Dependency checking logic of the processor determines, for each iteration, whether the iteration has a data dependence. Only the store data for stores where there was no data dependence determined are committed to memory. | 07-26-2012 |
20120204010 | NON-QUIESCING KEY SETTING FACILITY - A non-quiescing key setting facility is provided that enables manipulation of storage keys to be performed without quiescing operations of other processors of a multiprocessor system. With this facility, a storage key, which is accessible by a plurality of processors of the multiprocessor system, is updated absent a quiesce of operations of the plurality of processors. Since the storage key is updated absent quiescing of other operations, the storage key may be observed by a processor as having one value at the start of an operation performed by the processor and a second value at the end of the operation. A mechanism is provided to enable the operation to continue, avoiding a fatal exception. | 08-09-2012 |
20120221836 | Synchronizing Commands and Dependencies in an Asynchronous Command Queue - Provided are techniques for the managing of command queue dependencies and command queue synchronization. Incoming commands are actively tracked through their dependency relationships. Command dependencies may be tracked across multiple lists, including a submission list and a completion list. Each command on the submission list is prepared for processing and ultimately submitted to command processing logic. Command completion processing is performed on each command on the completion list, including by not limited to removing dependencies from pending commands and possibly queuing pending commands for submission to the command processing logic. Also provided as features of a command queue are a standby barrier, an active barrier and a marker. Standby and active barriers are employed to synchronize and track commands through the command queue. Markers are employed to track commands through the command queue. | 08-30-2012 |
20120246449 | METHOD AND APPARATUS FOR EFFICIENT LOOP INSTRUCTION EXECUTION USING BIT VECTOR SCANNING - A method, apparatus and computer program product for performing efficient loop instruction execution using bit vector scanning is presented. A bit vector is scanned, each bit in the bit vector representing at least one of a feature and a conditional status. The presence of a bit of said bit vector set to a first state is detected. The bit is set to a second state. An instruction address for a routine corresponding to said bit set to a first state is looked up using a bit position of said bit that was set to a first state. The routine is executed. The scanning, said detecting, said setting and said using are repeated until there are no remaining bits of said bit vector set to said first state. | 09-27-2012 |
20120246450 | REGISTER FILE SEGMENTS FOR SUPPORTING CODE BLOCK EXECUTION BY USING VIRTUAL CORES INSTANTIATED BY PARTITIONABLE ENGINES - A system for executing instructions using a plurality of register file segments for a processor. The system includes a global front end scheduler for receiving an incoming instruction sequence, wherein the global front end scheduler partitions the incoming instruction sequence into a plurality of code blocks of instructions and generates a plurality of inheritance vectors describing interdependencies between instructions of the code blocks. The system further includes a plurality of virtual cores of the processor coupled to receive code blocks allocated by the global front end scheduler, wherein each virtual core comprises a respective subset of resources of a plurality of partitionable engines, wherein the code blocks are executed by using the partitionable engines in accordance with a virtual core mode and in accordance with the respective inheritance vectors. A plurality register file segments are coupled to the partitionable engines for providing data storage. | 09-27-2012 |
20120260071 | CONDITIONAL ALU INSTRUCTION CONDITION SATISFACTION PROPAGATION BETWEEN MICROINSTRUCTIONS IN READ-PORT LIMITED REGISTER FILE MICROPROCESSOR - An architectural instruction instructs a microprocessor to perform an operation on first and second source operands to generate a result and to write the result to a destination register only if architectural condition flags satisfy a condition specified in the architectural instruction. A hardware instruction translator translates the architectural instruction into first and second microinstructions. To execute the first microinstruction, an execution pipeline performs the operation on the source operands to generate the result, determines whether the architectural condition flags satisfy the condition, and updates a non-architectural indicator to indicate whether the architectural condition flags satisfy the condition. To execute the first microinstruction, if the non-architectural indicator updated by the first microinstruction indicates the architectural condition flags satisfy the condition, it updates the destination register with the result; otherwise, it updates the destination register with the current value of the destination register. | 10-11-2012 |
20120278594 | PERFORMANCE BOTTLENECK IDENTIFICATION TOOL - A computer program product for identifying bottlenecks includes a computer readable storage medium with stored computer readable program instructions. The computer readable program instructions, when executed, provide a data collector module, a mapper module, and an analyzer module that are collectively configured to read mapped data and configuration files, and identify, based upon the mapped data and the configuration files, an undesirable bottleneck condition that causes a computer program to run inefficiently. A method includes reading a configuration file that includes data regarding processor components, and collecting data from hardware activity counters based upon the configuration file. The method also includes mapping the collected data to corresponding sections of code of a computer program, reading the mapped data and the configuration file, and identifying, based upon the reading of the mapped data and the configuration file, an undesirable bottleneck condition that causes the processor to run the computer program inefficiently. | 11-01-2012 |
20120290818 | Split Scheduler - In an embodiment, a scheduler implements a first dependency array that tracks dependencies on instruction operations (ops) within a distance N of a given op and which are short execution latency ops. Other dependencies are tracked in a second dependency array. The first dependency array may evaluate quickly, to support back-to-back issuance of short execution latency ops and their dependent ops. The second array may evaluate more slowly than the first dependency array. | 11-15-2012 |
20130007418 | FLUSH OPERATIONS IN A PROCESSOR - Methods and apparatuses are provided for flush operations in a processor. The apparatus comprises an out-of-order execution unit for processing instructions issued in-order from an instruction decoder for first and second threads and being configured to identify an errored instruction in a first thread. A retire unit includes a retire queue for receiving completed instructions from the out-of-order execution unit, the retire unit being configured retire older in-order first thread instructions until the errored instruction would be the next instruction to be retired, and then flushing the errored instruction and all later in-order first thread instructions from the retire queue. The method comprises determining that an errored instruction is being processed by an out-of-order execution unit of a processor and continuing to process to completion instructions earlier in-order from the errored instruction until the completion of the errored instruction. Following completion of the errored instruction, it is flushed along with all instructions later in-order than the errored instruction to recover the processor to a pre-error state. | 01-03-2013 |
20130046961 | SPECULATIVE MEMORY WRITE IN A PIPELINED PROCESSOR - An apparatus generally having an interface circuit and a processor. The interface circuit may have a queue and a connection to a memory. The processor may have a pipeline. The processor is generally configured to (i) place an address in the queue in response to processing a first instruction in a first stage of the pipeline, (ii) generate a flag by processing a second instruction in a second stage of the pipeline, the second instruction may be processed in the second stage after the first instruction is processed in the first stage, and (iii) generate a signal based on the flag in a third stage of the pipeline. The third stage may be situated in the pipeline after the second stage. The interface circuit is generally configured to cancel the address from the queue without transferring the address to the memory in response to the signal having a disabled value. | 02-21-2013 |
20130067201 | MULTIPROCESSOR SYSTEM, EXECUTION CONTROL METHOD AND EXECUTION CONTROL PROGRAM - The multiprocessor system includes one or a plurality of main processors and a plurality of sub-processors, and an execution control circuit which conducts execution control of each the sub-processors, wherein the execution control circuit includes an execution control processor for execution control processing of each the sub-processors, a control bus output unit for activation of a command to each the sub-processors, a status bus input unit for status notification from each the sub-processors, a determination circuit which determines whether or not the status notification has one-to-one dependency with a processing command to be issued next on an operation sequence and is to be processed at a high speed, a status accelerator which issues a corresponding processing activation command when the status notification is to be processed at a high speed, and a status FIFO control unit which processes the status notification by using the execution control processor. | 03-14-2013 |
20130080743 | ROUND ROBIN PRIORITY SELECTOR - Method and structures for performing round robin priority selection receive an input vector into an input port. The methods and structures group the bits of the input vector into groups of bits and supply the groups of bits to round robin priority selectors. Then, the methods and structures simultaneously identify an individual group priority bit within each group of bits based on the starting bit location, using the round robin priority selectors. The methods and structures also choose, using the group selector, a round robin priority selector based on the starting bit location. The methods and structures then output, from the group selector to a multiplexor, the individual group priority bit of the selected round robin priority selector. Following this the method outputs, from the multiplexor, an output vector having a first value (e.g., 1) only in the individual group priority bit output by the group selector. | 03-28-2013 |
20130103930 | DATA PROCESSING DEVICE AND METHOD, AND PROCESSOR UNIT OF SAME - A processor unit ( | 04-25-2013 |
20130145129 | REGISTER RENAMING DATA PROCESSING APPARATUS AND METHOD FOR PERFORMING REGISTER RENAMING - A data processing apparatus and method are provided. A processor performs data processing operations in response to data processing instructions which reference logical registers. A set of physical registers stores data values which are subjected to the data processing operations. A tag storage stores for each physical register a tag value indicative of one of the logical registers. The processor references the tag storage to perform the data processing operations. A tag value exchanger performs a tag switch exchanging two tag values in the tag storage when the processor executes a predetermined instruction which references two logical registers and for which a choice of which two physical registers are mapped to which of the two logical registers will have no effect on an outcome of the data processing operations. The tag value exchanger performs the tag switch with respect to the tag values indicative of the two logical registers. | 06-06-2013 |
20130166886 | SYSTEMS, APPARATUSES, AND METHODS FOR A HARDWARE AND SOFTWARE SYSTEM TO AUTOMATICALLY DECOMPOSE A PROGRAM TO MULTIPLE PARALLEL THREADS - Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program into multiple parallel threads are described. For example, a method according to one embodiment comprises: analyzing a single-threaded region of executing program code, the analysis including identifying dependencies within the single-threaded region; determining portions of the single-threaded region of executing program code which may be executed in parallel based on the analysis; assigning the portions to two or more parallel execution tracks; and executing the portions in parallel across the assigned execution tracks. | 06-27-2013 |
20130191617 | COMPUTER SYSTEM, COMPUTER SYSTEM CONTROL METHOD, COMPUTER SYSTEM CONTROL PROGRAM, AND INTEGRATED CIRCUIT - A computer system includes a memory having a secure area and a plurality of processors using the memory. When an access-allowed program unit executed by one of the processors starts an access to the secure area, the program unit subject to execution by the other processors is limited to the access-allowed program unit. | 07-25-2013 |
20130227251 | BRANCH MISPREDICATION BEHAVIOR SUPPRESSION ON ZERO PREDICATE BRANCH MISPREDICT - A method for suppressing branch misprediction behavior is contemplated in which a conditional branch instruction that would cause the flow of control to branch around instructions in response to a determination that a predicate vector is null is predicted not taken. However, in response to detecting that the prediction is incorrect, misprediction behavior is inhibited. | 08-29-2013 |
20130262832 | Instruction Scheduling for Reducing Register Usage - A method, computer program product, and system are provided for scheduling a plurality of instructions in a computing system. For example, the method can generate a plurality of instruction lineages, in which the plurality of instruction lineages is assigned to one or more registers. Each of the plurality of instruction lineages has at least one node representative of an instruction from the plurality of instructions. The method can also determine a node order based on respective priority values associated with each of the nodes. Further, the method can include scheduling the plurality of instructions based on the node order and the one or more registers assigned to the one or more registers. | 10-03-2013 |
20130262833 | PERFORMANCE OF VECTOR PARTITIONING LOOPS - A method for suppressing prediction of a backward branch instruction used in a vector partitioning loop includes detecting the first backward branch instruction that occurs after a predicate generating instruction. The predicate generating instruction generates a predicate vector that is dependent upon a dependency vector where each element of the dependency vector indicates whether a data dependency exists between elements of a vector instruction. The method also includes receiving an indication of a prediction accuracy of a prediction of the backward branch instruction. If the prediction accuracy does not satisfy a threshold value, the prediction of the backward branch instruction is suppressed until the dependency vector on which the predicate-generating instruction depends is available. | 10-03-2013 |
20130275724 | SYSTEMS, APPARATUSES, AND METHODS FOR GENERATING A DEPENDENCY VECTOR BASED ON TWO SOURCE WRITEMASK REGISTERS - Embodiments of systems, apparatuses, and methods of performing in a computer processor dependency index vector calculation in response to an instruction that includes a first and second source writemask register operands, a destination vector register operand, and an opcode are described. | 10-17-2013 |
20130311754 | FUSING CONDITIONAL WRITE INSTRUCTIONS HAVING OPPOSITE CONDITIONS IN INSTRUCTION PROCESSING CIRCUITS, AND RELATED PROCESSOR SYSTEMS, METHODS, AND COMPUTER-READABLE MEDIA - Fusing conditional write instructions having opposite conditions in instruction processing circuits and related processor systems, methods, and computer-readable media are disclosed. In one embodiment, a first conditional write instruction writing a first value to a target register based on evaluating a first condition is detected by an instruction processing circuit. The circuit also detects a second conditional write instruction writing a second value to the target register based on evaluating a second condition that is a logical opposite of the first condition. Either the first condition or the second condition is selected as a fused instruction condition, and corresponding values are selected as if-true and if-false values. A fused instruction is generated for selectively writing the if-true value to the target register if the fused instruction condition evaluates to true, and selectively writing the if-false value to the target register if the fused instruction condition evaluates to false. | 11-21-2013 |
20130326198 | LOAD-STORE DEPENDENCY PREDICTOR PC HASHING - Methods and processors for managing load-store dependencies in an out-of-order instruction pipeline. A load store dependency predictor includes a table for storing entries for load-store pairs that have been found to be dependent and execute out of order. Each entry in the table includes hashed values to identify load and store operations. When a load or store operation is detected, the PC and an architectural register number are used to create a hashed value that can be used to uniquely identify the operation. Then, the load store dependency predictor table is searched for any matching entries with the same hashed value. | 12-05-2013 |
20130339670 | REDUCING OPERAND STORE COMPARE PENALTIES - Embodiments relate to reducing operand store compare penalties by detecting potential unit of operation (UOP) dependencies. An aspect includes a computer system for reducing operation store compare penalties. The system includes memory and a processor. The system performs a method including cracking an instruction into units of operation, where each UOP includes instruction text and address determination fields. The method includes identifying a load UOP among the plurality of UOPs and comparing values of the address determination fields of the load UOP with values of address determination fields of one or more previously-decoded store UOPs. The method also includes forcing, prior to issuance of the instruction to an execution unit, a dependency between the load UOP and the one or more previously-decoded store UOPs based on the comparing. | 12-19-2013 |
20140019724 | COOPERATIVE THREAD ARRAY REDUCTION AND SCAN OPERATIONS - One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread. | 01-16-2014 |
20140047217 | SATISFIABILITY CHECKING - A satisfiability checking system may include a single instruction, multiple data (SIMD) machine configured to execute multiple threads in parallel. The multiple threads may be divided among multiple blocks. The SIMD machine may be further configured to perform satisfiability checking of a formula including multiple parts. The satisfiability checking may include assigning one or more of the parts to one or more threads of the multiple threads of a first block of the multiple blocks. The satisfiability checking may further include processing the assigned one or more parts in the first block such that first results are calculated based on a first proposition. The satisfiability checking may further include synchronizing the results among the one or more threads of the first block. | 02-13-2014 |
20140059327 | DETECTING CROSS-TALK ON PROCESSOR LINKS - A first of a plurality of data lanes of a first of a plurality of processor links is determined to have a weakest of base performance measurements for the plurality of data lanes. A switching data pattern is transmitted via a first set of the remainder processor links and a quiet data pattern is transmitted via a second set of the remainder processor links. If performance of the first data lane increases vis-à-vis the corresponding base performance measurement, the first set of remainder processor links is eliminated from the remainder processor links. If performance of the first data lanes decreases vis-à-vis the corresponding base performance measurement, the second set of remainder processor links is eliminated from the remainder processor links. The above operations are repeatedly executed until an aggressor processor link that is determined to decrease performance of the first of the plurality of data lanes is identified. | 02-27-2014 |
20140068230 | MICRO-ARCHITECTURE FOR ELIMINATING MOV OPERATIONS - A computer system and processor for elimination of move operations include circuits that obtain a computer instruction and bypass execution units in response to determining that the instruction includes a move operation that involves a transfer of data from a logical source register to a logical destination register. Instead of executing the move operation, the transfer of the data is performed by tracking changes in data dependencies of the source and the destination registers, and assigning a physical register associated with the source register to the destination register based on the dependencies. | 03-06-2014 |
20140075158 | IDENTIFYING LOAD-HIT-STORE CONFLICTS - A computing device identifies a load instruction and store instruction pair that causes a load-hit-store conflict. A processor tags a first load instruction that instructs the processor to load a first data set from memory. The processor stores an address at which the first load instruction is located in memory in a special purpose register. The processor determines where the first load instruction has a load-hit-store conflict with a first store instruction. If the processor determines the first load instruction has a load-hit store conflict with the first store instruction, the processor stores an address at which the first data set is located in memory in a second special purpose register, tags the first data set being stored by the first store instruction, stores an address at which the first store instruction is located in memory in a third special purpose register and increases a conflict counter. | 03-13-2014 |
20140095836 | CROSS-PIPE SERIALIZATION FOR MULTI-PIPELINE PROCESSOR - Embodiments relate to cross-pipe serialization for a multi-pipeline computer processor. An aspect includes receiving, by a processor, the processor comprising a first pipeline, the first pipeline comprising a serialization pipeline, and a second pipeline, the second pipeline comprising a non-serialization pipeline, a request comprising a first subrequest for the first pipeline and a second subrequest for the second pipeline. Another aspect includes completing the first subrequest by the first pipeline. Another aspect includes, based on completing the first subrequest by the first pipeline, sending cross-pipe unlock signal from the first pipeline to the second pipeline. Yet another aspect includes, based on receiving the cross-pipe unlock signal by the second pipeline, completing the second subrequest by the second pipeline. | 04-03-2014 |
20140108770 | IDENTIFYING LOAD-HIT-STORE CONFLICTS - A computing device identifies a load instruction and store instruction pair that causes a load-hit-store conflict. A processor tags a first load instruction that instructs the processor to load a first data set from memory. The processor stores an address at which the first load instruction is located in memory in a special purpose register. The processor determines where the first load instruction has a load-hit-store conflict with a first store instruction. If the processor determines the first load instruction has a load-hit store conflict with the first store instruction, the processor stores an address at which the first data set is located in memory in a second special purpose register, tags the first data set being stored by the first store instruction, stores an address at which the first store instruction is located in memory in a third special purpose register and increases a conflict counter. | 04-17-2014 |
20140181478 | DYNAMIC WRITE PORT RE-ARBITRATION - Within a processing pipeline | 06-26-2014 |
20140189314 | Real Time Instruction Trace Processors, Methods, and Systems - A method of an aspect includes generating real time instruction trace (RTIT) packets for a first logical processor of a processor. The RTIT packets indicate a flow of software executed by the first logical processor. The RTIT packets are stored in an RTIT queue corresponding to the first logical processor. The RTIT packets are transferred from the RTIT queue to memory predominantly with firmware of the processor. Other methods, apparatus, and systems are also disclosed. | 07-03-2014 |
20140189315 | Copy-On-Write Buffer For Restoring Program Code From A Speculative Region To A Non-Speculative Region - An apparatus is described having an out-of-order instruction execution pipeline. The out-of-order execution pipeline has a first circuit and a second circuit. The first circuit is to hold a pointer to physical storage space where information is kept that cannot yet be confirmed as being free of potential dependencies on the information. The second circuit is to hold the pointer if the pointer existed in the first circuit when a non speculative region of program code ended and upon retirement of a following speculative overwriter instruction originally coded to overwrite the information. | 07-03-2014 |
20140244977 | Deferred Saving of Registers in a Shared Register Pool for a Multithreaded Microprocessor - A method of sharing a plurality of registers in a register pool among a plurality of microprocessor threads begins by allocating a first set of registers in the register pool to a first thread, the first thread executing a first instruction using the first set of registers in the register pool. The first thread is descheduled without saving values stored in the first set of registers. A second thread is scheduled to execute a second instruction using registers allocated in the register pool. Finally, the first thread is rescheduled, the first thread reusing the allocated first set of registers. | 08-28-2014 |
20140258684 | System and Method to Increase Lockstep Core Availability - A system and method for increasing lockstep core availability provides for writing a state of a main CPU core to a state buffer, executing one or more instructions of a task by the main CPU core to generate a first output for each executed instruction, and executing the one or more instructions of the task by a checker CPU core to generate a second output for each executed instruction. The method further includes comparing the first output with the second output, and if the first output does not match the second output, generating one or more control signals, and based upon the generation of the one or more control signals, loading the state of the main CPU core from the state buffer to the main CPU core and the checker CPU core. | 09-11-2014 |
20140258685 | Using Reduced Instruction Set Cores - A processor may be built with cores that only execute some partial set of the instructions needed to be fully backwards compliant. Thus, in some embodiments power consumption may be reduced by providing partial cores that only execute certain instructions and not other instructions. The instructions not supported may be handled in other, more energy efficient ways, so that, the overall processor, including the partial core, may be fully backwards compliant. | 09-11-2014 |
20140258686 | INTEGRATED CIRCUIT, ELECTRONIC DEVICE AND INSTRUCTION SCHEDULING METHOD - An integrated circuit comprising a set of data processing units including a first data processing unit and at least one second data processing unit operable at variable frequencies is disclosed. The integrated circuit further includes an instruction scheduler adapted to evaluate data dependencies between individual instructions in a received plurality of instructions and assign the instructions to the first data processing unit and the at least one second data processing unit for parallel execution in accordance with said data dependencies. The integrated circuit is operable in a first power mode and a second power mode. The second power mode is a reduced power mode compared to the first power mode and is adapted to adjust the operating frequency of the first data processing unit and the at least one second data processing unit in the second power mode as a function of the evaluated data dependencies. | 09-11-2014 |
20140281404 | SYSTEM AND METHOD TO CLEAR AND REBUILD DEPENDENCIES - A data processing system and method of clearing and rebuilding dependencies, the data processing method including changing a counter associated with a first entry in response to selecting a second entry; comparing the counter with a threshold; and indicating that the first entry is ready to be selected in response to comparing the counter with the threshold; wherein the first entry is dependent on the second entry. | 09-18-2014 |
20140281405 | OPTIMIZING PERFORMANCE FOR CONTEXT-DEPENDENT INSTRUCTIONS - A processor includes a queue for storing instructions processed within the context of a current value of a register field, where for some embodiments the instruction is undefined or defined, depending upon the register field at time of processing. After a write instruction (an instruction that writes to the register field) executes, the queue is searched for any entries that contain instructions that depend upon the executed write instruction. Each such entry stores the value of the register field at the time the instruction in the entry was processed. If such an entry is found in the queue and its stored value of the register field does not match the value that the write instruction wrote to the register field, then the processor flushes the pipeline and restarts at a state so as to correctly execute the instruction. | 09-18-2014 |
20140281406 | Instruction For Performing An Overload Check - A processor is described having a functional unit within an instruction execution pipeline. The functional unit having circuitry to determine whether substantive data from a larger source data size will fit within a smaller data size that the substantive data is to flow to. | 09-18-2014 |
20140281407 | METHODS AND APPARATUS TO COMPILE INSTRUCTIONS FOR A VECTOR OF INSTRUCTION POINTERS PROCESSOR ARCHITECTURE - Methods, apparatus, systems, and articles of manufacture to compile instructions for a vector of instruction pointers (VIP) processor architecture are disclosed. An example method includes identifying a predicate dependency between a first compiled instruction and a second compiled instruction at a control flow join point, the second compiled instruction having different speculative assumptions corresponding to how the second compiled instruction will be executed based on an outcome of the first compiled instruction. A first strand is organized to execute a first instance of the second compiled instruction corresponding to a first one of the speculative assumptions, and a second strand to execute a second instance of the second compiled instruction corresponding to a second one of the speculative assumptions which is opposite to the first one of the speculative assumptions. The first instance of the second compiled instruction and the second instance of the second compiled instruction are executed in an asynchronous manner relative to each other and to the first compiled instruction. | 09-18-2014 |
20140281408 | METHOD AND APPARATUS FOR PREDICTING FORWARDING OF DATA FROM A STORE TO A LOAD - A method for gating a load operation based on entries of a prediction table is presented. The method comprises performing a look-up for the load operation in a prediction table to find a matching entry, wherein the matching entry corresponds to a prediction regarding a behavior of the load operation, and wherein the matching entry comprises: (a) a tag field operable to identify the matching entry; (b) a distance field operable to indicate a distance of the load operation to a prior aliasing store instruction; and (c) a confidence field operable to indicate a prediction strength generated by the prediction table. The method further comprises determining if the matching entry provides a valid prediction and, if valid, retrieving a location for the prior aliasing store instruction using the distance field. It finally comprises performing a gating operation on said load operation. | 09-18-2014 |
20140281409 | METHOD AND APPARATUS FOR NEAREST POTENTIAL STORE TAGGING - A method for performing memory disambiguation in an out-of-order microprocessor pipeline is disclosed. The method comprises storing a tag with a load operation, wherein the tag is an identification number representing a store instruction nearest to the load operation, wherein the store instruction is older with respect to the load operation and wherein the store has potential to result in a RAW violation in conjunction with the load operation. The method also comprises issuing the load operation from an instruction scheduling module. Further, the method comprises acquiring data for the load operation speculatively after the load operation has arrived at a load store queue module. Finally, the method comprises determining if an identification number associated with a last contiguous issued store with respect to the load operation is equal to or greater than the tag and gating a validation process for the load operation in response to the determination. | 09-18-2014 |
20140281410 | Method and Apparatus to Allow Early Dependency Resolution and Data Forwarding in a Microprocessor - A microprocessor implemented method for performing early dependency resolution and data forwarding is disclosed. The method comprises mapping a plurality of instructions in a guest address space into a corresponding plurality of instructions in a native address space. For each current guest branch instruction in the native address space fetched during execution, performing (a) determining a youngest prior guest branch target stored in a guest branch target register, wherein the guest branch register is operable to speculatively store a plurality of prior guest branch targets corresponding to prior guest branch instructions; (b) determining a current branch target for a respective current guest branch instruction by adding an offset value for the respective current guest branch instruction to the youngest prior guest branch target; and (c) creating an entry in the guest branch target register for the current branch target. | 09-18-2014 |
20140281411 | METHOD FOR DEPENDENCY BROADCASTING THROUGH A SOURCE ORGANIZED SOURCE VIEW DATA STRUCTURE - A method for dependency broadcasting through a source organized source view data structure. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks; using a plurality of register templates to track instruction destinations and instruction sources by populating the register template with block numbers corresponding to the instruction blocks, wherein the block numbers corresponding to the instruction blocks indicate interdependencies among the blocks of instructions; populating a source organized source view data structure, wherein the source view data structure stores sources corresponding to the instruction blocks as recorded by the plurality of register templates; upon dispatch of one block of the instruction blocks, broadcasting a number belonging to the one block to a row of the source view data structure that relates that block and marking the sources of the row accordingly; and updating the dependency information of remaining instruction blocks in accordance with the broadcast. | 09-18-2014 |
20140281412 | METHOD FOR POPULATING AND INSTRUCTION VIEW DATA STRUCTURE BY USING REGISTER TEMPLATE SNAPSHOTS - A method for populating an instruction view data structure by using register template snapshots. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks; using a plurality of register templates to track instruction destinations and instruction sources by populating the register template with block numbers corresponding to the instruction blocks, wherein the block numbers corresponding to the instruction blocks indicate interdependencies among the blocks of instructions; populating and instruction view data structure, wherein the instruction view data structure stores instructions corresponding to the instruction blocks as recorded by the plurality of register templates; and using the instruction view data structure to feed a plurality of stacked execution units of execution stage in accordance with the readiness of instruction sources of the instruction blocks. | 09-18-2014 |
20140317385 | TECHNIQUES FOR DETERMINING INSTRUCTION DEPENDENCIES - One embodiment sets forth a method for efficiently determining memory resource dependencies between instructions included in a software application. For each instruction, a dependency analyzer uses overlapping search techniques to identify one or more overlaps between the memory elements included in the current instruction and the memory elements included in previous instructions. The dependency analyzer then maps objects included in the instructions to a set of partition elements wherein each partition element represents a set of memory elements that are functionally equivalent for dependency analysis. Subsequently, the dependency analyzer uses the set of partition elements to determine memory dependencies between the instructions at the memory element level. Advantageously, the disclosed techniques enable the compiler to retain an acceptable compilation speed while tuning the instruction ordering at a fine-grained memory element level, thereby increasing the speed at which the processor may execute the software application. | 10-23-2014 |
20140317386 | TECHNIQUES FOR DETERMINING INSTRUCTION DEPENDENCIES - One embodiment sets forth a method for efficiently determining memory resource dependencies between instructions included in a software application. For each instruction, a dependency analyzer uses overlapping search techniques to identify one or more overlaps between the memory elements included in the current instruction and the memory elements included in previous instructions. The dependency analyzer then maps objects included in the instructions to a set of partition elements wherein each partition element represents a set of memory elements that are functionally equivalent for dependency analysis. Subsequently, the dependency analyzer uses the set of partition elements to determine memory dependencies between the instructions at the memory element level. Advantageously, the disclosed techniques enable the compiler to retain an acceptable compilation speed while tuning the instruction ordering at a fine-grained memory element level, thereby increasing the speed at which the processor may execute the software application. | 10-23-2014 |
20140380022 | STACK ACCESS TRACKING USING DEDICATED TABLE - A processor employs a prediction table at a front end of its instruction pipeline, whereby the prediction table stores address register and offset information for store instructions; and stack offset information for stack access instructions. The stack offset information for a corresponding instruction indicates the location of the data accessed by the instruction at the processor stack relative to a base location. The processor uses pattern matching to identify predicted dependencies between load/store instructions and predicted dependencies between stack access instructions. A scheduler unit of the instruction pipeline uses the predicted dependencies to perform store-to-load forwarding or other operations that increase efficiency and reduce power consumption at the processing system. | 12-25-2014 |
20140380023 | DEPENDENCE-BASED REPLAY SUPPRESSION - A method includes selecting for execution in a processor a load instruction having at least one dependent instruction. Responsive to selecting the load instruction, the at least one dependent instruction is selectively awakened based on a status of a store instruction associated with the load instruction to indicate that the at least one dependent instruction is eligible for execution. A processor includes an instruction pipeline having an execution unit to execute instructions, a scheduler, and a controller. The scheduler selects for execution in the execution unit a load instruction having at least one dependent instruction. The controller, responsive to the scheduler selecting the load instruction, selectively awakens the at least one dependent instruction based on a status of a store instruction associated with the load instruction to indicate that the at least one dependent instruction is eligible for execution by the execution unit. | 12-25-2014 |
20150012729 | Method and system of compiling program code into predicated instructions for excution on a processor without a program counter - A predicated instruction compilation system includes a control flow graph generation module to generate a control flow graph of a program code to be compiled into the predicated instructions to be executed on a processor that does not include any program counter. Each of the instructions includes a predicate guard and a predicate update. The compilation system also includes a control flow transformation module to automatically generate the predicate guard and an update to the predicate state on the processor. A computer-implemented method of compiling a program code into predicated instructions is also described. | 01-08-2015 |
20150026437 | METHOD AND APPARATUS FOR DIFFERENTIAL CHECKPOINTING - A processor core stores information that maps a physical register to an architectural register in response to an instruction modifying the architectural register. The processor recovers a checkpointed state of a set of architectural registers prior to modification of the architectural register by the instruction by modifying a reference mapping of physical registers to the set of architectural registers using the stored information. | 01-22-2015 |
20150039861 | ALLOCATION OF ALIAS REGISTERS IN A PIPELINED SCHEDULE - In an embodiment, a system includes a processor including one or more cores and a plurality of alias registers to store memory range information associated with a plurality of operations of a loop. The memory range information references one or more memory locations within a memory. The system also includes register assignment means for assigning each of the alias registers to a corresponding operation of the loop, where the assignments are made according to a rotation schedule, and one of the alias registers is assigned to a first operation in a first iteration of the loop and to a second operation in a subsequent iteration of the loop. The system also includes the memory coupled to the processor. Other embodiments are described and claimed. | 02-05-2015 |
20150046683 | METHOD FOR USING REGISTER TEMPLATES TO TRACK INTERDEPENDENCIES AMONG BLOCKS OF INSTRUCTIONS - A method for executing instructions using register templates to track interdependencies among blocks of instructions. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks; and using a register template to track instruction destinations and instruction sources by populating the register template with block numbers corresponding to the instruction blocks, wherein the block numbers corresponding to the instruction blocks indicate interdependencies among the blocks of instructions. | 02-12-2015 |
20150058601 | VERIFYING FORWARDING PATHS IN PIPELINES - A tool for formally verifying forwarding paths in an information pipeline. The tool creates two logic design copies of the pipeline to be verified. The tool retrieves a first and a second instruction, which have previously been proven to compute a mathematically correct result when executed separately. The tool defines driver input functions for issuing instructions to the two logic design copies. In accordance with the driver input functions, the tool issues instructions to the two logic design copies. The tool abstracts data flow of the two logic design copies to isolate forwarding paths for verification. The tool adjusts for latency differences between the first and second logic design copies. The tool checks a register for results, and when results from of two logic design copies become available in the register, the tool verifies the results to conclusively prove the correctness of all states of the information pipeline. | 02-26-2015 |
20150095619 | REQUEST CHANGE TRACKER - An example request change tracker may be used to create, modify, monitor, and report events occurring within a development and testing pipeline with respect to one or more computing applications. A request change tracker may include a pipeline event detector, a testing stage detector, a control module, and a reporting module. The pipeline event detector detects a pipeline event indicative of a status of a code module with respect to a pipeline. The testing stage detector determines the associated testing stage in the pipeline, based on the pipeline event. The control module initiates actions with respect to the pipeline, based on the determined testing stage. The reporting module updates a reporting log with information related to the state, progress and results of a testing stage in the pipeline. | 04-02-2015 |
20150100764 | DYNAMICALLY DETECTING UNIFORMITY AND ELIMINATING REDUNDANT COMPUTATIONS TO REDUCE POWER CONSUMPTION - One embodiment of the present invention includes techniques to decrease power consumption by reducing the number of redundant operations performed. In operation, a streamlining multiprocessor (SM) identifies uniform groups of threads that, when executed, apply the same deterministic operation to uniform sets of input operands. Within each uniform group of threads, the SM designates one thread as the anchor thread. The SM disables execution units assigned to all of the threads except the anchor thread. The anchor execution unit, assigned to the anchor thread, executes the operation on the uniform set of input operands. Subsequently, the SM sets the outputs of the non-anchor threads included in the uniform group of threads to equal the value of the anchor execution unit output. Advantageously, by exploiting the uniformity of data to reduce the number of execution units that execute, the SM dramatically reduces the power consumption compared to conventional SMs. | 04-09-2015 |
20150149745 | PARALLELIZATION WITH CONTROLLED DATA SHARING - In one general aspect, a method can include executing multiple functions calls in parallel, and receiving, by each function call, at least one data object passed as a parameter to the function call, the at least one data object associated with a data sharing mode that defines a restricted subset of allowed operations for performing by the function call on the at least one data object. The method can further include performing, by each function call, at least one operation on the at least one data object, the at least one operation included in the restricted subset of allowed operations, and performing at least one check to ensure that the at least one operation performed by the function call on the at least one data object received by the function call is included in the restricted subset of allowed operations. | 05-28-2015 |
20150309847 | TESTING OPERATION OF MULTI-THREADED PROCESSOR HAVING SHARED RESOURCES - A method of testing simultaneous multi-threaded operation of a shared execution resource in a processor includes running test patterns including irritator threads and non-irritator threads that try to simultaneously use the shared execution resource. Synchronizing the starts of the access of the irritator threads and the non-irritator threads to the shared execution resource includes the initial instructions of the irritator thread disabling execution of the irritator thread using a thread management register, and the initial instructions of the non-irritator thread enabling the irritator thread using the thread management register and starting execution of the non-irritator thread. Ending access to the shared execution resource includes the irritator thread communicating to the non-irritator thread an address of an end of the irritator thread loop, and the non-irritator thread moving the irritator thread out of the loop using thread restart. | 10-29-2015 |
20150355907 | RELIABLE TRANSACTIONAL EXECUTION USING DIGESTS - Performing a transaction in a transactional memory environment for performing transactional executions, the transactional memory environment including a digest-generating transaction to generate a computed digest and a digest-checking transaction to compare computed digests is provided. Included is identifying, by a computer system, a first indicator signaling a beginning instruction of a digest-generating transaction including a plurality of instructions; suppressing committing memory store data of the digest-generating transaction to memory; generating a computed digest based on the execution of at least one of the plurality of instructions; identifying a second indicator associated with the plurality of instructions signaling an ending instruction of the digest-generating transaction, the computed digest is replicable for an error-free execution of the plurality of instructions; and saving the computed digest, as a reliability digest, based on completing the digest-generating transaction and not save the first computed digest based on an abort of the digest-generating transaction. | 12-10-2015 |
20160004537 | COMMITTING HARDWARE TRANSACTIONS THAT ARE ABOUT TO RUN OUT OF RESOURCE - A transactional memory system determines whether a hardware transaction can be salvaged. A processor of the transactional memory system begins execution of a transaction in a transactional memory environment. Based on detection that an amount of available resource for transactional execution is below a predetermined threshold level, the processor determines whether the transaction can be salvaged. Based on determining that the transaction can not be salvaged, the processor aborts the transaction. Based on determining the transaction can be salvaged, the processor performs a salvage operation, wherein the salvage operation comprises one or more of: determining that the transaction can be brought to a stable state without exceeding the amount of available resource for transactional execution, and bringing the transaction to a stable state; and determining that a resource can be made available, and making the resource available. | 01-07-2016 |
20160092232 | PROPAGATING CONSTANT VALUES USING A COMPUTED CONSTANTS TABLE, AND RELATED APPARATUSES AND METHODS - Propagating constant values using a computed constants table, and related apparatuses and methods are disclosed. In one aspect, an apparatus comprises an instruction processing circuit configured to provide a computed constants table containing one or more entries. Each entry of the computed constants table comprises an attribute and a computed constant value. The instruction processing circuit is configured to detect a deterministic instruction in an instruction stream. Upon detecting the deterministic instruction, the instruction processing circuit determines whether an attribute of the deterministic instruction matches an entry of the computed constants table. If so, the instruction processing circuit provides the computed constant value stored in the entry to at least one dependent instruction. In this manner, a computed constant value may be propagated between instructions without requiring the deterministic instruction to be re-executed. | 03-31-2016 |
20160092236 | MECHANISM FOR ALLOWING SPECULATIVE EXECUTION OF LOADS BEYOND A WAIT FOR EVENT INSTRUCTION - A processor includes a mechanism that checks for and flushes only speculative loads and any respective dependent instructions that are younger than an executed wait for event (WEV) instruction, and which also match an address of a store instruction that has been determined to have been executed by a different processor prior to execution of the paired SEV instruction by the different processor. The mechanism may allow speculative loads that do not match the address of any store instruction that has been determined to have been executed by a different processor prior to execution of the paired SEV instruction by the different processor. | 03-31-2016 |
20160117171 | REAL TIME INSTRUCTION TRACE PROCESSORS, METHODS, AND SYSTEMS - A method of an aspect includes generating real time instruction trace (RTIT) packets for a first logical processor of a processor. The RTIT packets indicate a flow of software executed by the first logical processor. The RTIT packets are stored in an RTIT queue corresponding to the first logical processor. The RTIT packets are transferred from the RTIT queue to memory predominantly with firmware of the processor. Other methods, apparatus, and systems are also disclosed. | 04-28-2016 |
20160139925 | TECHNIQUES FOR IDENTIFYING INSTRUCTIONS FOR DECODE-TIME INSTRUCTION OPTIMIZATION GROUPING IN VIEW OF CACHE BOUNDARIES - A technique for processing instructions includes examining instructions in an instruction stream of a processor to determine properties of the instructions. The properties indicate whether the instructions may belong in an instruction sequence subject to decode-time instruction optimization (DTIO). Whether the properties of multiple ones of the instructions are compatible for inclusion within an instruction sequence of a same group is determined. The instructions with compatible ones of the properties are grouped into a first instruction group. The instructions of the first instruction group are decoded subsequent to formation of the first instruction group. Whether the first instruction group actually includes a DTIO sequence is verified based on the decoding. Based on the verifying, DTIO is performed on the instructions of the first instruction group or is not performed on the instructions of the first instruction group. | 05-19-2016 |
20160139926 | INSTRUCTION GROUP FORMATION TECHNIQUES FOR DECODE-TIME INSTRUCTION OPTIMIZATION BASED ON FEEDBACK - A technique of processing instructions for execution by a processor includes determining whether a first property of a first instruction and a second property of a second instruction are compatible. The first instruction and the second instruction are grouped in an instruction group in response to the first and second properties being compatible and a feedback value generated by a feedback function indicating the instruction group has been historically beneficial with respect to a benefit metric of the processor. Group formation for the first and second instructions is performed according to another criteria, in response to the first and second properties being incompatible or the feedback value indicating the grouping of the first and second instructions has not been historically beneficial. | 05-19-2016 |
20160147534 | METHOD FOR MIGRATING CPU STATE FROM AN INOPERABLE CORE TO A SPARE CORE - An apparatus is disclosed in which the apparatus may include a plurality of cores, including a first core, a second core and a third core, and circuitry coupled to the first core. The first core may be configured to process a plurality of instructions. The circuitry may be may be configured to detect that the first core stopped committing a subset of the plurality of instructions, and to send an indication to the second core that the first core stopped committing the subset. The second core may be configured to disable the first core from further processing instructions of the subset responsive to receiving the indication, and to copy data from the first core to a third core responsive to disabling the first core. The third core may be configured to resume processing the subset dependent upon the data. | 05-26-2016 |
20160162293 | ASYMMETRIC PROCESSOR WITH CORES THAT SUPPORT DIFFERENT ISA INSTRUCTION SUBSETS - An asymmetric multi-core processor uses at least two asymmetric cores to collectively support the instructions of an instruction set architecture (ISA). A general-feature core and a special feature core that support different instruction subsets of the ISA. A switch manager detects whether a thread includes an instruction that is not supported by the currently-executing core and, after detecting such an instruction, switches the thread to the other core. | 06-09-2016 |
20160179552 | INSTRUCTION AND LOGIC FOR A MATRIX SCHEDULER | 06-23-2016 |
20160253181 | History Buffer for Multiple-Field Registers | 09-01-2016 |