Entries |
Document | Title | Date |
20080209176 | Time stamping transactions to validate atomic operations in multiprocessor systems - A multi-core microprocessor has a plurality of processor cores which are coupled to a bridge element. The bridge element sends transactions to and/or receives transactions from the processor cores, where each transaction has one or more packets. The transactions include atomic transactions. The bridge element comprises a buffer unit storing a time stamp for each packet sent or received. Furthermore, a multi-core multi-node processor system is provided that has debug hardware to capture and time stamp intra-node and/or inter-node transaction packets. Atomic operations are, for example, atomic read-modify-write instructions. | 08-28-2008 |
20080209177 | Mechanism in a Multi-Threaded Microprocessor to Maintain Best Case Demand Instruction Redispatch - A method and system for maintaining a best-case demand redispatch of an instruction to allow for maximizing the time a rejected thread may execute in lookahead execution mode, while maintaining the smallest L1 cache miss penalty supported by the memory subsystem. In response to a demand miss, a load/store unit sends a fetch request to the next level cache. The cache line of the demand miss is examined to identify the critical sector. Once the critical sector is identified, a best-case data return time is determined based on the fastest time the next level cache is able to return the critical sector of the cache line. The load/store unit then sends a speculative warning to the dispatch unit to coincide with the best-case data return, wherein the speculative warning prepares the dispatch unit to resend the instruction for execution as soon as data is available to the processor core. | 08-28-2008 |
20080209178 | Method and Apparatus for Back to Back Issue of Dependent Instructions in an Out of Order Issue Queue - A method is provided for evaluating two or more instructions in an out of order issue queue during a particular cycle of the queue, to select an instruction for issue during the next following cycle. If an instruction was previously designated to issue during the particular cycle, one or more instructions in the queue are evaluated to determine if any of them are dependent on the designated instruction. For the evaluation, each instruction placed into the queue is accompanied by corresponding logic elements that provide destination to source compares for the instruction. In an embodiment comprising a method, the oldest ready instruction in the queue during a particular cycle is identified. When an instruction was previously designated to issue during the particular cycle, it is determined whether at least a first instruction in the queue complies with each condition in a set of conditions, the set including at least the conditions that the first instruction has a dependency on the designated instruction, and that the first instruction is older than the oldest ready instruction. The first instruction is selected for issue during the next following cycle only if the first instruction complies with each condition in the set. | 08-28-2008 |
20080215854 | System and Method for Adaptive Run-Time Reconfiguration for a Reconfigurable Instruction Set Co-Processor Architecture - A method for adaptive runtime reconfiguration of a co-processor instruction set, in a computer system with at least a main processor communicatively connected to at least one reconfigurable co-processor, includes the steps of configuring the co-processor to implement an instruction set comprising one or more co-processor instructions, issuing a co-processor instruction to the co-processor, and determining whether the instruction is implemented in the co-processor. For an instruction not implemented in the co-processor instruction set, raising a stall signal to delay the main processor, determining whether there is enough space in the co-processor for the non-implemented instruction, and if there is enough space for said instruction, reconfiguring the instruction set of the co-processor by adding the non-implemented instruction to the co-processor instruction set. The stall signal is cleared and the instruction is executed. | 09-04-2008 |
20080229076 | MACROSCALAR PROCESSOR ARCHITECTURE - A macroscalar processor architecture is described herein. In one embodiment, an exemplary processor includes one or more execution units to execute instructions and one or more iteration units coupled to the execution units. The one or more iteration units receive one or more primary instructions of a program loop that comprise a machine executable program. For each of the primary instructions received, at least one of the iteration units generates multiple secondary instructions that correspond to multiple loop iterations of the task of the respective primary instruction when executed by the one or more execution units. Other methods and apparatuses are also described. | 09-18-2008 |
20080282067 | Issue policy control within a multi-threaded in-order superscalar processor - A multi-threaded in-order superscalar processor | 11-13-2008 |
20080301411 | INVERTING DATA ON RESULT BUS TO PREPARE FOR INSTRUCTION IN THE NEXT CYCLE FOR HIGH FREQUENCY EXECUTION UNITS - A method of operating an arithmetic logic unit (ALU) by inverting a result of an operation to be executed during a current cycle in response to control signals from instruction decode logic which indicate that a later operation will require a complement of the result, wherein the result is inverted during the current cycle. The later operation may be a subtraction operation that immediately follows the first operation. The later instruction is decoded prior to the current cycle to control the inversion in the ALU. The ALU includes an adder, a rotator, and a data manipulation unit which invert the result during the current cycle in response to an invert control signal. The second operation subtracts the result during a subsequent cycle in which a carry control signal to the adder is enabled, and the rotator and the data manipulation unit are disabled. The ALU may be used in an execution unit of a microprocessor, such as a fixed-point unit. | 12-04-2008 |
20080313432 | BLOCK ALLOCATION TIMES IN A COMPUTER SYSTEM - A method and apparatus improves the block allocation time in a parallel computer system. A pre-load controller pre-loads blocks of hardware in a supercomputer cluster in anticipation of demand from a user application. In the preferred embodiments the pre-load controller determines when to pre-load the compute nodes and the block size to allocate the nodes based on pre-set parameters and previous use of the computer system. Further, in preferred embodiments each block of compute nodes in the parallel computer system has a stored hardware status to indicate whether the block is being pre-loaded, or already has been pre-loaded. In preferred embodiments, the hardware status is stored in a database connected to the computer's control system. In other embodiments, the compute nodes are remote computers in a distributed computer system. | 12-18-2008 |
20080320282 | Method And Systems For Providing Transaction Support For Executable Program Components - Methods and systems are described for providing transaction support for executable program components. In one embodiment, transaction information is associated with an instruction included in an executable addressable entity included in an executable program component generated from source code written in a programming language, wherein the transaction information is independent of the source code and the programming language. Further, an access to the instruction is detected for executing by a processor. A transaction operation to perform in association with the executing of the instruction is determined based on the transaction information associated with the instruction. The transaction operation is performed in association with the executing of the instruction, wherein the transaction operation is performed by a program component other than the executable program component including the executable addressable entity. | 12-25-2008 |
20090024837 | System and Method for Language Specification - Described are a system and method for language specification. The device may include (a) a processor running an operating system and (b) an image capturing device scanning an image. The operating system is configured to display a user interface. The image includes data that is transmitted to the operating system to execute a language installation protocol. The protocol indicates a display language for the user interface. | 01-22-2009 |
20090024838 | MECHANISM FOR SUPPRESSING INSTRUCTION REPLAY IN A PROCESSOR - A mechanism for suppressing instruction replay includes a processor having one or more execution units and a scheduler that issue instruction operations for execution by the one or more execution units. The scheduler may also cause instruction operations that are determined to be incorrectly executed to be replayed, or reissued. In addition, a prediction unit within the processor may predict whether a given instruction operation will replay and to provide an indication that the given instruction operation will replay. The processor also includes a decode unit that may decode instructions and in response to detecting the indication, may flag the given instruction operation. The scheduler may further inhibit issue of the flagged instruction operation until a status associated with the flagged instruction is good. | 01-22-2009 |
20090037697 | SYSTEM AND METHOD OF LOAD-STORE FORWARDING - A system and method for data forwarding from a store instruction to a load instruction during out-of-order execution, when the load instruction address matches against multiple older uncommitted store addresses or if the forwarding fails during the first pass due to any other reason. In a first pass, the youngest store instruction in program order of all store instructions older than a load instruction is found and an indication to the store buffer entry holding information of the youngest store instruction is recorded. In a second pass, the recorded indication is used to index the store buffer and the store bypass data is forwarded to the load instruction. Simultaneously, it is verified if no new store, younger than the previously identified store and older than the load has not been issued due to out-of-order execution. | 02-05-2009 |
20090070558 | MULTIPLEXING PER-PROBEPOINT INSTRUCTION SLOTS FOR OUT-OF-LINE EXECUTION - The present invention provides a probe system and method for multithreaded user-space programs. The system includes an instrumentation module that enables single stepping out of line processing for multithreaded programs, an establish probepoint module that divides up an area of the probed program's memory into a plurality of instruction slots, an ensure slot assigned module that ensures that an instruction slot is assigned to a probepoint, a slot acquisition module that acquires the instruction slot for the probepoint, stealing a slot from another probepoint as needed, and a free slot module that relinquishes the instruction slot owned by the probepoint when the probepoint is being unregistered. | 03-12-2009 |
20090119490 | PROCESSOR AND INSTRUCTION SCHEDULING METHOD - An instruction scheduling method and a processor using an instruction scheduling method are provided. The instruction scheduling method includes selecting a first instruction that has a highest priority from a plurality of instructions, and allocating the selected first instruction and a first time slot to one of the functional units, allocating a second instruction and a second time slot to one of the functional units, wherein the second instruction is dependent on the first instruction. | 05-07-2009 |
20090144526 | SYSTEM AND METHOD OF ACCESSING A DEVICE - A method of accessing a device is provided. A command is received from an agent, over a network, for executing at least one instruction for accessing the device. Information is sent to the agent, over the network, regarding the execution of the at least one instruction. | 06-04-2009 |
20090177867 | Processor architectures for enhanced computational capability - A digital signal processor includes a control block configured to issue instructions based on a stored program, and a compute array including two or more compute engines configured such that each of the issued instructions executes in successive compute engines of at least a subset of the compute engines at successive times. The digital signal processor may be utilized with a control processor or as a stand-alone processor. The compute array may be configured such that each of the issued instructions flows through successive compute engines of at least a subset of the compute engines at successive times. | 07-09-2009 |
20090182986 | Processing Unit Incorporating Issue Rate-Based Predictive Thermal Management - A circuit arrangement and method utilize an issue rate-based predictive thermal management technique in a microprocessor or other integrated circuit that tracks the rate in which instructions are issued to one or more execution units in the processing unit, and selectively delays the issuance of subsequent instructions to the execution unit(s) based upon the tracked issue rate to predictively control the thermal output of the integrated circuit. | 07-16-2009 |
20090210664 | System and Method for Issue Schema for a Cascaded Pipeline - The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having four or more execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receiving an issue group of instructions; (2) scheduling the instructions in program order received; and (3) executing the issue group of instructions in the cascaded delayed execution pipeline unit. The present invention can also be viewed as providing methods for providing a group priority issue schema for a cascaded pipeline. The method includes: (1) receiving an issue group of instructions; (2) scheduling the instructions in the program order received; and (3) executing the issue group of instructions in the cascaded delayed execution pipeline unit. | 08-20-2009 |
20090217006 | Heuristic backtracer - A heuristic backtracer is described. In one embodiment, a scanner scans a stack of an application for a pointer to a word of a machine code of the application. A preceding byte locator identifies one or more bytes immediately preceding the pointed-to machine code. A parser parses the one or more bytes immediately preceding the pointed-to machine code of the machine code for a call instruction. A return address identifier determines the pointed-to as a return address when the one or more bytes constitute the call instruction. | 08-27-2009 |
20090240919 | PROCESSOR AND METHOD FOR SYNCHRONOUS LOAD MULTIPLE FETCHING SEQUENCE AND PIPELINE STAGE RESULT TRACKING TO FACILITATE EARLY ADDRESS GENERATION INTERLOCK BYPASS - A pipelined processor including an architecture for address generation interlocking, the processor including: an instruction grouping unit to detect a read-after-write dependency and to resolve instruction interdependency; an instruction dispatch unit (IDU) including address generation interlock (AGI) and operand fetching logic for dispatching an instruction to at least one of a load store unit and an execution unit; wherein the load store unit is configured with access to a data cache and to return fetched data to the execution unit; wherein the execution unit is configured to write data into a general purpose register bank; and wherein the architecture provides support for bypassing of results of a load multiple instruction for address generation while such instruction is executing in the execution unit before the general purpose register bank is written. A method and a computer system are also provided. | 09-24-2009 |
20090249034 | PROCESSOR AND SIGNATURE GENERATION METHOD, AND MULTIPLE SYSTEM AND MULTIPLE EXECUTION VERIFICATION METHOD - A processor performs instruction execution regardless of a program order. An execution unit executes an instruction, and transmits end information of the instruction whose execution has ended. A retire unit receives the end information, rearranges a result of the instruction whose execution has ended in a program order to determine the instruction execution, and transmits completed instruction information which reports that the instruction execution has been determined. A signature generation unit receives the completed instruction information from the retire unit, and generates a signature using the completed instruction information. | 10-01-2009 |
20090271592 | Apparatus For Storing Instructions In A Multithreading Microprocessor - A circuit for selecting one of N requesters in a round-robin fashion is disclosed. The circuit 1-bit left rotatively increments a first addend by a second addend to generate a sum that is ANDed with the inverse of the first addend to generate a 1-hot vector indicating which of the requestors is selected next. The first addend is an N-bit vector where each bit is false if the corresponding requester is requesting access to a shared resource. The second addend is a 1-hot vector indicating the last selected requestor. A multithreading microprocessor dispatch scheduler employs the circuit for N concurrent threads each thread having one of P priorities. The dispatch scheduler generates P N-bit 1-hot round-robin bit vectors, and each thread's priority is used to select the appropriate round-robin bit from P vectors for combination with the thread's priority and an issuable bit to create a dispatch level used to select a thread for instruction dispatching. | 10-29-2009 |
20090282221 | Preferential Dispatching Of Computer Program Instructions - A computer processor that includes a plurality of execution pipelines, each execution pipeline including a configuration of one or more execution units of the processor, each execution pipeline characterized by an execution pipeline type, each execution pipeline type determined according to the types of computer program instructions executed in each execution pipeline; a plurality of hardware threads of execution, each hardware thread including computer program instructions characterized by instruction types, each hardware thread including sequences of instructions of a same instruction type, the sequences interspersed with instructions of other types; and an instruction dispatcher capable of dispatching instructions preferentially during a predefined preference period from a preferred hardware thread to a particular execution pipeline in dependence upon whether the preferred hardware thread presents a sequence of instructions, ready for execution from the preferred hardware thread, of a type for execution in the particular execution pipeline. | 11-12-2009 |
20090327660 | MEMORY THROUGHPUT INCREASE VIA FINE GRANULARITY OF PRECHARGE MANAGEMENT - Methods and apparatus to improve throughput in memory devices are described. In one embodiment, memory throughput is increased via fine granularity of precharge management. In an embodiment, three separate precharge timings may be used, e.g., optimized per memory bank, per memory bank group, and/or per a memory device. Other embodiments are also disclosed and claimed. | 12-31-2009 |
20100064120 | Replay Reduction for Power Saving - In one embodiment, a processor comprises a scheduler configured to issue a first instruction operation to be executed and an execution core coupled to the scheduler. Configured to execute the first instruction operation, the execution core comprises a plurality of replay sources configured to cause a replay of the first instruction operation responsive to detecting at least one of a plurality of replay cases. The scheduler is configured to inhibit issuance of the first instruction operation subsequent to the replay for a subset of the plurality of replay cases. The scheduler is coupled to receive an acknowledgement indication corresponding to each of the plurality of replay cases in the subset, and is configured to inhibit issuance of the first instruction operation until the acknowledgement indication is asserted that corresponds to an identified replay case of the subset. | 03-11-2010 |
20100077181 | System and Method for Issuing Load-Dependent Instructions in an Issue Queue in a Processing Unit of a Data Processing System - A system and method for issuing load-dependent instructions in an issue queue in a processing unit. A load miss queue is provided. The load miss queue comprises a physical address field, an issue queue position field, a valid identifier field, a source identifier field, and a data type field. A load instruction that misses a first level cache is dispatched, and both the physical address field and the data type field are set. A load-dependent instruction is identified. In response to indentifying the load-dependent instruction, each of the issue queue position field, valid identifier field, and source identifier field are set. If the issue queue position field refers to a flushed instruction, the valid identifier field is cleared. The load instruction is recycled, and a value of the valid identifier field is determined. The load-dependent instruction is then selected for issue on a next processing cycle independent of an age of the load-dependent instruction. | 03-25-2010 |
20100100712 | Multi-Execution Unit Processing Unit with Instruction Blocking Sequencer Logic - A processing unit includes multiple execution units and sequencer logic that is disposed downstream of instruction buffer logic, and that is responsive to a sequencer instruction present in an instruction stream. In response to such an instruction, the sequencer logic issues a plurality of instructions associated with a long latency operation to one execution unit, while blocking instructions from the instruction buffer logic from being issued to that execution unit. In addition, the blocking of instructions from being issued to the execution unit does not affect the issuance of instructions to any other execution unit, and as such, other instructions from the instruction buffer logic are still capable of being issued to and executed by other execution units even while the sequencer logic is issuing the plurality of instructions associated with the long latency operation. | 04-22-2010 |
20100228955 | METHOD AND APPARATUS FOR IMPROVED POWER MANAGEMENT OF MICROPROCESSORS BY INSTRUCTION GROUPING - A method of power gating a microprocessor having an instruction scheduling unit for receiving issued instructions from an instruction decode unit; an execution unit coupled to receive and send signals from and to the instruction scheduling unit; and a state machine located within the execution unit, the method comprises: obtaining a number of instructions per cycle being issued to the instruction scheduling unit; determining, subsequent to obtaining the number of instructions per cycle, if the number of instruction per cycle being issued to the instruction scheduling unit is less than a threshold level, and then determining if at least two of the instructions being issued to the instruction scheduling unit are independent of each other only when the instructions per cycle is less than the threshold level; determining when at least two of the instructions being issued to the instruction scheduling unit are independent of each other; and power gating the microprocessor to gate off power to idle macros with a signal from the state machine when the instructions are independent of each other without incurring significant loss of performance until an issue queue in the instruction scheduling unit is filled with instruction data. | 09-09-2010 |
20100250901 | Selecting Fixed-Point Instructions to Issue on Load-Store Unit - Issue logic identifies a simple fixed point instruction, included in a unified payload, which is ready to issue. The simple fixed point instruction is a type of instruction that is executable by both a fixed point execution unit and a load-store execution unit. In turn, the issue logic determines that the unified payload does not include a load-store instruction that is ready to issue. As a result, the issue logic issues the simple fixed point instruction to the load-store execution unit in response to determining that the simple fixed point instruction is ready to issue and determining that the unified payload does not include a load-store instruction that is ready to issue. | 09-30-2010 |
20100262808 | MANAGING INSTRUCTIONS FOR MORE EFFICIENT LOAD/STORE UNIT USAGE - The illustrative embodiments described herein provide a computer-implemented method, apparatus, and a system for managing instructions. A load/store unit receives a first instruction at a port. The load/store unit rejects the first instruction in response to determining that the first instruction has a first reject condition. Then, the instruction sequencing unit activates a first bit in response to the load/store unit rejection the first instruction. The instruction sequencing unit blocks the first instruction from reissue while the first bit is activated. The processor unit determines a class of rejection of the first instruction. The instruction sequencing unit starts a timer. The length of the timer is based on the class of rejection of the first instruction. The instruction sequencing unit resets the first bit in response to the timer expiring. The instruction sequencing unit allows the first instruction to become eligible for reissue in response to resetting the first bit. | 10-14-2010 |
20100299504 | MICROPROCESSOR WITH MICROINSTRUCTION-SPECIFIABLE NON-ARCHITECTURAL CONDITION CODE FLAG REGISTER - A microprocessor includes an architectural register and a non-architectural register, each having a plurality of condition code flags. A first instruction of the microarchitectural instruction set of the microprocessor instructs the microprocessor to update the plurality of condition code flags based on a result of the first instruction. The first instruction includes a field for indicating whether to update the plurality of condition code flags of the architectural or non-architectural register. A second instruction of the microarchitectural instruction set instructs the microprocessor to conditionally perform an operation based on one of the plurality of condition code flags. The second instruction includes a field for indicating whether to use the one of the plurality of condition code flags of the architectural or non-architectural register to determine whether to perform the operation. | 11-25-2010 |
20100306505 | Result path sharing between a plurality of execution units within a processor - A processor | 12-02-2010 |
20100312992 | MULTITHREAD EXECUTION DEVICE AND METHOD FOR EXECUTING MULTIPLE THREADS - A multithread execution device includes: a program memory in which a plurality of programs are stored; an instruction issue unit that issues an instruction retrieved from the program memory; an instruction execution unit that executes the instruction; a target execution speed information memory that stores target execution speed information of the instruction; an execution speed monitor that monitors an execution speed of the instruction; a feedback control unit that commands the instruction issue unit to issue the instruction such that the execution speed of the instruction approximately corresponds to the target execution speed information. | 12-09-2010 |
20100318769 | USING VECTOR ATOMIC MEMORY OPERATION TO HANDLE DATA OF DIFFERENT LENGTHS - A system and method of compiling program code, wherein the program code includes an operation on an array of data elements stored in memory of a computer system. The program code is scanned for an equation which operates on data of lengths other than the limited number of vector supported data lengths. The equation is then replaced with vectorized machine executable code, wherein the machine executable code comprises a nested loop and wherein the nested loop comprises an exterior loop and a virtual interior loop. The exterior loop decomposes the equation into a plurality of loops of length N, wherein N is an integer greater than one. The virtual interior loop executes vector operations corresponding to the N length loop to form a result vector of length N, wherein the virtual interior loop includes one or more vector atomic memory operation (AMO) instructions, used to resolve false conflicts. | 12-16-2010 |
20100332804 | UNIFIED HIGH-FREQUENCY OUT-OF-ORDER PICK QUEUE WITH SUPPORT FOR SPECULATIVE INSTRUCTIONS - Systems and methods for efficient picking of instructions for out-of-order issue and execution in a processor. In one embodiment, a processor comprises a unified pick queue that is dynamically allocated. Each entry is configured to store age and dependency information relative to other decoded instructions. Also, each entry stores a picked field, which when asserted indicates the decoded instruction has already been picked for out-of-order issue and execution. When asserted, a trigger field indicates a result of a corresponding decoded instruction will be available a predetermined number of clock cycles afterward. A younger instruction dependent on a result of an older instruction is ready to be picked before the result of the older instruction is available. In this case, the older instruction has asserted picked and trigger fields. | 12-30-2010 |
20110004743 | Pipe scheduling for pipelines based on destination register number - A data processing apparatus | 01-06-2011 |
20110055523 | EARLY BRANCH DETERMINATION - A method and apparatus for branch determination. The method includes a first command issuing within a computer processor, wherein execution of the first command by the computer processor includes evaluating one or more conditions to set one or more flags. The method further includes a second command issuing, subsequent to the first command issuing, within the computer processor, wherein execution of the second command by the computer processor includes causing the computer processor to wait until the one or more flags are set. Subsequent to the first and second commands issuing, the method includes a third command issuing within the computer processor, wherein execution of the third command by the computer processor includes performing a jump operation based on a value of at least one of the one or more flags set by the first command. | 03-03-2011 |
20110072243 | Unified Collector Structure for Multi-Bank Register File - One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles. | 03-24-2011 |
20110072244 | Credit-Based Streaming Multiprocessor Warp Scheduling - One embodiment of the present invention sets forth a technique for ensuring cache access instructions are scheduled for execution in a multi-threaded system to improve cache locality and system performance. A credit-based technique may be used to control instruction by instruction scheduling for each warp in a group so that the group of warps is processed uniformly. A credit is computed for each warp and the credit contributes to a weight for each warp. The weight is used to select instructions for the warps that are issued for execution. | 03-24-2011 |
20110078416 | APPARATUS AND METHOD FOR CONTROL PROCESSING IN DUAL PATH PROCESSOR - A computer processor comprises a decode unit and a processing channel. The decode unit decodes a stream of instruction packets from a memory, each instruction packet comprising a plurality of instructions. The processing channel comprises a plurality of functional units and operable to perform control processing operations. The decode unit is operable to receive and decode instruction packets of a bit length of 64 bits and to detect if the instruction packet defines three control instructions each having a length of 21 bits. The decode unit detects that the instruction packet comprises the three control instructions. The control instructions are supplied to the processing channel for execution in the order in which they appear in the instruction packet. The detection uses an identification bit in the instruction packet. | 03-31-2011 |
20110099354 | Information processing apparatus and instruction decoder for the information processing apparatus - An information processing apparatus includes an instruction supplying section that supplies a plurality of instructions as a single instruction group, an executing section that repetitively executes a plurality of execution processes corresponding to the plurality of instructions in parallel, an issue timing control section that controls an issue timing of each of the instructions to the executing section so that the plurality of execution processes are executed with a timing delayed in accordance with a predetermined latency, and an operand transforming section that transforms an operand register address of each of the instructions in accordance with a predetermined increment value upon every repetition of execution in the executing section. | 04-28-2011 |
20110138152 | INSTRUCTION CONTROL DEVICE - A processor which executes threads having different characteristics is provided with an instruction control device. In the instruction control device, a first instruction control unit issues an instruction included in a first instruction sequence to an instruction execution unit. In addition, a second instruction control unit issues an instruction included in a second instruction sequence to the instruction execution unit. Here, the delay time of the second instruction control unit is shorter than the delay time of the first instruction control unit. | 06-09-2011 |
20110153991 | DUAL ISSUING OF COMPLEX INSTRUCTION SET INSTRUCTIONS - A system and method for issuing a processor instruction to multiple processing sections arranged in an out-of-order processing pipeline architecture. The multiple processing sections include a first execution unit with a pipeline length and a second execution unit operating upon data produced by the first execution unit. An instruction issue unit accepts a complex instruction that is cracked into respective micro-ops for the first execution unit and the second execution unit. The instruction issue unit issues the first micro-op to the first execution unit to produce intermediate data. The instruction issue unit then delays for a time period corresponding to the processing pipeline length of the first execution unit. After the delay, a second micro-op is issued to the second execution unit. | 06-23-2011 |
20110208950 | PROCESSES, CIRCUITS, DEVICES, AND SYSTEMS FOR SCOREBOARD AND OTHER PROCESSOR IMPROVEMENTS - A method of instruction issue ( | 08-25-2011 |
20110225398 | ADVANCED PROCESSOR SCHEDULING IN A MULTITHREADED SYSTEM - An advanced processor comprises a plurality of multithreaded processor cores each having a data cache and instruction cache. A data switch interconnect is coupled to each of the processor cores and configured to pass information among the processor cores. A messaging network is coupled to each of the processor cores and a plurality of communication ports. In one aspect of an embodiment of the invention, the data switch interconnect is coupled to each of the processor cores by its respective data cache, and the messaging network is coupled to each of the processor cores by its respective message station. Advantages of the invention include the ability to provide high bandwidth communications between computer systems and memory in an efficient and cost-effective manner. | 09-15-2011 |
20110296143 | PIPELINE PROCESSOR AND AN EQUAL MODEL CONSERVATION METHOD - A pipeline processor which meets a latency restriction on an equal model is provided. The pipeline processor includes a pipeline processing unit to process an instruction at a plurality of stages and an equal model compensator to store the results of the processing of some or all of the instructions located in the pipeline processing unit and to write the results of the processing in a register file based on the latency of each instruction. | 12-01-2011 |
20120023314 | PAIRED EXECUTION SCHEDULING OF DEPENDENT MICRO-OPERATIONS - A method and mechanism for reducing latency of a multi-cycle scheduler within a processor. A processor comprises a front end pipeline that determines data dependencies between instructions prior to a scheduling pipe stage. For each data dependency, a distance value is determined based on a number of instructions a younger dependent instruction is located from a corresponding older (in program order) instruction. When the younger dependent instruction is allocated an entry in a multi-cycle scheduler, this distance value may be used to locate an entry storing the older instruction in the scheduler. When the older instruction is picked for issue, the younger dependent instruction is marked as pre-picked. In an immediately subsequent clock cycle, the younger dependent instruction may be picked for issue, thereby reducing the latency of the multi-cycle scheduler. | 01-26-2012 |
20120089819 | ISSUING INSTRUCTIONS WITH UNRESOLVED DATA DEPENDENCIES - The described embodiments include a processor that determines instructions that can be issued based on unresolved data dependencies. In an issue unit in the processor, the processor keeps a record of each instruction that is directly or indirectly dependent on a base instruction. Upon determining that the base instruction has been deferred, the processor monitors instructions that are being issued from an issue queue to an execution unit for execution. Upon determining that an instruction from the record has reached a head of the issue queue, the processor immediately issues the instruction from the issue queue. | 04-12-2012 |
20120191950 | PREDICTING A RESULT FOR A PREDICATE-GENERATING INSTRUCTION WHEN PROCESSING VECTOR INSTRUCTIONS - The described embodiments provide a processor that executes vector instructions. In the described embodiments, while dispatching instructions at runtime, the processor encounters a predicate-generating instruction. Upon determining that a result of the predicate-generating instruction is predictable, the processor dispatches a prediction micro-operation associated with the predicate-generating instruction, wherein the prediction micro-operation generates a predicted result vector for the predicate-generating instruction. The processor then executes the prediction micro-operation to generate the predicted result vector. In the described embodiments, when executing the prediction micro-operation to generate the predicted result vector, if the predicate vector is received, for each element of the predicted result vector for which the predicate vector is active, otherwise, for each element of the predicted result vector, generating the predicted result vector comprises setting the element of the predicted result vector to true. | 07-26-2012 |
20120191951 | MFENCE and LFENCE Micro-Architectural Implementation Method and System - A system and method for fencing memory accesses. Memory loads can be fenced, or all memory access can be fenced. The system receives a fencing instruction that separates memory access instructions into older accesses and newer accesses. A buffer within the memory ordering unit is allocated to the instruction. The access instructions newer than the fencing instruction are stalled. The older access instructions are gradually retired. When all older memory accesses are retired, the fencing instruction is dispatched from the buffer. | 07-26-2012 |
20120204009 | MULTI-LEVEL REGISTER FILE SUPPORTING MULTIPLE THREADS - A processor includes an instruction fetch unit, an issue queue coupled to the instruction fetch unit, an execution unit coupled to the issue queue, and a multi-level register file including a first level register file having lower access latency and a second level register file having higher access latency. Each of the first and second level register files includes a plurality of physical registers for holding operands that is concurrently shared by a plurality of threads. The processor further includes a mapper that, at dispatch of an instruction specifying a source logical register from the instruction fetch unit to the issue queue, initiates a swap of a first operand associated with the source logical register that is in the second level register file with a second operand held in the first level register file. The issue queue, following the swap, issues the instruction to the execution unit for execution. | 08-09-2012 |
20120246447 | Region-Weighted Accounting of Multi-Threaded Processor Core According to Dispatch State - According to one embodiment of the present disclosure, an approach is provided in which a thread is selected from multiple active threads, along with a corresponding weighting value. Computational logic determines whether one of the multiple threads is dispatching an instruction and, if so, computes a dispatch weighting value using the selected weighting value and a dispatch factor that indicates a weighting adjustment of the selected weighting value. In turn, a resource utilization value of the selected thread is computed using the dispatch weighting value. | 09-27-2012 |
20120246448 | MEMORY FRAGMENTS FOR SUPPORTING CODE BLOCK EXECUTION BY USING VIRTUAL CORES INSTANTIATED BY PARTITIONABLE ENGINES - A system for executing instructions using a plurality of memory fragments for a processor. The system includes a global front end scheduler for receiving an incoming instruction sequence, wherein the global front end scheduler partitions the incoming instruction sequence into a plurality of code blocks of instructions and generates a plurality of inheritance vectors describing interdependencies between instructions of the code blocks. The system further includes a plurality of virtual cores of the processor coupled to receive code blocks allocated by the global front end scheduler, wherein each virtual core comprises a respective subset of resources of a plurality of partitionable engines, wherein the code blocks are executed by using the partitionable engines in accordance with a virtual core mode and in accordance with the respective inheritance vectors. A plurality memory fragments are coupled to the partitionable engines for providing data storage. | 09-27-2012 |
20120272045 | CONTROL METHOD AND SYSTEM OF MULTIPROCESSOR - A control method and a system for dispatching the execution sequence of the processes in a multiprocessors system so as to dispatch an operation sequence for executing different operation programs by a monitoring processor and a plurality of target processors. The monitoring processor obtains operation status of other processors from a buffer; the monitoring processor selects at least one target processor according to the operation status; the monitoring processor assigns the target processor to execute a corresponding slave operation program, and modifies the operation status of the target processors in the buffer module; and the monitoring processor repeats the setting the operation status and assigning other target processors to execute corresponding operation programs, till a master operation program is completed. | 10-25-2012 |
20120303937 | COMPUTER SYSTEM AND CONTROL METHOD THEREOF - A computer system used to execute an application includes a motion sensing unit, a processor and an instruction transfer unit. The motion sensing unit senses a gesture of a human body and generates an input instruction based on the gesture. The processor executes the application (or a game). The instruction transfer unit is connected with the motion sensing unit and the processor and serves as a communication interface between the motion sensing unit and the application. The instruction transfer unit transfers the input instruction to a control command, and the processor controls and executes the application in accordance with the control command. | 11-29-2012 |
20120331273 | Reduced Instruction Set - A method of reducing a set of instructions for execution on a processor, the method comprising: extracting information from a first instruction of the set of instructions; identifying unencoded space in one or more further instructions of the set of instructions; replacing the unencoded space of the one or more further instructions with the extracted information of the first instruction so as to form one or more amalgamated instructions; and removing the first instruction from the set of instructions. | 12-27-2012 |
20130013897 | METHOD TO DYNAMICALLY DISTRIBUTE A MULTI-DIMENSIONAL WORK SET ACROSS A MULTI-CORE SYSTEM - A method provides efficient dispatch/completion of an N Dimensional (ND) Range command in a data processing system (DPS). The method comprises: a compiler generating one or more commands from received program instructions; ND Range work processing (WP) logic determining when a command generated by the compiler will be implemented over an ND configuration of operands, where N is greater than one (1); automatically decomposing the ND configuration of operands into a one (1) dimension (1D) work element comprising P sequentially ordered work items that each represent one of the operands; placing the 1D work element within a command queue of the DPS; enabling sequential dispatching of 1D work items in ordered sequence from to one or more processing units; and generating an ND Range output by mapping the 1D work output result to an ND position corresponding to an original location of the operand represented by the 1D work item. | 01-10-2013 |
20130024666 | METHOD OF SCHEDULING A PLURALITY OF INSTRUCTIONS FOR A PROCESSOR - A method of scheduling a plurality of instructions for a processor comprises the steps of: establishing a functional unit resource table comprising a plurality of columns, each of which corresponds to one of a plurality of operation cycles of the processor and comprises a plurality of fields, each of which indicates a functional unit of the processor; establishing a ping-pong resource table comprising a plurality of columns, each of which corresponds to one of the plurality of operation cycles of the processor and comprises a plurality of fields, each of which indicates a read port or a write port of a register bank of the processor; and allotting the plurality of instructions to the plurality of operation cycles of the processor and registering the functional units and the ports of the register banks corresponding to the allotted instructions on the functional unit resource table and the ping-pong resource table. | 01-24-2013 |
20130042090 | TEMPORAL SIMT EXECUTION OPTIMIZATION - One embodiment of the present invention sets forth a technique for optimizing parallel thread execution in a temporal single-instruction multiple thread (SIMT) architecture. When the threads in a parallel thread group execute temporally on a common processing pipeline rather than spatially on parallel processing pipelines, execution cycles may be reduced when some threads in the parallel thread group are inactive due to divergence. Similarly, an instruction can be dispatched for execution by only one thread in the parallel thread group when the threads in the parallel thread group are executing a scalar instruction. Reducing the number of threads that execute an instruction removes unnecessary or redundant operations for execution by the processing pipelines. Information about scalar operands and operations and divergence of the threads is used in the instruction dispatch logic to eliminate unnecessary or redundant activity in the processing pipelines. | 02-14-2013 |
20130111191 | PROCESSOR INSTRUCTION ISSUE THROTTLING | 05-02-2013 |
20130117541 | SPECULATIVE EXECUTION AND ROLLBACK - One embodiment of the present invention sets forth a technique for speculatively issuing instructions to allow a processing pipeline to continue to process some instructions during rollback of other instructions. A scheduler circuit issues instructions for execution assuming that, several cycles later, when the instructions reach multithreaded execution units, that dependencies between the instructions will be resolved, resources will be available, operand data will be available, and other conditions will not prevent execution of the instructions. When a rollback condition exists at the point of execution for an instruction for a particular thread group, the instruction is not dispatched to the multithreaded execution units. However, other instructions issued by the scheduler circuit for execution by different thread groups, and for which a rollback condition does not exist, are executed by the multithreaded execution units. The instruction incurring the rollback condition is reissued after the rollback condition no longer exists. | 05-09-2013 |
20130138925 | PROCESSING CORE WITH SPECULATIVE REGISTER PREPROCESSING - A method and circuit arrangement speculatively preprocess data stored in a register file during otherwise unused cycles in an execution unit, e.g., to prenormalize denormal floating point values stored in a floating point register file, to decompress compressed values stored in a register file, to decrypt encrypted values stored in a register file, or to otherwise preprocess data that is stored in an unprocessed form in a register file. | 05-30-2013 |
20130145127 | ZERO VALUE PREFIXES FOR OPERANDS OF DIFFERING BIT-WIDTHS - A data processing system is provided in which destination operands to be stored within architectural registers are constrained to have zero values added as prefixes in order that the architectural register value has a fixed bit width irrespective of the bit width of the destination operand being written thereto. Instead of adding these zero values everywhere in the data path, they are instead represented by zero flags in at least the physical registers utilised for register renaming operations and in the result queue prior to results being written to the architectural register file. This saves circuitry resources and reduces energy consumption. | 06-06-2013 |
20130166885 | METHOD AND APPARATUS FOR ON-CHIP TEMPERATURE - When an instruction is executed on an integrated circuit (IC), an activity level and temperature are measured. A relationship between the activity level and temperature is determined, to estimate the temperature from the activity level. The activity level is monitored and is input to a scheduler, which estimates the IC temperature based on the activity level. The scheduler distributes work taking into account the temperature of various IC regions and may include distributing work to the IC region that has a lowest estimated temperature or relatively lower estimated temperature (e.g., lower than the average IC or IC region temperature). When the utilization level of one or more IC regions is high, the scheduler is configured to reduce the clock speed or the voltage of the one or more IC regions, or flag the regions as being unavailable for additional workload. | 06-27-2013 |
20130185542 | EXTERNAL AUXILIARY EXECUTION UNIT INTERFACE TO OFF-CHIP AUXILIARY EXECUTION UNIT - An external Auxiliary Execution Unit (AXU) interface is provided between a processing core disposed in a first programmable chip and an off-chip AXU disposed in a second programmable chip to integrate the AXU with an issue unit, a fixed point execution unit, and optionally other functional units in the processing core. The external AXU interface enables the issue unit to issue instructions to the AXU in much the same manner as the issue unit would be able to issue instructions to an AXU that was disposed on the same chip. By doing so, the AXU on the second programmable chip can be designed, tested and verified independent of the processing core on the first programmable chip, thereby enabling a common processing core, which has been designed, tested, and verified, to be used in connection with multiple different AXU designs. | 07-18-2013 |
20130191616 | INSTRUCTION CONTROL CIRCUIT, PROCESSOR, AND INSTRUCTION CONTROL METHOD - In a vector processing device, a data dependence detecting unit detects a data dependence relation between a preceding instruction and a succeeding instruction which are inputted from an instruction buffer, and an instruction issuance control unit controls issuance of an instruction based on a detection result thereof. When there is a data dependence relation between the preceding instruction and the succeeding instruction, the instruction issuance control unit generates a new instruction equivalent to processing related to a vector register including the data dependence relation with the succeeding instruction in processing executed by the preceding instruction and issues the new instruction between the preceding instruction and the succeeding instruction, and thereby a data hazard can be avoided between the preceding instruction and the succeeding instruction without making a stall occur. | 07-25-2013 |
20130283013 | METHOD AND APPARATUS FOR AGENT INTERFACING WITH PIPELINE BACKBONE TO LOCALLY HANDLE TRANSACTIONS WHILE OBEYING ORDERING RULE - In accordance with embodiments disclosed herein, there are provided methods, systems, and apparatuses for enabling an agent interfacing with a pipelined backbone to locally handle transactions while obeying an ordering rule including, for example, receiving a transaction which requests access to a backbone; decoding routing destination information from the transaction received, in which the decoded routing destination information designates the transaction to be processed either locally or processed via the backbone; storing the decoded routing destination information and the transaction into a First-In-First-Out (FIFO) buffer; retrieving the decoded routing destination information and the transaction from the FIFO buffer; and processing the transaction locally or via the backbone based on the decoded routing destination information retrieved from the FIFO buffer with the transaction. | 10-24-2013 |
20130305021 | METHOD FOR CONVERGENCE ANALYSIS BASED ON THREAD VARIANCE ANALYSIS - Basic blocks within a thread program are characterized for convergence based on variance analysis or corresponding instructions. Each basic block is marked as divergent based on transitive control dependence on a block that is either divergent or comprising a variant branch condition. Convergent basic blocks that are defined by invariant instructions are advantageously identified as candidates for scalarization by a thread program compiler. | 11-14-2013 |
20130305022 | Speeding Up Younger Store Instruction Execution after a Sync Instruction - Mechanisms are provided, in a processor, for executing instructions that are younger than a previously dispatched synchronization (sync) instruction is provided. An instruction sequencer unit of the processor dispatches a sync instruction. The sync instruction is sent to a nest of one or more devices outside of the processor. The instruction sequencer unit dispatches a subsequent instruction after dispatching the sync instruction. The dispatching of the subsequent instruction after dispatching the sync instruction is performed prior to receiving a sync acknowledgement response from the nest. The instruction sequencer unit performs a completion of the subsequent instruction based on whether completion of the subsequent instruction is dependent upon receiving the sync acknowledgement from the nest and completion of the sync instruction. | 11-14-2013 |
20130326197 | ISSUING INSTRUCTIONS TO EXECUTION PIPELINES BASED ON REGISTER-ASSOCIATED PREFERENCES, AND RELATED INSTRUCTION PROCESSING CIRCUITS, PROCESSOR SYSTEMS, METHODS, AND COMPUTER-READABLE MEDIA - Issuing instructions to execution pipelines based on register-associated preferences and related instruction processing circuits, systems, methods, and computer-readable media are disclosed. In one embodiment, an instruction is detected in an instruction stream. Upon determining that the instruction specifies at least one source register, an execution pipeline preference(s) is determined based on at least one pipeline indicator associated with the at least one source register in a pipeline issuance table, and the instruction is issued to an execution pipeline based on the execution pipeline preference(s). Upon a determination that the instruction specifies at least one target register, at least one pipeline indicator associated with the at least one target register in the pipeline issuance table is updated based on the execution pipeline to which the instruction is issued. In this manner, optimal forwarding of instructions may be facilitated, thus improving processor performance. | 12-05-2013 |
20130339669 | NONTRANSACTIONAL STORE INSTRUCTION - A NONTRANSACTIONAL STORE instruction, executed in transactional execution mode, performs stores that are retained, even if a transaction associated with the instruction aborts. The stores include user-specified information that may facilitate debugging of an aborted transaction. | 12-19-2013 |
20130346729 | PIPELINING OUT-OF-ORDER INSTRUCTIONS - Systems, methods and computer program product provide for pipelining out-of-order instructions. Embodiments comprise an instruction reservation station for short instructions of a short latency type and long instructions of a long latency type, an issue queue containing at least two short instructions of a short latency type, which are to be chained to match a latency of a long instruction of a long latency type, a register file, at least one execution pipeline for instructions of a short latency type and at least one execution pipeline for instructions of a long latency type; wherein results of the at least one execution pipeline for instructions of the short latency type are written to the register file, preserved in an auxiliary buffer, or forwarded to inputs of said execution pipelines. Data of the auxiliary buffer are written to the register file. | 12-26-2013 |
20140052965 | DYNAMIC CPU GPU LOAD BALANCING USING POWER - Dynamic CPU GPU load balancing is described based on power. In one example, an instruction is received and power values are received for a central processing core (CPU) and a graphics processing core (GPU). The CPU or the GPU is selected based on the received power values and the instruction is sent to the selected core for processing. | 02-20-2014 |
20140082330 | ENHANCED INSTRUCTION SCHEDULING DURING COMPILATION OF HIGH LEVEL SOURCE CODE FOR IMPROVED EXECUTABLE CODE - Systems and methods for static code scheduling are disclosed. A method can include receiving an intermediate representation of source code, building a directed acyclic graph (DAG) for the intermediate representation, and creating chains of dependent instructions from the DAG for cluster formation. The chains are merged into clusters and each node in the DAG is marked with an identifier of a cluster it is part of to generate a marked instruction DAG. Instruction DAG scheduling is then performed using information about the clusters to generate an ordered intermediate representation of the source code. | 03-20-2014 |
20140129805 | EXECUTION PIPELINE POWER REDUCTION - Systems and methods for reducing power consumption by an execution pipeline are provided. In one example, a method includes stalling an operation from being executed in the execution pipeline based on inputs to the operation being unavailable in a register file and disabling access to read the register file in favor of controlling a bypass network based on the consumer characteristics of the operation and producer characteristics of other operations being executed in the execution pipeline to forward data produced at an execution stage in the execution pipeline to be used as one or more resources of the operation. | 05-08-2014 |
20140189312 | PROGRAMMABLE HARDWARE ACCELERATORS IN CPU - Embodiments of the present invention may include a data processing system comprising a processing execution block to execute instructions stored in an instruction queue, a programmable hardware accelerator, and a controller programmed to monitor the instruction queue to detect a first type of instructions stored in the instruction queue, reprogram the programmable hardware accelerator to execute the first type of instructions, and transmit the first type of instructions to the programmable hardware accelerator to be executed. | 07-03-2014 |
20140189313 | QUEUED INSTRUCTION RE-DISPATCH AFTER RUNAHEAD - Various embodiments of microprocessors and methods of operating a microprocessor during runahead operation are disclosed herein. One example method of operating a microprocessor includes identifying a runahead-triggering event associated with a runahead-triggering instruction and, responsive to identification of the runahead-triggering event, entering runahead operation and inserting the runahead-triggering instruction along with one or more additional instructions in a queue. The example method also includes resuming non-runahead operation of the microprocessor in response to resolution of the runahead-triggering event and re-dispatching the runahead-triggering instruction along with the one or more additional instructions from the queue to the execution logic. | 07-03-2014 |
20140201501 | DYNAMIC ACCESSING OF EXECUTION ELEMENTS THROUGH MODIFICATION OF ISSUE RULES - Embodiments of the invention relate to dynamically routing instructions to execution units based on detected errors in the execution units. An aspect of the invention includes a computer system including a processor having an instruction issue unit and a plurality of execution units. The processor is configured to detect an error in a first execution unit among the plurality of execution units and adjust instruction dispatch rules of the instruction issue unit based on detecting the error in the first execution unit to restrict access to the first execution unit while leaving un-restricted access to the remaining execution units of the plurality of execution units. | 07-17-2014 |
20140223143 | LOAD LATENCY SPECULATION IN AN OUT-OF-ORDER COMPUTER PROCESSOR - Load latency speculation in an out-of-order computer processor, including: issuing a load instruction for execution, wherein the load instruction has a predetermined expected execution latency; issuing a dependent instruction wakeup signal on an instruction wakeup bus, wherein the dependent instruction wakeup signal indicates that the load instruction will be completed upon the expiration of the expected execution latency; determining, upon the expiration of the expected execution latency, whether the load instruction has completed; and responsive to determining that the load instruction has not completed upon the expiration of the expected execution latency, issuing a negative dependent instruction wakeup signal on the instruction wakeup bus, wherein the negative dependent instruction wakeup signal indicates that the load instruction has not completed upon the expiration of the expected execution latency. | 08-07-2014 |
20140223144 | Load Latency Speculation In An Out-Of-Order Computer Processor - Load latency speculation in an out-of-order computer processor, including: issuing a load instruction for execution, wherein the load instruction has a predetermined expected execution latency; issuing a dependent instruction wakeup signal on an instruction wakeup bus, wherein the dependent instruction wakeup signal indicates that the load instruction will be completed upon the expiration of the expected execution latency; determining, upon the expiration of the expected execution latency, whether the load instruction has completed; and responsive to determining that the load instruction has not completed upon the expiration of the expected execution latency, issuing a negative dependent instruction wakeup signal on the instruction wakeup bus, wherein the negative dependent instruction wakeup signal indicates that the load instruction has not completed upon the expiration of the expected execution latency. | 08-07-2014 |
20140281402 | PROCESSOR WITH HYBRID PIPELINE CAPABLE OF OPERATING IN OUT-OF-ORDER AND IN-ORDER MODES - A method and circuit arrangement provide support for a hybrid pipeline that dynamically switches between out-of-order and in-order modes. The hybrid pipeline may selectively execute instructions from at least one instruction stream that require the high performance capabilities provided by out-of-order processing in the out-of-order mode. The hybrid pipeline may also execute instructions that have strict power requirements in the in-order mode where the in-order mode conserves more power compared to the out-of-order mode. Each stage in the hybrid pipeline may be activated and fully functional when the hybrid pipeline is in the out-of-order mode. However, stages in the hybrid pipeline not used for the in-order mode may be deactivated and bypassed by the instructions when the hybrid pipeline dynamically switches from the out-of-order mode to the in-order mode. The deactivated stages may then be reactivated when the hybrid pipeline dynamically switches from the in-order mode to the out-of-order mode. | 09-18-2014 |
20140281403 | CHAINING BETWEEN EXPOSED VECTOR PIPELINES - Embodiments include a method for chaining data in an exposed-pipeline processing element. The method includes separating a multiple instruction word into a first sub-instruction and a second sub-instruction, receiving the first sub-instruction and the second sub-instruction in the exposed-pipeline processing element. The method also includes issuing the first sub-instruction at a first time, issuing the second sub-instruction at a second time different than the first time, the second time being offset to account for a dependency of the second sub-instruction on a first result from the first sub-instruction, the first pipeline performing the first sub-instruction at a first clock cycle and communicating the first result from performing the first sub-instruction to a chaining bus coupled to the first pipeline and a second pipeline, the communicating at a second clock cycle subsequent to the first clock cycle that corresponds to a total number of latch pipeline stages in the first pipeline. | 09-18-2014 |
20140289500 | METHOD AND APPARATUS FOR PROVIDING AN INTERFACE BETWEEN A UICC AND A PROCESSOR IN AN ACCESS TERMINAL THAT SUPPORTS ASYNCHRONOUS COMMAND PROCESSING BY THE UICC - Techniques for providing an interface between a UICC and a processor, included in an access terminal, that supports asynchronous command processing by the UICC, are described. A first complex command, with a first processing time, may be received from the processor. An initial response to the first command, including a token, may be sent to the processor. The first command may be processed for the first processing time. At least one additional command, having a processing time shorter than the first processing time, may be received from the processor. Processing of the first command may be completed. Processing of a current one of the at least one additional command, which was being processed before, during, or after completion of the processing of the first command, may be completed. A response to the current one of the at least one additional command, including the token, may be sent to the processor. | 09-25-2014 |
20140310506 | ALLOCATING STORE QUEUE ENTRIES TO STORE INSTRUCTIONS FOR EARLY STORE-TO-LOAD FORWARDING - The present invention provides a method and apparatus for allocating store queue entries to store instructions for early store-to-load forwarding. Some embodiments of the method include allocating an entry in a store queue to a store instruction in response to the store instruction being dispatched and prior to receiving a translation of a virtual address to a physical address associated with the store instruction. The entry includes storage for data to be written to the physical address by the store instruction. | 10-16-2014 |
20140325187 | SINGLE-CYCLE INSTRUCTION PIPELINE SCHEDULING - A method includes allocating a first single-cycle instruction to a first pipeline that picks single-cycle instructions for execution in program order. The method further includes marking at least one source register of the first single-cycle instruction as ready for execution in the first pipeline in response to all older single-cycle instructions allocated to the first pipeline being ready and eligible to be picked for execution. An apparatus includes a decoder to decode a first single-cycle instruction and to allocate the first single-cycle instruction to a first pipeline. The apparatus further includes a scheduler to pick single-cycle instructions for execution by the first pipeline in program order and to mark at least one source register of the first single-cycle instruction as ready for execution in the first pipeline in response to determining that all older single-cycle instructions allocated to the first pipeline are ready and eligible. | 10-30-2014 |
20140351562 | TECHNIQUES FOR SCHEDULING OPERATIONS AT AN INSTRUCTION PIPELINE - A dispatch stage of a processor core dispatches designated operations (e.g. load/store operations) to a temporary queue when the resources to execute the designated operations are not available. Once the resources become available to execute an operation at the temporary queue, the operation is transferred to a scheduler queue where it can be picked for execution. By dispatching the designated operations to the temporary queue, other operations behind the designated operations in a program order are made available for dispatch to the scheduler queue, thereby improving instruction throughput at the processor core. | 11-27-2014 |
20140372732 | ACCELERATED REVERSAL OF SPECULATIVE STATE CHANGES AND RESOURCE RECOVERY - A method includes undoing, in reverse program order, changes in a state of a processing device caused by speculative instructions previously dispatched for execution in the processing device and concurrently deallocating resources previously allocated to the speculative instructions in response to interruption of dispatch of instructions due to a flush of the speculative instructions. A processor device comprises a retire queue to store entries for instructions that are awaiting retirement and a finite state machine. The finite state machine is to interrupt dispatch of instructions in response to a flush of speculative instructions previously dispatched for execution in the processing device and to undo, in reverse program order, changes in a state of the processing device caused by the speculative instructions while concurrently deallocating resources previously allocated to the speculative instructions. | 12-18-2014 |
20150026436 | HYBRID TAG SCHEDULER - The present invention provides a method and apparatus for scheduling based on tags of different types. Some embodiments of the method include broadcasting a first tag to entries in a queue of a scheduler. The first tag is broadcast in response to a first instruction associated with a first entry in the queue being picked for execution. The first tag includes information identifying the first entry and information indicating a type of the first tag. Some embodiments of the method also include marking at least one second entry in the queue is ready to be picked for execution in response to at least one second tag associated with at least one second entry in the queue matching the first tag. | 01-22-2015 |
20150074379 | System and Method for an Asynchronous Processor with Token-Based Very Long Instruction Word Architecture - Embodiments are provided for an asynchronous processor with token-based very long instruction word architecture. The asynchronous processor comprises a memory configured to cache a plurality of instructions, a feedback engine configured to receive the instructions in bundles of instructions at a time (referred to as very long instruction word) and to decode the instructions, and a crossbar bus configured to transfer calculation information and results of the asynchronous processor. The apparatus further comprises a plurality of sets of execution units (XUs) between the feedback engine and the crossbar bus. Each set of the sets of XUs comprises a plurality of XUs arranged in series and configured to process a bundle of instructions received at the each set from the feedback engine. | 03-12-2015 |
20150089198 | TECHNIQUE FOR REDUCING VOLTAGE DROOP BY THROTTLING INSTRUCTION ISSUE RATE - An issue control unit is configured to control the rate at which an instruction issue unit issues instructions to an execution pipeline in order to avoid spikes in power drawn by that execution pipeline. The issue control unit maintains a history buffer that reflects, for N previous cycles, the number of instructions issued during each of those N cycles. If the total number of instructions issued during the N previous cycles exceeds a threshold value, then the issue control unit throttles the instruction issue unit from issuing instructions during a subsequent cycle. In addition, the issue control unit increases the threshold value in proportion to the number of previously issued instructions and based on a variety of configurable parameters. Accordingly, the issue control unit maintains granular control over the rate with which the instruction issue unit “ramps up” to a maximum instruction issue rate. | 03-26-2015 |
20150095618 | VIRTUAL LOAD STORE QUEUE HAVING A DYNAMIC DISPATCH WINDOW WITH A UNIFIED STRUCTURE - An out of order processor. The processor includes a virtual load store queue for allocating a plurality of loads and a plurality of stores, wherein more loads and more stores can be accommodated beyond an actual physical size of the load store queue of the processor; wherein the processor allocates other instructions besides loads and stores beyond the actual physical size limitation of the load/store queue; and wherein the other instructions can be dispatched and executed even though intervening loads or stores do not have spaces in the load store queue. | 04-02-2015 |
20150113251 | Systems and Methods for Register Allocation - System and methods are provided for register allocation. An original code block and a target code block associated with a branch of an execution loop are determined. An original allocation of a plurality of physical registers to one or more original variables associated with the original code block is detected. A target allocation of the plurality of physical registers to one or more target variables associated with the target code block is determined. One or more temporary registers are selected from the plurality of physical registers based at least in part on the original allocation and the target allocation. The original allocation is changed to the target allocation using the selected temporary registers. Specifically, one or more instructions are generated to change the original allocation to the target allocation using the selected temporary registers. The instructions are executed using one or more processors. | 04-23-2015 |
20150301831 | SELECT LOGIC FOR THE INSTRUCTION SCHEDULER OF A MULTI STRAND OUT-OF-ORDER PROCESSOR BASED ON DELAYED RECONSTRUCTED PROGRAM ORDER - A processing device comprises select logic to schedule a plurality of instructions for execution. The select logic calculates a reconstructed program order (RPO) value for each of a plurality of instructions that are ready to be scheduled for execution. The select logic creates an ordered list of instructions based on the delayed RPO values, the delayed RPO values comprising the calculated RPO values from a previous execution cycle, and dispatches instructions for scheduling based on the ordered list. | 10-22-2015 |
20150363205 | IMPLEMENTING OUT OF ORDER PROCESSOR INSTRUCTION ISSUE QUEUE - A method and apparatus are provided for implementing an enhanced out of order processor instruction issue queue in a computer system. Instructions are selectively accepted into an instruction issue queue and ages are assigned to the accepted queue entry instructions using a queue counter. The queue entry instructions are issued based upon resources being ready and ages of the instructions. Ages of the queue entry instructions and the queue counter are selectively decremented, responsive to issuing instructions. | 12-17-2015 |
20150363206 | IMPLEMENTING OUT OF ORDER PROCESSOR INSTRUCTION ISSUE QUEUE - A method and apparatus are provided for implementing an enhanced out of order processor instruction issue queue in a computer system. Instructions are selectively accepted into an instruction issue queue and ages are assigned to the accepted queue entry instructions using a queue counter. The queue entry instructions are issued based upon resources being ready and ages of the instructions. Ages of the queue entry instructions and the queue counter are selectively decremented, responsive to issuing instructions. | 12-17-2015 |
20150370569 | INSTRUCTION PROCESSING SYSTEM AND METHOD - An instruction processing system is provided. The system includes a central processing unit (CPU), an m number of memory devices and an instruction control unit. The CPU is capable of being coupled to the m number of memory devices. Further, the CPU is configured to execute one or more instructions of the executable instructions. The m number of memory devices with different access speeds are configured to store the instructions, where m is a natural number greater than 1. The instruction control unit is configured to, based on a track address of a target instruction of a branch instruction stored in a track table, control a memory with a lower speed to provide the instruction for a memory with a higher speed. | 12-24-2015 |
20150370572 | Multi-User Processor System for Processing Information - This multi-user processor system for processing information, of the type including a data exchange engine ( | 12-24-2015 |
20160041828 | METHOD AND SYSTEM FOR GENERATING OBJECT CODE TO FACILITATE PREDICTIVE MEMORY RETRIEVAL - A method and system are described for generating reference tables in object code which specify the addresses of branches, routines called, and data references used by routines in the code. In a suitably equipped processing system, the reference tables can be passed to a memory management processor which can open the appropriate memory pages to expedite the retrieval of data referenced in the execution pipeline. The disclosed method and system create such reference tables at the beginning of each routine so that the table can be passed to the memory management processor in a suitably equipped processor. Resulting object code also allows processors lacking a suitable memory management processor to skip the reference table, preserving upward compatibility. | 02-11-2016 |
20160070574 | REGISTER FILES FOR STORING DATA OPERATED ON BY INSTRUCTIONS OF MULTIPLE WIDTHS - A processor core includes even and odd execution slices each having a register file. The slices are each configured to perform operations specified in a first set of instructions on data from its respective register file, and together configured to perform operations specified in a second set of instructions on data stored across both register files. During utilization, the processor receives a first instruction of the first set specifying an operation, a target register, and a source register. Next, a second instruction upon which content of the source register depends is identified as being of the second set. In response, the first instruction is dispatched to the even slice. In accordance with the operation specified in the first instruction, the even slice uses content of the source register in its register file to produce a result. Copies of the result are written to the target register in both register files. | 03-10-2016 |
20160070576 | SPECULATIVE REGISTER FILE READ SUPPRESSION - A single threaded out-of-order processor | 03-10-2016 |
20160092231 | INDEPENDENT MAPPING OF THREADS - Embodiments of the present invention provide systems and methods for mapping the architected state of one or more threads to a set of distributed physical register files to enable independent execution of one or more threads in a multiple slice processor. In one embodiment, a system is disclosed including a plurality of dispatch queues which receive instructions from one or more threads and an even number of parallel execution slices, each parallel execution slice containing a register file. A routing network directs an output from the dispatch queues to the parallel execution slices and the parallel execution slices independently execute the one or more threads. | 03-31-2016 |
20160154653 | MEMORY FRAGMENTS FOR SUPPORTING CODE BLOCK EXECUTION BY USING VIRTUAL CORES INSTANTIATED BY PARTITIONABLE ENGINES | 06-02-2016 |
20160202988 | PARALLEL SLICE PROCESSING METHOD USING A RECIRCULATING LOAD-STORE QUEUE FOR FAST DEALLOCATION OF ISSUE QUEUE ENTRIES | 07-14-2016 |
20160253179 | CONCURRENT EXECUTION OF HETEROGENEOUS VECTOR INSTRUCTIONS | 09-01-2016 |
20160378502 | AGE-BASED MANAGEMENT OF INSTRUCTION BLOCKS IN A PROCESSOR INSTRUCTION WINDOW - A processor core in an instruction block-based microarchitecture includes a control unit that explicitly tracks instruction block state including age or priority for current blocks that have been fetched from an instruction cache. Tracked instruction blocks are maintained in an age-ordered or priority-ordered list. When an instruction block is identified by the control unit for commitment, the list is checked for a match and a matching instruction block can be refreshed without re-fetching from the instruction cache. If a match is not found, an instruction block can be committed and replaced based on either age or priority. Such instruction state tracking typically consumes little overhead and enables instruction blocks to be reused and mispredicted instructions to be skipped to increase processor core efficiency. | 12-29-2016 |
20160378503 | TECHNIQUES TO WAKE-UP DEPENDENT INSTRUCTIONS FOR BACK-TO-BACK ISSUE IN A MICROPROCESSOR - Techniques are disclosed for back-to-back issue of instructions in a processor. A first instruction is stored in a queue position in an issue queue. The issue queue stores instructions in a corresponding queue position. The first instruction includes a target instruction tag and at least a source instruction tag. The target instruction tag is stored in a table storing a plurality of target instruction tags associated with a corresponding instruction. Each stored target instruction tag specifies a logical register that stores a target operand. Upon determining, based on the source instruction tag associated with the first instruction and the target instruction tag associated with a second instruction, that the first instruction is dependent on the second instruction, a pointer to the first instruction is associated with the second instruction. The pointer is used to wake up the first instruction upon issue of the second instruction. | 12-29-2016 |
20160378504 | TECHNIQUES TO WAKE-UP DEPENDENT INSTRUCTIONS FOR BACK-TO-BACK ISSUE IN A MICROPROCESSOR - Techniques are disclosed for back-to-back issue of instructions in a processor. A first instruction is stored in a queue position in an issue queue. The issue queue stores instructions in a corresponding queue position. The first instruction includes a target instruction tag and at least a source instruction tag. The target instruction tag is stored in a table storing a plurality of target instruction tags associated with a corresponding instruction. Each stored target instruction tag specifies a logical register that stores a target operand. Upon determining, based on the source instruction tag associated with the first instruction and the target instruction tag associated with a second instruction, that the first instruction is dependent on the second instruction, a pointer to the first instruction is associated with the second instruction. The pointer is used to wake up the first instruction upon issue of the second instruction. | 12-29-2016 |
20170235577 | OPERATION OF A MULTI-SLICE PROCESSOR IMPLEMENTING A MECHANISM TO OVERCOME A SYSTEM HANG | 08-17-2017 |
20170235578 | Method and Apparatus for Scheduling of Instructions in a Multi-Strand Out-Of-Order Processor | 08-17-2017 |
20170235581 | INSTRUCTIONS FOR MANAGING A PARALLEL CACHE HIERARCHY | 08-17-2017 |
20220137973 | Program Thread Selection Between a Plurality of Execution Pipelines - Techniques are disclosed relating to an apparatus that includes a plurality of execution pipelines including first and second execution pipelines, a shared circuit that is shared by the first and second execution pipelines, and a decode circuit. The first and second execution pipelines are configured to concurrently perform operations for respective instructions. The decode circuit is configured to assign a first program thread to the first execution pipeline and a second program thread to the second execution pipeline. In response to determining that respective instructions from the first and second program threads that utilize the shared circuit are concurrently available for dispatch, the decode circuit is further configured to select between the first program thread and the second program thread. | 05-05-2022 |