Patent application number | Description | Published |
20090210656 | METHOD AND SYSTEM FOR OVERLAPPING EXECUTION OF INSTRUCTIONS THROUGH NON-UNIFORM EXECUTION PIPELINES IN AN IN-ORDER PROCESSOR - A system and method for overlapping execution (OE) of instructions through non-uniform execution pipelines in an in-order processor are provided. The system includes a first execution unit to perform instruction execution in a first execution pipeline. The system also includes a second execution unit to perform instruction execution in a second execution pipeline, where the second execution pipeline includes a greater number of stages than the first execution pipeline. The system further includes an instruction dispatch unit (IDU), the IDU including OE registers and logic for dispatching an OE-capable instruction to the first execution unit such that the instruction completes execution prior to completing execution of a previously dispatched instruction to the second execution unit. The system additionally includes a latch to hold a result of the execution of the OE-capable instruction until after the second execution unit completes the execution of the previously dispatched instruction. | 08-20-2009 |
20090217002 | SYSTEM AND METHOD FOR PROVIDING ASYNCHRONOUS DYNAMIC MILLICODE ENTRY PREDICTION - A system and method for asynchronous dynamic millicode entry prediction in a processor are provided. The system includes a branch target buffer (BTB) to hold branch information. The branch information includes: a branch type indicating that the branch represents a millicode entry (mcentry) instruction targeting a millicode subroutine, and an instruction length code (ILC) associated with the mcentry instruction. The system also includes search logic to perform a method. The method includes locating a branch address in the BTB for the mcentry instruction targeting the millicode subroutine, and determining a return address to return from the millicode subroutine as a function of the an instruction address of the mcentry instruction and the ILC. The system further includes instruction fetch controls to fetch instructions of the millicode subroutine asynchronous to the search logic. The search logic may also operate asynchronous with respect to an instruction decode unit. | 08-27-2009 |
20090240914 | RECYCLING LONG MULTI-OPERAND INSTRUCTIONS - A pipelined microprocessor configured for long operand instructions is disclosed. The microprocessor includes a memory unit and a load-store unit. The load store unit is coupled to the memory unit and includes a data formatter receiving information from the memory unit and including an operand selector and a shift register portion. The microprocessor also includes an execution unit coupled to the load-store unit and receiving operand information there from. The execution unit includes output latches coupled to a storage location within the execution unit for storing output information from the execution unit. | 09-24-2009 |
20090240918 | METHOD, COMPUTER PROGRAM PRODUCT, AND HARDWARE PRODUCT FOR ELIMINATING OR REDUCING OPERAND LINE CROSSING PENALTY - Eliminating or reducing an operand line crossing penalty by performing an initial fetch for an operand from a data cache of a processor. The initial fetch is performed by allowing or permitting the initial fetch to occur unaligned with reference to a quadword boundary. A plurality of subsequent fetches for a corresponding plurality of operands from the data cache are performed wherein each of the plurality of subsequent fetches is aligned to any of a plurality of quadword boundaries to prevent each of a plurality of individual fetch requests from spanning a plurality of lines in the data cache. A steady stream of data is maintained by placing an operand buffer at an output of the data cache to store and merge data from the initial fetch and the plurality of subsequent fetches, and to return the stored and merged data to the processor. | 09-24-2009 |
20090240922 | METHOD, SYSTEM, COMPUTER PROGRAM PRODUCT, AND HARDWARE PRODUCT FOR IMPLEMENTING RESULT FORWARDING BETWEEN DIFFERENTLY SIZED OPERANDS IN A SUPERSCALAR PROCESSOR - Result and operand forwarding is provided between differently sized operands in a superscalar processor by grouping a first set of instructions for operand forwarding, and grouping a second set of instructions for result forwarding, the first set of instructions comprising a first source instruction having a first operand and a first dependent instruction having a second operand, the first dependent instruction depending from the first source instruction; the second set of instructions comprising a second source instruction having a third operand and a second dependent instruction having a fourth operand, the second dependent instruction depending from the second source instruction, performing operand forwarding by forwarding the first operand, either whole or in part, as it is being read to the first dependent instruction prior to execution; performing result forwarding by forwarding a result of the second source instruction, either whole or in part, to the second dependent instruction, after execution; wherein the operand forwarding is performed by executing the first source instruction together with the first dependent instruction; and wherein the result forwarding is performed by executing the second source instruction together with the second dependent instruction. | 09-24-2009 |
20090240929 | METHOD, SYSTEM AND COMPUTER PROGRAM PRODUCT FOR REDUCED OVERHEAD ADDRESS MODE CHANGE MANAGEMENT IN A PIPELINED, RECYLING MICROPROCESSOR - A method, system, and computer program product for reduced overhead address mode change management in a pipelined, recycling microprocessor are provided. The recycling microprocessor includes logic executing thereon. The microprocessor also includes an instruction fetch unit (IFU) supporting computation of address adds in selected address modes and reporting non-equal comparison of the computation to the logic. The microprocessor further includes a fixed point unit determining whether the mode has changed and reporting changes to the logic. Upon determining the comparison yields an equal result but the mode has changed, a recycle event is triggered to ensure subsequent ofetches are relaunched in the correct mode and that no execution writebacks occur from work performed in an incorrect mode. For comparisons yielding a non-equal result and a changed mode, the logic clears bits set in response to the determinations, and a serialization event is taken to reset a corresponding pipeline for operation in the correct mode. | 09-24-2009 |
20130339670 | REDUCING OPERAND STORE COMPARE PENALTIES - Embodiments relate to reducing operand store compare penalties by detecting potential unit of operation (UOP) dependencies. An aspect includes a computer system for reducing operation store compare penalties. The system includes memory and a processor. The system performs a method including cracking an instruction into units of operation, where each UOP includes instruction text and address determination fields. The method includes identifying a load UOP among the plurality of UOPs and comparing values of the address determination fields of the load UOP with values of address determination fields of one or more previously-decoded store UOPs. The method also includes forcing, prior to issuance of the instruction to an execution unit, a dependency between the load UOP and the one or more previously-decoded store UOPs based on the comparing. | 12-19-2013 |
20150261501 | CHECKSUM ADDER - Embodiments relate to a hardware circuit that is operable as a fixed point adder and a checksum adder. An aspect includes a driving of a multifunction compression tree disposed on a circuit path based on a control bit to execute one of first and second schemes of vector input addition and a driving of a multifunction adder disposed on the circuit path based on the control bit to perform the one of the first and second schemes of vector input addition. | 09-17-2015 |
20150261503 | VECTORIZED GALOIS FIELD MULTIPLICATION - Embodiments relate to vectorized Galois field multiplication. An aspect includes a subdivision of first and second input operands into vector elements of equal sizes with multiple modes defined such that a base mode has a size corresponding to a smallest vector element size, which is a factor of a size of the first and second input operands, and a higher mode has a size that is a multiple of the base mode size. The vector elements of the first input operand are modified with a bit mask based on a size of the vector elements. The modified vector elements of the first input operand and the vector elements of the second input operand are input into a single hardware tree configured for subdivision into staggered subtrees a size of each of which being based on the base mode size. | 09-17-2015 |
20150261504 | VECTORIZED GALOIS FIELD MULTIPLICATION - Embodiments relate to vectorized Galois field multiplication. An aspect includes an input of first and second input operands of equal sizes into a single hardware tree, a calculation of a predicted parity as a parity of the first input operand ANDed with a parity of the second input operand, a comparison of the predicted parity with a parity generated on a final result of a Galois field multiplication of the first and second operands and a raising of an error based on a mismatch between the predicted parity and the generated parity. | 09-17-2015 |