Patent application number | Description | Published |
20090031116 | THREE OPERAND INSTRUCTION EXTENSION FOR X86 ARCHITECTURE - A method and apparatus are contemplated for increasing the number of available instructions in an instruction set architecture. The new instructions extend the number of general-purpose registers and include three or more operands. A combination of an escape code field, an opcode field, an operation configuration field and an operation size field determines a unique new instruction operation. A source operand extension field includes bits to be combined with other fields in order to extend the number of source operand values for general-purpose registers. | 01-29-2009 |
20100281239 | RELIABLE EXECUTION USING COMPARE AND TRANSFER INSTRUCTION ON AN SMT MACHINE - A system and method for efficient reliable execution on a simultaneous multithreading machine. A processor is placed in a reliable execution mode (REM) to detect possible errors during execution of a mission critical software application. Only two threads may be configured to operate in this mode. Floating-point store and integer-transfer unary instructions may be converted to new binary instructions. Each new instruction has two source operands, each one corresponding to a different thread is specified by a same logical register number as a single source operand of the original unary instruction. All other instructions are replicated, wherein the original instruction and its twin are assigned to different threads. Simultaneous multi-threaded (SMT) floating-point logic may only be able to provide lockstep execution when it communicates using the new instruction with instantiated integer independent clusters. The new instruction cannot begin until both source operands are ready, which are subsequently compared to determine any mismatches or errors. | 11-04-2010 |
20100318771 | COMBINED BYTE-PERMUTE AND BIT SHIFT UNIT - A processor includes a decode unit and a byte permute unit. The byte permute unit receives an instruction from the decode unit. The byte permute unit determines whether the instruction corresponds to a shuffle instruction or a shift instruction. For a shuffle instruction, the byte permute unit uses a byte shuffler to perform a shuffle operation indicated by the instruction. For a shift instruction that indicates a shift magnitude, the byte permute unit uses the byte shuffler to byte-level shift a source operand corresponding to the instruction by an integer number of bytes. The byte permute unit also generates a sequence of output bits by bit-shifting the byte-level shifted source operand by a number of bits such that the sum of the number of bits and the integer number of bytes is equal to the shift magnitude. | 12-16-2010 |
20100318772 | SUPERSCALAR REGISTER-RENAMING FOR A STACK-ADDRESSED ARCHITECTURE - A system and method for increasing processor throughput by decreasing a loop critical path. In one embodiment, a table comprises multiple stack entries, each comprising an x87 floating-point (FP) stack specifier. The combinatorial logic for operand translation of N FP instructions per clock cycle may require N instantiated copies of a combinatorial logic block. Each instantiated copy may determine a new ordering of the stack entries. Control logic may receive necessary information from the corresponding N FP instructions and determine a corresponding combined computational effect, or stack reordering, on entries within the table based on two or more instructions. Resulting control signals are conveyed to the N instantiated copies. A resulting accumulative delay from an input of the first copy to the output of the Nth copy may be less than or equal to (N−1)*time_delay versus a longer N*time_delay. | 12-16-2010 |
20120005459 | PROCESSOR HAVING INCREASED PERFORMANCE AND ENERGY SAVING VIA MOVE ELIMINATION - Methods and apparatuses are provided for increasing processor performance and energy saving via eliminating physical data movement to accomplish a move instruction. The apparatus comprises a first plurality of available physical registers mapped to a second plurality of logical registers, including a source logical register and a destination logical register. A renaming unit remaps the destination logical register to the same physical register mapping as the source logical register in response to a move instruction. In this way, the move instruction is effectively executed without moving data between physical registers. A method is provided for increasing processor performance and energy saving via eliminating physical data movement to accomplish a move instruction. The method comprises determining a mapping of a logical source register and a logical destination register to physical registers of a processor and then remapping the logical destination register to the same physical register mapping as the logical source register to affect an equivalent of the move instruction with actual data movement between physical registers. | 01-05-2012 |
20120166769 | PROCESSOR HAVING INCREASED PERFORMANCE VIA ELIMINATION OF SERIAL DEPENDENCIES - Methods and apparatuses are provided for achieving increased performance via elimination of serial dependencies in instructions or instruction sequences. The apparatus comprises an operational unit for determining whether an instruction will cause dependencies during completion in an execution unit. Responsive to that determination the instruction is replaced with an alternative instruction for completion in the execution unit. In this way, the alternative instruction is completed without causing dependencies in the execution unit. The method comprises determining that an instruction will cause dependencies during completion in a processor and replacing the instruction with an alternative instruction for completion in the processor. | 06-28-2012 |
20120191952 | PROCESSOR IMPLEMENTING SCALAR CODE OPTIMIZATION - Methods and apparatuses are provided for increased efficiency and enhanced power saving in a processor via scalar code optimization. The method comprises determining that an instruction comprises a scalar instruction and then processing the instruction using only a lower portion of an XMM register. The apparatus comprises an operational unit capable of determining whether an instruction comprises a scalar instruction and execution units responsive that determining for processing the scalar instruction using only a lower portion of an XMM register of the processor. By not processing the upper portion of the XMM register efficiency is increased and power saving is enhanced. | 07-26-2012 |
20130132702 | Processor with Kernel Mode Access to User Space Virtual Addresses - A computer includes a memory and a processor connected to the memory. The processor includes memory segment configuration registers to store defined memory address segments and defined memory address segment attributes such that the processor operates in accordance with the defined memory address segments and defined memory address segment attributes to allow kernel mode access to user space virtual addresses for enhanced kernel mode memory capacity. | 05-23-2013 |
20140244933 | Way Lookahead - Methods and systems that identify and power up ways for future instructions are provided. A processor includes an n-way set associative cache and an instruction fetch unit. The n-way set associative cache is configured to store instructions. The instruction fetch unit is in communication with the n-way set associative cache and is configured to power up a first way, where a first indication is associated with an instruction and indicates the way where a future instruction is located and where the future instruction is two or more instructions ahead of the current instruction. | 08-28-2014 |
20140258624 | Apparatus and Method for Operating a Processor with an Operation Cache - A processor includes a computation engine to produce a computed value for a set of operands. A cache stores the set of operands and the computed value. The cache is configured to selectively identify a match and a miss for a new set of operands. In the event of a match the computed value is supplied by the cache and a computation engine operation is aborted. In the event of a miss a new computed value for the new set of operands is computed by the computation engine and is stored in the cache. | 09-11-2014 |
20140258667 | Apparatus and Method for Memory Operation Bonding - A processor is configured to evaluate memory operation bonding criteria to selectively identify memory operation bonding opportunities within a memory access plan. Memory operations are combined in response to the memory operation bonding opportunities to form a revised memory access plan with accelerated memory access. | 09-11-2014 |
20140258694 | Apparatus and Method for Branch Instruction Bonding - A processor is configured to identify a branch instruction immediately followed by an architectural delay slot. A single bonded instruction comprising the branch instruction immediately followed by the architectural delay slot is created. The single bonded instruction is loaded into an instruction buffer. | 09-11-2014 |
20140258697 | Apparatus and Method for Transitive Instruction Scheduling - A processor includes a multiple stage pipeline with a scheduler with a wakeup block and select logic. The wakeup block is configured to wake, in a first cycle, all instructions dependent upon a first selected instruction to form a wake instruction set. In a second cycle, the wakeup block wakes instructions dependent upon the wake instruction set to augment the wake instruction set. The select logic selects instructions from the wake instruction set based upon program order. | 09-11-2014 |
20140281413 | Superforwarding Processor - Methods and systems that allow the processor to effectively and efficiently reduce or eliminate the latency associated with instructions that copy the value of one register to another register. A processor includes a superforwarding table, a superforwarding logic block, and a computation engine. The superforwarding table stores an entry, wherein the entry has a valid bit, a key field, and a forward field. The superforwarding logic block determines which register contains the information needed for an instruction. The computation engine executes instructions. | 09-18-2014 |