Patent application number | Description | Published |
20120005459 | PROCESSOR HAVING INCREASED PERFORMANCE AND ENERGY SAVING VIA MOVE ELIMINATION - Methods and apparatuses are provided for increasing processor performance and energy saving via eliminating physical data movement to accomplish a move instruction. The apparatus comprises a first plurality of available physical registers mapped to a second plurality of logical registers, including a source logical register and a destination logical register. A renaming unit remaps the destination logical register to the same physical register mapping as the source logical register in response to a move instruction. In this way, the move instruction is effectively executed without moving data between physical registers. A method is provided for increasing processor performance and energy saving via eliminating physical data movement to accomplish a move instruction. The method comprises determining a mapping of a logical source register and a logical destination register to physical registers of a processor and then remapping the logical destination register to the same physical register mapping as the logical source register to affect an equivalent of the move instruction with actual data movement between physical registers. | 01-05-2012 |
20120110256 | LOW POWER CONTENT-ADDRESSABLE MEMORY AND METHOD - Content-Addressable Memory (CAM) arrays and related circuitry for integrated circuits and CAM array comparison methods are provided such that relatively low power is used in the operation of the CAM circuitry. A binary value pair is stored in a pair of CAM memory elements. A comparison signal is provided to comparator circuitry that uniquely represents the stored binary values. A match signal is input to the comparator circuitry that uniquely represents a binary value pair to be compared with the stored binary value pair. In one example, a transistor is operated to output a positive match result signal only on a condition that the comparison signal provided to the comparator circuitry and match signal input to the comparator circuitry represent the same binary value pair. In that example, no transistor of the comparator circuitry is operated when the comparison signal provided to the comparator circuitry and match signal input to the comparator circuitry represent different binary value pairs. | 05-03-2012 |
20120110594 | LOAD BALANCING WHEN ASSIGNING OPERATIONS IN A PROCESSOR - A method and apparatus for assigning operations in a processor are provided. An incoming instruction is received. The incoming instruction is capable of being processed: only by a first processing unit (PU), only by a second PU or by either first and second PUs. The processing of first and second PUs is load balanced by assigning the received instructions capable of being processed by either the first and the second PUs based on a metric representing differential loads placed on the first and the second PUs. | 05-03-2012 |
20120124435 | AVOIDING BIST AND MBIST INTRUSION LOGIC IN CRITICAL TIMING PATHS - Methods, systems, and apparatuses are presented that remove BIST intrusion logic from critical timing paths of a microcircuit design without significant impact on testing. In one embodiment, BIST data is multiplexed with scan test data and serially clocked in through scan test cells for BIST testing. In another embodiment, BIST data is injected into the feedback path of one or more data latches. In a third embodiment, BIST data is injected into the result data path of a multi-cycle ALU within an execution unit. In each embodiment, BIST circuitry is eliminated from critical timing paths. | 05-17-2012 |
20120137185 | METHOD AND APPARATUS FOR PERFORMING A MEMORY BUILT-IN SELF-TEST ON A PLURALITY OF MEMORY ELEMENT ARRAYS - A method and apparatus are described for performing a memory built-in self-test (MBIST) on a plurality of memory element arrays. Control packets are output over a first ring bus to respective ones of the arrays. Each of the arrays receives its respective control packet via the first ring bus, and reads commands residing in a plurality of fields within the respective control packet. Each of the arrays performs at least one self-test based on the commands, and outputs a respective result packet over a second ring bus. Each result packet indicates the results of the self-test performed on the array. Each control packet is transmitted in its own individual time slot to a respective one of the arrays. | 05-31-2012 |
20120143885 | HYBRID SOURCES PREREADY DETERMINATION - A method and apparatus for maintaining source ready information are disclosed. A first copy of the source ready information is stored in an Architectural Register Name (ARN)-indexed structure and a second copy of the source ready information is stored in a Physical Register Number (PRN)-indexed structure. As new instructions become available that require at least one source, the ARN-indexed structure is accessed. If at least one new source becomes available, the ARN-indexed structure and the PRN-indexed structure are updated to include information regarding the new sources. | 06-07-2012 |
20120144173 | UNIFIED SCHEDULER FOR A PROCESSOR MULTI-PIPELINE EXECUTION UNIT AND METHODS - A unified scheduler for a processor execution unit and methods are disclosed for providing faster throughput of micro-instruction/operation execution with respect to a multi-pipeline processor execution unit. In one example, an execution unit has a plurality of pipelines that operate at a predetermined clock rate, each pipeline configured to process a selected subset of microinstructions. The execution unit has a scheduler that includes a unified queue configured to queue microinstructions for all of the pipelines and a picker configured to direct a queued microinstruction to an appropriate pipeline for processing based on an indication of readiness for picking. Preferably, when all of the pipelines are ready to receive a microinstruction for processing and there is at least one microinstruction queued that is ready for picking for each pipeline, the picker picks and directs a queued microinstructions to each of the pipelines in a single clock cycle. | 06-07-2012 |
20120144174 | MULTIFLOW METHOD AND APPARATUS FOR OPERATION FUSION - A method and apparatus for utilizing scheduling resources in a processor are disclosed. A complex operation is assigned for execution as two micro-operations; a first micro-operation and a second micro-operation. The first micro-operation, which may be an address-generation operation, is executed using at least one of a first processing unit or a load and store unit and the second micro-operation, which may be an execution operation, is executed using a second processing unit, where at least one operand of the second micro-operation is an outcome of the first micro-operation. | 06-07-2012 |
20120144175 | METHOD AND APPARATUS FOR AN ENHANCED SPEED UNIFIED SCHEDULER UTILIZING OPTYPES FOR COMPACT LOGIC - An integrated circuit is disclosed wherein microinstructions are selectively queued for execution in an execution unit having multiple pipelines where each pipeline is configured to execute a selected subset of a set of supported microinstructions. The execution unit receives microinstruction data including an operation code OpCode and an operation type OpType. The OpType data being at least one bit less that a minimum binary size of an OpCode required to uniquely identify the microinstruction. The OpType data selected to indicate a category of microinstructions having common execution requirement characteristics. The microinstructions are selectively queued for pipeline processing by the execution unit pipelines based on the OpType without decoding the OpCode of the microinstruction. | 06-07-2012 |
20120144393 | MULTI-ISSUE UNIFIED INTEGER SCHEDULER - A method and apparatus for scheduling execution of instructions in a multi-issue processor. The apparatus includes post wake logic circuitry configured to track a plurality of entries corresponding to a plurality of instructions to be scheduled. Each instruction has at least one associated source address and a destination address. The post wake logic circuitry is configured to drive a ready input indicating an entry that is ready for execution based on a current match input. A picker circuitry is configured to pick an instruction for execution based the ready input. A compare circuit is configured to determine the destination address for the picked instruction, compare the destination address to the source address for all entries and drive the current match input. | 06-07-2012 |
20120159217 | METHOD AND APPARATUS FOR PROVIDING EARLY BYPASS DETECTION TO REDUCE POWER CONSUMPTION WHILE READING REGISTER FILES OF A PROCESSOR - A method and apparatus are described for reducing power consumption in a processor. A micro-operation is selected for execution, and a destination physical register tag of the selected micro-operation is compared to a plurality of source physical register tags of micro-operations dependent upon the selected micro-operation. If there is a match between the destination physical register tag and one of the source physical register tags, a corresponding physical register file (PRF) read operation is disabled. The comparison may be performed by a wakeup content-addressable memory (CAM) of a scheduler. The wakeup CAM may send a read control signal to the PRF to disable the read operation. Disabling the corresponding PRF read operation may include shutting off power in the PRF and related logic. | 06-21-2012 |
20120179895 | METHOD AND APPARATUS FOR FAST DECODING AND ENHANCING EXECUTION SPEED OF AN INSTRUCTION - Method and apparatus for fast decoding of microinstructions are disclosed. An integrated circuit is disclosed wherein microinstructions are queued for execution in an execution unit having multiple pipelines where each pipeline is configured to execute a set of supported microinstructions. The execution unit receives microinstruction data including an operation code (opcode) or a complex opcode. The execution unit executes the microinstruction multiple times wherein the microinstruction is executed at least once to get an address value and at least once to get a result of an operation. The execution unit processes complex opcodes by utilizing both a load/store support and a simple opcode support by splitting the complex opcode into load/store and simple opcode components and creating an internal source/destination between the two components. | 07-12-2012 |
20120291037 | METHOD AND APPARATUS FOR PRIORITIZING PROCESSOR SCHEDULER QUEUE OPERATIONS - A method and processor are described for implementing programmable priority encoding to track relative age order of operations in a scheduler queue. The processor may comprise a scheduler queue configured to maintain an ancestry table including a plurality of consecutively numbered row entries and a plurality of consecutively numbered columns. Each row entry includes one bit in each of the columns. Pickers are configured to pick an operation that is ready for execution based on the age of the operation as designated by the ancestry table. The column number of each bit having a select logic value indicates an operation that is older than the operation associated with the number of the row entry that the bit resides in. | 11-15-2012 |
20130039109 | SELECTABLE MULTI-WAY COMPARATOR - A method for comparing content addressable memory (CAM) elements is disclosed. Binary values are stored in a pair of CAM elements. A comparison value is provided to a group of comparators, the comparison value based on the binary value stored in the pair of CAM elements. A match value is provided to the group of comparators, the match value corresponding to a binary value pair to be compared with the binary value stored in the pair of CAM elements. A positive match result value is output from a selected group of comparators via an output line in response to the comparison value matching the match value. | 02-14-2013 |
20130117543 | LOW OVERHEAD OPERATION LATENCY AWARE SCHEDULER - A method and apparatus for processing multi-cycle instructions include picking a multi-cycle instruction and directing the picked multi-cycle instruction to a pipeline. The pipeline includes a pipeline control configured to detect a latency and a repeat rate of the picked multi-cycle instruction and to count clock cycles based on the detected latency and the detected repeat rate. The method and apparatus further include detecting the repeat rate and the latency of the picked multi-cycle instruction, and counting clock cycles based on the detected repeat rate and the latency of the picked multi-cycle instruction. | 05-09-2013 |
20140025933 | REPLAY REDUCTION BY WAKEUP SUPPRESSION USING EARLY MISS INDICATION - A method for reducing a number of operations replayed in a processor includes decoding an operation to determine a memory address and a command in the operation. If data is not in a way predictor based on the memory address, a suppress wakeup signal is sent to an operation scheduler, and the operation scheduler suppresses waking up other operations that are dependent on the data. | 01-23-2014 |
20140136819 | REGISTER FILE MANAGEMENT FOR OPERATIONS USING A SINGLE PHYSICAL REGISTER FOR BOTH SOURCE AND RESULT - A processor includes a physical register file having physical registers and an execution unit to perform an arithmetic operation to generate a result mapped to a physical register, wherein the processor delays a write of the result to the physical register file until the result is qualified as valid. A method includes mapping the same physical register both to store load data of a load-execute operation and to subsequently store a result of an arithmetic operation of the load-execute operation, and writing the load data into the physical register. The method further includes, in a first clock cycle, executing the arithmetic operation to generate the result, and, in a second clock cycle, providing the result as a source operand for a dependent operation. The method includes, in a third clock cycle, enabling a write of the result to the physical register file responsive to the result qualifying as valid. | 05-15-2014 |
20150026436 | HYBRID TAG SCHEDULER - The present invention provides a method and apparatus for scheduling based on tags of different types. Some embodiments of the method include broadcasting a first tag to entries in a queue of a scheduler. The first tag is broadcast in response to a first instruction associated with a first entry in the queue being picked for execution. The first tag includes information identifying the first entry and information indicating a type of the first tag. Some embodiments of the method also include marking at least one second entry in the queue is ready to be picked for execution in response to at least one second tag associated with at least one second entry in the queue matching the first tag. | 01-22-2015 |
20150026685 | DEPENDENT INSTRUCTION SUPPRESSION - A method includes suppressing execution of at least one dependent instruction of a first instruction by a processor responsive to an invalid status of an ancestor load instruction associated with the first instruction. A processor includes an instruction pipeline having an execution unit to execute instructions, a load store unit for retrieving data from a memory hierarchy, and a scheduler unit. The scheduler unit selects for execution in the execution unit a first load instruction having at least one dependent instruction linked to the first load instruction for data forwarding from the load store unit and suppresses execution of a second dependent instruction of the first dependent instruction responsive to an invalid status of the first load instruction. | 01-22-2015 |
20150026686 | DEPENDENT INSTRUCTION SUPPRESSION IN A LOAD-OPERATION INSTRUCTION - A method includes suppressing execution of an operation portion of a load-operation instruction in a processor responsive to an invalid status of a load portion of load-operation instruction. A processor includes an instruction pipeline including an execution unit operable to execute instructions and a scheduler unit. The scheduler unit includes a scheduler queue and is operable to store a load-operation in the scheduler queue. The load-operation instruction includes a load portion and an operation portion. The scheduler unit schedules the load portion for execution in the execution unit, marks the operation portion in the scheduler queue as eligible for execution responsive to scheduling the load portion, receives an indication of an invalid status of the load portion, and suppresses execution of the operation portion responsive to the indication of the invalid status. | 01-22-2015 |