Class / Patent application number | Description | Number of patent applications / Date published |
712023000 | Superscalar | 20 |
20080235491 | Techniques for Maintaining a Stack Pointer - A technique for reducing stack pointer adjustment operations when stack dependent operations, which correspond to stack dependent instructions, are encountered includes setting a stack pointer to an initial value for a stack. A number of bytes associated with the stack dependent operation is determined. A stack pointer delta is then modified based upon the number of bytes associated with the stack dependent operation. A current location in the stack is determined based on the stack pointer and the stack pointer delta. | 09-25-2008 |
20080244223 | BRANCH PRUNING IN ARCHITECTURES WITH SPECULATION SUPPORT - According to one example embodiment of the inventive subject matter, the method and apparatus described herein is used to generate an optimized speculative version of a static piece of code. The portion of code is optimized in the sense that the number of instructions executed will be smaller. However, since the applied optimization is speculative, the optimized version can be incorrect and some mechanism to recover from that situation is required. Thus, the quality of the produced code will be measured by taking into account both the final length of the code as well as the frequency of misspeculation. | 10-02-2008 |
20080244224 | Scheduling a direct dependent instruction - In one embodiment, the present invention includes an apparatus having an instruction selector to select an instruction, where the selector is to store a dependent indicator to indicate a direct dependent consumer instruction of a producer instruction, a decode logic coupled to the instruction selector to receive the dependent indicator when the producer instruction is selected and to generate a wakeup signal for the direct dependent consumer instruction, and wakeup logic to receive the wakeup signal and to indicate that the producer instruction has been selected. Other embodiments are described and claimed. | 10-02-2008 |
20080263318 | Timed ports - A processor has an interface portion and an interior environment. The interface portion comprises: at least one port arranged to receive a current time value; a first register associated with the port and arranged to store a trigger time value; and comparison logic configured to detect whether the current time value matches the trigger time value and, provided that said match is detected, to transfer data between the port and an external environment and alter a ready signal to indicate the transfer. The internal environment comprises: an execution unit for transferring data between the at least one port and the internal environment; and a thread scheduler for scheduling a plurality of threads for execution by the execution unit, each thread comprising a sequence of instructions. The scheduling includes scheduling one or more of said threads for execution in dependence on the ready signal. | 10-23-2008 |
20080270749 | Instruction issue control within a multi-threaded in-order superscalar processor - A multi-threaded in-order superscalar processor | 10-30-2008 |
20080313424 | METHOD AND APPARATUS FOR SPATIAL REGISTER PARTITIONING WITH A MULTI-BIT CELL REGISTER FILE - There is provided a multi-bit storage cell for a register file. The storage cell includes a first set of storage elements for a vector slice. Each storage element respectively corresponds to a particular one of a plurality of thread sets for the vector slice. The storage cell includes a second set of storage elements for a scalar slice. Each storage element in the second set respectively corresponds to a particular one of at least one thread set for the scalar slice. The storage cell includes at least one selection circuit for selecting, for an instruction issued by a thread, a particular one of the storage elements from any of the first set and the second set based upon the instruction being a vector instruction or a scalar instruction and based upon a corresponding set from among the pluralities of thread sets to which the thread belongs. | 12-18-2008 |
20080313425 | Enhanced Load Lookahead Prefetch in Single Threaded Mode for a Simultaneous Multithreaded Microprocessor - A method, system, and computer program product are provided for enhancing the execution of independent loads in a processing unit. A processing unit detects if a long-latency miss associated with a load instruction has been encountered. Responsive to a long-latency miss, the processing unit enters a load lookahead mode. Responsive to entering the load lookahead mode, the processing unit dispatches each instruction from a first set of instructions from a first buffer with an associated vector. The processing unit determines if the first set of instructions from the first buffer have completed execution. Responsive to completed execution of the first set of instructions from the first buffer, the processing unit copies the set of vectors from a first vector array to a second vector array. Then the processing unit dispatches a second set of instructions from a second buffer with an associated vector from the second vector array. | 12-18-2008 |
20080320274 | Age matrix for queue dispatch order - An apparatus for queue allocation. An embodiment of the apparatus includes a dispatch order data structure, a bit vector, and a queue controller. The dispatch order data structure corresponds to a queue. The dispatch order data structure stores a plurality of dispatch indicators associated with a plurality of pairs of entries of the queue to indicate a write order of the entries in the queue. The bit vector stores a plurality of mask values corresponding to the dispatch indicators of the dispatch order data structure. The queue controller interfaces with the queue and the dispatch order data structure. The queue controller excludes at least some of the entries from a queue operation based on the mask values of the bit vector. | 12-25-2008 |
20090019257 | Method and Apparatus for Length Decoding and Identifying Boundaries of Variable Length Instructions - A mechanism for superscalar decode of variable length instructions. A length decode unit may obtain a plurality of instruction bytes based on a scan window of a predetermined size. The instruction bytes may be associated with a plurality of variable length instructions, which are scheduled to be executed by a processing unit. The length decode unit may, for each instruction byte, estimate the start of a next variable length instruction following a current variable length instruction, and store a first pointer. A pre-pick unit may, for each instruction byte, use the first pointer to estimate the start of a subsequent variable length instruction following the next variable length instruction within the scan window, and store a second pointer. A pick unit may use a start pointer and related first and second pointers to determine the actual start of the variable length instructions within the scan window, and generate instruction pointers. | 01-15-2009 |
20090138674 | ELECTRONIC SYSTEM FOR CHANGING NUMBER OF PIPELINE STAGES OF A PIPELINE - An electronic system includes a pipeline having a first number of pipeline stages coupled in series, a pipeline control unit, and a logic engine, wherein each pipeline stage in the pipeline is for outputting data to a next pipeline stage at each cycle of a clock signal. The pipeline control unit is for changing the first number of pipeline stages in the pipeline to a second number of pipeline stages. The logic engine is for performing operations of the electronic system in a first mode by utilizing the pipeline having the first number of pipeline stages and for performing operations of the electronic system in a second mode by utilizing the pipeline having the second number of pipeline stages. A frequency control unit and a voltage control unit, coupled to the pipeline and the logic engine, respectively adjust the frequency and voltage of the electronic system accordingly. | 05-28-2009 |
20110035569 | MICROPROCESSOR WITH ALU INTEGRATED INTO LOAD UNIT - A superscalar pipelined microprocessor includes a register set defined by its instruction set architecture, a cache memory, execution units, and a load unit, coupled to the cache memory and distinct from the other execution units. The load unit comprises an ALU. The load unit receives an instruction that specifies a memory address of a source operand, an operation to be performed on the source operand to generate a result, and a destination register of the register set to which the result is to be stored. The load unit reads the source operand from the cache memory. The ALU performs the operation on the source operand to generate the result, rather than forwarding the source operand to any of the other execution units of the microprocessor to perform the operation on the source operand to generate the result. The load unit outputs the result for subsequent retirement to the destination register. | 02-10-2011 |
20110035570 | MICROPROCESSOR WITH ALU INTEGRATED INTO STORE UNIT - A superscalar pipelined microprocessor includes a register set defined by an instruction set architecture of the microprocessor, execution units, and a store unit, coupled to the cache memory and distinct from the other execution units of the microprocessor. The store unit comprises an ALU. The store unit receives an instruction that specifies a source register of the register set and an operation to be performed on a source operand to generate a result. The store unit reads the source operand from the source register. The ALU performs the operation on the source operand to generate the result, rather than forwarding the source operand to any of the other execution units of the microprocessor to perform the operation on the source operand to generate the result. The store unit operatively writes the result to the cache memory. | 02-10-2011 |
20110167241 | SUPERCONDUCTING CIRCUIT FOR HIGH-SPEED LOOKUP TABLE - A high-speed lookup table is designed using Rapid Single Flux Quantum (RSFQ) logic elements and fabricated using superconducting integrated circuits. The lookup table is composed of an address decoder and a programmable read-only memory array (PROM). The memory array has rapid parallel pipelined readout and slower serial reprogramming of memory contents. The memory cells are constructed using standard non-destructive reset-set flip-flops (RSN cells) and data flip-flops (DFF cells). An n-bit address decoder is implemented in the same technology and closely integrated with the memory array to achieve high-speed operation as a lookup table. The circuit architecture is scalable to large two-dimensional data arrays. | 07-07-2011 |
20120226891 | PROCESSOR WITH INCREASED EFFICIENCY VIA CONTROL WORD PREDICTION - Methods and apparatuses are provided for increased efficiency in a processor via control word prediction. The apparatus comprises an operational unit capable of determining whether an instruction will change a first control word to a second control word for processing dependent instructions. Execution units process the dependent instructions using a predicted control word and compare the second control word to the predicted control word. A scheduling unit causes the execution units to reprocess the dependent instructions when the predicted control word does not match the second control word. The method comprises determining that an instruction will change a first control word to a second control word and processing the dependent instructions using a predicted control word. The second control word is compared to the predicted control word and the dependent instructions are reprocessed using the second control word when the predicted control word does not match the second control word. | 09-06-2012 |
20140281375 | RUN-TIME INSTRUMENTATION HANDLING IN A SUPERSCALAR PROCESSOR - A method and a computer program for a processor simultaneously handle multiple instructions at a time. The method includes labeling of an instruction ending a relevant sample interval from a plurality of such instructions. Further, the method utilizes a buffer to store N more number of entries than actually required, wherein, N refers to the number of RI instructions younger than the instruction ending a sample interval. Further, the method also includes the step of recording relevant instrumentation data corresponding to the sample interval and providing the instrumentation data in response to identification of the sample interval. | 09-18-2014 |
20150100759 | PIPELINED FINITE STATE MACHINE - A system and method for controlling operation of a pipeline. In one embodiment, a pipelined datapath includes a plurality of processing stages and a pipeline controller. Each of the processing stages is configured to further processing provided by a previous one of the processing stages. The pipeline controller is configured to control operation of the processing stages. The pipeline controller includes a pipelined finite state machine. The pipelined finite state machine includes a plurality of control stages. Each of the control stages is configured to control operation of a single one of the processing stages, and to receive a state value that defines a state of the control stage for controlling the single one of the processing stages from a previous control stage. | 04-09-2015 |
20160055004 | METHOD AND APPARATUS FOR NON-SPECULATIVE FETCH AND EXECUTION OF CONTROL-DEPENDENT BLOCKS - An apparatus and method are described for non-speculative execution of conditional instructions. For example, one embodiment of a processor comprises: a register set including a first register to store a set of one or more condition bits; non-speculative execution logic to execute a first instruction to identify a first target instruction strand in response to a first conditional value read from the set of condition bits, the first instruction to wait until the first conditional value becomes known before causing the first target instruction strand to be fetched and executed, the non-speculative execution logic to execute a second instruction to identify an end of the first target instruction strand and responsively identify a new current instruction pointer for instructions which follow the second instruction; and out-of-order execution logic to fetch and execute the instructions which follow the second instruction prior to the execution of the second instruction. | 02-25-2016 |
20160179550 | FAST VECTOR DYNAMIC MEMORY CONFLICT DETECTION | 06-23-2016 |
20160202990 | LINKABLE ISSUE QUEUE PARALLEL EXECUTION SLICE FOR A PROCESSOR | 07-14-2016 |
20160202992 | LINKABLE ISSUE QUEUE PARALLEL EXECUTION SLICE PROCESSING METHOD | 07-14-2016 |