Entries |
Document | Title | Date |
20080215853 | System and Method for Line Rate Frame Processing Engine Using a Generic Instruction Set - A system comprises a frame parser and lookup engine operable to receive an incoming data frame, extract control data from payload data in the data frame, and using the control data to access a memory to fetch a plurality of instructions, a destination and tag management module operable to receive the fetched instructions and execute the instructions to transform the data frame control data, and an assemble module operable to assemble the transformed control data and the payload data. | 09-04-2008 |
20080222392 | Method and arrangements for pipeline processing of instructions - In one embodiment a method for parallel processing in a processing pipeline is disclosed. The method can include determining that a jump instruction is loaded in a main path of a processing pipeline prior to the jump instruction being executed. The method can load a jump hit target instruction in a bypass path of the pipeline in response to determining that the jump instruction is loaded in the main path. The bypass path can bypass at least one stage of the processing pipeline and couple into the main path in a stage that is prior to the execute stage. The method can switch the jump hit target instruction into the main path in response to a successful jump-hit condition. The bypass path and the main path can operate concurrently and in parallel. | 09-11-2008 |
20080222393 | Method and arrangements for pipeline processing of instructions - In one embodiment a method for operating a processing pipeline is disclosed. The method can include fetching an instruction in a first clock cycle, decoding the instruction in a second clock cycle and fetching an instruction data associated with the instruction in the second clock cycle. The method can also include associating the instruction data with the instruction and feeding the instruction and the instruction data to a processing unit utilizing the association. The method can also include loading a register with instruction data wherein the number of bits of instruction data loaded per clock cycle varies based on the amount of instruction data required to execute at least one instruction in a clock cycle. | 09-11-2008 |
20080229067 | DATA POINTERS WITH FAST CONTEXT SWITCHING - An apparatus and method are disclosed for multiple data pointer registers and a means for quickly switching active context between the data pointer registers. | 09-18-2008 |
20080229068 | ADAPTIVE FETCH GATING IN MULTITHREADED PROCESSORS, FETCH CONTROL AND METHOD OF CONTROLLING FETCHES - A multithreaded processor, fetch control for a multithreaded processor and a method of fetching in the multithreaded processor. Processor event and use (EU) signals are monitored for downstream pipeline conditions indicating pipeline execution thread states. Instruction cache fetches are skipped for any thread that is incapable of receiving fetched cache contents, e.g., because the thread is full or stalled. Also, consecutive fetches may be selected for the same thread, e.g., on a branch mis-predict. Thus, the processor avoids wasting power on unnecessary or place keeper fetches. | 09-18-2008 |
20080244229 | Information processing apparatus - In an information processing apparatus, a fetch to a storage address of a first storage unit which stores a first instruction executed at first within a plurality of instructions that is included in a software and executed when a processor starts the software via the channel is detected. It is detected that the processor executed a specific instruction within the plurality of instructions via the channel. It is determined whether a predetermined time has passed since the detection of the fetch to the storage address until the detection of the execution of the specific instruction. When it is determined that the predetermined time has not passed, it is determined whether an interrupt to the processor is prohibited based on a result of the processor executing the specific instruction, and an access is released to the process according to a result of determination. | 10-02-2008 |
20080250228 | Integrated circuit with restricted data access - A semiconductor integrated circuit includes a hardware mechanism arranged to ensure that associations between instructions and data are enforced so that a processor cannot fetch data from an instruction that is not authorised to do so. A Memory Protection Unit stores entries comprising instructions and associated data memory ranges. A hardware arrangement impairs the operation of the circuit if the CPU attempts to make a data fetch from an instruction that is outside the range associated with data in a Memory Protection Unit. Such functioning may be by issuing a chip reset. The Memory Protection Unit may be implemented in a Memory Management Unit having an extension so as to store a validity flag. The validity flag may only be set by a secure process such as the CPU well entrusted code or by a separate trusted hardware source. In this way, an operating system may function as normal referring to the Memory Management Unit as necessary, but security may be enforced through hardware. | 10-09-2008 |
20080256333 | SYSTEM AND METHOD FOR IGNORING FETCH PROTECTION - A system, method, and program product is provided that receives an instruction to fetch data from a data page. The data page is associated with a storage key and a fetch protection bit, and the instruction is pointed to by the program status word (PSW) that includes a PSW key and an ignore fetch protection bit. The data is fetched from the data page when the PSW key is a non-zero value, the PSW key is different than the storage key, and both the fetch protection bit and the ignore fetch protection bit are set ON. However, the data is not fetched from the data page when the PSW key is a non-zero value, the PSW key is different than the storage key, the fetch protection bit is ON, and the ignore fetch protection bit is OFF. | 10-16-2008 |
20080256334 | Processing System and Method for Executing Instructions - A processing system for executing instructions comprises a first part ( | 10-16-2008 |
20080256335 | MICROPROCESSOR, MICROCOMPUTER, AND ELECTRONIC INSTRUMENT - A microprocessor includes a pipeline control section which controls a pipeline process. The pipeline control section decodes an instruction code of an interrupt instruction and causes an immediate generation section to generate a vector address used for referring to information relating to a branch destination address corresponding to the interrupt instruction stored in a vector table based on a decoding result in a first instruction execution stage of the interrupt instruction. The pipeline control section controls the pipeline process so that the vector address is set in a pipeline register | 10-16-2008 |
20080263326 | METHOD AND APPARATUS FOR AN EFFICIENT MULTI-PATH TRACE CACHE DESIGN - A novel trace cache design and organization to efficiently store and retrieve multi-path traces. A goal is to design a trace cache, which is capable of storing multi-path traces without significant duplication in the traces. Furthermore, the effective access latency of these traces is reduced. | 10-23-2008 |
20080270757 | Fetch and Dispatch Disassociation Apparatus for Multistreaming Processors - A dynamic multistreaming processor has instruction queues, each instruction queue corresponding to an instruction stream, and execution units. The dynamic multistreaming processor also has a dispatch stage to select at least one instruction from one of the instruction queues and to dispatch the selected at least one instruction to one of the execution units. Lastly the dynamic multistreaming processor has a queue counter, associated with each instruction queue, for indicating the number of instructions in each queue, and a fetch counter, associated with each instruction queue, for indicating an address from which to obtain instructions when the associated instruction queue is not full. The dynamic multistreaming processor might also have fetch counters for indicating a next instruction address from which to obtain at least one instruction when the associated instruction queue is not full. The dynamic multistreaming processor could also have a second counter for indicating a next instruction address. | 10-30-2008 |
20080276069 | METHOD AND APPARATUS FOR PREDICTIVE DECODING - Predictive decoding is achieved by fetching an instruction, accessing a predictor containing predictor information including prior instruction execution characteristics, obtaining predictor information for the fetched instruction from the predictor; and generating a selected one of a plurality of decode operation streams corresponding to the fetched instruction. The decode operation stream is selected based on the predictor information. | 11-06-2008 |
20080276070 | REDUCING THE FETCH TIME OF TARGET INSTRUCTIONS OF A PREDICTED TAKEN BRANCH INSTRUCTION - A method and processor for reducing the fetch time of target instructions of a predicted taken branch instruction. Each entry in a buffer, referred to herein as a “branch target buffer”, may store an address of a branch instruction predicted taken and the instructions beginning at the target address of the branch instruction predicted taken. When an instruction is fetched from the instruction cache, a particular entry in the branch target buffer is indexed using particular bits of the fetched instruction. The address of the branch instruction in the indexed entry is compared with the address of the instruction fetched from the instruction cache. If there is a match, then the instructions beginning at the target address of that branch instruction are dispatched directly behind the branch instruction. In this manner, the fetch time of target instructions of a predicted taken branch instruction is reduced. | 11-06-2008 |
20080276071 | REDUCING THE FETCH TIME OF TARGET INSTRUCTIONS OF A PREDICTED TAKEN BRANCH INSTRUCTION - A method and processor for reducing the fetch time of target instructions of a predicted taken branch instruction. Each entry in a buffer, referred to herein as a “branch target buffer”, may store an address of a branch instruction predicted taken and the instructions beginning at the target address of the branch instruction predicted taken. When an instruction is fetched from the instruction cache, a particular entry in the branch target buffer is indexed using particular bits of the fetched instruction. The address of the branch instruction in the indexed entry is compared with the address of the instruction fetched from the instruction cache. If there is a match, then the instructions beginning at the target address of that branch instruction are dispatched directly behind the branch instruction. In this manner, the fetch time of target instructions of a predicted taken branch instruction is reduced. | 11-06-2008 |
20080307201 | Method and Apparatus for Cooperative Software Multitasking In A Processor System with a Partitioned Register File - A processor system executes multiple applet programs within a software application program in an information handling system. The information handling system includes operating system software that manages processor system hardware and software in a multi-tasking environment. In particular, the operating system software manages partitioning of a register file in the processor system to achieve a cooperative relationship among multiple applet programs within respective partitions of the register file. In one embodiment, the operating system software manages unique applet ID's to modify register file partition sizes and locations during applet program instruction text execution. In one embodiment, applet ID masking hardware provides sharing of register file space among multiple copies of applet program code. | 12-11-2008 |
20080313431 | Method and System for Altering Processor Execution of a Group of Instructions - An embodiment of the invention is a processor for detecting one or more groups of instructions and initiating a processor action upon detecting one or more groups of instructions. The processor includes an instruction unit for fetching and decoding a group of instructions. An instruction register receives the group of instruction having at least one instruction opcode. A control register includes a control word including a control opcode and an action field defining a processor action. An execution unit includes compare logic for comparing the instruction opcode and the control opcode. The execution unit initiates the processor action upon the compare logic detecting a hit between the instruction opcode and the control opcode. | 12-18-2008 |
20080320281 | PROCESSING MODULE WITH MMW TRANSCEIVER INTERCONNECTION - A processing module includes a fetch and decode module, an instruction register, a data register, an execution module, and a MMW transceiver section. The fetch and decode module is operable to fetch and decode an instruction of a program and to identify data associated with the instruction. The execution module is operable to execute the instruction upon the data associated with the instruction. The MMW transceiver section is operable to wirelessly receive at least one of the instruction and the data associated with the instruction from memory. | 12-25-2008 |
20090006811 | Method and System for Expanding a Conditional Instruction into a Unconditional Instruction and a Select Instruction - A method of expanding a conditional instruction having a plurality of operands within a pipeline processor is disclosed. The method identifies the conditional instruction prior to an issue stage and determines if the plurality of operands exceeds a predetermined threshold. The method expands the conditional instruction into a non-conditional instruction and a select instruction. The method further executes the non-conditional instruction and the select instruction in separate pipelines. | 01-01-2009 |
20090006812 | Method and Apparatus for Accessing a Cache With an Effective Address - A method and apparatus for accessing a processor cache. The method includes executing an access instruction in a processor core of the processor. The access instruction provides an untranslated effective address of data to be accessed by the access instruction. The method also includes determining whether a level one cache for the processor core includes the data corresponding to the effective address of the access instruction. The effective address of the access instruction is used without address translation to determine whether the level one cache for the processor core includes the data corresponding to the effective address. If the level one cache includes the data corresponding to the effective address, the data for the access instruction is provided from the level one cache. | 01-01-2009 |
20090037695 | DATA FETCH CIRCUIT AND METHOD THEREOF - A data fetch circuit and a method thereof are provided. A multi-phase clock signal is generated according to an input clock, and an input data is over-sampled according to the multi-phase clock signal in order to detect transition points of the input data. One of reference phases of the multi-phase clock signal is selected according to the detected transition point for fetching the input data and obtaining enough setup/hold time margin. Accordingly, appropriate data fetch points is found without complicated negative feed-back mechanism. Besides, a periodical monitoring mechanism may be further adopted for improving the accuracy of data fetch. | 02-05-2009 |
20090037696 | PROCESSOR - A processor ( | 02-05-2009 |
20090063819 | Method and Apparatus for Dynamically Managing Instruction Buffer Depths for Non-Predicted Branches - A method and apparatus for dynamically managing instruction buffer depths for non-predicted branches reduces wasted energy and resources associated with low confidence branch prediction conditions. A portion of the instruction buffer for a instruction thread is allocated for storing predicted branch instruction streams and another portion, which may be zero-sized during high prediction confidence conditions, is allocated to the non-predicted branch instruction stream. The size of the buffers is adjusted dynamically in conformity with an on-going prediction confidence that provides a measure of how well branch prediction mechanisms are working for a given instruction thread. An alternate instruction fetch address table can be maintained and multiplexed with the main fetch address register for addressing the instruction cache, so that the instruction stream can be quickly shifted to the non-predicted path when a branch instruction is resolved to the non-predicted path. | 03-05-2009 |
20090070554 | Register File System and Method for Pipelined Processing - The present disclosure includes a multi-threaded processor that includes a first register file associated with a first thread and a second register file associated with a second thread. At least one hardware resource is shared by the first and second register files. In addition, the first thread may have a pipeline access position that is non-sequential to the second thread. A method of accessing a plurality of register files is also disclosed. The method includes reading data from a first register file while concurrently reading data from a second register file. The first register file is associated with a first instruction stream and the second register file is associated with a second instruction stream. The first instruction stream is sequential to the second instruction stream in an execution pipeline of a processor, and the first register file is in a non-adjacent location with respect to the second register file. | 03-12-2009 |
20090070555 | DEVICE AND METHOD FOR FINDING EXTREME VALUES IN A DATA BLOCK - A method for locating an extreme value data chunk within a data block, the method includes: fetching, by a processor, an instruction; fetching, in response to a content of the instruction, a data unit that comprises multiple data chunks; selectively masking the fetched data chunks in response to a value of a mask; comparing, by a hardware accelerator, between values of valid data chunks to provide a extreme value data chunk; wherein valid data chunks include un-masked data chunks that belong to the data block; updating the value of the mask and jumping to the stage of fetching a new data unit, until the whole data block is fetched. | 03-12-2009 |
20090089547 | SYSTEM AND METHOD FOR MONITORING DEBUG EVENTS - A system has a pipelined processor for executing a plurality of instructions by sequentially fetching, decoding, executing and writing results associated with execution of each instruction. Debug circuitry is coupled to the pipelined processor for monitoring execution of the instructions to determine when a debug event occurs. The debug circuitry generates a debug exception to interrupt instruction processing flow. The debug circuitry has control circuitry for indicating a number of instructions, if any, that complete instruction execution between an instruction that caused the debug event and a point in instruction execution when the exception is taken. | 04-02-2009 |
20090106533 | DATA PROCESSING APPARATUS - The data processing apparatus includes two or more execution resources, each enabling a predetermined process for executing an instruction. The execution resources enable a pipeline process. Each execution resource treats instructions according to an in-order system following the instructions' flow order in case that the execution resource is in charge of the instructions. Also, each execution resource treats instructions according to an out-of-order system regardless of the instructions' flow order in case that the instructions are treated by different execution resources. Thus, local processes in the execution resources can be simplified and materialized in a small-scale of hardware. Consequently, the need for the whole synchronization in processing across execution resources is eliminated, and the locality of processes and the efficiency of electric power are increased. | 04-23-2009 |
20090113177 | INTEGRATED CIRCUIT WITH DMA MODULE FOR LOADING PORTIONS OF CODE TO A CODE MEMORY FOR EXECUTION BY A HOST PROCESSOR THAT CONTROLS A VIDEO DECODER - A system, method, and apparatus for dynamically booting processor code memory with a wait instruction is presented herein. A wait instruction precedes the transfer of a new code portion to the code memory. The wait instruction causes the processor to temporarily cease using the code memory. When the processor ceases using the code memory, the processor signals a direct memory access (DMA) module to transfer a new code portion to the code memory. The DMA module transfers the new code portion to the code memory and transmits a signal to the processor when the transfer is completed. The signal causes the processor to resume. When the processor resumes, the processor begins executing the instructions at the next code address. | 04-30-2009 |
20090113178 | Microprocessor based on event-processing instruction set and event-processing method using the same - Provided are a microprocessor based on event-processing instruction set and an event-processing method using the same. The microprocessor includes an event register controlling an event according to an event-processing instruction set provided in an instruction set architecture (ISA) and an event controller transmitting externally generated events into the microprocessor. Therefore, the microprocessor may be useful to reduce its unnecessary power consumption by suspending the execution of its program when an instruction decoded to execute the program is an event-processing instruction, and also to cut off its unnecessary power consumption that is caused for an interrupt delay period since the program of the microprocessor may be executed again by immediately re-running the microprocessor with the operation of the event register and the event controller when external events are generated. | 04-30-2009 |
20090113179 | OPERATIONAL PROCESSING APPARATUS, PROCESSOR, PROGRAM CONVERTING APPARATUS AND PROGRAM - The present invention provides an operational processing apparatus which can guarantee a period for executing instructions in the shortest cycle when the operational processing apparatus synchronizes with a hardware accelerator. A processor in the present invention simultaneously issues and executes instructions including instruction groups having a simultaneously issueable instruction. The processor executes a program including a specific instruction. The specific instruction instructs to exclude an instruction subsequent to the specific instruction out of the instruction groups including the specific instruction, and to suspend issuing the instruction subsequent to the specific instruction only during a predetermined period immediately after the specific instruction is issued. | 04-30-2009 |
20090113180 | Fetch Director Employing Barrel-Incrementer-Based Round-Robin Apparatus For Use In Multithreading Microprocessor - A fetch director in a multithreaded microprocessor that concurrently executes instructions of N threads is disclosed. The N threads request to fetch instructions from an instruction cache. In a given selection cycle, some of the threads may not be requesting to fetch instructions. The fetch director includes a circuit for selecting one of threads in a round-robin fashion to provide its fetch address to the instruction cache. The circuit 1-bit left rotatively increments a first addend by a second addend to generate a sum that is ANDed with the inverse of the first addend to generate a 1-hot vector indicating which of the threads is selected next. The first addend is an N-bit vector where each bit is false if the corresponding thread is requesting to fetch instructions from the instruction cache. The second addend is a 1-hot vector indicating the last selected thread. In one embodiment threads with an empty instruction buffer are selected at highest priority; a last dispatched but not fetched thread at middle priority; all other threads at lowest priority. The threads are selected round-robin within the highest and lowest priorities. | 04-30-2009 |
20090119485 | Predecode Repair Cache For Instructions That Cross An Instruction Cache Line - A predecode repair cache is described in a processor capable of fetching and executing variable length instructions having instructions of at least two lengths which may be mixed in a program. An instruction cache is operable to store in an instruction cache line instructions having at least a first length and a second length, the second length longer than the first length. A predecoder is operable to predecode instructions fetched from the instruction cache that have invalid predecode information to form repaired predecode information. A predecode repair cache is operable to store the repaired predecode information associated with instructions of the second length that span across two cache lines in the instruction cache. Methods for filling the predecode repair cache and for executing an instruction that spans across two cache lines are also described. | 05-07-2009 |
20090119486 | Method and a System for Accelerating Procedure Return Sequences - A method for retrieving a return address from a link stack when returning from a procedure in a pipeline processor is disclosed. The method identifies a retrieve instruction operable to retrieve a return address from a software stack. The method further identifies a branch instruction operable to branch to the return address. The method retrieves the return address from the link stack, in response to both the instruction and the branch instruction being identified and fetches instructions using the return address. | 05-07-2009 |
20090119487 | ARITHMETIC PROCESSING APPARATUS FOR EXECUTING INSTRUCTION CODE FETCHED FROM INSTRUCTION CACHE MEMORY - An arithmetic processing apparatus includes a cache block which stores a plurality of instruction codes from a main memory, a central processing unit which fetch-accesses the cache block and sequentially loads and executes the plurality of instruction codes, and a repeat buffer which stores an instruction code group corresponding to a buffer size, the instruction code group ranging from a head instruction code to a terminal instruction code among the head instruction code to an end instruction code of a repeat block repeatedly executed in the processing program, in the plurality of instruction codes stored in the cache block. The arithmetic processing apparatus further includes an instruction cache control unit which performs control so that the instruction code group stored in the repeat buffer is selected and supplied to the central processing unit when the repeat block is repeatedly executed. | 05-07-2009 |
20090132790 | SYSTEM AND METHOD FOR PROCESSOR WITH PREDICTIVE MEMORY RETRIEVAL ASSIST - A system and method are described for a memory management processor which, using a table of reference addresses embedded in the object code, can open the appropriate memory pages to expedite the retrieval of information from memory referenced by instructions in the execution pipeline. A suitable compiler parses the source code and collects references to branch addresses, calls to other routines, or data references, and creates reference tables listing the addresses for these references at the beginning of each routine. These tables are received by the memory management processor as the instructions of the routine are beginning to be loaded into the execution pipeline, so that the memory management processor can begin opening memory pages where the referenced information is stored. Opening the memory pages where the referenced information is located before the instructions reach the instruction processor helps lessen memory latency delays which can greatly impede processing performance. | 05-21-2009 |
20090138678 | Multifunction Hexadecimal Instruction Form System and Program Product - A new zSeries floating-point unit has a fused multiply-add dataflow capable of supporting two architectures and fused MULTIPLY and ADD and Multiply and SUBTRACT in both RRF and RXF formats for the fused functions. Both binary and hexadecimal floating-point instructions are supported for a total of 6 formats. The floating-point unit is capable of performing a multiply-add instruction for hexadecimal or binary every cycle with a latency of 5 cycles. This supports two architectures with two internal formats with their own biases. This has eliminated format conversion cycles and has optimized the width of the dataflow. The unit is optimized for both hexadecimal and binary floating-point architecture supporting a multiply-add/subtract per cycle. | 05-28-2009 |
20090138679 | Enhanced Boolean Processor - A processor including a Boolean logic unit, wherein the Boolean logic unit is operated for performing the short-circuit evaluation of a Normal Form Boolean expression/operation, a plurality of input/output interfaces in communication with the Boolean logic unit, wherein the plurality of input/output interfaces are operated for receiving a plurality of compiled Boolean expressions/operations and transmitting a plurality of compiled results, and a plurality of registers coupled to the plurality of input/output interface circuits, wherein the plurality of multi-bit registers include an instruction register, a first address register and a second address register. | 05-28-2009 |
20090172355 | INSTRUCTIONS WITH FLOATING POINT CONTROL OVERRIDE - Methods and apparatus relating to instructions with floating point control override are described. In an embodiment, floating point operation settings indicated by a floating point control register may be overridden on a per instruction basis. Other embodiments are also described. | 07-02-2009 |
20090182983 | Compare and Branch Facility and Instruction Therefore - An atomic compare and branch instruction is executed that combines the function of a compare instruction having an option field with a conditional branch or jump instruction such that condition codes are preserved rather than setting condition codes to a value representative of the compare results. One comparand is obtained from any one of a memory location or an immediate field and the other comparand is obtained from a register field. | 07-16-2009 |
20090182984 | Execute Relative Long Facility and Instructions Therefore - A method, system and program product for an execute relative instruction, which when executed fetches and executes a target instruction at a relative address and then returns processing to the next instruction following the execute relative instruction. The relative address is formed by adding the value of the program counter to a sign extended immediate field. The fetched target instruction is optionally modified before execution by OR'ing bits into predetermined bits of the target instruction. | 07-16-2009 |
20090182985 | Move Facility and Instructions Therefore - A move instruction, having a signed immediate field, copies a sign extended signed immediate field value to an operand location in memory. The size of the operand is determined by the opcode of the instruction. Preferably, the address of the operand is determined by adding a displacement field of the instruction to a value associated with a register field of the instruction. | 07-16-2009 |
20090187740 | Reducing errors in pre-decode caches - In a data processing system, data representing program instructions is fetched from memory, each instruction being from one of a plurality of sets of instructions including at least first and second sets of instructions and each program instruction within the fetched data comprising one or more blocks to be pre-decoded, each block representing a portion of an instruction. Pre-decoding circuitry is configured to perform pre-decoding operations on the blocks. For at least one portion of an instruction from the first set of instructions and at least one portion of an instruction from the second set of instructions the pre-decoding operation performed on a block fetched from memory is independent of whether the block is identified as representing the at least one portion of an instruction from the first set of instructions or as the at least one portion of an instruction from the second set of instructions. | 07-23-2009 |
20090193231 | METHOD AND APPARATUS FOR THREAD PRIORITY CONTROL IN A MULTI-THREADED PROCESSOR OF AN INFORMATION HANDLING SYSTEM - An information handling system employs a processor that includes a thread priority controller. An issue unit in the processor sends branch issue information to the thread priority controller when a branch instruction of an instruction thread issues. In one embodiment, if the branch issue information indicates low confidence in a branch prediction for the branch instruction, the thread priority controller speculatively increases or boosts the priority of the instruction thread containing this low confidence branch instruction. In the manner, should a branch redirect actually occur due to a mispredict, a fetcher is ready to access a redirect address in a memory array sooner than would otherwise be possible. | 07-30-2009 |
20090193232 | Apparatus, processor and method of cache coherency control - An apparatus includes a plurality of processors each of which includes a cache memory, and a controller which suspends a request of at least one of the processors during a predetermined period when a processor fetches a data from a main memory to the cache memory, wherein the controller suspends the request of at least one of the processors except the processor which fetches the data from the main memory to the cache memory. | 07-30-2009 |
20090198962 | DATA PROCESSING SYSTEM, PROCESSOR AND METHOD OF DATA PROCESSING HAVING BRANCH TARGET ADDRESS CACHE INCLUDING ADDRESS TYPE TAG BIT - In at least one embodiment, a processor includes an execution unit and instruction sequencing logic that fetches instructions from a memory system for execution by the execution unit. The instruction sequencing logic includes branch logic that outputs predicted branch target addresses for use as instruction fetch addresses. The branch logic includes a branch target address prediction circuitry concurrently holding a first entry providing storage for a first branch target address prediction associating a first instruction fetch address with a first branch target address to be used as an instruction fetch address and a second entry providing storage for a second branch target address prediction associating the first instruction fetch address with a different second branch target address. The first entry indicates a first instruction address type for the first instruction fetch address, and the second entry indicates a second instruction address type for the first instruction fetch address. | 08-06-2009 |
20090198963 | COMPLETION OF ASYNCHRONOUS MEMORY MOVE IN THE PRESENCE OF A BARRIER OPERATION - A method within a data processing system by which a processor executes an asynchronous memory move (AMM) store (ST) instruction to complete a corresponding AMM operation in parallel with an ongoing (not yet completed), previously issued barrier operation. The processor receives the AMM ST instruction after executing the barrier operation (or SYNC instruction) and before the completion of the barrier operation or SYNC on the system fabric. The processor continues executing the AMM ST instruction, which performs a move in virtual address space and then triggers the generation of the AMM operation. The AMM operation proceeds while the barrier operation continues, independent of the processor. The processor stops further execution of all other memory access requests, excluding AMM ST instructions that are received after the barrier operation, but before completion of the barrier operation. | 08-06-2009 |
20090210659 | PROCESSOR AND METHOD FOR WORKAROUND TRIGGER ACTIVATED EXCEPTIONS - A processor includes a microarchitecture for working around a processing flaw, the microarchitecture including: at least one detector adapted for detecting a predetermined state associated with the processing flaw; and at least one mechanism to modify default processor processing behavior; and upon modification of processing behavior, the processing of an instruction involving the processing flaw can be completed by avoiding the processing flaw. | 08-20-2009 |
20090210660 | Prioritising of instruction fetching in microprocessor systems - A method and system are provided for prioritising the fetching of instructions for each of a plurality of executing instruction threads in a multi-threaded processor. Instructions come from at least one source of instructions. Each thread has a number of threads buffered for execution in an instruction buffer | 08-20-2009 |
20090217002 | SYSTEM AND METHOD FOR PROVIDING ASYNCHRONOUS DYNAMIC MILLICODE ENTRY PREDICTION - A system and method for asynchronous dynamic millicode entry prediction in a processor are provided. The system includes a branch target buffer (BTB) to hold branch information. The branch information includes: a branch type indicating that the branch represents a millicode entry (mcentry) instruction targeting a millicode subroutine, and an instruction length code (ILC) associated with the mcentry instruction. The system also includes search logic to perform a method. The method includes locating a branch address in the BTB for the mcentry instruction targeting the millicode subroutine, and determining a return address to return from the millicode subroutine as a function of the an instruction address of the mcentry instruction and the ILC. The system further includes instruction fetch controls to fetch instructions of the millicode subroutine asynchronous to the search logic. The search logic may also operate asynchronous with respect to an instruction decode unit. | 08-27-2009 |
20090217003 | METHOD, SYSTEM, AND COMPUTER PROGRAM PRODUCT FOR REDUCING CACHE MEMORY POLLUTION - A method for reducing cache memory pollution including fetching an instruction stream from a cache line, preventing a fetching for the instruction stream from a sequential cache line, searching for a next predicted taken branch instruction, determining whether a length of the instruction stream extends beyond a length of the cache line based on the next predicted taken branch instruction, continuing preventing the fetching for the instruction stream from the sequential cache line if the length of the instruction stream does not extend beyond the length of the cache line, and allowing the fetching for the instruction stream from the sequential cache line if the length of the instruction stream extends beyond the length of the cache line, whereby the fetching from the sequential cache line and a resulting polluting of a cache memory that stores the instruction stream is minimized. A corresponding system and computer program product. | 08-27-2009 |
20090222645 | METRIC FOR SELECTIVE BRANCH TARGET BUFFER (BTB) ALLOCATION - A method and data processing system allocates entries in a branch target buffer (BTB). Instructions are fetched from a plurality of instructions and one of the plurality of instructions is determined to be a branch instruction. A corresponding branch target address is determined. A determination is made whether the branch target address is stored in a branch target buffer (BTB). When the branch target address is not stored in the branch target buffer, an entry in the branch target buffer is identified for allocation to receive the branch target address based upon stored metrics such as data processing cycle saving information and branch prediction state. In one form the stored metrics are stored in predetermined fields of the entries of the BTB. | 09-03-2009 |
20090235051 | System and Method of Selectively Committing a Result of an Executed Instruction - In a particular embodiment, a method is disclosed that includes receiving an instruction packet including a first instruction and a second instruction that is dependent on the first instruction at a processor having a plurality of parallel execution pipelines, including a first execution pipeline and a second execution pipeline. The method further includes executing in parallel at least a portion of the first instruction and at least a portion of the second instruction. The method also includes selectively committing a second result of executing the at least a portion of the second instruction with the second execution pipeline based on a first result related to execution of the first instruction with the first execution pipeline. | 09-17-2009 |
20090235052 | Data Processing Device and Electronic Equipment - A data processing device is provided using pipeline architecture to reduce a time loss due to a branch without causing an increase in circuit scale. The data processing device uses pipeline control. The data processing device includes an instruction queue in which a plurality of instruction codes can be fetched, a fetch address operation circuit which calculates a fetch address, a fetch circuit which fetches an instruction code based on the fetch address, and a branch information setting circuit which decodes a branch setting instruction, stores a branch address in a branch address storage register, and stores a branch target address in a branch target address storage register. The fetch address operation circuit compares either a previous fetch address or an expected next fetch address with a value stored in the branch address storage register, and determines a next fetch address to be output, based on the comparison result. | 09-17-2009 |
20090240918 | METHOD, COMPUTER PROGRAM PRODUCT, AND HARDWARE PRODUCT FOR ELIMINATING OR REDUCING OPERAND LINE CROSSING PENALTY - Eliminating or reducing an operand line crossing penalty by performing an initial fetch for an operand from a data cache of a processor. The initial fetch is performed by allowing or permitting the initial fetch to occur unaligned with reference to a quadword boundary. A plurality of subsequent fetches for a corresponding plurality of operands from the data cache are performed wherein each of the plurality of subsequent fetches is aligned to any of a plurality of quadword boundaries to prevent each of a plurality of individual fetch requests from spanning a plurality of lines in the data cache. A steady stream of data is maintained by placing an operand buffer at an output of the data cache to store and merge data from the initial fetch and the plurality of subsequent fetches, and to return the stored and merged data to the processor. | 09-24-2009 |
20090249033 | Data processing apparatus and method for handling instructions to be executed by processing circuitry - A data processing apparatus and method are provided for handling instructions to be executed by processing circuitry. The processing circuitry has a plurality of processor states, each processor state having a different instruction set associated therewith. Pre-decoding circuitry receives the instructions fetched from the memory and performs a pre-decoding operation to generate corresponding pre-decoded instructions, with those pre-decoded instructions then being stored in a cache for access by the processing circuitry. The pre-decoding circuitry performs the pre-decoding operation assuming a speculative processor state, and the cache is arranged to store an indication of the speculative processor state in association with the pre-decoded instructions. The processing circuitry is then arranged only to execute an instruction in the sequence using the corresponding pre-decoded instruction from the cache if a current processor state of the processing circuitry matches the indication of the speculative processor state stored in the cache for that instruction. This provides a simple and effective mechanism for detecting instructions that have been corrupted by the pre-decoding operation due to an incorrect assumption of processor state. | 10-01-2009 |
20090254734 | Partial Load/Store Forward Prediction - In one embodiment, a processor comprises a prediction circuit and another circuit coupled to the prediction circuit. The prediction circuit is configured to predict whether or not a first load instruction will experience a partial store to load forward (PSTLF) event during execution. A PSTLF event occurs if a plurality of bytes, accessed responsive to the first load instruction during execution, include at least a first byte updated responsive to a previous uncommitted store operation and also include at least a second byte not updated responsive to the previous uncommitted store operation. Coupled to receive the first load instruction, the circuit is configured to generate one or more load operations responsive to the first load instruction. The load operations are to be executed in the processor to execute the first load instruction, and a number of the load operations is dependent on the prediction by the prediction circuit. | 10-08-2009 |
20090282220 | Microprocessor with Compact Instruction Set Architecture - A re-encoded instruction set architecture (ISA) provides smaller bit-width instructions or a combination of smaller and larger bit-width instructions to improve instruction execution efficiency and reduce code footprint. The ISA can be re-encoded from a legacy ISA having larger bit-width instructions and can be used to unify one or more ISA extensions such as application specific ASEs. The re-encoded ISA maintains assembly-level compatibility with the ISA from which it is derived. In addition, the re-encoded ISA can have new and different types of additional instructions. | 11-12-2009 |
20090287907 | System for providing trace data in a data processor having a pipelined architecture - The invention is a method and system for providing trace data in a pipelined data processor. Aspects of the invention include providing a trace pipeline in parallel to the execution pipeline, providing trace information on whether conditional instructions complete or not, providing trace information on the interrupt status of the processor, replacing instructions in the processor with functionally equivalent instructions that also produce trace information and modifying the scheduling of instructions in the processor based on the occupancy of a trace output buffer. | 11-19-2009 |
20090300329 | VOLTAGE DROOP MITIGATION THROUGH INSTRUCTION ISSUE THROTTLING - A system and method for providing a digital real-time voltage droop detection and subsequent voltage droop reduction. A scheduler within a reservation station may store a weight value for each instruction corresponding to node capacitance switching activity for the instruction derived from pre-silicon power modeling analysis. For instructions picked with available source data, the corresponding weight values are summed together to produce a local current consumption value and this value is summed with any existing global current consumption values from corresponding schedulers of other processor cores yielding an activity event. The activity event is stored. Hashing functions within the scheduler are used to determine both a recent and an old activity average using the calculated activity event and stored older activity events. Instruction issue throttling occurs if either a difference between the old activity average and the recent activity average exceed a first threshold or the recent activity average exceeds a second threshold. | 12-03-2009 |
20090300330 | DATA PROCESSING METHOD AND SYSTEM BASED ON PIPELINE - A data processing system and method are disclosed. The system comprises an instruction-fetch stage where an instruction is fetched and a specific instruction is input into decode stage; a decode stage where said specific instruction indicates that contents of a register in a register file are used as an index, and then, the register file pointed to by said index is accessed based on said index; an execution stage where an access result of said decode stage is received, and computations are implemented according to the access result of the decode stage. | 12-03-2009 |
20090327657 | GENERATING AND PERFORMING DEPENDENCY CONTROLLED FLOW COMPRISING MULTIPLE MICRO-OPERATIONS (uops) - A processor to perform an out-of-order (OOO) processing in which a reservation station (RS) may generate and process a dependency controlled flow comprising multiple micro-operations (uops) with specific clock based dispatch scheme. The RS may either combine two or more uops into a single RS entry or make a direct connection between two or more RS entries. The RS may allow more than two source values to be associated with a single RS by combining sources from the two or more uops. One or more execution units may be provisioned to perform the function defined by the uops. The execution units may receive more than two sources at a given time point and produce two or more results on different ports. | 12-31-2009 |
20090327658 | COMPARE, SWAP AND STORE FACILITY WITH NO EXTERNAL SERIALIZATION - A compare, swap and store facility is provided that does not require external serialization. A compare and swap operation is performed using an interlocked update operation. If the comparison indicates equality, a store operation is performed. The compare, swap and store operations are performed as a single unit of operation. | 12-31-2009 |
20100005276 | INFORMATION PROCESSING DEVICE AND METHOD OF CONTROLLING INSTRUCTION FETCH - An information processing device includes an instruction fetch unit, an instruction buffer, an instruction executing unit, and an instruction fetch control unit. The instruction fetch unit supplies a fetch address to an instruction memory. The instruction buffer stores an instruction read out from the instruction memory. The instruction executing unit decodes and executes the instruction supplied from the instruction buffer. The instruction fetch control unit stops supply of the fetch address to the instruction memory by the instruction fetch unit when the fetch address corresponds to a first address or an address after the first address while the instruction executing unit executes loop processing. The loop processing is repeatedly executed for a predetermined number of times in accordance with decoding of the loop instruction by the instruction executing unit. The first address is an address after an address of an end instruction included in the loop processing. | 01-07-2010 |
20100017580 | Pre-decode checking for pre-decoded instructions that cross cache line boundaries - A data processing and method are provided for pre-decoding instructions. The data processing apparatus has pre-decoding circuitry for receiving instructions fetched from a memory and for performing a pre-decoding operation to generate corresponding pre-decoded instructions, which are then stored in the cache for access by the processing circuitry. If a pre-decoded instruction crosses a cache line boundary, then checking circuitry in respect of selected types of pre-decoded instruction checks for consistency between the first portion of the pre-decoded instruction stored within a first cache line and a contiguous second portion of the pre-decoded instruction stored within a second cache line. If this consistency check is passed such that the two portions are self-consistent, then the pre-decoded instruction can be further decoded and issued. If the consistency check is failed, or the pre-decoded instruction is not of a type for which consistency checking is supported, then re-generation of the pre-decoded instruction is triggered. | 01-21-2010 |
20100037036 | METHOD TO IMPROVE BRANCH PREDICTION LATENCY - An apparatus to generate a branch prediction of an instruction based at least in part on the address of the previous branch instruction, wherein the previous instruction is prior to the instruction in a program order. The prediction can also based on a branch history value with respect to the previous branch instruction and one or more previous branch predictions. | 02-11-2010 |
20100037037 | METHOD FOR INSTRUCTION PIPELINING ON IRREGULAR REGISTER FILES - A method for pipelining instructions on a PAC processor includes determining a minimum initial interval, and grouping the instructions so that the operands of dependent instructions are assigned to the same local register file. The virtual registers of the instructions that have data dependency across the first functional unit and the second functional unit are assigned to a global register file. The instructions are then modulo scheduled based on a current value of initial interval. The virtual registers of the scheduled instructions are allocated to the corresponding register files. If the allocation fails, a set of virtual registers is transferred from the first or second register file to the global register file. | 02-11-2010 |
20100042811 | METHOD FOR MANAGING BRANCH INSTRUCTIONS AND A DEVICE HAVING BRANCH INSTRUCTION MANAGEMENT CAPABILITIES - A method for managing branch instructions, the method includes: providing, to pipeline stages of a processor, multiple variable length groups of instructions; wherein each pipeline stage executes a group of instruction during a single execution cycle; receiving, at a certain execution cycle, multiple instruction fetch requests from multiple pipeline stages, each pipeline stage that generates an instruction fetch request stores a variable length group of instructions that comprises a branch instruction; sending to the fetch unit an instruction fetch command that is responsive to a first in order branch instruction in the pipeline stages; wherein if the first in order fetch command is a conditional fetch command then the instruction fetch command comprises a resolved target address; wherein the sending of the instruction fetch command is restricted to a single instruction fetch command per a single execution cycle. | 02-18-2010 |
20100049946 | PROCESSOR, COMPUTER READABLE RECORDING MEDIUM, AND STORAGE DEVICE - A processor includes: a first storage part that stores instructions of a program including sets of instruction groups, which sets are hierarchically structured; a second storage part that stores an address value of the first storage part in which an instruction to be read next is stored; a third storage part that includes storage areas respectively corresponding to hierarchical levels of the program; and a control part that executes, when an instruction read from the first storage part is a call instruction that calls a different one of the sets of instruction groups, a control to store the address value in the second storage part in one of the storage areas of the third storage part that corresponds to one of the hierarchical levels with which the different one of the sets of instruction groups being executed is associated. | 02-25-2010 |
20100058032 | Effective Use of a BHT in Processor Having Variable Length Instruction Set Execution Modes - In a processor executing instructions in at least a first instruction set execution mode having a first minimum instruction length and a second instruction set execution mode having a smaller, second minimum instruction length, line and counter index addresses are formed that access every counter in a branch history table (BHT), and reduce the number of index address bits that are multiplexed based on the current instruction set execution mode. In one embodiment, counters within a BHT line are arranged and indexed in such a manner that half of the BHT can be powered down for each access in one instruction set execution mode. | 03-04-2010 |
20100077180 | GENERATING PREDICATE VALUES BASED ON CONDITIONAL DATA DEPENDENCY IN VECTOR PROCESSORS - Embodiments of a method for performing parallel operations in a computer system when one or more conditional dependencies may be present, where a given conditional dependency includes a dependency associated with at least two data elements based on a pair of conditions. During operation, a processor receives instructions for generating one or more predicate values based on actual dependencies, where a given predicate value indicates data elements that may be safely evaluated in parallel, and where the given actual dependency occurs when the pair of conditions matches one or more criteria. Then, the processor executes the instructions for generating the one or more predicate values. | 03-25-2010 |
20100082946 | Microcomputer and its instruction execution method - A microcomputer in accordance with an exemplary embodiment of the preset invention include an instruction decoder | 04-01-2010 |
20100082947 | VERY-LONG INSTRUCTION WORD ARCHITECTURE WITH MULTIPLE PROCESSING UNITS - A processor may include a plurality of processing units for processing instructions, where each processing unit is associated with a discrete instruction queue. Data is read from a data queue selected by each instruction, and a sequencer manages distribution of instructions to the plurality of discrete instruction queues. | 04-01-2010 |
20100100707 | DATA STRUCTURE FOR CONTROLLING AN ALGORITHM PERFORMED ON A UNIT OF WORK IN A HIGHLY THREADED NETWORK ON A CHIP - A computer-implemented method, system and computer program product for controlling an algorithm that is performed on a unit of work in a subsequent software pipeline stage in a Network On a Chip (NOC) is presented. In one embodiment, the method executes a first operation in a first node of the NOC. The first node generates payload, and then loads that payload into a message. The message with the payload is transmitted to a nanokernel that controls a second node in the NOC. The nanokernel calls an algorithm that is needed by a second operation in a second node in the NOC, which uses the algorithm to execute the second operation. | 04-22-2010 |
20100100708 | Processing device - A processing device which can execute a plurality of threads includes: an execution unit which executes a command; a supply unit which supplies a command to the execution unit; a buffer unit which holds the command supplied from the supply unit; and a control unit which manages the buffer unit. The buffer unit has a set of buffer elements. Each of the buffer elements has a data unit for storing a command and a pointer unit for defining a connection relationship between the buffer elements. The control unit has a thread allocation unit which allocates a sequence of buffer elements whose connection relationship has been defined by the pointer unit for respective threads executed by the processing device. | 04-22-2010 |
20100100709 | Instruction control apparatus and instruction control method - In a CPU having a SMT function of executing plural threads composed of a series of instructions representing processing, there are provided a decode section for decoding processing represented by instructions of plural threads, an instruction buffer for obtaining instructions from a thread and holding the instructions, and inputting the held instructions to the decode section in order in the thread, and an execution pipeline for executing processing of instructions decoded by the decode section. The decode section checks whether or not an executable condition is ready for an instruction when the instruction is decoded and requests that the instructions held in the instruction buffer and an instruction subsequent to an instruction that is not ready with an executable condition are inputted again to the decode section. | 04-22-2010 |
20100100710 | Information processing apparatus, cache memory controlling apparatus, and memory access order assuring method - According to an aspect of the embodiment, when data on a cache RAM is rewritten in a storage processing of one thread, an determination unit searches a fetch port which holds a request of another thread, checks whether a request exists whose processing is completed, whose instruction is a load type instruction, and whose target address corresponds to a target address in a storage processing. When the corresponding request is detected, the determination unit sets a re-execution request flag to all the entries of the fetch port from the next entry of the entry which holds the oldest request to the entry which holds the detected request. When the processing of the oldest request is executed, a re-execution request unit transfers a re-execution request of an instruction to an instruction control unit for the request held in the entry in which the re-execution request flag is set. | 04-22-2010 |
20100106943 | Processing device - It is possible to realize fetch of instructs constituting a loop by using a simple configuration without fixing a loop start point. Provided is a processing method performed by a processing device including: a instruction buffer; a instruction decoder; a pointer arranged to correspond to the instruction buffer and indicating a connection relationship between one instruction buffer from which a instruction stream is read out and other instruction buffer containing the next instruction stream to be read out, according to an identifier of other instruction buffer; a start point storage unit containing an identifier of the instruction buffer containing a instruction stream serving as a start point of repetition when performing a instruction fetch of such a predetermined instruction that processing of a instruction stream is repeated in a loop. When a instruction stream is read out from the instruction buffer and the predetermined instruction is detected from the instruction stream, the identifier stored in the start point storage unit is set as a pointer of the identifier of the next instruction buffer from which a instruction is to be read out. | 04-29-2010 |
20100115238 | STREAM PROCESSING SYSTEM HAVING A RECONFIGURABLE MEMORY MODULE - A stream processing system includes a stream processing module coupled to a memory module and operable so as to fetch stream elements from the memory module, to process the stream elements fetched thereby, and to store processed stream elements in the memory module. The stream processing module includes a number (N) of stream processing units, and the memory module is configured with a number (N) of memory bank units each corresponding to a respective one of the stream processing units. The memory module is reconfigurable based on a desired inter-level configuration so that each of the memory bank units is configured to have a memory size sufficient to meet processing requirement of the respective one of the stream processing units. | 05-06-2010 |
20100115239 | VARIABLE INSTRUCTION WIDTH DIGITAL SIGNAL PROCESSOR - A DSP architecture achieves high code density and performance by using 16 bit encoding/decoding of three-register instructions and including orthogonal 64 register selection fields within a 32-bit instruction. A 64 entry register file allows high performance, while the 16-bit instruction size provides excellent code density in control type applications. | 05-06-2010 |
20100122066 | INSTRUCTION METHOD FOR FACILITATING EFFICIENT CODING AND INSTRUCTION FETCH OF LOOP CONSTRUCT - Instruction set techniques have been developed to identify explicitly the beginning of a loop body and to code a conditional loop-end in ways that allow a processor implementation to efficiently manage an instruction fetch buffer and/or entries in an instruction cache. In particular, for some computations and processor implementations, a machine instruction is defined that identifies a loop start, stores a corresponding loop start address on a return stack (or in other suitable storage) and directs fetch logic to take advantage of the identification by retaining in a fetch buffer or instruction cache the instruction(s) beginning at the loop start address, thereby avoiding usual branch delays on subsequent iterations of the loop. A conditional loop-end instruction can be used in conjunction with the loop start instruction to discard (or simply mark as no longer needed) the loop start address and the loop body instructions retained in the fetch buffer or instruction cache. | 05-13-2010 |
20100122067 | ACROSS-THREAD OUT-OF-ORDER INSTRUCTION DISPATCH IN A MULTITHREADED MICROPROCESSOR - Instruction dispatch in a multithreaded microprocessor such as a graphics processor is not constrained by an order among the threads. Instructions for each thread are fetched, and a dispatch circuit determines which instructions in the buffer are ready to execute. The dispatch circuit may issue any ready instruction for execution, and an instruction from one thread may be issued prior to an instruction from another thread regardless of which instruction was fetched first. If multiple functional units are available, multiple instructions can be dispatched in parallel. | 05-13-2010 |
20100125719 | Instruction Target History Based Register Address Indexing - A circuit arrangement and method support instruction target history based register address indexing, whereby register addresses to be used by an instruction are decoded using a target history table of previous target register addresses, and an index into the target history table supplied by an index value in the instruction. An instruction may include at least one index value that identifies a previously used register address. During execution of the instruction, the index is retrieved from the instruction, and then a register address is retrieved from the target history table using the index. | 05-20-2010 |
20100125720 | INSTRUCTION MODE IDENTIFICATION APPARATUS AND METHOD - An instruction mode identification apparatus includes a program counter and a processor. The program counter stores an instruction address, which comprises a plurality of bits for indicating an address of an instruction currently executed or to be executed. At least one of the plurality of bits is a redundant bit. The processor identifies an instruction mode according to the redundant bit. The instruction mode represents an execution mode of the current instruction. An instruction mode identification method is also disclosed. | 05-20-2010 |
20100153688 | APPARATUS AND METHOD FOR DATA PROCESS - An exemplary aspect of the present invention is a data processing apparatus for processing a loop in a pipeline that includes an instruction memory and a fetch circuit that fetches an instruction stored in the instruction memory. The fetch circuit includes an instruction queue that stores an instruction to be output from the fetch circuit, an evacuation queue that stores an instruction fetched from the instruction memory, a selector that selects one of the instruction output from the instruction queue and the instruction output from the evacuation queue, and a loop queue that stores the instruction selected by the selector and outputs to the instruction queue. | 06-17-2010 |
20100161941 | METHOD AND SYSTEM FOR IMPROVED FLASH CONTROLLER COMMANDS SELECTION - A system for selecting a subset of issued flash storage commands to improve processing time for command execution. A plurality of ports stores a first plurality of command identifiers and are associated with the plurality of ports. Each of the first plurality of arbiters selects an oldest command identifier among command identifiers within each corresponding port resulting in a second plurality of command identifiers. A second arbiter makes a plurality of selections from the second plurality of command identifiers based on command identifier age and the priority of the port. A session identifier queue stores commands associated with the plurality of selections among other commands forming a third plurality of commands. A microcontroller selects an executable command from the third plurality of commands for execution based on an execution optimization heuristic. After execution of the command, the command identifier in the port is cleared. | 06-24-2010 |
20100161942 | INFORMATION HANDLING SYSTEM INCLUDING A PROCESSOR WITH A BIFURCATED ISSUE QUEUE - An information handling system includes a processor with a bifurcated unified issue queue that may perform unified issue queue VSU store instruction dependency operations. The bifurcated unified issue queue BUIQ maintains VSU store instructions in the form of internal operations data. The BUIQ includes a unified issue queue UIQ | 06-24-2010 |
20100161943 | PROCESSOR CAPABLE OF POWER CONSUMPTION SCALING - The present invention relates to a processor capable of power consumption scaling, and more particularly, to a technique that variably controls the energy consumption of a processor according to the energy capacity being supplied by providing a pipeline register with a bypass function so as to control the operating frequency of the processor. | 06-24-2010 |
20100169611 | BRANCH MISPREDICTION RECOVERY MECHANISM FOR MICROPROCESSORS - A system and method for reducing branch misprediction penalty. In response to detecting a mispredicted branch instruction, circuitry within a microprocessor identifies a predetermined condition prior to retirement of the branch instruction. Upon identifying this condition, the entire corresponding pipeline is flushed prior to retirement of the branch instruction, and instruction fetch is started at a corresponding address of an oldest instruction in the pipeline immediately prior to the flushing of the pipeline. The correct outcome is stored prior to the pipeline flush. In order to distinguish the mispredicted branch from other instructions, identification information may be stored alongside the correct outcome. One example of the predetermined condition being satisfied is in response to a timer reaching a predetermined threshold value, wherein the timer begins incrementing in response to the mispredicted branch detection and resets at retirement of the mispredicted branch. | 07-01-2010 |
20100169612 | Data-Processing Unit for Nested-Loop Instructions - A data-processing unit has a fetching circuitry ( | 07-01-2010 |
20100174888 | Memory System - A memory system includes a storage device storing a plurality of instructions and a central processing unit processing an instruction fetched from the storage device, wherein the central processing unit detects a change in the instruction fetched from the storage device while processing the instruction. | 07-08-2010 |
20100180102 | Enhancing processing efficiency in large instruction width processors - A processor includes one or more processing units, an execution pipeline and control circuitry. The execution pipeline includes at least first and second pipeline stages that are cascaded so that program instructions, specifying operations to be performed by the processing units in successive cycles of the pipeline, are fetched from a memory by the first pipeline stage and conveyed to the second pipeline stage, which causes the processing units to perform the specified operations. | 07-15-2010 |
20100180103 | MECHANISM FOR INCREASING THE EFFECTIVE CAPACITY OF THE WORKING REGISTER FILE - A computer processor pipeline has both an architectural register file and a working register file. The lifetime of an entry in the working register file is determined by a predetermined number of instructions passing through a specified stage in the pipeline after the location in the working register file is allocated for an instruction. The size of the working register file is selected based upon performance characteristics. A working register file creditor indicator is coupled to the front end pipeline portion and to the back end pipeline portion. The working register file credit indicator is monitored to prevent a working register file overflow. When the a location in the architectural register file is read early, the location is monitored to determine whether the location is written to prior to issuance of the instruction associated with the early read. | 07-15-2010 |
20100185834 | Data Storing Method and Processor Using the Same - A data storing method applied to a processor having a pipelined processing unit is provided. The pipelined processing unit includes stages. The stages include a source operand fetch stage and a write-back stage. The method includes the following steps. Firstly, a storing instruction is fetched and decoded. Next, the storing instruction is entered to the source operand fetch stage, and whether there is a late-done instruction in the pipelined processing unit is determined. The late-done instruction not lagged behind the storing instruction generates a late-coming result before entering the write-back stage. If it is determined that there is a late-done instruction in the pipelined processing unit, then the late-coming result is fetched before the storing instruction is entered to the write-back stage. Thereafter, the storing instruction is entered to the write-back stage, and the late-coming result is stored to a target memory which the storing instruction corresponds to. | 07-22-2010 |
20100205400 | EXECUTING ROUTINES BETWEEN AN EMULATED OPERATING SYSTEM AND A HOST OPERATING SYSTEM - Approaches for emulating an operating system. A method includes executing a first operating system (OS) on an instruction processor. The first OS includes instructions of a first instruction set that are native to the instruction processor. A second OS is emulated on the first OS and includes instructions of a second instruction set that are not native to the instruction processor. An emulated transfer-of-control instruction is determined during emulation of the second OS to target either instructions of the first set or the second set. In response to determining that instructions of the first set are targeted, control is transferred to the targeted instructions of the first set on the instruction processor. In response to determining that instructions of the second set are targeted, the targeted instructions of the second set are retrieved and emulated. | 08-12-2010 |
20100205401 | PIPELINED MICROPROCESSOR WITH FAST NON-SELECTIVE CORRECT CONDITIONAL BRANCH INSTRUCTION RESOLUTION - A microprocessor includes a register that stores a state and a fetch unit that fetches instructions of a program. The program includes a first instruction followed non-immediately by a second instruction. The first instruction instructs the microprocessor to update the state in the register. The second instruction is a conditional branch instruction that specifies a branch condition based on the register state. The fetch unit dispatches the first instruction for execution but refrains from dispatching the second instruction for execution. Execution units receive the first instruction from the fetch unit and responsively update the register state. The fetch unit non-selectively correctly resolves the conditional branch instruction based on the register state when the execution units have updated the register state. The fetch unit also non-selectively refrains from sending the conditional branch instruction to the execution units to be resolved regardless of whether the execution units have updated the register state. | 08-12-2010 |
20100205402 | PIPELINED MICROPROCESSOR WITH NORMAL AND FAST CONDITIONAL BRANCH INSTRUCTIONS - A microprocessor includes a first branch condition state and a second branch condition state. The microprocessor also includes a conditional branch instruction of a first type that instructs the microprocessor to wait to correctly resolve the conditional branch instruction of the first type based on the first branch condition state until other instructions within the microprocessor that update the first branch condition state and that are older than the conditional branch instruction of the first type have updated the first branch condition state. A conditional branch instruction of a second type instructs the microprocessor to correctly resolve the conditional branch instruction of the second type based on the second branch condition state without regard to whether other instructions within the microprocessor that update the second branch condition state and that are older than the conditional branch instruction of the second type have yet updated the second branch condition state. | 08-12-2010 |
20100205403 | PIPELINED MICROPROCESSOR WITH FAST CONDITIONAL BRANCH INSTRUCTIONS BASED ON STATIC EXCEPTION STATE - A microprocessor includes a memory that stores an exception handler to handle an exception condition. The exception handler is a non-user program private to the microprocessor and includes a conditional branch instruction. A first fetch unit fetches instructions of a user program that includes a user program instruction that causes the exception condition. An execution unit executes the user program instructions fetched by the first fetch unit and executes instructions of the exception handler. The execution unit also saves a state in response to detecting the exception condition caused by the user program instruction. A second fetch unit fetches the exception handler instructions from the memory and resolves the conditional branch instruction based on the saved state without sending the conditional branch instruction to the execution unit to resolve the conditional branch instruction. | 08-12-2010 |
20100205404 | PIPELINED MICROPROCESSOR WITH FAST CONDITIONAL BRANCH INSTRUCTIONS BASED ON STATIC MICROCODE-IMPLEMENTED INSTRUCTION STATE - A microprocessor includes a memory that stores instructions of a non-user program to implement a user program instruction of the user-visible instruction set of the microprocessor. The non-user program includes a conditional branch instruction. A first fetch unit fetches instructions of the user program that includes the instruction that is implemented by the non-user program. An instruction decoder decodes the user program instructions and saves a state in response to decoding the user program instruction that is implemented by the non-user program. An execution unit executes the user program instructions fetched by the first fetch unit and executes instructions of the non-user program other than the conditional branch instruction. A second fetch unit fetches the non-user program instructions from the memory and resolves the conditional branch instruction based on the saved state without sending the conditional branch instruction to the execution unit to resolve the conditional branch instruction. | 08-12-2010 |
20100205405 | STATIC BRANCH PREDICTION METHOD AND CODE EXECUTION METHOD FOR PIPELINE PROCESSOR, AND CODE COMPILING METHOD FOR STATIC BRANCH PREDICTION - A static branch prediction method and code execution method for a pipeline processor, and a code compiling method for static branch prediction, are provided herein. The static branch prediction method includes predicting a conditional branch code as taken or not-taken, adding the prediction information, converting the conditional branch code into a jump target address setting (JTS) code including target address information, branch time information, and a test code, and scheduling codes in a block. The code may be scheduled into a last slot of the block, and the JTS code may be scheduled into an empty slot after all the other codes in the block are scheduled. When the conditional branch code is predicted as taken in the prediction operation, a target address indicated by the target address information may be fetched at a cycle time indicated by the branch time information. | 08-12-2010 |
20100211761 | DIGITAL SIGNAL PROCESSOR (DSP) WITH VECTOR MATH INSTRUCTION - In accordance with at least some embodiments, a digital signal processor (DSP) includes an instruction fetch unit and an instruction decode unit in communication with the instruction fetch unit. The DSP also includes a register set and a plurality of work units in communication with the instruction decode unit. A vector math instruction decoded by the instruction decode unit causes input vectors and output vectors to be aligned with a maximum boundary of the register set and causes parallel operations by the work units. | 08-19-2010 |
20100211762 | Mechanism for Efficient Implementation of Software Pipelined Loops in VLIW Processors - A system to implement a zero overhead software pipelined (SFP) loop includes a Very Long Instruction Word (VLIW) processor having an N number of execution slots. The VLIW processor executes a plurality of instructions in parallel without any limitation of an instruction buffer size. A program memory receives a Program Memory address to fetch an instruction packet. The program memory is closely coupled with the instruction buffer size to implement the zero overhead software pipelined (SFP) loop. The size of the zero overhead software pipelined (SFP) loop can exceed the instruction buffer size. A CPU control register includes a block count and an iteration count. The block count is loaded into a block counter and counts the plurality of instructions executed in the SFP loop, and the iteration count is loaded into an iteration counter and counts a number of iterations of the SFP loop based on the block count. | 08-19-2010 |
20100228952 | APPARATUS AND METHOD FOR FAST CORRECT RESOLUTION OF CALL AND RETURN INSTRUCTIONS USING MULTIPLE CALL/RETURN STACKS IN THE PRESENCE OF SPECULATIVE CONDITIONAL INSTRUCTION EXECUTION IN A PIPELINED MICROPROCESSOR - A microprocessor having a plurality of call/return stacks (CRS) correctly resolves a call or return instruction rather than issuing the instruction to execution units of the microprocessor to be resolved. The microprocessor fetches a call or return instruction and determines whether the instruction is the first call or return instruction fetched after fetching a conditional branch instruction that has yet to be resolved. The microprocessor copies the contents of a current CRS to another CRS and designates the other CRS as the current CRS, if the state exists. The microprocessor pushes the address of the next sequential instruction following the call instruction onto the current CRS and fetches an instruction at the call instruction target address if the instruction is a call instruction. The microprocessor pops a second return address from the current CRS and fetches an instruction at the second return address, if the instruction is a return instruction. | 09-09-2010 |
20100228953 | REDUCING DATA HAZARDS IN PIPELINED PROCESSORS TO PROVIDE HIGH PROCESSOR UTILIZATION - A pipelined computer processor is presented that reduces data hazards such that high processor utilization is attained. The processor restructures a set of instructions to operate concurrently on multiple pieces of data in multiple passes. One subset of instructions operates on one piece of data while different subsets of instructions operate concurrently on different pieces of data. A validity pipeline tracks the priming and draining of the pipeline processor to ensure that only valid data is written to registers or memory. Pass-dependent addressing is provided to correctly address registers and memory for different pieces of data. | 09-09-2010 |
20100241832 | Instruction fetching following changes in program flow - This application is concerned with a device and method for fetching instructions from a data store for processing by a data processor. The device comprises: a register for storing an address of an instruction to be processed by said data processor; a fetch unit responsive to an address input to said fetch unit to fetch an instruction stored at said address; an adder for adding a predetermined amount to said address stored in said register prior to sending said address to said fetch unit, said predetermined amount determining a position in a program flow said fetched instruction has with respect to said instruction addressed in said register; said adder being responsive to detection of a change in program flow to reset said predetermined amount to an initial value, and to increase said predetermined amount for subsequent fetches by an amount equal to the separation between addresses such that consecutive addresses are fetched up to a maximum predetermined amount. | 09-23-2010 |
20100262806 | Tracking Effective Addresses in an Out-of-Order Processor - Mechanisms, in a data processing system, are provided for tracking effective addresses through a processor pipeline of the data processing system. The mechanisms comprise logic for fetching an instruction from an instruction cache and associating, by an effective address table logic in the data processing system, an entry in an effective address table (EAT) data structure with the fetched instruction. The mechanisms further comprise logic for associating an effective address tag (eatag) with the fetched instruction, the eatag comprising a base eatag that points to the entry in the EAT and an eatag offset. Moreover, the mechanisms comprise logic for processing the instruction through the processor pipeline by processing the eatag. | 10-14-2010 |
20100262807 | Partial Flush Handling with Multiple Branches Per Group - Mechanisms are provided for partial flush handling with multiple branches per instruction group. The instruction fetch unit sorts instructions into groups. A group may include a floating branch instruction and a boundary branch instruction. For each group of instructions, the instruction sequencing unit creates an entry in a global completion table (GCT), which may also be referred to herein as a group completion table. The instruction sequencing unit uses the GCT to manage completion of instructions within each outstanding group. Because each group may include up to two branches, the instruction sequencing unit may dispatch instructions beyond the first branch, i.e. the floating branch. Therefore, if the floating branch results in a misprediction, the processor performs a partial flush of that group, as well as a flush of every group younger than that group. | 10-14-2010 |
20100293357 | Method and apparatus for providing platform independent secure domain - A platform independent secure domain providing apparatus, which determines whether an execution environment is to be in a secure domain and a non-secure domain by a secure bit. The apparatus includes a secure monitor that is adapted to generate a branch instruction when a call to a secure code is sensed, turn on the secure bit when the branch instruction has been successfully executed, and turn off the secure bit when the execution of the secure code is finished, an instruction bypass read only memory (ROM) adapted to receive the branch instruction from the secure monitor, and a processor adapted to execute the branch instruction that is fetched from the instruction bypass ROM. | 11-18-2010 |
20100312991 | Microprocessor with Compact Instruction Set Architecture - A re-encoded instruction set architecture (ISA) provides smaller bit-width instructions or a combination of smaller and larger bit-width instructions to improve instruction execution efficiency and reduce code footprint. The ISA can be re-encoded from a legacy ISA having larger bit-width instructions, and the re-encoded ISA can maintain assembly-level compatibility with the ISA from which it is derived. In addition, the re-encoded ISA can have new and different types of additional instructions, including instructions with encoded arguments determined by statistical analysis and instructions that have the effect of combinations of instructions. | 12-09-2010 |
20110004742 | Variable-Cycle, Event-Driven Multi-Execution Flash Processor - A Multi-Execution Flash Processor core performs operations associated with accessing non-volatile semiconductor based memory units. Execution units included in the core can execute instructions requiring different numbers of clock cycles to complete by generating an event control signal in response to completing an instruction. The core can be used in a controller to access and control external memory units. Data memory access operations include using an instruction decoder to select one or more execution units to perform an operation associated with the instruction, and generating an event control signal upon completion of the operation. In some cases, executing the instruction includes selecting a second execution unit. | 01-06-2011 |
20110029758 | CENTRAL PROCESSING UNIT MEASUREMENT FACILITY - A measurement sampling facility takes snapshots of the central processing unit (CPU) on which it is executing at specified sampling intervals to collect data relating to tasks executing on the CPU. The collected data is stored in a buffer, and at selected times, an interrupt is provided to remove data from the buffer to enable reuse thereof. The interrupt is not taken after each sample, but in sufficient time to remove the data and minimize data loss. | 02-03-2011 |
20110066827 | MULTIPROCESSOR - A multiprocessor of a single processor, including a pipeline processing unit which successively fetches an instruction sequence to be independently processed on each of the multiprocessor with a shifted phase in one cycle. | 03-17-2011 |
20110072242 | CONFIGURABLE PROCESSING APPARATUS AND SYSTEM THEREOF - A configurable processing apparatus includes a plurality of processing units, at least an instruction synchronization control circuit, and at least a configuration memory. Each processing apparatus has a stall-output signal generating circuit to output a stall-output signal, wherein the stall-output signal indicates that an unexpected stall is occurred in the processing unit. The processing unit has a stall-in signal, and an external circuit of the processing unit can control whether the processing unit is stalled according to the stall-in signal. The instruction synchronization control circuit generates the stall-in signals to the processing units in response to a content stored in the configuration memory and the stall-output signals of the processing units, so as to determine operation modes and instruction synchronization of the processing units. | 03-24-2011 |
20110087862 | Multiprocessor resource optimization - Embodiments include a device and a method. In an embodiment, a method applies a first resource management strategy to a first resource associated with a first processor and executes an instruction block in a first processor. The method also applies a second resource management strategy to a second resource of a similar type as the first resource and executes the instruction block in a second processor. The method further selects a resource management strategy likely to provide a substantially optimum execution of the instruction group from the first resource management strategy and the second resource management strategy. | 04-14-2011 |
20110107062 | Interrupt Handling - Techniques for handling interrupts of multiple instruction threads within a multi-thread processing environment. The techniques include: interleavingly fetching and issuing instructions of (i) a first instruction execution thread and (ii) a second instruction thread for execution by an execution block of the multi-thread processing environment; providing a first interrupt signal via a first interrupt signal line within the multi-thread processing environment to interrupt fetching and issuing of instructions of the first instruction execution thread; and providing a second interrupt signal via a second interrupt signal line within the multi-thread processing environment to interrupt fetching and issuing of instructions of the second instruction execution thread. The first interrupt signal line and the second interrupt signal line are physically separate and distinct signal lines that are directly electrically coupled to one another. | 05-05-2011 |
20110131394 | APPARATUS AND METHOD FOR USING BRANCH PREDICTION HEURISTICS FOR DETERMINATION OF TRACE FORMATION READINESS - A single unified level one instruction(s) cache in which some lines may contain traces and other lines in the same congruence class may contain blocks of instruction(s) consistent with conventional cache lines. Formation of trace lines in the cache is delayed on initial operation of the system to assure quality of the trace lines stored. | 06-02-2011 |
20110153986 | PREDICTING AND AVOIDING OPERAND-STORE-COMPARE HAZARDS IN OUT-OF-ORDER MICROPROCESSORS - A method and information processing system manage load and store operations executed out-of-order. At least one of a load instruction and a store instruction is executed. A determination is made that an operand store compare hazard has been encountered. An entry within an operand store compare hazard prediction table is created based on the determination. The entry includes at least an instruction address of the instruction that has been executed and a hazard indicating flag associated with the instruction. The hazard indicating flag indicates that the instruction has encountered the operand store compare hazard. When a load instruction is associated with the hazard indicating flag the load instruction becomes dependent upon all store instructions associated with a substantially similar flag. | 06-23-2011 |
20110161630 | GENERAL PURPOSE HARDWARE TO REPLACE FAULTY CORE COMPONENTS THAT MAY ALSO PROVIDE ADDITIONAL PROCESSOR FUNCTIONALITY - An apparatus and method is described herein for replacing faulty core components. General purpose hardware is provided to replace core pipeline components, such as execution units. In the embodiment of execution unit replacement, a proxy unit is provided, such that mapping logic is able to map instruction/operations, which correspond to faulty execution units, to the proxy unit. As a result, the proxy unit is able to receive the operations, send them to general purpose hardware for execution, and subsequently write-back the execution results to a register file; it essentially replaces the defective execution unit allowing a processor with defective units to be sold or continue operation. | 06-30-2011 |
20110167242 | Multiple instruction execution mode resource-constrained device - A resource-constrained device comprises a processor configured to execute multiple instruction streams comprising multiple instructions having an opcode and zero or more operands. Each of the multiple instruction streams is associated with one of multiple instruction execution modes having an instruction set comprising multiple instruction implementations. At least one of the multiple instruction implementations is configured to change the processor from a first instruction execution mode to a second instruction execution mode. The processor comprises an instruction fetcher configured to fetch an instruction from one of the multiple instruction streams based at least in part upon a current instruction execution mode. | 07-07-2011 |
20110179254 | LIMITING SPECULATIVE INSTRUCTION FETCHING IN A PROCESSOR - The described embodiments relate to a processor that speculatively executes instructions. During operation, the processor often executes instructions in a speculative-execution mode. Upon detecting an impending pipe-clearing event while executing instructions in the speculative-execution mode, the processor stalls an instruction fetch unit to prevent the instruction fetch unit from fetching instructions. In some embodiments, the processor stalls the instruction fetch unit until a condition that originally caused the processor to operate in the speculative-execution mode is resolved. In alternative embodiments, the processor maintains the stall of the instruction fetch unit until the pipe-clearing event has been completed (i.e., has been handled in the processor). | 07-21-2011 |
20110202748 | LOAD PAIR DISJOINT FACILITY AND INSTRUCTION THEREFORE - A Load/Store Disjoint instruction, when executed by a CPU, accesses operands from two disjoint memory locations and sets condition code indicators to indicate whether or not the two operands appeared to be accessed atomically by means of block-concurrent interlocked fetch with no intervening stores to the operands from other CPUs. In a Load Pair Disjoint form of the instruction, the accesses are loads and the disjoint data is stored in general registers. | 08-18-2011 |
20110208949 | HARDWARE THREAD DISABLE WITH STATUS INDICATING SAFE SHARED RESOURCE CONDITION - A technique for indicating a safe shared resource condition with respect to a disabled thread provides a mechanism for providing a fast indication to other hardware threads that a temporarily disabled thread can no longer impact shared resources, such as shared special-purpose registers and translation look-aside buffers within the processor core. Signals from pipelines within the core indicates whether any of the instructions pending in the pipeline impact the shared resources and if not, then the thread disable status is presented to the other threads via a state change in a thread status register. Upon receiving an indication that a particular hardware thread is to be disabled, control logic halts the dispatch of instructions for the particular hardware thread, and then waits until any indication that a shared resource is impacted by an instruction has cleared. Then the control logic updates the thread status to indicate the thread is disabled. | 08-25-2011 |
20110225394 | INSTRUCTION BREAKPOINTS IN A MULTI-CORE, MULTI-THREAD NETWORK COMMUNICATIONS PROCESSOR ARCHITECTURE - Described embodiments provide a packet classifier for a network processor that generates tasks corresponding to each received packet. The packet classifier includes a scheduler to generate threads of contexts corresponding to tasks received by the packet classifier from a plurality of processing modules of the network processor. A multi-thread instruction engine processes instructions corresponding to threads received from the scheduler. The multi-thread instruction engine executes instructions by fetching an instruction of the thread from an instruction memory of the packet classifier and determining whether a breakpoint mode of the network processor is enabled. If the breakpoint mode is enabled, and breakpoint indicator of the fetched instruction is set, the packet classifier enters a breakpoint mode. Otherwise, if the breakpoint indicator of the fetched instruction is not set, the multi-thread instruction engine executes the fetched instruction. | 09-15-2011 |
20110225395 | DATA PROCESSING SYSTEM AND CONTROL METHOD THEREOF - In a data processing system which includes a processor performing a processing in correspondence with a fetched instruction and a DRC capable of dynamically reconfiguring a circuit configuration in correspondence with configuration data, when the processor fetches the instruction, a configuration data decoder identifies whether or not the instruction is a configuration data instruction. When the instruction is the configuration data instruction, the configuration data is read from the configuration data memory in which the configuration data is housed and supplied to the DRC based on address information included in the configuration data instruction, thereby to enable the configuration data to be supplied to the DRC at a timing the same as a timing at which the instruction is fetched, so that the configuration data can be supplied at a high speed. | 09-15-2011 |
20110231632 | SEMICONDUCTOR DEVICE - Receiving a request for canceling setting, a control circuit erases data stored in a corresponding block, changes a value of a protection flag, and cancels protection setting. When an overall protection is set for any block, the control circuit prohibits access to all blocks, except when it is an operation mode for activating a memory program contained in the microcomputer. Further, control circuit permits an access to a block M only when partial protection is set, CPU is in the mode for activating a memory program contained in the microcomputer and the access is for reading an instruction code in accordance with an instruction fetch. | 09-22-2011 |
20110238952 | INSTRUCTION FETCH APPARATUS, PROCESSOR AND PROGRAM COUNTER ADDITION CONTROL METHOD - An instruction fetch apparatus is disclosed which includes: a program counter configured to manage the address of an instruction targeted to be executed in a program in which instructions belonging to a plurality of instruction sequences are placed sequentially; a change designation register configured to designate a change of an increment value on the program counter; an increment value register configured to hold the changed increment value; and an addition control section configured such that if the change designation register designates the change of the increment value on the program counter, then the addition control section increments the program counter based on the changed increment value held in the increment value register, the addition control section further incrementing the program counter by an instruction word length if the change designation register does not designate any change of the increment value on the program counter. | 09-29-2011 |
20110252221 | MICROCOMPUTER AND INTERRUPT CONTROL METHOD - A microcomputer includes: a plurality of register lists having a plurality of register patterns, respectively, wherein each of plurality of register patterns designates registers, data of which are to be saved in a data memory; an instruction fetch control circuit configured to fetch instruction code from an instruction memory in response to an interrupt request issued based on occurrence of an interrupt factor; and a register data saving control circuit configured to acquire one register pattern from one of the plurality of register lists in response to the interrupt request, and issue a microinstruction based on the acquired register pattern in response to the interrupt request. An instruction executing section is configured to execute the microinstruction prior to the fetched instruction code, to save the data of registers designated based on the acquired register pattern in the data memory. | 10-13-2011 |
20110264892 | DATA PROCESSING DEVICE - Provided is a data processing device ( | 10-27-2011 |
20110264893 | DATA PROCESSOR AND IC CARD - The data processor includes: a memory device for storing a program compiled by a compiler; and CPU operable to fetch an instruction code included by a program stored in the memory device. Further, the data processor has a filter for judging an instruction code which the compiler never outputs to limit, in action, CPU in case that CPU fetches the instruction code, which limits, in action, CPU in the case where the program is rewritten by not only an undefined instruction, but also an instruction other than an undefined instruction. The level of security is increased by limiting, in action, CPU. | 10-27-2011 |
20110276784 | HIERARCHICAL MULTITHREADED PROCESSING - In one embodiment, a current candidate thread is selected from each of multiple first groups of threads using a low granularity selection scheme, where each of the first groups includes multiple threads and first groups are mutually exclusive. A second group of threads is formed comprising the current candidate thread selected from each of the first groups of threads. A current winning thread is selected from the second group of threads using a high granularity selection scheme. An instruction is fetched from a memory based on a fetch address for a next instruction of the current winning thread. The instruction is then dispatched to one of the execution units for execution, whereby execution stalls of the execution units are reduced by fetching instructions based on the low granularity and high granularity selection schemes. | 11-10-2011 |
20110276785 | BYTE CODE CONVERSION ACCELERATION DEVICE AND A METHOD FOR THE SAME - Provided is a bytecode conversion acceleration device and a method for the same: allowing a reduction in the size of a storage unit for a look-up table including a decoding table, a link table and a native code table; increasing the number of bytecodes that can be processed by hardware by using the look-up table to thereby enhance the overall performance of a virtual machine; and allowing an execution portion to immediately execute the first native code to thereby enhance performance of the virtual machine. | 11-10-2011 |
20110289299 | System and Method to Evaluate a Data Value as an Instruction - A system and method to evaluate a data value as an instruction is disclosed. For example, an apparatus configured to execute program code includes an execute unit configured to execute a first instruction associated with a location of a second instruction. The first instruction is identified by a program counter. The apparatus also includes a decode unit configured to receive the second instruction from the location and to decode the second instruction to generate a decoded second instruction without changing the program counter to point to the second instruction. | 11-24-2011 |
20110289300 | Indirect Branch Target Predictor that Prevents Speculation if Mispredict Is Expected - In one embodiment, a processor implements an indirect branch target predictor to predict target addresses of indirect branch instructions. The indirect branch target predictor may store target addresses generated during previous executions of indirect branches, and may use the stored target addresses as predictions for current indirect branches. The indirect branch target predictor may also store a validation tag corresponding to each stored target address. The validation tag may be compared to similar data corresponding to the current indirect branch being predicted. If the validation tag does not match, the indirect branch is presumed to be mispredicted (since the branch target address actually belongs to a different instruction). The indirect branch target predictor may inhibit speculative execution subsequent to the mispredicted indirect branch until the redirect is signalled for the mispredicted indirect branch. | 11-24-2011 |
20110296142 | PROCESSOR AND METHOD PROVIDING INSTRUCTION SUPPORT FOR INSTRUCTIONS THAT UTILIZE MULTIPLE REGISTER WINDOWS - A processor including instruction support for large-operand instructions that use multiple register windows may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may also include an instruction execution unit that, during operation, receives instructions for execution from the instruction fetch unit and executes a large-operand instruction defined within the ISA, where execution of the large-operand instruction is dependent upon a plurality of registers arranged within a plurality of register windows. The processor may further include control circuitry (which may be included within the fetch unit, the execution unit, or elsewhere within the processor) that determines whether one or more of the register windows depended upon by the large-operand instruction are not present. In response to determining that one or more of these register windows are not present, the control circuitry causes them to be restored. | 12-01-2011 |
20110307686 | Method for Instructing a data processor to process data - A data processor which executes instructions described in first and second instruction formats. The first instruction format defines a register-addressing field of a predetermined size, while the second instruction format defines a register-addressing field of a size larger than that of the register-addressing field defined by the first instruction format. The data processor includes: instruction-type identifier, responsive to an instruction, for identifying the received instruction as being described in the first or second instruction format by the instruction itself; a first register file including a plurality of registers; and a second register file also including a plurality of registers, the number of the registers included in the second register file being larger than that of the registers included in the first register file. | 12-15-2011 |
20110314260 | HIGH-WORD FACILITY FOR EXTENDING THE NUMBER OF GENERAL PURPOSE REGISTERS AVAILABLE TO INSTRUCTIONS - A computer employs a set of General Purpose Registers (GPRs). Each GPR comprises a plurality of portions. Programs such as an Operating System and Applications operating in a Large GPR mode, access the full GPR, however programs such as Applications operating in Small GPR mode, only have access to a portion at a time. Instruction Opcodes, in Small GPR mode, may determine which portion is accessed. | 12-22-2011 |
20110320772 | CONTROLLING THE SELECTIVELY SETTING OF OPERATIONAL PARAMETERS FOR AN ADAPTER - An instruction is provided to establish various operational parameters for an adapter. These parameters include adapter interruption parameters, input/output address translation parameters, resetting error indications, setting measurement parameters, and setting an interception control, as examples. The instruction specifies a function information block, which is a program representation of a device table entry used by the adapter, to be used in certain situations in establishing the parameters. A store instruction is also provided that stores the current contents of the function information block. | 12-29-2011 |
20110320773 | FUNCTION VIRTUALIZATION FACILITY FOR BLOCKING INSTRUCTION FUNCTION OF A MULTI-FUNCTION INSTRUCTION OF A VIRTUAL PROCESSOR - In a processor supporting execution of a plurality of functions of an instruction, an instruction blocking value is set for blocking one or more of the plurality of functions, such that an attempt to execute one of the blocked functions, will result in a program exception and the instruction will not execute, however the same instruction will be able to execute any of the functions that are not blocked functions. | 12-29-2011 |
20110320774 | OPERAND FETCHING CONTROL AS A FUNCTION OF BRANCH CONFIDENCE - A system for data operand fetching control includes a computer processor that includes a control unit for determining memory access operations. The control unit is configured to perform a method. The method includes calculating a summation weight value for each instruction in a pipeline, the summation weight value calculated as a function of branch uncertainty and a pendency in which the instruction resides in the pipeline relative to other instructions in the pipeline. The method also includes mapping the summation weight value of a selected instruction that is attempting to access system memory to a memory access control, each memory access control specifying a manner of handling data fetching operations. The method further includes performing a memory access operation for the selected instruction based upon the mapping. | 12-29-2011 |
20110320775 | ACCELERATING EXECUTION OF COMPRESSED CODE - Methods and apparatus relating to accelerating execution of compressed code are described. In one embodiment, a two-level embedded code decompression scheme is utilized which eliminates bubbles, which may increase speed and/or reduce power consumption. Other embodiments are also described and claimed. | 12-29-2011 |
20120023311 | PROCESSOR APPARATUS AND MULTITHREAD PROCESSOR APPARATUS - A processor apparatus according to the present invention is a processor apparatus which shares hardware resources between a plurality of processors, and includes: a first determination unit which determines whether or not a register in each of the hardware resources holds extension context data of a program that is currently executed; a second determination unit which determines to which processor the extension context data in the hardware resource corresponds; a first transfer unit which saves and restores the extension context data between programs in the processor; and a second transfer unit which saves and restores the extension context data between programs between different processors. | 01-26-2012 |
20120023312 | SUBPROCESSOR, INTEGRATED CIRCUIT DEVICE, AND ELECTRONIC APPARATUS - A subprocessor, an integrated circuit device, and an electronic apparatus or the like capable of performing data processing efficiently are provided. A subprocessor is connected to a host processor through a bus controller. The subprocessor includes: a command fetch unit that fetches a command from a subprocessor program; a register unit; a command decoding unit that decodes the command; and an operation unit that performs command execution processing. The host processor sets a program counter value indicating a storage destination of the subprocessor program and a processing start command for, the processing of the subprocessor to the register unit. The command fetch unit fetches a command designated by the program counter value, the command decoding unit decodes the command, and the operation unit performs command execution processing. | 01-26-2012 |
20120066478 | METHOD FOR FAST PARALLEL INSTRUCTION LENGTH DETERMINATION - The present invention provides a method and apparatus that may be used for parallel instruction length decoding. One embodiment of the method includes concurrently determining a plurality of masks identifying bytes in a plurality of candidate instructions. Each mask uses a different byte in a first fetch window as a starting byte and the corresponding one of the plurality of candidate instructions includes the starting byte. This embodiment of the method also includes selecting one of the masks to identify one of the candidate instructions as a first instruction using information indicating an ending byte of a previous instruction. | 03-15-2012 |
20120066479 | METHODS AND APPARATUS FOR HANDLING SWITCHING AMONG THREADS WITHIN A MULTITHREAD PROCESSOR - A system, apparatus and method for handling switching among threads within a multithread processor are described herein. Embodiments of the present invention provide a method for multithread handling that includes fetching and issuing one or more instructions, corresponding to a first instruction execution thread, to an execution block for execution during a cycle count associated with the first instruction execution thread and when the instruction execution thread is in an active mode. The method further includes switching a second instruction execution thread to the active mode when the cycle count corresponding to the first instruction execution thread is complete, and fetching and issuing one or more instructions, corresponding to the second instruction execution thread, to the execution block for execution during a cycle count associated with the second instruction execution thread. The method additionally includes resetting the cycle counts when a master instruction execution thread is in the active mode. | 03-15-2012 |
20120072700 | MULTI-LEVEL REGISTER FILE SUPPORTING MULTIPLE THREADS - A processor includes an instruction fetch unit, an issue queue coupled to the instruction fetch unit, an execution unit coupled to the issue queue, and a multi-level register file including a first level register file having lower access latency and a second level register file having higher access latency. Each of the first and second level register files includes a plurality of physical registers for holding operands that is concurrently shared by a plurality of threads. The processor further includes a mapper that, at dispatch of an instruction specifying a source logical register from the instruction fetch unit to the issue queue, initiates a swap of a first operand associated with the source logical register that is in the second level register file with a second operand held in the first level register file. The issue queue, following the swap, issues the instruction to the execution unit for execution. | 03-22-2012 |
20120072701 | Method Macro Expander - One embodiment of the present invention sets forth a [TODO once claims are reviewed] | 03-22-2012 |
20120079241 | INSTRUCTION EXECUTION BASED ON OUTSTANDING LOAD OPERATIONS - One embodiment of the present invention sets forth a technique for scheduling thread execution in a multi-threaded processing environment. A two-level scheduler maintains a small set of active threads called strands to hide function unit pipeline latency and local memory access latency. The strands are a sub-set of a larger set of pending threads that is also maintained by the two-leveler scheduler. Pending threads are promoted to strands and strands are demoted to pending threads based on latency characteristics, such as whether outstanding load operations have been executed. The longer latency of the pending threads is hidden by selecting strands for execution. When the latency for a pending thread is expired, the pending thread may be promoted to a strand and begin (or resume) execution. When a strand encounters a latency event, the strand may be demoted to a pending thread while the latency is incurred. | 03-29-2012 |
20120084534 | SYSTEM AND METHOD FOR FAST BRANCHING USING A PROGRAMMABLE BRANCH TABLE - Methods and systems consistent with the present invention provide a programmable table which allows software to define a plurality of branching functions, each of which maps a vector of condition codes to a branch offset. This technique allows for a flexible multi-way branching functionality, using a conditional branch outcome table that can be specified by a programmer. Any instruction can specify the evaluation of arbitrary conditional expressions to compute the values for the condition codes, and can choose a particular branching function. When the processor executes the instruction, the processor's arithmetic/logical functional units evaluate the conditional expressions and then the processor performs the branch operation, according to the specified branching function. | 04-05-2012 |
20120089816 | QUERY SAMPLING INFORMATION INSTRUCTION - A measurement sampling facility takes snapshots of the central processing unit (CPU) on which it is executing at specified sampling intervals to collect data relating to tasks executing on the CPU. The collected data is stored in a buffer, and at selected times, an interrupt is provided to remove data from the buffer to enable reuse thereof. The interrupt is not taken after each sample, but in sufficient time to remove the data and minimize data loss. | 04-12-2012 |
20120096239 | Low Power Execution of a Multithreaded Program - Technologies for low power execution of one or more threads of a multithreaded program by one or more processing elements are generally disclosed. | 04-19-2012 |
20120124335 | System Core for Transferring Data Between an External Device and Memory - Details of a highly cost effective and efficient implementation of a manifold array (ManArray) architecture and instruction syntax for use therewith are described herein. Various aspects of this approach include the regularity of the syntax, the relative ease with which the instruction set can be represented in database form, the ready ability with which tools can be created, the ready generation of self-checking codes and parameterized test cases. Parameterizations can be fairly easily mapped and system maintenance is significantly simplified. | 05-17-2012 |
20120144163 | DATA PROCESSING METHOD AND SYSTEM BASED ON PIPELINE - A data processing system and method are disclosed. The system comprises an instruction-fetch stage where an instruction is fetched and a specific instruction is input into decode stage; a decode stage where said specific instruction indicates that contents of a register in a register file are used as an index, and then, the register file pointed to by said index is accessed based on said index; an execution stage where an access result of said decode stage is received, and computations are implemented according to the access result of the decode stage. | 06-07-2012 |
20120151185 | FINE-GRAINED PRIVILEGE ESCALATION - A processor and a method for privilege escalation in a processor are provided. The method may comprise fetching an instruction from a fetch address, where the instruction requires the processor to be in supervisor mode for execution, and determining whether the fetch address is within a predetermined address range. The instruction is filtered through an instruction mask and then it is determined whether the instruction, after being filtered through the mask, equals the value in an instruction value compare register. The processor privilege is raised to supervisor mode for execution of the instruction in response to the fetch address being within the predetermined address range and the filtered instruction equaling the value in the instruction value compare register, wherein the processor privilege is raised to supervisor mode without use of an interrupt. The processor privilege returns to its previous level after execution of the instruction. | 06-14-2012 |
20120151186 | CONTROLLING SIMULATION OF A MICROPROCESSOR INSTRUCTION FETCH UNIT THROUGH MANIPULATION OF INSTRUCTION ADDRESSES - Instruction fetch unit (IFU) verification is improved by dynamically monitoring the current state of the IFU model and detecting any predetermined states of interest. The instruction address sequence is automatically modified to force a selected address to be fetched next by the IFU model. The instruction address sequence may be modified by inserting one or more new instruction addresses, or by jumping to a non-sequential address in the instruction address sequence. In exemplary implementations, the selected address is a corresponding address for an existing instruction already loaded in the IFU cache, or differs only in a specific field from such an address. The instruction address control is preferably accomplished without violating any rules of the processor architecture by sending a flush signal to the IFU model and overwriting an address register corresponding to a next address to be fetched. | 06-14-2012 |
20120159124 | METHOD AND SYSTEM FOR COMPUTATIONAL ACCELERATION OF SEISMIC DATA PROCESSING - A computer-implemented method and a system for computational acceleration of seismic data processing are described. The method includes defining a specific non-uniform memory access (NUMA) scheduling for a plurality of cores in a processor according to data to be processed; and running two or more threads through each of the plurality of cores. | 06-21-2012 |
20120159125 | EFFICIENCY OF SHORT LOOP INSTRUCTION FETCH - A method, system and computer program product for instruction fetching within a processor instruction unit, utilizing a loop buffer, one or more virtual loop buffers, and/or an instruction buffer. During instruction fetch, modified instruction buffers coupled to an instruction cache (I-cache) temporarily store instructions from a single branch, backwards short loop. The modified instruction buffers may be a loop buffer, one or more virtual loop buffers, and/or an instruction buffer. Instructions are stored in the modified instruction buffers for the length of the loop cycle. The instruction fetch within the instruction unit of a processor retrieves the instructions for the short loop from the modified buffers during the loop cycle, rather than from the instruction cache. | 06-21-2012 |
20120166767 | SYSTEM, APPARATUS, AND METHOD FOR SEGMENT REGISTER READ AND WRITE REGARDLESS OF PRIVILEGE LEVEL - Embodiments of systems, apparatuses, and methods for performing privilege agnostic segment base register read or write instruction are described. An exemplary method may include fetching the privilege agnostic segment base register write instruction, wherein the privilege agnostic write instruction includes a 64-bit data source operand, decoding the fetched privilege agnostic segment base register write instruction, and executing the decoded privilege agnostic segment base register write instruction to write the 64-bit data of the source operand into the segment base register identified by the opcode of the privilege agnostic segment base register write instruction. | 06-28-2012 |
20120173848 | PIPELINE FLUSH FOR PROCESSOR THAT MAY EXECUTE INSTRUCTIONS OUT OF ORDER - An embodiment of an instruction pipeline includes first and second sections. The first section is operable to provide first and second ordered instructions, and the second section is operable, in response to the second instruction, to read first data from a data-storage location, is operable, in response to the first instruction, to write second data to the data-storage location after reading the first data, and is operable, in response to the writing the second data after reading the first data, to cause the flushing of a some, but not all, of the pipeline. Such an instruction pipeline may reduce the processing time lost and the energy expended due to a pipeline flush by flushing only a portion of the pipeline instead of flushing the entire pipeline. | 07-05-2012 |
20120173849 | Methods and Apparatus for Scalable Array Processor Interrupt Detection and Response - Hardware and software techniques for interrupt detection and response in a scalable pipelined array processor environment are described. Utilizing these techniques, a sequential program execution model with interrupts can be maintained in a highly parallel scalable pipelined array processing containing multiple processing elements and distributed memories and register files. When an interrupt occurs, interface signals are provided to all PEs to support independent interrupt operations in each PE dependent upon the local PE instruction sequence prior to the interrupt. Processing/element exception interrupts are supported and low latency interrupt processing is also provided for embedded systems where real time signal processing is to required. Further, a hierarchical interrupt structure is used allowing a generalized debug approach using debut interrupts and a dynamic debut monitor mechanism. | 07-05-2012 |
20120191947 | COMPUTER OPERATION CONTROL METHOD, PROGRAM AND SYSTEM - A computer implemented control method, article of manufacture, and computer implemented system for determining whether stack allocation is possible. The method includes: allocating an object created by a method frame to a stack. The allocation is performed in response to: calling a first and second instruction in the method frame; the first instruction causes an escape of the object, and the second instruction cancels the escape of the object; the object does not escape to a thread other than a thread to which the object has escaped, at the point in time when the escape is cancelled; the first instruction has been called before the second instruction is called; and the object does not escape in accordance with an instruction other than the first instruction in the method frame, regardless of whether the object escapes in accordance with the first instruction. | 07-26-2012 |
20120191948 | Nested Virtualization Performance In A Computer System - A virtualization architecture for improving the performance of nested virtualization in a computer system. A virtualization instruction reads or writes data in a control structure used by a virtual machine monitor (VMM) to maintain state on a virtual machine (VM) to support transitions between a root mode of operation of a CPU in which the VMM executes and a non-root mode of operation of the CPU in which the VM executes. A privileged data access is made to a primary control structure according to the virtualization instruction if the CPU is in the root mode. A non-privileged data access is made to a secondary control structure according to the virtualization instruction if the CPU is in the non-root mode and a secondary control structure field in the primary control structure is enabled. | 07-26-2012 |
20120204004 | Processor with a Hybrid Instruction Queue - A queuing apparatus having a hierarchy of queues, in one of a number of aspects, is configured to control backpressure between processors in a multiprocessor system. A fetch queue is coupled to an instruction cache and configured to store first instructions for a first processor and second instructions for a second processor in an order fetched from the instruction cache. An in-order queue is coupled to the fetch queue and configured to store the second instructions accepted from the fetch queue in response to a write indication. An out-of-order queue is coupled to the fetch queue and to the in-order queue and configured to store the second instructions accepted from the fetch queue in response to an indication that space is available in the out-of-order queue, wherein the second instructions may be accessed out-of-order with respect to other second instructions executing on different execution pipelines. | 08-09-2012 |
20120204005 | Processor with a Coprocessor having Early Access to Not-Yet Issued Instructions - Apparatus and methods provide early access of instructions. A fetch queue is coupled to an instruction cache and configured to store a mix of processor instructions for a first processor and coprocessor instructions for a second processor. A coprocessor instruction selector is coupled to the fetch queue and configured to copy coprocessor instructions from the fetch queue. A queue is coupled to the coprocessor instruction selector and from which coprocessor instructions are accessed for execution before the coprocessor instruction is issued to the first processor. Execution of the copied coprocessor instruction is started in the coprocessor before the coprocessor instruction is issued to a processor. The execution of the copied coprocessor instruction is completed based on information received from the processor after the coprocessor instruction has been issued to the processor. | 08-09-2012 |
20120216020 | INSTRUCTION SUPPORT FOR PERFORMING STREAM CIPHER - Techniques relating to a processor that provides instruction-level support for a stream cipher are disclosed. In one embodiment, the processor supports a first instruction executable to perform an alpha multiplication, an alpha division, and an exclusive-OR operation using a result of the alpha multiplication and a result of the alpha division. In one embodiment, the processor supports a second instruction executable to perform a modular addition of a value R1 and a value S, and to perform a first exclusive-OR operation on a result of the modular addition and a value R2. In one embodiment, the processor supports a third instruction executable to perform a substitution-box (S-Box) operation on a value R1 to produce a value R2′, and to perform a modular addition using a value R2 to produce a value R1'. | 08-23-2012 |
20120239908 | DUAL THREAD PROCESSOR - Pipeline processor architectures, processors, and methods are provided. A described processor includes thread allocation counters for corresponding processor threads. For example, a first counter is configured to store a first processor time allocation that controls first periods of processor time for a first processor thread, the first processor thread retaining control of the processor during each of the first periods of processor time. The processor causes data associated with the first processor thread to pass through the processor's pipeline during the first periods of processor time. A second counter is similarly configured. The processor can be configured to receive an input defining processor time to be allocated to one or more processor threads and to use the input to change one or more of the counters such that subsequent periods of processor times for the one or more processor threads are affected. | 09-20-2012 |
20120246445 | CENTRAL PROCESSING UNIT AND MICROCONTROLLER - A program data area | 09-27-2012 |
20120246446 | Dynamically Determining the Profitability of Direct Fetching in a Multicore Architecture - Technologies are generally described herein for determining a profitability of direct fetching in a multicore processor. The multicore processor may include a first and a second tile. The first tile may include a first core and a first cache. The second tile may include a second core, a second cache, and a fetch location pointer register (FLPR). The multicore processor may migrate a thread executing on the first core to the second core. The multicore processor may store a location of the first cache in the FLPR. The multicore processor may execute the thread on the second core. The multicore processor may identify a cache miss for a block in the second cache. The multicore processor may determine whether a profitability of direct fetching of the block indicates direct fetching or directory-based fetching. The multicore processor may perform direct fetching or directory-based fetching based on the determination. | 09-27-2012 |
20120254590 | SYSTEM AND TECHNIQUE FOR A PROCESSOR HAVING A STAGED EXECUTION PIPELINE - A technique includes receiving a request from a processor to retrieve a first instruction from a memory for a staged execution pipeline. The technique includes selectively retrieving the first instruction from the memory in response to the request based on a determination of whether the processor will execute the first instruction. | 10-04-2012 |
20120254591 | SYSTEMS, APPARATUSES, AND METHODS FOR STRIDE PATTERN GATHERING OF DATA ELEMENTS AND STRIDE PATTERN SCATTERING OF DATA ELEMENTS - Embodiments of systems, apparatuses, and methods for performing gather and scatter stride instruction in a computer processor are described. In some embodiments, the execution of a gather stride instruction causes a conditionally storage of strided data elements from memory into the destination register according to at least some of bit values of a writemask. | 10-04-2012 |
20120254592 | SYSTEMS, APPARATUSES, AND METHODS FOR EXPANDING A MEMORY SOURCE INTO A DESTINATION REGISTER AND COMPRESSING A SOURCE REGISTER INTO A DESTINATION MEMORY LOCATION - Embodiments of systems, apparatuses, and methods for performing an expand and/or compress instruction in a computer processor are described. In some embodiments, the execution of an expand instruction causes the selection of elements from a source that are to be sparsely stored in a destination based on values of the writemask and store each selected data element of the source as a sparse data element into a destination location, wherein the destination locations correspond to each writemask bit position that indicates that the corresponding data element of the source is to be stored. | 10-04-2012 |
20120254593 | SYSTEMS, APPARATUSES, AND METHODS FOR JUMPS USING A MASK REGISTER - Embodiments of systems, apparatuses, and methods for performing a jump instruction in a computer processor are described. In some embodiments, the execution of a blend instruction causes a conditional jump to an address of a target instruction when all of bits of a writemask are zero, wherein the address of the target instruction is calculated using an instruction pointer of the instruction and the relative offset. | 10-04-2012 |
20120254594 | Hardware Assist Thread for Increasing Code Parallelism - Mechanisms are provided for offloading a workload from a main thread to an assist thread. The mechanisms receive, in a fetch unit of a processor of the data processing system, a branch-to-assist-thread instruction of a main thread. The branch-to-assist-thread instruction informs hardware of the processor to look for an already spawned idle thread to be used as an assist thread. Hardware implemented pervasive thread control logic determines if one or more already spawned idle threads are available for use as an assist thread. The hardware implemented pervasive thread control logic selects an idle thread from the one or more already spawned idle threads if it is determined that one or more already spawned idle threads are available for use as an assist thread, to thereby provide the assist thread. In addition, the hardware implemented pervasive thread control logic offloads a portion of a workload of the main thread to the assist thread. | 10-04-2012 |
20120272043 | REQUEST COALESCING FOR INSTRUCTION STREAMS - Sequential fetch requests from a set of fetch requests are combined into longer coalesced requests that match the width of a system memory interface in order to improve memory access efficiency for reading the data specified by the fetch requests. The fetch requests may be of different classes and each data class is coalesced separately, even when intervening fetch requests are of a different class. Data read from memory is ordered according to the order of the set of fetch requests to produce an instruction stream that includes the fetch requests for the different classes. | 10-25-2012 |
20120272044 | PROCESSOR FOR EXECUTING HIGHLY EFFICIENT VLIW - A 32-bit instruction | 10-25-2012 |
20120284488 | Methods and Apparatus for Constant Extension in a Processor - Programs often require constants that cannot be encoded in a native instruction format, such as | 11-08-2012 |
20120290817 | BRANCH TARGET STORAGE AND RETRIEVAL IN AN OUT-OF-ORDER PROCESSOR - A processor configured to facilitate transfer and storage of predicted targets for control transfer instructions (CTIs). In certain embodiments, the processor may be multithreaded and support storage of predicted targets for multiple threads. In some embodiments, a CTI branch target may be stored by one element of a processor and a tag may indicate the location of the stored target. The tag may be associated with the CTI rather than associating the complete target address with the CTI. When the CTI reaches an execution stage of the processor, the tag may be used to retrieve the predicted target address. In some embodiments using a tag to retrieve a predicted target, CTI instructions from different processor threads may be interleaved without affecting retrieval of predicted targets. | 11-15-2012 |
20120303934 | METHOD AND APPARATUS FOR GENERATING AN ENHANCED PROCESSOR RESYNC INDICATOR SIGNAL USING HASH FUNCTIONS AND A LOAD TRACKING UNIT - A method and apparatus are described for generating a signal to resync a processor. In one embodiment, a particular load operation is picked from a load queue in the processor, and the particular load operation is completed out of order with respect to other load operations in the load queue. A load ordering block (LOB) in the processor receives a physical address of the completed load operation, and receives a probe data address that indicates an address of a requested data line. The LOB generates a signal to resync the processor when the physical address of the completed load operation matches the probe data address, (i.e., when bits, that have been set in a bit vector (e.g., Bloom filter) of the LOB by hashing the physical address of the completed load operation, match bits generated by hashing the probe data address). | 11-29-2012 |
20120311304 | PROCESSOR, COMPUTER PRODUCT, COMPRESSION APPARATUS, AND COMPRESSION METHOD - A processor accesses memory storing a compressed instruction sequence that includes compression information indicating that an instruction that with respect to the preceding instruction, has identical operation code and operand continuity is compressed. The processor includes a fetcher that fetches a bit string from the memory and determines whether the bit string is a non-compressed instruction, where if so, transfers the given bit string and if not, transfers the compression information; and a decoder that upon receiving the non-compressed instruction, holds in a buffer, instruction code and an operand pattern of the non-compressed instruction and executes processing to set to an initial value, the value of an instruction counter that indicates a count of consecutive instructions having identical operation code and operand continuity, and upon receiving the compression information, restores the instruction code based on the instruction code held in the buffer, the instruction counter value, and the operand pattern. | 12-06-2012 |
20130007415 | METHOD AND APPARATUS FOR SCHEDULING OF INSTRUCTIONS IN A MULTI-STRAND OUT-OF-ORDER PROCESSOR - In accordance with embodiments disclosed herein, there are provided methods, systems, and apparatuses for scheduling instructions in a multi-strand out-of-order processor. For example, an apparatus for scheduling instructions in a multi-strand out-of-order processor includes an out-of-order instruction fetch unit to retrieve a plurality of interdependent instructions for execution from a multi-strand representation of a sequential program listing; an instruction scheduling unit to schedule the execution of the plurality of interdependent instructions based at least in part on operand synchronization bits encoded within each of the plurality of interdependent instructions; and a plurality of execution units to execute at least a subset of the plurality of interdependent instructions in parallel. | 01-03-2013 |
20130007416 | METHOD AND APPARATUS FOR SHUFFLING DATA - Method, apparatus, and program means for shuffling data. The method of one embodiment comprises receiving a first operand having a set of L data elements and a second operand having a set of L control elements. For each control element, data from a first operand data element designated by the individual control element is shuffled to an associated resultant data element position if its flush to zero field is not set and a zero is placed into the associated resultant data element position if its flush to zero field is not set. | 01-03-2013 |
20130013895 | BYTE-ORIENTED MICROCONTROLLER HAVING WIDER PROGRAM MEMORY BUS SUPPORTING MACRO INSTRUCTION EXECUTION, ACCESSING RETURN ADDRESS IN ONE CLOCK CYCLE, STORAGE ACCESSING OPERATION VIA POINTER COMBINATION, AND INCREASED POINTER ADJUSTMENT AMOUNT - An exemplary byte-oriented microcontroller includes a program memory, a program memory bus, and a core circuit. The program memory bus has a bus width wider than one instruction byte, and the core circuit is coupled to the program memory through the program memory bus for executing at least one instruction by processing a plurality of instruction bytes fetched from the program memory. The core circuit includes a fetch unit, for fetching the instruction bytes through the program memory bus and re-ordering the fetched instruction bytes to form a complete instruction. | 01-10-2013 |
20130013896 | LOAD/MOVE AND DUPLICATE INSTRUCTIONS FOR A PROCESSOR - A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register. | 01-10-2013 |
20130024661 | HARDWARE ACCELERATION COMPONENTS FOR TRANSLATING GUEST INSTRUCTIONS TO NATIVE INSTRUCTIONS - A hardware based translation accelerator. The hardware includes a guest fetch logic component for accessing guest instructions; a guest fetch buffer coupled to the guest fetch logic component and a branch prediction component for assembling guest instructions into a guest instruction block; and conversion tables coupled to the guest fetch buffer for translating the guest instruction block into a corresponding native conversion block. The hardware further includes a native cache coupled to the conversion tables for storing the corresponding native conversion block, and a conversion look aside buffer coupled to the native cache for storing a mapping of the guest instruction block to corresponding native conversion block, wherein upon a subsequent request for a guest instruction, the conversion look aside buffer is indexed to determine whether a hit occurred, wherein the mapping indicates the guest instruction has a corresponding converted native instruction in the native cache. | 01-24-2013 |
20130036294 | SYSTEM AND METHOD FOR INSTRUCTION SETS WITH RUN-TIME CONSISTENCY CHECK - A system and method includes modules for determining whether an instruction is a target of a non-sequential fetch operation with an expected numerical property value, and avoiding execution of the instruction if it is the target of the non-sequential fetch operation and does not have the expected numerical property. Other embodiments include encoding an instruction with a functionality that is a target of a non-sequential fetch operation with an expected numerical property value. Instructions with the same functionality that are not targets of non-sequential fetch operations can be encoded with a different numerical property value. More specific embodiments can include a numerical property of parity, determining whether the instruction is valid, and throwing an exception, setting status bits, sending an interrupt to a control processor, and a combination thereof to avoid execution. | 02-07-2013 |
20130042089 | WORD LINE LATE KILL IN SCHEDULER - A method for picking an instruction for execution by a processor includes providing a multiple-entry vector, each entry in the vector including an indication of whether a corresponding instruction is ready to be picked. The vector is partitioned into equal-sized groups, and each group is evaluated starting with a highest priority group. The evaluating includes logically canceling all other groups in the vector when a group is determined to include an indication that an instruction is ready to be picked, whereby the vector only includes a positive indication for the one instruction that is ready to be picked. | 02-14-2013 |
20130054940 | MECHANISM FOR INSTRUCTION SET BASED THREAD EXECUTION ON A PLURALITY OF INSTRUCTION SEQUENCERS - In an embodiment, a method is provided. The method includes managing user-level threads on a first instruction sequencer in response to executing user-level instructions on a second instruction sequencer that is under control of an application level program. A first user-level thread is run on the second instruction sequencer and contains one or more user level instructions. A first user level instruction has at least 1) a field that makes reference to one or more instruction sequencers or 2) implicitly references with a pointer to code that specifically addresses one or more instruction sequencers when the code is executed. | 02-28-2013 |
20130067200 | MFENCE AND LFENCE MICRO-ARCHITECTURAL IMPLEMENTATION METHOD AND SYSTEM - A system and method for fencing memory accesses. Memory loads can be fenced, or all memory access can be fenced. The system receives a fencing instruction that separates memory access instructions into older accesses and newer accesses. A buffer within the memory ordering unit is allocated to the instruction. The access instructions newer than the fencing instruction are stalled. The older access instructions are gradually retired. When all older memory accesses are retired, the fencing instruction is dispatched from the buffer. | 03-14-2013 |
20130073833 | REDUCING STORE-HIT-LOADS IN AN OUT-OF-ORDER PROCESSOR - A technique for reducing store-hit-loads in an out-of-order processor includes storing a store address of a store instruction associated with a store-hit-load (SHL) pipeline flush in an SHL entry. In response to detecting another SHL pipeline flush for the store address, a current count associated with the SHL entry is updated. In response to the current count associated with the SHL entry reaching a first terminal count, a dependency for the store instruction is created such that execution of a younger load instruction with a load address that overlaps the store address stalls until the store instruction executes. | 03-21-2013 |
20130073834 | MFENCE AND LFENCE MICRO-ARCHITECTURAL IMPLEMENTATION METHOD AND SYSTEM - A system and method for fencing memory accesses. Memory loads can be fenced, or all memory access can be fenced. The system receives a fencing instruction that separates memory access instructions into older accesses and newer accesses. A buffer within the memory ordering unit is allocated to the instruction. The access instructions newer than the fencing instruction are stalled. The older access instructions are gradually retired. When all older memory accesses are retired, the fencing instruction is dispatched from the buffer. | 03-21-2013 |
20130080740 | FAST CONDITION CODE GENERATION FOR ARITHMETIC LOGIC UNIT - In one embodiment, a microprocessor includes fetch logic for retrieving an instruction, decode logic configured to identify an arithmetic operation specified in the instruction, and execution logic configured to receive operands specified by the instruction. The execution logic includes a primary logic path configured to perform the arithmetic operation on such operands and a secondary parallel logic path configured to output metadata associated with the result of the arithmetic operation. | 03-28-2013 |
20130086359 | Processor Hardware Pipeline Configured for Single-Instruction Address Extraction and Memory Access Operation - Memory access instructions, such as load and store instructions, are processed in a processor-based system. Processor hardware pipeline configurations enable efficient performance of memory access instructions, such as a pipeline configuration that enables, for a memory access operation request by a register-operand based virtual machine, computation of the memory location corresponding to a virtual-machine register by extracting a bit-field from the virtual-machine instruction and accessing (load or store) the computed memory location that represents a virtual register of the virtual-machine, in a single pass through the pipeline. Thus this processor hardware pipeline configuration enables a virtual machine register read/write operation to be performed by a single hardware processor instruction through a single pass in the processor hardware pipeline, for a register-operand based virtual machine. | 04-04-2013 |
20130086360 | FIFO Load Instruction - An instruction identifies a register and a memory location. Upon execution of the instruction by a processor, an item is loaded from the memory location and a shift and insert operation is performed to shift data in the register and to insert the item into the register. | 04-04-2013 |
20130086361 | Scalable Decode-Time Instruction Sequence Optimization of Dependent Instructions - Producer-consumer instructions, comprising a first instruction and a second instruction in program order, are fetched requiring in-order execution, the second instruction is modified by the processor so that the first instruction and second instruction can be completed out-of-order, the modification comprising any one of extending an immediate field of the second instruction using immediate field information of the first instruction or providing a source location of the first instruction as an additional source location to source locations of the second instruction. | 04-04-2013 |
20130086362 | Managing a Register Cache Based on an Architected Computer Instruction Set Having Operand First-Use Information - A prefix instruction is executed and passes operands to a net instruction without storing the operands in an architected resource such that the execution of the next instruction uses the operands provided by the prefix instruction to perform an operation, the operands may be prefix instruction immediate field or a target register of the prefix instruction execution. | 04-04-2013 |
20130117534 | INSTRUCTION AND LOGIC FOR PROCESSING TEXT STRINGS - Method, apparatus, and program means for performing a string comparison operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources store a result of a comparison between each data element of a first and second operand corresponding to a first and second text string, respectively. | 05-09-2013 |
20130124826 | Optimizing System Throughput By Automatically Altering Thread Co-Execution Based On Operating System Directives - A technique for optimizing program instruction execution throughput in a central processing unit core (CPU). The CPU implements a simultaneous multithreading (SMT) operational mode wherein program instructions associated with at least two software threads are executed in parallel as hardware threads while sharing one or more hardware resources used by the CPU, such as cache memory, translation lookaside buffers, functional execution units, etc. As part of the SMT mode, the CPU implements an autothread (AT) operational mode. During the AT operational mode, a determination is made whether there is a resource conflict between the hardware threads that undermines instruction execution throughput. If a resource conflict is detected, the CPU adjusts the relative instruction execution rates of the hardware threads based on relative priorities of the software threads. | 05-16-2013 |
20130124827 | INSTRUCTION AND LOGIC FOR PROCESSING TEXT STRINGS - Method, apparatus, and program means for performing a string comparison operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources store a result of a comparison between each data element of a first and second operand corresponding to a first and second text string, respectively. | 05-16-2013 |
20130138924 | EFFICIENT MICROCODE INSTRUCTION DISPATCH - An apparatus and method for avoiding bubbles and maintaining a maximum instruction throughput rate when cracking microcode instructions. A lookahead pointer scans the newest entries of a dispatch queue for microcode instructions. A detected microcode instruction is conveyed to a microcode engine to be cracked into a sequence of micro-ops. Then, the sequence of micro-ops is placed in a queue, and when the original microcode instruction entry in the dispatch queue is selected for dispatch, the sequence of micro-ops is dispatched to the next stage of the processor pipeline. | 05-30-2013 |
20130145123 | Computing Core Application Access Utilizing Dispersed Storage - A computing core includes a processing module, main memory, and a memory controller. The memory controller receives a request to fetch an instruction from the processing module and determines whether the instruction is currently stored in the main memory. When the instruction is not currently stored in the main memory, the memory controller determines whether the instruction is stored in a distributed storage network (DSN) memory as one or more sets of encoded instruction slices; and, when it is, the memory controller addresses the DSN memory to retrieve the one or more sets of encoded instruction slices. When at least a threshold number of encoded instruction slices are retrieved for each of the one or more sets of encoded instruction slices, the one or more sets of encoded instruction slices are decoded using a dispersed storage error coding function to reconstruct the instruction, which is provided to the processing module. | 06-06-2013 |
20130166880 | PROCESSING DEVICE AND METHOD FOR CONTROLLING PROCESSING DEVICE - A processing device has an instruction buffer retaining one or more instructions obtained by an instruction fetch request, an instruction execution control unit decoding and executing an instruction, a branch prediction mechanism retaining one or more branch histories including a distance flag indicating a difference between a branch instruction address and a branch destination instruction address and performing a branch prediction of an instruction, and an instruction fetch control unit issuing the instruction fetch request. When a branch prediction result is a branch taken and it is judged based on the distance flag that the instruction fetch request for the branch destination instruction address is included in the instruction fetch requests in a sequential direction which are issued until the branch prediction result is outputted, the control unit causes to output an instruction retained in the instruction buffer without issuing an instruction fetch request for the branch destination instruction address. | 06-27-2013 |
20130173885 | Processor and Methods of Adjusting a Branch Misprediction Recovery Mode - A processor core includes a fetch control unit for fetching instructions and placing the instructions into an instruction queue and includes a branch predictor for controlling the fetch control unit to speculatively fetch at least one instruction subsequent to an unresolved branch instruction. The processor further includes a controller configured to dispatch instructions from the instruction queue and, in response to a branch misprediction of an unresolved control instruction, to apply a selected one of a checkpointing-based recovery mode and a commit-time-based recovery mode. | 07-04-2013 |
20130179661 | Performing A Multiply-Multiply-Accumulate Instruction - In one embodiment, the present invention includes a processor having multiple execution units, at least one of which includes a circuit having a multiply-accumulate (MAC) unit including multiple multipliers and adders, and to execute a user-level multiply-multiply-accumulate instruction to populate a destination storage with a plurality of elements each corresponding to an absolute value for a pixel of a pixel block. Other embodiments are described and claimed. | 07-11-2013 |
20130185541 | BITSTREAM BUFFER MANIPULATION WITH A SIMD MERGE INSTRUCTION - Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block. | 07-18-2013 |
20130198490 | SYSTEMS AND METHODS FOR REDUCING BRANCH MISPREDICTION PENALTY - In a processing system capable of single and multi-thread execution, a branch prediciton unit can be configured to detect hard to predict branches and loop instructions. In a dual-threading (simultaneous multi-threading) configuration, one instruction queues (IQ) is used for each thread and instructions are alternately sent from each IQ to decode units. In single thread mode, the second IQ can be used to store the “not predicted path” of the hard-to-predict branch or the “fall-through” path of the loop. On mis-prediction, the mis-prediction penalty is reduced by getting the instructions from IQ instead of instruction cache. | 08-01-2013 |
20130205116 | MULTI-THREADED PROCESSOR INSTRUCTION BALANCING THROUGH INSTRUCTION UNCERTAINTY - A computer-implemented method for instruction execution in a pipeline, includes fetching, in the pipeline, a plurality of instructions, wherein the plurality of instructions includes a plurality of branch instructions, for each of the plurality of branch instructions, assigning a branch uncertainty to each of the plurality of branch instructions, for each of the plurality of instructions, assigning an instruction uncertainty that is a summation of branch uncertainties of older unresolved branches, and balancing the instructions, based on a current summation of instruction uncertainty, in the pipeline. | 08-08-2013 |
20130205117 | MFENCE AND LFENCE MICRO-ARCHITECTURAL IMPLEMENTATION METHOD AND SYSTEM - A system and method for fencing memory accesses. Memory loads can be fenced, or all memory access can be fenced. The system receives a fencing instruction that separates memory access instructions into older accesses and newer accesses. A buffer within the memory ordering unit is allocated to the instruction. The access instructions newer than the fencing instruction are stalled. The older access instructions are gradually retired. When all older memory accesses are retired, the fencing instruction is dispatched from the buffer. | 08-08-2013 |
20130246741 | RUN-TIME INSTRUMENTATION DIRECTED SAMPLING - Embodiments of the invention relate to implementing run-time instrumentation directed sampling. An aspect of the invention includes fetching a run-time instrumentation next (RINEXT) instruction from an instruction stream. The instruction stream includes the RINEXT instruction followed by a next sequential instruction (NSI) in program order. The method further includes executing the RINEXT instruction by a processor. The executing includes determining whether a current run-time instrumentation state enables setting a sample point for reporting run-time instrumentation information during program execution. Based on the current run-time instrumentation state enabling setting the sample point, the NSI is a sample instruction for causing a run-time instrumentation event. Based on executing the NSI sample instruction, the run-time instrumentation event causes recording of run-time instrumentation information into a run-time instrumentation program buffer as a reporting group. | 09-19-2013 |
20130246742 | RUN-TIME-INSTRUMENTATION CONTROLS EMIT INSTRUCTION - Embodiments of the invention relate to executing a run-time-instrumentation EMIT (RIEMIT) instruction. A processor is configured to capture the run-time-instrumentation information of a stream of instructions. The RIEMIT instruction is fetched and executed. It is determined if the current run-time-instrumentation controls are configured to permit capturing and storing of run-time-instrumentation information in a run-time-instrumentation program buffer. If the controls are configured to store run-time-instrumentation instructions, then a RIEMIT instruction specified value is stored as an emit record of a reporting group in the run-time-instrumentation program buffer. | 09-19-2013 |
20130246743 | DETERMINING THE STATUS OF RUN-TIME-INSTRUMENTATION CONTROLS - The invention relates to determining the status of run-time-instrumentation controls. The status is determined by executing a test run-time-instrumentation controls (TRIC) instruction. The TRIC instruction executed in either a supervisor state or a lesser-privileged state. The TRIC instruction determines whether the run-time-instrumentation controls have changed. The run-time-instrumentation controls are set to an initial value using a privileged load run-time-instrumentation controls (LRIC) instruction. The TRIC instruction is fetched and executed. If the TRIC instruction is enabled, then it is determined if the initial value set by the run-time-instrumentation controls has been changed. If the initial value set by the run-time-instrumentation controls has been changed, then a condition code is set to a first value. | 09-19-2013 |
20130246744 | MODIFYING RUN-TIME-INSTRUMENTATION CONTROLS FROM A LESSER-PRIVILEGED STATE - Embodiments of the invention relate to modifying run-time-instrumentation controls (MRIC) from a lesser-privileged state. The MRIC instruction is fetched. The MRIC instruction includes the address of a run-time-instrumentation control block (RICCB). The RICCB is fetched based on the address included in the MRIC instruction. The RICCB includes values for modifying a subset of the processor's run-time-instrumentation controls. The subset of run-time-instrumentation controls includes a runtime instrumentation program buffer current address (RCA) of a runtime instrumentation program buffer (RIB) location. The RIB holds run-time-instrumentation information of the events recognized by the processor during program execution. The values of the RICCB are loaded into the run-time-instrumentation controls. Event information is provided to the RIB based on the values that were loaded in the run-time-instrumentation control. | 09-19-2013 |
20130246745 | VECTOR PROCESSOR AND VECTOR PROCESSOR PROCESSING METHOD - A vector processor includes an instruction fetching unit configured to acquire an instruction, a decoding/issuing unit configured to decode the instruction and issuing the instruction, an operation group configured to include a plurality of operation units and a register configured to store the element data column, wherein the plurality of operation units include a first operation unit processes a first type instruction and a second operation unit processes a second type instruction and the first type instruction; and when a plurality of divided instructions, for which the element data of an instruction to be issued has been divided, are processed by the second operation unit, in a case where the second type instruction is not present, the decoding/issuing unit issues the divided instructions, and in a case where the second type instruction is present, the decoding/issuing unit issues the instruction to be issued without performing division. | 09-19-2013 |
20130246746 | RUN-TIME INSTRUMENTATION DIRECTED SAMPLING - Embodiments of the invention relate to implementing run-time instrumentation directed sampling. An aspect of the invention includes a method for implementing run-time instrumentation directed sampling. The method includes fetching a run-time instrumentation next (RINEXT) instruction from an instruction stream. The instruction stream includes the RINEXT instruction followed by a next sequential instruction (NSI) in program order. The method further includes executing the RINEXT instruction by a processor. The executing includes determining whether a current run-time instrumentation state enables setting a sample point for reporting run-time instrumentation information during program execution. Based on the current run-time instrumentation state enabling setting the sample point, the NSI is a sample instruction for causing a run-time instrumentation event. Based on executing the NSI sample instruction, the run-time instrumentation event causes recording of run-time instrumentation information into a run-time instrumentation program buffer as a reporting group. | 09-19-2013 |
20130246747 | RUN-TIME-INSTRUMENTATION CONTROLS EMIT INSTRUCTION - Embodiments of the invention relate to executing a run-time-instrumentation EMIT (RIEMIT) instruction. A processor is configured to capture the run-time-instrumentation information of a stream of instructions. The RIEMIT instruction is fetched and executed. It is determined if the current run-time-instrumentation controls are configured to permit capturing and storing of run-time-instrumentation information in a run-time-instrumentation program buffer. If the controls are configured to store run-time-instrumentation instructions, then a RIEMIT instruction specified value is stored as an emit record of a reporting group in the run-time-instrumentation program buffer. | 09-19-2013 |
20130246748 | DETERMINING THE STATUS OF RUN-TIME-INSTRUMENTATION CONTROLS - The invention relates to determining the status of run-time-instrumentation controls. The status is determined by executing a test run-time-instrumentation controls (TRIC) instruction. The TRIC instruction is executed in either a supervisor state or a lesser-privileged state. The TRIC instruction determines whether the run-time-instrumentation controls have changed. The run-time-instrumentation controls are set to an initial value using a privileged load run-time-instrumentation controls (LRIC) instruction. The TRIC instruction is fetched and executed. If the TRIC instruction is enabled, then it is determined if the initial value set by the run-time-instrumentation controls has been changed. If the initial value set by the run-time-instrumentation controls has been changed, then a condition code is set to a first value. | 09-19-2013 |
20130262821 | PERFORMING PREDECODE-TIME OPTIMIZED INSTRUCTIONS IN CONJUNCTION WITH PREDECODE TIME OPTIMIZED INSTRUCTION SEQUENCE CACHING - A method for performing predecode-time optimized instructions in conjunction with predecode time optimized instruction sequence caching. The method includes receiving a first instruction of an instruction sequence and a second instruction of the instruction sequence and determining if the first instruction and the second instruction can be optimized. In response to the determining that the first instruction and second instruction can be optimized, the method includes, preforming a pre-decode optimization on the instruction sequence and generating a new second instruction, wherein the new second instruction is not dependent on a target operand of the first instruction and storing a pre-decoded first instruction and a pre-decoded new second instruction in an instruction cache. In response to determining that the first instruction and second instruction can not be optimized, the method includes, storing the pre-decoded first instruction and a pre-decoded second instruction in the instruction cache. | 10-03-2013 |
20130262822 | CACHING OPTIMIZED INTERNAL INSTRUCTIONS IN LOOP BUFFER - Embodiments of the invention relate to a computer system for storing an internal instruction loop in a loop buffer. The computer system includes a loop buffer and a processor. The computer system is configured to perform a method including fetching instructions from memory to generate an internal instruction to be executed, detecting a beginning of a first instruction loop in the instructions, determining that a first internal instruction loop corresponding to the first instruction loop is not stored in the loop buffer, fetching the first instruction loop, optimizing one or more instructions corresponding to the first instruction loop to generate a first optimized internal instruction loop, and storing the first optimized internal instruction loop in the loop buffer based on the determination that the first internal instruction loop is not stored in the loop buffer. | 10-03-2013 |
20130262823 | INSTRUCTION MERGING OPTIMIZATION - A computer system for optimizing instructions includes a processor including an instruction execution unit configured to execute instructions and an instruction optimization unit configured to optimize instructions and memory to store machine instructions to be executed by the instruction execution unit. The computer system is configured to perform a method including analyzing machine instructions from among a stream of instructions to be executed by the instruction execution unit, the machine instructions including a memory load instruction and a data processing instruction to perform a data processing function based on the memory load instruction, identifying the machine instructions as being eligible for optimization, merging the machine instructions into a single optimized internal instruction, and executing the single optimized internal instruction to perform a memory load function and a data processing function corresponding to the memory load instruction and the data processing instruction. | 10-03-2013 |
20130262824 | CODE GENERATION METHOD, AND INFORMATION PROCESSING APPARATUS - A computer-readable recording medium having stored therein a program for causing a computer to execute a digital signature process includes determining that a first specific instruction for executing parallel calculations of the same type, each calculation operating on a different piece of data, is generated by combining first and second instructions included in a first code, retrieving, from the first code, a third instruction for calculating data referenced by the first instruction and a fourth instruction for calculating data referenced by the second instruction, and selecting the third and fourth instructions as candidates of instructions to be combined with each other preferentially to generate a second specific instruction which is different from the first specific instruction. | 10-03-2013 |
20130275720 | ZERO CYCLE MOVE - A system and method for reducing the latency of data move operations. A register rename unit within a processor determines whether a decoded move instruction is eligible for a zero cycle move operation. If so, control logic assigns a physical register identifier associated with a source operand of the move instruction to the destination operand of the move instruction. Additionally, the register rename unit marks the given move instruction to prevent it from proceeding in the processor pipeline. Further maintenance of the particular physical register identifier may be done by the register rename unit during commit of the given move instruction. | 10-17-2013 |
20130283011 | Computer Program Instruction Analysis - Disclosed is a method of analysis of a computer program instruction for use in a central processing unit having a decoding unit. The method comprises receiving an address of an instruction to be analysed, fetching said instruction stored at said address, decoding by a decoding unit associated with the central processing unit, the fetched instruction; and returning the results of said decoding of said fetched instruction. The decoded results are returned as a data block stored in memory associated with the central processing unit or in one or more registers of the central processing unit. The decoded results include the type of the instruction and/or the instruction length. The method optionally further comprises analysing the decoded results to determine whether the instruction may be replaced with one of a trap or a break point. Also disclosed is a system and computer program for analysis of a computer program instruction for use in a central processing unit having a decoding unit. | 10-24-2013 |
20130283012 | Methods and Apparatus for Scalable Array Processor Interrupt Detection and Response - Hardware and software techniques for interrupt detection and response in a scalable pipelined array processor environment are described. Utilizing these techniques, a sequential program execution model with interrupts can be maintained in a highly parallel scalable pipelined array processing containing multiple processing elements and distributed memories and register files. When an interrupt occurs, interface signals are provided to all PEs to support independent interrupt operations in each PE dependent upon the local PE instruction sequence prior to the interrupt. Processing/element exception interrupts are supported and low latency interrupt processing is also provided for embedded systems where real time signal processing is required. Further, a hierarchical interrupt structure is used allowing a generalized debug approach using debut interrupts and a dynamic debut monitor mechanism. | 10-24-2013 |
20130290675 | MITIGATION OF THREAD HOGS ON A THREADED PROCESSOR - Systems and methods for efficient thread arbitration in a threaded processor with dynamic resource allocation. A processor includes a resource shared by multiple threads. The resource includes an array with multiple entries, each of which may be allocated for use by any thread. Control logic detects a load miss to memory, wherein the miss is associated with a latency greater than a given threshold. The load instruction or an immediately younger instruction is selected for replay for an associated thread. A pipeline flush and replay for the associated thread begins with the selected instruction. Instructions younger than the load instruction are held at a given pipeline stage until the load instruction completes. During replay, this hold prevents resources from being allocated to the associated thread while the load instruction is being serviced. | 10-31-2013 |
20130290676 | BRANCH PREDICTION POWER REDUCTION - In one embodiment, a microprocessor is provided. The microprocessor includes a branch prediction unit. The branch prediction unit is configured to track the presence of branches in instruction data that is fetched from an instruction memory after a redirection at a target of a predicted taken branch. The branch prediction unit is selectively powered up from a powered-down state when the fetched instruction data includes a branch instruction and is maintained in the powered-down state when the fetched instruction data does not include an instruction branch in order to reduce power consumption of the microprocessor during instruction fetch operations. | 10-31-2013 |
20130297910 | MITIGATION OF THREAD HOGS ON A THREADED PROCESSOR USING A GENERAL LOAD/STORE TIMEOUT COUNTER - Systems and methods for efficient thread arbitration in a threaded processor with dynamic resource allocation. A processor includes a resource shared by multiple threads. The resource includes entries which may be allocated for use by any thread. Control logic detects long latency instructions. Long latency instructions have a latency greater than a given threshold. One example is a load instruction that has a read-after-write (RAW) data dependency on a store instruction that misses a last-level data cache. The long latency instruction or an immediately younger instruction is selected for replay for an associated thread. A pipeline flush and replay for the associated thread begins with the selected instruction. Instructions younger than the long latency instruction are held at a given pipeline stage until the long latency instruction completes. During replay, this hold prevents resources from being allocated to the associated thread while the long latency instruction is being serviced. | 11-07-2013 |
20130297911 | CHECKPOINTED BUFFER FOR RE-ENTRY FROM RUNAHEAD - Embodiments related to re-dispatching an instruction selected for re-execution from a buffer upon a microprocessor re-entering a particular execution location after runahead are provided. In one example, a microprocessor is provided. The example microprocessor includes fetch logic, one or more execution mechanisms for executing a retrieved instruction provided by the fetch logic, and scheduler logic for scheduling the retrieved instruction for execution. The example scheduler logic includes a buffer for storing the retrieved instruction and one or more additional instructions, the scheduler logic being configured, upon the microprocessor re-entering at a particular execution location after runahead, to re-dispatch, from the buffer, an instruction that has been previously dispatched to one of the execution mechanisms. | 11-07-2013 |
20130305017 | COMPILED CONTROL CODE PARALLELIZATION BY HARDWARE TREATMENT OF DATA DEPENDENCY - An apparatus comprising a buffer and a processor. The buffer may be configured to store a plurality of fetch sets. The processor may be configured to perform a change of flow operation based upon at least one of (i) a comparison between addresses of two memory locations involved in each of two memory accessess, (ii) a first predefined prefix code, and (iii) a second predefined prefix code. | 11-14-2013 |
20130305018 | MFENCE and LFENCE Micro-Architectural Implementation Method and System - A system and method for fencing memory accesses. Memory loads can be fenced, or all memory access can be fenced. The system receives a fencing instruction that separates memory access instructions into older accesses and newer accesses. A buffer within the memory ordering unit is allocated to the instruction. The access instructions newer than the fencing instruction are stalled. The older access instructions are gradually retired. When all older memory accesses are retired, the fencing instruction is dispatched from the buffer. | 11-14-2013 |
20130305019 | Instruction and Logic to Control Transfer in a Partial Binary Translation System - A dynamic optimization of code for a processor-specific dynamic binary translation of hot code pages (e.g., frequently executed code pages) may be provided by a run-time translation layer. A method may be provided to use an instruction look-aside buffer (iTLB) to map original code pages and translated code pages. The method may comprise fetching an instruction from an original code page, determining whether the fetched instruction is a first instruction of a new code page and whether the original code page is deprecated. If both determinations return yes, the method may further comprise fetching a next instruction from a translated code page. If either determinations returns no, the method may further comprise decoding the instruction and fetching the next instruction from the original code page. | 11-14-2013 |
20130326192 | BROADCAST OPERATION ON MASK REGISTER - Embodiments of systems, apparatuses, and methods for performing a mask broadcast instruction in a computer processor are described. In some embodiments, the execution of a mask broadcast instruction causes a broadcast of a data element of the source operand to a destination register of the destination operand according to the broadcast size. | 12-05-2013 |
20130339664 | INSTRUCTION EXECUTION UNIT THAT BROADCASTS DATA VALUES AT DIFFERENT LEVELS OF GRANULARITY - An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The first data structure is four times as large as the second data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second instruction to create a second replication data structure. | 12-19-2013 |
20130346727 | Methods and Apparatus to Extend Software Branch Target Hints - Apparatus and techniques for predicting a storage address based on contents of a first program accessible register (PAR) specified in a first instruction, wherein the first PAR correlates with a target address specified by a second PAR in a second instruction. Information is speculatively fetched at the predicted storage address prior to execution of the second instruction. The first instruction is an advance correlating notification (ADVCN) instruction, the second instruction is an indirect branch instruction, and the information is a plurality of instructions beginning at the predicted storage address. The predicted storage address is a branch target address for the indirect branch instruction from which instructions are speculatively fetched. The prediction is based on contents of the first PAR specified in the ADVCN instruction. The contents of the first PAR correlate with a taken evaluation of the branch instruction. | 12-26-2013 |
20140006752 | Qualifying Software Branch-Target Hints with Hardware-Based Predictions | 01-02-2014 |
20140013083 | CACHE COPROCESSING UNIT - A cache coprocessing unit in a computing system includes a cache array to store data, a hardware decode unit to decode instructions that are offloaded from being executed by an execution cluster of the computing system to reduce load and store operations between the execution cluster and the cache coprocessing unit, and a set of one or more operation units to perform operations on the cache array according to the decoded instructions. | 01-09-2014 |
20140025930 | MULTI-CORE PROCESSOR SHARING LI CACHE AND METHOD OF OPERATING SAME - A multi-core processor includes first processor core including a first instruction fetch unit and out-of-order execution data units, a second processor core including a second instruction fetch unit and in-order execution data units, and a shared-level 1 cache including a level 1-instruction cache shared between the first instruction fetch unit and the second instruction fetch unit and a level 1-data cache shared between the out-of-order execution data units and the in-order execution data. | 01-23-2014 |
20140040596 | PACKED LOAD/STORE WITH GATHER/SCATTER - Embodiments relate to packed loading and storing of data. An aspect includes a method for packed loading and storing of data distributed in a system that includes memory and a processing element. The method includes fetching and decoding an instruction for execution by the processing element. The processing element gathers a plurality of individually addressable data elements from non-contiguous locations in the memory which are narrower than a nominal width of register file elements in the processing element based on the instruction. The data elements are packed and loaded into register file elements of a register file entry by the processing element based on the instruction, such that at least two of the data elements gathered from the non-contiguous locations in the memory are packed and loaded into a single register file element of the register file entry. | 02-06-2014 |
20140040597 | PREDICATION IN A VECTOR PROCESSOR - Embodiments relate to vector processor predication in an active memory device. An aspect includes a system for vector processor predication in an active memory device. The system includes memory in the active memory device and a processing element in the active memory device. The processing element is configured to perform a method including decoding an instruction with a plurality of sub-instructions to execute in parallel. One or more mask bits are accessed from a vector mask register in the processing element. The one or more mask bits are applied by the processing element to predicate operation of a unit in the processing element associated with at least one of the sub-instructions. | 02-06-2014 |
20140040598 | VECTOR PROCESSING IN AN ACTIVE MEMORY DEVICE - Embodiments relate to vector processing in an active memory device. An aspect includes a system for vector processing in an active memory device. The system includes memory in the active memory device and a processing element in the active memory device. The processing element is configured to perform a method including decoding an instruction with a plurality of sub-instructions to execute in parallel. An iteration count to repeat execution of the sub-instructions in parallel is determined. Execution of the sub-instructions is repeated in parallel for multiple iterations, by the processing element, based on the iteration count. Multiple locations in the memory are accessed in parallel based on the execution of the sub-instructions. | 02-06-2014 |
20140040599 | PACKED LOAD/STORE WITH GATHER/SCATTER - Embodiments relate to packed loading and storing of data. An aspect includes a system for packed loading and storing of distributed data. The system includes memory and a processing element configured to communicate with the memory. The processing element is configured to perform a method including fetching and decoding an instruction for execution by the processing element. A plurality of individually addressable data elements is gathered from non-contiguous locations in the memory which are narrower than a nominal width of register file elements in the processing element based on the instruction. The processing element packs and loads the data elements into register file elements of a register file entry based on the instruction, such that at least two of the data elements gathered from the non-contiguous locations in the memory are packed and loaded into a single register file element of the register file entry. | 02-06-2014 |
20140040600 | DATA PROCESSOR - It is to provide a data processor which maintains compatibility with an existing instruction set such as a 16-bit fixed-length instruction set and in which an instruction code space is extended. | 02-06-2014 |
20140047214 | VECTOR REGISTER FILE - An aspect includes accessing a vector register in a vector register file. The vector register file includes a plurality of vector registers and each vector register includes a plurality of elements. A read command is received at a read port of the vector register file. The read command specifies a vector register address. The vector register address is decoded by an address decoder to determine a selected vector register of the vector register file. An element address is determined for one of the plurality of elements associated with the selected vector register based on a read element counter of the selected vector register. A word is selected in a memory array of the selected vector register as read data based on the element address. The read data is output from the selected vector register based on the decoding of the vector register address by the address decoder. | 02-13-2014 |
20140047215 | STALL REDUCING METHOD, DEVICE AND PROGRAM FOR PIPELINE OF PROCESSOR WITH SIMULTANEOUS MULTITHREADING FUNCTION - The disclosure provides a technique for suppressing the occurrence of stalling caused by data dependency other than register dependency in an out-of-order processor. A stall reducing program includes a handler for detecting a stall occurring during execution of execution code using a PMU, and identifying, based on dependencies, a second instruction dependent on data of a first instruction causing the stall based on this dependency; a profiler registering the second instruction as profile information; and an optimization module for inserting a thread yield instruction in the appropriate position inside the execution code or original code file based on the profile information, and outputting the optimized execution code. | 02-13-2014 |
20140047216 | Scalable Decode-Time Instruction Sequence Optimization of Dependent Instructions - Producer-consumer instructions, comprising a first instruction and a second instruction in program order, are fetched requiring in-order execution, the second instruction is modified by the processor so that the first instruction and second instruction can be completed out-of-order, the modification comprising any one of extending an immediate field of the second instruction using immediate field information of the first instruction or providing a source location of the first instruction as an additional source location to source locations of the second instruction. | 02-13-2014 |
20140068228 | INSTRUCTION FORWARDING BASED ON PREDICATION CRITERIA - Embodiments herein relate to forwarding an instruction based on predication criteria. A predicate state associated with a packet of data is to be compared to an instruction associated with the predication criteria. The instruction is to be forwarded to an execution unit if the predication criteria includes or matches the predicate state of the packet. | 03-06-2014 |
20140075156 | FETCH WIDTH PREDICTOR - Various techniques for predicting instruction fetch widths. In one embodiment, a fetch prediction unit in a processor is configured to generate a fetch width that specifies a number of bits to be retrieved in a subsequent fetch from an instruction cache. The fetch prediction unit may also generate a fetch prediction that includes the fetch width in response to a current fetch request. A number of bits corresponding to the fetch width may be fetched from the instruction cache. The fetch width may correspond to a location of a predicted-taken control transfer instruction. This fetch width prediction may lead to power savings in instruction cache accesses. | 03-13-2014 |
20140082327 | CONTINUOUS RUN-TIME VALIDATION OF PROGRAM EXECUTION: A PRACTICAL APPROACH - Trustworthy systems require that code be validated as genuine. Most systems implement this requirement prior to execution by matching a cryptographic hash of the binary file against a reference hash value, leaving the code vulnerable to run time compromises, such as code injection, return and jump-oriented programming, and illegal linking of the code to compromised library functions. The Run-time Execution Validator (REV) validates, as the program executes, the control flow path and instructions executed along the control flow path. REV uses a signature cache integrated into the processor pipeline to perform live validation of executions, at basic block boundaries, and ensures that changes to the program state are not made by the instructions within a basic block until the control flow path into the basic block and the instructions within the basic block are both validated. | 03-20-2014 |
20140089636 | CACHING OPTIMIZED INTERNAL INSTRUCTIONS IN LOOP BUFFER - Embodiments of the invention relate to a computer system for storing an internal instruction loop in a loop buffer. The computer system includes a loop buffer and a processor. The computer system is configured to perform a method including fetching instructions from memory to generate an internal instruction to be executed, detecting a beginning of a first instruction loop in the instructions, determining that a first internal instruction loop corresponding to the first instruction loop is not stored in the loop buffer, fetching the first instruction loop, optimizing one or more instructions corresponding to the first instruction loop to generate a first optimized internal instruction loop, and storing the first optimized internal instruction loop in the loop buffer based on the determination that the first internal instruction loop is not stored in the loop buffer. | 03-27-2014 |
20140089637 | Optimizing System Throughput By Automatically Altering Thread Co-Execution Based On Operating System Directives - A technique for optimizing program instruction execution throughput in a central processing unit core (CPU). The CPU implements a simultaneous multithreading (SMT) operational mode wherein program instructions associated with at least two software threads are executed in parallel as hardware threads while sharing one or more hardware resources used by the CPU, such as cache memory, translation lookaside buffers, functional execution units, etc. As part of the SMT mode, the CPU implements an autothread (AT) operational mode. During the AT operational mode, a determination is made whether there is a resource conflict between the hardware threads that undermines instruction execution throughput. If a resource conflict is detected, the CPU adjusts the relative instruction execution rates of the hardware threads based on relative priorities of the software threads. | 03-27-2014 |
20140095831 | APPARATUS AND METHOD FOR EFFICIENT GATHER AND SCATTER OPERATIONS - An apparatus and method are described for performing efficient gather operations in a pipelined processor. For example, a processor according to one embodiment of the invention comprises: gather setup logic to execute one or more gather setup operations in anticipation of one or more gather operations, the gather setup operations to determine one or more addresses of vector data elements to be gathered by the gather operations; and gather logic to execute the one or more gather operations to gather the vector data elements using the one or more addresses determined by the gather setup operations. | 04-03-2014 |
20140095832 | METHOD AND APPARATUS FOR PERFORMANCE EFFICIENT ISA VIRTUALIZATION USING DYNAMIC PARTIAL BINARY TRANSLATION - Methods, apparatus and systems for virtualization of a native instruction set are disclosed. Embodiments include a processor core executing the native instructions and a second core, or alternatively only the second processor core consuming less power while executing a second instruction set that excludes portions of the native instruction set. The second core's decoder detects invalid opcodes of the second instruction set. A microcode layer disassembler determines if opcodes should be translated. A translation runtime environment identifies an executable region containing an invalid opcode, other invalid opcodes and interjacent valid opcodes of the second instruction set. An analysis unit determines an initial machine state prior to execution of the invalid opcode. A partial translation of the executable region that includes encapsulations of the translations of invalid opcodes and state recoveries of the machine states is generated and saved to a translation cache memory. | 04-03-2014 |
20140095833 | Prefix Computer Instruction for Compatibly Extending Instruction Functionality - A prefix instruction is executed and passes operands to a next instruction without storing the operands in an architected resource such that the execution of the next instruction uses the operands provided by the prefix instruction to perform an operation, the operands may be prefix instruction immediate field or a target register of the prefix instruction execution. | 04-03-2014 |
20140101412 | SPECULATIVE PRIVILEGE ELEVATION - Systems and methods are provided for speculatively elevating a privilege level at which instructions are executed. In embodiment, this is accomplished b identification of a privilege elevation instruction (e.g., SYSCALL) at an early pipeline stage and speculatively executing subsequent instructions with elevated privileges. | 04-10-2014 |
20140108769 | MULTI-REGISTER SCATTER INSTRUCTION - A processor fetches a multi-register scatter instruction that includes a source operand and a destination operand. The source operand specifies a source vector register that includes multiple source data elements. The destination operand identifies multiple destination data elements that each specify a destination vector register and an index into that destination vector register. The instruction is decoded and executed, causing, for each of those identified destination data elements, the one of the source data elements that is in a position in the source vector register that corresponds with a position of that destination data element to be stored in the destination vector register at the index specified by that destination data element. | 04-17-2014 |
20140115300 | DATA PROCESSING SYSTEM WITH DATA CHARACTERISTIC BASED IDENTIFICATION OF CORRESPONDING INSTRUCTIONS - Some methods, computer program products, and data processing nodes identify a data unit in a data memory that is to be operated upon by a processor circuit, and uses a characteristic of the data unit to identify what instruction(s) within an instruction memory is be executed by the processor circuit to perform an operation upon the data unit. The data memory may be local to the processor circuit, and the instruction memory may be remotely accessible to the processor circuit through a data network. | 04-24-2014 |
20140122836 | CONFIDENCE-DRIVEN SELECTIVE PREDICATION OF PROCESSOR INSTRUCTIONS - An apparatus includes a network interface, memory, and a processor. The processor is coupled with the network interface and memory. The processor is configured to determine that an instruction instance is a branch instruction instance. Responsive to a determination that an instruction instance is a branch instruction instance, the processor is configured to obtain a branch prediction for the branch instruction instance and a confidence value of the branch prediction. The processor is further configured to determine that the confidence for the branch prediction is low based on the confidence value, and responsive to such a determination, generate predicated instruction instances based on the branch instruction instance. | 05-01-2014 |
20140143521 | INSTRUCTION SWAP FOR PATCHING PROBLEMATIC INSTRUCTIONS IN A MICROPROCESSOR - There is provided a method and system for replacing an instruction with another instruction. A match register stores an opcode that identifies an instruction to be replaced. A swap register stores an instruction that replaces the identified instruction. A multiplexer chooses the instruction stored in the swap register over the identified instruction if predecode bits of the identified instruction are set. | 05-22-2014 |
20140149717 | PERFORMING A MULTIPLY-MULTIPLY-ACCUMULATE INSTRUCTION - In one embodiment, the present invention includes a processor having multiple execution units, at least one of which includes a circuit having a multiply-accumulate (MAC) unit including multiple multipliers and adders, and to execute a user-level multiply-multiply-accumulate instruction to populate a destination storage with a plurality of elements each corresponding to an absolute value for a pixel of a pixel block. Other embodiments are described and claimed. | 05-29-2014 |
20140156972 | Control Transfer Termination Instructions Of An Instruction Set Architecture (ISA) - In an embodiment, the present invention includes a processor having an execution logic to execute instructions and a control transfer termination (CTT) logic coupled to the execution logic. This logic is to cause a CTT fault to be raised if a target instruction of a control transfer instruction is not a CTT instruction. Other embodiments are described and claimed. | 06-05-2014 |
20140156973 | PROCESSOR AND CONTROL METHOD OF PROCESSOR - A processor comprises an instruction fetch unit to acquire an instruction from an instruction address defined as an instruction storage source at a fetching stage of the processor repeating an instruction process including the fetching stage of fetching the instruction and an execution stage of executing the instruction, an associative relation storage unit to register an associative relation between a high-order bit field of the instruction address of the instruction undergoing the instruction process and high-order address information into which the high-order bit field of the instruction address is encoded, an encoding unit to encode the high-order bit field contained in the instruction address into the high-order address information on the basis of the associative relation and a decoding unit to decode the high-order bit field from the high-order address information and the associative relation. | 06-05-2014 |
20140164737 | EXECUTION EFFICIENCY IN A SINGLE-PROGRAM, MULTIPLE-DATA PROCESSOR - A method for executing instructions on a single-program, multiple-data processor system having a fixed number of execution lanes, including: scheduling a primary instruction for execution with a first wave of multiple data; assigning the first wave to a corresponding primary subset of the execution lanes; scheduling a secondary instruction having a second wave of multiple data, such that the second wave fits in lanes that are unused by the primary subset of lanes; assigning the second wave to a corresponding secondary subset of the lanes; fetching the primary and secondary instructions; configuring the execution lanes such that the primary subset is responsive to the primary instruction and the secondary subset is simultaneously responsive to the secondary instruction; and simultaneously executing the primary and secondary instructions in the execution lanes. | 06-12-2014 |
20140164738 | INSTRUCTION CATEGORIZATION FOR RUNAHEAD OPERATION - Embodiments related to methods and devices operative, in the event that execution of an instruction produces a runahead-triggering event, to cause a microprocessor to enter into and operate in a runahead without reissuing the instruction are provided. In one example, a microprocessor is provided. The example microprocessor includes fetch logic for retrieving an instruction, scheduling logic for issuing the instruction retrieved by the fetch logic for execution, and runahead control logic. The example runahead control logic is operative, in the event that execution of the instruction as scheduled by the scheduling logic produces a runahead-triggering event, to cause the microprocessor to enter into and operate in a runahead mode without reissuing the instruction, and carry out runahead policies while the microprocessor is in the runahead mode that governs operation of the microprocessor and cause the microprocessor to operate differently than when not in the runahead mode. | 06-12-2014 |
20140164739 | Modify and Execute Next Sequential Instruction Facility and Instructions Therefore - An modify next sequential instruction (MNSI) instruction, when executed, modifies a field of the fetched copy of the next sequential instruction (NSI) to enable a program to dynamically provide parameters to the NSI being executed. Thus the MNSI instruction is a non-disruptive prefix instruction to the NSI. The NSI may be modified to effectively extend the length of the NSI field, thus providing more registers or more range (in the case of a length field) than otherwise available to the NSI instruction according to the instruction set architecture (ISA) | 06-12-2014 |
20140164740 | Branch-Free Condition Evaluation - A compare instruction of an instruction set architecture (ISA), when executed tests one or more operands for an instruction defined condition. The result of the test is stored as an operand, with leading zeros, in a general register of the ISA. The general register is identified (explicitly or implicitly) by the compare instruction. Thus, the result of the test can be manipulated by standard register operations of the computer system. In a superscalar processor, no special “condition code” renaming is required, as the standard register renaming takes care of out-of-order processing of the conditions. | 06-12-2014 |
20140164741 | Modify and Execute Next Sequential Instruction Facility and Instructions Therefore - An modify next sequential instruction (MNSI) instruction, when executed, modifies a field of the fetched copy of the next sequential instruction (NSI) to enable a program to dynamically provide parameters to the NSI being executed. Thus the MNSI instruction is a non-disruptive prefix instruction to the NSI. The NSI may be modified to effectively extend the length of the NSI field, thus providing more registers or more range (in the case of a length field) than otherwise available to the NSI instruction according to the instruction set architecture (ISA) | 06-12-2014 |
20140173253 | Methods and Apparatus for Storing Expanded Width Instructions in a VLIW Memory for Deferred Execution - Techniques are described for decoupling fetching of an instruction stored in a main program memory from earliest execution of the instruction. An indirect execution method and program instructions to support such execution are addressed. In addition, an improved indirect deferred execution processor (DXP) VLIW architecture is described which supports a scalable array of memory centric processor elements that do not require local load and store units. | 06-19-2014 |
20140189305 | REDUNDANT EXECUTION FOR RELIABILITY IN A SUPER FMA ALU - A system, processor and method to increase computational reliability by using underutilized portions of a data path with a SuperFMA ALU. The method allows the reuse of underutilized hardware to implement spatial redundancy by using detection during the dispatch stage to determine if the operation may be executed by redundant hardware in the ALU. During execution, if determination is made that the correct conditions exists as determined by the redundant execution modes, the SuperFMA ALU performs the operation with redundant execution and compares the results for a match in order to generate a computational result. The method to increase computational reliability by using redundant execution is advantageous because the hardware cost of adding support for redundant execution is low and the complexity of implementation of the disclosed method is minimal due to the reuse of existing hardware. | 07-03-2014 |
20140195781 | STRUCTURED CONTROL INSTRUCTION FETCH UNIT - The structured control instruction fetch unit is a structured instruction stream controller that processes expand (XP), expand register indirect (XPR), loop (LOOP), and break (BRK) instructions for structured control. The fetch unit processes stop bits that mark the end of instruction blocks. Any instruction can be marked with a stop bit to indicate that it is the last one in an instruction block. All instructions are encoded with a predicate to reduce the use of control instructions and to simplify the control. A control stack guides instruction fetching by storing return addresses, loop block addresses, loop predicates, and loop counters. Control instructions and stop bits manage operation of the control stack. An instruction unit feeds execution units and includes a set-associative instruction cache, a control stack, an instruction buffer that decouples instruction fetching from execution, instruction decoders, and program counter (PC) control logic. | 07-10-2014 |
20140208073 | Arithmetic Branch Fusion - A processor and method for fusing together an arithmetic instruction and a branch instruction. The processor includes an instruction fetch unit configured to fetch instructions. The processor may also include an instruction decode unit that may be configured to decode the fetched instructions into micro-operations for execution by an execution unit. The decode unit may be configured to detect an occurrence of an arithmetic instruction followed by a branch instruction in program order, wherein the branch instruction, upon execution, changes a program flow of control dependent upon a result of execution of the arithmetic instruction. In addition, the processor may further be configured to fuse together the arithmetic instruction and the branch instruction such that a single micro-operation is formed. The single micro-operation includes execution information based upon both the arithmetic instruction and the branch instruction. | 07-24-2014 |
20140215185 | FETCHING INSTRUCTIONS OF A LOOP ROUTINE - In one aspect, a processor is configured to store instructions fetched from a program memory in an instruction queue, determine that an instruction to be decoded defines a beginning of a loop routine, and determine whether the instruction is stored in the instruction queue. In response to determining that the instruction is stored in the instruction queue, the processor disables fetching of instructions from the program memory, fetches instructions of the loop routine from the instruction queue, and stores the instructions of the loop routine in an instruction register. In response to determining that the instruction is not stored in the instruction queue, the processor fetches the instruction from the program memory, stores the instruction in the instruction queue, and stores the instruction in the instruction register. | 07-31-2014 |
20140215186 | SYSTEMS, APPARATUSES, AND METHODS FOR MAPPING A SOURCE OPERAND TO A DIFFERENT RANGE - Embodiments of systems, apparatuses, and methods for performing a range mapping instruction in a computer processor are described. In some embodiments, the execution of a range mapping instruction maps a data element having a source data range to a destination data element having a destination data range and storage of the of the destination data element. | 07-31-2014 |
20140237215 | Methods and Apparatus for Scalable Array Processor Interrupt Detection and Response - Hardware and software techniques for interrupt detection and response in a scalable pipelined array processor environment are described. Utilizing these techniques, a sequential program execution model with interrupts can be maintained in a highly parallel scalable pipelined array processing containing multiple processing elements and distributed memories and register files. When an interrupt occurs, interface signals are provided to all PEs to support independent interrupt operations in each PE dependent upon the local PE instruction sequence prior to the interrupt. Processing/element exception interrupts are supported and low latency interrupt processing is also provided for embedded systems where real time signal processing is required. Further, a hierarchical interrupt structure is used allowing a generalized debug approach using debut interrupts and a dynamic debut monitor mechanism. | 08-21-2014 |
20140258680 | PARALLEL DISPATCH OF COPROCESSOR INSTRUCTIONS IN A MULTI-THREAD PROCESSOR - Techniques are addressed for parallel dispatch of coprocessor and thread instructions to a coprocessor coupled to a threaded processor. A first packet of threaded processor instructions is accessed from an instruction fetch queue (IFQ) and a second packet of coprocessor instructions is accessed from the IFQ. The IFQ includes a plurality of thread queues that are each configured to store instructions associated with a specific thread of instructions. A dispatch circuit is configured to select the first packet of thread instructions from the IFQ and the second packet of coprocessor instructions from the IFQ and send the first packet to a threaded processor and the second packet to the coprocessor in parallel. A data port is configured to share data between the coprocessor and a register file in the threaded processor. Data port operations are accomplished without affecting operations on any thread executing on the threaded processor. | 09-11-2014 |
20140281387 | CONVERTING CONDITIONAL SHORT FORWARD BRANCHES TO COMPUTATIONALLY EQUIVALENT PREDICATED INSTRUCTIONS - A processor is operable to process conditional branches. The processor includes instruction fetch logic to fetch a conditional short forward branch. The conditional short forward branch is to include a conditional branch instruction and a set of one or more instructions that are to sequentially follow the conditional branch instruction in program order. The set of the one or more instructions are between the conditional branch instruction and a forward branch target instruction that is to be indicated by the conditional branch instruction. The processor also includes instruction conversion logic coupled with the instruction fetch logic. The instruction conversion logic is to convert the conditional short forward branch to a computationally equivalent set of one or more predicated instructions. Other processors are also disclosed, as are various methods and systems. | 09-18-2014 |
20140281388 | Method and Apparatus for Guest Return Address Stack Emulation Supporting Speculation - A microprocessor implemented method for maintaining a guest return address stack in an out-of-order microprocessor pipeline is disclosed. The method comprises mapping a plurality of instructions in a guest address space into a corresponding plurality of instructions in a native address space. For each function call instruction in the native address space fetched during execution, the method also comprises performing the following: (a) pushing a current entry into a guest return address stack (GRAS) responsive to a function call, wherein the GRAS is maintained at the fetch stage of the pipeline, and wherein the current entry comprises information regarding both a guest target return address and a corresponding native target return address associated with the function call; (b) popping the current entry from the GRAS in response to processing a return instruction; and (c) fetching instructions from the native target return address in the current entry after the popping from the GRAS. | 09-18-2014 |
20140317382 | DYNAMIC CONFIGURATION OF PROCESSING PIPELINE BASED ON DETERMINED TYPE OF FETCHED INSTRUCTION - Various embodiments relating to executing different types of instruction code in a micro-processing system are provided. In one embodiment, a micro-processing system includes a memory/storage subsystem configured to store non-native instruction set architecture (ISA) code and native ISA code in a common address space, fetch logic configured to retrieve the non-native ISA code and native ISA code from the common address space, instruction type determining logic configured to determine, at runtime, whether fetched instruction code is non-native ISA code or native ISA code, and processing logic configured to execute the fetched instruction code via a first pipeline configuration in response to the instruction type determining logic determining that the fetched instruction code is non-native ISA code, and via a second pipeline configuration which is different than the first pipeline configuration, in response to the instruction type determining logic determining that the fetched instruction code is native ISA code. | 10-23-2014 |
20140325186 | SUPPORTING CODE EXECUTION IN DUAL ADDRESS SPACES - A processing apparatus supports execution of executable computer program code, wherein non-instruction data is read from and written to a first address space, while executable instructions are fetched from a second address space. Preferably, the processing apparatus supports execution of a modified or enhanced computer program. The programs and user interfaces in the first address space see only the unmodified first program in the first address space and cannot detect the modified or enhanced program in the second address space. | 10-30-2014 |
20140331028 | IDENTIFICATION OF MISSING CALL AND RETURN INSTRUCTIONS FOR MANAGEMENT OF A RETURN ADDRESS STACK - A data processing apparatus and method of data processing are disclosed. A fetch unit retrieves program instructions comprising call instructions and return instructions from memory to be executed by an execution unit. A branch prediction unit generates a return address prediction for an identified return instruction with reference to a return address stack. The branch prediction unit performs a return address push onto said return address stack when the execution unit executes a call instruction and performs a return address pop from the return address stack when the execution unit executes a return instruction. An error detection unit identifies a missing call instruction or a missing return instruction in said program instructions by reference to the return address prediction, a resolved return address indicated by the execution unit when the return instruction is executed and the content of the return address stack. | 11-06-2014 |
20140337604 | MINIMIZING BANDWIDTH TO TRACK RETURN TARGETS BY AN INSTRUCTION TRACING SYSTEM - A processing device implementing minimizing bandwidth to track return targets by an instruction tracing system is disclosed. A processing device of the disclosure an instruction fetch unit comprising a return stack buffer (RSB) to predict a target address of a return (RET) instruction corresponding to a call (CALL) instruction. The processing device further includes a retirement unit comprising an instruction tracing module to initiate instruction tracing for instructions executed by the processing device, determine whether the target address of the RET instruction was mispredicted, determine a value of call depth counter (CDC) maintained by the instruction tracing module, and when the target address of the RET instruction was not mispredicted and when the value of the CDC is greater than zero, generate an indication that the RET instruction branches to a next linear instruction after the corresponding CALL instruction. | 11-13-2014 |
20140372729 | PROCESSOR WITH EXECUTION UNIT WAIT CONTROL - A processor includes a processor core. The processor core includes a first execution unit and a second execution unit. The first execution unit is configured to 1) execute a complex instruction that requires multiple instruction cycles to execute; 2) generate a wait signal that when asserted suspends execution of instructions by the second execution unit for at least a portion of the execution of the complex instruction; and 3) maintain information defining parameters of the wait signal generation across interruption of the complex instruction by execution of a different instruction in the first execution unit. | 12-18-2014 |
20150026434 | CONFIGURABLE LOGIC CONSTRUCTS IN A LOOP BUFFER - Techniques are described herein for using configurable logic constructs in a loop buffer. In an embodiment, a configurable hardware block is programmed based on one or more target functions within a loop. The configurable hardware block is associated with a plurality of registers, including a loopcount register and a first output register. For each iteration of the loop, a counter value in the loopcount register is updated and a target value in the first output register is updated using the programmed configurable hardware block. For each iteration of the loop, a set of one or more instructions may be fetched from the instruction buffer and executed based on the updated target value in the first output value. | 01-22-2015 |
20150032996 | EXECUTION-AWARE MEMORY PROTECTION - Execution-Aware Memory protection technologies are described. A processor includes an instruction fetch unit to fetch instructions of applications executing in a multitasking environment and an execution unit to execute the instructions. A memory protection unit (MPU) enforces memory access control of the applications by defining an instruction region (I-space) and a data region (D-space and linking the I-space to the D-space. When the MPU determining whether an instruction address is within the I-space and whether a data address of a data access operation is within the D-space. The MPU issues a memory protection fault for the data access operation when either the instruction address is not within the I-space or the data address is not within the D-space. | 01-29-2015 |
20150046682 | GLOBAL BRANCH PREDICTION USING BRANCH AND FETCH GROUP HISTORY - This disclosure includes a method for performing branch prediction by a processor having an instruction pipeline. The processor speculatively updates a global history register having fetch group history and branch history, fetches a fetch group of instructions, and assigns a global history vector to the instructions. The processor predicts any branches in the fetch group using the global history vector and a predictor, and evaluates whether the fetch group contains a predicted taken branch. If the fetch group contains a predicted taken branch, the processor flushes subsequently fetched instructions in the pipeline following the predicted taken branch, repairs the global history register to the global history vector, and updates the global history register based on branch prediction information. If the fetch group does not contain a predicted taken branch, the processor updates the global history register with a branch history value for each branch in the fetch group. | 02-12-2015 |
20150074377 | System and Method for an Asynchronous Processor with Pepelined Arithmetic and Logic Unit - Embodiments are provided for an asynchronous processor with pipelined arithmetic and logic unit. The asynchronous processor includes a non-transitory memory for storing instructions and a plurality of instruction execution units (XUs) arranged in a ring architecture for passing tokens. Each one of the XUs comprises a logic circuit configured to fetch a first instruction from the non-transitory memory, and execute the first instruction. The logic circuit is also configured to fetch a second instruction from the non-transitory memory, and execute the second instruction, regardless whether the one of the XUs holds a token for writing the first instruction. The logic circuit is further configured to write the first instruction to the non-transitory memory after fetching the second instruction. | 03-12-2015 |
20150089193 | PREDICTIVE FETCHING AND DECODING FOR SELECTED RETURN INSTRUCTIONS - Predictive fetching and decoding for selected instructions. A determination is made as to whether an instruction to be executed in a pipelined processor is a selected return instruction, the pipelined processor having a plurality of stages including an execute stage. Based on the instruction being the selected return instruction, obtaining from a data structure a predicted return address, the predicted return address being an address of an instruction to which it is predicted that processing is to be returned. Additionally, based on the instruction being the selected return instruction, operating state for the instruction at the predicted return address is predicted. The instruction is fetched at the predicted return address, prior to the selected return instruction reaching the execute stage, and decoding of the fetched instruction is initiated based on the predicted operating state. | 03-26-2015 |
20150089194 | PREDICTIVE FETCHING AND DECODING FOR SELECTED INSTRUCTIONS - Predictive fetching and decoding for selected instructions (e.g., operating system instructions, hypervisor instructions or other such instructions). A determination is made that a selected instruction, such as a system call instruction, an asynchronous interrupt, a return from system call instruction or return from asynchronous interrupt, is to be executed. Based on determining that such an instruction is to be executed, a predicted address is determined for the selected instruction, which is the address to which processing transfers in order to provide the requested services. Then, fetching of instructions beginning at the predicted address prior to execution of the selected instruction is commenced. Further, speculative state relating to a selected instruction, including, for instance, an indication of the privilege level of the selected instruction or instructions executed on behalf of the selected instruction, is predicted and maintained. | 03-26-2015 |
20150113248 | PROCESS AND METHOD FOR SAVING DESIGNATED REGISTERS IN INTERRUPT PROCESSING BASED ON AN INTERRUPT FACTOR - A microcomputer includes: a plurality of register lists having a plurality of register patterns, respectively, wherein each of plurality of register patterns designates registers, data of which are to be saved in a data memory; an instruction fetch control circuit configured to fetch instruction code from an instruction memory in response to an interrupt request issued based on occurrence of an interrupt factor; and a register data saving control circuit configured to acquire one register pattern from one of the plurality of register lists in response to the interrupt request, and issue a microinstruction based on the acquired register pattern in response to the interrupt request. An instruction executing section is configured to execute the microinstruction prior to the fetched instruction code, to save the data of registers designated based on the acquired register pattern in the data memory. | 04-23-2015 |
20150143083 | Techniques for Increasing Vector Processing Utilization and Efficiency Through Vector Lane Predication Prediction - Techniques for increasing vector processing utilization and efficiency through use of unmasked lanes of predicated vector instructions for executing non-conflicting instructions are provided. In one aspect, a method of vector lane predication for a processor is provided which includes the steps of: fetching predicated vector instructions from a memory; decoding the predicated vector instructions; determining if a mask value of the predicated vector instructions is available and, if the mask value of the predicated vector instructions is not available, predicting the mask value of the predicated vector instructions; and dispatching the predicated vector instructions to only masked vector lanes. | 05-21-2015 |
20150317160 | KICK-STARTED RUN-TO-COMPLETION PROCESSOR HAVING NO INSTRUCTION COUNTER - A pipelined run-to-completion processor includes no instruction counter and only fetches instructions either: as a result of being prompted from the outside by an input data value and/or an initial fetch information value, or as a result of execution of a fetch instruction. Initially the processor is not clocking. An incoming value kick-starts the processor to start clocking and to fetch a block of instructions from a section of code in a table. The input data value and/or the initial fetch information value determines the section and table from which the block is fetched. A LUT converts a table number in the initial fetch information value into a base address where the table is found. Fetch instructions at the ends of sections of code cause program execution to jump from section to section. A finished instruction causes an output data value to be output and stops clocking of the processor. | 11-05-2015 |
20150317162 | KICK-STARTED RUN-TO-COMPLETION PROCESSING METHOD THAT DOES NOT INVOLVE AN INSTRUCTION COUNTER - A pipelined run-to-completion processor includes no instruction counter and only fetches instructions either: as a result of being prompted from the outside by an input data value and/or an initial fetch information value, or as a result of execution of a fetch instruction. Initially the processor is not clocking. An incoming value kick-starts the processor to start clocking and to fetch a block of instructions from a section of code in a table. The input data value and/or the initial fetch information value determines the section and table from which the block is fetched. A LUT converts a table number in the initial fetch information value into a base address where the table is found. Fetch instructions at the ends of sections of code cause program execution to jump from section to section. A finished instruction causes an output data value to be output and stops clocking of the processor. | 11-05-2015 |
20150317163 | TABLE FETCH PROCESSOR INSTRUCTION USING TABLE NUMBER TO BASE ADDRESS TRANSLATION - A pipelined run-to-completion processor includes no instruction counter and only fetches instructions either: as a result of being prompted from the outside by an input data value and/or an initial fetch information value, or as a result of execution of a fetch instruction. Initially the processor is not clocking. An incoming value kick-starts the processor to start clocking and to fetch a block of instructions from a section of code in a table. The input data value and/or the initial fetch information value determines the section and table from which the block is fetched. A LUT converts a table number in the initial fetch information value into a base address where the table is found. Fetch instructions at the ends of sections of code cause program execution to jump from section to section. A finished instruction causes an output data value to be output and stops clocking of the processor. | 11-05-2015 |
20150370574 | REPLICATING LOGIC BLOCKS TO ENABLE INCREASED THROUGHPUT - A datapath pipeline which uses replicated logic blocks to increase the throughput of the pipeline is described. In an embodiment, the pipeline, or a part thereof, comprises a number of parallel logic paths each comprising the same logic. Input register stages at the start of each logic path are enabled in turn on successive clock cycles such that data is read into each logic path in turn and the logic in the different paths operates out of phase. The output of the logic paths is read into one or more output register stages and the logic paths are combined using a multiplexer which selects an output from one of the logic paths on any clock cycle. Various optimization techniques are described and in various examples, register retiming may also be used. In various examples, the datapath pipeline is within a processor. | 12-24-2015 |
20150378726 | IMPLEMENTATION FOR A HIGH PERFORMANCE BCD DIVIDER - Embodiments of an apparatus are disclosed for performing arithmetic operations on provided operands. The apparatus may include a fetch unit, and an arithmetic logic unit (ALU). The fetch unit may be configured to retrieve two operands responsive to receiving an instruction, wherein the operands include binary-coded decimal values. The ALU may be configured to scale a value of each of the operands, and then compress the scaled values of the operands. The compressed values of the operands may include fewer data bits than the corresponding scaled values. The ALU may be further configured to estimate a portion of a result of the operation dependent upon the compressed values of the operands. | 12-31-2015 |
20160004535 | METHOD OF OPERATING A MULTI-THREAD CAPABLE PROCESSOR SYSTEM, AN AUTOMOTIVE SYSTEM COMPRISING SUCH MULTI-THREAD CAPABLE PROCESSOR SYSTEM, AND A COMPUTER PROGRAM PRODUCT - A method of operating a multi-thread capable processor system comprising a plurality of processor pipelines is described. The method comprises fetching an instruction comprising an address and selecting an operation mode based on the address of the fetched instruction, the operation mode being selected from at least a lock-step mode and a multi-thread mode. If the operation mode is selected to be the lock-step mode, the method comprises letting at least two processor pipelines of the multi-thread capable processor system execute the instruction in lock-step mode to obtain respective lock-step results, comparing the respective lock-step results against a comparison criterion for determining whether the respective lock-step results match, and, if the respective lock-step results match, determine a matching result from the respective lock-step results, and writing back the matching results. | 01-07-2016 |
20160011870 | INSTRUCTION SET FOR ELIMINATING MISALIGNED MEMORY ACCESSES DURING PROCESSING OF AN ARRAY HAVING MISALIGNED DATA ROWS | 01-14-2016 |
20160011873 | INSTRUCTION FOR IMPLEMENTING VECTOR LOOPS OF ITERATIONS HAVING AN ITERATION DEPENDENT CONDITION | 01-14-2016 |
20160026498 | POWER MANAGEMENT SYSTEM, SYSTEM-ON-CHIP INCLUDING THE SAME AND MOBILE DEVICE INCLUDING THE SAME - A power management system controlling power for a plurality of functional blocks included in a system-on-chip includes a plurality of programmable nano controllers, an instruction memory and a signal map memory. The instruction memory is shared by the nano controllers and stores a plurality of instructions that are used by the nano controllers. The signal map memory is shared by the nano controllers and stores a plurality of signals that are provided to the functional blocks and are controlled by the nano controllers. A first nano controller among the plurality of nano controllers is programmed as a central sequencer. Second through n-th nano controllers among the plurality of nano controllers are programmed as first sub-sequencers that are dependent on the first nano controller. | 01-28-2016 |
20160034278 | PICOENGINE HAVING A HASH GENERATOR WITH REMAINDER INPUT S-BOX NONLINEARIZING - A processor includes a hash register and a hash generating circuit. The hash generating circuit includes a novel programmable nonlinearizing function circuit as well as a modulo-2 multiplier, a first modulo-2 summer, a modulor-2 divider, and a second modulo-2 summer. The nonlinearizing function circuit receives a hash value from the hash register and performs a programmable nonlinearizing function, thereby generating a modified version of the hash value. In one example, the nonlinearizing function circuit includes a plurality of separately enableable S-box circuits. The multiplier multiplies the input data by a programmable multiplier value, thereby generating a product value. The first summer sums a first portion of the product value with the modified hash value. The divider divides the resulting sum by a fixed divisor value, thereby generating a remainder value. The second summer sums the remainder value and the second portion of the input data, thereby generating a hash result. | 02-04-2016 |
20160055001 | LOW POWER INSTRUCTION BUFFER FOR HIGH PERFORMANCE PROCESSORS - A method for operating an instruction buffer is disclosed. A read pointer that includes a value indicative of a given bank of a plurality of banks is received. A subset of the of the plurality of banks may then be selected dependent upon the read pointer and one or more control bits associated with an instruction stored at a location specified by the read pointer. The subset of the plurality of banks may then be activated, and an instruction read from each activated bank to form a dispatch group. | 02-25-2016 |
20160085554 | ENERGY-FOCUSED COMPILER-ASSISTED BRANCH PREDICTION - A processing system to reduce energy consumption and improve performance in a processor, controlled by compiler inserted information ahead of a selected branch instruction, to statically expose and control how the prediction should be completed and which mechanism should be used to achieve energy and performance efficiency. | 03-24-2016 |
20160098278 | INSTRUCTION FORWARDING BASED ON PREDICATION CRITERIA - Embodiments herein relate to forwarding an instruction based on predication criteria. A predicate state associated with a packet of data is to be compared to an instruction associated with the predication criteria. The instruction is to be forwarded to an execution unit if the predication criteria includes or matches the predicate state of the packet. | 04-07-2016 |
20160124749 | COALESCING ADJACENT GATHER/SCATTER OPERATIONS - According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location. | 05-05-2016 |
20160132333 | Method, Apparatus, And System For Speculative Abort Control Mechanisms - An apparatus and method is described herein for providing robust speculative code section abort control mechanisms. Hardware is able to track speculative code region abort events, conditions, and/or scenarios, such as an explicit abort instruction, a data conflict, a speculative timer expiration, a disallowed instruction attribute or type, etc. And hardware, firmware, software, or a combination thereof makes an abort determination based on the tracked abort events. As an example, hardware may make an initial abort determination based on one or more predefined events or choose to pass the event information up to a firmware or software handler to make such an abort determination. Upon determining an abort of a speculative code region is to be performed, hardware, firmware, software, or a combination thereof performs the abort, which may include following a fallback path specified by hardware or software. And to enable testing of such a fallback path, in one implementation, hardware provides software a mechanism to always abort speculative code regions. | 05-12-2016 |
20160139924 | Machine Level Instructions to Compute a 4D Z-Curve Index from 4D Coordinates - In one embodiment, a processor includes 32-bit and 64-bit machine level instructions to compute a 4D Z-curve Index. A processor decode unit is configured to decode a z-curve ordering instruction having three source operands, each operand associated with one of a first, second, or third coordinate and a processor execution unit is configured to execute the decoded instruction before outputting the 4D Z-curve index to a location specified by a destination operand. | 05-19-2016 |
20160154648 | Method, apparatus, and system for speculative abort control mechanisms | 06-02-2016 |
20160154651 | METHOD AND APPARATUS FOR PERFORMANCE EFFICIENT ISA VIRTUALIZATION USING DYNAMIC PARTIAL BINARY TRANSLATION | 06-02-2016 |
20160179545 | INSTRUCTION AND LOGIC FOR REGISTER BASED HARDWARE MEMORY RENAMING | 06-23-2016 |
20160179548 | INSTRUCTION AND LOGIC TO PERFORM AN INVERSE CENTRIFUGE OPERATION | 06-23-2016 |
20160188343 | SYSTEMS, APPARATUSES, AND METHODS FOR DATA SPECULATION EXECUTION - Systems, methods, and apparatuses for data speculation execution (DSX) are described. In some embodiments, a hardware apparatus for DSX comprises decoder hardware to decode a class of instructions to support data speculative execution (DSX) including an instruction to begin a DSX, end a DSX, and speculative instructions to execute during a DSX, and execution hardware to speculatively execute decoded instructions that support DSX including the speculative instructions and update speculative instruction tracking hardware. | 06-30-2016 |
20160202979 | Instruction And Logic To Test Transactional Execution Status | 07-14-2016 |
20160253176 | FORMING INSTRUCTION GROUPS BASED ON DECODE TIME INSTRUCTION OPTIMIZATION | 09-01-2016 |
20160253182 | FORMING INSTRUCTION GROUPS BASED ON DECODE TIME INSTRUCTION OPTIMIZATION | 09-01-2016 |
20160378465 | EFFICIENT SPARSE ARRAY HANDLING IN A PROCESSOR - In one embodiment, a processor includes at least one core to execute instructions and an accelerator coupled to the at least one core. The accelerator may include a plurality of walker logics, which may be adapted to fetch at least a portion of a first array block and at least a portion of a second array block, determine whether a first index of the first array block matches a second index of the second array block, and send a first value of the first array block associated with the first index and a second value of the second array block associated with the second index to an arithmetic unit, based at least in part on the determination. Other embodiments are described and claimed. | 12-29-2016 |
20160378495 | Locking Operand Values for Groups of Instructions Executed Atomically - A method including fetching a group of instructions, including a group header for the group of instructions, where the group of instructions is configured to execute by a processor, and where the group header includes a field including locking information for at least one operand is provided. The method further includes storing a value of the at least one operand in at least one operand buffer of the processor and based on the locking information, locking a value of the at least one operand in the at least one operand of the buffer such that the at least one operand is not cleared from the at least one operand buffer of the processor in response to completing the execution of the group of instructions. | 12-29-2016 |
20190146790 | HIGHLY INTEGRATED SCALABLE, FLEXIBLE DSP MEGAMODULE ARCHITECTURE | 05-16-2019 |
20190146800 | MIXED INFERENCE USING LOW AND HIGH PRECISION | 05-16-2019 |