Entries |
Document | Title | Date |
20080209174 | Processor And Its Instruction Issue Method - An instruction issue method for use in a pipelined processor, comprising the steps of: decoding an instruction to be processed to get a type of the instruction; computing the number of cycles to be occupied at execution stage for the instruction, according to the type of the instruction; marking a target operand of the instruction as acquirable in a predefined cycle before the instruction enters write-back stage, according to the number of cycles, so that subsequent instructions taking the target operand as their source operands perform subsequent operations according to the case that the target operand is acquirable. | 08-28-2008 |
20080209175 | Computer system and method of adapting a computer system to support a register window architecture - A target computing system | 08-28-2008 |
20080222394 | Systems and Methods for TDM Multithreading - Systems and methods for distributing thread instructions in the pipeline of a multi-threading digital processor are disclosed. More particularly, hardware and software are disclosed for successively selecting threads in an ordered sequence for execution in the processor pipeline. If a thread to be selected cannot execute, then a complementary thread is selected for execution. | 09-11-2008 |
20080229073 | Address calculation and select-and insert instructions within data processing systems - A data processing system | 09-18-2008 |
20080229074 | Design Structure for Localized Control Caching Resulting in Power Efficient Control Logic - A design structure for an integrated circuit (IC) including a decoder decoding instructions, shadow latches storing instructions as a localized loop, and a state machine controlling the decoder and the plurality of shadow latches. When the state machine identifies instructions that are the same as those stored in the localized loop, it deactivates the decoder and activates the plurality of shadow latches to retrieve and execute the localized loop in place of the instructions provided by the decoder. Additionally, a method of providing localized control caching operations in an IC to reduce power dissipation is provided. The method includes initializing a state machine to control the IC, providing a plurality of shadow latches, decoding a set of instructions, detecting a loop of decoded instructions, caching the loop of decoded instructions in the shadow latches as a localized loop, detecting a loop end signal for the loop and stopping the caching of the localized loop. | 09-18-2008 |
20080256336 | MICROPROCESSOR WITH PRIVATE MICROCODE RAM - A microprocessor includes a private RAM (PRAM), for use by microcode, which is non-user-accessible and within its own distinct address space from the system memory address space. The PRAM is denser and slower than user-accessible registers of the microprocessor macroarchitecture, thereby enabling it to provide significantly more storage for microcode. The microinstruction set includes a microinstruction for loading data from the PRAM into the user-accessible registers, and a microinstruction for storing data from user-accessible registers to the PRAM. The microcode may also use the two microinstructions to load/store between the PRAM and non-user-accessible registers of the microarchitecture. Examples of PRAM uses include: computational temporary storage area; storage of x86 VMX VMCS in response to VMREAD and VMWRITE macroinstructions; instantiation of non-user-accessible storage, such as the x86 SMBASE register; and instantiation of x86 MSRs that tolerate the additional access latency of the PRAM, such as the IA32_SYSENTER_CS MSR. | 10-16-2008 |
20080256337 | Method of Decoding A Bit Sequence, Network Element Apparatus And PDU Specification Tool Kit - In the field of data communications, it is desirable to track bits of a bit sequence remaining to be decoded by a decoder. A method of decoding the bit sequence that corresponds to a PDU comprises reading-in a bit sequence and processing the bit sequence. In order to maintain a record of the bits reaming to be processed, a data stack is used during decoding of the bit sequence. | 10-16-2008 |
20080270759 | Computer Having Dynamically-Changeable Instruction Set in Real Time - A computer allows dynamic change of an instruction set during a real-time execution. The computer includes a CPU (Central Processing Unit) having an instruction fetch unit for fetching an instruction from a memory, an instruction decoding unit for generating a predetermined control code corresponding to the instruction fetched by the instruction fetch unit, and an arithmetic logic unit operated by the control code. The instruction decoding unit includes a basic instruction decoding unit for generating a control code for a basic instruction set; and a dynamic instruction decoding unit for generating another control code different from the control code corresponding to an instruction of the basic instruction set, or generating a control code corresponding to an instruction not existing in the basic instruction set. An instruction stored in the dynamic instruction decoding unit or a corresponding control code is configured to be changeable during execution in real time. | 10-30-2008 |
20080270760 | DEBUG SUPPORT METHOD AND APPARATUS - According to the present invention, there is provided a debug support apparatus having, a decoder configured to receive an instruction output from a compiler which receives a source code, decode the instruction, and output a decoding result; and a display unit configured to receive debug information output from the compiler and the decoding result output from the decoder and display at least a correspondence between each decoded instruction and a position in the source code. | 10-30-2008 |
20080282065 | Image forming apparatus and control method for the same - An image forming apparatus includes a job-history managing unit that creates and manages a job history, registers a job history, as a job history instance, containing an operation code that instructs a series of job operations to be executed and also containing an operating condition of the job operation, and that registers a macro instance which can be interpreted by other image forming apparatuses and is encoded to an executable shared code. The image forming apparatus also includes an interpreter unit that reads the job history instance, encodes the operation code to the shared code, and registers the shared code as the macro instance. | 11-13-2008 |
20080288752 | DESIGN STRUCTURE FOR FORWARDING STORE DATA TO LOADS IN A PIPELINED PROCESSOR - A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design for forwarding store data to loads in a pipelined processor is provided. In one implementation, a processor is provided that includes a decoder operable to decode an instruction, and a plurality of execution units operable to respectively execute a decoded instruction from the decoder. The plurality of execution units include a load/store execution unit operable to execute decoded load instructions and decoded store instructions and generate corresponding load memory operations and store memory operations. The store queue is operable to buffer one or more store memory operations prior to the one or more memory operations being completed, and the store queue is operable to forward store data of the one or more store memory operations buffered in the store queue to a load memory operation on a byte-by-byte basis. | 11-20-2008 |
20080301410 | Processor Configured for Operation with Multiple Operation Codes Per Instruction - A processor configured to operate with multiple operation codes for each of a plurality of instructions comprises memory circuitry and processing circuitry coupled to the memory circuitry. The processing circuitry is configured to decode a first operation code to produce a given one of the instructions and to decode a second operation code different than the first operation code to also produce the given instruction. Thus, the same instruction is produced for execution by the processing circuitry regardless of whether the first operation code or the second operation code is decoded. The assignment of multiple operation codes to a given instruction may occur in conjunction with the design of the processor, and dynamic selection of a particular one of those operation codes may be performed in conjunction with assembly of code for execution by the processor. | 12-04-2008 |
20090006814 | Immediate and Displacement Extraction and Decode Mechanism - An extraction and decode mechanism for acquiring and processing instructions and the corresponding constant(s) embedded within the instructions. The extraction and decode mechanism may be included within a processing unit, and may comprise an instruction decode unit and at least one constant steer network. During operation, the instruction decode unit may obtain and decode instructions which are to be executed by the processing unit. For each instruction, the instruction decode unit may also determine the location of one or more constants embedded within the instruction. The constant steer network may receive the location information from the instruction decode unit. While the instruction decode unit decodes the instruction, the constant steer network may obtain the constant(s) embedded within the instruction based on the location information and store the constant(s). The constant(s) embedded within the instruction may be immediate or displacement (imm/disp) constant(s). | 01-01-2009 |
20090006815 | SYSTEM AND METHOD FOR INTERFACING DEVICES - A system in one embodiment includes (i) an interface module to control and monitor the system; a plurality of power cells which act as a point of power delivery and monitor environmental variables that effect function and reliability, (ii) a radio frequency transmitter and receiver to manage nodes distributed across the plurality of power cells, (iii) a maintenance module presenting information requests to be forwarded to the plurality of power cells, and (iv) a communication bus for distribution of data throughout the system. | 01-01-2009 |
20090019262 | PROCESSOR MICRO-ARCHITECTURE FOR COMPUTE, SAVE OR RESTORE MULTIPLE REGISTERS, DEVICES, SYSTEMS, METHODS AND PROCESSES OF MANUFACTURE - An electronic circuit ( | 01-15-2009 |
20090031113 | Processor Array, Processor Element Complex, Microinstruction Control Appraratus, and Microinstruction Control Method - A processor array including area-saving microprogram memories is provided. In the processor array, microprogram memories of a plurality of adjacent processor arrays are shared. Effective data and position information | 01-29-2009 |
20090049278 | EFFICIENT MEMORY UPDATE PROCESS FOR ON-THE-FLY INSTRUCTION TRANSLATION FOR WELL BEHAVED APPLICATIONS EXECUTING ON A WEAKLY-ORDERED PROCESSOR - A multiprocessor data processing system (MDPS) with a weakly-ordered architecture providing processing logic for substantially eliminating issuing sync instructions after every store instruction of a well-behaved application. Instructions of a well-behaved application are translated and executed by a weakly-ordered processor. The processing logic includes a lock address tracking utility (LATU), which provides an algorithm and a table of lock addresses, within which each lock address is stored when the lock is acquired by the weakly-ordered processor. When a store instruction is encountered in the instruction stream, the LATU compares the target address of the store instruction against the table of lock addresses. If the target address matches one of the lock addresses, indicating that the store instruction is the corresponding unlock instruction (or lock release instruction), a sync instruction is issued ahead of the store operation. The sync causes all values updated by the intermediate store operations to be flushed out to the point of coherency and be visible to all processors. | 02-19-2009 |
20090063820 | Application Specific Instruction Set Processor for Digital Radio Processor Receiving Chain Signal Processing - This invention is an application specific integrated processor to implement the complete fixed-rate DRX signal processing paths (FDRX) for a reconfigurable processor-based multi-mode 3G wireless application. This architecture is based on the baseline 16-bit RISC architecture with addition functional blocks (ADU) tightly coupled with the based processor's data path. Each ADU accelerates a computation-intensive tasks in FDRX signal path, such as multi-tap FIRs, IIRs, complex domain and vectored data processing. The ADUs are controlled through custom instructions based on the load/store architecture. The whole FDRX data path can be easily implemented by the software employing these custom instructions. | 03-05-2009 |
20090063821 | PROCESSOR APPARATUS INCLUDING OPERATION CONTROLLER PROVIDED BETWEEN DECODE STAGE AND EXECUTE STAGE - A processor apparatus includes a sequence controller that decodes the instruction code stored in an instruction memory, an operation array that executes operation of the decoded instruction code, and an asynchronous FIFO. The asynchronous FIFO is provided between a decode stage for decoding the instruction code into at least one instruction by the sequence controller and an execute stage for executing the decoded instruction by the operation array. The asynchronous FIFO executes control, so that the read timing and the execute timing of the decoded instruction are different from each other, and the decoded instruction is continuously executed by the operation array. | 03-05-2009 |
20090063822 | MICROPROCESSOR - A microprocessor includes: a processor core that performs pipeline processing; an instruction analyzing section that analyzes an instruction to be processed by the processor core and outputs analysis information indicating whether the instruction matches with a specific instruction; and a memory that temporary stores the instruction with the analysis information, wherein the processor core includes: an instruction fetch unit that fetches the instruction stored in the memory; an instruction decode unit that decodes the instruction fetched by the instruction fetch unit; an instruction execute unit that executes the instruction decoded by the instruction decode unit; and a specific instruction execute controller that reads out the analysis information stored in the memory and controls operation of at least one of the instruction fetch unit and the instruction decode unit when the analysis instruction indicates that the instruction matches with the specific instruction. | 03-05-2009 |
20090089549 | H.264 Video Decoder CABAC Core Optimization Techniques - A device employing techniques to optimize Context-based Adaptive Binary Arithmetic Coding (CABAC) for the H.264 video decoding is provided. The device includes a processing circuit operative to implement a set of instructions to decode multiple bins simultaneously and renormalize an offset register and a range register after the multiple bins are decoded. The range register and offset registers may be 32 or 64 bits. The use of a larger range register allows renormalization to be skipped when enough bits are still in the range register. | 04-02-2009 |
20090089550 | JEK DYNAMIC INSTRUMENTATION - A method and system for performing dynamic instrumentation. At least some of the illustrative embodiments are methods comprising setting at least one monitor value (wherein the at least one monitor value is associated with a software monitoring handler), detecting a value within a register equal to the at least one monitor value, and executing the software monitoring handler based on the detecting. | 04-02-2009 |
20090119489 | Methods and Apparatus for Transforming, Loading, and Executing Super-Set Instructions - Techniques are described for loading decoded instructions and super-set instructions in a memory for later access. For loading a decoded instruction, the decoded instruction is a transformed form of an original instruction that was stored in the program memory. The transformation is from an encoded assembly level format to a binary machine level format. In one technique, the transformation mechanism is invoked by a transform and load instruction that causes an instruction retrieved from program memory to be transformed into a new language format and then loaded into a transformed instruction memory. The format of the transformed instruction may be optimized to the implementation requirements, such as improving critical path timing. The transformation of instructions may extend to other needs beyond timing path improvement, for example, requiring super-set instructions for increased functionality and improvements to instruction level parallelism. Techniques for transforming, loading, and executing super-set instructions are described. | 05-07-2009 |
20090138680 | VECTOR ATOMIC MEMORY OPERATIONS - A processor is operable to execute one or more vector atomic memory operations. A further embodiment provides support for atomic memory operations in a memory manger, which is operable to process atomic memory operations and to return a completion notification or a result. | 05-28-2009 |
20090172356 | COMPRESSED INSTRUCTION FORMAT - A technique for decoding an instruction in a variable-length instruction set. In one embodiment, an instruction encoding is described, in which legacy, present, and future instruction set extensions are supported, and increased functionality is provided, without expanding the code size and, in some cases, reducing the code size. | 07-02-2009 |
20090172357 | USING A PROCESSOR IDENTIFICATION INSTRUCTION TO PROVIDE MULTI-LEVEL PROCESSOR TOPOLOGY INFORMATION - Embodiments of an invention for using a processor identification instruction to provide multi-level processor topology information are disclosed. In one embodiment, a processor includes decode logic and control logic. The decode logic is to receive an identification instruction having an associated topological level value. The control logic is to provide, in response to the decode logic receiving the identification instruction, processor identification information corresponding to the associated topological level value. | 07-02-2009 |
20090172358 | IN-LANE VECTOR SHUFFLE INSTRUCTIONS - In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand. | 07-02-2009 |
20090198966 | Multi-Addressable Register File - A single register file may be addressed using both scalar and SIMD instructions. That is, subsets of registers within a multi-addressable register file according to the illustrative embodiments, are addressable with different instruction forms, e.g., scalar instructions, SIMD instructions, etc., while the entire set of registers may be addressed with yet another form of instructions, referred to herein as Vector-Scalar Extension (VSX) instructions. The operation set that may be performed on the entire set of registers using the VSX instruction form is substantially similar to that of the operation sets of the subsets of registers. Such an arrangement allows legacy instructions to access subsets of registers within the multi-addressable register file while new instructions, i.e. the VSX instructions, may access the entire range of registers within the multi-addressable register file. | 08-06-2009 |
20090217005 | METHOD, SYSTEM, AND COMPUTER PROGRAM PRODUCT FOR SELECTIVELY ACCELERATING EARLY INSTRUCTION PROCESSING - A method for selectively accelerating early instruction processing including receiving an instruction data that is normally processed in an execution stage of a processor pipeline, wherein a configuration of the instruction data allows a processing of the instruction data to be accelerated from the execution stage to an address generation stage that occurs earlier in the processor pipeline than the execution stage, determining whether the instruction data can be dispatched to the address generation stage to be processed without being delayed due to an unavailability of a processing resource needed for the processing of the instruction data in the address generation stage, dispatching the instruction data to be processed in the address generation stage if it can be dispatched without being delayed due to the unavailability of the processing resource, and dispatching the instruction data to be processed in the execution stage if it can not be dispatched without being delayed due to the unavailability of the processing resource, wherein the processing of the instruction data is selectively accelerated using an address generation interlock scheme. A corresponding system and computer program product. | 08-27-2009 |
20090235054 | DISASSEMBLING AN EXECUTABLE BINARY - A method for disassembling an executable binary (binary). In one implementation, a plurality of potential address references may be identified based on the binary and a plurality of storage addresses containing the binary. A plurality of assembler source code instructions (instructions) may be generated by disassembling the binary. The binary may be disassembled at one or more sequential addresses starting at each of the plurality of potential address references. | 09-17-2009 |
20090235055 | SCHEDULING APPARATUS AND SCHEDULING METHOD - A scheduling apparatus includes a state information acquisition unit that acquires, from a decoding unit, state information that serves as an indicator for computing a processing time actually required for a decoding unit to perform a decoding process on first processing target data, and a schedule generating unit that determines a second scheduled processing time for which the decoding unit is able to perform the decoding process on second processing target data, based on the processing time computed based on the state information. The schedule generating unit further generates a second schedule for the decoding unit to perform the decoding process on the second processing target data, by allocating the second scheduled processing time to the decoding unit. | 09-17-2009 |
20090254735 | MERGE MICROINSTRUCTION FOR MINIMIZING SOURCE DEPENDENCIES IN OUT-OF-ORDER EXECUTION MICROPROCESSOR WITH VARIABLE DATA SIZE MACROARCHITECTURE - A microprocessor processes a macroinstruction that instructs the microprocessor to write an 8-bit result into only a lower 8 bits of an N-bit architected general purpose register. An instruction translator translates the macroinstruction into a merge microinstruction that specifies an N-bit first source register, an 8-bit second source register, and an N-bit destination register to receive an N-bit result. The N-bit first source register and the N-bit destination register are the N-bit architected general purpose register. An execution unit receives the merge microinstruction and responsively generates the N-bit result to be subsequently written to the N-bit architected general purpose register even though the macroinstruction only instructs the microprocessor to write the 8-bit result into the lower 8 bits of the N-bit architected general purpose register. Specifically, the execution unit directs the 8-bit result into the lower 8 bits of the N-bit result and directs the upper N-8 bits of the N-bit first source register into corresponding upper N-8 bits of the N-bit result. | 10-08-2009 |
20090327659 | TRAP-BASED MECHANISM FOR TRACKING MEMORY ACCESSES - In general, the invention relates to a method. The method includes receiving notification, which includes context information, of a trap. The method further includes accessing, based at least partially upon the context information, a particular instruction that caused the trap, determining, based at least partially upon the context information, a particular address that is to be accessed by the particular instruction, updating a set of log information to indicate accessing of the particular address, causing subsequent accesses of the particular address to not give rise to a trap, after causing subsequent accesses of the particular address to not give rise to a trap, accessing the particular address, after accessing the particular address, causing subsequent accesses of the particular address to give rise to a trap, and causing the particular instruction to not be executed. | 12-31-2009 |
20100011190 | DECODING MULTITHREADED INSTRUCTIONS - A microprocessor capable of decoding a plurality of instructions associated with a plurality of threads is disclosed. The microprocessor may comprise a first array comprising a first plurality of microcode operations associated with an instruction from within the plurality, the first array capable of delivering a first predetermined number of microcode operations from the first plurality of microcode operations. The microprocessor may further comprise a second array comprising a second plurality of microcode operations, the second array capable of providing one or more of the second plurality of microcode operations in the event that the instruction decodes into more than the first predetermined number of microcode operations. The microprocessor may further comprise an arbiter coupled between the first and second arrays, where the arbiter may determine which thread from the plurality of threads accesses the second array. | 01-14-2010 |
20100042812 | Data Dependent Instruction Decode - A circuit arrangement and method support data dependent instruction decoding, whereby instructions are decoded, in part, using decode data that is stored in operand registers identified by such instructions. An instruction may include an opcode and at least one operand that identifies a register. During execution of the instruction, the instruction is first decoded using the opcode, and then decode data stored in the operand register is retrieved and used to further decode the instruction, e.g., to select from among a plurality of operations or instruction types associated with the same opcode. | 02-18-2010 |
20100049948 | Serial flash semiconductor memory - A serial flash memory is provided with multiple configurable pins, at least one of which is selectively configurable for use in either single-bit serial data transfers or multiple-bit serial data transfers. In single-bit serial mode, data transfer is bit-by-bit through a pin. In multiple-bit serial mode, a number of sequential bits are transferred at a time through respective pins. The serial flash memory may have 16 or fewer pins, and even 8 or fewer pins, so that low pin count packaging such as the 8-pin or 16-pin SOIC package and the 8-contact MLP/QFN/SON package may be used. The availability of the single-bit serial type protocol enables compatibility with a number of existing systems, while the availability of the multiple-bit serial type protocol enables the serial flash memory to provide data transfer rates, in systems that can support them, that are significantly faster than available with standard serial flash memories. | 02-25-2010 |
20100064119 | DATA PROCESSOR - The present invention is directed to realize efficient issue of a superscalar instruction in an instruction set including an instruction with a prefix. A circuit is employed which retrieves an instruction of each instruction code type other than a prefix on the basis of a determination result of decoders for determining an instruction code type, adds the immediately preceding instruction to the retrieved instruction, and outputs the resultant to instruction executing means. When an instruction of a target instruction code type is detected in a plurality of instruction units to be searched, the circuit outputs the detected instruction code and the immediately preceding instruction other than the target instruction code type as prefix code candidates. When an instruction of a target instruction code type cannot be detected at the rear end of the instruction units to be searched, the circuit outputs the instruction at the rear end as a prefix code candidate. When an instruction of a target instruction code type is detected at the head in the instruction code search, the circuit outputs the instruction code at the head. | 03-11-2010 |
20100095091 | Processor, Method and Computer Program - To accelerate processing speed of a processor while keeping increased complexity in the processor's circuitry to a minimum. A processor is offered, comprising a decoder which sequentially acquires and decodes an instruction from a program, including an instruction of a first type and a second type, which are classified according to a property of data upon which the instruction is to operate; a first operation unit which sequentially receives from the decoder, and executes, the instruction of the first type; an operand processing circuit which substitutes a variable value, which is set into a register that is associated with the first operation unit, and which is included within an operand of the instruction of the second type, with a constant; a buffer which queues the instruction of the second type that has been decoded by the decoder, and the operand thereof has been substituted by the operand processing circuit; and a second operation unit which sequentially receives from the buffer, and executes, the instruction of the second type. Methods and computer program for implementing the methods are also disclosed. | 04-15-2010 |
20100095092 | Instruction execution control device and instruction execution control method - An instruction execution control device operates a plurality of threads in a simultaneous multi-thread system. And the instruction execution control device has a thread selection circuit ( | 04-15-2010 |
20100095093 | INFORMATION PROCESSING APPARATUS AND METHOD OF CONTROLLING REGISTER - An information processing apparatus and a method of controlling the same that employs a register window system and a Simultaneous Multithreading method for reducing circuit areas by sharing a data transfer bus between threads, said bus connecting a master register and a work register provided for each thread and for avoiding interference in instruction execution with other threads caused by a conflict between accesses to a register between threads. An information processing apparatus and a method of controlling the information processing apparatus employing a register window system for register reading, in which a master register and a work register are held for each thread and a bus for transferring data from the master to the work register is shared by threads in order to realize Simultaneous Multithreading. | 04-15-2010 |
20100095094 | METHOD FOR PROCESSING DATA - A method and device for translating a program to a system including at least one first processor and a reconfigurable unit. Code portions of the program which are suitable for the reconfigurable unit are determined. The remaining code of the program is extracted and/or separated for processing by the first processor. | 04-15-2010 |
20100095095 | Instruction processing apparatus - An instruction processing apparatus includes a thread execution processing section executing threads each including plural instructions, a register file including a register window having plural registers, a current window pointer indicating a position of the register where the register window is possible to be inputted and outputted, a current register reading data held by the register window designated by the current window pointer to hold the data and a replacement buffer holding data transferred from the register file to the current register, a first transfer path transferring data in a register file to one of the replacement buffer, a second data transfer transferring data in a replacement buffer to one of the current registers, a calculation section executing a switching instruction of the register window, and a control section controlling, if the calculation section executes the switching instruction, the first data transfer path and the second data transfer path. | 04-15-2010 |
20100100711 | DATA PROCESSOR DEVICE AND METHODS THEREOF - A microsequencer is disclosed that controls the order in which microcode instructions are fetched from a microcode ROM. Each microcode instruction includes an execution command for execution by one or more execution units. Each microcode instruction also includes a microsequencer command to indicate the location of another microcode instruction at the microcode ROM. The microcode instruction can also include a delay field, indicating a selectable time when the associated microcode instruction is to be decoded. The delay field thereby provides more flexible control of the sequencing of microcode instructions. | 04-22-2010 |
20100106944 | Data processing apparatus and method for performing rearrangement operations - A data processing apparatus and method are provided for performing rearrangement operations. The data processing apparatus has a register data store with a plurality of registers, each register storing a plurality of data elements. Processing circuitry is responsive to control signals to perform processing operations on the data elements. An instruction decoder is responsive to at least one but no more than N rearrangement instructions, where N is an odd plural number, to generate control signals to control the processing circuitry to perform a rearrangement process at least equivalent to: obtaining as source data elements the data elements stored in N registers of said register data store as identified by the at least one re-arrangement instruction; performing a rearrangement operation to rearrange the source data elements between a regular N-way interleaved order and a de-interleaved order in order to produce a sequence of result data elements; and outputting the sequence of result data elements for storing in the register data store. This provides a particularly efficient technique for performing N-way interleave and de-interleave operations, where N is an odd number, resulting in high performance, low energy consumption, and reduced register use when compared with known prior art techniques. | 04-29-2010 |
20100122068 | MULTITHREADED PROCESSOR WITH MULTIPLE CONCURRENT PIPELINES PER THREAD - A multithreaded processor comprises a plurality of hardware thread units, an instruction decoder coupled to the thread units for decoding instructions received therefrom, and a plurality of execution units for executing the decoded instructions. The multithreaded processor is configured for controlling an instruction issuance sequence for threads associated with respective ones of the hardware thread units. On a given processor clock cycle, only a designated one of the threads is permitted to issue one or more instructions, but the designated thread that is permitted to issue instructions varies over a plurality of clock cycles in accordance with the instruction issuance sequence. The instructions are pipelined in a manner which permits at least a given one of the threads to support multiple concurrent instruction pipelines. | 05-13-2010 |
20100161944 | Processor and instruction control method - An original first instruction word (I | 06-24-2010 |
20100169613 | TRANSLATING INSTRUCTIONS IN A SPECULATIVE PROCESSOR - A method for use by a host microprocessor which translates sequences of instructions from a target instruction set for a target processor to sequences of instructions for the host microprocessor including the steps of beginning execution of a speculative sequence of target instructions by committing state of the target processor and storing memory stores previously generated by execution at a point in the execution of instructions at which state of the target processor is known, executing the speculative sequence of host instructions until another point in the execution of target instructions at which state of the target processor is known, rolling back to last committed state of the target processor and discarding the memory stores generated by the speculative sequence of host instructions if execution fails, and beginning execution of a next sequence of target instructions if execution succeeds. | 07-01-2010 |
20100174889 | GUEST-SPECIFIC MICROCODE - Embodiments of apparatuses, methods, and systems for modifying the behavior of a guest installed to run within a VM are disclosed. In one embodiment, an apparatus includes virtualization logic, first storage, second storage, decode logic, and multiplexing logic. The virtualization logic is to provide a mode in which to operate a virtual machine. The first storage is to store a first plurality of micro-instructions to control the apparatus. The second storage is to store a second plurality of micro-instructions to control the apparatus. The decode logic is to decode a macro-instruction into one of a first plurality and a second plurality of micro-instructions. The multiplexing logic is to cause the macro-instruction to be decoded into the second plurality of micro-instructions instead of the first plurality of micro-instructions only when issued from the virtual machine. | 07-08-2010 |
20100180104 | APPARATUS AND METHOD FOR PATCHING MICROCODE IN A MICROPROCESSOR USING PRIVATE RAM OF THE MICROPROCESSOR - A microprocessor has a microcode memory for storing original microcode instructions to implement user program instructions, and an interface to an external memory for storing a microcode patch. The microcode patch includes substitute microcode instructions and validation information. The microprocessor includes a private random access memory (PRAM), addressable by the original and substitute microcode instructions but not addressable by user program instructions. The microprocessor also includes patch hardware, which conditionally receives the substitute microcode instructions. The microprocessor executes the substitute microcode instructions when applied to the patch hardware instead of corresponding original microcode instructions. The microprocessor is configured to load the microcode patch from external memory into PRAM, determine whether the microcode patch is valid, apply substitute microcode instructions from PRAM to the patch hardware if the microcode patch is valid, and refrain from applying the substitute microcode instructions to the patch hardware, if the microcode patch is invalid. | 07-15-2010 |
20100185835 | Data Processing Circuit With A Plurality Of Instruction Modes, Method Of Operating Such A Data Circuit And Scheduling Method For Such A Data Circuit - A data processing circuit has an execution circuit ( | 07-22-2010 |
20100191937 | Implied Storage Operation Decode Using Redundant Target Address Detection - A logic arrangement and method to support implied storage operation decode uses redundant target address detection, whereby target addresses of previous instructions are compared with the target address of the current instruction, and if equal, and the target addresses of previous instructions are not used as sources, the current instruction is decoded as a store instruction. This allows a redundant operation in an instruction set architecture to be redefined as a store instruction, freeing up opcodes normally used for store instructions to be used for other instructions. | 07-29-2010 |
20100191938 | INFORMATION PROCESSING DEVICE, ARITHMETIC PROCESSING METHOD, ELECTRONIC APPARATUS AND PROJECTOR - An information processing device including: a first arithmetic processing unit performing first arithmetic processing; a second arithmetic processing unit performing second arithmetic processing; input registers adapted to include a first input register allocated to the first arithmetic processing unit, and a second input register allocated to the second arithmetic processing unit; and output registers storing a processing results of the first arithmetic processing unit and a processing results of the second arithmetic processing unit, in each of given execution cycles, the first arithmetic processing unit performs the first arithmetic processing using stored data of the first input register and stores a processing result of the first arithmetic processing in the output registers and the second arithmetic processing unit performs the second arithmetic processing using stored data of the second input register and stores a processing result of the second arithmetic processing in the output registers. | 07-29-2010 |
20100199072 | Register file - A register file comprising a plurality of register entries for storing data values for use in the execution of data processing instructions is provided, and comprises at least one write port and at least one read port, and circuitry responsive to a write request received at said at least one write port to update one of said plurality of register entries identified by an address specified by said write request with a data value specified by said write request. The register file also comprises further circuitry responsive to a received control signal to set at least a portion of a predetermined register entry to a predetermined value. In this way, certain register file updating instructions can be executed in parallel with other instructions without the need for additional full write-ports as would be required for typical dual-issue, thereby reducing area and routing complexity and cost compared with the use of an additional write-port due to the lower gate count required by the proposed further circuitry. | 08-05-2010 |
20100205406 | OUT-OF-ORDER EXECUTION MICROPROCESSOR THAT SPECULATIVELY EXECUTES DEPENDENT MEMORY ACCESS INSTRUCTIONS BY PREDICTING NO VALUE CHANGE BY OLDER INSTRUCTIONS THAT LOAD A SEGMENT REGISTER - An out-of-order execution microprocessor executes an architectural segment register-loading instruction that instructs the microprocessor to load a new value into an architectural segment register of the microprocessor. A comparator compares the new value specified by the architectural segment register-loading instruction with a current contents of the architectural segment register. A control unit causes to be re-executed using the new value all instructions in the microprocessor that used the current architectural segment register contents as a source operand and that are newer in program order than the architectural segment register-loading instruction whenever the comparator indicates the new value does not equal the current contents. An instruction scheduler retrieves the current contents and issues for execution instructions that use the retrieved current contents, even though the instructions are newer in program order than the register-loading instruction and the register-loading instruction has not yet written the new value to the architectural segment register. | 08-12-2010 |
20100217958 | Address calculation and select-and-insert instructions within data processing systems - A data processing system | 08-26-2010 |
20100250900 | DEPENDENCY TRACKING FOR ENABLING SUCCESSIVE PROCESSOR INSTRUCTIONS TO ISSUE - An information handling system includes a processor with an issue unit (IU) that may perform instruction dependency tracking for successive instruction issue operations. The IU maintains non-shifting issue queue (NSIQ) and shifting issue queue (SIQ) instructions along with relative instruction to instruction dependency information. A mapper maps queue position data for instructions that dispatch to issue queue locations within the IU. The IU may test an issuing producer instruction against consumer instructions in the IU for queue position (QPOS) and register tag (RTAG) matches. A matching consumer instruction may issue in a successive manner in the case of a queue position match or in a next processor cycle in the case of a register tag match. | 09-30-2010 |
20100268917 | Systems and Methods for Ramped Power State Control in a Semiconductor Device - Various embodiments of the present invention provide systems and methods for ramping current usage in a semiconductor device. For example, various embodiments of the present invention provide semiconductor devices that include at least a first function circuit and a second function circuit, and a power state change control circuit. The power state change control circuit is operable to transition the power state of the first function circuit from a reduced power state to an operative power state, and to transition the second function circuit from a reduced power state to an operative power state. Transition of the power state of at least one of the first function circuit and the second function circuit is done in at least a first stage at a first time and a second stage at a second time, with the second time being after the first time. | 10-21-2010 |
20100268918 | ASIP ARCHITECTURE FOR EXECUTING AT LEAST TWO DECODING METHODS - A system for execution of a decoding method is disclosed. The system is capable of executing at least two data decoding methods which are different in underlying coding principle, wherein at least one of the data decoding methods requires data shuffling operations on the data. In one aspect, the system includes at least one application specific processor having an instruction set having arithmetic operators excluding multiplication, division and power. The processor is selected for execution of approximations of each of the at least two data decoding methods. The system also includes at least a first memory unit, e.g. background memory, for storing data. The system also includes a transfer unit for transferring data from the first memory unit towards the at least one programmable processor. The transfer unit includes a data shuffler. The system may also include a controller for controlling the data shuffler independent from the processor. | 10-21-2010 |
20100287358 | Branch Prediction Path Instruction - A method for branch prediction, the method comprising, receiving a branch wrong guess instruction having a branch wrong guess instruction address and data including an opcode and a branch target address, determining whether the branch wrong guess instruction was predicted by a branch prediction mechanism, sending the branch wrong guess instruction to an execution unit responsive to determining that the branch wrong guess instruction was predicted by the branch prediction mechanism, and receiving and decoding instructions at the branch target address. | 11-11-2010 |
20100299500 | PREFIX ACCUMULATION FOR EFFICIENT PROCESSING OF INSTRUCTIONS WITH MULTIPLE PREFIX BYTES - In a microprocessor that has an instruction set architecture in which the instructions may include a variable number of prefix bytes, an apparatus for efficiently extracting instructions from a stream of undifferentiated instruction bytes. Decode logic determines which byte is an opcode byte for each instruction of a plurality of instructions within the stream of undifferentiated instruction bytes. The opcode byte is the first non-prefix byte of the instruction. The decode logic accumulates prefix information onto the opcode byte of the instruction for each instruction of the plurality of instructions. A queue holds the stream of undifferentiated instruction bytes and the accumulated prefix information. Extraction logic extracts the plurality of instructions from the queue in one clock cycle independent of the number of prefix bytes included in each of the plurality of instructions. | 11-25-2010 |
20100299501 | INSTRUCTION EXTRACTION THROUGH PREFIX ACCUMULATION - An apparatus has a queue, each entry stores a different line of a stream of instruction bytes and accumulated prefix information associated with each instruction byte. Control logic: (a) detects a condition where an initial portion of an instruction partially within a first line stored in the bottom entry (BE) of the queue remains unextracted from the queue, wherein the initial portion instruction bytes are all prefix bytes; (b) saves away the initial portion length, shifts the first line in the BE out of the queue, and shifts a second line into the BE, in response to detecting the condition; (c) extracts instruction bytes of the unextracted instruction from the second line in the BE and extracts accumulated prefix information from the second line of the BE in place of the already shifted out initial portion prefix bytes; (d) calculates the unextracted instruction length using the saved length; and (e) extracts an instruction other than the unextracted instruction from the second line in the BE using the calculated length. | 11-25-2010 |
20100299502 | BAD BRANCH PREDICTION DETECTION, MARKING, AND ACCUMULATION FOR FASTER INSTRUCTION STREAM PROCESSING - An apparatus for extracting instructions from a stream of undifferentiated instruction bytes in a microprocessor having an instruction set architecture in which the instructions are variable length. Decode logic decodes the instruction bytes of the stream to generate for each a corresponding opcode byte indictor and end byte indicator and receives a corresponding taken indicator for each of the instruction bytes. The taken indicator is true if a branch predictor predicted the instruction byte is the opcode byte of a taken branch instruction. The decode logic generates a corresponding bad prediction indicator for each of the instruction bytes. The bad prediction indicator is true if the corresponding taken indicator is true and the corresponding opcode byte indicator is false. The decode logic sets to true the bad prediction indicator for each remaining byte of an instruction whose opcode byte has a true bad prediction indicator. Control logic extracts instructions from the stream and sends the extracted instructions for further processing by the microprocessor. The control logic foregoes sending an instruction having both a true end byte indicator and a true bad prediction indicator. | 11-25-2010 |
20100306504 | Controlling issue and execution of instructions having multiple outcomes - At least one instruction of a sequence of program instructions has a plurality of alternative outcomes including at least a first outcome that is independent of at least one operand and a second outcome that is dependent on the at least one operand. The at least one operand is a value generated by a preceding instruction in the sequence. The instruction is issued for execution independently of when the at least one operand is generated by the preceding instruction. Recovery circuitry is provided to perform a recovery operation in the event that the second outcome is executed for the at least one instruction and the at least one operand has not been generated by the preceding instruction when the at least one instruction is to be executed by said instruction execution circuitry. | 12-02-2010 |
20100325394 | System and Method for Balancing Instruction Loads Between Multiple Execution Units Using Assignment History - A system and method for balancing instruction loads between multiple execution units are disclosed. One or more execution units may be represented by a slot configured to accept instructions on behalf of the execution unit(s). A decode unit may assign instructions to a particular slot for subsequent scheduling for execution. Slot assignments may be made based on an instruction's type and/or on a history of previous slot assignments. A cumulative slot assignment history may be maintained in a bias counter, the value of which reflects the bias of previous slot assignments. Slot assignments may be determined based on the value of the bias counter, in order to balance the instruction load across all slots, and all execution units. The bias counter may reflect slot assignments made only within a desired historical window. A separate data structure may store data reflecting the actual slot assignments made during the desired historical window. | 12-23-2010 |
20100332801 | Adaptively Handling Remote Atomic Execution - In one embodiment, a method includes receiving an instruction for decoding in a processor core and dynamically handling the instruction with one of multiple behaviors based on whether contention is predicted. If no contention is predicted, the instruction is executed in the core, and if contention is predicted data associated with the instruction is marshaled and sent to a selected remote agent for execution. Other embodiments are described and claimed. | 12-30-2010 |
20100332802 | Priority circuit, processor, and processing method - A priority circuit is connected to a reservation station and a plurality of arithmetic units that processes different operations and dispatches, when it is determined that an executable flag indicating that an instruction can be executed by only a specific arithmetic unit is on, an instruction to an arithmetic unit that is different from the specific arithmetic unit and of which a queue is vacant in accordance with the input performed by an instruction decoder and the reservation station. | 12-30-2010 |
20110016292 | OUT-OF-ORDER EXECUTION IN-ORDER RETIRE MICROPROCESSOR WITH BRANCH INFORMATION TABLE TO ENJOY REDUCED REORDER BUFFER SIZE - An out-of-order execution in-order retire microprocessor includes a branch information table comprising N entries. Each of the N entries stores information associated with a branch instruction. The microprocessor also includes a reorder buffer comprising M entries. Each of the M entries stores information associated with an unretired instruction within the microprocessor. Each of the M entries includes a field that indicates whether the unretired instruction is a branch instruction and, if so, a tag identifying one of the N entries in the branch information table storing information associated with the branch instruction. N is significantly less than M such that the overall die space and power consumption is reduced over a processor in which each reorder buffer entry stores the branch information. | 01-20-2011 |
20110022823 | INFORMATION PROCESSING SYSTEM AND INFORMATION PROCESSING METHOD THEREOF - An information processing system includes an execution unit and a decoder. The execution unit includes a plurality of arithmetic units each having a first operation circuit that performs a first operation on a first input value and a second input value, a second operation circuit that performs a second operation on the first input value and the second input value, and a selector that selects and outputs either a first output value output from the first operation circuit or a second output value output from the second operation circuit based on a selection signal. The decoder decodes an operation instruction and determines each value of the selection signal of each arithmetic unit. The decoder determines the value of the selection signal corresponding to the operation instruction with respect to each program. | 01-27-2011 |
20110029759 | METHOD AND APPARATUS FOR SHUFFLING DATA - Method, apparatus, and program means for shuffling data. The method of one embodiment comprises receiving a first operand having a set of L data elements and a second operand having a set of L control elements. For each control element, data from a first operand data element designated by the individual control element is shuffled to an associated resultant data element position if its flush to zero field is not set and a zero is placed into the associated resultant data element position if its flush to zero field is not set. | 02-03-2011 |
20110035572 | Computing device, information processing apparatus, and method of controlling computing device - Multiple data processing instructions instruct a computing device to process multiple data including first data and second data. When a multiple data processing instruction is decoded, two allocatable registers are selected. One is used to store the result of a processing operation performed on first data by one processing unit, and the other is used to store the result of a processing operation performed on second data by another processing unit. Those stored processing results are then transferred to result registers. Normal data processing instructions, on the other hand, instruct a processing operation on third data. When a normal data processing instruction is decoded, one allocatable register is selected and used to store the result of processing that a processing unit performs on the third data. The stored processing result is then transferred to a result register. | 02-10-2011 |
20110040953 | MICROPROCESSOR WITH MICROTRANSLATOR AND TAIL MICROCODE INSTRUCTION FOR FAST EXECUTION OF COMPLEX MACROINSTRUCTIONS HAVING BOTH MEMORY AND REGISTER FORMS - A microprocessor includes a first instruction translator that translates an instruction of an instruction set architecture of a microprocessor. The instruction may specify a first form that writes its result to a destination register or a second form that writes its result to memory. The first instruction translator generates, in response to encountering an instance of the instruction, an indication of whether the instance is of the first form or the second form. A microcode memory stores a tail instruction as part of a microcode routine invoked in response to encountering the instance of the instruction. A second instruction translator receives the tail instruction from the microcode memory and the indication and responsively generates a first micro-operation that writes the result to the destination register if the indication specifies the first form or a second micro-operation that completes a write of the result to memory if the indication specifies the second form. | 02-17-2011 |
20110040954 | DATA PROCESSING SYSTEM - The present invention provides a data processor or a data processing system which can be used in compatible modes among which the number of bits of an address specifying a logical address space varies at the time of referring to a branch address table by extension of displacement of a branch instruction. At the time of generating a branch address of a first branch instruction, the data processor or the data processing system optimizes a multiple with which a displacement is multiplied in accordance with the number of bits of an address specifying a logical address space, adds extended address information to the value of a register, and refers to a branch address table with address information obtained by the addition. The referred information is used as a branch address. To be adapted to a compatible mode using different number of bits of an address specifying a logical address space, it is sufficient to change a multiple with which the displacement is multiplied in accordance with the mode. | 02-17-2011 |
20110047355 | Offset Based Register Address Indexing - A circuit arrangement and method support offset based register address indexing, wherein register addresses to be used by an instruction are calculated using offsets to the full target register address, and the offsets are contained in the instruction and occupy less instruction space than the full address widths. An instruction may include at least one offset value that identifies a register address. During decoding of the instruction, an offset and a full target address are retrieved from the instruction, and then a register address is calculated by addition of the offset to the full target address. | 02-24-2011 |
20110078414 | MULTIPORTED REGISTER FILE FOR MULTITHREADED PROCESSORS AND PROCESSORS EMPLOYING REGISTER WINDOWS - A processor includes an instruction fetch unit configured to issue instructions for execution, where the instructions are selected from a number of threads, where each given instruction has a corresponding thread identifier, and where at least some of the instructions specify operand(s) via register identifiers. A register file stores operands usable by the instructions, and may include several banks, each corresponding to a register identifiers and including several entries corresponding to the several threads, wherein the entries are configured to store data values. In response to receiving a request to read a particular register identifier for a given thread identifier, the register file may be configured to decode the given thread identifier to retrieve entries from the banks that correspond to the given thread identifier. The register file may further select, from among the retrieved entries, a data value corresponding to the particular register identifier to be output. | 03-31-2011 |
20110078415 | Efficient Predicated Execution For Parallel Processors - The invention set forth herein describes a mechanism for predicated execution of instructions within a parallel processor executing multiple threads or data lanes. Each thread or data lane executing within the parallel processor is associated with a predicate register that stores a set of 1-bit predicates. Each of these predicates can be set using different types of predicate-setting instructions, where each predicate setting instruction specifies one or more source operands, at least one operation to be performed on the source operands, and one or more destination predicates for storing the result of the operation. An instruction can be guarded by a predicate that may influence whether the instruction is executed for a particular thread or data lane or how the instruction is executed for a particular thread or data lane. | 03-31-2011 |
20110093683 | Program flow control - A data processing apparatus includes a data engine | 04-21-2011 |
20110125987 | Dedicated Arithmetic Decoding Instruction - A dedicated arithmetic decoding instruction is disclosed. In a particular embodiment, an apparatus includes a memory and a processor coupled to the memory. The processor is configured to execute general purpose instructions and to execute a dedicated arithmetic decoding instruction retrieved from the memory. | 05-26-2011 |
20110145549 | PIPELINED DECODING APPARATUS AND METHOD BASED ON PARALLEL PROCESSING - An apparatus and method for decoding moving images based on parallel processing are provided. The apparatus for decoding images based on parallel processing can improve operational performance by pipelining massive-data transmission between processors while performing context-adaptive variable length decoding (CAVLD), inverse quantization (IQ), inverse transformation (IT), motion compensation (MC), intra prediction (IP) and deblocking filter (DF) operations in parallel in units of pluralities of macroblocks (MBs). | 06-16-2011 |
20110153989 | SYNCHRONIZING SIMD VECTORS - A vector compare-and-exchange operation is performed by: decoding by a decoder in a processing device, a single instruction specifying a vector compare-and-exchange operation for a plurality of data elements between a first storage location, a second storage location, and a third storage location; issuing the single instruction for execution by an execution unit in the processing device; and responsive to the execution of the single instruction, comparing data elements from the first storage location to corresponding data elements in the second storage location; and responsive to determining a match exists, replacing the data elements from the first storage location with corresponding data elements from the third storage location. | 06-23-2011 |
20110153990 | SYSTEM, APPARATUS, AND METHOD FOR SUPPORTING CONDITION CODES - An apparatus is described having decode circuitry to decode a first instruction, wherein the first instruction indicates that a copy of a plurality of condition codes bits is to be copied from a first register to a second register. The apparatus also has first execution circuitry to copy a plurality of condition code bits from a first register to a second register. | 06-23-2011 |
20110161632 | COMPILER ASSISTED LOW POWER AND HIGH PERFORMANCE LOAD HANDLING - A method and apparatus for handling low power and high performance loads is herein described. Software, such as a compiler, is utilized to identify producer loads, consumer reuse loads, and consumer forwarded loads. Based on the identification by software, hardware is able to direct performance of the load directly to a load value buffer, a store buffer, or a data cache. As a result, accesses to cache are reduced, through direct loading from load and store buffers, without sacrificing load performance. | 06-30-2011 |
20110161633 | Systems and Methods for Monitoring Out of Order Data Decoding - Various embodiments of the present invention provide systems and methods for monitoring out of order data decoding. For example, a method for monitoring out of order data processing is provided that includes receiving a plurality of data sets that is associated with a plurality of identifiers with each of the plurality of identifiers indicates a respective one of the plurality of data sets; storing each of the plurality of identifiers in a FIFO memory in an order that the corresponding data sets of the plurality of data sets was received; processing the plurality of data sets such that at least one of the plurality of data sets is provided as an output data set; accessing the next available identifier from the FIFO memory; and asserting an out of order signal when the next available identifier is not the same as the identifier associated with the output data set. | 06-30-2011 |
20110161634 | Processor, co-processor, information processing system, and method for controlling processor, co-processor, and information processing system - A processor includes a buffer that separates a sequence of instructions having no operand into segments and stores the segments, a data holder that holds data to be processed, a decoder that references the data and sequentially decodes at least one of the instructions from the top of the sequence, an instruction execution unit that executes the instruction, and an instruction sequence control unit that controls updating of the instruction sequence in accordance with the decoding result. When the decoded top instruction is a branch instruction and if a branch is taken, the instruction sequence control unit updates the sequence so that the top instruction of one of the segments is located at the top of the sequence. If a branch is not taken, the instruction sequence control unit updates the sequence so that an instruction immediately next to the branch instruction is located at the top of the sequence. | 06-30-2011 |
20110185159 | PROCESSOR INCLUDING AGE TRACKING OF ISSUE QUEUE INSTRUCTIONS - An information handling system includes a processor with an instruction issue queue (IQ) that may perform age tracking operations. The issue queue IQ maintains or stores instructions that may issue out-of-order in an internal data store IDS. The IDS organizes instructions in a queue position (QPOS) addressing arrangement. An age matrix of the IQ maintains a record of relative instruction aging for those instructions within the IDS. The age matrix updates latches or other memory cell data to reflect the changes in IDS instruction ages during a dispatch operation into the IQ. During dispatch of one or more instructions, the age matrix may update only those latches that require data change to reflect changing IDS instruction ages. The age matrix employs row and column data and clock controls to individually update those latches requiring update. The issue queue may selectively clock a row and a column of cells of the age matrix that correspond to a dispatched instruction's queue position while leaving other cells unclocked to conserve power. | 07-28-2011 |
20110219212 | System and Method of Processing Hierarchical Very Long Instruction Packets - A system and method of processing a hierarchical very long instruction word (VLIW) packet is disclosed. In a particular embodiment, a method of processing instructions is disclosed. The method includes receiving a hierarchical VLIW packet of instructions and decoding an instruction from the packet to determine whether the instruction is a single instruction or whether the instruction includes a subpacket that includes a plurality of sub-instructions. The method also includes, in response to determining that the instruction includes the subpacket, executing each of the sub-instructions. | 09-08-2011 |
20110219213 | INSTRUCTION CRACKING BASED ON MACHINE STATE - A method, information processing system, and computer program product manage instruction execution based on machine state. At least one instruction is received. The at least one instruction is decoded. A current machine state is determined in response to the decoding. The at least one instruction is organized into a set of unit of operations based on the current machine state that has been determined. The set of unit of operations is executed. | 09-08-2011 |
20110225397 | Mapping between registers used by multiple instruction sets - A processor | 09-15-2011 |
20110231633 | Operand size control - A data processing system | 09-22-2011 |
20110264895 | METHOD AND APPARATUS FOR PERFORMING MULTIPLY-ADD OPERATIONS ON PACKED DATA - A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data. | 10-27-2011 |
20110271082 | PERFORMING ACTIONS ON FRAME ENTRIES IN RESPONSE TO RECEIVING BULK INSTRUCTION - Various example embodiments are disclosed. According to an example embodiment, a switch may comprise an instruction decode stage and a lookup stage. The instruction decode stage may be configured to receive a bulk instruction identifying an action to perform on frame entries of the lookup stage, and in response to receiving the bulk instruction, send, to the lookup stage, at least first and second frame entry instructions, each of the first and second frame entry instructions identifying the action and identifying a unique frame entry in the lookup stage upon which to perform the action. The lookup stage may be configured to receive the first and second frame entry instructions from the instruction decode stage, and in response to receiving each of the first and second frame entry instructions, perform the identified action on the frame entry identified by the respective frame entry instruction. | 11-03-2011 |
20110271083 | MICROPROCESSOR ARCHITECTURE AND METHOD OF INSTRUCTION DECODING - A microprocessor architecture comprises an instruction decoding network for decoding in a first mode partially suppressed opcodes of a sequence of instructions, the opcodes comprising a first part containing parameters being invariant for each opcode of the sequence and a second part comprising a flag indicating an end of the sequence, the first part being suppressed for all opcodes of the sequence except a first opcode of the sequence. Further, a method of instruction decoding in a microprocessor architecture comprising an instruction decoding network for decoding in a first mode partially suppressed opcodes of a sequence of instructions, and in a second mode uncompressed instructions comprises decoding an opcode of an instruction in the second mode when the instruction is not compressible; and decoding an opcode of an instruction in the first mode when the instruction is compressible. | 11-03-2011 |
20110283090 | Instruction Addressing Using Register Address Sequence Detection - A circuit arrangement and method support efficient indexing into large register files by utilizing register address sequence detection, wherein register addresses to be used by an instruction are produced by concatenating a portion of the address that is contained in the instruction with another portion that is speculatively produced by sequence detection logic. The portion of the correct full address that is not contained in the instruction is stored in a software accessible special purpose register. If the end of a particular sequence of addresses is detected by the sequence detection logic, the invention speculatively assumes that the next address in the sequence will be used. Since only a portion of the full addresses are stored in the instruction, they occupy less instruction space than the full address widths. An instruction may include at least one address portion that identifies a register address. | 11-17-2011 |
20110307687 | IN-LANE VECTOR SHUFFLE INSTRUCTIONS - In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand. | 12-15-2011 |
20120036337 | Processor on an Electronic Microchip Comprising a Hardware Real-Time Monitor - A processor on an electronic microchip is capable of executing mathematical processes, each of said processes being associated with a priority, and includes means for the management of the processes, the means for the management of the processes taking the form of hardware, the management of the processes comprising the activation and the suspension of the processes and the management of the execution of the processes according to their priorities. | 02-09-2012 |
20120047351 | DATA PROCESSING SYSTEM HAVING SELECTIVE REDUNDANCY AND METHOD THEREFOR - A method includes: decoding an instruction a first time to obtain a first decoded instruction; decoding the instruction a second time to obtain a second decoded instruction; comparing at least a portion of the first decoded instruction to at least a portion of the second decoded instruction; and when the at least a portion of the first decoded instruction matches the at least a portion of the second decoded instruction, executing the instruction. | 02-23-2012 |
20120079244 | METHOD AND APPARATUS FOR UNIVERSAL LOGICAL OPERATIONS - An apparatus and method are described for performing arbitrary logical operations specified by a table. For example, one embodiment of a method for performing a logical operation on a computer processor comprises: reading data from each of two or more source operands; combining the data read from the source operands to generate an index value, the index value identifying a subset of bits within an immediate value transmitted with an instruction; reading the bits from the immediate value; and storing the bits read from the immediate value within a destination register to generate a result of the instruction. | 03-29-2012 |
20120079245 | DYNAMIC OPTIMIZATION FOR CONDITIONAL COMMIT - An apparatus and method is described herein for conditionally committing and/or speculative checkpointing transactions, which potentially results in dynamic resizing of transactions. During dynamic optimization of binary code, transactions are inserted to provide memory ordering safeguards, which enables a dynamic optimizer to more aggressively optimize code. And the conditional commit enables efficient execution of the dynamic optimization code, while attempting to prevent transactions from running out of hardware resources. While the speculative checkpoints enable quick and efficient recovery upon abort of a transaction. Processor hardware is adapted to support dynamic resizing of the transactions, such as including decoders that recognize a conditional commit instruction, a speculative checkpoint instruction, or both. And processor hardware is further adapted to perform operations to support conditional commit or speculative checkpointing in response to decoding such instructions. | 03-29-2012 |
20120079246 | APPARATUS, METHOD, AND SYSTEM FOR PROVIDING A DECISION MECHANISM FOR CONDITIONAL COMMITS IN AN ATOMIC REGION - An apparatus and method is described herein for conditionally committing /andor speculative checkpointing transactions, which potentially results in dynamic resizing of transactions. During dynamic optimization of binary code, transactions are inserted to provide memory ordering safeguards, which enables a dynamic optimizer to more aggressively optimize code. And the conditional commit enables efficient execution of the dynamic optimization code, while attempting to prevent transactions from running out of hardware resources. While the speculative checkpoints enable quick and efficient recovery upon abort of a transaction. Processor hardware is adapted to support dynamic resizing of the transactions, such as including decoders that recognize a conditional commit instruction, a speculative checkpoint instruction, or both. And processor hardware is further adapted to perform operations to support conditional commit or speculative checkpointing in response to decoding such instructions. | 03-29-2012 |
20120084535 | Opcode Space Minimizing Architecture Utilizing Instruction Address to Indicate Upper Address Bits - Due to the ever expanding number of registers and new instructions in modern microprocessor cores, the address widths present in the instruction encoding continue to widen, and fewer instruction opcodes are available, making it more difficult to add new instructions to existing architectures without resorting to inelegant tricks that have drawbacks such as source destructive operations. The disclosed invention utilizes specialized decode and address calculation hardware that concatenates a fixed number of least significant bits of the instruction address onto the upper address bits of each register address portion contained in the instruction, yielding the full register address, instead of providing the full register address widths for every register used in the instruction. This frees up valuable opcode space for other instructions and avoids compiler complexity. This aligns nicely with how most loops are unrolled in assembly language, where independent operations are near each other in memory. | 04-05-2012 |
20120084536 | PRIMITIVES TO ENHANCE THREAD-LEVEL SPECULATION - A processor may include an address monitor table and an atomic update table to support speculative threading. The processor may also include one or more registers to maintain state associated with execution of speculative threads. The processor may support one or more of the following primitives: an instruction to write to a register of the state, an instruction to trigger the committing of buffered memory updates, an instruction to read the a status register of the state, and/or an instruction to clear one of the state bits associated with trap/exception/interrupt handling. Other embodiments are also described and claimed. | 04-05-2012 |
20120089817 | Conditional selection of data elements - A data processing apparatus, method and computer program that perform an operation on one data element such as a register and then conditionally select either that register or a further register on which no operation has been performed. The apparatus comprises an instruction decoder configured to decode at least one conditional select instruction, said at least one conditional select instruction specifying a primary source register, a secondary source register, a destination register, a condition, and an operation to be performed on a data element from the secondary source register; a data processor configured to perform data processing operations controlled by the instruction decoder wherein: the data processor is responsive to the decoded at least one conditional select instruction and the condition having a predetermined outcome to perform the operation on the data element from the secondary source register to form a resultant data element and to store the resultant data element in the destination register; and the data processor is responsive to the decoded at least one conditional select instruction and the condition not having the predetermined outcome to form the resultant data element from the data element from the primary register and to store the resultant data element in the destination register. | 04-12-2012 |
20120096242 | Method and Apparatus for Performing Control of Flow in a Graphics Processor Architecture - Methods and systems for performing control of flow in a graphics processor architecture are provided. For example, in at least one embodiment, a computing system includes a memory storing a plurality of instructions and a graphics processing unit. The graphics processing unit is configured to process the instructions according to a multi-stage scalar pipeline and store condition code values in the branch control stack. The graphics processing unit is further configured to process branch instructions using condition code values stored in the condition register at the top of the branch control stack. | 04-19-2012 |
20120096243 | MULTITHREADED PROCESSOR WITH MULTIPLE CONCURRENT PIPELINES PER THREAD - A multithreaded processor comprises a plurality of hardware thread units, an instruction decoder coupled to the thread units for decoding instructions received therefrom, and a plurality of execution units for executing the decoded instructions. The multithreaded processor is configured for controlling an instruction issuance sequence for threads associated with respective ones of the hardware thread units. On a given processor clock cycle, only a designated one of the threads is permitted to issue one or more instructions, but the designated thread that is permitted to issue instructions varies over a plurality of clock cycles in accordance with the instruction issuance sequence. The instructions are pipelined in a manner which permits at least a given one of the threads to support multiple concurrent instruction pipelines. | 04-19-2012 |
20120117359 | NO-DELAY MICROSEQUENCER - An apparatus generally including a memory and a circuit is disclosed. The memory may be configured to store a plurality of instructions. Each of the instructions generally includes a corresponding command and a corresponding command repeat count. At least one of the instructions may include a subprocedure call. The circuit may be configured to (i) decode the instructions one at a time and (ii) present a sequence of the commands at an interface. The sequence (i) may be based on the decoding and (ii) may have no delays between consecutive the commands at the interface. | 05-10-2012 |
20120124337 | Size mis-match hazard detection - An out-of-order processor | 05-17-2012 |
20120131312 | Data processing apparatus and method - A data processing apparatus | 05-24-2012 |
20120144165 | SIDEBAND PAYLOADS IN PSEUDO NO-OPERATION INSTRUCTIONS - A pseudo no-op instruction in an instruction stream is detected, and the pseudo no-op instruction is decoded as being an opcode, wherein a parameter of the pseudo no-op instruction uniquely identifies the opcode. The method makes use of a pseudo no-op instruction and provides the pseudo no-op instruction with additional semantics outside of the instruction stream execution. New or enhanced functionality can be implemented in application software in a fashion that fully preserves backward compatibility to software and processors that do not support the new or enhanced functionality. If these functionalities are not supported, then the legacy software or processor will merely see and execute the pseudo no-op instruction, which will effectively do nothing at all. | 06-07-2012 |
20120144166 | CONTROL SIGNAL MEMOIZATION IN A MULTIPLE INSTRUCTION ISSUE MICROPROCESSOR - A dynamic predictive and/or exact caching mechanism is provided in various stages of a microprocessor pipeline so that various control signals can be stored and memorized in the course of program execution. Exact control signal vector caching may be done. Whenever an issue group is formed following instruction decode, register renaming, and dependency checking, an encoded copy of the issue group information can be cached under the tag of the leading instruction. The resulting dependency cache or control vector cache can be accessed right at the beginning of the instruction issue logic stage of the microprocessor pipeline the next time the corresponding group of instructions come up for re-execution. Since the encoded issue group bit pattern may be accessed in a single cycle out of the cache, the resulting microprocessor pipeline with this embodiment can be seen as two parallel pipes, where the shorter pipe is followed if there is a dependency cache or control vector cache hit. | 06-07-2012 |
20120159127 | SECURITY SANDBOX - Different instruction sets are provided for different units of execution such as threads, processes, and execution contexts. Execution units may be associated with instruction sets. The instruction sets may have mutually exclusive opcodes, meaning an opcode in one instruction set is not included in any other instruction set. When executing a given execution unit, the processor only allows execution of instructions in the instruction set that corresponds to the current execution unit. A failure occurs if the execution unit attempts to directly execute an instruction in another instruction set. | 06-21-2012 |
20120173851 | MECHANISM FOR MAINTAINING DYNAMIC REGISTER-LEVEL MEMORY-MODE FLAGS IN A VIRTUAL MACHINE SYSTEM - A method for maintaining dynamic register-level memory-mode flags in a virtual machine includes parsing a machine instruction of a live memory analysis command in a virtual machine (VM). The machine instruction can include an instruction opcode, a source address referring to a first type of memory and a destination address referring to a second type of memory. A register bitmap can be stored as a register-level memory-mode flag array. Thereafter, it can be determined whether or not the instruction opcode maps to an inheritance class. Finally, in response to a bit in the register-level memory mode flag array referencing virtual memory and the instruction opcode being mapped to an inheritance class, the register bitmap can be replaced with new bit values that represent redefined memory types for each register represented in the register bitmap. Subsequently, the new register bitmap can be used in simulation of a next machine instruction of a live memory analysis command executing in the virtual machine. | 07-05-2012 |
20120191949 | PREDICTING A RESULT OF A DEPENDENCY-CHECKING INSTRUCTION WHEN PROCESSING VECTOR INSTRUCTIONS - The described embodiments include a processor that executes a vector instruction. In the described embodiments, while dispatching instructions at runtime, the processor encounters a dependency-checking instruction. Upon determining that a result of the dependency-checking instruction is predictable, the processor dispatches a prediction micro-operation associated with the dependency-checking instruction, wherein the prediction micro-operation generates a predicted result vector for the dependency-checking instruction. The processor then executes the prediction micro-operation to generate the predicted result vector. In the described embodiments, when executing the prediction micro-operation to generate the predicted result vector, if a predicate vector is received, for each element of the predicted result vector for which the predicate vector is active, otherwise, for each element of the predicted result vector, the processor sets the element to zero. | 07-26-2012 |
20120198210 | Microprocessor Having Novel Operations - A processor. The processor includes a first register for storing a first packed data, a decoder, and a functional unit. The decoder has a control signal input. The control signal input is for receiving a first control signal and a second control signal. The first control signal is for indicating a pack operation. The second control signal is for indicating an unpack operation. The functional unit is coupled to the decoder and the register. The functional unit is for performing the pack operation and the unpack operation using the first packed data. The processor also supports a move operation. | 08-02-2012 |
20120204006 | Embedded opcode within an intermediate value passed between instructions - A data processing system | 08-09-2012 |
20120204007 | Controlling the execution of adjacent instructions that are dependent upon a same data condition - A data processing apparatus is disclosed, having: an instruction decoder configured to decode a stream of instructions, a data processor configured to process the decoded stream of instructions; wherein in response to a plurality of adjacent instructions within the stream of instructions execution of which is dependent upon a data condition being met and whose execution when said data condition is not met does not change a state of said processing apparatus, the processor is configured to: commence determining whether the data condition is met or not; and commence processing said plurality of adjacent instructions; and in response to determining that said data condition is not met; skip to a next instruction to be executed after said plurality of adjacent instructions without executing any intermediate ones of said plurality of adjacent instructions not yet executed and continue execution at the next instruction. | 08-09-2012 |
20120204008 | Processor with a Hybrid Instruction Queue with Instruction Elaboration Between Sections - Methods and apparatus for processing instructions by elaboration of instructions prior to issuing the instructions for execution are described. An instruction is received at a hybrid instruction queue comprised of a first queue and a second queue. When the second queue has available space, the instruction is elaborated to expand one or more bit fields to reduce decoding complexity when the elaborated instruction is issued, wherein the elaborated instruction is stored in the second queue. When the second queue does not have available space, the instruction is stored in an unelaborated form in a first queue. The first queue is configured as an exemplary in-order queue and the second queue is configured as an exemplary out-of-order queue. | 08-09-2012 |
20120221835 | MICROPROCESSOR SYSTEMS AND METHODS FOR LATENCY TOLERANCE EXECUTION - An instruction unit provides instructions for execution by a processor. A decode unit decodes instructions received from the instruction unit. Queues are coupled to receive instructions from the decode unit. Each instruction in a same queue is executed in order by a corresponding execution unit. An arbiter is coupled to each queue and to the execution unit that executes instructions of a first instruction type. The arbiter selects a next instruction of the first instruction type from a bottom entry of the queue for execution by the first execution unit. | 08-30-2012 |
20120233443 | PROCESSOR TO EXECUTE SHIFT RIGHT MERGE INSTRUCTIONS - Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block. | 09-13-2012 |
20120254595 | PROCESSOR, INFORMATION PROCESSING APPARATUS AND CONTROL METHOD THEREOF - Instructions decoded by the instruction decoding part are issued to an arithmetic and logic part, a consumption current value that has been consumed by the arithmetic and logic part with instructions issued during a first predetermined period and a consumption current estimated value for a current that is consumed by the arithmetic and logic part with instructions issuable during a second predetermined period of the decoded instructions are calculated, and issuance of some instructions of the decoded instructions is inhibited in the second predetermined period in a case where a change amount of the consumption current estimated value with respect to the consumption current value exceeds a predetermined limit value. | 10-04-2012 |
20120260069 | Processor Including Age Tracking of Issue Queue Instructions - An information handling system includes a processor with an instruction issue queue (IQ) that may perform age tracking operations. The issue queue IQ maintains or stores instructions that may issue out-of-order in an internal data store (IDS). The IDS organizes instructions in a queue position (QPOS) addressing arrangement. An age matrix of the IQ maintains a record of relative instruction aging for those instructions within the IDS. The age matrix updates latches or other memory cell data to reflect the changes in IDS instruction ages during a dispatch operation into the IQ. During dispatch of one or more instructions, the age matrix may update only those latches that require data change to reflect changing IDS instruction ages. The age matrix employs row and column data and clock controls to individually update those latches requiring update. | 10-11-2012 |
20120265966 | PROCESSOR WITH INCREASED EFFICIENCY VIA EARLY INSTRUCTION COMPLETION - Methods and apparatuses are provided for increased efficiency in a processor via early instruction completion. An apparatus is provided for increased efficiency in a processor via early instruction completion. The apparatus comprises an execution unit for processing instructions and determining whether a later issued instruction is ready for completion or an earlier issued instruction is ready for completion and a retire unit for retiring the later issued instruction when the later instruction is ready for completion or to retire the earlier instruction when later instruction is not ready for completion and the earlier issued instruction has a known good completion status. A method is provided for increased efficiency in a processor via early instruction completion. The method comprises completing an earlier issued instruction having a known good completion status ahead of a later issued instruction when the later issued instruction is not ready for completion. | 10-18-2012 |
20120278591 | CROSSBAR SWITCH MODULE HAVING DATA MOVEMENT INSTRUCTION PROCESSOR MODULE AND METHODS FOR IMPLEMENTING THE SAME - A microprocessor is provided that has a datapath that is split into upper and lower portions. The microprocessor includes a centralized crossbar switch module having a single data movement module. The data movement module is capable of processing instructions that require operands to be exchanged between upper and lower 64-bit halves of the split architecture. The data movement module can access and process all instructions that require simultaneous access to the entire register contents of the upper and lower portions. The data movement module is configured to execute any one of a number of different instructions to perform data manipulation with respect to one or more “split-operands” (also referred to simply as “operands” herein). The data movement module can exchange data (bytes and/or bits) of operands for the upper and lower 64-bit halves so that bytes and/or bits of operands can be moved or rearranged to other positions during execution of a particular instruction. The data movement module can allow for various types of operand data movement/manipulation that may be required to implement instruction processing that may be required per various instructions, such as permute, pack, shuffle, vectored conditional move, extract, shift, rotate instructions, any other instruction in which operand data is manipulated, shifted, moved, re-ordered, shuffled or scrambled. | 11-01-2012 |
20120278592 | MICROPROCESSOR SYSTEMS AND METHODS FOR REGISTER FILE CHECKPOINTING - In a processor, a decode unit identifies instructions needing a checkpoint and enables selected checkpoints, and a register file unit includes a plurality of architectural registers; a first set of checkpoint registers corresponding to a first checkpoint, wherein each checkpoint register of the first set corresponds to a corresponding architectural register of the plurality of architectural registers; a first set of indicators corresponding to the first set of checkpoint registers which, for each checkpoint register in the first set of checkpoint registers, indicates whether the corresponding architectural register has been modified or is intended to be modified prior to enabling of the first checkpoint; and a second set of indicators corresponding to the first set of checkpoint registers which, for each checkpoint register in the first set of checkpoint registers, indicates whether the corresponding architectural register has been modified or is intended to be modified after enabling of the first checkpoint. | 11-01-2012 |
20120303935 | MICROPROCESSOR SYSTEMS AND METHODS FOR HANDLING INSTRUCTIONS WITH MULTIPLE DEPENDENCIES - A processor includes an instruction unit which provides instructions for execution by the processor, a decode/issue unit which decodes instructions received from the instruction unit and issues the instructions, and a plurality of execution queues coupled to the decode/issue unit. Each issued instruction from the decode/issue unit is stored into an entry of at least one queue of the plurality of execution queues, wherein each entry of the plurality of execution queues is configured to store an issued instruction and a duplicate indicator corresponding to the issued instruction which indicates whether or not a duplicate instruction of the issued instruction is also stored in an entry of another queue of the plurality of execution queues. | 11-29-2012 |
20120303936 | DATA PROCESSING SYSTEM WITH LATENCY TOLERANCE EXECUTION - In a processor having an instruction unit, a decode/issue unit, and execution queues configured to provide instructions to correspondingly different types execution units, a method comprises maintaining a duplicate free list for the execution queues. The duplicate free list includes a plurality of duplicate dependent instruction indicators that indicate when a duplicate instruction for a dependent instruction is stored in at least one of the execution queues. One of the duplicate dependent instruction indicators is assigned to an execution queue for a dependent instruction. The dependent instruction is executed only when the one of the duplicate dependent instruction indicators is reset. | 11-29-2012 |
20120317400 | SYSTEM AND APPARATUS FOR GROUP FLOATING-POINT INFLATE AND DEFLATE OPERATIONS - Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions, the execution unit further capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results. | 12-13-2012 |
20130007417 | PROCESSOR TO EXECUTE SHIFT RIGHT MERGE INSTRUCTIONS - Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block. | 01-03-2013 |
20130024662 | RELAXATION OF SYNCHRONIZATION FOR ITERATIVE CONVERGENT COMPUTATIONS - Systems and methods are disclosed that allow atomic updates to global data to be at least partially eliminated to reduce synchronization overhead in parallel computing. A compiler analyzes the data to be processed to selectively permit unsynchronized data transfer for at least one type of data. A programmer may provide a hint to expressly identify the type of data that are candidates for unsynchronized data transfer. In one embodiment, the synchronization overhead is reducible by generating an application program that selectively substitutes codes for unsynchronized data transfer for a subset of codes for synchronized data transfer. In another embodiment, the synchronization overhead is reducible by employing a combination of software and hardware by using relaxation data registers and decoders that collectively convert a subset of commands for synchronized data transfer into commands for unsynchronized data transfer. | 01-24-2013 |
20130024663 | Table Call Instruction for Frequently Called Functions - An apparatus includes a memory that stores an instruction including an opcode and an operand. The operand specifies an immediate value or a register indicator of a register storing the immediate value. The immediate value is usable to identify a function call address. The function call address is selectable from a plurality of function call addresses. | 01-24-2013 |
20130024664 | METHOD, APPARATUS AND INSTRUCTIONS FOR PARALLEL DATA CONVERSIONS - Method, apparatus, and program means for performing a conversion. In one embodiment, a disclosed apparatus includes a destination storage location corresponding to a first architectural register. A functional unit operates responsive to a control signal, to convert a first packed first format value selected from a set of packed first format values into a plurality of second format values. Each of the first format values has a plurality of sub elements having a first number of bits The second format values have a greater number of bits. The functional unit stores the plurality of second format values into an architectural register. | 01-24-2013 |
20130024665 | METHOD, APPARATUS AND INSTRUCTIONS FOR PARALLEL DATA CONVERSIONS - Method, apparatus, and program means for performing a conversion. In one embodiment, a disclosed apparatus includes a destination storage location corresponding to a first architectural register. A functional unit operates responsive to a control signal, to convert a first packed first format value selected from a set of packed first format values into a plurality of second format values. Each of the first format values has a plurality of sub elements having a first number of bits The second format values have a greater number of bits. The functional unit stores the plurality of second format values into an architectural register. | 01-24-2013 |
20130046956 | SYSTEMS AND METHODS FOR HANDLING INSTRUCTIONS OF IN-ORDER AND OUT-OF-ORDER EXECUTION QUEUES - Systems and methods are disclosed that can include a processor having an instruction unit, a decode/issue unit, a first execution queue configured to provide instructions of a first instruction type to a first execution unit, and a second execution queue configured to provide instructions of a second instruction type to a second execution unit. A first instruction (IMUL) of the second instruction type is received. The first instruction is decoded by the decode/issue unit to determine operands of the first instruction. The operands of the first instruction are determined to include a dependency on a second instruction (Id) of the first instruction type stored in a first entry of the first execution queue. The first instruction is stored in a first entry of the second execution queue. In response to determining that the operands of the first instruction include the dependency on the second instruction: a synchronization indicator corresponding to the first instruction in a second entry of the first execution queue is set immediately adjacent the first entry of the first execution queue, which indicates that the first instruction is stored in another execution queue. A synchronization pending indicator is set in the first entry of the second execution queue to indicate that the first instruction has a corresponding synchronization indicator stored in another execution queue. | 02-21-2013 |
20130046957 | SYSTEMS AND METHODS FOR HANDLING INSTRUCTIONS OF IN-ORDER AND OUT-OF-ORDER EXECUTION QUEUES - Processing systems and methods are disclosed that can include an instruction unit which provides instructions for execution by the processor; a decode/issue unit which decodes instructions received from the instruction unit and issues the instructions; and a plurality of execution queues coupled to the decode/issue unit, wherein each issued instruction from the decode/issue unit can be stored into an entry of at least one queue of the plurality of execution queues. The plurality of queues can comprise an independent execution queue, a dependent execution queue, and a plurality of execution units coupled to receive instructions for execution from the plurality of execution queues. The plurality of execution units can comprise a first execution unit, coupled to receive instructions from the dependent execution queue and the independent execution queue which have been selected for execution. When a multi-cycle instruction at a bottom entry of the dependent execution queue is selected for execution, it may not be removed from the dependent execution queue until a result is received from the first execution unit. When a multi-cycle instruction at a bottom entry of the independent execution queue is selected for execution, it can be removed from the independent execution queue without waiting to receive a result from the first execution unit. | 02-21-2013 |
20130046958 | Systems and Methods for Local Iteration Adjustment - Various embodiments of the present invention provide systems and methods for data processing. As an example, a data processing circuit is disclosed that includes: a data decoder circuit and a local iteration adjustment circuit. The data decoder circuit is operable to perform a number of local iterations on a decoder input to yield a data output. The local iteration adjustment circuit is operable to generate a limit on the number of local iterations performed by the data decoder circuit | 02-21-2013 |
20130046959 | METHOD AND APPARATUS FOR PERFORMING LOGICAL COMPARE OPERATION - A method and apparatus for including in a processor instructions for performing logical-comparison and branch support operations on packed or unpacked data. In one embodiment, instruction decode logic decodes instructions for an execution unit to operate on packed data elements including logical comparisons. A register file including 128-bit packed data registers stores packed single-precision floating point (SPFP) and packed integer data elements. The logical comparisons may include comparison of SPFP data elements and comparison of integer data elements and setting at least one bit to indicate the results. Based on these comparisons, branch support actions are taken. Such branch support actions may include setting the at least one bit, which in turn may be utilized by a branching unit in response to a branch instruction. Alternatively, the branch support actions may include branching to an indicated target code location. | 02-21-2013 |
20130046960 | METHOD AND APPARATUS FOR PERFORMING LOGICAL COMPARE OPERATION - A method and apparatus for including in a processor instructions for performing logical-comparison and branch support operations on packed or unpacked data. In one embodiment, instruction decode logic decodes instructions for an execution unit to operate on packed data elements including logical comparisons. A register file including 128-bit packed data registers stores packed single-precision floating point (SPFP) and packed integer data elements. The logical comparisons may include comparison of SPFP data elements and comparison of integer data elements and setting at least one bit to indicate the results. Based on these comparisons, branch support actions are taken. Such branch support actions may include setting the at least one bit, which in turn may be utilized by a branching unit in response to a branch instruction. Alternatively, the branch support actions may include branching to an indicated target code location. | 02-21-2013 |
20130054941 | CLOCK DATA RECOVERY CIRCUIT AND CLOCK DATA RECOVERY METHOD - A processor includes: an arithmetic unit configured to execute instructions; an instruction decode part configured to decode the instructions executed in the arithmetic unit and to output opcodes; and an interrupt register configured to receive interrupt signals, wherein the instruction decode part includes an instruction code map that stores the opcodes in correspondence to instructions and outputs the opcodes in accordance with the instructions inputted, and the instruction code map stores a plurality of sets of opcodes to be output as switch opcodes corresponding to additional instructions, the additional instructions are a part of the instructions, and switches the sets of the switch opcodes in accordance with the interrupt signal. | 02-28-2013 |
20130073835 | PRIMITIVES TO ENHANCE THREAD-LEVEL SPECULATION - A processor may include an address monitor table and an atomic update table to support speculative threading. The processor may also include one or more registers to maintain state associated with execution of speculative threads. The processor may support one or more of the following primitives: an instruction to write to a register of the state, an instruction to trigger the committing of buffered memory updates, an instruction to read the a status register of the state, and/or an instruction to clear one of the state bits associated with trap/exception/interrupt handling. Other embodiments are also described and claimed. | 03-21-2013 |
20130080741 | HARDWARE CONTROL OF INSTRUCTION OPERANDS IN A PROCESSOR - An apparatus generally having a first circuit, a second circuit and a third circuit is disclosed. The first circuit may have a counter and may be configured to adjust at least one control signal in response to a current value of the counter. The first circuit may be implemented only in hardware. The counter generally counts a number of loops in which a plurality of instructions are executed. The second circuit may be configured to set the counter to an initial value. The third circuit may be configured to execute the instructions using a plurality of data items as a plurality of operands such that at least two of the instructions use different ones of the operands. The data items may be routed to the third circuit in response to the control signal. The apparatus generally forms a processor. | 03-28-2013 |
20130080742 | METHOD, APPARATUS AND INSTRUCTIONS FOR PARALLEL DATA CONVERSIONS - Method, apparatus, and program means for performing a conversion. In one embodiment, a disclosed apparatus includes a destination storage location corresponding to a first architectural register. A functional unit operates responsive to a control signal, to convert a first packed first format value selected from a set of packed first format values into a plurality of second format values. Each of the first format values has a plurality of sub elements having a first number of bits The second format values have a greater number of bits. The functional unit stores the plurality of second format values into an architectural register. | 03-28-2013 |
20130097408 | CONDITIONAL COMPARE INSTRUCTION - An instruction decoder ( | 04-18-2013 |
20130111190 | OPERATIONAL CODE EXPANSION IN RESPONSE TO SUCCESSIVE TARGET ADDRESS DETECTION | 05-02-2013 |
20130117537 | Method and Apparatus for Unpacking Packed Data - An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element. | 05-09-2013 |
20130117538 | Method and Apparatus for Unpacking Packed Data - An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element. | 05-09-2013 |
20130124830 | Method and Apparatus for Unpacking Packed Data - An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element. | 05-16-2013 |
20130124831 | Method and Apparatus for Packing Packed Data - An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element. | 05-16-2013 |
20130124832 | Method and Apparatus for Unpacking Packed Data - An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element. | 05-16-2013 |
20130124833 | Method and Apparatus for Unpacking Packed Data - An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element. | 05-16-2013 |
20130124834 | Method and Apparatus for Unpacking Packed Data - An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element. | 05-16-2013 |
20130124835 | Method and Apparatus for Packing Packed Data - An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element. | 05-16-2013 |
20130145124 | SYSTEM AND METHOD FOR PERFORMING SHAPED MEMORY ACCESS OPERATIONS - One embodiment of the present invention sets forth a technique that provides an efficient way to retrieve operands from a register file. Specifically, the instruction dispatch unit receives one or more instructions, each of which includes one or more operands. Collectively, the operands are organized into one or more operand groups from which a shaped access may be formed. The operands are retrieved from the register file and stored in a collector. Once all operands are read and collected in the collector, the instruction dispatch unit transmits the instructions and corresponding operands to functional units within the streaming multiprocessor for execution. One advantage of the present invention is that multiple operands are retrieved from the register file in a single register access operation without resource conflict. Performance in retrieving operands from the register file is improved by forming shaped accesses that efficiently retrieve operands exhibiting recognized memory access patterns. | 06-06-2013 |
20130145125 | Bitstream Buffer Manipulation With A SIMD Merge Instruction - Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block. | 06-06-2013 |
20130151817 | METHOD, APPARATUS, AND COMPUTER PROGRAM PRODUCT FOR PARALLEL FUNCTIONAL UNITS IN MULTICORE PROCESSORS - Method, apparatus, and computer program product embodiments of the invention maximize the use of functional processing units in a multicore processor integrated circuit architecture. Example embodiments of the invention determine that instructions to be executed in a functional processor of a local processor core of a multicore processor, are capable of execution in a functional processor of a neighbor processor core of the multicore processor. A compute request is sent to the neighbor processor core to initiate execution of the instructions in the functional processor. A compute response is received from the neighbor processor core, if the functional processor has been able to execute the instructions. | 06-13-2013 |
20130151818 | MICRO ARCHITECTURE FOR INDIRECT ACCESS TO A REGISTER FILE IN A PROCESSOR - A method and system for improving performance and latency of instruction execution within an execution pipeline in a processor. The method includes finding, while decoding an instruction, a pointer register used by the instruction; reading the pointer register; validating a pointer register entry; reading, if the pointer register entry is valid, a register file entry; validating a register file entry; validating, if the register file entry is invalid, a valid register file entry wherein the valid register file entry is in the register file's future file; bypassing, if the valid register file entry is valid, a valid register file value from the register file's future file to the execution pipeline wherein the valid register file value is in the valid register file entry; and executing the instruction using the valid register file value; wherein at least one of the steps is carried out using a computer device. | 06-13-2013 |
20130159673 | PROVIDING CAPACITY GUARANTEES FOR HARDWARE TRANSACTIONAL MEMORY SYSTEMS USING FENCES - A method is provided that includes determining a number of outstanding out-of-order instructions in an instruction stream. The method includes determining a number of available hardware resources for executing out-of-order instructions and inserting fencing instructions into the instruction stream if the number of outstanding out-of-order instructions exceeds the determined number of available hardware resources. A second method is provided for compiling source code that includes determining a speculative region. The second method includes generating machine-level instructions and inserting fencing instructions into the machine-level instructions in response to determining the speculative region. A processing device is provided that includes cache memory and a processing unit to execute processing device instructions in an instruction stream. The processing device includes an out-of-order speculation supervisor unit to determine hardware resource availability and generate an indication to insert fencing instructions in response to the availability. Computer readable storage media are also provided. | 06-20-2013 |
20130159674 | INSTRUCTION PREDICATION USING INSTRUCTION FILTERING - A method and circuit arrangement for selectively predicating instructions in an instruction stream based upon a predication filter criteria defined by a predication filter, which describes types or patterns of instructions that should be predicated. Predication logic compares a respective instruction of an instruction stream to predication filter criteria to determine whether the respective instruction matches the predication filter criteria, and the respective instruction is selectively predicated based on whether the respective instruction matches the predication filter criteria. | 06-20-2013 |
20130159675 | INSTRUCTION PREDICATION USING UNUSED DATAPATH FACILITIES - A method and circuit arrangement for selectively predicating an instruction in an instruction stream based upon a value corresponding to a predication register address indicated by a portion of an operand associated with the instruction. A first compare instruction in an instruction stream stores a compare result in at a register address of a predication register. The register address of the predication register is stored in a portion of an operand associated with a second instruction, and during decoding the second instruction, the predication register is accessed to determine a value stored at the register address of the predication register, and the second instruction is selectively predicated based on the value stored at the register address of the predication register. | 06-20-2013 |
20130159676 | INSTRUCTION SET ARCHITECTURE WITH EXTENDED REGISTER ADDRESSING - A method and circuit arrangement selectively repurpose bits from a primary opcode portion of an instruction for use in decoding one or more operands for the instruction. Decode logic of a processor, for example, may be placed in a predetermined mode that decodes a primary opcode for an instruction that is different from that specified in the primary opcode portion of the instruction, and then utilize one or more bits in the primary opcode portion to decode one or more operands for the instruction. By doing so, additional space is freed up in the instruction to support a larger register file and/or additional instruction types, e.g., as specified by a secondary or extended opcode. | 06-20-2013 |
20130166883 | METHOD AND APPARATUS FOR PERFORMING LOGICAL COMPARE OPERATIONS - A method and apparatus for including in a processor instructions for performing logical-comparison and branch support operations on packed or unpacked data. In one embodiment, instruction decode logic decodes instructions for an execution unit to operate on packed data elements including logical comparisons. A register file including 128-bit packed data registers stores packed single-precision floating point (SPFP) and packed integer data elements. The logical comparisons may include comparison of SPFP data elements and comparison of integer data elements and setting at least one bit to indicate the results. Based on these comparisons, branch support actions are taken. Such branch support actions may include setting the at least one bit, which in turn may be utilized by a branching unit in response to a branch instruction. Alternatively, the branch support actions may include branching to an indicated target code location. | 06-27-2013 |
20130166884 | METHOD AND APPARATUS FOR PERFORMING LOGICAL COMPARE OPERATIONS - A method and apparatus for including in a processor instructions for performing logical-comparison and branch support operations on packed or unpacked data. In one embodiment, instruction decode logic decodes instructions for an execution unit to operate on packed data elements including logical comparisons. A register file including 128-bit packed data registers stores packed single-precision floating point (SPFP) and packed integer data elements. The logical comparisons may include comparison of SPFP data elements and comparison of integer data elements and setting at least one bit to indicate the results. Based on these comparisons, branch support actions are taken. Such branch support actions may include setting the at least one bit, which in turn may be utilized by a branching unit in response to a branch instruction. Alternatively, the branch support actions may include branching to an indicated target code location. | 06-27-2013 |
20130191615 | INSTRUCTIONS AND LOGIC TO PERFORM MASK LOAD AND STORE OPERATIONS - In one embodiment, logic is provided to receive and execute a mask move instruction to transfer a vector data element including a plurality of packed data elements from a source location to a destination location, subject to mask information for the instruction. Other embodiments are described and claimed. | 07-25-2013 |
20130198491 | MAJOR BRANCH INSTRUCTIONS WITH TRANSACTIONAL MEMORY - Major branch instructions are provided that enable execution of a computer program to branch from one segment of code to another segment of code. These instructions also create a new stream of processing at the other segment of code enabling execution of the other segment of code to be performed in parallel with the segment of code from which the branch was taken. In one example, the other stream of processing starts a transaction for processing instructions of the other stream of processing. | 08-01-2013 |
20130198492 | MAJOR BRANCH INSTRUCTIONS - Major branch instructions are provided that enable execution of a computer program to branch from one segment of code to another segment of code. These instructions also create a new stream of processing at the other segment of code enabling execution of the other segment of code to be performed in parallel with the segment of code from which the branch was taken. In one example, the other stream of processing starts a transaction for processing instructions of the other stream of processing. | 08-01-2013 |
20130205119 | INSTRUCTION AND LOGIC TO TEST TRANSACTIONAL EXECUTION STATUS - Novel instructions, logic, methods and apparatus are disclosed to test transactional execution status. Embodiments include decoding a first instruction to start a transactional region. Responsive to the first instruction, a checkpoint for a set of architecture state registers is generated and memory accesses from a processing element in the transactional region associated with the first instruction are tracked. A second instruction to detect transactional execution of the transactional region is then decoded. An operation is executed, responsive to decoding the second instruction, to determine if an execution context of the second instruction is within the transactional region. Then responsive to the second instruction, a first flag is updated. In some embodiments, a register may optionally be updated and/or a second flag may optionally be updated responsive to the second instruction. | 08-08-2013 |
20130212357 | Floating Point Constant Generation Instruction - Systems and methods for generating a floating point constant value from an instruction are disclosed. A first field of the instruction is decoded as a sign bit of the floating point constant value. A second field of the instruction is decoded to correspond to an exponent value of the floating point constant value. A third field of the instruction is decoded to correspond to the significand of the floating point constant value. The first field, the second field, and the third field are combined to form the floating point constant value. The exponent value may include a bias, and a bias constant may be added to the exponent value to compensate for the bias. The third field may comprise the most significant bits of the significand. Optionally, the second field and the third field may be shifted by first and second shift values respectively before they are combined to form the floating point constant value. | 08-15-2013 |
20130212358 | DATA PROCESSING SYSTEM WITH LATENCY TOLERANCE EXECUTION - A data processing system comprises a processor unit that includes an instruction decode/issue unit including a re-order buffer having entries that include an execution queue tag that indicates an execution queue location of an instruction to which a re-order buffer entry is assigned, a result valid indicator to indicate that a corresponding instruction has executed with a status bit valid result, and a forward indicator to indicate that the status bit can be forwarded to an execution queue of an instruction pointed to that is waiting to receive the status bit. | 08-15-2013 |
20130212359 | Processor to Execute Shift Right Merge Instructions - Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block. | 08-15-2013 |
20130219151 | PROCESSOR FOR PERFORMING MULTIPLY-ADD OPERATIONS ON PACKED DATA - A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data. | 08-22-2013 |
20130232321 | Unpacking Packed Data In Multiple Lanes - Receiving an instruction indicating first and second operands. Each of the operands having packed data elements that correspond in respective positions. A first subset of the data elements of the first operand and a first subset of the data elements of the second operand each corresponding to a first lane. A second subset of the data elements of the first operand and a second subset of the data elements of the second operand each corresponding to a second lane. Storing result, in response to instruction, including: (1) in first lane, only lowest order data elements from first subset of first operand interleaved with corresponding lowest order data elements from first subset of second operand; and (2) in second lane, only highest order data elements from second subset of first operand interleaved with corresponding highest order data elements from second subset of second operand. | 09-05-2013 |
20130238879 | METHOD, APPARATUS AND INSTRUCTIONS FOR PARALLEL DATA CONVERSIONS - Method, apparatus, and program means for performing a conversion. In one embodiment, a disclosed apparatus includes a destination storage location corresponding to a first architectural register. A functional unit operates responsive to a control signal, to convert a first packed first format value selected from a set of packed first format values into a plurality of second format values. Each of the first format values has a plurality of sub elements having a first number of bits. The second format values have a greater number of bits. The functional unit stores the plurality of second format values into an architectural register. | 09-12-2013 |
20130262827 | APPARATUS AND METHOD USING HYBRID LENGTH INSTRUCTION WORD - A parallel processing computer architecture utilizes hybrid length instruction words that mix vectors and Very Long Instruction Word (VLIW) or other parallel processing instructions to enable data-parallel and task-parallel instructions to be run simultaneously with reduced redundancy and code size inefficiency. | 10-03-2013 |
20130262828 | PROCESSOR AND METHOD FOR DRIVING THE SAME - A low-power processor that does not easily malfunction is provided. Alternatively, a low-power processor having high processing speed is provided. Alternatively, a method for driving the processor is provided. In power gating, the processor performs part of data backup in parallel with arithmetic processing and performs part of data recovery in parallel with arithmetic processing. Such a driving method prevents a sharp increase in power consumption in a data backup period and a data recovery period and generation of instantaneous voltage drops and inhibits increases of the data backup period and the data recovery period. | 10-03-2013 |
20130275721 | RECONFIGURABLE INSTRUCTION ENCODING METHOD, EXECUTION METHOD, AND ELECTRONIC APPARATUS - Reconfigurable instruction encoding method, and execution method and electronic apparatus are provided. In the reconfigurable instruction encoding method, in an embodiment, instruction pairs of an application are encoded and re-encoded according to the number of times that the instruction pairs are utilized in the application to generate an instruction encoding table and an instruction mapping table. The reconfigurable instruction execution method includes: loading an instruction mapping table to a processing unit having an instruction mapping module, an instruction decoding module, and an execution module; converting a first instruction of an application to a target instruction by the instruction mapping module according to the instruction mapping table; and decoding the target instruction and executing the decoded target instruction by the decoding module and the execution module, respectively. | 10-17-2013 |
20130275722 | METHOD AND APPARATUS TO PROCESS KECCAK SECURE HASHING ALGORITHM - A processor includes a plurality of registers, an instruction decoder to receive an instruction to process a KECCAK state cube of data representing a KECCAK state of a KECCAK hash algorithm, to partition the KECCAK state cube into a plurality of subcubes, and to store the subcubes in the plurality of registers, respectively, and an execution unit coupled to the instruction decoder to perform the KECCAK hash algorithm on the plurality of subcubes respectively stored in the plurality of registers in a vector manner. | 10-17-2013 |
20130275723 | CONDITIONAL EXECUTION SUPPORT FOR ISA INSTRUCTIONS USING PREFIXES - In one embodiment, a processor includes an instruction decoder to receive a first instruction having a prefix and an opcode and to generate, by an instruction decoder of the processor, a second instruction executable based on a condition determined based on the prefix, and an execution unit to conditionally execute the second instruction based on the condition determined based on the prefix. | 10-17-2013 |
20130290679 | NEXT BRANCH TABLE FOR USE WITH A BRANCH PREDICTOR - A data processing system | 10-31-2013 |
20130290680 | OPTIMIZING REGISTER INITIALIZATION OPERATIONS - A system and method for efficiently reducing the latency of initializing registers. A register rename unit within a processor determines whether prior to an execution pipeline stage it is known a decoded given instruction writes a particular numerical value in a destination operand. An example is a move immediate instruction that writes a value of 0 in its destination operand. Other examples may also qualify. If the determination is made, a given physical register identifier is assigned to the destination operand, wherein the given physical register identifier is associated with the particular numerical value, but it is not associated with an actual physical register in a physical register file. The given instruction is marked to prevent it from proceeding to an execution pipeline stage. When the given physical register identifier is used to read the physical register file, no actual physical register is accessed. | 10-31-2013 |
20130297912 | APPARATUS AND METHOD FOR DYNAMIC ALLOCATION OF EXECUTION QUEUES - A processor reduces the likelihood of stalls at an instruction pipeline by dynamically extending the size of a full execution queue. To extend the full execution queue, the processor temporarily repurposes another execution queue to store instructions on behalf of the full execution queue. The execution queue to be repurposed can be selected based on a number of factors, including the type of instructions it is generally designated to store, whether it is empty of other instruction types, and the rate of cache hits at the processor. By selecting the repurposed queue based on dynamic factors such as the cache hit rate, the likelihood of stalls at the dispatch stage is reduced for different types of program flows, improving overall efficiency of the processor. | 11-07-2013 |
20130305020 | VECTOR FRIENDLY INSTRUCTION FORMAT AND EXECUTION THEREOF - A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams. | 11-14-2013 |
20130311752 | INSTRUCTION-OPTIMIZING PROCESSOR WITH BRANCH-COUNT TABLE IN HARDWARE - A processing system comprising a microprocessor core and a translator. Within the microprocessor core is arranged a hardware decoder configured to selectively decode instructions for execution in the microprocessor core, and, a logic structure configured to track usage of the hardware decoder. The translator is operatively coupled to the logic structure and configured to selectively translate the instructions for execution in the microprocessor core, based on the usage of the hardware decoder as determined by the logic structure. | 11-21-2013 |
20130311753 | METHOD AND DEVICE (UNIVERSAL MULTIFUNCTION ACCELERATOR) FOR ACCELERATING COMPUTATIONS BY PARALLEL COMPUTATIONS OF MIDDLE STRATUM OPERATIONS - This invention constitutes a method and apparatus for enabling parallel computations of intermediate operations which are generic in many algorithms in given applications and also contain most of the computationally intensive operations. The method includes designing a set of intermediate level functions suitable for predefined application, obtaining instructions corresponding to intermediate level operations from a processor, computing the addresses of the operands and the results, performing computations involved in multiple intermediate level operations. In an exemplary embodiment the apparatus consists of a local data address generator that computes the addresses of a plurality of operands and results, a programmable computational unit that performs parallels computations of the intermediate level operations and a local memory interface that is interfaced to local memory organized in multiple blocks. The local data address generator and programmable computational unit are configurable to cover any field requiring large computations. | 11-21-2013 |
20130326194 | METHOD, APPARATUS AND INSTRUCTIONS FOR PARALLEL DATA CONVERSIONS - Method, apparatus, and program means for performing a conversion. In one embodiment, a disclosed apparatus includes a destination storage location corresponding to a first architectural register. A functional unit operates responsive to a control signal, to convert a first packed first format value selected from a set of packed first format values into a plurality of second format values. Each of the first format values has a plurality of sub elements having a first number of bits The second format values have a greater number of bits. The functional unit stores the plurality of second format values into an architectural register. | 12-05-2013 |
20130326195 | PREVENTING EXECUTION OF PARITY-ERROR-INDUCED UNPREDICTABLE INSTRUCTIONS, AND RELATED PROCESSOR SYSTEMS, METHODS, AND COMPUTER-READABLE MEDIA - Preventing execution of parity-error-induced unpredictable instructions, and related processor systems, methods, and computer-readable media are disclosed. In this regard, a method for processing instructions in a central processing unit (CPU) is provided. The method comprises decoding an instruction comprising a plurality of bits, and generating a parity error indicator indicating whether a parity error exists in the plurality of bits prior to execution of the instruction. If the parity error indicator indicates that the parity error exists in the plurality of bits, one or more of the plurality of bits are modified to indicate a no execution operation (NOP), without effecting a roll back of a program counter of the CPU and without re-decoding the instruction. In this manner, the possibility of the parity error causing an inadvertent execution of an unpredictable instruction is reduced. | 12-05-2013 |
20130326196 | SYSTEMS, APPARATUSES, AND METHODS FOR PERFORMING VECTOR PACKED UNARY DECODING USING MASKS - Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed unary value decoding using masks in response to a single vector packed unary decoding using masks instruction that includes a destination vector register operand, a source writemask register operand, and an opcode are described. | 12-05-2013 |
20130339666 | SPECIAL CASE REGISTER UPDATE WITHOUT EXECUTION - A method of changing a value of associated with a logical address in a computing device. The method includes: receiving an instruction at an instruction decoder, the instruction including a target register expressed as a logical value; determining at an instruction decoder that a result of the instruction is to set the target register to a constant value, the target register being in a physical register file associated with an execution unit; and mapping, in a register mapper, the logical address to a location represented by a special register tag. | 12-19-2013 |
20130339667 | SPECIAL CASE REGISTER UPDATE WITHOUT EXECUTION - A method of changing a value of associated with a logical address in a computing device. The method includes: receiving an instruction at an instruction decoder, the instruction including a target register expressed as a logical value; determining at an instruction decoder that a result of the instruction is to set the target register to a constant value, the target register being in a physical register file associated with an execution unit; and mapping, in a register mapper, the logical address to a location represented by a special register tag. | 12-19-2013 |
20130339668 | SYSTEMS, APPARATUSES, AND METHODS FOR PERFORMING DELTA DECODING ON PACKED DATA ELEMENTS - Embodiments of systems, apparatuses, and methods for performing delta decoding on packed data elements of a source and storing the results in packed data elements of a destination using a single vector packed delta decode instruction are described. | 12-19-2013 |
20130346728 | Optimizing Performance Of Instructions Based On Sequence Detection Or Information Associated With The Instructions - In one embodiment, the present invention includes an instruction decoder that can receive an incoming instruction and a path select signal and decode the incoming instruction into a first instruction code or a second instruction code responsive to the path select signal. The two different instruction codes, both representing the same incoming instruction may be used by an execution unit to perform an operation optimized for different data lengths. Other embodiments are described and claimed. | 12-26-2013 |
20140013084 | DEVICE AND METHOD FOR IMPLEMENTING ADDRESS BUFFER MANAGEMENT OF PROCESSOR - The disclosure provides a device for implementing address buffer management of a processor, including: an assembler configured to perform operations to obtain intermediate values when the assembler encodes a set instruction for an address automatic-increment value and boundary values, and to encapsulate the intermediate values into the set instruction for the address automatic-increment value and boundary values; and a processor configured to determine, according to the intermediate values, whether to perform the address automatic-increment operation or the address automatic-decrement operation, so as to achieve the address buffer management. The disclosure also provides a method for implementing address buffer management of a processor, including: a processor decodes a set instruction for an address automatic-increment value and boundary values to obtain intermediate values, and determines, according to the intermediate values, whether to perform the address automatic-increment operation or the address automatic-decrement operation when the processor performs a load or store instruction, so as to realize the address buffer management. Through the device and the method of the disclosure, the hardware costs of the processor are reduced and design requirements of the processor's time sequence and energy efficiency are met. | 01-09-2014 |
20140019723 | BINARY TRANSLATION IN ASYMMETRIC MULTIPROCESSOR SYSTEM - An asymmetric multiprocessor system (ASMP) may comprise computational cores implementing different instruction set architectures and having different power requirements. Program code for execution on the ASMP is analyzed and a determination is made as to whether to allow the program code, or a code segment thereof to execute on a first core natively or to use binary translation on the code and execute the translated code on a second core which consumes less power than the first core during execution. | 01-16-2014 |
20140025933 | REPLAY REDUCTION BY WAKEUP SUPPRESSION USING EARLY MISS INDICATION - A method for reducing a number of operations replayed in a processor includes decoding an operation to determine a memory address and a command in the operation. If data is not in a way predictor based on the memory address, a suppress wakeup signal is sent to an operation scheduler, and the operation scheduler suppresses waking up other operations that are dependent on the data. | 01-23-2014 |
20140040601 | PREDICATION IN A VECTOR PROCESSOR - Embodiments relate to vector processor predication in an active memory device. An aspect includes a method for vector processor predication in an active memory device that includes memory and a processing element. The method includes decoding, in the processing element, an instruction including a plurality of sub-instructions to execute in parallel. One or more mask bits are accessed from a vector mask register in the processing element. The one or more mask bits are applied by the processing element to predicate operation of a unit in the processing element associated with at least one of the sub-instructions. | 02-06-2014 |
20140040602 | Storage Method, Memory, and Storing System with Accumulated Write Feature - A storage method, a memory and a storage system that have an accumulated write feature are provided in which the OR and AND operation are shifted from CPU/ALU (controller) to the memory, and the frequency for switching data transmission lines between read and write instructions can be reduced. In the memory, the interface unit includes a write arithmetic instruction interface, a write instruction interface, and an address instruction interface; the instruction/address decoder is configured to decode a write arithmetic instruction, a write instruction and an address instruction; and the pFET has a higher driving capability than the data switches, and the nFET has a lower driving capability than the data switches. The storage method, memory and storage system can reduce work load of CPU/ALU, and enable continuous data writing to the memory. | 02-06-2014 |
20140052962 | CUSTOM CHAINING STUBS FOR INSTRUCTION CODE TRANSLATION - A processing system includes a microprocessor, a hardware decoder arranged within the microprocessor, and a translator operatively coupled to the microprocessor. The hardware decoder is configured to decode instruction code non-native to the microprocessor for execution in the microprocessor. The translator is configured to form a translation of the instruction code in an instruction set native to the microprocessor and to connect a branch instruction in the translation to a chaining stub. The chaining stub is configured to selectively cause additional instruction code at a target address of the branch instruction to be received in the hardware decoder without causing the processing system to search for a translation of additional instruction code at the target address. | 02-20-2014 |
20140052963 | TECHNIQUE TO PERFORM THREE-SOURCE OPERATIONS - A technique to perform three-source instructions. At least one embodiment of the invention relates to converting a three-source instruction into at least two instructions identifying no more than two source values. | 02-20-2014 |
20140059326 | CALCULATION PROCESSING DEVICE AND CALCULATION PROCESSING DEVICE CONTROLLING METHOD - A calculation-processing-device includes: a decoder unit including, a first-counter to increment a first-count-value and to decrement the-first-count-value, and a second-counter configured to increment a second-count-value and to decrement the second-count-value; a first-instruction-executing-unit to execute an instruction of the first-class; a second-instruction-executing-unit to execute an instruction of the-second class; a first-instruction holding unit including a plurality of first-entries, to input the instruction of the first-class held in one of the plurality of first-entries into the first-instruction-executing-unit; a second-instruction-holding-unit including a plurality of second-entries, to input the instruction of the second-class held in one of the plurality of second-entries into the second-instruction-executing-unit; and first-control-unit to output the second-release-notification, and to change the output timing of the second-release-notification when a predetermined relationship is established between the first-timing and the second-timing, and the register is used by the subsequent instruction of the second-class. | 02-27-2014 |
20140068229 | INSTRUCTION ADDRESS ENCODING AND DECODING BASED ON PROGRAM CONSTRUCT GROUPS - Coding circuitry comprises at least an encoder configured to encode an instruction address for transmission to a decoder. The encoder is operative to identify the instruction address as belonging to a particular one of a plurality of groups of instruction addresses associated with respective distinct program constructs, and to encode the instruction address based on the identified group. The decoder is operative to identify the encoded instruction address as belonging to the particular one of a plurality of groups of instruction addresses associated with respective distinct program constructs, and to decode the encoded instruction address based on the identified group. The coding circuitry may be implemented as part of an integrated circuit or other processing device that includes associated processor and memory elements. In such an arrangement, the processor may generate the instruction address for delivery over a bus to the memory. | 03-06-2014 |
20140082328 | METHOD AND APPARATUS TO PROCESS 4-OPERAND SIMD INTEGER MULTIPLY-ACCUMULATE INSTRUCTION - According to one embodiment, a processor includes an instruction decoder to receive an instruction to process a multiply-accumulate operation, the instruction having a first operand, a second operand, a third operand, and a fourth operand. The first operand is to specify a first storage location to store an accumulated value; the second operand is to specify a second storage location to store a first value and a second value; and the third operand is to specify a third storage location to store a third value. The processor further includes an execution unit coupled to the instruction decoder to perform the multiply-accumulate operation to multiply the first value with the second value to generate a multiply result and to accumulate the multiply result and at least a portion of a third value to an accumulated value based on the fourth operand. | 03-20-2014 |
20140082329 | CONTINUOUS RUN-TIME VALIDATION OF PROGRAM EXECUTION: A PRACTICAL APPROACH - Trustworthy systems require that code be validated as genuine. Most systems implement this requirement prior to execution by matching a cryptographic hash of the binary file against a reference hash value, leaving the code vulnerable to run time compromises, such as code injection, return and jump-oriented programming, and illegal linking of the code to compromised library functions. The Run-time Execution Validator (REV) validates, as the program executes, the control flow path and instructions executed along the control flow path. REV uses a signature cache integrated into the processor pipeline to perform live validation of executions, at basic block boundaries, and ensures that changes to the program state are not made by the instructions within a basic block until the control flow path into the basic block and the instructions within the basic block are both validated. | 03-20-2014 |
20140089638 | Multi-Destination Instruction Handling - Various techniques for processing instructions that specify multiple destinations. A first portion of a processor pipeline is configured to split a multi-destination instruction into a plurality of single-destination operations. A second portion of the pipeline is configured to process the plurality of single-destination operations. A third portion of the pipeline is configured to merge the plurality of single-destination operations into one or more multi-destination operations. The one or more multi-destination operations may be performed. The first portion of the pipeline may include a decode unit. The second portion of the pipeline may include a map unit, which may in turn include circuitry configured to maintain a list of free architectural registers and a mapping table that maps physical registers to architectural registers. The third portion of the pipeline may comprise a dispatch unit. In some embodiments, this may provide certain advantages such as reduced area and/or power consumption. | 03-27-2014 |
20140095834 | INSTRUCTION AND LOGIC FOR BOYER-MOORE SEARCH OF TEXT STRINGS - Instructions and logic provide extended vector suffix comparisons for Boyer-Moore searches. Some embodiments, responsive to an instruction specifying: a pattern source operand and a target source operand, compare each of m data elements of the pattern operand with each data element of the target operand. A first and second equal ordered aggregation operation are performed from the comparisons according to the m data elements of the pattern source operand. A result of the first and second aggregation operations indicating whether or not a possible match exists between the m data elements of the pattern source operand and d data element positions relative to data elements of the target source operand is stored. Ordering of the data elements of the pattern and the target operands may be reversed for the second aggregation operation, and d may be a sum of m−1 and the quantity of target operand elements in some embodiments. | 04-03-2014 |
20140101414 | TECHNIQUE FOR TRANSLATING DEPENDENT INSTRUCTIONS - In response to determining an operation is a dependent operation, a mapper of a processor determines the source registers of the operation from which the dependent operation depends. The mapper translates the dependent operation to a new operation that uses as its source operands at least one of the determined source registers and a source register of the dependent operation. The new operation is independent of other pending operations and therefore can be executed without waiting for execution of other operations, thus reducing execution latency. | 04-10-2014 |
20140149718 | INSTRUCTION AND LOGIC TO PROVIDE PUSHING BUFFER COPY AND STORE FUNCTIONALITY - Instructions and logic provide pushing buffer copy and store functionality. Some embodiments include a first hardware thread or processing core, and a second hardware thread or processing core, a cache to store cache coherent data in a cache line for a shared memory address accessible by the second hardware thread or processing core. Responsive to decoding an instruction specifying a source data operand, said shared memory address as a destination operand, and one or more owner of said shared memory address, one or more execution units copy data from the source data operand to the cache coherent data in the cache line for said shared memory address accessible by said second hardware thread or processing core in the cache when said one or more owner includes said second hardware thread or processing core. | 05-29-2014 |
20140173255 | INSTRUCTION SET FOR SUPPORTING WIDE SCALAR PATTERN MATCHES - A processor includes an instruction decoder to receive an instruction having a first operand, a second operand, and a third operand, and an execution unit coupled to the instruction decoder to execute the instruction, the execution unit to individually perform a shift operation by at least one bit for each of a plurality of data elements stored in a storage location indicated by the second operand, for each of the data elements that has an overflow in response to the shift-left operation, to carry over the overflow into an adjacent data element based on a first bitmask obtained from the third operand, generating a final result, and to store the final result in a storage location indicated by the first operand. | 06-19-2014 |
20140173256 | PROCESSOR CONFIGURED FOR OPERATION WITH MULTIPLE OPERATION CODES PER INSTRUCTION - A method of associating operation codes with instructions for execution in a processor includes the steps of assigning the operation codes to the instructions in a manner that allows a given instruction to have multiple assigned operation codes and selecting a particular one of the multiple assigned operation codes for use in executing a program containing the given instruction. The assigning step may be implemented in conjunction with design of the processor, and may further comprise the steps of determining frequency of occurrence of adjacent pairs of instructions in one or more programs likely to be run on the processor, and assigning the operation codes to the instructions based at least in part on the determined frequency of occurrence of the adjacent pairs of instructions. The selecting step may be implemented in conjunction with code generation for the program containing the given instruction, for example, in a code assembler. | 06-19-2014 |
20140181476 | Scheduler Implementing Dependency Matrix Having Restricted Entries - A scheduler implementing a dependency matrix having restricted entries is disclosed. A processing device of the disclosure includes a decode unit to decode an instruction and a scheduler communicably coupled to the decode unit. In one embodiment, the scheduler is configured to receive the decoded instruction, determine that the decoded instruction qualifies for allocation as a restricted reservation station (RS) entry type in a dependency matrix maintained by the scheduler, identify RS entries in the dependency matrix that are free for allocation, allocate one of the identified free RS entries with information of the decoded instruction in the dependency matrix, and update a row of the dependency matrix corresponding to the claimed RS entry with source dependency information of the decoded instruction. | 06-26-2014 |
20140181477 | Compressing Execution Cycles For Divergent Execution In A Single Instruction Multiple Data (SIMD) Processor - In one embodiment, the present invention includes a processor with a vector execution unit to execute a vector instruction on a vector having a plurality of individual data elements, where the vector instruction is of a first width and the vector execution unit is of a smaller width. The processor further includes a control logic coupled to the vector execution unit to compress a number of execution cycles consumed in execution of the vector instruction when at least some of the individual data elements are not to be operated on by the vector instruction. Other embodiments are described and claimed. | 06-26-2014 |
20140189306 | ENHANCED LOOP STREAMING DETECTOR TO DRIVE LOGIC OPTIMIZATION - An enhanced loop streaming detection mechanism is provided in a processor to reduce power consumption. The processor includes a decoder to decode instructions in a loop into micro-operations, and a loop streaming detector to detect the presence of the loop in the micro-operations. The processor also includes a loop characteristic tracker unit to identify hardware components downstream from the decoder that are not to be used by the micro-operations in the loop, and to disable the identified hardware components. The processor also includes execution circuitry to execute the micro-operations in the loop with the identified hardware components disabled. | 07-03-2014 |
20140189307 | METHODS, APPARATUS, INSTRUCTIONS, AND LOGIC TO PROVIDE VECTOR ADDRESS CONFLICT RESOLUTION WITH VECTOR POPULATION COUNT FUNCTIONALITY - Instructions and logic provide SIMD address conflict resolution with vector population count functionality. Some embodiments include processors with a register with a variable plurality of data fields, each of the data fields to store a variable second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of bits set to one for corresponding data fields. Responsive to decoding a vector population count instruction, execution units count the number of bits set to one for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector population count instructions can be used with variable sized elements and conflict masks to generate iteration counts and completion masks to be used each iteration to resolve dependencies in gather-modify-scatter SIMD operations. | 07-03-2014 |
20140189308 | METHODS, APPARATUS, INSTRUCTIONS, AND LOGIC TO PROVIDE VECTOR ADDRESS CONFLICT DETECTION FUNCTIONALITY - Instructions and logic provide SIMD address conflict detection functionality. Some embodiments include processors with a register with a variable plurality of data fields, each of the data fields to store an offset for a data element in a memory. A destination register has corresponding data fields, each of these data fields to store a variable second plurality of bits to store a conflict mask having a mask bit for each offset. Responsive to decoding a vector conflict instruction, execution units compare the offset in each data field with every less significant data field to determine if they hold a matching offset, and in corresponding conflict masks in the destination register, set any mask bits corresponding to a less significant data field with a matching offset. Vector address conflict detection can be used with variable sized elements and to generate conflict masks to resolve dependencies in gather-modify-scatter SIMD operations. | 07-03-2014 |
20140189309 | METHODS, APPARATUS, INSTRUCTIONS, AND LOGIC TO PROVIDE PERMUTE CONTROLS WITH LEADING ZERO COUNT FUNCTIONALITY - Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations. | 07-03-2014 |
20140195782 | METHOD AND APPARATUS TO PROCESS SHA-2 SECURE HASHING ALGORITHM - A processor includes an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants. The processor further includes an execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction. | 07-10-2014 |
20140215188 | Multi-Level Dispatch for a Superscalar Processor - In an embodiment, a processor includes a multi-level dispatch circuit configured to supply operations for execution by multiple parallel execution pipelines. The multi-level dispatch circuit may include multiple dispatch buffers, each of which is coupled to multiple reservation stations. Each reservation station may be coupled to a respective execution pipeline and may be configured to schedule instruction operations (ops) for execution in the respective execution pipeline. The sets of reservation stations coupled to each dispatch buffer may be non-overlapping. Thus, if a given op is to be executed in a given execution pipeline, the op may be sent to the dispatch buffer which is coupled to the reservation station that provides ops to the given execution pipeline. | 07-31-2014 |
20140215189 | DATA PROCESSING APPARATUS AND METHOD FOR CONTROLLING USE OF AN ISSUE QUEUE - An apparatus and method includes execution circuitry including a wide operand execution unit configured to allow up to N bits of operand data to be processed during execution of a single instruction. Decoder circuitry decodes and generates, for each instruction, at least one control data block identifying an operation to be performed by the execution circuitry and at least two re-combineable control data blocks for the instruction. Issue queue control circuitry then allocates a slot in the issue queue for each of the at least two data blocks and up to M bits of associated operand data, and marks those allocated slots to identify that they contain re-combineable control data blocks. The issue queue control circuitry issues the combined block to said wide operand execution unit along with the operand data contained in each of the allocated slots for said at least two control data blocks. | 07-31-2014 |
20140223142 | PROCESSOR AND COMPILER - A Very Long Instruction Word (VLIW) processor having an instruction set with a reduced size resulting in a small number of bits being necessary to specify registers. The VLIW processor includes a register file, and first through third operation units, and executes a very long instruction word. Further, the very long instruction word includes a register specifying field which specifies a least one of the registers in the register file and a plurality of instructions. The operand of each instruction includes bits src | 08-07-2014 |
20140258683 | INSTRUCTION AND LOGIC TO PROVIDE VECTOR HORIZONTAL COMPARE FUNCTIONALITY - Instructions and logic provide vector horizontal compare functionality. Some embodiments, responsive to an instruction specifying: a destination operand, a size of the vector elements, a source operand, and a mask corresponding to a portion of the vector element data fields in the source operand; read values from data fields of the specified size in the source operand, corresponding to the mask and compare the values for equality. In some embodiments, responsive to a detection of inequality, a trap may be taken. In some alternative embodiments, a flag may be set. In other alternative embodiments, a mask field may be set to a masked state for the corresponding unequal value(s). In some embodiments, responsive to all unmasked data fields of the source operand being equal to a particular value, that value may be broadcast to all data fields of the specified size in the destination operand. | 09-11-2014 |
20140281391 | METHOD AND APPARATUS FOR FORWARDING LITERAL GENERATED DATA TO DEPENDENT INSTRUCTIONS MORE EFFICIENTLY USING A CONSTANT CACHE - A processor to a store constant value (immediate or literal) in a cache upon decoding a move immediate instruction in which the immediate is to be moved (copied or written) to an architected register. The constant value is stored in an entry in the cache. Each entry in the cache includes a field to indicate whether its stored constant value is valid, and a field to associate the entry with an architected register. Once a constant value is stored in the cache, it is immediately available for forwarding to a processor pipeline where a decoded instruction may need the constant value as an operand. | 09-18-2014 |
20140281392 | PROFILING CODE PORTIONS TO GENERATE TRANSLATIONS - The disclosure provides a micro-processing system operable in a hardware decoder mode and in a translation mode. In the hardware decoder mode, the hardware decoder receives and decodes non-native ISA instructions into native instructions for execution in a processing pipeline. In the translation mode, native translations of non-native ISA instructions are executed in the processing pipeline without using the hardware decoder. The system includes a code portion profile stored in hardware that changes dynamically in response to use of the hardware decoder to execute portions of non-native ISA code. The code portion profile is then used to dynamically form new native translations executable in the translation mode. | 09-18-2014 |
20140281393 | REORDER-BUFFER-BASED STATIC CHECKPOINTING FOR RENAME TABLE REBUILDING - Out-of-order CPUs, devices and methods diminish the time penalty from stalling the pipe to rebuild a rename table, such as due to a misprediction. A microprocessor can include a pipe that has a decoder, a dispatcher, and at least one execution unit. A rename table stores rename data, and a check-point table (“CPT”) stores rename data received from the dispatcher. A Re-Order Buffer (“ROB”) stores ROB data, and has a static mapping relationship with the CPT. If the rename table is flushed, such as due to a misprediction, the rename table is rebuilt at least in part by concurrent copying of rename data stored in the CPT, in coordination with walking the ROB. | 09-18-2014 |
20140281394 | METHOD TO IMPROVE SPEED OF EXECUTING RETURN BRANCH INSTRUCTIONS IN A PROCESSOR - An apparatus and method for executing call branch and return branch instructions in a processor by utilizing a link register stack. The processor includes a branch counter that is initialized to zero, and is set to zero each time the processor decodes a link register manipulating instruction other than a call branch instruction. The branch counter is incremented by one each time a call branch instruction is decoded and an address is pushed onto the link register stack. In response to decoding a return branch instruction and provided the branch counter is not zero, a target address for the decoded return branch instruction is popped off the link register stack, the branch counter is decremented, and there is no need to check the target address for correctness. | 09-18-2014 |
20140281395 | Systems, Apparatuses, and Methods for Reducing the Number of Short Integer Multiplications - Systems, methods, and apparatuses for calculating a square of a data value of a first source operand, a square of a data value of a second source operand, and a multiplication of the data of the first and second operands only using one multiplication are described. | 09-18-2014 |
20140281396 | PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS TO CONSOLIDATE UNMASKED ELEMENTS OF OPERATION MASKS - An instruction processing apparatus of an aspect includes a plurality of operation mask registers. The apparatus also includes a decode unit to receive an operation mask consolidation instruction. The operation mask consolidation instruction is to indicate a source operation mask register, of the plurality of operation mask registers, and a destination storage location. The source operation mask register is to include a source operation mask that is to include a plurality of masked elements that are to be disposed within a plurality of unmasked elements. An execution unit is coupled with the decode unit. The execution unit, in response to the operation mask consolidation instruction, is to store a consolidated operation mask in the destination storage location. The consolidated operation mask is to include the unmasked elements from the source operation mask consolidated together. Other apparatus, methods, systems, and instructions are also disclosed. | 09-18-2014 |
20140281397 | FUSIBLE INSTRUCTIONS AND LOGIC TO PROVIDE OR-TEST AND AND-TEST FUNCTIONALITY USING MULTIPLE TEST SOURCES - Fusible instructions and logic provide OR-test and AND-test functionality on multiple test sources. Some embodiments include a processor decode stage to decode a test instruction for execution, the instruction specifying first, second and third source data operands, and an operation type. Execution units, responsive to the decoded test instruction, perform one logical operation, according to the specified operation type, between data from the first and second source data operands, and perform a second logical operation between the data from the third source data operand and the result of the first logical operation to set a condition flag. Some embodiments generate the test instruction dynamically by fusing one logical instruction with a prior-art test instruction. Other embodiments generate the test instruction through a just-in-time compiler. Some embodiments also fuse the test instruction with a subsequent conditional branch instruction, and perform a branch according to how the condition flag is set. | 09-18-2014 |
20140281398 | INSTRUCTION EMULATION PROCESSORS, METHODS, AND SYSTEMS - A processor of an aspect includes decode logic to receive a first instruction and to determine that the first instruction is to be emulated. The processor also includes emulation mode aware post-decode instruction processor logic coupled with the decode logic. The emulation mode aware post-decode instruction processor logic is to process one or more control signals decoded from an instruction. The instruction is one of a set of one or more instructions used to emulate the first instruction. The one or more control signals are to be processed differently by the emulation mode aware post-decode instruction processor logic when in an emulation mode than when not in the emulation mode. Other apparatus are also disclosed as well as methods and systems. | 09-18-2014 |
20140289499 | SWITCHING BETWEEN DEDICATED FUNCTION HARDWARE AND USE OF A SOFTWARE ROUTINE TO GENERATE RESULT DATA - An apparatus for processing data 2 is provided including processing circuitry 24 controlled by an instruction decoder 20 in response to a stream of program instructions. There is also provided dedicated function hardware 12 configured to receive output data from the processing circuitry and to perform a dedicated processing operation. The instruction decoder 20 is responsive to an end instruction 54 and a software processing flag (blend_shade_enabled) to control the processing circuitry to end a current software routine, to generate output data and in dependence upon the software processing flag either trigger processing of the output data by the dedicated function hardware or trigger the processing circuitry to perform a further software routine upon the output data to generate software generated result data instead of hardware generated result data as generated by the dedicated hardware circuitry. | 09-25-2014 |
20140331029 | SYNCHRONISATION OF EXECUTION THREADS ON A MULTI-THREADED PROCESSOR - Method and apparatus are provided for synchronising execution of a plurality of threads on a multi-threaded processor. A program executed by a thread can have a number of synchronisation points corresponding to points where execution is to be synchronised with another thread. Execution of a thread is paused when it reaches a synchronisation point until at least one other thread with which it is intended to be synchronised reaches a corresponding synchronisation point. Execution is subsequently resumed. A control core maintains status data for threads and can cause a thread that is ready to run to use execution resources that were occupied by a thread that is waiting for a synchronization event. | 11-06-2014 |
20140344552 | PROVIDING STATUS OF A PROCESSING DEVICE WITH PERIODIC SYNCHRONIZATION POINT IN INSTRUCTION TRACING SYSTEM - In accordance with embodiments disclosed herein, there is provided systems and methods for providing status of a processing device with a periodic synchronization point in an instruction tracing system. For example, the method may include generating a boundary packet based on a unique byte pattern in a packet log. The boundary packet provides a starting point for packet decode. The method may also include generating a plurality of state packets based on status information of the processor. The plurality of state packets follows the boundary packet when outputted into the packet log. | 11-20-2014 |
20140344553 | Gathering and Scattering Multiple Data Elements - According to a first aspect, efficient data transfer operations can be achieved by: decoding by a processor device, a single instruction specifying a transfer operation for a plurality of data elements between a first storage location and a second storage location; issuing the single instruction for execution by an execution unit in the processor; detecting an occurrence of an exception during execution of the single instruction; and in response to the exception, delivering pending traps or interrupts to an exception handler prior to delivering the exception. | 11-20-2014 |
20140380021 | PROCESSOR, MULTIPROCESSOR SYSTEM, COMPILER, SOFTWARE SYSTEM, MEMORY CONTROL SYSTEM, AND COMPUTER SYSTEM - A processor includes: a first instruction processing unit that, in a first mode, receives a first input including instructions included in a first instruction set; a second instruction processing unit that, in a second mode, receives the first input, the second instruction processing unit having a simpler configuration than the first instruction processing unit; a third instruction processing unit that, in a third mode, receives a second input including instructions included in a second instruction set, the second instruction set including part of the instructions included in the first instruction set, the third instruction processing unit having a simpler configuration than the first instruction processing unit and the second instruction processing unit; a selection unit that selects, according to a mode, a result of decoding by one of the instruction processing units; and an instruction execution unit that executes an instruction according to the selected result of decoding. | 12-25-2014 |
20150012726 | LOOP STREAMING DETECTOR FOR STANDARD AND COMPLEX INSTRUCTION TYPES - A processor includes a microcode storage comprising a plurality of microcode flows and a decode logic coupled to the microcode storage. The decode logic is configured to receive a first instruction, decode the first instruction into an entry point vector to a first microcode flow in the microcode storage, the entry point vector comprising a first indicator specifying a number of clock cycles associated with the first microcode flow, initiate the microcode storage, wherein the microcode storage inserts microinstructions of the first microcode flow into an instruction queue, count clock cycles after initiating the microcode storage, and decode a second instruction without first receiving a return from the microcode storage, wherein the second instruction is decoded at a particular clock cycle based on the number of clock cycles associated with the first microcode flow. | 01-08-2015 |
20150012727 | PROCESSING DEVICE AND CONTROL METHOD OF PROCESSING DEVICE - A processing device has: a plurality of registers configured to correspond to a plurality of accessible register windows; and an instruction decoder configured to inhibit, when, during an execution of a first instruction of changing a number of a current register window by one, a second instruction of changing the number of the current register window by one in a direction same as a direction of the first instruction is input, a decode of the second instruction until when the execution of the first instruction is completed, and to perform, when, during the execution of the first instruction of changing the number of the current register window by one, a second instruction of changing the number of the current register window by one in a direction opposite to the direction of the first instruction is input, a decode of the second instruction during the execution of the first instruction. | 01-08-2015 |
20150019842 | Highly Efficient Different Precision Complex Multiply Accumulate to Enhance Chip Rate Functionality in DSSS Cellular Systems - This invention is a digital signal processor capable of performing correlation of data with pseudo noise for code division multiple access (CDMA) decoding using clusters. Each cluster includes plural multipliers. The multipliers multiply real and imaginary parts of packed data by corresponding pseudo noise data. Within a cluster the real parts and the imaginary parts of the products are summed separately. This forms plural complex number outputs equal in number to the number of clusters. The pseudo noise data is offset relative to the data input differing amounts for different clusters. The clusters are divided into first half clusters receiving data from even numbered slots and second half clusters receiving data from odd numbered slots. The correlation unit includes a mask input to selectively zero a multiplier product. | 01-15-2015 |
20150026435 | INSTRUCTION SET ARCHITECTURE WITH EXTENSIBLE REGISTER ADDRESSING - A method and circuit arrangement selectively source and/or write data from/to extended registers of an extended register file based in part on whether an operand address of an instruction references a primary register of primary register file configured to store a pointer to the extended register. Control logic connected to the primary register file and the extended register file determines whether the operand address references a primary register configured to store a pointer, and responsive to the determination, the control logic causes execution logic to selectively source and/or write data from/to the extended register pointed to by the pointer stored in the referenced primary register. | 01-22-2015 |
20150032998 | METHOD, APPARATUS, AND SYSTEM FOR TRANSACTIONAL SPECULATION CONTROL INSTRUCTIONS - An apparatus and method is described herein for providing speculation control instructions. An xAcquire and xRelease instruction are provided to define a critical section. In one embodiment, the xAcquire instruction includes a lock instruction with an elision prefix and the xRelease instruction includes a lock release instruction with an elision prefix. As a result, a processor is able to elide locks and transactionally execute a critical section defined in software by xAcquire and xRelease. But by adding only prefix hints, legacy processor are able to execute the same code by just ignoring the hints and executing the critical section traditionally with locks to guarantee mutual exclusion. Moreover, xBegin and xEnd are similarly provided for in an Instruction Set Architecture (ISA) to define a transactional code region. In addition, other control speculation instructions, such as xAbort to enable explicit abort of a critical or transactional code section and xTest to test a state of speculative execution is also provided in the ISA. | 01-29-2015 |
20150039860 | RDA CHECKPOINT OPTIMIZATION - A system and method for efficiently performing microarchitectural checkpointing. A register rename unit within a processor determines whether a physical register number qualifies to have duplicate mappings. Information for maintenance of the duplicate mappings is stored in a register duplicate array (RDA). To reduce the penalty for misspeculation or exception recovery, control logic in the processor supports multiple checkpoints. The RDA is one of multiple data structures to have checkpoint copies of state. The RDA utilizes a content addressable memory (CAM) to store physical register numbers. The duplicate counts for both the current state and the checkpoint copies for a given physical register number are updated when instructions utilizing the given physical register number are retired. To reduce on-die real estate and power consumption, a single CAM entry is stores the physical register number and the other fields are stored in separate storage elements. | 02-05-2015 |
20150052333 | Systems, Apparatuses, and Methods for Stride Pattern Gathering of Data Elements and Stride Pattern Scattering of Data Elements - Embodiments of systems, apparatuses, and methods for performing gather and scatter stride instruction in a computer processor are described. In some embodiments, the execution of a gather stride instruction causes a conditionally storage of strided data elements from memory into the destination register according to at least some of bit values of a writemask. | 02-19-2015 |
20150074378 | System and Method for an Asynchronous Processor with Heterogeneous Processors - Embodiments are provided for an asynchronous processor with heterogeneous processors. In an embodiment, the apparatus for an asynchronous processor comprises a memory configured to cache instructions, and a first unit (XU) configured to processing a first instruction of the instructions. The apparatus also comprises a second XU having less restricted access than the first XU to a resource of the asynchronous processor and configured to process a second instruction of the instructions. The second instruction requires access to the resource. The apparatus further comprises a feedback engine configured to decode the first instruction and the second instruction, and issue the first instruction to the first XU, and a scheduler configured to send the second instruction to the second XU. | 03-12-2015 |
20150082004 | Faster and More Efficient Different Precision Sum of Absolute Differences for Dynamically Configurable Block Searches for Motion Estimation - This invention is a digital signal processor form plural sums of absolute values (SAD) in a single operation. An operational unit performing a sum of absolute value operation comprising two sets of a plurality of rows, each row producing a SAD output. Plural absolute value difference units receive corresponding packed candidate pixel data and packed reference pixel data. A row summer sums the output of the absolute value difference units in the row. The candidate pixels are offset relative to the reference pixels by one pixel for each succeeding row in a set of rows. The two sets of rows operate on opposite halves of the candidate pixels packed within an instruction specified operand. The SAD operations can be performed on differing data widths employing carry chain control in the absolute difference unit and the row summers. | 03-19-2015 |
20150082005 | PROCESSING SYSTEM AND METHOD OF INSTRUCTION SET ENCODING SPACE UTILIZATION - A processing system comprises a processing device; a first instruction set encoded in a first encoding space and comprising one or more first instructions; a second instruction set encoded in a second encoding space different from the first encoding space and comprising two or more orthogonal second instructions; and an instruction encoder arranged to encode and encapsulate subsets of the second instructions in instruction containers, each instruction container sized to comprise a plurality of the second instructions. | 03-19-2015 |
20150082006 | System and Method for an Asynchronous Processor with Asynchronous Instruction Fetch, Decode, and Issue - Embodiments are provided for an asynchronous processor with an asynchronous Instruction fetch, decode, and issue unit. The asynchronous processor comprises an execution unit for asynchronous execution of a plurality of instructions, and a fetch, decode and issue unit configured for asynchronous decoding of the instructions. The fetch, decode and issue unit comprises a plurality of resources supporting functions of the fetch, decode and issue unit, and a plurality of decoders arranged in a predefined order for passing a plurality of tokens. The tokens control access of the decoders to the resources and allow the decoders exclusive access to the resources. The fetch, decode and issue unit also comprises an issuer unit for issuing the instructions from the decoders to the execution unit | 03-19-2015 |
20150082007 | REGISTER MAPPING WITH MULTIPLE INSTRUCTION SETS - A processor core supports execution of program instruction from both a first instruction set and a second instruction set. An architectural register file | 03-19-2015 |
20150082008 | FORMING INSTRUCTION GROUPS BASED ON DECODE TIME INSTRUCTION OPTIMIZATION - Instructions are grouped into instruction groups based on optimizations that may be performed. An instruction is obtained, and a determination is made as to whether the instruction is to be included in a current instruction group or another instruction group. This determination is made based on whether the instruction is a candidate for optimization, such as decode time instruction optimization. If it is determined that the instruction is to be included in another group, then the other group is formed to include the instruction. | 03-19-2015 |
20150095617 | USING SOFTWARE HAVING CONTROL TRANSFER TERMINATION INSTRUCTIONS WITH SOFTWARE NOT HAVING CONTROL TRANSFER TERMINATION INSTRUCTIONS - In an embodiment, the present invention includes a processor having a decode unit, an execution unit, and a retirement unit. The decode unit is to decode control transfer instructions and the execution unit is to execute control transfer instructions. The retirement unit is to retire a first control transfer instruction, and to raise a fault if a next instruction to be retired after the first control transfer instruction is not a second control transfer instruction and a target instruction of the first control transfer instruction is in code using the control transfer instructions. | 04-02-2015 |
20150106591 | INSTRUCTION AND LOGIC FOR PROCESSING TEXT STRINGS - Method, apparatus, and program means for performing a string comparison operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources store a result of a comparison between each data element of a first and second operand corresponding to a first and second text string, respectively. | 04-16-2015 |
20150106592 | INSTRUCTION AND LOGIC FOR PROCESSING TEXT STRINGS - Method, apparatus, and program means for performing a string comparison operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources store a result of a comparison between each data element of a first and second operand corresponding to a first and second text string, respectively. | 04-16-2015 |
20150106593 | INSTRUCTION AND LOGIC FOR PROCESSING TEXT STRINGS - Method, apparatus, and program means for performing a string comparison operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources store a result of a comparison between each data element of a first and second operand corresponding to a first and second text string, respectively. | 04-16-2015 |
20150106594 | INSTRUCTION AND LOGIC FOR PROCESSING TEXT STRINGS - Method, apparatus, and program means for performing a string comparison operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources store a result of a comparison between each data element of a first and second operand corresponding to a first and second text string, respectively. | 04-16-2015 |
20150113250 | MICROPROCESSOR WITH COMPRESSED AND UNCOMPRESSED MICROCODE MEMORIES - A microprocessor includes a plurality of memories each configured to hold microcode instructions. At least a first of the plurality of memories is configured to provide M-bit wide words of compressed microcode instructions, and at least a second of the plurality of memories is configured to provide N-bit wide words of uncompressed microcode instructions. M and N are integers greater than zero and N is greater than M. The microprocessor also includes a decompression unit configured to decompress the compressed microcode instructions after being fetched from the at least a first of the plurality of memories and before being executed. | 04-23-2015 |
20150324199 | COMPUTER PROCESSOR AND SYSTEM WITHOUT AN ARITHMETIC AND LOGIC UNIT - A computer system comprising a processor and a memory, the processor comprising an instruction cycle circuit configured to repeatedly obtain a next instruction of a computer program, an instruction decoder configured to decode and execute the instruction obtained by the instruction cycle circuit, the computer system supporting multiple arithmetic and/or logic operations under control of one or more of the instructions, wherein the memory stores multiple tables, each specific one of the multiple arithmetic and/or logic operations being supported by a specific table stored in the memory, each specific table comprising the result of the specific arithmetic operations for a range of inputs. | 11-12-2015 |
20150347144 | Decoding Instructions That Are Modified By One Or More Other Instructions - Methods and apparatus are provided for decoding instructions in a computer program wherein the instructions include one or more base instructions that are subject to modification by one or more other instructions. A decoder determines whether a first received instruction was arrived at by a non-incremental change to a program counter (i.e. a jump in the program). If the first instruction was arrived at by a non-incremental change to the program counter the decoder decodes the immediately preceding instruction to determine if the original instruction is a base instruction subject to modification by one or more other instructions. If the preceding instruction indicates that the original instruction is a base instruction an error has occurred and exception handling code is invoked. | 12-03-2015 |
20150370563 | MULTI-PROCESSOR SYSTEM HAVING TRIPWIRE DATA MERGING AND COLLISION DETECTION - An integrated circuit includes a pool of processors and a Tripwire Data Merging and Collision Detection Circuit (TDMCDC). Each processor has a special tripwire bus port. Execution of a novel tripwire instruction causes the processor to output a tripwire value onto its tripwire bus port. Each respective tripwire bus port is coupled to a corresponding respective one of a plurality of tripwire bus inputs of the TDMCDC. The TDMCDC receives tripwire values from the processors and communicates them onto a consolidated tripwire bus. From the consolidated bus the values are communicated out of the integrated circuit and to a debug station. If more than one processor outputs a valid tripwire value at a given time, then the TDMCDC asserts a collision bit signal that is communicated along with the tripwire value. Receiving tripwire values onto the debug station facilitates use of the debug station in monitoring and debugging processor code. | 12-24-2015 |
20150378736 | INSTRUCTIONS AND LOGIC TO PROVIDE GENERAL PURPOSE GF(256) SIMD CRYPTOGRAPHIC ARITHMETIC FUNCTIONALITY - Instructions and logic provide general purpose GF(2 | 12-31-2015 |
20160011874 | SILENT MEMORY INSTRUCTIONS AND MISS-RATE TRACKING TO OPTIMIZE SWITCHING POLICY ON THREADS IN A PROCESSING DEVICE | 01-14-2016 |
20160011877 | MANAGING INSTRUCTION ORDER IN A PROCESSOR PIPELINE | 01-14-2016 |
20160011982 | VARIABLE HANDLES | 01-14-2016 |
20160026463 | ZERO CYCLE MOVE USING FREE LIST COUNTS - A system and method for reducing the latency of data move operations. A register rename unit within a processor determines whether a decoded move instruction qualifies for a zero cycle move operation. If so, control logic assigns a physical register identifier associated with a source operand of the move instruction to the destination operand of the move instruction. Additionally, the register rename unit marks the given move instruction to prevent it from proceeding in the processor pipeline. Further maintenance of the particular physical register identifier may be done by the register rename unit during commit of the given move instruction. | 01-28-2016 |
20160070575 | PROCESSOR TO EXECUTE SHIFT RIGHT MERGE INSTRUCTIONS - Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block. | 03-10-2016 |
20160085547 | DATA ELEMENT SELECTION AND CONSOLIDATION PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS - A processor includes packed data registers, and a decode unit to decode a data element selection and consolidation instruction. The instruction is to have a first source packed data operand that is to have a plurality of data elements, and a second source operand that is to have a plurality of mask elements. Each mask element corresponds to a different data element in the same relative position. An execution unit is coupled with the decode unit. The execution unit, in response to the instruction, is to store a result packed data operand in a destination storage location that is to be indicated by the instruction. The result packed data operand is to include all data elements of the first source packed data operand, which correspond to unmasked mask elements of the second source operand, consolidated together in a portion of the result packed data operand. | 03-24-2016 |
20160092223 | PERSISTENT STORE FENCE PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS - A processor of an aspect includes a decode unit to decode a persistent store fence instruction. The processor also includes a memory subsystem module coupled with the decode unit. The memory subsystem module, in response to the persistent store fence instruction, is to ensure that a given data corresponding to the persistent store fence instruction is stored persistently in a persistent storage before data of all subsequent store instructions is stored persistently in the persistent storage. The subsequent store instructions occur after the persistent store fence instruction in original program order. Other processors, methods, systems, and articles of manufacture are also disclosed. | 03-31-2016 |
20160092226 | Systems, Apparatuses, and Methods for Zeroing of Bits in a Data Element - Embodiments of systems, methods and apparatuses for execution a NAME instruction are described. The execution of a VPBZHI causes, on a per data element basis of a second source, a zeroing of bits higher (more significant) than a starting point in the data element. The starting point is defined by the contents of a data element in a first source. The resultant data elements are stored in a corresponding data element position of a destination. | 03-31-2016 |
20160092227 | Robust and High Performance Instructions for System Call - Robust system call and system return instructions are executed by a processor to transfer control between a requester and an operating system kernel. The processor includes execution circuitry and registers that store pointers to data structures in memory. The execution circuitry receives a system call instruction from a requester to transfer control from a first privilege level of the requester to a second privilege level of an operating system kernel. In response, the execution circuitry swaps the data structures that are pointed to by the registers between the requester and the operating system kernel in one atomic transition. | 03-31-2016 |
20160103685 | SYSTEM REGISTER ACCESS - A data processing apparatus for accessing several system registers using a single command includes system registers and command generation circuitry capable of analysing a plurality of decoded system register access instructions, each specifying a system register identifier. In response to a predetermined condition, the command generation circuitry generates a single command to represent the plurality of decoded system register access instructions. The predetermined condition comprises a requirement that a total width of the system registers specified by the plurality of decoded system register access instructions is less than or equal to a predefined data processing width. | 04-14-2016 |
20160132330 | INSTRUCTION AND LOGIC FOR BOYER-MOORE SEARCH OF TEXT STRINGS - Instructions and logic provide extended vector suffix comparisons for Boyer-Moore searches. Some embodiments, responsive to an instruction specifying: a pattern source operand and a target source operand, compare each of m data elements of the pattern operand with each data element of the target operand. A first and second equal ordered aggregation operation are performed from the comparisons according to the m data elements of the pattern source operand. A result of the first and second aggregation operations indicating whether or not a possible match exists between the m data elements of the pattern source operand and d data element positions relative to data elements of the target source operand is stored. Ordering of the data elements of the pattern and the target operands may be reversed for the second aggregation operation, and d may be a sum of m−1 and the quantity of target operand elements in some embodiments. | 05-12-2016 |
20160132332 | SIGNAL PROCESSING DEVICE AND METHOD OF PERFORMING A BIT-EXPAND OPERATION - A signal processing device comprising at least one control unit arranged to receive at least one bit-expand instruction, decode the received at least one bit-expand instruction, and output at least one control signal in accordance with the received at least one bit-expand instruction. The signal processing device further includes at least one execution unit component arranged to receive at least one source register value comprising at least one data bit to be expanded, extract at least one data bit from the at least one source register value located at an offset position according to the at least one control signal, expand the at least one extracted data bit into at least one multi-bit data type, and output the at least one multi-bit data type to at least one destination register. | 05-12-2016 |
20160132334 | Method, apparaturs, and system for speculative abort control mechanisms - An apparatus and method is described herein for providing robust speculative code section abort control mechanisms. Hardware is able to track speculative code region abort events, conditions, and/or scenarios, such as an explicit abort instruction, a data conflict, a speculative timer expiration, a disallowed instruction attribute or type, etc. And hardware, firmware, software, or a combination thereof makes an abort determination based on the tracked abort events. As an example, hardware may make an initial abort determination based on one or more predefined events or choose to pass the event information up to a firmware or software handler to make such an abort determination. Upon determining an abort of a speculative code region is to be performed, hardware, firmware, software, or a combination thereof performs the abort, which may include following a fallback path specified by hardware or software. And to enable testing of such a fallback path, in one implementation, hardware provides software a mechanism to always abort speculative code regions. | 05-12-2016 |
20160132335 | Method, apparatus, and system for speculative abort control mechanisms - An apparatus and method is described herein for providing robust speculative code section abort control mechanisms. Hardware is able to track speculative code region abort events, conditions, and/or scenarios, such as an explicit abort instruction, a data conflict, a speculative timer expiration, a disallowed instruction attribute or type, etc. And hardware, firmware, software, or a combination thereof makes an abort determination based on the tracked abort events. As an example, hardware may make an initial abort determination based on one or more predefined events or choose to pass the event information up to a firmware or software handler to make such an abort determination. Upon determining an abort of a speculative code region is to be performed, hardware, firmware, software, or a combination thereof performs the abort, which may include following a fallback path specified by hardware or software. And to enable testing of such a fallback path, in one implementation, hardware provides software a mechanism to always abort speculative code regions. | 05-12-2016 |
20160132336 | Method, apparatus, and system for speculative abort control mechanisms - An apparatus and method is described herein for providing robust speculative code section abort control mechanisms. Hardware is able to track speculative code region abort events, conditions, and/or scenarios, such as an explicit abort instruction, a data conflict, a speculative timer expiration, a disallowed instruction attribute or type, etc. And hardware, firmware, software, or a combination thereof makes an abort determination based on the tracked abort events. As an example, hardware may make an initial abort determination based on one or more predefined events or choose to pass the event information up to a firmware or software handler to make such an abort determination. Upon determining an abort of a speculative code region is to be performed, hardware, firmware, software, or a combination thereof performs the abort, which may include following a fallback path specified by hardware or software. And to enable testing of such a fallback path, in one implementation, hardware provides software a mechanism to always abort speculative code regions. | 05-12-2016 |
20160132337 | Method, apparatus, and system for speculative abort control mechanisms - An apparatus and method is described herein for providing robust speculative code section abort control mechanisms. Hardware is able to track speculative code region abort events, conditions, and/or scenarios, such as an explicit abort instruction, a data conflict, a speculative timer expiration, a disallowed instruction attribute or type, etc. And hardware, firmware, software, or a combination thereof makes an abort determination based on the tracked abort events. As an example, hardware may make an initial abort determination based on one or more predefined events or choose to pass the event information up to a firmware or software handler to make such an abort determination. Upon determining an abort of a speculative code region is to be performed, hardware, firmware, software, or a combination thereof performs the abort, which may include following a fallback path specified by hardware or software. And to enable testing of such a fallback path, in one implementation, hardware provides software a mechanism to always abort speculative code regions. | 05-12-2016 |
20160139929 | THREE-DIMENSIONAL MORTON COORDINATE CONVERSION PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS - A processor includes a plurality of packed data registers, a decode unit, and an execution unit. The decode unit is to decode a three-dimensional (3D) Morton coordinate conversion instruction. The 3D Morton coordinate conversion instruction to indicate a source packed data operand that is to include a plurality of 3D Morton coordinates, and to indicate one or more destination storage locations. The execution unit is coupled with the packed data registers and the decode unit. The execution unit, in response to the decode unit decoding the 3D Morton coordinate conversion instruction, is to store one or more result packed data operands in the one or more destination storage locations. The one or more result packed data operands are to include a plurality of sets of three 3D coordinates. Each of the sets of the three 3D coordinates is to correspond to a different one of the 3D Morton coordinates. | 05-19-2016 |
20160139930 | FOUR-DIMENSIONAL MORTON COORDINATE CONVERSION PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS - A processor includes packed data registers, a decode unit, and an execution unit. The decode unit is to decode a four-dimensional (4D) Morton coordinate conversion instruction. The 4D Morton coordinate conversion instruction is to indicate a source packed data operand that is to include a plurality of 4D Morton coordinates, and is to indicate one or more destination storage locations. The execution unit is coupled with the packed data registers and the decode unit. The execution unit, in response to the decode unit decoding the 4D Morton coordinate conversion instruction, is to store one or more result packed data operands in the one or more destination storage locations. The one or more result packed data operands are to include a plurality of sets of four 4D coordinates. Each of the sets of the four 4D coordinates is to correspond to a different one of the 4D Morton coordinates. | 05-19-2016 |
20160139931 | MORTON COORDINATE ADJUSTMENT PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS - A processor includes a decode unit to decode an instruction that is to indicate a source packed data operand to include Morton coordinates, a dimensionality of a multi-dimensional space having points that the Morton coordinates are to be mapped to, a given dimension of the multi-dimensional space, and a destination. The execution unit is coupled with the decode unit. The execution unit, in response to the decode unit decoding the instruction, stores a result packed data operand in the destination. The result operand is to include Morton coordinates that are each to correspond to a different one of the Morton coordinates of the source operand. The Morton coordinates of the result operand are to be mapped to points in the multi-dimensional space that differ from the points that the corresponding Morton coordinates of the source operand are to be mapped to by a fixed change in the given dimension. | 05-19-2016 |
20160154652 | PACKED DATA OPERATION MASK COMPARISON PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS | 06-02-2016 |
20160179539 | INSTRUCTION AND LOGIC TO PERFORM A CENTRIFUGE OPERATION | 06-23-2016 |
20160179541 | STATELESS CAPTURE OF DATA LINEAR ADDRESSES DURING PRECISE EVENT BASED SAMPLING | 06-23-2016 |
20160179542 | INSTRUCTION AND LOGIC TO PERFORM A FUSED SINGLE CYCLE INCREMENT-COMPARE-JUMP | 06-23-2016 |
20160179543 | INSTRUCTION SET ARCHITECTURE WITH OPCODE LOOKUP USING MEMORY ATTRIBUTE | 06-23-2016 |
20160179549 | Instruction and Logic for Loop Stream Detection | 06-23-2016 |
20160188328 | SYSTEMS, APPARATUSES, AND METHODS FOR DATA SPECULATION EXECUTION - Systems, methods, and apparatuses for data speculation execution (DSX) are described. In some embodiments, a hardware apparatus for performing DSX comprises a hardware decoder to decode an instruction, the instruction to include an opcode, and execution hardware to execute the decoded instruction to reset data speculative execution (DSX) tracking hardware to track speculative memory accesses, clear a DSX status indication in a DSX status register, and commit all speculatively executed stores of the DSX region and thereby end a DSX region. | 06-30-2016 |
20160188329 | SYSTEMS, APPARATUSES, AND METHODS FOR DATA SPECULATION EXECUTION - Systems, methods, and apparatuses for data speculation execution (DSX) are described. In some embodiments, a hardware apparatus for performing DSX comprises a hardware decoder to decode an instruction, the instruction to include an opcode, and execution hardware to execute the decoded instruction to continue a data speculative execution (DSX) and to determine that a DSX loop iteration is to be committed, commit speculative stores associated with the DSX loop iteration, and start a new DSX loop iteration. | 06-30-2016 |
20160188330 | SYSTEMS, APPARATUSES, AND METHODS FOR DATA SPECULATION EXECUTION - Systems, methods, and apparatuses for data speculation execution (DSX) are described. In some embodiments, a hardware apparatus for performing DSX comprises a hardware decoder to decode an instruction, the instruction to include an opcode and an operand to store a portion of a fallback address and an operand to store a stride value, execution hardware to execute the decoded instruction to initiate a data speculative execution (DSX) region by activating DSX tracking hardware to track speculative memory accesses and detect ordering violations in the DSX region, and storing the fallback address. | 06-30-2016 |
20160188331 | SIGNAL PROCESSING DEVICE AND METHOD OF PERFORMING A PACK-INSERT OPERATION - A signal processing device comprising at least one control unit arranged to receive at least one pack-insert instruction, decode the received at least one pack-insert instruction, and output at least one pack-insert control signal in accordance with the received pack-insert instruction. The signal processing device further comprising at least one pack-insert component arranged to receive at least a first data block to be inserted into a sequence of data blocks to be output to at least one destination register, receive a plurality of further data blocks to be packed within the sequence of data blocks to be output to the at least one destination register, arrange the at least first data block and the plurality of further data blocks into a sequence of data blocks based at least partly on the at least one pack-insert control signal, and output the sequence of data blocks. | 06-30-2016 |
20160188332 | SIMD SIGN OPERATION - Method, apparatus, and program means for nonlinear filtering and deblocking applications utilizing SIMD sign and absolute value operations. The method of one embodiment comprises receiving first data for a first block and second data for a second block. The first data and said second data are comprised of a plurality of rows and columns of pixel data. A block boundary between the first block and the second block is characterized. A correction factor for a deblocking algorithm is calculated with a first instruction for a sign operation that multiplies and with a second instruction for an absolute value operation. Data for pixels located along said block boundary between the first and second block are corrected. | 06-30-2016 |
20160188341 | APPARATUS AND METHOD FOR FUSED ADD-ADD INSTRUCTIONS - In one embodiment of the invention, a processor including a storage location configured to store a set of source packed-data operands, each of the operands having a plurality of packed-data elements that are positive or negative according to an immediate bit value within one of the operands. The processor also including: a decoder to decode an instruction requiring an input of a plurality of source operands, and an execution unit to receive the decoded instructions and to generate a result that is a sum of the source operands. In one embodiment, the result is stored back into one of the source operands or the result is stored into an operand that is independent of the source operands. | 06-30-2016 |
20160188342 | SYSTEMS, APPARATUSES, AND METHODS FOR DATA SPECULATION EXECUTION - Systems, methods, and apparatuses for data speculation execution (DSX) are described. In some embodiments, a hardware apparatus for performing DSX comprises a hardware decoder to decode an instruction, the instruction to include an opcode, and execution hardware to execute the decoded instruction inside a speculative execution (DSX) and rollback execution to a stored address and clear a DSX status indication in a DSX status register, and thereby abort the DSX. | 06-30-2016 |
20160196139 | SYSTEMS, APPARATUSES, AND METHODS FOR PERFORMING A SHUFFLE AND OPERATION (SHUFFLE-OP) | 07-07-2016 |
20160202975 | Method and Apparatus to Process 4-Operand SIMD Integer Multiply-Accumulate Instruction | 07-14-2016 |
20160202982 | INDIRECT INSTRUCTION PREDICATION | 07-14-2016 |
20160378467 | PERSISTENT COMMIT PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS - A processor includes at least one memory controller, and a decode unit to decode a persistent commit demarcate instruction. The persistent commit demarcate instruction is to indicate a destination storage location. The processor also includes an execution unit coupled with the decode unit and the at least one memory controller. The execution unit, in response to the persistent commit demarcate instruction, is to store a demarcation value in the destination storage location. The demarcation value may demarcate at least all first store to persistent memory operations that are to have been accepted to memory when the persistent commit demarcate instruction is performed, but which are not necessarily to have been stored persistently, from at least all second store to persistent memory operations that are not yet to have been accepted to memory when the persistent commit demarcate instruction is performed. | 12-29-2016 |
20160378470 | INSTRUCTION AND LOGIC FOR TRACKING FETCH PERFORMANCE BOTTLENECKS - A processor includes a front end, an execution unit, a retirement stage, a counter, and a performance monitoring unit. The front end includes logic to receive an event instruction to enable supervision of a front end event that will delay execution of instructions. The execution unit includes logic to set a register with parameters for supervision of the front end event. The front end further includes logic to receive a candidate instruction and match the candidate instruction to the front end event. The counter includes logic to generate the front end event upon retirement of the candidate instruction. | 12-29-2016 |
20160378473 | INSTRUCTION AND LOGIC FOR CHARACTERIZATION OF DATA ACCESS - A processor includes a front end to receive an instruction, a decoder to decode the instruction, a core to execute the first instruction, and a retirement unit to retire the first instruction. The core includes logic to execute the first instruction, including logic to repeatedly record a translation lookaside buffer (TLB) until a designated number of records are determined, and flush the TLB after a flush interval. | 12-29-2016 |
20160378487 | EFFICIENT INSTRUCTION FUSION BY FUSING INSTRUCTIONS THAT FALL WITHIN A COUNTER-TRACKED AMOUNT OF CYCLES APART - A technique to enable efficient instruction fusion within a computer system. In one embodiment, a processor logic delays the processing of a second instruction for a threshold amount of time if a first instruction within an instruction queue is fusible with the second instruction. | 12-29-2016 |
20170235571 | Method and Apparatus to Process 4-Operand SIMD Integer Multiply-Accumulate Intruction | 08-17-2017 |
20170235572 | PROVIDING VECTOR HORIZONTAL COMPARE FUNCTIONALITY WITHIN A VECTOR REGISTER | 08-17-2017 |
20170235576 | METHOD AND APPARATUS FOR SPECULATIVE DECOMPRESSION | 08-17-2017 |
20190146791 | METHODS, APPARATUS, INSTRUCTIONS AND LOGIC TO PROVIDE POPULATION COUNT FUNCTIONALITY FOR GENOME SEQUENCING AND ALIGNMENT | 05-16-2019 |
20190146792 | APPARATUSES, METHODS, AND SYSTEMS FOR ELEMENT SORTING OF VECTORS | 05-16-2019 |
20220137971 | INSTRUCTION LENGTH BASED PARALLEL INSTRUCTION DEMARCATOR - Instruction length based parallel instruction demarcators and methods for parallel instruction demarcation are included, wherein an instruction sequence is received at an instruction buffer, the instruction sequence comprising a plurality of instruction syllables, and the instruction sequence is stored at the instruction buffer. It is determined, using one or more logic blocks arranged in a sequence, a length of instructions and at least one boundary. Additionally, using a controlling logic block, the sequence is demarcated into individual instructions. | 05-05-2022 |