Entries |
Document | Title | Date |
20080201560 | VERY LONG INSTRUCTION WORD (VLIW) COMPUTER HAVING EFFICIENT INSTRUCTION CODE FORMAT - A Very Long Instruction Word (VLIW) processor having an instruction set with a reduced size resulting in a small number of bits being necessary to specify registers. The VLIW processor includes a register file, and first through third operation units, and executes a very long instruction word. Further, the very long instruction word includes a register specifying field which specifies a least one of the registers in the register file and a plurality of instructions. The operand of each instruction includes bits src | 08-21-2008 |
20080209187 | Storing and processing SIMD saturation history flags and data size - A method and apparatus for calculation and storage of Single-Instruction-Multiple-Data (SIMD) saturation history information. A first coprocessor instruction has a first format identifying a saturation operation, a first source having packed data elements and a second source having packed data elements. The saturating operation is executed on the packed data elements of the first and second sources. Saturation flags are stored in the Wireless Coprocessor Saturation Status Flag (wCSSF) register to indicate if a result of the saturating operation saturated. A second coprocessor instruction has a second format identifying a saturation history processing operation and a saturation data size. An operand for the processing operation is determined based on the saturation data size, and the processing operation is executed on the saturation flags and the operand for the saturation data size. Condition code flags are stored in a status register to indicate the result of processing operation. | 08-28-2008 |
20080215860 | Software Protection Using Code Overlapping - Apparatus and methods for implementing software protection using code overlapping are disclosed. In one implementation, a combination block comprising a first sub-block of instructions with one or more interspersed obfuscation instructions is received. The obfuscation instructions interspersed among sequentially executable instructions of the first sub-block of instructions can include instructions from other sub-blocks as well as control instructions configured to guide a processor to execute all of the instructions in first sub-block of instructions in sequence. The obfuscation instructions are replaced with one or more replacement instructions. The replacement instructions can be of a same bit-length as the replaced obfuscation instructions. Moreover, the replacement instructions can include integrity checks configured to check for tampering with instructions and/or runtime program state in the first sub-block and/or the combination block. | 09-04-2008 |
20080215861 | METHOD AND APPARATUS FOR EFFICIENT RESOURCE UTILIZATION FOR PRESCIENT INSTRUCTION PREFETCH - Embodiments of an apparatus, system and method enhance the efficiency of processor resource utilization during instruction prefetching via one or more speculative threads. Renamer logic and a map table are utilized to perform filtering of instructions in a speculative thread instruction stream. The map table includes a yes-a-thing bit to indicate whether the associated physical register's content reflects the value that would be computed by the main thread. A thread progress beacon table is utilized to track relative progress of a main thread and a speculative helper thread. Based upon information in the thread progress beacon table, the main thread may effect termination of a helper thread that is not likely to provide a performance benefit for the main thread. | 09-04-2008 |
20080229082 | CONTROL SUB-UNIT AND CONTROL MAIN UNIT - A sub-unit judges whether an instruction received from an external unit is executable. If the instruction is judged to be executable, the sub-unit executes it. If, on the other hand, the instruction is judged to be unexecutable, the sub-unit notifies the external unit of an executable plan. | 09-18-2008 |
20080244244 | PARALLEL INSTRUCTION PROCESSING AND OPERAND INTEGRITY VERIFICATION - A method includes accessing, at a processing device, operand data associated with an instruction operation from a data cache and executing, at the processing device, the instruction operation using the operand data prior to determining the validity of the operand data. The method further includes retiring, at the processing device, the instruction operation in response to determining the operand data is valid. A processing device includes a data cache and an instruction pipeline. The instruction pipeline includes an execution stage configured to execute an instruction operation using operand data access from the data cache prior to determining the validity of the operand data and a retire stage configured to retire the instruction operation in response to determining the operand data is valid. | 10-02-2008 |
20080256344 | Scan Configuration of Field Programmable Gate Arrays - Embodiments herein include a method, service, apparatus, etc., that sets at least one integrated circuit board (that has programmable elements) to a programming state. When such programmable elements are set to the programming state, they are capable of being changed. Once the programmable elements are set to be changed, at least one printed sheet is scanned. The scanning can be preformed using any scanner that is operatively connected to the integrated circuit board through, for example, a processor. Again, the printed data (e.g., the barcodes or glyphs or other computer-only readable markings) on the printed sheet comprises the reprogramming data. The processor reads the barcodes or glyphs from the bitmap generated by the scanner, and executes the reprogramming data to change the logical instructions and data maintained within the programmable elements. | 10-16-2008 |
20080270769 | PROCESS FOR RUNNING PROGRAMS ON PROCESSORS AND CORRESPONDING PROCESSOR SYSTEM - Programs having a given instruction-set architecture are executed on a multiprocessor system comprising a plurality of processors, for example of a VLIW type, each of said processors being able to execute, at each processing cycle, a respective maximum number of instructions. The instructions are compiled as instruction words of given length executable on a first processor. At least some of the instruction words of given length are converted into modified-instruction words executable on a second processor. The operation of modifying comprises in turn at least one operation chosen in the group consisting of: splitting the instruction words into modified-instruction words; and entering no-operation instructions in the modified-instruction words. | 10-30-2008 |
20080294880 | CUSTOMIZATION OF A MICROPROCESSOR AND DATA PROTECTION METHOD - An electronic circuit containing a processing unit for executing program instructions, including at least one unit for recognizing at least one first instruction operator in the program and for converting this first operator into another instruction operator, both operators being interpretable by the processing unit. A method for controlling the access to data by such a circuit. | 11-27-2008 |
20080320286 | DYNAMIC OBJECT-LEVEL CODE TRANSLATION FOR IMPROVED PERFORMANCE OF A COMPUTER PROCESSOR - A system and method for improving the efficiency of an object-level instruction stream in a computer processor. Translation logic for generating translated instructions from an object-level instruction stream in a RISC-architected computer processor, and an execution unit which executes the translated instructions, are integrated into the processor. The translation logic combines the functions of a plurality of the object-level instructions into a single translated instruction which can be dispatched to a single execution unit as compared with the untranslated instructions, which would otherwise be serially dispatched to separate execution units. Processor throughput is thereby increased since the number of instructions which can be dispatched per cycle is extended. | 12-25-2008 |
20090013160 | DYNAMICALLY COMPOSING PROCESSOR CORES TO FORM LOGICAL PROCESSORS - A method, system and computer program product for dynamically composing processor cores to form logical processors. Processor cores are composable in that the processor cores are dynamically allocated to form a logical processor to handle a change in the operating status. Once a change in the operating status is detected, a mechanism may be triggered to recompose one or more processor cores into a logical processor to handle the change in the operating status. An analysis may be performed as to how one or more processor cores should be recomposed to handle the change in the operating status. After the analysis, the one or more processor cores are recomposed into the logical processor to handle the change in the operating status. By dynamically allocating the processor cores to handle the change in the operating status, performance and power efficiency is improved. | 01-08-2009 |
20090013161 | PROCESSOR FOR MAKING MORE EFFICIENT USE OF IDLING COMPONENTS AND PROGRAM CONVERSION APPARATUS FOR THE SAME - A processor that has a plurality of instruction slots each of which stores an instruction to be executed in parallel. One of the plurality of instruction slots is a first instruction slot and another a second instruction slot. A special instruction stored in the first instruction slot is executed by a first functional unit that executes instructions stored in the first instruction slot, and a second functional unit that executes instructions stored in the second instruction slot. An instruction stored in the second instruction slot is executed in parallel by a third functional unit that executes instructions stored in the second instruction slot. | 01-08-2009 |
20090024839 | Variable instruction set microprocessor - A new Variable instruction set microprocessor consisting of a fixed instruction set part, together with a programmable logic part, where custom instructions can be dynamically added at any time of the work of the said. | 01-22-2009 |
20090024840 | INSTRUCTION CODE COMPRESSION METHOD AND INSTRUCTION FETCH CIRCUIT - An instruction code compression method and an instruction fetch circuit which are capable of reducing both the number of fetches and program codes. A reuse flag is provided in an upper bit group including operational codes, and a lower bit group including operands and having the same number of bits as the upper bit group. When 2N+1 (N is an integer of 1 or more) instruction codes having the same upper bit group continues in a series of instruction codes, respective reuse flags of the lower bit group of a 2n-th (n is an integer of 1 or more and N or less) instruction code and a (2n+1)-th instruction code in the series of instruction codes are set to “1”, and the lower bit groups of the 2n-th and (2n+1)-th instruction codes are integreted into one compressed instruction code. | 01-22-2009 |
20090031120 | Method and Apparatus for Dynamically Fusing Instructions at Execution Time in a Processor of an Information Handling System - One embodiment of a processor includes a fetch stage, decoder stage, execution stage and completion stage. The execution stage includes a primary execution stage for handling low latency instructions and a secondary execution stage for handling higher latency instructions. A detector determines if an instruction is a high latency instruction or a low latency instruction. If the detector also finds that a particular low latency instruction is dependent on, and destructive of, a corresponding high latency instruction, then the secondary execution stage dynamically fuses the execution of the low latency instruction together with the execution of the high latency instruction. Otherwise, the primary execution stage handles the execution of the low latency instruction. | 01-29-2009 |
20090031121 | APPARATUS AND METHOD FOR REAL-TIME MICROCODE PATCH - An apparatus for performing microcode patches that is both fast and flexible. In one embodiment, an apparatus for performing a real-time microcode patch is provided. The apparatus includes a patch array and a mux. The patch array receives a microcode ROM address and determines that the microcode ROM address matches one of a plurality of entries within the patch array. When the microcode ROM address matches, the patch array outputs a corresponding patch instruction and to assert a hit signal. The mux receives the patch instruction from the patch array and a micro instruction corresponding to the microcode ROM address from a microcode ROM. The mux provides the micro instruction or the corresponding patch instruction to an instruction register based upon the state of the hit signal. | 01-29-2009 |
20090070564 | METHOD AND SYSTEM OF ALIGNING EXECUTION POINT OF DUPLICATE COPIES OF A USER PROGRAM BY EXCHANGING INFORMATION ABOUT INSTRUCTIONS EXECUTED - Aligning execution point of duplicate copies of a user program by exchanging information about instructions executed. At least some of the exemplary embodiments may be a method of operating duplicate copies of a user program in a first and second processor, allowing at least one of the user programs to execute until retired instruction counter values in each processor are substantially the same, and then executing a number of instructions of each user program. Of the instructions executed, at least some of the instructions are decoded and the inputs of each decoded instruction determined (the decoding substantially simultaneously with executing in each processor). The method further may include exchanging among the processors addresses of decoded instructions and values indicative of inputs of the decoded instructions, determining that an execution point of the user program in the first processor lags with respect to an execution point of the user program in the second processor using at least the addresses of the decoded instructions, and advancing the first processor until the execution point within each user program is substantially aligned. | 03-12-2009 |
20090077355 | INSTRUCTION EXPLOITATION THROUGH LOADER LATE FIX UP - A method, computer program product, and data processing system for substituting a candidate instruction in application code being loaded during load time. Responsive to identifying the candidate instruction, a determination is made whether a hardware facility of the data processing system is present to execute the candidate instruction. If the hardware facility is absent from the data processing system, the candidate instruction is substituted with a second set of instructions. | 03-19-2009 |
20090077356 | LOAD TIME INSTRUCTION SUBSTITUTION - A method, computer program product, and data processing system for substituting a candidate instruction in application code being loaded during load time. Responsive to identifying the candidate instruction, a determination is made whether a hardware facility of the data processing system is present to execute the candidate instruction. If the hardware facility is absent from the data processing system, the candidate instruction is substituted with a second set of instructions. | 03-19-2009 |
20090083525 | SYSTEMS AND METHODS THAT FACILITATE MANAGEMENT OF ADD-ON INSTRUCTION GENERATION, SELECTION, AND/OR MONITORING DURING EXECUTION - The subject invention relates to systems and methods that facilitate display, selection, and management of context associated with execution of add-on instructions. The systems and methods track add-on instruction calls provide a user with call and data context, wherein the user can select a particular add-on instruction context from a plurality of contexts in order to observe values and/or edit parameters associated with the add-on instruction. The add-on instruction context can include information such as instances of data for particular lines of execution, the add-on instruction called, a caller of the instruction, a location of the instruction call, references to complex data types and objects, etc. The systems and methods further provide a technique for automatic routine selection based on the add-on instruction state information such that the add-on instruction executed corresponds to a current state. | 03-26-2009 |
20090089560 | INFRASTRUCTURE FOR PARALLEL PROGRAMMING OF CLUSTERS OF MACHINES - GridBatch provides an infrastructure framework that hides the complexities and burdens of developing logic and programming application that implement detail parallelized computations from programmers. A programmer may use GridBatch to implement parallelized computational operations that minimize network bandwidth requirements, and efficiently partition and coordinate computational processing in a multiprocessor configuration. GridBatch provides an effective and lightweight approach to rapidly build parallelized applications using economically viable multiprocessor configurations that achieve the highest performance results. | 04-02-2009 |
20090094443 | INFORMATION PROCESSING APPARATUS AND METHOD THEREOF, PROGRAM, AND STORAGE MEDIUM - An information processing apparatus includes a determining unit adapted to determine a target instruction to be modified to a camouflaged instruction among instructions contained in a processing target program, a camouflaged instruction generating unit adapted to generate the camouflaged instruction corresponding to the target instruction, a restore command generating unit adapted to generate a restore command for restoring the generated camouflaged instruction to the corresponding target instruction, and a unit adapted to modify the target instruction contained in the processing target program with the generated camouflaged instruction and add the restore command to the program, wherein the restore command performs the restoration by referencing a memory storing an output value of a processing command contained in the processing target program and identifying the position of the target instruction in the program or the target instruction based on the referenced value. | 04-09-2009 |
20090106538 | System and Method for Implementing a Hardware-Supported Thread Assist Under Load Lookahead Mechanism for a Microprocessor - The present invention includes a system and method for implementing a hardware-supported thread assist under load lookahead mechanism for a microprocessor. According to an embodiment of the present invention, hardware thread-assist mode can be activated when one thread of the microprocessor is in a sleep mode. When load lookahead mode is activated, the fixed point unit copies the content of one or more architected facilities from an active thread to corresponding architected facilities in the first inactive thread. The load-store unit performs at least one speculative load in load lookahead mode and writes the results of the at least one speculative load to a duplicated architected facility in the first inactive thread. | 04-23-2009 |
20090113189 | Method and System for Hiding Information in the Instruction Processing Pipeline - A system, article of manufacture and method is provided for transferring secret information from a first location to a second location. The secret information is encoded and stalls in executable code are located. The executable code is configured to perform a predetermined function when executed on a pipeline processor. The encoded information is inserted into a plurality of instructions and the instructions are inserted into the executable code at the stalls. There is no net effect of all of the inserted instructions on the predetermined function of the executable code. The executable code is transferred to the second location. The location of the stalls in the transferred code is identified. The encoded information is extracted from the instructions located at the stalls. The encoded information may then be decoding information to generate the information at the second location. | 04-30-2009 |
20090132795 | Processor with excludable instructions and registers and changeable instruction coding for antivirus protection - Digital processor architecture is characterized by processor's instruction set and registers. If architecture is fixed and known to software developers the viruses may be created to harm computers. Invented processor architecture protects against viruses by modifying of association between instruction set coding and processor's functions. Additionally, invented architecture allows to exclude processor's parts associated with unused by program instructions and exclude registers. Exclusion of processor's parts unused by program makes processor smaller and faster in comparison with processor containing all blocks. Developed architecture also allows to exclude unused portions of instructions from instruction's format resulting in smaller memory size required for the same program. | 05-21-2009 |
20090158015 | Uses of Known Good Code for Implementing Processor Architectural Modifications - In one embodiment, a processor comprises a programmable map and a circuit. The programmable map is configured to store data that identifies at least one instruction for which an architectural modification of an instruction set architecture implemented by the processor has been defined, wherein the processor does not implement the modification. The circuitry is configured to detect the instruction or its memory operands and cause a transition to Known Good Code (KGC), wherein the KGC is protected from unauthorized modification and is provided from an authenticated entity. The KGC comprises code that, when executed, emulates the modification. In another embodiment, an integrated circuit comprises at least one processor core; at least one other circuit; and a KGC source configured to supply KGC to the processor core for execution. The KGC comprises interface code for the other circuit whereby an application executing on the processor core interfaces to the other circuit through the KGC. | 06-18-2009 |
20090164764 | PROCESSOR AND DEBUGGING DEVICE - A processor according to the present invention is capable of executing instructions in parallel, the processor further executing a string of instructions consisting of a plurality of instructions allocated at continuous addresses as an execution unit, comprising an instruction analyzer, an instruction executor and an instruction canceling unit. The instruction analyzer comprising debug instruction detectors for detecting a debug instruction which generates debug interruption, the instruction detectors of the same number as the instructions executable in parallel by the processor is provided. The instruction executor for executing a group of instructions comprising at least one of the instructions that is included in the same string of the instructions as the detected debug instruction and allocated at an address of a lower-order position than the detected debug instruction when the debug instruction is detected by the instruction analyzer, and an instruction canceller for canceling execution of a group of instructions comprising at least one of the instructions that is included in the same string of the instructions as the detected debug instruction and allocated at an address of a higher-order position than the detected debug instruction when the debug instruction is detected by the instruction analyzer. | 06-25-2009 |
20090177871 | ARCHITECTURAL SUPPORT FOR SOFTWARE THREAD-LEVEL SPECULATION - A system for thread-level speculation includes a memory system for storing a program code, a plurality of registers corresponding to one or more execution contexts, for storing sets of memory addresses that are accessed speculatively, and a plurality of processors, each providing the one or more execution contexts, in communication with the memory system, wherein a processor of the plurality of processors executes the program code to implement method steps of dividing a program into a plurality of epochs to be executed in parallel by the system, wherein one of the epochs is executed non-speculatively and the other epochs are executed speculatively, determining a current epoch to be executed on an execution context, encoding addresses read during execution of the current epoch, encoding addresses written during execution of predecessor epochs of the current epoch, and encoding addresses written during execution of successor epochs of the current epoch. | 07-09-2009 |
20090198977 | Sharing Data in Internal and Memory Representations with Dynamic Data-Driven Conversion - Illustrative embodiments determine the data type of the operand being accessed as well as analyze the data value subrange of the input operand data type. If the operand's data type does not match the required format of the instruction being processed, a determination is made as to whether a subrange of data values of the data type of the input operand is supported natively. If the subrange of data values of the input operand is not supported natively, then a format conversion is performed on the data and the instruction may then operate on the data. Otherwise, the data may be operated on directly by the instruction without a format conversion operation and thus, the conversion is not performed. | 08-06-2009 |
20090198978 | Data processing apparatus and method for identifying sequences of instructions - A data processing apparatus is provided comprising a processing unit for executing instructions, a cache structure for storing instructions retrieved from memory for access by the processing unit, and profiling logic for identifying a sequence of instructions that is functionally equivalent to an accelerator instruction. When such a sequence of instructions is identified, the equivalent accelerator instruction is stored in the cache structure as a replacement for the first instruction of the sequence, with the remaining instructions in the sequence of instructions being stored unchanged. The accelerator instruction includes an indication to cause the processing unit to skip the remainder of the sequence when executing the accelerator instruction. | 08-06-2009 |
20090217009 | SYSTEM, METHOD AND COMPUTER PROGRAM PRODUCT FOR TRANSLATING STORAGE ELEMENTS - A system, method and computer program product for translations in a computer system. The system includes a general purpose register containing a base address of an address translation table. The system also includes a millicode accessible special displacement register configured to receive a plurality of elements to be translated. The system further includes a multiplexer for selecting a particular one of the plurality of elements from the millicode accessible special displacement register and for generating a displacement or offset value. The system further includes an address generator for creating a combined address containing the base address from the general purpose register and the generated displacement or offset value. | 08-27-2009 |
20090240928 | CHANGE IN INSTRUCTION BEHAVIOR WITHIN CODE BLOCK BASED ON PROGRAM ACTION EXTERNAL THERETO - Extended, alternate and/or modified instruction behavior can be established using a program construct that appears outside a bounded block of program code in such a way that the behavioral changes are limited to the bounded block and coincide with a particular point in the execution thereof. These extensions, alternations and/or modifications are supported in some processor embodiments in ways that add neither additional code space nor additional execution cycles to the bounded block. In general, the particular point in execution of the bounded block may be specified in a variety of ways, including positionally or temporally. Techniques described herein have broad applicability, but will be understood by persons of ordinary skill in the art in the context of certain illustrative code blocks, including zero- (or low-) overhead loops, lightweight procedures and very long instruction word (VLIW) type instruction packets, and processors that support them. | 09-24-2009 |
20090249043 | Apparatus, method and computer program for processing instruction - A plurality of instructions to be executed in an order of being issued without an appointment of a waiting time or a starting moment are designed to be executed after a certain waiting time; instructions to be executed in an order of being issued without designation of starting moment or waiting time are provided with starting moment or waiting time information so that the instructions can be executed in an order designated by the time information. | 10-01-2009 |
20090254738 | OBFUSCATION DEVICE, PROCESSING DEVICE, METHOD, PROGRAM, AND INTEGRATED CIRCUIT THEREOF - It is an object of the present invention to provide an obfuscation device that can achieve both sufficient obfuscation and the appropriate instruction block to be executed. In the obfuscation device, a first instruction generating unit, for each of the first process and the second process, generates an initialization instruction for securing a management area for managing the identification information indicating an instruction block that should be executed next so as to proceed with the process, and to store the initialization instruction in said storage unit. Further, a second instruction generating unit generates a selection instruction (i) to make a first selection selecting a process that should be proceeded out of the first process and the second process, (ii) to make a second selection selecting an instruction block indicated by the identification information managed in the management area as an instruction block that should be executed for proceeding with the process selected by the first selection, and (iii) to cause the execution device to execute the instruction block selected by the second selection, and stores the selection instruction in said storage unit. Furthermore, a third instruction generating unit generates an updating instruction for updating, when the second process is selected by the first selection, and when the loop instruction included in the second process is executed, the identification information regarding the first process managed in the management area to identification information indicating an instruction block to be executed next in the first process which is subsequently selected by the first selection, and to store the updating instruction in said storage unit. | 10-08-2009 |
20090271593 | PATCHING DEVICE FOR PATCHING ROM CODE, METHOD FOR PATCHING ROM CODE, AND ELECTRONIC DEVICE UTILIZING THE SAME - An electronic device comprising a ROM, a reprogrammable memory, a processor, and a patching device. The ROM stores a first function starting from a first address, the reprogrammable memory stores a second function starting from a second address, the patching device couples to the ROM and the reprogrammable memory, and the processor couples to the patching device. The patching device receives directive information from the processor and determines whether the processor is going to fetch the first function, and generates and returns a branch instruction to the processor when the processor is going to fetch the first function. After receiving the branch instruction, the processor executes the branch instruction to cause an unconditional jump to the second address and subsequently fetches the second function. | 10-29-2009 |
20090292906 | Multi-Mode Register File For Use In Branch Prediction - A multi-mode register file is described. In one embodiment, the multi-mode register file includes an operand in a first mode. The multi-mode register file further includes auxiliary information which replaces the operand in a second mode. | 11-26-2009 |
20090319761 | HARDWARE CONSTRAINED SOFTWARE EXECUTION - Restricting execution by a computing device of instructions within an application program. The application program is modified such that execution of the selected instructions is dependent upon a corresponding expected state of one or more hardware components in the computing device. In an embodiment, the application program is modified to place the hardware components in the expected states prior to execution of the corresponding selected instructions. Creating the dependency on the hardware components prevents the unintended or malicious execution of the selected instructions. | 12-24-2009 |
20090327669 | INFORMATION PROCESSING APPARATUS, PROGRAM EXECUTION METHOD, AND STORAGE MEDIUM - According to one embodiment, an information processing apparatus comprises a storage storing program modules and parallel execution control description describing relationships of the program modules, a conversion module extracting a part relating to the program module from the parallel execution control description, and creating graph data structure creation information including preceding and succeeding information of the program module, an adding module extracting graph data structure creation information to which the input data is given, creating a node, and adding the created node to a formerly created graph data structure, and an execution module subjecting the graph data structure to at least one of depth-first search and breadth-first search with a restricted breadth, selecting one node from nodes stored in the node memory, and executing a program module corresponding to the selected node. | 12-31-2009 |
20100037039 | INSTRUCTION OPERATION CODE GENERATION SYSTEM - It is possible to increase the processor instruction set design job efficiency and reduce workload on designers in investigation of an instruction set. An instruction operation code generation system includes an operation code bit width decision means, an instruction sorting means, and an operation code value decision means. The operation code bit width decision means decides a bit width that can be assigned for an operation code of each instruction according to specification data associated with a processor instruction set. The instruction sorting means sorts the instructions according to the operation code bit width. The operation code value decision means decides the value of the operation code of each instruction. | 02-11-2010 |
20100049954 | METHOD FOR SPECULATIVE EXECUTION OF INSTRUCTIONS AND A DEVICE HAVING SPECULATIVE EXECUTION CAPABILITIES - A method for speculative execution of instructions, the method includes: decoding a compare instruction; speculatively executing, in a continuous manner, conditional instructions that are conditioned by a condition that is related to a resolution of the compare instruction and are decoded during a speculation window that starts at the decoding of the compare instruction and ends when the compare instruction is resolved; and stalling an execution of a non-conditional instruction that is dependent upon an outcome of at least one of the conditional instructions, until the speculation window ends. | 02-25-2010 |
20100064122 | FAST STRING MOVES - A microprocessor REP MOVS macroinstruction specifies the word length of the string in the IA-32 ECX register. The microprocessor includes a memory, configured to store a first and second sequence of microinstructions. The first sequence conditionally transfers control to a microinstruction within the first sequence based on the ECX register. The second sequence does not conditionally transfer control based on the ECX register. The microprocessor includes an instruction translator, coupled to the memory. In response to a macroinstruction that moves an immediate value into the ECX register, the instruction translator sets a flag and saves the immediate value. In response to a macroinstruction that modifies the ECX register in a different manner, the translator clears the flag. In response to a REP MOVS macroinstruction, the instruction translator transfers control to the first sequence if the flag is clear; and transfers control to the second sequence if the flag is set. | 03-11-2010 |
20100106949 | SOURCE CODE PROCESSING METHOD, SYSTEM AND PROGRAM - A method, system, and computer readable article of manufacture to enable parallel execution of a divided source code in a multiprocessor system. The method includes the steps of: inputting an original source code by an input device into the computing apparatus; finding a critical path in the original source code by a critical path cut module; cutting the critical path in the original source code into a plurality of process block groups by the critical path cut module; and dividing the plurality of process block groups among a plurality of processors in the multiprocessor system by a CPU assignment code generation module to produce the divided source code. The system includes an input device; a critical path cut module; and a CPU assignment code generation unit to produce the divided source code. The computer readable article of manufacture includes instructions to implement the method. | 04-29-2010 |
20100115247 | REPLACEMENT POLICY FOR HOT CODE DETECTION - Methods and apparatus relating to a replacement policy for hot code detection are described. In some embodiments, it may be determined which entry amongst a plurality of entries stored in storage unit is to be replaced next. The entries may correspond to hot code and may store age and execution frequency information corresponding to the hot code. Other embodiments are also described and claimed. | 05-06-2010 |
20100115248 | Technique for promoting efficient instruction fusion - A technique to enable efficient instruction fusion within a computer system. In one embodiment, a processor logic delays the processing of a second instruction for a threshold amount of time if a first instruction within an instruction queue is fusible with the second instruction. | 05-06-2010 |
20100138638 | SYSTEM AND METHOD OF INSTRUCTION MODIFICATION - A method and system of instruction modification. A first machine language instruction, which may comprise a plurality of discrete instructions, is fetched. Responsive to a trigger pattern in the first machine language instruction, a segment of the first machine language instruction is modified. Information can be substituted into the segment based on specifics outlined in the trigger pattern. Alternatively, information can be combined with the segment via logical and/or arithmetic operations. Modification of the segment produces a second machine language instruction that is executed by units of the processor. In one embodiment, information may be taken from a queue and used to replace data from the segment. How information is taken from the queue and how the information so taken is used to replace fields of the segment are defined by the trigger pattern. | 06-03-2010 |
20100146247 | Method and System for Managing Out of Order Dispatched Instruction Queue with Partially De-Coded Instruction Stream - A computer-implemented method and apparatus for managing an out of order dispatched instruction queue in a microprocessor. In one embodiment, the method and apparatus include assigning a group identification number and a target identification number to an instruction in an instruction stream. The group identification number and the target identification number are labeled after a pre-decoding stage inside an instruction fetcher unit. The group identification number and the target identification number are pre-decoded. The instruction is sent to an instruction queue. The instruction is re-ordered in the instruction stream after executing the instruction utilizing information from the pre-decoding of the group identification number and the target identification number. | 06-10-2010 |
20100169620 | SIGNAL PROCESSING DEVICE, SIGNAL PROCESSING METHOD, AND PROGRAM - There is provided a signal processing device which is capable of suppressing the influence of a digital data process on an analog signal process without completely stopping a digital data processing circuit. A signal processing device includes an analog signal processing circuit, a digital data processing circuit, a determination section configured to determine an influence of the digital data processing circuit on the analog signal processing circuit, and a control section configured to stop a partial circuit of the digital data processing circuit or lower processing capability thereof in response to a determination result of the determination section. | 07-01-2010 |
20100180105 | MODIFYING COMMANDS - The present disclosure includes methods, devices, modules, and systems for modifying commands. One device embodiment includes a memory controller including a channel, wherein the channel includes a command queue configured to hold commands, and circuitry configured to modify at least a number of commands in the queue and execute the modified commands. | 07-15-2010 |
20100191940 | SINGLE STEP MODE IN A SOFTWARE PIPELINE WITHIN A HIGHLY THREADED NETWORK ON A CHIP MICROPROCESSOR - A hardware thread is selectively forced to single step the execution of software instructions from a work packet granule. A “single step” packet is associated with a work packet granule. The work packet granule, with the associated “single step” packet, is dispatched as an appended work packet granule to a preselected hardware thread in a processor core, which, in one embodiment, is located at a node in a Network On a Chip (NOC). The work packet granule then executes in a single step mode until completion. | 07-29-2010 |
20100205413 | TRANSLATED MEMORY PROTECTION - A method of responding to an attempt to write a memory address including a target instruction which has been translated to a host instruction for execution by a host processor including the steps of marking a memory address including a target instruction which has been translated to a host instruction, detecting a memory address which has been marked when an attempt is made to write to the memory address, and responding to the detection of a memory address which has been marked by protecting a target instruction at the memory address until it has been assured that translations associated with the memory address will not be utilized before being updated. | 08-12-2010 |
20100268922 | Processor that utilizes re-configurable logic to construct persistent finite state machines from machine instruction sequences - A processor, integrated with re-configurable logic and memory elements, is disclosed which is to be used as part of a shared memory, multiprocessor computer system. The invention utilizes the re-configurable elements to construct persistent finite state machines based on information decoded by the invention from sequences of CISC or RISC type processor machine instructions residing in memory. The invention implements the same algorithm represented by the sequence of encoded instructions, but executes the algorithm consuming significantly fewer clock cycles than would be consumed by the processor originally targeted to execute the sequence of encoded instructions. | 10-21-2010 |
20110047362 | Version Pressure Feedback Mechanisms for Speculative Versioning Caches - Mechanisms are provided for controlling version pressure on a speculative versioning cache. Raw version pressure data is collected based on one or more threads accessing cache lines of the speculative versioning cache. One or more statistical measures of version pressure are generated based on the collected raw version pressure data. A determination is made as to whether one or more modifications to an operation of a data processing system are to be performed based on the one or more statistical measures of version pressure, the one or more modifications affecting version pressure exerted on the speculative versioning cache. An operation of the data processing system is modified based on the one or more determined modifications, in response to a determination that one or more modifications to the operation of the data processing system are to be performed, to affect the version pressure exerted on the speculative versioning cache. | 02-24-2011 |
20110055527 | METHOD AND SYSTEM FOR GENERATING OBJECT CODE TO FACILITATE PREDICTIVE MEMORY RETRIEVAL - A method and system are described for generating reference tables in object code which specify the addresses of branches, routines called, and data references used by routines in the code. In a suitably equipped processing system, the reference tables can be passed to a memory management processor which can open the appropriate memory pages to expedite the retrieval of data referenced in the execution pipeline. The disclosed method and system create such reference tables at the beginning of each routine so that the table can be passed to the memory management processor in a suitably equipped processor. Resulting object code also allows processors lacking a suitable memory management processor to skip the reference table, preserving upward compatibility. | 03-03-2011 |
20110066829 | Selecting Regions Of Hot Code In A Dynamic Binary Rewriter - An approach to region selection which extends beyond traces and selects super-regions. A super-region (SR) contains arbitrary control flow, such as interprocedural nested loops, that provides a larger scope for transformation (e.g. optimization) than traces. Hardware samples are used to identify SRs that contain the hot code of a client process without requiring any static program information. | 03-17-2011 |
20110107070 | PATCHING OF A READ-ONLY MEMORY - An information processing device comprises a memory carrying a program, and a processor capable of executing the program. The program instructs the processor to determine whether a selected identifier is contained in a set of identifiers, and, if the processor has determined that the identifier is contained in the set of identifiers, to execute a patch instruction identified by the identifier, and, if the processor has determined that the identifier is not contained in the set of identifiers, to execute an original instruction identified by the identifier. The memory may comprise a read-only memory (ROM) and a random access memory (RAM), the ROM carrying the original instruction and the RAM carrying the patch instruction and the set of identifiers. A method of executing a program on an information processing device and a method of modifying a program on an information processing device are also disclosed. | 05-05-2011 |
20110202749 | INSTRUCTION COMPRESSING APPARATUS AND METHOD - An instruction compressing apparatus and method for a parallel processing computer such as a very long instruction word (VLIW) computer, are provided. The instruction compressing apparatus includes a bundle code generating unit, an instruction compressing unit, and an instruction converting unit. The bundle code generating unit may generate a bundle code in response to an input of instructions to be compressed. The bundle code may indicate whether a current instruction group is terminated, and also whether an instruction group following the current instruction group is a no-operation (NOP) instruction group. The instruction compressing unit may remove a NOP instruction and/or a NOP instruction group from the input instructions according to the generated bundle code. The instruction converting unit may include the generated bundle code in the remaining instructions which have not been removed by the instruction compressing unit. | 08-18-2011 |
20110213952 | Methods and Apparatus for Dynamic Instruction Controlled Reconfigurable Register File - A scalable reconfigurable register file (SRRF) containing multiple register files, read and write multiplexer complexes, and a control unit operating in response to instructions is described. Multiple address configurations of the register files are supported by each instruction and different configurations are operable simultaneously during a single instruction execution. For example, with separate files of the size 32×32 supported configurations of 128×32 bit s, 64×64 bit s and 32×128 bit s can be in operation each cycle. Single width, double width, quad width operands are optimally supported without increasing the register file size and without increasing the number of register file read or write ports. | 09-01-2011 |
20110238961 | SYSTEM AND METHOD OF INSTRUCTION MODIFICATION - A method and system of instruction modification. A first machine language instruction, which may comprise a plurality of discrete instructions, is fetched. Responsive to a trigger pattern in the first machine language instruction, a segment of the first machine language instruction is modified. Information can be substituted into the segment based on specifics outlined in the trigger pattern. Alternatively, information can be combined with the segment via logical and/or arithmetic operations. Modification of the segment produces a second machine language instruction that is executed by units of the processor. In one embodiment, information may be taken from a queue and used to replace data from the segment. How information is taken from the queue and how the information so taken is used to replace fields of the segment are defined by the trigger pattern. | 09-29-2011 |
20110264897 | MICROPROCESSOR THAT FUSES LOAD-ALU-STORE AND JCC MACROINSTRUCTIONS - A microprocessor receives first and second program-adjacent macroinstructions of the microprocessor instruction set architecture. The first macroinstruction loads an operand from a location in memory, performs an arithmetic/logic operation using the loaded operand to generate a result, and stores the result back to the memory location. The second macroinstruction jumps to a target address if condition codes satisfy a specified condition and otherwise executes the next sequential instruction. An instruction translator simultaneously translates the first and second program-adjacent macroinstructions into first, second, and third micro-operations for execution by execution units. The first micro-operation calculates the memory location address and loads the operand therefrom. The second micro-operation performs the arithmetic/logic operation using the loaded operand to generate the result, updates the condition codes based on the result, and jumps to the target address if the updated condition codes satisfy the condition. The third micro-operation stores the result to the memory location. | 10-27-2011 |
20110283093 | MINIMIZING PROGRAM EXECUTION TIME FOR PARALLEL PROCESSING - According to one embodiment, an information processing apparatus comprises a storage storing program modules and parallel execution control description describing relationships of the program modules, a conversion module extracting a part relating to the program module from the parallel execution control description, and creating graph data structure creation information including preceding and succeeding information of the program module, an adding module extracting graph data structure creation information to which the input data is given, creating a node, and adding the created node to a formerly created graph data structure, and an execution module subjecting the graph data structure to at least one of first search and second search with a restricted breadth, selecting one node from nodes stored in the node memory, and executing a program module corresponding to the selected node. | 11-17-2011 |
20110302395 | HARDWARE ASSIST THREAD FOR DYNAMIC PERFORMANCE PROFILING - A method and data processing system for managing running of instructions in a program. A processor of the data processing system receives a monitoring instruction of a monitoring unit. The processor determines if at least one secondary thread of a set of secondary threads is available for use as an assist thread. The processor selects the at least one secondary thread from the set of secondary threads to become the assist thread in response to a determination that the at least one secondary thread of the set of secondary threads is available for use as an assist thread. The processor changes profiling of running of instructions in the program from the main thread to the assist thread. | 12-08-2011 |
20110302396 | IMAGE PROCESSING APPARATUS THAT PERFORMS PROCESSING ACCORDING TO INSTRUCTION DEFINING THE PROCESSING, CONTROL METHOD FOR THE APPARATUS, AND STORAGE MEDIUM - An image processing apparatus which makes it possible to select a plurality of instructions at a time, and connect a plurality of documents together so that they can be processed as one document. The image processing apparatus has a reading unit, which reads an image on an original to generate image data, and performs processing according to an instruction defining reading processing to be performed, as well as processing on the generated image data. The selected plurality of instructions are analyzed, and based on the analysis result, the selected plurality of instructions are connected together to create a new instruction. | 12-08-2011 |
20110320782 | PROGRAM STATUS WORD DEPENDENCY HANDLING IN AN OUT OF ORDER MICROPROCESSOR DESIGN - A computer implemented method of processing instructions of a computer program. The method comprises providing at least two copies of program status data; identifying a first update instruction of the instructions that writes to at least one field of the program status data; and associating the first update instruction with a first copy of the at least two copies of program status data. | 12-29-2011 |
20120005460 | INSTRUCTION EXECUTION APPARATUS, INSTRUCTION EXECUTION METHOD, AND INSTRUCTION EXECUTION PROGRAM - An apparatus, method, and program product for monitoring execution of a program, reducing overhead and not changing the behavior of the program. The apparatus performs additional processing which requires a memory area upon execution of a specific instruction to be executed by a predetermined execution system on a computer. The system includes a memory reservation unit reserving the memory area for the additional processing, an instruction replacement unit copying the specific instruction to the reserved memory area and replacing the specific instruction with a special-purpose instruction, an additional processing execution unit acquiring the memory area, and a replaced instruction execution unit performing the same processing as that performed by the specific instruction. | 01-05-2012 |
20120066482 | MACROSCALAR PROCESSOR ARCHITECTURE - A macroscalar processor architecture is described herein. In one embodiment, a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. Other methods and apparatuses are also described. | 03-15-2012 |
20120117360 | DEDICATED INSTRUCTIONS FOR VARIABLE LENGTH CODE INSERTION BY A DIGITAL SIGNAL PROCESSOR (DSP) - In accordance with at least some embodiments, a digital signal processor (DSP) includes an instruction fetch unit and an instruction decode unit in communication with the instruction fetch unit. The DSP also includes a register set and a plurality of work units in communication with the instruction decode unit. The DSP selectively uses a dedicated insert instruction to insert a variable number of bits into a register. | 05-10-2012 |
20120124342 | CONCURRENT CORE AFFINITY FOR WEAK COOPERATIVE MULTITHREADING SYSTEMS - A data structure is stored. Further, a plurality of operations performed on the data structure is modified to be per core instead of per thread so that a subset of the plurality of threads safely share the data structure. In addition, interruption of each of the plurality of threads in the subset is prevented unless one or more of each of the plurality of threads in the subset allows the interruption. | 05-17-2012 |
20120124343 | APPARATUS AND METHOD FOR MODIFYING INSTRUCTION OPERAND - Provided are an apparatus and method for modifying an instruction operand. The apparatus includes a first selector configured to receive first instruction operands and a second selector configured to receive second instruction operands. The apparatus also includes a modification unit configured to select a first instruction operand and a second instruction operand, and to modify the selected first instruction operand and the selected second instruction operand to reduce the operand instructions that are input to the first selector and the second selector. | 05-17-2012 |
20120151192 | Very Long Instruction Word (VLIW) Processor with Power Management, and Apparatus and Method of Power Management Therefor - A very long instruction word (VLIW) processor and an apparatus with power management and a method of power management therefor are provided in consistent with the exemplary embodiments of the disclosure. The power management method includes the following steps. Valid instruction(s) and no operation (NOP) instruction(s) of an input instruction package are rearranged to output a transcoded instruction package, wherein the transcoded instruction package by the rearrangement has its NOP instruction(s) corresponding to at least one execution unit, which is to be placed in power reduction state, of a VLIW processor. Power reduction control is selectively performed on at least one execution unit corresponding to at least one NOP instruction of the transcoded instruction package according to the transcoded instruction package. | 06-14-2012 |
20120179897 | TECHNIQUES FOR MODIFYING A PROCESSOR CODE SEQUENCE - A technique of modifying a code sequence for a processor includes identifying a set of one or more target instructions in the code sequence. A replacement instruction is selected that includes a set of replacement instruction parts. A length of each of the replacement instruction parts corresponds to a minimum instruction length for an instruction set of the processor. The replacement instruction parts include a first instruction type and one or more second instruction types that are each configured as exception instructions if processed in isolation from the first instruction type. The replacement instruction is then substituted for the set of one or more target instructions in the code sequence for processing by the processor. | 07-12-2012 |
20120198215 | INSTRUCTION EXPLOITATION THROUGH LOADER LATE FIX-UP - A method, computer program product, and data processing system for substituting a candidate instruction in application code being loaded during load time. Responsive to identifying the candidate instruction, a determination is made whether a hardware facility of the data processing system is present to execute the candidate instruction. If the hardware facility is absent from the data processing system, the candidate instruction is substituted with a second set of instructions. | 08-02-2012 |
20120204016 | Rewriting Branch Instructions Using Branch Stubs - Mechanisms are provided for rewriting branch instructions in a portion of code. The mechanisms receive a portion of source code having an original branch instruction. The mechanisms generate a branch stub for the original branch instruction. The branch stub stores information about the original branch instruction including an original target address of the original branch instruction. Moreover, the mechanisms rewrite the original branch instruction so that a target of the rewritten branch instruction references the branch stub. In addition, the mechanisms output compiled code including the rewritten branch instruction and the branch stub for execution by a computing device. The branch stub is utilized by the computing device at runtime to determine if execution of the rewritten branch instruction can be redirected directly to a target instruction corresponding to the original target address in an instruction cache of the computing device without intervention by an instruction cache runtime system. | 08-09-2012 |
20120216022 | CONTROLLING THE SELECTIVELY SETTING OF OPERATIONAL PARAMETERS FOR AN ADAPTER - An instruction is provided to establish various operational parameters for an adapter. These parameters include adapter interruption parameters, input/output address translation parameters, resetting error indications, setting measurement parameters, and setting an interception control, as examples. The instruction specifies a function information block, which is a program representation of a device table entry used by the adapter, to be used in certain situations in establishing the parameters. A store instruction is also provided that stores the current contents of the function information block. | 08-23-2012 |
20120239912 | INSTRUCTION PROCESSING METHOD, INSTRUCTION PROCESSING APPARATUS, AND INSTRUCTION PROCESSING PROGRAM - An instruction processing method includes generating a translated code block for an instruction, among instructions included in a target program to be executed and for which a number of executions through sequential interpretation is greater than or equal to a threshold, and storing the generated translated code block in a first storage unit and removing part or all of the translated code block from the first storage unit at a given timing, wherein the generating reduces the threshold with respect to the number of executions over a given period of time after the part or all of the translated code block is removed. | 09-20-2012 |
20120246452 | Signature Update by Code Transformation - Embodiments described herein provide an apparatus, computer readable digital storage medium and method for producing an instruction sequence for a computation unit which can be controlled by a program which includes at least the instruction sequence. | 09-27-2012 |
20120265972 | METHOD AND APPARATUS AND RECORD CARRIER - Method of generating respective instruction compaction schemes for subsets of instructions to be processed by a programmable processor, comprising the steps of a) receiving at least one input code sample representative for software to be executed on the programmable processor, the input code comprising a plurality of instructions defining a first set of instructions (S | 10-18-2012 |
20120284489 | Methods and Apparatus for Constant Extension in a Processor - Programs often require constants that cannot be encoded in a native instruction format, such as 32-bits. To provide an extended constant, an instruction packet is formed with constant extender information and a target instruction. The constant extender information encoded as a constant extender instruction provides a first set of constant bits, such as 26-bits for example, and the target instruction provides a second set of constant bits, such as 6-bits. The first set of constant bits are combined with the second set of constant bits to generate an extended constant for execution of the target instruction. The extended constant may be used as an extended source operand, an extended address for memory access instructions, an extended address for branch type of instructions, and the like. Multiple constant extender instructions may be used together to provide larger constants than can be provided by a single extension instruction. | 11-08-2012 |
20120324207 | Encapsulated Instruction Set - A method of encapsulating a long instruction in a set of short instructions for execution on a processor, the long instruction having k bits and each short instruction having l bits where l12-20-2012 | |
20130013902 | DYNAMICALLY RECONFIGURABLE PROCESSOR AND METHOD OF OPERATING THE SAME - A dynamically reconfigurable processor which executes a series of processes on an instruction basis for respective instructions, comprises: a dynamically configurable computing unit; and a clock generating circuit, wherein start timing for processes in the series of processes is determined based on the main clock except for an instruction execution process of executing the instruction with the dynamically configurable computing unit, the instruction execution process of executing the instruction with the dynamically configurable computing unit includes a computing element generating sub-process of dynamically configuring, with dynamically configurable computing unit, a computing element corresponding to the instruction, and an operation sub-process of performing an operation according to the instruction with the computing element configured in the computing element generating sub-process, start timing for the computing element generating sub-process and the operation sub-process is determined based on the sub-clock, and the sub-clock is generated such that the computing element generating sub-process and the operation sub-process are completed before the start timing for a process which is to be executed immediately after the instruction execution process. | 01-10-2013 |
20130031337 | Methods and Apparatus for Storage and Translation of an Entropy Encoded Instruction Sequence to Executable Form - A method of compressing a sequence of program instructions begins by examining a program instruction stream to identify a sequence of two or more instructions that meet a parameter. The identified sequence of two or more instructions is replaced by a selected type of layout instruction which is then compressed. A method of decompressing accesses an X-index and a Y-index together as a compressed value. The compressed value is decompressed to a selected type of layout instruction which is decoded and replaced with a sequence of two or more instructions. An apparatus for decompressing includes a storage subsystem configured for storing compressed instructions, wherein a compressed instruction comprises an X-index and a Y-index. A decompressor is configured for translating an X-index and Y-index accessed from the storage subsystem to a selected type of layout instruction which is decoded and replaced with a sequence of two or more instructions. | 01-31-2013 |
20130067207 | APPARATUS AND METHOD FOR COMPRESSING INSTRUCTIONS AND A COMPUTER-READABLE STORAGE MEDIA THEREFOR - Provided is a technique that is capable of efficiently compressing instructions by inserting instruction compression bits into valid instruction bundles and deleting no operation (NOP) instruction bundles. Accordingly, the number of instructions that can be parallel-processed in a processor may be increased. | 03-14-2013 |
20130073838 | MULTI-ADDRESSABLE REGISTER FILES AND FORMAT CONVERSIONS ASSOCIATED THEREWITH - A multi-addressable register file is addressed by a plurality of types of instructions, including scalar, vector and vector-scalar extension instructions. It may be determined that data is to be translated from one format to another format. If so determined, a convert machine instruction is executed that obtains a single precision datum in a first representation in a first format from a first register; converts the single precision datum of the first representation in the first format to a converted single precision datum of a second representation in a second format; and places the converted single precision datum in a second register. | 03-21-2013 |
20130086368 | Using Register Last Use Infomation to Perform Decode-Time Computer Instruction Optimization - Two computer machine instructions are fetched for execution, but replaced by a single optimized instruction to be executed, wherein a temporary register used by the two instructions is identified as a last-use register, where a last-use register has a value that is not to be accessed by later instructions, whereby the two computer machine instructions are replaced by a single optimized internal instruction for execution, the single optimized instruction not including the last-use register. | 04-04-2013 |
20130086369 | COMPILING CODE FOR AN ENHANCED APPLICATION BINARY INTERFACE (ABI) WITH DECODE TIME INSTRUCTION OPTIMIZATION - Compiling code for an enhanced application binary interface (ABI) including identifying, by a computer, a code sequence configured to perform a variable address reference table function including an access to a variable at an offset outside of a location in a variable address reference table. The code sequence includes an internal representation (IR) of a first instruction and an IR of a second instruction. The second instruction is dependent on the first instruction. A scheduler cost function associated with at least one of the IR of the first instruction and the IR of the second instruction is modified. The modifying includes generating a modified scheduler cost function that is configured to place the first instruction next to the second instruction. An object file is generated responsive to the modified scheduler cost function. The object file includes the first instruction placed next to the second instruction. The object file is emitted. | 04-04-2013 |
20130111193 | RUNNING SHIFT FOR DIVIDE INSTRUCTIONS FOR PROCESSING VECTORS | 05-02-2013 |
20130117548 | ALGORITHM FOR VECTORIZATION AND MEMORY COALESCING DURING COMPILING - One embodiment of the present invention sets forth a technique for reducing the number of assembly instructions included in a computer program. The technique involves receiving a directed acyclic graph (DAG) that includes a plurality of nodes, where each node includes an assembly instruction of the computer program, hierarchically parsing the plurality of nodes to identify at least two assembly instructions that are vectorizable and can be replaced by a single vectorized assembly instruction, and replacing the at least two assembly instructions with the single vectorized assembly instruction. | 05-09-2013 |
20130132710 | METHOD OF COMPRESSING AND DECOMPRESSING AN EXECUTABLE OR INTERPRETABLE PROGRAM - The method of compressing and decompressing an executable program, can be executed by a microprocessor or interpreted by an interpreter of an integrated circuit device: instructions are reformatted into the format of an initial set of instructions of said program for obtaining instructions in the format of an intermediate set of instructions; repetition templates in the program are determined and, for each repetition template, a pair is defined, formed of said repetition template and of an instruction in the format of a set of instructions; intermediate instructions are replaced by compressed instructions and the links of the compressed program are modified; the compressed program is stored in a memory of the device; and the compressed program is decompressed and the initial instructions are executed by said microprocessor or interpreted by said interpreter. The invention applies, in particular, to the integrated circuits of embedded devices. | 05-23-2013 |
20130145134 | SYSTEM AND METHOD FOR PERFORMING A BRANCH OBJECT CONVERSION TO PROGRAM CONFIGURABLE LOGIC CIRCUITRY - A method and system are provided for deriving a resultant software code from an originating ordered list of instructions that does not include overlapping branch logic. The method may include deriving a plurality of unordered software constructs from a sequence of processor instructions; associating software constructs in accordance with an original logic of the sequence of processor instructions; determining and resolving memory precedence conflicts within the associated plurality of software constructs; resolving forward branch logic structures into conditional logic constructs; resolving back branch logic structures into loop logic constructs; and/or applying the plurality of unordered software constructs in a programming operation by a parallel execution logic circuitry. The resultant plurality of unordered software constructs may be converted into programming reconfigurable logic, computers or processors, and also by means of a computer network or an electronics communications network. | 06-06-2013 |
20130166891 | HANDLING INSTRUCTION RECEIVED FROM A SANDBOXED THREAD OF EXECUTION - Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enveloping a thread of execution within an IDT-based secure sandbox. In one aspect, embodiments of the invention provide a method performed in a computer system, the method receiving an instruction from an execution thread where the computer system can be configured for redirection of instructions from the execution thread. The method can determine whether the instruction includes at least one of an interrupt instruction, a system call instruction and a system enter instruction. In response to determining that the instruction includes at least one of the interrupt instruction, the system call instruction and the system enter instruction, the method can further: (i) eliminate the redirection, (ii) modify a stack to specify return of control, and (iii) thereafter, pass the control to an operating system kernel. | 06-27-2013 |
20130173893 | HARDWARE COMPILATION AND/OR TRANSLATION WITH FAULT DETECTION AND ROLL BACK FUNCTIONALITY - Hardware compilation and/or translation with fault detection and roll back functionality are disclosed. Compilation and/or translation logic receives programs encoded in one language, and encodes the programs into a second language including instructions to support processor features not encoded into the original language encoding of the programs. In one embodiment, an execution unit executes instructions of the second language including an operation-check instruction to perform a first operation and record the first operation result for a comparison, and an operation-test instruction to perform a second operation and a fault detection operation by comparing the second operation result to the recorded first operation result. In some embodiments, an execution unit executes instructions of the second language including commit instructions to record execution checkpoint states of registers mapped to architectural registers, and roll-back instructions to restore the registers mapped to architectural registers to previously recorded execution checkpoint states. | 07-04-2013 |
20130198494 | METHOD FOR COMPILING A PARALLEL THREAD EXECUTION PROGRAM FOR GENERAL EXECUTION - A technique is disclosed for executing a compiled parallel application on a general purpose processor. The compiled parallel application comprises parallel thread execution code, which includes single-instruction multiple-data (SIMD) constructs, as well as references to intrinsic functions conventionally available in a graphics processing unit. The parallel thread execution code is transformed into an intermediate representation, which includes vector instruction constructs. The SIMD constructs are mapped to vector instructions available within the intermediate representation. Intrinsic functions are mapped to corresponding emulated runtime implementations. The technique advantageously enables parallel applications compiled for execution on a graphics processing unit to be executed on a general purpose central processing unit configured to support vector instructions. | 08-01-2013 |
20130198495 | Method and Apparatus For Register Spill Minimization - The aspects enable a computing device to allocate memory space to variables during runtime compilation of a software application. A compiler may be modified to identify operations that can be performed on either a main pipe or an alternative pipe, identify chains of related operations that can be performed on either the main pipe or the alternative pipe, identify points in the execution of code at which the number of live values will exceed the number of registers, and choosing a chain of operations as a candidate to be moved to the alternative pipe in order to reduce the number of live values at identified points in the execution of code. The entire chosen chain of operations may be moved to the alternative pipe. The alternative pipe may perform the computations and return the results to the main pipe for execution. | 08-01-2013 |
20130246765 | DATA PROCESSOR - For efficient issue of a superscalar instruction a circuit is employed which retrieves an instruction of each instruction code type other than a prefix based on a determination result of decoders for determining instruction code type, adds the immediately preceding instruction to the retrieved instruction, and outputs the resultant. When an instruction of a target code type is detected in a plurality of instruction units to be searched, the circuit outputs the detected instruction code and the immediately preceding instruction other than the target code type as prefix code candidates. When an instruction of a target code type cannot be detected at the rear end of the instruction units, the circuit outputs the instruction at the rear end as a prefix code candidate. When an instruction of a target code type is detected at the head in the instruction code search, the circuit outputs the instruction code at the head. | 09-19-2013 |
20130246766 | TRANSFORMING NON-CONTIGUOUS INSTRUCTION SPECIFIERS TO CONTIGUOUS INSTRUCTION SPECIFIERS - Emulation of instructions that include non-contiguous specifiers is facilitated. A non-contiguous specifier specifies a resource of an instruction, such as a register, using multiple fields of the instruction. For example, multiple fields of the instruction (e.g., two fields) include bits that together designate a particular register to be used by the instruction. Non-contiguous specifiers of instructions defined in one computer system architecture are transformed to contiguous specifiers usable by instructions defined in another computer system architecture. The instructions defined in the another computer system architecture emulate the instructions defined for the one computer system architecture. | 09-19-2013 |
20130262839 | INSTRUCTION MERGING OPTIMIZATION - A computer system for optimizing instructions is configured to identify two or more machine instructions as being eligible for optimization, to merge the two or more machine instructions into a single optimized internal instruction that is configured to perform functions of the two or more machine instructions, and to execute the single optimized internal instruction to perform the functions of the two or more machine instructions. Being eligible includes determining that the two or more machine instructions include a first instruction specifying a first target register and a second instruction specifying the first target register as a source register and a target register. The second instruction is a next sequential instruction of the first instruction in program order, wherein the first instruction specifies a first function to be performed, and the second instruction specifies a second function to be performed. | 10-03-2013 |
20130262840 | INSTRUCTION MERGING OPTIMIZATION - A computer-implemented method includes determining that two or more instructions of an instruction stream are eligible for optimization. Eligibility is based on a first instruction specifying a first target register and a second instruction specifying the first target register as a source register and a target register. The method includes merging the two or more machine instructions into a single optimized internal instruction that is configured to perform first and second functions of two or more machine instructions employing operands specified by the two or more machine instructions. The single optimized internal instruction specifies the first target register only as a single target register and the single optimized internal instruction specifies the first and second functions to be performed. The method includes executing the single optimized internal instruction to perform the first and second functions of the two or more instructions. | 10-03-2013 |
20130262841 | INSTRUCTION MERGING OPTIMIZATION - A computer-implemented method includes determining that two or more instructions of an instruction stream are eligible for optimization, where the two or more instructions include a memory load instruction and a data processing instruction to process data based on the memory load instruction. The method includes merging, by a processor, the two or more instructions into a single optimized internal instruction and executing the single optimized internal instruction to perform a memory load function and a data processing function corresponding to the memory load instruction and the data processing instruction. | 10-03-2013 |
20130262842 | CODE GENERATION METHOD AND INFORMATION PROCESSING APPARATUS - An information processing apparatus generates first and second trees representing a dependency relationship among instructions from first code. The information processing apparatus then adjusts the height of the shorter one of the first and second trees by inserting pseudo instructions that do not cause any difference in data before and after operation in the shorter tree, and also shuffles the order of instructions existing at the same depth from the root, according to operation types in at least one of the first and second trees. The information processing apparatus compares the first and second trees subjected to the height adjustment and the order shuffling with each other to determine combinations of an instruction of the first tree and an instruction of the second tree. | 10-03-2013 |
20130268742 | CORE SWITCHING ACCELERATION IN ASYMMETRIC MULTIPROCESSOR SYSTEM - An asymmetric multiprocessor system (ASMP) may comprise computational cores implementing different instruction set architectures and having different power requirements. Program code executing on the ASMP is analyzed by a binary analysis unit to determine what functions are called by the program code and select which of the cores are to execute the program code, or a code segment thereof. Selection may be made to provide for native execution of the program code, to minimize power consumption, and so forth. Control operations based on this selection may then be inserted into the program code, forming instrumented program code. The instrumented program code is then executed by the ASMP. | 10-10-2013 |
20130275734 | PACKED DATA OPERATION MASK CONCATENATION PROCESSORS, METHODS, SYSTEMS AND INSTRUCTIONS - A method of an aspect includes receiving a packed data operation mask concatenation instruction. The packed data operation mask concatenation instruction indicates a first source having a first packed data operation mask, indicates a second source having a second packed data operation mask, and indicates a destination. A result is stored in the destination in response to the packed data operation mask concatenation instruction. The result includes the first packed data operation mask concatenated with the second packed data operation mask. Other methods, apparatus, systems, and instructions are disclosed. | 10-17-2013 |
20130283020 | PREDICATE TRACE COMPRESSION - A program trace data compression mechanism in which execution of a variable length execution set (VLES) including multiple non-branch conditional instructions are traced in real-time in a manner that allows the instruction execution to be reconstructed completely by correlating the trace data with the traced binary code. | 10-24-2013 |
20130283021 | APPARATUS AND METHOD OF IMPROVED INSERT INSTRUCTIONS - An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group. The apparatus also includes masking layer circuitry to mask the first and third instructions at a first resultant vector granularity, and, mask the second and fourth instructions at a second resultant vector granularity. | 10-24-2013 |
20130283022 | SYSTEM, APPARATUS AND METHOD FOR TRANSLATING VECTOR INSTRUCTIONS - Vector translation instructions are used to demarcate the beginning and the end of a code region to be translated. The code region includes a first set of vector instructions defined in an instruction set of a source processor. A processor receives the vector translation instructions and the demarcated code region, and translates the code region into translated code. The translated code includes a second set of vector instructions defined in an instruction set of a target processor. The translated code is executed by the target processor to produce a result value, the result value being the same as an original result value produced by the source processor executing the code region. The target processor stores the result value at a location that is not a vector register, the location being the same as an original location used by the source processor to store the original result value. | 10-24-2013 |
20130311755 | RUNNING STATE POWER SAVING VIA REDUCED INSTRUCTIONS PER CLOCK OPERATION - A microprocessor includes functional units and control registers writeable to cause the functional units to institute actions that reduce the instructions-per-clock rate of the microprocessor to reduce power consumption when the microprocessor is operating in its lowest performance running state. Examples of the actions include in-order vs. out-of-order execution, serial vs. parallel cache access and single vs. multiple instruction issue, retire, translation and/or formatting per clock cycle. The actions may be instituted only if additional conditions exist, such as residing in the lowest performance running state for a minimum time, not running in a higher performance state for more than a maximum time, a user did not disable the feature, the microprocessor supports multiple running states and the operating system supports multiple running states. | 11-21-2013 |
20130311756 | ROTATE INSTRUCTIONS THAT COMPLETE EXECUTION WITHOUT READING CARRY FLAG - A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag. | 11-21-2013 |
20130332710 | MODULATING DYNAMIC OPTIMAIZATIONS OF A COMPUTER PROGRAM - Technologies and implementations for modulating dynamic optimizations of a computer program during execution are generally disclosed. | 12-12-2013 |
20130339682 | METHODS TO OPTIMIZE A PROGRAM LOOP VIA VECTOR INSTRUCTIONS USING A SHUFFLE TABLE AND A MASK STORE TABLE - According to one embodiment, a code optimizer is configured to receive first code having a program loop implemented with scalar instructions to store values of a first array to a second array based on values of a third array. The code optimizer is configured to generate second code representing the program loop with vector instructions including a shuffle instruction and a store instruction, the store instruction to shuffle using a shuffle table elements of the first array based on the second array in a vector manner, the store instruction to store using a mask store table the shuffled elements in the third array in a vector manner. | 12-19-2013 |
20140006757 | Method for Thread Reduction in a Multi-Thread Packet Processor | 01-02-2014 |
20140013088 | INTEGRATED CIRCUIT DEVICE AND METHODS OF PERFORMING BIT MANIPULATION THEREFOR - An integrated circuit device comprising at least one instruction processing module arranged to receive a bit-manipulation instruction, and in response to receiving the bit-manipulation instruction to select at least one bit from at least one source data register in accordance with a value of at least one control bit, select from candidate values a manipulation value for the at least one selected bit in accordance with a value of at least one further control bit, and store the selected manipulation value for the at least one selected bit in at least one output data register. | 01-09-2014 |
20140013089 | CONDITIONAL LOAD INSTRUCTIONS IN AN OUT-OF-ORDER EXECUTION MICROPROCESSOR - A microprocessor instruction translator translates a conditional load instruction into at least two microinstructions. An out-of-order execution pipeline executes the microinstructions. To execute a first microinstruction, an execution unit receives source operands from the source registers of a register file and responsively generates a first result using the source operands. To execute a second the microinstruction, an execution unit receives a previous value of the destination register and the first result and responsively reads data from a memory location specified by the first result and provides a second result that is the data if a condition is satisfied and that is the previous destination register value if not. The previous value of the destination register comprises a result produced by execution of a microinstruction that is the most recent in-order previous writer of the destination register with respect to the second microinstruction. | 01-09-2014 |
20140019731 | GENERALIZED BIT MANIPULATION INSTRUCTIONS FOR A COMPUTER PROCESSOR - Methods of bit manipulation within a computer processor are disclosed. Improved flexibility in bit manipulation proves helpful in computing elementary functions critical to the performance of many programs and for other applications. In one embodiment, a unit of input data is shifted/rotated and multiple non-contiguous bit fields from the unit of input data are inserted in an output register. In another embodiment, one of two units of input data is optionally shifted or rotated, the two units of input data are partitioned into a plurality of bit fields, bitwise operations are performed on each bit field, and pairs of bit fields are combined with either an AND or an OR bitwise operation. Embodiments are also disclosed to simultaneously perform these processes on multiple units and pairs of units of input data in a Single Input, Multiple Data processing environment capable of performing logical operations on floating point data. | 01-16-2014 |
20140019732 | SYSTEMS, APPARATUSES, AND METHODS FOR PERFORMING MASK BIT COMPRESSION - Embodiments of systems, apparatuses, and methods for performing in a computer processor mask bit compression in response to a single mask bit compression instruction that includes a source writemask register operand, a destination writemask register operand, and an opcode are described. | 01-16-2014 |
20140032882 | MODIFICATION OF FUNCTIONALITY IN EXECUTABLE CODE - In an example embodiment, an instruction set is accessed. An instruction modifier is associated with the instruction set. Thereafter, the instruction set is transformed into a modified instruction set based on the instruction modifier. After the transformation, the modified instruction set is executed. | 01-30-2014 |
20140047221 | FUSING FLAG-PRODUCING AND FLAG-CONSUMING INSTRUCTIONS IN INSTRUCTION PROCESSING CIRCUITS, AND RELATED PROCESSOR SYSTEMS, METHODS, AND COMPUTER-READABLE MEDIA - Fusing flag-producing and flag-consuming instructions in instruction processing circuits and related processor systems, methods, and computer-readable media are disclosed. In one embodiment, a flag-producing instruction indicating a first operation generating a first flag result is detected in an instruction stream by an instruction processing circuit. The instruction processing circuit also detects a flag-consuming instruction in the instruction stream indicating a second operation consuming the first flag result as an input. The instruction processing circuit generates a fused instruction indicating the first operation generating the first flag result and indicating the second operation consuming the first flag result as the input. In this manner, as a non-limiting example, the fused instruction eliminates a potential for a read-after-write hazard between the flag-producing instruction and the flag-consuming instruction. | 02-13-2014 |
20140047222 | METHOD AND DEVICE FOR RECOMBINING RUNTIME INSTRUCTION - A method for recombining runtime instruction comprising: an instruction running environment is buffered; the machine instruction segment to be scheduled is obtained; the second jump instruction which directs an entry address of an instruction recombining platform is inserted before the last instruction of the obtained machine instruction segment to generate the recombined instruction segment comprising the address A″; the value A of the address register of the buffered instruction running environment is modified to the address A″; the instruction running environment is recovered. A device for recombining the runtime instruction comprising: an instruction running environment buffering and recovering unit suitable for buffering and recovering the instruction running environment; an instruction obtaining unit suitable for obtaining the machine instruction segment to be scheduled; an instruction recombining unit suitable for generating the recombined instruction segment comprised the address A″; and an instruction replacing unit suitable for modifying the value of the address register of the buffered instruction running environment to the address of the recombined instruction segment. The monitoring and control of the runtime instruction of the computing device is completed. | 02-13-2014 |
20140082334 | Encoding to Increase Instruction Set Density - A conventional instruction set architecture such, as the x86 instruction set architecture, may be reencoded to reduce the amount of memory used by the instructions. This may be particularly useful in applications that are memory sized limited, as is the case with microcontrollers. With a reencoded instruction set that is more dense, more functions can be implemented or a smaller memory size may be used. The encoded instructions are then naturally decoded at run time in the predecoder and decoder of the core pipeline. | 03-20-2014 |
20140108771 | Using Register Last Use Information to Perform Decode Time Computer Instruction Optimization - Two computer machine instructions are fetched for execution, but replaced by a single optimized instruction to be executed, wherein a temporary register used by the two instructions is identified as a last-use register, where a last-use register has a value that is not to be accessed by later instructions, whereby the two computer machine instructions are replaced by a single optimized internal instruction for execution, the single optimized instruction not including the last-use register. | 04-17-2014 |
20140108772 | Exploiting an Architected Last-Use Operand Indication in a System Operand Resource Pool - A pool of available physical registers are provided for architected registers, wherein operations are performed that activate and deactivate selected architected registers, such that the deactivated selected architected registers need not retain values, and physical registers can be deallocated to the pool, wherein deallocation of physical registers is performed after a last-use by a designated last-use instruction, wherein the last-use information is provided either by the last-use instruction or a prefix instruction, wherein reads to deallocated architecture registers return an archtiected default value. | 04-17-2014 |
20140115304 | COMPRESSED INSTRUCTION CODE STORAGE - Computer implemented techniques are disclosed for identification of repeated binary strings and for storing those binary strings in order to compress code. The binary strings can be longer instructions, data, or addresses. A table of binary strings is generated based on repeated occurrences, and a reference index is provided for accessing specific entries within the table. An opcode uses a shorter string as an index through which to access the table. The longer string is executed when the longer string is an instruction. When the longer string is an address or data, the appropriate address or data are accessed. | 04-24-2014 |
20140129809 | BYTE SELECTION AND STEERING LOGIC FOR COMBINED BYTE SHIFT AND BYTE PERMUTE VECTOR UNIT - Exemplary embodiments of the present invention disclose a method and system for executing data permute and data shift instructions. In a step, an exemplary embodiment encodes a control index value using the recoding logic into a 1-hot-of-n control for at least one of a plurality of datum positions in the one or more target registers. In another step, an exemplary embodiment conditions the 1-hot-of-n control by a gate-free logic configured for at least one of the plurality of datum positions in the one or more target registers for each of the data permute instructions and the at least one data shift instruction. In another step, an exemplary embodiment selects the 1-hot-of-n control or the conditioned 1-hot-of-n control based on a current instruction mode. In another step, an exemplary embodiment transforms the selected 1-hot-of-n control into a format applicable for the crossbar switch. | 05-08-2014 |
20140149722 | Fusing Immediate Value, Write-Based Instructions in Instruction Processing Circuits, and Related Processor Systems, Methods, and Computer-Readable Media - Fusing immediate value, write-based instructions in instruction processing circuits, and related processor systems, methods, and computer-readable media are disclosed. In one embodiment, a first instruction indicating an operation writing an immediate value to a register is detected by an instruction processing circuit. The circuit also detects at least one subsequent instruction indicating an operation that overwrites at least one first portion of the register while maintaining a value of a second portion of the register. The at least one subsequent instruction is converted (or replaced) with a fused instruction(s), which indicates an operation writing the at least one first portion and the second portion of the register. In this manner, conversion of multiple instructions for generating a constant into the fused instruction(s) removes the potential for a read-after-write hazard and associated consequences caused by dependencies between certain instructions, while reducing a number of clock cycles required to process the instructions. | 05-29-2014 |
20140149723 | DYNAMIC EVALUATION AND ADAPTION OF HARDWARE HASH FUNCTIONS - Creating hash values based on bit values of an input vector. An apparatus includes a first and a second hash table, a first and second hash function generator adapted to configure a respective hash function for a creation of a first and second hash value based on the bit values of the input vector. The hash values are stored in the respective hash tables. An evaluation unit includes a comparison unit to compare a respective effectiveness of the first hash function and the second hash function, and an exchanging unit responsive to the comparison unit adapted to replace the first hash function by the second hash function. | 05-29-2014 |
20140149724 | VECTOR FRIENDLY INSTRUCTION FORMAT AND EXECUTION THEREOF - A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams. | 05-29-2014 |
20140164745 | REGISTER ALLOCATION FOR CLUSTERED MULTI-LEVEL REGISTER FILES - A method for allocating registers within a processing unit. A compiler assigns a plurality of instructions to a plurality of processing clusters. Each instruction is configured to access a first virtual register within a live range. The compiler determines which processing cluster in the plurality of processing clusters is an owner cluster for the first virtual register within the live range. The compiler configures a first instruction included in the plurality of instructions to access a first global virtual register. | 06-12-2014 |
20140164746 | Tracking Multiple Conditions in a General Purpose Register and Instruction Therefor - An operate-and-insert instruction of a program, when executed performs an operation based on one or more operands, results of an instruction specified test of the operation performed are stored in an instruction specified location of an instruction specified general register. The instruction specified general register is therefore able to hold results of many operate-and-insert instructions. The program can then use non-branch type instructions to evaluate conditions saved in the register, thus avoiding the performance penalty of branch instructions. | 06-12-2014 |
20140201505 | PREDICTION-BASED THREAD SELECTION IN A MULTITHREADING PROCESSOR - A processor includes one or more execution units to execute instructions of a plurality of threads and thread control logic coupled to the execution units to predict whether a first of the plurality of threads is ready for selection in a current cycle based on readiness of instructions of the first thread in one or more previous cycles, to predict whether a second of the plurality of threads is ready for selection in the current cycle based on readiness of instructions of the second thread in the one or more previous cycles, and to select one of the first and second threads in the current cycle based on the predictions. | 07-17-2014 |
20140208080 | APPARATUS AND METHOD FOR DOWN CONVERSION OF DATA TYPES - An apparatus and method are described for down-converting from a source operand to a destination operand with masking. For example, a method according to one embodiment includes the following operations: reading a source operand value to be down-converted from a first value to a down-converted value and stored in a destination location; reading each mask register bit stored in a mask register, the mask register bit(s) indicating whether to perform a masking operation or a conversion operation on the source operand value; if the mask register bit(s) indicates that a masking operation is to be performed, then performing a specified masking operation and storing the results of the masking operation in the destination location; and if the mask register bit(s) indicates that a masking operation is not to be performed, then down-converting the source operand value and storing the down-converted value in the specified destination location. | 07-24-2014 |
20140258692 | DATA PROCESSOR - A RISC data processor in which the number of flags generated by each instruction is increased so that a decrease of flag-generating instructions exceeds an increase of flag-using instructions in quantity, thereby achieving the decrease in instructions. An instruction for generating flags according to operands' data sizes is defined, and an instruction set handled by the RISC data processor includes an instruction capable of executing an operation on operands in more than one data size. An identical operation process is conducted on the small-size operand and on low-order bits of the large-size operand, and flags are generated capable of coping with the respective data sizes regardless of the data size of each operand subjected to the operation. Thus, a reduction in instruction code space of the RISC data processor can be achieved. | 09-11-2014 |
20140281429 | ELIMINATING REDUNDANT SYNCHRONIZATION BARRIERS IN INSTRUCTION PROCESSING CIRCUITS, AND RELATED PROCESSOR SYSTEMS, METHODS, AND COMPUTER-READABLE MEDIA - Embodiments disclosed herein include eliminating redundant synchronization barriers from execution pipelines in instruction processing circuits. Related processor systems, methods, and computer-readable media are also disclosed. By tracking the occurrence of synchronization events, unnecessary software synchronization operations may be identified and eliminated, thus improving performance of a central processing unit (CPU). In one embodiment, a method for eliminating redundant synchronization barriers in an instruction stream is provided. The method comprises determining whether a next instruction comprises a synchronization barrier of a type corresponding to a first synchronization event. The method also comprises eliminating the next instruction from the instruction stream, responsive to determining that the next instruction comprises a synchronization barrier of a type corresponding to the first synchronization event. In this manner, the average number of instructions executed during each CPU clock cycle may be increased by avoiding unnecessary synchronization operations. | 09-18-2014 |
20140281430 | EXECUTION OF CONDITION-BASED INSTRUCTIONS - Execution of condition-based instructions is facilitated. A condition-based instruction is obtained, as well as a confidence level associated with the instruction. The confidence level is checked, and based on the confidence level being a first value, a predicted operation of the instruction, which is based on a predictor, is unconditionally performed. Further, based on the confidence level being a second value, a specified operation of the instruction, which is based on a determined condition, is conditionally performed. | 09-18-2014 |
20140281431 | EFFICIENT WAY TO CANCEL SPECULATIVE 'SOURCE READY' IN SCHEDULER FOR DIRECT AND NESTED DEPENDENT INSTRUCTIONS - A method and apparatus for simultaneously canceling a dependent instruction and a nested dependent instruction when a cancel timer of a source of the dependent instruction and a cancel timer of a source of the nested dependent instruction expire and a producer instruction speculatively waking up the dependent instruction is canceled. | 09-18-2014 |
20140281432 | Systems and Methods for Move Elimination with Bypass Multiple Instantiation Table - Systems and methods for move operation elimination with bypass Multiple Instantiation Table (MIT) logic. An example processing system may comprise a first data structure configured to store a plurality of physical register values; a second data structure configured to store a plurality of pointers, each pointer referencing an element of the first data structure; a third data structure including a plurality of move elimination sets, each move elimination set comprising a plurality of bits representing a plurality of logical registers; and a logic configured to perform a data manipulation operation by causing an element of the second data structure to reference an element of the first data structure, the logic further configured to reflect results of two or more data manipulation operations by performing a single update of the third data structure. | 09-18-2014 |
20140289503 | PACKED DATA OPERATION MASK COMPARISON PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS - Receive packed data operation mask comparison instruction indicating first packed data operation mask having first packed data operation mask bits and second packed data operation mask having second packed data operation mask bits. Each packed data operation mask bit of first mask corresponds to a packed data operation mask bit of second mask in corresponding position. Modify first flag to first value if bitwise AND of each packed data operation mask bit of first mask with each corresponding packed data operation mask bit of second mask is zero. Otherwise modify first flag to second value. Modify second flag to third value if bitwise AND of each packed data operation mask bit of first mask with bitwise NOT of each corresponding packed data operation mask bit of second mask is zero. Otherwise modify second flag to fourth value. | 09-25-2014 |
20140304493 | METHODS AND SYSTEMS FOR PERFORMING A BINARY TRANSLATION - Systems and methods are provided in example embodiments for performing binary translation. A binary translation system converts, by a translator module, source instructions to target instructions. The binary translation system identifies a condition code block in the source instructions, where the condition code block includes a plurality of condition bits. In response to identifying the condition code block, the binary translation system provides an optimizer module to convert the condition code block. Then, the binary translation system performs a pre-execution on the condition code block to resolve the plurality of condition bits in the condition code block. | 10-09-2014 |
20140372733 | PROCESSOR WITH INTER-EXECUTION UNIT INSTRUCTION ISSUE - A processor includes an instruction storage memory, a processor core, and an instruction merge unit. The processor core includes a plurality of execution units coupled to the instruction storage memory. A first of the execution units is configured to execute instructions provided from the instruction storage memory via a first instruction path, and to execute instructions provided by a second of the execution units via a second instruction path. The second of the execution units is configured to execute instructions provided from the instruction storage memory, and to provide instructions for execution to the first of the execution units via the second instruction path. The instruction merge unit is configured to merge the instructions provided via the first and second instruction paths into a stream of instructions to be executed by the first execution unit. | 12-18-2014 |
20150012733 | SYSTEM AND METHOD FOR PERFORMING DETERMINISTIC PROCESSING - A system and method is provided for performing deterministic processing on a non-deterministic computer system. In one example, the system forces execution of one or more computer instructions to execute within a constant execution time. A deterministic engine, if necessary, waits a variable amount of time to ensure that the execution of the computer instructions is performed over the constant execution time. Because the execution time is constant, the execution is deterministic and therefore may be used in applications requiring deterministic behavior. For example, such a deterministic engine may be used in automated test equipment (ATE) applications. | 01-08-2015 |
20150026439 | APPARATUS AND METHOD FOR PERFORMING PERMUTE OPERATIONS - An apparatus and method are described for permuting data elements with masking. For example, a method according to one embodiment includes the following operations: reading values from a mask data structure to determine whether masking is implemented for each data element of a destination operand; if masking not implemented for a particular data element, then selecting data elements from a first source operand and a second source operand based on index values stored in destination operand to be copied to data element positions within the destination operand, wherein any one of the data elements from either the first source operand and the second source operand may be copied to any one of the data element positions within the destination operand; and if masking is implemented for a particular data element of the destination operand, then performing a designated masking operation with respect to that particular data element. | 01-22-2015 |
20150026440 | APPARATUS AND METHOD FOR PERFORMING A PERMUTE OPERATION - An apparatus and method are described for permuting data elements with masking. For example, a method according to one embodiment includes the following operations: reading values from a mask data structure to determine whether masking is implemented for each data element of a destination operand; if masking not implemented for a particular data element, then selecting data elements from the destination operand and a second source operand based on index values stored in a first source operand to be copied to data element positions within the destination operand, wherein any one of the data elements from either the destination operand and the second source operand may be copied to any one of the data element positions within the destination operand; if masking is implemented for a particular data element of the destination operand, then performing a designated masking operation with respect to that particular data element. | 01-22-2015 |
20150052337 | SELECTIVELY CONTROLLING INSTRUCTION EXECUTION IN TRANSACTIONAL PROCESSING - Execution of instructions in a transactional environment is selectively controlled. A TRANSACTION BEGIN instruction initiates a transaction and includes controls that selectively indicate whether certain types of instructions are permitted to execute within the transaction. The controls include one or more of an allow access register modification control and an allow floating point operation control. | 02-19-2015 |
20150058603 | Permute Operations With Flexible Zero Control - In one embodiment, the present invention includes logic to receive a permute instruction, first and second source operands, and control values, and to perform a permute operation based on an operation between at least two of the control values so that selected portions of the first and second source operands or a predetermined value can be stored into elements of a destination. Multiple permute instructions may be combined to perform efficient table lookups. Other embodiments are described and claimed. | 02-26-2015 |
20150089208 | PREDICTOR DATA STRUCTURE FOR USE IN PIPELINED PROCESSING - A predictor data structure is used for pipelined processing by a pipelined processor. The predictor data structure includes a predicted address to be used in return from execution of a selected instruction, and a predicted operating state associated with the predicted address. Based on determining a selected return instruction is to be executed, the predicted address to which processing is to be returned is obtained from the predictor data structure. Further, based on determining the selected return instruction is to be executed, a transitional operating state to be entered based on the predicted operating state stored in the predictor data structure is predicted, wherein at least one of the predicted address and the predicted transitional operating state are to be used to validate execution of the selected return instruction. | 03-26-2015 |
20150095625 | OPTIMIZATION OF INSTRUCTIONS TO REDUCE MEMORY ACCESS VIOLATIONS - Mechanisms for reducing memory access violations are disclosed. Sets of instructions may be identified and the identified sets of instructions may be re-translated or optimized to generate other sets of instructions. Execution of the other sets of instructions is analyzed to determine whether additional memory access violations occur. When additional memory access violations occur, further sets of instructions may be generated or re-translation/optimization of instructions may be disabled. | 04-02-2015 |
20150100767 | SELF-TIMED USER-EXTENSION INSTRUCTIONS FOR A PROCESSING DEVICE - A processor for executing configurable instructions and a method of configuring the processor are disclosed. In one embodiment, the processor includes (i) a processor core to execute preconfigured instructions and (ii) a processor core extension to execute user-defined extension instructions that are configurable instructions. The user-defined extension instructions may include an autonomous instruction with varying execution cycles based on source data and an operation performed. The processor core extension employs extension interface signals as a handshake protocol to operate together with the processor core without knowing any priori knowledge of how many processor clock cycles that the autonomous instruction will take to complete. | 04-09-2015 |
20150106600 | EXECUTION OF CONDITION-BASED INSTRUCTIONS - Execution of condition-based instructions is facilitated. A condition-based instruction is obtained, as well as a confidence level associated with the instruction. The confidence level is checked, and based on the confidence level being a first value, a predicted operation of the instruction, which is based on a predictor, is unconditionally performed. Further, based on the confidence level being a second value, a specified operation of the instruction, which is based on a determined condition, is conditionally performed. | 04-16-2015 |
20150121047 | READING A REGISTER PAIR BY WRITING A WIDE REGISTER - A read operation is initiated to obtain a wide input operand. Based on the initiating, a determination is made as to whether the wide input operand is available in a wide register or in two narrow registers. Based on determining the wide input operand is not available in the wide register, merging at least a portion of contents of the two narrow registers to obtain merged contents, writing the merged contents into the wide register, and continuing the read operation to obtain the wide input operand. Based on determining the wide input operand is available in the wide register, obtaining the wide input operand from the wide register. | 04-30-2015 |
20150143085 | VECTOR PROCESSING ENGINES (VPEs) EMPLOYING REORDERING CIRCUITRY IN DATA FLOW PATHS BETWEEN EXECUTION UNITS AND VECTOR DATA MEMORY TO PROVIDE IN-FLIGHT REORDERING OF OUTPUT VECTOR DATA STORED TO VECTOR DATA MEMORY, AND RELATED VECTOR PROCESSOR SYSTEMS AND METHODS - Vector processing engines (VPEs) employing reordering circuitry in data flow paths between execution units and vector data memory to provide in-flight reordering of output vector data stored to vector data memory are disclosed. Related vector processor systems and methods are also disclosed. Reordering circuitry is provided in data flow paths between execution units and vector data memory in the VPE. The reordering circuitry is configured to reorder output vector data sample sets from execution units as a result of performing vector processing operations in-flight while the output vector data sample sets are being provided over the data flow paths from the execution units to the vector data memory to be stored. In this manner, the output vector data sample sets are stored in the reordered format in the vector data memory without requiring additional post-processing steps, which may delay subsequent vector processing operations to be performed in the execution units. | 05-21-2015 |
20150143086 | VECTOR PROCESSING ENGINES (VPEs) EMPLOYING FORMAT CONVERSION CIRCUITRY IN DATA FLOW PATHS BETWEEN VECTOR DATA MEMORY AND EXECUTION UNITS TO PROVIDE IN-FLIGHT FORMAT-CONVERTING OF INPUT VECTOR DATA TO EXECUTION UNITS FOR VECTOR PROCESSING OPERATIONS, AND RELATED VECTOR PROCESSOR SYSTEMS AND METHODS - Vector processing engines (VPEs) employing format conversion circuitry in data flow paths between vector data memory and execution units to provide in-flight format-converting of input vector data to execution units for vector processing operations are disclosed. Related vector processor systems and methods are also disclosed. Format conversion circuitry is provided in data flow paths between vector data memory and execution units in the VPE. The format conversion circuitry is configured to convert input vector data sample sets fetched from vector data memory in-flight while the input vector data sample sets are being provided over the data flow paths to the execution units to be processed. In this manner, format conversion of the input vector data sample sets does not require pre-processing, storage, and re-fetching from vector data memory, thereby reducing power consumption and not limiting efficiency of the data flow paths by format conversion pre-processing delays. | 05-21-2015 |
20150143087 | SERVICE SYSTEM AND METHOD - A system includes provision of a first set of instructions associated with a product to a user of the product, the user having one or more associated characteristics, reception of a revision to the first set of instructions from the user, determination of whether to modify the first set of instructions based the revision and on the characteristics, and modification, if a determination is made to modify the first set of instructions, of the first set of instructions based at least in part on the revision to generate a second set of instructions. | 05-21-2015 |
20150143088 | VECTOR ELEMENT ROTATE AND INSERT UNDER MASK INSTRUCTION - A Vector Element Rotate and Insert Under Mask instruction. Each element of a second operand of the instruction is rotated in a specified direction by a specified number of bits. For each bit in a third operand of the instruction that is set to one, the corresponding bit of the rotated elements in the second operand replaces the corresponding bit in a first operand of the instruction. | 05-21-2015 |
20150293768 | COMPILING METHOD AND COMPILING APPARATUS - A compiling apparatus detects a plurality of branch instructions, each of which specifies execution of branch processing on the basis of a result of a comparison operation between integers and indicates the same jump destination, in a first code. The compiling apparatus converts the plurality of branch instructions into an instruction group having fewer branch instructions than the plurality of branch instructions by using logical and arithmetic instructions. The compiling apparatus generates a second code using the converted instruction group when the number of cycles of processing based on the converted instruction group is smaller than that based on the plurality of branch instructions. | 10-15-2015 |
20150301826 | SHARING PROCESSING RESULTS BETWEEN DIFFERENT PROCESSING LANES OF A DATA PROCESSING APPARATUS - A data processing apparatus has control circuitry for detecting whether a first micro-operation to be processed by a first processing lane would give the same result as a second micro-operation processed by a second processing lane. if they would give the same result, then the first micro-operation is prevented from being processed by the first processing lane and the result of the second micro-operation is output as the result of the first micro-operation. This avoids duplication of processing, to save energy for example. | 10-22-2015 |
20150301827 | REUSE OF RESULTS OF BACK-TO-BACK MICRO-OPERATIONS - A data processing apparatus has control circuitry for detecting whether a current micro-operation to be processed by processing circuitry is for the same data processing operation and specifies the same at least one operand as the last valid micro-operation processed by the processing circuitry. If so, then the control circuitry prevents the processing circuitry processing the current micro-operation so that an output register is not updated in response to the current micro-operation, and outputs the current value stored in the output register as the result of the current micro-operation. This allows power consumption to be reduced or performance to be improved by not repeating the same computation. | 10-22-2015 |
20150309798 | REGISTER RESOURCE LOCKING IN A VLIW PROCESSOR - An apparatus including a queue configured to store a plurality of instructions and state information indicating whether each instruction of the plurality of instructions can be performed independently of older pending instructions; and a state-selection circuit configured to set a state information of each instruction of the plurality of instructions ifs view of an older pending instruction in the queue. | 10-29-2015 |
20150309799 | STUNT BOX - A processor including a stunt box with an intermediate storage, including a plurality of registers, configured to store a plurality of execution pipe results as a plurality of intermediate results; a storage, communicatively coupled to the intermediate storage, configured to store a plurality of storage results which may include one or more of the plurality of intermediate results; and an arbiter, communicatively coupled to the intermediate storage and the storage, configured to receive the plurality of execution pipe results, the plurality of intermediate results, and the plurality of storage results and to select an output to retire from of the plurality of results, the plurality of intermediate results, and the plurality of storage results. | 10-29-2015 |
20150324200 | METHODS AND APPARATUS TO COMPILE INSTRUCTIONS FOR A VECTOR OF INSTRUCTION POINTERS PROCESSOR ARCHITECTURE - Methods, apparatus, systems, and articles of manufacture to compile instructions for a vector of instruction pointers (VIP) processor architecture are disclosed. An example method includes identifying a strand including a fork instruction introducing a first speculative assumption. A basing instruction to initialize a basing value of the strand before execution of a first instruction under the first speculative assumption. A determination of whether a second instruction under a second speculative assumption modifies a first memory address that is also modified by the first instruction under the first speculative assumption is made. The second instruction is not modified when the second instruction does not modify the first memory address. The second instruction is modified based on the basing value when the second instruction modifies the first memory address, the basing value to cause the second instruction to modify a second memory address different from the first memory address. | 11-12-2015 |
20150324201 | INFORMATION PROCESSING APPARATUS AND NON-TRANSITORY COMPUTER READABLE MEDIUM - An information processing apparatus includes a first controller that is controlled by a first operating system and a second controller that is controlled by a second operating system different from the first operating system. The first controller outputs a waiting instruction waiting persistently until a response is transmitted from a control target. The second controller converts the waiting instruction outputted from the first controller into a periodic instruction waiting periodically until a response is transmitted from the control target, and outputs the periodic instruction to the control target. | 11-12-2015 |
20150324228 | TRACE-BASED INSTRUCTION EXECUTION PROCESSING - A method for executing instructions in a thread processing environment includes determining a multiple requirements that must be satisfied and resources that must be available for executing multiple instructions. The multiple instructions are encapsulated into a schedulable structure. A header is configured for the schedulable structure with information including the determined multiple requirements and resources. The schedulable structure is schedule for executing each of the multiple instructions using the information. | 11-12-2015 |
20150339122 | PROCESSOR WITH CONDITIONAL INSTRUCTIONS - A computer implemented method for processing machine instructions by a physical processor, includes receiving a machine instruction, stored in a memory, to execute, the machine instruction including an identification of at least one first operation to execute and a conditional prefix representing a condition to verify to execute the at least one first operation; evaluating, using a management module, the prefix, and executing, using a processing unit, the at least one first operation identified in the machine instruction, according to whether the condition is verified or not. | 11-26-2015 |
20150370567 | METHOD AND APPARATUS FOR PERFORMANCE EFFICIENT ISA VIRTUALIZATION USING DYNAMIC PARTIAL BINARY TRANSLATION - Methods, apparatus and systems for virtualization of a native instruction set are disclosed. Embodiments include a processor core executing the native instructions and a second core, or alternatively only the second processor core consuming less power while executing a second instruction set that excludes portions of the native instruction set. The second core's decoder detects invalid opcodes of the second instruction set. A microcode layer disassembler determines if opcodes should be translated. A translation runtime environment identifies an executable region containing an invalid opcode, other invalid opcodes and interjacent valid opcodes of the second instruction set. An analysis unit determines an initial machine state prior to execution of the invalid opcode. A partial translation of the executable region that includes encapsulations of the translations of invalid opcodes and state recoveries of the machine states is generated and saved to a translation cache memory. | 12-24-2015 |
20150378732 | LATENT MODIFICATION INSTRUCTION FOR TRANSACTIONAL EXECUTION - An instruction stream includes a transactional code region. The transactional code region includes a latent modification instruction (LMI), a next sequential instruction (NSI) following the LMI, and a set of target instructions following the NSI in program order. Each target instruction has an associated function, and the LMI at least partially specifies a substitute function for the associated function. A processor executes the LMI, the NSI, and at least one of the target instructions, employing the substitute function at least partially specified by the LMI. The LMI, the NSI, and the target instructions may be executed by the processor in sequential program order or out of order. | 12-31-2015 |
20150378733 | REDUNDANCY ELIMINATION IN SINGLE INSTRUCTION MULTIPLE DATA/THREAD (SIMD/T) EXECUTION PROCESSING - A method for reducing execution of redundant threads in a processing environment. The method includes detecting threads that include redundant work among many different threads. Multiple threads from the detected threads are grouped into one or more thread clusters based on determining same thread computation results. Execution of all but a particular one thread in each of the one or more thread clusters is suppressed. The particular one thread in each of the one or more thread clusters is executed. Results determined from execution of the particular one thread in each of the one or more thread clusters are broadcasted to other threads in each of the one or more thread clusters. | 12-31-2015 |
20150378734 | SYSTEM AND METHODS FOR EXPANDABLY WIDE OPERAND INSTRUCTIONS - Expandably wide operations are disclosed in which operands wider than the data path between a processor and memory are used in executing instructions. The expandably wide operands reduce the influence of the characteristics of the associated processor in the design of functional units performing calculations, including the width of the register file, the processor clock rate, the exception subsystem of the processor, and the sequence of operations in loading and use of the operand in a wide cache memory. | 12-31-2015 |
20150378735 | LATENT MODIFICATION INSTRUCTION FOR TRANSACTIONAL EXECUTION - An instruction stream includes a transactional code region. The transactional code region includes a latent modification instruction (LMI), a next sequential instruction (NSI) following the LMI, and a set of target instructions following the NSI in program order. Each target instruction has an associated function, and the LMI at least partially specifies a substitute function for the associated function. A processor executes the LMI, the NSI, and at least one of the target instructions, employing the substitute function at least partially specified by the LMI. The LMI, the NSI, and the target instructions may be executed by the processor in sequential program order or out of order. | 12-31-2015 |
20150378741 | ARCHITECTURE AND EXECUTION FOR EFFICIENT MIXED PRECISION COMPUTATIONS IN SINGLE INSTRUCTION MULTIPLE DATA/THREAD (SIMD/T) DEVICES - A method for improving power, performance, area (PPA) for mixed precision computations in a processing environment. The method includes determining a braiding factor as a number of units of work encoded into a physical thread. A value of the braiding factor is determined based on a mix of precision requirements presented for individual units of work. Units of work are classified as instructions for applied code transformation based on associated precision requirements for the processing environment. Instruction inputs from specified registers are packed together into a destination register according to the determined value of the braiding factor. The packed instructions presented in vector form are executed with an instruction set architecture configured for executing packed instructions of different precisions. | 12-31-2015 |
20160011872 | APPARATUSES AND METHODS FOR GENERATING A SUPPRESSED ADDRESS TRACE | 01-14-2016 |
20160019062 | INSTRUCTION AND LOGIC FOR ADAPTIVE EVENT-BASED SAMPLING - A processor includes a core and an event-based sampler. The core includes logic to execute and retire an instruction. The event-based sampler includes logic determine a subset of a plurality of execution data of the processor from a register. The register includes bits specifying a subset of execution data. The event-based sampler further includes logic to selectively collect the determined subset of execution data upon retirement of the instruction and to store the selectively collected execution data. | 01-21-2016 |
20160019064 | Run-Length Encoding Decompression - Approaches are described to improve database performance by implementing a RLE decompression function at a low level within a general-purpose processor or an external block. Specifically, embodiments of a hardware implementation of an instruction for RLE decompression are disclosed. The described approaches improve performance by supporting the RLE decompression function within a processor and/or external block. Specifically, a RLE decompression hardware implementation is disclosed that produces a 64-bit RLE decompression result, with an example embodiment performing the task in two pipelined execution stages with a throughput of one per cycle. According to embodiments, hardware organization of narrow-width shifters operating in parallel, controlled by computed shift counts, is used to perform the decompression. Because of the decreased time required to perform RLE decompression according to embodiments, the performance of tasks that use embodiments described herein for decompression of run-length encoded data is made more efficient. | 01-21-2016 |
20160077835 | METHODS AND APPARATUS FOR STORAGE AND TRANSLATION OF ENTROPY ENCODED SOFTWARE EMBEDDED WITHIN A MEMORY HIERARCHY - A system for translating compressed instructions to instructions in an executable format is described. A translation unit is configured to decompress compressed instructions into a native instruction format using X and Y indices accessed from a memory, a translation memory, and a program specified mix mask. A level 1 cache is configured to store the native instruction format for each compressed instruction. The memory may be configured as a paged instruction cache to store pages of compressed instructions intermixed with pages of uncompressed instructions. Methods of determining a mix mask for efficiently translating compressed instructions is also described. A genetic method uses pairs of mix masks as genes from a seed population of mix masks that are bred and may be mutated to produce pairs of offspring mix masks to update the seed population. A mix mask for efficiently translating compressed instructions is determined from the updated seed population. | 03-17-2016 |
20160162296 | CONDITIONAL BRANCH INSTRUCTION COMPACTION FOR REGIONAL CODE SIZE REDUCTION - In an approach for decreasing an execution time of a computer code, one or more processors identify a long-form conditional branch that is included in a first region of a computer code. The one or more processors generate a long-form unconditional branch with a target that is a target of a long-form conditional branch. The one or more processors modify the long-form conditional branch to be a short-form conditional branch. The one or more processors insert the long-form unconditional branch into the computer code within a branch distance of the short-form conditional branch. The one or more processors modify a target of the short-form conditional branch to be a location of the long-form unconditional branch in the computer code. | 06-09-2016 |
20160202984 | INSTRUCTION FOR PERFORMING A PSEUDORANDOM NUMBER GENERATE OPERATION | 07-14-2016 |
20160378482 | EFFICIENT QUANTIZATION OF COMPARE RESULTS - A set machine instruction is provided that has associated therewith a result location to be used with a set operation. The set machine instruction is executed, which includes checking contents of a selected field, and determining, based on the checking, whether the contents of the selected field indicate a first condition, a second condition or a third condition represented in one data type. The result location is set to a value based on the determining, wherein the value, based on the setting, is of a data type different from the one data type and represents a result of a previously executed instruction. The result of the previously executed instruction being one of the first condition, the second condition or the third condition. | 12-29-2016 |
20160378484 | MAPPING INSTRUCTION BLOCKS BASED ON BLOCK SIZE - A processor core in an instruction block-based microarchitecture utilizes instruction blocks having headers that include an index to a size table that may be expressed using one of memory, register, logic, or code stream. A control unit in the processor core determines how many instructions to fetch for a current instruction block for mapping into an instruction window based on the block size that is indicated from the size table. As instruction block sizes are often unevenly distributed for a given program, utilization of the size table enables more flexibility in matching instruction blocks to the sizes of available slots in the instruction window as compared to arrangements in which instruction blocks have a fixed sized or are sized with less granularity. Such flexibility may enable denser instruction packing which increases overall processing efficiency by reducing the number of nops (no operations, such as null functions) in a given instruction block. | 12-29-2016 |
20160378485 | EFFICIENT QUANTIZATION OF COMPARE RESULTS - A set machine instruction is provided that has associated therewith a result location to be used with a set operation. The set machine instruction is executed, which includes checking contents of a selected field, and determining, based on the checking, whether the contents of the selected field indicate a first condition, a second condition or a third condition represented in one data type. The result location is set to a value based on the determining, wherein the value, based on the setting, is of a data type different from the one data type and represents a result of a previously executed instruction. The result of the previously executed instruction being one of the first condition, the second condition or the third condition. | 12-29-2016 |
20180024837 | CONTROLLING THE OPERATING SPEED OF STAGES OF AN ASYNCHRONOUS PIPELINE | 01-25-2018 |
20190146797 | SUPPLYING CONSTANT VALUES | 05-16-2019 |