Class / Patent application number | Description | Number of patent applications / Date published |
712223000 | Logic operation instruction processing | 66 |
20080222399 | METHOD FOR THE HANDLING OF MODE-SETTING INSTRUCTIONS IN A MULTITHREADED COMPUTING ENVIRONMENT - The present invention relates to the provisioning of mode-setting instruction as they relate to requisite hardware within a processing system. As such, the processing system allows for multiple programs, or processing threads of execution, to independently specify Modes, wherein modes are program specified assertions in regard to the processing system hardware environment (e.g., the temperature, voltage, frequency, gating functions, etc.). Thus, the objectives of the present invention are to facilitate a mutually acceptable environment for all of the processing threads that are being executed within a processing system; this objecting being subject to the respective processing requirements as requested by Mode-setting instructions that are specified by each executed processing thread. | 09-11-2008 |
20080256343 | Convergence determination and scaling factor estimation based on sensed switching activity or measured power consumption - A method and system for determining convergence of iterative processes and estimating a scaling factor in decoding processes based on switching activity of the logic circuitry are provided. During execution of an iterative process using logic circuitry comprising logic gates switching activity of a plurality of the logic gates is sensed to determine switching data indicative of a total switching activity of the plurality of the logic gates. The iterative process is iterated using the logic circuitry until convergence is indicated by the switching data. Similarly, a scaling factor for use in decoding processes is determined based on the switching data. | 10-16-2008 |
20080276078 | Method and Apparatus for Context Address Generation for Motion Vectors and Coefficients - A method for high/low usage is provided. The method receives a macroblock data structure and a syntax element at a digital signal processing engine. Further, the method classifies the syntax element as high use or low use. In addition, the method sends the syntax element from the digital signal processing engine to a logic unit, distinct from the digital processing engine, for binarization if the syntax element is high use. | 11-06-2008 |
20080307206 | METHOD AND APPARATUS TO EFFICIENTLY EVALUATE MONOTONICITY - A method and processor to evaluate a monotonicity of a set of input values is disclosed. The processor achieves high processing power by means of an arbitrary number of identical parallel processing elements. Each processing element allows instruction dependent data paths and makes use of ALU factories which consist of a number of separate arithmetic logical units (ALUs) are arranged in a special kind of matrix. The processor allows parallel evaluation and analysis of the monotonicity of a multitude of sets of values. A threshold value can be freely configured to allow an uncertainty of nearly equal values which is of high importance in digital signal processing. | 12-11-2008 |
20090089556 | High-Speed Add-Compare-Select (ACS) Circuit - A high speed add-compare-select (ACS) circuit for a Viterbi decoder or a turbo decoder has a lower critical path delay than that achievable using a traditional ACS circuit. According to one embodiment of the invention, the path and branch metrics are split into most-significant and least-significant portions, such portions separately added in order to reduce the propagation delay. | 04-02-2009 |
20090100253 | METHODS FOR PERFORMING EXTENDED TABLE LOOKUPS - A permutation instruction generates vector elements for a destination register using identified source and destination registers. A plurality of partial table lookups corresponding to an extended table produces a plurality of intermediate results. At least one source register stores a plurality of index values corresponding to the extended table. Out-of-range index values are values that are not contained in at least one additional source register and result in a predetermined constant value being stored into a predetermined vector element of the destination register. The index values are adjusted between the partial table lookups. A final result is formed by performing a logic function with the plurality of intermediate results. The final result is thereby formed without a full table lookup of each element of the final result. | 04-16-2009 |
20090164762 | OPTIMIZING XOR-BASED CODES - A “code optimizer” provides various techniques for optimizing arbitrary XOR-based codes for encoding and/or decoding of data. Further, the optimization techniques enabled by the code optimizer do not depend on any underlining code structure. Therefore, the optimization techniques provided by the code optimizer are applicable to arbitrary codes with arbitrary redundancy. As such, the optimized XOR-based codes generated by the code optimizer are more flexible than specially designed codes, and allow for any desired level of fault tolerance. Typical uses of XOR-based codes include, for example, encoding and/or decoding data using redundant data packets for data transmission real-time communications systems, encoding and/or decoding operations for storage systems such as RAID arrays, etc. | 06-25-2009 |
20090177870 | Method and System for a Wiring-Efficient Permute Unit - A method of providing wiring efficiency in a permute unit. Multiple selectors receive input data and shared control signals from multiple register files. The permute unit includes multiple multiplexors (MUXs) coupled to multiple logical AND gates. The multiple logical AND gates are coupled to multiple logical OR gates. The logical AND gates are physically separated from the logical OR gates. The logical AND gates receive input from one or more output data signals from the selectors. The logical OR gates combine the one or more output signals from the logical AND gates and provide output data from the permute unit. | 07-09-2009 |
20090249041 | SYSTEM AND METHOD FOR REDUCING POWER CONSUMPTION IN A DEVICE USING REGISTER FILES - A device and method for reducing the power consumption of an electronic device using register file with bypass mechanism. The width of a pulse controlling the word write operation may be extended twice as long so that the extended portion substantially overlaps a following word read pulse. The extension of the pulse width of the read operation may enable lowering the Vcc Min value for the electronic device and thus may lower the power consumption of the device. | 10-01-2009 |
20090292904 | APPARATUS AND METHOD FOR DISABLING A MICROPROCESSOR THAT PROVIDES FOR A SECURE EXECUTION MODE - An apparatus providing for a secure execution environment including a microprocessor and a secure non-volatile memory. The microprocessor executes non-secure application programs and a secure application program, where the non-secure application programs are accessed from a system memory via a system bus, and where the secure application program is executed in a secure execution mode. The microprocessor has secure watchdog logic that monitors environmental attributes corresponding to the microprocessor and to the secure application program, and that is configured to transfer program control to one of a plurality of event handlers within the secure application program. The secure non-volatile memory is coupled to the microprocessor via a private bus. The secure non-volatile memory is configured to store the secure application program, where transactions over the private bus between the microprocessor and the secure non-volatile memory are isolated from the system bus and corresponding system bus resources within the microprocessor. | 11-26-2009 |
20090319759 | SEAMLESS FREQUENCY SEQUESTERING - A method and apparatus for seamless frequency sequestering is herein described. In response to a frequency throttle event, controlling software, such as an OS, is provided access to a throttled amount of frequency associated with the frequency throttle event, while another amount of frequency is transparently sequestered for performance of non-controlling software tasks. | 12-24-2009 |
20100049952 | MICROPROCESSOR THAT PERFORMS STORE FORWARDING BASED ON COMPARISON OF HASHED ADDRESS BITS - An apparatus for decreasing the likelihood of incorrectly forwarding store data includes a hash generator, which hashes J address bits to K hashed bits. The J address bits are a memory address specified by a load/store instruction, where K is an integer greater than zero and J is an integer greater than K. The apparatus also includes a comparator, which outputs a first value if L address bits specified by the load instruction match L address bits specified by the store instruction and K hashed bits of the load instruction match corresponding K hashed bits of the store instruction, and otherwise to output a second value, where L is greater than zero. The apparatus also includes forwarding logic, which forwards data from the store instruction to the load instruction if the comparator outputs the first value and foregoes forwarding the data when the comparator outputs the second value. | 02-25-2010 |
20100077187 | System and Method to Execute a Linear Feedback-Shift Instruction - A system and method to execute a linear feedback-shift instruction is disclosed. In a particular embodiment the method includes executing an instruction at a processor by receiving source data and executing a bitwise logical operation on the source data and on reference data to generate intermediate data. The method further includes determining a parity value of the intermediate data, shifting the source data, and entering the parity value of the intermediate data into a data field of the shifted source data to produce resultant data. | 03-25-2010 |
20100100714 | System and Method of Indirect Register Access - Systems and methods are provided for managing access to registers. A system may include a set of direct registers and a set of indirect registers. The indirect registers may be accessed through the direct registers, and the direct registers may provide various features to provide faster access to the indirect registers. One of the direct registers may indicate access modes for accessing the indirect registers. The access modes may include auto-increment, auto-decrement, auto-reset, and no change modes. Based on the access mode, the currently accessed address may be automatically modified after accessing the indirect register at the address. | 04-22-2010 |
20100185837 | Reconfigurable Logic Automata - A family of reconfigurable asynchronous logic elements that interact with their nearest neighbors permits reconfigurable implementation of circuits that are asynchronous at the bit level, rather than at the level of functional blocks. These elements pass information by means of tokens. Each cell is self-timed, and cells that are configured as interconnect perform at propagation delay speeds, so no hardware non-local connections are needed. A reconfigurable asynchronous logic element comprises a set of edges for communication with at least one neighboring cell, each edge having an input for receiving tokens from neighboring cells and an output for transferring tokens to at least one neighboring cell, circuitry configured to perform a logic operation utilizing received tokens as inputs and to produce an output token reflecting the result of the logic operation, and circuitry. A reconfigurable lattice of asynchronous logic automata comprises a plurality of reconfigurable logic automata that compute by locally passing state tokens and are reconfigured by the directed shifting of programming instructions through neighboring logic elements. | 07-22-2010 |
20100217960 | Method of Performing Serial Functions in Parallel - A method for performing serial functions in parallel, where a datapath is divided into several independent stages, or pipeline stages, so that logical functions can be implemented in each pipeline stage concurrently. In an illustrative embodiment of the invention, a pipelined logic tree is described. This method allows for n-bits to be input to the system and n-bits to output from the system concurrently. | 08-26-2010 |
20100293358 | DYNAMIC PROCESSOR-SET MANAGEMENT - A dynamic processor-set management method provides for transferring a process from a shared processor set to a dedicated processor set when that process meets a first utilization-related criterion. The method also provides for transferring a process between from a dedicated processor set to a shared processor set when that process meets a second utilization-related criterion. The processor sets are mapped to processor cores that execute the processes. | 11-18-2010 |
20100299506 | ROTATE THEN OPERATE ON SELECTED BITS FACILITY AND INSTRUCTIONS THEREFORE - A rotate then operate instruction having a T bit is fetched and executed wherein a first operand in a first register is rotated by an amount and a Boolean operation is performed on a selected portion of the rotated first operand and a second operand in of a second register. If the T bit is ‘0’ the selected portion of the result of the Boolean operation is inserted into corresponding bits of a second operand of a second register. If the T bit is ‘1’, in addition to the inserted bits, the bits other than the selected portion of the rotated first operand are saved in the second register. | 11-25-2010 |
20100318773 | INCLUSIVE "OR" BIT MATRIX COMPARE RESOLUTION OF VECTOR UPDATE CONFLICT MASKS - A computer system is operable to identify index elements in a vector index array that cannot be processed in parallel by calculating a complement modified bit matrix compare function between a first matrix filled with elements from the vector index array and a second matrix filled with the same elements from the vector index array. | 12-16-2010 |
20110138156 | METHOD AND APPARATUS FOR EVALUATING A LOGICAL EXPRESSION AND PROCESSOR MAKING USE OF SAME - A method and associated processor suitable for executing machine instructions for evaluating a logical expression are provided. The approach suggested makes use of a memory and an extended set of instructions. The memory, which can be embodied in a general purpose register for example, is for storing information related to an intermediate results obtained in evaluating the logical expression as well as a nesting level of sub-expressions in the logical expression being evaluated. The extended set of instruction allows for initializing and updating the information in that memory. A processor for executing the extended set of instruction is also provided along with a process for generating machine code making use of this extended set of instructions for evaluating a logical expression. | 06-09-2011 |
20110153997 | Bit Range Isolation Instructions, Methods, and Apparatus - Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed. | 06-23-2011 |
20110191570 | Method And Apparatus For Performing Logical Compare Operations - A method and apparatus for including in a processor instructions for performing logical-comparison and branch support operations on packed or unpacked data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first data and a second data. The processor performs logical comparisons on the first and second data. The logical comparisons may be performed on each bit of the first and second data, or may be performed only on certain bits. For at least one embodiment, at least the first data includes packed data elements, and the logical comparisons are performed on the most significant bits of the packed data elements. The logical comparisons may include comparison of the same respective bits of the first and second data, and also includes logical comparisons of bits of the first data with the complement of the corresponding bits of the second data. Based on these comparisons, branch support actions are taken. Such branch support actions may include setting one or more flags, which in turn may be utilized by a branching unit. Alternatively, the branch support actions may include branching to an indicated target code location. | 08-04-2011 |
20110258419 | Attaching And Virtualizing Reconfigurable Logic Units To A Processor - In one embodiment, the present invention includes a pipeline to execute instructions out-of-order, where the pipeline has front-end stages, execution units, and back-end stages, and the execution units are coupled between dispatch ports of the front-end stages and writeback ports of the back-end stages. Further, a reconfigurable logic is coupled between one of the dispatch ports and one of the writeback ports. Other embodiments are described and claimed. | 10-20-2011 |
20120030451 | PARALLEL AND LONG ADAPTIVE INSTRUCTION SET ARCHITECTURE - An Parallel and Long Adaptive Instruction Set Architecture (PALADIN) is provided to optimize packet processing. The Instruction Set Architecture (ISA) includes instructions such as aggregate comparison, comparison OR, comparison AND and bitwise instructions. The ISA also includes dedicated packet processing instructions such as hash, predicate, select, checksum and time to live adjust, move header left, post, move header left/right and load/store header/status. | 02-02-2012 |
20120047355 | Information Processing Apparatus Performing Various Bit Operation and Information Processing Method Thereof - An information processing apparatus operates data stored in an input register for each bit and stores a result thereof in an output register. A selector circuit selects output data of a bit from input data of 128 bits in the input register. An AND circuit outputs, only when data from a corresponding selector circuit is valid, the data to a corresponding bit of the output register. A control signal generator inputs a select signal indicating the number of a bit to be selected to each selector circuit, and also inputs a signal indicating whether data input from the selector circuit is valid or invalid to each AND circuit. | 02-23-2012 |
20120072704 | "OR" BIT MATRIX MULTIPLY VECTOR INSTRUCTION - A processor is operable to execute a bit matrix multiply instruction. In further examples, the processor is operable to perform a vector bit matrix multiply instruction, and is a part of a computerized system. | 03-22-2012 |
20120144170 | DYNAMICALLY SCALABLE PER-CPU COUNTERS - Embodiments include a reference counting system and method for a multiprocessor system including distributed per-CPU counters having a dynamically variable batch size. A global counter is dynamically updated as each per-CPU counter reaches its associated batch size. An initial batch size provides a desired scalability. The batch size is automatically reduced as the global count approaches a predefined target, to increase the accuracy of the global count. Counting can be performed atomically using architecturally supported atomic operations. Using synchronized counters, counting can be done with a lock held by each processor to provide the necessary mutual exclusion for performing the atomic operations. | 06-07-2012 |
20120204014 | Systems and Methods for Improving Divergent Conditional Branches - Embodiments of the present invention provide systems, methods, and computer program products for improving divergent conditional branches in code being executed by a processor. For example, in an embodiment, a method comprises detecting a conditional statement of a program being simultaneously executed by a plurality of threads, determining which threads evaluate a condition of the conditional statement as true and which threads evaluate the condition as false, pushing an identifier associated with the larger set of the threads onto a stack, executing code associated with a smaller set of the threads, and executing code associated with the larger set of the threads. | 08-09-2012 |
20120216021 | Performing An All-To-All Data Exchange On A Plurality Of Data Buffers By Performing Swap Operations - Methods, apparatus, and products are disclosed for performing an all-to-all exchange on n number of data buffers using XOR swap operations. Each data buffer has n number of data elements. Performing an all-to-all exchange on n number of data buffers using XOR swap operations includes for each rank value of i and j where i is greater than j and where i is less than or equal to n: selecting data element i in data buffer j; selecting data element j in data buffer i; and exchanging contents of data element i in data buffer j with contents of data element j in data buffer i using an XOR swap operation. | 08-23-2012 |
20130067205 | INSTRUCTION PACKET INCLUDING MULTIPLE INSTRUCTIONS HAVING A COMMON DESTINATION - An apparatus includes a processor and a memory coupled to the processor. The memory stores an instruction packet (e.g., a VLIW instruction packet) including a first predicate independent instruction and a second predicate independent instruction. Each of the predicate independent instructions has the same destination. | 03-14-2013 |
20130138928 | VLIW PROCESSOR, INSTRUCTION STRUCTURE, AND INSTRUCTION EXECUTION METHOD - A first operation unit | 05-30-2013 |
20130159683 | INSTRUCTION PREDICATION USING INSTRUCTION ADDRESS PATTERN MATCHING - A particular method includes receiving, at a processor, an instruction and an address of the instruction. The method also includes preventing execution of the instruction based at least in part on determining that the address is within a range of addresses. | 06-20-2013 |
20130212363 | MACHINE TRANSPORT AND EXECUTION OF LOGIC SIMULATION - Technologies related to machine transport and execution of logic simulation. In some examples, logic simulation systems may cyclically calculate logic state vectors based on the current state and inputs into the system. The state vector is a state of a logic storage element in a model. State vectors may be distributed from a core of common memory to one or more arrays of processors to compute the next state vector. The one or more arrays of processors are connected with arrays of logic processors and memory for efficiency and speed. | 08-15-2013 |
20130227253 | METHOD AND APPARATUS FOR PERFORMING LOGICAL COMPARE OPERATIONS - A method and apparatus for including in a processor instructions for performing logical-comparison and branch support operations on packed or unpacked data. In one embodiment, instruction decode logic decodes instructions for an execution unit to operate on packed data elements including logical comparisons. A register file including 128-bit packed data registers stores packed single-precision floating point (SPFP) and packed integer data elements. The logical comparisons may include comparison of SPFP data elements and comparison of integer data elements and setting at least one bit to indicate the results. Based on these comparisons, branch support actions are taken. Such branch support actions may include setting the at least one bit, which in turn may be utilized by a branching unit in response to a branch instruction. Alternatively, the branch support actions may include branching to an indicated target code location. | 08-29-2013 |
20130254517 | APPARATUS AND METHOD FOR PROCESSING INVALID OPERATION IN PROLOGUE OR EPILOGUE OF LOOP - An apparatus for processing an invalid operation in a prologue and/or an epilogue of a loop includes a register file including a first region for storing a data validity value indicating whether data is valid or invalid, and a second region for storing the data; and a functional unit configured to determine whether an operation is valid or invalid based on a value of a first region of each of one or more input sources received from the register file, and output a destination including a value based on the value of the first region of each of the input sources | 09-26-2013 |
20140095844 | Systems, Apparatuses, and Methods for Performing Rotate and XOR in Response to a Single Instruction - Disclosed herein are systems, apparatuses, and methods performing in a computer processor of performing a rotate and XOR in response to a single XOR and rotate instruction, wherein the rotate and XOR instruction includes a first and second source operand, a destination operand, and an immediate value. | 04-03-2014 |
20140095845 | APPARATUS AND METHOD FOR EFFICIENTLY EXECUTING BOOLEAN FUNCTIONS - An apparatus and method are described for performing efficient Boolean operations in a pipelined processor which, in one embodiment, does not natively support three operand instructions. For example, a processor according to one embodiment of the invention comprises: a set of registers for storing packed operands; Boolean operation logic to execute a single instruction which uses three or more source operands packed in the set of registers, the Boolean operation logic to read at least three source operands and an immediate value to perform a Boolean operation on the three source operands, wherein the Boolean operation comprises: combining a bit read from each of the three operands to form an index to the immediate value, the index identifying a bit position within the immediate value; reading the bit from the identified bit position of the immediate value; and storing the bit from the identified bit position of the immediate value in a destination register. | 04-03-2014 |
20140122839 | APPARATUS AND METHOD OF EXECUTION UNIT FOR CALCULATING MULTIPLE ROUNDS OF A SKEIN HASHING ALGORITHM - An apparatus is described that includes an execution unit within an instruction pipeline. The execution unit has multiple stages of a circuit that includes a) and b) as follows. a) a first logic circuitry section having multiple mix logic sections each having: i) a first input to receive a first quad word and a second input to receive a second quad word; ii) an adder having a pair of inputs that are respectively coupled to the first and second inputs; iii) a rotator having a respective input coupled to the second input; iv) an XOR gate having a first input coupled to an output of the adder and a second input coupled to an output of the rotator. b) permute logic circuitry having inputs coupled to the respective adder and XOR gate outputs of the multiple mix logic sections. | 05-01-2014 |
20140149721 | METHOD, COMPUTER PROGRAM PRODUCT, AND SYSTEM FOR A MULTI-INPUT BITWISE LOGICAL OPERATION - A method, computer program product, and system are provided for multi-input bitwise logical operations. The method includes the steps of receiving a multi-input bitwise logical operation instruction that specifies two or more input operands and a function operand, where a first input operand of the two or more input operands comprises a number of bits, each bit having a corresponding bit in each of the additional input operands in the two or more input operands. The function operand is written to a lookup table. Then, the lookup table is accessed for each set of corresponding input operand bits in the two or more input operands to generate an output for the multi-input bitwise logical operation instruction. | 05-29-2014 |
20140189322 | Systems, Apparatuses, and Methods for Masking Usage Counting - Embodiments of systems, apparatuses, and methods for counting instructions of a particular type are described herein. In some embodiments, a processor includes a plurality of write mask registers, logic to determine write mask register usage of an instruction in a particular manner and a counter to count a number of instances of instructions that have been determined to use a write mask register in the particular manner. | 07-03-2014 |
20140223147 | SYSTEMS AND METHODS FOR VIRTUAL PARALLEL COMPUTING USING MATRIX PRODUCT STATES - A virtual parallel computing system and method represents bits with matrices and computes over all input states in parallel through a sequence of matrix operations. The matrix operations relate to logic gate operators to carry out a function implementation that represents a problem to be solved. Initial matrices are prepared to encode the weights of all input states, which can be binary states. Intermediate results can be simplified to decrease computational complexity while maintaining useful approximation results. The final matrices can encode the answer(s) to the problem represented by the function implementation. The system and method are particularly useful in speeding up database searches and in counting solutions of satisfiability problems. | 08-07-2014 |
20140250288 | SYSTEMS AND METHODS THAT FORMULATE EMBEDDINGS OF PROBLEMS FOR SOLVING BY A QUANTUM PROCESSOR - Systems and methods allow formulation of embeddings of problems via targeted hardware (e.g., particular quantum processor). In a first stage, sets of connected subgraphs are successively generated, each set including a respective subgraph for each decision variable in the problem graph, adjacent decisions variables in the problem graph mapped to respective vertices in the hardware graph, the respective vertices which are connected by at least one respective edge in the hardware graph. In a second stage, the connected subgraphs are refined such that no vertex represents more than a single decision variable. | 09-04-2014 |
20140281422 | Method and Apparatus for Sorting Elements in Hardware Structures - A method for sorting elements in hardware structures is disclosed. The method comprises selecting a plurality of elements to order from an unordered input queue (UIQ) within a predetermined range in response to finding a match between at least one most significant bit of the predetermined range and corresponding bits of a respective identifier associated with each of the plurality of elements. The method further comprises presenting each of the plurality of elements to a respective multiplexer. Further the method comprises generating a select signal for an enabled multiplexer in response to finding a match between at least one least significant bit of a respective identifier associated with each of the plurality of elements and a port number of the ordered queue. Finally, the method comprises forwarding a packet associated with a selected element identifier to a matching port number of the ordered queue from the enabled multiplexer. | 09-18-2014 |
20140365750 | VOLTAGE DROOP REDUCTION BY DELAYED BACK-PROPAGATION OF PIPELINE READY SIGNAL - A system, method, and computer program product for generating flow-control signals for a processing pipeline is disclosed. The method includes the steps of generating, by a first pipeline stage, a delayed ready signal based on a downstream ready signal received from a second pipeline stage and a throttle disable signal. A downstream valid signal is generated by the first pipeline stage based on an upstream valid signal and the delayed ready signal. An upstream ready signal is generated by the first pipeline stage based on the delayed ready signal and the downstream valid signal. | 12-11-2014 |
20160085556 | INSTRUCTION AND LOGIC FOR SCHEDULING INSTRUCTIONS - A processor includes a front end and a scheduler. The front end includes logic to determine whether to apply an acyclical or cyclical thread assignment scheme to code received at the processor, and to, based upon a determined thread assignment scheme, assign code to a static logical thread and to a rotating logical thread. The scheduler includes logic to assign the static logical thread to the same physical thread upon a subsequent control flow execution of the static logical thread, and to assign the rotating logical thread to different physical threads upon different executions of instructions in the rotating logical thread. | 03-24-2016 |
20160117168 | VLIW PROCESSOR, INSTRUCTION STRUCTURE, AND INSTRUCTION EXECUTION METHOD - A processor, includes a first comparison operation unit; a second comparison operation unit; a first operation unit; a second operation unit; a third operation unit; and a register, wherein the first comparison operation unit receives a first comparison operation signal, a first input signal, and a second input signal, performs a comparison operation indicated by the first comparison operation signal on the first input signal and the second input signal, and outputs a result of the comparison operation, the second comparison operation unit receives a second comparison operation signal, a third input signal, and a fourth input signal, performs a comparison operation indicated by the second comparison operation signal on the third input signal and the fourth input signal, and outputs a result of the comparison operation, the first operation unit receives the comparison result of the first comparison operation unit. | 04-28-2016 |
20160378469 | INSTRUCTION TO PERFORM A LOGICAL OPERATION ON CONDITIONS AND TO QUANTIZE THE BOOLEAN RESULT OF THAT OPERATION - A machine instruction is provided that has associated therewith a result location to be used for a set operation, a first source, a second source, and an operation select field configured to specify a plurality of selectable operations. The machine instruction is executed, which includes obtaining the first source, the second source, and a selected operation, and performing the selected operation on the first source and the second source to obtain a result in one data type. That result is quantized to a value in a different data type, and the value is placed in the result location. | 12-29-2016 |
20160378471 | INSTRUCTION AND LOGIC FOR EXECUTION CONTEXT GROUPS FOR PARALLEL PROCESSING - A processor includes cores and a context management circuit. The circuit includes logic to determine an execution context group (ECG) to be migrated between cores. The ECG is to include application threads. The circuit also includes logic to halt all execution contexts in the ECG before migrating the ECG, reassign processor affinity to designate the target core, and restart execution of the ECG. | 12-29-2016 |
20160378472 | Instruction and Logic for Predication and Implicit Destination - A processor includes a front end to receive an instruction. The processor also includes a core to execute the instruction. The core includes logic to execute a base function of the instruction to yield a result, generate a predicate value of a comparison of the result based upon a predication setting in the instruction, and set the predicate value in a register. The processor also includes a retirement unit to retire the instruction. | 12-29-2016 |
20160378475 | INSTRUCTION TO PERFORM A LOGICAL OPERATION ON CONDITIONS AND TO QUANTIZE THE BOOLEAN RESULT OF THAT OPERATION - A machine instruction is provided that has associated therewith a result location to be used for a set operation, a first source, a second source, and an operation select field configured to specify a plurality of selectable operations. The machine instruction is executed, which includes obtaining the first source, the second source, and a selected operation, and performing the selected operation on the first source and the second source to obtain a result in one data type. That result is quantized to a value in a different data type, and the value is placed in the result location. | 12-29-2016 |
712224000 | Masking | 16 |
20090049283 | INFORMATION PROCESSING DEVICE AND INSTRUCTION EXECUTING METHOD - An information processing device including registers ( | 02-19-2009 |
20090077354 | Techniques for Predicated Execution in an Out-of-Order Processor - A technique for handling predicated code in an out-of-order processor includes detecting a predicate defining instruction associated with a predicated code region. Renaming of predicated instructions, within the predicated code region, is then stalled until a predicate of the predicate defining instruction is resolved. | 03-19-2009 |
20090089557 | UTILIZING MASKED DATA BITS DURING ACCESSES TO A MEMORY - Embodiments of an apparatus that uses unused masked data bits during an access to a memory are described. This apparatus includes a selection circuit, which selects data bits to be driven on data lines during the access to the memory. This selection circuit includes a control input that receives a data mask signal, which indicates whether a set of data bits is to be masked during the access to the memory. During the access to the memory, the selection circuit selects either the set of data bits to be driven when the data mask signal is not asserted, or an alternative set of values to be driven when the data mask signal is asserted. | 04-02-2009 |
20090150655 | METHOD OF UPDATING REGISTER, AND REGISTER AND COMPUTER SYSTEM TO WHICH THE METHOD CAN BE APPLIED - A register updating method includes generating third information including first information and second information, wherein the first information indicates whether updating of regions of a register block is allowed and the second information includes information that is to be updated in the register block, transmitting the third information to an address of the register block that is to be updated, and selecting a part of the second information in a unit of the regions and writing the selected second information to the register block, based on the first information included in the received third information. | 06-11-2009 |
20100023734 | SYSTEM, METHOD AND COMPUTER PROGRAM PRODUCT FOR EXECUTING A HIGH LEVEL PROGRAMMING LANGUAGE CONDITIONAL STATEMENT - A method for executing an instruction, the method includes: executing a compare and configure mask instruction, wherein the executing comprises: performing a comparison to provide a comparison result; and configuring, in response to the comparison result, a multiple bit mask that is stored in a multiple-purpose register; wherein all bits of the multiple bit mask are configured to have the same value; and applying an algorithmic operation on the multiple bit mask to provide an algorithmic operation result; wherein the algorithmic operation result represents an outcome of a high level programming language conditional statement. | 01-28-2010 |
20100169619 | Efficient Encoding for Detecting Load Dependency on Store with Misalignment - In one embodiment, an apparatus comprises a queue comprising a plurality of entries and a control unit coupled to the queue. The control unit is configured to allocate a first queue entry to a store memory operation, and is configured to write a first even offset, a first even mask, a first odd offset, and a first odd mask corresponding to the store memory operation to the first entry. A group of contiguous memory locations are logically divided into alternately-addressed even and odd byte ranges. A given store memory operation writes at most one even byte range and one adjacent odd byte range. The first even offset identifies a first even byte range that is potentially written by the store memory operation, and the first odd offset identifies a first odd byte range that is potentially written by the store memory operation. The first even mask identifies bytes within the first even byte range that are written by the store memory operation, and wherein the first odd mask identifies bytes within the first odd byte range that are written by the store memory operation. | 07-01-2010 |
20130246760 | CONDITIONAL IMMEDIATE VALUE LOADING INSTRUCTIONS - A data processor comprising a plurality of registers, and instruction execution circuitry having an associated instruction set, wherein the instruction set includes an instruction specifying at least a mask operand, a register operand and an immediate value operand, and the instruction execution circuitry, in response to an instance of the instruction, determines a Boolean value based on the mask operand and sets a respective one of a plurality of registers specified by the register operand of the instance to a value of the immediate value operand if the Boolean value is true. The instruction execution circuitry, in response to the instance of the instruction, may set the respective one of the plurality of registers specified by the register operand of the instance to zero if the Boolean value is false. | 09-19-2013 |
20130290687 | APPARATUS AND METHOD OF IMPROVED PERMUTE INSTRUCTIONS - An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths. | 10-31-2013 |
20140189323 | APPARATUS AND METHOD FOR PROPAGATING CONDITIONALLY EVALUATED VALUES IN SIMD/VECTOR EXECUTION - An apparatus and method for propagating conditionally evaluated values. For example, a method according to one embodiment comprises: reading each value contained in an input mask register, each value being a true value or a false value and having a bit position associated therewith; for each true value read from the input mask register, generating a first result containing the bit position of the true value; for each false value read from the input mask register following the first true value, adding the vector length of the input mask register to a bit position of the last true value read from the input mask register to generate a second result; and storing each of the first results and second results in bit positions of an output register corresponding to the bit positions read from the input mask register. | 07-03-2014 |
20140365751 | OPERAND GENERATION IN AT LEAST ONE PROCESSING PIPELINE - A data processing apparatus has at least one processing pipeline having first, second and third pipeline stages. The first pipeline stage detects whether a stream of instructions to be processed includes a predetermined instruction sequence comprising first and second instructions for performing first and second operand generation operations, where the second operand generation operation is dependent on an outcome of the first. In response to detecting this instruction sequence, the first pipeline stage generates a modified stream of instructions in which at least the second instruction is replaced with a third instruction for performing a combined operand generation operation having the same effect as the first and second operand generation operations. As the third instruction can be scheduled independently of the first instruction, processing performance of the pipeline can be improved. | 12-11-2014 |
20160092212 | DYNAMIC ISSUE MASKS FOR PROCESSOR HANG PREVENTION - Embodiments include issuing dynamic issue masks for processor hang prevention. Aspects include storing an instruction in an issue queue for execution by an execution unit, the instruction including a default issue mask. Aspects further include determining whether the instruction in the issue queue is likely to be rescinded by the execution unit. Based on determining that the instruction is not likely to be rescinded by the execution unit, aspects include issuing the instruction to the execution unit with the default issue mask. Based on determining that the instruction is likely to be rescinded by the execution unit, aspects include issuing the instruction to the execution unit with a likely to be rescinded issue mask. | 03-31-2016 |
20160139920 | CARRY CHAIN FOR SIMD OPERATIONS - Examples of a carry chain for performing an operation on operands each including elements of a selectable size is provided. Advantageously, the carry chain adapts to elements of different sizes. The carry chain determines a mask based on a selected size of an element. The carry chain selects, based on the mask, whether to carry a partial result of an operation performed on corresponding first portions of a first operand and a second operand into a next operation. The next operation is performed on corresponding second portions of the first operand and the second operand, and, based on the selection, the partial result of the operation. The carry chain stores, in a memory, a result formed from outputs of the operation and the next operation. | 05-19-2016 |
20160139923 | LOAD REGISTER ON CONDITION IMMEDIATE OR IMMEDIATE INSTRUCTION - A data processor comprising a plurality of registers, and instruction execution circuitry having an associated instruction set, wherein the instruction set includes an instruction specifying at least a mask operand, a register operand and an immediate value operand, and the instruction execution circuitry, in response to an instance of the instruction, determines a Boolean value based on the mask operand and sets a respective one of a plurality of registers specified by the register operand of the instance to a value of the immediate value operand if the Boolean value is true. The instruction execution circuitry, in response to the instance of the instruction, may set the respective one of the plurality of registers specified by the register operand of the instance to zero if the Boolean value is false. | 05-19-2016 |
20160378468 | CONVERSION OF BOOLEAN CONDITIONS - A Set Boolean machine instruction is provided that has associated therewith a result location to be used for a set Boolean operation and a mask. The mask is configured to test a plurality of types of conditions, including simple conditions and composite conditions. The machine instruction is executed, and the executing includes performing a first logical operation between the mask and contents of a selected field to obtain an output. The mask indicates a condition to be tested, and the condition is one type of condition of the plurality of types of conditions. The executing further includes performing a second logical operation on the output to obtain a first value represented as one data type, and placing a result in the result location based on the first value. The result including a second a value of another data type, the other data type being different from the one data type. | 12-29-2016 |
20160378474 | CONVERSION OF BOOLEAN CONDITIONS - A Set Boolean machine instruction is provided that has associated therewith a result location to be used for a set Boolean operation and a mask. The mask is configured to test a plurality of types of conditions, including simple conditions and composite conditions. The machine instruction is executed, and the executing includes performing a first logical operation between the mask and contents of a selected field to obtain an output. The mask indicates a condition to be tested, and the condition is one type of condition of the plurality of types of conditions. The executing further includes performing a second logical operation on the output to obtain a first value represented as one data type, and placing a result in the result location based on the first value. The result including a second a value of another data type, the other data type being different from the one data type. | 12-29-2016 |
20220137929 | DATA PROCESSING DEVICE AND METHOD FOR THE CRYPTOGRAPHIC PROCESSING OF DATA - According to one example embodiment, a data processing device is described, having a processing circuit which processes a data block cryptographically iteratively, starting from the received version of the data block, via a plurality of processed versions of the data block through to an output version of the data block in a plurality of successive rounds by means of an S-box. The S-box has a plurality of layers, in each case having a majority gate and an Exclusive-OR gate. | 05-05-2022 |