Class / Patent application number | Description | Number of patent applications / Date published |
712204000 | INSTRUCTION ALIGNMENT | 34 |
20080222391 | Apparatus and Method for Optimizing Scalar Code Executed on a SIMD Engine by Alignment of SIMD Slots - An apparatus and method for optimizing scalar code executed on a single instruction multiple data (SIMD) engine is provided that aligns the slots of SIMD registers. With the apparatus and method, a compiler is provided that parses source code and, for each statement in the program, generates an expression tree. The compiler inspects all storage inputs to scalar operations in the expression tree to determine their alignment in the SIMD registers. This alignment is propagated up the expression tree from the leaves. When the alignments of two operands in the expression tree are the same, the resulting alignment is the shared value. When the alignments of two operands in the expression tree are different, one operand is shifted. For shifted operands, a shift operation is inserted in the expression tree. The executable code is then generated for the expression tree and shifts are inserted where indicated. | 09-11-2008 |
20080229066 | System and Method for Compiling Scalar Code for a Single Instruction Multiple Data (SIMD) Execution Engine - A system, method, and computer program product are provided for performing scalar operations using a SIMD data parallel execution unit. With the mechanisms of the illustrative embodiments, scalar operations in application code are identified that may be executed using vector operations in a SIMD data parallel execution unit. The scalar operations are converted, such as by a static or dynamic compiler, into one or more vector load instructions and one or more vector computation instructions. In addition, control words may be generated to adjust the alignment of the scalar values for the scalar operation within the vector registers to which these scalar values are loaded using the vector load instructions. The alignment amounts for adjusting the scalar values within the vector registers may be statically or dynamically determined. | 09-18-2008 |
20080270756 | SHIFT SIGNIFICAND OF DECIMAL FLOATING POINT DATA - A decimal floating point finite number in a decimal floating point format is composed from the number in a different format. A decimal floating point format includes fields to hold information relating to the sign, exponent and significand of the decimal floating point finite number. Other decimal floating point data, including infinities and NaNs (not a number), are also composed. Decimal floating point data are also decomposed from the decimal floating point format to a different format. For composition and decomposition, one or more instructions may be employed, including a shift significand instruction. | 10-30-2008 |
20090037694 | Load Misaligned Vector with Permute and Mask Insert - Embodiments of the invention provide logic within the store data path between a processor and a memory array. The logic may be configured to misalign vector data as it is stored to memory. By misaligning vector data as it is stored to memory, memory bandwidth may be maximized while processing bandwidth required to store vector data misaligned is minimized. Furthermore, embodiments of the invention provide logic within the load data path which allows vector data which is stored misaligned to be aligned as it is loaded into a vector register. By aligning misaligned vector data as it is loaded into a vector register, memory bandwidth may be maximized while processing bandwidth required to align misaligned vector data may be minimized. | 02-05-2009 |
20090063818 | Alignment of Cache Fetch Return Data Relative to a Thread - A method of obtaining data, comprising at least one sector, for use by at least a first thread wherein each processor cycle is allocated to at least one thread, includes the steps of: requesting data for at least a first thread; upon receipt of at least a first sector of the data, determining whether the at least first sector is aligned with the at least first thread, wherein a given sector is aligned with a given thread when a processor cycle in which the given sector will be written is allocated to the given thread; responsive to a determination that the at least first sector is aligned with the at least first thread, bypassing the at least first sector, wherein bypassing a sector comprises reading the sector while it is being written; and responsive to a determination that the at least first sector is not aligned with the at least first thread, delaying the writing of the at least first sector until the occurrence of a processor cycle allocated to the at least first thread by retaining the at least first sector in at least one alignment register, thereby permitting the at least first sector to be bypassed. | 03-05-2009 |
20090187739 | Method and Apparatus for Improved Computer Load and Store Operations - Load and store operations in computer systems are extended to provide for Stream Load and Store and Masked Load and Store. In Stream operations, a CPU executes a Stream instruction that indicates, by appropriate arguments, a first address in memory or a first register in a register file from whence to begin reading data entities, and a first address or register from whence to begin storing the entities, and a number of entities to be read and written. In Masked Load and Masked Store operations stored masks are used to indicate patterns relative to first addresses and registers for loading and storing. Bit-string vector methods are taught for masks. | 07-23-2009 |
20090217001 | System and Method for Handling Load and/or Store Operations in a Superscalar Microprocessor - The present invention provides a system and method for managing load and store operations necessary for reading from and writing to memory or I/O in a superscalar RISC architecture environment. To perform this task, a load store unit is provided whose main purpose is to make load requests out of order whenever possible to get the load data back for use by an instruction execution unit as quickly as possible. A load operation can only be performed out of order if there are no address collisions and no write pendings. An address collision occurs when a read is requested at a memory location where an older instruction will be writing. Write pending refers to the case where an older instruction requests a store operation, but the store address has not yet been calculated. The data cache unit returns 8 bytes of unaligned data. The load/store unit aligns this data properly before it is returned to the instruction execution unit. Thus, the three main tasks of the load store unit are: (1) handling out of order cache requests; (2) detecting address collisions; and (3) alignment of data. | 08-27-2009 |
20090249032 | INFORMATION APPARATUS - An information apparatus comprises: a barrel shifter composed of a bidirectional 1-bit shifter, . . . , and a bidirectional 24-bit shifter which are connected in series; a control unit for outputting an endian conversion control signal SE indicating one of a shift operation and endian conversion; an endian conversion unit for generating data by endian conversion using data obtained by performing a shift operation in the bidirectional 8-bit shifter and the bidirectional 24-bit shifter; and a selector for selecting, when the endian conversion control signal SE indicates a shift operation, data outputted from the bidirectional 24-bit shifter, and selecting, when the endian conversion control signal SE indicates endian conversion, the data outputted from the endian conversion unit. | 10-01-2009 |
20090300328 | Aligning Protocol Data Units - An apparatus for receiving one or more protocol data units (PDUs) from a word aligned queue including a media access control (MAC) physical-layer (PHY) coprocessor (MPC) logically residing between a physical-layer controller and a media access controller (MAC) processor. The MPC is configured to access a reception physical-layer queue storing a burst, such that the reception physical-layer queue includes a plurality of word lines. The burst includes one or more PDUs that each occupy one or more word lines of the reception physical-layer queue, such that a particular word line stores a portion of a first PDU and a portion of second PDU. The MPC is also configured to receive from the reception physical-layer queue the first PDU including the portion of the first PDU stored in the selected word line. | 12-03-2009 |
20100138635 | Systems and Methods for Managing Endian Mode of a Device - Systems, methods, and devices for managing endian-ness are disclosed. In one embodiment, a device is configured to selectively operate in one of a big-endian operating mode or a little-endian operating mode. The device may include a register in which the current endian mode of the device is indicated in at least two different bit positions within the register. The at least two different bit positions may be chosen such that a data bit in one of the bit positions would be read by a system if the device and system operate in the same endian mode, while a data bit in another of the chosen bit positions would be read by the system if the device and system are operating in different endian modes from one another. In some embodiments, the endian mode of the device may be controlled by a hardware input or a software input. | 06-03-2010 |
20100211760 | APPARATUS AND METHOD FOR PROVIDING INSTRUCTION FOR HETEROGENEOUS PROCESSOR - Provided are an apparatus and method for providing instructions for a heterogeneous processor having heterogeneous components supporting different data widths. Respective data widths of operands and connections in a data flow graph are determined by using type information of operands. Instructions, to be executed by the heterogeneous processor, are provided based on the determined data widths. | 08-19-2010 |
20110185158 | HISTORY AND ALIGNMENT BASED CRACKING FOR STORE MULTIPLE INSTRUCTIONS FOR OPTIMIZING OPERAND STORE COMPARE PENALTIES - Store multiple instructions are managed based on previous execution history and their alignment. At least one store multiple instruction is detected. A flag is determined to be associated with the at least one store multiple instruction. The flag indicates that the at least one store multiple instruction has previously encountered an operand store compare hazard. The at least one store multiple instruction is organized into a set of unit of operations. The set of unit of operations is executed. The executing avoids the operand store compare hazard previously encountered by the at least one store multiple instruction. | 07-28-2011 |
20110191569 | DATA PROCESSING DEVICE AND SEMICONDUCTOR INTEGRATED CIRCUIT DEVICE - The present invention provides a data processing device where a program can be commonly used and a vector table can be shared regardless of types of endian in a bi-endian system. An instruction is fixed to little endian, and endian of data to be used for executing the instruction is variable. A size of the respective vector addresses in the vector table is 32 bits, and the number of bits at the time of a data access is maximally 32 bits. A CPU fetches an instruction, and before the fetched instruction is executed, the CPU accesses to, for example, for 32-bit data in a memory. At this time, the CPU controls the aligner so that addresses and alignments of data to be stored in each address in byte unit in a data register are identical to addresses and alignments of data determined by little endian of an instruction without depending on types of the endian of the data. | 08-04-2011 |
20110202747 | INSTRUCTION LENGTH BASED CRACKING FOR INSTRUCTION OF VARIABLE LENGTH STORAGE OPERANDS - A method, information processing system, and computer program product manage variable operand length instructions. At least one variable operand length instruction is received. The at least one variable operand length instruction is analyzed. A length of at least one operand in the variable operand length instruction is identified based on the analyzing. The at least one variable operand length instruction is organized into a set of unit of operations. The set of unit of operations are executed. The executing increases one or more performance metrics of the at least one variable operand length instruction. | 08-18-2011 |
20110252220 | INSTRUCTION CRACKING AND ISSUE SHORTENING BASED ON INSTRUCTION BASE FIELDS, INDEX FIELDS, OPERAND FIELDS, AND VARIOUS OTHER INSTRUCTION TEXT BITS - A method, information processing system, and computer program product crack and/or shorten computer executable instructions. At least one instruction is received. The at least on instruction is analyzed. An instruction type associated with the at least one instruction is identified. At least one of a base field, an index field, one or more operands, and a mask field of the instruction are analyzed. At least one of the following is then performed: the at least one instruction is organized into a set of unit of operation; and the at least one instruction is shortened. The set of unit of operations is then executed. | 10-13-2011 |
20120079240 | Reduced-Level Shift Overflow Detection - A processor includes a shift overflow detector for rapidly detecting overflows that may result during execution of a shift instruction. Shift indication signals are generated in response to changes in logic state between adjacent pairs of bits of a received shift data word. A received shift amount is decoded to produce decoded shift signals that indicate an amount of shifting for the received shift data word. An overflow condition is detected in response to the generated shift indication signals and the decoded shift signals and an indication of the detected overflow condition is provided. | 03-29-2012 |
20120254588 | SYSTEMS, APPARATUSES, AND METHODS FOR BLENDING TWO SOURCE OPERANDS INTO A SINGLE DESTINATION USING A WRITEMASK - Embodiments of systems, apparatuses, and methods for performing a blend instruction in a computer processor are described. In some embodiments, the execution of a blend instruction causes a data element-by-element selection of data elements of first and second source operands using the corresponding bit positions of a writemask as a selector between the first and second operands and storage of the selected data elements into the destination at the corresponding position in the destination. | 10-04-2012 |
20120254589 | SYSTEM, APPARATUS, AND METHOD FOR ALIGNING REGISTERS - Embodiments of systems, apparatuses, and methods for performing an align instruction in a computer processor are described. In some embodiments, the execution of an align instruction causes the selective storage of data elements of two concatenated sources to be stored in a destination. | 10-04-2012 |
20130159672 | Bitstream Buffer Manipulation With A SIMD Merge Instruction - Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block. | 06-20-2013 |
20130205115 | USING THE LEAST SIGNIFICANT BITS OF A CALLED FUNCTION'S ADDRESS TO SWITCH PROCESSOR MODES - Systems and methods for tracking and switching between execution modes in processing systems. A processing system is configured to execute instructions in at least two instruction execution triodes including a first and second execution mode chosen from a classic/aligned mode and a compressed/unaligned mode. Target addresses of selected instructions such as calls and returns are forcibly misaligned in the compressed mode, such one or more bits, such as, the least significant bits (alignment bits) of the target address in the compressed mode are different from the corresponding alignment bits in the classic mode. When the selected instructions are encountered during execution in the first mode, a decision to switch operation to the second mode is based on analyzing the alignment bits of the target address of the selected instruction. | 08-08-2013 |
20130246738 | INSTRUCTION TO LOAD DATA UP TO A SPECIFIED MEMORY BOUNDARY INDICATED BY THE INSTRUCTION - A Load to Block Boundary instruction is provided that loads a variable number of bytes of data into a register while ensuring that a specified memory boundary is not crossed. The boundary may be specified a number of ways, including, but not limited to, a variable value in the instruction text, a fixed instruction text value encoded in the opcode, or a register based boundary. | 09-19-2013 |
20130246739 | COPYING CHARACTER DATA HAVING A TERMINATION CHARACTER FROM ONE MEMORY LOCATION TO ANOTHER - Copying characters of a set of terminated character data from one memory location to another memory location using parallel processing and without causing unwarranted exceptions. The character data to be copied is loaded within one or more vector registers. In particular, in one embodiment, an instruction (e.g., a Vector Load to block Boundary instruction) is used that loads data in parallel in a vector register to a specified boundary, and provides a way to determine the number of characters loaded. To determine the number of characters loaded (a count), another instruction (e.g., a Load Count to Block Boundary instruction) is used. Further, an instruction (e.g., a Vector Find Element Not Equal instruction) is used to find the index of the first delimiter character, i.e., the first termination character, such as a zero or null character within the character data. This instruction checks a plurality of bytes of data in parallel. | 09-19-2013 |
20130246740 | INSTRUCTION TO LOAD DATA UP TO A DYNAMICALLY DETERMINED MEMORY BOUNDARY - A Load to Block Boundary instruction is provided that loads a variable number of bytes of data into a register while ensuring that a specified memory boundary is not crossed. The boundary is dynamically determined based on a specified type of boundary and one or more characteristics of the processor executing the instruction, such as cache line size or page size used by the processor. | 09-19-2013 |
20130275718 | SHUFFLE PATTERN GENERATING CIRCUIT, PROCESSOR, SHUFFLE PATTERN GENERATING METHOD, AND INSTRUCTION SEQUENCE - Based on an input index sequence ( | 10-17-2013 |
20130275719 | PACKED DATA OPERATION MASK SHIFT PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS - A method of an aspect includes receiving a packed data operation mask shift instruction. The packed data operation mask shift instruction indicates a source having a packed data operation mask, indicates a shift count number of bits, and indicates a destination. The method further includes storing a result in the destination in response to the packed data operation mask shift instruction. The result includes a sequence of bits of the packed data operation mask that have been shifted by the shift count number of bits. Other methods, apparatus, systems, and instructions are disclosed. | 10-17-2013 |
20130305015 | PERFORMING A CYCLIC REDUNDANCY CHECKSUM OPERATION RESPONSIVE TO A USER-LEVEL INSTRUCTION - In one embodiment, the present invention includes a method for receiving incoming data in a processor and performing a checksum operation on the incoming data in the processor pursuant to a user-level instruction for the checksum operation. For example, a cyclic redundancy checksum may be computed in the processor itself responsive to the user-level instruction. Other embodiments are described and claimed. | 11-14-2013 |
20130305016 | PERFORMING A CYCLIC REDUNDANCY CHECKSUM OPERATION RESPONSIVE TO A USER-LEVEL INSTRUCTION - In one embodiment, the present invention includes a method for receiving incoming data in a processor and performing a checksum operation on the incoming data in the processor pursuant to a user-level instruction for the checksum operation. For example, a cyclic redundancy checksum may be computed in the processor itself responsive to the user-level instruction. Other embodiments are described and claimed. | 11-14-2013 |
20140013082 | RECONFIGURABLE DEVICE FOR REPOSITIONING DATA WITHIN A DATA WORD - Disclosed is a system and device and related methods for data manipulation, especially for SIMD operations such as permute, shift, and rotate. An apparatus includes a permute section that repositions data on sub-word boundaries and a shift section that repositions the data distances smaller than the sub-word width. The sub-word width is configurable and selectable, and the permute section and shift section may operate on different boundary widths. In a first stage, the permute section repositions the data at the nearest sub-word boundary and, in a second stage, the shift section repositions the data to its final desired position. The shift section includes multi-stages set in a logarithmic cascade relationship. Additionally, each shifter within each of the multi-stages is highly connected, allowing fast and precise data movements. | 01-09-2014 |
20140025927 | REDUCING REGISTER READ PORTS FOR REGISTER PAIRS - Embodiments relate to reducing a number of read ports for register pairs. An aspect includes executing an instruction. The instruction identifies a pair of registers as containing a wide operand which spans the pair of registers. It is determined if a pairing indicator associated with the pair of registers has a first value or a second value. The first value indicates that the wide operand is stored in a wide register, and the second value indicates that the wide operand is not stored in the wide register. Based on the pairing indicator having the first value, the wide operand is read from the wide register. Based on the pairing indicator having the second value, the wide operand is read from the pair of registers. An operation is performed using the wide operand. | 01-23-2014 |
20140025928 | PREDICTING REGISTER PAIRS - Embodiments relate to reducing a number of read ports for register pairs. An aspect includes executing an instruction. The instruction identifies a pair of registers as containing a wide operand which spans the pair of registers. The executing of the instruction includes determining whether a pairing indicator associated with the pair of registers has a first value, a second value or a third value. Based on the pairing indicator having the first value, the wide operand is read from the wide register. Based on the pairing indicator having the second value the wide operand is read from the pair of registers. Based on the pairing indicator having the third value, the wide operand is speculatively read from a predetermined register. The predetermined register consists of the wide register or the pair of registers. | 01-23-2014 |
20140025929 | MANAGING REGISTER PAIRING - Embodiments relate to reducing a number of read ports for register pairs. An aspect includes maintaining an active pairing indicator that is configured to have a first value or a second value. The first value indicates that the wide operand is stored in a wide register. The second value indicates that the wide operand is not stored in the wide register. The operand is read from either the wide register or a pair of registers based on the active pairing indicator. The active pairing indicator and the values of the set of wide registers are stored to a storage based on a request to store a register pairing status. A saved pairing indicator and saved values of the set of wide registers is loaded from the storage respectively into an active pairing register and wide registers. | 01-23-2014 |
20140095830 | INSTRUCTION FOR SHIFTING BITS LEFT WITH PULLING ONES INTO LESS SIGNIFICANT BITS - A mask generating instruction is executed by a processor to improve efficiency of vector operations on an array of data elements. The processor includes vector registers, one of which stores data elements of an array. The processor further includes execution circuitry to receive a mask generating instruction that specifies at least a first operand and a second operand. Responsive to the mask generating instruction, the execution circuitry is to shift bits of the first operand to the left by a number of times defined in the second operand, and pull in a bit of one from the right each time a most significant bit of the first operand is shifted out from the left to generate a result. Each bit in the result corresponds to one of the data elements of the array. | 04-03-2014 |
20150032995 | PROCESSORS OPERABLE TO ALLOW FLEXIBLE INSTRUCTION ALIGNMENT - Methods and apparatus are provided for optimizing a processor core. Common processor subcircuitry is used to perform calculations for various types of instructions, including branch and non-branch instructions. Increasing the commonality of calculations across different instruction types allows branch instructions to jump to byte aligned memory address even if supported instructions are multi-byte or word aligned. | 01-29-2015 |
20150039858 | REDUCING REGISTER READ PORTS FOR REGISTER PAIRS - Embodiments relate to reducing a number of read ports for register pairs. An aspect includes executing an instruction. The instruction identifies a pair of registers as containing a wide operand which spans the pair of registers. It is determined if a pairing indicator associated with the pair of registers has a first value or a second value. The first value indicates that the wide operand is stored in a wide register, and the second value indicates that the wide operand is not stored in the wide register. Based on the pairing indicator having the first value, the wide operand is read from the wide register. Based on the pairing indicator having the second value, the wide operand is read from the pair of registers. An operation is performed using the wide operand. | 02-05-2015 |