Class / Patent application number | Description | Number of patent applications / Date published |
712005000 | Masking to control an access to data in vector register | 33 |
20080313422 | Enhanced Single Threaded Execution in a Simultaneous Multithreaded Microprocessor - A method, system, and computer program product are provided for enhancing the execution of independent loads in a processing unit. The processing unit dispatches a first set of instructions in order from a first buffer for execution. The processing unit receives updated results from the execution of the first set of instructions. The processing unit updates, in a first register, at least one register entry associated with each instruction in the first set of instructions, with the updated results. The processing unit determines if the first set of instructions from the first buffer have completed execution. Responsive to the completed execution of the first set of instructions from the first buffer, the processing unit copies the set of entries from the first register to a second register. | 12-18-2008 |
20090249027 | METHOD AND APPARATUS FOR SCRAMBLING SEQUENCE GENERATION IN A COMMUNICATION SYSTEM - A wireless communications method is provided. The method includes employing a processor executing computer executable instructions stored on a computer readable storage medium to implement various acts. The method also includes generating cyclic shifts for a sequence generator by masking shift register output values with one or more vectors. The method includes forwarding the sequence generator to a future state based in part on the output values and the vectors. | 10-01-2009 |
20100042806 | DETERMINING INDEX VALUES FOR BITS OF A BINARY VECTOR - In one embodiment, the present invention determines index values corresponding to bits of a binary vector that have a value of 1. During each clock cycle, a masking technique is applied to M sub-vector index values, where each sub-vector index value corresponds to a different bit of a sub-vector of the binary vector. The masking technique is applied such that (i) the sub-vector index values that correspond to bits having a value of 0 are zeroed out and (ii) the sub-vector index values that correspond to the bits having a value of 1 are left unchanged. The masked sub-vector index values are sorted, and index values are calculated based on the masked sub-vector index values. The index values generated are then distributed uniformly to a number M of index memories such that the M index memories store substantially the same number of index values. | 02-18-2010 |
20100199067 | Split Vector Loads and Stores with Stride Separated Words - A method, system and computer program product are presented for causing a parallel load/store of stride-separated words from a data vector using different memory chips in a computer. | 08-05-2010 |
20100274988 | Flexible vector modes of operation for SIMD processor - In addition to the usual modes of SIMD processor operation, where corresponding elements of two source vector registers are used as input pairs to be operated upon by the execution unit, or where one element of a source vector register is broadcast for use across the elements of another source vector register, the new system provides several other modes of operation for the elements of one or two source vector registers. Improving upon the time-costly moving of elements for an operation such as DCT, the present invention defines a more general set of modes of vector operations. In one embodiment, these new modes of operation use a third vector register to define how each element of one or both source vector registers are mapped, in order to pair these mapped elements as inputs to a vector execution unit. Furthermore, the decision to write an individual vector element result to a destination vector register, for each individual element produced by the vector execution unit, may be selectively disabled, enabled, or made to depend upon a selectable condition flag or a mask bit. | 10-28-2010 |
20110035567 | ACTUAL INSTRUCTION AND ACTUAL-FAULT INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments include a processor that executes a vector instruction. The processor starts by receiving a vector instruction that optionally receives a predicate vector (which has N elements) as an input. The processor then executes the vector instruction. In the described embodiments, executing the vector instruction causes the processor to generate a result vector. When generating the result vector, if the predicate vector is received, for each element in the result vector for which a corresponding element of the predicate vector is active, otherwise, for each element of the result vector, the processor determines element positions for which a fault was masked during a prior operation. The processor then updates elements in the result vector to identify a leftmost element for which a fault was masked. | 02-10-2011 |
20110093681 | REMAINING INSTRUCTION FOR PROCESSING VECTORS - The described embodiments include a processor that executes a vector instruction. The processor starts by receiving an input vector and optionally receiving a predicate vector as inputs. The processor then executes the vector instruction, which causes the processor to determine a key element position in the input vector and generate a result vector. When generating the result vector, if the predicate vector is received, for each element in the result vector for which a corresponding element of the predicate vector is active, otherwise, for each element of the result vector, the processor sets each element of the result vector to the right of the key element to a first predetermined value and sets each element of the result vector at or to the left of the key element to a second predetermined value. The processor then sets one or more processor status flags based on the values in the result vector. | 04-21-2011 |
20120151182 | Performing Function Calls Using Single Instruction Multiple Data (SIMD) Registers - In one embodiment, a processor can perform a function call from a main program to a function that is to operate on at least one vector-type operand, in which only scalar values are passed to the function, and input values to the function including the at least one vector-type operand are to be renamed from virtual registers identified in the function to physical registers of a vector register file, and output values from the function including the at least one vector-type operand are to be renamed from virtual registers identified in the function to physical registers of the vector register file. Other embodiments are described and claimed. | 06-14-2012 |
20130124823 | METHODS, APPARATUS, AND INSTRUCTIONS FOR PROCESSING VECTOR DATA - A computer processor includes control logic for executing LoadUnpack and PackStore instructions. In one embodiment, the processor includes a vector register and a mask register. In response to a PackStore instruction with an argument specifying a memory location, a circuit in the processor copies unmasked vector elements from the vector register to consecutive memory locations, starting at the specified memory location, without copying masked vector elements. In response to a LoadUnpack instruction, the circuit copies data items from consecutive memory locations, starting at an identified memory location, into unmasked vector elements of the vector register, without copying data to masked vector elements. Other embodiments are described and claimed. | 05-16-2013 |
20130212355 | Conditional vector mapping in a SIMD processor - The present invention provides a method for mapping input vector register elements to output vector register elements in one step in relation to a control vector register controlling vector-to-vector mapping and condition code values. The method includes storing an input vector having N-elements of input data in a vector register and storing a control vector having N-elements in a vector register, and providing for enabling vector-to-vector mapping where the mask bit is not set to selectively disable. The masking of certain elements is useful to partition large mappings of vectors or matrices into sizes that fits the number of elements of a given SIMD, and merging of multiple mapped results together. This method and system provides a highly efficient mechanism of mapping vector register elements in parallel based on a user-defined mapping and prior calculated condition codes, and merging these mapped vector elements with another vector using a mask. | 08-15-2013 |
20140019715 | SYSTEMS, APPARATUSES, AND METHODS FOR PERFORMING A CONVERSION OF A WRITEMASK REGISTER TO A LIST OF INDEX VALUES IN A VECTOR REGISTER - Embodiments of systems, apparatuses, and methods for performing in a computer processor conversion of a mask register into a list of index values in response to a single vector packed convert a mask register into a list of index values instruction that includes a destination vector register operand, a source writemask register operand, and an opcode are described. | 01-16-2014 |
20140059322 | APPARATUS AND METHOD FOR BROADCASTING FROM A GENERAL PURPOSE REGISTER TO A VECTOR REGISTER - An apparatus and method are described for broadcasting from a general purpose source register to a destination vector register. For example, a method according to one embodiment includes the following operations: selecting data element position N within the destination vector register to be updated; broadcasting a set of data from the general purpose source register to data element position N within the destination vector register if a mask indicator is set to a first indication; and either copying zeroes to data element position N within the destination vector register or maintaining existing values stored within data element position N within the destination vector register if the mask indicator is set to a second indication. | 02-27-2014 |
20140089634 | APPARATUS AND METHOD FOR DETECTING IDENTICAL ELEMENTS WITHIN A VECTOR REGISTER - An apparatus, system and method are described for identifying identical elements in a vector register. For example, a computer implemented method according to one embodiment comprises the operations of: reading each active element from a first vector register, each active element having a defined bit position within the first vector register; reading each element from a second vector register, each element having a defined bit position within the second vector register corresponding to a bit position of a current active element in the first vector register; reading an input mask register, the input mask register identifying active bit positions in the second vector register for which comparisons are to be made with values in the first vector register, the comparison operations comprising: comparing each active element in the second vector register with elements in the first vector register having bit positions preceding the bit position of the current active element in the second vector register; and setting a bit position in an output mask register equal to a true value if all of the preceding bit positions in the first vector register are equal to the bit in the current active bit position in the second vector register. | 03-27-2014 |
20140095828 | VECTOR MOVE INSTRUCTION CONTROLLED BY READ AND WRITE MASKS - A processor executes a vector move instruction to move data elements from a second vector register to a first vector register under the control of a first mask register and a second mask register. A register file within the processor includes the first vector register, the second vector register, the first mask register and the second mask register. In response to the vector move instruction, execution circuitry in the processor is to replace a given number of target data elements in the first vector register with the given number of source data elements in the second vector register. Each source data element corresponds to a mask bit in the second mask register having a second bit value, and wherein each target data element corresponds to a mask bit in the first mask register having a first bit value. | 04-03-2014 |
20140129802 | METHODS, APPARATUS, AND INSTRUCTIONS FOR PROCESSING VECTOR DATA - A computer processor includes control logic for executing LoadUnpack and PackStore instructions. In one embodiment, the processor includes a vector register and a mask register. In response to a PackStore instruction with an argument specifying a memory location, a circuit in the processor copies unmasked vector elements from the vector register to consecutive memory locations, starting at the specified memory location, without copying masked vector elements. In response to a LoadUnpack instruction, the circuit copies data items from consecutive memory locations, starting at an identified memory location, into unmasked vector elements of the vector register, without copying data to masked vector elements. Other embodiments are described and claimed. | 05-08-2014 |
20140189288 | INSTRUCTION TO REDUCE ELEMENTS IN A VECTOR REGISTER WITH STRIDED ACCESS PATTERN - A vector reduction instruction with non-unit strided access pattern is received and executed by the execution circuitry of a processor. In response to the instruction, the execution circuitry performs an associative reduction operation on data elements of a first vector register. Based on values of the mask register and a current element position being processed, the execution circuitry sequentially set one or more data elements of the first vector register to a result, which is generated by the associative reduction operation applied to both a previous data element of the first vector register and a data clement of a third vector register. The previous data element is located more than one element position away from the current element position. | 07-03-2014 |
20140195775 | INSTRUCTION AND LOGIC TO PROVIDE VECTOR LOADS AND STORES WITH STRIDES AND MASKING FUNCTIONALITY - Instructions and logic provide vector loads and/or stores with stride and mask functionality. Some embodiments, responsive to an instruction specifying: a set of loads, destination register, mask register, memory address, and stride length; execution units read values in the mask register, wherein fields in the mask register correspond to stride-length multiples from the memory address to data elements in memory. A first mask value indicates the element has not been loaded from memory and a second value indicates that the element does not need to be, or has already been loaded. For each having the first value, the corresponding multiple of said stride length is generated according to the data field's position in the mask register to load the data element from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. These instructions can restart after faults. | 07-10-2014 |
20140208065 | APPARATUS AND METHOD FOR MASK REGISTER EXPAND OPERATION - An apparatus and method are described for expanding bits from a mask register in a processor and computing system with vector registers and vector data elements. For example, a method according to one embodiment includes the following operations: reading each mask register bit stored in a mask register, the mask register containing mask values used for performing operations on vector values stored in a set of vector registers; and replicating each mask register bit N times into a destination register, where N is the number of vector elements stored in each vector register. | 07-24-2014 |
20140223139 | SYSTEMS, APPARATUSES, AND METHODS FOR SETTING AN OUTPUT MASK IN A DESTINATION WRITEMASK REGISTER FROM A SOURCE WRITE MASK REGISTER USING AN INPUT WRITEMASK AND IMMEDIATE - Embodiments of systems, apparatuses, and methods for performing in a computer processor generation of a predicate mask based on vector comparison in response to a single instruction are described. | 08-07-2014 |
20140223140 | SYSTEMS, APPARATUSES, AND METHODS FOR PERFORMING VECTOR PACKED UNARY ENCODING USING MASKS - Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed unary encoding using masks in response to a single vector packed unary encoding using masks instruction that includes a source vector register operand, a destination writemask register operand, and an opcode are described. | 08-07-2014 |
20140289494 | INSTRUCTION AND LOGIC TO PROVIDE VECTOR HORIZONTAL MAJORITY VOTING FUNCTIONALITY - Instructions and logic provide vector horizontal majority voting functionality. Some embodiments, responsive to an instruction specifying: a destination operand, a size of the vector elements, a source operand, and a mask corresponding to a portion of the vector element data fields in the source operand; read a number of values from data fields of the specified size in the source operand, corresponding to the mask specified by the instruction and store a result value to that number of corresponding data fields in the destination operand, the result value computed from the majority of values read from the number of data fields of the source operand. | 09-25-2014 |
20140372727 | INSTRUCTION AND LOGIC TO PROVIDE VECTOR BLEND AND PERMUTE FUNCTIONALITY - Vector blend and permute functionality are provided, responsive to instructions specifying: a destination vector register comprising fields to store vector elements, a first vector register, a vector element size, a second vector register, and a third operand. Indices are read from fields in the second register. Each index has a first selector portion and a second selector portion. Corresponding unmasked vector elements are stored to fields of the destination register, wherein each vector element, responsive to the respective first selector portion having a first value, is copied to an intermediate vector from a corresponding data field of the first register, and responsive to the respective first selector portion having a second value, is copied to the intermediate vector from a corresponding data field of the third operand. Then unmasked data fields of the destination are replaced by data fields in the intermediate vector indexed by the corresponding second selector portions. | 12-18-2014 |
20150143075 | VECTOR GENERATE MASK INSTRUCTION - A Vector Generate Mask instruction. For each element in the first operand, a bit mask is generated. The mask includes bits set to a selected value starting at a position specified by a first field of the instruction and ending at a position specified by a second field of the instruction. | 05-21-2015 |
20160041827 | INSTRUCTIONS FOR MERGING MASK PATTERNS - A method is described that includes fetching an instruction and decoding the instruction. The method further includes fetching a first mask vector from a first mask register space location identified by the instruction. The method further includes fetching a second mask vector from a second mask register space location identified by the instruction. The method also includes executing the instruction by merging the first and second mask vectors into a single data structure and causing the single data structure to be written into a memory location identified by the instruction. | 02-11-2016 |
20160179520 | METHOD AND APPARATUS FOR VARIABLY EXPANDING BETWEEN MASK AND VECTOR REGISTERS | 06-23-2016 |
20160179521 | METHOD AND APPARATUS FOR EXPANDING A MASK TO A VECTOR OF MASK VALUES | 06-23-2016 |
20160179522 | METHOD AND APPARATUS FOR PERFORMING A VECTOR BIT REVERSAL | 06-23-2016 |
20160179526 | METHOD AND APPARATUS FOR VECTOR INDEX LOAD AND STORE | 06-23-2016 |
20160179527 | METHOD AND APPARATUS FOR EFFICIENTLY MANAGING ARCHITECTURAL REGISTER STATE OF A PROCESSOR | 06-23-2016 |
20160179528 | METHOD AND APPARATUS FOR PERFORMING CONFLICT DETECTION | 06-23-2016 |
20160188333 | METHOD AND APPARATUS FOR COMPRESSING A MASK VALUE - An apparatus and method for mask compression. For example, one embodiment of a processor comprises: a source mask register to store a plurality of mask bits including a plurality of set bits and a plurality of bits that are not set; a destination mask register to store set bits read from the source mask register; and mask compression logic to read each of the set bits from the source mask register and to store the set bits in contiguous bit locations on one side of the destination mask register. | 06-30-2016 |
20160188336 | METHODS, APPARATUS, INSTRUCTIONS AND LOGIC TO PROVIDE VECTOR PACKED TUPLE CROSS-COMPARISON FUNCTIONALITY - Instructions and logic provide SIMD vector packed tuple cross-comparison functionality. Some processor embodiments include first and second registers with a variable plurality of data fields, each of the data fields to store an element of a first data type. The processor executes a SIMD instruction for vector packed tuple cross-comparison in some embodiments, which for each data field of a portion of data fields in a tuple of the first register, compares its corresponding element with every element of a corresponding portion of data fields in a tuple of the second register and sets a mask bit corresponding to each element of the second register portion, in a bit-mask corresponding to each unmasked element of the corresponding first register portion, according to the corresponding comparison. In some embodiments bit-masks are shifted by corresponding elements in data fields of a third register. The comparison type is indicated by an immediate operand. | 06-30-2016 |
20160188532 | METHOD AND APPARATUS FOR PERFORMING A VECTOR BIT SHUFFLE - An apparatus and method for performing a vector bit shuffle. For example, one embodiment of a processor comprises: a first vector register to store a plurality of source data elements; a second vector register to store a plurality of control elements, each of the control elements comprising a plurality of bit fields, each bit field to be associated with a corresponding bit position in a destination mask register and to identify a bit from each of the source data elements to be copied to each of the particular bit positions; and vector bit shuffle logic to read each bit field from the second vector register to identify a bit from each of the source data elements and to responsively copy the bit from each of the source data elements to each of the corresponding bit positions in the destination mask register. | 06-30-2016 |